Hi
Can someone help me with the following error that I faced while setting
up single node spark framework.
karthik@karthik-OptiPlex-9020:~/spark-1.0.0$ MASTER=spark://localhost:7077
sbin/spark-shell
bash: sbin/spark-shell: No such file or directory
karthik@karthik-OptiPlex-9020:~/spark-1.0.0$
Hi,
I am a post graduate student, new to spark. I want to understand how
Spark scheduler works. I just have theoretical understanding of DAG
scheduler and the underlying task scheduler.
I want to know, given a job to the framework, after the DAG scheduler
phase, how the scheduling happens??
Hi
I have this doubt:
I understand that each java process runs on different JVM instances. Now,
if I have a single executor on my machine and run several java processes,
then there will be several JVM instances running.
Now, process_local means, the data is located on the same JVM as the task
Hi,
Can someone help me with the following error:
scala val rdd = sc.parallelize(Array(1,2,3,4))
rdd: org.apache.spark.rdd.RDD[Int] = ParallelCollectionRDD[0] at
parallelize at console:12
scala rdd.persist(StorageLevel.MEMORY_ONLY)
console:15: error: not found: value StorageLevel
Hi
I have a three node spark cluster. I restricted the resources per
application by setting appropriate parameters and I could run two
applications simultaneously. Now, I want to replicate an RDD and run two
applications simultaneously. Can someone help how to go about doing this!!!
I replicated
Hi,
An RDD replicated by an application is owned by only that application. No
other applications can share it. Then, what is motive behind providing the
rdd replication feature. What all oparations can be performed on the
replicated RDD.
Thank you!!!
-karthik
Hi,
Can someone tell me what kind of operations can be performed on a
replicated rdd?? What are the use-cases of a replicated rdd.
One basic doubt that is bothering me from long time: what is the difference
between an application and job in the Spark parlance. I am confused b'cas
of Hadoop
-- Forwarded message --
From: rapelly kartheek kartheek.m...@gmail.com
Date: Thu, Sep 4, 2014 at 11:49 AM
Subject: Re: RDDs
To: Liu, Raymond raymond@intel.com
Thank you Raymond.
I am more clear now. So, if an rdd is replicated over multiple nodes (i.e.
say two sets of nodes
Hi,
var cachedPeers: Seq[BlockManagerId] = null
private def replicate(blockId: String, data: ByteBuffer, level:
StorageLevel) {
val tLevel = StorageLevel(level.useDisk, level.useMemory,
level.deserialized, 1)
if (cachedPeers == null) {
cachedPeers = master.getPeers(blockManagerId,
Hi,
Whenever I replicate an rdd, I find that the rdd gets replicated only in
one node. I have a 3 node cluster.
I set rdd.persist(StorageLevel.MEMORY_ONLY_2) in my application.
The webUI shows that its replicates twice. But, the rdd stogare details
show that its replicated only once and only in
Hi,
Can someone tell me how to profile a spark application.
-Karthik
Thank you Ted.
regards
Karthik
On Mon, Sep 8, 2014 at 3:33 PM, Ted Yu yuzhih...@gmail.com wrote:
See
https://cwiki.apache.org/confluence/display/SPARK/Profiling+Spark+Applications+Using+YourKit
On Sep 8, 2014, at 2:48 AM, rapelly kartheek kartheek.m...@gmail.com
wrote:
Hi,
Can someone
hi Ted,
Where do I find the licence keys that I need to copy to the licences
directory.
Thank you!!
On Mon, Sep 8, 2014 at 8:25 PM, rapelly kartheek kartheek.m...@gmail.com
wrote:
Thank you Ted.
regards
Karthik
On Mon, Sep 8, 2014 at 3:33 PM, Ted Yu yuzhih...@gmail.com wrote:
See
HI,
Can someone please tell me how to compile the spark source code to effect
the changes in the source code. I was trying to ship the jars to all the
slaves, but in vain.
-Karthik
I have been doing that. All the modifications to the code are not being
compiled.
On Thu, Sep 11, 2014 at 10:45 PM, Daniil Osipov daniil.osi...@shazam.com
wrote:
In the spark source folder, execute `sbt/sbt assembly`
On Thu, Sep 11, 2014 at 8:27 AM, rapelly kartheek kartheek.m...@gmail.com
Hi
I am trying to perform read/write file operations in spark by creating
Writable object.
But, I am not able to write to a file. The concerned data is not rdd.
Can someone please tell me how to perform read/write file operations on
non-rdd data in spark.
Regards
karthik
Hi
I am trying to perform some read/write file operations in spark. Somehow I
am neither able to write to a file nor read.
import java.io._
val writer = new PrintWriter(new File(test.txt ))
writer.write(Hello Scala)
Can someone please tell me how to perform file I/O in spark.
to make sure the file is
accessible on ALL executors. One way to do that is to use a distributed
filesystem like HDFS or GlusterFS.
On Mon, Sep 15, 2014 at 8:51 AM, rapelly kartheek kartheek.m...@gmail.com
wrote:
Hi
I am trying to perform some read/write file operations in spark. Somehow
I am
The file gets created on the fly. So I dont know how to make sure that its
accessible to all nodes.
On Mon, Sep 15, 2014 at 10:10 PM, rapelly kartheek kartheek.m...@gmail.com
wrote:
Yes. I have HDFS. My cluster has 5 nodes. When I run the above commands, I
see that the file gets created
I came across these APIs in one the scala tutorials over the net.
On Mon, Sep 15, 2014 at 10:14 PM, Mohit Jaggi mohitja...@gmail.com wrote:
But the above APIs are not for HDFS.
On Mon, Sep 15, 2014 at 9:40 AM, rapelly kartheek kartheek.m...@gmail.com
wrote:
Yes. I have HDFS. My cluster
Can you please direct me to the right way of doing this.
On Mon, Sep 15, 2014 at 10:18 PM, rapelly kartheek kartheek.m...@gmail.com
wrote:
I came across these APIs in one the scala tutorials over the net.
On Mon, Sep 15, 2014 at 10:14 PM, Mohit Jaggi mohitja...@gmail.com
wrote
Hi,
I'd made some modifications to the spark source code in the master and
reflected them to the slaves using rsync.
I followed this command:
rsync -avL --progress path/to/spark-1.0.0 username@destinationhostname
:path/to/destdirectory.
This worked perfectly. But, I wanted to simultaneously
Hi Tobias,
I've copied the files from master to all the slaves.
On Fri, Sep 19, 2014 at 1:37 PM, Tobias Pfeiffer t...@preferred.jp wrote:
Hi,
On Fri, Sep 19, 2014 at 5:02 PM, rapelly kartheek kartheek.m...@gmail.com
wrote:
This worked perfectly. But, I wanted to simultaneously rsync all
,
* you have copied a lot of files from various hosts to username@slave3:path*
only from one node to all the other nodes...
On Fri, Sep 19, 2014 at 1:45 PM, rapelly kartheek kartheek.m...@gmail.com
wrote:
Hi Tobias,
I've copied the files from master to all the slaves.
On Fri, Sep 19, 2014
-- Forwarded message --
From: rapelly kartheek kartheek.m...@gmail.com
Date: Fri, Sep 19, 2014 at 1:51 PM
Subject: Re: rsync problem
To: Tobias Pfeiffer t...@preferred.jp
any idea why the cluster is dying down???
On Fri, Sep 19, 2014 at 1:47 PM, rapelly kartheek kartheek.m
directory
$SPARK_HOME/work is rsynced as well.
Try emptying the contents of the work folder on each node and try again.
On Fri, Sep 19, 2014 at 4:53 AM, rapelly kartheek kartheek.m...@gmail.com
wrote:
I
* followed this command:rsync -avL --progress path/to/spark-1.0.0
username
Pfeiffer t...@preferred.jp wrote:
Hi,
I assume you unintentionally did not reply to the list, so I'm adding it
back to CC.
How do you submit your job to the cluster?
Tobias
On Thu, Sep 25, 2014 at 2:21 AM, rapelly kartheek kartheek.m...@gmail.com
wrote:
How do I find out whether
Hi,
I was facing GC overhead errors while executing an application with 570MB
data(with rdd replication).
In order to fix the heap errors, I repartitioned the rdd to 10:
val logData = sc.textFile(hdfs:/text_data/text
data.txt).persist(StorageLevel.MEMORY_ONLY_2)
val
Hi,
I am trying to write a String that is not an rdd to HDFS. This data is a
variable in Spark Scheduler code. None of the spark File operations are
working because my data is not rdd.
So, I tried using SparkContext.parallelize(data). But it throws error:
[error]
Regards
Sanjiv Singh
Mob : +091 9990-447-339
On Sun, Oct 12, 2014 at 11:45 AM, rapelly kartheek [hidden email]
http://user/SendEmail.jtp?type=nodenode=16231i=0 wrote:
Hi,
I am trying to write a String that is not an rdd to HDFS. This data is a
variable in Spark Scheduler code. None of the spark
Hi,
I am trying to understand rdd replication code. In the process, I
frequently execute one spark application whenever I make a change to the
code to see effect.
My problem is, after a set of repeated executions of the same application,
I find that my cluster behaves unusually.
Ideally, when
Hi
I am trying to access a file in HDFS from spark source code. Basically, I
am tweaking the spark source code. I need to access a file in HDFS from the
source code of the spark. I am really not understanding how to go about
doing this.
Can someone please help me out in this regard.
Thank you!!
at 11:26 AM, Samarth Mailinglist
mailinglistsama...@gmail.com wrote:
Instead of a file path, use a HDFS URI.
For example: (In Python)
data = sc.textFile(hdfs://localhost/user/someuser/data)
On Wed, Nov 12, 2014 at 10:12 AM, rapelly kartheek
kartheek.m...@gmail.com wrote:
Hi
I am
Hi,
I am trying to read a HDFS file from Spark scheduler code. I could find
how to write hdfs read/writes in java.
But I need to access hdfs from spark using scala. Can someone please help
me in this regard.
, Tri
tri@verizonwireless.com.invalid wrote:
It should be
val file = sc.textFile(hdfs:///localhost:9000/sigmoid/input.txt)
3 “///”
Thanks
Tri
*From:* rapelly kartheek [mailto:kartheek.m...@gmail.com]
*Sent:* Friday, November 14, 2014 9:42 AM
*To:* Akhil Das; user
Hi Akhil,
I face error: not found : value URI
On Fri, Nov 14, 2014 at 9:29 PM, rapelly kartheek kartheek.m...@gmail.com
wrote:
I'll just try out with object Akhil provided.
There was no problem working in shell with sc.textFile.
Thank you Akhil and Tri.
On Fri, Nov 14, 2014 at 9:21 PM
Hi,
When I submit a spark application like this:
./bin/spark-submit --class org.apache.spark.examples.SparkKMeans
--deploy-mode client --master spark://karthik:7077
$SPARK_HOME/examples/*/scala-*/spark-examples-*.jar /k-means 4 0.001
Which part of the spark framework code deals with the name of
Hi,
I've been fiddling with spark/*/storage/blockManagerMasterActor.getPeers()
definition in the context of blockManagerMaster.askDriverWithReply()
sending a request GetPeers().
1) I couldn't understand what the 'selfIndex' is used for?.
2) Also, I tried modifying the 'peers' array by just
Hi,
I face the following exception when submit a spark application. The log
file shows:
14/12/02 11:52:58 ERROR LiveListenerBus: Listener EventLoggingListener
threw an exception
java.io.IOException: Filesystem closed
at org.apache.hadoop.hdfs.DFSClient.checkOpen(DFSClient.java:689)
at
Regards
On Tue, Dec 2, 2014 at 11:59 AM, rapelly kartheek kartheek.m...@gmail.com
wrote:
Hi,
I face the following exception when submit a spark application. The log
file shows:
14/12/02 11:52:58 ERROR LiveListenerBus: Listener EventLoggingListener
threw an exception
java.io.IOException
it.
Thanks
Best Regards
On Tue, Dec 2, 2014 at 11:59 AM, rapelly kartheek
kartheek.m...@gmail.com wrote:
Hi,
I face the following exception when submit a spark application. The log
file shows:
14/12/02 11:52:58 ERROR LiveListenerBus: Listener EventLoggingListener
threw an exception
:
It could be because those threads are finishing quickly.
Thanks
Best Regards
On Tue, Dec 2, 2014 at 2:19 PM, rapelly kartheek kartheek.m...@gmail.com
wrote:
But, somehow, if I run this application for the second time, I find that
the application gets executed and the results are out
Hi,
I was just thinking about necessity for rdd replication. One category could
be something like large number of threads requiring same rdd. Even though,
a single rdd can be shared by multiple threads belonging to same
application , I believe we can extract better parallelism if the rdd is
Hi,
I want to find the time taken for replicating an rdd in spark cluster along
with the computation time on the replicated rdd.
Can someone please suggest some ideas?
Thank you
Hi,
I need to find the storage locations (node Ids ) of each partition of a
replicated rdd in spark. I mean, if an rdd is replicated twice, I want to
find the two nodes for each partition where it is stored.
Spark WebUI has a page wherein it depicts the data distribution of each
rdd. But, I
Hi,
I need to find the storage locations (node Ids ) of each partition of a
replicated rdd in spark. I mean, if an rdd is replicated twice, I want to
find the two nodes for each partition where it is stored.
Spark WebUI has a page wherein it depicts the data distribution of each
rdd. But, I need
Hi,
I want to find the time taken for replicating an rdd in spark cluster along
with the computation time on the replicated rdd.
Can someone please suggest a suitable spark profiler?
Thank you
Hi,
I get this following Exception when I submit spark application that
calculates the frequency of characters in a file. Especially, when I
increase the size of data, I face this problem.
Exception in thread Thread-47 org.apache.spark.SparkException: Job
aborted due to stage failure: Task
spark-1.0.0
On Thu, Jan 1, 2015 at 12:04 PM, Josh Rosen rosenvi...@gmail.com wrote:
Which version of Spark are you using?
On Wed, Dec 31, 2014 at 10:24 PM, rapelly kartheek
kartheek.m...@gmail.com wrote:
Hi,
I get this following Exception when I submit spark application that
calculates
-- Forwarded message --
From: rapelly kartheek kartheek.m...@gmail.com
Date: Thu, Jan 1, 2015 at 12:05 PM
Subject: Re: NullPointerException
To: Josh Rosen rosenvi...@gmail.com, user@spark.apache.org
spark-1.0.0
On Thu, Jan 1, 2015 at 12:04 PM, Josh Rosen rosenvi...@gmail.com
error?
On Wed, Dec 31, 2014 at 10:35 PM, rapelly kartheek
kartheek.m...@gmail.com wrote:
spark-1.0.0
On Thu, Jan 1, 2015 at 12:04 PM, Josh Rosen rosenvi...@gmail.com wrote:
Which version of Spark are you using?
On Wed, Dec 31, 2014 at 10:24 PM, rapelly kartheek
kartheek.m...@gmail.com
Hi,
I get the following error when I build spark using sbt:
[error] Nonzero exit code (128): git clone
https://github.com/ScrapCodes/sbt-pom-reader.git
/home/karthik/.sbt/0.13/staging/ad8e8574a5bcb2d22d23/sbt-pom-reader
[error] Use 'last' for the full log.
Any help please?
directory.
On Mon, Jan 19, 2015 at 9:33 AM, Rapelly Kartheek
kartheek.m...@gmail.com wrote:
Hi,
I get the following exception when I run my application:
karthik@karthik:~/spark-1.2.0$ ./bin/spark-submit --class
org.apache.spark.examples.SimpleApp001 --deploy-mode client --master
spark
Hi,
I get the following exception when I run my application:
karthik@karthik:~/spark-1.2.0$ ./bin/spark-submit --class
org.apache.spark.examples.SimpleApp001 --deploy-mode client --master
spark://karthik:7077 $SPARK_HOME/examples/*/scala-*/spark-examples-*.jar
out1.txt
log4j:WARN No such
your local machine, add an entry in your /etc/hosts file
like and then run the program again (use sudo to edit the file)
127.0.0.1 home
On Mon, Jan 19, 2015 at 3:03 PM, Rapelly Kartheek
kartheek.m...@gmail.com wrote:
Hi,
I get the following exception when I run my application
; there is an empty host between the 2nd and 3rd. This is true
of most URI schemes with a host.
On Mon, Jan 19, 2015 at 9:56 AM, Rapelly Kartheek
kartheek.m...@gmail.com wrote:
Yes yes.. hadoop/etc/hadoop/hdfs-site.xml file has the path like:
hdfs://home/...
On Mon, Jan 19, 2015 at 3:21 PM, Sean Owen
Yes, this proxy problem is resolved.
*how your build refers tohttps://github.com/ScrapCodes/sbt-pom-reader.git
https://github.com/ScrapCodes/sbt-pom-reader.git I don't see thisrepo
the project code base.*
I manually downloaded the sbt-pom-reader directory and moved into
.sbt/0.13/staging/*/
to access github.com for
cloning some dependencies as github is blocked in India. What are the other
possible ways for this problem??
Thank you!
On Sun, Jan 4, 2015 at 9:45 PM, Rapelly Kartheek [hidden email]
http:///user/SendEmail.jtp?type=nodenode=20963i=0 wrote:
Hi,
I get the following error
58 matches
Mail list logo