from:"rapelly kartheek"

hi

2014-06-22 Thread rapelly kartheek

Hi
  Can someone help me with the following error that I faced while setting
up single node spark framework.

karthik@karthik-OptiPlex-9020:~/spark-1.0.0$ MASTER=spark://localhost:7077
sbin/spark-shell
bash: sbin/spark-shell: No such file or directory
karthik@karthik-OptiPlex-9020:~/spark-1.0.0$ MASTER=spark://localhost:7077
bin/spark-shell
Java HotSpot(TM) 64-Bit Server VM warning: ignoring option
MaxPermSize=128m; support was removed in 8.0
14/06/23 10:44:53 INFO spark.SecurityManager: Changing view acls to: karthik
14/06/23 10:44:53 INFO spark.SecurityManager: SecurityManager:
authentication disabled; ui acls disabled; users with view permissions:
Set(karthik)
14/06/23 10:44:53 INFO spark.HttpServer: Starting HTTP Server
14/06/23 10:44:53 INFO server.Server: jetty-8.y.z-SNAPSHOT
14/06/23 10:44:53 INFO server.AbstractConnector: Started
SocketConnector@0.0.0.0:39588
Welcome to
    __
 / __/__  ___ _/ /__
_\ \/ _ \/ _ `/ __/  '_/
   /___/ .__/\_,_/_/ /_/\_\   version 1.0.0
  /_/

Using Scala version 2.10.4 (Java HotSpot(TM) 64-Bit Server VM, Java
1.8.0_05)
Type in expressions to have them evaluated.
Type :help for more information.
14/06/23 10:44:55 INFO spark.SecurityManager: Changing view acls to: karthik
14/06/23 10:44:55 INFO spark.SecurityManager: SecurityManager:
authentication disabled; ui acls disabled; users with view permissions:
Set(karthik)
14/06/23 10:44:55 INFO slf4j.Slf4jLogger: Slf4jLogger started
14/06/23 10:44:55 INFO Remoting: Starting remoting
14/06/23 10:44:55 INFO Remoting: Remoting started; listening on addresses
:[akka.tcp://spark@karthik-OptiPlex-9020:50294]
14/06/23 10:44:55 INFO Remoting: Remoting now listens on addresses:
[akka.tcp://spark@karthik-OptiPlex-9020:50294]
14/06/23 10:44:55 INFO spark.SparkEnv: Registering MapOutputTracker
14/06/23 10:44:55 INFO spark.SparkEnv: Registering BlockManagerMaster
14/06/23 10:44:55 INFO storage.DiskBlockManager: Created local directory at
/tmp/spark-local-20140623104455-3297
14/06/23 10:44:55 INFO storage.MemoryStore: MemoryStore started with
capacity 294.6 MB.
14/06/23 10:44:55 INFO network.ConnectionManager: Bound socket to port
60264 with id = ConnectionManagerId(karthik-OptiPlex-9020,60264)
14/06/23 10:44:55 INFO storage.BlockManagerMaster: Trying to register
BlockManager
14/06/23 10:44:55 INFO storage.BlockManagerInfo: Registering block manager
karthik-OptiPlex-9020:60264 with 294.6 MB RAM
14/06/23 10:44:55 INFO storage.BlockManagerMaster: Registered BlockManager
14/06/23 10:44:55 INFO spark.HttpServer: Starting HTTP Server
14/06/23 10:44:55 INFO server.Server: jetty-8.y.z-SNAPSHOT
14/06/23 10:44:55 INFO server.AbstractConnector: Started
SocketConnector@0.0.0.0:38307
14/06/23 10:44:55 INFO broadcast.HttpBroadcast: Broadcast server started at
http://10.0.1.61:38307
14/06/23 10:44:55 INFO spark.HttpFileServer: HTTP File server directory is
/tmp/spark-082a44f6-e877-48cc-8ab7-1bcbcf8136b0
14/06/23 10:44:55 INFO spark.HttpServer: Starting HTTP Server
14/06/23 10:44:55 INFO server.Server: jetty-8.y.z-SNAPSHOT
14/06/23 10:44:55 INFO server.AbstractConnector: Started
SocketConnector@0.0.0.0:58745
14/06/23 10:44:56 INFO server.Server: jetty-8.y.z-SNAPSHOT
14/06/23 10:44:56 INFO server.AbstractConnector: Started
SelectChannelConnector@0.0.0.0:4040
14/06/23 10:44:56 INFO ui.SparkUI: Started SparkUI at
http://karthik-OptiPlex-9020:4040
14/06/23 10:44:56 WARN util.NativeCodeLoader: Unable to load native-hadoop
library for your platform... using builtin-java classes where applicable
14/06/23 10:44:56 INFO client.AppClient$ClientActor: Connecting to master
spark://localhost:7077...
14/06/23 10:44:56 INFO repl.SparkILoop: Created spark context..
14/06/23 10:44:56 WARN client.AppClient$ClientActor: Could not connect to
akka.tcp://sparkMaster@localhost:7077:
akka.remote.EndpointAssociationException: Association failed with
[akka.tcp://sparkMaster@localhost:7077]
Spark context available as sc.

scala 14/06/23 10:44:56 WARN client.AppClient$ClientActor: Could not
connect to akka.tcp://sparkMaster@localhost:7077:
akka.remote.EndpointAssociationException: Association failed with
[akka.tcp://sparkMaster@localhost:7077]
14/06/23 10:44:56 WARN client.AppClient$ClientActor: Could not connect to
akka.tcp://sparkMaster@localhost:7077:
akka.remote.EndpointAssociationException: Association failed with
[akka.tcp://sparkMaster@localhost:7077]
14/06/23 10:44:56 WARN client.AppClient$ClientActor: Could not connect to
akka.tcp://sparkMaster@localhost:7077:
akka.remote.EndpointAssociationException: Association failed with
[akka.tcp://sparkMaster@localhost:7077]
14/06/23 10:45:16 INFO client.AppClient$ClientActor: Connecting to master
spark://localhost:7077...
14/06/23 10:45:16 WARN client.AppClient$ClientActor: Could not connect to
akka.tcp://sparkMaster@localhost:7077:
akka.remote.EndpointAssociationException: Association failed with
[akka.tcp://sparkMaster@localhost:7077]
14/06/23 10:45:16 WARN client.AppClient$ClientActor: Could

Scheduling in spark

2014-07-08 Thread rapelly kartheek

Hi,
  I am a post graduate student, new to spark. I want to understand how
Spark scheduler works. I just have theoretical understanding of DAG
scheduler and the underlying task scheduler.

I want to know, given a job to the framework, after the DAG scheduler
phase, how the scheduling happens??

Can someone help me out as to how to proceed in these lines.  I have some
exposure towards Hadoop schedulers and tools like Mumak simulator for
experiments.

Can someone please tell me how to perform simulations on Spark w.r.to
 schedulers.


Thanks in advance

Hi

2014-08-20 Thread rapelly kartheek

Hi
I have this doubt:

I understand that each java process runs on different JVM instances. Now,
if I have a single executor on my machine and run several java processes,
then there will be several JVM instances running.

Now, process_local means, the data is located on the same JVM as the task
that is launched. But, the memory associated with the entire executor is
same. Then, how does this memory gets distributed across the JVMs??. I
mean, how this memory gets  associated with multiple JVMs??

Thank you!!!
-karthik

StorageLevel error.

2014-08-25 Thread rapelly kartheek

Hi,
Can someone help me with the following error:


scala val rdd = sc.parallelize(Array(1,2,3,4))
rdd: org.apache.spark.rdd.RDD[Int] = ParallelCollectionRDD[0] at
parallelize at console:12

scala rdd.persist(StorageLevel.MEMORY_ONLY)
console:15: error: not found: value StorageLevel
  rdd.persist(StorageLevel.MEMORY_ONLY)
  ^


Thank you!!!

Replicate RDDs

2014-08-27 Thread rapelly kartheek

Hi

I have a three node spark cluster. I restricted the resources per
application by setting appropriate parameters and I could run two
applications simultaneously. Now, I want to replicate an RDD and run two
applications simultaneously. Can someone help how to go about doing this!!!

I replicated an RDD of size 1354MB over this cluster. The webUI shows that
its replicated twice. But when I go to storage details, the two partitions,
each of size ~677MB, are stored on the same node. All other nodes do not
contain any partitions.

Can someone tell me where am I going wrong?

Thank you!!
-karthik

operations on replicated RDD

2014-09-01 Thread rapelly kartheek

Hi,

An RDD replicated by an application is owned by only that application. No
other applications can share it. Then, what is motive behind providing the
rdd replication feature. What all oparations can be performed on the
replicated RDD.

Thank you!!!
-karthik

RDDs

2014-09-03 Thread rapelly kartheek

Hi,

Can someone tell me what kind of operations can be performed on a
replicated rdd?? What are the use-cases of a replicated rdd.

One basic doubt that is bothering me from long time: what is the difference
between an application and job in the Spark parlance. I am confused b'cas
of Hadoop jargon.

Thank you

Fwd: RDDs

2014-09-04 Thread rapelly kartheek

-- Forwarded message --
From: rapelly kartheek kartheek.m...@gmail.com
Date: Thu, Sep 4, 2014 at 11:49 AM
Subject: Re: RDDs
To: Liu, Raymond raymond@intel.com

Thank you Raymond.
I am more clear now. So, if an rdd is replicated over multiple nodes (i.e.
say two sets of nodes as it is a collection of chunks), can we run two jobs
concurrently and seperately on these two sets of nodes?

On Thu, Sep 4, 2014 at 11:38 AM, Liu, Raymond raymond@intel.com wrote:

 Actually, a replicated RDD and a parallel job on the same RDD, this two
 conception is not related at all.
 A replicated RDD just store data on multiple node, it helps with HA and
 provide better chance for data locality. It is still one RDD, not two
 separate RDD.
 While regarding run two jobs on the same RDD, it doesn't matter that the
 RDD is replicated or not. You can always do it if you wish to.

 Best Regards,
 Raymond Liu

 -Original Message-
 From: Kartheek.R [mailto:kartheek.m...@gmail.com]
 Sent: Thursday, September 04, 2014 1:24 PM
 To: u...@spark.incubator.apache.org
 Subject: RE: RDDs

 Thank you Raymond and Tobias.
 Yeah, I am very clear about what I was asking. I was talking about
 replicated rdd only. Now that I've got my understanding about job and
 application validated, I wanted to know if we can replicate an rdd and run
 two jobs (that need same rdd) of an application in parallel?.

 -Karthk

 --
 View this message in context:
 http://apache-spark-user-list.1001560.n3.nabble.com/RDDs-tp13343p13416.html
 Sent from the Apache Spark User List mailing list archive at Nabble.com.

 -
 To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional
 commands, e-mail: user-h...@spark.apache.org

question on replicate() in blockManager.scala

2014-09-05 Thread rapelly kartheek

Hi,

var cachedPeers: Seq[BlockManagerId] = null
  private def replicate(blockId: String, data: ByteBuffer, level:
StorageLevel) {
val tLevel = StorageLevel(level.useDisk, level.useMemory,
level.deserialized, 1)
if (cachedPeers == null) {
  cachedPeers = master.getPeers(blockManagerId, level.replication - 1)
}
for (peer: BlockManagerId - cachedPeers) {
  val start = System.nanoTime
  data.rewind()
  logDebug(Try to replicate BlockId  + blockId +  once; The size of
the data is 
+ data.limit() +  Bytes. To node:  + peer)
  if (!BlockManagerWorker.syncPutBlock(PutBlock(blockId, data, tLevel),
new ConnectionManagerId(peer.host, peer.port))) {
logError(Failed to call syncPutBlock to  + peer)
  }
  logDebug(Replicated BlockId  + blockId +  once used  +
(System.nanoTime - start) / 1e6 +  s; The size of the data is  +
data.limit() +  bytes.)
}


I get the flow of this code. But, I dont find any method being called for
actually writing the data into the set of peers chosen for replication.

Where exaclty is the replication happening?

Thank you!!
-Karthik

replicated rdd storage problem

2014-09-05 Thread rapelly kartheek

Hi,

Whenever I replicate an rdd, I find that the rdd gets replicated only in
one node. I have a 3 node cluster.

I set rdd.persist(StorageLevel.MEMORY_ONLY_2) in my application.

The webUI shows that its replicates twice. But, the rdd stogare details
show that its replicated only once and only in one node.

 Can someone tell me where am I going wrong???

regards
-Karthik

How to profile a spark application

2014-09-08 Thread rapelly kartheek

Hi,

Can someone tell me how to profile a spark application.

-Karthik

Re: How to profile a spark application

2014-09-08 Thread rapelly kartheek

Thank you Ted.

regards
Karthik

On Mon, Sep 8, 2014 at 3:33 PM, Ted Yu yuzhih...@gmail.com wrote:

 See
 https://cwiki.apache.org/confluence/display/SPARK/Profiling+Spark+Applications+Using+YourKit

 On Sep 8, 2014, at 2:48 AM, rapelly kartheek kartheek.m...@gmail.com
 wrote:

 Hi,

 Can someone tell me how to profile a spark application.

 -Karthik

Re: How to profile a spark application

2014-09-08 Thread rapelly kartheek

hi Ted,

Where do I find the licence keys that I need to copy to the licences
directory.

Thank you!!


On Mon, Sep 8, 2014 at 8:25 PM, rapelly kartheek kartheek.m...@gmail.com
wrote:

 Thank you Ted.

 regards
 Karthik

 On Mon, Sep 8, 2014 at 3:33 PM, Ted Yu yuzhih...@gmail.com wrote:

 See
 https://cwiki.apache.org/confluence/display/SPARK/Profiling+Spark+Applications+Using+YourKit

 On Sep 8, 2014, at 2:48 AM, rapelly kartheek kartheek.m...@gmail.com
 wrote:

 Hi,

 Can someone tell me how to profile a spark application.

 -Karthik

compiling spark source code

2014-09-11 Thread rapelly kartheek

HI,


Can someone please tell me how to compile the spark source code to effect
the changes in the source code. I was trying to ship the jars to all the
slaves, but in vain.

-Karthik

Re: compiling spark source code

2014-09-11 Thread rapelly kartheek

I have been doing that. All the modifications to the code  are not being
compiled.


On Thu, Sep 11, 2014 at 10:45 PM, Daniil Osipov daniil.osi...@shazam.com
wrote:

 In the spark source folder, execute `sbt/sbt assembly`

 On Thu, Sep 11, 2014 at 8:27 AM, rapelly kartheek kartheek.m...@gmail.com
  wrote:

 HI,


 Can someone please tell me how to compile the spark source code to effect
 the changes in the source code. I was trying to ship the jars to all the
 slaves, but in vain.

 -Karthik

File operations on spark

2014-09-14 Thread rapelly kartheek

Hi

I am trying to perform read/write file operations in spark by creating
Writable object.
But, I am not able to write to a file. The concerned data is not rdd.

Can someone please tell me how to perform read/write file operations on
non-rdd data in spark.

Regards
karthik

File I/O in spark

2014-09-15 Thread rapelly kartheek

Hi

I am trying to perform some read/write file operations in spark. Somehow I
am neither able to write to a file nor read.

import java.io._

  val writer = new PrintWriter(new File(test.txt ))

  writer.write(Hello Scala)


Can someone please tell me how to perform file I/O in spark.

Re: File I/O in spark

2014-09-15 Thread rapelly kartheek

Yes. I have HDFS. My cluster has 5 nodes. When I run the above commands, I
see that the file gets created in the master node. But, there wont be any
data written to it.


On Mon, Sep 15, 2014 at 10:06 PM, Mohit Jaggi mohitja...@gmail.com wrote:

 Is this code running in an executor? You need to make sure the file is
 accessible on ALL executors. One way to do that is to use a distributed
 filesystem like HDFS or GlusterFS.

 On Mon, Sep 15, 2014 at 8:51 AM, rapelly kartheek kartheek.m...@gmail.com
  wrote:

 Hi

 I am trying to perform some read/write file operations in spark. Somehow
 I am neither able to write to a file nor read.

 import java.io._

   val writer = new PrintWriter(new File(test.txt ))

   writer.write(Hello Scala)


 Can someone please tell me how to perform file I/O in spark.

Re: File I/O in spark

2014-09-15 Thread rapelly kartheek

The file gets created on the fly. So I dont know how to make sure that its
accessible to all nodes.

On Mon, Sep 15, 2014 at 10:10 PM, rapelly kartheek kartheek.m...@gmail.com
wrote:

 Yes. I have HDFS. My cluster has 5 nodes. When I run the above commands, I
 see that the file gets created in the master node. But, there wont be any
 data written to it.


 On Mon, Sep 15, 2014 at 10:06 PM, Mohit Jaggi mohitja...@gmail.com
 wrote:

 Is this code running in an executor? You need to make sure the file is
 accessible on ALL executors. One way to do that is to use a distributed
 filesystem like HDFS or GlusterFS.

 On Mon, Sep 15, 2014 at 8:51 AM, rapelly kartheek 
 kartheek.m...@gmail.com wrote:

 Hi

 I am trying to perform some read/write file operations in spark. Somehow
 I am neither able to write to a file nor read.

 import java.io._

   val writer = new PrintWriter(new File(test.txt ))

   writer.write(Hello Scala)


 Can someone please tell me how to perform file I/O in spark.

Re: File I/O in spark

2014-09-15 Thread rapelly kartheek

I came across these APIs in one the scala tutorials over the net.

On Mon, Sep 15, 2014 at 10:14 PM, Mohit Jaggi mohitja...@gmail.com wrote:

 But the above APIs are not for HDFS.

 On Mon, Sep 15, 2014 at 9:40 AM, rapelly kartheek kartheek.m...@gmail.com
  wrote:

 Yes. I have HDFS. My cluster has 5 nodes. When I run the above commands,
 I see that the file gets created in the master node. But, there wont be any
 data written to it.


 On Mon, Sep 15, 2014 at 10:06 PM, Mohit Jaggi mohitja...@gmail.com
 wrote:

 Is this code running in an executor? You need to make sure the file is
 accessible on ALL executors. One way to do that is to use a distributed
 filesystem like HDFS or GlusterFS.

 On Mon, Sep 15, 2014 at 8:51 AM, rapelly kartheek 
 kartheek.m...@gmail.com wrote:

 Hi

 I am trying to perform some read/write file operations in spark.
 Somehow I am neither able to write to a file nor read.

 import java.io._

   val writer = new PrintWriter(new File(test.txt ))

   writer.write(Hello Scala)


 Can someone please tell me how to perform file I/O in spark.

Re: File I/O in spark

2014-09-15 Thread rapelly kartheek

Can you please direct me to the right way of doing this.

On Mon, Sep 15, 2014 at 10:18 PM, rapelly kartheek kartheek.m...@gmail.com
wrote:

 I came across these APIs in one the scala tutorials over the net.

 On Mon, Sep 15, 2014 at 10:14 PM, Mohit Jaggi mohitja...@gmail.com
 wrote:

 But the above APIs are not for HDFS.

 On Mon, Sep 15, 2014 at 9:40 AM, rapelly kartheek 
 kartheek.m...@gmail.com wrote:

 Yes. I have HDFS. My cluster has 5 nodes. When I run the above commands,
 I see that the file gets created in the master node. But, there wont be any
 data written to it.


 On Mon, Sep 15, 2014 at 10:06 PM, Mohit Jaggi mohitja...@gmail.com
 wrote:

 Is this code running in an executor? You need to make sure the file is
 accessible on ALL executors. One way to do that is to use a distributed
 filesystem like HDFS or GlusterFS.

 On Mon, Sep 15, 2014 at 8:51 AM, rapelly kartheek 
 kartheek.m...@gmail.com wrote:

 Hi

 I am trying to perform some read/write file operations in spark.
 Somehow I am neither able to write to a file nor read.

 import java.io._

   val writer = new PrintWriter(new File(test.txt ))

   writer.write(Hello Scala)


 Can someone please tell me how to perform file I/O in spark.

rsync problem

2014-09-19 Thread rapelly kartheek

Hi,

I'd made some modifications to  the spark source code in the master and
reflected them to the slaves using rsync.

I followed this command:
rsync -avL --progress path/to/spark-1.0.0  username@destinationhostname
:path/to/destdirectory.

This worked perfectly. But, I wanted to simultaneously rsync all the
slaves. So, added the other slaves as following:

rsync -avL --progress path/to/spark-1.0.0  username@destinationhostname
:path/to/destdirectory username@slave2:path username@slave3:path and so on.

But this didn't work. Anyway, for now, I did it individually for each node.

Can someone give me the right syntax.


Secondly, after this rsync, I find that my cluster has become tremendously
slow!!!
Sometimes the cluster is just shutting down. Job execution is not happening.

Can someone throw some light on this aspect.

thank you
Karthik

Re: rsync problem

2014-09-19 Thread rapelly kartheek

Hi Tobias,

I've copied the files from master to all the slaves.

On Fri, Sep 19, 2014 at 1:37 PM, Tobias Pfeiffer t...@preferred.jp wrote:

 Hi,

 On Fri, Sep 19, 2014 at 5:02 PM, rapelly kartheek kartheek.m...@gmail.com
  wrote:

 This worked perfectly. But, I wanted to simultaneously rsync all the
 slaves. So, added the other slaves as following:

 rsync -avL --progress path/to/spark-1.0.0  username@destinationhostname
 :path/to/destdirectory username@slave2:path username@slave3:path and so
 on.


 The rsync man page says
rsync [OPTION...] SRC... [USER@]HOST:DEST
 so as I understand your command, you have copied a lot of files from
 various hosts to username@slave3:path. I don't think rsync can copy to
 various locations at once.

 Tobias

Re: rsync problem

2014-09-19 Thread rapelly kartheek

,


* you have copied a lot of files from various hosts to username@slave3:path*
only from one node to all the other nodes...

On Fri, Sep 19, 2014 at 1:45 PM, rapelly kartheek kartheek.m...@gmail.com
wrote:

 Hi Tobias,

 I've copied the files from master to all the slaves.

 On Fri, Sep 19, 2014 at 1:37 PM, Tobias Pfeiffer t...@preferred.jp wrote:

 Hi,

 On Fri, Sep 19, 2014 at 5:02 PM, rapelly kartheek 
 kartheek.m...@gmail.com wrote:

 This worked perfectly. But, I wanted to simultaneously rsync all the
 slaves. So, added the other slaves as following:

 rsync -avL --progress path/to/spark-1.0.0  username@destinationhostname
 :path/to/destdirectory username@slave2:path username@slave3:path and so
 on.


 The rsync man page says
rsync [OPTION...] SRC... [USER@]HOST:DEST
 so as I understand your command, you have copied a lot of files from
 various hosts to username@slave3:path. I don't think rsync can copy to
 various locations at once.

 Tobias

Fwd: rsync problem

2014-09-19 Thread rapelly kartheek

-- Forwarded message --
From: rapelly kartheek kartheek.m...@gmail.com
Date: Fri, Sep 19, 2014 at 1:51 PM
Subject: Re: rsync problem
To: Tobias Pfeiffer t...@preferred.jp

any idea why the cluster is dying down???

On Fri, Sep 19, 2014 at 1:47 PM, rapelly kartheek kartheek.m...@gmail.com
wrote:

 ,

 * you have copied a lot of files from various hosts to
 username@slave3:path*
 only from one node to all the other nodes...

 On Fri, Sep 19, 2014 at 1:45 PM, rapelly kartheek kartheek.m...@gmail.com
  wrote:

 Hi Tobias,

 I've copied the files from master to all the slaves.

 On Fri, Sep 19, 2014 at 1:37 PM, Tobias Pfeiffer t...@preferred.jp
 wrote:

 Hi,

 On Fri, Sep 19, 2014 at 5:02 PM, rapelly kartheek 
 kartheek.m...@gmail.com wrote:

 This worked perfectly. But, I wanted to simultaneously rsync all the
 slaves. So, added the other slaves as following:

 rsync -avL --progress path/to/spark-1.0.0  username@destinationhostname
 :path/to/destdirectory username@slave2:path username@slave3:path and
 so on.

 The rsync man page says
rsync [OPTION...] SRC... [USER@]HOST:DEST
 so as I understand your command, you have copied a lot of files from
 various hosts to username@slave3:path. I don't think rsync can copy to
 various locations at once.

 Tobias

Re: rsync problem

2014-09-19 Thread rapelly kartheek

Thank you Soumya Simantha and Tobias. I've deleted the contents of the work
folder in all the nodes.
Now its working perfectly as it was before.

Thank you
Karthik

On Fri, Sep 19, 2014 at 4:46 PM, Soumya Simanta soumya.sima...@gmail.com
wrote:

 One possible reason is maybe that the checkpointing directory
 $SPARK_HOME/work is rsynced as well.
 Try emptying the contents of the work folder on each node and try again.



 On Fri, Sep 19, 2014 at 4:53 AM, rapelly kartheek kartheek.m...@gmail.com
  wrote:

 I
 * followed this command:rsync -avL --progress path/to/spark-1.0.0
 username@destinationhostname:*


 *path/to/destdirectory. Anyway, for now, I did it individually for each
 node.*

 I have copied to each node at a time individually using the above
 command. So, I guess the copying may not contain any mixture of files.
 Also, as of now, I am not facing any MethodNotFound exceptions. But, there
 is no job execution taking place.

 After sometime, one by one, each goes down and the cluster shuts down.

 On Fri, Sep 19, 2014 at 2:15 PM, Tobias Pfeiffer t...@preferred.jp
 wrote:

 Hi,

 On Fri, Sep 19, 2014 at 5:17 PM, rapelly kartheek 
 kartheek.m...@gmail.com wrote:

  ,

 * you have copied a lot of files from various hosts to
 username@slave3:path*
 only from one node to all the other nodes...


 I don't think rsync can do that in one command as you described. My
 guess is that now you have a wild mixture of jar files all across your
 cluster which will lead to fancy exceptions like MethodNotFound etc.,
 that's maybe why your cluster is not working correctly.

 Tobias

Re: rsync problem

2014-09-26 Thread rapelly kartheek

Hi,

This is the command I am using for submitting my application, SimpleApp:

./bin/spark-submit --class org.apache.spark.examples.SimpleApp
--deploy-mode client --master spark://karthik:7077
$SPARK_HOME/examples/*/scala-*/spark-examples-*.jar /text-data


On Thu, Sep 25, 2014 at 6:52 AM, Tobias Pfeiffer t...@preferred.jp wrote:

 Hi,

 I assume you unintentionally did not reply to the list, so I'm adding it
 back to CC.

 How do you submit your job to the cluster?

 Tobias


 On Thu, Sep 25, 2014 at 2:21 AM, rapelly kartheek kartheek.m...@gmail.com
  wrote:

 How do I find out whether a node in the cluster is a master or slave??
 Till now I was thinking that slaves file under the conf folder makes the
 difference. Also, the MASTER_MASTER_IP in the spark-env.sh file.

 what else differentiates a slave from the master??

 On Wed, Sep 24, 2014 at 10:46 PM, rapelly kartheek 
 kartheek.m...@gmail.com wrote:

 The job execution is taking place perfectly. Previously, all my print
 statements used to be stored in spark/work/*/stdout file. But, now after
 doing the rsync, I find that none of the prtint statements are getting
 reflected in the stdout file under work folder. But, when I go to the code,
 I find the statements in the code. But, they are not reflected into the
 stdout file as before.

 Can you please tell me where I went wrong.  All I want is to see my
 mofication in the code getting relected in output
 .

 On Wed, Sep 24, 2014 at 10:22 PM, rapelly kartheek 
 kartheek.m...@gmail.com wrote:

 Hi,

 I have a very important and fundamental doubt: I have rsynced the
 entire spark folder from the master to all slaves in the cluster. When I
 execute a job, its working perfectly. But, when I rsync the entire spark
 folder of the master to all the slaves, is it not that I am sending the
 master configurations to all the slaves and making the slaves behave like
 master??

 First of all, is it correct to rsync the entire spark folder??
 But, if I change only one file, then how do I rsync it to all??

 On Fri, Sep 19, 2014 at 8:44 PM, rapelly kartheek 
 kartheek.m...@gmail.com wrote:

 Thank you Soumya Simantha and Tobias. I've deleted the contents of the
 work folder in all the nodes.
 Now its working perfectly as it was before.

 Thank you
 Karthik

 On Fri, Sep 19, 2014 at 4:46 PM, Soumya Simanta 
 soumya.sima...@gmail.com wrote:

 One possible reason is maybe that the checkpointing directory
 $SPARK_HOME/work is rsynced as well.
 Try emptying the contents of the work folder on each node and try
 again.



 On Fri, Sep 19, 2014 at 4:53 AM, rapelly kartheek 
 kartheek.m...@gmail.com wrote:

 I
 * followed this command:rsync -avL --progress path/to/spark-1.0.0
 username@destinationhostname:*


 *path/to/destdirectory. Anyway, for now, I did it individually for
 each node.*

 I have copied to each node at a time individually using the above
 command. So, I guess the copying may not contain any mixture of files.
 Also, as of now, I am not facing any MethodNotFound exceptions. But, 
 there
 is no job execution taking place.

 After sometime, one by one, each goes down and the cluster shuts
 down.

 On Fri, Sep 19, 2014 at 2:15 PM, Tobias Pfeiffer t...@preferred.jp
 wrote:

 Hi,

 On Fri, Sep 19, 2014 at 5:17 PM, rapelly kartheek 
 kartheek.m...@gmail.com wrote:

  ,

 * you have copied a lot of files from various hosts to
 username@slave3:path*
 only from one node to all the other nodes...


 I don't think rsync can do that in one command as you described. My
 guess is that now you have a wild mixture of jar files all across your
 cluster which will lead to fancy exceptions like MethodNotFound etc.,
 that's maybe why your cluster is not working correctly.

 Tobias

Rdd repartitioning

2014-10-10 Thread rapelly kartheek

Hi,
I was facing GC overhead errors while executing an application with 570MB
data(with rdd replication).

In order to fix the heap errors, I repartitioned the rdd to 10:

val logData = sc.textFile(hdfs:/text_data/text
data.txt).persist(StorageLevel.MEMORY_ONLY_2)
val parts=logData.coalesce(10,true)
  println(parts.partitions.length).

But the problem is, WebUI still shows number of partitions as 5 while the
print statement outputs 10. I tried even repartition(), but face the same
problem.

Also, does webUI show the storage details of each partition twice when I
replicate the rdd? Because, I see that webUI displays each partition only
once while it says 2 x replicated.

Can someone help me out in this!!!

-Karthik

How to convert a non-rdd data to rdd.

2014-10-12 Thread rapelly kartheek

Hi,

I am trying to write a String that is not an rdd to HDFS. This data is a
variable in Spark Scheduler code. None of the spark File operations are
working because my data is not rdd.

So, I tried using SparkContext.parallelize(data). But it throws error:

[error]
/home/karthik/spark-1.0.0/core/src/main/scala/org/apache/spark/storage/BlockManagerMaster.scala:265:
not found: value SparkContext
[error]  SparkContext.parallelize(result)
[error]  ^
[error] one error found

I realized that this data is part of the Scheduler. So, the Sparkcontext
would not have got created yet.

Any help in writing scheduler variable data to HDFS is appreciated!!

-Karthik

Re: How to convert a non-rdd data to rdd.

2014-10-12 Thread rapelly kartheek

Its a variable in spark-1.0.0/*/storagre/BlockManagerMaster.scala  class.
The return data of AskDriverWithReply() method for the getPeers() method.

Basically, it is a Seq[ArrayBuffer]:

ArraySeq(ArrayBuffer(BlockManagerId(1, s1, 47006, 0), BlockManagerId(0, s1,
34625, 0)), ArrayBuffer(BlockManagerId(1, s1, 47006, 0), BlockManagerId(0,
s2, 34625, 0)), ArrayBuffer(BlockManagerId(1, s1, 47006, 0),
BlockManagerId(0, s2, 34625, 0)), ArrayBuffer(BlockManagerId(1, s1, 47006,
0), BlockManagerId(0, s2, 34625, 0)), ArrayBuffer(BlockManagerId(driver,
karthik, 51051, 0), BlockManagerId(1, s1, 47006, 0)))


On Sun, Oct 12, 2014 at 12:59 PM, @Sanjiv Singh [via Apache Spark User
List] ml-node+s1001560n16231...@n3.nabble.com wrote:

 Hi Karthik,

 Can you provide us more detail of dataset data that you wanted to
 parallelize with

 SparkContext.parallelize(data);




 Regards,
 Sanjiv Singh


 Regards
 Sanjiv Singh
 Mob :  +091 9990-447-339

 On Sun, Oct 12, 2014 at 11:45 AM, rapelly kartheek [hidden email]
 http://user/SendEmail.jtp?type=nodenode=16231i=0 wrote:

 Hi,

 I am trying to write a String that is not an rdd to HDFS. This data is a
 variable in Spark Scheduler code. None of the spark File operations are
 working because my data is not rdd.

 So, I tried using SparkContext.parallelize(data). But it throws error:

 [error]
 /home/karthik/spark-1.0.0/core/src/main/scala/org/apache/spark/storage/BlockManagerMaster.scala:265:
 not found: value SparkContext
 [error]  SparkContext.parallelize(result)
 [error]  ^
 [error] one error found

 I realized that this data is part of the Scheduler. So, the Sparkcontext
 would not have got created yet.

 Any help in writing scheduler variable data to HDFS is appreciated!!

 -Karthik




 --
  If you reply to this email, your message will be added to the discussion
 below:

 http://apache-spark-user-list.1001560.n3.nabble.com/How-to-convert-a-non-rdd-data-to-rdd-tp16230p16231.html
  To unsubscribe from How to convert a non-rdd data to rdd., click here
 http://apache-spark-user-list.1001560.n3.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_codenode=16230code=a2FydGhlZWsubWJtc0BnbWFpbC5jb218MTYyMzB8LTE1NjA1NDM4NDM=
 .
 NAML
 http://apache-spark-user-list.1001560.n3.nabble.com/template/NamlServlet.jtp?macro=macro_viewerid=instant_html%21nabble%3Aemail.namlbase=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespacebreadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml

Rdd replication

2014-11-09 Thread rapelly kartheek

Hi,

I am trying to understand  rdd replication code. In the process, I
frequently execute one spark application whenever I make a change to the
code to see effect.

My problem is, after a set of repeated executions of the same application,
I find that my cluster behaves unusually.

Ideally, when I replicate an rdd twice, the webUI displays each partition
twice in the RDD storage info tab. But, sometimes I find that it displays
each partition only once. Also, when it is replicated only once, each
partition gets displayed twice. This happens frequently.

Can someone throw some light in this regard.

Read a HDFS file from Spark source code

2014-11-11 Thread rapelly kartheek

Hi

I am trying to access a file in HDFS from spark source code. Basically, I
am tweaking the spark source code. I need to access a file in HDFS from the
source code of the spark. I am really not understanding how to go about
doing this.

Can someone please help me out in this regard.
Thank you!!
Karthik

Re: Read a HDFS file from Spark source code

2014-11-11 Thread rapelly kartheek

Hi Sean,
I was following this link;

http://mund-consulting.com/Blog/Posts/file-operations-in-HDFS-using-java.aspx

But, I was facing FileSystem ambiguity error. I really don't have any idea
as to how to go about doing this.
Can you please help me how to start off with this?


On Wed, Nov 12, 2014 at 11:26 AM, Samarth Mailinglist 
mailinglistsama...@gmail.com wrote:

 Instead of a file path, use a HDFS URI.
 For example: (In Python)



 data = sc.textFile(hdfs://localhost/user/someuser/data)

 

 On Wed, Nov 12, 2014 at 10:12 AM, rapelly kartheek 
 kartheek.m...@gmail.com wrote:

 Hi

 I am trying to access a file in HDFS from spark source code. Basically,
 I am tweaking the spark source code. I need to access a file in HDFS from
 the source code of the spark. I am really not understanding how to go about
 doing this.

 Can someone please help me out in this regard.
 Thank you!!
 Karthik

Read a HDFS file from Spark using HDFS API

2014-11-14 Thread rapelly kartheek

Hi,
I am trying to read a HDFS file from Spark scheduler code. I could find
how to write hdfs read/writes in java.

But I  need to access hdfs from spark using scala. Can someone please help
me in this regard.

Re: Read a HDFS file from Spark using HDFS API

2014-11-14 Thread rapelly kartheek

I'll just try out with object Akhil provided.
There was no problem working in shell with sc.textFile.

Thank you Akhil and Tri.

On Fri, Nov 14, 2014 at 9:21 PM, Akhil Das ak...@sigmoidanalytics.com
wrote:

 [image: Inline image 1]


 Thanks
 Best Regards

 On Fri, Nov 14, 2014 at 9:18 PM, Bui, Tri 
 tri@verizonwireless.com.invalid wrote:

 It should be



 val file = sc.textFile(hdfs:///localhost:9000/sigmoid/input.txt)



 3 “///”



 Thanks

 Tri



 *From:* rapelly kartheek [mailto:kartheek.m...@gmail.com]
 *Sent:* Friday, November 14, 2014 9:42 AM
 *To:* Akhil Das; user@spark.apache.org
 *Subject:* Re: Read a HDFS file from Spark using HDFS API



 No. I am not accessing hdfs from either shell or a spark application. I
 want to access from spark Scheduler code.



 I face an error when I use sc.textFile() as SparkContext wouldn't have
 been created yet. So, error says: sc not found.



 On Fri, Nov 14, 2014 at 9:07 PM, Akhil Das ak...@sigmoidanalytics.com
 wrote:

 like this?



 val file = sc.textFile(hdfs://localhost:9000/sigmoid/input.txt)


 Thanks

 Best Regards



 On Fri, Nov 14, 2014 at 9:02 PM, rapelly kartheek 
 kartheek.m...@gmail.com wrote:

 Hi,

 I am trying to read a HDFS file from Spark scheduler code. I could find
 how to write hdfs read/writes in java.



 But I  need to access hdfs from spark using scala. Can someone please
 help me in this regard.

Re: Read a HDFS file from Spark using HDFS API

2014-11-14 Thread rapelly kartheek

Hi Akhil,

I face error:  not found : value URI 

On Fri, Nov 14, 2014 at 9:29 PM, rapelly kartheek kartheek.m...@gmail.com
wrote:

 I'll just try out with object Akhil provided.
 There was no problem working in shell with sc.textFile.

 Thank you Akhil and Tri.

 On Fri, Nov 14, 2014 at 9:21 PM, Akhil Das ak...@sigmoidanalytics.com
 wrote:

 [image: Inline image 1]


 Thanks
 Best Regards

 On Fri, Nov 14, 2014 at 9:18 PM, Bui, Tri 
 tri@verizonwireless.com.invalid wrote:

 It should be



 val file = sc.textFile(hdfs:///localhost:9000/sigmoid/input.txt)



 3 “///”



 Thanks

 Tri



 *From:* rapelly kartheek [mailto:kartheek.m...@gmail.com]
 *Sent:* Friday, November 14, 2014 9:42 AM
 *To:* Akhil Das; user@spark.apache.org
 *Subject:* Re: Read a HDFS file from Spark using HDFS API



 No. I am not accessing hdfs from either shell or a spark application. I
 want to access from spark Scheduler code.



 I face an error when I use sc.textFile() as SparkContext wouldn't have
 been created yet. So, error says: sc not found.



 On Fri, Nov 14, 2014 at 9:07 PM, Akhil Das ak...@sigmoidanalytics.com
 wrote:

 like this?



 val file = sc.textFile(hdfs://localhost:9000/sigmoid/input.txt)


 Thanks

 Best Regards



 On Fri, Nov 14, 2014 at 9:02 PM, rapelly kartheek 
 kartheek.m...@gmail.com wrote:

 Hi,

 I am trying to read a HDFS file from Spark scheduler code. I could
 find how to write hdfs read/writes in java.



 But I  need to access hdfs from spark using scala. Can someone please
 help me in this regard.

How to access application name in the spark framework code.

2014-11-24 Thread rapelly kartheek

Hi,

When I submit a spark application like this:

./bin/spark-submit --class org.apache.spark.examples.SparkKMeans
--deploy-mode client --master spark://karthik:7077
$SPARK_HOME/examples/*/scala-*/spark-examples-*.jar /k-means 4 0.001
Which part of the spark framework code deals with the name of the
application?. Basically, I want to access the name of the application in
the spark scheduler code.

Can someone please tell me where I should look for the code that deals
with the name of the currently executing application (say, SparkKMeans)?

Thank you.

[no subject]

2014-11-26 Thread rapelly kartheek

Hi,
I've been fiddling with spark/*/storage/blockManagerMasterActor.getPeers()
definition in the context of blockManagerMaster.askDriverWithReply()
sending a request GetPeers().

1) I couldn't understand what the 'selfIndex' is used for?.

2) Also, I tried modifying the 'peers' array by just eliminating some
blockManagerId's and passed the modified one to the tabulate method. The
application gets executed, but I find that the
blockManagerMaster.askDriverWithReply() recieves the sequence of
blockManagerIds that include the ones I have eliminated previously.

For example,

My original 'peers' array contained 5 blockManagerId's: BlockManagerId(2,
s2, 39997, 0), BlockManagerId(1, s4, 35874, 0),BlockManagerId(3, s1, 33738,
0), BlockManagerId(0, s3, 38207, 0), BlockManagerId(driver, karthik,
34388, 0).

I modified it to peers1 having 3 blockManagerId's : BlockManagerId(2, s2,
39997, 0), BlockManagerId(1, s4, 35874, 0), BlockManagerId(3, s1, 33738, 0).

Then I passed this modified peers1 array for the sequence conversion:

'Array.tabulate[BlockManagerId](size) { i = peers1((selfIndex + i + 1) %
peers1.length) }.toSeq

But, finally when the /storage/blockManagerMaster.askDriverWithReply() gets
the result, it contains the blockManagerIds that I have eliminated
purposely.

Can someone please make me understand how this seq[BlockManagerId] is
constructed?
Thank you!

java.io.IOException: Filesystem closed

2014-12-01 Thread rapelly kartheek

Hi,

I face the following exception when submit a spark application. The log
file shows:

14/12/02 11:52:58 ERROR LiveListenerBus: Listener EventLoggingListener
threw an exception
java.io.IOException: Filesystem closed
at org.apache.hadoop.hdfs.DFSClient.checkOpen(DFSClient.java:689)
at
org.apache.hadoop.hdfs.DFSOutputStream.flushOrSync(DFSOutputStream.java:1668)
at org.apache.hadoop.hdfs.DFSOutputStream.hflush(DFSOutputStream.java:1629)
at org.apache.hadoop.hdfs.DFSOutputStream.sync(DFSOutputStream.java:1614)
at org.apache.hadoop.fs.FSDataOutputStream.sync(FSDataOutputStream.java:120)
at
org.apache.spark.util.FileLogger$$anonfun$flush$2.apply(FileLogger.scala:158)
at
org.apache.spark.util.FileLogger$$anonfun$flush$2.apply(FileLogger.scala:158)
at scala.Option.foreach(Option.scala:236)
at org.apache.spark.util.FileLogger.flush(FileLogger.scala:158)
at
org.apache.spark.scheduler.EventLoggingListener.logEvent(EventLoggingListener.scala:87)
at
org.apache.spark.scheduler.EventLoggingListener.onJobEnd(EventLoggingListener.scala:112)
at
org.apache.spark.scheduler.SparkListenerBus$$anonfun$postToAll$4.apply(SparkListenerBus.scala:52)
at
org.apache.spark.scheduler.SparkListenerBus$$anonfun$postToAll$4.apply(SparkListenerBus.scala:52)
at
org.apache.spark.scheduler.SparkListenerBus$$anonfun$foreachListener$1.apply(SparkListenerBus.scala:81)
at
org.apache.spark.scheduler.SparkListenerBus$$anonfun$foreachListener$1.apply(SparkListenerBus.scala:79)
at
scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
at
org.apache.spark.scheduler.SparkListenerBus$class.foreachListener(SparkListenerBus.scala:79)
at
org.apache.spark.scheduler.SparkListenerBus$class.postToAll(SparkListenerBus.scala:52)
at
org.apache.spark.scheduler.LiveListenerBus.postToAll(LiveListenerBus.scala:32)
at
org.apache.spark.scheduler.LiveListenerBus$$anon$1$$anonfun$run$1$$anonfun$apply$mcV$sp$1.apply(LiveListenerBus.scala:56)
at
org.apache.spark.scheduler.LiveListenerBus$$anon$1$$anonfun$run$1$$anonfun$apply$mcV$sp$1.apply(LiveListenerBus.scala:56)
at scala.Option.foreach(Option.scala:236)
at
org.apache.spark.scheduler.LiveListenerBus$$anon$1$$anonfun$run$1.apply$mcV$sp(LiveListenerBus.scala:56)
at
org.apache.spark.scheduler.LiveListenerBus$$anon$1$$anonfun$run$1.apply(LiveListenerBus.scala:47)
at
org.apache.spark.scheduler.LiveListenerBus$$anon$1$$anonfun$run$1.apply(LiveListenerBus.scala:47)
at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1160)
at
org.apache.spark.scheduler.LiveListenerBus$$anon$1.run(LiveListenerBus.scala:46)

Someone please help me resolve this!!

Thanks

Re: java.io.IOException: Filesystem closed

2014-12-02 Thread rapelly kartheek

Sorry for the delayed response. Please find my application attached.

On Tue, Dec 2, 2014 at 12:04 PM, Akhil Das ak...@sigmoidanalytics.com
wrote:

 What is the application that you are submitting? Looks like you might have
 invoked fs inside the app and then closed it within it.

 Thanks
 Best Regards

 On Tue, Dec 2, 2014 at 11:59 AM, rapelly kartheek kartheek.m...@gmail.com
  wrote:

 Hi,

 I face the following exception when submit a spark application. The log
 file shows:

 14/12/02 11:52:58 ERROR LiveListenerBus: Listener EventLoggingListener
 threw an exception
 java.io.IOException: Filesystem closed
 at org.apache.hadoop.hdfs.DFSClient.checkOpen(DFSClient.java:689)
 at
 org.apache.hadoop.hdfs.DFSOutputStream.flushOrSync(DFSOutputStream.java:1668)
 at
 org.apache.hadoop.hdfs.DFSOutputStream.hflush(DFSOutputStream.java:1629)
 at org.apache.hadoop.hdfs.DFSOutputStream.sync(DFSOutputStream.java:1614)
 at
 org.apache.hadoop.fs.FSDataOutputStream.sync(FSDataOutputStream.java:120)
 at
 org.apache.spark.util.FileLogger$$anonfun$flush$2.apply(FileLogger.scala:158)
 at
 org.apache.spark.util.FileLogger$$anonfun$flush$2.apply(FileLogger.scala:158)
 at scala.Option.foreach(Option.scala:236)
 at org.apache.spark.util.FileLogger.flush(FileLogger.scala:158)
 at
 org.apache.spark.scheduler.EventLoggingListener.logEvent(EventLoggingListener.scala:87)
 at
 org.apache.spark.scheduler.EventLoggingListener.onJobEnd(EventLoggingListener.scala:112)
 at
 org.apache.spark.scheduler.SparkListenerBus$$anonfun$postToAll$4.apply(SparkListenerBus.scala:52)
 at
 org.apache.spark.scheduler.SparkListenerBus$$anonfun$postToAll$4.apply(SparkListenerBus.scala:52)
 at
 org.apache.spark.scheduler.SparkListenerBus$$anonfun$foreachListener$1.apply(SparkListenerBus.scala:81)
 at
 org.apache.spark.scheduler.SparkListenerBus$$anonfun$foreachListener$1.apply(SparkListenerBus.scala:79)
 at
 scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
 at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
 at
 org.apache.spark.scheduler.SparkListenerBus$class.foreachListener(SparkListenerBus.scala:79)
 at
 org.apache.spark.scheduler.SparkListenerBus$class.postToAll(SparkListenerBus.scala:52)
 at
 org.apache.spark.scheduler.LiveListenerBus.postToAll(LiveListenerBus.scala:32)
 at
 org.apache.spark.scheduler.LiveListenerBus$$anon$1$$anonfun$run$1$$anonfun$apply$mcV$sp$1.apply(LiveListenerBus.scala:56)
 at
 org.apache.spark.scheduler.LiveListenerBus$$anon$1$$anonfun$run$1$$anonfun$apply$mcV$sp$1.apply(LiveListenerBus.scala:56)
 at scala.Option.foreach(Option.scala:236)
 at
 org.apache.spark.scheduler.LiveListenerBus$$anon$1$$anonfun$run$1.apply$mcV$sp(LiveListenerBus.scala:56)
 at
 org.apache.spark.scheduler.LiveListenerBus$$anon$1$$anonfun$run$1.apply(LiveListenerBus.scala:47)
 at
 org.apache.spark.scheduler.LiveListenerBus$$anon$1$$anonfun$run$1.apply(LiveListenerBus.scala:47)
 at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1160)
 at
 org.apache.spark.scheduler.LiveListenerBus$$anon$1.run(LiveListenerBus.scala:46)

 Someone please help me resolve this!!

 Thanks





SimpleApp001.scala
Description: Binary data

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Re: java.io.IOException: Filesystem closed

2014-12-02 Thread rapelly kartheek

But, somehow, if I run this application for the second time, I find that
the application gets executed and the results are out regardless of the
same errors in logs.

On Tue, Dec 2, 2014 at 2:08 PM, Akhil Das ak...@sigmoidanalytics.com
wrote:

 Your code seems to have a lot of threads and i think you might be invoking
 sc.stop before those threads get finished.

 Thanks
 Best Regards

 On Tue, Dec 2, 2014 at 12:04 PM, Akhil Das ak...@sigmoidanalytics.com
 wrote:

 What is the application that you are submitting? Looks like you might
 have invoked fs inside the app and then closed it within it.

 Thanks
 Best Regards

 On Tue, Dec 2, 2014 at 11:59 AM, rapelly kartheek 
 kartheek.m...@gmail.com wrote:

 Hi,

 I face the following exception when submit a spark application. The log
 file shows:

 14/12/02 11:52:58 ERROR LiveListenerBus: Listener EventLoggingListener
 threw an exception
 java.io.IOException: Filesystem closed
 at org.apache.hadoop.hdfs.DFSClient.checkOpen(DFSClient.java:689)
 at
 org.apache.hadoop.hdfs.DFSOutputStream.flushOrSync(DFSOutputStream.java:1668)
 at
 org.apache.hadoop.hdfs.DFSOutputStream.hflush(DFSOutputStream.java:1629)
 at org.apache.hadoop.hdfs.DFSOutputStream.sync(DFSOutputStream.java:1614)
 at
 org.apache.hadoop.fs.FSDataOutputStream.sync(FSDataOutputStream.java:120)
 at
 org.apache.spark.util.FileLogger$$anonfun$flush$2.apply(FileLogger.scala:158)
 at
 org.apache.spark.util.FileLogger$$anonfun$flush$2.apply(FileLogger.scala:158)
 at scala.Option.foreach(Option.scala:236)
 at org.apache.spark.util.FileLogger.flush(FileLogger.scala:158)
 at
 org.apache.spark.scheduler.EventLoggingListener.logEvent(EventLoggingListener.scala:87)
 at
 org.apache.spark.scheduler.EventLoggingListener.onJobEnd(EventLoggingListener.scala:112)
 at
 org.apache.spark.scheduler.SparkListenerBus$$anonfun$postToAll$4.apply(SparkListenerBus.scala:52)
 at
 org.apache.spark.scheduler.SparkListenerBus$$anonfun$postToAll$4.apply(SparkListenerBus.scala:52)
 at
 org.apache.spark.scheduler.SparkListenerBus$$anonfun$foreachListener$1.apply(SparkListenerBus.scala:81)
 at
 org.apache.spark.scheduler.SparkListenerBus$$anonfun$foreachListener$1.apply(SparkListenerBus.scala:79)
 at
 scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
 at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
 at
 org.apache.spark.scheduler.SparkListenerBus$class.foreachListener(SparkListenerBus.scala:79)
 at
 org.apache.spark.scheduler.SparkListenerBus$class.postToAll(SparkListenerBus.scala:52)
 at
 org.apache.spark.scheduler.LiveListenerBus.postToAll(LiveListenerBus.scala:32)
 at
 org.apache.spark.scheduler.LiveListenerBus$$anon$1$$anonfun$run$1$$anonfun$apply$mcV$sp$1.apply(LiveListenerBus.scala:56)
 at
 org.apache.spark.scheduler.LiveListenerBus$$anon$1$$anonfun$run$1$$anonfun$apply$mcV$sp$1.apply(LiveListenerBus.scala:56)
 at scala.Option.foreach(Option.scala:236)
 at
 org.apache.spark.scheduler.LiveListenerBus$$anon$1$$anonfun$run$1.apply$mcV$sp(LiveListenerBus.scala:56)
 at
 org.apache.spark.scheduler.LiveListenerBus$$anon$1$$anonfun$run$1.apply(LiveListenerBus.scala:47)
 at
 org.apache.spark.scheduler.LiveListenerBus$$anon$1$$anonfun$run$1.apply(LiveListenerBus.scala:47)
 at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1160)
 at
 org.apache.spark.scheduler.LiveListenerBus$$anon$1.run(LiveListenerBus.scala:46)

 Someone please help me resolve this!!

 Thanks

Re: java.io.IOException: Filesystem closed

2014-12-02 Thread rapelly kartheek

Does the sparkContext shuts down itself by default even if I dont mention
specifically in my code?? Because, I ran the application without
sc.context(), still I get file system closed error along with correct
output.

On Tue, Dec 2, 2014 at 2:20 PM, Akhil Das ak...@sigmoidanalytics.com
wrote:

 It could be because those threads are finishing quickly.

 Thanks
 Best Regards

 On Tue, Dec 2, 2014 at 2:19 PM, rapelly kartheek kartheek.m...@gmail.com
 wrote:

 But, somehow, if I run this application for the second time, I find that
 the application gets executed and the results are out regardless of the
 same errors in logs.

 On Tue, Dec 2, 2014 at 2:08 PM, Akhil Das ak...@sigmoidanalytics.com
 wrote:

 Your code seems to have a lot of threads and i think you might be
 invoking sc.stop before those threads get finished.

 Thanks
 Best Regards

 On Tue, Dec 2, 2014 at 12:04 PM, Akhil Das ak...@sigmoidanalytics.com
 wrote:

 What is the application that you are submitting? Looks like you might
 have invoked fs inside the app and then closed it within it.

 Thanks
 Best Regards

 On Tue, Dec 2, 2014 at 11:59 AM, rapelly kartheek 
 kartheek.m...@gmail.com wrote:

 Hi,

 I face the following exception when submit a spark application. The
 log file shows:

 14/12/02 11:52:58 ERROR LiveListenerBus: Listener EventLoggingListener
 threw an exception
 java.io.IOException: Filesystem closed
 at org.apache.hadoop.hdfs.DFSClient.checkOpen(DFSClient.java:689)
 at
 org.apache.hadoop.hdfs.DFSOutputStream.flushOrSync(DFSOutputStream.java:1668)
 at
 org.apache.hadoop.hdfs.DFSOutputStream.hflush(DFSOutputStream.java:1629)
 at
 org.apache.hadoop.hdfs.DFSOutputStream.sync(DFSOutputStream.java:1614)
 at
 org.apache.hadoop.fs.FSDataOutputStream.sync(FSDataOutputStream.java:120)
 at
 org.apache.spark.util.FileLogger$$anonfun$flush$2.apply(FileLogger.scala:158)
 at
 org.apache.spark.util.FileLogger$$anonfun$flush$2.apply(FileLogger.scala:158)
 at scala.Option.foreach(Option.scala:236)
 at org.apache.spark.util.FileLogger.flush(FileLogger.scala:158)
 at
 org.apache.spark.scheduler.EventLoggingListener.logEvent(EventLoggingListener.scala:87)
 at
 org.apache.spark.scheduler.EventLoggingListener.onJobEnd(EventLoggingListener.scala:112)
 at
 org.apache.spark.scheduler.SparkListenerBus$$anonfun$postToAll$4.apply(SparkListenerBus.scala:52)
 at
 org.apache.spark.scheduler.SparkListenerBus$$anonfun$postToAll$4.apply(SparkListenerBus.scala:52)
 at
 org.apache.spark.scheduler.SparkListenerBus$$anonfun$foreachListener$1.apply(SparkListenerBus.scala:81)
 at
 org.apache.spark.scheduler.SparkListenerBus$$anonfun$foreachListener$1.apply(SparkListenerBus.scala:79)
 at
 scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
 at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
 at
 org.apache.spark.scheduler.SparkListenerBus$class.foreachListener(SparkListenerBus.scala:79)
 at
 org.apache.spark.scheduler.SparkListenerBus$class.postToAll(SparkListenerBus.scala:52)
 at
 org.apache.spark.scheduler.LiveListenerBus.postToAll(LiveListenerBus.scala:32)
 at
 org.apache.spark.scheduler.LiveListenerBus$$anon$1$$anonfun$run$1$$anonfun$apply$mcV$sp$1.apply(LiveListenerBus.scala:56)
 at
 org.apache.spark.scheduler.LiveListenerBus$$anon$1$$anonfun$run$1$$anonfun$apply$mcV$sp$1.apply(LiveListenerBus.scala:56)
 at scala.Option.foreach(Option.scala:236)
 at
 org.apache.spark.scheduler.LiveListenerBus$$anon$1$$anonfun$run$1.apply$mcV$sp(LiveListenerBus.scala:56)
 at
 org.apache.spark.scheduler.LiveListenerBus$$anon$1$$anonfun$run$1.apply(LiveListenerBus.scala:47)
 at
 org.apache.spark.scheduler.LiveListenerBus$$anon$1$$anonfun$run$1.apply(LiveListenerBus.scala:47)
 at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1160)
 at
 org.apache.spark.scheduler.LiveListenerBus$$anon$1.run(LiveListenerBus.scala:46)

 Someone please help me resolve this!!

 Thanks

Necessity for rdd replication.

2014-12-03 Thread rapelly kartheek

Hi,

I was just thinking about necessity for rdd replication. One category could
be something like large number of threads requiring same rdd. Even though,
a single rdd can be shared by multiple threads belonging to same
application , I believe we can extract better parallelism  if the rdd is
replicated, am I right?.

I am eager to know if there are any real life applications or any other
scenarios which force rdd to be replicated. Can someone please throw some
light on necessity for rdd replication.

Thank you

Profiling a spark application.

2014-12-25 Thread rapelly kartheek

Hi,

I want to find the time taken for replicating an rdd in spark cluster along
with the computation time on the replicated rdd.

Can someone please suggest some ideas?

Thank you

Storage Locations of an rdd

2014-12-26 Thread rapelly kartheek

Hi,

I need to find the storage locations (node Ids ) of  each  partition of a
replicated rdd in spark. I mean, if an rdd is replicated twice, I want to
find the two nodes for each partition where it is stored.

Spark WebUI has a page wherein it depicts the data distribution of each
rdd. But, I really don't appreciate what it displays.

Can someone please throw some light in this regard?

Thank you
Karthik

Storage Locations of an rdd

2014-12-26 Thread rapelly kartheek

Hi,
I need to find the storage locations (node Ids ) of each partition of a
replicated rdd in spark. I mean, if an rdd is replicated twice, I want to
find the two nodes for each partition where it is stored.
Spark WebUI has a page wherein it depicts the data distribution of each
rdd. But, I need to know the first and second locations of each partition
of the replicated rdd.
Can someone please throw some light in this regard?
Thank you
Karthik

Spark profiler

2014-12-29 Thread rapelly kartheek

Hi,

I want to find the time taken for replicating an rdd in spark cluster along
with the computation time on the replicated rdd.

Can someone please suggest a suitable spark profiler?

Thank you

NullPointerException

2014-12-31 Thread rapelly kartheek

Hi,
I get this following Exception when I submit spark application that
calculates the frequency of characters in a file. Especially, when I
increase the size of data, I face this problem.

Exception in thread Thread-47 org.apache.spark.SparkException: Job
aborted due to stage failure: Task 11.0:10 failed 4 times, most recent
failure: Exception failure in TID 295 on host s1:
java.lang.NullPointerException
org.apache.spark.storage.BlockManager.org
$apache$spark$storage$BlockManager$$replicate(BlockManager.scala:786)
org.apache.spark.storage.BlockManager.doPut(BlockManager.scala:752)
org.apache.spark.storage.BlockManager.put(BlockManager.scala:574)
org.apache.spark.CacheManager.getOrCompute(CacheManager.scala:108)
org.apache.spark.rdd.RDD.iterator(RDD.scala:227)
org.apache.spark.rdd.FilteredRDD.compute(FilteredRDD.scala:34)
org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)
org.apache.spark.rdd.RDD.iterator(RDD.scala:229)
org.apache.spark.rdd.MappedRDD.compute(MappedRDD.scala:31)
org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)
org.apache.spark.rdd.RDD.iterator(RDD.scala:229)
org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:111)
org.apache.spark.scheduler.Task.run(Task.scala:51)

org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:187)

java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)

java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
java.lang.Thread.run(Thread.java:745)
Driver stacktrace:
at org.apache.spark.scheduler.DAGScheduler.org
$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1033)
at
org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1017)
at
org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1015)
at
scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
at
org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1015)
at
org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:633)
at
org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:633)
at scala.Option.foreach(Option.scala:236)
at
org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:633)
at
org.apache.spark.scheduler.DAGSchedulerEventProcessActor$$anonfun$receive$2.applyOrElse(DAGScheduler.scala:1207)
at akka.actor.ActorCell.receiveMessage(ActorCell.scala:498)
at akka.actor.ActorCell.invoke(ActorCell.scala:456)
at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:237)
at akka.dispatch.Mailbox.run(Mailbox.scala:219)
at
akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:386)
at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
at
scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
at
scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)


Any help?

Thank you!

Re: NullPointerException

2014-12-31 Thread rapelly kartheek

spark-1.0.0

On Thu, Jan 1, 2015 at 12:04 PM, Josh Rosen rosenvi...@gmail.com wrote:

 Which version of Spark are you using?

 On Wed, Dec 31, 2014 at 10:24 PM, rapelly kartheek 
 kartheek.m...@gmail.com wrote:

 Hi,
 I get this following Exception when I submit spark application that
 calculates the frequency of characters in a file. Especially, when I
 increase the size of data, I face this problem.

 Exception in thread Thread-47 org.apache.spark.SparkException: Job
 aborted due to stage failure: Task 11.0:10 failed 4 times, most recent
 failure: Exception failure in TID 295 on host s1:
 java.lang.NullPointerException
 org.apache.spark.storage.BlockManager.org
 $apache$spark$storage$BlockManager$$replicate(BlockManager.scala:786)

 org.apache.spark.storage.BlockManager.doPut(BlockManager.scala:752)
 org.apache.spark.storage.BlockManager.put(BlockManager.scala:574)
 org.apache.spark.CacheManager.getOrCompute(CacheManager.scala:108)
 org.apache.spark.rdd.RDD.iterator(RDD.scala:227)
 org.apache.spark.rdd.FilteredRDD.compute(FilteredRDD.scala:34)
 org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)
 org.apache.spark.rdd.RDD.iterator(RDD.scala:229)
 org.apache.spark.rdd.MappedRDD.compute(MappedRDD.scala:31)
 org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)
 org.apache.spark.rdd.RDD.iterator(RDD.scala:229)

 org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:111)
 org.apache.spark.scheduler.Task.run(Task.scala:51)

 org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:187)

 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)

 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
 java.lang.Thread.run(Thread.java:745)
 Driver stacktrace:
 at org.apache.spark.scheduler.DAGScheduler.org
 $apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1033)
 at
 org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1017)
 at
 org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1015)
 at
 scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
 at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
 at
 org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1015)
 at
 org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:633)
 at
 org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:633)
 at scala.Option.foreach(Option.scala:236)
 at
 org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:633)
 at
 org.apache.spark.scheduler.DAGSchedulerEventProcessActor$$anonfun$receive$2.applyOrElse(DAGScheduler.scala:1207)
 at akka.actor.ActorCell.receiveMessage(ActorCell.scala:498)
 at akka.actor.ActorCell.invoke(ActorCell.scala:456)
 at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:237)
 at akka.dispatch.Mailbox.run(Mailbox.scala:219)
 at
 akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:386)
 at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
 at
 scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
 at
 scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
 at
 scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)


 Any help?

 Thank you!

Fwd: NullPointerException

2014-12-31 Thread rapelly kartheek

-- Forwarded message --
From: rapelly kartheek kartheek.m...@gmail.com
Date: Thu, Jan 1, 2015 at 12:05 PM
Subject: Re: NullPointerException
To: Josh Rosen rosenvi...@gmail.com, user@spark.apache.org


spark-1.0.0

On Thu, Jan 1, 2015 at 12:04 PM, Josh Rosen rosenvi...@gmail.com wrote:

 Which version of Spark are you using?

 On Wed, Dec 31, 2014 at 10:24 PM, rapelly kartheek 
 kartheek.m...@gmail.com wrote:

 Hi,
 I get this following Exception when I submit spark application that
 calculates the frequency of characters in a file. Especially, when I
 increase the size of data, I face this problem.

 Exception in thread Thread-47 org.apache.spark.SparkException: Job
 aborted due to stage failure: Task 11.0:10 failed 4 times, most recent
 failure: Exception failure in TID 295 on host s1:
 java.lang.NullPointerException
 org.apache.spark.storage.BlockManager.org
 $apache$spark$storage$BlockManager$$replicate(BlockManager.scala:786)

 org.apache.spark.storage.BlockManager.doPut(BlockManager.scala:752)
 org.apache.spark.storage.BlockManager.put(BlockManager.scala:574)
 org.apache.spark.CacheManager.getOrCompute(CacheManager.scala:108)
 org.apache.spark.rdd.RDD.iterator(RDD.scala:227)
 org.apache.spark.rdd.FilteredRDD.compute(FilteredRDD.scala:34)
 org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)
 org.apache.spark.rdd.RDD.iterator(RDD.scala:229)
 org.apache.spark.rdd.MappedRDD.compute(MappedRDD.scala:31)
 org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)
 org.apache.spark.rdd.RDD.iterator(RDD.scala:229)

 org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:111)
 org.apache.spark.scheduler.Task.run(Task.scala:51)

 org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:187)

 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)

 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
 java.lang.Thread.run(Thread.java:745)
 Driver stacktrace:
 at org.apache.spark.scheduler.DAGScheduler.org
 $apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1033)
 at
 org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1017)
 at
 org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1015)
 at
 scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
 at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
 at
 org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1015)
 at
 org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:633)
 at
 org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:633)
 at scala.Option.foreach(Option.scala:236)
 at
 org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:633)
 at
 org.apache.spark.scheduler.DAGSchedulerEventProcessActor$$anonfun$receive$2.applyOrElse(DAGScheduler.scala:1207)
 at akka.actor.ActorCell.receiveMessage(ActorCell.scala:498)
 at akka.actor.ActorCell.invoke(ActorCell.scala:456)
 at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:237)
 at akka.dispatch.Mailbox.run(Mailbox.scala:219)
 at
 akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:386)
 at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
 at
 scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
 at
 scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
 at
 scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)


 Any help?

 Thank you!

Re: NullPointerException

2014-12-31 Thread rapelly kartheek

Ok. Let me try out on a newer version.

Thank you!!

On Thu, Jan 1, 2015 at 12:17 PM, Josh Rosen rosenvi...@gmail.com wrote:

 It looks like 'null' might be selected as a block replication peer?
 https://github.com/apache/spark/blob/v1.0.0/core/src/main/scala/org/apache/spark/storage/BlockManager.scala#L786

 I know that we fixed some replication bugs in newer versions of Spark
 (such as https://github.com/apache/spark/pull/2366), so it's possible
 that this issue would be resolved by updating.  Can you try re-running your
 job with a newer Spark version to see whether you still see the same error?

 On Wed, Dec 31, 2014 at 10:35 PM, rapelly kartheek 
 kartheek.m...@gmail.com wrote:

 spark-1.0.0

 On Thu, Jan 1, 2015 at 12:04 PM, Josh Rosen rosenvi...@gmail.com wrote:

 Which version of Spark are you using?

 On Wed, Dec 31, 2014 at 10:24 PM, rapelly kartheek 
 kartheek.m...@gmail.com wrote:

 Hi,
 I get this following Exception when I submit spark application that
 calculates the frequency of characters in a file. Especially, when I
 increase the size of data, I face this problem.

 Exception in thread Thread-47 org.apache.spark.SparkException: Job
 aborted due to stage failure: Task 11.0:10 failed 4 times, most recent
 failure: Exception failure in TID 295 on host s1:
 java.lang.NullPointerException
 org.apache.spark.storage.BlockManager.org
 $apache$spark$storage$BlockManager$$replicate(BlockManager.scala:786)

 org.apache.spark.storage.BlockManager.doPut(BlockManager.scala:752)

 org.apache.spark.storage.BlockManager.put(BlockManager.scala:574)

 org.apache.spark.CacheManager.getOrCompute(CacheManager.scala:108)
 org.apache.spark.rdd.RDD.iterator(RDD.scala:227)
 org.apache.spark.rdd.FilteredRDD.compute(FilteredRDD.scala:34)
 org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)
 org.apache.spark.rdd.RDD.iterator(RDD.scala:229)
 org.apache.spark.rdd.MappedRDD.compute(MappedRDD.scala:31)
 org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)
 org.apache.spark.rdd.RDD.iterator(RDD.scala:229)

 org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:111)
 org.apache.spark.scheduler.Task.run(Task.scala:51)

 org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:187)

 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)

 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
 java.lang.Thread.run(Thread.java:745)
 Driver stacktrace:
 at org.apache.spark.scheduler.DAGScheduler.org
 $apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1033)
 at
 org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1017)
 at
 org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1015)
 at
 scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
 at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
 at
 org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1015)
 at
 org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:633)
 at
 org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:633)
 at scala.Option.foreach(Option.scala:236)
 at
 org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:633)
 at
 org.apache.spark.scheduler.DAGSchedulerEventProcessActor$$anonfun$receive$2.applyOrElse(DAGScheduler.scala:1207)
 at akka.actor.ActorCell.receiveMessage(ActorCell.scala:498)
 at akka.actor.ActorCell.invoke(ActorCell.scala:456)
 at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:237)
 at akka.dispatch.Mailbox.run(Mailbox.scala:219)
 at
 akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:386)
 at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
 at
 scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
 at
 scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
 at
 scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)


 Any help?

 Thank you!

Spark-1.2.0 build error

2015-01-02 Thread rapelly kartheek

Hi,

I get the following error when I build spark using sbt:

[error] Nonzero exit code (128): git clone
https://github.com/ScrapCodes/sbt-pom-reader.git
/home/karthik/.sbt/0.13/staging/ad8e8574a5bcb2d22d23/sbt-pom-reader
[error] Use 'last' for the full log.


Any help please?

Re: UnknownhostException : home

2015-01-19 Thread Rapelly Kartheek

Yes yes.. hadoop/etc/hadoop/hdfs-site.xml file has the path like:
hdfs://home/...

On Mon, Jan 19, 2015 at 3:21 PM, Sean Owen so...@cloudera.com wrote:

 I bet somewhere you have a path like hdfs://home/... which would
 suggest that 'home' is a hostname, when I imagine you mean it as a
 root directory.

 On Mon, Jan 19, 2015 at 9:33 AM, Rapelly Kartheek
 kartheek.m...@gmail.com wrote:
  Hi,
 
  I get the following exception when I run my application:
 
  karthik@karthik:~/spark-1.2.0$ ./bin/spark-submit --class
  org.apache.spark.examples.SimpleApp001 --deploy-mode client --master
  spark://karthik:7077 $SPARK_HOME/examples/*/scala-*/spark-examples-*.jar
 out1.txt
  log4j:WARN No such property [target] in org.apache.log4j.FileAppender.
  Exception in thread main java.lang.IllegalArgumentException:
  java.net.UnknownHostException: home
  at
 
 org.apache.hadoop.security.SecurityUtil.buildTokenService(SecurityUtil.java:377)
  at
 
 org.apache.hadoop.hdfs.NameNodeProxies.createNonHAProxy(NameNodeProxies.java:237)
  at
 
 org.apache.hadoop.hdfs.NameNodeProxies.createProxy(NameNodeProxies.java:141)
  at org.apache.hadoop.hdfs.DFSClient.init(DFSClient.java:569)
  at org.apache.hadoop.hdfs.DFSClient.init(DFSClient.java:512)
  at
 
 org.apache.hadoop.hdfs.DistributedFileSystem.initialize(DistributedFileSystem.java:142)
  at
  org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2316)
  at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:366)
  at org.apache.spark.util.FileLogger.init(FileLogger.scala:90)
  at
 
 org.apache.spark.scheduler.EventLoggingListener.init(EventLoggingListener.scala:63)
  at org.apache.spark.SparkContext.init(SparkContext.scala:352)
  at
 org.apache.spark.examples.SimpleApp001$.main(SimpleApp001.scala:13)
  at org.apache.spark.examples.SimpleApp001.main(SimpleApp001.scala)
  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
  at
 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
  at
 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
  at java.lang.reflect.Method.invoke(Method.java:606)
  at org.apache.spark.deploy.SparkSubmit$.launch(SparkSubmit.scala:358)
  at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:75)
  at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
  Caused by: java.net.UnknownHostException: home
  ... 20 more
 
 
  I couldn't trace the cause of this exception. Any help in this regard?
 
  Thanks

UnknownhostException : home

2015-01-19 Thread Rapelly Kartheek

Hi,

I get the following exception when I run my application:

karthik@karthik:~/spark-1.2.0$ ./bin/spark-submit --class
org.apache.spark.examples.SimpleApp001 --deploy-mode client --master
spark://karthik:7077 $SPARK_HOME/examples/*/scala-*/spark-examples-*.jar
out1.txt
log4j:WARN No such property [target] in org.apache.log4j.FileAppender.
Exception in thread main java.lang.IllegalArgumentException:
java.net.UnknownHostException: home
at
org.apache.hadoop.security.SecurityUtil.buildTokenService(SecurityUtil.java:377)
at
org.apache.hadoop.hdfs.NameNodeProxies.createNonHAProxy(NameNodeProxies.java:237)
at
org.apache.hadoop.hdfs.NameNodeProxies.createProxy(NameNodeProxies.java:141)
at org.apache.hadoop.hdfs.DFSClient.init(DFSClient.java:569)
at org.apache.hadoop.hdfs.DFSClient.init(DFSClient.java:512)
at
org.apache.hadoop.hdfs.DistributedFileSystem.initialize(DistributedFileSystem.java:142)
at
org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2316)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:366)
at org.apache.spark.util.FileLogger.init(FileLogger.scala:90)
at
org.apache.spark.scheduler.EventLoggingListener.init(EventLoggingListener.scala:63)
at org.apache.spark.SparkContext.init(SparkContext.scala:352)
at org.apache.spark.examples.SimpleApp001$.main(SimpleApp001.scala:13)
at org.apache.spark.examples.SimpleApp001.main(SimpleApp001.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.spark.deploy.SparkSubmit$.launch(SparkSubmit.scala:358)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:75)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Caused by: java.net.UnknownHostException: home
... 20 more


I couldn't trace the cause of this exception. Any help in this regard?

Thanks

Re: UnknownhostException : home

2015-01-19 Thread Rapelly Kartheek

Actually, I don't have any entry in my /etc/hosts file with hostname:
home. Infact, I didn't use this hostname naywhere. Then why is it that
its trying to resolve this?

On Mon, Jan 19, 2015 at 3:15 PM, Ashish paliwalash...@gmail.com wrote:

 it's not able to resolve home to an IP.
 Assuming it's your local machine, add an entry in your /etc/hosts file
 like and then run the program again (use sudo to edit the file)

 127.0.0.1 home

 On Mon, Jan 19, 2015 at 3:03 PM, Rapelly Kartheek
 kartheek.m...@gmail.com wrote:
  Hi,
 
  I get the following exception when I run my application:
 
  karthik@karthik:~/spark-1.2.0$ ./bin/spark-submit --class
  org.apache.spark.examples.SimpleApp001 --deploy-mode client --master
  spark://karthik:7077 $SPARK_HOME/examples/*/scala-*/spark-examples-*.jar
 out1.txt
  log4j:WARN No such property [target] in org.apache.log4j.FileAppender.
  Exception in thread main java.lang.IllegalArgumentException:
  java.net.UnknownHostException: home
  at
 
 org.apache.hadoop.security.SecurityUtil.buildTokenService(SecurityUtil.java:377)
  at
 
 org.apache.hadoop.hdfs.NameNodeProxies.createNonHAProxy(NameNodeProxies.java:237)
  at
 
 org.apache.hadoop.hdfs.NameNodeProxies.createProxy(NameNodeProxies.java:141)
  at org.apache.hadoop.hdfs.DFSClient.init(DFSClient.java:569)
  at org.apache.hadoop.hdfs.DFSClient.init(DFSClient.java:512)
  at
 
 org.apache.hadoop.hdfs.DistributedFileSystem.initialize(DistributedFileSystem.java:142)
  at
  org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2316)
  at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:366)
  at org.apache.spark.util.FileLogger.init(FileLogger.scala:90)
  at
 
 org.apache.spark.scheduler.EventLoggingListener.init(EventLoggingListener.scala:63)
  at org.apache.spark.SparkContext.init(SparkContext.scala:352)
  at
 org.apache.spark.examples.SimpleApp001$.main(SimpleApp001.scala:13)
  at org.apache.spark.examples.SimpleApp001.main(SimpleApp001.scala)
  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
  at
 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
  at
 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
  at java.lang.reflect.Method.invoke(Method.java:606)
  at org.apache.spark.deploy.SparkSubmit$.launch(SparkSubmit.scala:358)
  at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:75)
  at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
  Caused by: java.net.UnknownHostException: home
  ... 20 more
 
 
  I couldn't trace the cause of this exception. Any help in this regard?
 
  Thanks



 --
 thanks
 ashish

 Blog: http://www.ashishpaliwal.com/blog
 My Photo Galleries: http://www.pbase.com/ashishpaliwal

Re: UnknownhostException : home

2015-01-19 Thread Rapelly Kartheek

Yeah... I made that mistake in spark/conf/spark-defaults.conf for
setting:  spark.eventLog.dir.
Now it works

Thank you
Karthik


On Mon, Jan 19, 2015 at 3:29 PM, Sean Owen so...@cloudera.com wrote:

 Sorry, to be clear, you need to write hdfs:///home/ Note three
 slashes; there is an empty host between the 2nd and 3rd. This is true
 of most URI schemes with a host.

 On Mon, Jan 19, 2015 at 9:56 AM, Rapelly Kartheek
 kartheek.m...@gmail.com wrote:
  Yes yes.. hadoop/etc/hadoop/hdfs-site.xml file has the path like:
  hdfs://home/...
 
  On Mon, Jan 19, 2015 at 3:21 PM, Sean Owen so...@cloudera.com wrote:
 
  I bet somewhere you have a path like hdfs://home/... which would
  suggest that 'home' is a hostname, when I imagine you mean it as a
  root directory.
 
  On Mon, Jan 19, 2015 at 9:33 AM, Rapelly Kartheek
  kartheek.m...@gmail.com wrote:
   Hi,
  
   I get the following exception when I run my application:
  
   karthik@karthik:~/spark-1.2.0$ ./bin/spark-submit --class
   org.apache.spark.examples.SimpleApp001 --deploy-mode client --master
   spark://karthik:7077
 $SPARK_HOME/examples/*/scala-*/spark-examples-*.jar
  out1.txt
   log4j:WARN No such property [target] in org.apache.log4j.FileAppender.
   Exception in thread main java.lang.IllegalArgumentException:
   java.net.UnknownHostException: home
   at
  
  
 org.apache.hadoop.security.SecurityUtil.buildTokenService(SecurityUtil.java:377)
   at
  
  
 org.apache.hadoop.hdfs.NameNodeProxies.createNonHAProxy(NameNodeProxies.java:237)
   at
  
  
 org.apache.hadoop.hdfs.NameNodeProxies.createProxy(NameNodeProxies.java:141)
   at org.apache.hadoop.hdfs.DFSClient.init(DFSClient.java:569)
   at org.apache.hadoop.hdfs.DFSClient.init(DFSClient.java:512)
   at
  
  
 org.apache.hadoop.hdfs.DistributedFileSystem.initialize(DistributedFileSystem.java:142)
   at
   org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2316)
   at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:366)
   at org.apache.spark.util.FileLogger.init(FileLogger.scala:90)
   at
  
  
 org.apache.spark.scheduler.EventLoggingListener.init(EventLoggingListener.scala:63)
   at org.apache.spark.SparkContext.init(SparkContext.scala:352)
   at
   org.apache.spark.examples.SimpleApp001$.main(SimpleApp001.scala:13)
   at org.apache.spark.examples.SimpleApp001.main(SimpleApp001.scala)
   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   at
  
  
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
   at
  
  
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
   at java.lang.reflect.Method.invoke(Method.java:606)
   at
   org.apache.spark.deploy.SparkSubmit$.launch(SparkSubmit.scala:358)
   at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:75)
   at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
   Caused by: java.net.UnknownHostException: home
   ... 20 more
  
  
   I couldn't trace the cause of this exception. Any help in this regard?
  
   Thanks

Re: Problem with building spark-1.2.0

2015-01-12 Thread Rapelly Kartheek

Yes, this proxy problem is resolved.

*how your build refers tohttps://github.com/ScrapCodes/sbt-pom-reader.git
https://github.com/ScrapCodes/sbt-pom-reader.git I don't see thisrepo
the project code base.*
I manually downloaded the sbt-pom-reader directory and moved into
.sbt/0.13/staging/*/ directory. But, I face the following:

karthik@s4:~/spark-1.2.0$ SPARK_HADOOP_VERSION = 2.3.0 sbt/sbt assembly
Using /usr/lib/jvm/java-7-oracle as default JAVA_HOME.
Note, this will be overridden by -java-home if it is set.
[info] Loading project definition from
/home/karthik/spark-1.2.0/project/project
[info] Loading project definition from
/home/karthik/.sbt/0.13/staging/ad8e8574a5bcb2d22d23/sbt-pom-reader/project
[warn] Multiple resolvers having different access mechanism configured with
same name 'sbt-plugin-releases'. To avoid conflict, Remove duplicate
project resolvers (`resolvers`) or rename publishing resolver (`publishTo`).
[info] Updating
{file:/home/karthik/.sbt/0.13/staging/ad8e8574a5bcb2d22d23/sbt-pom-reader/project/}sbt-pom-reader-build...
[info] Resolving com.typesafe.sbt#sbt-ghpages;0.5.2 ...

Could you please tell me how do I build stand-alone spark-1.2.0 with sbt
correctly?

On Mon, Jan 12, 2015 at 4:21 PM, Sean Owen so...@cloudera.com wrote:

The problem is there in the logs. When it went to clone some code,
something went wrong with the proxy:

Received HTTP code 407 from proxy after CONNECT

Probably you have an HTTP proxy and you have not authenticated. It's
specific to your environment.

Although it's unrelated, I'm curious how your build refers to
https://github.com/ScrapCodes/sbt-pom-reader.git I don't see this
repo the project code base.

On Mon, Jan 12, 2015 at 9:09 AM, Kartheek.R kartheek.m...@gmail.com
wrote:
Hi,
This is what I am trying to do:

karthik@s4:~/spark-1.2.0$ SPARK_HADOOP_VERSION=2.3.0 sbt/sbt clean
Using /usr/lib/jvm/java-7-oracle as default JAVA_HOME.
Note, this will be overridden by -java-home if it is set.
[info] Loading project definition from
/home/karthik/spark-1.2.0/project/project
Cloning into
'/home/karthik/.sbt/0.13/staging/ad8e8574a5bcb2d22d23/sbt-pom-reader'...
fatal: unable to access '
https://github.com/ScrapCodes/sbt-pom-reader.git/':
Received HTTP code 407 from proxy after CONNECT
java.lang.RuntimeException: Nonzero exit code (128): git clone
https://github.com/ScrapCodes/sbt-pom-reader.git

Re: Problem with building spark-1.2.0

2015-01-04 Thread Rapelly Kartheek

yeah.. but none of the sites get opened.

On Sun, Jan 4, 2015 at 10:35 PM, Ted Yu yuzhih...@gmail.com wrote:

 Have you used Google to find some way of accessing github :-)



 On Jan 4, 2015, at 8:46 AM, Kartheek.R kartheek.m...@gmail.com wrote:

 The problem is that my network is not able to access github.com for
 cloning some dependencies as github is blocked in India. What are the other
 possible ways for this problem??

 Thank you!

 On Sun, Jan 4, 2015 at 9:45 PM, Rapelly Kartheek [hidden email]
 http:///user/SendEmail.jtp?type=nodenode=20963i=0 wrote:

 Hi,

 I get the following error when I build spark-1.2.0 using sbt:

 [error] Nonzero exit code (128): git clone
 https://github.com/ScrapCodes/sbt-pom-reader.git
 /home/karthik/.sbt/0.13/staging/ad8e8574a5bcb2d22d23/sbt-pom-reader
 [error] Use 'last' for the full log.

 Any help please?

 Thanks



 --
 View this message in context: Re: Problem with building spark-1.2.0
 http://apache-spark-user-list.1001560.n3.nabble.com/Re-Problem-with-building-spark-1-2-0-tp20963.html
 Sent from the Apache Spark User List mailing list archive
 http://apache-spark-user-list.1001560.n3.nabble.com/ at Nabble.com.

58 matches

Mail list logo