Re: Specify node where driver should run

2016-06-06 Thread Saiph Kappa
ever node that > is as specified in your yarn conf. > On Jun 5, 2016 4:54 PM, "Saiph Kappa" <saiph.ka...@gmail.com> wrote: > >> Hi, >> >> In yarn-cluster mode, is there any way to specify on which node I want >> the driver to run? >> >> Thanks. >> >

Specify node where driver should run

2016-06-05 Thread Saiph Kappa
Hi, In yarn-cluster mode, is there any way to specify on which node I want the driver to run? Thanks.

Specify number of executors in standalone cluster mode

2016-02-21 Thread Saiph Kappa
Hi, I'm running a spark streaming application onto a spark cluster that spans 6 machines/workers. I'm using spark cluster standalone mode. Each machine has 8 cores. Is there any way to specify that I want to run my application on all 6 machines and just use 2 cores on each machine? Thanks

Re: ClassNotFoundException when executing spark jobs in standalone/cluster mode on Spark 1.5.2

2015-12-29 Thread Saiph Kappa
> > On Fri, Dec 25, 2015 at 5:57 PM, Saiph Kappa <saiph.ka...@gmail.com> > wrote: > >> Hi, >> >> I'm submitting a spark job like this: >> >> ~/spark-1.5.2-bin-hadoop2.6/bin/spark-submit --class Benchmark --master >>> spark://machine1:6066 --de

ClassNotFoundException when executing spark jobs in standalone/cluster mode on Spark 1.5.2

2015-12-25 Thread Saiph Kappa
Hi, I'm submitting a spark job like this: ~/spark-1.5.2-bin-hadoop2.6/bin/spark-submit --class Benchmark --master > spark://machine1:6066 --deploy-mode cluster --jars > target/scala-2.10/benchmark-app_2.10-0.1-SNAPSHOT.jar > /home/user/bench/target/scala-2.10/benchmark-app_2.10-0.1-SNAPSHOT.jar

Re: Spark Streaming: How to specify deploy mode through configuration parameter?

2015-12-17 Thread Saiph Kappa
PM, Ted Yu <yuzhih...@gmail.com> wrote: > Since both scala and java files are involved in the PR, I don't see an > easy way around without building yourself. > > Cheers > > On Wed, Dec 16, 2015 at 10:18 AM, Saiph Kappa <saiph.ka...@gmail.com> > wrote: > >>

How to submit spark job to YARN from scala code

2015-12-17 Thread Saiph Kappa
Hi, Since it is not currently possible to submit a spark job to a spark cluster running in standalone mode (cluster mode - it's not currently possible to specify this deploy mode within the code), can I do it with YARN? I tried to do something like this (but in scala): « ... // Client object -

Re: Spark Streaming: How to specify deploy mode through configuration parameter?

2015-12-16 Thread Saiph Kappa
> > On Wed, Dec 16, 2015 at 7:31 AM, Saiph Kappa <saiph.ka...@gmail.com> > wrote: > >> Hi, >> >> I have a client application running on host0 that is launching multiple >> drivers on multiple remote standalone spark clusters (each cluster is >> runni

Spark Streaming: How to specify deploy mode through configuration parameter?

2015-12-16 Thread Saiph Kappa
Hi, I have a client application running on host0 that is launching multiple drivers on multiple remote standalone spark clusters (each cluster is running on a single machine): « ... List("host1", "host2" , "host3").foreach(host => { val sparkConf = new SparkConf() sparkConf.setAppName("App")

Re: Spark Streaming - stream between 2 applications

2015-11-20 Thread Saiph Kappa
I think my problem persists whether I use Kafka or sockets. Or am I wrong? How would you use Kafka here? On Fri, Nov 20, 2015 at 7:12 PM, Christian <engr...@gmail.com> wrote: > Have you considered using Kafka? > > On Fri, Nov 20, 2015 at 6:48 AM Saiph Kappa <saiph.ka...@gmail.c

Re: DataGenerator for streaming application

2015-09-21 Thread Saiph Kappa
the data? I believe > rawSocketStream waits for a big chunk of data before it can start > processing it. I think what you are writing is a String and you should use > socketTextStream which reads the data on a per line basis. > > On Sun, Sep 20, 2015 at 9:56 AM, Saiph Kappa <saiph.ka...@gmail.com

DataGenerator for streaming application

2015-09-19 Thread Saiph Kappa
Hi, I am trying to build a data generator that feeds a streaming application. This data generator just reads a file and send its lines through a socket. I get no errors on the logs, and the benchmark bellow always prints "Received 0 records". Am I doing something wrong? object MyDataGenerator {

Re: Why can't I allocate more than 4 executors with 2 machines on YARN?

2015-06-22 Thread Saiph Kappa
. On Mon, Jun 22, 2015 at 10:42 PM, ÐΞ€ρ@Ҝ (๏̯͡๏) deepuj...@gmail.com wrote: 1) Can you try with yarn-cluster 2) Does your queue have enough capacity On Mon, Jun 22, 2015 at 11:10 AM, Saiph Kappa saiph.ka...@gmail.com wrote: Hi, I am running a simple spark streaming application on hadoop 2.7.0

Why can't I allocate more than 4 executors with 2 machines on YARN?

2015-06-22 Thread Saiph Kappa
Hi, I am running a simple spark streaming application on hadoop 2.7.0/YARN (master: yarn-client) cluster with 2 different machines (12GB RAM with 8 CPU cores each). I am launching my application like this: ~/myapp$ ~/my-spark/bin/spark-submit --class App --master yarn-client --driver-memory 4g

Re: Unable to use more than 1 executor for spark streaming application with YARN

2015-06-17 Thread Saiph Kappa
How can I get more information regarding this exception? On Wed, Jun 17, 2015 at 1:17 AM, Saiph Kappa saiph.ka...@gmail.com wrote: Hi, I am running a simple spark streaming application on hadoop 2.7.0/YARN (master: yarn-client) with 2 executors in different machines. However, while the app

Unable to use more than 1 executor for spark streaming application with YARN

2015-06-16 Thread Saiph Kappa
Hi, I am running a simple spark streaming application on hadoop 2.7.0/YARN (master: yarn-client) with 2 executors in different machines. However, while the app is running, I can see on the app web UI (tab executors) that only 1 executor keeps completing tasks over time, the other executor only

Re: How to run spark streaming application on YARN?

2015-06-05 Thread Saiph Kappa
. On Thu, Jun 4, 2015 at 7:20 PM, Saiph Kappa saiph.ka...@gmail.com wrote: Additionally, I think this document ( https://spark.apache.org/docs/latest/building-spark.html ) should mention that the protobuf.version might need to be changed to match the one used in the chosen hadoop version

Re: How to run spark streaming application on YARN?

2015-06-04 Thread Saiph Kappa
it with spark-submit or using org .apache.spark.deploy.yarn.Client. 2015-06-04 20:30 GMT+03:00 Saiph Kappa saiph.ka...@gmail.com: No, I am not. I run it with sbt «sbt run-main Branchmark». I thought it was the same thing since I am passing all the configurations through the application code

Re: How to run spark streaming application on YARN?

2015-06-04 Thread Saiph Kappa
to be able to run my application. On Thu, Jun 4, 2015 at 7:14 PM, Sandy Ryza sandy.r...@cloudera.com wrote: That might work, but there might also be other steps that are required. -Sandy On Thu, Jun 4, 2015 at 11:13 AM, Saiph Kappa saiph.ka...@gmail.com wrote: Thanks! It is working fine now

Dynamic Allocation with Spark Streaming

2015-05-22 Thread Saiph Kappa
Hi, 1. Dynamic allocation is currently only supported with YARN, correct? 2. In spark streaming, it is possible to change the number of executors while an application is running? If so, can the allocation be controlled by the application, instead of using any already defined automatic policy?

Re: Dynamic Allocation with Spark Streaming

2015-05-22 Thread Saiph Kappa
wrote: For #1, the answer is yes. For #2, See TD's comments on SPARK-7661 Cheers On Fri, May 22, 2015 at 6:58 PM, Saiph Kappa saiph.ka...@gmail.com wrote: Hi, 1. Dynamic allocation is currently only supported with YARN, correct? 2. In spark streaming, it is possible to change the number

Re: Dynamic Allocation with Spark Streaming

2015-05-22 Thread Saiph Kappa
Or should I shutdown the streaming context gracefully and then start it again with a different number of executors? On Sat, May 23, 2015 at 4:00 AM, Saiph Kappa saiph.ka...@gmail.com wrote: Sorry, but I can't see on TD's comments how to allocate executors on demand. It seems to me that he's

Re: Could not compute split, block not found in Spark Streaming Simple Application

2015-04-13 Thread Saiph Kappa
- [records/sec] - Last Error Receiver-0---10-10-10-9-Receiver-1--- On Thu, Apr 9, 2015 at 7:55 PM, Tathagata Das t...@databricks.com wrote: Are you running # of receivers = # machines? TD On Thu, Apr 9, 2015 at 9:56 AM, Saiph Kappa saiph.ka...@gmail.com wrote: Sorry, I was getting

Re: Could not compute split, block not found in Spark Streaming Simple Application

2015-04-09 Thread Saiph Kappa
the driver and the workers and give it to me? Basically I want to trace through what is happening to the block that is not being found. And can you tell what Cluster manager are you using? Spark Standalone, Mesos or YARN? On Fri, Mar 27, 2015 at 10:09 AM, Saiph Kappa saiph.ka...@gmail.com wrote

Could not compute split, block not found in Spark Streaming Simple Application

2015-03-27 Thread Saiph Kappa
Hi, I am just running this simple example with machineA: 1 master + 1 worker machineB: 1 worker « val ssc = new StreamingContext(sparkConf, Duration(1000)) val rawStreams = (1 to numStreams).map(_ =ssc.rawSocketStream[String](host, port, StorageLevel.MEMORY_ONLY_SER)).toArray val

Why different numbers of partitions give different results for the same computation on the same dataset?

2015-03-03 Thread Saiph Kappa
Hi, I have a spark streaming application, running on a single node, consisting mainly of map operations. I perform repartitioning to control the number of CPU cores that I want to use. The code goes like this: val ssc = new StreamingContext(sparkConf, Seconds(5)) val distFile =

Re: throughput in the web console?

2015-03-03 Thread Saiph Kappa
Sorry I made a mistake. Please ignore my question. On Tue, Mar 3, 2015 at 2:47 AM, Saiph Kappa saiph.ka...@gmail.com wrote: I performed repartitioning and everything went fine with respect to the number of CPU cores being used (and respective times). However, I noticed something very strange

Re: Why different numbers of partitions give different results for the same computation on the same dataset?

2015-03-03 Thread Saiph Kappa
Sorry I made a mistake in my code. Please ignore my question number 2. Different numbers of partitions give *the same* results! On Tue, Mar 3, 2015 at 7:32 PM, Saiph Kappa saiph.ka...@gmail.com wrote: Hi, I have a spark streaming application, running on a single node, consisting mainly

Re: throughput in the web console?

2015-03-02 Thread Saiph Kappa
the received streams (DStream.repartition) with sufficient partitions to load balance across more machines. TD On Thu, Feb 26, 2015 at 9:52 AM, Saiph Kappa saiph.ka...@gmail.com wrote: One more question: while processing the exact same batch I noticed that giving more CPUs to the worker

Re: throughput in the web console?

2015-02-26 Thread Saiph Kappa
, whether I was using 4 or 6 or 8 CPUs. On Thu, Feb 26, 2015 at 5:35 PM, Saiph Kappa saiph.ka...@gmail.com wrote: By setting spark.eventLog.enabled to true it is possible to see the application UI after the application has finished its execution, however the Streaming tab is no longer visible

Re: throughput in the web console?

2015-02-26 Thread Saiph Kappa
By setting spark.eventLog.enabled to true it is possible to see the application UI after the application has finished its execution, however the Streaming tab is no longer visible. For measuring the duration of batches in the code I am doing something like this: «wordCharValues.foreachRDD(rdd = {

How can I measure the time an RDD takes to execute?

2015-01-10 Thread Saiph Kappa
Hi, How can I measure the time an RDD takes to execute? In particular, I want to do it for the following piece of code: « val ssc = new StreamingContext(sparkConf, Seconds(5)) val distFile = ssc.textFileStream(/home/myuser/twitter-dump) val words = distFile.flatMap(_.split( )).filter(_.length

Re: Question about textFileStream

2014-11-12 Thread Saiph Kappa
What if the window is of 5 seconds, and the file takes longer than 5 seconds to be completely scanned? It will still attempt to load the whole file? On Mon, Nov 10, 2014 at 6:24 PM, Soumitra Kumar kumar.soumi...@gmail.com wrote: Entire file in a window. On Mon, Nov 10, 2014 at 9:20 AM, Saiph

Question about textFileStream

2014-11-10 Thread Saiph Kappa
Hi, In my application I am doing something like this new StreamingContext(sparkConf, Seconds(10)).textFileStream(logs/), and I get some unknown exceptions when I copy a file with about 800 MB to that folder (logs/). I have a single worker running with 512 MB of memory. Anyone can tell me if

Re: ERROR UserGroupInformation: PriviledgedActionException

2014-11-05 Thread Saiph Kappa
/dependency On Wed, Nov 5, 2014 at 6:32 AM, Akhil Das ak...@sigmoidanalytics.com wrote: Its more like you are having different versions of spark Thanks Best Regards On Wed, Nov 5, 2014 at 3:05 AM, Saiph Kappa saiph.ka...@gmail.com wrote: I set the host and port of the driver and now

Re: Spark Streaming Applications

2014-10-23 Thread Saiph Kappa
On Tue, Oct 21, 2014 at 4:33 PM, Saiph Kappa saiph.ka...@gmail.com wrote: Hi, I have been trying to find a fairly complex application that makes use of the Spark Streaming framework. I checked public github repos but the examples I found were too simple, only comprising simple operations

Spark Streaming Applications

2014-10-21 Thread Saiph Kappa
Hi, I have been trying to find a fairly complex application that makes use of the Spark Streaming framework. I checked public github repos but the examples I found were too simple, only comprising simple operations like counters and sums. On the Spark summit website, I could find very interesting

Simple Question: Spark Streaming Applications

2014-09-29 Thread Saiph Kappa
Hi, Do all spark streaming applications use the map operation? or the majority of them? Thanks.