Re: Specify node where driver should run

2016-06-06 Thread Saiph Kappa
pecified in your yarn conf. > On Jun 5, 2016 4:54 PM, "Saiph Kappa" wrote: > >> Hi, >> >> In yarn-cluster mode, is there any way to specify on which node I want >> the driver to run? >> >> Thanks. >> >

Specify node where driver should run

2016-06-05 Thread Saiph Kappa
Hi, In yarn-cluster mode, is there any way to specify on which node I want the driver to run? Thanks.

Specify number of executors in standalone cluster mode

2016-02-21 Thread Saiph Kappa
Hi, I'm running a spark streaming application onto a spark cluster that spans 6 machines/workers. I'm using spark cluster standalone mode. Each machine has 8 cores. Is there any way to specify that I want to run my application on all 6 machines and just use 2 cores on each machine? Thanks

Re: ClassNotFoundException when executing spark jobs in standalone/cluster mode on Spark 1.5.2

2015-12-29 Thread Saiph Kappa
tirely sure why, but everything goes fine without that line. Thanks! On Tue, Dec 29, 2015 at 1:39 PM, Prem Spark wrote: > you need make sure this class is accessible to all servers since its a > cluster mode and drive can be on any of the worker nodes. > > > On Fri, Dec 25, 20

ClassNotFoundException when executing spark jobs in standalone/cluster mode on Spark 1.5.2

2015-12-25 Thread Saiph Kappa
Hi, I'm submitting a spark job like this: ~/spark-1.5.2-bin-hadoop2.6/bin/spark-submit --class Benchmark --master > spark://machine1:6066 --deploy-mode cluster --jars > target/scala-2.10/benchmark-app_2.10-0.1-SNAPSHOT.jar > /home/user/bench/target/scala-2.10/benchmark-app_2.10-0.1-SNAPSHOT.jar 1

How to submit spark job to YARN from scala code

2015-12-17 Thread Saiph Kappa
Hi, Since it is not currently possible to submit a spark job to a spark cluster running in standalone mode (cluster mode - it's not currently possible to specify this deploy mode within the code), can I do it with YARN? I tried to do something like this (but in scala): « ... // Client object -

Re: Spark Streaming: How to specify deploy mode through configuration parameter?

2015-12-17 Thread Saiph Kappa
PM, Ted Yu wrote: > Since both scala and java files are involved in the PR, I don't see an > easy way around without building yourself. > > Cheers > > On Wed, Dec 16, 2015 at 10:18 AM, Saiph Kappa > wrote: > >> Exactly, but it's only fixed for the next sp

Re: Spark Streaming: How to specify deploy mode through configuration parameter?

2015-12-16 Thread Saiph Kappa
Exactly, but it's only fixed for the next spark version. Is there any work around for version 1.5.2? On Wed, Dec 16, 2015 at 4:36 PM, Ted Yu wrote: > This seems related: > [SPARK-10123][DEPLOY] Support specifying deploy mode from configuration > > FYI > > On Wed, Dec 16,

Spark Streaming: How to specify deploy mode through configuration parameter?

2015-12-16 Thread Saiph Kappa
Hi, I have a client application running on host0 that is launching multiple drivers on multiple remote standalone spark clusters (each cluster is running on a single machine): « ... List("host1", "host2" , "host3").foreach(host => { val sparkConf = new SparkConf() sparkConf.setAppName("App") s

Re: Spark Streaming - stream between 2 applications

2015-11-20 Thread Saiph Kappa
I think my problem persists whether I use Kafka or sockets. Or am I wrong? How would you use Kafka here? On Fri, Nov 20, 2015 at 7:12 PM, Christian wrote: > Have you considered using Kafka? > > On Fri, Nov 20, 2015 at 6:48 AM Saiph Kappa wrote: > >> Hi, >> >>

Spark Streaming - stream between 2 applications

2015-11-20 Thread Saiph Kappa
Hi, I have a basic spark streaming application like this: « ... val ssc = new StreamingContext(sparkConf, Duration(batchMillis)) val rawStreams = (1 to numStreams).map(_ => ssc.rawSocketStream[String](host, port, StorageLevel.MEMORY_ONLY_SER)).toArray val union = ssc.union(rawStreams) union.f

Spark Streaming + SparkSQL, time based windowing queries

2015-11-13 Thread Saiph Kappa
Hi, Does SparkSQL support time based windowing queries over streams like the following one (from Intel/StreamingSQL): « sql( """|SELECT t.word, COUNT(t.word)|FROM (SELECT * FROM test) OVER (WINDOW '9' SECONDS, SLIDE '3' SECONDS) AS t|GROUP BY t.word """.stripMargin) » What are my

Re: DataGenerator for streaming application

2015-09-21 Thread Saiph Kappa
cketStream waits for a big chunk of data before it can start > processing it. I think what you are writing is a String and you should use > socketTextStream which reads the data on a per line basis. > > On Sun, Sep 20, 2015 at 9:56 AM, Saiph Kappa > wrote: > >> Hi, >&g

DataGenerator for streaming application

2015-09-19 Thread Saiph Kappa
Hi, I am trying to build a data generator that feeds a streaming application. This data generator just reads a file and send its lines through a socket. I get no errors on the logs, and the benchmark bellow always prints "Received 0 records". Am I doing something wrong? object MyDataGenerator {

Re: Why can't I allocate more than 4 executors with 2 machines on YARN?

2015-06-22 Thread Saiph Kappa
Mon, Jun 22, 2015 at 10:42 PM, ÐΞ€ρ@Ҝ (๏̯͡๏) wrote: > 1) Can you try with yarn-cluster > 2) Does your queue have enough capacity > > On Mon, Jun 22, 2015 at 11:10 AM, Saiph Kappa > wrote: > >> Hi, >> >> I am running a simple spark streaming application o

Why can't I allocate more than 4 executors with 2 machines on YARN?

2015-06-22 Thread Saiph Kappa
Hi, I am running a simple spark streaming application on hadoop 2.7.0/YARN (master: yarn-client) cluster with 2 different machines (12GB RAM with 8 CPU cores each). I am launching my application like this: ~/myapp$ ~/my-spark/bin/spark-submit --class App --master yarn-client --driver-memory 4g -

Re: Unable to use more than 1 executor for spark streaming application with YARN

2015-06-17 Thread Saiph Kappa
How can I get more information regarding this exception? On Wed, Jun 17, 2015 at 1:17 AM, Saiph Kappa wrote: > Hi, > > I am running a simple spark streaming application on hadoop 2.7.0/YARN > (master: yarn-client) with 2 executors in different machines. However, > while the ap

Unable to use more than 1 executor for spark streaming application with YARN

2015-06-16 Thread Saiph Kappa
Hi, I am running a simple spark streaming application on hadoop 2.7.0/YARN (master: yarn-client) with 2 executors in different machines. However, while the app is running, I can see on the app web UI (tab executors) that only 1 executor keeps completing tasks over time, the other executor only wor

Re: How to run spark streaming application on YARN?

2015-06-05 Thread Saiph Kappa
Thanks. On Thu, Jun 4, 2015 at 7:20 PM, Saiph Kappa wrote: > Additionally, I think this document ( > https://spark.apache.org/docs/latest/building-spark.html ) should mention > that the protobuf.version might need to be changed to match the one used in > the chosen hadoop version

Re: How to run spark streaming application on YARN?

2015-06-04 Thread Saiph Kappa
able to run my application. On Thu, Jun 4, 2015 at 7:14 PM, Sandy Ryza wrote: > That might work, but there might also be other steps that are required. > > -Sandy > > On Thu, Jun 4, 2015 at 11:13 AM, Saiph Kappa > wrote: > >> Thanks! It is working fine now with spark-su

Re: How to run spark streaming application on YARN?

2015-06-04 Thread Saiph Kappa
or using org > .apache.spark.deploy.yarn.Client. > > 2015-06-04 20:30 GMT+03:00 Saiph Kappa : > >> No, I am not. I run it with sbt «sbt "run-main Branchmark"». I thought it >> was the same thing since I am passing all the configurations through the >> applicat

Re: How to run spark streaming application on YARN?

2015-06-04 Thread Saiph Kappa
using spark-submit? > > -Sandy > > On Thu, Jun 4, 2015 at 10:20 AM, Saiph Kappa > wrote: > >> Hi, >> >> I've been running my spark streaming application in standalone mode >> without any worries. Now, I've been trying to run it on YARN (hadoop 2.7.0) &g

How to run spark streaming application on YARN?

2015-06-04 Thread Saiph Kappa
Hi, I've been running my spark streaming application in standalone mode without any worries. Now, I've been trying to run it on YARN (hadoop 2.7.0) but I am having some problems. Here are the config parameters of my application: « val sparkConf = new SparkConf() sparkConf.setMaster("yarn-client"

Re: Dynamic Allocation with Spark Streaming

2015-05-22 Thread Saiph Kappa
Or should I shutdown the streaming context gracefully and then start it again with a different number of executors? On Sat, May 23, 2015 at 4:00 AM, Saiph Kappa wrote: > Sorry, but I can't see on TD's comments how to allocate executors on > demand. It seems to me that he

Re: Dynamic Allocation with Spark Streaming

2015-05-22 Thread Saiph Kappa
> For #1, the answer is yes. > > For #2, See TD's comments on SPARK-7661 > > Cheers > > > On Fri, May 22, 2015 at 6:58 PM, Saiph Kappa > wrote: > >> Hi, >> >> 1. Dynamic allocation is currently only supported with YARN, correct? >>

Dynamic Allocation with Spark Streaming

2015-05-22 Thread Saiph Kappa
Hi, 1. Dynamic allocation is currently only supported with YARN, correct? 2. In spark streaming, it is possible to change the number of executors while an application is running? If so, can the allocation be controlled by the application, instead of using any already defined automatic policy? Tha

Re: "Could not compute split, block not found" in Spark Streaming Simple Application

2015-04-13 Thread Saiph Kappa
- [records/sec] - Median rate - [records/sec] - Maximum rate - [records/sec] - Last Error Receiver-0---10-10-10-9-Receiver-1--- On Thu, Apr 9, 2015 at 7:55 PM, Tathagata Das wrote: > Are you running # of receivers = # machines? > > TD > > On Thu, Apr 9, 2015 at 9:56 A

Re: "Could not compute split, block not found" in Spark Streaming Simple Application

2015-04-09 Thread Saiph Kappa
ver and the workers and give it to me? Basically I > want to trace through what is happening to the block that is not being > found. > And can you tell what Cluster manager are you using? Spark Standalone, > Mesos or YARN? > > > On Fri, Mar 27, 2015 at 10:09 AM, Saiph Kappa &

"Could not compute split, block not found" in Spark Streaming Simple Application

2015-03-27 Thread Saiph Kappa
Hi, I am just running this simple example with machineA: 1 master + 1 worker machineB: 1 worker « val ssc = new StreamingContext(sparkConf, Duration(1000)) val rawStreams = (1 to numStreams).map(_ =>ssc.rawSocketStream[String](host, port, StorageLevel.MEMORY_ONLY_SER)).toArray val uni

Re: throughput in the web console?

2015-03-03 Thread Saiph Kappa
Sorry I made a mistake. Please ignore my question. On Tue, Mar 3, 2015 at 2:47 AM, Saiph Kappa wrote: > I performed repartitioning and everything went fine with respect to the > number of CPU cores being used (and respective times). However, I noticed > something very strange: ins

Re: Why different numbers of partitions give different results for the same computation on the same dataset?

2015-03-03 Thread Saiph Kappa
Sorry I made a mistake in my code. Please ignore my question number 2. Different numbers of partitions give *the same* results! On Tue, Mar 3, 2015 at 7:32 PM, Saiph Kappa wrote: > Hi, > > I have a spark streaming application, running on a single node, consisting > mainly of map o

Why different numbers of partitions give different results for the same computation on the same dataset?

2015-03-03 Thread Saiph Kappa
Hi, I have a spark streaming application, running on a single node, consisting mainly of map operations. I perform repartitioning to control the number of CPU cores that I want to use. The code goes like this: val ssc = new StreamingContext(sparkConf, Seconds(5)) > val distFile = ssc.text

Re: throughput in the web console?

2015-03-02 Thread Saiph Kappa
on by explicitly repartitioning the received streams > (DStream.repartition) with sufficient partitions to load balance across > more machines. > > TD > > On Thu, Feb 26, 2015 at 9:52 AM, Saiph Kappa > wrote: > >> One more question: while processing the exact same batch I not

Re: throughput in the web console?

2015-02-26 Thread Saiph Kappa
, whether I was using 4 or 6 or 8 CPUs. On Thu, Feb 26, 2015 at 5:35 PM, Saiph Kappa wrote: > By setting spark.eventLog.enabled to true it is possible to see the > application UI after the application has finished its execution, however > the Streaming tab is no longer visible. > > Fo

Re: throughput in the web console?

2015-02-26 Thread Saiph Kappa
By setting spark.eventLog.enabled to true it is possible to see the application UI after the application has finished its execution, however the Streaming tab is no longer visible. For measuring the duration of batches in the code I am doing something like this: «wordCharValues.foreachRDD(rdd => {

How can I measure the time an RDD takes to execute?

2015-01-10 Thread Saiph Kappa
Hi, How can I measure the time an RDD takes to execute? In particular, I want to do it for the following piece of code: « val ssc = new StreamingContext(sparkConf, Seconds(5)) val distFile = ssc.textFileStream("/home/myuser/twitter-dump") val words = distFile.flatMap(_.split(" ")).filter(_.leng

Re: Question about textFileStream

2014-11-12 Thread Saiph Kappa
What if the window is of 5 seconds, and the file takes longer than 5 seconds to be completely scanned? It will still attempt to load the whole file? On Mon, Nov 10, 2014 at 6:24 PM, Soumitra Kumar wrote: > Entire file in a window. > > On Mon, Nov 10, 2014 at 9:20 AM, Saiph Kappa

Question about textFileStream

2014-11-10 Thread Saiph Kappa
Hi, In my application I am doing something like this "new StreamingContext(sparkConf, Seconds(10)).textFileStream("logs/")", and I get some unknown exceptions when I copy a file with about 800 MB to that folder ("logs/"). I have a single worker running with 512 MB of memory. Anyone can tell me if

Re: ERROR UserGroupInformation: PriviledgedActionException

2014-11-05 Thread Saiph Kappa
org.apache.spark > spark-examples_2.10 > 1.1.0 > On Wed, Nov 5, 2014 at 6:32 AM, Akhil Das wrote: > Its more like you are having different versions of spark > > Thanks > Best Regards > > On Wed, Nov 5, 2014 at 3:05 AM,

Re: ERROR UserGroupInformation: PriviledgedActionException

2014-11-04 Thread Saiph Kappa
ake sure you are able to ping this from the cluster) > > *spark.driver.port* - set it to a port number which is accessible from > the spark cluster. > > You can look at more configuration options over here. > <http://spark.apache.org/docs/latest/configuration.html#networking> > >

ERROR UserGroupInformation: PriviledgedActionException

2014-11-03 Thread Saiph Kappa
Hi, I am trying to submit a job to a spark cluster running on a single machine (1 master + 1 worker) with hadoop 1.0.4. I submit it in the code: «val sparkConf = new SparkConf().setMaster("spark://myserver:7077").setAppName("MyApp").setJars(Array("target/my-app-1.0-SNAPSHOT.jar"))». When I run th

Re: Spark Streaming Applications

2014-10-23 Thread Saiph Kappa
her/killrweather > > > On Tue, Oct 21, 2014 at 4:33 PM, Saiph Kappa > wrote: > >> Hi, >> >> I have been trying to find a fairly complex application that makes use of >> the Spark Streaming framework. I checked public github repos but the >> examples I fou

Spark Streaming Applications

2014-10-21 Thread Saiph Kappa
Hi, I have been trying to find a fairly complex application that makes use of the Spark Streaming framework. I checked public github repos but the examples I found were too simple, only comprising simple operations like counters and sums. On the Spark summit website, I could find very interesting

Re: Simple Question: Spark Streaming Applications

2014-09-30 Thread Saiph Kappa
. > > Thanks, > Liquan > > On Mon, Sep 29, 2014 at 10:15 AM, Saiph Kappa > wrote: > >> Hi, >> >> Do all spark streaming applications use the map operation? or the >> majority of them? >> >> Thanks. >> > > > > -- > Liquan Pei > Department of Physics > University of Massachusetts Amherst >

Simple Question: Spark Streaming Applications

2014-09-29 Thread Saiph Kappa
Hi, Do all spark streaming applications use the map operation? or the majority of them? Thanks.