ever node that
> is as specified in your yarn conf.
> On Jun 5, 2016 4:54 PM, "Saiph Kappa" <saiph.ka...@gmail.com> wrote:
>
>> Hi,
>>
>> In yarn-cluster mode, is there any way to specify on which node I want
>> the driver to run?
>>
>> Thanks.
>>
>
Hi,
In yarn-cluster mode, is there any way to specify on which node I want the
driver to run?
Thanks.
Hi,
I'm running a spark streaming application onto a spark cluster that spans 6
machines/workers. I'm using spark cluster standalone mode. Each machine has
8 cores. Is there any way to specify that I want to run my application on
all 6 machines and just use 2 cores on each machine?
Thanks
>
> On Fri, Dec 25, 2015 at 5:57 PM, Saiph Kappa <saiph.ka...@gmail.com>
> wrote:
>
>> Hi,
>>
>> I'm submitting a spark job like this:
>>
>> ~/spark-1.5.2-bin-hadoop2.6/bin/spark-submit --class Benchmark --master
>>> spark://machine1:6066 --de
Hi,
I'm submitting a spark job like this:
~/spark-1.5.2-bin-hadoop2.6/bin/spark-submit --class Benchmark --master
> spark://machine1:6066 --deploy-mode cluster --jars
> target/scala-2.10/benchmark-app_2.10-0.1-SNAPSHOT.jar
> /home/user/bench/target/scala-2.10/benchmark-app_2.10-0.1-SNAPSHOT.jar
PM, Ted Yu <yuzhih...@gmail.com> wrote:
> Since both scala and java files are involved in the PR, I don't see an
> easy way around without building yourself.
>
> Cheers
>
> On Wed, Dec 16, 2015 at 10:18 AM, Saiph Kappa <saiph.ka...@gmail.com>
> wrote:
>
>>
Hi,
Since it is not currently possible to submit a spark job to a spark cluster
running in standalone mode (cluster mode - it's not currently possible to
specify this deploy mode within the code), can I do it with YARN?
I tried to do something like this (but in scala):
«
... // Client object -
>
> On Wed, Dec 16, 2015 at 7:31 AM, Saiph Kappa <saiph.ka...@gmail.com>
> wrote:
>
>> Hi,
>>
>> I have a client application running on host0 that is launching multiple
>> drivers on multiple remote standalone spark clusters (each cluster is
>> runni
Hi,
I have a client application running on host0 that is launching multiple
drivers on multiple remote standalone spark clusters (each cluster is
running on a single machine):
«
...
List("host1", "host2" , "host3").foreach(host => {
val sparkConf = new SparkConf()
sparkConf.setAppName("App")
I think my problem persists whether I use Kafka or sockets. Or am I wrong?
How would you use Kafka here?
On Fri, Nov 20, 2015 at 7:12 PM, Christian <engr...@gmail.com> wrote:
> Have you considered using Kafka?
>
> On Fri, Nov 20, 2015 at 6:48 AM Saiph Kappa <saiph.ka...@gmail.c
the data? I believe
> rawSocketStream waits for a big chunk of data before it can start
> processing it. I think what you are writing is a String and you should use
> socketTextStream which reads the data on a per line basis.
>
> On Sun, Sep 20, 2015 at 9:56 AM, Saiph Kappa <saiph.ka...@gmail.com
Hi,
I am trying to build a data generator that feeds a streaming application.
This data generator just reads a file and send its lines through a socket.
I get no errors on the logs, and the benchmark bellow always prints
"Received 0 records". Am I doing something wrong?
object MyDataGenerator {
.
On Mon, Jun 22, 2015 at 10:42 PM, ÐΞ€ρ@Ҝ (๏̯͡๏) deepuj...@gmail.com wrote:
1) Can you try with yarn-cluster
2) Does your queue have enough capacity
On Mon, Jun 22, 2015 at 11:10 AM, Saiph Kappa saiph.ka...@gmail.com
wrote:
Hi,
I am running a simple spark streaming application on hadoop 2.7.0
Hi,
I am running a simple spark streaming application on hadoop 2.7.0/YARN
(master: yarn-client) cluster with 2 different machines (12GB RAM with 8
CPU cores each).
I am launching my application like this:
~/myapp$ ~/my-spark/bin/spark-submit --class App --master yarn-client
--driver-memory 4g
How can I get more information regarding this exception?
On Wed, Jun 17, 2015 at 1:17 AM, Saiph Kappa saiph.ka...@gmail.com wrote:
Hi,
I am running a simple spark streaming application on hadoop 2.7.0/YARN
(master: yarn-client) with 2 executors in different machines. However,
while the app
Hi,
I am running a simple spark streaming application on hadoop 2.7.0/YARN
(master: yarn-client) with 2 executors in different machines. However,
while the app is running, I can see on the app web UI (tab executors) that
only 1 executor keeps completing tasks over time, the other executor only
.
On Thu, Jun 4, 2015 at 7:20 PM, Saiph Kappa saiph.ka...@gmail.com wrote:
Additionally, I think this document (
https://spark.apache.org/docs/latest/building-spark.html ) should mention
that the protobuf.version might need to be changed to match the one used in
the chosen hadoop version
it with spark-submit or using org
.apache.spark.deploy.yarn.Client.
2015-06-04 20:30 GMT+03:00 Saiph Kappa saiph.ka...@gmail.com:
No, I am not. I run it with sbt «sbt run-main Branchmark». I thought it
was the same thing since I am passing all the configurations through the
application code
to be able to run my application.
On Thu, Jun 4, 2015 at 7:14 PM, Sandy Ryza sandy.r...@cloudera.com wrote:
That might work, but there might also be other steps that are required.
-Sandy
On Thu, Jun 4, 2015 at 11:13 AM, Saiph Kappa saiph.ka...@gmail.com
wrote:
Thanks! It is working fine now
Hi,
1. Dynamic allocation is currently only supported with YARN, correct?
2. In spark streaming, it is possible to change the number of executors
while an application is running? If so, can the allocation be controlled by
the application, instead of using any already defined automatic policy?
wrote:
For #1, the answer is yes.
For #2, See TD's comments on SPARK-7661
Cheers
On Fri, May 22, 2015 at 6:58 PM, Saiph Kappa saiph.ka...@gmail.com
wrote:
Hi,
1. Dynamic allocation is currently only supported with YARN, correct?
2. In spark streaming, it is possible to change the number
Or should I shutdown the streaming context gracefully and then start it
again with a different number of executors?
On Sat, May 23, 2015 at 4:00 AM, Saiph Kappa saiph.ka...@gmail.com wrote:
Sorry, but I can't see on TD's comments how to allocate executors on
demand. It seems to me that he's
- [records/sec]
- Last Error
Receiver-0---10-10-10-9-Receiver-1---
On Thu, Apr 9, 2015 at 7:55 PM, Tathagata Das t...@databricks.com wrote:
Are you running # of receivers = # machines?
TD
On Thu, Apr 9, 2015 at 9:56 AM, Saiph Kappa saiph.ka...@gmail.com wrote:
Sorry, I was getting
the driver and the workers and give it to me? Basically I
want to trace through what is happening to the block that is not being
found.
And can you tell what Cluster manager are you using? Spark Standalone,
Mesos or YARN?
On Fri, Mar 27, 2015 at 10:09 AM, Saiph Kappa saiph.ka...@gmail.com
wrote
Hi,
I am just running this simple example with
machineA: 1 master + 1 worker
machineB: 1 worker
«
val ssc = new StreamingContext(sparkConf, Duration(1000))
val rawStreams = (1 to numStreams).map(_
=ssc.rawSocketStream[String](host, port,
StorageLevel.MEMORY_ONLY_SER)).toArray
val
Hi,
I have a spark streaming application, running on a single node, consisting
mainly of map operations. I perform repartitioning to control the number of
CPU cores that I want to use. The code goes like this:
val ssc = new StreamingContext(sparkConf, Seconds(5))
val distFile =
Sorry I made a mistake. Please ignore my question.
On Tue, Mar 3, 2015 at 2:47 AM, Saiph Kappa saiph.ka...@gmail.com wrote:
I performed repartitioning and everything went fine with respect to the
number of CPU cores being used (and respective times). However, I noticed
something very strange
Sorry I made a mistake in my code. Please ignore my question number 2.
Different numbers of partitions give *the same* results!
On Tue, Mar 3, 2015 at 7:32 PM, Saiph Kappa saiph.ka...@gmail.com wrote:
Hi,
I have a spark streaming application, running on a single node, consisting
mainly
the received streams
(DStream.repartition) with sufficient partitions to load balance across
more machines.
TD
On Thu, Feb 26, 2015 at 9:52 AM, Saiph Kappa saiph.ka...@gmail.com
wrote:
One more question: while processing the exact same batch I noticed that
giving more CPUs to the worker
,
whether I was using 4 or 6 or 8 CPUs.
On Thu, Feb 26, 2015 at 5:35 PM, Saiph Kappa saiph.ka...@gmail.com wrote:
By setting spark.eventLog.enabled to true it is possible to see the
application UI after the application has finished its execution, however
the Streaming tab is no longer visible
By setting spark.eventLog.enabled to true it is possible to see the
application UI after the application has finished its execution, however
the Streaming tab is no longer visible.
For measuring the duration of batches in the code I am doing something like
this:
«wordCharValues.foreachRDD(rdd = {
Hi,
How can I measure the time an RDD takes to execute?
In particular, I want to do it for the following piece of code:
«
val ssc = new StreamingContext(sparkConf, Seconds(5))
val distFile = ssc.textFileStream(/home/myuser/twitter-dump)
val words = distFile.flatMap(_.split( )).filter(_.length
What if the window is of 5 seconds, and the file takes longer than 5
seconds to be completely scanned? It will still attempt to load the whole
file?
On Mon, Nov 10, 2014 at 6:24 PM, Soumitra Kumar kumar.soumi...@gmail.com
wrote:
Entire file in a window.
On Mon, Nov 10, 2014 at 9:20 AM, Saiph
Hi,
In my application I am doing something like this new
StreamingContext(sparkConf, Seconds(10)).textFileStream(logs/), and I
get some unknown exceptions when I copy a file with about 800 MB to that
folder (logs/). I have a single worker running with 512 MB of memory.
Anyone can tell me if
/dependency
On Wed, Nov 5, 2014 at 6:32 AM, Akhil Das ak...@sigmoidanalytics.com
wrote:
Its more like you are having different versions of spark
Thanks
Best Regards
On Wed, Nov 5, 2014 at 3:05 AM, Saiph Kappa saiph.ka...@gmail.com wrote:
I set the host and port of the driver and now
On Tue, Oct 21, 2014 at 4:33 PM, Saiph Kappa saiph.ka...@gmail.com
wrote:
Hi,
I have been trying to find a fairly complex application that makes use of
the Spark Streaming framework. I checked public github repos but the
examples I found were too simple, only comprising simple operations
Hi,
I have been trying to find a fairly complex application that makes use of
the Spark Streaming framework. I checked public github repos but the
examples I found were too simple, only comprising simple operations like
counters and sums. On the Spark summit website, I could find very
interesting
Hi,
Do all spark streaming applications use the map operation? or the majority
of them?
Thanks.
38 matches
Mail list logo