pecified in your yarn conf.
> On Jun 5, 2016 4:54 PM, "Saiph Kappa" wrote:
>
>> Hi,
>>
>> In yarn-cluster mode, is there any way to specify on which node I want
>> the driver to run?
>>
>> Thanks.
>>
>
Hi,
In yarn-cluster mode, is there any way to specify on which node I want the
driver to run?
Thanks.
Hi,
I'm running a spark streaming application onto a spark cluster that spans 6
machines/workers. I'm using spark cluster standalone mode. Each machine has
8 cores. Is there any way to specify that I want to run my application on
all 6 machines and just use 2 cores on each machine?
Thanks
tirely sure why, but
everything goes fine without that line.
Thanks!
On Tue, Dec 29, 2015 at 1:39 PM, Prem Spark wrote:
> you need make sure this class is accessible to all servers since its a
> cluster mode and drive can be on any of the worker nodes.
>
>
> On Fri, Dec 25, 20
Hi,
I'm submitting a spark job like this:
~/spark-1.5.2-bin-hadoop2.6/bin/spark-submit --class Benchmark --master
> spark://machine1:6066 --deploy-mode cluster --jars
> target/scala-2.10/benchmark-app_2.10-0.1-SNAPSHOT.jar
> /home/user/bench/target/scala-2.10/benchmark-app_2.10-0.1-SNAPSHOT.jar 1
Hi,
Since it is not currently possible to submit a spark job to a spark cluster
running in standalone mode (cluster mode - it's not currently possible to
specify this deploy mode within the code), can I do it with YARN?
I tried to do something like this (but in scala):
«
... // Client object -
PM, Ted Yu wrote:
> Since both scala and java files are involved in the PR, I don't see an
> easy way around without building yourself.
>
> Cheers
>
> On Wed, Dec 16, 2015 at 10:18 AM, Saiph Kappa
> wrote:
>
>> Exactly, but it's only fixed for the next sp
Exactly, but it's only fixed for the next spark version. Is there any work
around for version 1.5.2?
On Wed, Dec 16, 2015 at 4:36 PM, Ted Yu wrote:
> This seems related:
> [SPARK-10123][DEPLOY] Support specifying deploy mode from configuration
>
> FYI
>
> On Wed, Dec 16,
Hi,
I have a client application running on host0 that is launching multiple
drivers on multiple remote standalone spark clusters (each cluster is
running on a single machine):
«
...
List("host1", "host2" , "host3").foreach(host => {
val sparkConf = new SparkConf()
sparkConf.setAppName("App")
s
I think my problem persists whether I use Kafka or sockets. Or am I wrong?
How would you use Kafka here?
On Fri, Nov 20, 2015 at 7:12 PM, Christian wrote:
> Have you considered using Kafka?
>
> On Fri, Nov 20, 2015 at 6:48 AM Saiph Kappa wrote:
>
>> Hi,
>>
>>
Hi,
I have a basic spark streaming application like this:
«
...
val ssc = new StreamingContext(sparkConf, Duration(batchMillis))
val rawStreams = (1 to numStreams).map(_ =>
ssc.rawSocketStream[String](host, port, StorageLevel.MEMORY_ONLY_SER)).toArray
val union = ssc.union(rawStreams)
union.f
Hi,
Does SparkSQL support time based windowing queries over streams like the
following one (from Intel/StreamingSQL):
«
sql(
"""|SELECT t.word, COUNT(t.word)|FROM (SELECT * FROM test)
OVER (WINDOW '9' SECONDS, SLIDE '3' SECONDS) AS t|GROUP BY t.word
""".stripMargin)
»
What are my
cketStream waits for a big chunk of data before it can start
> processing it. I think what you are writing is a String and you should use
> socketTextStream which reads the data on a per line basis.
>
> On Sun, Sep 20, 2015 at 9:56 AM, Saiph Kappa
> wrote:
>
>> Hi,
>&g
Hi,
I am trying to build a data generator that feeds a streaming application.
This data generator just reads a file and send its lines through a socket.
I get no errors on the logs, and the benchmark bellow always prints
"Received 0 records". Am I doing something wrong?
object MyDataGenerator {
Mon, Jun 22, 2015 at 10:42 PM, ÐΞ€ρ@Ҝ (๏̯͡๏) wrote:
> 1) Can you try with yarn-cluster
> 2) Does your queue have enough capacity
>
> On Mon, Jun 22, 2015 at 11:10 AM, Saiph Kappa
> wrote:
>
>> Hi,
>>
>> I am running a simple spark streaming application o
Hi,
I am running a simple spark streaming application on hadoop 2.7.0/YARN
(master: yarn-client) cluster with 2 different machines (12GB RAM with 8
CPU cores each).
I am launching my application like this:
~/myapp$ ~/my-spark/bin/spark-submit --class App --master yarn-client
--driver-memory 4g -
How can I get more information regarding this exception?
On Wed, Jun 17, 2015 at 1:17 AM, Saiph Kappa wrote:
> Hi,
>
> I am running a simple spark streaming application on hadoop 2.7.0/YARN
> (master: yarn-client) with 2 executors in different machines. However,
> while the ap
Hi,
I am running a simple spark streaming application on hadoop 2.7.0/YARN
(master: yarn-client) with 2 executors in different machines. However,
while the app is running, I can see on the app web UI (tab executors) that
only 1 executor keeps completing tasks over time, the other executor only
wor
Thanks.
On Thu, Jun 4, 2015 at 7:20 PM, Saiph Kappa wrote:
> Additionally, I think this document (
> https://spark.apache.org/docs/latest/building-spark.html ) should mention
> that the protobuf.version might need to be changed to match the one used in
> the chosen hadoop version
able to run my application.
On Thu, Jun 4, 2015 at 7:14 PM, Sandy Ryza wrote:
> That might work, but there might also be other steps that are required.
>
> -Sandy
>
> On Thu, Jun 4, 2015 at 11:13 AM, Saiph Kappa
> wrote:
>
>> Thanks! It is working fine now with spark-su
or using org
> .apache.spark.deploy.yarn.Client.
>
> 2015-06-04 20:30 GMT+03:00 Saiph Kappa :
>
>> No, I am not. I run it with sbt «sbt "run-main Branchmark"». I thought it
>> was the same thing since I am passing all the configurations through the
>> applicat
using spark-submit?
>
> -Sandy
>
> On Thu, Jun 4, 2015 at 10:20 AM, Saiph Kappa
> wrote:
>
>> Hi,
>>
>> I've been running my spark streaming application in standalone mode
>> without any worries. Now, I've been trying to run it on YARN (hadoop 2.7.0)
&g
Hi,
I've been running my spark streaming application in standalone mode without
any worries. Now, I've been trying to run it on YARN (hadoop 2.7.0) but I
am having some problems.
Here are the config parameters of my application:
«
val sparkConf = new SparkConf()
sparkConf.setMaster("yarn-client"
Or should I shutdown the streaming context gracefully and then start it
again with a different number of executors?
On Sat, May 23, 2015 at 4:00 AM, Saiph Kappa wrote:
> Sorry, but I can't see on TD's comments how to allocate executors on
> demand. It seems to me that he
> For #1, the answer is yes.
>
> For #2, See TD's comments on SPARK-7661
>
> Cheers
>
>
> On Fri, May 22, 2015 at 6:58 PM, Saiph Kappa
> wrote:
>
>> Hi,
>>
>> 1. Dynamic allocation is currently only supported with YARN, correct?
>>
Hi,
1. Dynamic allocation is currently only supported with YARN, correct?
2. In spark streaming, it is possible to change the number of executors
while an application is running? If so, can the allocation be controlled by
the application, instead of using any already defined automatic policy?
Tha
- [records/sec]
- Median rate
- [records/sec]
- Maximum rate
- [records/sec]
- Last Error
Receiver-0---10-10-10-9-Receiver-1---
On Thu, Apr 9, 2015 at 7:55 PM, Tathagata Das wrote:
> Are you running # of receivers = # machines?
>
> TD
>
> On Thu, Apr 9, 2015 at 9:56 A
ver and the workers and give it to me? Basically I
> want to trace through what is happening to the block that is not being
> found.
> And can you tell what Cluster manager are you using? Spark Standalone,
> Mesos or YARN?
>
>
> On Fri, Mar 27, 2015 at 10:09 AM, Saiph Kappa
&
Hi,
I am just running this simple example with
machineA: 1 master + 1 worker
machineB: 1 worker
«
val ssc = new StreamingContext(sparkConf, Duration(1000))
val rawStreams = (1 to numStreams).map(_
=>ssc.rawSocketStream[String](host, port,
StorageLevel.MEMORY_ONLY_SER)).toArray
val uni
Sorry I made a mistake. Please ignore my question.
On Tue, Mar 3, 2015 at 2:47 AM, Saiph Kappa wrote:
> I performed repartitioning and everything went fine with respect to the
> number of CPU cores being used (and respective times). However, I noticed
> something very strange: ins
Sorry I made a mistake in my code. Please ignore my question number 2.
Different numbers of partitions give *the same* results!
On Tue, Mar 3, 2015 at 7:32 PM, Saiph Kappa wrote:
> Hi,
>
> I have a spark streaming application, running on a single node, consisting
> mainly of map o
Hi,
I have a spark streaming application, running on a single node, consisting
mainly of map operations. I perform repartitioning to control the number of
CPU cores that I want to use. The code goes like this:
val ssc = new StreamingContext(sparkConf, Seconds(5))
> val distFile = ssc.text
on by explicitly repartitioning the received streams
> (DStream.repartition) with sufficient partitions to load balance across
> more machines.
>
> TD
>
> On Thu, Feb 26, 2015 at 9:52 AM, Saiph Kappa
> wrote:
>
>> One more question: while processing the exact same batch I not
,
whether I was using 4 or 6 or 8 CPUs.
On Thu, Feb 26, 2015 at 5:35 PM, Saiph Kappa wrote:
> By setting spark.eventLog.enabled to true it is possible to see the
> application UI after the application has finished its execution, however
> the Streaming tab is no longer visible.
>
> Fo
By setting spark.eventLog.enabled to true it is possible to see the
application UI after the application has finished its execution, however
the Streaming tab is no longer visible.
For measuring the duration of batches in the code I am doing something like
this:
«wordCharValues.foreachRDD(rdd => {
Hi,
How can I measure the time an RDD takes to execute?
In particular, I want to do it for the following piece of code:
«
val ssc = new StreamingContext(sparkConf, Seconds(5))
val distFile = ssc.textFileStream("/home/myuser/twitter-dump")
val words = distFile.flatMap(_.split(" ")).filter(_.leng
What if the window is of 5 seconds, and the file takes longer than 5
seconds to be completely scanned? It will still attempt to load the whole
file?
On Mon, Nov 10, 2014 at 6:24 PM, Soumitra Kumar
wrote:
> Entire file in a window.
>
> On Mon, Nov 10, 2014 at 9:20 AM, Saiph Kappa
Hi,
In my application I am doing something like this "new
StreamingContext(sparkConf, Seconds(10)).textFileStream("logs/")", and I
get some unknown exceptions when I copy a file with about 800 MB to that
folder ("logs/"). I have a single worker running with 512 MB of memory.
Anyone can tell me if
org.apache.spark
> spark-examples_2.10
> 1.1.0
>
On Wed, Nov 5, 2014 at 6:32 AM, Akhil Das
wrote:
> Its more like you are having different versions of spark
>
> Thanks
> Best Regards
>
> On Wed, Nov 5, 2014 at 3:05 AM,
ake sure you are able to ping this from the cluster)
>
> *spark.driver.port* - set it to a port number which is accessible from
> the spark cluster.
>
> You can look at more configuration options over here.
> <http://spark.apache.org/docs/latest/configuration.html#networking>
>
>
Hi,
I am trying to submit a job to a spark cluster running on a single machine
(1 master + 1 worker) with hadoop 1.0.4. I submit it in the code:
«val sparkConf = new
SparkConf().setMaster("spark://myserver:7077").setAppName("MyApp").setJars(Array("target/my-app-1.0-SNAPSHOT.jar"))».
When I run th
her/killrweather
>
>
> On Tue, Oct 21, 2014 at 4:33 PM, Saiph Kappa
> wrote:
>
>> Hi,
>>
>> I have been trying to find a fairly complex application that makes use of
>> the Spark Streaming framework. I checked public github repos but the
>> examples I fou
Hi,
I have been trying to find a fairly complex application that makes use of
the Spark Streaming framework. I checked public github repos but the
examples I found were too simple, only comprising simple operations like
counters and sums. On the Spark summit website, I could find very
interesting
.
>
> Thanks,
> Liquan
>
> On Mon, Sep 29, 2014 at 10:15 AM, Saiph Kappa
> wrote:
>
>> Hi,
>>
>> Do all spark streaming applications use the map operation? or the
>> majority of them?
>>
>> Thanks.
>>
>
>
>
> --
> Liquan Pei
> Department of Physics
> University of Massachusetts Amherst
>
Hi,
Do all spark streaming applications use the map operation? or the majority
of them?
Thanks.
45 matches
Mail list logo