from:"PhuDuc Nguyen"

Re: can't kill spark job in supervise mode

2016-01-30 Thread PhuDuc Nguyen

Hi Tim,

Yes we are running Spark on Mesos in cluster mode with supervise flag.
Submit script looks like this:

spark-submit \
--conf "spark.executor.extraJavaOptions=-XX:+UseG1GC -XX:+UseCompressedOops
-XX:-UseGCOverheadLimit" \
--supervise \
--deploy-mode cluster \
--class  \
--master mesos://:7077 

Mesos version = 0.26.0
Spark version = 1.5.2


thanks,
Duc

On Sat, Jan 30, 2016 at 9:48 AM, Tim Chen <t...@mesosphere.io> wrote:

> Hi Duc,
>
> Are you running Spark on Mesos with cluster mode? And what's your cluster
> mode submission, and version of Spark are you running?
>
> Tim
>
> On Sat, Jan 30, 2016 at 8:19 AM, PhuDuc Nguyen <duc.was.h...@gmail.com>
> wrote:
>
>> I have a spark job running on Mesos in multi-master and supervise mode.
>> If I kill it, it is resilient as expected and respawns on another node.
>> However, I cannot kill it when I need to. I have tried 2 methods:
>>
>> 1) ./bin/spark-class org.apache.spark.deploy.Client kill
>> <masterIp:port???> 
>>
>> 2) ./bin/spark-submit --master mesos:// --kill 
>>
>> Method 2, accepts the kill request but is respawned on another node.
>> Ultimately, I can't get either method to kill the job. I suspect I have
>> the wrong port for the master URL during the kill request for method 1?
>> I've tried every combination of IP and port I can think of, is there one I
>> am missing?
>>
>> Ports I've tried:
>> 5050 = mesos UI
>> 8080 = marathon
>> 7077 = spark dispatcher
>> 8081 = spark drivers UI
>> 4040 = spark job UI
>>
>> thanks,
>> Duc
>>
>
>

can't kill spark job in supervise mode

2016-01-30 Thread PhuDuc Nguyen

I have a spark job running on Mesos in multi-master and supervise mode. If
I kill it, it is resilient as expected and respawns on another node.
However, I cannot kill it when I need to. I have tried 2 methods:

1) ./bin/spark-class org.apache.spark.deploy.Client kill 


2) ./bin/spark-submit --master mesos:// --kill 

Method 2, accepts the kill request but is respawned on another node.
Ultimately, I can't get either method to kill the job. I suspect I have the
wrong port for the master URL during the kill request for method 1? I've
tried every combination of IP and port I can think of, is there one I am
missing?

Ports I've tried:
5050 = mesos UI
8080 = marathon
7077 = spark dispatcher
8081 = spark drivers UI
4040 = spark job UI

thanks,
Duc

Re: Spark Streaming + Kafka + scala job message read issue

2015-12-25 Thread PhuDuc Nguyen

Vivek,

Did you say you have 8 spark jobs that are consuming from the same topic
and all jobs are using the same consumer group name? If so, each job would
get a subset of messages from that kafka topic, ie each job would get 1 out
of 8 messages from that topic. Is that your intent?

regards,
Duc






On Thu, Dec 24, 2015 at 7:20 AM,  wrote:

> We are using the older receiver based approach, the number of partitions
> is 1 (we have a single node kafka) and we use single thread per topic still
> we have the problem. Please see the API we use. All 8 spark jobs use same
> group name – is that a problem?
>
>
>
> val topicMap = topics.split(",").map((_, numThreads.toInt)).toMap  - *Number
> of threads used here is 1*
>
> val searches = KafkaUtils.createStream(ssc, zkQuorum, group,
> topicMap).map(line => parse(line._2).extract[Search])
>
>
>
>
>
> Regards,
> Vivek M
>
> *From:* Bryan [mailto:bryan.jeff...@gmail.com]
> *Sent:* 24 December 2015 17:20
> *To:* Vivek Meghanathan (WT01 - NEP) ;
> user@spark.apache.org
> *Subject:* RE: Spark Streaming + Kafka + scala job message read issue
>
>
>
> Are you using a direct stream consumer, or the older receiver based
> consumer? If the latter, do the number of partitions you’ve specified for
> your topic match the number of partitions in the topic on Kafka?
>
>
>
> That would be an possible cause – as you might receive all data from a
> given partition while missing data from other partitions.
>
>
>
> Regards,
>
>
>
> Bryan Jeffrey
>
>
>
> Sent from Outlook Mail 
> for Windows 10 phone
>
>
>
>
> *From: *vivek.meghanat...@wipro.com
> *Sent: *Thursday, December 24, 2015 5:22 AM
> *To: *user@spark.apache.org
> *Subject: *Spark Streaming + Kafka + scala job message read issue
>
>
>
> Hi All,
>
>
>
> We are using Bitnami Kafka 0.8.2 + spark 1.5.2 in Google cloud platform.
> Our spark streaming job(consumer) not receiving all the messages sent to
> the specific topic. It receives 1 out of ~50 messages(added log in the job
> stream and identified). We are not seeing any errors in the kafka logs.
> Unable to debug further from kafka layer. The console consumer shows the
> INPUT topic is received in the console. it is not reaching the spark-kafka
> integration stream. Any thoughts how to debug this issue. Another topic is
> working fine in same setup.
>
> Again tried with spark 1.3.0, kafka 0.8.1.1 which is also has same issue.
> All these jobs are working fine in our local lab servers
>
> Regards,
> Vivek M
>
> The information contained in this electronic message and any attachments
> to this message are intended for the exclusive use of the addressee(s) and
> may contain proprietary, confidential or privileged information. If you are
> not the intended recipient, you should not disseminate, distribute or copy
> this e-mail. Please notify the sender immediately and destroy all copies of
> this message and any attachments. WARNING: Computer viruses can be
> transmitted via email. The recipient should check this email and any
> attachments for the presence of viruses. The company accepts no liability
> for any damage caused by any virus transmitted by this email.
> www.wipro.com
>
>
> The information contained in this electronic message and any attachments
> to this message are intended for the exclusive use of the addressee(s) and
> may contain proprietary, confidential or privileged information. If you are
> not the intended recipient, you should not disseminate, distribute or copy
> this e-mail. Please notify the sender immediately and destroy all copies of
> this message and any attachments. WARNING: Computer viruses can be
> transmitted via email. The recipient should check this email and any
> attachments for the presence of viruses. The company accepts no liability
> for any damage caused by any virus transmitted by this email.
> www.wipro.com
>

Re: Preventing an RDD from shuffling

2015-12-16 Thread PhuDuc Nguyen

There is a way and it's called "map-side-join". To be clear, there is no
explicit function call/API to execute a map-side-join. You have to code it
using a local/broadcast value combined with the map() function. A caveat
for this to work is that one side of the join must be small-ish to exist as
a local/broadcast value. A description of what you're trying to achieve is
a partition local join via the map function. The results are equivalent to
a join but avoids a cluster wide shuffle.

Read the pdf below and look for "Example: Join". This will explain how
joins work in Spark and how you can try to optimize it with a map-side-join
(if your use case fits).
http://ampcamp.berkeley.edu/wp-content/uploads/2012/06/matei-zaharia-amp-camp-2012-advanced-spark.pdf

HTH,
Duc

On Wed, Dec 16, 2015 at 3:23 AM, sparkuser2345 
wrote:

> Is there a way to prevent an RDD from shuffling in a join operation without
> repartitioning it?
>
> I'm reading an RDD from sharded MongoDB, joining that with an RDD of
> incoming data (+ some additional calculations), and writing the resulting
> RDD back to MongoDB. It would make sense to shuffle only the incoming data
> RDD so that the joined RDD would already be partitioned correctly according
> to the MondoDB shard key.
>
> I know I can prevent an RDD from shuffling in a join operation by
> partitioning it beforehand but partitioning would already shuffle the RDD.
> In addition, I'm only doing the join once per RDD read from MongoDB. Is
> there a way to tell Spark to shuffle only the incoming data RDD?
>
>
>
> --
> View this message in context:
> http://apache-spark-user-list.1001560.n3.nabble.com/Preventing-an-RDD-from-shuffling-tp25717.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>
> -
> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
> For additional commands, e-mail: user-h...@spark.apache.org
>
>

Re: [mesos][docker] addFile doesn't work properly

2015-12-10 Thread PhuDuc Nguyen

Have you tried setting spark.mesos.uri property like

val conf = new SparkConf().set("spark.mesos.uris", ...)
val sc = new SparkContext(conf)
...

http://spark.apache.org/docs/latest/running-on-mesos.html

HTH,
Duc







On Thu, Dec 10, 2015 at 1:04 PM, PHELIPOT, REMY 
wrote:

> Hello!
>
> I'm using Apache Spark with Mesos, and I've launched a job with
> coarse-mode=true. In my job, I must download a file from the internet, so
> I'm using:
>
> import org.apache.spark.SparkFiles
> sc.addFile("
> http://samplecsvs.s3.amazonaws.com/Sacramentorealestatetransactions.csv;)
> val path = SparkFiles.get("Sacramentorealestatetransactions.csv")
> val textRDD = sc.textFile(path)
> ... some stuff
>
> But the job failed with the following error:
>
> Job aborted due to stage failure: Task 1 in stage 8.0 failed 4 times, most 
> recent failure: Lost task 1.3 in stage 8.0 (TID 58, slave-1): 
> java.io.FileNotFoundException: File 
> file:/tmp/spark-5dde1847-b433-4282-a535-57ba5e2c9b81/userFiles-0885c136-9df1-44b9-a531-343268edfb6c/Sacramentorealestatetransactions.csv
>  does not exist
>   at 
> org.apache.hadoop.fs.RawLocalFileSystem.deprecatedGetFileStatus(RawLocalFileSystem.java:534)
>   at 
> org.apache.hadoop.fs.RawLocalFileSystem.getFileLinkStatusInternal(RawLocalFileSystem.java:747)
>   at 
> org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:524)
>   at 
> org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:409)
>   at 
> org.apache.hadoop.fs.ChecksumFileSystem$ChecksumFSInputChecker.(ChecksumFileSystem.java:140)
>   at 
> org.apache.hadoop.fs.ChecksumFileSystem.open(ChecksumFileSystem.java:341)
>   at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:766)
>   at 
> org.apache.hadoop.mapred.LineRecordReader.(LineRecordReader.java:108)
>   at 
> org.apache.hadoop.mapred.TextInputFormat.getRecordReader(TextInputFormat.java:67)
>   at org.apache.spark.rdd.HadoopRDD$$anon$1.(HadoopRDD.scala:239)
>   at org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:216)
>   at org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:101)
>   at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:297)
>   at org.apache.spark.rdd.RDD.iterator(RDD.scala:264)
>   at 
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
>   at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:297)
>   at org.apache.spark.rdd.RDD.iterator(RDD.scala:264)
>   at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)
>   at org.apache.spark.scheduler.Task.run(Task.scala:88)
>   at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>   at java.lang.Thread.run(Thread.java:745)
>
> Indeed, the file is not downloaded inside the executor container. However it 
> is downloaded in the driver container.
>
> It seems spark doesn't copy this file on executor containers, can someone 
> confirm this issue? Am I doing something wrong?
>
> Kind regards,
>
> Rémy
>
> Ce message et toutes les pièces jointes (ci-après le "message") sont
> établis à l’intention exclusive des destinataires désignés. Il contient des
> informations confidentielles et pouvant être protégé par le secret
> professionnel. Si vous recevez ce message par erreur, merci d'en avertir
> immédiatement l'expéditeur et de détruire le message. Toute utilisation de
> ce message non conforme à sa destination, toute diffusion ou toute
> publication, totale ou partielle, est interdite, sauf autorisation expresse
> de l’émetteur. L'internet ne garantissant pas l'intégrité de ce message
> lors de son acheminement, Atos (et ses filiales) décline(nt) toute
> responsabilité au titre de son contenu. Bien que ce message ait fait
> l’objet d’un traitement anti-virus lors de son envoi, l’émetteur ne peut
> garantir l’absence totale de logiciels malveillants dans son contenu et ne
> pourrait être tenu pour responsable des dommages engendrés par la
> transmission de l’un d’eux.
>
> This message and any attachments (the "message") are intended solely for
> the addressee(s). It contains confidential information, that may be
> privileged. If you receive this message in error, please notify the sender
> immediately and delete the message. Any use of the message in violation of
> its purpose, any dissemination or disclosure, either wholly or partially is
> strictly prohibited, unless it has been explicitly authorized by the
> sender. As its integrity cannot be secured on the internet, Atos and its
> subsidiaries decline any liability for the content of this message.
> Although the sender endeavors to maintain a computer virus-free network,
> the sender does not warrant that this transmission is virus-free and will
>

Re: Need to maintain the consumer offset by myself when using spark streaming kafka direct approach?

2015-12-08 Thread PhuDuc Nguyen

Kafka Receiver-based approach:
This will maintain the consumer offsets in ZK for you.

Kafka Direct approach:
You can use checkpointing and that will maintain consumer offsets for you.
You'll want to checkpoint to a highly available file system like HDFS or S3.
http://spark.apache.org/docs/latest/streaming-programming-guide.html#checkpointing

You don't have to maintain your own offsets if you don't want to. If the 2
solutions above don't satisfy your requirements, then consider writing your
own; otherwise I would recommend using the supported features in Spark.

HTH,
Duc

On Tue, Dec 8, 2015 at 5:05 AM, Tao Li  wrote:

> I am using spark streaming kafka direct approach these days. I found that
> when I start the application, it always start consumer the latest offset. I
> hope that when application start, it consume from the offset last
> application consumes with the same kafka consumer group. It means I have to
> maintain the consumer offset by my self, for example record it on
> zookeeper, and reload the last offset from zookeeper when restarting the
> applicaiton?
>
> I see the following discussion:
> https://github.com/apache/spark/pull/4805
> https://issues.apache.org/jira/browse/SPARK-6249
>
> Is there any conclusion? Do we need to maintain the offset by myself? Or
> spark streaming will support a feature to simplify the offset maintain work?
>
>
> https://forums.databricks.com/questions/2936/need-to-maintain-the-consumer-offset-by-myself-whe.html
>

Re: Spark UI - Streaming Tab

2015-12-04 Thread PhuDuc Nguyen

I believe the "Streaming" tab is dynamic - it appears once you have a
streaming job running, not when the cluster is simply up. It does not
depend on 1.6 and has been in there since at least 1.0.

HTH,
Duc

On Fri, Dec 4, 2015 at 7:28 AM, patcharee  wrote:

> Hi,
>
> We tried to get the streaming tab interface on Spark UI -
> https://databricks.com/blog/2015/07/08/new-visualizations-for-understanding-spark-streaming-applications.html
>
> Tested on version 1.5.1, 1.6.0-snapshot, but no such interface for
> streaming applications at all. Any suggestions? Do we need to configure the
> history UI somehow to get such interface?
>
> Thanks,
> Patcharee
>
> -
> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
> For additional commands, e-mail: user-h...@spark.apache.org
>
>

Re: Parallelizing operations using Spark

2015-11-17 Thread PhuDuc Nguyen

You should try passing your solr writer into rdd.foreachPartition() for max
parallelism - each partition on each executor will execute the function
passed in.

HTH,
Duc

On Tue, Nov 17, 2015 at 7:36 AM, Susheel Kumar 
wrote:

> Any input/suggestions on parallelizing below operations using Spark over
> Java Thread pooling
> - reading of 100 thousands json files from local file system
> - processing each file content and submitting to Solr as Input document
>
> Thanks,
> Susheel
>
> On Mon, Nov 16, 2015 at 5:44 PM, Susheel Kumar 
> wrote:
>
>> Hello Spark Users,
>>
>> My first email to spark mailing list and looking forward. I have been
>> working on Solr and in the past have used Java thread pooling to
>> parallelize Solr indexing using SolrJ.
>>
>> Now i am again working on indexing data and this time from JSON files (in
>> 100 thousands) and before I try out parallelizing the operations using
>> Spark (reading each JSON file, post its content to Solr) I wanted to
>> confirm my understanding.
>>
>>
>> By reading json files using wholeTextFiles and then posting the content
>> to Solr
>>
>> - would be similar to what i will achieve using Java multi-threading /
>> thread pooling and using ExecutorFramework  and
>> - what additional other advantages i would get by using Spark (less
>> code...)
>> - How we can parallelize/batch this further? For e.g. In my Java
>> multi-threaded i not only parallelize the reading / data acquisition but
>> also posting in batches in parallel.
>>
>>
>> Below is the code snippet to give you an idea of what i am thinking to
>> start initially.  Please feel free to suggest/correct my understanding and
>> below code structure.
>>
>> SparkConf conf = new SparkConf().setAppName(appName).setMaster("local[8]"
>> );
>>
>> JavaSparkContext sc = new JavaSparkContext(conf);
>>
>> JavaPairRDD rdd = sc.wholeTextFiles("/../*.json");
>>
>> rdd.foreach(new VoidFunction>() {
>>
>>
>> @Override
>>
>> public void post(Tuple2 arg0) throws Exception {
>>
>> //post content to Solr
>>
>> arg0._2
>>
>> ...
>>
>> ...
>>
>> }
>>
>> });
>>
>>
>> Thanks,
>>
>> Susheel
>>
>
>

Re: dynamic allocation w/ spark streaming on mesos?

2015-11-11 Thread PhuDuc Nguyen

Dean,

Thanks for the reply. I'm searching (via spark mailing list archive and
google) and can't find the previous thread you mentioned. I've stumbled
upon a few but may not be the thread you're referring to. I'm very
interested in reading that discussion and any links/keywords would be
greatly appreciated.

I can see it's a non-trivial problem to solve for every use case in
streaming and thus not yet supported in general. However, I think (maybe
naively) it can be solved for specific use cases. If I use the available
features to create a fault tolerant design - i.e. failures/dead nodes can
occur on master nodes, driver node, or executor nodes without data loss and
"at-least-once" semantics is acceptable - then can't I safely scale down in
streaming by killing executors? If this is not true, then are we saying
that streaming is not fault tolerant?

I know it won't be as simple as setting a config like
spark.dyanmicAllocation.enabled=true and magically we'll have elastic
streaming, but I'm interested if anyone else has attempted to solve this
for their specific use case with extra coding involved? Pitfalls? Thoughts?

thanks,
Duc

On Wed, Nov 11, 2015 at 8:36 AM, Dean Wampler <deanwamp...@gmail.com> wrote:

> Dynamic allocation doesn't work yet with Spark Streaming in any cluster
> scenario. There was a previous thread on this topic which discusses the
> issues that need to be resolved.
>
> Dean Wampler, Ph.D.
> Author: Programming Scala, 2nd Edition
> <http://shop.oreilly.com/product/0636920033073.do> (O'Reilly)
> Typesafe <http://typesafe.com>
> @deanwampler <http://twitter.com/deanwampler>
> http://polyglotprogramming.com
>
> On Wed, Nov 11, 2015 at 8:09 AM, PhuDuc Nguyen <duc.was.h...@gmail.com>
> wrote:
>
>> I'm trying to get Spark Streaming to scale up/down its number of
>> executors within Mesos based on workload. It's not scaling down. I'm using
>> Spark 1.5.1 reading from Kafka using the direct (receiver-less) approach.
>>
>> Based on this ticket https://issues.apache.org/jira/browse/SPARK-6287
>> with the right configuration, I have a simple example working with the
>> spark-shell connected to a Mesos cluster. By working I mean the number of
>> executors scales up/down based on workload. However, the spark-shell is not
>> a streaming example.
>>
>> What is that status of dynamic resource allocation with Spark Streaming
>> on Mesos? Is it supported at all? Or supported but with some caveats to
>> ensure no data loss?
>>
>> thanks,
>> Duc
>>
>
>

dynamic allocation w/ spark streaming on mesos?

2015-11-11 Thread PhuDuc Nguyen

I'm trying to get Spark Streaming to scale up/down its number of executors
within Mesos based on workload. It's not scaling down. I'm using Spark
1.5.1 reading from Kafka using the direct (receiver-less) approach.

Based on this ticket https://issues.apache.org/jira/browse/SPARK-6287 with
the right configuration, I have a simple example working with the
spark-shell connected to a Mesos cluster. By working I mean the number of
executors scales up/down based on workload. However, the spark-shell is not
a streaming example.

What is that status of dynamic resource allocation with Spark Streaming on
Mesos? Is it supported at all? Or supported but with some caveats to ensure
no data loss?

thanks,
Duc

Re: dynamic allocation w/ spark streaming on mesos?

2015-11-11 Thread PhuDuc Nguyen

Awesome, thanks for the tip!



On Wed, Nov 11, 2015 at 2:25 PM, Tathagata Das <t...@databricks.com> wrote:

> The reason the existing dynamic allocation does not work out of the box
> for spark streaming is because the heuristics used for decided when to
> scale up/down is not the right one for micro-batch workloads. It works
> great for typical batch workloads. However you can use the underlying
> developer API to add / remove executors to implement your own scaling
> logic.
>
> 1. Use SparkContext.requestExecutor and SparkContext.killExecutor
>
> 2. Use StreamingListener to get the scheduling delay and processing times,
> and use that do a request or kill executors.
>
> TD
>
> On Wed, Nov 11, 2015 at 9:48 AM, PhuDuc Nguyen <duc.was.h...@gmail.com>
> wrote:
>
>> Dean,
>>
>> Thanks for the reply. I'm searching (via spark mailing list archive and
>> google) and can't find the previous thread you mentioned. I've stumbled
>> upon a few but may not be the thread you're referring to. I'm very
>> interested in reading that discussion and any links/keywords would be
>> greatly appreciated.
>>
>> I can see it's a non-trivial problem to solve for every use case in
>> streaming and thus not yet supported in general. However, I think (maybe
>> naively) it can be solved for specific use cases. If I use the available
>> features to create a fault tolerant design - i.e. failures/dead nodes can
>> occur on master nodes, driver node, or executor nodes without data loss and
>> "at-least-once" semantics is acceptable - then can't I safely scale down in
>> streaming by killing executors? If this is not true, then are we saying
>> that streaming is not fault tolerant?
>>
>> I know it won't be as simple as setting a config like
>> spark.dyanmicAllocation.enabled=true and magically we'll have elastic
>> streaming, but I'm interested if anyone else has attempted to solve this
>> for their specific use case with extra coding involved? Pitfalls? Thoughts?
>>
>> thanks,
>> Duc
>>
>>
>>
>>
>> On Wed, Nov 11, 2015 at 8:36 AM, Dean Wampler <deanwamp...@gmail.com>
>> wrote:
>>
>>> Dynamic allocation doesn't work yet with Spark Streaming in any cluster
>>> scenario. There was a previous thread on this topic which discusses the
>>> issues that need to be resolved.
>>>
>>> Dean Wampler, Ph.D.
>>> Author: Programming Scala, 2nd Edition
>>> <http://shop.oreilly.com/product/0636920033073.do> (O'Reilly)
>>> Typesafe <http://typesafe.com>
>>> @deanwampler <http://twitter.com/deanwampler>
>>> http://polyglotprogramming.com
>>>
>>> On Wed, Nov 11, 2015 at 8:09 AM, PhuDuc Nguyen <duc.was.h...@gmail.com>
>>> wrote:
>>>
>>>> I'm trying to get Spark Streaming to scale up/down its number of
>>>> executors within Mesos based on workload. It's not scaling down. I'm using
>>>> Spark 1.5.1 reading from Kafka using the direct (receiver-less) approach.
>>>>
>>>> Based on this ticket https://issues.apache.org/jira/browse/SPARK-6287
>>>> with the right configuration, I have a simple example working with the
>>>> spark-shell connected to a Mesos cluster. By working I mean the number of
>>>> executors scales up/down based on workload. However, the spark-shell is not
>>>> a streaming example.
>>>>
>>>> What is that status of dynamic resource allocation with Spark Streaming
>>>> on Mesos? Is it supported at all? Or supported but with some caveats to
>>>> ensure no data loss?
>>>>
>>>> thanks,
>>>> Duc
>>>>
>>>
>>>
>>
>

Re: can't kill spark job in supervise mode

can't kill spark job in supervise mode

Re: Spark Streaming + Kafka + scala job message read issue

Re: Preventing an RDD from shuffling

Re: [mesos][docker] addFile doesn't work properly

Re: Need to maintain the consumer offset by myself when using spark streaming kafka direct approach?

Re: Spark UI - Streaming Tab

Re: Parallelizing operations using Spark

Re: dynamic allocation w/ spark streaming on mesos?

dynamic allocation w/ spark streaming on mesos?

Re: dynamic allocation w/ spark streaming on mesos?

11 matches

Site Navigation

Mail list logo

Footer information