Thanks TD.
On Tue, Mar 14, 2017 at 4:37 PM, Tathagata Das wrote:
> This setting allows multiple spark jobs generated through multiple
> foreachRDD to run concurrently, even if they are across batches. So output
> op2 from batch X, can run concurrently with op1 of batch X+1
This setting allows multiple spark jobs generated through multiple
foreachRDD to run concurrently, even if they are across batches. So output
op2 from batch X, can run concurrently with op1 of batch X+1
This is not safe because it breaks the checkpointing logic in subtle ways.
Note that this was
Thanks TD for the response. Can you please provide more explanation. I am
having multiple streams in the spark streaming application (Spark 2.0.2
using DStreams). I know many people using this setting. So your
explanation will help a lot of people.
Thanks
On Fri, Mar 10, 2017 at 6:24 PM,
That config I not safe. Please do not use it.
On Mar 10, 2017 10:03 AM, "shyla deshpande"
wrote:
> I have a spark streaming application which processes 3 kafka streams and
> has 5 output operations.
>
> Not sure what should be the setting for
I have a spark streaming application which processes 3 kafka streams and
has 5 output operations.
Not sure what should be the setting for spark.streaming.concurrentJobs.
1. If the concurrentJobs setting is 4 does that mean 2 output operations
will be run sequentially?
2. If I had 6 cores what
perhaps due
to some additional overhead with YARN.
Is there any safe way to launch concurrent jobs like this using a single
PySpark context?
--
Mike Sukmanowsky
Aspiring Digital Carpenter
*e*: mike.sukmanow...@gmail.com
LinkedIn http://www.linkedin.com/profile/view?id=10897143 | github
to launch multiple Spark jobs via spark-submit and
let YARN/Spark's dynamic executor allocation take care of fair scheduling.
In practice, this doesn't seem to yield very fast computation perhaps due
to some additional overhead with YARN.
Is there any safe way to launch concurrent jobs like this using
By looking at the code of JobScheduler, I find a parameter of below:
private val numConcurrentJobs =
ssc.conf.getInt(spark.streaming.concurrentJobs, 1)
private val jobExecutor = Executors.newFixedThreadPool(numConcurrentJobs)
Does that mean each App can have only one active stage?
app has a number of jars that
I don't particularly want to have to upload each time I want to run a small
ad-hoc spark-shell session.
Thanks,
Ishaaq
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/launching-concurrent-jobs-programmatically-tp4990p5033.html
-spark-user-list.1001560.n3.nabble.com/launching-concurrent-jobs-programmatically-tp4990.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.
-concurrent-jobs-programmatically-tp4990.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.
11 matches
Mail list logo