Change parallelism number in Spark Streaming

2019-06-26 Thread Rong, Jialei
Hi Dear Spark Expert

I’m curious about a question regarding Spark Streaming/Structured Streaming: 
whether it allows to change parallelism number(the default one or the one 
specified in particular operator) in a stream having stateful 
transform/operator? Whether this will cause my checkpointed state get messed up?


Regards
Jialei



Re: Change parallelism number in Spark Streaming

2019-06-26 Thread Rong, Jialei
Thank you for your quick reply!
Is there any plan to improve this?
I asked this question due to some investigation on comparing those state of art 
streaming systems, among which Flink and DataFlow allow changing parallelism 
number, and by my knowledge of Spark Streaming, it seems it is also able to do 
that: if some “key interval” concept is used, then state can somehow decoupled 
from partition number by consistent hashing.


Regards
Jialei

From: Jacek Laskowski 
Date: Wednesday, June 26, 2019 at 11:00 AM
To: "Rong, Jialei" 
Cc: "user @spark" 
Subject: Re: Change parallelism number in Spark Streaming

Hi,

It's not allowed to change the numer of partitions after your streaming query 
is started.

The reason is exactly the number of state stores which is exactly the number of 
partitions (perhaps multiplied by the number of stateful operators).

I think you'll even get a warning or an exception when you change it after 
restarting the query.

The number of partitions is stored in a checkpoint location.

Jacek

On Wed, 26 Jun 2019, 19:30 Rong, Jialei,  wrote:
Hi Dear Spark Expert

I’m curious about a question regarding Spark Streaming/Structured Streaming: 
whether it allows to change parallelism number(the default one or the one 
specified in particular operator) in a stream having stateful 
transform/operator? Whether this will cause my checkpointed state get messed up?


Regards
Jialei



Re: Change parallelism number in Spark Streaming

2019-06-26 Thread Rong, Jialei
Fantastic, thanks!


From: Jungtaek Lim 
Date: Wednesday, June 26, 2019 at 2:59 PM
To: "Rong, Jialei" 
Cc: Jacek Laskowski , "user @spark" 
Subject: Re: Change parallelism number in Spark Streaming

Hi,

you could consider state operator's partition numbers as "max parallelism", as 
parallelism can be reduced via applying coalesce. It would be effectively 
working similar as key groups.

If you're also considering offline query, there's a tool to manipulate state 
which enables reading and writing state in structured streaming, achieving 
rescaling and schema evolution.

https://github.com/HeartSaVioR/spark-state-tools
(DISCLAIMER: I'm an author of this tool.)

Thanks,
Jungtaek Lim (HeartSaVioR)

On Thu, Jun 27, 2019 at 4:48 AM Rong, Jialei  wrote:
Thank you for your quick reply!
Is there any plan to improve this?
I asked this question due to some investigation on comparing those state of art 
streaming systems, among which Flink and DataFlow allow changing parallelism 
number, and by my knowledge of Spark Streaming, it seems it is also able to do 
that: if some “key interval” concept is used, then state can somehow decoupled 
from partition number by consistent hashing.


Regards
Jialei

From: Jacek Laskowski mailto:ja...@japila.pl>>
Date: Wednesday, June 26, 2019 at 11:00 AM
To: "Rong, Jialei" 
Cc: "user @spark" mailto:user@spark.apache.org>>
Subject: Re: Change parallelism number in Spark Streaming

Hi,

It's not allowed to change the numer of partitions after your streaming query 
is started.

The reason is exactly the number of state stores which is exactly the number of 
partitions (perhaps multiplied by the number of stateful operators).

I think you'll even get a warning or an exception when you change it after 
restarting the query.

The number of partitions is stored in a checkpoint location.

Jacek

On Wed, 26 Jun 2019, 19:30 Rong, Jialei,  wrote:
Hi Dear Spark Expert

I’m curious about a question regarding Spark Streaming/Structured Streaming: 
whether it allows to change parallelism number(the default one or the one 
specified in particular operator) in a stream having stateful 
transform/operator? Whether this will cause my checkpointed state get messed up?


Regards
Jialei



--
Name : Jungtaek Lim
Blog : http://medium.com/@heartsavior
Twitter : http://twitter.com/heartsavior
LinkedIn : http://www.linkedin.com/in/heartsavior


unsubscribe

2020-04-16 Thread Rong, Jialei