Re: [DISCUSS] SPIP: State Data Source - Reader

2023-10-16 Thread Jungtaek Lim
Thanks Bartosz and Anish for your support!

I'll wait for a couple more days to see whether we can hear more voices on
this. We could probably look for initiating a VOTE thread if there is no
objection.

On Tue, Oct 17, 2023 at 5:48 AM Anish Shrigondekar <
anish.shrigonde...@databricks.com> wrote:

> Hi Jungtaek,
>
> Thanks for putting this together. +1 from me and looks good overall.
> Posted some minor comments/questions to the doc.
>
> Thanks,
> Anish
>
> On Mon, Oct 16, 2023 at 11:25 AM Bartosz Konieczny <
> bartkoniec...@gmail.com> wrote:
>
>> Thank you, Jungtaek, for your answers! It's clear now.
>>
>> +1 for me. It seems like a prerequisite for further ops-related
>> improvements for the state store management. I mean especially here the
>> state rebalancing that could rely on this read+write state store API. I
>> don't mean here the dynamic state rebalancing that could probably be
>> implemented with a lower latency directly in the stateful API. Instead I'm
>> thinking more of an offline job to rebalance the state and later restart
>> the stateful pipeline with the changed number of shuffle partitions.
>>
>> Best,
>> Bartosz.
>>
>> On Mon, Oct 16, 2023 at 6:19 PM Jungtaek Lim <
>> kabhwan.opensou...@gmail.com> wrote:
>>
>>> bump for better reach
>>>
>>> On Thu, Oct 12, 2023 at 4:26 PM Jungtaek Lim <
>>> kabhwan.opensou...@gmail.com> wrote:
>>>
 Sorry, please use this link instead for SPIP doc:
 https://docs.google.com/document/d/1_iVf_CIu2RZd3yWWF6KoRNlBiz5NbSIK0yThqG0EvPY/edit?usp=sharing


 On Thu, Oct 12, 2023 at 3:58 PM Jungtaek Lim <
 kabhwan.opensou...@gmail.com> wrote:

> Hi dev,
>
> I'd like to start a discussion on "State Data Source - Reader".
>
> This proposal aims to introduce a new data source "statestore" which
> enables reading the state rows from existing checkpoint via offline 
> (batch)
> query. This will enable users to 1) create unit tests against stateful
> query verifying the state value (especially flatMapGroupsWithState), 2)
> gather more context on the status when an incident occurs, especially for
> incorrect output.
>
> *SPIP*:
> https://docs.google.com/document/d/1HjEupRv8TRFeULtJuxRq_tEG1Wq-9UNu-ctGgCYRke0/edit?usp=sharing
> *JIRA*: https://issues.apache.org/jira/browse/SPARK-45511
>
> Looking forward to your feedback!
>
> Thanks,
> Jungtaek Lim (HeartSaVioR)
>
> ps. The scope of the project is narrowed to the reader in this SPIP,
> since the writer requires us to consider more cases. We are planning on 
> it.
>

>>
>> --
>> Bartosz Konieczny
>> freelance data engineer
>> https://www.waitingforcode.com
>> https://github.com/bartosz25/
>> https://twitter.com/waitingforcode
>>
>>


Re: [DISCUSS] SPIP: State Data Source - Reader

2023-10-16 Thread Anish Shrigondekar
Hi Jungtaek,

Thanks for putting this together. +1 from me and looks good overall. Posted
some minor comments/questions to the doc.

Thanks,
Anish

On Mon, Oct 16, 2023 at 11:25 AM Bartosz Konieczny 
wrote:

> Thank you, Jungtaek, for your answers! It's clear now.
>
> +1 for me. It seems like a prerequisite for further ops-related
> improvements for the state store management. I mean especially here the
> state rebalancing that could rely on this read+write state store API. I
> don't mean here the dynamic state rebalancing that could probably be
> implemented with a lower latency directly in the stateful API. Instead I'm
> thinking more of an offline job to rebalance the state and later restart
> the stateful pipeline with the changed number of shuffle partitions.
>
> Best,
> Bartosz.
>
> On Mon, Oct 16, 2023 at 6:19 PM Jungtaek Lim 
> wrote:
>
>> bump for better reach
>>
>> On Thu, Oct 12, 2023 at 4:26 PM Jungtaek Lim <
>> kabhwan.opensou...@gmail.com> wrote:
>>
>>> Sorry, please use this link instead for SPIP doc:
>>> https://docs.google.com/document/d/1_iVf_CIu2RZd3yWWF6KoRNlBiz5NbSIK0yThqG0EvPY/edit?usp=sharing
>>>
>>>
>>> On Thu, Oct 12, 2023 at 3:58 PM Jungtaek Lim <
>>> kabhwan.opensou...@gmail.com> wrote:
>>>
 Hi dev,

 I'd like to start a discussion on "State Data Source - Reader".

 This proposal aims to introduce a new data source "statestore" which
 enables reading the state rows from existing checkpoint via offline (batch)
 query. This will enable users to 1) create unit tests against stateful
 query verifying the state value (especially flatMapGroupsWithState), 2)
 gather more context on the status when an incident occurs, especially for
 incorrect output.

 *SPIP*:
 https://docs.google.com/document/d/1HjEupRv8TRFeULtJuxRq_tEG1Wq-9UNu-ctGgCYRke0/edit?usp=sharing
 *JIRA*: https://issues.apache.org/jira/browse/SPARK-45511

 Looking forward to your feedback!

 Thanks,
 Jungtaek Lim (HeartSaVioR)

 ps. The scope of the project is narrowed to the reader in this SPIP,
 since the writer requires us to consider more cases. We are planning on it.

>>>
>
> --
> Bartosz Konieczny
> freelance data engineer
> https://www.waitingforcode.com
> https://github.com/bartosz25/
> https://twitter.com/waitingforcode
>
>


Re: [DISCUSS] SPIP: State Data Source - Reader

2023-10-16 Thread Bartosz Konieczny
Thank you, Jungtaek, for your answers! It's clear now.

+1 for me. It seems like a prerequisite for further ops-related
improvements for the state store management. I mean especially here the
state rebalancing that could rely on this read+write state store API. I
don't mean here the dynamic state rebalancing that could probably be
implemented with a lower latency directly in the stateful API. Instead I'm
thinking more of an offline job to rebalance the state and later restart
the stateful pipeline with the changed number of shuffle partitions.

Best,
Bartosz.

On Mon, Oct 16, 2023 at 6:19 PM Jungtaek Lim 
wrote:

> bump for better reach
>
> On Thu, Oct 12, 2023 at 4:26 PM Jungtaek Lim 
> wrote:
>
>> Sorry, please use this link instead for SPIP doc:
>> https://docs.google.com/document/d/1_iVf_CIu2RZd3yWWF6KoRNlBiz5NbSIK0yThqG0EvPY/edit?usp=sharing
>>
>>
>> On Thu, Oct 12, 2023 at 3:58 PM Jungtaek Lim <
>> kabhwan.opensou...@gmail.com> wrote:
>>
>>> Hi dev,
>>>
>>> I'd like to start a discussion on "State Data Source - Reader".
>>>
>>> This proposal aims to introduce a new data source "statestore" which
>>> enables reading the state rows from existing checkpoint via offline (batch)
>>> query. This will enable users to 1) create unit tests against stateful
>>> query verifying the state value (especially flatMapGroupsWithState), 2)
>>> gather more context on the status when an incident occurs, especially for
>>> incorrect output.
>>>
>>> *SPIP*:
>>> https://docs.google.com/document/d/1HjEupRv8TRFeULtJuxRq_tEG1Wq-9UNu-ctGgCYRke0/edit?usp=sharing
>>> *JIRA*: https://issues.apache.org/jira/browse/SPARK-45511
>>>
>>> Looking forward to your feedback!
>>>
>>> Thanks,
>>> Jungtaek Lim (HeartSaVioR)
>>>
>>> ps. The scope of the project is narrowed to the reader in this SPIP,
>>> since the writer requires us to consider more cases. We are planning on it.
>>>
>>

-- 
Bartosz Konieczny
freelance data engineer
https://www.waitingforcode.com
https://github.com/bartosz25/
https://twitter.com/waitingforcode


Re: [DISCUSS] SPIP: State Data Source - Reader

2023-10-16 Thread Jungtaek Lim
bump for better reach

On Thu, Oct 12, 2023 at 4:26 PM Jungtaek Lim 
wrote:

> Sorry, please use this link instead for SPIP doc:
> https://docs.google.com/document/d/1_iVf_CIu2RZd3yWWF6KoRNlBiz5NbSIK0yThqG0EvPY/edit?usp=sharing
>
>
> On Thu, Oct 12, 2023 at 3:58 PM Jungtaek Lim 
> wrote:
>
>> Hi dev,
>>
>> I'd like to start a discussion on "State Data Source - Reader".
>>
>> This proposal aims to introduce a new data source "statestore" which
>> enables reading the state rows from existing checkpoint via offline (batch)
>> query. This will enable users to 1) create unit tests against stateful
>> query verifying the state value (especially flatMapGroupsWithState), 2)
>> gather more context on the status when an incident occurs, especially for
>> incorrect output.
>>
>> *SPIP*:
>> https://docs.google.com/document/d/1HjEupRv8TRFeULtJuxRq_tEG1Wq-9UNu-ctGgCYRke0/edit?usp=sharing
>> *JIRA*: https://issues.apache.org/jira/browse/SPARK-45511
>>
>> Looking forward to your feedback!
>>
>> Thanks,
>> Jungtaek Lim (HeartSaVioR)
>>
>> ps. The scope of the project is narrowed to the reader in this SPIP,
>> since the writer requires us to consider more cases. We are planning on it.
>>
>