Re: trigger once (batch job with streaming semantics)

2022-05-10 Thread Martijn Visser
Hi Georg,

I'm not aware of those examples being available publicly.

Best regards,

Martijn

On Mon, 9 May 2022 at 23:04, Georg Heiler  wrote:

> Hi Martijn,
>
> many thanks for this clarification. Do you know of any example somewhere
> which would showcase such an approach?
>
> Best,
> Georg
>
> Am Mo., 9. Mai 2022 um 14:45 Uhr schrieb Martijn Visser <
> martijnvis...@apache.org>:
>
>> Hi Georg,
>>
>> No they wouldn't. There is no capability out of the box that lets you
>> start Flink in streaming mode, run everything that's available at that
>> moment and then stops when there's no data anymore. You would need to
>> trigger the stop yourself.
>>
>> Best regards,
>>
>> Martijn
>>
>> On Fri, 6 May 2022 at 13:37, Georg Heiler 
>> wrote:
>>
>>> Hi,
>>>
>>> I would disagree:
>>> In the case of spark, it is a streaming application that is offering
>>> full streaming semantics (but with less cost and bigger latency) as it
>>> triggers less often. In particular, windowing and stateful semantics as
>>> well as late-arriving data are handled automatically using the regular
>>> streaming features.
>>>
>>> Would these features be available in a Flink Batch job as well?
>>>
>>> Best,
>>> Georg
>>>
>>> Am Fr., 6. Mai 2022 um 13:26 Uhr schrieb Martijn Visser <
>>> martijnvis...@apache.org>:
>>>
 Hi Georg,

 Flink batch applications run until all their input is processed. When
 that's the case, the application finishes. You can read more about this in
 the documentation for DataStream [1] or Table API [2]. I think this matches
 the same as Spark is explaining in the documentation.

 Best regards,

 Martijn

 [1]
 https://nightlies.apache.org/flink/flink-docs-master/docs/dev/datastream/execution_mode/
 [2]
 https://nightlies.apache.org/flink/flink-docs-master/docs/dev/table/common/

 On Mon, 2 May 2022 at 16:46, Georg Heiler 
 wrote:

> Hi,
>
> spark
> https://spark.apache.org/docs/latest/structured-streaming-programming-guide.html#triggers
> offers a variety of triggers.
>
> In particular, it also has the "once" mode:
>
> *One-time micro-batch* The query will execute *only one* micro-batch
> to process all the available data and then stop on its own. This is useful
> in scenarios you want to periodically spin up a cluster, process 
> everything
> that is available since the last period, and then shutdown the cluster. In
> some case, this may lead to significant cost savings.
>
> Does flink have a similar possibility?
>
> Best,
> Georg
>



Re: trigger once (batch job with streaming semantics)

2022-05-09 Thread Georg Heiler
Hi Martijn,

many thanks for this clarification. Do you know of any example somewhere
which would showcase such an approach?

Best,
Georg

Am Mo., 9. Mai 2022 um 14:45 Uhr schrieb Martijn Visser <
martijnvis...@apache.org>:

> Hi Georg,
>
> No they wouldn't. There is no capability out of the box that lets you
> start Flink in streaming mode, run everything that's available at that
> moment and then stops when there's no data anymore. You would need to
> trigger the stop yourself.
>
> Best regards,
>
> Martijn
>
> On Fri, 6 May 2022 at 13:37, Georg Heiler 
> wrote:
>
>> Hi,
>>
>> I would disagree:
>> In the case of spark, it is a streaming application that is offering full
>> streaming semantics (but with less cost and bigger latency) as it triggers
>> less often. In particular, windowing and stateful semantics as well as
>> late-arriving data are handled automatically using the regular streaming
>> features.
>>
>> Would these features be available in a Flink Batch job as well?
>>
>> Best,
>> Georg
>>
>> Am Fr., 6. Mai 2022 um 13:26 Uhr schrieb Martijn Visser <
>> martijnvis...@apache.org>:
>>
>>> Hi Georg,
>>>
>>> Flink batch applications run until all their input is processed. When
>>> that's the case, the application finishes. You can read more about this in
>>> the documentation for DataStream [1] or Table API [2]. I think this matches
>>> the same as Spark is explaining in the documentation.
>>>
>>> Best regards,
>>>
>>> Martijn
>>>
>>> [1]
>>> https://nightlies.apache.org/flink/flink-docs-master/docs/dev/datastream/execution_mode/
>>> [2]
>>> https://nightlies.apache.org/flink/flink-docs-master/docs/dev/table/common/
>>>
>>> On Mon, 2 May 2022 at 16:46, Georg Heiler 
>>> wrote:
>>>
 Hi,

 spark
 https://spark.apache.org/docs/latest/structured-streaming-programming-guide.html#triggers
 offers a variety of triggers.

 In particular, it also has the "once" mode:

 *One-time micro-batch* The query will execute *only one* micro-batch
 to process all the available data and then stop on its own. This is useful
 in scenarios you want to periodically spin up a cluster, process everything
 that is available since the last period, and then shutdown the cluster. In
 some case, this may lead to significant cost savings.

 Does flink have a similar possibility?

 Best,
 Georg

>>>


Re: trigger once (batch job with streaming semantics)

2022-05-09 Thread Martijn Visser
Hi Georg,

No they wouldn't. There is no capability out of the box that lets you start
Flink in streaming mode, run everything that's available at that moment and
then stops when there's no data anymore. You would need to trigger the stop
yourself.

Best regards,

Martijn

On Fri, 6 May 2022 at 13:37, Georg Heiler  wrote:

> Hi,
>
> I would disagree:
> In the case of spark, it is a streaming application that is offering full
> streaming semantics (but with less cost and bigger latency) as it triggers
> less often. In particular, windowing and stateful semantics as well as
> late-arriving data are handled automatically using the regular streaming
> features.
>
> Would these features be available in a Flink Batch job as well?
>
> Best,
> Georg
>
> Am Fr., 6. Mai 2022 um 13:26 Uhr schrieb Martijn Visser <
> martijnvis...@apache.org>:
>
>> Hi Georg,
>>
>> Flink batch applications run until all their input is processed. When
>> that's the case, the application finishes. You can read more about this in
>> the documentation for DataStream [1] or Table API [2]. I think this matches
>> the same as Spark is explaining in the documentation.
>>
>> Best regards,
>>
>> Martijn
>>
>> [1]
>> https://nightlies.apache.org/flink/flink-docs-master/docs/dev/datastream/execution_mode/
>> [2]
>> https://nightlies.apache.org/flink/flink-docs-master/docs/dev/table/common/
>>
>> On Mon, 2 May 2022 at 16:46, Georg Heiler 
>> wrote:
>>
>>> Hi,
>>>
>>> spark
>>> https://spark.apache.org/docs/latest/structured-streaming-programming-guide.html#triggers
>>> offers a variety of triggers.
>>>
>>> In particular, it also has the "once" mode:
>>>
>>> *One-time micro-batch* The query will execute *only one* micro-batch to
>>> process all the available data and then stop on its own. This is useful in
>>> scenarios you want to periodically spin up a cluster, process everything
>>> that is available since the last period, and then shutdown the cluster. In
>>> some case, this may lead to significant cost savings.
>>>
>>> Does flink have a similar possibility?
>>>
>>> Best,
>>> Georg
>>>
>>


Re: trigger once (batch job with streaming semantics)

2022-05-06 Thread Georg Heiler
Hi,

I would disagree:
In the case of spark, it is a streaming application that is offering full
streaming semantics (but with less cost and bigger latency) as it triggers
less often. In particular, windowing and stateful semantics as well as
late-arriving data are handled automatically using the regular streaming
features.

Would these features be available in a Flink Batch job as well?

Best,
Georg

Am Fr., 6. Mai 2022 um 13:26 Uhr schrieb Martijn Visser <
martijnvis...@apache.org>:

> Hi Georg,
>
> Flink batch applications run until all their input is processed. When
> that's the case, the application finishes. You can read more about this in
> the documentation for DataStream [1] or Table API [2]. I think this matches
> the same as Spark is explaining in the documentation.
>
> Best regards,
>
> Martijn
>
> [1]
> https://nightlies.apache.org/flink/flink-docs-master/docs/dev/datastream/execution_mode/
> [2]
> https://nightlies.apache.org/flink/flink-docs-master/docs/dev/table/common/
>
> On Mon, 2 May 2022 at 16:46, Georg Heiler 
> wrote:
>
>> Hi,
>>
>> spark
>> https://spark.apache.org/docs/latest/structured-streaming-programming-guide.html#triggers
>> offers a variety of triggers.
>>
>> In particular, it also has the "once" mode:
>>
>> *One-time micro-batch* The query will execute *only one* micro-batch to
>> process all the available data and then stop on its own. This is useful in
>> scenarios you want to periodically spin up a cluster, process everything
>> that is available since the last period, and then shutdown the cluster. In
>> some case, this may lead to significant cost savings.
>>
>> Does flink have a similar possibility?
>>
>> Best,
>> Georg
>>
>


Re: trigger once (batch job with streaming semantics)

2022-05-06 Thread Martijn Visser
Hi Georg,

Flink batch applications run until all their input is processed. When
that's the case, the application finishes. You can read more about this in
the documentation for DataStream [1] or Table API [2]. I think this matches
the same as Spark is explaining in the documentation.

Best regards,

Martijn

[1]
https://nightlies.apache.org/flink/flink-docs-master/docs/dev/datastream/execution_mode/
[2]
https://nightlies.apache.org/flink/flink-docs-master/docs/dev/table/common/

On Mon, 2 May 2022 at 16:46, Georg Heiler  wrote:

> Hi,
>
> spark
> https://spark.apache.org/docs/latest/structured-streaming-programming-guide.html#triggers
> offers a variety of triggers.
>
> In particular, it also has the "once" mode:
>
> *One-time micro-batch* The query will execute *only one* micro-batch to
> process all the available data and then stop on its own. This is useful in
> scenarios you want to periodically spin up a cluster, process everything
> that is available since the last period, and then shutdown the cluster. In
> some case, this may lead to significant cost savings.
>
> Does flink have a similar possibility?
>
> Best,
> Georg
>


trigger once (batch job with streaming semantics)

2022-05-02 Thread Georg Heiler
Hi,

spark
https://spark.apache.org/docs/latest/structured-streaming-programming-guide.html#triggers
offers a variety of triggers.

In particular, it also has the "once" mode:

*One-time micro-batch* The query will execute *only one* micro-batch to
process all the available data and then stop on its own. This is useful in
scenarios you want to periodically spin up a cluster, process everything
that is available since the last period, and then shutdown the cluster. In
some case, this may lead to significant cost savings.

Does flink have a similar possibility?

Best,
Georg