Re: trigger once (batch job with streaming semantics)
Hi Georg, I'm not aware of those examples being available publicly. Best regards, Martijn On Mon, 9 May 2022 at 23:04, Georg Heiler wrote: > Hi Martijn, > > many thanks for this clarification. Do you know of any example somewhere > which would showcase such an approach? > > Best, > Georg > > Am Mo., 9. Mai 2022 um 14:45 Uhr schrieb Martijn Visser < > martijnvis...@apache.org>: > >> Hi Georg, >> >> No they wouldn't. There is no capability out of the box that lets you >> start Flink in streaming mode, run everything that's available at that >> moment and then stops when there's no data anymore. You would need to >> trigger the stop yourself. >> >> Best regards, >> >> Martijn >> >> On Fri, 6 May 2022 at 13:37, Georg Heiler >> wrote: >> >>> Hi, >>> >>> I would disagree: >>> In the case of spark, it is a streaming application that is offering >>> full streaming semantics (but with less cost and bigger latency) as it >>> triggers less often. In particular, windowing and stateful semantics as >>> well as late-arriving data are handled automatically using the regular >>> streaming features. >>> >>> Would these features be available in a Flink Batch job as well? >>> >>> Best, >>> Georg >>> >>> Am Fr., 6. Mai 2022 um 13:26 Uhr schrieb Martijn Visser < >>> martijnvis...@apache.org>: >>> Hi Georg, Flink batch applications run until all their input is processed. When that's the case, the application finishes. You can read more about this in the documentation for DataStream [1] or Table API [2]. I think this matches the same as Spark is explaining in the documentation. Best regards, Martijn [1] https://nightlies.apache.org/flink/flink-docs-master/docs/dev/datastream/execution_mode/ [2] https://nightlies.apache.org/flink/flink-docs-master/docs/dev/table/common/ On Mon, 2 May 2022 at 16:46, Georg Heiler wrote: > Hi, > > spark > https://spark.apache.org/docs/latest/structured-streaming-programming-guide.html#triggers > offers a variety of triggers. > > In particular, it also has the "once" mode: > > *One-time micro-batch* The query will execute *only one* micro-batch > to process all the available data and then stop on its own. This is useful > in scenarios you want to periodically spin up a cluster, process > everything > that is available since the last period, and then shutdown the cluster. In > some case, this may lead to significant cost savings. > > Does flink have a similar possibility? > > Best, > Georg >
Re: trigger once (batch job with streaming semantics)
Hi Martijn, many thanks for this clarification. Do you know of any example somewhere which would showcase such an approach? Best, Georg Am Mo., 9. Mai 2022 um 14:45 Uhr schrieb Martijn Visser < martijnvis...@apache.org>: > Hi Georg, > > No they wouldn't. There is no capability out of the box that lets you > start Flink in streaming mode, run everything that's available at that > moment and then stops when there's no data anymore. You would need to > trigger the stop yourself. > > Best regards, > > Martijn > > On Fri, 6 May 2022 at 13:37, Georg Heiler > wrote: > >> Hi, >> >> I would disagree: >> In the case of spark, it is a streaming application that is offering full >> streaming semantics (but with less cost and bigger latency) as it triggers >> less often. In particular, windowing and stateful semantics as well as >> late-arriving data are handled automatically using the regular streaming >> features. >> >> Would these features be available in a Flink Batch job as well? >> >> Best, >> Georg >> >> Am Fr., 6. Mai 2022 um 13:26 Uhr schrieb Martijn Visser < >> martijnvis...@apache.org>: >> >>> Hi Georg, >>> >>> Flink batch applications run until all their input is processed. When >>> that's the case, the application finishes. You can read more about this in >>> the documentation for DataStream [1] or Table API [2]. I think this matches >>> the same as Spark is explaining in the documentation. >>> >>> Best regards, >>> >>> Martijn >>> >>> [1] >>> https://nightlies.apache.org/flink/flink-docs-master/docs/dev/datastream/execution_mode/ >>> [2] >>> https://nightlies.apache.org/flink/flink-docs-master/docs/dev/table/common/ >>> >>> On Mon, 2 May 2022 at 16:46, Georg Heiler >>> wrote: >>> Hi, spark https://spark.apache.org/docs/latest/structured-streaming-programming-guide.html#triggers offers a variety of triggers. In particular, it also has the "once" mode: *One-time micro-batch* The query will execute *only one* micro-batch to process all the available data and then stop on its own. This is useful in scenarios you want to periodically spin up a cluster, process everything that is available since the last period, and then shutdown the cluster. In some case, this may lead to significant cost savings. Does flink have a similar possibility? Best, Georg >>>
Re: trigger once (batch job with streaming semantics)
Hi Georg, No they wouldn't. There is no capability out of the box that lets you start Flink in streaming mode, run everything that's available at that moment and then stops when there's no data anymore. You would need to trigger the stop yourself. Best regards, Martijn On Fri, 6 May 2022 at 13:37, Georg Heiler wrote: > Hi, > > I would disagree: > In the case of spark, it is a streaming application that is offering full > streaming semantics (but with less cost and bigger latency) as it triggers > less often. In particular, windowing and stateful semantics as well as > late-arriving data are handled automatically using the regular streaming > features. > > Would these features be available in a Flink Batch job as well? > > Best, > Georg > > Am Fr., 6. Mai 2022 um 13:26 Uhr schrieb Martijn Visser < > martijnvis...@apache.org>: > >> Hi Georg, >> >> Flink batch applications run until all their input is processed. When >> that's the case, the application finishes. You can read more about this in >> the documentation for DataStream [1] or Table API [2]. I think this matches >> the same as Spark is explaining in the documentation. >> >> Best regards, >> >> Martijn >> >> [1] >> https://nightlies.apache.org/flink/flink-docs-master/docs/dev/datastream/execution_mode/ >> [2] >> https://nightlies.apache.org/flink/flink-docs-master/docs/dev/table/common/ >> >> On Mon, 2 May 2022 at 16:46, Georg Heiler >> wrote: >> >>> Hi, >>> >>> spark >>> https://spark.apache.org/docs/latest/structured-streaming-programming-guide.html#triggers >>> offers a variety of triggers. >>> >>> In particular, it also has the "once" mode: >>> >>> *One-time micro-batch* The query will execute *only one* micro-batch to >>> process all the available data and then stop on its own. This is useful in >>> scenarios you want to periodically spin up a cluster, process everything >>> that is available since the last period, and then shutdown the cluster. In >>> some case, this may lead to significant cost savings. >>> >>> Does flink have a similar possibility? >>> >>> Best, >>> Georg >>> >>
Re: trigger once (batch job with streaming semantics)
Hi, I would disagree: In the case of spark, it is a streaming application that is offering full streaming semantics (but with less cost and bigger latency) as it triggers less often. In particular, windowing and stateful semantics as well as late-arriving data are handled automatically using the regular streaming features. Would these features be available in a Flink Batch job as well? Best, Georg Am Fr., 6. Mai 2022 um 13:26 Uhr schrieb Martijn Visser < martijnvis...@apache.org>: > Hi Georg, > > Flink batch applications run until all their input is processed. When > that's the case, the application finishes. You can read more about this in > the documentation for DataStream [1] or Table API [2]. I think this matches > the same as Spark is explaining in the documentation. > > Best regards, > > Martijn > > [1] > https://nightlies.apache.org/flink/flink-docs-master/docs/dev/datastream/execution_mode/ > [2] > https://nightlies.apache.org/flink/flink-docs-master/docs/dev/table/common/ > > On Mon, 2 May 2022 at 16:46, Georg Heiler > wrote: > >> Hi, >> >> spark >> https://spark.apache.org/docs/latest/structured-streaming-programming-guide.html#triggers >> offers a variety of triggers. >> >> In particular, it also has the "once" mode: >> >> *One-time micro-batch* The query will execute *only one* micro-batch to >> process all the available data and then stop on its own. This is useful in >> scenarios you want to periodically spin up a cluster, process everything >> that is available since the last period, and then shutdown the cluster. In >> some case, this may lead to significant cost savings. >> >> Does flink have a similar possibility? >> >> Best, >> Georg >> >
Re: trigger once (batch job with streaming semantics)
Hi Georg, Flink batch applications run until all their input is processed. When that's the case, the application finishes. You can read more about this in the documentation for DataStream [1] or Table API [2]. I think this matches the same as Spark is explaining in the documentation. Best regards, Martijn [1] https://nightlies.apache.org/flink/flink-docs-master/docs/dev/datastream/execution_mode/ [2] https://nightlies.apache.org/flink/flink-docs-master/docs/dev/table/common/ On Mon, 2 May 2022 at 16:46, Georg Heiler wrote: > Hi, > > spark > https://spark.apache.org/docs/latest/structured-streaming-programming-guide.html#triggers > offers a variety of triggers. > > In particular, it also has the "once" mode: > > *One-time micro-batch* The query will execute *only one* micro-batch to > process all the available data and then stop on its own. This is useful in > scenarios you want to periodically spin up a cluster, process everything > that is available since the last period, and then shutdown the cluster. In > some case, this may lead to significant cost savings. > > Does flink have a similar possibility? > > Best, > Georg >
trigger once (batch job with streaming semantics)
Hi, spark https://spark.apache.org/docs/latest/structured-streaming-programming-guide.html#triggers offers a variety of triggers. In particular, it also has the "once" mode: *One-time micro-batch* The query will execute *only one* micro-batch to process all the available data and then stop on its own. This is useful in scenarios you want to periodically spin up a cluster, process everything that is available since the last period, and then shutdown the cluster. In some case, this may lead to significant cost savings. Does flink have a similar possibility? Best, Georg