Spark multiple iterations in batch processing

2022-12-23 Thread Suparn Lele (sulele)
Hi, Basically I am running a flink batch job. My requirement is following I have 10 tables having raw data in postgresql I want to aggregate that data by creating a tumble window of 10 minutes I need to store the aggregated data into aggregated postgresql tables My pseudo code somewhat looks

Re: Flink Batch Processing

2020-09-29 Thread Timo Walther
like to use CEP API, you can use Table API (StreamTableEnvironment) to read from Hbase and call `toAppendStream` directly afterwards to further process in DataStream API. This works also for bounded streams thus you can do "batch" processing. Regards, Timo On 29.09.20 09:56, Til

Re: Flink Batch Processing

2020-09-29 Thread Till Rohrmann
ek > > [1] > https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=158866741 > > pon., 28 wrz 2020 o 15:14 s_penakalap...@yahoo.com < > s_penakalap...@yahoo.com> napisał(a): > > Hi All, > > Need your help in Flink Batch processing: scenario described below: >

Re: Flink Batch Processing

2020-09-29 Thread s_penakalap...@yahoo.com
p here. Piotrek [1] https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=158866741 pon., 28 wrz 2020 o 15:14 s_penakalap...@yahoo.com napisał(a): Hi All, Need your help in Flink Batch processing: scenario described below: we have multiple vehicles, we get data from each vehicle at a ver

Re: Flink Batch Processing

2020-09-28 Thread Piotr Nowojski
., 28 wrz 2020 o 15:14 s_penakalap...@yahoo.com napisał(a): > Hi All, > > Need your help in Flink Batch processing: scenario described below: > > we have multiple vehicles, we get data from each vehicle at a very high > speed, 1 record per minute. > thresholds can be set

Flink Batch Processing

2020-09-28 Thread s_penakalap...@yahoo.com
Hi All, Need your help in Flink Batch processing: scenario described below: we have multiple vehicles, we get data from each vehicle at a very high speed, 1 record per minute.thresholds can be set by the owner for each vehicle.  Say: we have 3 vehicles, threshold is set for 2 vehicles. Vehicle 1

Re: Decompressing Tar Files for Batch Processing

2020-07-07 Thread Austin Cawley-Edwards
On Tue, Jul 7, 2020 at 10:53 AM Austin Cawley-Edwards < austin.caw...@gmail.com> wrote: > Hey Xiaolong, > > Thanks for the suggestions. Just to make sure I understand, are you saying > to run the download and decompression in the Job Manager before executing > the job? > > I think another way to

Re: Decompressing Tar Files for Batch Processing

2020-07-07 Thread Austin Cawley-Edwards
Hey Chesnay, Thanks for the advice, and easy enough to do it in a separate process. Best, Austin On Tue, Jul 7, 2020 at 10:29 AM Chesnay Schepler wrote: > I would probably go with a separate process. > > Downloading the file could work with Flink if it is already present in > some supported

Re: Decompressing Tar Files for Batch Processing

2020-07-07 Thread Chesnay Schepler
I would probably go with a separate process. Downloading the file could work with Flink if it is already present in some supported filesystem. Decompressing the file is supported for selected formats (deflate, gzip, bz2, xz), but this seems to be an undocumented feature, so I'm not sure how

Decompressing Tar Files for Batch Processing

2020-07-06 Thread Austin Cawley-Edwards
Hey all, I need to ingest a tar file containing ~1GB of data in around 10 CSVs. The data is fairly connected and needs some cleaning, which I'd like to do with the Batch Table API + SQL (but have never used before). I've got a small prototype loading the uncompressed CSVs and applying the

Configuration Best Practices for Batch Processing

2019-10-20 Thread Micah Whitacre
I was curious suggested best practices as it relates to running batch processes on Flink. Does anyone have any good guides on good default settings and configuration? One question I'm really curious about is what suggestions there might be for the relationship of memory of TaskManagers? Number

Batch Processing

2018-07-05 Thread Gaurav Sehgal
Hello, I am looking for batch processing framework which will read data in batches from MongoDb and enrich it using another data source and then upload them in ElasticSearch, is Flink a good framework for such a use case. Regards, Gaurav

Re: Flink batch processing fault tolerance

2017-02-17 Thread Aljoscha Krettek
> 发送时间:2017年2月17日(星期五) 11:22 > 收件人:user <user@flink.apache.org> > 主 题:Re: Flink batch processing fault tolerance > > Hi, > > It's the reason why I gave up use Flink for my current project and pick up > traditional Hadoop Framework again. > > 2017-02-17 10:56 GMT+08:00 Renjie

回复:Flink batch processing fault tolerance

2017-02-16 Thread wangzhijiang999
--发件人:Si-li Liu <unix...@gmail.com>发送时间:2017年2月17日(星期五) 11:22收件人:user <user@flink.apache.org>主 题:Re: Flink batch processing fault tolerance Hi,  It's the reason why I gave up use Flink for my current project and pick up traditional Hadoop Framework again.  2017-02-17 10:56 GMT+08:0

Re: Flink batch processing fault tolerance

2017-02-16 Thread Si-li Liu
>> >> >> >> *From:* Aljoscha Krettek [mailto:aljos...@apache.org] >> *Sent:* Thursday, February 16, 2017 2:48 PM >> *To:* user@flink.apache.org >> *Subject:* Re: Flink batch processing fault tolerance >> >> >> >> Hi, >>

Re: Flink batch processing fault tolerance

2017-02-16 Thread Renjie Liu
; > > Best, > > Anton > > > > > > *From:* Aljoscha Krettek [mailto:aljos...@apache.org] > *Sent:* Thursday, February 16, 2017 2:48 PM > *To:* user@flink.apache.org > *Subject:* Re: Flink batch processing fault tolerance > > > > Hi, > > yes, t

RE: Flink batch processing fault tolerance

2017-02-16 Thread Anton Solovev
Hi Aljoscha, Could you share your plans of resolving it? Best, Anton From: Aljoscha Krettek [mailto:aljos...@apache.org] Sent: Thursday, February 16, 2017 2:48 PM To: user@flink.apache.org Subject: Re: Flink batch processing fault tolerance Hi, yes, this is indeed true. We had some plans

Re: Flink batch processing fault tolerance

2017-02-16 Thread Aljoscha Krettek
Hi, yes, this is indeed true. We had some plans for how to resolve this but they never materialised because of the focus on Stream Processing. We might unite the two in the future and then you will get fault-tolerant batch/stream processing in the same API. Best, Aljoscha On Wed, 15 Feb 2017 at

Flink batch processing fault tolerance

2017-02-15 Thread Renjie Liu
Hi, all: I'm learning flink's doc and curious about the fault tolerance of batch process jobs. It seems that when one of task execution fails, the whole job will be restarted, is it true? If so, isn't it impractical to deploy large flink batch jobs? -- Liu, Renjie Software Engineer, MVAD

Re: Flink Batch Processing with Kafka

2016-08-03 Thread Prabhu V
If your environment is not kerberized (or if you can offord to restart the job every 7 days), a checkpoint enabled, flink job with windowing and the count trigger, would be ideal for your requirement. Check the api's on flink windows. I had something like this that worked

Flink Batch Processing with Kafka

2016-08-03 Thread Alam, Zeeshan
Hi, Flink works very well with Kafka if you wish to stream data. Following is how I am streaming data with Kafka and Flink. FlinkKafkaConsumer08 kafkaConsumer = new FlinkKafkaConsumer08<>(KAFKA_AVRO_TOPIC, avroSchema, properties); DataStream messageStream = env.addSource(kafkaConsumer); Is

Re: Using Kafka and Flink for batch processing of a batch data source

2016-07-21 Thread Suneel Marthi
a separate Batch process. A similar architecture using Spark Streaming (for both batch and streaming) is demonstrated by Cloudera's Oryx 2.0 project - see http://oryx.io On Thu, Jul 21, 2016 at 12:41 PM, milind parikh <milindspar...@gmail.com> wrote: > At this point in time, im

Re: Using Kafka and Flink for batch processing of a batch data source

2016-07-21 Thread milind parikh
At this point in time, imo, batch processing is not why you should be considering Flink. That said, I predict that the stream processing (and event processing) will become the dominant methodology; as we begin to gravitate towards "I can't wait; I want it now" phenomenon. In that metho

Re: Using Kafka and Flink for batch processing of a batch data source

2016-07-20 Thread Leith Mudge
Thanks Milind & Till, This is what I thought from my reading of the documentation but it is nice to have it confirmed by people more knowledgeable. Supplementary to this question is whether Flink is the best choice for batch processing at this point in time or whether I would be better to

Re: Using Kafka and Flink for batch processing of a batch data source

2016-07-20 Thread Till Rohrmann
gt; Milind > > On Jul 19, 2016 9:37 PM, "Leith Mudge" <lei...@palamir.com> wrote: > >> I am currently working on an architecture for a big data streaming and >> batch processing platform. I am planning on using Apache Kafka for a >> distributed messaging s

Re: Using Kafka and Flink for batch processing of a batch data source

2016-07-20 Thread milind parikh
ture for a big data streaming and > batch processing platform. I am planning on using Apache Kafka for a > distributed messaging system to handle data from streaming data sources and > then pass on to Apache Flink for stream processing. I would also like to > use Flink's batch processing cap

Using Kafka and Flink for batch processing of a batch data source

2016-07-19 Thread Leith Mudge
I am currently working on an architecture for a big data streaming and batch processing platform. I am planning on using Apache Kafka for a distributed messaging system to handle data from streaming data sources and then pass on to Apache Flink for stream processing. I would also like to use

Re: Possible use case: Simulating iterative batch processing by rewinding source

2016-04-11 Thread Robert Metzger
Flink's DataStream API also allows reading files from disk (local, hdfs, etc.). So you don't have to set up Kafka to make this work (If you have it already, you can of course use it). On Mon, Apr 11, 2016 at 11:08 AM, Ufuk Celebi wrote: > On Mon, Apr 11, 2016 at 10:26 AM, Raul

Re: Possible use case: Simulating iterative batch processing by rewinding source

2016-04-11 Thread Ufuk Celebi
On Mon, Apr 11, 2016 at 10:26 AM, Raul Kripalani wrote: > Would appreciate the feedback of the community. Even if it's to inform that > currently this iterative, batch, windowed approach is not possible, that's > ok! Hey Raul! What you describe should work with Flink. This is

Re: Possible use case: Simulating iterative batch processing by rewinding source

2016-04-11 Thread Raul Kripalani
asically I have dumps of timeseries data (10y in ticks) which I need to >> calculate many metrics in an exploratory manner based on event time. NOTE: >> I don't have the metrics beforehand, it's gonna be an exploratory and >> iterative data analytics effort. >> >> Flink do

Re: Possible use case: Simulating iterative batch processing by rewinding source

2016-04-06 Thread Christophe Salperwyck
't have the metrics beforehand, it's gonna be an exploratory and > iterative data analytics effort. > > Flink doesn't seem to support windows on batch processing, so I'm thinking > about emulating batch by using the Kafka stream connector and rewinding the > data stream for every new

Possible use case: Simulating iterative batch processing by rewinding source

2016-04-06 Thread Raul Kripalani
have the metrics beforehand, it's gonna be an exploratory and iterative data analytics effort. Flink doesn't seem to support windows on batch processing, so I'm thinking about emulating batch by using the Kafka stream connector and rewinding the data stream for every new metric that I calculate

Re: Batch Processing Fault Tolerance (DataSet API)

2016-02-22 Thread Till Rohrmann
At the moment, the system can only deal with lost slots (nodes) if either there are some excess slots which have not been used before or if the died node is restarted. The latter is the case for yarn applications, for example. There the application master will restart containers which have died.

Re: Batch Processing Fault Tolerance (DataSet API)

2016-02-22 Thread Ovidiu-Cristian MARCU
Thank you, Till! The current (in progress) implementation is considering also the problem related to losing the task's slots of the failed node(s), something related to [2] ? [2] https://issues.apache.org/jira/browse/FLINK-3047 Best, Ovidiu > On 22 Feb 2016, at 18:13, Till Rohrmann

Re: Batch Processing Fault Tolerance (DataSet API)

2016-02-22 Thread Till Rohrmann
Hi Ovidiu, at the moment Flink's batch fault tolerance restarts the whole job in case of a failure. However, parts of the logic to do partial backtracking such as intermediate result partitions and the backtracking algorithm are already implemented or exist as a PR [1]. So we hope to complete the

Batch Processing Fault Tolerance (DataSet API)

2016-02-22 Thread Ovidiu-Cristian MARCU
Hi In case of failure of a node what does it mean 'Fault tolerance for programs in the DataSet API works by retrying failed executions’ [1] ? -work already done by the rest of the nodes is not lost, only work of the lost node is recomputed, job execution will continue or -entire job execution

Re: Checkpoints in batch processing & JDBC Output Format

2015-11-18 Thread Stephan Ewen
, Maximilian Bode < maximilian.b...@tngtech.com> wrote: > Hi Stephan, > > thank you very much for your answer. I was happy to meet Robert in Munich > last week and he proposed that for our problem, batch processing is the way > to go. > > We also talked about how exactly t

Checkpoints in batch processing & JDBC Output Format

2015-11-09 Thread Maximilian Bode
be the natural candidate for this problem. My first question is about the checkpointing system. Apparently (e.g. [1] and [2]) it does not apply to batch processing. So how does Flink handle failures during batch processing? For the use case described above, 'at least once' semantics would suffice – still

Re: Batch Processing as Streaming

2015-07-02 Thread Welly Tambunan
are long lived. They are started once and live to the end of the stream, or the machine failure. Greetings, Stephan On Thu, Jul 2, 2015 at 11:48 AM, tambunanw if05...@gmail.com wrote: Hi All, I see that the way batch processing works in Flink is quite different with Spark. It's all about