I meant to respond to this thread yesterday, but got busy with work and slipped me.
This is possible doable using Flink Streaming, others can correct me here. *Assumption:* Both the Batch and Streaming processes are reading from a single Kafka topic and by "Batched data", I am assuming its the same data that's being fed to Streaming but aggregated over a longer time period. This could be done using a Lambda like Architecture. 1. A Kafka topic that's ingesting data to be distributed to various consumers. 2. A Flink Streaming process with a small time window (minutes/seconds) that's ingesting from Kafka and handles data over this small window. 3. Another Flink Streaming process with a very long time window (few hrs ?) that's also ingesting from Kafka and is munging over large time periods of data (think mini-batch that extends Streaming). This should work and u don't need a separate Batch process. A similar architecture using Spark Streaming (for both batch and streaming) is demonstrated by Cloudera's Oryx 2.0 project - see http://oryx.io On Thu, Jul 21, 2016 at 12:41 PM, milind parikh <milindspar...@gmail.com> wrote: > At this point in time, imo, batch processing is not why you should be > considering Flink. > > That said, I predict that the stream processing (and event processing) > will become the dominant methodology; as we begin to gravitate towards "I > can't wait; I want it now" phenomenon. In that methodology, I believe > Flink represents the cutting edge of what is possible; at this point in > time. > > Regards > Milind > > On Jul 20, 2016 4:57 PM, "Leith Mudge" <lei...@palamir.com> wrote: > > Thanks Milind & Till, > > > > This is what I thought from my reading of the documentation but it is nice > to have it confirmed by people more knowledgeable. > > > > Supplementary to this question is whether Flink is the best choice for > batch processing at this point in time or whether I would be better to look > at a more mature and dedicated batch processing engine such as Spark? I do > like the choices that adopting the unified programming model outlined in > Apache Beam/Google Cloud Dataflow SDK and this purports to have runners for > both Flink and Spark. > > > > Regards, > > > > Leith > > *From: *Till Rohrmann <trohrm...@apache.org> > *Date: *Wednesday, 20 July 2016 at 5:05 PM > *To: *<user@flink.apache.org> > *Subject: *Re: Using Kafka and Flink for batch processing of a batch data > source > > > > At the moment there is also no batch source for Kafka. I'm also not so > sure how you would define a batch given a Kafka stream. Only reading till a > certain offset? Or maybe until one has read n messages? > > > > I think it's best to write the batch data to HDFS or another batch data > store. > > > > Cheers, > > Till > > > > On Wed, Jul 20, 2016 at 8:08 AM, milind parikh <milindspar...@gmail.com> > wrote: > > It likely does not make sense to publish a file ( "batch data") into > Kafka; unless the file is very small. > > An improvised pub-sub mechanism for Kafka could be to (a) write the file > into a persistent store outside of kafka (b) publishing of a message into > Kafka about that write so as to enable processing of that file. > > If you really needed to have provenance around processing, you could route > data processing through Nifi before Flink. > > Regards > Milind > > > > On Jul 19, 2016 9:37 PM, "Leith Mudge" <lei...@palamir.com> wrote: > > I am currently working on an architecture for a big data streaming and > batch processing platform. I am planning on using Apache Kafka for a > distributed messaging system to handle data from streaming data sources and > then pass on to Apache Flink for stream processing. I would also like to > use Flink's batch processing capabilities to process batch data. > > Does it make sense to pass the batched data through Kafka on a periodic > basis as a source for Flink batch processing (is this even possible?) or > should I just write the batch data to a data store and then process by > reading into Flink? > > > ------------------------------ > > > | All rights in this email and any attached documents or files are > expressly reserved. This e-mail, and any files transmitted with it, > contains confidential information which may be subject to legal privilege. > If you are not the intended recipient, please delete it and notify Palamir > Pty Ltd by e-mail. Palamir Pty Ltd does not warrant this transmission or > attachments are free from viruses or similar malicious code and does not > accept liability for any consequences to the recipient caused by opening or > using this e-mail. For the legal protection of our business, any email sent > or received by us may be monitored or intercepted. | Please consider the > environment before printing this email. | > > > > ------------------------------ > > | All rights in this email and any attached documents or files are > expressly reserved. This e-mail, and any files transmitted with it, > contains confidential information which may be subject to legal privilege. > If you are not the intended recipient, please delete it and notify Palamir > Pty Ltd by e-mail. Palamir Pty Ltd does not warrant this transmission or > attachments are free from viruses or similar malicious code and does not > accept liability for any consequences to the recipient caused by opening or > using this e-mail. For the legal protection of our business, any email sent > or received by us may be monitored or intercepted. | Please consider the > environment before printing this email. | > > >