Re: Streaming Protobuf into Parquet file not working with StreamingFileSink

2019-03-27 Thread Rafi Aroch
Thanks Piotr & Kostas. Really looking forward to this :) Rafi On Wed, Mar 27, 2019 at 10:58 AM Piotr Nowojski wrote: > Hi Rafi, > > There is also an ongoing effort to support bounded streams in DataStream > API [1], which might provide the backbone for the functionalists that you > need. > >

Re: Streaming Protobuf into Parquet file not working with StreamingFileSink

2019-03-27 Thread Piotr Nowojski
Hi Rafi, There is also an ongoing effort to support bounded streams in DataStream API [1], which might provide the backbone for the functionalists that you need. Piotrek [1] https://issues.apache.org/jira/browse/FLINK-11875 > On 25 Mar 2019

Re: Streaming Protobuf into Parquet file not working with StreamingFileSink

2019-03-25 Thread Rafi Aroch
Hi Kostas, Thank you. I'm currently testing my job against a small file, so it's finishing before the checkpointing starts. But also if it was a larger file and checkpoint did happen, there would always be the tailing events starting after the last checkpoint until the source has finished. So woul

Re: Streaming Protobuf into Parquet file not working with StreamingFileSink

2019-03-21 Thread Rafi Aroch
Hi Kostas, Yes I have. Rafi On Thu, Mar 21, 2019, 20:47 Kostas Kloudas wrote: > Hi Rafi, > > Have you enabled checkpointing for you job? > > Cheers, > Kostas > > On Thu, Mar 21, 2019 at 5:18 PM Rafi Aroch wrote: > >> Hi Piotr and Kostas, >> >> Thanks for your reply. >> >> The issue is that I

Re: Streaming Protobuf into Parquet file not working with StreamingFileSink

2019-03-21 Thread Kostas Kloudas
Hi Rafi, Have you enabled checkpointing for you job? Cheers, Kostas On Thu, Mar 21, 2019 at 5:18 PM Rafi Aroch wrote: > Hi Piotr and Kostas, > > Thanks for your reply. > > The issue is that I don't see any committed files, only in-progress. > I tried to debug the code for more details. I see t

Re: Streaming Protobuf into Parquet file not working with StreamingFileSink

2019-03-21 Thread Rafi Aroch
Hi Piotr and Kostas, Thanks for your reply. The issue is that I don't see any committed files, only in-progress. I tried to debug the code for more details. I see that in *BulkPartWriter* I do reach the *write* methods and see events getting written, but I never reach the *closeForCommit*. I reac

Re: Streaming Protobuf into Parquet file not working with StreamingFileSink

2019-03-21 Thread Kostas Kloudas
Hi Rafi, Piotr is correct. In-progress files are not necessarily readable. The valid files are the ones that are "committed" or finalized. Cheers, Kostas On Thu, Mar 21, 2019 at 2:53 PM Piotr Nowojski wrote: > Hi, > > I’m not sure, but shouldn’t you be just reading committed files and ignore >

Re: Streaming Protobuf into Parquet file not working with StreamingFileSink

2019-03-21 Thread Piotr Nowojski
Hi, I’m not sure, but shouldn’t you be just reading committed files and ignore in-progress? Maybe Kostas could add more insight to this topic. Piotr Nowojski > On 20 Mar 2019, at 12:23, Rafi Aroch wrote: > > Hi, > > I'm trying to stream events in Prorobuf format into a parquet file. > I look

Streaming Protobuf into Parquet file not working with StreamingFileSink

2019-03-20 Thread Rafi Aroch
Hi, I'm trying to stream events in Prorobuf format into a parquet file. I looked into both streaming-file options: BucketingSink & StreamingFileSink. I first tried using the newer *StreamingFileSink* with the *forBulkFormat *API. I noticed there's currently support only for the Avro format with th