Re: BucketingSink capabilities for DataSet API

2020-02-19 Thread aj
Thanks, Timo. I have not used and explore Table API until now. I have used dataset and datastream API only. I will read about the Table API. On Wed, Feb 19, 2020 at 4:33 PM Timo Walther wrote: > Hi Anuj, > > another option would be to use the new Hive connectors. Have you looked > into those?

Re: BucketingSink capabilities for DataSet API

2020-02-19 Thread aj
Thanks, Rafi. I will try with this but yes if partitioning is not possible then I also have to look some other solution. On Wed, Feb 19, 2020 at 3:44 PM Rafi Aroch wrote: > Hi Anuj, > > It's been a while since I wrote this (Flink 1.5.2). Could be a > better/newer way, but this is what how I

Re: BucketingSink capabilities for DataSet API

2020-02-19 Thread Timo Walther
Hi Anuj, another option would be to use the new Hive connectors. Have you looked into those? They might work on SQL internal data types which is why you would need to use the Table API then. Maybe Bowen in CC can help you here. Regards, Timo On 19.02.20 11:14, Rafi Aroch wrote: Hi Anuj,

Re: BucketingSink capabilities for DataSet API

2020-02-19 Thread Rafi Aroch
Hi Anuj, It's been a while since I wrote this (Flink 1.5.2). Could be a better/newer way, but this is what how I read & write Parquet with hadoop-compatibility: // imports > import org.apache.avro.generic.GenericRecord; > import org.apache.flink.api.java.hadoop.mapreduce.HadoopInputFormat; >

Re: BucketingSink capabilities for DataSet API

2020-02-15 Thread aj
Hi Rafi, I have a similar use case where I want to read parquet files in the dataset and want to perform some transformation and similarly want to write the result using year month day partitioned. I am stuck at first step only where how to read and write Parquet files using

Re: BucketingSink capabilities for DataSet API

2018-10-25 Thread Andrey Zagrebin
Hi Rafi, At the moment I do not see any support of Parquet in DataSet API except HadoopOutputFormat, mentioned in stack overflow question. I have cc’ed Fabian and Aljoscha, maybe they could provide more information. Best, Andrey > On 25 Oct 2018, at 13:08, Rafi Aroch wrote: > > Hi, > > I'm

BucketingSink capabilities for DataSet API

2018-10-25 Thread Rafi Aroch
Hi, I'm writing a Batch job which reads Parquet, does some aggregations and writes back as Parquet files. I would like the output to be partitioned by year, month, day by event time. Similarly to the functionality of the BucketingSink. I was able to achieve the reading/writing to/from Parquet by