Re: DataStream API: Parquet File Format with Scala Case Classes

2022-02-24 Thread Fabian Paul
Hi Ryan, I guess the ticket you are looking for is the following [1]. AFAIK the work on it hasn't started yet. So we are still appreciating initial designs or ideas. Best, Fabian [1] https://issues.apache.org/jira/browse/FLINK-25416 On Tue, Feb 22, 2022 at 11:54 PM Ryan van Huuksloot <

Re: DataStream API: Parquet File Format with Scala Case Classes

2022-02-22 Thread Ryan van Huuksloot
Hi Fabian, Thanks for the response! I'll take a look at the CSVReaderFormat. Our team is interested in contributing to Parquet. However, our capacity for the current sprint is fully committed to other workstreams. I'll put this issue onto the backlog and see how it stacks against our internal

Re: DataStream API: Parquet File Format with Scala Case Classes

2022-02-21 Thread Fabian Paul
Hi Ryan, Thanks for bringing up this topic. Currently, your analysis is correct, and reading parquet files outside the Table API is rather difficult. The community started an effort in Flink 1.15 to restructure some of the formats to make them better applicable to the DataStream and Table API.

DataStream API: Parquet File Format with Scala Case Classes

2022-02-18 Thread Ryan van Huuksloot
Hello, Context: We are working on integrating Hybrid Sources with different Sources and Sinks. I have been working on a Parquet source that allows users to load the FileSource[T] so that the source can be used within Hybrid Sources where the HybridSource is of Type[T]. The environment is Scala