Re: Plan for a Parquet new release and writing Parquet file with outputstream

2018-04-21 Thread Jean-Baptiste Onofré
Yup, that's great. I will update the PR when back from vacation. Regards JB Le 20 avr. 2018 à 02:26, à 02:26, Eugene Kirpichov a écrit: >Very cool! JB, time to update your PR? > >On Thu, Apr 19, 2018 at 9:17 AM Alexey Romanenko > >wrote: > >>

Re: Plan for a Parquet new release and writing Parquet file with outputstream

2018-04-19 Thread Eugene Kirpichov
Very cool! JB, time to update your PR? On Thu, Apr 19, 2018 at 9:17 AM Alexey Romanenko wrote: > FYI: Apache Parquet 1.10.0 was release recently. > It contains *org.apache.parquet.io.OutputFile *and updated > *org.apache.parquet.hadoop.ParquetFileWriter* > > WBR, >

Re: Plan for a Parquet new release and writing Parquet file with outputstream

2018-04-19 Thread Alexey Romanenko
FYI: Apache Parquet 1.10.0 was release recently. It contains org.apache.parquet.io.OutputFile and updated org.apache.parquet.hadoop.ParquetFileWriter WBR, Alexey > On 14 Feb 2018, at 20:10, Jean-Baptiste Onofré wrote: > > Great !! > > In the mean time, I started to PoC

Re: Plan for a Parquet new release and writing Parquet file with outputstream

2018-02-14 Thread Jean-Baptiste Onofré
Great !! In the mean time, I started to PoC around directly parquet-common to see if I can implement a BeamParquetReader and a BeamParquetWriter. I might also propose some PRs. I will continue tomorrow around that. Thanks again ! Regards JB On 02/14/2018 08:04 PM, Ryan Blue wrote: > Additions

Re: Plan for a Parquet new release and writing Parquet file with outputstream

2018-02-13 Thread Jean-Baptiste Onofré
Hi Ryan, Thanks for the update. Ideally for Beam, it would be great to have the AvroParquetReader and AvroParquetWriter using the InputFile/OutputFile interfaces. It would allow me to directly leverage Beam FileIO. Do you have a rough date for the Parquet release with that ? Thanks Regards JB

Re: Plan for a Parquet new release and writing Parquet file with outputstream

2018-02-13 Thread Ryan Blue
Jean-Baptiste, We're planning a release that will include the new OutputFile class, which I think you should be able to use. Is there anything you'd change to make this work more easily with Beam? rb On Tue, Feb 13, 2018 at 12:31 PM, Jean-Baptiste Onofré wrote: > Hi guys, >

Re: Plan for a Parquet new release and writing Parquet file with outputstream

2018-02-13 Thread Eugene Kirpichov
Thanks for raising this, JB! To clarify for people on Parquet mailing list who are not familiar with Beam: Beam supports multiple filesystems (currently: local, HDFS, Google Cloud, S3) via a pluggable interface (that among other things can give you a Channel for reading/writing the given path),

Plan for a Parquet new release and writing Parquet file with outputstream

2018-02-13 Thread Jean-Baptiste Onofré
Hi guys, I'm working on the Apache Beam ParquetIO: https://github.com/apache/beam/pull/1851 In Beam, thanks to FileIO, we support several filesystems (HDFS, S3, ...). If I was able to implement the Read part using AvroParquetReader leveraging Beam FileIO, I'm struggling on the writing part.