Hi Ryan,
sorry to have been quite, but I was busy traveling recently :)
Just a quick update about this one:
- I asked a guy from my team to work with me on the Beam ParquetIO. We're also
seeing several users expected this new IO.
- I will update my current PR to use Parquet SNAPSHOT and verify t
Great !!
In the mean time, I started to PoC around directly parquet-common to see if I
can implement a BeamParquetReader and a BeamParquetWriter.
I might also propose some PRs.
I will continue tomorrow around that.
Thanks again !
Regards
JB
On 02/14/2018 08:04 PM, Ryan Blue wrote:
> Additions
Additions to the builders are easy enough that we can get that in. There's
a PR out there that needs to be fixed:
https://github.com/apache/parquet-mr/pull/446
I've asked the author for just the builder changes. If we don't hear back,
we can add another PR but I'd like to give the author some time
Hi Ryan,
Thanks for the update.
Ideally for Beam, it would be great to have the AvroParquetReader and
AvroParquetWriter using the InputFile/OutputFile interfaces. It would allow me
to directly leverage Beam FileIO.
Do you have a rough date for the Parquet release with that ?
Thanks
Regards
JB
Jean-Baptiste,
We're planning a release that will include the new OutputFile class, which
I think you should be able to use. Is there anything you'd change to make
this work more easily with Beam?
rb
On Tue, Feb 13, 2018 at 12:31 PM, Jean-Baptiste Onofré
wrote:
> Hi guys,
>
> I'm working on th
Hi guys,
I'm working on the Apache Beam ParquetIO:
https://github.com/apache/beam/pull/1851
In Beam, thanks to FileIO, we support several filesystems (HDFS, S3, ...).
If I was able to implement the Read part using AvroParquetReader leveraging Beam
FileIO, I'm struggling on the writing part.
I