You might consider using Avro with Java classes. That would reduce the
amount of code you need because it can use reflection to work with your
classes. We don’t recommend building to the object model APIs unless you
need tighter integration with an existing processing engine. Here’s an
example of h
Sorry for this long delayed reply,
Finally have time to work on this again, and yes, after taking a closer
study at parquet-hadoop source code, I'm able to simple write a customer
ParquetWriter using java.io.FileOutputStream for my use case.
We do not use Avro, All the data is in flat java class
Yeah, sounds like something went wrong. What is your data model? Parquet
can handle Avro records pretty seamlessly if you already have them.
On Wed, Mar 14, 2018 at 9:20 AM, ALeX Wang wrote:
> Hi Ryan,
>
> Thanks for the reply,
>
> We are using samza for streaming,
>
> Regarding parquet java, th
Hi Ryan,
Thanks for the reply,
We are using samza for streaming,
Regarding parquet java, then i must have not used the APIs right,,, since
last time we tried, we have 7 hadoop processes spawned for writing to a
single file and it was much slower than our parquet c++ alternative,
Thanks,
On 14
Hi Alex,
I don't think what you're trying to do makes sense. If you're using Scala,
then your data is already in the JVM and it is probably much easier to
write it to Parquet using the Java library. While that library depends on
Hadoop, you don't have to use it with HDFS. The Hadoop FileSystem int
Also could i get a pointer to example that write parquet file from arrow
memory buffer directly?
The part i'm currently missing is how to derive the repetition level and
definition level@@
Thanks,
On 13 March 2018 at 17:52, ALeX Wang wrote:
> hi,
>
> i know it is may not be the best place to a
hi,
i know it is may not be the best place to ask but would like to try
anyways, as it is quite hard for me to find good example of this online.
My usecase:
i'd like to generate from streaming data (using Scala) into arrow format in
memory mapped file and then have my parquet-cpp program writing