Re: Question about my use case.

2018-04-20 Thread Ryan Blue
You might consider using Avro with Java classes. That would reduce the amount of code you need because it can use reflection to work with your classes. We don’t recommend building to the object model APIs unless you need tighter integration with an existing processing engine. Here’s an example of h

Re: Question about my use case.

2018-04-19 Thread ALeX Wang
Sorry for this long delayed reply, Finally have time to work on this again, and yes, after taking a closer study at parquet-hadoop source code, I'm able to simple write a customer ParquetWriter using java.io.FileOutputStream for my use case. We do not use Avro, All the data is in flat java class

Re: Question about my use case.

2018-03-14 Thread Ryan Blue
Yeah, sounds like something went wrong. What is your data model? Parquet can handle Avro records pretty seamlessly if you already have them. On Wed, Mar 14, 2018 at 9:20 AM, ALeX Wang wrote: > Hi Ryan, > > Thanks for the reply, > > We are using samza for streaming, > > Regarding parquet java, th

Re: Question about my use case.

2018-03-14 Thread ALeX Wang
Hi Ryan, Thanks for the reply, We are using samza for streaming, Regarding parquet java, then i must have not used the APIs right,,, since last time we tried, we have 7 hadoop processes spawned for writing to a single file and it was much slower than our parquet c++ alternative, Thanks, On 14

Re: Question about my use case.

2018-03-14 Thread Ryan Blue
Hi Alex, I don't think what you're trying to do makes sense. If you're using Scala, then your data is already in the JVM and it is probably much easier to write it to Parquet using the Java library. While that library depends on Hadoop, you don't have to use it with HDFS. The Hadoop FileSystem int

Re: Question about my use case.

2018-03-13 Thread ALeX Wang
Also could i get a pointer to example that write parquet file from arrow memory buffer directly? The part i'm currently missing is how to derive the repetition level and definition level@@ Thanks, On 13 March 2018 at 17:52, ALeX Wang wrote: > hi, > > i know it is may not be the best place to a

Question about my use case.

2018-03-13 Thread ALeX Wang
hi, i know it is may not be the best place to ask but would like to try anyways, as it is quite hard for me to find good example of this online. My usecase: i'd like to generate from streaming data (using Scala) into arrow format in memory mapped file and then have my parquet-cpp program writing