I don't think there is an official Apache library in Java that supports writing/reading Arrow data to parquet.
If you are looking to interchange Arrow data between Java/Python, your best bet is to use the native Arrow file format (Java doesn't support compression options yet). -Micah On Thu, Nov 12, 2020 at 3:23 PM Chris Nuernberger <[email protected]> wrote: > We use Clojure and have a dataframe library that does this: > > https://github.com/techascent/tech.ml.dataset/ > > On Thu, Nov 12, 2020 at 2:44 PM Jason Sachs <[email protected]> wrote: > >> The Python examples in https://arrow.apache.org/docs/python/parquet.html >> are wonderful and really easy to get started; in particular this one: >> >> writer = pq.ParquetWriter('example2.parquet', table.schema) >> for i in range(3): >> writer.write_table(table) >> writer.close() >> >> How would I do something similar in Java? Arrow and Parquet libraries >> don't seem to know about one another. >> >> I have looked a little bit at the Javadocs at >> https://www.javadoc.io/doc/org.apache.parquet/parquet-column/1.10.0/index.html >> but my head is spinning. (although for the record most of my work is in >> Python and a coworker is handling the Java side... he is only slightly less >> confused, though) >> >
