Re: Specifying a projection in Java API

2018-04-13 Thread Andy Grove
OK sorry for all the messages but I have this working now: On 4/13/18, 12:59 PM, "Andy Grove" wrote: Immediately after sending this I realized that I also needed to pass the projection message type in the following lines: val columnIO = new

Re: Specifying a projection in Java API

2018-04-13 Thread Andy Grove
Immediately after sending this I realized that I also needed to pass the projection message type in the following lines: val columnIO = new ColumnIOFactory().getColumnIO(projectionType) val recordReader = columnIO.getRecordReader(pages, new GroupRecordConverter(projectionType)) I

Re: Specifying a projection in Java API

2018-04-13 Thread Andy Grove
Thanks. I tried this. val projection: Seq[column.ColumnDescriptor] = //filter the columns I want from the schema val projectionBuilder = Types.buildMessage() for (col <- projection) { projectionBuilder.addField(Types.buildMessage().named(col.getPath.head)) }

Re: Specifying a projection in Java API

2018-04-13 Thread Ryan Blue
I'd suggest using the Types builders to create your projection schema (MessageType), then passing that schema to the ParquetFileReader.setRequestedSchema method you found. On Fri, Apr 13, 2018 at 10:40 AM, Andy Grove wrote: > Hi Ryan, > > I'm writing some low-level

Re: Specifying a projection in Java API

2018-04-13 Thread Andy Grove
Hi Ryan, I'm writing some low-level performance tests to try and find a bottleneck on our platform and have intentionally excluded Spark/Thrift/Presto etc and want to test Parquet directly both with local files and against our HDFS cluster to get performance metrics. Our parquet files were

Re: Specifying a projection in Java API

2018-04-13 Thread Ryan Blue
Andy, what object model are you using to read? Usually you don't have a list of column descriptors, you have an Avro read schema or a Thrift class or something. On Fri, Apr 13, 2018 at 10:31 AM, Andy Grove wrote: > Hi, > > I’m trying to read a parquet file with a projection

Specifying a projection in Java API

2018-04-13 Thread Andy Grove
Hi, I’m trying to read a parquet file with a projection from Scala and I can’t find docs or examples for the correct way to do this. I have the file schema and have filtered for the list of columns I need, so I have a List of ColumnDescriptors. It looks like I should call