Re: Specifying a projection in Java API

2018-04-13 Thread Andy Grove
Thanks. I tried this. val projection: Seq[column.ColumnDescriptor] = //filter the columns I want from the schema val projectionBuilder = Types.buildMessage() for (col <- projection) { projectionBuilder.addField(Types.buildMessage().named(col.getPath.head)) }

Re: Specifying a projection in Java API

2018-04-13 Thread Andy Grove
OK sorry for all the messages but I have this working now: On 4/13/18, 12:59 PM, "Andy Grove" wrote: Immediately after sending this I realized that I also needed to pass the projection message type in the following lines: val columnIO = new

Specifying a projection in Java API

2018-04-13 Thread Andy Grove
Hi, I’m trying to read a parquet file with a projection from Scala and I can’t find docs or examples for the correct way to do this. I have the file schema and have filtered for the list of columns I need, so I have a List of ColumnDescriptors. It looks like I should call

Re: Specifying a projection in Java API

2018-04-13 Thread Andy Grove
Hi Ryan, I'm writing some low-level performance tests to try and find a bottleneck on our platform and have intentionally excluded Spark/Thrift/Presto etc and want to test Parquet directly both with local files and against our HDFS cluster to get performance metrics. Our parquet files were

Re: Specifying a projection in Java API

2018-04-13 Thread Ryan Blue
Andy, what object model are you using to read? Usually you don't have a list of column descriptors, you have an Avro read schema or a Thrift class or something. On Fri, Apr 13, 2018 at 10:31 AM, Andy Grove wrote: > Hi, > > I’m trying to read a parquet file with a projection

Re: Specifying a projection in Java API

2018-04-13 Thread Ryan Blue
I'd suggest using the Types builders to create your projection schema (MessageType), then passing that schema to the ParquetFileReader.setRequestedSchema method you found. On Fri, Apr 13, 2018 at 10:40 AM, Andy Grove wrote: > Hi Ryan, > > I'm writing some low-level

Re: Specifying a projection in Java API

2018-04-13 Thread Andy Grove
Immediately after sending this I realized that I also needed to pass the projection message type in the following lines: val columnIO = new ColumnIOFactory().getColumnIO(projectionType) val recordReader = columnIO.getRecordReader(pages, new GroupRecordConverter(projectionType)) I

[jira] [Updated] (PARQUET-1244) Documentation link to logical types broken

2018-04-13 Thread Antoine Pitrou (JIRA)
[ https://issues.apache.org/jira/browse/PARQUET-1244?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Antoine Pitrou updated PARQUET-1244: Labels: beginner (was: ) > Documentation link to logical types broken >

[jira] [Created] (PARQUET-1269) [C++] Scanning fails with list columns

2018-04-13 Thread Antoine Pitrou (JIRA)
Antoine Pitrou created PARQUET-1269: --- Summary: [C++] Scanning fails with list columns Key: PARQUET-1269 URL: https://issues.apache.org/jira/browse/PARQUET-1269 Project: Parquet Issue Type:

[jira] [Created] (PARQUET-1270) [C++] Executable tools do not get installed

2018-04-13 Thread Antoine Pitrou (JIRA)
Antoine Pitrou created PARQUET-1270: --- Summary: [C++] Executable tools do not get installed Key: PARQUET-1270 URL: https://issues.apache.org/jira/browse/PARQUET-1270 Project: Parquet Issue

[jira] [Commented] (PARQUET-1270) [C++] Executable tools do not get installed

2018-04-13 Thread ASF GitHub Bot (JIRA)
[ https://issues.apache.org/jira/browse/PARQUET-1270?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16437192#comment-16437192 ] ASF GitHub Bot commented on PARQUET-1270: - pitrou opened a new pull request #455: PARQUET-1270:

[jira] [Created] (PARQUET-1271) [C++] "parquet_reader" should be "parquet-reader"

2018-04-13 Thread Antoine Pitrou (JIRA)
Antoine Pitrou created PARQUET-1271: --- Summary: [C++] "parquet_reader" should be "parquet-reader" Key: PARQUET-1271 URL: https://issues.apache.org/jira/browse/PARQUET-1271 Project: Parquet

[jira] [Commented] (PARQUET-968) Add Hive/Presto support in ProtoParquet

2018-04-13 Thread ASF GitHub Bot (JIRA)
[ https://issues.apache.org/jira/browse/PARQUET-968?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16437369#comment-16437369 ] ASF GitHub Bot commented on PARQUET-968: costimuraru commented on issue #411: PARQUET-968 Add

[jira] [Commented] (PARQUET-968) Add Hive/Presto support in ProtoParquet

2018-04-13 Thread ASF GitHub Bot (JIRA)
[ https://issues.apache.org/jira/browse/PARQUET-968?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16437294#comment-16437294 ] ASF GitHub Bot commented on PARQUET-968: costimuraru commented on issue #411: PARQUET-968 Add