OK sorry for all the messages but I have this working now:
On 4/13/18, 12:59 PM, "Andy Grove" wrote:
Immediately after sending this I realized that I also needed to pass the
projection message type in the following lines:
val columnIO = new
Immediately after sending this I realized that I also needed to pass the
projection message type in the following lines:
val columnIO = new ColumnIOFactory().getColumnIO(projectionType)
val recordReader = columnIO.getRecordReader(pages, new
GroupRecordConverter(projectionType))
I
Thanks. I tried this.
val projection: Seq[column.ColumnDescriptor] = //filter the columns I
want from the schema
val projectionBuilder = Types.buildMessage()
for (col <- projection) {
projectionBuilder.addField(Types.buildMessage().named(col.getPath.head))
}
I'd suggest using the Types builders to create your projection schema
(MessageType), then passing that schema to the
ParquetFileReader.setRequestedSchema method you found.
On Fri, Apr 13, 2018 at 10:40 AM, Andy Grove wrote:
> Hi Ryan,
>
> I'm writing some low-level
Hi Ryan,
I'm writing some low-level performance tests to try and find a bottleneck on
our platform and have intentionally excluded Spark/Thrift/Presto etc and want
to test Parquet directly both with local files and against our HDFS cluster to
get performance metrics. Our parquet files were
Andy, what object model are you using to read? Usually you don't have a
list of column descriptors, you have an Avro read schema or a Thrift class
or something.
On Fri, Apr 13, 2018 at 10:31 AM, Andy Grove wrote:
> Hi,
>
> I’m trying to read a parquet file with a projection
Hi,
I’m trying to read a parquet file with a projection from Scala and I can’t find
docs or examples for the correct way to do this.
I have the file schema and have filtered for the list of columns I need, so I
have a List of ColumnDescriptors.
It looks like I should call