Raghu Angadi commented on PIG-833:
There will be benchmark results either attached to this jira or to a subsequent
I would like to compare to SequenceFiles and the new format in Hive. Should to
see on par performance.
Major performance benefits come from commonly used projections (through column
groups) and map side joins of sorted tables. An important part of motivation is
some features like column security, ability to delete entire columns.
We are running some larger scale benchmarks internally.. but these run on
Yahoo's internal data sources.
> Storage access layer
> Key: PIG-833
> URL: https://issues.apache.org/jira/browse/PIG-833
> Project: Pig
> Issue Type: New Feature
> Reporter: Jay Tang
> Attachments: hadoop20.jar.bz2, PIG-833-zebra.patch, zebra-javadoc.tgz
> A layer is needed to provide a high level data access abstraction and a
> tabular view of data in Hadoop, and could free Pig users from implementing
> their own data storage/retrieval code. This layer should also include a
> columnar storage format in order to provide fast data projection,
> CPU/space-efficient data serialization, and a schema language to manage
> physical storage metadata. Eventually it could also support predicate
> pushdown for further performance improvement. Initially, this layer could be
> a contrib project in Pig and become a hadoop subproject later on.
This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.