Chao Wang commented on PIG-1098:

Ideally, should have a better structure for methods such as: advance(), 
advanceCG(), getKey(), getCGKey(), getValue(), getCGValue() (ColumnGroup.java).
The only difference of new *CG* methods is that they do not do the check "if 
(atEnd())". This gives some performance gain while degrading code readability a 

Considering this is the first cut for performance improvement and all the above 
changes are inside ColumnGroup class, which is package private, as a result, 
these are Zebra's internal implementation details and we can safely improve 
them in the future,  overall +1

> [zebra] Zebra Performance Optimizations
> ---------------------------------------
>                 Key: PIG-1098
>                 URL: https://issues.apache.org/jira/browse/PIG-1098
>             Project: Pig
>          Issue Type: Improvement
>            Reporter: Yan Zhou
>            Assignee: Yan Zhou
>            Priority: Minor
>             Fix For: 0.6.0, 0.7.0
>         Attachments: PIG-1098.patch
> Many in-core performance optimization opportunities exist in zebra, such as 
> removal of redundant precautionary checks, use of better collection types to 
> reduce levels of indirection to the memory objects, changing of input splits 
> in ascending sizes to descending sizes. Observed improvements of wall clock 
> time of some PIG LOAD queries are around 10%.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

Reply via email to