[
https://issues.apache.org/jira/browse/PIG-979?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12776760#action_12776760
]
Ying He commented on PIG-979:
-----------------------------
performance tests doesn't show noticeable difference between trunk and
accumulator patch when calling no-accumulator udfs.
the script to test performance is:
register /homes/yinghe/pig_test/pigperf.jar;
register /homes/yinghe/pig_test/string.jar;
register /homes/yinghe/pig_test/piggybank.jar;
A = load '/user/pig/tests/data/pigmix_large/page_views' using
org.apache.pig.test.utils.datagen.PigPerformanceLoader() as (user, action,
timespent, query_term, ip_addr, timestamp, estimated_revenue, page_info,
page_links);
B = foreach A generate user,
org.apache.pig.piggybank.evaluation.string.STRINGCAT(user, ip_addr) as id;
C = group B by id parallel 10;
D = foreach C {
generate group, string.BagCount2(B)*string.ColumnLen2(B, 0);
}
store D into 'test2';
The input data has 100M rows, output has 57M rows, so the UDFs are called 57M
times.
The result is
with patch: 5min 14sec
w/o patch: 5min 17sec
> Acummulator Interface for UDFs
> ------------------------------
>
> Key: PIG-979
> URL: https://issues.apache.org/jira/browse/PIG-979
> Project: Pig
> Issue Type: New Feature
> Reporter: Alan Gates
> Assignee: Ying He
> Attachments: PIG-979.patch, PIG-979.patch
>
>
> Add an accumulator interface for UDFs that would allow them to take a set
> number of records at a time instead of the entire bag.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.