[ https://issues.apache.org/jira/browse/PIG-979?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12776760#action_12776760 ]
Ying He commented on PIG-979: ----------------------------- performance tests doesn't show noticeable difference between trunk and accumulator patch when calling no-accumulator udfs. the script to test performance is: register /homes/yinghe/pig_test/pigperf.jar; register /homes/yinghe/pig_test/string.jar; register /homes/yinghe/pig_test/piggybank.jar; A = load '/user/pig/tests/data/pigmix_large/page_views' using org.apache.pig.test.utils.datagen.PigPerformanceLoader() as (user, action, timespent, query_term, ip_addr, timestamp, estimated_revenue, page_info, page_links); B = foreach A generate user, org.apache.pig.piggybank.evaluation.string.STRINGCAT(user, ip_addr) as id; C = group B by id parallel 10; D = foreach C { generate group, string.BagCount2(B)*string.ColumnLen2(B, 0); } store D into 'test2'; The input data has 100M rows, output has 57M rows, so the UDFs are called 57M times. The result is with patch: 5min 14sec w/o patch: 5min 17sec > Acummulator Interface for UDFs > ------------------------------ > > Key: PIG-979 > URL: https://issues.apache.org/jira/browse/PIG-979 > Project: Pig > Issue Type: New Feature > Reporter: Alan Gates > Assignee: Ying He > Attachments: PIG-979.patch, PIG-979.patch > > > Add an accumulator interface for UDFs that would allow them to take a set > number of records at a time instead of the entire bag. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.