[jira] Commented: (PIG-865) Performance: Unnnecessary computation in FRJoin
[ https://issues.apache.org/jira/browse/PIG-865?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12755808#action_12755808 ] Ashutosh Chauhan commented on PIG-865: -- Ouch... It should have been atleast at par if not better ! Reading the code, I can see there are more opportunities to optimize here. Currently, I am trying to get an access on M45, once I get it I will run few benchmarks and report back if I see improvements. Performance: Unnnecessary computation in FRJoin --- Key: PIG-865 URL: https://issues.apache.org/jira/browse/PIG-865 Project: Pig Issue Type: Improvement Components: impl Affects Versions: 0.3.0 Reporter: Ashutosh Chauhan Assignee: Ashutosh Chauhan Priority: Minor Attachments: pig-865.patch, pig-865_v2.patch In POFRJoin implementation POLocalRearrange is used to extract join keys from the input tuples. If keys match then to perform actual join input tuples are fed to Foreach which does a cross on its inputs. After keys are extracted using POLocalRearrange output; function getValueTuple(POLocalRearrange lr, Tuple tuple) is called to reconstruct the input tuple. It seems that this function call is unnecessary since we already have input tuple at that time. This is not a bug, but since this function would get called for every tuple, if it is eliminated, it should certainly help to improve performance. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-865) Performance: Unnnecessary computation in FRJoin
[ https://issues.apache.org/jira/browse/PIG-865?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12727128#action_12727128 ] Hadoop QA commented on PIG-865: --- -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12412450/pig-865_v2.patch against trunk revision 790735. +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no tests are needed for this patch. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Pig-Patch-minerva.apache.org/112/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Pig-Patch-minerva.apache.org/112/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Console output: http://hudson.zones.apache.org/hudson/job/Pig-Patch-minerva.apache.org/112/console This message is automatically generated. Performance: Unnnecessary computation in FRJoin --- Key: PIG-865 URL: https://issues.apache.org/jira/browse/PIG-865 Project: Pig Issue Type: Improvement Components: impl Affects Versions: 0.3.0 Reporter: Ashutosh Chauhan Assignee: Ashutosh Chauhan Priority: Minor Fix For: 0.4.0 Attachments: pig-865.patch, pig-865_v2.patch In POFRJoin implementation POLocalRearrange is used to extract join keys from the input tuples. If keys match then to perform actual join input tuples are fed to Foreach which does a cross on its inputs. After keys are extracted using POLocalRearrange output; function getValueTuple(POLocalRearrange lr, Tuple tuple) is called to reconstruct the input tuple. It seems that this function call is unnecessary since we already have input tuple at that time. This is not a bug, but since this function would get called for every tuple, if it is eliminated, it should certainly help to improve performance. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-865) Performance: Unnnecessary computation in FRJoin
[ https://issues.apache.org/jira/browse/PIG-865?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12725361#action_12725361 ] Pradeep Kamath commented on PIG-865: Looks good. Minor comment: Since the constant expression has type Tuple for the fragment input, the following code in getNext() should change to be consistent : {code} ce.setValue(inp.result); to ce.setValue((Tuple)(inp.result)); {code} Not related to the patch but also related to performance, the following code: long time1 = System.currentTimeMillis(); long time2 = System.currentTimeMillis(); log.debug(Hash Table built. Time taken: + (time2-time1)); can be removed since this seems like it was left over from debugging in the initial checkin and can be cleaned up. Performance: Unnnecessary computation in FRJoin --- Key: PIG-865 URL: https://issues.apache.org/jira/browse/PIG-865 Project: Pig Issue Type: Improvement Components: impl Affects Versions: 0.3.0 Reporter: Ashutosh Chauhan Priority: Minor Fix For: 0.4.0 Attachments: pig-865.patch In POFRJoin implementation POLocalRearrange is used to extract join keys from the input tuples. If keys match then to perform actual join input tuples are fed to Foreach which does a cross on its inputs. After keys are extracted using POLocalRearrange output; function getValueTuple(POLocalRearrange lr, Tuple tuple) is called to reconstruct the input tuple. It seems that this function call is unnecessary since we already have input tuple at that time. This is not a bug, but since this function would get called for every tuple, if it is eliminated, it should certainly help to improve performance. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-865) Performance: Unnnecessary computation in FRJoin
[ https://issues.apache.org/jira/browse/PIG-865?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12725368#action_12725368 ] Ashutosh Chauhan commented on PIG-865: -- Thanks for the review, Pradeep. As I was looking into code, I also found that bags used to hold replicate contents are recreated everytime, instead same bag object can be cleared and used again, thus minimizing object overhead. In the extreme case where every value of join key is different for every tuple (of replicate) but matches with tuples of fragment, we will end up creating as many bags as there are tuples where one bag would do. Will include this change and upload new patch. Performance: Unnnecessary computation in FRJoin --- Key: PIG-865 URL: https://issues.apache.org/jira/browse/PIG-865 Project: Pig Issue Type: Improvement Components: impl Affects Versions: 0.3.0 Reporter: Ashutosh Chauhan Priority: Minor Fix For: 0.4.0 Attachments: pig-865.patch In POFRJoin implementation POLocalRearrange is used to extract join keys from the input tuples. If keys match then to perform actual join input tuples are fed to Foreach which does a cross on its inputs. After keys are extracted using POLocalRearrange output; function getValueTuple(POLocalRearrange lr, Tuple tuple) is called to reconstruct the input tuple. It seems that this function call is unnecessary since we already have input tuple at that time. This is not a bug, but since this function would get called for every tuple, if it is eliminated, it should certainly help to improve performance. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-865) Performance: Unnnecessary computation in FRJoin
[ https://issues.apache.org/jira/browse/PIG-865?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12724929#action_12724929 ] Hadoop QA commented on PIG-865: --- -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12412006/pig-865.patch against trunk revision 788174. +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no tests are needed for this patch. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Pig-Patch-minerva.apache.org/105/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Pig-Patch-minerva.apache.org/105/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Console output: http://hudson.zones.apache.org/hudson/job/Pig-Patch-minerva.apache.org/105/console This message is automatically generated. Performance: Unnnecessary computation in FRJoin --- Key: PIG-865 URL: https://issues.apache.org/jira/browse/PIG-865 Project: Pig Issue Type: Improvement Components: impl Affects Versions: 0.3.0 Reporter: Ashutosh Chauhan Priority: Minor Fix For: 0.4.0 Attachments: pig-865.patch In POFRJoin implementation POLocalRearrange is used to extract join keys from the input tuples. If keys match then to perform actual join input tuples are fed to Foreach which does a cross on its inputs. After keys are extracted using POLocalRearrange output; function getValueTuple(POLocalRearrange lr, Tuple tuple) is called to reconstruct the input tuple. It seems that this function call is unnecessary since we already have input tuple at that time. This is not a bug, but since this function would get called for every tuple, if it is eliminated, it should certainly help to improve performance. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-865) Performance: Unnnecessary computation in FRJoin
[ https://issues.apache.org/jira/browse/PIG-865?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12724921#action_12724921 ] Ashutosh Chauhan commented on PIG-865: -- Patch contains no new unit-tests as it neither introduces new functionality nor modifies the existing one. Performance: Unnnecessary computation in FRJoin --- Key: PIG-865 URL: https://issues.apache.org/jira/browse/PIG-865 Project: Pig Issue Type: Improvement Components: impl Affects Versions: 0.3.0 Reporter: Ashutosh Chauhan Priority: Minor Fix For: 0.4.0 Attachments: pig-865.patch In POFRJoin implementation POLocalRearrange is used to extract join keys from the input tuples. If keys match then to perform actual join input tuples are fed to Foreach which does a cross on its inputs. After keys are extracted using POLocalRearrange output; function getValueTuple(POLocalRearrange lr, Tuple tuple) is called to reconstruct the input tuple. It seems that this function call is unnecessary since we already have input tuple at that time. This is not a bug, but since this function would get called for every tuple, if it is eliminated, it should certainly help to improve performance. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.