[jira] Commented: (PIG-865) Performance: Unnnecessary computation in FRJoin

2009-09-15 Thread Ashutosh Chauhan (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-865?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12755808#action_12755808
 ] 

Ashutosh Chauhan commented on PIG-865:
--

Ouch... It should have been atleast at par if not better !  Reading the code, I 
can see there are more opportunities to optimize here. Currently, I am trying 
to get an access on M45, once I get it I will run few benchmarks and report 
back if I see improvements.

 Performance: Unnnecessary computation in FRJoin
 ---

 Key: PIG-865
 URL: https://issues.apache.org/jira/browse/PIG-865
 Project: Pig
  Issue Type: Improvement
  Components: impl
Affects Versions: 0.3.0
Reporter: Ashutosh Chauhan
Assignee: Ashutosh Chauhan
Priority: Minor
 Attachments: pig-865.patch, pig-865_v2.patch


 In POFRJoin implementation POLocalRearrange is used to extract join keys from 
 the input tuples. If keys match then to perform actual join input tuples are 
 fed to Foreach which does a cross on its inputs. After keys are extracted 
 using POLocalRearrange output; function getValueTuple(POLocalRearrange lr, 
 Tuple tuple) is called to reconstruct the input tuple. It seems that this 
 function call is unnecessary since we already have input tuple at that time. 
 This is not a bug, but since this function would get called for every tuple, 
 if it is eliminated, it should certainly help to improve performance. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-865) Performance: Unnnecessary computation in FRJoin

2009-07-03 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-865?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12727128#action_12727128
 ] 

Hadoop QA commented on PIG-865:
---

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12412450/pig-865_v2.patch
  against trunk revision 790735.

+1 @author.  The patch does not contain any @author tags.

-1 tests included.  The patch doesn't appear to include any new or modified 
tests.
Please justify why no tests are needed for this patch.

+1 javadoc.  The javadoc tool did not generate any warning messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

+1 findbugs.  The patch does not introduce any new Findbugs warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

+1 core tests.  The patch passed core unit tests.

+1 contrib tests.  The patch passed contrib unit tests.

Test results: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-minerva.apache.org/112/testReport/
Findbugs warnings: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-minerva.apache.org/112/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Console output: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-minerva.apache.org/112/console

This message is automatically generated.

 Performance: Unnnecessary computation in FRJoin
 ---

 Key: PIG-865
 URL: https://issues.apache.org/jira/browse/PIG-865
 Project: Pig
  Issue Type: Improvement
  Components: impl
Affects Versions: 0.3.0
Reporter: Ashutosh Chauhan
Assignee: Ashutosh Chauhan
Priority: Minor
 Fix For: 0.4.0

 Attachments: pig-865.patch, pig-865_v2.patch


 In POFRJoin implementation POLocalRearrange is used to extract join keys from 
 the input tuples. If keys match then to perform actual join input tuples are 
 fed to Foreach which does a cross on its inputs. After keys are extracted 
 using POLocalRearrange output; function getValueTuple(POLocalRearrange lr, 
 Tuple tuple) is called to reconstruct the input tuple. It seems that this 
 function call is unnecessary since we already have input tuple at that time. 
 This is not a bug, but since this function would get called for every tuple, 
 if it is eliminated, it should certainly help to improve performance. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-865) Performance: Unnnecessary computation in FRJoin

2009-06-29 Thread Pradeep Kamath (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-865?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12725361#action_12725361
 ] 

Pradeep Kamath commented on PIG-865:


Looks good.
Minor comment: Since the constant expression has type Tuple for the fragment 
input, the following code in getNext() should change to be consistent :
{code}
 ce.setValue(inp.result);
to
 ce.setValue((Tuple)(inp.result));
{code}


Not related to the patch but also related to performance, the following code:
long time1 = System.currentTimeMillis(); 
long time2 = System.currentTimeMillis(); 
log.debug(Hash Table built. Time taken:  + (time2-time1));
can be removed since this seems like it was left over from debugging in the 
initial checkin and can be cleaned up.

 Performance: Unnnecessary computation in FRJoin
 ---

 Key: PIG-865
 URL: https://issues.apache.org/jira/browse/PIG-865
 Project: Pig
  Issue Type: Improvement
  Components: impl
Affects Versions: 0.3.0
Reporter: Ashutosh Chauhan
Priority: Minor
 Fix For: 0.4.0

 Attachments: pig-865.patch


 In POFRJoin implementation POLocalRearrange is used to extract join keys from 
 the input tuples. If keys match then to perform actual join input tuples are 
 fed to Foreach which does a cross on its inputs. After keys are extracted 
 using POLocalRearrange output; function getValueTuple(POLocalRearrange lr, 
 Tuple tuple) is called to reconstruct the input tuple. It seems that this 
 function call is unnecessary since we already have input tuple at that time. 
 This is not a bug, but since this function would get called for every tuple, 
 if it is eliminated, it should certainly help to improve performance. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-865) Performance: Unnnecessary computation in FRJoin

2009-06-29 Thread Ashutosh Chauhan (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-865?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12725368#action_12725368
 ] 

Ashutosh Chauhan commented on PIG-865:
--

Thanks for the review, Pradeep. 
As I was looking into code, I also found that bags used to hold replicate 
contents are recreated everytime, instead same bag object can be cleared and 
used again, thus minimizing object overhead. In the extreme case where every 
value of join key is different for every tuple (of replicate) but matches with 
tuples of fragment, we will end up creating as many bags as there are tuples 
where one bag would do. Will include this change and upload new patch.

   

 Performance: Unnnecessary computation in FRJoin
 ---

 Key: PIG-865
 URL: https://issues.apache.org/jira/browse/PIG-865
 Project: Pig
  Issue Type: Improvement
  Components: impl
Affects Versions: 0.3.0
Reporter: Ashutosh Chauhan
Priority: Minor
 Fix For: 0.4.0

 Attachments: pig-865.patch


 In POFRJoin implementation POLocalRearrange is used to extract join keys from 
 the input tuples. If keys match then to perform actual join input tuples are 
 fed to Foreach which does a cross on its inputs. After keys are extracted 
 using POLocalRearrange output; function getValueTuple(POLocalRearrange lr, 
 Tuple tuple) is called to reconstruct the input tuple. It seems that this 
 function call is unnecessary since we already have input tuple at that time. 
 This is not a bug, but since this function would get called for every tuple, 
 if it is eliminated, it should certainly help to improve performance. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-865) Performance: Unnnecessary computation in FRJoin

2009-06-28 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-865?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12724929#action_12724929
 ] 

Hadoop QA commented on PIG-865:
---

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12412006/pig-865.patch
  against trunk revision 788174.

+1 @author.  The patch does not contain any @author tags.

-1 tests included.  The patch doesn't appear to include any new or modified 
tests.
Please justify why no tests are needed for this patch.

+1 javadoc.  The javadoc tool did not generate any warning messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

+1 findbugs.  The patch does not introduce any new Findbugs warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

+1 core tests.  The patch passed core unit tests.

+1 contrib tests.  The patch passed contrib unit tests.

Test results: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-minerva.apache.org/105/testReport/
Findbugs warnings: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-minerva.apache.org/105/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Console output: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-minerva.apache.org/105/console

This message is automatically generated.

 Performance: Unnnecessary computation in FRJoin
 ---

 Key: PIG-865
 URL: https://issues.apache.org/jira/browse/PIG-865
 Project: Pig
  Issue Type: Improvement
  Components: impl
Affects Versions: 0.3.0
Reporter: Ashutosh Chauhan
Priority: Minor
 Fix For: 0.4.0

 Attachments: pig-865.patch


 In POFRJoin implementation POLocalRearrange is used to extract join keys from 
 the input tuples. If keys match then to perform actual join input tuples are 
 fed to Foreach which does a cross on its inputs. After keys are extracted 
 using POLocalRearrange output; function getValueTuple(POLocalRearrange lr, 
 Tuple tuple) is called to reconstruct the input tuple. It seems that this 
 function call is unnecessary since we already have input tuple at that time. 
 This is not a bug, but since this function would get called for every tuple, 
 if it is eliminated, it should certainly help to improve performance. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-865) Performance: Unnnecessary computation in FRJoin

2009-06-27 Thread Ashutosh Chauhan (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-865?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12724921#action_12724921
 ] 

Ashutosh Chauhan commented on PIG-865:
--

Patch contains no new unit-tests as it neither introduces new functionality nor 
modifies the existing one.

 Performance: Unnnecessary computation in FRJoin
 ---

 Key: PIG-865
 URL: https://issues.apache.org/jira/browse/PIG-865
 Project: Pig
  Issue Type: Improvement
  Components: impl
Affects Versions: 0.3.0
Reporter: Ashutosh Chauhan
Priority: Minor
 Fix For: 0.4.0

 Attachments: pig-865.patch


 In POFRJoin implementation POLocalRearrange is used to extract join keys from 
 the input tuples. If keys match then to perform actual join input tuples are 
 fed to Foreach which does a cross on its inputs. After keys are extracted 
 using POLocalRearrange output; function getValueTuple(POLocalRearrange lr, 
 Tuple tuple) is called to reconstruct the input tuple. It seems that this 
 function call is unnecessary since we already have input tuple at that time. 
 This is not a bug, but since this function would get called for every tuple, 
 if it is eliminated, it should certainly help to improve performance. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.