[jira] Updated: (PIG-1178) LogicalPlan and Optimizer are too complex and hard to work with

2010-03-12 Thread Ankit Modi (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1178?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ankit Modi updated PIG-1178:


Status: Patch Available  (was: Open)

 LogicalPlan and Optimizer are too complex and hard to work with
 ---

 Key: PIG-1178
 URL: https://issues.apache.org/jira/browse/PIG-1178
 Project: Pig
  Issue Type: Improvement
Reporter: Alan Gates
Assignee: Ying He
 Attachments: expressions-2.patch, expressions.patch, lp.patch, 
 lp.patch, pig_1178.patch, pig_1178.patch, PIG_1178.patch, pig_1178_2.patch, 
 pig_1178_3.patch


 The current implementation of the logical plan and the logical optimizer in 
 Pig has proven to not be easily extensible. Developer feedback has indicated 
 that adding new rules to the optimizer is quite burdensome. In addition, the 
 logical plan has been an area of numerous bugs, many of which have been 
 difficult to fix. Developers also feel that the logical plan is difficult to 
 understand and maintain. The root cause for these issues is that a number of 
 design decisions that were made as part of the 0.2 rewrite of the front end 
 have now proven to be sub-optimal. The heart of this proposal is to revisit a 
 number of those proposals and rebuild the logical plan with a simpler design 
 that will make it much easier to maintain the logical plan as well as extend 
 the logical optimizer. 
 See http://wiki.apache.org/pig/PigLogicalPlanOptimizerRewrite for full 
 details.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1178) LogicalPlan and Optimizer are too complex and hard to work with

2010-03-12 Thread Ankit Modi (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1178?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ankit Modi updated PIG-1178:


Attachment: pig_1178_3.patch

 LogicalPlan and Optimizer are too complex and hard to work with
 ---

 Key: PIG-1178
 URL: https://issues.apache.org/jira/browse/PIG-1178
 Project: Pig
  Issue Type: Improvement
Reporter: Alan Gates
Assignee: Ying He
 Attachments: expressions-2.patch, expressions.patch, lp.patch, 
 lp.patch, pig_1178.patch, pig_1178.patch, PIG_1178.patch, pig_1178_2.patch, 
 pig_1178_3.patch


 The current implementation of the logical plan and the logical optimizer in 
 Pig has proven to not be easily extensible. Developer feedback has indicated 
 that adding new rules to the optimizer is quite burdensome. In addition, the 
 logical plan has been an area of numerous bugs, many of which have been 
 difficult to fix. Developers also feel that the logical plan is difficult to 
 understand and maintain. The root cause for these issues is that a number of 
 design decisions that were made as part of the 0.2 rewrite of the front end 
 have now proven to be sub-optimal. The heart of this proposal is to revisit a 
 number of those proposals and rebuild the logical plan with a simpler design 
 that will make it much easier to maintain the logical plan as well as extend 
 the logical optimizer. 
 See http://wiki.apache.org/pig/PigLogicalPlanOptimizerRewrite for full 
 details.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1178) LogicalPlan and Optimizer are too complex and hard to work with

2010-02-24 Thread Ankit Modi (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1178?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12837954#action_12837954
 ] 

Ankit Modi commented on PIG-1178:
-

the core tests are failing due to some issue with hudson or the framework.

I ran the core tests again yesterday night and they passed. 

 LogicalPlan and Optimizer are too complex and hard to work with
 ---

 Key: PIG-1178
 URL: https://issues.apache.org/jira/browse/PIG-1178
 Project: Pig
  Issue Type: Improvement
Reporter: Alan Gates
Assignee: Ying He
 Attachments: expressions-2.patch, expressions.patch, lp.patch, 
 lp.patch, pig_1178.patch, pig_1178.patch, PIG_1178.patch, pig_1178_2.patch


 The current implementation of the logical plan and the logical optimizer in 
 Pig has proven to not be easily extensible. Developer feedback has indicated 
 that adding new rules to the optimizer is quite burdensome. In addition, the 
 logical plan has been an area of numerous bugs, many of which have been 
 difficult to fix. Developers also feel that the logical plan is difficult to 
 understand and maintain. The root cause for these issues is that a number of 
 design decisions that were made as part of the 0.2 rewrite of the front end 
 have now proven to be sub-optimal. The heart of this proposal is to revisit a 
 number of those proposals and rebuild the logical plan with a simpler design 
 that will make it much easier to maintain the logical plan as well as extend 
 the logical optimizer. 
 See http://wiki.apache.org/pig/PigLogicalPlanOptimizerRewrite for full 
 details.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1178) LogicalPlan and Optimizer are too complex and hard to work with

2010-02-22 Thread Ankit Modi (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1178?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ankit Modi updated PIG-1178:


Status: Open  (was: Patch Available)

 LogicalPlan and Optimizer are too complex and hard to work with
 ---

 Key: PIG-1178
 URL: https://issues.apache.org/jira/browse/PIG-1178
 Project: Pig
  Issue Type: Improvement
Reporter: Alan Gates
Assignee: Ying He
 Attachments: expressions-2.patch, expressions.patch, lp.patch, 
 lp.patch, pig_1178.patch, pig_1178.patch, PIG_1178.patch, pig_1178_2.patch


 The current implementation of the logical plan and the logical optimizer in 
 Pig has proven to not be easily extensible. Developer feedback has indicated 
 that adding new rules to the optimizer is quite burdensome. In addition, the 
 logical plan has been an area of numerous bugs, many of which have been 
 difficult to fix. Developers also feel that the logical plan is difficult to 
 understand and maintain. The root cause for these issues is that a number of 
 design decisions that were made as part of the 0.2 rewrite of the front end 
 have now proven to be sub-optimal. The heart of this proposal is to revisit a 
 number of those proposals and rebuild the logical plan with a simpler design 
 that will make it much easier to maintain the logical plan as well as extend 
 the logical optimizer. 
 See http://wiki.apache.org/pig/PigLogicalPlanOptimizerRewrite for full 
 details.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1178) LogicalPlan and Optimizer are too complex and hard to work with

2010-02-22 Thread Ankit Modi (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1178?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ankit Modi updated PIG-1178:


Status: Patch Available  (was: Open)

Resubmitting patch again due to core test failures

 LogicalPlan and Optimizer are too complex and hard to work with
 ---

 Key: PIG-1178
 URL: https://issues.apache.org/jira/browse/PIG-1178
 Project: Pig
  Issue Type: Improvement
Reporter: Alan Gates
Assignee: Ying He
 Attachments: expressions-2.patch, expressions.patch, lp.patch, 
 lp.patch, pig_1178.patch, pig_1178.patch, PIG_1178.patch, pig_1178_2.patch


 The current implementation of the logical plan and the logical optimizer in 
 Pig has proven to not be easily extensible. Developer feedback has indicated 
 that adding new rules to the optimizer is quite burdensome. In addition, the 
 logical plan has been an area of numerous bugs, many of which have been 
 difficult to fix. Developers also feel that the logical plan is difficult to 
 understand and maintain. The root cause for these issues is that a number of 
 design decisions that were made as part of the 0.2 rewrite of the front end 
 have now proven to be sub-optimal. The heart of this proposal is to revisit a 
 number of those proposals and rebuild the logical plan with a simpler design 
 that will make it much easier to maintain the logical plan as well as extend 
 the logical optimizer. 
 See http://wiki.apache.org/pig/PigLogicalPlanOptimizerRewrite for full 
 details.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Reopened: (PIG-965) PERFORMANCE: optimize common case in matches (PORegex)

2010-02-16 Thread Ankit Modi (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-965?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ankit Modi reopened PIG-965:


  Assignee: (was: Ankit Modi)

I couldn't see the poregex2.patch patch applied in the code.

automaton.jar is present in the trunk, but the files modified/added by above 
patch are not modified/added.

 PERFORMANCE: optimize common case in matches (PORegex)
 --

 Key: PIG-965
 URL: https://issues.apache.org/jira/browse/PIG-965
 Project: Pig
  Issue Type: Improvement
  Components: impl
Reporter: Thejas M Nair
 Attachments: automaton.jar, poregex2.patch


 Some frequently seen use cases of 'matches' comparison operator have follow 
 properties -
 1. The rhs is a constant string . eg c1 matches 'abc%' 
 2. Regexes such that look for matching prefix , suffix etc are very common. 
 eg - abc%', %abc, '%abc%' 
 To optimize for these common cases , PORegex.java can be changed to -
 1. Compile the pattern (rhs of matches) re-use it if the pattern string has 
 not changed. 
 2. Use string comparisons for simple common regexes (in 2 above).
 The implementation of Hive like clause uses similar optimizations.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1178) LogicalPlan and Optimizer are too complex and hard to work with

2010-02-05 Thread Ankit Modi (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1178?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ankit Modi updated PIG-1178:


Attachment: pig_1178.patch

This is a new patch that can be applied to SVN Trunk.

It includes ForEach, InnerLoad, Generate operators along with some 
LogicalExpression.
It also includes a new optimizer Rule for pushing FilterAboveForeach

 LogicalPlan and Optimizer are too complex and hard to work with
 ---

 Key: PIG-1178
 URL: https://issues.apache.org/jira/browse/PIG-1178
 Project: Pig
  Issue Type: Improvement
Reporter: Alan Gates
Assignee: Ying He
 Attachments: expressions-2.patch, expressions.patch, lp.patch, 
 lp.patch, pig_1178.patch, pig_1178.patch, PIG_1178.patch


 The current implementation of the logical plan and the logical optimizer in 
 Pig has proven to not be easily extensible. Developer feedback has indicated 
 that adding new rules to the optimizer is quite burdensome. In addition, the 
 logical plan has been an area of numerous bugs, many of which have been 
 difficult to fix. Developers also feel that the logical plan is difficult to 
 understand and maintain. The root cause for these issues is that a number of 
 design decisions that were made as part of the 0.2 rewrite of the front end 
 have now proven to be sub-optimal. The heart of this proposal is to revisit a 
 number of those proposals and rebuild the logical plan with a simpler design 
 that will make it much easier to maintain the logical plan as well as extend 
 the logical optimizer. 
 See http://wiki.apache.org/pig/PigLogicalPlanOptimizerRewrite for full 
 details.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1154) local mode fails when hadoop config directory is specified in classpath

2010-02-02 Thread Ankit Modi (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1154?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12828822#action_12828822
 ] 

Ankit Modi commented on PIG-1154:
-

It looks like the problem is caused by overwritten value of mapred.system.dir 
from mapred-default.xml and the path mentioned above 
/mapredsystem/hadoop/mapredsystem/ may not exist.

This cannot be solved in local mode as it is not possible to change classpath 
at runtime.

I'll provide a patch which would
   * Provide a warning whenever classpath contains mapred-site.xml or 
hdfs-site.xml.
   * It'll exit pig with an error message if above case is encountered.

 local mode fails when hadoop config directory is specified in classpath
 ---

 Key: PIG-1154
 URL: https://issues.apache.org/jira/browse/PIG-1154
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.7.0
Reporter: Thejas M Nair
Assignee: Ankit Modi
 Fix For: 0.7.0


 In local mode, the hadoop configuration should not be taken from the 
 classpath . 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1154) local mode fails when hadoop config directory is specified in classpath

2010-02-02 Thread Ankit Modi (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1154?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12828831#action_12828831
 ] 

Ankit Modi commented on PIG-1154:
-

It will provide warning whenever the files are encountered in Local Mode.

On top of that it will exit with error if mapred.system.dir is different from 
the default one and it does not exist.

 local mode fails when hadoop config directory is specified in classpath
 ---

 Key: PIG-1154
 URL: https://issues.apache.org/jira/browse/PIG-1154
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.7.0
Reporter: Thejas M Nair
Assignee: Ankit Modi
 Fix For: 0.7.0


 In local mode, the hadoop configuration should not be taken from the 
 classpath . 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1154) local mode fails when hadoop config directory is specified in classpath

2010-02-02 Thread Ankit Modi (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1154?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ankit Modi updated PIG-1154:


Attachment: pig_1154.patch

Patch according to comments mentioned above.

 local mode fails when hadoop config directory is specified in classpath
 ---

 Key: PIG-1154
 URL: https://issues.apache.org/jira/browse/PIG-1154
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.7.0
Reporter: Thejas M Nair
Assignee: Ankit Modi
 Fix For: 0.7.0

 Attachments: pig_1154.patch


 In local mode, the hadoop configuration should not be taken from the 
 classpath . 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1154) local mode fails when hadoop config directory is specified in classpath

2010-02-02 Thread Ankit Modi (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1154?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ankit Modi updated PIG-1154:


Status: Patch Available  (was: Open)

This patch only affects only Local Mode in pig.

 local mode fails when hadoop config directory is specified in classpath
 ---

 Key: PIG-1154
 URL: https://issues.apache.org/jira/browse/PIG-1154
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.7.0
Reporter: Thejas M Nair
Assignee: Ankit Modi
 Fix For: 0.7.0

 Attachments: pig_1154.patch


 In local mode, the hadoop configuration should not be taken from the 
 classpath . 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1178) LogicalPlan and Optimizer are too complex and hard to work with

2010-02-02 Thread Ankit Modi (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1178?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ankit Modi updated PIG-1178:


Status: Open  (was: Patch Available)

I found a bug in the code so I'll be releasing another patch for the same.

I'll keep this patch in the JIRA until I replace it with a new one so everyone 
can review it.

 LogicalPlan and Optimizer are too complex and hard to work with
 ---

 Key: PIG-1178
 URL: https://issues.apache.org/jira/browse/PIG-1178
 Project: Pig
  Issue Type: Improvement
Reporter: Alan Gates
Assignee: Ying He
 Attachments: expressions-2.patch, expressions.patch, lp.patch, 
 lp.patch, pig_1178.patch, PIG_1178.patch


 The current implementation of the logical plan and the logical optimizer in 
 Pig has proven to not be easily extensible. Developer feedback has indicated 
 that adding new rules to the optimizer is quite burdensome. In addition, the 
 logical plan has been an area of numerous bugs, many of which have been 
 difficult to fix. Developers also feel that the logical plan is difficult to 
 understand and maintain. The root cause for these issues is that a number of 
 design decisions that were made as part of the 0.2 rewrite of the front end 
 have now proven to be sub-optimal. The heart of this proposal is to revisit a 
 number of those proposals and rebuild the logical plan with a simpler design 
 that will make it much easier to maintain the logical plan as well as extend 
 the logical optimizer. 
 See http://wiki.apache.org/pig/PigLogicalPlanOptimizerRewrite for full 
 details.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-965) PERFORMANCE: optimize common case in matches (PORegex)

2009-12-16 Thread Ankit Modi (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-965?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ankit Modi updated PIG-965:
---

Attachment: (was: poregex2.patch)

 PERFORMANCE: optimize common case in matches (PORegex)
 --

 Key: PIG-965
 URL: https://issues.apache.org/jira/browse/PIG-965
 Project: Pig
  Issue Type: Improvement
  Components: impl
Reporter: Thejas M Nair
Assignee: Ankit Modi
 Attachments: automaton.jar


 Some frequently seen use cases of 'matches' comparison operator have follow 
 properties -
 1. The rhs is a constant string . eg c1 matches 'abc%' 
 2. Regexes such that look for matching prefix , suffix etc are very common. 
 eg - abc%', %abc, '%abc%' 
 To optimize for these common cases , PORegex.java can be changed to -
 1. Compile the pattern (rhs of matches) re-use it if the pattern string has 
 not changed. 
 2. Use string comparisons for simple common regexes (in 2 above).
 The implementation of Hive like clause uses similar optimizations.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-965) PERFORMANCE: optimize common case in matches (PORegex)

2009-12-16 Thread Ankit Modi (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-965?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ankit Modi updated PIG-965:
---

Status: Open  (was: Patch Available)

 PERFORMANCE: optimize common case in matches (PORegex)
 --

 Key: PIG-965
 URL: https://issues.apache.org/jira/browse/PIG-965
 Project: Pig
  Issue Type: Improvement
  Components: impl
Reporter: Thejas M Nair
Assignee: Ankit Modi
 Attachments: automaton.jar


 Some frequently seen use cases of 'matches' comparison operator have follow 
 properties -
 1. The rhs is a constant string . eg c1 matches 'abc%' 
 2. Regexes such that look for matching prefix , suffix etc are very common. 
 eg - abc%', %abc, '%abc%' 
 To optimize for these common cases , PORegex.java can be changed to -
 1. Compile the pattern (rhs of matches) re-use it if the pattern string has 
 not changed. 
 2. Use string comparisons for simple common regexes (in 2 above).
 The implementation of Hive like clause uses similar optimizations.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-965) PERFORMANCE: optimize common case in matches (PORegex)

2009-12-16 Thread Ankit Modi (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-965?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ankit Modi updated PIG-965:
---

Assignee: Benjamin Francisoud  (was: Ankit Modi)
  Status: Patch Available  (was: Open)

 PERFORMANCE: optimize common case in matches (PORegex)
 --

 Key: PIG-965
 URL: https://issues.apache.org/jira/browse/PIG-965
 Project: Pig
  Issue Type: Improvement
  Components: impl
Reporter: Thejas M Nair
Assignee: Benjamin Francisoud
 Attachments: automaton.jar, poregex2.patch


 Some frequently seen use cases of 'matches' comparison operator have follow 
 properties -
 1. The rhs is a constant string . eg c1 matches 'abc%' 
 2. Regexes such that look for matching prefix , suffix etc are very common. 
 eg - abc%', %abc, '%abc%' 
 To optimize for these common cases , PORegex.java can be changed to -
 1. Compile the pattern (rhs of matches) re-use it if the pattern string has 
 not changed. 
 2. Use string comparisons for simple common regexes (in 2 above).
 The implementation of Hive like clause uses similar optimizations.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Assigned: (PIG-965) PERFORMANCE: optimize common case in matches (PORegex)

2009-12-16 Thread Ankit Modi (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-965?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ankit Modi reassigned PIG-965:
--

Assignee: Ankit Modi  (was: Benjamin Francisoud)

 PERFORMANCE: optimize common case in matches (PORegex)
 --

 Key: PIG-965
 URL: https://issues.apache.org/jira/browse/PIG-965
 Project: Pig
  Issue Type: Improvement
  Components: impl
Reporter: Thejas M Nair
Assignee: Ankit Modi
 Attachments: automaton.jar, poregex2.patch


 Some frequently seen use cases of 'matches' comparison operator have follow 
 properties -
 1. The rhs is a constant string . eg c1 matches 'abc%' 
 2. Regexes such that look for matching prefix , suffix etc are very common. 
 eg - abc%', %abc, '%abc%' 
 To optimize for these common cases , PORegex.java can be changed to -
 1. Compile the pattern (rhs of matches) re-use it if the pattern string has 
 not changed. 
 2. Use string comparisons for simple common regexes (in 2 above).
 The implementation of Hive like clause uses similar optimizations.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-965) PERFORMANCE: optimize common case in matches (PORegex)

2009-12-16 Thread Ankit Modi (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-965?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ankit Modi updated PIG-965:
---

Attachment: (was: poregex2.patch)

 PERFORMANCE: optimize common case in matches (PORegex)
 --

 Key: PIG-965
 URL: https://issues.apache.org/jira/browse/PIG-965
 Project: Pig
  Issue Type: Improvement
  Components: impl
Reporter: Thejas M Nair
Assignee: Ankit Modi
 Attachments: automaton.jar, poregex2.patch


 Some frequently seen use cases of 'matches' comparison operator have follow 
 properties -
 1. The rhs is a constant string . eg c1 matches 'abc%' 
 2. Regexes such that look for matching prefix , suffix etc are very common. 
 eg - abc%', %abc, '%abc%' 
 To optimize for these common cases , PORegex.java can be changed to -
 1. Compile the pattern (rhs of matches) re-use it if the pattern string has 
 not changed. 
 2. Use string comparisons for simple common regexes (in 2 above).
 The implementation of Hive like clause uses similar optimizations.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-965) PERFORMANCE: optimize common case in matches (PORegex)

2009-12-16 Thread Ankit Modi (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-965?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ankit Modi updated PIG-965:
---

Status: Open  (was: Patch Available)

 PERFORMANCE: optimize common case in matches (PORegex)
 --

 Key: PIG-965
 URL: https://issues.apache.org/jira/browse/PIG-965
 Project: Pig
  Issue Type: Improvement
  Components: impl
Reporter: Thejas M Nair
Assignee: Ankit Modi
 Attachments: automaton.jar, poregex2.patch


 Some frequently seen use cases of 'matches' comparison operator have follow 
 properties -
 1. The rhs is a constant string . eg c1 matches 'abc%' 
 2. Regexes such that look for matching prefix , suffix etc are very common. 
 eg - abc%', %abc, '%abc%' 
 To optimize for these common cases , PORegex.java can be changed to -
 1. Compile the pattern (rhs of matches) re-use it if the pattern string has 
 not changed. 
 2. Use string comparisons for simple common regexes (in 2 above).
 The implementation of Hive like clause uses similar optimizations.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-965) PERFORMANCE: optimize common case in matches (PORegex)

2009-12-16 Thread Ankit Modi (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-965?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ankit Modi updated PIG-965:
---

Status: Patch Available  (was: Open)

 PERFORMANCE: optimize common case in matches (PORegex)
 --

 Key: PIG-965
 URL: https://issues.apache.org/jira/browse/PIG-965
 Project: Pig
  Issue Type: Improvement
  Components: impl
Reporter: Thejas M Nair
Assignee: Ankit Modi
 Attachments: automaton.jar, poregex2.patch


 Some frequently seen use cases of 'matches' comparison operator have follow 
 properties -
 1. The rhs is a constant string . eg c1 matches 'abc%' 
 2. Regexes such that look for matching prefix , suffix etc are very common. 
 eg - abc%', %abc, '%abc%' 
 To optimize for these common cases , PORegex.java can be changed to -
 1. Compile the pattern (rhs of matches) re-use it if the pattern string has 
 not changed. 
 2. Use string comparisons for simple common regexes (in 2 above).
 The implementation of Hive like clause uses similar optimizations.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-965) PERFORMANCE: optimize common case in matches (PORegex)

2009-12-15 Thread Ankit Modi (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-965?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ankit Modi updated PIG-965:
---

Attachment: (was: poregex2.patch)

 PERFORMANCE: optimize common case in matches (PORegex)
 --

 Key: PIG-965
 URL: https://issues.apache.org/jira/browse/PIG-965
 Project: Pig
  Issue Type: Improvement
  Components: impl
Reporter: Thejas M Nair
Assignee: Ankit Modi
 Attachments: automaton.jar, poregex2.patch


 Some frequently seen use cases of 'matches' comparison operator have follow 
 properties -
 1. The rhs is a constant string . eg c1 matches 'abc%' 
 2. Regexes such that look for matching prefix , suffix etc are very common. 
 eg - abc%', %abc, '%abc%' 
 To optimize for these common cases , PORegex.java can be changed to -
 1. Compile the pattern (rhs of matches) re-use it if the pattern string has 
 not changed. 
 2. Use string comparisons for simple common regexes (in 2 above).
 The implementation of Hive like clause uses similar optimizations.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-965) PERFORMANCE: optimize common case in matches (PORegex)

2009-12-15 Thread Ankit Modi (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-965?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ankit Modi updated PIG-965:
---

Attachment: poregex2.patch

 PERFORMANCE: optimize common case in matches (PORegex)
 --

 Key: PIG-965
 URL: https://issues.apache.org/jira/browse/PIG-965
 Project: Pig
  Issue Type: Improvement
  Components: impl
Reporter: Thejas M Nair
Assignee: Ankit Modi
 Attachments: automaton.jar, poregex2.patch


 Some frequently seen use cases of 'matches' comparison operator have follow 
 properties -
 1. The rhs is a constant string . eg c1 matches 'abc%' 
 2. Regexes such that look for matching prefix , suffix etc are very common. 
 eg - abc%', %abc, '%abc%' 
 To optimize for these common cases , PORegex.java can be changed to -
 1. Compile the pattern (rhs of matches) re-use it if the pattern string has 
 not changed. 
 2. Use string comparisons for simple common regexes (in 2 above).
 The implementation of Hive like clause uses similar optimizations.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-965) PERFORMANCE: optimize common case in matches (PORegex)

2009-12-15 Thread Ankit Modi (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-965?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ankit Modi updated PIG-965:
---

Status: Patch Available  (was: Open)

 PERFORMANCE: optimize common case in matches (PORegex)
 --

 Key: PIG-965
 URL: https://issues.apache.org/jira/browse/PIG-965
 Project: Pig
  Issue Type: Improvement
  Components: impl
Reporter: Thejas M Nair
Assignee: Ankit Modi
 Attachments: automaton.jar, poregex2.patch


 Some frequently seen use cases of 'matches' comparison operator have follow 
 properties -
 1. The rhs is a constant string . eg c1 matches 'abc%' 
 2. Regexes such that look for matching prefix , suffix etc are very common. 
 eg - abc%', %abc, '%abc%' 
 To optimize for these common cases , PORegex.java can be changed to -
 1. Compile the pattern (rhs of matches) re-use it if the pattern string has 
 not changed. 
 2. Use string comparisons for simple common regexes (in 2 above).
 The implementation of Hive like clause uses similar optimizations.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-965) PERFORMANCE: optimize common case in matches (PORegex)

2009-12-15 Thread Ankit Modi (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-965?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12791096#action_12791096
 ] 

Ankit Modi commented on PIG-965:


Here are numbers comparing comparing optimization 12 against optimization 1  
dk.brics

dk.brics.Runautomaton is as fast as optimization 2 and also provides similar 
speeds in a set of additional expressions.

|| Query || svn_trunk || std_dev || Optimization 1  2 || std_dev || 
Optimization 1  brics.RunAutomaton || std_dev ||
| .\*ABCD.\* |  33.87 |  0.71 | 18.77 | 0.71 | 18.94 | 0.02 |
| .\*ABCD | 30.06 | 2.91 | 18.44 | 0.05 | 18.94 | 0.03 |
| ABCD.\* |  21.93 | 2.91 | 18.35 | 0.1 | 18.85 | 0.04 |

Values are averaged over 3 runs.

 PERFORMANCE: optimize common case in matches (PORegex)
 --

 Key: PIG-965
 URL: https://issues.apache.org/jira/browse/PIG-965
 Project: Pig
  Issue Type: Improvement
  Components: impl
Reporter: Thejas M Nair
Assignee: Ankit Modi
 Attachments: automaton.jar, poregex2.patch


 Some frequently seen use cases of 'matches' comparison operator have follow 
 properties -
 1. The rhs is a constant string . eg c1 matches 'abc%' 
 2. Regexes such that look for matching prefix , suffix etc are very common. 
 eg - abc%', %abc, '%abc%' 
 To optimize for these common cases , PORegex.java can be changed to -
 1. Compile the pattern (rhs of matches) re-use it if the pattern string has 
 not changed. 
 2. Use string comparisons for simple common regexes (in 2 above).
 The implementation of Hive like clause uses similar optimizations.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-965) PERFORMANCE: optimize common case in matches (PORegex)

2009-12-14 Thread Ankit Modi (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-965?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ankit Modi updated PIG-965:
---

Attachment: (was: poregex2.patch)

 PERFORMANCE: optimize common case in matches (PORegex)
 --

 Key: PIG-965
 URL: https://issues.apache.org/jira/browse/PIG-965
 Project: Pig
  Issue Type: Improvement
  Components: impl
Reporter: Thejas M Nair
Assignee: Ankit Modi
 Attachments: automaton.jar, poregex2.patch


 Some frequently seen use cases of 'matches' comparison operator have follow 
 properties -
 1. The rhs is a constant string . eg c1 matches 'abc%' 
 2. Regexes such that look for matching prefix , suffix etc are very common. 
 eg - abc%', %abc, '%abc%' 
 To optimize for these common cases , PORegex.java can be changed to -
 1. Compile the pattern (rhs of matches) re-use it if the pattern string has 
 not changed. 
 2. Use string comparisons for simple common regexes (in 2 above).
 The implementation of Hive like clause uses similar optimizations.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-965) PERFORMANCE: optimize common case in matches (PORegex)

2009-12-14 Thread Ankit Modi (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-965?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ankit Modi updated PIG-965:
---

Status: Open  (was: Patch Available)

 PERFORMANCE: optimize common case in matches (PORegex)
 --

 Key: PIG-965
 URL: https://issues.apache.org/jira/browse/PIG-965
 Project: Pig
  Issue Type: Improvement
  Components: impl
Reporter: Thejas M Nair
Assignee: Ankit Modi
 Attachments: automaton.jar, poregex2.patch


 Some frequently seen use cases of 'matches' comparison operator have follow 
 properties -
 1. The rhs is a constant string . eg c1 matches 'abc%' 
 2. Regexes such that look for matching prefix , suffix etc are very common. 
 eg - abc%', %abc, '%abc%' 
 To optimize for these common cases , PORegex.java can be changed to -
 1. Compile the pattern (rhs of matches) re-use it if the pattern string has 
 not changed. 
 2. Use string comparisons for simple common regexes (in 2 above).
 The implementation of Hive like clause uses similar optimizations.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-965) PERFORMANCE: optimize common case in matches (PORegex)

2009-12-14 Thread Ankit Modi (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-965?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ankit Modi updated PIG-965:
---

Attachment: poregex2.patch

 PERFORMANCE: optimize common case in matches (PORegex)
 --

 Key: PIG-965
 URL: https://issues.apache.org/jira/browse/PIG-965
 Project: Pig
  Issue Type: Improvement
  Components: impl
Reporter: Thejas M Nair
Assignee: Ankit Modi
 Attachments: automaton.jar, poregex2.patch


 Some frequently seen use cases of 'matches' comparison operator have follow 
 properties -
 1. The rhs is a constant string . eg c1 matches 'abc%' 
 2. Regexes such that look for matching prefix , suffix etc are very common. 
 eg - abc%', %abc, '%abc%' 
 To optimize for these common cases , PORegex.java can be changed to -
 1. Compile the pattern (rhs of matches) re-use it if the pattern string has 
 not changed. 
 2. Use string comparisons for simple common regexes (in 2 above).
 The implementation of Hive like clause uses similar optimizations.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-965) PERFORMANCE: optimize common case in matches (PORegex)

2009-12-14 Thread Ankit Modi (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-965?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ankit Modi updated PIG-965:
---

Status: Patch Available  (was: Open)

I have included changes suggested by Thejas.

 PERFORMANCE: optimize common case in matches (PORegex)
 --

 Key: PIG-965
 URL: https://issues.apache.org/jira/browse/PIG-965
 Project: Pig
  Issue Type: Improvement
  Components: impl
Reporter: Thejas M Nair
Assignee: Ankit Modi
 Attachments: automaton.jar, poregex2.patch


 Some frequently seen use cases of 'matches' comparison operator have follow 
 properties -
 1. The rhs is a constant string . eg c1 matches 'abc%' 
 2. Regexes such that look for matching prefix , suffix etc are very common. 
 eg - abc%', %abc, '%abc%' 
 To optimize for these common cases , PORegex.java can be changed to -
 1. Compile the pattern (rhs of matches) re-use it if the pattern string has 
 not changed. 
 2. Use string comparisons for simple common regexes (in 2 above).
 The implementation of Hive like clause uses similar optimizations.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-965) PERFORMANCE: optimize common case in matches (PORegex)

2009-12-14 Thread Ankit Modi (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-965?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ankit Modi updated PIG-965:
---

Status: Open  (was: Patch Available)

 PERFORMANCE: optimize common case in matches (PORegex)
 --

 Key: PIG-965
 URL: https://issues.apache.org/jira/browse/PIG-965
 Project: Pig
  Issue Type: Improvement
  Components: impl
Reporter: Thejas M Nair
Assignee: Ankit Modi
 Attachments: automaton.jar, poregex2.patch


 Some frequently seen use cases of 'matches' comparison operator have follow 
 properties -
 1. The rhs is a constant string . eg c1 matches 'abc%' 
 2. Regexes such that look for matching prefix , suffix etc are very common. 
 eg - abc%', %abc, '%abc%' 
 To optimize for these common cases , PORegex.java can be changed to -
 1. Compile the pattern (rhs of matches) re-use it if the pattern string has 
 not changed. 
 2. Use string comparisons for simple common regexes (in 2 above).
 The implementation of Hive like clause uses similar optimizations.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-965) PERFORMANCE: optimize common case in matches (PORegex)

2009-12-14 Thread Ankit Modi (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-965?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12790545#action_12790545
 ] 

Ankit Modi commented on PIG-965:


* NonConstantRegex - I did not think of equals. But I added a length check 
before as it could find out change in length faster and to best of my knowledge 
its a getMethod. And yes as you mentioned equals will check for same object and 
instanceOf which is not useful in our case.

* The numbers published above are using dk.brics.automaton.RunAutomaton. Do you 
want me to publish numbers for more set of regexs ?

I'll create a patch for rest of the comments.

 PERFORMANCE: optimize common case in matches (PORegex)
 --

 Key: PIG-965
 URL: https://issues.apache.org/jira/browse/PIG-965
 Project: Pig
  Issue Type: Improvement
  Components: impl
Reporter: Thejas M Nair
Assignee: Ankit Modi
 Attachments: automaton.jar, poregex2.patch


 Some frequently seen use cases of 'matches' comparison operator have follow 
 properties -
 1. The rhs is a constant string . eg c1 matches 'abc%' 
 2. Regexes such that look for matching prefix , suffix etc are very common. 
 eg - abc%', %abc, '%abc%' 
 To optimize for these common cases , PORegex.java can be changed to -
 1. Compile the pattern (rhs of matches) re-use it if the pattern string has 
 not changed. 
 2. Use string comparisons for simple common regexes (in 2 above).
 The implementation of Hive like clause uses similar optimizations.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-965) PERFORMANCE: optimize common case in matches (PORegex)

2009-12-11 Thread Ankit Modi (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-965?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ankit Modi updated PIG-965:
---

Attachment: (was: poregex2.patch)

 PERFORMANCE: optimize common case in matches (PORegex)
 --

 Key: PIG-965
 URL: https://issues.apache.org/jira/browse/PIG-965
 Project: Pig
  Issue Type: Improvement
  Components: impl
Reporter: Thejas M Nair
Assignee: Ankit Modi

 Some frequently seen use cases of 'matches' comparison operator have follow 
 properties -
 1. The rhs is a constant string . eg c1 matches 'abc%' 
 2. Regexes such that look for matching prefix , suffix etc are very common. 
 eg - abc%', %abc, '%abc%' 
 To optimize for these common cases , PORegex.java can be changed to -
 1. Compile the pattern (rhs of matches) re-use it if the pattern string has 
 not changed. 
 2. Use string comparisons for simple common regexes (in 2 above).
 The implementation of Hive like clause uses similar optimizations.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-965) PERFORMANCE: optimize common case in matches (PORegex)

2009-12-11 Thread Ankit Modi (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-965?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ankit Modi updated PIG-965:
---

Attachment: (was: poregex2.patch)

 PERFORMANCE: optimize common case in matches (PORegex)
 --

 Key: PIG-965
 URL: https://issues.apache.org/jira/browse/PIG-965
 Project: Pig
  Issue Type: Improvement
  Components: impl
Reporter: Thejas M Nair
Assignee: Ankit Modi

 Some frequently seen use cases of 'matches' comparison operator have follow 
 properties -
 1. The rhs is a constant string . eg c1 matches 'abc%' 
 2. Regexes such that look for matching prefix , suffix etc are very common. 
 eg - abc%', %abc, '%abc%' 
 To optimize for these common cases , PORegex.java can be changed to -
 1. Compile the pattern (rhs of matches) re-use it if the pattern string has 
 not changed. 
 2. Use string comparisons for simple common regexes (in 2 above).
 The implementation of Hive like clause uses similar optimizations.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-965) PERFORMANCE: optimize common case in matches (PORegex)

2009-12-11 Thread Ankit Modi (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-965?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ankit Modi updated PIG-965:
---

Status: Patch Available  (was: Open)

 PERFORMANCE: optimize common case in matches (PORegex)
 --

 Key: PIG-965
 URL: https://issues.apache.org/jira/browse/PIG-965
 Project: Pig
  Issue Type: Improvement
  Components: impl
Reporter: Thejas M Nair
Assignee: Ankit Modi
 Attachments: automaton.jar, poregex2.patch


 Some frequently seen use cases of 'matches' comparison operator have follow 
 properties -
 1. The rhs is a constant string . eg c1 matches 'abc%' 
 2. Regexes such that look for matching prefix , suffix etc are very common. 
 eg - abc%', %abc, '%abc%' 
 To optimize for these common cases , PORegex.java can be changed to -
 1. Compile the pattern (rhs of matches) re-use it if the pattern string has 
 not changed. 
 2. Use string comparisons for simple common regexes (in 2 above).
 The implementation of Hive like clause uses similar optimizations.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-965) PERFORMANCE: optimize common case in matches (PORegex)

2009-12-11 Thread Ankit Modi (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-965?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ankit Modi updated PIG-965:
---

Attachment: automaton.jar
poregex2.patch

New patch with removed comments and added automaton.jar from 
http://www.brics.dk/~amoeller/automaton/automaton.jar.

It fails findBugs due to missing symbols. I ran the findBugs after adding the 
jar to the build and it did not complain about any findBugs in the modified and 
added files.

 PERFORMANCE: optimize common case in matches (PORegex)
 --

 Key: PIG-965
 URL: https://issues.apache.org/jira/browse/PIG-965
 Project: Pig
  Issue Type: Improvement
  Components: impl
Reporter: Thejas M Nair
Assignee: Ankit Modi
 Attachments: automaton.jar, poregex2.patch


 Some frequently seen use cases of 'matches' comparison operator have follow 
 properties -
 1. The rhs is a constant string . eg c1 matches 'abc%' 
 2. Regexes such that look for matching prefix , suffix etc are very common. 
 eg - abc%', %abc, '%abc%' 
 To optimize for these common cases , PORegex.java can be changed to -
 1. Compile the pattern (rhs of matches) re-use it if the pattern string has 
 not changed. 
 2. Use string comparisons for simple common regexes (in 2 above).
 The implementation of Hive like clause uses similar optimizations.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-965) PERFORMANCE: optimize common case in matches (PORegex)

2009-12-11 Thread Ankit Modi (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-965?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ankit Modi updated PIG-965:
---

Status: Open  (was: Patch Available)

One small change to JarManager.java is missing. Will add a new patch with it.

 PERFORMANCE: optimize common case in matches (PORegex)
 --

 Key: PIG-965
 URL: https://issues.apache.org/jira/browse/PIG-965
 Project: Pig
  Issue Type: Improvement
  Components: impl
Reporter: Thejas M Nair
Assignee: Ankit Modi
 Attachments: automaton.jar, poregex2.patch


 Some frequently seen use cases of 'matches' comparison operator have follow 
 properties -
 1. The rhs is a constant string . eg c1 matches 'abc%' 
 2. Regexes such that look for matching prefix , suffix etc are very common. 
 eg - abc%', %abc, '%abc%' 
 To optimize for these common cases , PORegex.java can be changed to -
 1. Compile the pattern (rhs of matches) re-use it if the pattern string has 
 not changed. 
 2. Use string comparisons for simple common regexes (in 2 above).
 The implementation of Hive like clause uses similar optimizations.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1106) FR join should not spill

2009-12-11 Thread Ankit Modi (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1106?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ankit Modi updated PIG-1106:


Status: Patch Available  (was: Open)

This patch does not have any unit tests.

 FR join should not spill
 

 Key: PIG-1106
 URL: https://issues.apache.org/jira/browse/PIG-1106
 Project: Pig
  Issue Type: Bug
Reporter: Olga Natkovich
Assignee: Ankit Modi
 Fix For: 0.7.0

 Attachments: frjoin-nonspill.patch


 Currently, the values for the replicated side of the data are placed in a 
 spillable bag (POFRJoin near line 275). This does not make sense because the 
 whole point of the optimization is that the data on one side fits into 
 memory. We already have a non-spillable bag implemented 
 (NonSpillableDataBag.java) and we need to change FRJoin code to use it. And 
 of course need to do lots of testing to make sure that we don't spill but die 
 instead when we run out of memory

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1106) FR join should not spill

2009-12-11 Thread Ankit Modi (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1106?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12789294#action_12789294
 ] 

Ankit Modi commented on PIG-1106:
-

Tests I ran were using two files

file format
f1: random chararray(100)
f2: random int

leftside file contained 100 tuples and right side file contain 3million tuples.

Code
{noformat}
A = load 'leftsidefrjoin.txt' as ( key, value);
B = load 'rightsidefrjoin.txt' as (key, value);
C = join A by key left, B by key using repl;
--- Fragmented input and replicated input
store C into 'output';
{noformat}

This generated following error
{noformat}
FATAL org.apache.hadoop.mapred.TaskTracker: Error running child : 
java.lang.OutOfMemoryError: GC overhead limit exceeded
at java.util.ArrayList.init(ArrayList.java:112)
at org.apache.pig.data.DefaultTuple.init(DefaultTuple.java:63)
at 
org.apache.pig.data.DefaultTupleFactory.newTuple(DefaultTupleFactory.java:35)
at 
org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POLocalRearrange.constructLROutput(POLocalRearrange.java:369)
at 
org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POLocalRearrange.getNext(POLocalRearrange.java:288)
at 
org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POFRJoin.setUpHashMap(POFRJoin.java:351)
at 
org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POFRJoin.getNext(POFRJoin.java:211)
at 
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.runPipeline(PigMapBase.java:250)
at 
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:241)
at 
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapOnly$Map.map(PigMapOnly.java:65)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:358)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:307)
at org.apache.hadoop.mapred.Child.main(Child.java:170)
{noformat}

I ran the same job with same records on left hand side and 100K records on 
right hand side. The job completed successfully.

 FR join should not spill
 

 Key: PIG-1106
 URL: https://issues.apache.org/jira/browse/PIG-1106
 Project: Pig
  Issue Type: Bug
Reporter: Olga Natkovich
Assignee: Ankit Modi
 Fix For: 0.7.0

 Attachments: frjoin-nonspill.patch


 Currently, the values for the replicated side of the data are placed in a 
 spillable bag (POFRJoin near line 275). This does not make sense because the 
 whole point of the optimization is that the data on one side fits into 
 memory. We already have a non-spillable bag implemented 
 (NonSpillableDataBag.java) and we need to change FRJoin code to use it. And 
 of course need to do lots of testing to make sure that we don't spill but die 
 instead when we run out of memory

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-965) PERFORMANCE: optimize common case in matches (PORegex)

2009-12-11 Thread Ankit Modi (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-965?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ankit Modi updated PIG-965:
---

Attachment: (was: automaton.jar)

 PERFORMANCE: optimize common case in matches (PORegex)
 --

 Key: PIG-965
 URL: https://issues.apache.org/jira/browse/PIG-965
 Project: Pig
  Issue Type: Improvement
  Components: impl
Reporter: Thejas M Nair
Assignee: Ankit Modi

 Some frequently seen use cases of 'matches' comparison operator have follow 
 properties -
 1. The rhs is a constant string . eg c1 matches 'abc%' 
 2. Regexes such that look for matching prefix , suffix etc are very common. 
 eg - abc%', %abc, '%abc%' 
 To optimize for these common cases , PORegex.java can be changed to -
 1. Compile the pattern (rhs of matches) re-use it if the pattern string has 
 not changed. 
 2. Use string comparisons for simple common regexes (in 2 above).
 The implementation of Hive like clause uses similar optimizations.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-965) PERFORMANCE: optimize common case in matches (PORegex)

2009-12-11 Thread Ankit Modi (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-965?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ankit Modi updated PIG-965:
---

Attachment: (was: poregex2.patch)

 PERFORMANCE: optimize common case in matches (PORegex)
 --

 Key: PIG-965
 URL: https://issues.apache.org/jira/browse/PIG-965
 Project: Pig
  Issue Type: Improvement
  Components: impl
Reporter: Thejas M Nair
Assignee: Ankit Modi

 Some frequently seen use cases of 'matches' comparison operator have follow 
 properties -
 1. The rhs is a constant string . eg c1 matches 'abc%' 
 2. Regexes such that look for matching prefix , suffix etc are very common. 
 eg - abc%', %abc, '%abc%' 
 To optimize for these common cases , PORegex.java can be changed to -
 1. Compile the pattern (rhs of matches) re-use it if the pattern string has 
 not changed. 
 2. Use string comparisons for simple common regexes (in 2 above).
 The implementation of Hive like clause uses similar optimizations.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-965) PERFORMANCE: optimize common case in matches (PORegex)

2009-12-11 Thread Ankit Modi (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-965?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ankit Modi updated PIG-965:
---

Status: Patch Available  (was: Open)

 PERFORMANCE: optimize common case in matches (PORegex)
 --

 Key: PIG-965
 URL: https://issues.apache.org/jira/browse/PIG-965
 Project: Pig
  Issue Type: Improvement
  Components: impl
Reporter: Thejas M Nair
Assignee: Ankit Modi
 Attachments: automaton.jar, poregex2.patch


 Some frequently seen use cases of 'matches' comparison operator have follow 
 properties -
 1. The rhs is a constant string . eg c1 matches 'abc%' 
 2. Regexes such that look for matching prefix , suffix etc are very common. 
 eg - abc%', %abc, '%abc%' 
 To optimize for these common cases , PORegex.java can be changed to -
 1. Compile the pattern (rhs of matches) re-use it if the pattern string has 
 not changed. 
 2. Use string comparisons for simple common regexes (in 2 above).
 The implementation of Hive like clause uses similar optimizations.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-965) PERFORMANCE: optimize common case in matches (PORegex)

2009-12-11 Thread Ankit Modi (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-965?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ankit Modi updated PIG-965:
---

Attachment: automaton.jar
poregex2.patch

 PERFORMANCE: optimize common case in matches (PORegex)
 --

 Key: PIG-965
 URL: https://issues.apache.org/jira/browse/PIG-965
 Project: Pig
  Issue Type: Improvement
  Components: impl
Reporter: Thejas M Nair
Assignee: Ankit Modi
 Attachments: automaton.jar, poregex2.patch


 Some frequently seen use cases of 'matches' comparison operator have follow 
 properties -
 1. The rhs is a constant string . eg c1 matches 'abc%' 
 2. Regexes such that look for matching prefix , suffix etc are very common. 
 eg - abc%', %abc, '%abc%' 
 To optimize for these common cases , PORegex.java can be changed to -
 1. Compile the pattern (rhs of matches) re-use it if the pattern string has 
 not changed. 
 2. Use string comparisons for simple common regexes (in 2 above).
 The implementation of Hive like clause uses similar optimizations.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-965) PERFORMANCE: optimize common case in matches (PORegex)

2009-12-10 Thread Ankit Modi (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-965?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ankit Modi updated PIG-965:
---

Attachment: poregex2.patch

Attaching one more file of patch. This one has all the changes, except changes 
to build.xml. Still trying to find a maven repo for dk.brics.automaton.

 PERFORMANCE: optimize common case in matches (PORegex)
 --

 Key: PIG-965
 URL: https://issues.apache.org/jira/browse/PIG-965
 Project: Pig
  Issue Type: Improvement
  Components: impl
Reporter: Thejas M Nair
Assignee: Ankit Modi
 Attachments: poregex.patch, poregex2.patch, poregex2.patch


 Some frequently seen use cases of 'matches' comparison operator have follow 
 properties -
 1. The rhs is a constant string . eg c1 matches 'abc%' 
 2. Regexes such that look for matching prefix , suffix etc are very common. 
 eg - abc%', %abc, '%abc%' 
 To optimize for these common cases , PORegex.java can be changed to -
 1. Compile the pattern (rhs of matches) re-use it if the pattern string has 
 not changed. 
 2. Use string comparisons for simple common regexes (in 2 above).
 The implementation of Hive like clause uses similar optimizations.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-965) PERFORMANCE: optimize common case in matches (PORegex)

2009-12-07 Thread Ankit Modi (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-965?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ankit Modi updated PIG-965:
---

Attachment: poregex2.patch
poregex.patch

These are patches for two implementations 

One (poregex.patch) is an implementation applying optimization mentioned above 
in the JIRA.
Second (poregex2.patch) implementation applies optimization 1 and uses 
dk.brics.automaton for running simple regular expressions. Otherwise it reverts 
back to java.util.regex.

In 1 the decision to use optimization two or use java.util.regex is decided by 
getSimpleString method

In 2 the decision to use dk.brics.automaton is done by 
determineBestRegexMethod. ( changes to build.xml is this patch are temporary )

Both patches use RegexInit as an implementation which makes a decision ( 
calling the above mentioned  decision functions ) and then sets the 
implementation to one decided by the decision function.

In second patch, the decision function was created looking at the support of 
operators in dk.brics.automaton and its grammar. I tried out the classes 
supported and not supported in dk.brics.automaton and decided upon it.

I could not find any specific page mentioning the difference between regex 
language java.util.regex and dk.brics.automaton.

 PERFORMANCE: optimize common case in matches (PORegex)
 --

 Key: PIG-965
 URL: https://issues.apache.org/jira/browse/PIG-965
 Project: Pig
  Issue Type: Improvement
  Components: impl
Reporter: Thejas M Nair
Assignee: Ankit Modi
 Attachments: poregex.patch, poregex2.patch


 Some frequently seen use cases of 'matches' comparison operator have follow 
 properties -
 1. The rhs is a constant string . eg c1 matches 'abc%' 
 2. Regexes such that look for matching prefix , suffix etc are very common. 
 eg - abc%', %abc, '%abc%' 
 To optimize for these common cases , PORegex.java can be changed to -
 1. Compile the pattern (rhs of matches) re-use it if the pattern string has 
 not changed. 
 2. Use string comparisons for simple common regexes (in 2 above).
 The implementation of Hive like clause uses similar optimizations.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (PIG-1130) In pig local ( hadoop local mode ) mode the counting of number of tuples and bytes is incorrect if data is more than one local split.

2009-12-06 Thread Ankit Modi (JIRA)
In pig local ( hadoop local mode ) mode the counting of number of tuples and 
bytes is incorrect if data is more than one local split.
-

 Key: PIG-1130
 URL: https://issues.apache.org/jira/browse/PIG-1130
 Project: Pig
  Issue Type: Bug
Reporter: Ankit Modi
Priority: Minor


If the output generates more than one part file, the current code only gives 
stats of the first part file. ie. part-0

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-965) PERFORMANCE: optimize common case in matches (PORegex)

2009-12-01 Thread Ankit Modi (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-965?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12784596#action_12784596
 ] 

Ankit Modi commented on PIG-965:


I implemented a patch with optimization 1 and 2 mentioned above and another 
patch with optimization 1,2 and dk.brics.automaton.

dk.brics.automaton does not support all features of java.util.regex hence the 
second patch considers that and switches to java.util.regex if the regex can 
only be handled by java.util.regex.

Here are the numbers

||Regex||   svn_trunk   ||Optimization 1 and 2||
dk.brics.automaton|| comments ||
| .\*ABCD.\* | 92.74 | 50.92| 49.32 | Here only optimization 2 is 
used |
| .\*[A-F]{2,3}.\*  |152.3| 133.48| 105.93 | dk.brics.automaton is used |
| A.B.C.D | 54.492 | 44.46 | 44.66 | dk.brics.automaton is used |
|   .\*([A-F]{4})\w\*\1.\* | 129.29 | 112.89 | 109.43 | java.util.regex used in 
all cases |
|   .\*\[A-F\]\{4\}\w\*[N-Z]\{3\}.\* | 129.63 | 108.11 | 54.42 | 
dk.brics.automaton used |


These results were obtained using Local Mode on 1 Billion lines of data of 
following format
f1:Chararray(100) of random chars from [A-Z]
f2:int random integer

dk.brics.automaton provides good performance in case of complex regex. 


 PERFORMANCE: optimize common case in matches (PORegex)
 --

 Key: PIG-965
 URL: https://issues.apache.org/jira/browse/PIG-965
 Project: Pig
  Issue Type: Improvement
  Components: impl
Reporter: Thejas M Nair
Assignee: Ankit Modi

 Some frequently seen use cases of 'matches' comparison operator have follow 
 properties -
 1. The rhs is a constant string . eg c1 matches 'abc%' 
 2. Regexes such that look for matching prefix , suffix etc are very common. 
 eg - abc%', %abc, '%abc%' 
 To optimize for these common cases , PORegex.java can be changed to -
 1. Compile the pattern (rhs of matches) re-use it if the pattern string has 
 not changed. 
 2. Use string comparisons for simple common regexes (in 2 above).
 The implementation of Hive like clause uses similar optimizations.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1107) PigLineRecordReader bails out on an empty line for compressed data

2009-11-25 Thread Ankit Modi (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1107?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ankit Modi updated PIG-1107:


Status: Patch Available  (was: Open)

 PigLineRecordReader bails out on an empty line for compressed data
 --

 Key: PIG-1107
 URL: https://issues.apache.org/jira/browse/PIG-1107
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.6.0
Reporter: Ankit Modi
Assignee: Ankit Modi
 Fix For: 0.6.0

 Attachments: pig_piglinerecordreader_bug.patch


 PigLineRecordReader bails out with an exception when it encounters an empty 
 line in a compressed file
 java.lang.ArrayIndexOutOfBoundsException: -1
at 
 org.apache.pig.impl.io.PigLineRecordReader$LineReader.getNext(PigLineRecordReader.java:136)
 at 
 org.apache.pig.impl.io.PigLineRecordReader.next(PigLineRecordReader.java:57)
 at org.apache.pig.builtin.PigStorage.getNext(PigStorage.java:121)
 at 
 org.apache.pig.backend.executionengine.PigSlice.next(PigSlice.java:139)
 at 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.SliceWrapper$1.next(SliceWrapper.java:164)
 at 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.SliceWrapper$1.next(SliceWrapper.java:140)
 at 
 org.apache.hadoop.mapred.MapTask$TrackedRecordReader.moveToNext(MapTask.java:192)
 at 
 org.apache.hadoop.mapred.MapTask$TrackedRecordReader.next(MapTask.java:176)
 at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:48)
 at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:358)
 at org.apache.hadoop.mapred.MapTask.run(MapTask.java:307)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1107) PigLineRecordReader bails out on an empty line for compressed data

2009-11-25 Thread Ankit Modi (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1107?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ankit Modi updated PIG-1107:


Attachment: pig_piglinerecordreader_bug.patch

Submitting a small patch. It has 2 new unit tests for the patch applied.

 PigLineRecordReader bails out on an empty line for compressed data
 --

 Key: PIG-1107
 URL: https://issues.apache.org/jira/browse/PIG-1107
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.6.0
Reporter: Ankit Modi
Assignee: Ankit Modi
 Fix For: 0.6.0

 Attachments: pig_piglinerecordreader_bug.patch


 PigLineRecordReader bails out with an exception when it encounters an empty 
 line in a compressed file
 java.lang.ArrayIndexOutOfBoundsException: -1
at 
 org.apache.pig.impl.io.PigLineRecordReader$LineReader.getNext(PigLineRecordReader.java:136)
 at 
 org.apache.pig.impl.io.PigLineRecordReader.next(PigLineRecordReader.java:57)
 at org.apache.pig.builtin.PigStorage.getNext(PigStorage.java:121)
 at 
 org.apache.pig.backend.executionengine.PigSlice.next(PigSlice.java:139)
 at 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.SliceWrapper$1.next(SliceWrapper.java:164)
 at 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.SliceWrapper$1.next(SliceWrapper.java:140)
 at 
 org.apache.hadoop.mapred.MapTask$TrackedRecordReader.moveToNext(MapTask.java:192)
 at 
 org.apache.hadoop.mapred.MapTask$TrackedRecordReader.next(MapTask.java:176)
 at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:48)
 at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:358)
 at org.apache.hadoop.mapred.MapTask.run(MapTask.java:307)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (PIG-1107) PigLineRecordReader bails out on an empty line for compressed data

2009-11-24 Thread Ankit Modi (JIRA)
PigLineRecordReader bails out on an empty line for compressed data
--

 Key: PIG-1107
 URL: https://issues.apache.org/jira/browse/PIG-1107
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.6.0
Reporter: Ankit Modi
Assignee: Ankit Modi
 Fix For: 0.6.0


PigLineRecordReader bails out with an exception when it encounters an empty 
line in a compressed file

java.lang.ArrayIndexOutOfBoundsException: -1
   at 
org.apache.pig.impl.io.PigLineRecordReader$LineReader.getNext(PigLineRecordReader.java:136)
at 
org.apache.pig.impl.io.PigLineRecordReader.next(PigLineRecordReader.java:57)
at org.apache.pig.builtin.PigStorage.getNext(PigStorage.java:121)
at 
org.apache.pig.backend.executionengine.PigSlice.next(PigSlice.java:139)
at 
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.SliceWrapper$1.next(SliceWrapper.java:164)
at 
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.SliceWrapper$1.next(SliceWrapper.java:140)
at 
org.apache.hadoop.mapred.MapTask$TrackedRecordReader.moveToNext(MapTask.java:192)
at 
org.apache.hadoop.mapred.MapTask$TrackedRecordReader.next(MapTask.java:176)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:48)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:358)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:307)




-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1053) Consider moving to Hadoop for local mode

2009-11-17 Thread Ankit Modi (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1053?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ankit Modi updated PIG-1053:


Attachment: hadoopLocal.patch

This patch fails in releaseAudit for two new html files.

 Consider moving to Hadoop for local mode
 

 Key: PIG-1053
 URL: https://issues.apache.org/jira/browse/PIG-1053
 Project: Pig
  Issue Type: Improvement
Reporter: Alan Gates
Assignee: Ankit Modi
 Attachments: hadoopLocal.patch


 We need to consider moving Pig to use Hadoop's local mode instead of its own.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1053) Consider moving to Hadoop for local mode

2009-11-17 Thread Ankit Modi (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1053?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ankit Modi updated PIG-1053:


Status: Patch Available  (was: Open)

 Consider moving to Hadoop for local mode
 

 Key: PIG-1053
 URL: https://issues.apache.org/jira/browse/PIG-1053
 Project: Pig
  Issue Type: Improvement
Reporter: Alan Gates
Assignee: Ankit Modi
 Attachments: hadoopLocal.patch


 We need to consider moving Pig to use Hadoop's local mode instead of its own.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1053) Consider moving to Hadoop for local mode

2009-11-17 Thread Ankit Modi (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1053?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12779179#action_12779179
 ] 

Ankit Modi commented on PIG-1053:
-

This patch has an issue with custom comparators ( OrderBy) in Local Mode ( Does 
not affect MapReduce mode ).

Details:
Pig uses custom Comparators by setting OutputKeyComparator to the 
customComparator.class, and passing the jar path to JVM while starting the task.
In this new local mode a new JVM is not started. So hadoop does not have the 
classpath of customComparator and fails.

A solution for the above problem would be to pass jarpath of customComparator 
in the classpath argument to JVM running pig.

eg. CustomComparatorUse.pig
register custom.jar

A = load 'file';B = order a by * using custompackage.customclass; -- Here hadoop
 bails out giving ClassNotFoundException

 Consider moving to Hadoop for local mode
 

 Key: PIG-1053
 URL: https://issues.apache.org/jira/browse/PIG-1053
 Project: Pig
  Issue Type: Improvement
Reporter: Alan Gates
Assignee: Ankit Modi
 Attachments: hadoopLocal.patch


 We need to consider moving Pig to use Hadoop's local mode instead of its own.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1053) Consider moving to Hadoop for local mode

2009-11-17 Thread Ankit Modi (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1053?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12779181#action_12779181
 ] 

Ankit Modi commented on PIG-1053:
-

This patch has an issue with custom comparators ( OrderBy) in Local Mode ( Does 
not affect MapReduce mode ).

Details:
Pig uses custom Comparators by setting OutputKeyComparator to the 
customComparator.class, and passing the jar path to JVM while starting the task.
In this new local mode a new JVM is not started. So hadoop does not have the 
classpath of customComparator and fails.

A solution for the above problem would be to pass jarpath of customComparator 
in the classpath argument to JVM running pig.

eg.
{code:title=CustomComparatorUse.pig}
register custom.jar
A = load 'file';
B = order A by * using custompackage.customclass; --- Here hadoop bails out 
giving ClassNotFoundException
store B into 'file2'
{code}

JVM Command
{{java -cp pig.jar org.pig.apache.Main -x local CustomComparatorUse.pig # This 
does not work}}

Use this instead
{{java -cp pig.jar:{color:red}custom.jar{color} org.pig.apache.Main -x local 
CustomComparatorUse.pig}}

 Consider moving to Hadoop for local mode
 

 Key: PIG-1053
 URL: https://issues.apache.org/jira/browse/PIG-1053
 Project: Pig
  Issue Type: Improvement
Reporter: Alan Gates
Assignee: Ankit Modi
 Attachments: hadoopLocal.patch


 We need to consider moving Pig to use Hadoop's local mode instead of its own.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1053) Consider moving to Hadoop for local mode

2009-11-17 Thread Ankit Modi (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1053?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12779252#action_12779252
 ] 

Ankit Modi commented on PIG-1053:
-

PhysicalPlan in local mode had POCounter Operator before every POStore. This 
operator was used for getting stats.

As we moved to Hadoop this operator is no longer used. Hence the plan size 
changed. So the numbers changed. 

 Consider moving to Hadoop for local mode
 

 Key: PIG-1053
 URL: https://issues.apache.org/jira/browse/PIG-1053
 Project: Pig
  Issue Type: Improvement
Reporter: Alan Gates
Assignee: Ankit Modi
 Attachments: hadoopLocal.patch


 We need to consider moving Pig to use Hadoop's local mode instead of its own.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1053) Consider moving to Hadoop for local mode

2009-11-17 Thread Ankit Modi (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1053?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ankit Modi updated PIG-1053:


Attachment: hadoopLocal.patch

 Consider moving to Hadoop for local mode
 

 Key: PIG-1053
 URL: https://issues.apache.org/jira/browse/PIG-1053
 Project: Pig
  Issue Type: Improvement
Reporter: Alan Gates
Assignee: Ankit Modi
 Attachments: hadoopLocal.patch


 We need to consider moving Pig to use Hadoop's local mode instead of its own.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1036) Fragment-replicate left outer join

2009-11-03 Thread Ankit Modi (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1036?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ankit Modi updated PIG-1036:


Attachment: LeftOuterFRJoin.patch

Attaching a new patch.

The join now only supports two way Left join. 
Join requires a schema to be mandatory be present on the right side, and it is 
used to determine the number of null fields/columns in nullTuple.

As its a two way join we use nullBag instead of an Array of nullBag. 
A DataBag is used instead of a Tuple to maintain consistency on the result Type 
of ConstantExpression.

 Fragment-replicate left outer join
 --

 Key: PIG-1036
 URL: https://issues.apache.org/jira/browse/PIG-1036
 Project: Pig
  Issue Type: New Feature
Reporter: Olga Natkovich
Assignee: Ankit Modi
 Attachments: LeftOuterFRJoin.patch




-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1036) Fragment-replicate left outer join

2009-11-03 Thread Ankit Modi (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1036?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ankit Modi updated PIG-1036:


Attachment: (was: LeftOuterFRJoin.patch)

 Fragment-replicate left outer join
 --

 Key: PIG-1036
 URL: https://issues.apache.org/jira/browse/PIG-1036
 Project: Pig
  Issue Type: New Feature
Reporter: Olga Natkovich
Assignee: Ankit Modi



-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1036) Fragment-replicate left outer join

2009-11-03 Thread Ankit Modi (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1036?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ankit Modi updated PIG-1036:


Status: Open  (was: Patch Available)

 Fragment-replicate left outer join
 --

 Key: PIG-1036
 URL: https://issues.apache.org/jira/browse/PIG-1036
 Project: Pig
  Issue Type: New Feature
Reporter: Olga Natkovich
Assignee: Ankit Modi
 Attachments: LeftOuterFRJoin.patch




-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1036) Fragment-replicate left outer join

2009-11-03 Thread Ankit Modi (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1036?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ankit Modi updated PIG-1036:


Status: Patch Available  (was: Open)

 Fragment-replicate left outer join
 --

 Key: PIG-1036
 URL: https://issues.apache.org/jira/browse/PIG-1036
 Project: Pig
  Issue Type: New Feature
Reporter: Olga Natkovich
Assignee: Ankit Modi
 Attachments: LeftOuterFRJoin.patch




-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1036) Fragment-replicate left outer join

2009-10-29 Thread Ankit Modi (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1036?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ankit Modi updated PIG-1036:


Status: Open  (was: Patch Available)

 Fragment-replicate left outer join
 --

 Key: PIG-1036
 URL: https://issues.apache.org/jira/browse/PIG-1036
 Project: Pig
  Issue Type: New Feature
Reporter: Olga Natkovich
Assignee: Ankit Modi
 Attachments: LeftOuterFRJoin.patch




-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1036) Fragment-replicate left outer join

2009-10-29 Thread Ankit Modi (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1036?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ankit Modi updated PIG-1036:


Status: Patch Available  (was: Open)

 Fragment-replicate left outer join
 --

 Key: PIG-1036
 URL: https://issues.apache.org/jira/browse/PIG-1036
 Project: Pig
  Issue Type: New Feature
Reporter: Olga Natkovich
Assignee: Ankit Modi
 Attachments: LeftOuterFRJoin.patch




-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Assigned: (PIG-1036) Fragment-replicate left outer join

2009-10-28 Thread Ankit Modi (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1036?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ankit Modi reassigned PIG-1036:
---

Assignee: Ankit Modi

 Fragment-replicate left outer join
 --

 Key: PIG-1036
 URL: https://issues.apache.org/jira/browse/PIG-1036
 Project: Pig
  Issue Type: New Feature
Reporter: Olga Natkovich
Assignee: Ankit Modi



-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1036) Fragment-replicate left outer join

2009-10-28 Thread Ankit Modi (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1036?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ankit Modi updated PIG-1036:


Attachment: LeftOuterFRJoin.patch

This patch fails in findBugs as I had modified the line that contained findBugs 
warnings earlier.

It also fails on ReleaseAudit for html ( doc ) file for POFRJoin

 Fragment-replicate left outer join
 --

 Key: PIG-1036
 URL: https://issues.apache.org/jira/browse/PIG-1036
 Project: Pig
  Issue Type: New Feature
Reporter: Olga Natkovich
Assignee: Ankit Modi
 Attachments: LeftOuterFRJoin.patch




-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-960) Using Hadoop's optimized LineRecordReader for reading Tuples in PigStorage

2009-10-01 Thread Ankit Modi (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-960?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ankit Modi updated PIG-960:
---

Attachment: pig_rlr.patch

 Using Hadoop's optimized LineRecordReader for reading Tuples in PigStorage 
 ---

 Key: PIG-960
 URL: https://issues.apache.org/jira/browse/PIG-960
 Project: Pig
  Issue Type: Improvement
  Components: impl
Reporter: Ankit Modi
 Attachments: pig_rlr.patch


 PigStorage's reading of Tuples ( lines ) can be optimized using Hadoop's 
 {{LineRecordReader}}.
 This can help in following areas
 - Improving performance reading of Tuples (lines) in {{PigStorage}}
 - Any future improvements in line reading done in Hadoop's 
 {{LineRecordReader}} is automatically carried over to Pig
 Issues that are handled by this patch
 - BZip uses internal buffers and positioning for determining the number of 
 bytes read. Hence buffering done by {{LineRecordReader}} has to be turned off
 - Current implementation of {{LocalSeekableInputStream}} does not implement 
 {{available}} method. This method has to be implemented.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-960) Using Hadoop's optimized LineRecordReader for reading Tuples in PigStorage

2009-10-01 Thread Ankit Modi (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-960?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ankit Modi updated PIG-960:
---

Attachment: (was: pig_rlr.patch)

 Using Hadoop's optimized LineRecordReader for reading Tuples in PigStorage 
 ---

 Key: PIG-960
 URL: https://issues.apache.org/jira/browse/PIG-960
 Project: Pig
  Issue Type: Improvement
  Components: impl
Reporter: Ankit Modi
 Attachments: pig_rlr.patch


 PigStorage's reading of Tuples ( lines ) can be optimized using Hadoop's 
 {{LineRecordReader}}.
 This can help in following areas
 - Improving performance reading of Tuples (lines) in {{PigStorage}}
 - Any future improvements in line reading done in Hadoop's 
 {{LineRecordReader}} is automatically carried over to Pig
 Issues that are handled by this patch
 - BZip uses internal buffers and positioning for determining the number of 
 bytes read. Hence buffering done by {{LineRecordReader}} has to be turned off
 - Current implementation of {{LocalSeekableInputStream}} does not implement 
 {{available}} method. This method has to be implemented.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-960) Using Hadoop's optimized LineRecordReader for reading Tuples in PigStorage

2009-10-01 Thread Ankit Modi (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-960?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12761376#action_12761376
 ] 

Ankit Modi commented on PIG-960:


Added the latest patch making PigLineRecordReader a wrapper only.

 Using Hadoop's optimized LineRecordReader for reading Tuples in PigStorage 
 ---

 Key: PIG-960
 URL: https://issues.apache.org/jira/browse/PIG-960
 Project: Pig
  Issue Type: Improvement
  Components: impl
Reporter: Ankit Modi
 Attachments: pig_rlr.patch


 PigStorage's reading of Tuples ( lines ) can be optimized using Hadoop's 
 {{LineRecordReader}}.
 This can help in following areas
 - Improving performance reading of Tuples (lines) in {{PigStorage}}
 - Any future improvements in line reading done in Hadoop's 
 {{LineRecordReader}} is automatically carried over to Pig
 Issues that are handled by this patch
 - BZip uses internal buffers and positioning for determining the number of 
 bytes read. Hence buffering done by {{LineRecordReader}} has to be turned off
 - Current implementation of {{LocalSeekableInputStream}} does not implement 
 {{available}} method. This method has to be implemented.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-960) Using Hadoop's optimized LineRecordReader for reading Tuples in PigStorage

2009-09-30 Thread Ankit Modi (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-960?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ankit Modi updated PIG-960:
---

Attachment: (was: pig_rlr.patch)

 Using Hadoop's optimized LineRecordReader for reading Tuples in PigStorage 
 ---

 Key: PIG-960
 URL: https://issues.apache.org/jira/browse/PIG-960
 Project: Pig
  Issue Type: Improvement
  Components: impl
Reporter: Ankit Modi
 Attachments: pig_rlr.patch


 PigStorage's reading of Tuples ( lines ) can be optimized using Hadoop's 
 {{LineRecordReader}}.
 This can help in following areas
 - Improving performance reading of Tuples (lines) in {{PigStorage}}
 - Any future improvements in line reading done in Hadoop's 
 {{LineRecordReader}} is automatically carried over to Pig
 Issues that are handled by this patch
 - BZip uses internal buffers and positioning for determining the number of 
 bytes read. Hence buffering done by {{LineRecordReader}} has to be turned off
 - Current implementation of {{LocalSeekableInputStream}} does not implement 
 {{available}} method. This method has to be implemented.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-960) Using Hadoop's optimized LineRecordReader for reading Tuples in PigStorage

2009-09-29 Thread Ankit Modi (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-960?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12760822#action_12760822
 ] 

Ankit Modi commented on PIG-960:


Thanks for comments Daniel.

Answers:
1. PigLineRecordReader (PLRR) needs to know the type of InputStream it is 
handling. BZip2 or Uncompressed. Depending on the type of input stream it 
chooses which Reader to utilize. BPIS ( BufferedPositionedInputStream ) stores 
the input stream as a protected member. PLRR can access this via following 
ways: - making member public, - adding a get method to access it or - inherit.
I implemented the last one as it makes least changes to BPIS.
2. Good one. Will be fixed in next patch.
3. Will be added in next patch
4. Corrected in next patch.

 Using Hadoop's optimized LineRecordReader for reading Tuples in PigStorage 
 ---

 Key: PIG-960
 URL: https://issues.apache.org/jira/browse/PIG-960
 Project: Pig
  Issue Type: Improvement
  Components: impl
Reporter: Ankit Modi
 Attachments: pig_rlr.patch


 PigStorage's reading of Tuples ( lines ) can be optimized using Hadoop's 
 {{LineRecordReader}}.
 This can help in following areas
 - Improving performance reading of Tuples (lines) in {{PigStorage}}
 - Any future improvements in line reading done in Hadoop's 
 {{LineRecordReader}} is automatically carried over to Pig
 Issues that are handled by this patch
 - BZip uses internal buffers and positioning for determining the number of 
 bytes read. Hence buffering done by {{LineRecordReader}} has to be turned off
 - Current implementation of {{LocalSeekableInputStream}} does not implement 
 {{available}} method. This method has to be implemented.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-960) Using Hadoop's optimized LineRecordReader for reading Tuples in PigStorage

2009-09-28 Thread Ankit Modi (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-960?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ankit Modi updated PIG-960:
---

Attachment: pig_rlr.patch

Added a new patch with Apache license and SVN Trunk Revision 819662

 Using Hadoop's optimized LineRecordReader for reading Tuples in PigStorage 
 ---

 Key: PIG-960
 URL: https://issues.apache.org/jira/browse/PIG-960
 Project: Pig
  Issue Type: Improvement
  Components: impl
Reporter: Ankit Modi
 Attachments: pig_rlr.patch


 PigStorage's reading of Tuples ( lines ) can be optimized using Hadoop's 
 {{LineRecordReader}}.
 This can help in following areas
 - Improving performance reading of Tuples (lines) in {{PigStorage}}
 - Any future improvements in line reading done in Hadoop's 
 {{LineRecordReader}} is automatically carried over to Pig
 Issues that are handled by this patch
 - BZip uses internal buffers and positioning for determining the number of 
 bytes read. Hence buffering done by {{LineRecordReader}} has to be turned off
 - Current implementation of {{LocalSeekableInputStream}} does not implement 
 {{available}} method. This method has to be implemented.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-960) Using Hadoop's optimized LineRecordReader for reading Tuples in PigStorage

2009-09-17 Thread Ankit Modi (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-960?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ankit Modi updated PIG-960:
---

Status: Open  (was: Patch Available)

This patch failed in release audit

 Using Hadoop's optimized LineRecordReader for reading Tuples in PigStorage 
 ---

 Key: PIG-960
 URL: https://issues.apache.org/jira/browse/PIG-960
 Project: Pig
  Issue Type: Improvement
  Components: impl
Reporter: Ankit Modi

 PigStorage's reading of Tuples ( lines ) can be optimized using Hadoop's 
 {{LineRecordReader}}.
 This can help in following areas
 - Improving performance reading of Tuples (lines) in {{PigStorage}}
 - Any future improvements in line reading done in Hadoop's 
 {{LineRecordReader}} is automatically carried over to Pig
 Issues that are handled by this patch
 - BZip uses internal buffers and positioning for determining the number of 
 bytes read. Hence buffering done by {{LineRecordReader}} has to be turned off
 - Current implementation of {{LocalSeekableInputStream}} does not implement 
 {{available}} method. This method has to be implemented.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-960) Using Hadoop's optimized LineRecordReader for reading Tuples in PigStorage

2009-09-15 Thread Ankit Modi (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-960?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ankit Modi updated PIG-960:
---

Patch Info: [Patch Available]

Adding a patch file

 Using Hadoop's optimized LineRecordReader for reading Tuples in PigStorage 
 ---

 Key: PIG-960
 URL: https://issues.apache.org/jira/browse/PIG-960
 Project: Pig
  Issue Type: Improvement
  Components: impl
Reporter: Ankit Modi

 PigStorage's reading of Tuples ( lines ) can be optimized using Hadoop's 
 {{LineRecordReader}}.
 This can help in following areas
 - Improving performance reading of Tuples (lines) in {{PigStorage}}
 - Any future improvements in line reading done in Hadoop's 
 {{LineRecordReader}} is automatically carried over to Pig
 Issues that are handled by this patch
 - BZip uses internal buffers and positioning for determining the number of 
 bytes read. Hence buffering done by {{LineRecordReader}} has to be turned off
 - Current implementation of {{LocalSeekableInputStream}} does not implement 
 {{available}} method. This method has to be implemented.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (PIG-960) Using Hadoop's optimized LineRecordReader for reading Tuples in PigStorage

2009-09-14 Thread Ankit Modi (JIRA)
Using Hadoop's optimized LineRecordReader for reading Tuples in PigStorage 
---

 Key: PIG-960
 URL: https://issues.apache.org/jira/browse/PIG-960
 Project: Pig
  Issue Type: Improvement
  Components: impl
Reporter: Ankit Modi


PigStorage's reading of Tuples ( lines ) can be optimized using Hadoop's 
{{LineRecordReader}}.

This can help in following areas
- Improving performance reading of Tuples (lines) in {{PigStorage}}
- Any future improvements in line reading done in Hadoop's {{LineRecordReader}} 
is automatically carried over to Pig

Issues that are handled by this patch
- BZip uses internal buffers and positioning for determining the number of 
bytes read. Hence buffering done by {{LineRecordReader}} has to be turned off
- Current implementation of {{LocalSeekableInputStream}} does not implement 
{{available}} method. This method has to be implemented.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-960) Using Hadoop's optimized LineRecordReader for reading Tuples in PigStorage

2009-09-14 Thread Ankit Modi (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-960?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ankit Modi updated PIG-960:
---

Patch Info:   (was: [Patch Available])

Performance improvement numbers obtained by running PigMix

||Script||svn Trunk||LineRecordReader Patch||
||L1|186|147|
||L2|73|33|
||L3|195|165|
||L4|116|76|
||L5|93|59|
||L6|102|63|
||L7|91|69|
||L8|84|44|
||L9|189|148|
||L10|285|268|
||L11|108|51|
||L12|112|73|
||Sum|1634|1196|
||% Improvement| ||26.81|



 Using Hadoop's optimized LineRecordReader for reading Tuples in PigStorage 
 ---

 Key: PIG-960
 URL: https://issues.apache.org/jira/browse/PIG-960
 Project: Pig
  Issue Type: Improvement
  Components: impl
Reporter: Ankit Modi

 PigStorage's reading of Tuples ( lines ) can be optimized using Hadoop's 
 {{LineRecordReader}}.
 This can help in following areas
 - Improving performance reading of Tuples (lines) in {{PigStorage}}
 - Any future improvements in line reading done in Hadoop's 
 {{LineRecordReader}} is automatically carried over to Pig
 Issues that are handled by this patch
 - BZip uses internal buffers and positioning for determining the number of 
 bytes read. Hence buffering done by {{LineRecordReader}} has to be turned off
 - Current implementation of {{LocalSeekableInputStream}} does not implement 
 {{available}} method. This method has to be implemented.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.