[jira] Updated: (PIG-1036) Fragment-replicate left outer join

2009-10-29 Thread Ankit Modi (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1036?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ankit Modi updated PIG-1036:


Attachment: (was: LeftOuterFRJoin.patch)

> Fragment-replicate left outer join
> --
>
> Key: PIG-1036
> URL: https://issues.apache.org/jira/browse/PIG-1036
> Project: Pig
>  Issue Type: New Feature
>Reporter: Olga Natkovich
>Assignee: Ankit Modi
>


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1036) Fragment-replicate left outer join

2009-10-29 Thread Ankit Modi (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1036?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ankit Modi updated PIG-1036:


Attachment: LeftOuterFRJoin.patch

Updated with the new SVN trunk. Findbugs are removed automatically with Olgan's 
changes.
Even ReleaseAudit warnings are removed.

> Fragment-replicate left outer join
> --
>
> Key: PIG-1036
> URL: https://issues.apache.org/jira/browse/PIG-1036
> Project: Pig
>  Issue Type: New Feature
>Reporter: Olga Natkovich
>Assignee: Ankit Modi
> Attachments: LeftOuterFRJoin.patch
>
>


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1036) Fragment-replicate left outer join

2009-10-29 Thread Ankit Modi (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1036?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ankit Modi updated PIG-1036:


Status: Open  (was: Patch Available)

> Fragment-replicate left outer join
> --
>
> Key: PIG-1036
> URL: https://issues.apache.org/jira/browse/PIG-1036
> Project: Pig
>  Issue Type: New Feature
>Reporter: Olga Natkovich
>Assignee: Ankit Modi
> Attachments: LeftOuterFRJoin.patch
>
>


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1036) Fragment-replicate left outer join

2009-10-29 Thread Ankit Modi (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1036?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ankit Modi updated PIG-1036:


Status: Patch Available  (was: Open)

> Fragment-replicate left outer join
> --
>
> Key: PIG-1036
> URL: https://issues.apache.org/jira/browse/PIG-1036
> Project: Pig
>  Issue Type: New Feature
>Reporter: Olga Natkovich
>Assignee: Ankit Modi
> Attachments: LeftOuterFRJoin.patch
>
>


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1036) Fragment-replicate left outer join

2009-11-03 Thread Ankit Modi (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1036?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ankit Modi updated PIG-1036:


Attachment: LeftOuterFRJoin.patch

Attaching a new patch.

The join now only supports two way Left join. 
Join requires a schema to be mandatory be present on the right side, and it is 
used to determine the number of null fields/columns in nullTuple.

As its a two way join we use nullBag instead of an Array of nullBag. 
A DataBag is used instead of a Tuple to maintain consistency on the result Type 
of ConstantExpression.

> Fragment-replicate left outer join
> --
>
> Key: PIG-1036
> URL: https://issues.apache.org/jira/browse/PIG-1036
> Project: Pig
>  Issue Type: New Feature
>Reporter: Olga Natkovich
>Assignee: Ankit Modi
> Attachments: LeftOuterFRJoin.patch
>
>


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1036) Fragment-replicate left outer join

2009-11-03 Thread Ankit Modi (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1036?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ankit Modi updated PIG-1036:


Attachment: (was: LeftOuterFRJoin.patch)

> Fragment-replicate left outer join
> --
>
> Key: PIG-1036
> URL: https://issues.apache.org/jira/browse/PIG-1036
> Project: Pig
>  Issue Type: New Feature
>Reporter: Olga Natkovich
>Assignee: Ankit Modi
>


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1036) Fragment-replicate left outer join

2009-11-03 Thread Ankit Modi (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1036?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ankit Modi updated PIG-1036:


Status: Open  (was: Patch Available)

> Fragment-replicate left outer join
> --
>
> Key: PIG-1036
> URL: https://issues.apache.org/jira/browse/PIG-1036
> Project: Pig
>  Issue Type: New Feature
>Reporter: Olga Natkovich
>Assignee: Ankit Modi
> Attachments: LeftOuterFRJoin.patch
>
>


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1036) Fragment-replicate left outer join

2009-11-03 Thread Ankit Modi (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1036?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12773149#action_12773149
 ] 

Ankit Modi commented on PIG-1036:
-

Also the the patch fixes two wrong error codes in 
{code}LogToPhyTranslationVisitor.updateWithEmptyBagCheck{code}

{code}
int errCode = 1109;  // was 1105
String msg = "Input (" + joinInput.getAlias() + ") " +
"on which outer join is desired should have a valid 
schema";
  
} catch (FrontendException e) {
int errCode = 2104;  // was 2014

{code}

> Fragment-replicate left outer join
> --
>
> Key: PIG-1036
> URL: https://issues.apache.org/jira/browse/PIG-1036
> Project: Pig
>  Issue Type: New Feature
>Reporter: Olga Natkovich
>Assignee: Ankit Modi
> Attachments: LeftOuterFRJoin.patch
>
>


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1036) Fragment-replicate left outer join

2009-11-03 Thread Ankit Modi (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1036?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ankit Modi updated PIG-1036:


Status: Patch Available  (was: Open)

> Fragment-replicate left outer join
> --
>
> Key: PIG-1036
> URL: https://issues.apache.org/jira/browse/PIG-1036
> Project: Pig
>  Issue Type: New Feature
>Reporter: Olga Natkovich
>Assignee: Ankit Modi
> Attachments: LeftOuterFRJoin.patch
>
>


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Assigned: (PIG-965) PERFORMANCE: optimize common case in matches (PORegex)

2009-11-17 Thread Ankit Modi (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-965?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ankit Modi reassigned PIG-965:
--

Assignee: Ankit Modi

> PERFORMANCE: optimize common case in matches (PORegex)
> --
>
> Key: PIG-965
> URL: https://issues.apache.org/jira/browse/PIG-965
> Project: Pig
>  Issue Type: Improvement
>  Components: impl
>Reporter: Thejas M Nair
>Assignee: Ankit Modi
>
> Some frequently seen use cases of 'matches' comparison operator have follow 
> properties -
> 1. The rhs is a constant string . eg "c1 matches 'abc%' "
> 2. Regexes such that look for matching prefix , suffix etc are very common. 
> eg - "abc%', "%abc", '%abc%' 
> To optimize for these common cases , PORegex.java can be changed to -
> 1. Compile the pattern (rhs of matches) re-use it if the pattern string has 
> not changed. 
> 2. Use string comparisons for simple common regexes (in 2 above).
> The implementation of Hive like clause uses similar optimizations.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1053) Consider moving to Hadoop for local mode

2009-11-17 Thread Ankit Modi (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1053?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ankit Modi updated PIG-1053:


Attachment: hadoopLocal.patch

This patch fails in releaseAudit for two new html files.

> Consider moving to Hadoop for local mode
> 
>
> Key: PIG-1053
> URL: https://issues.apache.org/jira/browse/PIG-1053
> Project: Pig
>  Issue Type: Improvement
>Reporter: Alan Gates
>Assignee: Ankit Modi
> Attachments: hadoopLocal.patch
>
>
> We need to consider moving Pig to use Hadoop's local mode instead of its own.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1053) Consider moving to Hadoop for local mode

2009-11-17 Thread Ankit Modi (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1053?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ankit Modi updated PIG-1053:


Status: Patch Available  (was: Open)

> Consider moving to Hadoop for local mode
> 
>
> Key: PIG-1053
> URL: https://issues.apache.org/jira/browse/PIG-1053
> Project: Pig
>  Issue Type: Improvement
>Reporter: Alan Gates
>Assignee: Ankit Modi
> Attachments: hadoopLocal.patch
>
>
> We need to consider moving Pig to use Hadoop's local mode instead of its own.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1053) Consider moving to Hadoop for local mode

2009-11-17 Thread Ankit Modi (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1053?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12779179#action_12779179
 ] 

Ankit Modi commented on PIG-1053:
-

This patch has an issue with custom comparators ( OrderBy) in Local Mode ( Does 
not affect MapReduce mode ).

Details:
Pig uses custom Comparators by setting OutputKeyComparator to the 
customComparator.class, and passing the jar path to JVM while starting the task.
In this new local mode a new JVM is not started. So hadoop does not have the 
classpath of customComparator and fails.

A solution for the above problem would be to pass jarpath of customComparator 
in the "classpath" argument to JVM running pig.

eg. CustomComparatorUse.pig
register custom.jar

A = load 'file';B = order a by * using custompackage.customclass; -- Here hadoop
>> bails out giving ClassNotFoundException

> Consider moving to Hadoop for local mode
> 
>
> Key: PIG-1053
> URL: https://issues.apache.org/jira/browse/PIG-1053
> Project: Pig
>  Issue Type: Improvement
>Reporter: Alan Gates
>Assignee: Ankit Modi
> Attachments: hadoopLocal.patch
>
>
> We need to consider moving Pig to use Hadoop's local mode instead of its own.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1053) Consider moving to Hadoop for local mode

2009-11-17 Thread Ankit Modi (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1053?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12779181#action_12779181
 ] 

Ankit Modi commented on PIG-1053:
-

This patch has an issue with custom comparators ( OrderBy) in Local Mode ( Does 
not affect MapReduce mode ).

Details:
Pig uses custom Comparators by setting OutputKeyComparator to the 
customComparator.class, and passing the jar path to JVM while starting the task.
In this new local mode a new JVM is not started. So hadoop does not have the 
classpath of customComparator and fails.

A solution for the above problem would be to pass jarpath of customComparator 
in the "classpath" argument to JVM running pig.

eg.
{code:title=CustomComparatorUse.pig}
register custom.jar
A = load 'file';
B = order A by * using custompackage.customclass; --- Here hadoop bails out 
giving ClassNotFoundException
store B into 'file2'
{code}

JVM Command
{{java -cp pig.jar org.pig.apache.Main -x local CustomComparatorUse.pig # This 
does not work}}

Use this instead
{{java -cp pig.jar:{color:red}custom.jar{color} org.pig.apache.Main -x local 
CustomComparatorUse.pig}}

> Consider moving to Hadoop for local mode
> 
>
> Key: PIG-1053
> URL: https://issues.apache.org/jira/browse/PIG-1053
> Project: Pig
>  Issue Type: Improvement
>Reporter: Alan Gates
>Assignee: Ankit Modi
> Attachments: hadoopLocal.patch
>
>
> We need to consider moving Pig to use Hadoop's local mode instead of its own.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1053) Consider moving to Hadoop for local mode

2009-11-17 Thread Ankit Modi (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1053?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12779252#action_12779252
 ] 

Ankit Modi commented on PIG-1053:
-

PhysicalPlan in local mode had POCounter Operator before every POStore. This 
operator was used for getting stats.

As we moved to Hadoop this operator is no longer used. Hence the plan size 
changed. So the numbers changed. 

> Consider moving to Hadoop for local mode
> 
>
> Key: PIG-1053
> URL: https://issues.apache.org/jira/browse/PIG-1053
> Project: Pig
>  Issue Type: Improvement
>Reporter: Alan Gates
>Assignee: Ankit Modi
> Attachments: hadoopLocal.patch
>
>
> We need to consider moving Pig to use Hadoop's local mode instead of its own.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1053) Consider moving to Hadoop for local mode

2009-11-17 Thread Ankit Modi (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1053?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ankit Modi updated PIG-1053:


Attachment: hadoopLocal.patch

> Consider moving to Hadoop for local mode
> 
>
> Key: PIG-1053
> URL: https://issues.apache.org/jira/browse/PIG-1053
> Project: Pig
>  Issue Type: Improvement
>Reporter: Alan Gates
>Assignee: Ankit Modi
> Attachments: hadoopLocal.patch
>
>
> We need to consider moving Pig to use Hadoop's local mode instead of its own.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1053) Consider moving to Hadoop for local mode

2009-11-17 Thread Ankit Modi (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1053?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ankit Modi updated PIG-1053:


Attachment: (was: hadoopLocal.patch)

> Consider moving to Hadoop for local mode
> 
>
> Key: PIG-1053
> URL: https://issues.apache.org/jira/browse/PIG-1053
> Project: Pig
>  Issue Type: Improvement
>Reporter: Alan Gates
>Assignee: Ankit Modi
> Attachments: hadoopLocal.patch
>
>
> We need to consider moving Pig to use Hadoop's local mode instead of its own.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1053) Consider moving to Hadoop for local mode

2009-11-17 Thread Ankit Modi (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1053?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ankit Modi updated PIG-1053:


Status: Patch Available  (was: Open)

> Consider moving to Hadoop for local mode
> 
>
> Key: PIG-1053
> URL: https://issues.apache.org/jira/browse/PIG-1053
> Project: Pig
>  Issue Type: Improvement
>Reporter: Alan Gates
>Assignee: Ankit Modi
> Attachments: hadoopLocal.patch
>
>
> We need to consider moving Pig to use Hadoop's local mode instead of its own.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1053) Consider moving to Hadoop for local mode

2009-11-17 Thread Ankit Modi (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1053?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ankit Modi updated PIG-1053:


Status: Open  (was: Patch Available)

> Consider moving to Hadoop for local mode
> 
>
> Key: PIG-1053
> URL: https://issues.apache.org/jira/browse/PIG-1053
> Project: Pig
>  Issue Type: Improvement
>Reporter: Alan Gates
>Assignee: Ankit Modi
> Attachments: hadoopLocal.patch
>
>
> We need to consider moving Pig to use Hadoop's local mode instead of its own.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (PIG-1107) PigLineRecordReader bails out on an empty line for compressed data

2009-11-24 Thread Ankit Modi (JIRA)
PigLineRecordReader bails out on an empty line for compressed data
--

 Key: PIG-1107
 URL: https://issues.apache.org/jira/browse/PIG-1107
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.6.0
Reporter: Ankit Modi
Assignee: Ankit Modi
 Fix For: 0.6.0


PigLineRecordReader bails out with an exception when it encounters an empty 
line in a compressed file

java.lang.ArrayIndexOutOfBoundsException: -1
   at 
org.apache.pig.impl.io.PigLineRecordReader$LineReader.getNext(PigLineRecordReader.java:136)
at 
org.apache.pig.impl.io.PigLineRecordReader.next(PigLineRecordReader.java:57)
at org.apache.pig.builtin.PigStorage.getNext(PigStorage.java:121)
at 
org.apache.pig.backend.executionengine.PigSlice.next(PigSlice.java:139)
at 
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.SliceWrapper$1.next(SliceWrapper.java:164)
at 
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.SliceWrapper$1.next(SliceWrapper.java:140)
at 
org.apache.hadoop.mapred.MapTask$TrackedRecordReader.moveToNext(MapTask.java:192)
at 
org.apache.hadoop.mapred.MapTask$TrackedRecordReader.next(MapTask.java:176)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:48)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:358)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:307)




-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1107) PigLineRecordReader bails out on an empty line for compressed data

2009-11-25 Thread Ankit Modi (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1107?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ankit Modi updated PIG-1107:


Status: Patch Available  (was: Open)

> PigLineRecordReader bails out on an empty line for compressed data
> --
>
> Key: PIG-1107
> URL: https://issues.apache.org/jira/browse/PIG-1107
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.6.0
>Reporter: Ankit Modi
>Assignee: Ankit Modi
> Fix For: 0.6.0
>
> Attachments: pig_piglinerecordreader_bug.patch
>
>
> PigLineRecordReader bails out with an exception when it encounters an empty 
> line in a compressed file
> java.lang.ArrayIndexOutOfBoundsException: -1
>at 
> org.apache.pig.impl.io.PigLineRecordReader$LineReader.getNext(PigLineRecordReader.java:136)
> at 
> org.apache.pig.impl.io.PigLineRecordReader.next(PigLineRecordReader.java:57)
> at org.apache.pig.builtin.PigStorage.getNext(PigStorage.java:121)
> at 
> org.apache.pig.backend.executionengine.PigSlice.next(PigSlice.java:139)
> at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.SliceWrapper$1.next(SliceWrapper.java:164)
> at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.SliceWrapper$1.next(SliceWrapper.java:140)
> at 
> org.apache.hadoop.mapred.MapTask$TrackedRecordReader.moveToNext(MapTask.java:192)
> at 
> org.apache.hadoop.mapred.MapTask$TrackedRecordReader.next(MapTask.java:176)
> at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:48)
> at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:358)
> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:307)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1107) PigLineRecordReader bails out on an empty line for compressed data

2009-11-25 Thread Ankit Modi (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1107?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ankit Modi updated PIG-1107:


Attachment: pig_piglinerecordreader_bug.patch

Submitting a small patch. It has 2 new unit tests for the patch applied.

> PigLineRecordReader bails out on an empty line for compressed data
> --
>
> Key: PIG-1107
> URL: https://issues.apache.org/jira/browse/PIG-1107
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.6.0
>Reporter: Ankit Modi
>Assignee: Ankit Modi
> Fix For: 0.6.0
>
> Attachments: pig_piglinerecordreader_bug.patch
>
>
> PigLineRecordReader bails out with an exception when it encounters an empty 
> line in a compressed file
> java.lang.ArrayIndexOutOfBoundsException: -1
>at 
> org.apache.pig.impl.io.PigLineRecordReader$LineReader.getNext(PigLineRecordReader.java:136)
> at 
> org.apache.pig.impl.io.PigLineRecordReader.next(PigLineRecordReader.java:57)
> at org.apache.pig.builtin.PigStorage.getNext(PigStorage.java:121)
> at 
> org.apache.pig.backend.executionengine.PigSlice.next(PigSlice.java:139)
> at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.SliceWrapper$1.next(SliceWrapper.java:164)
> at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.SliceWrapper$1.next(SliceWrapper.java:140)
> at 
> org.apache.hadoop.mapred.MapTask$TrackedRecordReader.moveToNext(MapTask.java:192)
> at 
> org.apache.hadoop.mapred.MapTask$TrackedRecordReader.next(MapTask.java:176)
> at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:48)
> at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:358)
> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:307)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-965) PERFORMANCE: optimize common case in matches (PORegex)

2009-12-01 Thread Ankit Modi (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-965?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12784596#action_12784596
 ] 

Ankit Modi commented on PIG-965:


I implemented a patch with optimization 1 and 2 mentioned above and another 
patch with optimization 1,2 and dk.brics.automaton.

dk.brics.automaton does not support all features of java.util.regex hence the 
second patch considers that and switches to java.util.regex if the regex can 
only be handled by java.util.regex.

Here are the numbers

||Regex||   svn_trunk   ||Optimization 1 and 2||
dk.brics.automaton|| comments ||
| .\*ABCD.\* | 92.74 | 50.92| 49.32 | Here only optimization 2 is 
used |
| .\*[A-F]{2,3}.\*  |152.3| 133.48| 105.93 | dk.brics.automaton is used |
| A.B.C.D | 54.492 | 44.46 | 44.66 | dk.brics.automaton is used |
|   .\*([A-F]{4})\w\*\1.\* | 129.29 | 112.89 | 109.43 | java.util.regex used in 
all cases |
|   .\*\[A-F\]\{4\}\w\*[N-Z]\{3\}.\* | 129.63 | 108.11 | 54.42 | 
dk.brics.automaton used |


These results were obtained using Local Mode on 1 Billion lines of data of 
following format
f1:Chararray(100) of random chars from [A-Z]
f2:int random integer

dk.brics.automaton provides good performance in case of complex regex. 


> PERFORMANCE: optimize common case in matches (PORegex)
> --
>
> Key: PIG-965
> URL: https://issues.apache.org/jira/browse/PIG-965
> Project: Pig
>  Issue Type: Improvement
>  Components: impl
>Reporter: Thejas M Nair
>Assignee: Ankit Modi
>
> Some frequently seen use cases of 'matches' comparison operator have follow 
> properties -
> 1. The rhs is a constant string . eg "c1 matches 'abc%' "
> 2. Regexes such that look for matching prefix , suffix etc are very common. 
> eg - "abc%', "%abc", '%abc%' 
> To optimize for these common cases , PORegex.java can be changed to -
> 1. Compile the pattern (rhs of matches) re-use it if the pattern string has 
> not changed. 
> 2. Use string comparisons for simple common regexes (in 2 above).
> The implementation of Hive like clause uses similar optimizations.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (PIG-1130) In pig local ( hadoop local mode ) mode the counting of number of tuples and bytes is incorrect if data is more than one local split.

2009-12-06 Thread Ankit Modi (JIRA)
In pig local ( hadoop local mode ) mode the counting of number of tuples and 
bytes is incorrect if data is more than one local split.
-

 Key: PIG-1130
 URL: https://issues.apache.org/jira/browse/PIG-1130
 Project: Pig
  Issue Type: Bug
Reporter: Ankit Modi
Priority: Minor


If the output generates more than one part file, the current code only gives 
stats of the first part file. ie. part-0

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-965) PERFORMANCE: optimize common case in matches (PORegex)

2009-12-07 Thread Ankit Modi (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-965?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ankit Modi updated PIG-965:
---

Attachment: poregex2.patch
poregex.patch

These are patches for two implementations 

One (poregex.patch) is an implementation applying optimization mentioned above 
in the JIRA.
Second (poregex2.patch) implementation applies optimization 1 and uses 
dk.brics.automaton for running simple regular expressions. Otherwise it reverts 
back to java.util.regex.

In 1 the decision to use optimization two or use java.util.regex is decided by 
getSimpleString method

In 2 the decision to use dk.brics.automaton is done by 
determineBestRegexMethod. ( changes to build.xml is this patch are temporary )

Both patches use RegexInit as an implementation which makes a decision ( 
calling the above mentioned  decision functions ) and then sets the 
implementation to one decided by the decision function.

In second patch, the decision function was created looking at the support of 
operators in dk.brics.automaton and its grammar. I tried out the classes 
supported and not supported in dk.brics.automaton and decided upon it.

I could not find any specific page mentioning the difference between regex 
language java.util.regex and dk.brics.automaton.

> PERFORMANCE: optimize common case in matches (PORegex)
> --
>
> Key: PIG-965
> URL: https://issues.apache.org/jira/browse/PIG-965
> Project: Pig
>  Issue Type: Improvement
>  Components: impl
>Reporter: Thejas M Nair
>Assignee: Ankit Modi
> Attachments: poregex.patch, poregex2.patch
>
>
> Some frequently seen use cases of 'matches' comparison operator have follow 
> properties -
> 1. The rhs is a constant string . eg "c1 matches 'abc%' "
> 2. Regexes such that look for matching prefix , suffix etc are very common. 
> eg - "abc%', "%abc", '%abc%' 
> To optimize for these common cases , PORegex.java can be changed to -
> 1. Compile the pattern (rhs of matches) re-use it if the pattern string has 
> not changed. 
> 2. Use string comparisons for simple common regexes (in 2 above).
> The implementation of Hive like clause uses similar optimizations.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-965) PERFORMANCE: optimize common case in matches (PORegex)

2009-12-10 Thread Ankit Modi (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-965?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ankit Modi updated PIG-965:
---

Attachment: poregex2.patch

Attaching one more file of patch. This one has all the changes, except changes 
to build.xml. Still trying to find a maven repo for dk.brics.automaton.

> PERFORMANCE: optimize common case in matches (PORegex)
> --
>
> Key: PIG-965
> URL: https://issues.apache.org/jira/browse/PIG-965
> Project: Pig
>  Issue Type: Improvement
>  Components: impl
>Reporter: Thejas M Nair
>Assignee: Ankit Modi
> Attachments: poregex.patch, poregex2.patch, poregex2.patch
>
>
> Some frequently seen use cases of 'matches' comparison operator have follow 
> properties -
> 1. The rhs is a constant string . eg "c1 matches 'abc%' "
> 2. Regexes such that look for matching prefix , suffix etc are very common. 
> eg - "abc%', "%abc", '%abc%' 
> To optimize for these common cases , PORegex.java can be changed to -
> 1. Compile the pattern (rhs of matches) re-use it if the pattern string has 
> not changed. 
> 2. Use string comparisons for simple common regexes (in 2 above).
> The implementation of Hive like clause uses similar optimizations.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-965) PERFORMANCE: optimize common case in matches (PORegex)

2009-12-11 Thread Ankit Modi (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-965?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ankit Modi updated PIG-965:
---

Attachment: (was: poregex2.patch)

> PERFORMANCE: optimize common case in matches (PORegex)
> --
>
> Key: PIG-965
> URL: https://issues.apache.org/jira/browse/PIG-965
> Project: Pig
>  Issue Type: Improvement
>  Components: impl
>Reporter: Thejas M Nair
>Assignee: Ankit Modi
>
> Some frequently seen use cases of 'matches' comparison operator have follow 
> properties -
> 1. The rhs is a constant string . eg "c1 matches 'abc%' "
> 2. Regexes such that look for matching prefix , suffix etc are very common. 
> eg - "abc%', "%abc", '%abc%' 
> To optimize for these common cases , PORegex.java can be changed to -
> 1. Compile the pattern (rhs of matches) re-use it if the pattern string has 
> not changed. 
> 2. Use string comparisons for simple common regexes (in 2 above).
> The implementation of Hive like clause uses similar optimizations.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-965) PERFORMANCE: optimize common case in matches (PORegex)

2009-12-11 Thread Ankit Modi (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-965?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ankit Modi updated PIG-965:
---

Attachment: (was: poregex2.patch)

> PERFORMANCE: optimize common case in matches (PORegex)
> --
>
> Key: PIG-965
> URL: https://issues.apache.org/jira/browse/PIG-965
> Project: Pig
>  Issue Type: Improvement
>  Components: impl
>Reporter: Thejas M Nair
>Assignee: Ankit Modi
>
> Some frequently seen use cases of 'matches' comparison operator have follow 
> properties -
> 1. The rhs is a constant string . eg "c1 matches 'abc%' "
> 2. Regexes such that look for matching prefix , suffix etc are very common. 
> eg - "abc%', "%abc", '%abc%' 
> To optimize for these common cases , PORegex.java can be changed to -
> 1. Compile the pattern (rhs of matches) re-use it if the pattern string has 
> not changed. 
> 2. Use string comparisons for simple common regexes (in 2 above).
> The implementation of Hive like clause uses similar optimizations.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-965) PERFORMANCE: optimize common case in matches (PORegex)

2009-12-11 Thread Ankit Modi (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-965?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ankit Modi updated PIG-965:
---

Attachment: (was: poregex.patch)

> PERFORMANCE: optimize common case in matches (PORegex)
> --
>
> Key: PIG-965
> URL: https://issues.apache.org/jira/browse/PIG-965
> Project: Pig
>  Issue Type: Improvement
>  Components: impl
>Reporter: Thejas M Nair
>Assignee: Ankit Modi
>
> Some frequently seen use cases of 'matches' comparison operator have follow 
> properties -
> 1. The rhs is a constant string . eg "c1 matches 'abc%' "
> 2. Regexes such that look for matching prefix , suffix etc are very common. 
> eg - "abc%', "%abc", '%abc%' 
> To optimize for these common cases , PORegex.java can be changed to -
> 1. Compile the pattern (rhs of matches) re-use it if the pattern string has 
> not changed. 
> 2. Use string comparisons for simple common regexes (in 2 above).
> The implementation of Hive like clause uses similar optimizations.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-965) PERFORMANCE: optimize common case in matches (PORegex)

2009-12-11 Thread Ankit Modi (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-965?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ankit Modi updated PIG-965:
---

Status: Patch Available  (was: Open)

> PERFORMANCE: optimize common case in matches (PORegex)
> --
>
> Key: PIG-965
> URL: https://issues.apache.org/jira/browse/PIG-965
> Project: Pig
>  Issue Type: Improvement
>  Components: impl
>Reporter: Thejas M Nair
>Assignee: Ankit Modi
> Attachments: automaton.jar, poregex2.patch
>
>
> Some frequently seen use cases of 'matches' comparison operator have follow 
> properties -
> 1. The rhs is a constant string . eg "c1 matches 'abc%' "
> 2. Regexes such that look for matching prefix , suffix etc are very common. 
> eg - "abc%', "%abc", '%abc%' 
> To optimize for these common cases , PORegex.java can be changed to -
> 1. Compile the pattern (rhs of matches) re-use it if the pattern string has 
> not changed. 
> 2. Use string comparisons for simple common regexes (in 2 above).
> The implementation of Hive like clause uses similar optimizations.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-965) PERFORMANCE: optimize common case in matches (PORegex)

2009-12-11 Thread Ankit Modi (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-965?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ankit Modi updated PIG-965:
---

Attachment: automaton.jar
poregex2.patch

New patch with removed comments and added automaton.jar from 
http://www.brics.dk/~amoeller/automaton/automaton.jar.

It fails findBugs due to missing symbols. I ran the findBugs after adding the 
jar to the build and it did not complain about any findBugs in the modified and 
added files.

> PERFORMANCE: optimize common case in matches (PORegex)
> --
>
> Key: PIG-965
> URL: https://issues.apache.org/jira/browse/PIG-965
> Project: Pig
>  Issue Type: Improvement
>  Components: impl
>Reporter: Thejas M Nair
>Assignee: Ankit Modi
> Attachments: automaton.jar, poregex2.patch
>
>
> Some frequently seen use cases of 'matches' comparison operator have follow 
> properties -
> 1. The rhs is a constant string . eg "c1 matches 'abc%' "
> 2. Regexes such that look for matching prefix , suffix etc are very common. 
> eg - "abc%', "%abc", '%abc%' 
> To optimize for these common cases , PORegex.java can be changed to -
> 1. Compile the pattern (rhs of matches) re-use it if the pattern string has 
> not changed. 
> 2. Use string comparisons for simple common regexes (in 2 above).
> The implementation of Hive like clause uses similar optimizations.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-965) PERFORMANCE: optimize common case in matches (PORegex)

2009-12-11 Thread Ankit Modi (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-965?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ankit Modi updated PIG-965:
---

Status: Open  (was: Patch Available)

One small change to JarManager.java is missing. Will add a new patch with it.

> PERFORMANCE: optimize common case in matches (PORegex)
> --
>
> Key: PIG-965
> URL: https://issues.apache.org/jira/browse/PIG-965
> Project: Pig
>  Issue Type: Improvement
>  Components: impl
>Reporter: Thejas M Nair
>Assignee: Ankit Modi
> Attachments: automaton.jar, poregex2.patch
>
>
> Some frequently seen use cases of 'matches' comparison operator have follow 
> properties -
> 1. The rhs is a constant string . eg "c1 matches 'abc%' "
> 2. Regexes such that look for matching prefix , suffix etc are very common. 
> eg - "abc%', "%abc", '%abc%' 
> To optimize for these common cases , PORegex.java can be changed to -
> 1. Compile the pattern (rhs of matches) re-use it if the pattern string has 
> not changed. 
> 2. Use string comparisons for simple common regexes (in 2 above).
> The implementation of Hive like clause uses similar optimizations.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1106) FR join should not spill

2009-12-11 Thread Ankit Modi (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1106?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ankit Modi updated PIG-1106:


Attachment: frjoin-nonspill.patch

This patch does not have any tests. Creating a test would be creating a big 
file about 250 MB and testing it.

I have ran some tests in similar fashion.


> FR join should not spill
> 
>
> Key: PIG-1106
> URL: https://issues.apache.org/jira/browse/PIG-1106
> Project: Pig
>  Issue Type: Bug
>Reporter: Olga Natkovich
>Assignee: Ankit Modi
> Fix For: 0.7.0
>
> Attachments: frjoin-nonspill.patch
>
>
> Currently, the values for the replicated side of the data are placed in a 
> spillable bag (POFRJoin near line 275). This does not make sense because the 
> whole point of the optimization is that the data on one side fits into 
> memory. We already have a non-spillable bag implemented 
> (NonSpillableDataBag.java) and we need to change FRJoin code to use it. And 
> of course need to do lots of testing to make sure that we don't spill but die 
> instead when we run out of memory

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1106) FR join should not spill

2009-12-11 Thread Ankit Modi (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1106?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ankit Modi updated PIG-1106:


Status: Patch Available  (was: Open)

This patch does not have any unit tests.

> FR join should not spill
> 
>
> Key: PIG-1106
> URL: https://issues.apache.org/jira/browse/PIG-1106
> Project: Pig
>  Issue Type: Bug
>Reporter: Olga Natkovich
>Assignee: Ankit Modi
> Fix For: 0.7.0
>
> Attachments: frjoin-nonspill.patch
>
>
> Currently, the values for the replicated side of the data are placed in a 
> spillable bag (POFRJoin near line 275). This does not make sense because the 
> whole point of the optimization is that the data on one side fits into 
> memory. We already have a non-spillable bag implemented 
> (NonSpillableDataBag.java) and we need to change FRJoin code to use it. And 
> of course need to do lots of testing to make sure that we don't spill but die 
> instead when we run out of memory

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1106) FR join should not spill

2009-12-11 Thread Ankit Modi (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1106?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12789294#action_12789294
 ] 

Ankit Modi commented on PIG-1106:
-

Tests I ran were using two files

file format
f1: random chararray(100)
f2: random int

leftside file contained 100 tuples and right side file contain 3million tuples.

Code
{noformat}
A = load 'leftsidefrjoin.txt' as ( key, value);
B = load 'rightsidefrjoin.txt' as (key, value);
C = join A by key left, B by key using "repl";
--- Fragmented input and replicated input
store C into 'output';
{noformat}

This generated following error
{noformat}
FATAL org.apache.hadoop.mapred.TaskTracker: Error running child : 
java.lang.OutOfMemoryError: GC overhead limit exceeded
at java.util.ArrayList.(ArrayList.java:112)
at org.apache.pig.data.DefaultTuple.(DefaultTuple.java:63)
at 
org.apache.pig.data.DefaultTupleFactory.newTuple(DefaultTupleFactory.java:35)
at 
org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POLocalRearrange.constructLROutput(POLocalRearrange.java:369)
at 
org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POLocalRearrange.getNext(POLocalRearrange.java:288)
at 
org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POFRJoin.setUpHashMap(POFRJoin.java:351)
at 
org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POFRJoin.getNext(POFRJoin.java:211)
at 
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.runPipeline(PigMapBase.java:250)
at 
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:241)
at 
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapOnly$Map.map(PigMapOnly.java:65)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:358)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:307)
at org.apache.hadoop.mapred.Child.main(Child.java:170)
{noformat}

I ran the same job with same records on left hand side and 100K records on 
right hand side. The job completed successfully.

> FR join should not spill
> 
>
> Key: PIG-1106
> URL: https://issues.apache.org/jira/browse/PIG-1106
> Project: Pig
>  Issue Type: Bug
>Reporter: Olga Natkovich
>Assignee: Ankit Modi
> Fix For: 0.7.0
>
> Attachments: frjoin-nonspill.patch
>
>
> Currently, the values for the replicated side of the data are placed in a 
> spillable bag (POFRJoin near line 275). This does not make sense because the 
> whole point of the optimization is that the data on one side fits into 
> memory. We already have a non-spillable bag implemented 
> (NonSpillableDataBag.java) and we need to change FRJoin code to use it. And 
> of course need to do lots of testing to make sure that we don't spill but die 
> instead when we run out of memory

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-965) PERFORMANCE: optimize common case in matches (PORegex)

2009-12-11 Thread Ankit Modi (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-965?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ankit Modi updated PIG-965:
---

Attachment: (was: automaton.jar)

> PERFORMANCE: optimize common case in matches (PORegex)
> --
>
> Key: PIG-965
> URL: https://issues.apache.org/jira/browse/PIG-965
> Project: Pig
>  Issue Type: Improvement
>  Components: impl
>Reporter: Thejas M Nair
>Assignee: Ankit Modi
>
> Some frequently seen use cases of 'matches' comparison operator have follow 
> properties -
> 1. The rhs is a constant string . eg "c1 matches 'abc%' "
> 2. Regexes such that look for matching prefix , suffix etc are very common. 
> eg - "abc%', "%abc", '%abc%' 
> To optimize for these common cases , PORegex.java can be changed to -
> 1. Compile the pattern (rhs of matches) re-use it if the pattern string has 
> not changed. 
> 2. Use string comparisons for simple common regexes (in 2 above).
> The implementation of Hive like clause uses similar optimizations.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-965) PERFORMANCE: optimize common case in matches (PORegex)

2009-12-11 Thread Ankit Modi (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-965?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ankit Modi updated PIG-965:
---

Attachment: (was: poregex2.patch)

> PERFORMANCE: optimize common case in matches (PORegex)
> --
>
> Key: PIG-965
> URL: https://issues.apache.org/jira/browse/PIG-965
> Project: Pig
>  Issue Type: Improvement
>  Components: impl
>Reporter: Thejas M Nair
>Assignee: Ankit Modi
>
> Some frequently seen use cases of 'matches' comparison operator have follow 
> properties -
> 1. The rhs is a constant string . eg "c1 matches 'abc%' "
> 2. Regexes such that look for matching prefix , suffix etc are very common. 
> eg - "abc%', "%abc", '%abc%' 
> To optimize for these common cases , PORegex.java can be changed to -
> 1. Compile the pattern (rhs of matches) re-use it if the pattern string has 
> not changed. 
> 2. Use string comparisons for simple common regexes (in 2 above).
> The implementation of Hive like clause uses similar optimizations.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-965) PERFORMANCE: optimize common case in matches (PORegex)

2009-12-11 Thread Ankit Modi (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-965?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ankit Modi updated PIG-965:
---

Status: Patch Available  (was: Open)

> PERFORMANCE: optimize common case in matches (PORegex)
> --
>
> Key: PIG-965
> URL: https://issues.apache.org/jira/browse/PIG-965
> Project: Pig
>  Issue Type: Improvement
>  Components: impl
>Reporter: Thejas M Nair
>Assignee: Ankit Modi
> Attachments: automaton.jar, poregex2.patch
>
>
> Some frequently seen use cases of 'matches' comparison operator have follow 
> properties -
> 1. The rhs is a constant string . eg "c1 matches 'abc%' "
> 2. Regexes such that look for matching prefix , suffix etc are very common. 
> eg - "abc%', "%abc", '%abc%' 
> To optimize for these common cases , PORegex.java can be changed to -
> 1. Compile the pattern (rhs of matches) re-use it if the pattern string has 
> not changed. 
> 2. Use string comparisons for simple common regexes (in 2 above).
> The implementation of Hive like clause uses similar optimizations.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-965) PERFORMANCE: optimize common case in matches (PORegex)

2009-12-11 Thread Ankit Modi (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-965?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ankit Modi updated PIG-965:
---

Attachment: automaton.jar
poregex2.patch

> PERFORMANCE: optimize common case in matches (PORegex)
> --
>
> Key: PIG-965
> URL: https://issues.apache.org/jira/browse/PIG-965
> Project: Pig
>  Issue Type: Improvement
>  Components: impl
>Reporter: Thejas M Nair
>Assignee: Ankit Modi
> Attachments: automaton.jar, poregex2.patch
>
>
> Some frequently seen use cases of 'matches' comparison operator have follow 
> properties -
> 1. The rhs is a constant string . eg "c1 matches 'abc%' "
> 2. Regexes such that look for matching prefix , suffix etc are very common. 
> eg - "abc%', "%abc", '%abc%' 
> To optimize for these common cases , PORegex.java can be changed to -
> 1. Compile the pattern (rhs of matches) re-use it if the pattern string has 
> not changed. 
> 2. Use string comparisons for simple common regexes (in 2 above).
> The implementation of Hive like clause uses similar optimizations.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-965) PERFORMANCE: optimize common case in matches (PORegex)

2009-12-14 Thread Ankit Modi (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-965?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ankit Modi updated PIG-965:
---

Attachment: (was: poregex2.patch)

> PERFORMANCE: optimize common case in matches (PORegex)
> --
>
> Key: PIG-965
> URL: https://issues.apache.org/jira/browse/PIG-965
> Project: Pig
>  Issue Type: Improvement
>  Components: impl
>Reporter: Thejas M Nair
>Assignee: Ankit Modi
> Attachments: automaton.jar, poregex2.patch
>
>
> Some frequently seen use cases of 'matches' comparison operator have follow 
> properties -
> 1. The rhs is a constant string . eg "c1 matches 'abc%' "
> 2. Regexes such that look for matching prefix , suffix etc are very common. 
> eg - "abc%', "%abc", '%abc%' 
> To optimize for these common cases , PORegex.java can be changed to -
> 1. Compile the pattern (rhs of matches) re-use it if the pattern string has 
> not changed. 
> 2. Use string comparisons for simple common regexes (in 2 above).
> The implementation of Hive like clause uses similar optimizations.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-965) PERFORMANCE: optimize common case in matches (PORegex)

2009-12-14 Thread Ankit Modi (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-965?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ankit Modi updated PIG-965:
---

Status: Open  (was: Patch Available)

> PERFORMANCE: optimize common case in matches (PORegex)
> --
>
> Key: PIG-965
> URL: https://issues.apache.org/jira/browse/PIG-965
> Project: Pig
>  Issue Type: Improvement
>  Components: impl
>Reporter: Thejas M Nair
>Assignee: Ankit Modi
> Attachments: automaton.jar, poregex2.patch
>
>
> Some frequently seen use cases of 'matches' comparison operator have follow 
> properties -
> 1. The rhs is a constant string . eg "c1 matches 'abc%' "
> 2. Regexes such that look for matching prefix , suffix etc are very common. 
> eg - "abc%', "%abc", '%abc%' 
> To optimize for these common cases , PORegex.java can be changed to -
> 1. Compile the pattern (rhs of matches) re-use it if the pattern string has 
> not changed. 
> 2. Use string comparisons for simple common regexes (in 2 above).
> The implementation of Hive like clause uses similar optimizations.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-965) PERFORMANCE: optimize common case in matches (PORegex)

2009-12-14 Thread Ankit Modi (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-965?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ankit Modi updated PIG-965:
---

Attachment: poregex2.patch

> PERFORMANCE: optimize common case in matches (PORegex)
> --
>
> Key: PIG-965
> URL: https://issues.apache.org/jira/browse/PIG-965
> Project: Pig
>  Issue Type: Improvement
>  Components: impl
>Reporter: Thejas M Nair
>Assignee: Ankit Modi
> Attachments: automaton.jar, poregex2.patch
>
>
> Some frequently seen use cases of 'matches' comparison operator have follow 
> properties -
> 1. The rhs is a constant string . eg "c1 matches 'abc%' "
> 2. Regexes such that look for matching prefix , suffix etc are very common. 
> eg - "abc%', "%abc", '%abc%' 
> To optimize for these common cases , PORegex.java can be changed to -
> 1. Compile the pattern (rhs of matches) re-use it if the pattern string has 
> not changed. 
> 2. Use string comparisons for simple common regexes (in 2 above).
> The implementation of Hive like clause uses similar optimizations.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-965) PERFORMANCE: optimize common case in matches (PORegex)

2009-12-14 Thread Ankit Modi (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-965?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ankit Modi updated PIG-965:
---

Status: Patch Available  (was: Open)

I have included changes suggested by Thejas.

> PERFORMANCE: optimize common case in matches (PORegex)
> --
>
> Key: PIG-965
> URL: https://issues.apache.org/jira/browse/PIG-965
> Project: Pig
>  Issue Type: Improvement
>  Components: impl
>Reporter: Thejas M Nair
>Assignee: Ankit Modi
> Attachments: automaton.jar, poregex2.patch
>
>
> Some frequently seen use cases of 'matches' comparison operator have follow 
> properties -
> 1. The rhs is a constant string . eg "c1 matches 'abc%' "
> 2. Regexes such that look for matching prefix , suffix etc are very common. 
> eg - "abc%', "%abc", '%abc%' 
> To optimize for these common cases , PORegex.java can be changed to -
> 1. Compile the pattern (rhs of matches) re-use it if the pattern string has 
> not changed. 
> 2. Use string comparisons for simple common regexes (in 2 above).
> The implementation of Hive like clause uses similar optimizations.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-965) PERFORMANCE: optimize common case in matches (PORegex)

2009-12-14 Thread Ankit Modi (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-965?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ankit Modi updated PIG-965:
---

Status: Open  (was: Patch Available)

> PERFORMANCE: optimize common case in matches (PORegex)
> --
>
> Key: PIG-965
> URL: https://issues.apache.org/jira/browse/PIG-965
> Project: Pig
>  Issue Type: Improvement
>  Components: impl
>Reporter: Thejas M Nair
>Assignee: Ankit Modi
> Attachments: automaton.jar, poregex2.patch
>
>
> Some frequently seen use cases of 'matches' comparison operator have follow 
> properties -
> 1. The rhs is a constant string . eg "c1 matches 'abc%' "
> 2. Regexes such that look for matching prefix , suffix etc are very common. 
> eg - "abc%', "%abc", '%abc%' 
> To optimize for these common cases , PORegex.java can be changed to -
> 1. Compile the pattern (rhs of matches) re-use it if the pattern string has 
> not changed. 
> 2. Use string comparisons for simple common regexes (in 2 above).
> The implementation of Hive like clause uses similar optimizations.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-965) PERFORMANCE: optimize common case in matches (PORegex)

2009-12-14 Thread Ankit Modi (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-965?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12790545#action_12790545
 ] 

Ankit Modi commented on PIG-965:


* NonConstantRegex - I did not think of equals. But I added a length check 
before as it could find out change in length faster and to best of my knowledge 
its a getMethod. And yes as you mentioned equals will check for same object and 
instanceOf which is not useful in our case.

* The numbers published above are using dk.brics.automaton.RunAutomaton. Do you 
want me to publish numbers for more set of regexs ?

I'll create a patch for rest of the comments.

> PERFORMANCE: optimize common case in matches (PORegex)
> --
>
> Key: PIG-965
> URL: https://issues.apache.org/jira/browse/PIG-965
> Project: Pig
>  Issue Type: Improvement
>  Components: impl
>Reporter: Thejas M Nair
>Assignee: Ankit Modi
> Attachments: automaton.jar, poregex2.patch
>
>
> Some frequently seen use cases of 'matches' comparison operator have follow 
> properties -
> 1. The rhs is a constant string . eg "c1 matches 'abc%' "
> 2. Regexes such that look for matching prefix , suffix etc are very common. 
> eg - "abc%', "%abc", '%abc%' 
> To optimize for these common cases , PORegex.java can be changed to -
> 1. Compile the pattern (rhs of matches) re-use it if the pattern string has 
> not changed. 
> 2. Use string comparisons for simple common regexes (in 2 above).
> The implementation of Hive like clause uses similar optimizations.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-965) PERFORMANCE: optimize common case in matches (PORegex)

2009-12-15 Thread Ankit Modi (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-965?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ankit Modi updated PIG-965:
---

Attachment: (was: poregex2.patch)

> PERFORMANCE: optimize common case in matches (PORegex)
> --
>
> Key: PIG-965
> URL: https://issues.apache.org/jira/browse/PIG-965
> Project: Pig
>  Issue Type: Improvement
>  Components: impl
>Reporter: Thejas M Nair
>Assignee: Ankit Modi
> Attachments: automaton.jar, poregex2.patch
>
>
> Some frequently seen use cases of 'matches' comparison operator have follow 
> properties -
> 1. The rhs is a constant string . eg "c1 matches 'abc%' "
> 2. Regexes such that look for matching prefix , suffix etc are very common. 
> eg - "abc%', "%abc", '%abc%' 
> To optimize for these common cases , PORegex.java can be changed to -
> 1. Compile the pattern (rhs of matches) re-use it if the pattern string has 
> not changed. 
> 2. Use string comparisons for simple common regexes (in 2 above).
> The implementation of Hive like clause uses similar optimizations.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-965) PERFORMANCE: optimize common case in matches (PORegex)

2009-12-15 Thread Ankit Modi (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-965?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ankit Modi updated PIG-965:
---

Attachment: poregex2.patch

> PERFORMANCE: optimize common case in matches (PORegex)
> --
>
> Key: PIG-965
> URL: https://issues.apache.org/jira/browse/PIG-965
> Project: Pig
>  Issue Type: Improvement
>  Components: impl
>Reporter: Thejas M Nair
>Assignee: Ankit Modi
> Attachments: automaton.jar, poregex2.patch
>
>
> Some frequently seen use cases of 'matches' comparison operator have follow 
> properties -
> 1. The rhs is a constant string . eg "c1 matches 'abc%' "
> 2. Regexes such that look for matching prefix , suffix etc are very common. 
> eg - "abc%', "%abc", '%abc%' 
> To optimize for these common cases , PORegex.java can be changed to -
> 1. Compile the pattern (rhs of matches) re-use it if the pattern string has 
> not changed. 
> 2. Use string comparisons for simple common regexes (in 2 above).
> The implementation of Hive like clause uses similar optimizations.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-965) PERFORMANCE: optimize common case in matches (PORegex)

2009-12-15 Thread Ankit Modi (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-965?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ankit Modi updated PIG-965:
---

Status: Patch Available  (was: Open)

> PERFORMANCE: optimize common case in matches (PORegex)
> --
>
> Key: PIG-965
> URL: https://issues.apache.org/jira/browse/PIG-965
> Project: Pig
>  Issue Type: Improvement
>  Components: impl
>Reporter: Thejas M Nair
>Assignee: Ankit Modi
> Attachments: automaton.jar, poregex2.patch
>
>
> Some frequently seen use cases of 'matches' comparison operator have follow 
> properties -
> 1. The rhs is a constant string . eg "c1 matches 'abc%' "
> 2. Regexes such that look for matching prefix , suffix etc are very common. 
> eg - "abc%', "%abc", '%abc%' 
> To optimize for these common cases , PORegex.java can be changed to -
> 1. Compile the pattern (rhs of matches) re-use it if the pattern string has 
> not changed. 
> 2. Use string comparisons for simple common regexes (in 2 above).
> The implementation of Hive like clause uses similar optimizations.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-965) PERFORMANCE: optimize common case in matches (PORegex)

2009-12-15 Thread Ankit Modi (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-965?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12791096#action_12791096
 ] 

Ankit Modi commented on PIG-965:


Here are numbers comparing comparing optimization 1&2 against optimization 1 & 
dk.brics

dk.brics.Runautomaton is as fast as optimization 2 and also provides similar 
speeds in a set of additional expressions.

|| Query || svn_trunk || std_dev || Optimization 1 & 2 || std_dev || 
Optimization 1 & brics.RunAutomaton || std_dev ||
| .\*ABCD.\* |  33.87 |  0.71 | 18.77 | 0.71 | 18.94 | 0.02 |
| .\*ABCD | 30.06 | 2.91 | 18.44 | 0.05 | 18.94 | 0.03 |
| ABCD.\* |  21.93 | 2.91 | 18.35 | 0.1 | 18.85 | 0.04 |

Values are averaged over 3 runs.

> PERFORMANCE: optimize common case in matches (PORegex)
> --
>
> Key: PIG-965
> URL: https://issues.apache.org/jira/browse/PIG-965
> Project: Pig
>  Issue Type: Improvement
>  Components: impl
>Reporter: Thejas M Nair
>Assignee: Ankit Modi
> Attachments: automaton.jar, poregex2.patch
>
>
> Some frequently seen use cases of 'matches' comparison operator have follow 
> properties -
> 1. The rhs is a constant string . eg "c1 matches 'abc%' "
> 2. Regexes such that look for matching prefix , suffix etc are very common. 
> eg - "abc%', "%abc", '%abc%' 
> To optimize for these common cases , PORegex.java can be changed to -
> 1. Compile the pattern (rhs of matches) re-use it if the pattern string has 
> not changed. 
> 2. Use string comparisons for simple common regexes (in 2 above).
> The implementation of Hive like clause uses similar optimizations.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-965) PERFORMANCE: optimize common case in matches (PORegex)

2009-12-16 Thread Ankit Modi (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-965?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ankit Modi updated PIG-965:
---

Attachment: (was: poregex2.patch)

> PERFORMANCE: optimize common case in matches (PORegex)
> --
>
> Key: PIG-965
> URL: https://issues.apache.org/jira/browse/PIG-965
> Project: Pig
>  Issue Type: Improvement
>  Components: impl
>Reporter: Thejas M Nair
>Assignee: Ankit Modi
> Attachments: automaton.jar
>
>
> Some frequently seen use cases of 'matches' comparison operator have follow 
> properties -
> 1. The rhs is a constant string . eg "c1 matches 'abc%' "
> 2. Regexes such that look for matching prefix , suffix etc are very common. 
> eg - "abc%', "%abc", '%abc%' 
> To optimize for these common cases , PORegex.java can be changed to -
> 1. Compile the pattern (rhs of matches) re-use it if the pattern string has 
> not changed. 
> 2. Use string comparisons for simple common regexes (in 2 above).
> The implementation of Hive like clause uses similar optimizations.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-965) PERFORMANCE: optimize common case in matches (PORegex)

2009-12-16 Thread Ankit Modi (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-965?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ankit Modi updated PIG-965:
---

Status: Open  (was: Patch Available)

> PERFORMANCE: optimize common case in matches (PORegex)
> --
>
> Key: PIG-965
> URL: https://issues.apache.org/jira/browse/PIG-965
> Project: Pig
>  Issue Type: Improvement
>  Components: impl
>Reporter: Thejas M Nair
>Assignee: Ankit Modi
> Attachments: automaton.jar
>
>
> Some frequently seen use cases of 'matches' comparison operator have follow 
> properties -
> 1. The rhs is a constant string . eg "c1 matches 'abc%' "
> 2. Regexes such that look for matching prefix , suffix etc are very common. 
> eg - "abc%', "%abc", '%abc%' 
> To optimize for these common cases , PORegex.java can be changed to -
> 1. Compile the pattern (rhs of matches) re-use it if the pattern string has 
> not changed. 
> 2. Use string comparisons for simple common regexes (in 2 above).
> The implementation of Hive like clause uses similar optimizations.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-965) PERFORMANCE: optimize common case in matches (PORegex)

2009-12-16 Thread Ankit Modi (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-965?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ankit Modi updated PIG-965:
---

Assignee: Benjamin Francisoud  (was: Ankit Modi)
  Status: Patch Available  (was: Open)

> PERFORMANCE: optimize common case in matches (PORegex)
> --
>
> Key: PIG-965
> URL: https://issues.apache.org/jira/browse/PIG-965
> Project: Pig
>  Issue Type: Improvement
>  Components: impl
>Reporter: Thejas M Nair
>Assignee: Benjamin Francisoud
> Attachments: automaton.jar, poregex2.patch
>
>
> Some frequently seen use cases of 'matches' comparison operator have follow 
> properties -
> 1. The rhs is a constant string . eg "c1 matches 'abc%' "
> 2. Regexes such that look for matching prefix , suffix etc are very common. 
> eg - "abc%', "%abc", '%abc%' 
> To optimize for these common cases , PORegex.java can be changed to -
> 1. Compile the pattern (rhs of matches) re-use it if the pattern string has 
> not changed. 
> 2. Use string comparisons for simple common regexes (in 2 above).
> The implementation of Hive like clause uses similar optimizations.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-965) PERFORMANCE: optimize common case in matches (PORegex)

2009-12-16 Thread Ankit Modi (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-965?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ankit Modi updated PIG-965:
---

Attachment: poregex2.patch

Rewrote some logic in case 1 and 3 of determineBestRegex. Also found a bug in 
case1 so updated that.

Added Thejas's recommendation.

Also added a few unit test patterns.

> PERFORMANCE: optimize common case in matches (PORegex)
> --
>
> Key: PIG-965
> URL: https://issues.apache.org/jira/browse/PIG-965
> Project: Pig
>  Issue Type: Improvement
>  Components: impl
>Reporter: Thejas M Nair
>Assignee: Ankit Modi
> Attachments: automaton.jar, poregex2.patch
>
>
> Some frequently seen use cases of 'matches' comparison operator have follow 
> properties -
> 1. The rhs is a constant string . eg "c1 matches 'abc%' "
> 2. Regexes such that look for matching prefix , suffix etc are very common. 
> eg - "abc%', "%abc", '%abc%' 
> To optimize for these common cases , PORegex.java can be changed to -
> 1. Compile the pattern (rhs of matches) re-use it if the pattern string has 
> not changed. 
> 2. Use string comparisons for simple common regexes (in 2 above).
> The implementation of Hive like clause uses similar optimizations.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Assigned: (PIG-965) PERFORMANCE: optimize common case in matches (PORegex)

2009-12-16 Thread Ankit Modi (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-965?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ankit Modi reassigned PIG-965:
--

Assignee: Ankit Modi  (was: Benjamin Francisoud)

> PERFORMANCE: optimize common case in matches (PORegex)
> --
>
> Key: PIG-965
> URL: https://issues.apache.org/jira/browse/PIG-965
> Project: Pig
>  Issue Type: Improvement
>  Components: impl
>Reporter: Thejas M Nair
>Assignee: Ankit Modi
> Attachments: automaton.jar, poregex2.patch
>
>
> Some frequently seen use cases of 'matches' comparison operator have follow 
> properties -
> 1. The rhs is a constant string . eg "c1 matches 'abc%' "
> 2. Regexes such that look for matching prefix , suffix etc are very common. 
> eg - "abc%', "%abc", '%abc%' 
> To optimize for these common cases , PORegex.java can be changed to -
> 1. Compile the pattern (rhs of matches) re-use it if the pattern string has 
> not changed. 
> 2. Use string comparisons for simple common regexes (in 2 above).
> The implementation of Hive like clause uses similar optimizations.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-965) PERFORMANCE: optimize common case in matches (PORegex)

2009-12-16 Thread Ankit Modi (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-965?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ankit Modi updated PIG-965:
---

Attachment: (was: poregex2.patch)

> PERFORMANCE: optimize common case in matches (PORegex)
> --
>
> Key: PIG-965
> URL: https://issues.apache.org/jira/browse/PIG-965
> Project: Pig
>  Issue Type: Improvement
>  Components: impl
>Reporter: Thejas M Nair
>Assignee: Ankit Modi
> Attachments: automaton.jar, poregex2.patch
>
>
> Some frequently seen use cases of 'matches' comparison operator have follow 
> properties -
> 1. The rhs is a constant string . eg "c1 matches 'abc%' "
> 2. Regexes such that look for matching prefix , suffix etc are very common. 
> eg - "abc%', "%abc", '%abc%' 
> To optimize for these common cases , PORegex.java can be changed to -
> 1. Compile the pattern (rhs of matches) re-use it if the pattern string has 
> not changed. 
> 2. Use string comparisons for simple common regexes (in 2 above).
> The implementation of Hive like clause uses similar optimizations.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-965) PERFORMANCE: optimize common case in matches (PORegex)

2009-12-16 Thread Ankit Modi (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-965?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ankit Modi updated PIG-965:
---

Attachment: poregex2.patch

> PERFORMANCE: optimize common case in matches (PORegex)
> --
>
> Key: PIG-965
> URL: https://issues.apache.org/jira/browse/PIG-965
> Project: Pig
>  Issue Type: Improvement
>  Components: impl
>Reporter: Thejas M Nair
>Assignee: Ankit Modi
> Attachments: automaton.jar, poregex2.patch
>
>
> Some frequently seen use cases of 'matches' comparison operator have follow 
> properties -
> 1. The rhs is a constant string . eg "c1 matches 'abc%' "
> 2. Regexes such that look for matching prefix , suffix etc are very common. 
> eg - "abc%', "%abc", '%abc%' 
> To optimize for these common cases , PORegex.java can be changed to -
> 1. Compile the pattern (rhs of matches) re-use it if the pattern string has 
> not changed. 
> 2. Use string comparisons for simple common regexes (in 2 above).
> The implementation of Hive like clause uses similar optimizations.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-965) PERFORMANCE: optimize common case in matches (PORegex)

2009-12-16 Thread Ankit Modi (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-965?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ankit Modi updated PIG-965:
---

Status: Open  (was: Patch Available)

> PERFORMANCE: optimize common case in matches (PORegex)
> --
>
> Key: PIG-965
> URL: https://issues.apache.org/jira/browse/PIG-965
> Project: Pig
>  Issue Type: Improvement
>  Components: impl
>Reporter: Thejas M Nair
>Assignee: Ankit Modi
> Attachments: automaton.jar, poregex2.patch
>
>
> Some frequently seen use cases of 'matches' comparison operator have follow 
> properties -
> 1. The rhs is a constant string . eg "c1 matches 'abc%' "
> 2. Regexes such that look for matching prefix , suffix etc are very common. 
> eg - "abc%', "%abc", '%abc%' 
> To optimize for these common cases , PORegex.java can be changed to -
> 1. Compile the pattern (rhs of matches) re-use it if the pattern string has 
> not changed. 
> 2. Use string comparisons for simple common regexes (in 2 above).
> The implementation of Hive like clause uses similar optimizations.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-965) PERFORMANCE: optimize common case in matches (PORegex)

2009-12-16 Thread Ankit Modi (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-965?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ankit Modi updated PIG-965:
---

Status: Patch Available  (was: Open)

> PERFORMANCE: optimize common case in matches (PORegex)
> --
>
> Key: PIG-965
> URL: https://issues.apache.org/jira/browse/PIG-965
> Project: Pig
>  Issue Type: Improvement
>  Components: impl
>Reporter: Thejas M Nair
>Assignee: Ankit Modi
> Attachments: automaton.jar, poregex2.patch
>
>
> Some frequently seen use cases of 'matches' comparison operator have follow 
> properties -
> 1. The rhs is a constant string . eg "c1 matches 'abc%' "
> 2. Regexes such that look for matching prefix , suffix etc are very common. 
> eg - "abc%', "%abc", '%abc%' 
> To optimize for these common cases , PORegex.java can be changed to -
> 1. Compile the pattern (rhs of matches) re-use it if the pattern string has 
> not changed. 
> 2. Use string comparisons for simple common regexes (in 2 above).
> The implementation of Hive like clause uses similar optimizations.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1178) LogicalPlan and Optimizer are too complex and hard to work with

2010-02-02 Thread Ankit Modi (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1178?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ankit Modi updated PIG-1178:


Attachment: pig_1178.patch

Attaching another patch with end-to-end functionality of load,filter,join,store 
and a few other expression operators.

This patch is self sufficient and can be applied directly on SVN Trunk.

> LogicalPlan and Optimizer are too complex and hard to work with
> ---
>
> Key: PIG-1178
> URL: https://issues.apache.org/jira/browse/PIG-1178
> Project: Pig
>  Issue Type: Improvement
>Reporter: Alan Gates
>Assignee: Ying He
> Attachments: expressions-2.patch, expressions.patch, lp.patch, 
> lp.patch, pig_1178.patch, PIG_1178.patch
>
>
> The current implementation of the logical plan and the logical optimizer in 
> Pig has proven to not be easily extensible. Developer feedback has indicated 
> that adding new rules to the optimizer is quite burdensome. In addition, the 
> logical plan has been an area of numerous bugs, many of which have been 
> difficult to fix. Developers also feel that the logical plan is difficult to 
> understand and maintain. The root cause for these issues is that a number of 
> design decisions that were made as part of the 0.2 rewrite of the front end 
> have now proven to be sub-optimal. The heart of this proposal is to revisit a 
> number of those proposals and rebuild the logical plan with a simpler design 
> that will make it much easier to maintain the logical plan as well as extend 
> the logical optimizer. 
> See http://wiki.apache.org/pig/PigLogicalPlanOptimizerRewrite for full 
> details.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1154) local mode fails when hadoop config directory is specified in classpath

2010-02-02 Thread Ankit Modi (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1154?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12828822#action_12828822
 ] 

Ankit Modi commented on PIG-1154:
-

It looks like the problem is caused by overwritten value of mapred.system.dir 
from mapred-default.xml and the path mentioned above 
"/mapredsystem/hadoop/mapredsystem/" may not exist.

This cannot be solved in local mode as it is not possible to change classpath 
at runtime.

I'll provide a patch which would
   * Provide a warning whenever classpath contains mapred-site.xml or 
hdfs-site.xml.
   * It'll exit pig with an error message if above case is encountered.

> local mode fails when hadoop config directory is specified in classpath
> ---
>
> Key: PIG-1154
> URL: https://issues.apache.org/jira/browse/PIG-1154
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.7.0
>Reporter: Thejas M Nair
>Assignee: Ankit Modi
> Fix For: 0.7.0
>
>
> In local mode, the hadoop configuration should not be taken from the 
> classpath . 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1154) local mode fails when hadoop config directory is specified in classpath

2010-02-02 Thread Ankit Modi (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1154?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12828831#action_12828831
 ] 

Ankit Modi commented on PIG-1154:
-

It will provide warning whenever the files are encountered in Local Mode.

On top of that it will exit with error if mapred.system.dir is different from 
the default one and it does not exist.

> local mode fails when hadoop config directory is specified in classpath
> ---
>
> Key: PIG-1154
> URL: https://issues.apache.org/jira/browse/PIG-1154
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.7.0
>Reporter: Thejas M Nair
>Assignee: Ankit Modi
> Fix For: 0.7.0
>
>
> In local mode, the hadoop configuration should not be taken from the 
> classpath . 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1154) local mode fails when hadoop config directory is specified in classpath

2010-02-02 Thread Ankit Modi (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1154?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ankit Modi updated PIG-1154:


Attachment: pig_1154.patch

Patch according to comments mentioned above.

> local mode fails when hadoop config directory is specified in classpath
> ---
>
> Key: PIG-1154
> URL: https://issues.apache.org/jira/browse/PIG-1154
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.7.0
>Reporter: Thejas M Nair
>Assignee: Ankit Modi
> Fix For: 0.7.0
>
> Attachments: pig_1154.patch
>
>
> In local mode, the hadoop configuration should not be taken from the 
> classpath . 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1154) local mode fails when hadoop config directory is specified in classpath

2010-02-02 Thread Ankit Modi (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1154?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ankit Modi updated PIG-1154:


Status: Patch Available  (was: Open)

This patch only affects only Local Mode in pig.

> local mode fails when hadoop config directory is specified in classpath
> ---
>
> Key: PIG-1154
> URL: https://issues.apache.org/jira/browse/PIG-1154
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.7.0
>Reporter: Thejas M Nair
>Assignee: Ankit Modi
> Fix For: 0.7.0
>
> Attachments: pig_1154.patch
>
>
> In local mode, the hadoop configuration should not be taken from the 
> classpath . 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1178) LogicalPlan and Optimizer are too complex and hard to work with

2010-02-02 Thread Ankit Modi (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1178?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ankit Modi updated PIG-1178:


Status: Open  (was: Patch Available)

I found a bug in the code so I'll be releasing another patch for the same.

I'll keep this patch in the JIRA until I replace it with a new one so everyone 
can review it.

> LogicalPlan and Optimizer are too complex and hard to work with
> ---
>
> Key: PIG-1178
> URL: https://issues.apache.org/jira/browse/PIG-1178
> Project: Pig
>  Issue Type: Improvement
>Reporter: Alan Gates
>Assignee: Ying He
> Attachments: expressions-2.patch, expressions.patch, lp.patch, 
> lp.patch, pig_1178.patch, PIG_1178.patch
>
>
> The current implementation of the logical plan and the logical optimizer in 
> Pig has proven to not be easily extensible. Developer feedback has indicated 
> that adding new rules to the optimizer is quite burdensome. In addition, the 
> logical plan has been an area of numerous bugs, many of which have been 
> difficult to fix. Developers also feel that the logical plan is difficult to 
> understand and maintain. The root cause for these issues is that a number of 
> design decisions that were made as part of the 0.2 rewrite of the front end 
> have now proven to be sub-optimal. The heart of this proposal is to revisit a 
> number of those proposals and rebuild the logical plan with a simpler design 
> that will make it much easier to maintain the logical plan as well as extend 
> the logical optimizer. 
> See http://wiki.apache.org/pig/PigLogicalPlanOptimizerRewrite for full 
> details.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1178) LogicalPlan and Optimizer are too complex and hard to work with

2010-02-05 Thread Ankit Modi (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1178?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ankit Modi updated PIG-1178:


Attachment: pig_1178.patch

This is a new patch that can be applied to SVN Trunk.

It includes ForEach, InnerLoad, Generate operators along with some 
LogicalExpression.
It also includes a new optimizer Rule for pushing FilterAboveForeach

> LogicalPlan and Optimizer are too complex and hard to work with
> ---
>
> Key: PIG-1178
> URL: https://issues.apache.org/jira/browse/PIG-1178
> Project: Pig
>  Issue Type: Improvement
>Reporter: Alan Gates
>Assignee: Ying He
> Attachments: expressions-2.patch, expressions.patch, lp.patch, 
> lp.patch, pig_1178.patch, pig_1178.patch, PIG_1178.patch
>
>
> The current implementation of the logical plan and the logical optimizer in 
> Pig has proven to not be easily extensible. Developer feedback has indicated 
> that adding new rules to the optimizer is quite burdensome. In addition, the 
> logical plan has been an area of numerous bugs, many of which have been 
> difficult to fix. Developers also feel that the logical plan is difficult to 
> understand and maintain. The root cause for these issues is that a number of 
> design decisions that were made as part of the 0.2 rewrite of the front end 
> have now proven to be sub-optimal. The heart of this proposal is to revisit a 
> number of those proposals and rebuild the logical plan with a simpler design 
> that will make it much easier to maintain the logical plan as well as extend 
> the logical optimizer. 
> See http://wiki.apache.org/pig/PigLogicalPlanOptimizerRewrite for full 
> details.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Reopened: (PIG-965) PERFORMANCE: optimize common case in matches (PORegex)

2010-02-16 Thread Ankit Modi (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-965?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ankit Modi reopened PIG-965:


  Assignee: (was: Ankit Modi)

I couldn't see the poregex2.patch patch applied in the code.

automaton.jar is present in the trunk, but the files modified/added by above 
patch are not modified/added.

> PERFORMANCE: optimize common case in matches (PORegex)
> --
>
> Key: PIG-965
> URL: https://issues.apache.org/jira/browse/PIG-965
> Project: Pig
>  Issue Type: Improvement
>  Components: impl
>Reporter: Thejas M Nair
> Attachments: automaton.jar, poregex2.patch
>
>
> Some frequently seen use cases of 'matches' comparison operator have follow 
> properties -
> 1. The rhs is a constant string . eg "c1 matches 'abc%' "
> 2. Regexes such that look for matching prefix , suffix etc are very common. 
> eg - "abc%', "%abc", '%abc%' 
> To optimize for these common cases , PORegex.java can be changed to -
> 1. Compile the pattern (rhs of matches) re-use it if the pattern string has 
> not changed. 
> 2. Use string comparisons for simple common regexes (in 2 above).
> The implementation of Hive like clause uses similar optimizations.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1178) LogicalPlan and Optimizer are too complex and hard to work with

2010-02-22 Thread Ankit Modi (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1178?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ankit Modi updated PIG-1178:


Status: Patch Available  (was: Open)

> LogicalPlan and Optimizer are too complex and hard to work with
> ---
>
> Key: PIG-1178
> URL: https://issues.apache.org/jira/browse/PIG-1178
> Project: Pig
>  Issue Type: Improvement
>Reporter: Alan Gates
>Assignee: Ying He
> Attachments: expressions-2.patch, expressions.patch, lp.patch, 
> lp.patch, pig_1178.patch, pig_1178.patch, PIG_1178.patch, pig_1178_2.patch
>
>
> The current implementation of the logical plan and the logical optimizer in 
> Pig has proven to not be easily extensible. Developer feedback has indicated 
> that adding new rules to the optimizer is quite burdensome. In addition, the 
> logical plan has been an area of numerous bugs, many of which have been 
> difficult to fix. Developers also feel that the logical plan is difficult to 
> understand and maintain. The root cause for these issues is that a number of 
> design decisions that were made as part of the 0.2 rewrite of the front end 
> have now proven to be sub-optimal. The heart of this proposal is to revisit a 
> number of those proposals and rebuild the logical plan with a simpler design 
> that will make it much easier to maintain the logical plan as well as extend 
> the logical optimizer. 
> See http://wiki.apache.org/pig/PigLogicalPlanOptimizerRewrite for full 
> details.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1178) LogicalPlan and Optimizer are too complex and hard to work with

2010-02-22 Thread Ankit Modi (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1178?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ankit Modi updated PIG-1178:


Attachment: pig_1178_2.patch

Another patch with a few more LogicalExpressions and some more unit tests using 
the foreach operator

It also has a rudimentry planPrinter to print new logical plan.

> LogicalPlan and Optimizer are too complex and hard to work with
> ---
>
> Key: PIG-1178
> URL: https://issues.apache.org/jira/browse/PIG-1178
> Project: Pig
>  Issue Type: Improvement
>Reporter: Alan Gates
>Assignee: Ying He
> Attachments: expressions-2.patch, expressions.patch, lp.patch, 
> lp.patch, pig_1178.patch, pig_1178.patch, PIG_1178.patch, pig_1178_2.patch
>
>
> The current implementation of the logical plan and the logical optimizer in 
> Pig has proven to not be easily extensible. Developer feedback has indicated 
> that adding new rules to the optimizer is quite burdensome. In addition, the 
> logical plan has been an area of numerous bugs, many of which have been 
> difficult to fix. Developers also feel that the logical plan is difficult to 
> understand and maintain. The root cause for these issues is that a number of 
> design decisions that were made as part of the 0.2 rewrite of the front end 
> have now proven to be sub-optimal. The heart of this proposal is to revisit a 
> number of those proposals and rebuild the logical plan with a simpler design 
> that will make it much easier to maintain the logical plan as well as extend 
> the logical optimizer. 
> See http://wiki.apache.org/pig/PigLogicalPlanOptimizerRewrite for full 
> details.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1178) LogicalPlan and Optimizer are too complex and hard to work with

2010-02-22 Thread Ankit Modi (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1178?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ankit Modi updated PIG-1178:


Status: Open  (was: Patch Available)

> LogicalPlan and Optimizer are too complex and hard to work with
> ---
>
> Key: PIG-1178
> URL: https://issues.apache.org/jira/browse/PIG-1178
> Project: Pig
>  Issue Type: Improvement
>Reporter: Alan Gates
>Assignee: Ying He
> Attachments: expressions-2.patch, expressions.patch, lp.patch, 
> lp.patch, pig_1178.patch, pig_1178.patch, PIG_1178.patch, pig_1178_2.patch
>
>
> The current implementation of the logical plan and the logical optimizer in 
> Pig has proven to not be easily extensible. Developer feedback has indicated 
> that adding new rules to the optimizer is quite burdensome. In addition, the 
> logical plan has been an area of numerous bugs, many of which have been 
> difficult to fix. Developers also feel that the logical plan is difficult to 
> understand and maintain. The root cause for these issues is that a number of 
> design decisions that were made as part of the 0.2 rewrite of the front end 
> have now proven to be sub-optimal. The heart of this proposal is to revisit a 
> number of those proposals and rebuild the logical plan with a simpler design 
> that will make it much easier to maintain the logical plan as well as extend 
> the logical optimizer. 
> See http://wiki.apache.org/pig/PigLogicalPlanOptimizerRewrite for full 
> details.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1178) LogicalPlan and Optimizer are too complex and hard to work with

2010-02-22 Thread Ankit Modi (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1178?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ankit Modi updated PIG-1178:


Status: Patch Available  (was: Open)

Resubmitting patch again due to core test failures

> LogicalPlan and Optimizer are too complex and hard to work with
> ---
>
> Key: PIG-1178
> URL: https://issues.apache.org/jira/browse/PIG-1178
> Project: Pig
>  Issue Type: Improvement
>Reporter: Alan Gates
>Assignee: Ying He
> Attachments: expressions-2.patch, expressions.patch, lp.patch, 
> lp.patch, pig_1178.patch, pig_1178.patch, PIG_1178.patch, pig_1178_2.patch
>
>
> The current implementation of the logical plan and the logical optimizer in 
> Pig has proven to not be easily extensible. Developer feedback has indicated 
> that adding new rules to the optimizer is quite burdensome. In addition, the 
> logical plan has been an area of numerous bugs, many of which have been 
> difficult to fix. Developers also feel that the logical plan is difficult to 
> understand and maintain. The root cause for these issues is that a number of 
> design decisions that were made as part of the 0.2 rewrite of the front end 
> have now proven to be sub-optimal. The heart of this proposal is to revisit a 
> number of those proposals and rebuild the logical plan with a simpler design 
> that will make it much easier to maintain the logical plan as well as extend 
> the logical optimizer. 
> See http://wiki.apache.org/pig/PigLogicalPlanOptimizerRewrite for full 
> details.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1178) LogicalPlan and Optimizer are too complex and hard to work with

2010-02-24 Thread Ankit Modi (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1178?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12837954#action_12837954
 ] 

Ankit Modi commented on PIG-1178:
-

the core tests are failing due to some issue with hudson or the framework.

I ran the core tests again yesterday night and they passed. 

> LogicalPlan and Optimizer are too complex and hard to work with
> ---
>
> Key: PIG-1178
> URL: https://issues.apache.org/jira/browse/PIG-1178
> Project: Pig
>  Issue Type: Improvement
>Reporter: Alan Gates
>Assignee: Ying He
> Attachments: expressions-2.patch, expressions.patch, lp.patch, 
> lp.patch, pig_1178.patch, pig_1178.patch, PIG_1178.patch, pig_1178_2.patch
>
>
> The current implementation of the logical plan and the logical optimizer in 
> Pig has proven to not be easily extensible. Developer feedback has indicated 
> that adding new rules to the optimizer is quite burdensome. In addition, the 
> logical plan has been an area of numerous bugs, many of which have been 
> difficult to fix. Developers also feel that the logical plan is difficult to 
> understand and maintain. The root cause for these issues is that a number of 
> design decisions that were made as part of the 0.2 rewrite of the front end 
> have now proven to be sub-optimal. The heart of this proposal is to revisit a 
> number of those proposals and rebuild the logical plan with a simpler design 
> that will make it much easier to maintain the logical plan as well as extend 
> the logical optimizer. 
> See http://wiki.apache.org/pig/PigLogicalPlanOptimizerRewrite for full 
> details.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1178) LogicalPlan and Optimizer are too complex and hard to work with

2010-03-12 Thread Ankit Modi (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1178?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ankit Modi updated PIG-1178:


Status: Patch Available  (was: Open)

> LogicalPlan and Optimizer are too complex and hard to work with
> ---
>
> Key: PIG-1178
> URL: https://issues.apache.org/jira/browse/PIG-1178
> Project: Pig
>  Issue Type: Improvement
>Reporter: Alan Gates
>Assignee: Ying He
> Attachments: expressions-2.patch, expressions.patch, lp.patch, 
> lp.patch, pig_1178.patch, pig_1178.patch, PIG_1178.patch, pig_1178_2.patch, 
> pig_1178_3.patch
>
>
> The current implementation of the logical plan and the logical optimizer in 
> Pig has proven to not be easily extensible. Developer feedback has indicated 
> that adding new rules to the optimizer is quite burdensome. In addition, the 
> logical plan has been an area of numerous bugs, many of which have been 
> difficult to fix. Developers also feel that the logical plan is difficult to 
> understand and maintain. The root cause for these issues is that a number of 
> design decisions that were made as part of the 0.2 rewrite of the front end 
> have now proven to be sub-optimal. The heart of this proposal is to revisit a 
> number of those proposals and rebuild the logical plan with a simpler design 
> that will make it much easier to maintain the logical plan as well as extend 
> the logical optimizer. 
> See http://wiki.apache.org/pig/PigLogicalPlanOptimizerRewrite for full 
> details.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1178) LogicalPlan and Optimizer are too complex and hard to work with

2010-03-12 Thread Ankit Modi (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1178?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ankit Modi updated PIG-1178:


Attachment: pig_1178_3.patch

> LogicalPlan and Optimizer are too complex and hard to work with
> ---
>
> Key: PIG-1178
> URL: https://issues.apache.org/jira/browse/PIG-1178
> Project: Pig
>  Issue Type: Improvement
>Reporter: Alan Gates
>Assignee: Ying He
> Attachments: expressions-2.patch, expressions.patch, lp.patch, 
> lp.patch, pig_1178.patch, pig_1178.patch, PIG_1178.patch, pig_1178_2.patch, 
> pig_1178_3.patch
>
>
> The current implementation of the logical plan and the logical optimizer in 
> Pig has proven to not be easily extensible. Developer feedback has indicated 
> that adding new rules to the optimizer is quite burdensome. In addition, the 
> logical plan has been an area of numerous bugs, many of which have been 
> difficult to fix. Developers also feel that the logical plan is difficult to 
> understand and maintain. The root cause for these issues is that a number of 
> design decisions that were made as part of the 0.2 rewrite of the front end 
> have now proven to be sub-optimal. The heart of this proposal is to revisit a 
> number of those proposals and rebuild the logical plan with a simpler design 
> that will make it much easier to maintain the logical plan as well as extend 
> the logical optimizer. 
> See http://wiki.apache.org/pig/PigLogicalPlanOptimizerRewrite for full 
> details.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (PIG-960) Using Hadoop's optimized LineRecordReader for reading Tuples in PigStorage

2009-09-14 Thread Ankit Modi (JIRA)
Using Hadoop's optimized LineRecordReader for reading Tuples in PigStorage 
---

 Key: PIG-960
 URL: https://issues.apache.org/jira/browse/PIG-960
 Project: Pig
  Issue Type: Improvement
  Components: impl
Reporter: Ankit Modi


PigStorage's reading of Tuples ( lines ) can be optimized using Hadoop's 
{{LineRecordReader}}.

This can help in following areas
- Improving performance reading of Tuples (lines) in {{PigStorage}}
- Any future improvements in line reading done in Hadoop's {{LineRecordReader}} 
is automatically carried over to Pig

Issues that are handled by this patch
- BZip uses internal buffers and positioning for determining the number of 
bytes read. Hence buffering done by {{LineRecordReader}} has to be turned off
- Current implementation of {{LocalSeekableInputStream}} does not implement 
{{available}} method. This method has to be implemented.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-960) Using Hadoop's optimized LineRecordReader for reading Tuples in PigStorage

2009-09-14 Thread Ankit Modi (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-960?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ankit Modi updated PIG-960:
---

Patch Info:   (was: [Patch Available])

Performance improvement numbers obtained by running PigMix

||Script||svn Trunk||LineRecordReader Patch||
||L1|186|147|
||L2|73|33|
||L3|195|165|
||L4|116|76|
||L5|93|59|
||L6|102|63|
||L7|91|69|
||L8|84|44|
||L9|189|148|
||L10|285|268|
||L11|108|51|
||L12|112|73|
||Sum|1634|1196|
||% Improvement| ||26.81|



> Using Hadoop's optimized LineRecordReader for reading Tuples in PigStorage 
> ---
>
> Key: PIG-960
> URL: https://issues.apache.org/jira/browse/PIG-960
> Project: Pig
>  Issue Type: Improvement
>  Components: impl
>Reporter: Ankit Modi
>
> PigStorage's reading of Tuples ( lines ) can be optimized using Hadoop's 
> {{LineRecordReader}}.
> This can help in following areas
> - Improving performance reading of Tuples (lines) in {{PigStorage}}
> - Any future improvements in line reading done in Hadoop's 
> {{LineRecordReader}} is automatically carried over to Pig
> Issues that are handled by this patch
> - BZip uses internal buffers and positioning for determining the number of 
> bytes read. Hence buffering done by {{LineRecordReader}} has to be turned off
> - Current implementation of {{LocalSeekableInputStream}} does not implement 
> {{available}} method. This method has to be implemented.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-960) Using Hadoop's optimized LineRecordReader for reading Tuples in PigStorage

2009-09-15 Thread Ankit Modi (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-960?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ankit Modi updated PIG-960:
---

Patch Info: [Patch Available]

Adding a patch file

> Using Hadoop's optimized LineRecordReader for reading Tuples in PigStorage 
> ---
>
> Key: PIG-960
> URL: https://issues.apache.org/jira/browse/PIG-960
> Project: Pig
>  Issue Type: Improvement
>  Components: impl
>Reporter: Ankit Modi
>
> PigStorage's reading of Tuples ( lines ) can be optimized using Hadoop's 
> {{LineRecordReader}}.
> This can help in following areas
> - Improving performance reading of Tuples (lines) in {{PigStorage}}
> - Any future improvements in line reading done in Hadoop's 
> {{LineRecordReader}} is automatically carried over to Pig
> Issues that are handled by this patch
> - BZip uses internal buffers and positioning for determining the number of 
> bytes read. Hence buffering done by {{LineRecordReader}} has to be turned off
> - Current implementation of {{LocalSeekableInputStream}} does not implement 
> {{available}} method. This method has to be implemented.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-960) Using Hadoop's optimized LineRecordReader for reading Tuples in PigStorage

2009-09-15 Thread Ankit Modi (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-960?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ankit Modi updated PIG-960:
---

Attachment: pig_rlr.patch

This is a patch of all the changes for improvement done with LineRecordReader

> Using Hadoop's optimized LineRecordReader for reading Tuples in PigStorage 
> ---
>
> Key: PIG-960
> URL: https://issues.apache.org/jira/browse/PIG-960
> Project: Pig
>  Issue Type: Improvement
>  Components: impl
>Reporter: Ankit Modi
> Attachments: pig_rlr.patch
>
>
> PigStorage's reading of Tuples ( lines ) can be optimized using Hadoop's 
> {{LineRecordReader}}.
> This can help in following areas
> - Improving performance reading of Tuples (lines) in {{PigStorage}}
> - Any future improvements in line reading done in Hadoop's 
> {{LineRecordReader}} is automatically carried over to Pig
> Issues that are handled by this patch
> - BZip uses internal buffers and positioning for determining the number of 
> bytes read. Hence buffering done by {{LineRecordReader}} has to be turned off
> - Current implementation of {{LocalSeekableInputStream}} does not implement 
> {{available}} method. This method has to be implemented.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-960) Using Hadoop's optimized LineRecordReader for reading Tuples in PigStorage

2009-09-17 Thread Ankit Modi (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-960?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ankit Modi updated PIG-960:
---

Attachment: (was: pig_rlr.patch)

> Using Hadoop's optimized LineRecordReader for reading Tuples in PigStorage 
> ---
>
> Key: PIG-960
> URL: https://issues.apache.org/jira/browse/PIG-960
> Project: Pig
>  Issue Type: Improvement
>  Components: impl
>Reporter: Ankit Modi
>
> PigStorage's reading of Tuples ( lines ) can be optimized using Hadoop's 
> {{LineRecordReader}}.
> This can help in following areas
> - Improving performance reading of Tuples (lines) in {{PigStorage}}
> - Any future improvements in line reading done in Hadoop's 
> {{LineRecordReader}} is automatically carried over to Pig
> Issues that are handled by this patch
> - BZip uses internal buffers and positioning for determining the number of 
> bytes read. Hence buffering done by {{LineRecordReader}} has to be turned off
> - Current implementation of {{LocalSeekableInputStream}} does not implement 
> {{available}} method. This method has to be implemented.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-960) Using Hadoop's optimized LineRecordReader for reading Tuples in PigStorage

2009-09-17 Thread Ankit Modi (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-960?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ankit Modi updated PIG-960:
---

Status: Open  (was: Patch Available)

This patch failed in release audit

> Using Hadoop's optimized LineRecordReader for reading Tuples in PigStorage 
> ---
>
> Key: PIG-960
> URL: https://issues.apache.org/jira/browse/PIG-960
> Project: Pig
>  Issue Type: Improvement
>  Components: impl
>Reporter: Ankit Modi
>
> PigStorage's reading of Tuples ( lines ) can be optimized using Hadoop's 
> {{LineRecordReader}}.
> This can help in following areas
> - Improving performance reading of Tuples (lines) in {{PigStorage}}
> - Any future improvements in line reading done in Hadoop's 
> {{LineRecordReader}} is automatically carried over to Pig
> Issues that are handled by this patch
> - BZip uses internal buffers and positioning for determining the number of 
> bytes read. Hence buffering done by {{LineRecordReader}} has to be turned off
> - Current implementation of {{LocalSeekableInputStream}} does not implement 
> {{available}} method. This method has to be implemented.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-960) Using Hadoop's optimized LineRecordReader for reading Tuples in PigStorage

2009-09-28 Thread Ankit Modi (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-960?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ankit Modi updated PIG-960:
---

Attachment: pig_rlr.patch

Added a new patch with Apache license and SVN Trunk Revision 819662

> Using Hadoop's optimized LineRecordReader for reading Tuples in PigStorage 
> ---
>
> Key: PIG-960
> URL: https://issues.apache.org/jira/browse/PIG-960
> Project: Pig
>  Issue Type: Improvement
>  Components: impl
>Reporter: Ankit Modi
> Attachments: pig_rlr.patch
>
>
> PigStorage's reading of Tuples ( lines ) can be optimized using Hadoop's 
> {{LineRecordReader}}.
> This can help in following areas
> - Improving performance reading of Tuples (lines) in {{PigStorage}}
> - Any future improvements in line reading done in Hadoop's 
> {{LineRecordReader}} is automatically carried over to Pig
> Issues that are handled by this patch
> - BZip uses internal buffers and positioning for determining the number of 
> bytes read. Hence buffering done by {{LineRecordReader}} has to be turned off
> - Current implementation of {{LocalSeekableInputStream}} does not implement 
> {{available}} method. This method has to be implemented.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-960) Using Hadoop's optimized LineRecordReader for reading Tuples in PigStorage

2009-09-28 Thread Ankit Modi (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-960?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ankit Modi updated PIG-960:
---

Status: Patch Available  (was: Open)

This update adds three new warning as it uses org.apache.mapred classes which 
have been deprecated

> Using Hadoop's optimized LineRecordReader for reading Tuples in PigStorage 
> ---
>
> Key: PIG-960
> URL: https://issues.apache.org/jira/browse/PIG-960
> Project: Pig
>  Issue Type: Improvement
>  Components: impl
>Reporter: Ankit Modi
> Attachments: pig_rlr.patch
>
>
> PigStorage's reading of Tuples ( lines ) can be optimized using Hadoop's 
> {{LineRecordReader}}.
> This can help in following areas
> - Improving performance reading of Tuples (lines) in {{PigStorage}}
> - Any future improvements in line reading done in Hadoop's 
> {{LineRecordReader}} is automatically carried over to Pig
> Issues that are handled by this patch
> - BZip uses internal buffers and positioning for determining the number of 
> bytes read. Hence buffering done by {{LineRecordReader}} has to be turned off
> - Current implementation of {{LocalSeekableInputStream}} does not implement 
> {{available}} method. This method has to be implemented.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-960) Using Hadoop's optimized LineRecordReader for reading Tuples in PigStorage

2009-09-29 Thread Ankit Modi (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-960?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12760822#action_12760822
 ] 

Ankit Modi commented on PIG-960:


Thanks for comments Daniel.

Answers:
1. PigLineRecordReader (PLRR) needs to know the type of InputStream it is 
handling. BZip2 or Uncompressed. Depending on the type of input stream it 
chooses which Reader to utilize. BPIS ( BufferedPositionedInputStream ) stores 
the input stream as a protected member. PLRR can access this via following 
ways: - making member public, - adding a get method to access it or - inherit.
I implemented the last one as it makes least changes to BPIS.
2. Good one. Will be fixed in next patch.
3. Will be added in next patch
4. Corrected in next patch.

> Using Hadoop's optimized LineRecordReader for reading Tuples in PigStorage 
> ---
>
> Key: PIG-960
> URL: https://issues.apache.org/jira/browse/PIG-960
> Project: Pig
>  Issue Type: Improvement
>  Components: impl
>Reporter: Ankit Modi
> Attachments: pig_rlr.patch
>
>
> PigStorage's reading of Tuples ( lines ) can be optimized using Hadoop's 
> {{LineRecordReader}}.
> This can help in following areas
> - Improving performance reading of Tuples (lines) in {{PigStorage}}
> - Any future improvements in line reading done in Hadoop's 
> {{LineRecordReader}} is automatically carried over to Pig
> Issues that are handled by this patch
> - BZip uses internal buffers and positioning for determining the number of 
> bytes read. Hence buffering done by {{LineRecordReader}} has to be turned off
> - Current implementation of {{LocalSeekableInputStream}} does not implement 
> {{available}} method. This method has to be implemented.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-960) Using Hadoop's optimized LineRecordReader for reading Tuples in PigStorage

2009-09-30 Thread Ankit Modi (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-960?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ankit Modi updated PIG-960:
---

Attachment: (was: pig_rlr.patch)

> Using Hadoop's optimized LineRecordReader for reading Tuples in PigStorage 
> ---
>
> Key: PIG-960
> URL: https://issues.apache.org/jira/browse/PIG-960
> Project: Pig
>  Issue Type: Improvement
>  Components: impl
>Reporter: Ankit Modi
> Attachments: pig_rlr.patch
>
>
> PigStorage's reading of Tuples ( lines ) can be optimized using Hadoop's 
> {{LineRecordReader}}.
> This can help in following areas
> - Improving performance reading of Tuples (lines) in {{PigStorage}}
> - Any future improvements in line reading done in Hadoop's 
> {{LineRecordReader}} is automatically carried over to Pig
> Issues that are handled by this patch
> - BZip uses internal buffers and positioning for determining the number of 
> bytes read. Hence buffering done by {{LineRecordReader}} has to be turned off
> - Current implementation of {{LocalSeekableInputStream}} does not implement 
> {{available}} method. This method has to be implemented.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-960) Using Hadoop's optimized LineRecordReader for reading Tuples in PigStorage

2009-09-30 Thread Ankit Modi (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-960?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ankit Modi updated PIG-960:
---

Attachment: pig_rlr.patch

> Using Hadoop's optimized LineRecordReader for reading Tuples in PigStorage 
> ---
>
> Key: PIG-960
> URL: https://issues.apache.org/jira/browse/PIG-960
> Project: Pig
>  Issue Type: Improvement
>  Components: impl
>Reporter: Ankit Modi
> Attachments: pig_rlr.patch
>
>
> PigStorage's reading of Tuples ( lines ) can be optimized using Hadoop's 
> {{LineRecordReader}}.
> This can help in following areas
> - Improving performance reading of Tuples (lines) in {{PigStorage}}
> - Any future improvements in line reading done in Hadoop's 
> {{LineRecordReader}} is automatically carried over to Pig
> Issues that are handled by this patch
> - BZip uses internal buffers and positioning for determining the number of 
> bytes read. Hence buffering done by {{LineRecordReader}} has to be turned off
> - Current implementation of {{LocalSeekableInputStream}} does not implement 
> {{available}} method. This method has to be implemented.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-960) Using Hadoop's optimized LineRecordReader for reading Tuples in PigStorage

2009-10-01 Thread Ankit Modi (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-960?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ankit Modi updated PIG-960:
---

Attachment: pig_rlr.patch

> Using Hadoop's optimized LineRecordReader for reading Tuples in PigStorage 
> ---
>
> Key: PIG-960
> URL: https://issues.apache.org/jira/browse/PIG-960
> Project: Pig
>  Issue Type: Improvement
>  Components: impl
>Reporter: Ankit Modi
> Attachments: pig_rlr.patch
>
>
> PigStorage's reading of Tuples ( lines ) can be optimized using Hadoop's 
> {{LineRecordReader}}.
> This can help in following areas
> - Improving performance reading of Tuples (lines) in {{PigStorage}}
> - Any future improvements in line reading done in Hadoop's 
> {{LineRecordReader}} is automatically carried over to Pig
> Issues that are handled by this patch
> - BZip uses internal buffers and positioning for determining the number of 
> bytes read. Hence buffering done by {{LineRecordReader}} has to be turned off
> - Current implementation of {{LocalSeekableInputStream}} does not implement 
> {{available}} method. This method has to be implemented.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-960) Using Hadoop's optimized LineRecordReader for reading Tuples in PigStorage

2009-10-01 Thread Ankit Modi (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-960?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ankit Modi updated PIG-960:
---

Attachment: (was: pig_rlr.patch)

> Using Hadoop's optimized LineRecordReader for reading Tuples in PigStorage 
> ---
>
> Key: PIG-960
> URL: https://issues.apache.org/jira/browse/PIG-960
> Project: Pig
>  Issue Type: Improvement
>  Components: impl
>Reporter: Ankit Modi
> Attachments: pig_rlr.patch
>
>
> PigStorage's reading of Tuples ( lines ) can be optimized using Hadoop's 
> {{LineRecordReader}}.
> This can help in following areas
> - Improving performance reading of Tuples (lines) in {{PigStorage}}
> - Any future improvements in line reading done in Hadoop's 
> {{LineRecordReader}} is automatically carried over to Pig
> Issues that are handled by this patch
> - BZip uses internal buffers and positioning for determining the number of 
> bytes read. Hence buffering done by {{LineRecordReader}} has to be turned off
> - Current implementation of {{LocalSeekableInputStream}} does not implement 
> {{available}} method. This method has to be implemented.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-960) Using Hadoop's optimized LineRecordReader for reading Tuples in PigStorage

2009-10-01 Thread Ankit Modi (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-960?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12761376#action_12761376
 ] 

Ankit Modi commented on PIG-960:


Added the latest patch making PigLineRecordReader a wrapper only.

> Using Hadoop's optimized LineRecordReader for reading Tuples in PigStorage 
> ---
>
> Key: PIG-960
> URL: https://issues.apache.org/jira/browse/PIG-960
> Project: Pig
>  Issue Type: Improvement
>  Components: impl
>Reporter: Ankit Modi
> Attachments: pig_rlr.patch
>
>
> PigStorage's reading of Tuples ( lines ) can be optimized using Hadoop's 
> {{LineRecordReader}}.
> This can help in following areas
> - Improving performance reading of Tuples (lines) in {{PigStorage}}
> - Any future improvements in line reading done in Hadoop's 
> {{LineRecordReader}} is automatically carried over to Pig
> Issues that are handled by this patch
> - BZip uses internal buffers and positioning for determining the number of 
> bytes read. Hence buffering done by {{LineRecordReader}} has to be turned off
> - Current implementation of {{LocalSeekableInputStream}} does not implement 
> {{available}} method. This method has to be implemented.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Assigned: (PIG-1036) Fragment-replicate left outer join

2009-10-28 Thread Ankit Modi (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1036?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ankit Modi reassigned PIG-1036:
---

Assignee: Ankit Modi

> Fragment-replicate left outer join
> --
>
> Key: PIG-1036
> URL: https://issues.apache.org/jira/browse/PIG-1036
> Project: Pig
>  Issue Type: New Feature
>Reporter: Olga Natkovich
>Assignee: Ankit Modi
>


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1036) Fragment-replicate left outer join

2009-10-28 Thread Ankit Modi (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1036?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ankit Modi updated PIG-1036:


Attachment: LeftOuterFRJoin.patch

This patch fails in findBugs as I had modified the line that contained findBugs 
warnings earlier.

It also fails on ReleaseAudit for html ( doc ) file for POFRJoin

> Fragment-replicate left outer join
> --
>
> Key: PIG-1036
> URL: https://issues.apache.org/jira/browse/PIG-1036
> Project: Pig
>  Issue Type: New Feature
>Reporter: Olga Natkovich
>Assignee: Ankit Modi
> Attachments: LeftOuterFRJoin.patch
>
>


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1036) Fragment-replicate left outer join

2009-10-28 Thread Ankit Modi (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1036?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ankit Modi updated PIG-1036:


Status: Patch Available  (was: Open)

> Fragment-replicate left outer join
> --
>
> Key: PIG-1036
> URL: https://issues.apache.org/jira/browse/PIG-1036
> Project: Pig
>  Issue Type: New Feature
>Reporter: Olga Natkovich
>Assignee: Ankit Modi
> Attachments: LeftOuterFRJoin.patch
>
>


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1036) Fragment-replicate left outer join

2009-10-28 Thread Ankit Modi (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1036?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12771175#action_12771175
 ] 

Ankit Modi commented on PIG-1036:
-

This patch fails in findBugs as I had modified ***lines (4 lines of 
constructors)*** that contained findBugs warnings earlier. 

> Fragment-replicate left outer join
> --
>
> Key: PIG-1036
> URL: https://issues.apache.org/jira/browse/PIG-1036
> Project: Pig
>  Issue Type: New Feature
>Reporter: Olga Natkovich
>Assignee: Ankit Modi
> Attachments: LeftOuterFRJoin.patch
>
>


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.