[jira] Updated: (PIG-1036) Fragment-replicate left outer join
[ https://issues.apache.org/jira/browse/PIG-1036?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ankit Modi updated PIG-1036: Attachment: (was: LeftOuterFRJoin.patch) > Fragment-replicate left outer join > -- > > Key: PIG-1036 > URL: https://issues.apache.org/jira/browse/PIG-1036 > Project: Pig > Issue Type: New Feature >Reporter: Olga Natkovich >Assignee: Ankit Modi > -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1036) Fragment-replicate left outer join
[ https://issues.apache.org/jira/browse/PIG-1036?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ankit Modi updated PIG-1036: Attachment: LeftOuterFRJoin.patch Updated with the new SVN trunk. Findbugs are removed automatically with Olgan's changes. Even ReleaseAudit warnings are removed. > Fragment-replicate left outer join > -- > > Key: PIG-1036 > URL: https://issues.apache.org/jira/browse/PIG-1036 > Project: Pig > Issue Type: New Feature >Reporter: Olga Natkovich >Assignee: Ankit Modi > Attachments: LeftOuterFRJoin.patch > > -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1036) Fragment-replicate left outer join
[ https://issues.apache.org/jira/browse/PIG-1036?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ankit Modi updated PIG-1036: Status: Open (was: Patch Available) > Fragment-replicate left outer join > -- > > Key: PIG-1036 > URL: https://issues.apache.org/jira/browse/PIG-1036 > Project: Pig > Issue Type: New Feature >Reporter: Olga Natkovich >Assignee: Ankit Modi > Attachments: LeftOuterFRJoin.patch > > -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1036) Fragment-replicate left outer join
[ https://issues.apache.org/jira/browse/PIG-1036?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ankit Modi updated PIG-1036: Status: Patch Available (was: Open) > Fragment-replicate left outer join > -- > > Key: PIG-1036 > URL: https://issues.apache.org/jira/browse/PIG-1036 > Project: Pig > Issue Type: New Feature >Reporter: Olga Natkovich >Assignee: Ankit Modi > Attachments: LeftOuterFRJoin.patch > > -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1036) Fragment-replicate left outer join
[ https://issues.apache.org/jira/browse/PIG-1036?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ankit Modi updated PIG-1036: Attachment: LeftOuterFRJoin.patch Attaching a new patch. The join now only supports two way Left join. Join requires a schema to be mandatory be present on the right side, and it is used to determine the number of null fields/columns in nullTuple. As its a two way join we use nullBag instead of an Array of nullBag. A DataBag is used instead of a Tuple to maintain consistency on the result Type of ConstantExpression. > Fragment-replicate left outer join > -- > > Key: PIG-1036 > URL: https://issues.apache.org/jira/browse/PIG-1036 > Project: Pig > Issue Type: New Feature >Reporter: Olga Natkovich >Assignee: Ankit Modi > Attachments: LeftOuterFRJoin.patch > > -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1036) Fragment-replicate left outer join
[ https://issues.apache.org/jira/browse/PIG-1036?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ankit Modi updated PIG-1036: Attachment: (was: LeftOuterFRJoin.patch) > Fragment-replicate left outer join > -- > > Key: PIG-1036 > URL: https://issues.apache.org/jira/browse/PIG-1036 > Project: Pig > Issue Type: New Feature >Reporter: Olga Natkovich >Assignee: Ankit Modi > -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1036) Fragment-replicate left outer join
[ https://issues.apache.org/jira/browse/PIG-1036?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ankit Modi updated PIG-1036: Status: Open (was: Patch Available) > Fragment-replicate left outer join > -- > > Key: PIG-1036 > URL: https://issues.apache.org/jira/browse/PIG-1036 > Project: Pig > Issue Type: New Feature >Reporter: Olga Natkovich >Assignee: Ankit Modi > Attachments: LeftOuterFRJoin.patch > > -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-1036) Fragment-replicate left outer join
[ https://issues.apache.org/jira/browse/PIG-1036?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12773149#action_12773149 ] Ankit Modi commented on PIG-1036: - Also the the patch fixes two wrong error codes in {code}LogToPhyTranslationVisitor.updateWithEmptyBagCheck{code} {code} int errCode = 1109; // was 1105 String msg = "Input (" + joinInput.getAlias() + ") " + "on which outer join is desired should have a valid schema"; } catch (FrontendException e) { int errCode = 2104; // was 2014 {code} > Fragment-replicate left outer join > -- > > Key: PIG-1036 > URL: https://issues.apache.org/jira/browse/PIG-1036 > Project: Pig > Issue Type: New Feature >Reporter: Olga Natkovich >Assignee: Ankit Modi > Attachments: LeftOuterFRJoin.patch > > -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1036) Fragment-replicate left outer join
[ https://issues.apache.org/jira/browse/PIG-1036?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ankit Modi updated PIG-1036: Status: Patch Available (was: Open) > Fragment-replicate left outer join > -- > > Key: PIG-1036 > URL: https://issues.apache.org/jira/browse/PIG-1036 > Project: Pig > Issue Type: New Feature >Reporter: Olga Natkovich >Assignee: Ankit Modi > Attachments: LeftOuterFRJoin.patch > > -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Assigned: (PIG-965) PERFORMANCE: optimize common case in matches (PORegex)
[ https://issues.apache.org/jira/browse/PIG-965?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ankit Modi reassigned PIG-965: -- Assignee: Ankit Modi > PERFORMANCE: optimize common case in matches (PORegex) > -- > > Key: PIG-965 > URL: https://issues.apache.org/jira/browse/PIG-965 > Project: Pig > Issue Type: Improvement > Components: impl >Reporter: Thejas M Nair >Assignee: Ankit Modi > > Some frequently seen use cases of 'matches' comparison operator have follow > properties - > 1. The rhs is a constant string . eg "c1 matches 'abc%' " > 2. Regexes such that look for matching prefix , suffix etc are very common. > eg - "abc%', "%abc", '%abc%' > To optimize for these common cases , PORegex.java can be changed to - > 1. Compile the pattern (rhs of matches) re-use it if the pattern string has > not changed. > 2. Use string comparisons for simple common regexes (in 2 above). > The implementation of Hive like clause uses similar optimizations. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1053) Consider moving to Hadoop for local mode
[ https://issues.apache.org/jira/browse/PIG-1053?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ankit Modi updated PIG-1053: Attachment: hadoopLocal.patch This patch fails in releaseAudit for two new html files. > Consider moving to Hadoop for local mode > > > Key: PIG-1053 > URL: https://issues.apache.org/jira/browse/PIG-1053 > Project: Pig > Issue Type: Improvement >Reporter: Alan Gates >Assignee: Ankit Modi > Attachments: hadoopLocal.patch > > > We need to consider moving Pig to use Hadoop's local mode instead of its own. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1053) Consider moving to Hadoop for local mode
[ https://issues.apache.org/jira/browse/PIG-1053?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ankit Modi updated PIG-1053: Status: Patch Available (was: Open) > Consider moving to Hadoop for local mode > > > Key: PIG-1053 > URL: https://issues.apache.org/jira/browse/PIG-1053 > Project: Pig > Issue Type: Improvement >Reporter: Alan Gates >Assignee: Ankit Modi > Attachments: hadoopLocal.patch > > > We need to consider moving Pig to use Hadoop's local mode instead of its own. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-1053) Consider moving to Hadoop for local mode
[ https://issues.apache.org/jira/browse/PIG-1053?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12779179#action_12779179 ] Ankit Modi commented on PIG-1053: - This patch has an issue with custom comparators ( OrderBy) in Local Mode ( Does not affect MapReduce mode ). Details: Pig uses custom Comparators by setting OutputKeyComparator to the customComparator.class, and passing the jar path to JVM while starting the task. In this new local mode a new JVM is not started. So hadoop does not have the classpath of customComparator and fails. A solution for the above problem would be to pass jarpath of customComparator in the "classpath" argument to JVM running pig. eg. CustomComparatorUse.pig register custom.jar A = load 'file';B = order a by * using custompackage.customclass; -- Here hadoop >> bails out giving ClassNotFoundException > Consider moving to Hadoop for local mode > > > Key: PIG-1053 > URL: https://issues.apache.org/jira/browse/PIG-1053 > Project: Pig > Issue Type: Improvement >Reporter: Alan Gates >Assignee: Ankit Modi > Attachments: hadoopLocal.patch > > > We need to consider moving Pig to use Hadoop's local mode instead of its own. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-1053) Consider moving to Hadoop for local mode
[ https://issues.apache.org/jira/browse/PIG-1053?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12779181#action_12779181 ] Ankit Modi commented on PIG-1053: - This patch has an issue with custom comparators ( OrderBy) in Local Mode ( Does not affect MapReduce mode ). Details: Pig uses custom Comparators by setting OutputKeyComparator to the customComparator.class, and passing the jar path to JVM while starting the task. In this new local mode a new JVM is not started. So hadoop does not have the classpath of customComparator and fails. A solution for the above problem would be to pass jarpath of customComparator in the "classpath" argument to JVM running pig. eg. {code:title=CustomComparatorUse.pig} register custom.jar A = load 'file'; B = order A by * using custompackage.customclass; --- Here hadoop bails out giving ClassNotFoundException store B into 'file2' {code} JVM Command {{java -cp pig.jar org.pig.apache.Main -x local CustomComparatorUse.pig # This does not work}} Use this instead {{java -cp pig.jar:{color:red}custom.jar{color} org.pig.apache.Main -x local CustomComparatorUse.pig}} > Consider moving to Hadoop for local mode > > > Key: PIG-1053 > URL: https://issues.apache.org/jira/browse/PIG-1053 > Project: Pig > Issue Type: Improvement >Reporter: Alan Gates >Assignee: Ankit Modi > Attachments: hadoopLocal.patch > > > We need to consider moving Pig to use Hadoop's local mode instead of its own. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-1053) Consider moving to Hadoop for local mode
[ https://issues.apache.org/jira/browse/PIG-1053?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12779252#action_12779252 ] Ankit Modi commented on PIG-1053: - PhysicalPlan in local mode had POCounter Operator before every POStore. This operator was used for getting stats. As we moved to Hadoop this operator is no longer used. Hence the plan size changed. So the numbers changed. > Consider moving to Hadoop for local mode > > > Key: PIG-1053 > URL: https://issues.apache.org/jira/browse/PIG-1053 > Project: Pig > Issue Type: Improvement >Reporter: Alan Gates >Assignee: Ankit Modi > Attachments: hadoopLocal.patch > > > We need to consider moving Pig to use Hadoop's local mode instead of its own. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1053) Consider moving to Hadoop for local mode
[ https://issues.apache.org/jira/browse/PIG-1053?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ankit Modi updated PIG-1053: Attachment: hadoopLocal.patch > Consider moving to Hadoop for local mode > > > Key: PIG-1053 > URL: https://issues.apache.org/jira/browse/PIG-1053 > Project: Pig > Issue Type: Improvement >Reporter: Alan Gates >Assignee: Ankit Modi > Attachments: hadoopLocal.patch > > > We need to consider moving Pig to use Hadoop's local mode instead of its own. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1053) Consider moving to Hadoop for local mode
[ https://issues.apache.org/jira/browse/PIG-1053?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ankit Modi updated PIG-1053: Attachment: (was: hadoopLocal.patch) > Consider moving to Hadoop for local mode > > > Key: PIG-1053 > URL: https://issues.apache.org/jira/browse/PIG-1053 > Project: Pig > Issue Type: Improvement >Reporter: Alan Gates >Assignee: Ankit Modi > Attachments: hadoopLocal.patch > > > We need to consider moving Pig to use Hadoop's local mode instead of its own. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1053) Consider moving to Hadoop for local mode
[ https://issues.apache.org/jira/browse/PIG-1053?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ankit Modi updated PIG-1053: Status: Patch Available (was: Open) > Consider moving to Hadoop for local mode > > > Key: PIG-1053 > URL: https://issues.apache.org/jira/browse/PIG-1053 > Project: Pig > Issue Type: Improvement >Reporter: Alan Gates >Assignee: Ankit Modi > Attachments: hadoopLocal.patch > > > We need to consider moving Pig to use Hadoop's local mode instead of its own. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1053) Consider moving to Hadoop for local mode
[ https://issues.apache.org/jira/browse/PIG-1053?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ankit Modi updated PIG-1053: Status: Open (was: Patch Available) > Consider moving to Hadoop for local mode > > > Key: PIG-1053 > URL: https://issues.apache.org/jira/browse/PIG-1053 > Project: Pig > Issue Type: Improvement >Reporter: Alan Gates >Assignee: Ankit Modi > Attachments: hadoopLocal.patch > > > We need to consider moving Pig to use Hadoop's local mode instead of its own. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (PIG-1107) PigLineRecordReader bails out on an empty line for compressed data
PigLineRecordReader bails out on an empty line for compressed data -- Key: PIG-1107 URL: https://issues.apache.org/jira/browse/PIG-1107 Project: Pig Issue Type: Bug Affects Versions: 0.6.0 Reporter: Ankit Modi Assignee: Ankit Modi Fix For: 0.6.0 PigLineRecordReader bails out with an exception when it encounters an empty line in a compressed file java.lang.ArrayIndexOutOfBoundsException: -1 at org.apache.pig.impl.io.PigLineRecordReader$LineReader.getNext(PigLineRecordReader.java:136) at org.apache.pig.impl.io.PigLineRecordReader.next(PigLineRecordReader.java:57) at org.apache.pig.builtin.PigStorage.getNext(PigStorage.java:121) at org.apache.pig.backend.executionengine.PigSlice.next(PigSlice.java:139) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.SliceWrapper$1.next(SliceWrapper.java:164) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.SliceWrapper$1.next(SliceWrapper.java:140) at org.apache.hadoop.mapred.MapTask$TrackedRecordReader.moveToNext(MapTask.java:192) at org.apache.hadoop.mapred.MapTask$TrackedRecordReader.next(MapTask.java:176) at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:48) at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:358) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:307) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1107) PigLineRecordReader bails out on an empty line for compressed data
[ https://issues.apache.org/jira/browse/PIG-1107?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ankit Modi updated PIG-1107: Status: Patch Available (was: Open) > PigLineRecordReader bails out on an empty line for compressed data > -- > > Key: PIG-1107 > URL: https://issues.apache.org/jira/browse/PIG-1107 > Project: Pig > Issue Type: Bug >Affects Versions: 0.6.0 >Reporter: Ankit Modi >Assignee: Ankit Modi > Fix For: 0.6.0 > > Attachments: pig_piglinerecordreader_bug.patch > > > PigLineRecordReader bails out with an exception when it encounters an empty > line in a compressed file > java.lang.ArrayIndexOutOfBoundsException: -1 >at > org.apache.pig.impl.io.PigLineRecordReader$LineReader.getNext(PigLineRecordReader.java:136) > at > org.apache.pig.impl.io.PigLineRecordReader.next(PigLineRecordReader.java:57) > at org.apache.pig.builtin.PigStorage.getNext(PigStorage.java:121) > at > org.apache.pig.backend.executionengine.PigSlice.next(PigSlice.java:139) > at > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.SliceWrapper$1.next(SliceWrapper.java:164) > at > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.SliceWrapper$1.next(SliceWrapper.java:140) > at > org.apache.hadoop.mapred.MapTask$TrackedRecordReader.moveToNext(MapTask.java:192) > at > org.apache.hadoop.mapred.MapTask$TrackedRecordReader.next(MapTask.java:176) > at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:48) > at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:358) > at org.apache.hadoop.mapred.MapTask.run(MapTask.java:307) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1107) PigLineRecordReader bails out on an empty line for compressed data
[ https://issues.apache.org/jira/browse/PIG-1107?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ankit Modi updated PIG-1107: Attachment: pig_piglinerecordreader_bug.patch Submitting a small patch. It has 2 new unit tests for the patch applied. > PigLineRecordReader bails out on an empty line for compressed data > -- > > Key: PIG-1107 > URL: https://issues.apache.org/jira/browse/PIG-1107 > Project: Pig > Issue Type: Bug >Affects Versions: 0.6.0 >Reporter: Ankit Modi >Assignee: Ankit Modi > Fix For: 0.6.0 > > Attachments: pig_piglinerecordreader_bug.patch > > > PigLineRecordReader bails out with an exception when it encounters an empty > line in a compressed file > java.lang.ArrayIndexOutOfBoundsException: -1 >at > org.apache.pig.impl.io.PigLineRecordReader$LineReader.getNext(PigLineRecordReader.java:136) > at > org.apache.pig.impl.io.PigLineRecordReader.next(PigLineRecordReader.java:57) > at org.apache.pig.builtin.PigStorage.getNext(PigStorage.java:121) > at > org.apache.pig.backend.executionengine.PigSlice.next(PigSlice.java:139) > at > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.SliceWrapper$1.next(SliceWrapper.java:164) > at > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.SliceWrapper$1.next(SliceWrapper.java:140) > at > org.apache.hadoop.mapred.MapTask$TrackedRecordReader.moveToNext(MapTask.java:192) > at > org.apache.hadoop.mapred.MapTask$TrackedRecordReader.next(MapTask.java:176) > at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:48) > at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:358) > at org.apache.hadoop.mapred.MapTask.run(MapTask.java:307) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-965) PERFORMANCE: optimize common case in matches (PORegex)
[ https://issues.apache.org/jira/browse/PIG-965?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12784596#action_12784596 ] Ankit Modi commented on PIG-965: I implemented a patch with optimization 1 and 2 mentioned above and another patch with optimization 1,2 and dk.brics.automaton. dk.brics.automaton does not support all features of java.util.regex hence the second patch considers that and switches to java.util.regex if the regex can only be handled by java.util.regex. Here are the numbers ||Regex|| svn_trunk ||Optimization 1 and 2|| dk.brics.automaton|| comments || | .\*ABCD.\* | 92.74 | 50.92| 49.32 | Here only optimization 2 is used | | .\*[A-F]{2,3}.\* |152.3| 133.48| 105.93 | dk.brics.automaton is used | | A.B.C.D | 54.492 | 44.46 | 44.66 | dk.brics.automaton is used | | .\*([A-F]{4})\w\*\1.\* | 129.29 | 112.89 | 109.43 | java.util.regex used in all cases | | .\*\[A-F\]\{4\}\w\*[N-Z]\{3\}.\* | 129.63 | 108.11 | 54.42 | dk.brics.automaton used | These results were obtained using Local Mode on 1 Billion lines of data of following format f1:Chararray(100) of random chars from [A-Z] f2:int random integer dk.brics.automaton provides good performance in case of complex regex. > PERFORMANCE: optimize common case in matches (PORegex) > -- > > Key: PIG-965 > URL: https://issues.apache.org/jira/browse/PIG-965 > Project: Pig > Issue Type: Improvement > Components: impl >Reporter: Thejas M Nair >Assignee: Ankit Modi > > Some frequently seen use cases of 'matches' comparison operator have follow > properties - > 1. The rhs is a constant string . eg "c1 matches 'abc%' " > 2. Regexes such that look for matching prefix , suffix etc are very common. > eg - "abc%', "%abc", '%abc%' > To optimize for these common cases , PORegex.java can be changed to - > 1. Compile the pattern (rhs of matches) re-use it if the pattern string has > not changed. > 2. Use string comparisons for simple common regexes (in 2 above). > The implementation of Hive like clause uses similar optimizations. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (PIG-1130) In pig local ( hadoop local mode ) mode the counting of number of tuples and bytes is incorrect if data is more than one local split.
In pig local ( hadoop local mode ) mode the counting of number of tuples and bytes is incorrect if data is more than one local split. - Key: PIG-1130 URL: https://issues.apache.org/jira/browse/PIG-1130 Project: Pig Issue Type: Bug Reporter: Ankit Modi Priority: Minor If the output generates more than one part file, the current code only gives stats of the first part file. ie. part-0 -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-965) PERFORMANCE: optimize common case in matches (PORegex)
[ https://issues.apache.org/jira/browse/PIG-965?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ankit Modi updated PIG-965: --- Attachment: poregex2.patch poregex.patch These are patches for two implementations One (poregex.patch) is an implementation applying optimization mentioned above in the JIRA. Second (poregex2.patch) implementation applies optimization 1 and uses dk.brics.automaton for running simple regular expressions. Otherwise it reverts back to java.util.regex. In 1 the decision to use optimization two or use java.util.regex is decided by getSimpleString method In 2 the decision to use dk.brics.automaton is done by determineBestRegexMethod. ( changes to build.xml is this patch are temporary ) Both patches use RegexInit as an implementation which makes a decision ( calling the above mentioned decision functions ) and then sets the implementation to one decided by the decision function. In second patch, the decision function was created looking at the support of operators in dk.brics.automaton and its grammar. I tried out the classes supported and not supported in dk.brics.automaton and decided upon it. I could not find any specific page mentioning the difference between regex language java.util.regex and dk.brics.automaton. > PERFORMANCE: optimize common case in matches (PORegex) > -- > > Key: PIG-965 > URL: https://issues.apache.org/jira/browse/PIG-965 > Project: Pig > Issue Type: Improvement > Components: impl >Reporter: Thejas M Nair >Assignee: Ankit Modi > Attachments: poregex.patch, poregex2.patch > > > Some frequently seen use cases of 'matches' comparison operator have follow > properties - > 1. The rhs is a constant string . eg "c1 matches 'abc%' " > 2. Regexes such that look for matching prefix , suffix etc are very common. > eg - "abc%', "%abc", '%abc%' > To optimize for these common cases , PORegex.java can be changed to - > 1. Compile the pattern (rhs of matches) re-use it if the pattern string has > not changed. > 2. Use string comparisons for simple common regexes (in 2 above). > The implementation of Hive like clause uses similar optimizations. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-965) PERFORMANCE: optimize common case in matches (PORegex)
[ https://issues.apache.org/jira/browse/PIG-965?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ankit Modi updated PIG-965: --- Attachment: poregex2.patch Attaching one more file of patch. This one has all the changes, except changes to build.xml. Still trying to find a maven repo for dk.brics.automaton. > PERFORMANCE: optimize common case in matches (PORegex) > -- > > Key: PIG-965 > URL: https://issues.apache.org/jira/browse/PIG-965 > Project: Pig > Issue Type: Improvement > Components: impl >Reporter: Thejas M Nair >Assignee: Ankit Modi > Attachments: poregex.patch, poregex2.patch, poregex2.patch > > > Some frequently seen use cases of 'matches' comparison operator have follow > properties - > 1. The rhs is a constant string . eg "c1 matches 'abc%' " > 2. Regexes such that look for matching prefix , suffix etc are very common. > eg - "abc%', "%abc", '%abc%' > To optimize for these common cases , PORegex.java can be changed to - > 1. Compile the pattern (rhs of matches) re-use it if the pattern string has > not changed. > 2. Use string comparisons for simple common regexes (in 2 above). > The implementation of Hive like clause uses similar optimizations. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-965) PERFORMANCE: optimize common case in matches (PORegex)
[ https://issues.apache.org/jira/browse/PIG-965?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ankit Modi updated PIG-965: --- Attachment: (was: poregex2.patch) > PERFORMANCE: optimize common case in matches (PORegex) > -- > > Key: PIG-965 > URL: https://issues.apache.org/jira/browse/PIG-965 > Project: Pig > Issue Type: Improvement > Components: impl >Reporter: Thejas M Nair >Assignee: Ankit Modi > > Some frequently seen use cases of 'matches' comparison operator have follow > properties - > 1. The rhs is a constant string . eg "c1 matches 'abc%' " > 2. Regexes such that look for matching prefix , suffix etc are very common. > eg - "abc%', "%abc", '%abc%' > To optimize for these common cases , PORegex.java can be changed to - > 1. Compile the pattern (rhs of matches) re-use it if the pattern string has > not changed. > 2. Use string comparisons for simple common regexes (in 2 above). > The implementation of Hive like clause uses similar optimizations. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-965) PERFORMANCE: optimize common case in matches (PORegex)
[ https://issues.apache.org/jira/browse/PIG-965?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ankit Modi updated PIG-965: --- Attachment: (was: poregex2.patch) > PERFORMANCE: optimize common case in matches (PORegex) > -- > > Key: PIG-965 > URL: https://issues.apache.org/jira/browse/PIG-965 > Project: Pig > Issue Type: Improvement > Components: impl >Reporter: Thejas M Nair >Assignee: Ankit Modi > > Some frequently seen use cases of 'matches' comparison operator have follow > properties - > 1. The rhs is a constant string . eg "c1 matches 'abc%' " > 2. Regexes such that look for matching prefix , suffix etc are very common. > eg - "abc%', "%abc", '%abc%' > To optimize for these common cases , PORegex.java can be changed to - > 1. Compile the pattern (rhs of matches) re-use it if the pattern string has > not changed. > 2. Use string comparisons for simple common regexes (in 2 above). > The implementation of Hive like clause uses similar optimizations. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-965) PERFORMANCE: optimize common case in matches (PORegex)
[ https://issues.apache.org/jira/browse/PIG-965?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ankit Modi updated PIG-965: --- Attachment: (was: poregex.patch) > PERFORMANCE: optimize common case in matches (PORegex) > -- > > Key: PIG-965 > URL: https://issues.apache.org/jira/browse/PIG-965 > Project: Pig > Issue Type: Improvement > Components: impl >Reporter: Thejas M Nair >Assignee: Ankit Modi > > Some frequently seen use cases of 'matches' comparison operator have follow > properties - > 1. The rhs is a constant string . eg "c1 matches 'abc%' " > 2. Regexes such that look for matching prefix , suffix etc are very common. > eg - "abc%', "%abc", '%abc%' > To optimize for these common cases , PORegex.java can be changed to - > 1. Compile the pattern (rhs of matches) re-use it if the pattern string has > not changed. > 2. Use string comparisons for simple common regexes (in 2 above). > The implementation of Hive like clause uses similar optimizations. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-965) PERFORMANCE: optimize common case in matches (PORegex)
[ https://issues.apache.org/jira/browse/PIG-965?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ankit Modi updated PIG-965: --- Status: Patch Available (was: Open) > PERFORMANCE: optimize common case in matches (PORegex) > -- > > Key: PIG-965 > URL: https://issues.apache.org/jira/browse/PIG-965 > Project: Pig > Issue Type: Improvement > Components: impl >Reporter: Thejas M Nair >Assignee: Ankit Modi > Attachments: automaton.jar, poregex2.patch > > > Some frequently seen use cases of 'matches' comparison operator have follow > properties - > 1. The rhs is a constant string . eg "c1 matches 'abc%' " > 2. Regexes such that look for matching prefix , suffix etc are very common. > eg - "abc%', "%abc", '%abc%' > To optimize for these common cases , PORegex.java can be changed to - > 1. Compile the pattern (rhs of matches) re-use it if the pattern string has > not changed. > 2. Use string comparisons for simple common regexes (in 2 above). > The implementation of Hive like clause uses similar optimizations. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-965) PERFORMANCE: optimize common case in matches (PORegex)
[ https://issues.apache.org/jira/browse/PIG-965?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ankit Modi updated PIG-965: --- Attachment: automaton.jar poregex2.patch New patch with removed comments and added automaton.jar from http://www.brics.dk/~amoeller/automaton/automaton.jar. It fails findBugs due to missing symbols. I ran the findBugs after adding the jar to the build and it did not complain about any findBugs in the modified and added files. > PERFORMANCE: optimize common case in matches (PORegex) > -- > > Key: PIG-965 > URL: https://issues.apache.org/jira/browse/PIG-965 > Project: Pig > Issue Type: Improvement > Components: impl >Reporter: Thejas M Nair >Assignee: Ankit Modi > Attachments: automaton.jar, poregex2.patch > > > Some frequently seen use cases of 'matches' comparison operator have follow > properties - > 1. The rhs is a constant string . eg "c1 matches 'abc%' " > 2. Regexes such that look for matching prefix , suffix etc are very common. > eg - "abc%', "%abc", '%abc%' > To optimize for these common cases , PORegex.java can be changed to - > 1. Compile the pattern (rhs of matches) re-use it if the pattern string has > not changed. > 2. Use string comparisons for simple common regexes (in 2 above). > The implementation of Hive like clause uses similar optimizations. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-965) PERFORMANCE: optimize common case in matches (PORegex)
[ https://issues.apache.org/jira/browse/PIG-965?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ankit Modi updated PIG-965: --- Status: Open (was: Patch Available) One small change to JarManager.java is missing. Will add a new patch with it. > PERFORMANCE: optimize common case in matches (PORegex) > -- > > Key: PIG-965 > URL: https://issues.apache.org/jira/browse/PIG-965 > Project: Pig > Issue Type: Improvement > Components: impl >Reporter: Thejas M Nair >Assignee: Ankit Modi > Attachments: automaton.jar, poregex2.patch > > > Some frequently seen use cases of 'matches' comparison operator have follow > properties - > 1. The rhs is a constant string . eg "c1 matches 'abc%' " > 2. Regexes such that look for matching prefix , suffix etc are very common. > eg - "abc%', "%abc", '%abc%' > To optimize for these common cases , PORegex.java can be changed to - > 1. Compile the pattern (rhs of matches) re-use it if the pattern string has > not changed. > 2. Use string comparisons for simple common regexes (in 2 above). > The implementation of Hive like clause uses similar optimizations. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1106) FR join should not spill
[ https://issues.apache.org/jira/browse/PIG-1106?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ankit Modi updated PIG-1106: Attachment: frjoin-nonspill.patch This patch does not have any tests. Creating a test would be creating a big file about 250 MB and testing it. I have ran some tests in similar fashion. > FR join should not spill > > > Key: PIG-1106 > URL: https://issues.apache.org/jira/browse/PIG-1106 > Project: Pig > Issue Type: Bug >Reporter: Olga Natkovich >Assignee: Ankit Modi > Fix For: 0.7.0 > > Attachments: frjoin-nonspill.patch > > > Currently, the values for the replicated side of the data are placed in a > spillable bag (POFRJoin near line 275). This does not make sense because the > whole point of the optimization is that the data on one side fits into > memory. We already have a non-spillable bag implemented > (NonSpillableDataBag.java) and we need to change FRJoin code to use it. And > of course need to do lots of testing to make sure that we don't spill but die > instead when we run out of memory -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1106) FR join should not spill
[ https://issues.apache.org/jira/browse/PIG-1106?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ankit Modi updated PIG-1106: Status: Patch Available (was: Open) This patch does not have any unit tests. > FR join should not spill > > > Key: PIG-1106 > URL: https://issues.apache.org/jira/browse/PIG-1106 > Project: Pig > Issue Type: Bug >Reporter: Olga Natkovich >Assignee: Ankit Modi > Fix For: 0.7.0 > > Attachments: frjoin-nonspill.patch > > > Currently, the values for the replicated side of the data are placed in a > spillable bag (POFRJoin near line 275). This does not make sense because the > whole point of the optimization is that the data on one side fits into > memory. We already have a non-spillable bag implemented > (NonSpillableDataBag.java) and we need to change FRJoin code to use it. And > of course need to do lots of testing to make sure that we don't spill but die > instead when we run out of memory -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-1106) FR join should not spill
[ https://issues.apache.org/jira/browse/PIG-1106?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12789294#action_12789294 ] Ankit Modi commented on PIG-1106: - Tests I ran were using two files file format f1: random chararray(100) f2: random int leftside file contained 100 tuples and right side file contain 3million tuples. Code {noformat} A = load 'leftsidefrjoin.txt' as ( key, value); B = load 'rightsidefrjoin.txt' as (key, value); C = join A by key left, B by key using "repl"; --- Fragmented input and replicated input store C into 'output'; {noformat} This generated following error {noformat} FATAL org.apache.hadoop.mapred.TaskTracker: Error running child : java.lang.OutOfMemoryError: GC overhead limit exceeded at java.util.ArrayList.(ArrayList.java:112) at org.apache.pig.data.DefaultTuple.(DefaultTuple.java:63) at org.apache.pig.data.DefaultTupleFactory.newTuple(DefaultTupleFactory.java:35) at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POLocalRearrange.constructLROutput(POLocalRearrange.java:369) at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POLocalRearrange.getNext(POLocalRearrange.java:288) at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POFRJoin.setUpHashMap(POFRJoin.java:351) at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POFRJoin.getNext(POFRJoin.java:211) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.runPipeline(PigMapBase.java:250) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:241) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapOnly$Map.map(PigMapOnly.java:65) at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50) at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:358) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:307) at org.apache.hadoop.mapred.Child.main(Child.java:170) {noformat} I ran the same job with same records on left hand side and 100K records on right hand side. The job completed successfully. > FR join should not spill > > > Key: PIG-1106 > URL: https://issues.apache.org/jira/browse/PIG-1106 > Project: Pig > Issue Type: Bug >Reporter: Olga Natkovich >Assignee: Ankit Modi > Fix For: 0.7.0 > > Attachments: frjoin-nonspill.patch > > > Currently, the values for the replicated side of the data are placed in a > spillable bag (POFRJoin near line 275). This does not make sense because the > whole point of the optimization is that the data on one side fits into > memory. We already have a non-spillable bag implemented > (NonSpillableDataBag.java) and we need to change FRJoin code to use it. And > of course need to do lots of testing to make sure that we don't spill but die > instead when we run out of memory -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-965) PERFORMANCE: optimize common case in matches (PORegex)
[ https://issues.apache.org/jira/browse/PIG-965?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ankit Modi updated PIG-965: --- Attachment: (was: automaton.jar) > PERFORMANCE: optimize common case in matches (PORegex) > -- > > Key: PIG-965 > URL: https://issues.apache.org/jira/browse/PIG-965 > Project: Pig > Issue Type: Improvement > Components: impl >Reporter: Thejas M Nair >Assignee: Ankit Modi > > Some frequently seen use cases of 'matches' comparison operator have follow > properties - > 1. The rhs is a constant string . eg "c1 matches 'abc%' " > 2. Regexes such that look for matching prefix , suffix etc are very common. > eg - "abc%', "%abc", '%abc%' > To optimize for these common cases , PORegex.java can be changed to - > 1. Compile the pattern (rhs of matches) re-use it if the pattern string has > not changed. > 2. Use string comparisons for simple common regexes (in 2 above). > The implementation of Hive like clause uses similar optimizations. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-965) PERFORMANCE: optimize common case in matches (PORegex)
[ https://issues.apache.org/jira/browse/PIG-965?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ankit Modi updated PIG-965: --- Attachment: (was: poregex2.patch) > PERFORMANCE: optimize common case in matches (PORegex) > -- > > Key: PIG-965 > URL: https://issues.apache.org/jira/browse/PIG-965 > Project: Pig > Issue Type: Improvement > Components: impl >Reporter: Thejas M Nair >Assignee: Ankit Modi > > Some frequently seen use cases of 'matches' comparison operator have follow > properties - > 1. The rhs is a constant string . eg "c1 matches 'abc%' " > 2. Regexes such that look for matching prefix , suffix etc are very common. > eg - "abc%', "%abc", '%abc%' > To optimize for these common cases , PORegex.java can be changed to - > 1. Compile the pattern (rhs of matches) re-use it if the pattern string has > not changed. > 2. Use string comparisons for simple common regexes (in 2 above). > The implementation of Hive like clause uses similar optimizations. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-965) PERFORMANCE: optimize common case in matches (PORegex)
[ https://issues.apache.org/jira/browse/PIG-965?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ankit Modi updated PIG-965: --- Status: Patch Available (was: Open) > PERFORMANCE: optimize common case in matches (PORegex) > -- > > Key: PIG-965 > URL: https://issues.apache.org/jira/browse/PIG-965 > Project: Pig > Issue Type: Improvement > Components: impl >Reporter: Thejas M Nair >Assignee: Ankit Modi > Attachments: automaton.jar, poregex2.patch > > > Some frequently seen use cases of 'matches' comparison operator have follow > properties - > 1. The rhs is a constant string . eg "c1 matches 'abc%' " > 2. Regexes such that look for matching prefix , suffix etc are very common. > eg - "abc%', "%abc", '%abc%' > To optimize for these common cases , PORegex.java can be changed to - > 1. Compile the pattern (rhs of matches) re-use it if the pattern string has > not changed. > 2. Use string comparisons for simple common regexes (in 2 above). > The implementation of Hive like clause uses similar optimizations. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-965) PERFORMANCE: optimize common case in matches (PORegex)
[ https://issues.apache.org/jira/browse/PIG-965?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ankit Modi updated PIG-965: --- Attachment: automaton.jar poregex2.patch > PERFORMANCE: optimize common case in matches (PORegex) > -- > > Key: PIG-965 > URL: https://issues.apache.org/jira/browse/PIG-965 > Project: Pig > Issue Type: Improvement > Components: impl >Reporter: Thejas M Nair >Assignee: Ankit Modi > Attachments: automaton.jar, poregex2.patch > > > Some frequently seen use cases of 'matches' comparison operator have follow > properties - > 1. The rhs is a constant string . eg "c1 matches 'abc%' " > 2. Regexes such that look for matching prefix , suffix etc are very common. > eg - "abc%', "%abc", '%abc%' > To optimize for these common cases , PORegex.java can be changed to - > 1. Compile the pattern (rhs of matches) re-use it if the pattern string has > not changed. > 2. Use string comparisons for simple common regexes (in 2 above). > The implementation of Hive like clause uses similar optimizations. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-965) PERFORMANCE: optimize common case in matches (PORegex)
[ https://issues.apache.org/jira/browse/PIG-965?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ankit Modi updated PIG-965: --- Attachment: (was: poregex2.patch) > PERFORMANCE: optimize common case in matches (PORegex) > -- > > Key: PIG-965 > URL: https://issues.apache.org/jira/browse/PIG-965 > Project: Pig > Issue Type: Improvement > Components: impl >Reporter: Thejas M Nair >Assignee: Ankit Modi > Attachments: automaton.jar, poregex2.patch > > > Some frequently seen use cases of 'matches' comparison operator have follow > properties - > 1. The rhs is a constant string . eg "c1 matches 'abc%' " > 2. Regexes such that look for matching prefix , suffix etc are very common. > eg - "abc%', "%abc", '%abc%' > To optimize for these common cases , PORegex.java can be changed to - > 1. Compile the pattern (rhs of matches) re-use it if the pattern string has > not changed. > 2. Use string comparisons for simple common regexes (in 2 above). > The implementation of Hive like clause uses similar optimizations. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-965) PERFORMANCE: optimize common case in matches (PORegex)
[ https://issues.apache.org/jira/browse/PIG-965?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ankit Modi updated PIG-965: --- Status: Open (was: Patch Available) > PERFORMANCE: optimize common case in matches (PORegex) > -- > > Key: PIG-965 > URL: https://issues.apache.org/jira/browse/PIG-965 > Project: Pig > Issue Type: Improvement > Components: impl >Reporter: Thejas M Nair >Assignee: Ankit Modi > Attachments: automaton.jar, poregex2.patch > > > Some frequently seen use cases of 'matches' comparison operator have follow > properties - > 1. The rhs is a constant string . eg "c1 matches 'abc%' " > 2. Regexes such that look for matching prefix , suffix etc are very common. > eg - "abc%', "%abc", '%abc%' > To optimize for these common cases , PORegex.java can be changed to - > 1. Compile the pattern (rhs of matches) re-use it if the pattern string has > not changed. > 2. Use string comparisons for simple common regexes (in 2 above). > The implementation of Hive like clause uses similar optimizations. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-965) PERFORMANCE: optimize common case in matches (PORegex)
[ https://issues.apache.org/jira/browse/PIG-965?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ankit Modi updated PIG-965: --- Attachment: poregex2.patch > PERFORMANCE: optimize common case in matches (PORegex) > -- > > Key: PIG-965 > URL: https://issues.apache.org/jira/browse/PIG-965 > Project: Pig > Issue Type: Improvement > Components: impl >Reporter: Thejas M Nair >Assignee: Ankit Modi > Attachments: automaton.jar, poregex2.patch > > > Some frequently seen use cases of 'matches' comparison operator have follow > properties - > 1. The rhs is a constant string . eg "c1 matches 'abc%' " > 2. Regexes such that look for matching prefix , suffix etc are very common. > eg - "abc%', "%abc", '%abc%' > To optimize for these common cases , PORegex.java can be changed to - > 1. Compile the pattern (rhs of matches) re-use it if the pattern string has > not changed. > 2. Use string comparisons for simple common regexes (in 2 above). > The implementation of Hive like clause uses similar optimizations. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-965) PERFORMANCE: optimize common case in matches (PORegex)
[ https://issues.apache.org/jira/browse/PIG-965?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ankit Modi updated PIG-965: --- Status: Patch Available (was: Open) I have included changes suggested by Thejas. > PERFORMANCE: optimize common case in matches (PORegex) > -- > > Key: PIG-965 > URL: https://issues.apache.org/jira/browse/PIG-965 > Project: Pig > Issue Type: Improvement > Components: impl >Reporter: Thejas M Nair >Assignee: Ankit Modi > Attachments: automaton.jar, poregex2.patch > > > Some frequently seen use cases of 'matches' comparison operator have follow > properties - > 1. The rhs is a constant string . eg "c1 matches 'abc%' " > 2. Regexes such that look for matching prefix , suffix etc are very common. > eg - "abc%', "%abc", '%abc%' > To optimize for these common cases , PORegex.java can be changed to - > 1. Compile the pattern (rhs of matches) re-use it if the pattern string has > not changed. > 2. Use string comparisons for simple common regexes (in 2 above). > The implementation of Hive like clause uses similar optimizations. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-965) PERFORMANCE: optimize common case in matches (PORegex)
[ https://issues.apache.org/jira/browse/PIG-965?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ankit Modi updated PIG-965: --- Status: Open (was: Patch Available) > PERFORMANCE: optimize common case in matches (PORegex) > -- > > Key: PIG-965 > URL: https://issues.apache.org/jira/browse/PIG-965 > Project: Pig > Issue Type: Improvement > Components: impl >Reporter: Thejas M Nair >Assignee: Ankit Modi > Attachments: automaton.jar, poregex2.patch > > > Some frequently seen use cases of 'matches' comparison operator have follow > properties - > 1. The rhs is a constant string . eg "c1 matches 'abc%' " > 2. Regexes such that look for matching prefix , suffix etc are very common. > eg - "abc%', "%abc", '%abc%' > To optimize for these common cases , PORegex.java can be changed to - > 1. Compile the pattern (rhs of matches) re-use it if the pattern string has > not changed. > 2. Use string comparisons for simple common regexes (in 2 above). > The implementation of Hive like clause uses similar optimizations. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-965) PERFORMANCE: optimize common case in matches (PORegex)
[ https://issues.apache.org/jira/browse/PIG-965?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12790545#action_12790545 ] Ankit Modi commented on PIG-965: * NonConstantRegex - I did not think of equals. But I added a length check before as it could find out change in length faster and to best of my knowledge its a getMethod. And yes as you mentioned equals will check for same object and instanceOf which is not useful in our case. * The numbers published above are using dk.brics.automaton.RunAutomaton. Do you want me to publish numbers for more set of regexs ? I'll create a patch for rest of the comments. > PERFORMANCE: optimize common case in matches (PORegex) > -- > > Key: PIG-965 > URL: https://issues.apache.org/jira/browse/PIG-965 > Project: Pig > Issue Type: Improvement > Components: impl >Reporter: Thejas M Nair >Assignee: Ankit Modi > Attachments: automaton.jar, poregex2.patch > > > Some frequently seen use cases of 'matches' comparison operator have follow > properties - > 1. The rhs is a constant string . eg "c1 matches 'abc%' " > 2. Regexes such that look for matching prefix , suffix etc are very common. > eg - "abc%', "%abc", '%abc%' > To optimize for these common cases , PORegex.java can be changed to - > 1. Compile the pattern (rhs of matches) re-use it if the pattern string has > not changed. > 2. Use string comparisons for simple common regexes (in 2 above). > The implementation of Hive like clause uses similar optimizations. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-965) PERFORMANCE: optimize common case in matches (PORegex)
[ https://issues.apache.org/jira/browse/PIG-965?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ankit Modi updated PIG-965: --- Attachment: (was: poregex2.patch) > PERFORMANCE: optimize common case in matches (PORegex) > -- > > Key: PIG-965 > URL: https://issues.apache.org/jira/browse/PIG-965 > Project: Pig > Issue Type: Improvement > Components: impl >Reporter: Thejas M Nair >Assignee: Ankit Modi > Attachments: automaton.jar, poregex2.patch > > > Some frequently seen use cases of 'matches' comparison operator have follow > properties - > 1. The rhs is a constant string . eg "c1 matches 'abc%' " > 2. Regexes such that look for matching prefix , suffix etc are very common. > eg - "abc%', "%abc", '%abc%' > To optimize for these common cases , PORegex.java can be changed to - > 1. Compile the pattern (rhs of matches) re-use it if the pattern string has > not changed. > 2. Use string comparisons for simple common regexes (in 2 above). > The implementation of Hive like clause uses similar optimizations. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-965) PERFORMANCE: optimize common case in matches (PORegex)
[ https://issues.apache.org/jira/browse/PIG-965?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ankit Modi updated PIG-965: --- Attachment: poregex2.patch > PERFORMANCE: optimize common case in matches (PORegex) > -- > > Key: PIG-965 > URL: https://issues.apache.org/jira/browse/PIG-965 > Project: Pig > Issue Type: Improvement > Components: impl >Reporter: Thejas M Nair >Assignee: Ankit Modi > Attachments: automaton.jar, poregex2.patch > > > Some frequently seen use cases of 'matches' comparison operator have follow > properties - > 1. The rhs is a constant string . eg "c1 matches 'abc%' " > 2. Regexes such that look for matching prefix , suffix etc are very common. > eg - "abc%', "%abc", '%abc%' > To optimize for these common cases , PORegex.java can be changed to - > 1. Compile the pattern (rhs of matches) re-use it if the pattern string has > not changed. > 2. Use string comparisons for simple common regexes (in 2 above). > The implementation of Hive like clause uses similar optimizations. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-965) PERFORMANCE: optimize common case in matches (PORegex)
[ https://issues.apache.org/jira/browse/PIG-965?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ankit Modi updated PIG-965: --- Status: Patch Available (was: Open) > PERFORMANCE: optimize common case in matches (PORegex) > -- > > Key: PIG-965 > URL: https://issues.apache.org/jira/browse/PIG-965 > Project: Pig > Issue Type: Improvement > Components: impl >Reporter: Thejas M Nair >Assignee: Ankit Modi > Attachments: automaton.jar, poregex2.patch > > > Some frequently seen use cases of 'matches' comparison operator have follow > properties - > 1. The rhs is a constant string . eg "c1 matches 'abc%' " > 2. Regexes such that look for matching prefix , suffix etc are very common. > eg - "abc%', "%abc", '%abc%' > To optimize for these common cases , PORegex.java can be changed to - > 1. Compile the pattern (rhs of matches) re-use it if the pattern string has > not changed. > 2. Use string comparisons for simple common regexes (in 2 above). > The implementation of Hive like clause uses similar optimizations. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-965) PERFORMANCE: optimize common case in matches (PORegex)
[ https://issues.apache.org/jira/browse/PIG-965?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12791096#action_12791096 ] Ankit Modi commented on PIG-965: Here are numbers comparing comparing optimization 1&2 against optimization 1 & dk.brics dk.brics.Runautomaton is as fast as optimization 2 and also provides similar speeds in a set of additional expressions. || Query || svn_trunk || std_dev || Optimization 1 & 2 || std_dev || Optimization 1 & brics.RunAutomaton || std_dev || | .\*ABCD.\* | 33.87 | 0.71 | 18.77 | 0.71 | 18.94 | 0.02 | | .\*ABCD | 30.06 | 2.91 | 18.44 | 0.05 | 18.94 | 0.03 | | ABCD.\* | 21.93 | 2.91 | 18.35 | 0.1 | 18.85 | 0.04 | Values are averaged over 3 runs. > PERFORMANCE: optimize common case in matches (PORegex) > -- > > Key: PIG-965 > URL: https://issues.apache.org/jira/browse/PIG-965 > Project: Pig > Issue Type: Improvement > Components: impl >Reporter: Thejas M Nair >Assignee: Ankit Modi > Attachments: automaton.jar, poregex2.patch > > > Some frequently seen use cases of 'matches' comparison operator have follow > properties - > 1. The rhs is a constant string . eg "c1 matches 'abc%' " > 2. Regexes such that look for matching prefix , suffix etc are very common. > eg - "abc%', "%abc", '%abc%' > To optimize for these common cases , PORegex.java can be changed to - > 1. Compile the pattern (rhs of matches) re-use it if the pattern string has > not changed. > 2. Use string comparisons for simple common regexes (in 2 above). > The implementation of Hive like clause uses similar optimizations. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-965) PERFORMANCE: optimize common case in matches (PORegex)
[ https://issues.apache.org/jira/browse/PIG-965?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ankit Modi updated PIG-965: --- Attachment: (was: poregex2.patch) > PERFORMANCE: optimize common case in matches (PORegex) > -- > > Key: PIG-965 > URL: https://issues.apache.org/jira/browse/PIG-965 > Project: Pig > Issue Type: Improvement > Components: impl >Reporter: Thejas M Nair >Assignee: Ankit Modi > Attachments: automaton.jar > > > Some frequently seen use cases of 'matches' comparison operator have follow > properties - > 1. The rhs is a constant string . eg "c1 matches 'abc%' " > 2. Regexes such that look for matching prefix , suffix etc are very common. > eg - "abc%', "%abc", '%abc%' > To optimize for these common cases , PORegex.java can be changed to - > 1. Compile the pattern (rhs of matches) re-use it if the pattern string has > not changed. > 2. Use string comparisons for simple common regexes (in 2 above). > The implementation of Hive like clause uses similar optimizations. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-965) PERFORMANCE: optimize common case in matches (PORegex)
[ https://issues.apache.org/jira/browse/PIG-965?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ankit Modi updated PIG-965: --- Status: Open (was: Patch Available) > PERFORMANCE: optimize common case in matches (PORegex) > -- > > Key: PIG-965 > URL: https://issues.apache.org/jira/browse/PIG-965 > Project: Pig > Issue Type: Improvement > Components: impl >Reporter: Thejas M Nair >Assignee: Ankit Modi > Attachments: automaton.jar > > > Some frequently seen use cases of 'matches' comparison operator have follow > properties - > 1. The rhs is a constant string . eg "c1 matches 'abc%' " > 2. Regexes such that look for matching prefix , suffix etc are very common. > eg - "abc%', "%abc", '%abc%' > To optimize for these common cases , PORegex.java can be changed to - > 1. Compile the pattern (rhs of matches) re-use it if the pattern string has > not changed. > 2. Use string comparisons for simple common regexes (in 2 above). > The implementation of Hive like clause uses similar optimizations. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-965) PERFORMANCE: optimize common case in matches (PORegex)
[ https://issues.apache.org/jira/browse/PIG-965?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ankit Modi updated PIG-965: --- Assignee: Benjamin Francisoud (was: Ankit Modi) Status: Patch Available (was: Open) > PERFORMANCE: optimize common case in matches (PORegex) > -- > > Key: PIG-965 > URL: https://issues.apache.org/jira/browse/PIG-965 > Project: Pig > Issue Type: Improvement > Components: impl >Reporter: Thejas M Nair >Assignee: Benjamin Francisoud > Attachments: automaton.jar, poregex2.patch > > > Some frequently seen use cases of 'matches' comparison operator have follow > properties - > 1. The rhs is a constant string . eg "c1 matches 'abc%' " > 2. Regexes such that look for matching prefix , suffix etc are very common. > eg - "abc%', "%abc", '%abc%' > To optimize for these common cases , PORegex.java can be changed to - > 1. Compile the pattern (rhs of matches) re-use it if the pattern string has > not changed. > 2. Use string comparisons for simple common regexes (in 2 above). > The implementation of Hive like clause uses similar optimizations. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-965) PERFORMANCE: optimize common case in matches (PORegex)
[ https://issues.apache.org/jira/browse/PIG-965?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ankit Modi updated PIG-965: --- Attachment: poregex2.patch Rewrote some logic in case 1 and 3 of determineBestRegex. Also found a bug in case1 so updated that. Added Thejas's recommendation. Also added a few unit test patterns. > PERFORMANCE: optimize common case in matches (PORegex) > -- > > Key: PIG-965 > URL: https://issues.apache.org/jira/browse/PIG-965 > Project: Pig > Issue Type: Improvement > Components: impl >Reporter: Thejas M Nair >Assignee: Ankit Modi > Attachments: automaton.jar, poregex2.patch > > > Some frequently seen use cases of 'matches' comparison operator have follow > properties - > 1. The rhs is a constant string . eg "c1 matches 'abc%' " > 2. Regexes such that look for matching prefix , suffix etc are very common. > eg - "abc%', "%abc", '%abc%' > To optimize for these common cases , PORegex.java can be changed to - > 1. Compile the pattern (rhs of matches) re-use it if the pattern string has > not changed. > 2. Use string comparisons for simple common regexes (in 2 above). > The implementation of Hive like clause uses similar optimizations. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Assigned: (PIG-965) PERFORMANCE: optimize common case in matches (PORegex)
[ https://issues.apache.org/jira/browse/PIG-965?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ankit Modi reassigned PIG-965: -- Assignee: Ankit Modi (was: Benjamin Francisoud) > PERFORMANCE: optimize common case in matches (PORegex) > -- > > Key: PIG-965 > URL: https://issues.apache.org/jira/browse/PIG-965 > Project: Pig > Issue Type: Improvement > Components: impl >Reporter: Thejas M Nair >Assignee: Ankit Modi > Attachments: automaton.jar, poregex2.patch > > > Some frequently seen use cases of 'matches' comparison operator have follow > properties - > 1. The rhs is a constant string . eg "c1 matches 'abc%' " > 2. Regexes such that look for matching prefix , suffix etc are very common. > eg - "abc%', "%abc", '%abc%' > To optimize for these common cases , PORegex.java can be changed to - > 1. Compile the pattern (rhs of matches) re-use it if the pattern string has > not changed. > 2. Use string comparisons for simple common regexes (in 2 above). > The implementation of Hive like clause uses similar optimizations. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-965) PERFORMANCE: optimize common case in matches (PORegex)
[ https://issues.apache.org/jira/browse/PIG-965?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ankit Modi updated PIG-965: --- Attachment: (was: poregex2.patch) > PERFORMANCE: optimize common case in matches (PORegex) > -- > > Key: PIG-965 > URL: https://issues.apache.org/jira/browse/PIG-965 > Project: Pig > Issue Type: Improvement > Components: impl >Reporter: Thejas M Nair >Assignee: Ankit Modi > Attachments: automaton.jar, poregex2.patch > > > Some frequently seen use cases of 'matches' comparison operator have follow > properties - > 1. The rhs is a constant string . eg "c1 matches 'abc%' " > 2. Regexes such that look for matching prefix , suffix etc are very common. > eg - "abc%', "%abc", '%abc%' > To optimize for these common cases , PORegex.java can be changed to - > 1. Compile the pattern (rhs of matches) re-use it if the pattern string has > not changed. > 2. Use string comparisons for simple common regexes (in 2 above). > The implementation of Hive like clause uses similar optimizations. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-965) PERFORMANCE: optimize common case in matches (PORegex)
[ https://issues.apache.org/jira/browse/PIG-965?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ankit Modi updated PIG-965: --- Attachment: poregex2.patch > PERFORMANCE: optimize common case in matches (PORegex) > -- > > Key: PIG-965 > URL: https://issues.apache.org/jira/browse/PIG-965 > Project: Pig > Issue Type: Improvement > Components: impl >Reporter: Thejas M Nair >Assignee: Ankit Modi > Attachments: automaton.jar, poregex2.patch > > > Some frequently seen use cases of 'matches' comparison operator have follow > properties - > 1. The rhs is a constant string . eg "c1 matches 'abc%' " > 2. Regexes such that look for matching prefix , suffix etc are very common. > eg - "abc%', "%abc", '%abc%' > To optimize for these common cases , PORegex.java can be changed to - > 1. Compile the pattern (rhs of matches) re-use it if the pattern string has > not changed. > 2. Use string comparisons for simple common regexes (in 2 above). > The implementation of Hive like clause uses similar optimizations. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-965) PERFORMANCE: optimize common case in matches (PORegex)
[ https://issues.apache.org/jira/browse/PIG-965?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ankit Modi updated PIG-965: --- Status: Open (was: Patch Available) > PERFORMANCE: optimize common case in matches (PORegex) > -- > > Key: PIG-965 > URL: https://issues.apache.org/jira/browse/PIG-965 > Project: Pig > Issue Type: Improvement > Components: impl >Reporter: Thejas M Nair >Assignee: Ankit Modi > Attachments: automaton.jar, poregex2.patch > > > Some frequently seen use cases of 'matches' comparison operator have follow > properties - > 1. The rhs is a constant string . eg "c1 matches 'abc%' " > 2. Regexes such that look for matching prefix , suffix etc are very common. > eg - "abc%', "%abc", '%abc%' > To optimize for these common cases , PORegex.java can be changed to - > 1. Compile the pattern (rhs of matches) re-use it if the pattern string has > not changed. > 2. Use string comparisons for simple common regexes (in 2 above). > The implementation of Hive like clause uses similar optimizations. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-965) PERFORMANCE: optimize common case in matches (PORegex)
[ https://issues.apache.org/jira/browse/PIG-965?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ankit Modi updated PIG-965: --- Status: Patch Available (was: Open) > PERFORMANCE: optimize common case in matches (PORegex) > -- > > Key: PIG-965 > URL: https://issues.apache.org/jira/browse/PIG-965 > Project: Pig > Issue Type: Improvement > Components: impl >Reporter: Thejas M Nair >Assignee: Ankit Modi > Attachments: automaton.jar, poregex2.patch > > > Some frequently seen use cases of 'matches' comparison operator have follow > properties - > 1. The rhs is a constant string . eg "c1 matches 'abc%' " > 2. Regexes such that look for matching prefix , suffix etc are very common. > eg - "abc%', "%abc", '%abc%' > To optimize for these common cases , PORegex.java can be changed to - > 1. Compile the pattern (rhs of matches) re-use it if the pattern string has > not changed. > 2. Use string comparisons for simple common regexes (in 2 above). > The implementation of Hive like clause uses similar optimizations. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1178) LogicalPlan and Optimizer are too complex and hard to work with
[ https://issues.apache.org/jira/browse/PIG-1178?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ankit Modi updated PIG-1178: Attachment: pig_1178.patch Attaching another patch with end-to-end functionality of load,filter,join,store and a few other expression operators. This patch is self sufficient and can be applied directly on SVN Trunk. > LogicalPlan and Optimizer are too complex and hard to work with > --- > > Key: PIG-1178 > URL: https://issues.apache.org/jira/browse/PIG-1178 > Project: Pig > Issue Type: Improvement >Reporter: Alan Gates >Assignee: Ying He > Attachments: expressions-2.patch, expressions.patch, lp.patch, > lp.patch, pig_1178.patch, PIG_1178.patch > > > The current implementation of the logical plan and the logical optimizer in > Pig has proven to not be easily extensible. Developer feedback has indicated > that adding new rules to the optimizer is quite burdensome. In addition, the > logical plan has been an area of numerous bugs, many of which have been > difficult to fix. Developers also feel that the logical plan is difficult to > understand and maintain. The root cause for these issues is that a number of > design decisions that were made as part of the 0.2 rewrite of the front end > have now proven to be sub-optimal. The heart of this proposal is to revisit a > number of those proposals and rebuild the logical plan with a simpler design > that will make it much easier to maintain the logical plan as well as extend > the logical optimizer. > See http://wiki.apache.org/pig/PigLogicalPlanOptimizerRewrite for full > details. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-1154) local mode fails when hadoop config directory is specified in classpath
[ https://issues.apache.org/jira/browse/PIG-1154?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12828822#action_12828822 ] Ankit Modi commented on PIG-1154: - It looks like the problem is caused by overwritten value of mapred.system.dir from mapred-default.xml and the path mentioned above "/mapredsystem/hadoop/mapredsystem/" may not exist. This cannot be solved in local mode as it is not possible to change classpath at runtime. I'll provide a patch which would * Provide a warning whenever classpath contains mapred-site.xml or hdfs-site.xml. * It'll exit pig with an error message if above case is encountered. > local mode fails when hadoop config directory is specified in classpath > --- > > Key: PIG-1154 > URL: https://issues.apache.org/jira/browse/PIG-1154 > Project: Pig > Issue Type: Bug >Affects Versions: 0.7.0 >Reporter: Thejas M Nair >Assignee: Ankit Modi > Fix For: 0.7.0 > > > In local mode, the hadoop configuration should not be taken from the > classpath . -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-1154) local mode fails when hadoop config directory is specified in classpath
[ https://issues.apache.org/jira/browse/PIG-1154?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12828831#action_12828831 ] Ankit Modi commented on PIG-1154: - It will provide warning whenever the files are encountered in Local Mode. On top of that it will exit with error if mapred.system.dir is different from the default one and it does not exist. > local mode fails when hadoop config directory is specified in classpath > --- > > Key: PIG-1154 > URL: https://issues.apache.org/jira/browse/PIG-1154 > Project: Pig > Issue Type: Bug >Affects Versions: 0.7.0 >Reporter: Thejas M Nair >Assignee: Ankit Modi > Fix For: 0.7.0 > > > In local mode, the hadoop configuration should not be taken from the > classpath . -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1154) local mode fails when hadoop config directory is specified in classpath
[ https://issues.apache.org/jira/browse/PIG-1154?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ankit Modi updated PIG-1154: Attachment: pig_1154.patch Patch according to comments mentioned above. > local mode fails when hadoop config directory is specified in classpath > --- > > Key: PIG-1154 > URL: https://issues.apache.org/jira/browse/PIG-1154 > Project: Pig > Issue Type: Bug >Affects Versions: 0.7.0 >Reporter: Thejas M Nair >Assignee: Ankit Modi > Fix For: 0.7.0 > > Attachments: pig_1154.patch > > > In local mode, the hadoop configuration should not be taken from the > classpath . -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1154) local mode fails when hadoop config directory is specified in classpath
[ https://issues.apache.org/jira/browse/PIG-1154?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ankit Modi updated PIG-1154: Status: Patch Available (was: Open) This patch only affects only Local Mode in pig. > local mode fails when hadoop config directory is specified in classpath > --- > > Key: PIG-1154 > URL: https://issues.apache.org/jira/browse/PIG-1154 > Project: Pig > Issue Type: Bug >Affects Versions: 0.7.0 >Reporter: Thejas M Nair >Assignee: Ankit Modi > Fix For: 0.7.0 > > Attachments: pig_1154.patch > > > In local mode, the hadoop configuration should not be taken from the > classpath . -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1178) LogicalPlan and Optimizer are too complex and hard to work with
[ https://issues.apache.org/jira/browse/PIG-1178?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ankit Modi updated PIG-1178: Status: Open (was: Patch Available) I found a bug in the code so I'll be releasing another patch for the same. I'll keep this patch in the JIRA until I replace it with a new one so everyone can review it. > LogicalPlan and Optimizer are too complex and hard to work with > --- > > Key: PIG-1178 > URL: https://issues.apache.org/jira/browse/PIG-1178 > Project: Pig > Issue Type: Improvement >Reporter: Alan Gates >Assignee: Ying He > Attachments: expressions-2.patch, expressions.patch, lp.patch, > lp.patch, pig_1178.patch, PIG_1178.patch > > > The current implementation of the logical plan and the logical optimizer in > Pig has proven to not be easily extensible. Developer feedback has indicated > that adding new rules to the optimizer is quite burdensome. In addition, the > logical plan has been an area of numerous bugs, many of which have been > difficult to fix. Developers also feel that the logical plan is difficult to > understand and maintain. The root cause for these issues is that a number of > design decisions that were made as part of the 0.2 rewrite of the front end > have now proven to be sub-optimal. The heart of this proposal is to revisit a > number of those proposals and rebuild the logical plan with a simpler design > that will make it much easier to maintain the logical plan as well as extend > the logical optimizer. > See http://wiki.apache.org/pig/PigLogicalPlanOptimizerRewrite for full > details. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1178) LogicalPlan and Optimizer are too complex and hard to work with
[ https://issues.apache.org/jira/browse/PIG-1178?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ankit Modi updated PIG-1178: Attachment: pig_1178.patch This is a new patch that can be applied to SVN Trunk. It includes ForEach, InnerLoad, Generate operators along with some LogicalExpression. It also includes a new optimizer Rule for pushing FilterAboveForeach > LogicalPlan and Optimizer are too complex and hard to work with > --- > > Key: PIG-1178 > URL: https://issues.apache.org/jira/browse/PIG-1178 > Project: Pig > Issue Type: Improvement >Reporter: Alan Gates >Assignee: Ying He > Attachments: expressions-2.patch, expressions.patch, lp.patch, > lp.patch, pig_1178.patch, pig_1178.patch, PIG_1178.patch > > > The current implementation of the logical plan and the logical optimizer in > Pig has proven to not be easily extensible. Developer feedback has indicated > that adding new rules to the optimizer is quite burdensome. In addition, the > logical plan has been an area of numerous bugs, many of which have been > difficult to fix. Developers also feel that the logical plan is difficult to > understand and maintain. The root cause for these issues is that a number of > design decisions that were made as part of the 0.2 rewrite of the front end > have now proven to be sub-optimal. The heart of this proposal is to revisit a > number of those proposals and rebuild the logical plan with a simpler design > that will make it much easier to maintain the logical plan as well as extend > the logical optimizer. > See http://wiki.apache.org/pig/PigLogicalPlanOptimizerRewrite for full > details. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Reopened: (PIG-965) PERFORMANCE: optimize common case in matches (PORegex)
[ https://issues.apache.org/jira/browse/PIG-965?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ankit Modi reopened PIG-965: Assignee: (was: Ankit Modi) I couldn't see the poregex2.patch patch applied in the code. automaton.jar is present in the trunk, but the files modified/added by above patch are not modified/added. > PERFORMANCE: optimize common case in matches (PORegex) > -- > > Key: PIG-965 > URL: https://issues.apache.org/jira/browse/PIG-965 > Project: Pig > Issue Type: Improvement > Components: impl >Reporter: Thejas M Nair > Attachments: automaton.jar, poregex2.patch > > > Some frequently seen use cases of 'matches' comparison operator have follow > properties - > 1. The rhs is a constant string . eg "c1 matches 'abc%' " > 2. Regexes such that look for matching prefix , suffix etc are very common. > eg - "abc%', "%abc", '%abc%' > To optimize for these common cases , PORegex.java can be changed to - > 1. Compile the pattern (rhs of matches) re-use it if the pattern string has > not changed. > 2. Use string comparisons for simple common regexes (in 2 above). > The implementation of Hive like clause uses similar optimizations. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1178) LogicalPlan and Optimizer are too complex and hard to work with
[ https://issues.apache.org/jira/browse/PIG-1178?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ankit Modi updated PIG-1178: Status: Patch Available (was: Open) > LogicalPlan and Optimizer are too complex and hard to work with > --- > > Key: PIG-1178 > URL: https://issues.apache.org/jira/browse/PIG-1178 > Project: Pig > Issue Type: Improvement >Reporter: Alan Gates >Assignee: Ying He > Attachments: expressions-2.patch, expressions.patch, lp.patch, > lp.patch, pig_1178.patch, pig_1178.patch, PIG_1178.patch, pig_1178_2.patch > > > The current implementation of the logical plan and the logical optimizer in > Pig has proven to not be easily extensible. Developer feedback has indicated > that adding new rules to the optimizer is quite burdensome. In addition, the > logical plan has been an area of numerous bugs, many of which have been > difficult to fix. Developers also feel that the logical plan is difficult to > understand and maintain. The root cause for these issues is that a number of > design decisions that were made as part of the 0.2 rewrite of the front end > have now proven to be sub-optimal. The heart of this proposal is to revisit a > number of those proposals and rebuild the logical plan with a simpler design > that will make it much easier to maintain the logical plan as well as extend > the logical optimizer. > See http://wiki.apache.org/pig/PigLogicalPlanOptimizerRewrite for full > details. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1178) LogicalPlan and Optimizer are too complex and hard to work with
[ https://issues.apache.org/jira/browse/PIG-1178?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ankit Modi updated PIG-1178: Attachment: pig_1178_2.patch Another patch with a few more LogicalExpressions and some more unit tests using the foreach operator It also has a rudimentry planPrinter to print new logical plan. > LogicalPlan and Optimizer are too complex and hard to work with > --- > > Key: PIG-1178 > URL: https://issues.apache.org/jira/browse/PIG-1178 > Project: Pig > Issue Type: Improvement >Reporter: Alan Gates >Assignee: Ying He > Attachments: expressions-2.patch, expressions.patch, lp.patch, > lp.patch, pig_1178.patch, pig_1178.patch, PIG_1178.patch, pig_1178_2.patch > > > The current implementation of the logical plan and the logical optimizer in > Pig has proven to not be easily extensible. Developer feedback has indicated > that adding new rules to the optimizer is quite burdensome. In addition, the > logical plan has been an area of numerous bugs, many of which have been > difficult to fix. Developers also feel that the logical plan is difficult to > understand and maintain. The root cause for these issues is that a number of > design decisions that were made as part of the 0.2 rewrite of the front end > have now proven to be sub-optimal. The heart of this proposal is to revisit a > number of those proposals and rebuild the logical plan with a simpler design > that will make it much easier to maintain the logical plan as well as extend > the logical optimizer. > See http://wiki.apache.org/pig/PigLogicalPlanOptimizerRewrite for full > details. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1178) LogicalPlan and Optimizer are too complex and hard to work with
[ https://issues.apache.org/jira/browse/PIG-1178?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ankit Modi updated PIG-1178: Status: Open (was: Patch Available) > LogicalPlan and Optimizer are too complex and hard to work with > --- > > Key: PIG-1178 > URL: https://issues.apache.org/jira/browse/PIG-1178 > Project: Pig > Issue Type: Improvement >Reporter: Alan Gates >Assignee: Ying He > Attachments: expressions-2.patch, expressions.patch, lp.patch, > lp.patch, pig_1178.patch, pig_1178.patch, PIG_1178.patch, pig_1178_2.patch > > > The current implementation of the logical plan and the logical optimizer in > Pig has proven to not be easily extensible. Developer feedback has indicated > that adding new rules to the optimizer is quite burdensome. In addition, the > logical plan has been an area of numerous bugs, many of which have been > difficult to fix. Developers also feel that the logical plan is difficult to > understand and maintain. The root cause for these issues is that a number of > design decisions that were made as part of the 0.2 rewrite of the front end > have now proven to be sub-optimal. The heart of this proposal is to revisit a > number of those proposals and rebuild the logical plan with a simpler design > that will make it much easier to maintain the logical plan as well as extend > the logical optimizer. > See http://wiki.apache.org/pig/PigLogicalPlanOptimizerRewrite for full > details. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1178) LogicalPlan and Optimizer are too complex and hard to work with
[ https://issues.apache.org/jira/browse/PIG-1178?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ankit Modi updated PIG-1178: Status: Patch Available (was: Open) Resubmitting patch again due to core test failures > LogicalPlan and Optimizer are too complex and hard to work with > --- > > Key: PIG-1178 > URL: https://issues.apache.org/jira/browse/PIG-1178 > Project: Pig > Issue Type: Improvement >Reporter: Alan Gates >Assignee: Ying He > Attachments: expressions-2.patch, expressions.patch, lp.patch, > lp.patch, pig_1178.patch, pig_1178.patch, PIG_1178.patch, pig_1178_2.patch > > > The current implementation of the logical plan and the logical optimizer in > Pig has proven to not be easily extensible. Developer feedback has indicated > that adding new rules to the optimizer is quite burdensome. In addition, the > logical plan has been an area of numerous bugs, many of which have been > difficult to fix. Developers also feel that the logical plan is difficult to > understand and maintain. The root cause for these issues is that a number of > design decisions that were made as part of the 0.2 rewrite of the front end > have now proven to be sub-optimal. The heart of this proposal is to revisit a > number of those proposals and rebuild the logical plan with a simpler design > that will make it much easier to maintain the logical plan as well as extend > the logical optimizer. > See http://wiki.apache.org/pig/PigLogicalPlanOptimizerRewrite for full > details. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-1178) LogicalPlan and Optimizer are too complex and hard to work with
[ https://issues.apache.org/jira/browse/PIG-1178?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12837954#action_12837954 ] Ankit Modi commented on PIG-1178: - the core tests are failing due to some issue with hudson or the framework. I ran the core tests again yesterday night and they passed. > LogicalPlan and Optimizer are too complex and hard to work with > --- > > Key: PIG-1178 > URL: https://issues.apache.org/jira/browse/PIG-1178 > Project: Pig > Issue Type: Improvement >Reporter: Alan Gates >Assignee: Ying He > Attachments: expressions-2.patch, expressions.patch, lp.patch, > lp.patch, pig_1178.patch, pig_1178.patch, PIG_1178.patch, pig_1178_2.patch > > > The current implementation of the logical plan and the logical optimizer in > Pig has proven to not be easily extensible. Developer feedback has indicated > that adding new rules to the optimizer is quite burdensome. In addition, the > logical plan has been an area of numerous bugs, many of which have been > difficult to fix. Developers also feel that the logical plan is difficult to > understand and maintain. The root cause for these issues is that a number of > design decisions that were made as part of the 0.2 rewrite of the front end > have now proven to be sub-optimal. The heart of this proposal is to revisit a > number of those proposals and rebuild the logical plan with a simpler design > that will make it much easier to maintain the logical plan as well as extend > the logical optimizer. > See http://wiki.apache.org/pig/PigLogicalPlanOptimizerRewrite for full > details. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1178) LogicalPlan and Optimizer are too complex and hard to work with
[ https://issues.apache.org/jira/browse/PIG-1178?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ankit Modi updated PIG-1178: Status: Patch Available (was: Open) > LogicalPlan and Optimizer are too complex and hard to work with > --- > > Key: PIG-1178 > URL: https://issues.apache.org/jira/browse/PIG-1178 > Project: Pig > Issue Type: Improvement >Reporter: Alan Gates >Assignee: Ying He > Attachments: expressions-2.patch, expressions.patch, lp.patch, > lp.patch, pig_1178.patch, pig_1178.patch, PIG_1178.patch, pig_1178_2.patch, > pig_1178_3.patch > > > The current implementation of the logical plan and the logical optimizer in > Pig has proven to not be easily extensible. Developer feedback has indicated > that adding new rules to the optimizer is quite burdensome. In addition, the > logical plan has been an area of numerous bugs, many of which have been > difficult to fix. Developers also feel that the logical plan is difficult to > understand and maintain. The root cause for these issues is that a number of > design decisions that were made as part of the 0.2 rewrite of the front end > have now proven to be sub-optimal. The heart of this proposal is to revisit a > number of those proposals and rebuild the logical plan with a simpler design > that will make it much easier to maintain the logical plan as well as extend > the logical optimizer. > See http://wiki.apache.org/pig/PigLogicalPlanOptimizerRewrite for full > details. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1178) LogicalPlan and Optimizer are too complex and hard to work with
[ https://issues.apache.org/jira/browse/PIG-1178?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ankit Modi updated PIG-1178: Attachment: pig_1178_3.patch > LogicalPlan and Optimizer are too complex and hard to work with > --- > > Key: PIG-1178 > URL: https://issues.apache.org/jira/browse/PIG-1178 > Project: Pig > Issue Type: Improvement >Reporter: Alan Gates >Assignee: Ying He > Attachments: expressions-2.patch, expressions.patch, lp.patch, > lp.patch, pig_1178.patch, pig_1178.patch, PIG_1178.patch, pig_1178_2.patch, > pig_1178_3.patch > > > The current implementation of the logical plan and the logical optimizer in > Pig has proven to not be easily extensible. Developer feedback has indicated > that adding new rules to the optimizer is quite burdensome. In addition, the > logical plan has been an area of numerous bugs, many of which have been > difficult to fix. Developers also feel that the logical plan is difficult to > understand and maintain. The root cause for these issues is that a number of > design decisions that were made as part of the 0.2 rewrite of the front end > have now proven to be sub-optimal. The heart of this proposal is to revisit a > number of those proposals and rebuild the logical plan with a simpler design > that will make it much easier to maintain the logical plan as well as extend > the logical optimizer. > See http://wiki.apache.org/pig/PigLogicalPlanOptimizerRewrite for full > details. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (PIG-960) Using Hadoop's optimized LineRecordReader for reading Tuples in PigStorage
Using Hadoop's optimized LineRecordReader for reading Tuples in PigStorage --- Key: PIG-960 URL: https://issues.apache.org/jira/browse/PIG-960 Project: Pig Issue Type: Improvement Components: impl Reporter: Ankit Modi PigStorage's reading of Tuples ( lines ) can be optimized using Hadoop's {{LineRecordReader}}. This can help in following areas - Improving performance reading of Tuples (lines) in {{PigStorage}} - Any future improvements in line reading done in Hadoop's {{LineRecordReader}} is automatically carried over to Pig Issues that are handled by this patch - BZip uses internal buffers and positioning for determining the number of bytes read. Hence buffering done by {{LineRecordReader}} has to be turned off - Current implementation of {{LocalSeekableInputStream}} does not implement {{available}} method. This method has to be implemented. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-960) Using Hadoop's optimized LineRecordReader for reading Tuples in PigStorage
[ https://issues.apache.org/jira/browse/PIG-960?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ankit Modi updated PIG-960: --- Patch Info: (was: [Patch Available]) Performance improvement numbers obtained by running PigMix ||Script||svn Trunk||LineRecordReader Patch|| ||L1|186|147| ||L2|73|33| ||L3|195|165| ||L4|116|76| ||L5|93|59| ||L6|102|63| ||L7|91|69| ||L8|84|44| ||L9|189|148| ||L10|285|268| ||L11|108|51| ||L12|112|73| ||Sum|1634|1196| ||% Improvement| ||26.81| > Using Hadoop's optimized LineRecordReader for reading Tuples in PigStorage > --- > > Key: PIG-960 > URL: https://issues.apache.org/jira/browse/PIG-960 > Project: Pig > Issue Type: Improvement > Components: impl >Reporter: Ankit Modi > > PigStorage's reading of Tuples ( lines ) can be optimized using Hadoop's > {{LineRecordReader}}. > This can help in following areas > - Improving performance reading of Tuples (lines) in {{PigStorage}} > - Any future improvements in line reading done in Hadoop's > {{LineRecordReader}} is automatically carried over to Pig > Issues that are handled by this patch > - BZip uses internal buffers and positioning for determining the number of > bytes read. Hence buffering done by {{LineRecordReader}} has to be turned off > - Current implementation of {{LocalSeekableInputStream}} does not implement > {{available}} method. This method has to be implemented. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-960) Using Hadoop's optimized LineRecordReader for reading Tuples in PigStorage
[ https://issues.apache.org/jira/browse/PIG-960?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ankit Modi updated PIG-960: --- Patch Info: [Patch Available] Adding a patch file > Using Hadoop's optimized LineRecordReader for reading Tuples in PigStorage > --- > > Key: PIG-960 > URL: https://issues.apache.org/jira/browse/PIG-960 > Project: Pig > Issue Type: Improvement > Components: impl >Reporter: Ankit Modi > > PigStorage's reading of Tuples ( lines ) can be optimized using Hadoop's > {{LineRecordReader}}. > This can help in following areas > - Improving performance reading of Tuples (lines) in {{PigStorage}} > - Any future improvements in line reading done in Hadoop's > {{LineRecordReader}} is automatically carried over to Pig > Issues that are handled by this patch > - BZip uses internal buffers and positioning for determining the number of > bytes read. Hence buffering done by {{LineRecordReader}} has to be turned off > - Current implementation of {{LocalSeekableInputStream}} does not implement > {{available}} method. This method has to be implemented. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-960) Using Hadoop's optimized LineRecordReader for reading Tuples in PigStorage
[ https://issues.apache.org/jira/browse/PIG-960?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ankit Modi updated PIG-960: --- Attachment: pig_rlr.patch This is a patch of all the changes for improvement done with LineRecordReader > Using Hadoop's optimized LineRecordReader for reading Tuples in PigStorage > --- > > Key: PIG-960 > URL: https://issues.apache.org/jira/browse/PIG-960 > Project: Pig > Issue Type: Improvement > Components: impl >Reporter: Ankit Modi > Attachments: pig_rlr.patch > > > PigStorage's reading of Tuples ( lines ) can be optimized using Hadoop's > {{LineRecordReader}}. > This can help in following areas > - Improving performance reading of Tuples (lines) in {{PigStorage}} > - Any future improvements in line reading done in Hadoop's > {{LineRecordReader}} is automatically carried over to Pig > Issues that are handled by this patch > - BZip uses internal buffers and positioning for determining the number of > bytes read. Hence buffering done by {{LineRecordReader}} has to be turned off > - Current implementation of {{LocalSeekableInputStream}} does not implement > {{available}} method. This method has to be implemented. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-960) Using Hadoop's optimized LineRecordReader for reading Tuples in PigStorage
[ https://issues.apache.org/jira/browse/PIG-960?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ankit Modi updated PIG-960: --- Attachment: (was: pig_rlr.patch) > Using Hadoop's optimized LineRecordReader for reading Tuples in PigStorage > --- > > Key: PIG-960 > URL: https://issues.apache.org/jira/browse/PIG-960 > Project: Pig > Issue Type: Improvement > Components: impl >Reporter: Ankit Modi > > PigStorage's reading of Tuples ( lines ) can be optimized using Hadoop's > {{LineRecordReader}}. > This can help in following areas > - Improving performance reading of Tuples (lines) in {{PigStorage}} > - Any future improvements in line reading done in Hadoop's > {{LineRecordReader}} is automatically carried over to Pig > Issues that are handled by this patch > - BZip uses internal buffers and positioning for determining the number of > bytes read. Hence buffering done by {{LineRecordReader}} has to be turned off > - Current implementation of {{LocalSeekableInputStream}} does not implement > {{available}} method. This method has to be implemented. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-960) Using Hadoop's optimized LineRecordReader for reading Tuples in PigStorage
[ https://issues.apache.org/jira/browse/PIG-960?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ankit Modi updated PIG-960: --- Status: Open (was: Patch Available) This patch failed in release audit > Using Hadoop's optimized LineRecordReader for reading Tuples in PigStorage > --- > > Key: PIG-960 > URL: https://issues.apache.org/jira/browse/PIG-960 > Project: Pig > Issue Type: Improvement > Components: impl >Reporter: Ankit Modi > > PigStorage's reading of Tuples ( lines ) can be optimized using Hadoop's > {{LineRecordReader}}. > This can help in following areas > - Improving performance reading of Tuples (lines) in {{PigStorage}} > - Any future improvements in line reading done in Hadoop's > {{LineRecordReader}} is automatically carried over to Pig > Issues that are handled by this patch > - BZip uses internal buffers and positioning for determining the number of > bytes read. Hence buffering done by {{LineRecordReader}} has to be turned off > - Current implementation of {{LocalSeekableInputStream}} does not implement > {{available}} method. This method has to be implemented. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-960) Using Hadoop's optimized LineRecordReader for reading Tuples in PigStorage
[ https://issues.apache.org/jira/browse/PIG-960?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ankit Modi updated PIG-960: --- Attachment: pig_rlr.patch Added a new patch with Apache license and SVN Trunk Revision 819662 > Using Hadoop's optimized LineRecordReader for reading Tuples in PigStorage > --- > > Key: PIG-960 > URL: https://issues.apache.org/jira/browse/PIG-960 > Project: Pig > Issue Type: Improvement > Components: impl >Reporter: Ankit Modi > Attachments: pig_rlr.patch > > > PigStorage's reading of Tuples ( lines ) can be optimized using Hadoop's > {{LineRecordReader}}. > This can help in following areas > - Improving performance reading of Tuples (lines) in {{PigStorage}} > - Any future improvements in line reading done in Hadoop's > {{LineRecordReader}} is automatically carried over to Pig > Issues that are handled by this patch > - BZip uses internal buffers and positioning for determining the number of > bytes read. Hence buffering done by {{LineRecordReader}} has to be turned off > - Current implementation of {{LocalSeekableInputStream}} does not implement > {{available}} method. This method has to be implemented. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-960) Using Hadoop's optimized LineRecordReader for reading Tuples in PigStorage
[ https://issues.apache.org/jira/browse/PIG-960?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ankit Modi updated PIG-960: --- Status: Patch Available (was: Open) This update adds three new warning as it uses org.apache.mapred classes which have been deprecated > Using Hadoop's optimized LineRecordReader for reading Tuples in PigStorage > --- > > Key: PIG-960 > URL: https://issues.apache.org/jira/browse/PIG-960 > Project: Pig > Issue Type: Improvement > Components: impl >Reporter: Ankit Modi > Attachments: pig_rlr.patch > > > PigStorage's reading of Tuples ( lines ) can be optimized using Hadoop's > {{LineRecordReader}}. > This can help in following areas > - Improving performance reading of Tuples (lines) in {{PigStorage}} > - Any future improvements in line reading done in Hadoop's > {{LineRecordReader}} is automatically carried over to Pig > Issues that are handled by this patch > - BZip uses internal buffers and positioning for determining the number of > bytes read. Hence buffering done by {{LineRecordReader}} has to be turned off > - Current implementation of {{LocalSeekableInputStream}} does not implement > {{available}} method. This method has to be implemented. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-960) Using Hadoop's optimized LineRecordReader for reading Tuples in PigStorage
[ https://issues.apache.org/jira/browse/PIG-960?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12760822#action_12760822 ] Ankit Modi commented on PIG-960: Thanks for comments Daniel. Answers: 1. PigLineRecordReader (PLRR) needs to know the type of InputStream it is handling. BZip2 or Uncompressed. Depending on the type of input stream it chooses which Reader to utilize. BPIS ( BufferedPositionedInputStream ) stores the input stream as a protected member. PLRR can access this via following ways: - making member public, - adding a get method to access it or - inherit. I implemented the last one as it makes least changes to BPIS. 2. Good one. Will be fixed in next patch. 3. Will be added in next patch 4. Corrected in next patch. > Using Hadoop's optimized LineRecordReader for reading Tuples in PigStorage > --- > > Key: PIG-960 > URL: https://issues.apache.org/jira/browse/PIG-960 > Project: Pig > Issue Type: Improvement > Components: impl >Reporter: Ankit Modi > Attachments: pig_rlr.patch > > > PigStorage's reading of Tuples ( lines ) can be optimized using Hadoop's > {{LineRecordReader}}. > This can help in following areas > - Improving performance reading of Tuples (lines) in {{PigStorage}} > - Any future improvements in line reading done in Hadoop's > {{LineRecordReader}} is automatically carried over to Pig > Issues that are handled by this patch > - BZip uses internal buffers and positioning for determining the number of > bytes read. Hence buffering done by {{LineRecordReader}} has to be turned off > - Current implementation of {{LocalSeekableInputStream}} does not implement > {{available}} method. This method has to be implemented. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-960) Using Hadoop's optimized LineRecordReader for reading Tuples in PigStorage
[ https://issues.apache.org/jira/browse/PIG-960?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ankit Modi updated PIG-960: --- Attachment: (was: pig_rlr.patch) > Using Hadoop's optimized LineRecordReader for reading Tuples in PigStorage > --- > > Key: PIG-960 > URL: https://issues.apache.org/jira/browse/PIG-960 > Project: Pig > Issue Type: Improvement > Components: impl >Reporter: Ankit Modi > Attachments: pig_rlr.patch > > > PigStorage's reading of Tuples ( lines ) can be optimized using Hadoop's > {{LineRecordReader}}. > This can help in following areas > - Improving performance reading of Tuples (lines) in {{PigStorage}} > - Any future improvements in line reading done in Hadoop's > {{LineRecordReader}} is automatically carried over to Pig > Issues that are handled by this patch > - BZip uses internal buffers and positioning for determining the number of > bytes read. Hence buffering done by {{LineRecordReader}} has to be turned off > - Current implementation of {{LocalSeekableInputStream}} does not implement > {{available}} method. This method has to be implemented. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-960) Using Hadoop's optimized LineRecordReader for reading Tuples in PigStorage
[ https://issues.apache.org/jira/browse/PIG-960?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ankit Modi updated PIG-960: --- Attachment: pig_rlr.patch > Using Hadoop's optimized LineRecordReader for reading Tuples in PigStorage > --- > > Key: PIG-960 > URL: https://issues.apache.org/jira/browse/PIG-960 > Project: Pig > Issue Type: Improvement > Components: impl >Reporter: Ankit Modi > Attachments: pig_rlr.patch > > > PigStorage's reading of Tuples ( lines ) can be optimized using Hadoop's > {{LineRecordReader}}. > This can help in following areas > - Improving performance reading of Tuples (lines) in {{PigStorage}} > - Any future improvements in line reading done in Hadoop's > {{LineRecordReader}} is automatically carried over to Pig > Issues that are handled by this patch > - BZip uses internal buffers and positioning for determining the number of > bytes read. Hence buffering done by {{LineRecordReader}} has to be turned off > - Current implementation of {{LocalSeekableInputStream}} does not implement > {{available}} method. This method has to be implemented. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-960) Using Hadoop's optimized LineRecordReader for reading Tuples in PigStorage
[ https://issues.apache.org/jira/browse/PIG-960?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ankit Modi updated PIG-960: --- Attachment: pig_rlr.patch > Using Hadoop's optimized LineRecordReader for reading Tuples in PigStorage > --- > > Key: PIG-960 > URL: https://issues.apache.org/jira/browse/PIG-960 > Project: Pig > Issue Type: Improvement > Components: impl >Reporter: Ankit Modi > Attachments: pig_rlr.patch > > > PigStorage's reading of Tuples ( lines ) can be optimized using Hadoop's > {{LineRecordReader}}. > This can help in following areas > - Improving performance reading of Tuples (lines) in {{PigStorage}} > - Any future improvements in line reading done in Hadoop's > {{LineRecordReader}} is automatically carried over to Pig > Issues that are handled by this patch > - BZip uses internal buffers and positioning for determining the number of > bytes read. Hence buffering done by {{LineRecordReader}} has to be turned off > - Current implementation of {{LocalSeekableInputStream}} does not implement > {{available}} method. This method has to be implemented. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-960) Using Hadoop's optimized LineRecordReader for reading Tuples in PigStorage
[ https://issues.apache.org/jira/browse/PIG-960?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ankit Modi updated PIG-960: --- Attachment: (was: pig_rlr.patch) > Using Hadoop's optimized LineRecordReader for reading Tuples in PigStorage > --- > > Key: PIG-960 > URL: https://issues.apache.org/jira/browse/PIG-960 > Project: Pig > Issue Type: Improvement > Components: impl >Reporter: Ankit Modi > Attachments: pig_rlr.patch > > > PigStorage's reading of Tuples ( lines ) can be optimized using Hadoop's > {{LineRecordReader}}. > This can help in following areas > - Improving performance reading of Tuples (lines) in {{PigStorage}} > - Any future improvements in line reading done in Hadoop's > {{LineRecordReader}} is automatically carried over to Pig > Issues that are handled by this patch > - BZip uses internal buffers and positioning for determining the number of > bytes read. Hence buffering done by {{LineRecordReader}} has to be turned off > - Current implementation of {{LocalSeekableInputStream}} does not implement > {{available}} method. This method has to be implemented. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-960) Using Hadoop's optimized LineRecordReader for reading Tuples in PigStorage
[ https://issues.apache.org/jira/browse/PIG-960?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12761376#action_12761376 ] Ankit Modi commented on PIG-960: Added the latest patch making PigLineRecordReader a wrapper only. > Using Hadoop's optimized LineRecordReader for reading Tuples in PigStorage > --- > > Key: PIG-960 > URL: https://issues.apache.org/jira/browse/PIG-960 > Project: Pig > Issue Type: Improvement > Components: impl >Reporter: Ankit Modi > Attachments: pig_rlr.patch > > > PigStorage's reading of Tuples ( lines ) can be optimized using Hadoop's > {{LineRecordReader}}. > This can help in following areas > - Improving performance reading of Tuples (lines) in {{PigStorage}} > - Any future improvements in line reading done in Hadoop's > {{LineRecordReader}} is automatically carried over to Pig > Issues that are handled by this patch > - BZip uses internal buffers and positioning for determining the number of > bytes read. Hence buffering done by {{LineRecordReader}} has to be turned off > - Current implementation of {{LocalSeekableInputStream}} does not implement > {{available}} method. This method has to be implemented. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Assigned: (PIG-1036) Fragment-replicate left outer join
[ https://issues.apache.org/jira/browse/PIG-1036?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ankit Modi reassigned PIG-1036: --- Assignee: Ankit Modi > Fragment-replicate left outer join > -- > > Key: PIG-1036 > URL: https://issues.apache.org/jira/browse/PIG-1036 > Project: Pig > Issue Type: New Feature >Reporter: Olga Natkovich >Assignee: Ankit Modi > -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1036) Fragment-replicate left outer join
[ https://issues.apache.org/jira/browse/PIG-1036?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ankit Modi updated PIG-1036: Attachment: LeftOuterFRJoin.patch This patch fails in findBugs as I had modified the line that contained findBugs warnings earlier. It also fails on ReleaseAudit for html ( doc ) file for POFRJoin > Fragment-replicate left outer join > -- > > Key: PIG-1036 > URL: https://issues.apache.org/jira/browse/PIG-1036 > Project: Pig > Issue Type: New Feature >Reporter: Olga Natkovich >Assignee: Ankit Modi > Attachments: LeftOuterFRJoin.patch > > -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1036) Fragment-replicate left outer join
[ https://issues.apache.org/jira/browse/PIG-1036?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ankit Modi updated PIG-1036: Status: Patch Available (was: Open) > Fragment-replicate left outer join > -- > > Key: PIG-1036 > URL: https://issues.apache.org/jira/browse/PIG-1036 > Project: Pig > Issue Type: New Feature >Reporter: Olga Natkovich >Assignee: Ankit Modi > Attachments: LeftOuterFRJoin.patch > > -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-1036) Fragment-replicate left outer join
[ https://issues.apache.org/jira/browse/PIG-1036?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12771175#action_12771175 ] Ankit Modi commented on PIG-1036: - This patch fails in findBugs as I had modified ***lines (4 lines of constructors)*** that contained findBugs warnings earlier. > Fragment-replicate left outer join > -- > > Key: PIG-1036 > URL: https://issues.apache.org/jira/browse/PIG-1036 > Project: Pig > Issue Type: New Feature >Reporter: Olga Natkovich >Assignee: Ankit Modi > Attachments: LeftOuterFRJoin.patch > > -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.