Using Hadoop's optimized LineRecordReader for reading Tuples in PigStorage
---
Key: PIG-960
URL: https://issues.apache.org/jira/browse/PIG-960
Project: Pig
Issue Type:
[
https://issues.apache.org/jira/browse/PIG-960?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Ankit Modi updated PIG-960:
---
Patch Info: (was: [Patch Available])
Performance improvement numbers obtained by running PigMix
[
https://issues.apache.org/jira/browse/PIG-960?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Ankit Modi updated PIG-960:
---
Patch Info: [Patch Available]
Adding a patch file
Using Hadoop's optimized LineRecordReader for reading
[
https://issues.apache.org/jira/browse/PIG-960?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Ankit Modi updated PIG-960:
---
Status: Open (was: Patch Available)
This patch failed in release audit
Using Hadoop's optimized
[
https://issues.apache.org/jira/browse/PIG-960?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Ankit Modi updated PIG-960:
---
Attachment: pig_rlr.patch
Added a new patch with Apache license and SVN Trunk Revision 819662
Using Hadoop's
[
https://issues.apache.org/jira/browse/PIG-960?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12760822#action_12760822
]
Ankit Modi commented on PIG-960:
Thanks for comments Daniel.
Answers:
1. PigLineRecordReader
[
https://issues.apache.org/jira/browse/PIG-960?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Ankit Modi updated PIG-960:
---
Attachment: (was: pig_rlr.patch)
Using Hadoop's optimized LineRecordReader for reading Tuples in
[
https://issues.apache.org/jira/browse/PIG-960?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Ankit Modi updated PIG-960:
---
Attachment: pig_rlr.patch
Using Hadoop's optimized LineRecordReader for reading Tuples in PigStorage
[
https://issues.apache.org/jira/browse/PIG-960?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Ankit Modi updated PIG-960:
---
Attachment: (was: pig_rlr.patch)
Using Hadoop's optimized LineRecordReader for reading Tuples in
[
https://issues.apache.org/jira/browse/PIG-960?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12761376#action_12761376
]
Ankit Modi commented on PIG-960:
Added the latest patch making PigLineRecordReader a wrapper
[
https://issues.apache.org/jira/browse/PIG-1036?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Ankit Modi reassigned PIG-1036:
---
Assignee: Ankit Modi
Fragment-replicate left outer join
--
[
https://issues.apache.org/jira/browse/PIG-1036?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Ankit Modi updated PIG-1036:
Attachment: LeftOuterFRJoin.patch
This patch fails in findBugs as I had modified the line that contained
[
https://issues.apache.org/jira/browse/PIG-1036?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Ankit Modi updated PIG-1036:
Status: Open (was: Patch Available)
Fragment-replicate left outer join
--
[
https://issues.apache.org/jira/browse/PIG-1036?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Ankit Modi updated PIG-1036:
Status: Patch Available (was: Open)
Fragment-replicate left outer join
--
[
https://issues.apache.org/jira/browse/PIG-1036?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Ankit Modi updated PIG-1036:
Attachment: LeftOuterFRJoin.patch
Attaching a new patch.
The join now only supports two way Left join.
[
https://issues.apache.org/jira/browse/PIG-1036?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Ankit Modi updated PIG-1036:
Attachment: (was: LeftOuterFRJoin.patch)
Fragment-replicate left outer join
[
https://issues.apache.org/jira/browse/PIG-1036?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Ankit Modi updated PIG-1036:
Status: Open (was: Patch Available)
Fragment-replicate left outer join
--
[
https://issues.apache.org/jira/browse/PIG-1036?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Ankit Modi updated PIG-1036:
Status: Patch Available (was: Open)
Fragment-replicate left outer join
--
[
https://issues.apache.org/jira/browse/PIG-1053?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Ankit Modi updated PIG-1053:
Attachment: hadoopLocal.patch
This patch fails in releaseAudit for two new html files.
Consider moving to
[
https://issues.apache.org/jira/browse/PIG-1053?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Ankit Modi updated PIG-1053:
Status: Patch Available (was: Open)
Consider moving to Hadoop for local mode
[
https://issues.apache.org/jira/browse/PIG-1053?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12779179#action_12779179
]
Ankit Modi commented on PIG-1053:
-
This patch has an issue with custom comparators ( OrderBy)
[
https://issues.apache.org/jira/browse/PIG-1053?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12779181#action_12779181
]
Ankit Modi commented on PIG-1053:
-
This patch has an issue with custom comparators ( OrderBy)
[
https://issues.apache.org/jira/browse/PIG-1053?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12779252#action_12779252
]
Ankit Modi commented on PIG-1053:
-
PhysicalPlan in local mode had POCounter Operator before
[
https://issues.apache.org/jira/browse/PIG-1053?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Ankit Modi updated PIG-1053:
Attachment: hadoopLocal.patch
Consider moving to Hadoop for local mode
PigLineRecordReader bails out on an empty line for compressed data
--
Key: PIG-1107
URL: https://issues.apache.org/jira/browse/PIG-1107
Project: Pig
Issue Type: Bug
Affects
[
https://issues.apache.org/jira/browse/PIG-1107?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Ankit Modi updated PIG-1107:
Status: Patch Available (was: Open)
PigLineRecordReader bails out on an empty line for compressed data
[
https://issues.apache.org/jira/browse/PIG-1107?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Ankit Modi updated PIG-1107:
Attachment: pig_piglinerecordreader_bug.patch
Submitting a small patch. It has 2 new unit tests for the
[
https://issues.apache.org/jira/browse/PIG-965?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12784596#action_12784596
]
Ankit Modi commented on PIG-965:
I implemented a patch with optimization 1 and 2 mentioned
In pig local ( hadoop local mode ) mode the counting of number of tuples and
bytes is incorrect if data is more than one local split.
-
Key:
[
https://issues.apache.org/jira/browse/PIG-965?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Ankit Modi updated PIG-965:
---
Attachment: poregex2.patch
poregex.patch
These are patches for two implementations
One
[
https://issues.apache.org/jira/browse/PIG-965?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Ankit Modi updated PIG-965:
---
Attachment: poregex2.patch
Attaching one more file of patch. This one has all the changes, except changes
to
[
https://issues.apache.org/jira/browse/PIG-965?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Ankit Modi updated PIG-965:
---
Attachment: (was: poregex2.patch)
PERFORMANCE: optimize common case in matches (PORegex)
[
https://issues.apache.org/jira/browse/PIG-965?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Ankit Modi updated PIG-965:
---
Attachment: (was: poregex2.patch)
PERFORMANCE: optimize common case in matches (PORegex)
[
https://issues.apache.org/jira/browse/PIG-965?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Ankit Modi updated PIG-965:
---
Status: Patch Available (was: Open)
PERFORMANCE: optimize common case in matches (PORegex)
[
https://issues.apache.org/jira/browse/PIG-965?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Ankit Modi updated PIG-965:
---
Attachment: automaton.jar
poregex2.patch
New patch with removed comments and added
[
https://issues.apache.org/jira/browse/PIG-965?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Ankit Modi updated PIG-965:
---
Status: Open (was: Patch Available)
One small change to JarManager.java is missing. Will add a new patch with
[
https://issues.apache.org/jira/browse/PIG-1106?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Ankit Modi updated PIG-1106:
Status: Patch Available (was: Open)
This patch does not have any unit tests.
FR join should not spill
[
https://issues.apache.org/jira/browse/PIG-1106?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12789294#action_12789294
]
Ankit Modi commented on PIG-1106:
-
Tests I ran were using two files
file format
f1: random
[
https://issues.apache.org/jira/browse/PIG-965?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Ankit Modi updated PIG-965:
---
Attachment: (was: automaton.jar)
PERFORMANCE: optimize common case in matches (PORegex)
[
https://issues.apache.org/jira/browse/PIG-965?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Ankit Modi updated PIG-965:
---
Attachment: (was: poregex2.patch)
PERFORMANCE: optimize common case in matches (PORegex)
[
https://issues.apache.org/jira/browse/PIG-965?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Ankit Modi updated PIG-965:
---
Status: Patch Available (was: Open)
PERFORMANCE: optimize common case in matches (PORegex)
[
https://issues.apache.org/jira/browse/PIG-965?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Ankit Modi updated PIG-965:
---
Attachment: automaton.jar
poregex2.patch
PERFORMANCE: optimize common case in matches
[
https://issues.apache.org/jira/browse/PIG-965?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Ankit Modi updated PIG-965:
---
Attachment: (was: poregex2.patch)
PERFORMANCE: optimize common case in matches (PORegex)
[
https://issues.apache.org/jira/browse/PIG-965?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Ankit Modi updated PIG-965:
---
Status: Open (was: Patch Available)
PERFORMANCE: optimize common case in matches (PORegex)
[
https://issues.apache.org/jira/browse/PIG-965?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Ankit Modi updated PIG-965:
---
Attachment: poregex2.patch
PERFORMANCE: optimize common case in matches (PORegex)
[
https://issues.apache.org/jira/browse/PIG-965?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Ankit Modi updated PIG-965:
---
Status: Patch Available (was: Open)
I have included changes suggested by Thejas.
PERFORMANCE: optimize
[
https://issues.apache.org/jira/browse/PIG-965?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Ankit Modi updated PIG-965:
---
Status: Open (was: Patch Available)
PERFORMANCE: optimize common case in matches (PORegex)
[
https://issues.apache.org/jira/browse/PIG-965?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12790545#action_12790545
]
Ankit Modi commented on PIG-965:
* NonConstantRegex - I did not think of equals. But I added a
[
https://issues.apache.org/jira/browse/PIG-965?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Ankit Modi updated PIG-965:
---
Attachment: (was: poregex2.patch)
PERFORMANCE: optimize common case in matches (PORegex)
[
https://issues.apache.org/jira/browse/PIG-965?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Ankit Modi updated PIG-965:
---
Attachment: poregex2.patch
PERFORMANCE: optimize common case in matches (PORegex)
[
https://issues.apache.org/jira/browse/PIG-965?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Ankit Modi updated PIG-965:
---
Status: Patch Available (was: Open)
PERFORMANCE: optimize common case in matches (PORegex)
[
https://issues.apache.org/jira/browse/PIG-965?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12791096#action_12791096
]
Ankit Modi commented on PIG-965:
Here are numbers comparing comparing optimization 12 against
[
https://issues.apache.org/jira/browse/PIG-965?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Ankit Modi updated PIG-965:
---
Attachment: (was: poregex2.patch)
PERFORMANCE: optimize common case in matches (PORegex)
[
https://issues.apache.org/jira/browse/PIG-965?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Ankit Modi updated PIG-965:
---
Status: Open (was: Patch Available)
PERFORMANCE: optimize common case in matches (PORegex)
[
https://issues.apache.org/jira/browse/PIG-965?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Ankit Modi updated PIG-965:
---
Assignee: Benjamin Francisoud (was: Ankit Modi)
Status: Patch Available (was: Open)
PERFORMANCE:
[
https://issues.apache.org/jira/browse/PIG-965?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Ankit Modi reassigned PIG-965:
--
Assignee: Ankit Modi (was: Benjamin Francisoud)
PERFORMANCE: optimize common case in matches (PORegex)
[
https://issues.apache.org/jira/browse/PIG-965?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Ankit Modi updated PIG-965:
---
Attachment: (was: poregex2.patch)
PERFORMANCE: optimize common case in matches (PORegex)
[
https://issues.apache.org/jira/browse/PIG-965?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Ankit Modi updated PIG-965:
---
Status: Open (was: Patch Available)
PERFORMANCE: optimize common case in matches (PORegex)
[
https://issues.apache.org/jira/browse/PIG-965?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Ankit Modi updated PIG-965:
---
Status: Patch Available (was: Open)
PERFORMANCE: optimize common case in matches (PORegex)
[
https://issues.apache.org/jira/browse/PIG-1154?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12828822#action_12828822
]
Ankit Modi commented on PIG-1154:
-
It looks like the problem is caused by overwritten value
[
https://issues.apache.org/jira/browse/PIG-1154?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12828831#action_12828831
]
Ankit Modi commented on PIG-1154:
-
It will provide warning whenever the files are encountered
[
https://issues.apache.org/jira/browse/PIG-1154?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Ankit Modi updated PIG-1154:
Attachment: pig_1154.patch
Patch according to comments mentioned above.
local mode fails when hadoop
[
https://issues.apache.org/jira/browse/PIG-1154?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Ankit Modi updated PIG-1154:
Status: Patch Available (was: Open)
This patch only affects only Local Mode in pig.
local mode fails when
[
https://issues.apache.org/jira/browse/PIG-1178?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Ankit Modi updated PIG-1178:
Status: Open (was: Patch Available)
I found a bug in the code so I'll be releasing another patch for the
[
https://issues.apache.org/jira/browse/PIG-1178?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Ankit Modi updated PIG-1178:
Attachment: pig_1178.patch
This is a new patch that can be applied to SVN Trunk.
It includes ForEach,
[
https://issues.apache.org/jira/browse/PIG-965?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Ankit Modi reopened PIG-965:
Assignee: (was: Ankit Modi)
I couldn't see the poregex2.patch patch applied in the code.
automaton.jar
[
https://issues.apache.org/jira/browse/PIG-1178?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Ankit Modi updated PIG-1178:
Status: Open (was: Patch Available)
LogicalPlan and Optimizer are too complex and hard to work with
[
https://issues.apache.org/jira/browse/PIG-1178?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Ankit Modi updated PIG-1178:
Status: Patch Available (was: Open)
Resubmitting patch again due to core test failures
LogicalPlan and
[
https://issues.apache.org/jira/browse/PIG-1178?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12837954#action_12837954
]
Ankit Modi commented on PIG-1178:
-
the core tests are failing due to some issue with hudson
[
https://issues.apache.org/jira/browse/PIG-1178?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Ankit Modi updated PIG-1178:
Status: Patch Available (was: Open)
LogicalPlan and Optimizer are too complex and hard to work with
[
https://issues.apache.org/jira/browse/PIG-1178?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Ankit Modi updated PIG-1178:
Attachment: pig_1178_3.patch
LogicalPlan and Optimizer are too complex and hard to work with
71 matches
Mail list logo