[jira] Updated: (PIG-570) Large BZip files Seem to loose data in Pig

2008-12-30 Thread Benjamin Reed (JIRA)
[ https://issues.apache.org/jira/browse/PIG-570?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Benjamin Reed updated PIG-570: -- Attachment: PIG-570.patch I believe the problem is due to bad position tracking. In the current version o

[jira] Updated: (PIG-570) Large BZip files Seem to loose data in Pig

2008-12-30 Thread Benjamin Reed (JIRA)
[ https://issues.apache.org/jira/browse/PIG-570?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Benjamin Reed updated PIG-570: -- Attachment: bzipTest.bz2 this is the test data for the bzip unit test. it should go under test/org/apache

[jira] Updated: (PIG-570) Large BZip files Seem to loose data in Pig

2008-12-30 Thread Benjamin Reed (JIRA)
[ https://issues.apache.org/jira/browse/PIG-570?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Benjamin Reed updated PIG-570: -- Status: Patch Available (was: Open) > Large BZip files Seem to loose data in Pig > -

Re: [jira] Updated: (PIG-570) Large BZip files Seem to loose data in Pig

2008-12-30 Thread Mridul
A similar thing existed with PigStorage iirc (atleast last time I checked it a while back - unless I missed something) ... If the record boundary aligned itself with hdfs boundary, the subsequent record would get dropped by pig. To illustrate map1 would read until end of its block or last rec

Re: Pig performance

2008-12-30 Thread Kevin Weil
Hi Olga, I am eagerly awaiting not having to re-read all data each time I store part of a split! As far as timelines go, I imagine this will be a larger fix that will come in after the merge from types -> trunk? And is Pig-273the proper bug for trac

[jira] Updated: (PIG-580) PERFORMANCE: Combiner should also be used when there are distinct aggregates in a foreach following a group provided there are no non-algebraics in the foreach

2008-12-30 Thread Pradeep Kamath (JIRA)
[ https://issues.apache.org/jira/browse/PIG-580?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pradeep Kamath updated PIG-580: --- Patch Info: [Patch Available] Summary: PERFORMANCE: Combiner should also be used when there are d

[jira] Updated: (PIG-580) PERFORMANCE: Combiner should also be used when there are distinct aggregates in a foreach following a group provided there are no non-algebraics in the foreach

2008-12-30 Thread Pradeep Kamath (JIRA)
[ https://issues.apache.org/jira/browse/PIG-580?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pradeep Kamath updated PIG-580: --- Status: Patch Available (was: Open) > PERFORMANCE: Combiner should also be used when there are distinct

[jira] Updated: (PIG-580) PERFORMANCE: Combiner should also be used when there are distinct aggregates in a foreach following a group provided there are no non-algebraics in the foreach

2008-12-30 Thread Pradeep Kamath (JIRA)
[ https://issues.apache.org/jira/browse/PIG-580?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pradeep Kamath updated PIG-580: --- Attachment: PIG-580.patch > PERFORMANCE: Combiner should also be used when there are distinct aggregates

[jira] Updated: (PIG-592) schema inferred incorrectly

2008-12-30 Thread Christopher Olston (JIRA)
[ https://issues.apache.org/jira/browse/PIG-592?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Christopher Olston updated PIG-592: --- Description: A simple pig script, that never introduces any schema information: A = load 'foo';

[jira] Created: (PIG-592) schema inferred incorrectly

2008-12-30 Thread Christopher Olston (JIRA)
schema inferred incorrectly --- Key: PIG-592 URL: https://issues.apache.org/jira/browse/PIG-592 Project: Pig Issue Type: Bug Affects Versions: types_branch Reporter: Christopher Olston A simple pig s

[jira] Updated: (PIG-593) RegExLoader stops an non-matching line

2008-12-30 Thread Vadim Zaliva (JIRA)
[ https://issues.apache.org/jira/browse/PIG-593?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vadim Zaliva updated PIG-593: - Status: Patch Available (was: Open) deco /Users/lord/java/pig-0.1.1/contrib/piggybank/java/src> svn diff In

[jira] Created: (PIG-593) RegExLoader stops an non-matching line

2008-12-30 Thread Vadim Zaliva (JIRA)
RegExLoader stops an non-matching line -- Key: PIG-593 URL: https://issues.apache.org/jira/browse/PIG-593 Project: Pig Issue Type: Bug Components: impl Affects Versions: 0.1.0 Rep

[jira] Updated: (PIG-593) RegExLoader stops an non-matching line

2008-12-30 Thread Vadim Zaliva (JIRA)
[ https://issues.apache.org/jira/browse/PIG-593?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vadim Zaliva updated PIG-593: - Attachment: PIG-593.diff Attaching patch. > RegExLoader stops an non-matching line >

[jira] Updated: (PIG-593) RegExLoader stops an non-matching line

2008-12-30 Thread Vadim Zaliva (JIRA)
[ https://issues.apache.org/jira/browse/PIG-593?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vadim Zaliva updated PIG-593: - Comment: was deleted > RegExLoader stops an non-matching line > -- > >

[jira] Created: (PIG-594) Inconsistent behaviour of FilterFunc UDF when used in the Filter and ForEach statements

2008-12-30 Thread Viraj Bhat (JIRA)
Inconsistent behaviour of FilterFunc UDF when used in the Filter and ForEach statements --- Key: PIG-594 URL: https://issues.apache.org/jira/browse/PIG-594 Project: P

[jira] Updated: (PIG-594) Inconsistent behaviour of FilterFunc UDF when used in the Filter and ForEach statements

2008-12-30 Thread Viraj Bhat (JIRA)
[ https://issues.apache.org/jira/browse/PIG-594?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Viraj Bhat updated PIG-594: --- Description: I have a UDF known as INSETFROMFILE, which matches data against a set of values stored in an HDFS

[jira] Updated: (PIG-594) Inconsistent behaviour of FilterFunc UDF when used in the Filter and ForEach statements

2008-12-30 Thread Viraj Bhat (JIRA)
[ https://issues.apache.org/jira/browse/PIG-594?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Viraj Bhat updated PIG-594: --- Attachment: myurldata.txt Input data for Pig Script > Inconsistent behaviour of FilterFunc UDF when used in the

[jira] Updated: (PIG-594) Inconsistent behaviour of FilterFunc UDF when used in the Filter and ForEach statements

2008-12-30 Thread Viraj Bhat (JIRA)
[ https://issues.apache.org/jira/browse/PIG-594?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Viraj Bhat updated PIG-594: --- Attachment: insetfilterfile FilterFile in HDFS > Inconsistent behaviour of FilterFunc UDF when used in the Filt

[jira] Updated: (PIG-594) Inconsistent behaviour of FilterFunc UDF when used in the Filter and ForEach statements

2008-12-30 Thread Viraj Bhat (JIRA)
[ https://issues.apache.org/jira/browse/PIG-594?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Viraj Bhat updated PIG-594: --- Attachment: INSETFROMFILE.java INSETFROMFILE UDF which uses FilterFunc > Inconsistent behaviour of FilterFunc U

[jira] Commented: (PIG-570) Large BZip files Seem to loose data in Pig

2008-12-30 Thread Olga Natkovich (JIRA)
[ https://issues.apache.org/jira/browse/PIG-570?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12659994#action_12659994 ] Olga Natkovich commented on PIG-570: Ben, thanks. This is great! I tried to apply your pa

Use of Minicluster in unit tests

2008-12-30 Thread Pradeep Kamath
Hi, MiniCluster is used to create a Hadoop cluster on the machine running the unit tests to test scripts in an end-to-end manner. Currently the unit tests which use MiniCluster create a temporary file on the local file system instead of the DFS in the miniCluster and supply it to the load state

[jira] Updated: (PIG-593) RegExLoader stops an non-matching line

2008-12-30 Thread Vadim Zaliva (JIRA)
[ https://issues.apache.org/jira/browse/PIG-593?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vadim Zaliva updated PIG-593: - Priority: Minor (was: Major) > RegExLoader stops an non-matching line > ---