[jira] Commented: (PIG-1292) Interface Refinements

2010-03-15 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12845336#action_12845336
 ] 

Hadoop QA commented on PIG-1292:


-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12438638/pig-1292.patch
  against trunk revision 923043.

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 6 new or modified tests.

+1 javadoc.  The javadoc tool did not generate any warning messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

+1 findbugs.  The patch does not introduce any new Findbugs warnings.

-1 release audit.  The applied patch generated 531 release audit warnings 
(more than the trunk's current 530 warnings).

-1 core tests.  The patch failed core unit tests.

+1 contrib tests.  The patch passed contrib unit tests.

Test results: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/237/testReport/
Release audit warnings: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/237/artifact/trunk/patchprocess/releaseAuditDiffWarnings.txt
Findbugs warnings: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/237/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Console output: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/237/console

This message is automatically generated.

 Interface Refinements
 -

 Key: PIG-1292
 URL: https://issues.apache.org/jira/browse/PIG-1292
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.7.0
Reporter: Ashutosh Chauhan
Assignee: Ashutosh Chauhan
 Fix For: 0.7.0

 Attachments: pig-1292.patch, pig-interfaces.patch


 A loader can't implement both OrderedLoadFunc and IndexableLoadFunc, as both 
 are abstract classes instead of being interfaces.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



Broken build

2010-03-15 Thread Dmitriy Ryaboy
Hi guys,
Trunk has been broken for a while. A bunch of tests in the test-commit
target fail, mostly due to The import
org.apache.pig.experimental.logical.optimizer.PlanPrinter cannot be
resolved. Could someone check in the missing file?

-D


[jira] Updated: (PIG-1296) Skewed join fail due to negative partition index

2010-03-15 Thread Daniel Dai (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1296?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai updated PIG-1296:


Status: Patch Available  (was: Open)

 Skewed join fail due to negative partition index
 

 Key: PIG-1296
 URL: https://issues.apache.org/jira/browse/PIG-1296
 Project: Pig
  Issue Type: Bug
  Components: impl
Affects Versions: 0.7.0
Reporter: Daniel Dai
Assignee: Daniel Dai
 Fix For: 0.7.0

 Attachments: PIG-1296-1.patch


 Skewed join throw stack:
 java.io.IOException: Illegal partition for Partition: -1 Null: false index: 0 
 (fc52di95l6m3j,20100210) (-3648)
 at 
 org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(MapTask.java:904)
 at 
 org.apache.hadoop.mapred.MapTask$NewOutputCollector.write(MapTask.java:541)
 at 
 org.apache.hadoop.mapreduce.TaskInputOutputContext.write(TaskInputOutputContext.java:80)
 at 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$MapWithPartitionIndex.collect(PigMapReduce.java:187)
 at 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$MapWithPartitionIndex.runPipeline(PigMapReduce.java:206)
 at 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:227)
 at 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:52)
 at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
 at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:621)
 at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
 at org.apache.hadoop.mapred.Child.main(Child.java:159)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1285) Allow SingleTupleBag to be serialized

2010-03-15 Thread Pradeep Kamath (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1285?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12845458#action_12845458
 ] 

Pradeep Kamath commented on PIG-1285:
-

Couple of comments:
 * I think instead of the code below the implementation of write should be 
inlined into SingleTupleBag.write() (I guess DefaultDataBag.write() and 
SingleTupleBag.write() could call a common method to implement write()).
{noformat}
+DataBag bag = bagFactory.newDefaultBag();
+bag.addAll(this);
+bag.write(out)
{noformat}

The reason is that bagFactory.newDefaultBag() registers the bag with the 
SpillableMemoryManager which inturn puts a weak reference to the bag on a 
Linked list - in the past we have seen this list grow in size and cause memory 
issue and was one of the main motivations for creating SingleTupleBag.

 * There is an implementation for write() but not read() - reading through the 
code I guess this is because during deserialization SingleTupleBag.read() will 
not be called but DefaultDataBag.read() would be called. I am wondering if 
leaving the SingleTupleBag.read() as-is is confusing since it throws an 
exception with the message - SingleTupleBag should never be serialized or 
deserialized.


 Allow SingleTupleBag to be serialized
 -

 Key: PIG-1285
 URL: https://issues.apache.org/jira/browse/PIG-1285
 Project: Pig
  Issue Type: Improvement
Reporter: Dmitriy V. Ryaboy
Assignee: Dmitriy V. Ryaboy
 Fix For: 0.7.0

 Attachments: PIG-1285.patch


 Currently, Pig uses a SingleTupleBag for efficiency when a full-blown 
 spillable bag implementation is not needed in the Combiner optimization.
 Unfortunately this can create problems. The below Initial.exec() code fails 
 at run-time with the message that a SingleTupleBag cannot be serialized:
 {code}
 @Override
 public Tuple exec(Tuple in) throws IOException {
   // single record. just copy.
   if (in == null) return null;   
   try {
  Tuple resTuple = tupleFactory_.newTuple(in.size());
  for (int i=0; i in.size(); i++) {
resTuple.set(i, in.get(i));
 }
 return resTuple;
} catch (IOException e) {
  log.warn(e);
  return null;
   }
 }
 {code}
 The code below can fix the problem in the UDF, but it seems like something 
 that should be handled transparently, not requiring UDF authors to know about 
 SingleTupleBags.
 {code}
 @Override
 public Tuple exec(Tuple in) throws IOException {
   // single record. just copy.
   if (in == null) return null;   
   
   /*
* Unfortunately SingleTupleBags are not serializable. We cache whether 
 a given index contains a bag
* in the map below, and copy all bags into DefaultBags before 
 returning to avoid serialization exceptions.
*/
   MapInteger, Boolean isBagAtIndex = Maps.newHashMap();
   
   try {
 Tuple resTuple = tupleFactory_.newTuple(in.size());
 for (int i=0; i in.size(); i++) {
   Object obj = in.get(i);
   if (!isBagAtIndex.containsKey(i)) {
 isBagAtIndex.put(i, obj instanceof SingleTupleBag);
   }
   if (isBagAtIndex.get(i)) {
 DataBag newBag = bagFactory_.newDefaultBag();
 newBag.addAll((DataBag)obj);
 obj = newBag;
   }
   resTuple.set(i, obj);
 }
 return resTuple;
   } catch (IOException e) {
 log.warn(e);
 return null;
   }
 }
 {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (PIG-1297) algebraic interface of udf does not get used if the foreach with udf projects column within group

2010-03-15 Thread Thejas M Nair (JIRA)
algebraic interface of udf does not get used if the foreach with udf projects 
column within group
-

 Key: PIG-1297
 URL: https://issues.apache.org/jira/browse/PIG-1297
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.7.0
Reporter: Thejas M Nair


grunt l = load 'file' as (a,b,c);
grunt g = group l by (a,b);
grunt f = foreach g generate SUM(l.c), group.a;
grunt explain f;
...
...
#--
# Map Reduce Plan
#--
MapReduce node 1-752
Map Plan
Local Rearrange[tuple]{tuple}(false) - 1-742
|   |
|   Project[bytearray][0] - 1-743
|   |
|   Project[bytearray][1] - 1-744
|
|---Load(file:///Users/tejas/pig/trunk/file:org.apache.pig.builtin.PigStorage) 
- 1-739
Reduce Plan
Store(fakefile:org.apache.pig.builtin.PigStorage) - 1-751
|
|---New For Each(false,false)[bag] - 1-750
|   |
|   POUserFunc(org.apache.pig.builtin.SUM)[double] - 1-747
|   |
|   |---Project[bag][2] - 1-746
|   |
|   |---Project[bag][1] - 1-745
|   |
|   Project[bytearray][0] - 1-749
|   |
|   |---Project[tuple][0] - 1-748
|
|---Package[tuple]{tuple} - 1-741
Global sort: false



-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (PIG-1298) Restore file traveral behavior to Pig loaders

2010-03-15 Thread Richard Ding (JIRA)
Restore file traveral behavior to Pig loaders
-

 Key: PIG-1298
 URL: https://issues.apache.org/jira/browse/PIG-1298
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.7.0
Reporter: Richard Ding
Assignee: Richard Ding
 Fix For: 0.7.0


Given a location to a Pig loader, it is expected to recursively load all the 
files under the location (i.e., all the files returned with  ls -R command). 
However, after the transition to using Hadoop 20 API,  only files returned with 
ls command are loaded.

 

  

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1298) Restore file traversal behavior to Pig loaders

2010-03-15 Thread Richard Ding (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1298?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Richard Ding updated PIG-1298:
--

Summary: Restore file traversal behavior to Pig loaders  (was: Restore file 
traveral behavior to Pig loaders)

 Restore file traversal behavior to Pig loaders
 --

 Key: PIG-1298
 URL: https://issues.apache.org/jira/browse/PIG-1298
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.7.0
Reporter: Richard Ding
Assignee: Richard Ding
 Fix For: 0.7.0


 Given a location to a Pig loader, it is expected to recursively load all the 
 files under the location (i.e., all the files returned with  ls -R 
 command). However, after the transition to using Hadoop 20 API,  only files 
 returned with ls command are loaded.
  
   

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (PIG-1299) Implement Pig counter to track number of output rows for each output files

2010-03-15 Thread Richard Ding (JIRA)
Implement Pig counter  to track number of output rows for each output files 


 Key: PIG-1299
 URL: https://issues.apache.org/jira/browse/PIG-1299
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.6.0
Reporter: Richard Ding
Assignee: Richard Ding
 Fix For: 0.7.0


When running a multi-store query, the Hadoop job tracker often displays only 0 
for Reduce output records or Map output records counters, This is incorrect 
and misleading. Pig should implement an output records counter for each 
output files in the query. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Assigned: (PIG-1279) Make sample loaders interchangeable

2010-03-15 Thread Richard Ding (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1279?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Richard Ding reassigned PIG-1279:
-

Assignee: Richard Ding

 Make sample loaders interchangeable 
 

 Key: PIG-1279
 URL: https://issues.apache.org/jira/browse/PIG-1279
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.7.0
Reporter: Richard Ding
Assignee: Richard Ding

 In Pig 0.6 one can use random sample loader in place of Poisson sample loader 
 for skewed join, but this isn't the case in trunk (PIG-1264).
 In general, the sample loaders should be interchangeable (the sampling 
 characteristics differs). 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-200) Pig Performance Benchmarks

2010-03-15 Thread Daniel Dai (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-200?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai updated PIG-200:
---

Attachment: perf-0.6.patch

Hi, Duncan,
perf.patch is a little bit old. I attach new perf-0.6.patch. Instruction to 
generate input data for Pigmix is:
1. apply perf-0.6.patch on pig 0.6 release
2. ant jar compile-test
3. export PIG_HOME=.
4. test/utils/pigmix/datagen/generate_data.sh

 Pig Performance Benchmarks
 --

 Key: PIG-200
 URL: https://issues.apache.org/jira/browse/PIG-200
 Project: Pig
  Issue Type: Task
Reporter: Amir Youssefi
Assignee: Alan Gates
 Attachments: generate_data.pl, perf-0.6.patch, perf.hadoop.patch, 
 perf.patch


 To benchmark Pig performance, we need to have a TPC-H like Large Data Set 
 plus Script Collection. This is used in comparison of different Pig releases, 
 Pig vs. other systems (e.g. Pig + Hadoop vs. Hadoop Only).
 Here is Wiki for small tests: http://wiki.apache.org/pig/PigPerformance
 I am currently running long-running Pig scripts over data-sets in the order 
 of tens of TBs. Next step is hundreds of TBs.
 We need to have an open large-data set (open source scripts which generate 
 data-set) and detailed scripts for important operations such as ORDER, 
 AGGREGATION etc.
 We can call those the Pig Workouts: Cardio (short processing), Marathon (long 
 running scripts) and Triathlon (Mix). 
 I will update this JIRA with more details of current activities soon.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1296) Skewed join fail due to negative partition index

2010-03-15 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1296?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12845584#action_12845584
 ] 

Hadoop QA commented on PIG-1296:


-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12438842/PIG-1296-1.patch
  against trunk revision 923043.

+1 @author.  The patch does not contain any @author tags.

-1 tests included.  The patch doesn't appear to include any new or modified 
tests.
Please justify why no tests are needed for this patch.

+1 javadoc.  The javadoc tool did not generate any warning messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

+1 findbugs.  The patch does not introduce any new Findbugs warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

+1 core tests.  The patch passed core unit tests.

+1 contrib tests.  The patch passed contrib unit tests.

Test results: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/238/testReport/
Findbugs warnings: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/238/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Console output: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/238/console

This message is automatically generated.

 Skewed join fail due to negative partition index
 

 Key: PIG-1296
 URL: https://issues.apache.org/jira/browse/PIG-1296
 Project: Pig
  Issue Type: Bug
  Components: impl
Affects Versions: 0.7.0
Reporter: Daniel Dai
Assignee: Daniel Dai
 Fix For: 0.7.0

 Attachments: PIG-1296-1.patch


 Skewed join throw stack:
 java.io.IOException: Illegal partition for Partition: -1 Null: false index: 0 
 (fc52di95l6m3j,20100210) (-3648)
 at 
 org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(MapTask.java:904)
 at 
 org.apache.hadoop.mapred.MapTask$NewOutputCollector.write(MapTask.java:541)
 at 
 org.apache.hadoop.mapreduce.TaskInputOutputContext.write(TaskInputOutputContext.java:80)
 at 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$MapWithPartitionIndex.collect(PigMapReduce.java:187)
 at 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$MapWithPartitionIndex.runPipeline(PigMapReduce.java:206)
 at 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:227)
 at 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:52)
 at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
 at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:621)
 at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
 at org.apache.hadoop.mapred.Child.main(Child.java:159)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1285) Allow SingleTupleBag to be serialized

2010-03-15 Thread Pradeep Kamath (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1285?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12845606#action_12845606
 ] 

Pradeep Kamath commented on PIG-1285:
-

SingleTupleBag did not go the route of extending DefaultAbstractBag for a 
couple of reasons
1) The object would have few more members (like mMemSize* fields, mSize etc 
which are present in DefaultAbstractBag) - this would make the object bigger in 
memory and SingleTupleBag was designed to be used in map/combine phase with 
minimal memory overhead
2) The first point in my previous comment - we don't want this bag to register 
with SpillableMemoryManger which in turn puts a weak reference to the bag on a 
Linked list - in the past we have seen this list grow in size and itself cause 
memory issues

 Allow SingleTupleBag to be serialized
 -

 Key: PIG-1285
 URL: https://issues.apache.org/jira/browse/PIG-1285
 Project: Pig
  Issue Type: Improvement
Reporter: Dmitriy V. Ryaboy
Assignee: Dmitriy V. Ryaboy
 Fix For: 0.7.0

 Attachments: PIG-1285.patch


 Currently, Pig uses a SingleTupleBag for efficiency when a full-blown 
 spillable bag implementation is not needed in the Combiner optimization.
 Unfortunately this can create problems. The below Initial.exec() code fails 
 at run-time with the message that a SingleTupleBag cannot be serialized:
 {code}
 @Override
 public Tuple exec(Tuple in) throws IOException {
   // single record. just copy.
   if (in == null) return null;   
   try {
  Tuple resTuple = tupleFactory_.newTuple(in.size());
  for (int i=0; i in.size(); i++) {
resTuple.set(i, in.get(i));
 }
 return resTuple;
} catch (IOException e) {
  log.warn(e);
  return null;
   }
 }
 {code}
 The code below can fix the problem in the UDF, but it seems like something 
 that should be handled transparently, not requiring UDF authors to know about 
 SingleTupleBags.
 {code}
 @Override
 public Tuple exec(Tuple in) throws IOException {
   // single record. just copy.
   if (in == null) return null;   
   
   /*
* Unfortunately SingleTupleBags are not serializable. We cache whether 
 a given index contains a bag
* in the map below, and copy all bags into DefaultBags before 
 returning to avoid serialization exceptions.
*/
   MapInteger, Boolean isBagAtIndex = Maps.newHashMap();
   
   try {
 Tuple resTuple = tupleFactory_.newTuple(in.size());
 for (int i=0; i in.size(); i++) {
   Object obj = in.get(i);
   if (!isBagAtIndex.containsKey(i)) {
 isBagAtIndex.put(i, obj instanceof SingleTupleBag);
   }
   if (isBagAtIndex.get(i)) {
 DataBag newBag = bagFactory_.newDefaultBag();
 newBag.addAll((DataBag)obj);
 obj = newBag;
   }
   resTuple.set(i, obj);
 }
 return resTuple;
   } catch (IOException e) {
 log.warn(e);
 return null;
   }
 }
 {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1257) PigStorage per the new load-store redesign should support splitting of bzip files

2010-03-15 Thread Pradeep Kamath (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1257?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pradeep Kamath updated PIG-1257:


Status: Open  (was: Patch Available)

 PigStorage per the new load-store redesign should support splitting of bzip 
 files
 -

 Key: PIG-1257
 URL: https://issues.apache.org/jira/browse/PIG-1257
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.7.0
Reporter: Pradeep Kamath
Assignee: Pradeep Kamath
 Fix For: 0.7.0

 Attachments: PIG-1257-2.patch, PIG-1257.patch


 PigStorage implemented per new load-store-redesign (PIG-966) is based on 
 TextInputFormat for reading data. TextInputFormat has support for reading 
 bzip data but without support for splitting bzip files. In pig 0.6, splitting 
 was enabled for bzip files - we should attempt to enable that feature.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1257) PigStorage per the new load-store redesign should support splitting of bzip files

2010-03-15 Thread Pradeep Kamath (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1257?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pradeep Kamath updated PIG-1257:


Attachment: blockHeaderEndsAt136500.txt.bz2
blockEndingInCR.txt.bz2
PIG-1257-3.patch

Since the last patch, I uncovered some issue with code while testing some 
boundary conditions. I have fixed those in the new patch PIG-1257-3.patch and 
included those boundary conditions in testcases in TestBZip

 PigStorage per the new load-store redesign should support splitting of bzip 
 files
 -

 Key: PIG-1257
 URL: https://issues.apache.org/jira/browse/PIG-1257
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.7.0
Reporter: Pradeep Kamath
Assignee: Pradeep Kamath
 Fix For: 0.7.0

 Attachments: blockEndingInCR.txt.bz2, 
 blockHeaderEndsAt136500.txt.bz2, PIG-1257-2.patch, PIG-1257-3.patch, 
 PIG-1257.patch


 PigStorage implemented per new load-store-redesign (PIG-966) is based on 
 TextInputFormat for reading data. TextInputFormat has support for reading 
 bzip data but without support for splitting bzip files. In pig 0.6, splitting 
 was enabled for bzip files - we should attempt to enable that feature.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1257) PigStorage per the new load-store redesign should support splitting of bzip files

2010-03-15 Thread Pradeep Kamath (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1257?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pradeep Kamath updated PIG-1257:


Status: Patch Available  (was: Open)

 PigStorage per the new load-store redesign should support splitting of bzip 
 files
 -

 Key: PIG-1257
 URL: https://issues.apache.org/jira/browse/PIG-1257
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.7.0
Reporter: Pradeep Kamath
Assignee: Pradeep Kamath
 Fix For: 0.7.0

 Attachments: blockEndingInCR.txt.bz2, 
 blockHeaderEndsAt136500.txt.bz2, PIG-1257-2.patch, PIG-1257-3.patch, 
 PIG-1257.patch, recordLossblockHeaderEndsAt136500.txt.bz2


 PigStorage implemented per new load-store-redesign (PIG-966) is based on 
 TextInputFormat for reading data. TextInputFormat has support for reading 
 bzip data but without support for splitting bzip files. In pig 0.6, splitting 
 was enabled for bzip files - we should attempt to enable that feature.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1257) PigStorage per the new load-store redesign should support splitting of bzip files

2010-03-15 Thread Pradeep Kamath (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1257?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pradeep Kamath updated PIG-1257:


Attachment: recordLossblockHeaderEndsAt136500.txt.bz2

The .bz2 files attached to this issue should be put in 
test/org/apache/pig/test/data for this patch to pass unit tests.

 PigStorage per the new load-store redesign should support splitting of bzip 
 files
 -

 Key: PIG-1257
 URL: https://issues.apache.org/jira/browse/PIG-1257
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.7.0
Reporter: Pradeep Kamath
Assignee: Pradeep Kamath
 Fix For: 0.7.0

 Attachments: blockEndingInCR.txt.bz2, 
 blockHeaderEndsAt136500.txt.bz2, PIG-1257-2.patch, PIG-1257-3.patch, 
 PIG-1257.patch, recordLossblockHeaderEndsAt136500.txt.bz2


 PigStorage implemented per new load-store-redesign (PIG-966) is based on 
 TextInputFormat for reading data. TextInputFormat has support for reading 
 bzip data but without support for splitting bzip files. In pig 0.6, splitting 
 was enabled for bzip files - we should attempt to enable that feature.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1257) PigStorage per the new load-store redesign should support splitting of bzip files

2010-03-15 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1257?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12845627#action_12845627
 ] 

Hadoop QA commented on PIG-1257:


-1 overall.  Here are the results of testing the latest attachment 
  
http://issues.apache.org/jira/secure/attachment/12438883/recordLossblockHeaderEndsAt136500.txt.bz2
  against trunk revision 923043.

+1 @author.  The patch does not contain any @author tags.

-1 tests included.  The patch doesn't appear to include any new or modified 
tests.
Please justify why no tests are needed for this patch.

-1 patch.  The patch command could not apply the patch.

Console output: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/239/console

This message is automatically generated.

 PigStorage per the new load-store redesign should support splitting of bzip 
 files
 -

 Key: PIG-1257
 URL: https://issues.apache.org/jira/browse/PIG-1257
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.7.0
Reporter: Pradeep Kamath
Assignee: Pradeep Kamath
 Fix For: 0.7.0

 Attachments: blockEndingInCR.txt.bz2, 
 blockHeaderEndsAt136500.txt.bz2, PIG-1257-2.patch, PIG-1257-3.patch, 
 PIG-1257.patch, recordLossblockHeaderEndsAt136500.txt.bz2


 PigStorage implemented per new load-store-redesign (PIG-966) is based on 
 TextInputFormat for reading data. TextInputFormat has support for reading 
 bzip data but without support for splitting bzip files. In pig 0.6, splitting 
 was enabled for bzip files - we should attempt to enable that feature.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (PIG-1300) PigStorage does not load tuples with large #s.

2010-03-15 Thread Brian Donaldson (JIRA)
PigStorage does not load tuples with large #s.
--

 Key: PIG-1300
 URL: https://issues.apache.org/jira/browse/PIG-1300
 Project: Pig
  Issue Type: Bug
  Components: data
Reporter: Brian Donaldson


Say I have a file 'a' with the following entry:
(30010401402)

grunt A = LOAD 'a' AS (t:tuple(a:chararray));
grunt DUMP A;
2010-03-15 17:37:23,333 [main] WARN  
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigHadoopLogger - 
org.apache.pig.builtin.PigStorage: Unable to interpret value [...@353c375 in 
field being converted to type tuple, caught Exception For input string: 
30010401402 field discarded
2010-03-15 17:37:23,335 [main] INFO  
org.apache.pig.backend.local.executionengine.LocalPigLauncher - Successfully 
stored result in: file:/tmp/temp-1345435162/tmp-308780808
2010-03-15 17:37:23,335 [main] INFO  
org.apache.pig.backend.local.executionengine.LocalPigLauncher - Records written 
: 1
2010-03-15 17:37:23,335 [main] INFO  
org.apache.pig.backend.local.executionengine.LocalPigLauncher - Bytes written : 0
2010-03-15 17:37:23,335 [main] INFO  
org.apache.pig.backend.local.executionengine.LocalPigLauncher - 100% complete!
2010-03-15 17:37:23,336 [main] INFO  
org.apache.pig.backend.local.executionengine.LocalPigLauncher - Success!!
()

If I have another file 'b' with the following entry:
(30010401402L)

grunt B = LOAD 'b' AS (t:tuple(a:chararray));
grunt DUMP B;
2010-03-15 17:39:10,051 [main] INFO  
org.apache.pig.backend.local.executionengine.LocalPigLauncher - Successfully 
stored result in: file:/tmp/temp-1630850555/tmp1316256240
2010-03-15 17:39:10,051 [main] INFO  
org.apache.pig.backend.local.executionengine.LocalPigLauncher - Records written 
: 1
2010-03-15 17:39:10,051 [main] INFO  
org.apache.pig.backend.local.executionengine.LocalPigLauncher - Bytes written : 0
2010-03-15 17:39:10,051 [main] INFO  
org.apache.pig.backend.local.executionengine.LocalPigLauncher - 100% complete!
2010-03-15 17:39:10,052 [main] INFO  
org.apache.pig.backend.local.executionengine.LocalPigLauncher - Success!!
((30010401402L))

Is there a way to get the load in the first example to work?  Or do I need to 
start affixing an L to all my #s? 


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1300) PigStorage does not load tuples with large #s.

2010-03-15 Thread Daniel Dai (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1300?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12845628#action_12845628
 ] 

Daniel Dai commented on PIG-1300:
-

Which version of Pig are you using? Can you try it on trunk? Looks like it 
should be fixed in PIG-613.

 PigStorage does not load tuples with large #s.
 --

 Key: PIG-1300
 URL: https://issues.apache.org/jira/browse/PIG-1300
 Project: Pig
  Issue Type: Bug
  Components: data
Reporter: Brian Donaldson

 Say I have a file 'a' with the following entry:
 (30010401402)
 grunt A = LOAD 'a' AS (t:tuple(a:chararray));
 grunt DUMP A;
 2010-03-15 17:37:23,333 [main] WARN  
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigHadoopLogger 
 - org.apache.pig.builtin.PigStorage: Unable to interpret value [...@353c375 
 in field being converted to type tuple, caught Exception For input string: 
 30010401402 field discarded
 2010-03-15 17:37:23,335 [main] INFO  
 org.apache.pig.backend.local.executionengine.LocalPigLauncher - Successfully 
 stored result in: file:/tmp/temp-1345435162/tmp-308780808
 2010-03-15 17:37:23,335 [main] INFO  
 org.apache.pig.backend.local.executionengine.LocalPigLauncher - Records 
 written : 1
 2010-03-15 17:37:23,335 [main] INFO  
 org.apache.pig.backend.local.executionengine.LocalPigLauncher - Bytes written 
 : 0
 2010-03-15 17:37:23,335 [main] INFO  
 org.apache.pig.backend.local.executionengine.LocalPigLauncher - 100% complete!
 2010-03-15 17:37:23,336 [main] INFO  
 org.apache.pig.backend.local.executionengine.LocalPigLauncher - Success!!
 ()
 If I have another file 'b' with the following entry:
 (30010401402L)
 grunt B = LOAD 'b' AS (t:tuple(a:chararray));
 grunt DUMP B;
 2010-03-15 17:39:10,051 [main] INFO  
 org.apache.pig.backend.local.executionengine.LocalPigLauncher - Successfully 
 stored result in: file:/tmp/temp-1630850555/tmp1316256240
 2010-03-15 17:39:10,051 [main] INFO  
 org.apache.pig.backend.local.executionengine.LocalPigLauncher - Records 
 written : 1
 2010-03-15 17:39:10,051 [main] INFO  
 org.apache.pig.backend.local.executionengine.LocalPigLauncher - Bytes written 
 : 0
 2010-03-15 17:39:10,051 [main] INFO  
 org.apache.pig.backend.local.executionengine.LocalPigLauncher - 100% complete!
 2010-03-15 17:39:10,052 [main] INFO  
 org.apache.pig.backend.local.executionengine.LocalPigLauncher - Success!!
 ((30010401402L))
 Is there a way to get the load in the first example to work?  Or do I need to 
 start affixing an L to all my #s? 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1300) PigStorage does not load tuples with large #s.

2010-03-15 Thread Brian Donaldson (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1300?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12845636#action_12845636
 ] 

Brian Donaldson commented on PIG-1300:
--

This is with version 0.5+11.1 (cloudera), and with the recently released 0.6.

 PigStorage does not load tuples with large #s.
 --

 Key: PIG-1300
 URL: https://issues.apache.org/jira/browse/PIG-1300
 Project: Pig
  Issue Type: Bug
  Components: data
Reporter: Brian Donaldson

 Say I have a file 'a' with the following entry:
 (30010401402)
 grunt A = LOAD 'a' AS (t:tuple(a:chararray));
 grunt DUMP A;
 2010-03-15 17:37:23,333 [main] WARN  
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigHadoopLogger 
 - org.apache.pig.builtin.PigStorage: Unable to interpret value [...@353c375 
 in field being converted to type tuple, caught Exception For input string: 
 30010401402 field discarded
 2010-03-15 17:37:23,335 [main] INFO  
 org.apache.pig.backend.local.executionengine.LocalPigLauncher - Successfully 
 stored result in: file:/tmp/temp-1345435162/tmp-308780808
 2010-03-15 17:37:23,335 [main] INFO  
 org.apache.pig.backend.local.executionengine.LocalPigLauncher - Records 
 written : 1
 2010-03-15 17:37:23,335 [main] INFO  
 org.apache.pig.backend.local.executionengine.LocalPigLauncher - Bytes written 
 : 0
 2010-03-15 17:37:23,335 [main] INFO  
 org.apache.pig.backend.local.executionengine.LocalPigLauncher - 100% complete!
 2010-03-15 17:37:23,336 [main] INFO  
 org.apache.pig.backend.local.executionengine.LocalPigLauncher - Success!!
 ()
 If I have another file 'b' with the following entry:
 (30010401402L)
 grunt B = LOAD 'b' AS (t:tuple(a:chararray));
 grunt DUMP B;
 2010-03-15 17:39:10,051 [main] INFO  
 org.apache.pig.backend.local.executionengine.LocalPigLauncher - Successfully 
 stored result in: file:/tmp/temp-1630850555/tmp1316256240
 2010-03-15 17:39:10,051 [main] INFO  
 org.apache.pig.backend.local.executionengine.LocalPigLauncher - Records 
 written : 1
 2010-03-15 17:39:10,051 [main] INFO  
 org.apache.pig.backend.local.executionengine.LocalPigLauncher - Bytes written 
 : 0
 2010-03-15 17:39:10,051 [main] INFO  
 org.apache.pig.backend.local.executionengine.LocalPigLauncher - 100% complete!
 2010-03-15 17:39:10,052 [main] INFO  
 org.apache.pig.backend.local.executionengine.LocalPigLauncher - Success!!
 ((30010401402L))
 Is there a way to get the load in the first example to work?  Or do I need to 
 start affixing an L to all my #s? 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1300) PigStorage does not load tuples with large #s.

2010-03-15 Thread Daniel Dai (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1300?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12845641#action_12845641
 ] 

Daniel Dai commented on PIG-1300:
-

I just tried, it works in trunk. The fix will come with next release (0.7).

 PigStorage does not load tuples with large #s.
 --

 Key: PIG-1300
 URL: https://issues.apache.org/jira/browse/PIG-1300
 Project: Pig
  Issue Type: Bug
  Components: data
Reporter: Brian Donaldson
 Fix For: 0.7.0


 Say I have a file 'a' with the following entry:
 (30010401402)
 grunt A = LOAD 'a' AS (t:tuple(a:chararray));
 grunt DUMP A;
 2010-03-15 17:37:23,333 [main] WARN  
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigHadoopLogger 
 - org.apache.pig.builtin.PigStorage: Unable to interpret value [...@353c375 
 in field being converted to type tuple, caught Exception For input string: 
 30010401402 field discarded
 2010-03-15 17:37:23,335 [main] INFO  
 org.apache.pig.backend.local.executionengine.LocalPigLauncher - Successfully 
 stored result in: file:/tmp/temp-1345435162/tmp-308780808
 2010-03-15 17:37:23,335 [main] INFO  
 org.apache.pig.backend.local.executionengine.LocalPigLauncher - Records 
 written : 1
 2010-03-15 17:37:23,335 [main] INFO  
 org.apache.pig.backend.local.executionengine.LocalPigLauncher - Bytes written 
 : 0
 2010-03-15 17:37:23,335 [main] INFO  
 org.apache.pig.backend.local.executionengine.LocalPigLauncher - 100% complete!
 2010-03-15 17:37:23,336 [main] INFO  
 org.apache.pig.backend.local.executionengine.LocalPigLauncher - Success!!
 ()
 If I have another file 'b' with the following entry:
 (30010401402L)
 grunt B = LOAD 'b' AS (t:tuple(a:chararray));
 grunt DUMP B;
 2010-03-15 17:39:10,051 [main] INFO  
 org.apache.pig.backend.local.executionengine.LocalPigLauncher - Successfully 
 stored result in: file:/tmp/temp-1630850555/tmp1316256240
 2010-03-15 17:39:10,051 [main] INFO  
 org.apache.pig.backend.local.executionengine.LocalPigLauncher - Records 
 written : 1
 2010-03-15 17:39:10,051 [main] INFO  
 org.apache.pig.backend.local.executionengine.LocalPigLauncher - Bytes written 
 : 0
 2010-03-15 17:39:10,051 [main] INFO  
 org.apache.pig.backend.local.executionengine.LocalPigLauncher - 100% complete!
 2010-03-15 17:39:10,052 [main] INFO  
 org.apache.pig.backend.local.executionengine.LocalPigLauncher - Success!!
 ((30010401402L))
 Is there a way to get the load in the first example to work?  Or do I need to 
 start affixing an L to all my #s? 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Resolved: (PIG-1300) PigStorage does not load tuples with large #s.

2010-03-15 Thread Daniel Dai (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1300?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai resolved PIG-1300.
-

   Resolution: Fixed
Fix Version/s: 0.7.0

 PigStorage does not load tuples with large #s.
 --

 Key: PIG-1300
 URL: https://issues.apache.org/jira/browse/PIG-1300
 Project: Pig
  Issue Type: Bug
  Components: data
Reporter: Brian Donaldson
 Fix For: 0.7.0


 Say I have a file 'a' with the following entry:
 (30010401402)
 grunt A = LOAD 'a' AS (t:tuple(a:chararray));
 grunt DUMP A;
 2010-03-15 17:37:23,333 [main] WARN  
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigHadoopLogger 
 - org.apache.pig.builtin.PigStorage: Unable to interpret value [...@353c375 
 in field being converted to type tuple, caught Exception For input string: 
 30010401402 field discarded
 2010-03-15 17:37:23,335 [main] INFO  
 org.apache.pig.backend.local.executionengine.LocalPigLauncher - Successfully 
 stored result in: file:/tmp/temp-1345435162/tmp-308780808
 2010-03-15 17:37:23,335 [main] INFO  
 org.apache.pig.backend.local.executionengine.LocalPigLauncher - Records 
 written : 1
 2010-03-15 17:37:23,335 [main] INFO  
 org.apache.pig.backend.local.executionengine.LocalPigLauncher - Bytes written 
 : 0
 2010-03-15 17:37:23,335 [main] INFO  
 org.apache.pig.backend.local.executionengine.LocalPigLauncher - 100% complete!
 2010-03-15 17:37:23,336 [main] INFO  
 org.apache.pig.backend.local.executionengine.LocalPigLauncher - Success!!
 ()
 If I have another file 'b' with the following entry:
 (30010401402L)
 grunt B = LOAD 'b' AS (t:tuple(a:chararray));
 grunt DUMP B;
 2010-03-15 17:39:10,051 [main] INFO  
 org.apache.pig.backend.local.executionengine.LocalPigLauncher - Successfully 
 stored result in: file:/tmp/temp-1630850555/tmp1316256240
 2010-03-15 17:39:10,051 [main] INFO  
 org.apache.pig.backend.local.executionengine.LocalPigLauncher - Records 
 written : 1
 2010-03-15 17:39:10,051 [main] INFO  
 org.apache.pig.backend.local.executionengine.LocalPigLauncher - Bytes written 
 : 0
 2010-03-15 17:39:10,051 [main] INFO  
 org.apache.pig.backend.local.executionengine.LocalPigLauncher - 100% complete!
 2010-03-15 17:39:10,052 [main] INFO  
 org.apache.pig.backend.local.executionengine.LocalPigLauncher - Success!!
 ((30010401402L))
 Is there a way to get the load in the first example to work?  Or do I need to 
 start affixing an L to all my #s? 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1292) Interface Refinements

2010-03-15 Thread Pradeep Kamath (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12845717#action_12845717
 ] 

Pradeep Kamath commented on PIG-1292:
-

As Xuefu mentioned, we can get rid of the splitIdx argument in public 
WritableComparable? getSplitComparable(InputSplit split, int splitIdx).

Otherwise the changes look good, +1 for commit with the above change.

 Interface Refinements
 -

 Key: PIG-1292
 URL: https://issues.apache.org/jira/browse/PIG-1292
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.7.0
Reporter: Ashutosh Chauhan
Assignee: Ashutosh Chauhan
 Fix For: 0.7.0

 Attachments: pig-1292.patch, pig-interfaces.patch


 A loader can't implement both OrderedLoadFunc and IndexableLoadFunc, as both 
 are abstract classes instead of being interfaces.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.