from:"Pradeep Kamath"


 [ 
https://issues.apache.org/jira/browse/PIG-1317?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pradeep Kamath updated PIG-1317:


Attachment: PIG-1316.patch

Attached patch implements the change to cache the results of 
LoadMetadata.getSchema for use in future calls.

 LOLoad should cache results of LoadMetadata.getSchema() for use in subsequent 
 calls to LOLoad.getSchema() or LOLoad.determineSchema()
 -

 Key: PIG-1317
 URL: https://issues.apache.org/jira/browse/PIG-1317
 Project: Pig
  Issue Type: Improvement
Affects Versions: 0.7.0
Reporter: Pradeep Kamath
Assignee: Pradeep Kamath
 Fix For: 0.7.0


 In LOLoad.getProjectionMap(), the private method determineSchema() is called 
 which inturn calls LoadMetadata.getSchema() - the latter call could 
 potentially be expensive if the input file is read to determine the schema or 
 a metadata system is contacted to get the schema - determineSchema() can 
 cache the schema it gets so that subsequent calls use the cached version.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (PIG-1317) LOLoad should cache results of LoadMetadata.getSchema() for use in subsequent calls to LOLoad.getSchema() or LOLoad.determineSchema()


 [ 
https://issues.apache.org/jira/browse/PIG-1317?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pradeep Kamath updated PIG-1317:


Status: Open  (was: Patch Available)

Attached wrong patch file

 LOLoad should cache results of LoadMetadata.getSchema() for use in subsequent 
 calls to LOLoad.getSchema() or LOLoad.determineSchema()
 -

 Key: PIG-1317
 URL: https://issues.apache.org/jira/browse/PIG-1317
 Project: Pig
  Issue Type: Improvement
Affects Versions: 0.7.0
Reporter: Pradeep Kamath
Assignee: Pradeep Kamath
 Fix For: 0.7.0


 In LOLoad.getProjectionMap(), the private method determineSchema() is called 
 which inturn calls LoadMetadata.getSchema() - the latter call could 
 potentially be expensive if the input file is read to determine the schema or 
 a metadata system is contacted to get the schema - determineSchema() can 
 cache the schema it gets so that subsequent calls use the cached version.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (PIG-1317) LOLoad should cache results of LoadMetadata.getSchema() for use in subsequent calls to LOLoad.getSchema() or LOLoad.determineSchema()


 [ 
https://issues.apache.org/jira/browse/PIG-1317?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pradeep Kamath updated PIG-1317:


Status: Patch Available  (was: Open)

 LOLoad should cache results of LoadMetadata.getSchema() for use in subsequent 
 calls to LOLoad.getSchema() or LOLoad.determineSchema()
 -

 Key: PIG-1317
 URL: https://issues.apache.org/jira/browse/PIG-1317
 Project: Pig
  Issue Type: Improvement
Affects Versions: 0.7.0
Reporter: Pradeep Kamath
Assignee: Pradeep Kamath
 Fix For: 0.7.0


 In LOLoad.getProjectionMap(), the private method determineSchema() is called 
 which inturn calls LoadMetadata.getSchema() - the latter call could 
 potentially be expensive if the input file is read to determine the schema or 
 a metadata system is contacted to get the schema - determineSchema() can 
 cache the schema it gets so that subsequent calls use the cached version.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (PIG-1317) LOLoad should cache results of LoadMetadata.getSchema() for use in subsequent calls to LOLoad.getSchema() or LOLoad.determineSchema()


 [ 
https://issues.apache.org/jira/browse/PIG-1317?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pradeep Kamath updated PIG-1317:


Attachment: (was: PIG-1316.patch)

 LOLoad should cache results of LoadMetadata.getSchema() for use in subsequent 
 calls to LOLoad.getSchema() or LOLoad.determineSchema()
 -

 Key: PIG-1317
 URL: https://issues.apache.org/jira/browse/PIG-1317
 Project: Pig
  Issue Type: Improvement
Affects Versions: 0.7.0
Reporter: Pradeep Kamath
Assignee: Pradeep Kamath
 Fix For: 0.7.0


 In LOLoad.getProjectionMap(), the private method determineSchema() is called 
 which inturn calls LoadMetadata.getSchema() - the latter call could 
 potentially be expensive if the input file is read to determine the schema or 
 a metadata system is contacted to get the schema - determineSchema() can 
 cache the schema it gets so that subsequent calls use the cached version.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (PIG-1317) LOLoad should cache results of LoadMetadata.getSchema() for use in subsequent calls to LOLoad.getSchema() or LOLoad.determineSchema()


 [ 
https://issues.apache.org/jira/browse/PIG-1317?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pradeep Kamath updated PIG-1317:


Status: Patch Available  (was: Open)

 LOLoad should cache results of LoadMetadata.getSchema() for use in subsequent 
 calls to LOLoad.getSchema() or LOLoad.determineSchema()
 -

 Key: PIG-1317
 URL: https://issues.apache.org/jira/browse/PIG-1317
 Project: Pig
  Issue Type: Improvement
Affects Versions: 0.7.0
Reporter: Pradeep Kamath
Assignee: Pradeep Kamath
 Fix For: 0.7.0

 Attachments: PIG-1317.patch


 In LOLoad.getProjectionMap(), the private method determineSchema() is called 
 which inturn calls LoadMetadata.getSchema() - the latter call could 
 potentially be expensive if the input file is read to determine the schema or 
 a metadata system is contacted to get the schema - determineSchema() can 
 cache the schema it gets so that subsequent calls use the cached version.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (PIG-1317) LOLoad should cache results of LoadMetadata.getSchema() for use in subsequent calls to LOLoad.getSchema() or LOLoad.determineSchema()


 [ 
https://issues.apache.org/jira/browse/PIG-1317?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pradeep Kamath updated PIG-1317:


Attachment: PIG-1317.patch

Attached correct patch file now.

 LOLoad should cache results of LoadMetadata.getSchema() for use in subsequent 
 calls to LOLoad.getSchema() or LOLoad.determineSchema()
 -

 Key: PIG-1317
 URL: https://issues.apache.org/jira/browse/PIG-1317
 Project: Pig
  Issue Type: Improvement
Affects Versions: 0.7.0
Reporter: Pradeep Kamath
Assignee: Pradeep Kamath
 Fix For: 0.7.0

 Attachments: PIG-1317.patch


 In LOLoad.getProjectionMap(), the private method determineSchema() is called 
 which inturn calls LoadMetadata.getSchema() - the latter call could 
 potentially be expensive if the input file is read to determine the schema or 
 a metadata system is contacted to get the schema - determineSchema() can 
 cache the schema it gets so that subsequent calls use the cached version.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (PIG-1316) TextLoader should use Bzip2TextInputFormat for bzip files so that bzip files can be efficiently processed by splitting the files


 [ 
https://issues.apache.org/jira/browse/PIG-1316?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pradeep Kamath updated PIG-1316:


Attachment: PIG-1316.patch

Attached patch makes the required changes in TextLoader to use 
BZip2TextInputFormat if the load location ends with extension .bz or .bz2 
like PigStorage. Also for non bzip data, TextLoader will now use 
PigTextInputFormat rather than TextInputFormat so that input directories can be 
recursively traversed. I have also changed BZip2TextInputFormat to extend 
PigFileInputFormat instead of FileInputFormat for the same reason.

 TextLoader should use Bzip2TextInputFormat for bzip files so that bzip files 
 can be efficiently processed by splitting the files
 

 Key: PIG-1316
 URL: https://issues.apache.org/jira/browse/PIG-1316
 Project: Pig
  Issue Type: Improvement
Affects Versions: 0.7.0
Reporter: Pradeep Kamath
Assignee: Pradeep Kamath
 Fix For: 0.7.0

 Attachments: PIG-1316.patch


 Currently TextLoader uses TextInputFormat which does not split bzip files - 
 this can be fixed by using Bzip2TextInputformat.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (PIG-1308) Inifinite loop in JobClient when reading from BinStorage Message: [org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total input paths to process : 2]


 [ 
https://issues.apache.org/jira/browse/PIG-1308?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pradeep Kamath updated PIG-1308:


  Resolution: Fixed
Hadoop Flags: [Reviewed]
  Status: Resolved  (was: Patch Available)

Patch committed to trunk

 Inifinite loop in JobClient when reading from BinStorage Message: 
 [org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total input paths to 
 process : 2]
 

 Key: PIG-1308
 URL: https://issues.apache.org/jira/browse/PIG-1308
 Project: Pig
  Issue Type: Bug
Reporter: Viraj Bhat
Assignee: Pradeep Kamath
 Fix For: 0.7.0

 Attachments: PIG-1308.patch


 Simple script fails to read files from BinStorage() and fails to submit jobs 
 to JobTracker. This occurs with trunk and not with Pig 0.6 branch.
 {code}
 data = load 'binstoragesample' using BinStorage() as (s, m, l);
 A = foreach ULT generate   s#'key' as value;
 X = limit A 20;
 dump X;
 {code}
 When this script is submitted to the Jobtracker, we found the following error:
 2010-03-18 22:31:22,296 [main] INFO  
 org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total input paths to 
 process : 2
 2010-03-18 22:32:01,574 [main] INFO  
 org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total input paths to 
 process : 2
 2010-03-18 22:32:43,276 [main] INFO  
 org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total input paths to 
 process : 2
 2010-03-18 22:33:21,743 [main] INFO  
 org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total input paths to 
 process : 2
 2010-03-18 22:34:02,004 [main] INFO  
 org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total input paths to 
 process : 2
 2010-03-18 22:34:43,442 [main] INFO  
 org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total input paths to 
 process : 2
 2010-03-18 22:35:25,907 [main] INFO  
 org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total input paths to 
 process : 2
 2010-03-18 22:36:07,402 [main] INFO  
 org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total input paths to 
 process : 2
 2010-03-18 22:36:48,596 [main] INFO  
 org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total input paths to 
 process : 2
 2010-03-18 22:37:28,014 [main] INFO  
 org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total input paths to 
 process : 2
 2010-03-18 22:38:04,823 [main] INFO  
 org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total input paths to 
 process : 2
 2010-03-18 22:38:38,981 [main] INFO  
 org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total input paths to 
 process : 2
 2010-03-18 22:39:12,220 [main] INFO  
 org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total input paths to 
 process : 2
 Stack Trace revelead 
 at org.apache.pig.impl.io.ReadToEndLoader.init(ReadToEndLoader.java:144)
 at 
 org.apache.pig.impl.io.ReadToEndLoader.init(ReadToEndLoader.java:115)
 at org.apache.pig.builtin.BinStorage.getSchema(BinStorage.java:404)
 at 
 org.apache.pig.impl.logicalLayer.LOLoad.determineSchema(LOLoad.java:167)
 at 
 org.apache.pig.impl.logicalLayer.LOLoad.getProjectionMap(LOLoad.java:263)
 at 
 org.apache.pig.impl.logicalLayer.ProjectionMapCalculator.visit(ProjectionMapCalculator.java:112)
 at org.apache.pig.impl.logicalLayer.LOLoad.visit(LOLoad.java:210)
 at org.apache.pig.impl.logicalLayer.LOLoad.visit(LOLoad.java:52)
 at 
 org.apache.pig.impl.plan.DependencyOrderWalker.walk(DependencyOrderWalker.java:69)
 at org.apache.pig.impl.plan.PlanVisitor.visit(PlanVisitor.java:51)
 at 
 org.apache.pig.impl.logicalLayer.optimizer.LogicalTransformer.rebuildProjectionMaps(LogicalTransformer.java:76)
 at 
 org.apache.pig.impl.logicalLayer.optimizer.LogicalOptimizer.optimize(LogicalOptimizer.java:216)
 at org.apache.pig.PigServer.compileLp(PigServer.java:883)
 at org.apache.pig.PigServer.store(PigServer.java:564)
 The binstorage data was generated from 2 datasets using limit and union:
 {code}
 Large1 = load 'input1'  using PigStorage();
 Large2 = load 'input2' using PigStorage();
 V = limit Large1 1;
 C = limit Large2 1;
 U = union V, C;
 store U into 'binstoragesample' using BinStorage();
 {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Created: (PIG-1316) TextLoader should use Bzip2TextInputFormat for bzip files so that bzip files can be efficiently processed by splitting the files

TextLoader should use Bzip2TextInputFormat for bzip files so that bzip files 
can be efficiently processed by splitting the files


 Key: PIG-1316
 URL: https://issues.apache.org/jira/browse/PIG-1316
 Project: Pig
  Issue Type: Improvement
Affects Versions: 0.7.0
Reporter: Pradeep Kamath
Assignee: Pradeep Kamath
 Fix For: 0.7.0


Currently TextLoader uses TextInputFormat which does not split bzip files - 
this can be fixed by using Bzip2TextInputformat.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Created: (PIG-1317) LOLoad should cache results of LoadMetadata.getSchema() for use in subsequent calls to LOLoad.getSchema() or LOLoad.determineSchema()

LOLoad should cache results of LoadMetadata.getSchema() for use in subsequent 
calls to LOLoad.getSchema() or LOLoad.determineSchema()
-

 Key: PIG-1317
 URL: https://issues.apache.org/jira/browse/PIG-1317
 Project: Pig
  Issue Type: Improvement
Affects Versions: 0.7.0
Reporter: Pradeep Kamath
 Fix For: 0.7.0


In LOLoad.getProjectionMap(), the private method determineSchema() is called 
which inturn calls LoadMetadata.getSchema() - the latter call could potentially 
be expensive if the input file is read to determine the schema or a metadata 
system is contacted to get the schema - determineSchema() can cache the schema 
it gets so that subsequent calls use the cached version.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (PIG-1317) LOLoad should cache results of LoadMetadata.getSchema() for use in subsequent calls to LOLoad.getSchema() or LOLoad.determineSchema()


 [ 
https://issues.apache.org/jira/browse/PIG-1317?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pradeep Kamath updated PIG-1317:


Assignee: Pradeep Kamath

 LOLoad should cache results of LoadMetadata.getSchema() for use in subsequent 
 calls to LOLoad.getSchema() or LOLoad.determineSchema()
 -

 Key: PIG-1317
 URL: https://issues.apache.org/jira/browse/PIG-1317
 Project: Pig
  Issue Type: Improvement
Affects Versions: 0.7.0
Reporter: Pradeep Kamath
Assignee: Pradeep Kamath
 Fix For: 0.7.0


 In LOLoad.getProjectionMap(), the private method determineSchema() is called 
 which inturn calls LoadMetadata.getSchema() - the latter call could 
 potentially be expensive if the input file is read to determine the schema or 
 a metadata system is contacted to get the schema - determineSchema() can 
 cache the schema it gets so that subsequent calls use the cached version.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Created: (PIG-1323) Communicate whether the call to LoadFunc.setLocation is being made in hadoop's front end or backend

Communicate whether the call to LoadFunc.setLocation is being made in hadoop's 
front end or backend
---

 Key: PIG-1323
 URL: https://issues.apache.org/jira/browse/PIG-1323
 Project: Pig
  Issue Type: Improvement
Affects Versions: 0.7.0
Reporter: Pradeep Kamath
Assignee: Pradeep Kamath
 Fix For: 0.7.0


Loaders which interact with external systems like a metadata server may need to 
know if the LoadFunc.setLocation call happens from the frontend (on the client 
machine) or in the backend (on each map task). The Configuration in the Job 
argument to setLocation() can contain this information.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (PIG-1323) Communicate whether the call to LoadFunc.setLocation is being made in hadoop's front end or backend


 [ 
https://issues.apache.org/jira/browse/PIG-1323?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pradeep Kamath updated PIG-1323:


Status: Patch Available  (was: Open)

 Communicate whether the call to LoadFunc.setLocation is being made in 
 hadoop's front end or backend
 ---

 Key: PIG-1323
 URL: https://issues.apache.org/jira/browse/PIG-1323
 Project: Pig
  Issue Type: Improvement
Affects Versions: 0.7.0
Reporter: Pradeep Kamath
Assignee: Pradeep Kamath
 Fix For: 0.7.0

 Attachments: PIG-1323.patch


 Loaders which interact with external systems like a metadata server may need 
 to know if the LoadFunc.setLocation call happens from the frontend (on the 
 client machine) or in the backend (on each map task). The Configuration in 
 the Job argument to setLocation() can contain this information.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (PIG-1323) Communicate whether the call to LoadFunc.setLocation is being made in hadoop's front end or backend


 [ 
https://issues.apache.org/jira/browse/PIG-1323?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pradeep Kamath updated PIG-1323:


Attachment: PIG-1323.patch

Attached patch addresses the issue in the description by setting state in the 
Configuration depending on where in PigInputFormat the LoadFunc.setLocation() 
method is called. No tests are includes since testing this in a unit test 
framework is not feasible - I have manually tested this.

 Communicate whether the call to LoadFunc.setLocation is being made in 
 hadoop's front end or backend
 ---

 Key: PIG-1323
 URL: https://issues.apache.org/jira/browse/PIG-1323
 Project: Pig
  Issue Type: Improvement
Affects Versions: 0.7.0
Reporter: Pradeep Kamath
Assignee: Pradeep Kamath
 Fix For: 0.7.0

 Attachments: PIG-1323.patch


 Loaders which interact with external systems like a metadata server may need 
 to know if the LoadFunc.setLocation call happens from the frontend (on the 
 client machine) or in the backend (on each map task). The Configuration in 
 the Job argument to setLocation() can contain this information.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (PIG-1325) Provide a way to exclude a testcase when running ant test


 [ 
https://issues.apache.org/jira/browse/PIG-1325?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pradeep Kamath updated PIG-1325:


Attachment: PIG-1325.patch

Patch which allows to exclude a particular testcase from the ant test run. I 
am not submitting this to go through Hadoop QA since this is a build.xml change 
which has nothing which can be tested by the Hadoop QA process.

 Provide a way to exclude a testcase when running ant test
 ---

 Key: PIG-1325
 URL: https://issues.apache.org/jira/browse/PIG-1325
 Project: Pig
  Issue Type: Improvement
Reporter: Pradeep Kamath
Assignee: Pradeep Kamath
 Attachments: PIG-1325.patch


 Provide a way to exclude a testcase when running ant test

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (PIG-1325) Provide a way to exclude a testcase when running ant test


[ 
https://issues.apache.org/jira/browse/PIG-1325?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12848446#action_12848446
 ] 

Pradeep Kamath commented on PIG-1325:
-

I have tested locally that the change enables the feature requested.

 Provide a way to exclude a testcase when running ant test
 ---

 Key: PIG-1325
 URL: https://issues.apache.org/jira/browse/PIG-1325
 Project: Pig
  Issue Type: Improvement
Reporter: Pradeep Kamath
Assignee: Pradeep Kamath
 Attachments: PIG-1325.patch


 Provide a way to exclude a testcase when running ant test

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Resolved: (PIG-1325) Provide a way to exclude a testcase when running ant test


 [ 
https://issues.apache.org/jira/browse/PIG-1325?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pradeep Kamath resolved PIG-1325.
-

   Resolution: Fixed
Fix Version/s: 0.7.0
 Hadoop Flags: [Reviewed]

Patch committed.

 Provide a way to exclude a testcase when running ant test
 ---

 Key: PIG-1325
 URL: https://issues.apache.org/jira/browse/PIG-1325
 Project: Pig
  Issue Type: Improvement
Reporter: Pradeep Kamath
Assignee: Pradeep Kamath
 Fix For: 0.7.0

 Attachments: PIG-1325.patch


 Provide a way to exclude a testcase when running ant test

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (PIG-1285) Allow SingleTupleBag to be serialized

2010-03-20 Thread Pradeep Kamath (JIRA)


[ 
https://issues.apache.org/jira/browse/PIG-1285?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12847721#action_12847721
 ] 

Pradeep Kamath commented on PIG-1285:
-

yes

 Allow SingleTupleBag to be serialized
 -

 Key: PIG-1285
 URL: https://issues.apache.org/jira/browse/PIG-1285
 Project: Pig
  Issue Type: Improvement
Reporter: Dmitriy V. Ryaboy
Assignee: Dmitriy V. Ryaboy
 Fix For: 0.7.0

 Attachments: PIG-1285.patch


 Currently, Pig uses a SingleTupleBag for efficiency when a full-blown 
 spillable bag implementation is not needed in the Combiner optimization.
 Unfortunately this can create problems. The below Initial.exec() code fails 
 at run-time with the message that a SingleTupleBag cannot be serialized:
 {code}
 @Override
 public Tuple exec(Tuple in) throws IOException {
   // single record. just copy.
   if (in == null) return null;   
   try {
  Tuple resTuple = tupleFactory_.newTuple(in.size());
  for (int i=0; i in.size(); i++) {
resTuple.set(i, in.get(i));
 }
 return resTuple;
} catch (IOException e) {
  log.warn(e);
  return null;
   }
 }
 {code}
 The code below can fix the problem in the UDF, but it seems like something 
 that should be handled transparently, not requiring UDF authors to know about 
 SingleTupleBags.
 {code}
 @Override
 public Tuple exec(Tuple in) throws IOException {
   // single record. just copy.
   if (in == null) return null;   
   
   /*
* Unfortunately SingleTupleBags are not serializable. We cache whether 
 a given index contains a bag
* in the map below, and copy all bags into DefaultBags before 
 returning to avoid serialization exceptions.
*/
   MapInteger, Boolean isBagAtIndex = Maps.newHashMap();
   
   try {
 Tuple resTuple = tupleFactory_.newTuple(in.size());
 for (int i=0; i in.size(); i++) {
   Object obj = in.get(i);
   if (!isBagAtIndex.containsKey(i)) {
 isBagAtIndex.put(i, obj instanceof SingleTupleBag);
   }
   if (isBagAtIndex.get(i)) {
 DataBag newBag = bagFactory_.newDefaultBag();
 newBag.addAll((DataBag)obj);
 obj = newBag;
   }
   resTuple.set(i, obj);
 }
 return resTuple;
   } catch (IOException e) {
 log.warn(e);
 return null;
   }
 }
 {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (PIG-1308) Inifinite loop in JobClient when reading from BinStorage Message: [org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total input paths to process : 2]

2010-03-20 Thread Pradeep Kamath (JIRA)


 [ 
https://issues.apache.org/jira/browse/PIG-1308?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pradeep Kamath updated PIG-1308:


Attachment: PIG-1308.patch

The root cause of the issue is that the OpLimitOptimizer has a relaxed check() 
implementation which only checks if the node matched by RuleMatcher is a 
LOLimit which would be true any time there is a LOLimit in the plan. This 
results in the optimizer running 500 (the current max) iterations of all rules 
since the OpLimitOptimizer always matches.

The attached patch fixes the issue by tightening the implementation of 
OpLimitOptimizer.check() to return false in cases where LOLimit cannot be 
pushed up.

 Inifinite loop in JobClient when reading from BinStorage Message: 
 [org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total input paths to 
 process : 2]
 

 Key: PIG-1308
 URL: https://issues.apache.org/jira/browse/PIG-1308
 Project: Pig
  Issue Type: Bug
Reporter: Viraj Bhat
Assignee: Pradeep Kamath
 Fix For: 0.7.0

 Attachments: PIG-1308.patch


 Simple script fails to read files from BinStorage() and fails to submit jobs 
 to JobTracker. This occurs with trunk and not with Pig 0.6 branch.
 {code}
 data = load 'binstoragesample' using BinStorage() as (s, m, l);
 A = foreach ULT generate   s#'key' as value;
 X = limit A 20;
 dump X;
 {code}
 When this script is submitted to the Jobtracker, we found the following error:
 2010-03-18 22:31:22,296 [main] INFO  
 org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total input paths to 
 process : 2
 2010-03-18 22:32:01,574 [main] INFO  
 org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total input paths to 
 process : 2
 2010-03-18 22:32:43,276 [main] INFO  
 org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total input paths to 
 process : 2
 2010-03-18 22:33:21,743 [main] INFO  
 org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total input paths to 
 process : 2
 2010-03-18 22:34:02,004 [main] INFO  
 org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total input paths to 
 process : 2
 2010-03-18 22:34:43,442 [main] INFO  
 org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total input paths to 
 process : 2
 2010-03-18 22:35:25,907 [main] INFO  
 org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total input paths to 
 process : 2
 2010-03-18 22:36:07,402 [main] INFO  
 org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total input paths to 
 process : 2
 2010-03-18 22:36:48,596 [main] INFO  
 org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total input paths to 
 process : 2
 2010-03-18 22:37:28,014 [main] INFO  
 org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total input paths to 
 process : 2
 2010-03-18 22:38:04,823 [main] INFO  
 org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total input paths to 
 process : 2
 2010-03-18 22:38:38,981 [main] INFO  
 org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total input paths to 
 process : 2
 2010-03-18 22:39:12,220 [main] INFO  
 org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total input paths to 
 process : 2
 Stack Trace revelead 
 at org.apache.pig.impl.io.ReadToEndLoader.init(ReadToEndLoader.java:144)
 at 
 org.apache.pig.impl.io.ReadToEndLoader.init(ReadToEndLoader.java:115)
 at org.apache.pig.builtin.BinStorage.getSchema(BinStorage.java:404)
 at 
 org.apache.pig.impl.logicalLayer.LOLoad.determineSchema(LOLoad.java:167)
 at 
 org.apache.pig.impl.logicalLayer.LOLoad.getProjectionMap(LOLoad.java:263)
 at 
 org.apache.pig.impl.logicalLayer.ProjectionMapCalculator.visit(ProjectionMapCalculator.java:112)
 at org.apache.pig.impl.logicalLayer.LOLoad.visit(LOLoad.java:210)
 at org.apache.pig.impl.logicalLayer.LOLoad.visit(LOLoad.java:52)
 at 
 org.apache.pig.impl.plan.DependencyOrderWalker.walk(DependencyOrderWalker.java:69)
 at org.apache.pig.impl.plan.PlanVisitor.visit(PlanVisitor.java:51)
 at 
 org.apache.pig.impl.logicalLayer.optimizer.LogicalTransformer.rebuildProjectionMaps(LogicalTransformer.java:76)
 at 
 org.apache.pig.impl.logicalLayer.optimizer.LogicalOptimizer.optimize(LogicalOptimizer.java:216)
 at org.apache.pig.PigServer.compileLp(PigServer.java:883)
 at org.apache.pig.PigServer.store(PigServer.java:564)
 The binstorage data was generated from 2 datasets using limit and union:
 {code}
 Large1 = load 'input1'  using PigStorage();
 Large2 = load 'input2' using PigStorage();
 V = limit Large1 1;
 C = limit Large2 1;
 U = union V, C;
 store U into 'binstoragesample' using BinStorage();
 {code}

-- 
This message

[jira] Updated: (PIG-1308) Inifinite loop in JobClient when reading from BinStorage Message: [org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total input paths to process : 2]

2010-03-20 Thread Pradeep Kamath (JIRA)


 [ 
https://issues.apache.org/jira/browse/PIG-1308?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pradeep Kamath updated PIG-1308:


Status: Patch Available  (was: Open)

 Inifinite loop in JobClient when reading from BinStorage Message: 
 [org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total input paths to 
 process : 2]
 

 Key: PIG-1308
 URL: https://issues.apache.org/jira/browse/PIG-1308
 Project: Pig
  Issue Type: Bug
Reporter: Viraj Bhat
Assignee: Pradeep Kamath
 Fix For: 0.7.0

 Attachments: PIG-1308.patch


 Simple script fails to read files from BinStorage() and fails to submit jobs 
 to JobTracker. This occurs with trunk and not with Pig 0.6 branch.
 {code}
 data = load 'binstoragesample' using BinStorage() as (s, m, l);
 A = foreach ULT generate   s#'key' as value;
 X = limit A 20;
 dump X;
 {code}
 When this script is submitted to the Jobtracker, we found the following error:
 2010-03-18 22:31:22,296 [main] INFO  
 org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total input paths to 
 process : 2
 2010-03-18 22:32:01,574 [main] INFO  
 org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total input paths to 
 process : 2
 2010-03-18 22:32:43,276 [main] INFO  
 org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total input paths to 
 process : 2
 2010-03-18 22:33:21,743 [main] INFO  
 org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total input paths to 
 process : 2
 2010-03-18 22:34:02,004 [main] INFO  
 org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total input paths to 
 process : 2
 2010-03-18 22:34:43,442 [main] INFO  
 org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total input paths to 
 process : 2
 2010-03-18 22:35:25,907 [main] INFO  
 org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total input paths to 
 process : 2
 2010-03-18 22:36:07,402 [main] INFO  
 org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total input paths to 
 process : 2
 2010-03-18 22:36:48,596 [main] INFO  
 org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total input paths to 
 process : 2
 2010-03-18 22:37:28,014 [main] INFO  
 org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total input paths to 
 process : 2
 2010-03-18 22:38:04,823 [main] INFO  
 org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total input paths to 
 process : 2
 2010-03-18 22:38:38,981 [main] INFO  
 org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total input paths to 
 process : 2
 2010-03-18 22:39:12,220 [main] INFO  
 org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total input paths to 
 process : 2
 Stack Trace revelead 
 at org.apache.pig.impl.io.ReadToEndLoader.init(ReadToEndLoader.java:144)
 at 
 org.apache.pig.impl.io.ReadToEndLoader.init(ReadToEndLoader.java:115)
 at org.apache.pig.builtin.BinStorage.getSchema(BinStorage.java:404)
 at 
 org.apache.pig.impl.logicalLayer.LOLoad.determineSchema(LOLoad.java:167)
 at 
 org.apache.pig.impl.logicalLayer.LOLoad.getProjectionMap(LOLoad.java:263)
 at 
 org.apache.pig.impl.logicalLayer.ProjectionMapCalculator.visit(ProjectionMapCalculator.java:112)
 at org.apache.pig.impl.logicalLayer.LOLoad.visit(LOLoad.java:210)
 at org.apache.pig.impl.logicalLayer.LOLoad.visit(LOLoad.java:52)
 at 
 org.apache.pig.impl.plan.DependencyOrderWalker.walk(DependencyOrderWalker.java:69)
 at org.apache.pig.impl.plan.PlanVisitor.visit(PlanVisitor.java:51)
 at 
 org.apache.pig.impl.logicalLayer.optimizer.LogicalTransformer.rebuildProjectionMaps(LogicalTransformer.java:76)
 at 
 org.apache.pig.impl.logicalLayer.optimizer.LogicalOptimizer.optimize(LogicalOptimizer.java:216)
 at org.apache.pig.PigServer.compileLp(PigServer.java:883)
 at org.apache.pig.PigServer.store(PigServer.java:564)
 The binstorage data was generated from 2 datasets using limit and union:
 {code}
 Large1 = load 'input1'  using PigStorage();
 Large2 = load 'input2' using PigStorage();
 V = limit Large1 1;
 C = limit Large2 1;
 U = union V, C;
 store U into 'binstoragesample' using BinStorage();
 {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (PIG-1287) Use hadoop-0.20.2 with pig 0.7.0 release

2010-03-17 Thread Pradeep Kamath (JIRA)


[ 
https://issues.apache.org/jira/browse/PIG-1287?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12846637#action_12846637
 ] 

Pradeep Kamath commented on PIG-1287:
-

The unit test failures are because hadoop QA process is not using the 
hadoop.jar attached in this patch - I ran tests locally on mymachine with the 
new jar and they all passed.

 Use hadoop-0.20.2 with pig 0.7.0 release
 

 Key: PIG-1287
 URL: https://issues.apache.org/jira/browse/PIG-1287
 Project: Pig
  Issue Type: Task
Affects Versions: 0.7.0
Reporter: Pradeep Kamath
Assignee: Pradeep Kamath
 Fix For: 0.7.0

 Attachments: hadoop20.jar, PIG-1287-2.patch, PIG-1287.patch


 Use hadoop-0.20.2 with pig 0.7.0 release

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (PIG-1257) PigStorage per the new load-store redesign should support splitting of bzip files


[ 
https://issues.apache.org/jira/browse/PIG-1257?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12846038#action_12846038
 ] 

Pradeep Kamath commented on PIG-1257:
-

In the following case in inputData the record will end with \r won't it? 
(notice the \r in the middle after 2)
{code}
  1\t2\r3\t4, // '\r' case - this will be split into two tuples
{code}

 PigStorage per the new load-store redesign should support splitting of bzip 
 files
 -

 Key: PIG-1257
 URL: https://issues.apache.org/jira/browse/PIG-1257
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.7.0
Reporter: Pradeep Kamath
Assignee: Pradeep Kamath
 Fix For: 0.7.0

 Attachments: blockEndingInCR.txt.bz2, 
 blockHeaderEndsAt136500.txt.bz2, PIG-1257-2.patch, PIG-1257-3.patch, 
 PIG-1257.patch, recordLossblockHeaderEndsAt136500.txt.bz2


 PigStorage implemented per new load-store-redesign (PIG-966) is based on 
 TextInputFormat for reading data. TextInputFormat has support for reading 
 bzip data but without support for splitting bzip files. In pig 0.6, splitting 
 was enabled for bzip files - we should attempt to enable that feature.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (PIG-1257) PigStorage per the new load-store redesign should support splitting of bzip files


[ 
https://issues.apache.org/jira/browse/PIG-1257?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12846080#action_12846080
 ] 

Pradeep Kamath commented on PIG-1257:
-

I ran all unit tests on my local machines and also  the test-patch ant target:
[exec] +1 overall.  
 [exec] 
 [exec] +1 @author.  The patch does not contain any @author tags.
 [exec] 
 [exec] +1 tests included.  The patch appears to include 12 new or 
modified tests.
 [exec] 
 [exec] +1 javadoc.  The javadoc tool did not generate any warning 
messages.
 [exec] 
 [exec] +1 javac.  The applied patch does not increase the total number 
of javac compiler warnings.
 [exec] 
 [exec] +1 findbugs.  The patch does not introduce any new Findbugs 
warnings.
 [exec] 
 [exec] +1 release audit.  The applied patch does not increase the 
total number of release audit warnings.
 [exec] 
 [exec] 


 PigStorage per the new load-store redesign should support splitting of bzip 
 files
 -

 Key: PIG-1257
 URL: https://issues.apache.org/jira/browse/PIG-1257
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.7.0
Reporter: Pradeep Kamath
Assignee: Pradeep Kamath
 Fix For: 0.7.0

 Attachments: blockEndingInCR.txt.bz2, 
 blockHeaderEndsAt136500.txt.bz2, PIG-1257-2.patch, PIG-1257-3.patch, 
 PIG-1257.patch, recordLossblockHeaderEndsAt136500.txt.bz2


 PigStorage implemented per new load-store-redesign (PIG-966) is based on 
 TextInputFormat for reading data. TextInputFormat has support for reading 
 bzip data but without support for splitting bzip files. In pig 0.6, splitting 
 was enabled for bzip files - we should attempt to enable that feature.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Created: (PIG-1302) Include zebra's

Include zebra's 


 Key: PIG-1302
 URL: https://issues.apache.org/jira/browse/PIG-1302
 Project: Pig
  Issue Type: Improvement
Reporter: Pradeep Kamath




-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (PIG-1302) Include zebra's pigtest ant target as a part of pig's ant test target


 [ 
https://issues.apache.org/jira/browse/PIG-1302?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pradeep Kamath updated PIG-1302:


  Description: There are changes made in Pig interfaces which break 
zebra loaders/storers. It would be good to run the pig tests in the zebra unit 
tests as part of running pig's core-test for each patch submission. So 
essentially in the test ant target in pig, we would need to invoke zebra's 
pigtest target.
Affects Version/s: 0.7.0
Fix Version/s: 0.7.0
  Summary: Include zebra's pigtest ant target as a part of pig's 
ant test target  (was: Include zebra's )

 Include zebra's pigtest ant target as a part of pig's ant test target
 ---

 Key: PIG-1302
 URL: https://issues.apache.org/jira/browse/PIG-1302
 Project: Pig
  Issue Type: Improvement
Affects Versions: 0.7.0
Reporter: Pradeep Kamath
 Fix For: 0.7.0


 There are changes made in Pig interfaces which break zebra loaders/storers. 
 It would be good to run the pig tests in the zebra unit tests as part of 
 running pig's core-test for each patch submission. So essentially in the 
 test ant target in pig, we would need to invoke zebra's pigtest target.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (PIG-1257) PigStorage per the new load-store redesign should support splitting of bzip files


 [ 
https://issues.apache.org/jira/browse/PIG-1257?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pradeep Kamath updated PIG-1257:


  Resolution: Fixed
Hadoop Flags: [Reviewed]
  Status: Resolved  (was: Patch Available)

Patch committed

 PigStorage per the new load-store redesign should support splitting of bzip 
 files
 -

 Key: PIG-1257
 URL: https://issues.apache.org/jira/browse/PIG-1257
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.7.0
Reporter: Pradeep Kamath
Assignee: Pradeep Kamath
 Fix For: 0.7.0

 Attachments: blockEndingInCR.txt.bz2, 
 blockHeaderEndsAt136500.txt.bz2, PIG-1257-2.patch, PIG-1257-3.patch, 
 PIG-1257.patch, recordLossblockHeaderEndsAt136500.txt.bz2


 PigStorage implemented per new load-store-redesign (PIG-966) is based on 
 TextInputFormat for reading data. TextInputFormat has support for reading 
 bzip data but without support for splitting bzip files. In pig 0.6, splitting 
 was enabled for bzip files - we should attempt to enable that feature.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (PIG-1287) Use hadoop-0.20.2 with pig 0.7.0 release


 [ 
https://issues.apache.org/jira/browse/PIG-1287?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pradeep Kamath updated PIG-1287:


Attachment: PIG-1287-2.patch

The new patch also fixes warning aggregation in PigHadoopLogger to use the 
counter support now available in hadoop 0.20.2

 Use hadoop-0.20.2 with pig 0.7.0 release
 

 Key: PIG-1287
 URL: https://issues.apache.org/jira/browse/PIG-1287
 Project: Pig
  Issue Type: Task
Affects Versions: 0.7.0
Reporter: Pradeep Kamath
Assignee: Pradeep Kamath
 Fix For: 0.7.0

 Attachments: hadoop20.jar, PIG-1287-2.patch, PIG-1287.patch


 Use hadoop-0.20.2 with pig 0.7.0 release

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (PIG-1287) Use hadoop-0.20.2 with pig 0.7.0 release


 [ 
https://issues.apache.org/jira/browse/PIG-1287?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pradeep Kamath updated PIG-1287:


Status: Patch Available  (was: Open)

 Use hadoop-0.20.2 with pig 0.7.0 release
 

 Key: PIG-1287
 URL: https://issues.apache.org/jira/browse/PIG-1287
 Project: Pig
  Issue Type: Task
Affects Versions: 0.7.0
Reporter: Pradeep Kamath
Assignee: Pradeep Kamath
 Fix For: 0.7.0

 Attachments: hadoop20.jar, PIG-1287-2.patch, PIG-1287.patch


 Use hadoop-0.20.2 with pig 0.7.0 release

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (PIG-1205) Enhance HBaseStorage-- Make it support loading row key and implement StoreFunc


[ 
https://issues.apache.org/jira/browse/PIG-1205?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12846224#action_12846224
 ] 

Pradeep Kamath commented on PIG-1205:
-

Jeff, if the only issue blocking the commit is javac warning - unless the 
warning is due to use of deprecated hadoop API, we should fix it - if it is due 
to deprecated hadoop API then its ok to ignore. Very soon trunk will be 
branched for Pig 0.7.0 - so if this feature is useful to feature in Pig 0.7.0, 
we should get this committed soon.

 Enhance HBaseStorage-- Make it support loading row key and implement StoreFunc
 --

 Key: PIG-1205
 URL: https://issues.apache.org/jira/browse/PIG-1205
 Project: Pig
  Issue Type: Sub-task
Affects Versions: 0.7.0
Reporter: Jeff Zhang
Assignee: Jeff Zhang
 Fix For: 0.7.0

 Attachments: PIG_1205.patch, PIG_1205_2.patch, PIG_1205_3.patch, 
 PIG_1205_4.patch




-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (PIG-1285) Allow SingleTupleBag to be serialized


[ 
https://issues.apache.org/jira/browse/PIG-1285?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12845458#action_12845458
 ] 

Pradeep Kamath commented on PIG-1285:
-

Couple of comments:
 * I think instead of the code below the implementation of write should be 
inlined into SingleTupleBag.write() (I guess DefaultDataBag.write() and 
SingleTupleBag.write() could call a common method to implement write()).
{noformat}
+DataBag bag = bagFactory.newDefaultBag();
+bag.addAll(this);
+bag.write(out)
{noformat}

The reason is that bagFactory.newDefaultBag() registers the bag with the 
SpillableMemoryManager which inturn puts a weak reference to the bag on a 
Linked list - in the past we have seen this list grow in size and cause memory 
issue and was one of the main motivations for creating SingleTupleBag.

 * There is an implementation for write() but not read() - reading through the 
code I guess this is because during deserialization SingleTupleBag.read() will 
not be called but DefaultDataBag.read() would be called. I am wondering if 
leaving the SingleTupleBag.read() as-is is confusing since it throws an 
exception with the message - SingleTupleBag should never be serialized or 
deserialized.


 Allow SingleTupleBag to be serialized
 -

 Key: PIG-1285
 URL: https://issues.apache.org/jira/browse/PIG-1285
 Project: Pig
  Issue Type: Improvement
Reporter: Dmitriy V. Ryaboy
Assignee: Dmitriy V. Ryaboy
 Fix For: 0.7.0

 Attachments: PIG-1285.patch


 Currently, Pig uses a SingleTupleBag for efficiency when a full-blown 
 spillable bag implementation is not needed in the Combiner optimization.
 Unfortunately this can create problems. The below Initial.exec() code fails 
 at run-time with the message that a SingleTupleBag cannot be serialized:
 {code}
 @Override
 public Tuple exec(Tuple in) throws IOException {
   // single record. just copy.
   if (in == null) return null;   
   try {
  Tuple resTuple = tupleFactory_.newTuple(in.size());
  for (int i=0; i in.size(); i++) {
resTuple.set(i, in.get(i));
 }
 return resTuple;
} catch (IOException e) {
  log.warn(e);
  return null;
   }
 }
 {code}
 The code below can fix the problem in the UDF, but it seems like something 
 that should be handled transparently, not requiring UDF authors to know about 
 SingleTupleBags.
 {code}
 @Override
 public Tuple exec(Tuple in) throws IOException {
   // single record. just copy.
   if (in == null) return null;   
   
   /*
* Unfortunately SingleTupleBags are not serializable. We cache whether 
 a given index contains a bag
* in the map below, and copy all bags into DefaultBags before 
 returning to avoid serialization exceptions.
*/
   MapInteger, Boolean isBagAtIndex = Maps.newHashMap();
   
   try {
 Tuple resTuple = tupleFactory_.newTuple(in.size());
 for (int i=0; i in.size(); i++) {
   Object obj = in.get(i);
   if (!isBagAtIndex.containsKey(i)) {
 isBagAtIndex.put(i, obj instanceof SingleTupleBag);
   }
   if (isBagAtIndex.get(i)) {
 DataBag newBag = bagFactory_.newDefaultBag();
 newBag.addAll((DataBag)obj);
 obj = newBag;
   }
   resTuple.set(i, obj);
 }
 return resTuple;
   } catch (IOException e) {
 log.warn(e);
 return null;
   }
 }
 {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (PIG-1285) Allow SingleTupleBag to be serialized


[ 
https://issues.apache.org/jira/browse/PIG-1285?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12845606#action_12845606
 ] 

Pradeep Kamath commented on PIG-1285:
-

SingleTupleBag did not go the route of extending DefaultAbstractBag for a 
couple of reasons
1) The object would have few more members (like mMemSize* fields, mSize etc 
which are present in DefaultAbstractBag) - this would make the object bigger in 
memory and SingleTupleBag was designed to be used in map/combine phase with 
minimal memory overhead
2) The first point in my previous comment - we don't want this bag to register 
with SpillableMemoryManger which in turn puts a weak reference to the bag on a 
Linked list - in the past we have seen this list grow in size and itself cause 
memory issues

 Allow SingleTupleBag to be serialized
 -

 Key: PIG-1285
 URL: https://issues.apache.org/jira/browse/PIG-1285
 Project: Pig
  Issue Type: Improvement
Reporter: Dmitriy V. Ryaboy
Assignee: Dmitriy V. Ryaboy
 Fix For: 0.7.0

 Attachments: PIG-1285.patch


 Currently, Pig uses a SingleTupleBag for efficiency when a full-blown 
 spillable bag implementation is not needed in the Combiner optimization.
 Unfortunately this can create problems. The below Initial.exec() code fails 
 at run-time with the message that a SingleTupleBag cannot be serialized:
 {code}
 @Override
 public Tuple exec(Tuple in) throws IOException {
   // single record. just copy.
   if (in == null) return null;   
   try {
  Tuple resTuple = tupleFactory_.newTuple(in.size());
  for (int i=0; i in.size(); i++) {
resTuple.set(i, in.get(i));
 }
 return resTuple;
} catch (IOException e) {
  log.warn(e);
  return null;
   }
 }
 {code}
 The code below can fix the problem in the UDF, but it seems like something 
 that should be handled transparently, not requiring UDF authors to know about 
 SingleTupleBags.
 {code}
 @Override
 public Tuple exec(Tuple in) throws IOException {
   // single record. just copy.
   if (in == null) return null;   
   
   /*
* Unfortunately SingleTupleBags are not serializable. We cache whether 
 a given index contains a bag
* in the map below, and copy all bags into DefaultBags before 
 returning to avoid serialization exceptions.
*/
   MapInteger, Boolean isBagAtIndex = Maps.newHashMap();
   
   try {
 Tuple resTuple = tupleFactory_.newTuple(in.size());
 for (int i=0; i in.size(); i++) {
   Object obj = in.get(i);
   if (!isBagAtIndex.containsKey(i)) {
 isBagAtIndex.put(i, obj instanceof SingleTupleBag);
   }
   if (isBagAtIndex.get(i)) {
 DataBag newBag = bagFactory_.newDefaultBag();
 newBag.addAll((DataBag)obj);
 obj = newBag;
   }
   resTuple.set(i, obj);
 }
 return resTuple;
   } catch (IOException e) {
 log.warn(e);
 return null;
   }
 }
 {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (PIG-1257) PigStorage per the new load-store redesign should support splitting of bzip files


 [ 
https://issues.apache.org/jira/browse/PIG-1257?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pradeep Kamath updated PIG-1257:


Status: Open  (was: Patch Available)

 PigStorage per the new load-store redesign should support splitting of bzip 
 files
 -

 Key: PIG-1257
 URL: https://issues.apache.org/jira/browse/PIG-1257
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.7.0
Reporter: Pradeep Kamath
Assignee: Pradeep Kamath
 Fix For: 0.7.0

 Attachments: PIG-1257-2.patch, PIG-1257.patch


 PigStorage implemented per new load-store-redesign (PIG-966) is based on 
 TextInputFormat for reading data. TextInputFormat has support for reading 
 bzip data but without support for splitting bzip files. In pig 0.6, splitting 
 was enabled for bzip files - we should attempt to enable that feature.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (PIG-1257) PigStorage per the new load-store redesign should support splitting of bzip files


 [ 
https://issues.apache.org/jira/browse/PIG-1257?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pradeep Kamath updated PIG-1257:


Attachment: blockHeaderEndsAt136500.txt.bz2
blockEndingInCR.txt.bz2
PIG-1257-3.patch

Since the last patch, I uncovered some issue with code while testing some 
boundary conditions. I have fixed those in the new patch PIG-1257-3.patch and 
included those boundary conditions in testcases in TestBZip

 PigStorage per the new load-store redesign should support splitting of bzip 
 files
 -

 Key: PIG-1257
 URL: https://issues.apache.org/jira/browse/PIG-1257
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.7.0
Reporter: Pradeep Kamath
Assignee: Pradeep Kamath
 Fix For: 0.7.0

 Attachments: blockEndingInCR.txt.bz2, 
 blockHeaderEndsAt136500.txt.bz2, PIG-1257-2.patch, PIG-1257-3.patch, 
 PIG-1257.patch


 PigStorage implemented per new load-store-redesign (PIG-966) is based on 
 TextInputFormat for reading data. TextInputFormat has support for reading 
 bzip data but without support for splitting bzip files. In pig 0.6, splitting 
 was enabled for bzip files - we should attempt to enable that feature.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (PIG-1257) PigStorage per the new load-store redesign should support splitting of bzip files


 [ 
https://issues.apache.org/jira/browse/PIG-1257?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pradeep Kamath updated PIG-1257:


Status: Patch Available  (was: Open)

 PigStorage per the new load-store redesign should support splitting of bzip 
 files
 -

 Key: PIG-1257
 URL: https://issues.apache.org/jira/browse/PIG-1257
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.7.0
Reporter: Pradeep Kamath
Assignee: Pradeep Kamath
 Fix For: 0.7.0

 Attachments: blockEndingInCR.txt.bz2, 
 blockHeaderEndsAt136500.txt.bz2, PIG-1257-2.patch, PIG-1257-3.patch, 
 PIG-1257.patch, recordLossblockHeaderEndsAt136500.txt.bz2


 PigStorage implemented per new load-store-redesign (PIG-966) is based on 
 TextInputFormat for reading data. TextInputFormat has support for reading 
 bzip data but without support for splitting bzip files. In pig 0.6, splitting 
 was enabled for bzip files - we should attempt to enable that feature.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (PIG-1257) PigStorage per the new load-store redesign should support splitting of bzip files


 [ 
https://issues.apache.org/jira/browse/PIG-1257?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pradeep Kamath updated PIG-1257:


Attachment: recordLossblockHeaderEndsAt136500.txt.bz2

The .bz2 files attached to this issue should be put in 
test/org/apache/pig/test/data for this patch to pass unit tests.

 PigStorage per the new load-store redesign should support splitting of bzip 
 files
 -

 Key: PIG-1257
 URL: https://issues.apache.org/jira/browse/PIG-1257
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.7.0
Reporter: Pradeep Kamath
Assignee: Pradeep Kamath
 Fix For: 0.7.0

 Attachments: blockEndingInCR.txt.bz2, 
 blockHeaderEndsAt136500.txt.bz2, PIG-1257-2.patch, PIG-1257-3.patch, 
 PIG-1257.patch, recordLossblockHeaderEndsAt136500.txt.bz2


 PigStorage implemented per new load-store-redesign (PIG-966) is based on 
 TextInputFormat for reading data. TextInputFormat has support for reading 
 bzip data but without support for splitting bzip files. In pig 0.6, splitting 
 was enabled for bzip files - we should attempt to enable that feature.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (PIG-1292) Interface Refinements


[ 
https://issues.apache.org/jira/browse/PIG-1292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12845717#action_12845717
 ] 

Pradeep Kamath commented on PIG-1292:
-

As Xuefu mentioned, we can get rid of the splitIdx argument in public 
WritableComparable? getSplitComparable(InputSplit split, int splitIdx).

Otherwise the changes look good, +1 for commit with the above change.

 Interface Refinements
 -

 Key: PIG-1292
 URL: https://issues.apache.org/jira/browse/PIG-1292
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.7.0
Reporter: Ashutosh Chauhan
Assignee: Ashutosh Chauhan
 Fix For: 0.7.0

 Attachments: pig-1292.patch, pig-interfaces.patch


 A loader can't implement both OrderedLoadFunc and IndexableLoadFunc, as both 
 are abstract classes instead of being interfaces.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (PIG-1290) WeightedRangePartitioner should not check if input is empty if quantile file is empty

2010-03-13 Thread Pradeep Kamath (JIRA)


 [ 
https://issues.apache.org/jira/browse/PIG-1290?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pradeep Kamath updated PIG-1290:


  Resolution: Fixed
Hadoop Flags: [Reviewed]
  Status: Resolved  (was: Patch Available)

Core tests ran successfully on my machine and looking at the test report the 
failures seem transient. I haven't included new tests in this patch since an 
existing test covers the change in this patch.

Patch committed.

 WeightedRangePartitioner should not check if input is empty if quantile file 
 is empty
 -

 Key: PIG-1290
 URL: https://issues.apache.org/jira/browse/PIG-1290
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.6.0, 0.7.0
Reporter: Pradeep Kamath
Assignee: Pradeep Kamath
 Fix For: 0.7.0

 Attachments: PIG-1290.patch


 Currently WeightedRangePartitioner checks if the input is also empty if the 
 quantile file is empty. For this it tries to read the input (which under the 
 covers will result in creating splits for the input etc). If the input is a 
 directory with many files, this could result in many calls to the namenode 
 from each task - this can be avoided.
 If the input is non empty and quantile file is empty, then we would error out 
 anyway (this should be confirmed). Also while fixing this jira we should 
 ensure that pig can still do order by on empty input.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (PIG-1290) WeightedRangePartitioner should not check if input is empty if quantile file is empty

2010-03-12 Thread Pradeep Kamath (JIRA)


 [ 
https://issues.apache.org/jira/browse/PIG-1290?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pradeep Kamath updated PIG-1290:


Status: Open  (was: Patch Available)

 WeightedRangePartitioner should not check if input is empty if quantile file 
 is empty
 -

 Key: PIG-1290
 URL: https://issues.apache.org/jira/browse/PIG-1290
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.6.0, 0.7.0
Reporter: Pradeep Kamath
Assignee: Pradeep Kamath
 Fix For: 0.7.0

 Attachments: PIG-1290.patch


 Currently WeightedRangePartitioner checks if the input is also empty if the 
 quantile file is empty. For this it tries to read the input (which under the 
 covers will result in creating splits for the input etc). If the input is a 
 directory with many files, this could result in many calls to the namenode 
 from each task - this can be avoided.
 If the input is non empty and quantile file is empty, then we would error out 
 anyway (this should be confirmed). Also while fixing this jira we should 
 ensure that pig can still do order by on empty input.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (PIG-1290) WeightedRangePartitioner should not check if input is empty if quantile file is empty

2010-03-12 Thread Pradeep Kamath (JIRA)


 [ 
https://issues.apache.org/jira/browse/PIG-1290?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pradeep Kamath updated PIG-1290:


Status: Patch Available  (was: Open)

Looks like the unit test failure was due to some other check in which has now 
got fixed - resubmitting

 WeightedRangePartitioner should not check if input is empty if quantile file 
 is empty
 -

 Key: PIG-1290
 URL: https://issues.apache.org/jira/browse/PIG-1290
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.6.0, 0.7.0
Reporter: Pradeep Kamath
Assignee: Pradeep Kamath
 Fix For: 0.7.0

 Attachments: PIG-1290.patch


 Currently WeightedRangePartitioner checks if the input is also empty if the 
 quantile file is empty. For this it tries to read the input (which under the 
 covers will result in creating splits for the input etc). If the input is a 
 directory with many files, this could result in many calls to the namenode 
 from each task - this can be avoided.
 If the input is non empty and quantile file is empty, then we would error out 
 anyway (this should be confirmed). Also while fixing this jira we should 
 ensure that pig can still do order by on empty input.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (PIG-1290) WeightedRangePartitioner should not check if input is empty if quantile file is empty

2010-03-12 Thread Pradeep Kamath (JIRA)


 [ 
https://issues.apache.org/jira/browse/PIG-1290?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pradeep Kamath updated PIG-1290:


Status: Patch Available  (was: Open)

Again there seem to be transient unrelated test failures - am resubmitting one 
more time - will also kick off a unit test run on my machine.

 WeightedRangePartitioner should not check if input is empty if quantile file 
 is empty
 -

 Key: PIG-1290
 URL: https://issues.apache.org/jira/browse/PIG-1290
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.6.0, 0.7.0
Reporter: Pradeep Kamath
Assignee: Pradeep Kamath
 Fix For: 0.7.0

 Attachments: PIG-1290.patch


 Currently WeightedRangePartitioner checks if the input is also empty if the 
 quantile file is empty. For this it tries to read the input (which under the 
 covers will result in creating splits for the input etc). If the input is a 
 directory with many files, this could result in many calls to the namenode 
 from each task - this can be avoided.
 If the input is non empty and quantile file is empty, then we would error out 
 anyway (this should be confirmed). Also while fixing this jira we should 
 ensure that pig can still do order by on empty input.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Assigned: (PIG-1290) WeightedRangePartitioner should not check if input is empty if quantile file is empty

2010-03-11 Thread Pradeep Kamath (JIRA)


 [ 
https://issues.apache.org/jira/browse/PIG-1290?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pradeep Kamath reassigned PIG-1290:
---

Assignee: Pradeep Kamath

 WeightedRangePartitioner should not check if input is empty if quantile file 
 is empty
 -

 Key: PIG-1290
 URL: https://issues.apache.org/jira/browse/PIG-1290
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.6.0, 0.7.0
Reporter: Pradeep Kamath
Assignee: Pradeep Kamath
 Fix For: 0.7.0


 Currently WeightedRangePartitioner checks if the input is also empty if the 
 quantile file is empty. For this it tries to read the input (which under the 
 covers will result in creating splits for the input etc). If the input is a 
 directory with many files, this could result in many calls to the namenode 
 from each task - this can be avoided.
 If the input is non empty and quantile file is empty, then we would error out 
 anyway (this should be confirmed). Also while fixing this jira we should 
 ensure that pig can still do order by on empty input.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (PIG-1290) WeightedRangePartitioner should not check if input is empty if quantile file is empty

2010-03-11 Thread Pradeep Kamath (JIRA)


 [ 
https://issues.apache.org/jira/browse/PIG-1290?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pradeep Kamath updated PIG-1290:


Status: Patch Available  (was: Open)

 WeightedRangePartitioner should not check if input is empty if quantile file 
 is empty
 -

 Key: PIG-1290
 URL: https://issues.apache.org/jira/browse/PIG-1290
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.6.0, 0.7.0
Reporter: Pradeep Kamath
Assignee: Pradeep Kamath
 Fix For: 0.7.0

 Attachments: PIG-1290.patch


 Currently WeightedRangePartitioner checks if the input is also empty if the 
 quantile file is empty. For this it tries to read the input (which under the 
 covers will result in creating splits for the input etc). If the input is a 
 directory with many files, this could result in many calls to the namenode 
 from each task - this can be avoided.
 If the input is non empty and quantile file is empty, then we would error out 
 anyway (this should be confirmed). Also while fixing this jira we should 
 ensure that pig can still do order by on empty input.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (PIG-1290) WeightedRangePartitioner should not check if input is empty if quantile file is empty

2010-03-11 Thread Pradeep Kamath (JIRA)


 [ 
https://issues.apache.org/jira/browse/PIG-1290?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pradeep Kamath updated PIG-1290:


Attachment: PIG-1290.patch

Attached patch removes the check in WeightedRangePartitioner to check that the 
input is empty when quantile file is empty. There is already a test 
-testEmptyStore in TestEvalPipeline2 to test that pig handles order by on empty 
files fine - so this patch does not include any new tests.

 WeightedRangePartitioner should not check if input is empty if quantile file 
 is empty
 -

 Key: PIG-1290
 URL: https://issues.apache.org/jira/browse/PIG-1290
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.6.0, 0.7.0
Reporter: Pradeep Kamath
Assignee: Pradeep Kamath
 Fix For: 0.7.0

 Attachments: PIG-1290.patch


 Currently WeightedRangePartitioner checks if the input is also empty if the 
 quantile file is empty. For this it tries to read the input (which under the 
 covers will result in creating splits for the input etc). If the input is a 
 directory with many files, this could result in many calls to the namenode 
 from each task - this can be avoided.
 If the input is non empty and quantile file is empty, then we would error out 
 anyway (this should be confirmed). Also while fixing this jira we should 
 ensure that pig can still do order by on empty input.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (PIG-1205) Enhance HBaseStorage-- Make it support loading row key and implement StoreFunc

2010-03-10 Thread Pradeep Kamath (JIRA)


[ 
https://issues.apache.org/jira/browse/PIG-1205?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12843949#action_12843949
 ] 

Pradeep Kamath commented on PIG-1205:
-

Jeff, unless the warning is due to use of deprecated hadoop API, we should fix 
it - if it is due to deprecated hadoop API then its ok to ignore.

 Enhance HBaseStorage-- Make it support loading row key and implement StoreFunc
 --

 Key: PIG-1205
 URL: https://issues.apache.org/jira/browse/PIG-1205
 Project: Pig
  Issue Type: Sub-task
Affects Versions: 0.7.0
Reporter: Jeff Zhang
Assignee: Jeff Zhang
 Fix For: 0.7.0

 Attachments: PIG_1205.patch, PIG_1205_2.patch, PIG_1205_3.patch, 
 PIG_1205_4.patch




-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (PIG-1287) Use hadoop-0.20.2 with pig 0.7.0 release

2010-03-09 Thread Pradeep Kamath (JIRA)


[ 
https://issues.apache.org/jira/browse/PIG-1287?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12843381#action_12843381
 ] 

Pradeep Kamath commented on PIG-1287:
-

0.20.2 is supposed to backward compatible with 0.20.1 - I am also running some 
tests on a 0.20.1 cluster to ensure that there are no failures due to 
incompatiblities.

 Use hadoop-0.20.2 with pig 0.7.0 release
 

 Key: PIG-1287
 URL: https://issues.apache.org/jira/browse/PIG-1287
 Project: Pig
  Issue Type: Task
Affects Versions: 0.7.0
Reporter: Pradeep Kamath
Assignee: Pradeep Kamath
 Fix For: 0.7.0

 Attachments: hadoop20.jar, PIG-1287.patch


 Use hadoop-0.20.2 with pig 0.7.0 release

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (PIG-1205) Enhance HBaseStorage-- Make it support loading row key and implement StoreFunc

2010-03-08 Thread Pradeep Kamath (JIRA)


 [ 
https://issues.apache.org/jira/browse/PIG-1205?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pradeep Kamath updated PIG-1205:


Status: Patch Available  (was: Open)

 Enhance HBaseStorage-- Make it support loading row key and implement StoreFunc
 --

 Key: PIG-1205
 URL: https://issues.apache.org/jira/browse/PIG-1205
 Project: Pig
  Issue Type: Sub-task
Affects Versions: 0.7.0
Reporter: Jeff Zhang
Assignee: Jeff Zhang
 Fix For: 0.7.0

 Attachments: PIG_1205.patch, PIG_1205_2.patch, PIG_1205_3.patch




-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (PIG-1205) Enhance HBaseStorage-- Make it support loading row key and implement StoreFunc

2010-03-05 Thread Pradeep Kamath (JIRA)


[ 
https://issues.apache.org/jira/browse/PIG-1205?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12841937#action_12841937
 ] 

Pradeep Kamath commented on PIG-1205:
-

Review comments:
1) The top level comment in HBaseStorage reads - A Hbase loader - am 
wondering if it is worth keeping it a loader (maybe change the name to 
HBaseLoader) and create a separate Storer which extends StoreFunc rather than 
have HBaseStorage implement StoreFuncInterface - by extending the StoreFunc, if 
new functions with default implementations are added then the Storer will not 
need to change. The disadvantage is if we call the loader HBaseLoader, existing 
users of HBaseStorage would have to change their scripts to use HBaseLoader 
instead. This is just a suggestion - I am fine if HBaseStorage does both load 
and store and implements StoreFuncInterface - Jeff I will let you decide which 
is better. If you choose to do both load and store in HBaseStorage change the 
top level comment accordingly.
2) The following method implementation should change from:

{code}
  @Override 

   
  public String relToAbsPathForStoreLocation(String location, Path curDir)  

   
  throws IOException {  

   
  // TODO Auto-generated method stub

   
  return null;  

   
  }   
{code}

to

{code}
  @Override 

   
  public String relToAbsPathForStoreLocation(String location, Path curDir)  

   
  throws IOException {  

   
  return location;  

   
  }   
{code}

Also, do address the javadoc/javac issues reported above.

If the above are addressed, +1 for the patch (I don't have enough HBase 
knowledge to review the HBase specific code - I have only reviewed the use of 
load/store API).


 Enhance HBaseStorage-- Make it support loading row key and implement StoreFunc
 --

 Key: PIG-1205
 URL: https://issues.apache.org/jira/browse/PIG-1205
 Project: Pig
  Issue Type: Sub-task
Affects Versions: 0.7.0
Reporter: Jeff Zhang
Assignee: Jeff Zhang
 Fix For: 0.7.0

 Attachments: PIG_1205.patch, PIG_1205_2.patch




-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (PIG-1265) Change LoadMetadata and StoreMetadata to use Job instead of Configuraiton and add a cleanupOnFailure method to StoreFuncInterface

2010-03-01 Thread Pradeep Kamath (JIRA)


 [ 
https://issues.apache.org/jira/browse/PIG-1265?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pradeep Kamath updated PIG-1265:


Status: Patch Available  (was: Open)

 Change LoadMetadata and StoreMetadata to use Job instead of Configuraiton and 
 add a cleanupOnFailure method to StoreFuncInterface
 -

 Key: PIG-1265
 URL: https://issues.apache.org/jira/browse/PIG-1265
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.7.0
Reporter: Pradeep Kamath
Assignee: Pradeep Kamath
 Fix For: 0.7.0

 Attachments: PIG-1265-2.patch, PIG-1265.patch


 Speaking to the hadoop team folks, the direction in hadoop is to use Job 
 instead of Configuration - for example InputFormat/OutputFormat 
 implementations use Job to store input/output location. So pig should also do 
 the same in LoadMetadata and StoreMetadata to be closer to hadoop.
 Currently when a job fails, pig assumes the output locations (corresponding 
 to the stores in the job) are hdfs locations and attempts to delete them. 
 Since output locations could be non hdfs locations, this cleanup should be 
 delegated to the StoreFuncInterface implementation - hence a new method - 
 cleanupOnFailure() should be introduced in StoreFuncInterface and a default 
 implementation should be provided in the StoreFunc abstract class which 
 checks if the location exists on hdfs and deletes it if so.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (PIG-1265) Change LoadMetadata and StoreMetadata to use Job instead of Configuraiton and add a cleanupOnFailure method to StoreFuncInterface

2010-03-01 Thread Pradeep Kamath (JIRA)


 [ 
https://issues.apache.org/jira/browse/PIG-1265?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pradeep Kamath updated PIG-1265:


Attachment: PIG-1265-2.patch

There were some failures in zebra nightly tests which are addressed in the new 
patch.

 Change LoadMetadata and StoreMetadata to use Job instead of Configuraiton and 
 add a cleanupOnFailure method to StoreFuncInterface
 -

 Key: PIG-1265
 URL: https://issues.apache.org/jira/browse/PIG-1265
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.7.0
Reporter: Pradeep Kamath
Assignee: Pradeep Kamath
 Fix For: 0.7.0

 Attachments: PIG-1265-2.patch, PIG-1265.patch


 Speaking to the hadoop team folks, the direction in hadoop is to use Job 
 instead of Configuration - for example InputFormat/OutputFormat 
 implementations use Job to store input/output location. So pig should also do 
 the same in LoadMetadata and StoreMetadata to be closer to hadoop.
 Currently when a job fails, pig assumes the output locations (corresponding 
 to the stores in the job) are hdfs locations and attempts to delete them. 
 Since output locations could be non hdfs locations, this cleanup should be 
 delegated to the StoreFuncInterface implementation - hence a new method - 
 cleanupOnFailure() should be introduced in StoreFuncInterface and a default 
 implementation should be provided in the StoreFunc abstract class which 
 checks if the location exists on hdfs and deletes it if so.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (PIG-1265) Change LoadMetadata and StoreMetadata to use Job instead of Configuraiton and add a cleanupOnFailure method to StoreFuncInterface

2010-03-01 Thread Pradeep Kamath (JIRA)


 [ 
https://issues.apache.org/jira/browse/PIG-1265?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pradeep Kamath updated PIG-1265:


Status: Open  (was: Patch Available)

 Change LoadMetadata and StoreMetadata to use Job instead of Configuraiton and 
 add a cleanupOnFailure method to StoreFuncInterface
 -

 Key: PIG-1265
 URL: https://issues.apache.org/jira/browse/PIG-1265
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.7.0
Reporter: Pradeep Kamath
Assignee: Pradeep Kamath
 Fix For: 0.7.0

 Attachments: PIG-1265-2.patch, PIG-1265.patch


 Speaking to the hadoop team folks, the direction in hadoop is to use Job 
 instead of Configuration - for example InputFormat/OutputFormat 
 implementations use Job to store input/output location. So pig should also do 
 the same in LoadMetadata and StoreMetadata to be closer to hadoop.
 Currently when a job fails, pig assumes the output locations (corresponding 
 to the stores in the job) are hdfs locations and attempts to delete them. 
 Since output locations could be non hdfs locations, this cleanup should be 
 delegated to the StoreFuncInterface implementation - hence a new method - 
 cleanupOnFailure() should be introduced in StoreFuncInterface and a default 
 implementation should be provided in the StoreFunc abstract class which 
 checks if the location exists on hdfs and deletes it if so.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (PIG-1265) Change LoadMetadata and StoreMetadata to use Job instead of Configuraiton and add a cleanupOnFailure method to StoreFuncInterface

2010-02-28 Thread Pradeep Kamath (JIRA)


[ 
https://issues.apache.org/jira/browse/PIG-1265?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12839530#action_12839530
 ] 

Pradeep Kamath commented on PIG-1265:
-

All unit tests succeeded on a local run on my machine.

 Change LoadMetadata and StoreMetadata to use Job instead of Configuraiton and 
 add a cleanupOnFailure method to StoreFuncInterface
 -

 Key: PIG-1265
 URL: https://issues.apache.org/jira/browse/PIG-1265
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.7.0
Reporter: Pradeep Kamath
Assignee: Pradeep Kamath
 Fix For: 0.7.0

 Attachments: PIG-1265.patch


 Speaking to the hadoop team folks, the direction in hadoop is to use Job 
 instead of Configuration - for example InputFormat/OutputFormat 
 implementations use Job to store input/output location. So pig should also do 
 the same in LoadMetadata and StoreMetadata to be closer to hadoop.
 Currently when a job fails, pig assumes the output locations (corresponding 
 to the stores in the job) are hdfs locations and attempts to delete them. 
 Since output locations could be non hdfs locations, this cleanup should be 
 delegated to the StoreFuncInterface implementation - hence a new method - 
 cleanupOnFailure() should be introduced in StoreFuncInterface and a default 
 implementation should be provided in the StoreFunc abstract class which 
 checks if the location exists on hdfs and deletes it if so.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Created: (PIG-1265) Change LoadMetadata and StoreMetadata to use Job instead of Configuraiton and add a cleanupOnFailure method to StoreFuncInterface

2010-02-26 Thread Pradeep Kamath (JIRA)

Change LoadMetadata and StoreMetadata to use Job instead of Configuraiton and 
add a cleanupOnFailure method to StoreFuncInterface
-

 Key: PIG-1265
 URL: https://issues.apache.org/jira/browse/PIG-1265
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.7.0
Reporter: Pradeep Kamath
Assignee: Pradeep Kamath
 Fix For: 0.7.0


Speaking to the hadoop team folks, the direction in hadoop is to use Job 
instead of Configuration - for example InputFormat/OutputFormat implementations 
use Job to store input/output location. So pig should also do the same in 
LoadMetadata and StoreMetadata to be closer to hadoop.

Currently when a job fails, pig assumes the output locations (corresponding to 
the stores in the job) are hdfs locations and attempts to delete them. Since 
output locations could be non hdfs locations, this cleanup should be delegated 
to the StoreFuncInterface implementation - hence a new method - 
cleanupOnFailure() should be introduced in StoreFuncInterface and a default 
implementation should be provided in the StoreFunc abstract class which checks 
if the location exists on hdfs and deletes it if so.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (PIG-1265) Change LoadMetadata and StoreMetadata to use Job instead of Configuraiton and add a cleanupOnFailure method to StoreFuncInterface

2010-02-26 Thread Pradeep Kamath (JIRA)


 [ 
https://issues.apache.org/jira/browse/PIG-1265?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pradeep Kamath updated PIG-1265:


Assignee: Pradeep Kamath  (was: Pradeep Kamath)
  Status: Patch Available  (was: Open)

 Change LoadMetadata and StoreMetadata to use Job instead of Configuraiton and 
 add a cleanupOnFailure method to StoreFuncInterface
 -

 Key: PIG-1265
 URL: https://issues.apache.org/jira/browse/PIG-1265
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.7.0
Reporter: Pradeep Kamath
Assignee: Pradeep Kamath
 Fix For: 0.7.0

 Attachments: PIG-1265.patch


 Speaking to the hadoop team folks, the direction in hadoop is to use Job 
 instead of Configuration - for example InputFormat/OutputFormat 
 implementations use Job to store input/output location. So pig should also do 
 the same in LoadMetadata and StoreMetadata to be closer to hadoop.
 Currently when a job fails, pig assumes the output locations (corresponding 
 to the stores in the job) are hdfs locations and attempts to delete them. 
 Since output locations could be non hdfs locations, this cleanup should be 
 delegated to the StoreFuncInterface implementation - hence a new method - 
 cleanupOnFailure() should be introduced in StoreFuncInterface and a default 
 implementation should be provided in the StoreFunc abstract class which 
 checks if the location exists on hdfs and deletes it if so.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (PIG-1259) ResourceFieldSchema.setSchema should not allow a bag field without a Tuple as its only sub field (the tuple itself can have a schema with 1 subfields)


 [ 
https://issues.apache.org/jira/browse/PIG-1259?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pradeep Kamath updated PIG-1259:


Attachment: PIG-1259-2.patch

Patch to address unit test failures - some tests had a missing try-catch block

 ResourceFieldSchema.setSchema should not allow a bag field without a Tuple as 
 its only sub field  (the tuple itself can have a schema with  1 subfields)
 -

 Key: PIG-1259
 URL: https://issues.apache.org/jira/browse/PIG-1259
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.7.0
Reporter: Pradeep Kamath
Assignee: Pradeep Kamath
 Fix For: 0.7.0

 Attachments: PIG-1259-2.patch, PIG-1259.patch


 Currently Schema.getPigSchema(ResourceSchema) does not allow a bag field in 
 the ResourceSchema with a subschema containing anything other than a tuple. 
 The tuple itself can have a schema with  1 subfields. This check should also 
  be enforced in ResourceFieldSchema.setSchema()

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (PIG-1259) ResourceFieldSchema.setSchema should not allow a bag field without a Tuple as its only sub field (the tuple itself can have a schema with 1 subfields)


 [ 
https://issues.apache.org/jira/browse/PIG-1259?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pradeep Kamath updated PIG-1259:


Status: Patch Available  (was: Open)

 ResourceFieldSchema.setSchema should not allow a bag field without a Tuple as 
 its only sub field  (the tuple itself can have a schema with  1 subfields)
 -

 Key: PIG-1259
 URL: https://issues.apache.org/jira/browse/PIG-1259
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.7.0
Reporter: Pradeep Kamath
Assignee: Pradeep Kamath
 Fix For: 0.7.0

 Attachments: PIG-1259-2.patch, PIG-1259.patch


 Currently Schema.getPigSchema(ResourceSchema) does not allow a bag field in 
 the ResourceSchema with a subschema containing anything other than a tuple. 
 The tuple itself can have a schema with  1 subfields. This check should also 
  be enforced in ResourceFieldSchema.setSchema()

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (PIG-1259) ResourceFieldSchema.setSchema should not allow a bag field without a Tuple as its only sub field (the tuple itself can have a schema with 1 subfields)


 [ 
https://issues.apache.org/jira/browse/PIG-1259?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pradeep Kamath updated PIG-1259:


Status: Open  (was: Patch Available)

 ResourceFieldSchema.setSchema should not allow a bag field without a Tuple as 
 its only sub field  (the tuple itself can have a schema with  1 subfields)
 -

 Key: PIG-1259
 URL: https://issues.apache.org/jira/browse/PIG-1259
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.7.0
Reporter: Pradeep Kamath
Assignee: Pradeep Kamath
 Fix For: 0.7.0

 Attachments: PIG-1259-2.patch, PIG-1259.patch


 Currently Schema.getPigSchema(ResourceSchema) does not allow a bag field in 
 the ResourceSchema with a subschema containing anything other than a tuple. 
 The tuple itself can have a schema with  1 subfields. This check should also 
  be enforced in ResourceFieldSchema.setSchema()

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (PIG-1205) Enhance HBaseStorage-- Make it support loading row key and implement StoreFunc


[ 
https://issues.apache.org/jira/browse/PIG-1205?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12838524#action_12838524
 ] 

Pradeep Kamath commented on PIG-1205:
-

Jeff, the patch no longer applies cleanly on trunk - looks like we missed 
reviewing this earlier - sorry about that - can you regenerate this patch 
against trunk?

 Enhance HBaseStorage-- Make it support loading row key and implement StoreFunc
 --

 Key: PIG-1205
 URL: https://issues.apache.org/jira/browse/PIG-1205
 Project: Pig
  Issue Type: Sub-task
Affects Versions: 0.7.0
Reporter: Jeff Zhang
Assignee: Jeff Zhang
 Fix For: 0.7.0

 Attachments: PIG_1205.patch




-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (PIG-1257) PigStorage per the new load-store redesign should support splitting of bzip files


 [ 
https://issues.apache.org/jira/browse/PIG-1257?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pradeep Kamath updated PIG-1257:


Status: Open  (was: Patch Available)

 PigStorage per the new load-store redesign should support splitting of bzip 
 files
 -

 Key: PIG-1257
 URL: https://issues.apache.org/jira/browse/PIG-1257
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.7.0
Reporter: Pradeep Kamath
Assignee: Pradeep Kamath
 Fix For: 0.7.0

 Attachments: PIG-1257-2.patch, PIG-1257.patch


 PigStorage implemented per new load-store-redesign (PIG-966) is based on 
 TextInputFormat for reading data. TextInputFormat has support for reading 
 bzip data but without support for splitting bzip files. In pig 0.6, splitting 
 was enabled for bzip files - we should attempt to enable that feature.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (PIG-1257) PigStorage per the new load-store redesign should support splitting of bzip files


 [ 
https://issues.apache.org/jira/browse/PIG-1257?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pradeep Kamath updated PIG-1257:


Attachment: PIG-1257-2.patch

Attached new patch to address unit test failures

 PigStorage per the new load-store redesign should support splitting of bzip 
 files
 -

 Key: PIG-1257
 URL: https://issues.apache.org/jira/browse/PIG-1257
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.7.0
Reporter: Pradeep Kamath
Assignee: Pradeep Kamath
 Fix For: 0.7.0

 Attachments: PIG-1257-2.patch, PIG-1257.patch


 PigStorage implemented per new load-store-redesign (PIG-966) is based on 
 TextInputFormat for reading data. TextInputFormat has support for reading 
 bzip data but without support for splitting bzip files. In pig 0.6, splitting 
 was enabled for bzip files - we should attempt to enable that feature.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (PIG-1257) PigStorage per the new load-store redesign should support splitting of bzip files


 [ 
https://issues.apache.org/jira/browse/PIG-1257?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pradeep Kamath updated PIG-1257:


Status: Patch Available  (was: Open)

 PigStorage per the new load-store redesign should support splitting of bzip 
 files
 -

 Key: PIG-1257
 URL: https://issues.apache.org/jira/browse/PIG-1257
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.7.0
Reporter: Pradeep Kamath
Assignee: Pradeep Kamath
 Fix For: 0.7.0

 Attachments: PIG-1257-2.patch, PIG-1257.patch


 PigStorage implemented per new load-store-redesign (PIG-966) is based on 
 TextInputFormat for reading data. TextInputFormat has support for reading 
 bzip data but without support for splitting bzip files. In pig 0.6, splitting 
 was enabled for bzip files - we should attempt to enable that feature.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (PIG-1257) PigStorage per the new load-store redesign should support splitting of bzip files


 [ 
https://issues.apache.org/jira/browse/PIG-1257?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pradeep Kamath updated PIG-1257:


Status: Patch Available  (was: Open)

 PigStorage per the new load-store redesign should support splitting of bzip 
 files
 -

 Key: PIG-1257
 URL: https://issues.apache.org/jira/browse/PIG-1257
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.7.0
Reporter: Pradeep Kamath
Assignee: Pradeep Kamath
 Fix For: 0.7.0

 Attachments: PIG-1257.patch


 PigStorage implemented per new load-store-redesign (PIG-966) is based on 
 TextInputFormat for reading data. TextInputFormat has support for reading 
 bzip data but without support for splitting bzip files. In pig 0.6, splitting 
 was enabled for bzip files - we should attempt to enable that feature.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (PIG-1257) PigStorage per the new load-store redesign should support splitting of bzip files


 [ 
https://issues.apache.org/jira/browse/PIG-1257?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pradeep Kamath updated PIG-1257:


Attachment: PIG-1257.patch

Attached patch builds an InputFormat (Bzip2TextInputFormat) on top of the 
existing CBZip2InputStream.

 PigStorage per the new load-store redesign should support splitting of bzip 
 files
 -

 Key: PIG-1257
 URL: https://issues.apache.org/jira/browse/PIG-1257
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.7.0
Reporter: Pradeep Kamath
Assignee: Pradeep Kamath
 Fix For: 0.7.0

 Attachments: PIG-1257.patch


 PigStorage implemented per new load-store-redesign (PIG-966) is based on 
 TextInputFormat for reading data. TextInputFormat has support for reading 
 bzip data but without support for splitting bzip files. In pig 0.6, splitting 
 was enabled for bzip files - we should attempt to enable that feature.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Created: (PIG-1259) ResourceFieldSchema.setSchema should not allow a bag field without a Tuple as its only sub field (the tuple itself can have a schema with 1 subfields)

ResourceFieldSchema.setSchema should not allow a bag field without a Tuple as 
its only sub field  (the tuple itself can have a schema with  1 subfields)
-

 Key: PIG-1259
 URL: https://issues.apache.org/jira/browse/PIG-1259
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.7.0
Reporter: Pradeep Kamath
 Fix For: 0.7.0


Currently Schema.getPigSchema(ResourceSchema) does not allow a bag field in the 
ResourceSchema with a subschema containing anything other than a tuple. The 
tuple itself can have a schema with  1 subfields. This check should also  be 
enforced in ResourceFieldSchema.setSchema()

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (PIG-1259) ResourceFieldSchema.setSchema should not allow a bag field without a Tuple as its only sub field (the tuple itself can have a schema with 1 subfields)


 [ 
https://issues.apache.org/jira/browse/PIG-1259?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pradeep Kamath updated PIG-1259:


Assignee: Pradeep Kamath
  Status: Patch Available  (was: Open)

 ResourceFieldSchema.setSchema should not allow a bag field without a Tuple as 
 its only sub field  (the tuple itself can have a schema with  1 subfields)
 -

 Key: PIG-1259
 URL: https://issues.apache.org/jira/browse/PIG-1259
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.7.0
Reporter: Pradeep Kamath
Assignee: Pradeep Kamath
 Fix For: 0.7.0

 Attachments: PIG-1259.patch


 Currently Schema.getPigSchema(ResourceSchema) does not allow a bag field in 
 the ResourceSchema with a subschema containing anything other than a tuple. 
 The tuple itself can have a schema with  1 subfields. This check should also 
  be enforced in ResourceFieldSchema.setSchema()

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (PIG-1079) Modify merge join to use distributed cache to maintain the index

2010-02-23 Thread Pradeep Kamath (JIRA)


[ 
https://issues.apache.org/jira/browse/PIG-1079?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12837408#action_12837408
 ] 

Pradeep Kamath commented on PIG-1079:
-

+1

 Modify merge join to use distributed cache to maintain the index
 

 Key: PIG-1079
 URL: https://issues.apache.org/jira/browse/PIG-1079
 Project: Pig
  Issue Type: Bug
Reporter: Sriranjan Manjunath
Assignee: Richard Ding
 Fix For: 0.7.0

 Attachments: PIG-1079.patch, PIG-1079.patch




-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (PIG-1250) Make StoreFunc an abstract class and create a mirror interface called StoreFuncInterface

2010-02-22 Thread Pradeep Kamath (JIRA)


 [ 
https://issues.apache.org/jira/browse/PIG-1250?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pradeep Kamath updated PIG-1250:


Affects Version/s: 0.7.0
Fix Version/s: 0.7.0

 Make StoreFunc an abstract class and create a mirror interface called 
 StoreFuncInterface
 

 Key: PIG-1250
 URL: https://issues.apache.org/jira/browse/PIG-1250
 Project: Pig
  Issue Type: Sub-task
Affects Versions: 0.7.0
Reporter: Pradeep Kamath
Assignee: Pradeep Kamath
 Fix For: 0.7.0




-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Created: (PIG-1250) Make StoreFunc an abstract class and create a mirror interface called StoreFuncInterface

2010-02-22 Thread Pradeep Kamath (JIRA)

Make StoreFunc an abstract class and create a mirror interface called 
StoreFuncInterface


 Key: PIG-1250
 URL: https://issues.apache.org/jira/browse/PIG-1250
 Project: Pig
  Issue Type: Sub-task
Reporter: Pradeep Kamath
Assignee: Pradeep Kamath




-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (PIG-1250) Make StoreFunc an abstract class and create a mirror interface called StoreFuncInterface

2010-02-22 Thread Pradeep Kamath (JIRA)


 [ 
https://issues.apache.org/jira/browse/PIG-1250?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pradeep Kamath updated PIG-1250:


Status: Patch Available  (was: Open)

 Make StoreFunc an abstract class and create a mirror interface called 
 StoreFuncInterface
 

 Key: PIG-1250
 URL: https://issues.apache.org/jira/browse/PIG-1250
 Project: Pig
  Issue Type: Sub-task
Affects Versions: 0.7.0
Reporter: Pradeep Kamath
Assignee: Pradeep Kamath
 Fix For: 0.7.0

 Attachments: PIG-1250.patch




-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

COMPLETED merge of load-store-redesign branch to trunk

2010-02-19 Thread Pradeep Kamath

The merge from load-store-redesign branch to trunk is now completed. New
commits can now proceed on trunk. The load-store-redesign branch is
deprecated with this merge and no more commits should be done on that
branch.

 

Pradeep

 



From: Pradeep Kamath 
Sent: Thursday, February 18, 2010 11:20 AM
To: Pradeep Kamath; 'pig-dev@hadoop.apache.org';
'pig-u...@hadoop.apache.org'
Subject: BEGINNING merge of load-store-redesign branch to trunk - hold
off commits!

 

Hi,

  I will begin this activity now - a request to all committers to not
commit to trunk or load-store-redesign till I send an all clear message
- I am anticipating this will hopefully be completed by end of day
(Pacific time) tomorrow.

 

Thanks,

Pradeep

 



From: Pradeep Kamath 
Sent: Tuesday, February 16, 2010 11:34 AM
To: 'pig-dev@hadoop.apache.org'; 'pig-u...@hadoop.apache.org'
Subject: Plan to merge load-store-redesign branch to trunk

 

Hi,

   We would like to merge the load-store-redesign branch to trunk
tentatively on Thursday. To do this, I would like to request all
committers to not commit anything to load-store-redesign branch or trunk
during the period of the merge. I will send out a mail to indicate begin
and end of this activity - tentatively I am expecting this to be a day's
period between 9 AM PST Thursday to 9AM PST Friday so I can resolve any
conflicts and run all tests.

 

Pradeep

[jira] Created: (PIG-1245) Remove the connection to nameone in HExecutionEngine.init()

Remove the connection to nameone in HExecutionEngine.init() 


 Key: PIG-1245
 URL: https://issues.apache.org/jira/browse/PIG-1245
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.6.0
Reporter: Pradeep Kamath
 Fix For: 0.7.0


PigContext.connect() calls HExecutionEngine.init(). The former is called from 
the backend map/reduce tasks in DefaultIndexableLoader used in merge join. It 
is not clear that a connection to the namenode is required in 
HExecutionEngine.init().

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (PIG-966) Proposed rework for LoadFunc, StoreFunc, and Slice/r interfaces

[
https://issues.apache.org/jira/browse/PIG-966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12836035#action_12836035
]

Pradeep Kamath commented on PIG-966:

LoadFunc is now an abstract class with default implementations for some of the
methods - we hope this will aid implementers. I would like to make the same
change for StoreFunc. Since PigStorage currently does both load and store, we
would need to also introduce an interface - StoreFuncInterface so that
PigStorage can extend LoadFunc and implement StoreFuncInterface. To be
symmetrical, we would need to also introduce a LoadFuncInterface. This
interface can be used by implementers if they want their loadFunc
implementation to extend some other class. We can document and recommend
strongly to users to only use our abstract classes since that would be make
them less vulnerable to incompatibile additions in the future (hopefully when
we add new methods into these abstract classes we will give a default
implementation).

I will upload a patch for this unless anyone has strong objections.

Proposed rework for LoadFunc, StoreFunc, and Slice/r interfaces
---

Key: PIG-966
URL: https://issues.apache.org/jira/browse/PIG-966
Project: Pig
Issue Type: Improvement
Components: impl
Reporter: Alan Gates
Assignee: Alan Gates

I propose that we rework the LoadFunc, StoreFunc, and Slice/r interfaces
significantly. See http://wiki.apache.org/pig/LoadStoreRedesignProposal for
full details

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (PIG-1218) Use distributed cache to store samples


 [ 
https://issues.apache.org/jira/browse/PIG-1218?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pradeep Kamath updated PIG-1218:


  Resolution: Fixed
Hadoop Flags: [Reviewed]
  Status: Resolved  (was: Patch Available)

Committed patch PIG-1218_2.patch since the merge join changes need to be 
re-worked and will be handled in a different patch.

Thanks Richard!

 Use distributed cache to store samples
 --

 Key: PIG-1218
 URL: https://issues.apache.org/jira/browse/PIG-1218
 Project: Pig
  Issue Type: Improvement
Reporter: Olga Natkovich
Assignee: Richard Ding
 Fix For: 0.7.0

 Attachments: PIG-1218.patch, PIG-1218_2.patch, PIG-1218_3.patch


 Currently, in the case of skew join and order by we use sample that is just 
 written to the dfs (not distributed cache) and, as the result, get opened and 
 copied around more than necessary. This impacts query performance and also 
 places unnecesary load on the name node

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (PIG-966) Proposed rework for LoadFunc, StoreFunc, and Slice/r interfaces


[ 
https://issues.apache.org/jira/browse/PIG-966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12836080#action_12836080
 ] 

Pradeep Kamath commented on PIG-966:


In retrospect, I think we can skip on creating a LoadFuncInterface since 
currently there is no real use case for an interface - we are adding it to keep 
symmetry with StoreFuncINterface and to allow implementations which extends 
other classes to implement this interface. The first motivation is not very 
strong and second also can be achieved through composition rather than 
inheritance - it is unclear how inheriting a different class would benefit a 
Loader implementation over composition to delegation functionality. By 
introducing a LoadFuncInterface we would be exposing users who implement it to 
backward incompatible additions in the future. So I think we should not add a 
LoadFuncInterface now and ONLY if a real need arises add it. The rest of my 
proposal (making StoreFunc an abstract class and add a new StoreFuncInterface) 
still holds.

 Proposed rework for LoadFunc, StoreFunc, and Slice/r interfaces
 ---

 Key: PIG-966
 URL: https://issues.apache.org/jira/browse/PIG-966
 Project: Pig
  Issue Type: Improvement
  Components: impl
Reporter: Alan Gates
Assignee: Alan Gates

 I propose that we rework the LoadFunc, StoreFunc, and Slice/r interfaces 
 significantly.  See http://wiki.apache.org/pig/LoadStoreRedesignProposal for 
 full details

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

BEGINNING merge of load-store-redesign branch to trunk - hold off commits!

2010-02-18 Thread Pradeep Kamath

Hi,

  I will begin this activity now - a request to all committers to not
commit to trunk or load-store-redesign till I send an all clear message
- I am anticipating this will hopefully be completed by end of day
(Pacific time) tomorrow.

 

Thanks,

Pradeep

 



From: Pradeep Kamath 
Sent: Tuesday, February 16, 2010 11:34 AM
To: 'pig-dev@hadoop.apache.org'; 'pig-u...@hadoop.apache.org'
Subject: Plan to merge load-store-redesign branch to trunk

 

Hi,

   We would like to merge the load-store-redesign branch to trunk
tentatively on Thursday. To do this, I would like to request all
committers to not commit anything to load-store-redesign branch or trunk
during the period of the merge. I will send out a mail to indicate begin
and end of this activity - tentatively I am expecting this to be a day's
period between 9 AM PST Thursday to 9AM PST Friday so I can resolve any
conflicts and run all tests.

 

Pradeep

[jira] Updated: (PIG-1216) New load store design does not allow Pig to validate inputs and outputs up front

2010-02-17 Thread Pradeep Kamath (JIRA)


 [ 
https://issues.apache.org/jira/browse/PIG-1216?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pradeep Kamath updated PIG-1216:


   Resolution: Fixed
Fix Version/s: 0.7.0
 Hadoop Flags: [Reviewed]
   Status: Resolved  (was: Patch Available)

Patch committed to load-store-redesign branch - Thanks Ashutosh!

Note that only outputs will be validated up front (in line with Pig 0.6.0) - 
inputs will not be validated up front since for the following case validating 
inputs is not easy:
{code}
...
store into 'foo'...
load 'foo'...
...
{code}

 New load store design does not allow Pig to validate inputs and outputs up 
 front
 

 Key: PIG-1216
 URL: https://issues.apache.org/jira/browse/PIG-1216
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.7.0
Reporter: Alan Gates
Assignee: Ashutosh Chauhan
 Fix For: 0.7.0

 Attachments: pig-1216.patch, pig-1216_1.patch


 In Pig 0.6 and before, Pig attempts to verify existence of inputs and 
 non-existence of outputs during parsing to avoid run time failures when 
 inputs don't exist or outputs can't be overwritten.  The downside to this was 
 that Pig assumed all inputs and outputs were HDFS files, which made 
 implementation harder for non-HDFS based load and store functions.  In the 
 load store redesign (PIG-966) this was delegated to InputFormats and 
 OutputFormats to avoid this problem and to make use of the checks already 
 being done in those implementations.  Unfortunately, for Pig Latin scripts 
 that run more then one MR job, this does not work well.  MR does not do 
 input/output verification on all the jobs at once.  It does them one at a 
 time.  So if a Pig Latin script results in 10 MR jobs and the file to store 
 to at the end already exists, the first 9 jobs will be run before the 10th 
 job discovers that the whole thing was doomed from the beginning.  
 To avoid this a validate call needs to be added to the new LoadFunc and 
 StoreFunc interfaces.  Pig needs to pass this method enough information that 
 the load function implementer can delegate to InputFormat.getSplits() and the 
 store function implementer to OutputFormat.checkOutputSpecs() if s/he decides 
 to.  Since 90% of all load and store functions use HDFS and PigStorage will 
 also need to, the Pig team should implement a default file existence check on 
 HDFS and make it available as a static method to other Load/Store function 
 implementers.  

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (PIG-1079) Modify merge join to use distributed cache to maintain the index

2010-02-17 Thread Pradeep Kamath (JIRA)


 [ 
https://issues.apache.org/jira/browse/PIG-1079?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pradeep Kamath updated PIG-1079:


Fix Version/s: 0.7.0
 Assignee: Richard Ding

 Modify merge join to use distributed cache to maintain the index
 

 Key: PIG-1079
 URL: https://issues.apache.org/jira/browse/PIG-1079
 Project: Pig
  Issue Type: Bug
Reporter: Sriranjan Manjunath
Assignee: Richard Ding
 Fix For: 0.7.0




-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (PIG-1218) Use distributed cache to store samples

2010-02-17 Thread Pradeep Kamath (JIRA)


[ 
https://issues.apache.org/jira/browse/PIG-1218?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12834957#action_12834957
 ] 

Pradeep Kamath commented on PIG-1218:
-

+1 Patch mostly looks good - couple of comments:
 * In a couple of places instead of using Configuration and JobConf based on 
PigMapReduce.sJobConf, you should create a new Configiuration(false) and new 
JobConf(false) so we create fresh datastructures without any properties coming 
from the Map reduce based datastructures.
 * Since partitionFile is no longer used in POPartitionRearrange.java we should 
remove it.

You can make these changes and go ahead and commit it if it passes tests

 Use distributed cache to store samples
 --

 Key: PIG-1218
 URL: https://issues.apache.org/jira/browse/PIG-1218
 Project: Pig
  Issue Type: Improvement
Reporter: Olga Natkovich
Assignee: Richard Ding
 Fix For: 0.7.0

 Attachments: PIG-1218.patch, PIG-1218_2.patch


 Currently, in the case of skew join and order by we use sample that is just 
 written to the dfs (not distributed cache) and, as the result, get opened and 
 copied around more than necessary. This impacts query performance and also 
 places unnecesary load on the name node

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Created: (PIG-1239) PigContext.connect() should not create a jobClient and jobClient should be created on demand when needed

PigContext.connect() should not create a jobClient and jobClient should be 
created on demand when needed


 Key: PIG-1239
 URL: https://issues.apache.org/jira/browse/PIG-1239
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.6.0
Reporter: Pradeep Kamath
Assignee: Pradeep Kamath
 Fix For: 0.6.0, 0.7.0


PigContext.connect() currently connects to the jobtracker and creates a 
JobClient - this causes issue in POMergeJoin/POFRJoin wherein these connections 
to the jobtracker are made from each map task. The creation of the JobClient is 
not necessary in PigContext.connect() and a JobClient should be created on 
demand where it is needed instead.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (PIG-1213) Schema serialization is broken


 [ 
https://issues.apache.org/jira/browse/PIG-1213?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pradeep Kamath updated PIG-1213:


  Resolution: Fixed
Hadoop Flags: [Reviewed]
  Status: Resolved  (was: Patch Available)

Patch was committed to trunk and branch-0.6 on 01 Feb 2010

 Schema serialization is broken
 --

 Key: PIG-1213
 URL: https://issues.apache.org/jira/browse/PIG-1213
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.6.0
Reporter: Pradeep Kamath
Assignee: Pradeep Kamath
 Fix For: 0.6.0

 Attachments: PIG-1213.patch


 Consider a udf which needs to know the schema of its input in the backend 
 while executing. To achieve this, the udf needs to store the schema into the 
 UDFContext. Internally the UDFContext will serialize the schema into the 
 jobconf. However this currently is broken and gives a Serialization exception

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Plan to merge load-store-redesign branch to trunk

2010-02-16 Thread Pradeep Kamath

Hi,

   We would like to merge the load-store-redesign branch to trunk
tentatively on Thursday. To do this, I would like to request all
committers to not commit anything to load-store-redesign branch or trunk
during the period of the merge. I will send out a mail to indicate begin
and end of this activity - tentatively I am expecting this to be a day's
period between 9 AM PST Thursday to 9AM PST Friday so I can resolve any
conflicts and run all tests.

 

Pradeep

[jira] Updated: (PIG-1239) PigContext.connect() should not create a jobClient and jobClient should be created on demand when needed


 [ 
https://issues.apache.org/jira/browse/PIG-1239?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pradeep Kamath updated PIG-1239:


Attachment: PIG-1239-load-store-redesign-branch.patch
PIG-1239-branch-0.6.patch

Attached patches for branch-0.6 and load-store-redesign branch.

Changes are:
 * PigContext.connect() does not create a JobClient - instead it creates and 
holds a JobConf object - callers have been changed to use the JobConf and 
create a JobClient 
 * On the load-store-redesign branch, POMergeJoin no longer does a pc.connect 
since it is no longer needed

 PigContext.connect() should not create a jobClient and jobClient should be 
 created on demand when needed
 

 Key: PIG-1239
 URL: https://issues.apache.org/jira/browse/PIG-1239
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.6.0
Reporter: Pradeep Kamath
Assignee: Pradeep Kamath
 Fix For: 0.6.0, 0.7.0

 Attachments: PIG-1239-branch-0.6.patch, 
 PIG-1239-load-store-redesign-branch.patch


 PigContext.connect() currently connects to the jobtracker and creates a 
 JobClient - this causes issue in POMergeJoin/POFRJoin wherein these 
 connections to the jobtracker are made from each map task. The creation of 
 the JobClient is not necessary in PigContext.connect() and a JobClient should 
 be created on demand where it is needed instead.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (PIG-1216) New load store design does not allow Pig to validate inputs and outputs up front


[ 
https://issues.apache.org/jira/browse/PIG-1216?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12834411#action_12834411
 ] 

Pradeep Kamath commented on PIG-1216:
-

Review comments:
 * Is it ok to call outputSpecs multiple times (since we will now be calling it 
in the visitor and Hadoop will be calling it later when the job is launched) - 
hope that does not break the contract per Hadoop's OutputFormat interface
 * The test case for validation failure should ensure that 
PlanValidationException is indeed thrown (through some boolean flag?) - 
currently the code has :
{code}
} catch (PlanValidationException pve){
+   // We expect this to happen.
+}
{code}
 * import org.omg.PortableInterceptor.SUCCESSFUL; in TestStore.java seems 
accidental - if you will be submitting a new patch for above comment, you can 
remove this import also.

Otherwise looks good.

 New load store design does not allow Pig to validate inputs and outputs up 
 front
 

 Key: PIG-1216
 URL: https://issues.apache.org/jira/browse/PIG-1216
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.7.0
Reporter: Alan Gates
Assignee: Ashutosh Chauhan
 Attachments: pig-1216.patch


 In Pig 0.6 and before, Pig attempts to verify existence of inputs and 
 non-existence of outputs during parsing to avoid run time failures when 
 inputs don't exist or outputs can't be overwritten.  The downside to this was 
 that Pig assumed all inputs and outputs were HDFS files, which made 
 implementation harder for non-HDFS based load and store functions.  In the 
 load store redesign (PIG-966) this was delegated to InputFormats and 
 OutputFormats to avoid this problem and to make use of the checks already 
 being done in those implementations.  Unfortunately, for Pig Latin scripts 
 that run more then one MR job, this does not work well.  MR does not do 
 input/output verification on all the jobs at once.  It does them one at a 
 time.  So if a Pig Latin script results in 10 MR jobs and the file to store 
 to at the end already exists, the first 9 jobs will be run before the 10th 
 job discovers that the whole thing was doomed from the beginning.  
 To avoid this a validate call needs to be added to the new LoadFunc and 
 StoreFunc interfaces.  Pig needs to pass this method enough information that 
 the load function implementer can delegate to InputFormat.getSplits() and the 
 store function implementer to OutputFormat.checkOutputSpecs() if s/he decides 
 to.  Since 90% of all load and store functions use HDFS and PigStorage will 
 also need to, the Pig team should implement a default file existence check on 
 HDFS and make it available as a static method to other Load/Store function 
 implementers.  

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (PIG-1239) PigContext.connect() should not create a jobClient and jobClient should be created on demand when needed


[ 
https://issues.apache.org/jira/browse/PIG-1239?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12834443#action_12834443
 ] 

Pradeep Kamath commented on PIG-1239:
-

* No unit tests are included in both patches since this is difficult to capture 
in a unit test - manual tests were done to ensure that connections to 
JobTracker no longer happens from a script using replicated join.
 * Release audit warning are due to diffs in html docs
 * The extra javac warnings are due to use of JobConf which is deprecated - I 
have added suppressWarning tags which don't seem to help. We need to use 
JobConf here and there is no way around the warning.

Results from running test-patch ant target for branch-0.6
   [exec] -1 overall.
 [exec]
 [exec] +1 @author.  The patch does not contain any @author tags.
 [exec]
 [exec] -1 tests included.  The patch doesn't appear to include any new 
or modified tests.
 [exec] Please justify why no tests are needed for 
this patch.
 [exec]
 [exec] +1 javadoc.  The javadoc tool did not generate any warning 
messages.
 [exec]
 [exec] +1 javac.  The applied patch does not increase the total number 
of javac compiler warnings.
 [exec]
 [exec] +1 findbugs.  The patch does not introduce any new Findbugs 
warnings.
 [exec]
 [exec] -1 release audit.  The applied patch generated 391 release 
audit warnings (more than the trunk's current 389 warnings).
 [exec]
 [exec]
 [exec]
 [exec]
 [exec] 
==
 [exec] 
==
 [exec] Finished build.
 [exec] 
==
 [exec] 
==

Results from running test-patch ant target for load-store-redesign branch:

 [exec] -1 overall.
 [exec]
 [exec] +1 @author.  The patch does not contain any @author tags.
 [exec] 
 [exec] -1 tests included.  The patch doesn't appear to include any new 
or modified tests.
 [exec] Please justify why no tests are needed for 
this patch.
 [exec] 
 [exec] +1 javadoc.  The javadoc tool did not generate any warning 
messages.
 [exec] 
 [exec] -1 javac.  The applied patch generated 105 javac compiler 
warnings (more than the trunk's current 103 warnings).
 [exec] 
 [exec] +1 findbugs.  The patch does not introduce any new Findbugs 
warnings.
 [exec] 
 [exec] +1 release audit.  The applied patch does not increase the 
total number of release audit warnings.
 [exec] 
 [exec] 



 PigContext.connect() should not create a jobClient and jobClient should be 
 created on demand when needed
 

 Key: PIG-1239
 URL: https://issues.apache.org/jira/browse/PIG-1239
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.6.0
Reporter: Pradeep Kamath
Assignee: Pradeep Kamath
 Fix For: 0.6.0, 0.7.0

 Attachments: PIG-1239-branch-0.6.patch, 
 PIG-1239-load-store-redesign-branch.patch


 PigContext.connect() currently connects to the jobtracker and creates a 
 JobClient - this causes issue in POMergeJoin/POFRJoin wherein these 
 connections to the jobtracker are made from each map task. The creation of 
 the JobClient is not necessary in PigContext.connect() and a JobClient should 
 be created on demand where it is needed instead.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Resolved: (PIG-1239) PigContext.connect() should not create a jobClient and jobClient should be created on demand when needed


 [ 
https://issues.apache.org/jira/browse/PIG-1239?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pradeep Kamath resolved PIG-1239.
-

  Resolution: Fixed
Hadoop Flags: [Reviewed]

Patch committed to branch-0.6 and load-store-redesign branch.

 PigContext.connect() should not create a jobClient and jobClient should be 
 created on demand when needed
 

 Key: PIG-1239
 URL: https://issues.apache.org/jira/browse/PIG-1239
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.6.0
Reporter: Pradeep Kamath
Assignee: Pradeep Kamath
 Fix For: 0.6.0, 0.7.0

 Attachments: PIG-1239-branch-0.6.patch, 
 PIG-1239-load-store-redesign-branch.patch


 PigContext.connect() currently connects to the jobtracker and creates a 
 JobClient - this causes issue in POMergeJoin/POFRJoin wherein these 
 connections to the jobtracker are made from each map task. The creation of 
 the JobClient is not necessary in PigContext.connect() and a JobClient should 
 be created on demand where it is needed instead.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (PIG-1234) Unable to create input slice for har:// files


 [ 
https://issues.apache.org/jira/browse/PIG-1234?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pradeep Kamath updated PIG-1234:


Fix Version/s: 0.7.0

 Unable to create input slice for har:// files
 -

 Key: PIG-1234
 URL: https://issues.apache.org/jira/browse/PIG-1234
 Project: Pig
  Issue Type: Bug
Reporter: Tsz Wo (Nicholas), SZE
 Fix For: 0.7.0


 Tried to load har:// files
 {noformat}
 grunt a = LOAD 'har://hdfs-namenode/user/tsz/t20.har/t20' USING 
 PigStorage('\n') AS (line);
 grunt dump 
 {noformat}
 but pig says
 {noformat}
 2010-02-10 18:42:20,750 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 
 2118:
  Unable to create input slice for: har://hdfs-namenode/user/tsz/t20.har/t20
 {noformat}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (PIG-1234) Unable to create input slice for har:// files


 [ 
https://issues.apache.org/jira/browse/PIG-1234?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pradeep Kamath updated PIG-1234:


Attachment: PIG-1234.patch

Patch against load-store-redesign branch which fixes this - the code currently 
was trying to validate the schema supplied in the load vs. schema of the 
current directory path (which is always hdfs). The patch makes the change to 
not do this check if the local is a valid url with authority.

 Unable to create input slice for har:// files
 -

 Key: PIG-1234
 URL: https://issues.apache.org/jira/browse/PIG-1234
 Project: Pig
  Issue Type: Bug
Reporter: Tsz Wo (Nicholas), SZE
 Fix For: 0.7.0

 Attachments: PIG-1234.patch


 Tried to load har:// files
 {noformat}
 grunt a = LOAD 'har://hdfs-namenode/user/tsz/t20.har/t20' USING 
 PigStorage('\n') AS (line);
 grunt dump 
 {noformat}
 but pig says
 {noformat}
 2010-02-10 18:42:20,750 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 
 2118:
  Unable to create input slice for: har://hdfs-namenode/user/tsz/t20.har/t20
 {noformat}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (PIG-1234) Unable to create input slice for har:// files


 [ 
https://issues.apache.org/jira/browse/PIG-1234?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pradeep Kamath updated PIG-1234:


Assignee: Pradeep Kamath

 Unable to create input slice for har:// files
 -

 Key: PIG-1234
 URL: https://issues.apache.org/jira/browse/PIG-1234
 Project: Pig
  Issue Type: Bug
Reporter: Tsz Wo (Nicholas), SZE
Assignee: Pradeep Kamath
 Fix For: 0.7.0

 Attachments: PIG-1234.patch


 Tried to load har:// files
 {noformat}
 grunt a = LOAD 'har://hdfs-namenode/user/tsz/t20.har/t20' USING 
 PigStorage('\n') AS (line);
 grunt dump 
 {noformat}
 but pig says
 {noformat}
 2010-02-10 18:42:20,750 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 
 2118:
  Unable to create input slice for: har://hdfs-namenode/user/tsz/t20.har/t20
 {noformat}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (PIG-1234) Unable to create input slice for har:// files


[ 
https://issues.apache.org/jira/browse/PIG-1234?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12833221#action_12833221
 ] 

Pradeep Kamath commented on PIG-1234:
-

Results from running test-patch ant target:
  [exec] +1 overall.
 [exec]
 [exec] +1 @author.  The patch does not contain any @author tags.
 [exec]
 [exec] +1 tests included.  The patch appears to include 3 new or 
modified tests.
 [exec]
 [exec] +1 javadoc.  The javadoc tool did not generate any warning 
messages.
 [exec]
 [exec] +1 javac.  The applied patch does not increase the total number 
of javac compiler warnings.
 [exec]
 [exec] +1 findbugs.  The patch does not introduce any new Findbugs 
warnings.
 [exec]
 [exec] +1 release audit.  The applied patch does not increase the 
total number of release audit warnings.

I am currently running unit tests against this patch on load-store-redesign 
branch.

 Unable to create input slice for har:// files
 -

 Key: PIG-1234
 URL: https://issues.apache.org/jira/browse/PIG-1234
 Project: Pig
  Issue Type: Bug
Reporter: Tsz Wo (Nicholas), SZE
Assignee: Pradeep Kamath
 Fix For: 0.7.0

 Attachments: PIG-1234.patch


 Tried to load har:// files
 {noformat}
 grunt a = LOAD 'har://hdfs-namenode/user/tsz/t20.har/t20' USING 
 PigStorage('\n') AS (line);
 grunt dump 
 {noformat}
 but pig says
 {noformat}
 2010-02-10 18:42:20,750 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 
 2118:
  Unable to create input slice for: har://hdfs-namenode/user/tsz/t20.har/t20
 {noformat}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (PIG-1234) Unable to create input slice for har:// files