[jira] Commented: (PIG-1338) Pig should exclude hadoop conf in local mode
[ https://issues.apache.org/jira/browse/PIG-1338?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12851489#action_12851489 ] Pradeep Kamath commented on PIG-1338: - I haven't done a full review but had a comment on one of the changes which is pretty important: {noformat} Index: src/org/apache/pig/backend/hadoop/datastorage/ConfigurationUtil.java === --- src/org/apache/pig/backend/hadoop/datastorage/ConfigurationUtil.java (revision 928370) +++ src/org/apache/pig/backend/hadoop/datastorage/ConfigurationUtil.java (working copy) @@ -30,7 +30,9 @@ public static Configuration toConfiguration(Properties properties) { assert properties != null; -final Configuration config = new Configuration(); +final Configuration config = new Configuration(false); +config.addResource(core-default.xml); +config.addResource(mapred-default.xml); final EnumerationObject iter = properties.keys(); while (iter.hasMoreElements()) { final String key = (String) iter.nextElement() {noformat} Looking at the Configuration class's implementation I found the following code: {noformat} static{ //print deprecation warning if hadoop-site.xml is found in classpath ClassLoader cL = Thread.currentThread().getContextClassLoader(); if (cL == null) { cL = Configuration.class.getClassLoader(); } if(cL.getResource(hadoop-site.xml)!=null) { LOG.warn(DEPRECATED: hadoop-site.xml found in the classpath. + Usage of hadoop-site.xml is deprecated. Instead use core-site.xml, + mapred-site.xml and hdfs-site.xml to override properties of + core-default.xml, mapred-default.xml and hdfs-default.xml + respectively); } addDefaultResource(core-default.xml); addDefaultResource(core-site.xml); } private void loadResources(Properties properties, ArrayList resources, boolean quiet) { if(loadDefaults) { for (String resource : defaultResources) { loadResource(properties, resource, quiet); } //support the hadoop-site.xml as a deprecated case if(getResource(hadoop-site.xml)!=null) { loadResource(properties, hadoop-site.xml, quiet); } } for (Object resource : resources) { loadResource(properties, resource, quiet); } } {noformat} There are two questions related to the code in Configuration Vs the change in this patch: 1) In the patch, core-default.xml and mapred-default.xml are added as resources while in Configuration core-default.xml and core-site.xml are added by default 2) In the patch, hadoop-site.xml is not considered while in Configuration, it is - so if a hadoop 20.x cluster is installed with hadoop-site.xml configured and without the other .xml files (like core-default.xml etc.) then pig would not get the cluster config information right? Pig should exclude hadoop conf in local mode Key: PIG-1338 URL: https://issues.apache.org/jira/browse/PIG-1338 Project: Pig Issue Type: Improvement Components: impl Affects Versions: 0.7.0 Reporter: Daniel Dai Assignee: Daniel Dai Attachments: PIG-1338-1.patch, PIG-1338-2.patch Currently, the behavior for hadoop conf look up is: * in local mode, if there is hadoop conf, bail out; if there is no hadoop conf, launch local mode * in hadoop mode, if there is hadoop conf, use this conf to launch Pig; if no, still launch without warning, but many functionality will go wrong We should bring it to a more intuitive way, which is: * in local mode, always launch Pig in local mode * in hadoop mode, if there is hadoop conf, use this conf to launch Pig; if no, bail out with a meaningful message -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-1316) TextLoader should use Bzip2TextInputFormat for bzip files so that bzip files can be efficiently processed by splitting the files
[ https://issues.apache.org/jira/browse/PIG-1316?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12850266#action_12850266 ] Pradeep Kamath commented on PIG-1316: - test-patch ant target results frm running locally: [exec] [exec] +1 overall. [exec] [exec] +1 @author. The patch does not contain any @author tags. [exec] [exec] +1 tests included. The patch appears to include 13 new or modified tests. [exec] [exec] +1 javadoc. The javadoc tool did not generate any warning messages. [exec] [exec] +1 javac. The applied patch does not increase the total number of javac compiler warnings. [exec] [exec] +1 findbugs. The patch does not introduce any new Findbugs warnings. [exec] [exec] +1 release audit. The applied patch does not increase the total number of release audit warnings. [exec] [exec] [exec] [exec] [exec] == [exec] == [exec] Finished build. [exec] == [exec] == Am also running the unit tests locally - the tests require some manual data setup (the data file for the test needs to be created before test run - the patch cannot handle these actions) - will update with results. TextLoader should use Bzip2TextInputFormat for bzip files so that bzip files can be efficiently processed by splitting the files Key: PIG-1316 URL: https://issues.apache.org/jira/browse/PIG-1316 Project: Pig Issue Type: Improvement Affects Versions: 0.7.0 Reporter: Pradeep Kamath Assignee: Pradeep Kamath Fix For: 0.7.0 Attachments: PIG-1316.patch Currently TextLoader uses TextInputFormat which does not split bzip files - this can be fixed by using Bzip2TextInputformat. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-1316) TextLoader should use Bzip2TextInputFormat for bzip files so that bzip files can be efficiently processed by splitting the files
[ https://issues.apache.org/jira/browse/PIG-1316?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12850386#action_12850386 ] Pradeep Kamath commented on PIG-1316: - All unit tests passed - will commit shorty: ... [junit] Running org.apache.pig.test.TestUnion [junit] Tests run: 3, Failures: 0, Errors: 0, Time elapsed: 49.523 sec test-contrib: BUILD SUCCESSFUL Total time: 278 minutes 11 seconds TextLoader should use Bzip2TextInputFormat for bzip files so that bzip files can be efficiently processed by splitting the files Key: PIG-1316 URL: https://issues.apache.org/jira/browse/PIG-1316 Project: Pig Issue Type: Improvement Affects Versions: 0.7.0 Reporter: Pradeep Kamath Assignee: Pradeep Kamath Fix For: 0.7.0 Attachments: PIG-1316.patch Currently TextLoader uses TextInputFormat which does not split bzip files - this can be fixed by using Bzip2TextInputformat. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1316) TextLoader should use Bzip2TextInputFormat for bzip files so that bzip files can be efficiently processed by splitting the files
[ https://issues.apache.org/jira/browse/PIG-1316?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pradeep Kamath updated PIG-1316: Resolution: Fixed Hadoop Flags: [Reviewed] Status: Resolved (was: Patch Available) Patch committed to trunk and branch-0.7 TextLoader should use Bzip2TextInputFormat for bzip files so that bzip files can be efficiently processed by splitting the files Key: PIG-1316 URL: https://issues.apache.org/jira/browse/PIG-1316 Project: Pig Issue Type: Improvement Affects Versions: 0.7.0 Reporter: Pradeep Kamath Assignee: Pradeep Kamath Fix For: 0.7.0 Attachments: PIG-1316.patch Currently TextLoader uses TextInputFormat which does not split bzip files - this can be fixed by using Bzip2TextInputformat. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1317) LOLoad should cache results of LoadMetadata.getSchema() for use in subsequent calls to LOLoad.getSchema() or LOLoad.determineSchema()
[ https://issues.apache.org/jira/browse/PIG-1317?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pradeep Kamath updated PIG-1317: Resolution: Fixed Hadoop Flags: [Reviewed] Status: Resolved (was: Patch Available) Patch committed to trunk and branch-0.7 LOLoad should cache results of LoadMetadata.getSchema() for use in subsequent calls to LOLoad.getSchema() or LOLoad.determineSchema() - Key: PIG-1317 URL: https://issues.apache.org/jira/browse/PIG-1317 Project: Pig Issue Type: Improvement Affects Versions: 0.7.0 Reporter: Pradeep Kamath Assignee: Pradeep Kamath Fix For: 0.7.0 Attachments: PIG-1317.patch In LOLoad.getProjectionMap(), the private method determineSchema() is called which inturn calls LoadMetadata.getSchema() - the latter call could potentially be expensive if the input file is read to determine the schema or a metadata system is contacted to get the schema - determineSchema() can cache the schema it gets so that subsequent calls use the cached version. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1317) LOLoad should cache results of LoadMetadata.getSchema() for use in subsequent calls to LOLoad.getSchema() or LOLoad.determineSchema()
[ https://issues.apache.org/jira/browse/PIG-1317?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pradeep Kamath updated PIG-1317: Attachment: PIG-1316.patch Attached patch implements the change to cache the results of LoadMetadata.getSchema for use in future calls. LOLoad should cache results of LoadMetadata.getSchema() for use in subsequent calls to LOLoad.getSchema() or LOLoad.determineSchema() - Key: PIG-1317 URL: https://issues.apache.org/jira/browse/PIG-1317 Project: Pig Issue Type: Improvement Affects Versions: 0.7.0 Reporter: Pradeep Kamath Assignee: Pradeep Kamath Fix For: 0.7.0 In LOLoad.getProjectionMap(), the private method determineSchema() is called which inturn calls LoadMetadata.getSchema() - the latter call could potentially be expensive if the input file is read to determine the schema or a metadata system is contacted to get the schema - determineSchema() can cache the schema it gets so that subsequent calls use the cached version. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1317) LOLoad should cache results of LoadMetadata.getSchema() for use in subsequent calls to LOLoad.getSchema() or LOLoad.determineSchema()
[ https://issues.apache.org/jira/browse/PIG-1317?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pradeep Kamath updated PIG-1317: Status: Open (was: Patch Available) Attached wrong patch file LOLoad should cache results of LoadMetadata.getSchema() for use in subsequent calls to LOLoad.getSchema() or LOLoad.determineSchema() - Key: PIG-1317 URL: https://issues.apache.org/jira/browse/PIG-1317 Project: Pig Issue Type: Improvement Affects Versions: 0.7.0 Reporter: Pradeep Kamath Assignee: Pradeep Kamath Fix For: 0.7.0 In LOLoad.getProjectionMap(), the private method determineSchema() is called which inturn calls LoadMetadata.getSchema() - the latter call could potentially be expensive if the input file is read to determine the schema or a metadata system is contacted to get the schema - determineSchema() can cache the schema it gets so that subsequent calls use the cached version. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1317) LOLoad should cache results of LoadMetadata.getSchema() for use in subsequent calls to LOLoad.getSchema() or LOLoad.determineSchema()
[ https://issues.apache.org/jira/browse/PIG-1317?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pradeep Kamath updated PIG-1317: Status: Patch Available (was: Open) LOLoad should cache results of LoadMetadata.getSchema() for use in subsequent calls to LOLoad.getSchema() or LOLoad.determineSchema() - Key: PIG-1317 URL: https://issues.apache.org/jira/browse/PIG-1317 Project: Pig Issue Type: Improvement Affects Versions: 0.7.0 Reporter: Pradeep Kamath Assignee: Pradeep Kamath Fix For: 0.7.0 In LOLoad.getProjectionMap(), the private method determineSchema() is called which inturn calls LoadMetadata.getSchema() - the latter call could potentially be expensive if the input file is read to determine the schema or a metadata system is contacted to get the schema - determineSchema() can cache the schema it gets so that subsequent calls use the cached version. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1317) LOLoad should cache results of LoadMetadata.getSchema() for use in subsequent calls to LOLoad.getSchema() or LOLoad.determineSchema()
[ https://issues.apache.org/jira/browse/PIG-1317?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pradeep Kamath updated PIG-1317: Attachment: (was: PIG-1316.patch) LOLoad should cache results of LoadMetadata.getSchema() for use in subsequent calls to LOLoad.getSchema() or LOLoad.determineSchema() - Key: PIG-1317 URL: https://issues.apache.org/jira/browse/PIG-1317 Project: Pig Issue Type: Improvement Affects Versions: 0.7.0 Reporter: Pradeep Kamath Assignee: Pradeep Kamath Fix For: 0.7.0 In LOLoad.getProjectionMap(), the private method determineSchema() is called which inturn calls LoadMetadata.getSchema() - the latter call could potentially be expensive if the input file is read to determine the schema or a metadata system is contacted to get the schema - determineSchema() can cache the schema it gets so that subsequent calls use the cached version. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1317) LOLoad should cache results of LoadMetadata.getSchema() for use in subsequent calls to LOLoad.getSchema() or LOLoad.determineSchema()
[ https://issues.apache.org/jira/browse/PIG-1317?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pradeep Kamath updated PIG-1317: Status: Patch Available (was: Open) LOLoad should cache results of LoadMetadata.getSchema() for use in subsequent calls to LOLoad.getSchema() or LOLoad.determineSchema() - Key: PIG-1317 URL: https://issues.apache.org/jira/browse/PIG-1317 Project: Pig Issue Type: Improvement Affects Versions: 0.7.0 Reporter: Pradeep Kamath Assignee: Pradeep Kamath Fix For: 0.7.0 Attachments: PIG-1317.patch In LOLoad.getProjectionMap(), the private method determineSchema() is called which inturn calls LoadMetadata.getSchema() - the latter call could potentially be expensive if the input file is read to determine the schema or a metadata system is contacted to get the schema - determineSchema() can cache the schema it gets so that subsequent calls use the cached version. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1317) LOLoad should cache results of LoadMetadata.getSchema() for use in subsequent calls to LOLoad.getSchema() or LOLoad.determineSchema()
[ https://issues.apache.org/jira/browse/PIG-1317?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pradeep Kamath updated PIG-1317: Attachment: PIG-1317.patch Attached correct patch file now. LOLoad should cache results of LoadMetadata.getSchema() for use in subsequent calls to LOLoad.getSchema() or LOLoad.determineSchema() - Key: PIG-1317 URL: https://issues.apache.org/jira/browse/PIG-1317 Project: Pig Issue Type: Improvement Affects Versions: 0.7.0 Reporter: Pradeep Kamath Assignee: Pradeep Kamath Fix For: 0.7.0 Attachments: PIG-1317.patch In LOLoad.getProjectionMap(), the private method determineSchema() is called which inturn calls LoadMetadata.getSchema() - the latter call could potentially be expensive if the input file is read to determine the schema or a metadata system is contacted to get the schema - determineSchema() can cache the schema it gets so that subsequent calls use the cached version. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1316) TextLoader should use Bzip2TextInputFormat for bzip files so that bzip files can be efficiently processed by splitting the files
[ https://issues.apache.org/jira/browse/PIG-1316?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pradeep Kamath updated PIG-1316: Attachment: PIG-1316.patch Attached patch makes the required changes in TextLoader to use BZip2TextInputFormat if the load location ends with extension .bz or .bz2 like PigStorage. Also for non bzip data, TextLoader will now use PigTextInputFormat rather than TextInputFormat so that input directories can be recursively traversed. I have also changed BZip2TextInputFormat to extend PigFileInputFormat instead of FileInputFormat for the same reason. TextLoader should use Bzip2TextInputFormat for bzip files so that bzip files can be efficiently processed by splitting the files Key: PIG-1316 URL: https://issues.apache.org/jira/browse/PIG-1316 Project: Pig Issue Type: Improvement Affects Versions: 0.7.0 Reporter: Pradeep Kamath Assignee: Pradeep Kamath Fix For: 0.7.0 Attachments: PIG-1316.patch Currently TextLoader uses TextInputFormat which does not split bzip files - this can be fixed by using Bzip2TextInputformat. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1308) Inifinite loop in JobClient when reading from BinStorage Message: [org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total input paths to process : 2]
[ https://issues.apache.org/jira/browse/PIG-1308?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pradeep Kamath updated PIG-1308: Resolution: Fixed Hadoop Flags: [Reviewed] Status: Resolved (was: Patch Available) Patch committed to trunk Inifinite loop in JobClient when reading from BinStorage Message: [org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total input paths to process : 2] Key: PIG-1308 URL: https://issues.apache.org/jira/browse/PIG-1308 Project: Pig Issue Type: Bug Reporter: Viraj Bhat Assignee: Pradeep Kamath Fix For: 0.7.0 Attachments: PIG-1308.patch Simple script fails to read files from BinStorage() and fails to submit jobs to JobTracker. This occurs with trunk and not with Pig 0.6 branch. {code} data = load 'binstoragesample' using BinStorage() as (s, m, l); A = foreach ULT generate s#'key' as value; X = limit A 20; dump X; {code} When this script is submitted to the Jobtracker, we found the following error: 2010-03-18 22:31:22,296 [main] INFO org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total input paths to process : 2 2010-03-18 22:32:01,574 [main] INFO org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total input paths to process : 2 2010-03-18 22:32:43,276 [main] INFO org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total input paths to process : 2 2010-03-18 22:33:21,743 [main] INFO org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total input paths to process : 2 2010-03-18 22:34:02,004 [main] INFO org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total input paths to process : 2 2010-03-18 22:34:43,442 [main] INFO org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total input paths to process : 2 2010-03-18 22:35:25,907 [main] INFO org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total input paths to process : 2 2010-03-18 22:36:07,402 [main] INFO org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total input paths to process : 2 2010-03-18 22:36:48,596 [main] INFO org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total input paths to process : 2 2010-03-18 22:37:28,014 [main] INFO org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total input paths to process : 2 2010-03-18 22:38:04,823 [main] INFO org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total input paths to process : 2 2010-03-18 22:38:38,981 [main] INFO org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total input paths to process : 2 2010-03-18 22:39:12,220 [main] INFO org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total input paths to process : 2 Stack Trace revelead at org.apache.pig.impl.io.ReadToEndLoader.init(ReadToEndLoader.java:144) at org.apache.pig.impl.io.ReadToEndLoader.init(ReadToEndLoader.java:115) at org.apache.pig.builtin.BinStorage.getSchema(BinStorage.java:404) at org.apache.pig.impl.logicalLayer.LOLoad.determineSchema(LOLoad.java:167) at org.apache.pig.impl.logicalLayer.LOLoad.getProjectionMap(LOLoad.java:263) at org.apache.pig.impl.logicalLayer.ProjectionMapCalculator.visit(ProjectionMapCalculator.java:112) at org.apache.pig.impl.logicalLayer.LOLoad.visit(LOLoad.java:210) at org.apache.pig.impl.logicalLayer.LOLoad.visit(LOLoad.java:52) at org.apache.pig.impl.plan.DependencyOrderWalker.walk(DependencyOrderWalker.java:69) at org.apache.pig.impl.plan.PlanVisitor.visit(PlanVisitor.java:51) at org.apache.pig.impl.logicalLayer.optimizer.LogicalTransformer.rebuildProjectionMaps(LogicalTransformer.java:76) at org.apache.pig.impl.logicalLayer.optimizer.LogicalOptimizer.optimize(LogicalOptimizer.java:216) at org.apache.pig.PigServer.compileLp(PigServer.java:883) at org.apache.pig.PigServer.store(PigServer.java:564) The binstorage data was generated from 2 datasets using limit and union: {code} Large1 = load 'input1' using PigStorage(); Large2 = load 'input2' using PigStorage(); V = limit Large1 1; C = limit Large2 1; U = union V, C; store U into 'binstoragesample' using BinStorage(); {code} -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (PIG-1316) TextLoader should use Bzip2TextInputFormat for bzip files so that bzip files can be efficiently processed by splitting the files
TextLoader should use Bzip2TextInputFormat for bzip files so that bzip files can be efficiently processed by splitting the files Key: PIG-1316 URL: https://issues.apache.org/jira/browse/PIG-1316 Project: Pig Issue Type: Improvement Affects Versions: 0.7.0 Reporter: Pradeep Kamath Assignee: Pradeep Kamath Fix For: 0.7.0 Currently TextLoader uses TextInputFormat which does not split bzip files - this can be fixed by using Bzip2TextInputformat. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (PIG-1317) LOLoad should cache results of LoadMetadata.getSchema() for use in subsequent calls to LOLoad.getSchema() or LOLoad.determineSchema()
LOLoad should cache results of LoadMetadata.getSchema() for use in subsequent calls to LOLoad.getSchema() or LOLoad.determineSchema() - Key: PIG-1317 URL: https://issues.apache.org/jira/browse/PIG-1317 Project: Pig Issue Type: Improvement Affects Versions: 0.7.0 Reporter: Pradeep Kamath Fix For: 0.7.0 In LOLoad.getProjectionMap(), the private method determineSchema() is called which inturn calls LoadMetadata.getSchema() - the latter call could potentially be expensive if the input file is read to determine the schema or a metadata system is contacted to get the schema - determineSchema() can cache the schema it gets so that subsequent calls use the cached version. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1317) LOLoad should cache results of LoadMetadata.getSchema() for use in subsequent calls to LOLoad.getSchema() or LOLoad.determineSchema()
[ https://issues.apache.org/jira/browse/PIG-1317?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pradeep Kamath updated PIG-1317: Assignee: Pradeep Kamath LOLoad should cache results of LoadMetadata.getSchema() for use in subsequent calls to LOLoad.getSchema() or LOLoad.determineSchema() - Key: PIG-1317 URL: https://issues.apache.org/jira/browse/PIG-1317 Project: Pig Issue Type: Improvement Affects Versions: 0.7.0 Reporter: Pradeep Kamath Assignee: Pradeep Kamath Fix For: 0.7.0 In LOLoad.getProjectionMap(), the private method determineSchema() is called which inturn calls LoadMetadata.getSchema() - the latter call could potentially be expensive if the input file is read to determine the schema or a metadata system is contacted to get the schema - determineSchema() can cache the schema it gets so that subsequent calls use the cached version. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (PIG-1323) Communicate whether the call to LoadFunc.setLocation is being made in hadoop's front end or backend
Communicate whether the call to LoadFunc.setLocation is being made in hadoop's front end or backend --- Key: PIG-1323 URL: https://issues.apache.org/jira/browse/PIG-1323 Project: Pig Issue Type: Improvement Affects Versions: 0.7.0 Reporter: Pradeep Kamath Assignee: Pradeep Kamath Fix For: 0.7.0 Loaders which interact with external systems like a metadata server may need to know if the LoadFunc.setLocation call happens from the frontend (on the client machine) or in the backend (on each map task). The Configuration in the Job argument to setLocation() can contain this information. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1323) Communicate whether the call to LoadFunc.setLocation is being made in hadoop's front end or backend
[ https://issues.apache.org/jira/browse/PIG-1323?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pradeep Kamath updated PIG-1323: Status: Patch Available (was: Open) Communicate whether the call to LoadFunc.setLocation is being made in hadoop's front end or backend --- Key: PIG-1323 URL: https://issues.apache.org/jira/browse/PIG-1323 Project: Pig Issue Type: Improvement Affects Versions: 0.7.0 Reporter: Pradeep Kamath Assignee: Pradeep Kamath Fix For: 0.7.0 Attachments: PIG-1323.patch Loaders which interact with external systems like a metadata server may need to know if the LoadFunc.setLocation call happens from the frontend (on the client machine) or in the backend (on each map task). The Configuration in the Job argument to setLocation() can contain this information. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1323) Communicate whether the call to LoadFunc.setLocation is being made in hadoop's front end or backend
[ https://issues.apache.org/jira/browse/PIG-1323?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pradeep Kamath updated PIG-1323: Attachment: PIG-1323.patch Attached patch addresses the issue in the description by setting state in the Configuration depending on where in PigInputFormat the LoadFunc.setLocation() method is called. No tests are includes since testing this in a unit test framework is not feasible - I have manually tested this. Communicate whether the call to LoadFunc.setLocation is being made in hadoop's front end or backend --- Key: PIG-1323 URL: https://issues.apache.org/jira/browse/PIG-1323 Project: Pig Issue Type: Improvement Affects Versions: 0.7.0 Reporter: Pradeep Kamath Assignee: Pradeep Kamath Fix For: 0.7.0 Attachments: PIG-1323.patch Loaders which interact with external systems like a metadata server may need to know if the LoadFunc.setLocation call happens from the frontend (on the client machine) or in the backend (on each map task). The Configuration in the Job argument to setLocation() can contain this information. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1325) Provide a way to exclude a testcase when running ant test
[ https://issues.apache.org/jira/browse/PIG-1325?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pradeep Kamath updated PIG-1325: Attachment: PIG-1325.patch Patch which allows to exclude a particular testcase from the ant test run. I am not submitting this to go through Hadoop QA since this is a build.xml change which has nothing which can be tested by the Hadoop QA process. Provide a way to exclude a testcase when running ant test --- Key: PIG-1325 URL: https://issues.apache.org/jira/browse/PIG-1325 Project: Pig Issue Type: Improvement Reporter: Pradeep Kamath Assignee: Pradeep Kamath Attachments: PIG-1325.patch Provide a way to exclude a testcase when running ant test -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-1325) Provide a way to exclude a testcase when running ant test
[ https://issues.apache.org/jira/browse/PIG-1325?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12848446#action_12848446 ] Pradeep Kamath commented on PIG-1325: - I have tested locally that the change enables the feature requested. Provide a way to exclude a testcase when running ant test --- Key: PIG-1325 URL: https://issues.apache.org/jira/browse/PIG-1325 Project: Pig Issue Type: Improvement Reporter: Pradeep Kamath Assignee: Pradeep Kamath Attachments: PIG-1325.patch Provide a way to exclude a testcase when running ant test -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Resolved: (PIG-1325) Provide a way to exclude a testcase when running ant test
[ https://issues.apache.org/jira/browse/PIG-1325?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pradeep Kamath resolved PIG-1325. - Resolution: Fixed Fix Version/s: 0.7.0 Hadoop Flags: [Reviewed] Patch committed. Provide a way to exclude a testcase when running ant test --- Key: PIG-1325 URL: https://issues.apache.org/jira/browse/PIG-1325 Project: Pig Issue Type: Improvement Reporter: Pradeep Kamath Assignee: Pradeep Kamath Fix For: 0.7.0 Attachments: PIG-1325.patch Provide a way to exclude a testcase when running ant test -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-1285) Allow SingleTupleBag to be serialized
[ https://issues.apache.org/jira/browse/PIG-1285?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12847721#action_12847721 ] Pradeep Kamath commented on PIG-1285: - yes Allow SingleTupleBag to be serialized - Key: PIG-1285 URL: https://issues.apache.org/jira/browse/PIG-1285 Project: Pig Issue Type: Improvement Reporter: Dmitriy V. Ryaboy Assignee: Dmitriy V. Ryaboy Fix For: 0.7.0 Attachments: PIG-1285.patch Currently, Pig uses a SingleTupleBag for efficiency when a full-blown spillable bag implementation is not needed in the Combiner optimization. Unfortunately this can create problems. The below Initial.exec() code fails at run-time with the message that a SingleTupleBag cannot be serialized: {code} @Override public Tuple exec(Tuple in) throws IOException { // single record. just copy. if (in == null) return null; try { Tuple resTuple = tupleFactory_.newTuple(in.size()); for (int i=0; i in.size(); i++) { resTuple.set(i, in.get(i)); } return resTuple; } catch (IOException e) { log.warn(e); return null; } } {code} The code below can fix the problem in the UDF, but it seems like something that should be handled transparently, not requiring UDF authors to know about SingleTupleBags. {code} @Override public Tuple exec(Tuple in) throws IOException { // single record. just copy. if (in == null) return null; /* * Unfortunately SingleTupleBags are not serializable. We cache whether a given index contains a bag * in the map below, and copy all bags into DefaultBags before returning to avoid serialization exceptions. */ MapInteger, Boolean isBagAtIndex = Maps.newHashMap(); try { Tuple resTuple = tupleFactory_.newTuple(in.size()); for (int i=0; i in.size(); i++) { Object obj = in.get(i); if (!isBagAtIndex.containsKey(i)) { isBagAtIndex.put(i, obj instanceof SingleTupleBag); } if (isBagAtIndex.get(i)) { DataBag newBag = bagFactory_.newDefaultBag(); newBag.addAll((DataBag)obj); obj = newBag; } resTuple.set(i, obj); } return resTuple; } catch (IOException e) { log.warn(e); return null; } } {code} -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1308) Inifinite loop in JobClient when reading from BinStorage Message: [org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total input paths to process : 2]
[ https://issues.apache.org/jira/browse/PIG-1308?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pradeep Kamath updated PIG-1308: Attachment: PIG-1308.patch The root cause of the issue is that the OpLimitOptimizer has a relaxed check() implementation which only checks if the node matched by RuleMatcher is a LOLimit which would be true any time there is a LOLimit in the plan. This results in the optimizer running 500 (the current max) iterations of all rules since the OpLimitOptimizer always matches. The attached patch fixes the issue by tightening the implementation of OpLimitOptimizer.check() to return false in cases where LOLimit cannot be pushed up. Inifinite loop in JobClient when reading from BinStorage Message: [org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total input paths to process : 2] Key: PIG-1308 URL: https://issues.apache.org/jira/browse/PIG-1308 Project: Pig Issue Type: Bug Reporter: Viraj Bhat Assignee: Pradeep Kamath Fix For: 0.7.0 Attachments: PIG-1308.patch Simple script fails to read files from BinStorage() and fails to submit jobs to JobTracker. This occurs with trunk and not with Pig 0.6 branch. {code} data = load 'binstoragesample' using BinStorage() as (s, m, l); A = foreach ULT generate s#'key' as value; X = limit A 20; dump X; {code} When this script is submitted to the Jobtracker, we found the following error: 2010-03-18 22:31:22,296 [main] INFO org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total input paths to process : 2 2010-03-18 22:32:01,574 [main] INFO org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total input paths to process : 2 2010-03-18 22:32:43,276 [main] INFO org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total input paths to process : 2 2010-03-18 22:33:21,743 [main] INFO org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total input paths to process : 2 2010-03-18 22:34:02,004 [main] INFO org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total input paths to process : 2 2010-03-18 22:34:43,442 [main] INFO org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total input paths to process : 2 2010-03-18 22:35:25,907 [main] INFO org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total input paths to process : 2 2010-03-18 22:36:07,402 [main] INFO org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total input paths to process : 2 2010-03-18 22:36:48,596 [main] INFO org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total input paths to process : 2 2010-03-18 22:37:28,014 [main] INFO org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total input paths to process : 2 2010-03-18 22:38:04,823 [main] INFO org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total input paths to process : 2 2010-03-18 22:38:38,981 [main] INFO org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total input paths to process : 2 2010-03-18 22:39:12,220 [main] INFO org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total input paths to process : 2 Stack Trace revelead at org.apache.pig.impl.io.ReadToEndLoader.init(ReadToEndLoader.java:144) at org.apache.pig.impl.io.ReadToEndLoader.init(ReadToEndLoader.java:115) at org.apache.pig.builtin.BinStorage.getSchema(BinStorage.java:404) at org.apache.pig.impl.logicalLayer.LOLoad.determineSchema(LOLoad.java:167) at org.apache.pig.impl.logicalLayer.LOLoad.getProjectionMap(LOLoad.java:263) at org.apache.pig.impl.logicalLayer.ProjectionMapCalculator.visit(ProjectionMapCalculator.java:112) at org.apache.pig.impl.logicalLayer.LOLoad.visit(LOLoad.java:210) at org.apache.pig.impl.logicalLayer.LOLoad.visit(LOLoad.java:52) at org.apache.pig.impl.plan.DependencyOrderWalker.walk(DependencyOrderWalker.java:69) at org.apache.pig.impl.plan.PlanVisitor.visit(PlanVisitor.java:51) at org.apache.pig.impl.logicalLayer.optimizer.LogicalTransformer.rebuildProjectionMaps(LogicalTransformer.java:76) at org.apache.pig.impl.logicalLayer.optimizer.LogicalOptimizer.optimize(LogicalOptimizer.java:216) at org.apache.pig.PigServer.compileLp(PigServer.java:883) at org.apache.pig.PigServer.store(PigServer.java:564) The binstorage data was generated from 2 datasets using limit and union: {code} Large1 = load 'input1' using PigStorage(); Large2 = load 'input2' using PigStorage(); V = limit Large1 1; C = limit Large2 1; U = union V, C; store U into 'binstoragesample' using BinStorage(); {code} -- This message
[jira] Updated: (PIG-1308) Inifinite loop in JobClient when reading from BinStorage Message: [org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total input paths to process : 2]
[ https://issues.apache.org/jira/browse/PIG-1308?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pradeep Kamath updated PIG-1308: Status: Patch Available (was: Open) Inifinite loop in JobClient when reading from BinStorage Message: [org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total input paths to process : 2] Key: PIG-1308 URL: https://issues.apache.org/jira/browse/PIG-1308 Project: Pig Issue Type: Bug Reporter: Viraj Bhat Assignee: Pradeep Kamath Fix For: 0.7.0 Attachments: PIG-1308.patch Simple script fails to read files from BinStorage() and fails to submit jobs to JobTracker. This occurs with trunk and not with Pig 0.6 branch. {code} data = load 'binstoragesample' using BinStorage() as (s, m, l); A = foreach ULT generate s#'key' as value; X = limit A 20; dump X; {code} When this script is submitted to the Jobtracker, we found the following error: 2010-03-18 22:31:22,296 [main] INFO org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total input paths to process : 2 2010-03-18 22:32:01,574 [main] INFO org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total input paths to process : 2 2010-03-18 22:32:43,276 [main] INFO org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total input paths to process : 2 2010-03-18 22:33:21,743 [main] INFO org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total input paths to process : 2 2010-03-18 22:34:02,004 [main] INFO org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total input paths to process : 2 2010-03-18 22:34:43,442 [main] INFO org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total input paths to process : 2 2010-03-18 22:35:25,907 [main] INFO org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total input paths to process : 2 2010-03-18 22:36:07,402 [main] INFO org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total input paths to process : 2 2010-03-18 22:36:48,596 [main] INFO org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total input paths to process : 2 2010-03-18 22:37:28,014 [main] INFO org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total input paths to process : 2 2010-03-18 22:38:04,823 [main] INFO org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total input paths to process : 2 2010-03-18 22:38:38,981 [main] INFO org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total input paths to process : 2 2010-03-18 22:39:12,220 [main] INFO org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total input paths to process : 2 Stack Trace revelead at org.apache.pig.impl.io.ReadToEndLoader.init(ReadToEndLoader.java:144) at org.apache.pig.impl.io.ReadToEndLoader.init(ReadToEndLoader.java:115) at org.apache.pig.builtin.BinStorage.getSchema(BinStorage.java:404) at org.apache.pig.impl.logicalLayer.LOLoad.determineSchema(LOLoad.java:167) at org.apache.pig.impl.logicalLayer.LOLoad.getProjectionMap(LOLoad.java:263) at org.apache.pig.impl.logicalLayer.ProjectionMapCalculator.visit(ProjectionMapCalculator.java:112) at org.apache.pig.impl.logicalLayer.LOLoad.visit(LOLoad.java:210) at org.apache.pig.impl.logicalLayer.LOLoad.visit(LOLoad.java:52) at org.apache.pig.impl.plan.DependencyOrderWalker.walk(DependencyOrderWalker.java:69) at org.apache.pig.impl.plan.PlanVisitor.visit(PlanVisitor.java:51) at org.apache.pig.impl.logicalLayer.optimizer.LogicalTransformer.rebuildProjectionMaps(LogicalTransformer.java:76) at org.apache.pig.impl.logicalLayer.optimizer.LogicalOptimizer.optimize(LogicalOptimizer.java:216) at org.apache.pig.PigServer.compileLp(PigServer.java:883) at org.apache.pig.PigServer.store(PigServer.java:564) The binstorage data was generated from 2 datasets using limit and union: {code} Large1 = load 'input1' using PigStorage(); Large2 = load 'input2' using PigStorage(); V = limit Large1 1; C = limit Large2 1; U = union V, C; store U into 'binstoragesample' using BinStorage(); {code} -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-1287) Use hadoop-0.20.2 with pig 0.7.0 release
[ https://issues.apache.org/jira/browse/PIG-1287?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12846637#action_12846637 ] Pradeep Kamath commented on PIG-1287: - The unit test failures are because hadoop QA process is not using the hadoop.jar attached in this patch - I ran tests locally on mymachine with the new jar and they all passed. Use hadoop-0.20.2 with pig 0.7.0 release Key: PIG-1287 URL: https://issues.apache.org/jira/browse/PIG-1287 Project: Pig Issue Type: Task Affects Versions: 0.7.0 Reporter: Pradeep Kamath Assignee: Pradeep Kamath Fix For: 0.7.0 Attachments: hadoop20.jar, PIG-1287-2.patch, PIG-1287.patch Use hadoop-0.20.2 with pig 0.7.0 release -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-1257) PigStorage per the new load-store redesign should support splitting of bzip files
[ https://issues.apache.org/jira/browse/PIG-1257?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12846038#action_12846038 ] Pradeep Kamath commented on PIG-1257: - In the following case in inputData the record will end with \r won't it? (notice the \r in the middle after 2) {code} 1\t2\r3\t4, // '\r' case - this will be split into two tuples {code} PigStorage per the new load-store redesign should support splitting of bzip files - Key: PIG-1257 URL: https://issues.apache.org/jira/browse/PIG-1257 Project: Pig Issue Type: Bug Affects Versions: 0.7.0 Reporter: Pradeep Kamath Assignee: Pradeep Kamath Fix For: 0.7.0 Attachments: blockEndingInCR.txt.bz2, blockHeaderEndsAt136500.txt.bz2, PIG-1257-2.patch, PIG-1257-3.patch, PIG-1257.patch, recordLossblockHeaderEndsAt136500.txt.bz2 PigStorage implemented per new load-store-redesign (PIG-966) is based on TextInputFormat for reading data. TextInputFormat has support for reading bzip data but without support for splitting bzip files. In pig 0.6, splitting was enabled for bzip files - we should attempt to enable that feature. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-1257) PigStorage per the new load-store redesign should support splitting of bzip files
[ https://issues.apache.org/jira/browse/PIG-1257?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12846080#action_12846080 ] Pradeep Kamath commented on PIG-1257: - I ran all unit tests on my local machines and also the test-patch ant target: [exec] +1 overall. [exec] [exec] +1 @author. The patch does not contain any @author tags. [exec] [exec] +1 tests included. The patch appears to include 12 new or modified tests. [exec] [exec] +1 javadoc. The javadoc tool did not generate any warning messages. [exec] [exec] +1 javac. The applied patch does not increase the total number of javac compiler warnings. [exec] [exec] +1 findbugs. The patch does not introduce any new Findbugs warnings. [exec] [exec] +1 release audit. The applied patch does not increase the total number of release audit warnings. [exec] [exec] PigStorage per the new load-store redesign should support splitting of bzip files - Key: PIG-1257 URL: https://issues.apache.org/jira/browse/PIG-1257 Project: Pig Issue Type: Bug Affects Versions: 0.7.0 Reporter: Pradeep Kamath Assignee: Pradeep Kamath Fix For: 0.7.0 Attachments: blockEndingInCR.txt.bz2, blockHeaderEndsAt136500.txt.bz2, PIG-1257-2.patch, PIG-1257-3.patch, PIG-1257.patch, recordLossblockHeaderEndsAt136500.txt.bz2 PigStorage implemented per new load-store-redesign (PIG-966) is based on TextInputFormat for reading data. TextInputFormat has support for reading bzip data but without support for splitting bzip files. In pig 0.6, splitting was enabled for bzip files - we should attempt to enable that feature. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (PIG-1302) Include zebra's
Include zebra's Key: PIG-1302 URL: https://issues.apache.org/jira/browse/PIG-1302 Project: Pig Issue Type: Improvement Reporter: Pradeep Kamath -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1302) Include zebra's pigtest ant target as a part of pig's ant test target
[ https://issues.apache.org/jira/browse/PIG-1302?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pradeep Kamath updated PIG-1302: Description: There are changes made in Pig interfaces which break zebra loaders/storers. It would be good to run the pig tests in the zebra unit tests as part of running pig's core-test for each patch submission. So essentially in the test ant target in pig, we would need to invoke zebra's pigtest target. Affects Version/s: 0.7.0 Fix Version/s: 0.7.0 Summary: Include zebra's pigtest ant target as a part of pig's ant test target (was: Include zebra's ) Include zebra's pigtest ant target as a part of pig's ant test target --- Key: PIG-1302 URL: https://issues.apache.org/jira/browse/PIG-1302 Project: Pig Issue Type: Improvement Affects Versions: 0.7.0 Reporter: Pradeep Kamath Fix For: 0.7.0 There are changes made in Pig interfaces which break zebra loaders/storers. It would be good to run the pig tests in the zebra unit tests as part of running pig's core-test for each patch submission. So essentially in the test ant target in pig, we would need to invoke zebra's pigtest target. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1257) PigStorage per the new load-store redesign should support splitting of bzip files
[ https://issues.apache.org/jira/browse/PIG-1257?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pradeep Kamath updated PIG-1257: Resolution: Fixed Hadoop Flags: [Reviewed] Status: Resolved (was: Patch Available) Patch committed PigStorage per the new load-store redesign should support splitting of bzip files - Key: PIG-1257 URL: https://issues.apache.org/jira/browse/PIG-1257 Project: Pig Issue Type: Bug Affects Versions: 0.7.0 Reporter: Pradeep Kamath Assignee: Pradeep Kamath Fix For: 0.7.0 Attachments: blockEndingInCR.txt.bz2, blockHeaderEndsAt136500.txt.bz2, PIG-1257-2.patch, PIG-1257-3.patch, PIG-1257.patch, recordLossblockHeaderEndsAt136500.txt.bz2 PigStorage implemented per new load-store-redesign (PIG-966) is based on TextInputFormat for reading data. TextInputFormat has support for reading bzip data but without support for splitting bzip files. In pig 0.6, splitting was enabled for bzip files - we should attempt to enable that feature. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1287) Use hadoop-0.20.2 with pig 0.7.0 release
[ https://issues.apache.org/jira/browse/PIG-1287?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pradeep Kamath updated PIG-1287: Attachment: PIG-1287-2.patch The new patch also fixes warning aggregation in PigHadoopLogger to use the counter support now available in hadoop 0.20.2 Use hadoop-0.20.2 with pig 0.7.0 release Key: PIG-1287 URL: https://issues.apache.org/jira/browse/PIG-1287 Project: Pig Issue Type: Task Affects Versions: 0.7.0 Reporter: Pradeep Kamath Assignee: Pradeep Kamath Fix For: 0.7.0 Attachments: hadoop20.jar, PIG-1287-2.patch, PIG-1287.patch Use hadoop-0.20.2 with pig 0.7.0 release -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1287) Use hadoop-0.20.2 with pig 0.7.0 release
[ https://issues.apache.org/jira/browse/PIG-1287?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pradeep Kamath updated PIG-1287: Status: Patch Available (was: Open) Use hadoop-0.20.2 with pig 0.7.0 release Key: PIG-1287 URL: https://issues.apache.org/jira/browse/PIG-1287 Project: Pig Issue Type: Task Affects Versions: 0.7.0 Reporter: Pradeep Kamath Assignee: Pradeep Kamath Fix For: 0.7.0 Attachments: hadoop20.jar, PIG-1287-2.patch, PIG-1287.patch Use hadoop-0.20.2 with pig 0.7.0 release -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-1205) Enhance HBaseStorage-- Make it support loading row key and implement StoreFunc
[ https://issues.apache.org/jira/browse/PIG-1205?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12846224#action_12846224 ] Pradeep Kamath commented on PIG-1205: - Jeff, if the only issue blocking the commit is javac warning - unless the warning is due to use of deprecated hadoop API, we should fix it - if it is due to deprecated hadoop API then its ok to ignore. Very soon trunk will be branched for Pig 0.7.0 - so if this feature is useful to feature in Pig 0.7.0, we should get this committed soon. Enhance HBaseStorage-- Make it support loading row key and implement StoreFunc -- Key: PIG-1205 URL: https://issues.apache.org/jira/browse/PIG-1205 Project: Pig Issue Type: Sub-task Affects Versions: 0.7.0 Reporter: Jeff Zhang Assignee: Jeff Zhang Fix For: 0.7.0 Attachments: PIG_1205.patch, PIG_1205_2.patch, PIG_1205_3.patch, PIG_1205_4.patch -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-1285) Allow SingleTupleBag to be serialized
[ https://issues.apache.org/jira/browse/PIG-1285?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12845458#action_12845458 ] Pradeep Kamath commented on PIG-1285: - Couple of comments: * I think instead of the code below the implementation of write should be inlined into SingleTupleBag.write() (I guess DefaultDataBag.write() and SingleTupleBag.write() could call a common method to implement write()). {noformat} +DataBag bag = bagFactory.newDefaultBag(); +bag.addAll(this); +bag.write(out) {noformat} The reason is that bagFactory.newDefaultBag() registers the bag with the SpillableMemoryManager which inturn puts a weak reference to the bag on a Linked list - in the past we have seen this list grow in size and cause memory issue and was one of the main motivations for creating SingleTupleBag. * There is an implementation for write() but not read() - reading through the code I guess this is because during deserialization SingleTupleBag.read() will not be called but DefaultDataBag.read() would be called. I am wondering if leaving the SingleTupleBag.read() as-is is confusing since it throws an exception with the message - SingleTupleBag should never be serialized or deserialized. Allow SingleTupleBag to be serialized - Key: PIG-1285 URL: https://issues.apache.org/jira/browse/PIG-1285 Project: Pig Issue Type: Improvement Reporter: Dmitriy V. Ryaboy Assignee: Dmitriy V. Ryaboy Fix For: 0.7.0 Attachments: PIG-1285.patch Currently, Pig uses a SingleTupleBag for efficiency when a full-blown spillable bag implementation is not needed in the Combiner optimization. Unfortunately this can create problems. The below Initial.exec() code fails at run-time with the message that a SingleTupleBag cannot be serialized: {code} @Override public Tuple exec(Tuple in) throws IOException { // single record. just copy. if (in == null) return null; try { Tuple resTuple = tupleFactory_.newTuple(in.size()); for (int i=0; i in.size(); i++) { resTuple.set(i, in.get(i)); } return resTuple; } catch (IOException e) { log.warn(e); return null; } } {code} The code below can fix the problem in the UDF, but it seems like something that should be handled transparently, not requiring UDF authors to know about SingleTupleBags. {code} @Override public Tuple exec(Tuple in) throws IOException { // single record. just copy. if (in == null) return null; /* * Unfortunately SingleTupleBags are not serializable. We cache whether a given index contains a bag * in the map below, and copy all bags into DefaultBags before returning to avoid serialization exceptions. */ MapInteger, Boolean isBagAtIndex = Maps.newHashMap(); try { Tuple resTuple = tupleFactory_.newTuple(in.size()); for (int i=0; i in.size(); i++) { Object obj = in.get(i); if (!isBagAtIndex.containsKey(i)) { isBagAtIndex.put(i, obj instanceof SingleTupleBag); } if (isBagAtIndex.get(i)) { DataBag newBag = bagFactory_.newDefaultBag(); newBag.addAll((DataBag)obj); obj = newBag; } resTuple.set(i, obj); } return resTuple; } catch (IOException e) { log.warn(e); return null; } } {code} -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-1285) Allow SingleTupleBag to be serialized
[ https://issues.apache.org/jira/browse/PIG-1285?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12845606#action_12845606 ] Pradeep Kamath commented on PIG-1285: - SingleTupleBag did not go the route of extending DefaultAbstractBag for a couple of reasons 1) The object would have few more members (like mMemSize* fields, mSize etc which are present in DefaultAbstractBag) - this would make the object bigger in memory and SingleTupleBag was designed to be used in map/combine phase with minimal memory overhead 2) The first point in my previous comment - we don't want this bag to register with SpillableMemoryManger which in turn puts a weak reference to the bag on a Linked list - in the past we have seen this list grow in size and itself cause memory issues Allow SingleTupleBag to be serialized - Key: PIG-1285 URL: https://issues.apache.org/jira/browse/PIG-1285 Project: Pig Issue Type: Improvement Reporter: Dmitriy V. Ryaboy Assignee: Dmitriy V. Ryaboy Fix For: 0.7.0 Attachments: PIG-1285.patch Currently, Pig uses a SingleTupleBag for efficiency when a full-blown spillable bag implementation is not needed in the Combiner optimization. Unfortunately this can create problems. The below Initial.exec() code fails at run-time with the message that a SingleTupleBag cannot be serialized: {code} @Override public Tuple exec(Tuple in) throws IOException { // single record. just copy. if (in == null) return null; try { Tuple resTuple = tupleFactory_.newTuple(in.size()); for (int i=0; i in.size(); i++) { resTuple.set(i, in.get(i)); } return resTuple; } catch (IOException e) { log.warn(e); return null; } } {code} The code below can fix the problem in the UDF, but it seems like something that should be handled transparently, not requiring UDF authors to know about SingleTupleBags. {code} @Override public Tuple exec(Tuple in) throws IOException { // single record. just copy. if (in == null) return null; /* * Unfortunately SingleTupleBags are not serializable. We cache whether a given index contains a bag * in the map below, and copy all bags into DefaultBags before returning to avoid serialization exceptions. */ MapInteger, Boolean isBagAtIndex = Maps.newHashMap(); try { Tuple resTuple = tupleFactory_.newTuple(in.size()); for (int i=0; i in.size(); i++) { Object obj = in.get(i); if (!isBagAtIndex.containsKey(i)) { isBagAtIndex.put(i, obj instanceof SingleTupleBag); } if (isBagAtIndex.get(i)) { DataBag newBag = bagFactory_.newDefaultBag(); newBag.addAll((DataBag)obj); obj = newBag; } resTuple.set(i, obj); } return resTuple; } catch (IOException e) { log.warn(e); return null; } } {code} -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1257) PigStorage per the new load-store redesign should support splitting of bzip files
[ https://issues.apache.org/jira/browse/PIG-1257?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pradeep Kamath updated PIG-1257: Status: Open (was: Patch Available) PigStorage per the new load-store redesign should support splitting of bzip files - Key: PIG-1257 URL: https://issues.apache.org/jira/browse/PIG-1257 Project: Pig Issue Type: Bug Affects Versions: 0.7.0 Reporter: Pradeep Kamath Assignee: Pradeep Kamath Fix For: 0.7.0 Attachments: PIG-1257-2.patch, PIG-1257.patch PigStorage implemented per new load-store-redesign (PIG-966) is based on TextInputFormat for reading data. TextInputFormat has support for reading bzip data but without support for splitting bzip files. In pig 0.6, splitting was enabled for bzip files - we should attempt to enable that feature. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1257) PigStorage per the new load-store redesign should support splitting of bzip files
[ https://issues.apache.org/jira/browse/PIG-1257?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pradeep Kamath updated PIG-1257: Attachment: blockHeaderEndsAt136500.txt.bz2 blockEndingInCR.txt.bz2 PIG-1257-3.patch Since the last patch, I uncovered some issue with code while testing some boundary conditions. I have fixed those in the new patch PIG-1257-3.patch and included those boundary conditions in testcases in TestBZip PigStorage per the new load-store redesign should support splitting of bzip files - Key: PIG-1257 URL: https://issues.apache.org/jira/browse/PIG-1257 Project: Pig Issue Type: Bug Affects Versions: 0.7.0 Reporter: Pradeep Kamath Assignee: Pradeep Kamath Fix For: 0.7.0 Attachments: blockEndingInCR.txt.bz2, blockHeaderEndsAt136500.txt.bz2, PIG-1257-2.patch, PIG-1257-3.patch, PIG-1257.patch PigStorage implemented per new load-store-redesign (PIG-966) is based on TextInputFormat for reading data. TextInputFormat has support for reading bzip data but without support for splitting bzip files. In pig 0.6, splitting was enabled for bzip files - we should attempt to enable that feature. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1257) PigStorage per the new load-store redesign should support splitting of bzip files
[ https://issues.apache.org/jira/browse/PIG-1257?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pradeep Kamath updated PIG-1257: Status: Patch Available (was: Open) PigStorage per the new load-store redesign should support splitting of bzip files - Key: PIG-1257 URL: https://issues.apache.org/jira/browse/PIG-1257 Project: Pig Issue Type: Bug Affects Versions: 0.7.0 Reporter: Pradeep Kamath Assignee: Pradeep Kamath Fix For: 0.7.0 Attachments: blockEndingInCR.txt.bz2, blockHeaderEndsAt136500.txt.bz2, PIG-1257-2.patch, PIG-1257-3.patch, PIG-1257.patch, recordLossblockHeaderEndsAt136500.txt.bz2 PigStorage implemented per new load-store-redesign (PIG-966) is based on TextInputFormat for reading data. TextInputFormat has support for reading bzip data but without support for splitting bzip files. In pig 0.6, splitting was enabled for bzip files - we should attempt to enable that feature. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1257) PigStorage per the new load-store redesign should support splitting of bzip files
[ https://issues.apache.org/jira/browse/PIG-1257?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pradeep Kamath updated PIG-1257: Attachment: recordLossblockHeaderEndsAt136500.txt.bz2 The .bz2 files attached to this issue should be put in test/org/apache/pig/test/data for this patch to pass unit tests. PigStorage per the new load-store redesign should support splitting of bzip files - Key: PIG-1257 URL: https://issues.apache.org/jira/browse/PIG-1257 Project: Pig Issue Type: Bug Affects Versions: 0.7.0 Reporter: Pradeep Kamath Assignee: Pradeep Kamath Fix For: 0.7.0 Attachments: blockEndingInCR.txt.bz2, blockHeaderEndsAt136500.txt.bz2, PIG-1257-2.patch, PIG-1257-3.patch, PIG-1257.patch, recordLossblockHeaderEndsAt136500.txt.bz2 PigStorage implemented per new load-store-redesign (PIG-966) is based on TextInputFormat for reading data. TextInputFormat has support for reading bzip data but without support for splitting bzip files. In pig 0.6, splitting was enabled for bzip files - we should attempt to enable that feature. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-1292) Interface Refinements
[ https://issues.apache.org/jira/browse/PIG-1292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12845717#action_12845717 ] Pradeep Kamath commented on PIG-1292: - As Xuefu mentioned, we can get rid of the splitIdx argument in public WritableComparable? getSplitComparable(InputSplit split, int splitIdx). Otherwise the changes look good, +1 for commit with the above change. Interface Refinements - Key: PIG-1292 URL: https://issues.apache.org/jira/browse/PIG-1292 Project: Pig Issue Type: Bug Affects Versions: 0.7.0 Reporter: Ashutosh Chauhan Assignee: Ashutosh Chauhan Fix For: 0.7.0 Attachments: pig-1292.patch, pig-interfaces.patch A loader can't implement both OrderedLoadFunc and IndexableLoadFunc, as both are abstract classes instead of being interfaces. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1290) WeightedRangePartitioner should not check if input is empty if quantile file is empty
[ https://issues.apache.org/jira/browse/PIG-1290?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pradeep Kamath updated PIG-1290: Resolution: Fixed Hadoop Flags: [Reviewed] Status: Resolved (was: Patch Available) Core tests ran successfully on my machine and looking at the test report the failures seem transient. I haven't included new tests in this patch since an existing test covers the change in this patch. Patch committed. WeightedRangePartitioner should not check if input is empty if quantile file is empty - Key: PIG-1290 URL: https://issues.apache.org/jira/browse/PIG-1290 Project: Pig Issue Type: Bug Affects Versions: 0.6.0, 0.7.0 Reporter: Pradeep Kamath Assignee: Pradeep Kamath Fix For: 0.7.0 Attachments: PIG-1290.patch Currently WeightedRangePartitioner checks if the input is also empty if the quantile file is empty. For this it tries to read the input (which under the covers will result in creating splits for the input etc). If the input is a directory with many files, this could result in many calls to the namenode from each task - this can be avoided. If the input is non empty and quantile file is empty, then we would error out anyway (this should be confirmed). Also while fixing this jira we should ensure that pig can still do order by on empty input. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1290) WeightedRangePartitioner should not check if input is empty if quantile file is empty
[ https://issues.apache.org/jira/browse/PIG-1290?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pradeep Kamath updated PIG-1290: Status: Open (was: Patch Available) WeightedRangePartitioner should not check if input is empty if quantile file is empty - Key: PIG-1290 URL: https://issues.apache.org/jira/browse/PIG-1290 Project: Pig Issue Type: Bug Affects Versions: 0.6.0, 0.7.0 Reporter: Pradeep Kamath Assignee: Pradeep Kamath Fix For: 0.7.0 Attachments: PIG-1290.patch Currently WeightedRangePartitioner checks if the input is also empty if the quantile file is empty. For this it tries to read the input (which under the covers will result in creating splits for the input etc). If the input is a directory with many files, this could result in many calls to the namenode from each task - this can be avoided. If the input is non empty and quantile file is empty, then we would error out anyway (this should be confirmed). Also while fixing this jira we should ensure that pig can still do order by on empty input. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1290) WeightedRangePartitioner should not check if input is empty if quantile file is empty
[ https://issues.apache.org/jira/browse/PIG-1290?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pradeep Kamath updated PIG-1290: Status: Patch Available (was: Open) Looks like the unit test failure was due to some other check in which has now got fixed - resubmitting WeightedRangePartitioner should not check if input is empty if quantile file is empty - Key: PIG-1290 URL: https://issues.apache.org/jira/browse/PIG-1290 Project: Pig Issue Type: Bug Affects Versions: 0.6.0, 0.7.0 Reporter: Pradeep Kamath Assignee: Pradeep Kamath Fix For: 0.7.0 Attachments: PIG-1290.patch Currently WeightedRangePartitioner checks if the input is also empty if the quantile file is empty. For this it tries to read the input (which under the covers will result in creating splits for the input etc). If the input is a directory with many files, this could result in many calls to the namenode from each task - this can be avoided. If the input is non empty and quantile file is empty, then we would error out anyway (this should be confirmed). Also while fixing this jira we should ensure that pig can still do order by on empty input. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1290) WeightedRangePartitioner should not check if input is empty if quantile file is empty
[ https://issues.apache.org/jira/browse/PIG-1290?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pradeep Kamath updated PIG-1290: Status: Patch Available (was: Open) Again there seem to be transient unrelated test failures - am resubmitting one more time - will also kick off a unit test run on my machine. WeightedRangePartitioner should not check if input is empty if quantile file is empty - Key: PIG-1290 URL: https://issues.apache.org/jira/browse/PIG-1290 Project: Pig Issue Type: Bug Affects Versions: 0.6.0, 0.7.0 Reporter: Pradeep Kamath Assignee: Pradeep Kamath Fix For: 0.7.0 Attachments: PIG-1290.patch Currently WeightedRangePartitioner checks if the input is also empty if the quantile file is empty. For this it tries to read the input (which under the covers will result in creating splits for the input etc). If the input is a directory with many files, this could result in many calls to the namenode from each task - this can be avoided. If the input is non empty and quantile file is empty, then we would error out anyway (this should be confirmed). Also while fixing this jira we should ensure that pig can still do order by on empty input. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Assigned: (PIG-1290) WeightedRangePartitioner should not check if input is empty if quantile file is empty
[ https://issues.apache.org/jira/browse/PIG-1290?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pradeep Kamath reassigned PIG-1290: --- Assignee: Pradeep Kamath WeightedRangePartitioner should not check if input is empty if quantile file is empty - Key: PIG-1290 URL: https://issues.apache.org/jira/browse/PIG-1290 Project: Pig Issue Type: Bug Affects Versions: 0.6.0, 0.7.0 Reporter: Pradeep Kamath Assignee: Pradeep Kamath Fix For: 0.7.0 Currently WeightedRangePartitioner checks if the input is also empty if the quantile file is empty. For this it tries to read the input (which under the covers will result in creating splits for the input etc). If the input is a directory with many files, this could result in many calls to the namenode from each task - this can be avoided. If the input is non empty and quantile file is empty, then we would error out anyway (this should be confirmed). Also while fixing this jira we should ensure that pig can still do order by on empty input. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1290) WeightedRangePartitioner should not check if input is empty if quantile file is empty
[ https://issues.apache.org/jira/browse/PIG-1290?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pradeep Kamath updated PIG-1290: Status: Patch Available (was: Open) WeightedRangePartitioner should not check if input is empty if quantile file is empty - Key: PIG-1290 URL: https://issues.apache.org/jira/browse/PIG-1290 Project: Pig Issue Type: Bug Affects Versions: 0.6.0, 0.7.0 Reporter: Pradeep Kamath Assignee: Pradeep Kamath Fix For: 0.7.0 Attachments: PIG-1290.patch Currently WeightedRangePartitioner checks if the input is also empty if the quantile file is empty. For this it tries to read the input (which under the covers will result in creating splits for the input etc). If the input is a directory with many files, this could result in many calls to the namenode from each task - this can be avoided. If the input is non empty and quantile file is empty, then we would error out anyway (this should be confirmed). Also while fixing this jira we should ensure that pig can still do order by on empty input. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1290) WeightedRangePartitioner should not check if input is empty if quantile file is empty
[ https://issues.apache.org/jira/browse/PIG-1290?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pradeep Kamath updated PIG-1290: Attachment: PIG-1290.patch Attached patch removes the check in WeightedRangePartitioner to check that the input is empty when quantile file is empty. There is already a test -testEmptyStore in TestEvalPipeline2 to test that pig handles order by on empty files fine - so this patch does not include any new tests. WeightedRangePartitioner should not check if input is empty if quantile file is empty - Key: PIG-1290 URL: https://issues.apache.org/jira/browse/PIG-1290 Project: Pig Issue Type: Bug Affects Versions: 0.6.0, 0.7.0 Reporter: Pradeep Kamath Assignee: Pradeep Kamath Fix For: 0.7.0 Attachments: PIG-1290.patch Currently WeightedRangePartitioner checks if the input is also empty if the quantile file is empty. For this it tries to read the input (which under the covers will result in creating splits for the input etc). If the input is a directory with many files, this could result in many calls to the namenode from each task - this can be avoided. If the input is non empty and quantile file is empty, then we would error out anyway (this should be confirmed). Also while fixing this jira we should ensure that pig can still do order by on empty input. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-1205) Enhance HBaseStorage-- Make it support loading row key and implement StoreFunc
[ https://issues.apache.org/jira/browse/PIG-1205?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12843949#action_12843949 ] Pradeep Kamath commented on PIG-1205: - Jeff, unless the warning is due to use of deprecated hadoop API, we should fix it - if it is due to deprecated hadoop API then its ok to ignore. Enhance HBaseStorage-- Make it support loading row key and implement StoreFunc -- Key: PIG-1205 URL: https://issues.apache.org/jira/browse/PIG-1205 Project: Pig Issue Type: Sub-task Affects Versions: 0.7.0 Reporter: Jeff Zhang Assignee: Jeff Zhang Fix For: 0.7.0 Attachments: PIG_1205.patch, PIG_1205_2.patch, PIG_1205_3.patch, PIG_1205_4.patch -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-1287) Use hadoop-0.20.2 with pig 0.7.0 release
[ https://issues.apache.org/jira/browse/PIG-1287?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12843381#action_12843381 ] Pradeep Kamath commented on PIG-1287: - 0.20.2 is supposed to backward compatible with 0.20.1 - I am also running some tests on a 0.20.1 cluster to ensure that there are no failures due to incompatiblities. Use hadoop-0.20.2 with pig 0.7.0 release Key: PIG-1287 URL: https://issues.apache.org/jira/browse/PIG-1287 Project: Pig Issue Type: Task Affects Versions: 0.7.0 Reporter: Pradeep Kamath Assignee: Pradeep Kamath Fix For: 0.7.0 Attachments: hadoop20.jar, PIG-1287.patch Use hadoop-0.20.2 with pig 0.7.0 release -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1205) Enhance HBaseStorage-- Make it support loading row key and implement StoreFunc
[ https://issues.apache.org/jira/browse/PIG-1205?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pradeep Kamath updated PIG-1205: Status: Patch Available (was: Open) Enhance HBaseStorage-- Make it support loading row key and implement StoreFunc -- Key: PIG-1205 URL: https://issues.apache.org/jira/browse/PIG-1205 Project: Pig Issue Type: Sub-task Affects Versions: 0.7.0 Reporter: Jeff Zhang Assignee: Jeff Zhang Fix For: 0.7.0 Attachments: PIG_1205.patch, PIG_1205_2.patch, PIG_1205_3.patch -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-1205) Enhance HBaseStorage-- Make it support loading row key and implement StoreFunc
[ https://issues.apache.org/jira/browse/PIG-1205?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12841937#action_12841937 ] Pradeep Kamath commented on PIG-1205: - Review comments: 1) The top level comment in HBaseStorage reads - A Hbase loader - am wondering if it is worth keeping it a loader (maybe change the name to HBaseLoader) and create a separate Storer which extends StoreFunc rather than have HBaseStorage implement StoreFuncInterface - by extending the StoreFunc, if new functions with default implementations are added then the Storer will not need to change. The disadvantage is if we call the loader HBaseLoader, existing users of HBaseStorage would have to change their scripts to use HBaseLoader instead. This is just a suggestion - I am fine if HBaseStorage does both load and store and implements StoreFuncInterface - Jeff I will let you decide which is better. If you choose to do both load and store in HBaseStorage change the top level comment accordingly. 2) The following method implementation should change from: {code} @Override public String relToAbsPathForStoreLocation(String location, Path curDir) throws IOException { // TODO Auto-generated method stub return null; } {code} to {code} @Override public String relToAbsPathForStoreLocation(String location, Path curDir) throws IOException { return location; } {code} Also, do address the javadoc/javac issues reported above. If the above are addressed, +1 for the patch (I don't have enough HBase knowledge to review the HBase specific code - I have only reviewed the use of load/store API). Enhance HBaseStorage-- Make it support loading row key and implement StoreFunc -- Key: PIG-1205 URL: https://issues.apache.org/jira/browse/PIG-1205 Project: Pig Issue Type: Sub-task Affects Versions: 0.7.0 Reporter: Jeff Zhang Assignee: Jeff Zhang Fix For: 0.7.0 Attachments: PIG_1205.patch, PIG_1205_2.patch -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1265) Change LoadMetadata and StoreMetadata to use Job instead of Configuraiton and add a cleanupOnFailure method to StoreFuncInterface
[ https://issues.apache.org/jira/browse/PIG-1265?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pradeep Kamath updated PIG-1265: Status: Patch Available (was: Open) Change LoadMetadata and StoreMetadata to use Job instead of Configuraiton and add a cleanupOnFailure method to StoreFuncInterface - Key: PIG-1265 URL: https://issues.apache.org/jira/browse/PIG-1265 Project: Pig Issue Type: Bug Affects Versions: 0.7.0 Reporter: Pradeep Kamath Assignee: Pradeep Kamath Fix For: 0.7.0 Attachments: PIG-1265-2.patch, PIG-1265.patch Speaking to the hadoop team folks, the direction in hadoop is to use Job instead of Configuration - for example InputFormat/OutputFormat implementations use Job to store input/output location. So pig should also do the same in LoadMetadata and StoreMetadata to be closer to hadoop. Currently when a job fails, pig assumes the output locations (corresponding to the stores in the job) are hdfs locations and attempts to delete them. Since output locations could be non hdfs locations, this cleanup should be delegated to the StoreFuncInterface implementation - hence a new method - cleanupOnFailure() should be introduced in StoreFuncInterface and a default implementation should be provided in the StoreFunc abstract class which checks if the location exists on hdfs and deletes it if so. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1265) Change LoadMetadata and StoreMetadata to use Job instead of Configuraiton and add a cleanupOnFailure method to StoreFuncInterface
[ https://issues.apache.org/jira/browse/PIG-1265?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pradeep Kamath updated PIG-1265: Attachment: PIG-1265-2.patch There were some failures in zebra nightly tests which are addressed in the new patch. Change LoadMetadata and StoreMetadata to use Job instead of Configuraiton and add a cleanupOnFailure method to StoreFuncInterface - Key: PIG-1265 URL: https://issues.apache.org/jira/browse/PIG-1265 Project: Pig Issue Type: Bug Affects Versions: 0.7.0 Reporter: Pradeep Kamath Assignee: Pradeep Kamath Fix For: 0.7.0 Attachments: PIG-1265-2.patch, PIG-1265.patch Speaking to the hadoop team folks, the direction in hadoop is to use Job instead of Configuration - for example InputFormat/OutputFormat implementations use Job to store input/output location. So pig should also do the same in LoadMetadata and StoreMetadata to be closer to hadoop. Currently when a job fails, pig assumes the output locations (corresponding to the stores in the job) are hdfs locations and attempts to delete them. Since output locations could be non hdfs locations, this cleanup should be delegated to the StoreFuncInterface implementation - hence a new method - cleanupOnFailure() should be introduced in StoreFuncInterface and a default implementation should be provided in the StoreFunc abstract class which checks if the location exists on hdfs and deletes it if so. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1265) Change LoadMetadata and StoreMetadata to use Job instead of Configuraiton and add a cleanupOnFailure method to StoreFuncInterface
[ https://issues.apache.org/jira/browse/PIG-1265?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pradeep Kamath updated PIG-1265: Status: Open (was: Patch Available) Change LoadMetadata and StoreMetadata to use Job instead of Configuraiton and add a cleanupOnFailure method to StoreFuncInterface - Key: PIG-1265 URL: https://issues.apache.org/jira/browse/PIG-1265 Project: Pig Issue Type: Bug Affects Versions: 0.7.0 Reporter: Pradeep Kamath Assignee: Pradeep Kamath Fix For: 0.7.0 Attachments: PIG-1265-2.patch, PIG-1265.patch Speaking to the hadoop team folks, the direction in hadoop is to use Job instead of Configuration - for example InputFormat/OutputFormat implementations use Job to store input/output location. So pig should also do the same in LoadMetadata and StoreMetadata to be closer to hadoop. Currently when a job fails, pig assumes the output locations (corresponding to the stores in the job) are hdfs locations and attempts to delete them. Since output locations could be non hdfs locations, this cleanup should be delegated to the StoreFuncInterface implementation - hence a new method - cleanupOnFailure() should be introduced in StoreFuncInterface and a default implementation should be provided in the StoreFunc abstract class which checks if the location exists on hdfs and deletes it if so. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-1265) Change LoadMetadata and StoreMetadata to use Job instead of Configuraiton and add a cleanupOnFailure method to StoreFuncInterface
[ https://issues.apache.org/jira/browse/PIG-1265?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12839530#action_12839530 ] Pradeep Kamath commented on PIG-1265: - All unit tests succeeded on a local run on my machine. Change LoadMetadata and StoreMetadata to use Job instead of Configuraiton and add a cleanupOnFailure method to StoreFuncInterface - Key: PIG-1265 URL: https://issues.apache.org/jira/browse/PIG-1265 Project: Pig Issue Type: Bug Affects Versions: 0.7.0 Reporter: Pradeep Kamath Assignee: Pradeep Kamath Fix For: 0.7.0 Attachments: PIG-1265.patch Speaking to the hadoop team folks, the direction in hadoop is to use Job instead of Configuration - for example InputFormat/OutputFormat implementations use Job to store input/output location. So pig should also do the same in LoadMetadata and StoreMetadata to be closer to hadoop. Currently when a job fails, pig assumes the output locations (corresponding to the stores in the job) are hdfs locations and attempts to delete them. Since output locations could be non hdfs locations, this cleanup should be delegated to the StoreFuncInterface implementation - hence a new method - cleanupOnFailure() should be introduced in StoreFuncInterface and a default implementation should be provided in the StoreFunc abstract class which checks if the location exists on hdfs and deletes it if so. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (PIG-1265) Change LoadMetadata and StoreMetadata to use Job instead of Configuraiton and add a cleanupOnFailure method to StoreFuncInterface
Change LoadMetadata and StoreMetadata to use Job instead of Configuraiton and add a cleanupOnFailure method to StoreFuncInterface - Key: PIG-1265 URL: https://issues.apache.org/jira/browse/PIG-1265 Project: Pig Issue Type: Bug Affects Versions: 0.7.0 Reporter: Pradeep Kamath Assignee: Pradeep Kamath Fix For: 0.7.0 Speaking to the hadoop team folks, the direction in hadoop is to use Job instead of Configuration - for example InputFormat/OutputFormat implementations use Job to store input/output location. So pig should also do the same in LoadMetadata and StoreMetadata to be closer to hadoop. Currently when a job fails, pig assumes the output locations (corresponding to the stores in the job) are hdfs locations and attempts to delete them. Since output locations could be non hdfs locations, this cleanup should be delegated to the StoreFuncInterface implementation - hence a new method - cleanupOnFailure() should be introduced in StoreFuncInterface and a default implementation should be provided in the StoreFunc abstract class which checks if the location exists on hdfs and deletes it if so. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1265) Change LoadMetadata and StoreMetadata to use Job instead of Configuraiton and add a cleanupOnFailure method to StoreFuncInterface
[ https://issues.apache.org/jira/browse/PIG-1265?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pradeep Kamath updated PIG-1265: Assignee: Pradeep Kamath (was: Pradeep Kamath) Status: Patch Available (was: Open) Change LoadMetadata and StoreMetadata to use Job instead of Configuraiton and add a cleanupOnFailure method to StoreFuncInterface - Key: PIG-1265 URL: https://issues.apache.org/jira/browse/PIG-1265 Project: Pig Issue Type: Bug Affects Versions: 0.7.0 Reporter: Pradeep Kamath Assignee: Pradeep Kamath Fix For: 0.7.0 Attachments: PIG-1265.patch Speaking to the hadoop team folks, the direction in hadoop is to use Job instead of Configuration - for example InputFormat/OutputFormat implementations use Job to store input/output location. So pig should also do the same in LoadMetadata and StoreMetadata to be closer to hadoop. Currently when a job fails, pig assumes the output locations (corresponding to the stores in the job) are hdfs locations and attempts to delete them. Since output locations could be non hdfs locations, this cleanup should be delegated to the StoreFuncInterface implementation - hence a new method - cleanupOnFailure() should be introduced in StoreFuncInterface and a default implementation should be provided in the StoreFunc abstract class which checks if the location exists on hdfs and deletes it if so. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1259) ResourceFieldSchema.setSchema should not allow a bag field without a Tuple as its only sub field (the tuple itself can have a schema with 1 subfields)
[ https://issues.apache.org/jira/browse/PIG-1259?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pradeep Kamath updated PIG-1259: Attachment: PIG-1259-2.patch Patch to address unit test failures - some tests had a missing try-catch block ResourceFieldSchema.setSchema should not allow a bag field without a Tuple as its only sub field (the tuple itself can have a schema with 1 subfields) - Key: PIG-1259 URL: https://issues.apache.org/jira/browse/PIG-1259 Project: Pig Issue Type: Bug Affects Versions: 0.7.0 Reporter: Pradeep Kamath Assignee: Pradeep Kamath Fix For: 0.7.0 Attachments: PIG-1259-2.patch, PIG-1259.patch Currently Schema.getPigSchema(ResourceSchema) does not allow a bag field in the ResourceSchema with a subschema containing anything other than a tuple. The tuple itself can have a schema with 1 subfields. This check should also be enforced in ResourceFieldSchema.setSchema() -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1259) ResourceFieldSchema.setSchema should not allow a bag field without a Tuple as its only sub field (the tuple itself can have a schema with 1 subfields)
[ https://issues.apache.org/jira/browse/PIG-1259?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pradeep Kamath updated PIG-1259: Status: Patch Available (was: Open) ResourceFieldSchema.setSchema should not allow a bag field without a Tuple as its only sub field (the tuple itself can have a schema with 1 subfields) - Key: PIG-1259 URL: https://issues.apache.org/jira/browse/PIG-1259 Project: Pig Issue Type: Bug Affects Versions: 0.7.0 Reporter: Pradeep Kamath Assignee: Pradeep Kamath Fix For: 0.7.0 Attachments: PIG-1259-2.patch, PIG-1259.patch Currently Schema.getPigSchema(ResourceSchema) does not allow a bag field in the ResourceSchema with a subschema containing anything other than a tuple. The tuple itself can have a schema with 1 subfields. This check should also be enforced in ResourceFieldSchema.setSchema() -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1259) ResourceFieldSchema.setSchema should not allow a bag field without a Tuple as its only sub field (the tuple itself can have a schema with 1 subfields)
[ https://issues.apache.org/jira/browse/PIG-1259?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pradeep Kamath updated PIG-1259: Status: Open (was: Patch Available) ResourceFieldSchema.setSchema should not allow a bag field without a Tuple as its only sub field (the tuple itself can have a schema with 1 subfields) - Key: PIG-1259 URL: https://issues.apache.org/jira/browse/PIG-1259 Project: Pig Issue Type: Bug Affects Versions: 0.7.0 Reporter: Pradeep Kamath Assignee: Pradeep Kamath Fix For: 0.7.0 Attachments: PIG-1259-2.patch, PIG-1259.patch Currently Schema.getPigSchema(ResourceSchema) does not allow a bag field in the ResourceSchema with a subschema containing anything other than a tuple. The tuple itself can have a schema with 1 subfields. This check should also be enforced in ResourceFieldSchema.setSchema() -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-1205) Enhance HBaseStorage-- Make it support loading row key and implement StoreFunc
[ https://issues.apache.org/jira/browse/PIG-1205?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12838524#action_12838524 ] Pradeep Kamath commented on PIG-1205: - Jeff, the patch no longer applies cleanly on trunk - looks like we missed reviewing this earlier - sorry about that - can you regenerate this patch against trunk? Enhance HBaseStorage-- Make it support loading row key and implement StoreFunc -- Key: PIG-1205 URL: https://issues.apache.org/jira/browse/PIG-1205 Project: Pig Issue Type: Sub-task Affects Versions: 0.7.0 Reporter: Jeff Zhang Assignee: Jeff Zhang Fix For: 0.7.0 Attachments: PIG_1205.patch -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1257) PigStorage per the new load-store redesign should support splitting of bzip files
[ https://issues.apache.org/jira/browse/PIG-1257?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pradeep Kamath updated PIG-1257: Status: Open (was: Patch Available) PigStorage per the new load-store redesign should support splitting of bzip files - Key: PIG-1257 URL: https://issues.apache.org/jira/browse/PIG-1257 Project: Pig Issue Type: Bug Affects Versions: 0.7.0 Reporter: Pradeep Kamath Assignee: Pradeep Kamath Fix For: 0.7.0 Attachments: PIG-1257-2.patch, PIG-1257.patch PigStorage implemented per new load-store-redesign (PIG-966) is based on TextInputFormat for reading data. TextInputFormat has support for reading bzip data but without support for splitting bzip files. In pig 0.6, splitting was enabled for bzip files - we should attempt to enable that feature. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1257) PigStorage per the new load-store redesign should support splitting of bzip files
[ https://issues.apache.org/jira/browse/PIG-1257?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pradeep Kamath updated PIG-1257: Attachment: PIG-1257-2.patch Attached new patch to address unit test failures PigStorage per the new load-store redesign should support splitting of bzip files - Key: PIG-1257 URL: https://issues.apache.org/jira/browse/PIG-1257 Project: Pig Issue Type: Bug Affects Versions: 0.7.0 Reporter: Pradeep Kamath Assignee: Pradeep Kamath Fix For: 0.7.0 Attachments: PIG-1257-2.patch, PIG-1257.patch PigStorage implemented per new load-store-redesign (PIG-966) is based on TextInputFormat for reading data. TextInputFormat has support for reading bzip data but without support for splitting bzip files. In pig 0.6, splitting was enabled for bzip files - we should attempt to enable that feature. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1257) PigStorage per the new load-store redesign should support splitting of bzip files
[ https://issues.apache.org/jira/browse/PIG-1257?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pradeep Kamath updated PIG-1257: Status: Patch Available (was: Open) PigStorage per the new load-store redesign should support splitting of bzip files - Key: PIG-1257 URL: https://issues.apache.org/jira/browse/PIG-1257 Project: Pig Issue Type: Bug Affects Versions: 0.7.0 Reporter: Pradeep Kamath Assignee: Pradeep Kamath Fix For: 0.7.0 Attachments: PIG-1257-2.patch, PIG-1257.patch PigStorage implemented per new load-store-redesign (PIG-966) is based on TextInputFormat for reading data. TextInputFormat has support for reading bzip data but without support for splitting bzip files. In pig 0.6, splitting was enabled for bzip files - we should attempt to enable that feature. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1257) PigStorage per the new load-store redesign should support splitting of bzip files
[ https://issues.apache.org/jira/browse/PIG-1257?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pradeep Kamath updated PIG-1257: Status: Patch Available (was: Open) PigStorage per the new load-store redesign should support splitting of bzip files - Key: PIG-1257 URL: https://issues.apache.org/jira/browse/PIG-1257 Project: Pig Issue Type: Bug Affects Versions: 0.7.0 Reporter: Pradeep Kamath Assignee: Pradeep Kamath Fix For: 0.7.0 Attachments: PIG-1257.patch PigStorage implemented per new load-store-redesign (PIG-966) is based on TextInputFormat for reading data. TextInputFormat has support for reading bzip data but without support for splitting bzip files. In pig 0.6, splitting was enabled for bzip files - we should attempt to enable that feature. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1257) PigStorage per the new load-store redesign should support splitting of bzip files
[ https://issues.apache.org/jira/browse/PIG-1257?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pradeep Kamath updated PIG-1257: Attachment: PIG-1257.patch Attached patch builds an InputFormat (Bzip2TextInputFormat) on top of the existing CBZip2InputStream. PigStorage per the new load-store redesign should support splitting of bzip files - Key: PIG-1257 URL: https://issues.apache.org/jira/browse/PIG-1257 Project: Pig Issue Type: Bug Affects Versions: 0.7.0 Reporter: Pradeep Kamath Assignee: Pradeep Kamath Fix For: 0.7.0 Attachments: PIG-1257.patch PigStorage implemented per new load-store-redesign (PIG-966) is based on TextInputFormat for reading data. TextInputFormat has support for reading bzip data but without support for splitting bzip files. In pig 0.6, splitting was enabled for bzip files - we should attempt to enable that feature. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (PIG-1259) ResourceFieldSchema.setSchema should not allow a bag field without a Tuple as its only sub field (the tuple itself can have a schema with 1 subfields)
ResourceFieldSchema.setSchema should not allow a bag field without a Tuple as its only sub field (the tuple itself can have a schema with 1 subfields) - Key: PIG-1259 URL: https://issues.apache.org/jira/browse/PIG-1259 Project: Pig Issue Type: Bug Affects Versions: 0.7.0 Reporter: Pradeep Kamath Fix For: 0.7.0 Currently Schema.getPigSchema(ResourceSchema) does not allow a bag field in the ResourceSchema with a subschema containing anything other than a tuple. The tuple itself can have a schema with 1 subfields. This check should also be enforced in ResourceFieldSchema.setSchema() -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1259) ResourceFieldSchema.setSchema should not allow a bag field without a Tuple as its only sub field (the tuple itself can have a schema with 1 subfields)
[ https://issues.apache.org/jira/browse/PIG-1259?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pradeep Kamath updated PIG-1259: Assignee: Pradeep Kamath Status: Patch Available (was: Open) ResourceFieldSchema.setSchema should not allow a bag field without a Tuple as its only sub field (the tuple itself can have a schema with 1 subfields) - Key: PIG-1259 URL: https://issues.apache.org/jira/browse/PIG-1259 Project: Pig Issue Type: Bug Affects Versions: 0.7.0 Reporter: Pradeep Kamath Assignee: Pradeep Kamath Fix For: 0.7.0 Attachments: PIG-1259.patch Currently Schema.getPigSchema(ResourceSchema) does not allow a bag field in the ResourceSchema with a subschema containing anything other than a tuple. The tuple itself can have a schema with 1 subfields. This check should also be enforced in ResourceFieldSchema.setSchema() -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-1079) Modify merge join to use distributed cache to maintain the index
[ https://issues.apache.org/jira/browse/PIG-1079?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12837408#action_12837408 ] Pradeep Kamath commented on PIG-1079: - +1 Modify merge join to use distributed cache to maintain the index Key: PIG-1079 URL: https://issues.apache.org/jira/browse/PIG-1079 Project: Pig Issue Type: Bug Reporter: Sriranjan Manjunath Assignee: Richard Ding Fix For: 0.7.0 Attachments: PIG-1079.patch, PIG-1079.patch -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1250) Make StoreFunc an abstract class and create a mirror interface called StoreFuncInterface
[ https://issues.apache.org/jira/browse/PIG-1250?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pradeep Kamath updated PIG-1250: Affects Version/s: 0.7.0 Fix Version/s: 0.7.0 Make StoreFunc an abstract class and create a mirror interface called StoreFuncInterface Key: PIG-1250 URL: https://issues.apache.org/jira/browse/PIG-1250 Project: Pig Issue Type: Sub-task Affects Versions: 0.7.0 Reporter: Pradeep Kamath Assignee: Pradeep Kamath Fix For: 0.7.0 -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (PIG-1250) Make StoreFunc an abstract class and create a mirror interface called StoreFuncInterface
Make StoreFunc an abstract class and create a mirror interface called StoreFuncInterface Key: PIG-1250 URL: https://issues.apache.org/jira/browse/PIG-1250 Project: Pig Issue Type: Sub-task Reporter: Pradeep Kamath Assignee: Pradeep Kamath -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1250) Make StoreFunc an abstract class and create a mirror interface called StoreFuncInterface
[ https://issues.apache.org/jira/browse/PIG-1250?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pradeep Kamath updated PIG-1250: Status: Patch Available (was: Open) Make StoreFunc an abstract class and create a mirror interface called StoreFuncInterface Key: PIG-1250 URL: https://issues.apache.org/jira/browse/PIG-1250 Project: Pig Issue Type: Sub-task Affects Versions: 0.7.0 Reporter: Pradeep Kamath Assignee: Pradeep Kamath Fix For: 0.7.0 Attachments: PIG-1250.patch -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
COMPLETED merge of load-store-redesign branch to trunk
The merge from load-store-redesign branch to trunk is now completed. New commits can now proceed on trunk. The load-store-redesign branch is deprecated with this merge and no more commits should be done on that branch. Pradeep From: Pradeep Kamath Sent: Thursday, February 18, 2010 11:20 AM To: Pradeep Kamath; 'pig-dev@hadoop.apache.org'; 'pig-u...@hadoop.apache.org' Subject: BEGINNING merge of load-store-redesign branch to trunk - hold off commits! Hi, I will begin this activity now - a request to all committers to not commit to trunk or load-store-redesign till I send an all clear message - I am anticipating this will hopefully be completed by end of day (Pacific time) tomorrow. Thanks, Pradeep From: Pradeep Kamath Sent: Tuesday, February 16, 2010 11:34 AM To: 'pig-dev@hadoop.apache.org'; 'pig-u...@hadoop.apache.org' Subject: Plan to merge load-store-redesign branch to trunk Hi, We would like to merge the load-store-redesign branch to trunk tentatively on Thursday. To do this, I would like to request all committers to not commit anything to load-store-redesign branch or trunk during the period of the merge. I will send out a mail to indicate begin and end of this activity - tentatively I am expecting this to be a day's period between 9 AM PST Thursday to 9AM PST Friday so I can resolve any conflicts and run all tests. Pradeep
[jira] Created: (PIG-1245) Remove the connection to nameone in HExecutionEngine.init()
Remove the connection to nameone in HExecutionEngine.init() Key: PIG-1245 URL: https://issues.apache.org/jira/browse/PIG-1245 Project: Pig Issue Type: Bug Affects Versions: 0.6.0 Reporter: Pradeep Kamath Fix For: 0.7.0 PigContext.connect() calls HExecutionEngine.init(). The former is called from the backend map/reduce tasks in DefaultIndexableLoader used in merge join. It is not clear that a connection to the namenode is required in HExecutionEngine.init(). -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-966) Proposed rework for LoadFunc, StoreFunc, and Slice/r interfaces
[ https://issues.apache.org/jira/browse/PIG-966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12836035#action_12836035 ] Pradeep Kamath commented on PIG-966: LoadFunc is now an abstract class with default implementations for some of the methods - we hope this will aid implementers. I would like to make the same change for StoreFunc. Since PigStorage currently does both load and store, we would need to also introduce an interface - StoreFuncInterface so that PigStorage can extend LoadFunc and implement StoreFuncInterface. To be symmetrical, we would need to also introduce a LoadFuncInterface. This interface can be used by implementers if they want their loadFunc implementation to extend some other class. We can document and recommend strongly to users to only use our abstract classes since that would be make them less vulnerable to incompatibile additions in the future (hopefully when we add new methods into these abstract classes we will give a default implementation). I will upload a patch for this unless anyone has strong objections. Proposed rework for LoadFunc, StoreFunc, and Slice/r interfaces --- Key: PIG-966 URL: https://issues.apache.org/jira/browse/PIG-966 Project: Pig Issue Type: Improvement Components: impl Reporter: Alan Gates Assignee: Alan Gates I propose that we rework the LoadFunc, StoreFunc, and Slice/r interfaces significantly. See http://wiki.apache.org/pig/LoadStoreRedesignProposal for full details -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1218) Use distributed cache to store samples
[ https://issues.apache.org/jira/browse/PIG-1218?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pradeep Kamath updated PIG-1218: Resolution: Fixed Hadoop Flags: [Reviewed] Status: Resolved (was: Patch Available) Committed patch PIG-1218_2.patch since the merge join changes need to be re-worked and will be handled in a different patch. Thanks Richard! Use distributed cache to store samples -- Key: PIG-1218 URL: https://issues.apache.org/jira/browse/PIG-1218 Project: Pig Issue Type: Improvement Reporter: Olga Natkovich Assignee: Richard Ding Fix For: 0.7.0 Attachments: PIG-1218.patch, PIG-1218_2.patch, PIG-1218_3.patch Currently, in the case of skew join and order by we use sample that is just written to the dfs (not distributed cache) and, as the result, get opened and copied around more than necessary. This impacts query performance and also places unnecesary load on the name node -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-966) Proposed rework for LoadFunc, StoreFunc, and Slice/r interfaces
[ https://issues.apache.org/jira/browse/PIG-966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12836080#action_12836080 ] Pradeep Kamath commented on PIG-966: In retrospect, I think we can skip on creating a LoadFuncInterface since currently there is no real use case for an interface - we are adding it to keep symmetry with StoreFuncINterface and to allow implementations which extends other classes to implement this interface. The first motivation is not very strong and second also can be achieved through composition rather than inheritance - it is unclear how inheriting a different class would benefit a Loader implementation over composition to delegation functionality. By introducing a LoadFuncInterface we would be exposing users who implement it to backward incompatible additions in the future. So I think we should not add a LoadFuncInterface now and ONLY if a real need arises add it. The rest of my proposal (making StoreFunc an abstract class and add a new StoreFuncInterface) still holds. Proposed rework for LoadFunc, StoreFunc, and Slice/r interfaces --- Key: PIG-966 URL: https://issues.apache.org/jira/browse/PIG-966 Project: Pig Issue Type: Improvement Components: impl Reporter: Alan Gates Assignee: Alan Gates I propose that we rework the LoadFunc, StoreFunc, and Slice/r interfaces significantly. See http://wiki.apache.org/pig/LoadStoreRedesignProposal for full details -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
BEGINNING merge of load-store-redesign branch to trunk - hold off commits!
Hi, I will begin this activity now - a request to all committers to not commit to trunk or load-store-redesign till I send an all clear message - I am anticipating this will hopefully be completed by end of day (Pacific time) tomorrow. Thanks, Pradeep From: Pradeep Kamath Sent: Tuesday, February 16, 2010 11:34 AM To: 'pig-dev@hadoop.apache.org'; 'pig-u...@hadoop.apache.org' Subject: Plan to merge load-store-redesign branch to trunk Hi, We would like to merge the load-store-redesign branch to trunk tentatively on Thursday. To do this, I would like to request all committers to not commit anything to load-store-redesign branch or trunk during the period of the merge. I will send out a mail to indicate begin and end of this activity - tentatively I am expecting this to be a day's period between 9 AM PST Thursday to 9AM PST Friday so I can resolve any conflicts and run all tests. Pradeep
[jira] Updated: (PIG-1216) New load store design does not allow Pig to validate inputs and outputs up front
[ https://issues.apache.org/jira/browse/PIG-1216?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pradeep Kamath updated PIG-1216: Resolution: Fixed Fix Version/s: 0.7.0 Hadoop Flags: [Reviewed] Status: Resolved (was: Patch Available) Patch committed to load-store-redesign branch - Thanks Ashutosh! Note that only outputs will be validated up front (in line with Pig 0.6.0) - inputs will not be validated up front since for the following case validating inputs is not easy: {code} ... store into 'foo'... load 'foo'... ... {code} New load store design does not allow Pig to validate inputs and outputs up front Key: PIG-1216 URL: https://issues.apache.org/jira/browse/PIG-1216 Project: Pig Issue Type: Bug Affects Versions: 0.7.0 Reporter: Alan Gates Assignee: Ashutosh Chauhan Fix For: 0.7.0 Attachments: pig-1216.patch, pig-1216_1.patch In Pig 0.6 and before, Pig attempts to verify existence of inputs and non-existence of outputs during parsing to avoid run time failures when inputs don't exist or outputs can't be overwritten. The downside to this was that Pig assumed all inputs and outputs were HDFS files, which made implementation harder for non-HDFS based load and store functions. In the load store redesign (PIG-966) this was delegated to InputFormats and OutputFormats to avoid this problem and to make use of the checks already being done in those implementations. Unfortunately, for Pig Latin scripts that run more then one MR job, this does not work well. MR does not do input/output verification on all the jobs at once. It does them one at a time. So if a Pig Latin script results in 10 MR jobs and the file to store to at the end already exists, the first 9 jobs will be run before the 10th job discovers that the whole thing was doomed from the beginning. To avoid this a validate call needs to be added to the new LoadFunc and StoreFunc interfaces. Pig needs to pass this method enough information that the load function implementer can delegate to InputFormat.getSplits() and the store function implementer to OutputFormat.checkOutputSpecs() if s/he decides to. Since 90% of all load and store functions use HDFS and PigStorage will also need to, the Pig team should implement a default file existence check on HDFS and make it available as a static method to other Load/Store function implementers. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1079) Modify merge join to use distributed cache to maintain the index
[ https://issues.apache.org/jira/browse/PIG-1079?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pradeep Kamath updated PIG-1079: Fix Version/s: 0.7.0 Assignee: Richard Ding Modify merge join to use distributed cache to maintain the index Key: PIG-1079 URL: https://issues.apache.org/jira/browse/PIG-1079 Project: Pig Issue Type: Bug Reporter: Sriranjan Manjunath Assignee: Richard Ding Fix For: 0.7.0 -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-1218) Use distributed cache to store samples
[ https://issues.apache.org/jira/browse/PIG-1218?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12834957#action_12834957 ] Pradeep Kamath commented on PIG-1218: - +1 Patch mostly looks good - couple of comments: * In a couple of places instead of using Configuration and JobConf based on PigMapReduce.sJobConf, you should create a new Configiuration(false) and new JobConf(false) so we create fresh datastructures without any properties coming from the Map reduce based datastructures. * Since partitionFile is no longer used in POPartitionRearrange.java we should remove it. You can make these changes and go ahead and commit it if it passes tests Use distributed cache to store samples -- Key: PIG-1218 URL: https://issues.apache.org/jira/browse/PIG-1218 Project: Pig Issue Type: Improvement Reporter: Olga Natkovich Assignee: Richard Ding Fix For: 0.7.0 Attachments: PIG-1218.patch, PIG-1218_2.patch Currently, in the case of skew join and order by we use sample that is just written to the dfs (not distributed cache) and, as the result, get opened and copied around more than necessary. This impacts query performance and also places unnecesary load on the name node -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (PIG-1239) PigContext.connect() should not create a jobClient and jobClient should be created on demand when needed
PigContext.connect() should not create a jobClient and jobClient should be created on demand when needed Key: PIG-1239 URL: https://issues.apache.org/jira/browse/PIG-1239 Project: Pig Issue Type: Bug Affects Versions: 0.6.0 Reporter: Pradeep Kamath Assignee: Pradeep Kamath Fix For: 0.6.0, 0.7.0 PigContext.connect() currently connects to the jobtracker and creates a JobClient - this causes issue in POMergeJoin/POFRJoin wherein these connections to the jobtracker are made from each map task. The creation of the JobClient is not necessary in PigContext.connect() and a JobClient should be created on demand where it is needed instead. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1213) Schema serialization is broken
[ https://issues.apache.org/jira/browse/PIG-1213?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pradeep Kamath updated PIG-1213: Resolution: Fixed Hadoop Flags: [Reviewed] Status: Resolved (was: Patch Available) Patch was committed to trunk and branch-0.6 on 01 Feb 2010 Schema serialization is broken -- Key: PIG-1213 URL: https://issues.apache.org/jira/browse/PIG-1213 Project: Pig Issue Type: Bug Affects Versions: 0.6.0 Reporter: Pradeep Kamath Assignee: Pradeep Kamath Fix For: 0.6.0 Attachments: PIG-1213.patch Consider a udf which needs to know the schema of its input in the backend while executing. To achieve this, the udf needs to store the schema into the UDFContext. Internally the UDFContext will serialize the schema into the jobconf. However this currently is broken and gives a Serialization exception -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
Plan to merge load-store-redesign branch to trunk
Hi, We would like to merge the load-store-redesign branch to trunk tentatively on Thursday. To do this, I would like to request all committers to not commit anything to load-store-redesign branch or trunk during the period of the merge. I will send out a mail to indicate begin and end of this activity - tentatively I am expecting this to be a day's period between 9 AM PST Thursday to 9AM PST Friday so I can resolve any conflicts and run all tests. Pradeep
[jira] Updated: (PIG-1239) PigContext.connect() should not create a jobClient and jobClient should be created on demand when needed
[ https://issues.apache.org/jira/browse/PIG-1239?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pradeep Kamath updated PIG-1239: Attachment: PIG-1239-load-store-redesign-branch.patch PIG-1239-branch-0.6.patch Attached patches for branch-0.6 and load-store-redesign branch. Changes are: * PigContext.connect() does not create a JobClient - instead it creates and holds a JobConf object - callers have been changed to use the JobConf and create a JobClient * On the load-store-redesign branch, POMergeJoin no longer does a pc.connect since it is no longer needed PigContext.connect() should not create a jobClient and jobClient should be created on demand when needed Key: PIG-1239 URL: https://issues.apache.org/jira/browse/PIG-1239 Project: Pig Issue Type: Bug Affects Versions: 0.6.0 Reporter: Pradeep Kamath Assignee: Pradeep Kamath Fix For: 0.6.0, 0.7.0 Attachments: PIG-1239-branch-0.6.patch, PIG-1239-load-store-redesign-branch.patch PigContext.connect() currently connects to the jobtracker and creates a JobClient - this causes issue in POMergeJoin/POFRJoin wherein these connections to the jobtracker are made from each map task. The creation of the JobClient is not necessary in PigContext.connect() and a JobClient should be created on demand where it is needed instead. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-1216) New load store design does not allow Pig to validate inputs and outputs up front
[ https://issues.apache.org/jira/browse/PIG-1216?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12834411#action_12834411 ] Pradeep Kamath commented on PIG-1216: - Review comments: * Is it ok to call outputSpecs multiple times (since we will now be calling it in the visitor and Hadoop will be calling it later when the job is launched) - hope that does not break the contract per Hadoop's OutputFormat interface * The test case for validation failure should ensure that PlanValidationException is indeed thrown (through some boolean flag?) - currently the code has : {code} } catch (PlanValidationException pve){ + // We expect this to happen. +} {code} * import org.omg.PortableInterceptor.SUCCESSFUL; in TestStore.java seems accidental - if you will be submitting a new patch for above comment, you can remove this import also. Otherwise looks good. New load store design does not allow Pig to validate inputs and outputs up front Key: PIG-1216 URL: https://issues.apache.org/jira/browse/PIG-1216 Project: Pig Issue Type: Bug Affects Versions: 0.7.0 Reporter: Alan Gates Assignee: Ashutosh Chauhan Attachments: pig-1216.patch In Pig 0.6 and before, Pig attempts to verify existence of inputs and non-existence of outputs during parsing to avoid run time failures when inputs don't exist or outputs can't be overwritten. The downside to this was that Pig assumed all inputs and outputs were HDFS files, which made implementation harder for non-HDFS based load and store functions. In the load store redesign (PIG-966) this was delegated to InputFormats and OutputFormats to avoid this problem and to make use of the checks already being done in those implementations. Unfortunately, for Pig Latin scripts that run more then one MR job, this does not work well. MR does not do input/output verification on all the jobs at once. It does them one at a time. So if a Pig Latin script results in 10 MR jobs and the file to store to at the end already exists, the first 9 jobs will be run before the 10th job discovers that the whole thing was doomed from the beginning. To avoid this a validate call needs to be added to the new LoadFunc and StoreFunc interfaces. Pig needs to pass this method enough information that the load function implementer can delegate to InputFormat.getSplits() and the store function implementer to OutputFormat.checkOutputSpecs() if s/he decides to. Since 90% of all load and store functions use HDFS and PigStorage will also need to, the Pig team should implement a default file existence check on HDFS and make it available as a static method to other Load/Store function implementers. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-1239) PigContext.connect() should not create a jobClient and jobClient should be created on demand when needed
[ https://issues.apache.org/jira/browse/PIG-1239?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12834443#action_12834443 ] Pradeep Kamath commented on PIG-1239: - * No unit tests are included in both patches since this is difficult to capture in a unit test - manual tests were done to ensure that connections to JobTracker no longer happens from a script using replicated join. * Release audit warning are due to diffs in html docs * The extra javac warnings are due to use of JobConf which is deprecated - I have added suppressWarning tags which don't seem to help. We need to use JobConf here and there is no way around the warning. Results from running test-patch ant target for branch-0.6 [exec] -1 overall. [exec] [exec] +1 @author. The patch does not contain any @author tags. [exec] [exec] -1 tests included. The patch doesn't appear to include any new or modified tests. [exec] Please justify why no tests are needed for this patch. [exec] [exec] +1 javadoc. The javadoc tool did not generate any warning messages. [exec] [exec] +1 javac. The applied patch does not increase the total number of javac compiler warnings. [exec] [exec] +1 findbugs. The patch does not introduce any new Findbugs warnings. [exec] [exec] -1 release audit. The applied patch generated 391 release audit warnings (more than the trunk's current 389 warnings). [exec] [exec] [exec] [exec] [exec] == [exec] == [exec] Finished build. [exec] == [exec] == Results from running test-patch ant target for load-store-redesign branch: [exec] -1 overall. [exec] [exec] +1 @author. The patch does not contain any @author tags. [exec] [exec] -1 tests included. The patch doesn't appear to include any new or modified tests. [exec] Please justify why no tests are needed for this patch. [exec] [exec] +1 javadoc. The javadoc tool did not generate any warning messages. [exec] [exec] -1 javac. The applied patch generated 105 javac compiler warnings (more than the trunk's current 103 warnings). [exec] [exec] +1 findbugs. The patch does not introduce any new Findbugs warnings. [exec] [exec] +1 release audit. The applied patch does not increase the total number of release audit warnings. [exec] [exec] PigContext.connect() should not create a jobClient and jobClient should be created on demand when needed Key: PIG-1239 URL: https://issues.apache.org/jira/browse/PIG-1239 Project: Pig Issue Type: Bug Affects Versions: 0.6.0 Reporter: Pradeep Kamath Assignee: Pradeep Kamath Fix For: 0.6.0, 0.7.0 Attachments: PIG-1239-branch-0.6.patch, PIG-1239-load-store-redesign-branch.patch PigContext.connect() currently connects to the jobtracker and creates a JobClient - this causes issue in POMergeJoin/POFRJoin wherein these connections to the jobtracker are made from each map task. The creation of the JobClient is not necessary in PigContext.connect() and a JobClient should be created on demand where it is needed instead. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Resolved: (PIG-1239) PigContext.connect() should not create a jobClient and jobClient should be created on demand when needed
[ https://issues.apache.org/jira/browse/PIG-1239?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pradeep Kamath resolved PIG-1239. - Resolution: Fixed Hadoop Flags: [Reviewed] Patch committed to branch-0.6 and load-store-redesign branch. PigContext.connect() should not create a jobClient and jobClient should be created on demand when needed Key: PIG-1239 URL: https://issues.apache.org/jira/browse/PIG-1239 Project: Pig Issue Type: Bug Affects Versions: 0.6.0 Reporter: Pradeep Kamath Assignee: Pradeep Kamath Fix For: 0.6.0, 0.7.0 Attachments: PIG-1239-branch-0.6.patch, PIG-1239-load-store-redesign-branch.patch PigContext.connect() currently connects to the jobtracker and creates a JobClient - this causes issue in POMergeJoin/POFRJoin wherein these connections to the jobtracker are made from each map task. The creation of the JobClient is not necessary in PigContext.connect() and a JobClient should be created on demand where it is needed instead. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1234) Unable to create input slice for har:// files
[ https://issues.apache.org/jira/browse/PIG-1234?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pradeep Kamath updated PIG-1234: Fix Version/s: 0.7.0 Unable to create input slice for har:// files - Key: PIG-1234 URL: https://issues.apache.org/jira/browse/PIG-1234 Project: Pig Issue Type: Bug Reporter: Tsz Wo (Nicholas), SZE Fix For: 0.7.0 Tried to load har:// files {noformat} grunt a = LOAD 'har://hdfs-namenode/user/tsz/t20.har/t20' USING PigStorage('\n') AS (line); grunt dump {noformat} but pig says {noformat} 2010-02-10 18:42:20,750 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 2118: Unable to create input slice for: har://hdfs-namenode/user/tsz/t20.har/t20 {noformat} -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1234) Unable to create input slice for har:// files
[ https://issues.apache.org/jira/browse/PIG-1234?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pradeep Kamath updated PIG-1234: Attachment: PIG-1234.patch Patch against load-store-redesign branch which fixes this - the code currently was trying to validate the schema supplied in the load vs. schema of the current directory path (which is always hdfs). The patch makes the change to not do this check if the local is a valid url with authority. Unable to create input slice for har:// files - Key: PIG-1234 URL: https://issues.apache.org/jira/browse/PIG-1234 Project: Pig Issue Type: Bug Reporter: Tsz Wo (Nicholas), SZE Fix For: 0.7.0 Attachments: PIG-1234.patch Tried to load har:// files {noformat} grunt a = LOAD 'har://hdfs-namenode/user/tsz/t20.har/t20' USING PigStorage('\n') AS (line); grunt dump {noformat} but pig says {noformat} 2010-02-10 18:42:20,750 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 2118: Unable to create input slice for: har://hdfs-namenode/user/tsz/t20.har/t20 {noformat} -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1234) Unable to create input slice for har:// files
[ https://issues.apache.org/jira/browse/PIG-1234?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pradeep Kamath updated PIG-1234: Assignee: Pradeep Kamath Unable to create input slice for har:// files - Key: PIG-1234 URL: https://issues.apache.org/jira/browse/PIG-1234 Project: Pig Issue Type: Bug Reporter: Tsz Wo (Nicholas), SZE Assignee: Pradeep Kamath Fix For: 0.7.0 Attachments: PIG-1234.patch Tried to load har:// files {noformat} grunt a = LOAD 'har://hdfs-namenode/user/tsz/t20.har/t20' USING PigStorage('\n') AS (line); grunt dump {noformat} but pig says {noformat} 2010-02-10 18:42:20,750 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 2118: Unable to create input slice for: har://hdfs-namenode/user/tsz/t20.har/t20 {noformat} -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-1234) Unable to create input slice for har:// files
[ https://issues.apache.org/jira/browse/PIG-1234?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12833221#action_12833221 ] Pradeep Kamath commented on PIG-1234: - Results from running test-patch ant target: [exec] +1 overall. [exec] [exec] +1 @author. The patch does not contain any @author tags. [exec] [exec] +1 tests included. The patch appears to include 3 new or modified tests. [exec] [exec] +1 javadoc. The javadoc tool did not generate any warning messages. [exec] [exec] +1 javac. The applied patch does not increase the total number of javac compiler warnings. [exec] [exec] +1 findbugs. The patch does not introduce any new Findbugs warnings. [exec] [exec] +1 release audit. The applied patch does not increase the total number of release audit warnings. I am currently running unit tests against this patch on load-store-redesign branch. Unable to create input slice for har:// files - Key: PIG-1234 URL: https://issues.apache.org/jira/browse/PIG-1234 Project: Pig Issue Type: Bug Reporter: Tsz Wo (Nicholas), SZE Assignee: Pradeep Kamath Fix For: 0.7.0 Attachments: PIG-1234.patch Tried to load har:// files {noformat} grunt a = LOAD 'har://hdfs-namenode/user/tsz/t20.har/t20' USING PigStorage('\n') AS (line); grunt dump {noformat} but pig says {noformat} 2010-02-10 18:42:20,750 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 2118: Unable to create input slice for: har://hdfs-namenode/user/tsz/t20.har/t20 {noformat} -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-1234) Unable to create input slice for har:// files
[ https://issues.apache.org/jira/browse/PIG-1234?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12833231#action_12833231 ] Pradeep Kamath commented on PIG-1234: - A couple of observations about the load statement in the script posted above: 1) '\n' is not a valid argument for PigStorage - the argument is meant to be the field delimiter and '\n' cannot be used for a field delimiter (since it is considered to be the record delimiter by PigStorage) 2) The patch here only fixes the incorrect checking of the scheme in the url - whether har://.. resources can be read by PigStorage or not will depend on whether TextInputFormat can read har://.. resources. PigStorage simply passes the location onto to TextInputFormat which does the real reading. Unable to create input slice for har:// files - Key: PIG-1234 URL: https://issues.apache.org/jira/browse/PIG-1234 Project: Pig Issue Type: Bug Reporter: Tsz Wo (Nicholas), SZE Assignee: Pradeep Kamath Fix For: 0.7.0 Attachments: PIG-1234.patch Tried to load har:// files {noformat} grunt a = LOAD 'har://hdfs-namenode/user/tsz/t20.har/t20' USING PigStorage('\n') AS (line); grunt dump {noformat} but pig says {noformat} 2010-02-10 18:42:20,750 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 2118: Unable to create input slice for: har://hdfs-namenode/user/tsz/t20.har/t20 {noformat} -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-1234) Unable to create input slice for har:// files
[ https://issues.apache.org/jira/browse/PIG-1234?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12832202#action_12832202 ] Pradeep Kamath commented on PIG-1234: - can you try using a pig.jar compiled from the load store redesign branch - http://svn.apache.org/repos/asf/hadoop/pig/branches/load-store-redesign? Unable to create input slice for har:// files - Key: PIG-1234 URL: https://issues.apache.org/jira/browse/PIG-1234 Project: Pig Issue Type: Bug Reporter: Tsz Wo (Nicholas), SZE Tried to load har:// files {noformat} grunt a = LOAD 'har://hdfs-namenode/user/tsz/t20.har/t20' USING PigStorage('\n') AS (line); grunt dump {noformat} but pig says {noformat} 2010-02-10 18:42:20,750 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 2118: Unable to create input slice for: har://hdfs-namenode/user/tsz/t20.har/t20 {noformat} -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Resolved: (PIG-1090) Update sources to reflect recent changes in load-store interfaces
[ https://issues.apache.org/jira/browse/PIG-1090?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pradeep Kamath resolved PIG-1090. - Resolution: Fixed +1 for PIG-1090-22.patch, patch committed. Closing this jira as resolved since all changes to accommodate the new load-store interfaces have now been checked in. Update sources to reflect recent changes in load-store interfaces - Key: PIG-1090 URL: https://issues.apache.org/jira/browse/PIG-1090 Project: Pig Issue Type: Sub-task Affects Versions: 0.7.0 Reporter: Pradeep Kamath Assignee: Pradeep Kamath Fix For: 0.7.0 Attachments: PIG-1090-10.patch, PIG-1090-11.patch, PIG-1090-12.patch, PIG-1090-13.patch, PIG-1090-14.patch, PIG-1090-15.patch, PIG-1090-16.patch, PIG-1090-17.patch, PIG-1090-18.patch, PIG-1090-19.patch, PIG-1090-2.patch, PIG-1090-20.patch, PIG-1090-21.patch, PIG-1090-22.patch, PIG-1090-3.patch, PIG-1090-4.patch, PIG-1090-6.patch, PIG-1090-7.patch, PIG-1090-8.patch, PIG-1090-9.patch, PIG-1090.patch, PIG-1190-5.patch There have been some changes (as recorded in the Changes Section, Nov 2 2009 sub section of http://wiki.apache.org/pig/LoadStoreRedesignProposal) in the load/store interfaces - this jira is to track the task of making those changes under src. Changes under test will be addresses in a different jira. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Resolved: (PIG-1228) \
[ https://issues.apache.org/jira/browse/PIG-1228?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pradeep Kamath resolved PIG-1228. - Resolution: Invalid Seems like a jira created by accident \ - Key: PIG-1228 URL: https://issues.apache.org/jira/browse/PIG-1228 Project: Pig Issue Type: Sub-task Reporter: Pradeep Kamath -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (PIG-1228) \
\ - Key: PIG-1228 URL: https://issues.apache.org/jira/browse/PIG-1228 Project: Pig Issue Type: Sub-task Reporter: Pradeep Kamath -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1090) Update sources to reflect recent changes in load-store interfaces
[ https://issues.apache.org/jira/browse/PIG-1090?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pradeep Kamath updated PIG-1090: Attachment: PIG-1090-21.patch Attached patch to handle calling storeSchema in StoreMetadata interface in local mode (currently there is a hadoop bug : https://issues.apache.org/jira/browse/MAPREDUCE-1447 which prevents the current code from making this call in local mode). The patch is a workaround till hadoop fixes the bug - in MapReduceLauncer, we explictly call this method for successful stores. Update sources to reflect recent changes in load-store interfaces - Key: PIG-1090 URL: https://issues.apache.org/jira/browse/PIG-1090 Project: Pig Issue Type: Sub-task Affects Versions: 0.7.0 Reporter: Pradeep Kamath Assignee: Pradeep Kamath Fix For: 0.7.0 Attachments: PIG-1090-10.patch, PIG-1090-11.patch, PIG-1090-12.patch, PIG-1090-13.patch, PIG-1090-14.patch, PIG-1090-15.patch, PIG-1090-16.patch, PIG-1090-17.patch, PIG-1090-18.patch, PIG-1090-19.patch, PIG-1090-2.patch, PIG-1090-20.patch, PIG-1090-21.patch, PIG-1090-3.patch, PIG-1090-4.patch, PIG-1090-6.patch, PIG-1090-7.patch, PIG-1090-8.patch, PIG-1090-9.patch, PIG-1090.patch, PIG-1190-5.patch There have been some changes (as recorded in the Changes Section, Nov 2 2009 sub section of http://wiki.apache.org/pig/LoadStoreRedesignProposal) in the load/store interfaces - this jira is to track the task of making those changes under src. Changes under test will be addresses in a different jira. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-1090) Update sources to reflect recent changes in load-store interfaces
[ https://issues.apache.org/jira/browse/PIG-1090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12829242#action_12829242 ] Pradeep Kamath commented on PIG-1090: - test-patch results for PIG-1090-21.patch: [exec] +1 overall. [exec] [exec] +1 @author. The patch does not contain any @author tags. [exec] [exec] +1 tests included. The patch appears to include 6 new or modified tests. [exec] [exec] +1 javadoc. The javadoc tool did not generate any warning messages. [exec] [exec] +1 javac. The applied patch does not increase the total number of javac compiler warnings. [exec] [exec] +1 findbugs. The patch does not introduce any new Findbugs warnings. [exec] [exec] +1 release audit. The applied patch does not increase the total number of release audit warnings. Update sources to reflect recent changes in load-store interfaces - Key: PIG-1090 URL: https://issues.apache.org/jira/browse/PIG-1090 Project: Pig Issue Type: Sub-task Affects Versions: 0.7.0 Reporter: Pradeep Kamath Assignee: Pradeep Kamath Fix For: 0.7.0 Attachments: PIG-1090-10.patch, PIG-1090-11.patch, PIG-1090-12.patch, PIG-1090-13.patch, PIG-1090-14.patch, PIG-1090-15.patch, PIG-1090-16.patch, PIG-1090-17.patch, PIG-1090-18.patch, PIG-1090-19.patch, PIG-1090-2.patch, PIG-1090-20.patch, PIG-1090-21.patch, PIG-1090-3.patch, PIG-1090-4.patch, PIG-1090-6.patch, PIG-1090-7.patch, PIG-1090-8.patch, PIG-1090-9.patch, PIG-1090.patch, PIG-1190-5.patch There have been some changes (as recorded in the Changes Section, Nov 2 2009 sub section of http://wiki.apache.org/pig/LoadStoreRedesignProposal) in the load/store interfaces - this jira is to track the task of making those changes under src. Changes under test will be addresses in a different jira. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.