[jira] Commented: (PIG-1234) Unable to create input slice for har:// files
[ https://issues.apache.org/jira/browse/PIG-1234?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12834378#action_12834378 ] Tsz Wo (Nicholas), SZE commented on PIG-1234: - 1) '\n' is not a valid argument for PigStorage ... I wrote a pig wordcount program as show below. {code} -- ExtractWord below is an UDF REGISTER ./tutorial.jar; a = LOAD 'inputDir' USING PigStorage('\n') AS (line); b = FOREACH a GENERATE flatten(org.apache.pig.tutorial.ExtractWord(line)) as word; c = GROUP b BY word; d = FOREACH c GENERATE group, COUNT(b); STORE d INTO 'outputDir' USING PigStorage('\t'); {code} If inputDir is a hdfs:// dir, the program works fine but if it is replaced by a har:// dir, it fails as shown previously. It actually don't need any field delimiter. So I put '\n'. BTW, do you think it is good to add a pig wordcount example/tutorial? The existing tutorials are quite lengthy. They may be too hard for beginners. 2) The patch here only fixes the incorrect checking of the scheme ... I will test it. Thanks, Pradeep. Unable to create input slice for har:// files - Key: PIG-1234 URL: https://issues.apache.org/jira/browse/PIG-1234 Project: Pig Issue Type: Bug Reporter: Tsz Wo (Nicholas), SZE Assignee: Pradeep Kamath Fix For: 0.7.0 Attachments: PIG-1234.patch Tried to load har:// files {noformat} grunt a = LOAD 'har://hdfs-namenode/user/tsz/t20.har/t20' USING PigStorage('\n') AS (line); grunt dump {noformat} but pig says {noformat} 2010-02-10 18:42:20,750 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 2118: Unable to create input slice for: har://hdfs-namenode/user/tsz/t20.har/t20 {noformat} -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-1234) Unable to create input slice for har:// files
[ https://issues.apache.org/jira/browse/PIG-1234?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12834483#action_12834483 ] Tsz Wo (Nicholas), SZE commented on PIG-1234: - The patch worked fine: The wordcount program succeeded and took 53 mins. {noformat} grunt STORE d INTO 't10_har_pig_wc' USING PigStorage('\t'); 2010-02-16 19:35:28,501 [main] WARN org.apache.pig.PigServer - Encountered Warning IMPLICIT_CAST_TO_CHARARRAY 1 time(s). ... 2010-02-16 19:57:14,018 [Thread-10] INFO org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total input paths to process : 10 ... 2010-02-16 20:20:23,936 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Submitting job: job_201002042035_43197 to execution engine. 2010-02-16 20:20:23,936 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - More information at: http://jobtracker:50030/jobdetails.jsp?jobid=job_201002042035_43197 2010-02-16 20:20:23,936 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - To kill this job, use: kill job_201002042035_43197 2010-02-16 20:20:24,447 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 0% complete 2010-02-16 20:21:44,201 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 12% complete 2010-02-16 20:21:49,587 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 13% complete 2010-02-16 20:22:03,904 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 17% complete 2010-02-16 20:22:27,098 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 24% complete 2010-02-16 20:23:11,759 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 32% complete 2010-02-16 20:23:33,678 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 39% complete 2010-02-16 20:23:42,900 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 42% complete 2010-02-16 20:23:51,739 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 45% complete 2010-02-16 20:24:00,521 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 47% complete 2010-02-16 20:24:10,577 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 50% complete 2010-02-16 20:24:18,561 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 53% complete 2010-02-16 20:24:27,555 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 56% complete 2010-02-16 20:24:34,101 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 58% complete 2010-02-16 20:24:43,539 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 62% complete 2010-02-16 20:24:58,398 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 66% complete 2010-02-16 20:26:31,290 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 66% complete 2010-02-16 20:27:22,225 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 66% complete 2010-02-16 20:28:17,773 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 100% complete 2010-02-16 20:28:17,773 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Successfully stored result in: hdfs://namenode/user/tsz/t10_har_pig_wc 2010-02-16 20:28:32,253 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Records written : 132 2010-02-16 20:28:32,253 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Bytes written : 1862 2010-02-16 20:28:32,253 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Success! {noformat} Unable to create input slice for har:// files - Key: PIG-1234 URL: https://issues.apache.org/jira/browse/PIG-1234 Project: Pig Issue Type: Bug Reporter: Tsz Wo (Nicholas), SZE Assignee: Pradeep Kamath Fix For: 0.7.0 Attachments: PIG-1234.patch Tried to load har:// files {noformat} grunt a = LOAD 'har://hdfs-namenode/user/tsz/t20.har/t20' USING PigStorage('\n') AS (line); grunt dump {noformat} but pig says {noformat} 2010-02-10 18:42:20,750
[jira] Commented: (PIG-1234) Unable to create input slice for har:// files
[ https://issues.apache.org/jira/browse/PIG-1234?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12834491#action_12834491 ] Tsz Wo (Nicholas), SZE commented on PIG-1234: - Pig is amazing! The pig wordcount program with har:// ran faster than the mapreduce wordcount example, which took 57 mins [in my previous test|https://issues.apache.org/jira/browse/HADOOP-6467?focusedCommentId=12830401page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#action_12830401]. Unable to create input slice for har:// files - Key: PIG-1234 URL: https://issues.apache.org/jira/browse/PIG-1234 Project: Pig Issue Type: Bug Reporter: Tsz Wo (Nicholas), SZE Assignee: Pradeep Kamath Fix For: 0.7.0 Attachments: PIG-1234.patch Tried to load har:// files {noformat} grunt a = LOAD 'har://hdfs-namenode/user/tsz/t20.har/t20' USING PigStorage('\n') AS (line); grunt dump {noformat} but pig says {noformat} 2010-02-10 18:42:20,750 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 2118: Unable to create input slice for: har://hdfs-namenode/user/tsz/t20.har/t20 {noformat} -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-1234) Unable to create input slice for har:// files
[ https://issues.apache.org/jira/browse/PIG-1234?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12833221#action_12833221 ] Pradeep Kamath commented on PIG-1234: - Results from running test-patch ant target: [exec] +1 overall. [exec] [exec] +1 @author. The patch does not contain any @author tags. [exec] [exec] +1 tests included. The patch appears to include 3 new or modified tests. [exec] [exec] +1 javadoc. The javadoc tool did not generate any warning messages. [exec] [exec] +1 javac. The applied patch does not increase the total number of javac compiler warnings. [exec] [exec] +1 findbugs. The patch does not introduce any new Findbugs warnings. [exec] [exec] +1 release audit. The applied patch does not increase the total number of release audit warnings. I am currently running unit tests against this patch on load-store-redesign branch. Unable to create input slice for har:// files - Key: PIG-1234 URL: https://issues.apache.org/jira/browse/PIG-1234 Project: Pig Issue Type: Bug Reporter: Tsz Wo (Nicholas), SZE Assignee: Pradeep Kamath Fix For: 0.7.0 Attachments: PIG-1234.patch Tried to load har:// files {noformat} grunt a = LOAD 'har://hdfs-namenode/user/tsz/t20.har/t20' USING PigStorage('\n') AS (line); grunt dump {noformat} but pig says {noformat} 2010-02-10 18:42:20,750 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 2118: Unable to create input slice for: har://hdfs-namenode/user/tsz/t20.har/t20 {noformat} -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-1234) Unable to create input slice for har:// files
[ https://issues.apache.org/jira/browse/PIG-1234?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12833231#action_12833231 ] Pradeep Kamath commented on PIG-1234: - A couple of observations about the load statement in the script posted above: 1) '\n' is not a valid argument for PigStorage - the argument is meant to be the field delimiter and '\n' cannot be used for a field delimiter (since it is considered to be the record delimiter by PigStorage) 2) The patch here only fixes the incorrect checking of the scheme in the url - whether har://.. resources can be read by PigStorage or not will depend on whether TextInputFormat can read har://.. resources. PigStorage simply passes the location onto to TextInputFormat which does the real reading. Unable to create input slice for har:// files - Key: PIG-1234 URL: https://issues.apache.org/jira/browse/PIG-1234 Project: Pig Issue Type: Bug Reporter: Tsz Wo (Nicholas), SZE Assignee: Pradeep Kamath Fix For: 0.7.0 Attachments: PIG-1234.patch Tried to load har:// files {noformat} grunt a = LOAD 'har://hdfs-namenode/user/tsz/t20.har/t20' USING PigStorage('\n') AS (line); grunt dump {noformat} but pig says {noformat} 2010-02-10 18:42:20,750 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 2118: Unable to create input slice for: har://hdfs-namenode/user/tsz/t20.har/t20 {noformat} -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-1234) Unable to create input slice for har:// files
[ https://issues.apache.org/jira/browse/PIG-1234?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12833233#action_12833233 ] Richard Ding commented on PIG-1234: --- +1 for commit. Unable to create input slice for har:// files - Key: PIG-1234 URL: https://issues.apache.org/jira/browse/PIG-1234 Project: Pig Issue Type: Bug Reporter: Tsz Wo (Nicholas), SZE Assignee: Pradeep Kamath Fix For: 0.7.0 Attachments: PIG-1234.patch Tried to load har:// files {noformat} grunt a = LOAD 'har://hdfs-namenode/user/tsz/t20.har/t20' USING PigStorage('\n') AS (line); grunt dump {noformat} but pig says {noformat} 2010-02-10 18:42:20,750 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 2118: Unable to create input slice for: har://hdfs-namenode/user/tsz/t20.har/t20 {noformat} -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-1234) Unable to create input slice for har:// files
[ https://issues.apache.org/jira/browse/PIG-1234?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12832162#action_12832162 ] Tsz Wo (Nicholas), SZE commented on PIG-1234: - More error messages: {noformat} Backend error message during job submission --- org.apache.pig.backend.executionengine.ExecException: ERROR 2118: Unable to create input slice for: har://hdfs-namenode/user/tsz/t20.har/t20 at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigInputFormat.getSplits(PigInputFormat.java:269) at org.apache.hadoop.mapred.JobClient.writeOldSplits(JobClient.java:810) at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:781) at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:730) at org.apache.hadoop.mapred.jobcontrol.Job.submit(Job.java:378) at org.apache.hadoop.mapred.jobcontrol.JobControl.startReadyJobs(JobControl.java:247) at org.apache.hadoop.mapred.jobcontrol.JobControl.run(JobControl.java:279) at java.lang.Thread.run(Thread.java:619) Caused by: java.lang.IllegalArgumentException: Wrong FS: har://hdfs-namenode/user/tsz/t20.har/t20, expected: hdfs://namenode at org.apache.hadoop.fs.FileSystem.checkPath(FileSystem.java:310) at org.apache.hadoop.hdfs.DistributedFileSystem.checkPath(DistributedFileSystem.java:99) at org.apache.hadoop.hdfs.DistributedFileSystem.getPathName(DistributedFileSystem.java:155) at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:453) at org.apache.hadoop.fs.FileSystem.exists(FileSystem.java:648) at org.apache.pig.backend.hadoop.datastorage.HDataStorage.isContainer(HDataStorage.java:203) at org.apache.pig.backend.hadoop.datastorage.HDataStorage.asElement(HDataStorage.java:131) at org.apache.pig.impl.io.FileLocalizer.fileExists(FileLocalizer.java:553) at org.apache.pig.backend.executionengine.PigSlicer.validate(PigSlicer.java:123) at org.apache.pig.impl.io.ValidatingInputFileSpec.validate(ValidatingInputFileSpec.java:59) at org.apache.pig.impl.io.ValidatingInputFileSpec.init(ValidatingInputFileSpec.java:44) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigInputFormat.getSplits(PigInputFormat.java:240) ... 7 more Pig Stack Trace --- ERROR 2118: Unable to create input slice for: har://hdfs-namenode/user/tsz/t20.har/t20 org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1066: Unable to open iterator for alias a at org.apache.pig.PigServer.openIterator(PigServer.java:482) at org.apache.pig.tools.grunt.GruntParser.processDump(GruntParser.java:539) at org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:241) at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:168) at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:144) at org.apache.pig.tools.grunt.Grunt.run(Grunt.java:75) at org.apache.pig.Main.main(Main.java:352) Caused by: org.apache.pig.backend.executionengine.ExecException: ERROR 6015: During execution, encountered a Hadoop error. at .apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigInputFormat.getSplits(PigInputFormat.java:269) at .apache.hadoop.mapred.JobClient.writeOldSplits(JobClient.java:810) at .apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:781) at .apache.hadoop.mapred.JobClient.submitJob(JobClient.java:730) at .apache.hadoop.mapred.jobcontrol.Job.submit(Job.java:378) at .apache.hadoop.mapred.jobcontrol.JobControl.startReadyJobs(JobControl.java:247) at .apache.hadoop.mapred.jobcontrol.JobControl.run(JobControl.java:279) at .lang.Thread.run(Thread.java:619) Caused by: org.apache.pig.backend.executionengine.ExecException: ERROR 2118: Unable to create input slice for: har://hdfs-namenode/user/tsz/t20.har/t20 ... 8 more Caused by: java.lang.IllegalArgumentException: Wrong FS: har://hdfs-namenode/user/tsz/t20.har/t20, expected: hdfs://namenode at .apache.hadoop.fs.FileSystem.checkPath(FileSystem.java:310) at .apache.hadoop.hdfs.DistributedFileSystem.checkPath(DistributedFileSystem.java:99) at .apache.hadoop.hdfs.DistributedFileSystem.getPathName(DistributedFileSystem.java:155) at .apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:453) at .apache.hadoop.fs.FileSystem.exists(FileSystem.java:648) at .apache.pig.backend.hadoop.datastorage.HDataStorage.isContainer(HDataStorage.java:203) at .apache.pig.backend.hadoop.datastorage.HDataStorage.asElement(HDataStorage.java:131) at
[jira] Commented: (PIG-1234) Unable to create input slice for har:// files
[ https://issues.apache.org/jira/browse/PIG-1234?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12832202#action_12832202 ] Pradeep Kamath commented on PIG-1234: - can you try using a pig.jar compiled from the load store redesign branch - http://svn.apache.org/repos/asf/hadoop/pig/branches/load-store-redesign? Unable to create input slice for har:// files - Key: PIG-1234 URL: https://issues.apache.org/jira/browse/PIG-1234 Project: Pig Issue Type: Bug Reporter: Tsz Wo (Nicholas), SZE Tried to load har:// files {noformat} grunt a = LOAD 'har://hdfs-namenode/user/tsz/t20.har/t20' USING PigStorage('\n') AS (line); grunt dump {noformat} but pig says {noformat} 2010-02-10 18:42:20,750 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 2118: Unable to create input slice for: har://hdfs-namenode/user/tsz/t20.har/t20 {noformat} -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-1234) Unable to create input slice for har:// files
[ https://issues.apache.org/jira/browse/PIG-1234?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12832247#action_12832247 ] Tsz Wo (Nicholas), SZE commented on PIG-1234: - can you try using a pig.jar compiled from the load store redesign branch ... Sure, I will try it later. Got the same problem if PigStorage is replaced with TextLoader. My pig version is Apache Pig version 0.6.0.0.20.1.1001221613 (r902141) compiled Jan 22 2010, 16:13:46 Unable to create input slice for har:// files - Key: PIG-1234 URL: https://issues.apache.org/jira/browse/PIG-1234 Project: Pig Issue Type: Bug Reporter: Tsz Wo (Nicholas), SZE Tried to load har:// files {noformat} grunt a = LOAD 'har://hdfs-namenode/user/tsz/t20.har/t20' USING PigStorage('\n') AS (line); grunt dump {noformat} but pig says {noformat} 2010-02-10 18:42:20,750 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 2118: Unable to create input slice for: har://hdfs-namenode/user/tsz/t20.har/t20 {noformat} -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.