[jira] Commented: (PIG-1234) Unable to create input slice for har:// files

2010-02-16 Thread Tsz Wo (Nicholas), SZE (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1234?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12834378#action_12834378
 ] 

Tsz Wo (Nicholas), SZE commented on PIG-1234:
-

 1) '\n' is not a valid argument for PigStorage ...

I wrote a pig wordcount program as show below.
{code}
-- ExtractWord below is an UDF
REGISTER ./tutorial.jar;

a = LOAD 'inputDir' USING PigStorage('\n') AS (line);
b = FOREACH a GENERATE flatten(org.apache.pig.tutorial.ExtractWord(line)) as 
word;
c = GROUP b BY word;
d = FOREACH c GENERATE group, COUNT(b);
STORE d INTO 'outputDir' USING PigStorage('\t');
{code}
If inputDir is a hdfs:// dir, the program works fine but if it is replaced by a 
har:// dir, it fails as shown previously.  It actually don't need any field 
delimiter.  So I put '\n'.

BTW, do you think it is good to add a pig wordcount example/tutorial?  The 
existing tutorials are quite lengthy.  They may be too hard for beginners.

 2) The patch here only fixes the incorrect checking of the scheme ...
I will test it.  Thanks, Pradeep.

 Unable to create input slice for har:// files
 -

 Key: PIG-1234
 URL: https://issues.apache.org/jira/browse/PIG-1234
 Project: Pig
  Issue Type: Bug
Reporter: Tsz Wo (Nicholas), SZE
Assignee: Pradeep Kamath
 Fix For: 0.7.0

 Attachments: PIG-1234.patch


 Tried to load har:// files
 {noformat}
 grunt a = LOAD 'har://hdfs-namenode/user/tsz/t20.har/t20' USING 
 PigStorage('\n') AS (line);
 grunt dump 
 {noformat}
 but pig says
 {noformat}
 2010-02-10 18:42:20,750 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 
 2118:
  Unable to create input slice for: har://hdfs-namenode/user/tsz/t20.har/t20
 {noformat}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1234) Unable to create input slice for har:// files

2010-02-16 Thread Tsz Wo (Nicholas), SZE (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1234?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12834483#action_12834483
 ] 

Tsz Wo (Nicholas), SZE commented on PIG-1234:
-

The patch worked fine: The wordcount program succeeded and took 53 mins.
{noformat}
grunt STORE d INTO 't10_har_pig_wc' USING PigStorage('\t');
2010-02-16 19:35:28,501 [main] WARN  org.apache.pig.PigServer - Encountered 
Warning IMPLICIT_CAST_TO_CHARARRAY 1 time(s).
...
2010-02-16 19:57:14,018 [Thread-10] INFO  
org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total input paths to 
process : 10
...
2010-02-16 20:20:23,936 [main] INFO  
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
 - Submitting job: job_201002042035_43197 to execution engine.
2010-02-16 20:20:23,936 [main] INFO  
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
 - More information at: 
http://jobtracker:50030/jobdetails.jsp?jobid=job_201002042035_43197
2010-02-16 20:20:23,936 [main] INFO  
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
 - To kill this job, use: kill job_201002042035_43197
2010-02-16 20:20:24,447 [main] INFO  
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher 
- 0% complete
2010-02-16 20:21:44,201 [main] INFO  
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher 
- 12% complete
2010-02-16 20:21:49,587 [main] INFO  
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher 
- 13% complete
2010-02-16 20:22:03,904 [main] INFO  
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher 
- 17% complete
2010-02-16 20:22:27,098 [main] INFO  
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher 
- 24% complete
2010-02-16 20:23:11,759 [main] INFO  
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher 
- 32% complete
2010-02-16 20:23:33,678 [main] INFO  
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher 
- 39% complete
2010-02-16 20:23:42,900 [main] INFO  
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher 
- 42% complete
2010-02-16 20:23:51,739 [main] INFO  
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher 
- 45% complete
2010-02-16 20:24:00,521 [main] INFO  
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher 
- 47% complete
2010-02-16 20:24:10,577 [main] INFO  
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher 
- 50% complete
2010-02-16 20:24:18,561 [main] INFO  
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher 
- 53% complete
2010-02-16 20:24:27,555 [main] INFO  
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher 
- 56% complete
2010-02-16 20:24:34,101 [main] INFO  
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher 
- 58% complete
2010-02-16 20:24:43,539 [main] INFO  
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher 
- 62% complete
2010-02-16 20:24:58,398 [main] INFO  
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher 
- 66% complete
2010-02-16 20:26:31,290 [main] INFO  
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher 
- 66% complete
2010-02-16 20:27:22,225 [main] INFO  
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher 
- 66% complete
2010-02-16 20:28:17,773 [main] INFO  
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher 
- 100% complete
2010-02-16 20:28:17,773 [main] INFO  
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
 - Successfully stored result in: hdfs://namenode/user/tsz/t10_har_pig_wc
2010-02-16 20:28:32,253 [main] INFO  
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher 
- Records written : 132
2010-02-16 20:28:32,253 [main] INFO  
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher 
- Bytes written : 1862
2010-02-16 20:28:32,253 [main] INFO  
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher 
- Success!
{noformat}

 Unable to create input slice for har:// files
 -

 Key: PIG-1234
 URL: https://issues.apache.org/jira/browse/PIG-1234
 Project: Pig
  Issue Type: Bug
Reporter: Tsz Wo (Nicholas), SZE
Assignee: Pradeep Kamath
 Fix For: 0.7.0

 Attachments: PIG-1234.patch


 Tried to load har:// files
 {noformat}
 grunt a = LOAD 'har://hdfs-namenode/user/tsz/t20.har/t20' USING 
 PigStorage('\n') AS (line);
 grunt dump 
 {noformat}
 but pig says
 {noformat}
 2010-02-10 18:42:20,750 

[jira] Commented: (PIG-1234) Unable to create input slice for har:// files

2010-02-16 Thread Tsz Wo (Nicholas), SZE (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1234?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12834491#action_12834491
 ] 

Tsz Wo (Nicholas), SZE commented on PIG-1234:
-

Pig is amazing!  The pig wordcount program with har:// ran faster than the 
mapreduce wordcount example, which took 57 mins [in my previous 
test|https://issues.apache.org/jira/browse/HADOOP-6467?focusedCommentId=12830401page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#action_12830401].

 Unable to create input slice for har:// files
 -

 Key: PIG-1234
 URL: https://issues.apache.org/jira/browse/PIG-1234
 Project: Pig
  Issue Type: Bug
Reporter: Tsz Wo (Nicholas), SZE
Assignee: Pradeep Kamath
 Fix For: 0.7.0

 Attachments: PIG-1234.patch


 Tried to load har:// files
 {noformat}
 grunt a = LOAD 'har://hdfs-namenode/user/tsz/t20.har/t20' USING 
 PigStorage('\n') AS (line);
 grunt dump 
 {noformat}
 but pig says
 {noformat}
 2010-02-10 18:42:20,750 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 
 2118:
  Unable to create input slice for: har://hdfs-namenode/user/tsz/t20.har/t20
 {noformat}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1234) Unable to create input slice for har:// files

2010-02-12 Thread Pradeep Kamath (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1234?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12833221#action_12833221
 ] 

Pradeep Kamath commented on PIG-1234:
-

Results from running test-patch ant target:
  [exec] +1 overall.
 [exec]
 [exec] +1 @author.  The patch does not contain any @author tags.
 [exec]
 [exec] +1 tests included.  The patch appears to include 3 new or 
modified tests.
 [exec]
 [exec] +1 javadoc.  The javadoc tool did not generate any warning 
messages.
 [exec]
 [exec] +1 javac.  The applied patch does not increase the total number 
of javac compiler warnings.
 [exec]
 [exec] +1 findbugs.  The patch does not introduce any new Findbugs 
warnings.
 [exec]
 [exec] +1 release audit.  The applied patch does not increase the 
total number of release audit warnings.

I am currently running unit tests against this patch on load-store-redesign 
branch.

 Unable to create input slice for har:// files
 -

 Key: PIG-1234
 URL: https://issues.apache.org/jira/browse/PIG-1234
 Project: Pig
  Issue Type: Bug
Reporter: Tsz Wo (Nicholas), SZE
Assignee: Pradeep Kamath
 Fix For: 0.7.0

 Attachments: PIG-1234.patch


 Tried to load har:// files
 {noformat}
 grunt a = LOAD 'har://hdfs-namenode/user/tsz/t20.har/t20' USING 
 PigStorage('\n') AS (line);
 grunt dump 
 {noformat}
 but pig says
 {noformat}
 2010-02-10 18:42:20,750 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 
 2118:
  Unable to create input slice for: har://hdfs-namenode/user/tsz/t20.har/t20
 {noformat}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1234) Unable to create input slice for har:// files

2010-02-12 Thread Pradeep Kamath (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1234?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12833231#action_12833231
 ] 

Pradeep Kamath commented on PIG-1234:
-

A couple of observations about the load statement in the script posted above:
1) '\n' is not a valid argument for PigStorage - the argument is meant to be 
the field delimiter and '\n' cannot be used for a field delimiter (since it is 
considered to be the record delimiter by PigStorage)
2) The patch here only fixes the incorrect checking of the scheme in the url - 
whether har://.. resources can be read by PigStorage or not will depend on 
whether TextInputFormat can read har://.. resources. PigStorage simply passes 
the location onto to TextInputFormat which does the real reading.

 Unable to create input slice for har:// files
 -

 Key: PIG-1234
 URL: https://issues.apache.org/jira/browse/PIG-1234
 Project: Pig
  Issue Type: Bug
Reporter: Tsz Wo (Nicholas), SZE
Assignee: Pradeep Kamath
 Fix For: 0.7.0

 Attachments: PIG-1234.patch


 Tried to load har:// files
 {noformat}
 grunt a = LOAD 'har://hdfs-namenode/user/tsz/t20.har/t20' USING 
 PigStorage('\n') AS (line);
 grunt dump 
 {noformat}
 but pig says
 {noformat}
 2010-02-10 18:42:20,750 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 
 2118:
  Unable to create input slice for: har://hdfs-namenode/user/tsz/t20.har/t20
 {noformat}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1234) Unable to create input slice for har:// files

2010-02-12 Thread Richard Ding (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1234?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12833233#action_12833233
 ] 

Richard Ding commented on PIG-1234:
---

+1 for commit.

 Unable to create input slice for har:// files
 -

 Key: PIG-1234
 URL: https://issues.apache.org/jira/browse/PIG-1234
 Project: Pig
  Issue Type: Bug
Reporter: Tsz Wo (Nicholas), SZE
Assignee: Pradeep Kamath
 Fix For: 0.7.0

 Attachments: PIG-1234.patch


 Tried to load har:// files
 {noformat}
 grunt a = LOAD 'har://hdfs-namenode/user/tsz/t20.har/t20' USING 
 PigStorage('\n') AS (line);
 grunt dump 
 {noformat}
 but pig says
 {noformat}
 2010-02-10 18:42:20,750 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 
 2118:
  Unable to create input slice for: har://hdfs-namenode/user/tsz/t20.har/t20
 {noformat}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1234) Unable to create input slice for har:// files

2010-02-10 Thread Tsz Wo (Nicholas), SZE (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1234?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12832162#action_12832162
 ] 

Tsz Wo (Nicholas), SZE commented on PIG-1234:
-

More error messages:
{noformat}
Backend error message during job submission
---
org.apache.pig.backend.executionengine.ExecException: ERROR 2118: Unable to 
create input slice for: har://hdfs-namenode/user/tsz/t20.har/t20
at 
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigInputFormat.getSplits(PigInputFormat.java:269)
at org.apache.hadoop.mapred.JobClient.writeOldSplits(JobClient.java:810)
at 
org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:781)
at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:730)
at org.apache.hadoop.mapred.jobcontrol.Job.submit(Job.java:378)
at 
org.apache.hadoop.mapred.jobcontrol.JobControl.startReadyJobs(JobControl.java:247)
at 
org.apache.hadoop.mapred.jobcontrol.JobControl.run(JobControl.java:279)
at java.lang.Thread.run(Thread.java:619)
Caused by: java.lang.IllegalArgumentException: Wrong FS: 
har://hdfs-namenode/user/tsz/t20.har/t20, expected: hdfs://namenode
at org.apache.hadoop.fs.FileSystem.checkPath(FileSystem.java:310)
at 
org.apache.hadoop.hdfs.DistributedFileSystem.checkPath(DistributedFileSystem.java:99)
at 
org.apache.hadoop.hdfs.DistributedFileSystem.getPathName(DistributedFileSystem.java:155)
at 
org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:453)
at org.apache.hadoop.fs.FileSystem.exists(FileSystem.java:648)
at 
org.apache.pig.backend.hadoop.datastorage.HDataStorage.isContainer(HDataStorage.java:203)
at 
org.apache.pig.backend.hadoop.datastorage.HDataStorage.asElement(HDataStorage.java:131)
at 
org.apache.pig.impl.io.FileLocalizer.fileExists(FileLocalizer.java:553)
at 
org.apache.pig.backend.executionengine.PigSlicer.validate(PigSlicer.java:123)
at 
org.apache.pig.impl.io.ValidatingInputFileSpec.validate(ValidatingInputFileSpec.java:59)
at 
org.apache.pig.impl.io.ValidatingInputFileSpec.init(ValidatingInputFileSpec.java:44)
at 
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigInputFormat.getSplits(PigInputFormat.java:240)
... 7 more

Pig Stack Trace
---
ERROR 2118: Unable to create input slice for: 
har://hdfs-namenode/user/tsz/t20.har/t20

org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1066: Unable to open 
iterator for alias a
at org.apache.pig.PigServer.openIterator(PigServer.java:482)
at 
org.apache.pig.tools.grunt.GruntParser.processDump(GruntParser.java:539)
at 
org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:241)
at 
org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:168)
at 
org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:144)
at org.apache.pig.tools.grunt.Grunt.run(Grunt.java:75)
at org.apache.pig.Main.main(Main.java:352)
Caused by: org.apache.pig.backend.executionengine.ExecException: ERROR 6015: 
During execution, encountered a Hadoop error.
at 
.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigInputFormat.getSplits(PigInputFormat.java:269)
at .apache.hadoop.mapred.JobClient.writeOldSplits(JobClient.java:810)
at .apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:781)
at .apache.hadoop.mapred.JobClient.submitJob(JobClient.java:730)
at .apache.hadoop.mapred.jobcontrol.Job.submit(Job.java:378)
at 
.apache.hadoop.mapred.jobcontrol.JobControl.startReadyJobs(JobControl.java:247)
at .apache.hadoop.mapred.jobcontrol.JobControl.run(JobControl.java:279)
at .lang.Thread.run(Thread.java:619)
Caused by: org.apache.pig.backend.executionengine.ExecException: ERROR 2118: 
Unable to create input slice for: har://hdfs-namenode/user/tsz/t20.har/t20
... 8 more
Caused by: java.lang.IllegalArgumentException: Wrong FS: 
har://hdfs-namenode/user/tsz/t20.har/t20, expected: hdfs://namenode
at .apache.hadoop.fs.FileSystem.checkPath(FileSystem.java:310)
at 
.apache.hadoop.hdfs.DistributedFileSystem.checkPath(DistributedFileSystem.java:99)
at 
.apache.hadoop.hdfs.DistributedFileSystem.getPathName(DistributedFileSystem.java:155)
at 
.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:453)
at .apache.hadoop.fs.FileSystem.exists(FileSystem.java:648)
at 
.apache.pig.backend.hadoop.datastorage.HDataStorage.isContainer(HDataStorage.java:203)
at 
.apache.pig.backend.hadoop.datastorage.HDataStorage.asElement(HDataStorage.java:131)
at 

[jira] Commented: (PIG-1234) Unable to create input slice for har:// files

2010-02-10 Thread Pradeep Kamath (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1234?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12832202#action_12832202
 ] 

Pradeep Kamath commented on PIG-1234:
-

can you try using a pig.jar compiled from the load store redesign branch - 
http://svn.apache.org/repos/asf/hadoop/pig/branches/load-store-redesign?



 Unable to create input slice for har:// files
 -

 Key: PIG-1234
 URL: https://issues.apache.org/jira/browse/PIG-1234
 Project: Pig
  Issue Type: Bug
Reporter: Tsz Wo (Nicholas), SZE

 Tried to load har:// files
 {noformat}
 grunt a = LOAD 'har://hdfs-namenode/user/tsz/t20.har/t20' USING 
 PigStorage('\n') AS (line);
 grunt dump 
 {noformat}
 but pig says
 {noformat}
 2010-02-10 18:42:20,750 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 
 2118:
  Unable to create input slice for: har://hdfs-namenode/user/tsz/t20.har/t20
 {noformat}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1234) Unable to create input slice for har:// files

2010-02-10 Thread Tsz Wo (Nicholas), SZE (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1234?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12832247#action_12832247
 ] 

Tsz Wo (Nicholas), SZE commented on PIG-1234:
-

 can you try using a pig.jar compiled from the load store redesign branch ...
Sure, I will try it later.


Got the same problem if PigStorage is replaced with TextLoader.  My pig version 
is

Apache Pig version 0.6.0.0.20.1.1001221613 (r902141) 
compiled Jan 22 2010, 16:13:46



 Unable to create input slice for har:// files
 -

 Key: PIG-1234
 URL: https://issues.apache.org/jira/browse/PIG-1234
 Project: Pig
  Issue Type: Bug
Reporter: Tsz Wo (Nicholas), SZE

 Tried to load har:// files
 {noformat}
 grunt a = LOAD 'har://hdfs-namenode/user/tsz/t20.har/t20' USING 
 PigStorage('\n') AS (line);
 grunt dump 
 {noformat}
 but pig says
 {noformat}
 2010-02-10 18:42:20,750 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 
 2118:
  Unable to create input slice for: har://hdfs-namenode/user/tsz/t20.har/t20
 {noformat}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.