[jira] Updated: (PIG-619) Dumping empty results produces Unable to get results for /tmp/temp-1964806069/tmp256878619 org.apache.pig.builtin.BinStorage message

2009-05-28 Thread Alan Gates (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-619?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Gates updated PIG-619:
---

Resolution: Fixed
Status: Resolved  (was: Patch Available)

Patch checked in.

 Dumping empty results produces Unable to get results for 
 /tmp/temp-1964806069/tmp256878619  org.apache.pig.builtin.BinStorage message
 ---

 Key: PIG-619
 URL: https://issues.apache.org/jira/browse/PIG-619
 Project: Pig
  Issue Type: Bug
  Components: impl
Affects Versions: 0.2.0
 Environment: Hadoop 18, Multi-node hadoop installation
Reporter: Viraj Bhat
Assignee: Alan Gates
 Fix For: 0.3.0

 Attachments: mydata.txt, PIG-619.patch, tmpfileload.pig


 Following pig script stores empty filter results into  'emptyfilteredlogs' 
 HDFS dir. It later reloads this data from an empty HDFS dir for additional 
 grouping and counting. It has been observed that this script, succeeds on a 
 single node hadoop installation with the following message as the alias 
 COUNT_EMPTYFILTERED_LOGS contains empty data.
 ==
 2009-01-13 21:47:08,988 [main] INFO  
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
  - Success!
 ==
 But on a multi-node Hadoop installation, the script fails with the following 
 error:
 ==
 2009-01-13 13:48:34,602 [main] INFO  
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
  - Success!
 java.io.IOException: Unable to open iterator for alias: 
 COUNT_EMPTYFILTERED_LOGS [Unable to get results for 
 /tmp/temp-1964806069/tmp256878619:org.apache.pig.builtin.BinStorage]
 at 
 org.apache.pig.backend.hadoop.executionengine.HJob.getResults(HJob.java:74)
 at org.apache.pig.PigServer.openIterator(PigServer.java:408)
 at 
 org.apache.pig.tools.grunt.GruntParser.processDump(GruntParser.java:269)
 at 
 org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:178)
 at 
 org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:84)
 at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:64)
 at org.apache.pig.Main.main(Main.java:306)
 Caused by: org.apache.pig.backend.executionengine.ExecException: Unable to 
 get results for 
 /tmp/temp-1964806069/tmp256878619:org.apache.pig.builtin.BinStorage
 ... 7 more
 Caused by: java.io.IOException: /tmp/temp-1964806069/tmp256878619 does not 
 exist
 at 
 org.apache.pig.impl.io.FileLocalizer.openDFSFile(FileLocalizer.java:188)
 at org.apache.pig.impl.io.FileLocalizer.open(FileLocalizer.java:291)
 at 
 org.apache.pig.backend.hadoop.executionengine.HJob.getResults(HJob.java:69)
 ... 6 more
 ==
 {code}
 RAW_LOGS = load 'mydata.txt' as (url:chararray, numvisits:int);
 RAW_LOGS = limit RAW_LOGS 2;
 FILTERED_LOGS = filter RAW_LOGS by numvisits  0;
 store FILTERED_LOGS into 'emptyfilteredlogs' using PigStorage();
 EMPTY_FILTERED_LOGS = load 'emptyfilteredlogs' as (url:chararray, 
 numvisits:int);
 GROUP_EMPTYFILTERED_LOGS = group EMPTY_FILTERED_LOGS by numvisits;
 COUNT_EMPTYFILTERED_LOGS = foreach GROUP_EMPTYFILTERED_LOGS generate
  group, COUNT(EMPTY_FILTERED_LOGS);
 explain COUNT_EMPTYFILTERED_LOGS;
 dump COUNT_EMPTYFILTERED_LOGS;
 {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-619) Dumping empty results produces Unable to get results for /tmp/temp-1964806069/tmp256878619 org.apache.pig.builtin.BinStorage message

2009-05-14 Thread Alan Gates (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-619?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Gates updated PIG-619:
---

Fix Version/s: 0.3.0
   Status: Patch Available  (was: Open)

In order to see this behavior, you need three map reduce jobs, something like:

A = load
B = filter everything out
C = group
D = foreach
E = distinct
F = group
G = foreach
store G

In this case the first job (A-D) will run and produce 0 length part files.  The 
second job (E) will run, but no maps will be started because the files are zero 
length.  As a result Hadoop now seems to create no output files for this second 
job.  The third job (F-G) then fails complaining that the input files don't 
exist.  The patch changes pig's slicer to return at least one input split per 
part file even when the file is zero length.

 Dumping empty results produces Unable to get results for 
 /tmp/temp-1964806069/tmp256878619  org.apache.pig.builtin.BinStorage message
 ---

 Key: PIG-619
 URL: https://issues.apache.org/jira/browse/PIG-619
 Project: Pig
  Issue Type: Bug
  Components: impl
Affects Versions: 0.2.0
 Environment: Hadoop 18, Multi-node hadoop installation
Reporter: Viraj Bhat
Assignee: Alan Gates
 Fix For: 0.3.0

 Attachments: mydata.txt, PIG-619.patch, tmpfileload.pig


 Following pig script stores empty filter results into  'emptyfilteredlogs' 
 HDFS dir. It later reloads this data from an empty HDFS dir for additional 
 grouping and counting. It has been observed that this script, succeeds on a 
 single node hadoop installation with the following message as the alias 
 COUNT_EMPTYFILTERED_LOGS contains empty data.
 ==
 2009-01-13 21:47:08,988 [main] INFO  
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
  - Success!
 ==
 But on a multi-node Hadoop installation, the script fails with the following 
 error:
 ==
 2009-01-13 13:48:34,602 [main] INFO  
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
  - Success!
 java.io.IOException: Unable to open iterator for alias: 
 COUNT_EMPTYFILTERED_LOGS [Unable to get results for 
 /tmp/temp-1964806069/tmp256878619:org.apache.pig.builtin.BinStorage]
 at 
 org.apache.pig.backend.hadoop.executionengine.HJob.getResults(HJob.java:74)
 at org.apache.pig.PigServer.openIterator(PigServer.java:408)
 at 
 org.apache.pig.tools.grunt.GruntParser.processDump(GruntParser.java:269)
 at 
 org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:178)
 at 
 org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:84)
 at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:64)
 at org.apache.pig.Main.main(Main.java:306)
 Caused by: org.apache.pig.backend.executionengine.ExecException: Unable to 
 get results for 
 /tmp/temp-1964806069/tmp256878619:org.apache.pig.builtin.BinStorage
 ... 7 more
 Caused by: java.io.IOException: /tmp/temp-1964806069/tmp256878619 does not 
 exist
 at 
 org.apache.pig.impl.io.FileLocalizer.openDFSFile(FileLocalizer.java:188)
 at org.apache.pig.impl.io.FileLocalizer.open(FileLocalizer.java:291)
 at 
 org.apache.pig.backend.hadoop.executionengine.HJob.getResults(HJob.java:69)
 ... 6 more
 ==
 {code}
 RAW_LOGS = load 'mydata.txt' as (url:chararray, numvisits:int);
 RAW_LOGS = limit RAW_LOGS 2;
 FILTERED_LOGS = filter RAW_LOGS by numvisits  0;
 store FILTERED_LOGS into 'emptyfilteredlogs' using PigStorage();
 EMPTY_FILTERED_LOGS = load 'emptyfilteredlogs' as (url:chararray, 
 numvisits:int);
 GROUP_EMPTYFILTERED_LOGS = group EMPTY_FILTERED_LOGS by numvisits;
 COUNT_EMPTYFILTERED_LOGS = foreach GROUP_EMPTYFILTERED_LOGS generate
  group, COUNT(EMPTY_FILTERED_LOGS);
 explain COUNT_EMPTYFILTERED_LOGS;
 dump COUNT_EMPTYFILTERED_LOGS;
 {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-619) Dumping empty results produces Unable to get results for /tmp/temp-1964806069/tmp256878619 org.apache.pig.builtin.BinStorage message

2009-01-13 Thread Viraj Bhat (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-619?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Viraj Bhat updated PIG-619:
---

Attachment: mydata.txt

Test data

 Dumping empty results produces Unable to get results for 
 /tmp/temp-1964806069/tmp256878619  org.apache.pig.builtin.BinStorage message
 ---

 Key: PIG-619
 URL: https://issues.apache.org/jira/browse/PIG-619
 Project: Pig
  Issue Type: Bug
  Components: impl
Affects Versions: types_branch
 Environment: Hadoop 18, Multi-node hadoop installation
Reporter: Viraj Bhat
 Fix For: types_branch

 Attachments: mydata.txt, tmpfileload.pig


 Following pig script stores empty filter results into  'emptyfilteredlogs' 
 HDFS dir. It later reloads this data from an empty HDFS dir for additional 
 grouping and counting. It has been observed that this script, succeeds on a 
 single node hadoop installation with the following message as the alias 
 COUNT_EMPTYFILTERED_LOGS contains empty data.
 ==
 2009-01-13 21:47:08,988 [main] INFO  
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
  - Success!
 ==
 But on a multi-node Hadoop installation, the script fails with the following 
 error:
 ==
 2009-01-13 13:48:34,602 [main] INFO  
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
  - Success!
 java.io.IOException: Unable to open iterator for alias: 
 COUNT_EMPTYFILTERED_LOGS [Unable to get results for 
 /tmp/temp-1964806069/tmp256878619:org.apache.pig.builtin.BinStorage]
 at 
 org.apache.pig.backend.hadoop.executionengine.HJob.getResults(HJob.java:74)
 at org.apache.pig.PigServer.openIterator(PigServer.java:408)
 at 
 org.apache.pig.tools.grunt.GruntParser.processDump(GruntParser.java:269)
 at 
 org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:178)
 at 
 org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:84)
 at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:64)
 at org.apache.pig.Main.main(Main.java:306)
 Caused by: org.apache.pig.backend.executionengine.ExecException: Unable to 
 get results for 
 /tmp/temp-1964806069/tmp256878619:org.apache.pig.builtin.BinStorage
 ... 7 more
 Caused by: java.io.IOException: /tmp/temp-1964806069/tmp256878619 does not 
 exist
 at 
 org.apache.pig.impl.io.FileLocalizer.openDFSFile(FileLocalizer.java:188)
 at org.apache.pig.impl.io.FileLocalizer.open(FileLocalizer.java:291)
 at 
 org.apache.pig.backend.hadoop.executionengine.HJob.getResults(HJob.java:69)
 ... 6 more
 ==
 {code}
 RAW_LOGS = load 'mydata.txt' as (url:chararray, numvisits:int);
 RAW_LOGS = limit RAW_LOGS 2;
 FILTERED_LOGS = filter RAW_LOGS by numvisits  0;
 store FILTERED_LOGS into 'emptyfilteredlogs' using PigStorage();
 EMPTY_FILTERED_LOGS = load 'emptyfilteredlogs' as (url:chararray, 
 numvisits:int);
 GROUP_EMPTYFILTERED_LOGS = group EMPTY_FILTERED_LOGS by numvisits;
 COUNT_EMPTYFILTERED_LOGS = foreach GROUP_EMPTYFILTERED_LOGS generate
  group, COUNT(EMPTY_FILTERED_LOGS);
 explain COUNT_EMPTYFILTERED_LOGS;
 dump COUNT_EMPTYFILTERED_LOGS;
 {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.