[jira] Updated: (PIG-619) Dumping empty results produces Unable to get results for /tmp/temp-1964806069/tmp256878619 org.apache.pig.builtin.BinStorage message
[ https://issues.apache.org/jira/browse/PIG-619?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alan Gates updated PIG-619: --- Resolution: Fixed Status: Resolved (was: Patch Available) Patch checked in. Dumping empty results produces Unable to get results for /tmp/temp-1964806069/tmp256878619 org.apache.pig.builtin.BinStorage message --- Key: PIG-619 URL: https://issues.apache.org/jira/browse/PIG-619 Project: Pig Issue Type: Bug Components: impl Affects Versions: 0.2.0 Environment: Hadoop 18, Multi-node hadoop installation Reporter: Viraj Bhat Assignee: Alan Gates Fix For: 0.3.0 Attachments: mydata.txt, PIG-619.patch, tmpfileload.pig Following pig script stores empty filter results into 'emptyfilteredlogs' HDFS dir. It later reloads this data from an empty HDFS dir for additional grouping and counting. It has been observed that this script, succeeds on a single node hadoop installation with the following message as the alias COUNT_EMPTYFILTERED_LOGS contains empty data. == 2009-01-13 21:47:08,988 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Success! == But on a multi-node Hadoop installation, the script fails with the following error: == 2009-01-13 13:48:34,602 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Success! java.io.IOException: Unable to open iterator for alias: COUNT_EMPTYFILTERED_LOGS [Unable to get results for /tmp/temp-1964806069/tmp256878619:org.apache.pig.builtin.BinStorage] at org.apache.pig.backend.hadoop.executionengine.HJob.getResults(HJob.java:74) at org.apache.pig.PigServer.openIterator(PigServer.java:408) at org.apache.pig.tools.grunt.GruntParser.processDump(GruntParser.java:269) at org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:178) at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:84) at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:64) at org.apache.pig.Main.main(Main.java:306) Caused by: org.apache.pig.backend.executionengine.ExecException: Unable to get results for /tmp/temp-1964806069/tmp256878619:org.apache.pig.builtin.BinStorage ... 7 more Caused by: java.io.IOException: /tmp/temp-1964806069/tmp256878619 does not exist at org.apache.pig.impl.io.FileLocalizer.openDFSFile(FileLocalizer.java:188) at org.apache.pig.impl.io.FileLocalizer.open(FileLocalizer.java:291) at org.apache.pig.backend.hadoop.executionengine.HJob.getResults(HJob.java:69) ... 6 more == {code} RAW_LOGS = load 'mydata.txt' as (url:chararray, numvisits:int); RAW_LOGS = limit RAW_LOGS 2; FILTERED_LOGS = filter RAW_LOGS by numvisits 0; store FILTERED_LOGS into 'emptyfilteredlogs' using PigStorage(); EMPTY_FILTERED_LOGS = load 'emptyfilteredlogs' as (url:chararray, numvisits:int); GROUP_EMPTYFILTERED_LOGS = group EMPTY_FILTERED_LOGS by numvisits; COUNT_EMPTYFILTERED_LOGS = foreach GROUP_EMPTYFILTERED_LOGS generate group, COUNT(EMPTY_FILTERED_LOGS); explain COUNT_EMPTYFILTERED_LOGS; dump COUNT_EMPTYFILTERED_LOGS; {code} -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-619) Dumping empty results produces Unable to get results for /tmp/temp-1964806069/tmp256878619 org.apache.pig.builtin.BinStorage message
[ https://issues.apache.org/jira/browse/PIG-619?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alan Gates updated PIG-619: --- Fix Version/s: 0.3.0 Status: Patch Available (was: Open) In order to see this behavior, you need three map reduce jobs, something like: A = load B = filter everything out C = group D = foreach E = distinct F = group G = foreach store G In this case the first job (A-D) will run and produce 0 length part files. The second job (E) will run, but no maps will be started because the files are zero length. As a result Hadoop now seems to create no output files for this second job. The third job (F-G) then fails complaining that the input files don't exist. The patch changes pig's slicer to return at least one input split per part file even when the file is zero length. Dumping empty results produces Unable to get results for /tmp/temp-1964806069/tmp256878619 org.apache.pig.builtin.BinStorage message --- Key: PIG-619 URL: https://issues.apache.org/jira/browse/PIG-619 Project: Pig Issue Type: Bug Components: impl Affects Versions: 0.2.0 Environment: Hadoop 18, Multi-node hadoop installation Reporter: Viraj Bhat Assignee: Alan Gates Fix For: 0.3.0 Attachments: mydata.txt, PIG-619.patch, tmpfileload.pig Following pig script stores empty filter results into 'emptyfilteredlogs' HDFS dir. It later reloads this data from an empty HDFS dir for additional grouping and counting. It has been observed that this script, succeeds on a single node hadoop installation with the following message as the alias COUNT_EMPTYFILTERED_LOGS contains empty data. == 2009-01-13 21:47:08,988 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Success! == But on a multi-node Hadoop installation, the script fails with the following error: == 2009-01-13 13:48:34,602 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Success! java.io.IOException: Unable to open iterator for alias: COUNT_EMPTYFILTERED_LOGS [Unable to get results for /tmp/temp-1964806069/tmp256878619:org.apache.pig.builtin.BinStorage] at org.apache.pig.backend.hadoop.executionengine.HJob.getResults(HJob.java:74) at org.apache.pig.PigServer.openIterator(PigServer.java:408) at org.apache.pig.tools.grunt.GruntParser.processDump(GruntParser.java:269) at org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:178) at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:84) at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:64) at org.apache.pig.Main.main(Main.java:306) Caused by: org.apache.pig.backend.executionengine.ExecException: Unable to get results for /tmp/temp-1964806069/tmp256878619:org.apache.pig.builtin.BinStorage ... 7 more Caused by: java.io.IOException: /tmp/temp-1964806069/tmp256878619 does not exist at org.apache.pig.impl.io.FileLocalizer.openDFSFile(FileLocalizer.java:188) at org.apache.pig.impl.io.FileLocalizer.open(FileLocalizer.java:291) at org.apache.pig.backend.hadoop.executionengine.HJob.getResults(HJob.java:69) ... 6 more == {code} RAW_LOGS = load 'mydata.txt' as (url:chararray, numvisits:int); RAW_LOGS = limit RAW_LOGS 2; FILTERED_LOGS = filter RAW_LOGS by numvisits 0; store FILTERED_LOGS into 'emptyfilteredlogs' using PigStorage(); EMPTY_FILTERED_LOGS = load 'emptyfilteredlogs' as (url:chararray, numvisits:int); GROUP_EMPTYFILTERED_LOGS = group EMPTY_FILTERED_LOGS by numvisits; COUNT_EMPTYFILTERED_LOGS = foreach GROUP_EMPTYFILTERED_LOGS generate group, COUNT(EMPTY_FILTERED_LOGS); explain COUNT_EMPTYFILTERED_LOGS; dump COUNT_EMPTYFILTERED_LOGS; {code} -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-619) Dumping empty results produces Unable to get results for /tmp/temp-1964806069/tmp256878619 org.apache.pig.builtin.BinStorage message
[ https://issues.apache.org/jira/browse/PIG-619?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Viraj Bhat updated PIG-619: --- Attachment: mydata.txt Test data Dumping empty results produces Unable to get results for /tmp/temp-1964806069/tmp256878619 org.apache.pig.builtin.BinStorage message --- Key: PIG-619 URL: https://issues.apache.org/jira/browse/PIG-619 Project: Pig Issue Type: Bug Components: impl Affects Versions: types_branch Environment: Hadoop 18, Multi-node hadoop installation Reporter: Viraj Bhat Fix For: types_branch Attachments: mydata.txt, tmpfileload.pig Following pig script stores empty filter results into 'emptyfilteredlogs' HDFS dir. It later reloads this data from an empty HDFS dir for additional grouping and counting. It has been observed that this script, succeeds on a single node hadoop installation with the following message as the alias COUNT_EMPTYFILTERED_LOGS contains empty data. == 2009-01-13 21:47:08,988 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Success! == But on a multi-node Hadoop installation, the script fails with the following error: == 2009-01-13 13:48:34,602 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Success! java.io.IOException: Unable to open iterator for alias: COUNT_EMPTYFILTERED_LOGS [Unable to get results for /tmp/temp-1964806069/tmp256878619:org.apache.pig.builtin.BinStorage] at org.apache.pig.backend.hadoop.executionengine.HJob.getResults(HJob.java:74) at org.apache.pig.PigServer.openIterator(PigServer.java:408) at org.apache.pig.tools.grunt.GruntParser.processDump(GruntParser.java:269) at org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:178) at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:84) at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:64) at org.apache.pig.Main.main(Main.java:306) Caused by: org.apache.pig.backend.executionengine.ExecException: Unable to get results for /tmp/temp-1964806069/tmp256878619:org.apache.pig.builtin.BinStorage ... 7 more Caused by: java.io.IOException: /tmp/temp-1964806069/tmp256878619 does not exist at org.apache.pig.impl.io.FileLocalizer.openDFSFile(FileLocalizer.java:188) at org.apache.pig.impl.io.FileLocalizer.open(FileLocalizer.java:291) at org.apache.pig.backend.hadoop.executionengine.HJob.getResults(HJob.java:69) ... 6 more == {code} RAW_LOGS = load 'mydata.txt' as (url:chararray, numvisits:int); RAW_LOGS = limit RAW_LOGS 2; FILTERED_LOGS = filter RAW_LOGS by numvisits 0; store FILTERED_LOGS into 'emptyfilteredlogs' using PigStorage(); EMPTY_FILTERED_LOGS = load 'emptyfilteredlogs' as (url:chararray, numvisits:int); GROUP_EMPTYFILTERED_LOGS = group EMPTY_FILTERED_LOGS by numvisits; COUNT_EMPTYFILTERED_LOGS = foreach GROUP_EMPTYFILTERED_LOGS generate group, COUNT(EMPTY_FILTERED_LOGS); explain COUNT_EMPTYFILTERED_LOGS; dump COUNT_EMPTYFILTERED_LOGS; {code} -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.