[ 
https://issues.apache.org/jira/browse/PIG-175?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Gates resolved PIG-175.
----------------------------

    Resolution: Won't Fix

Pig local mode has been dropped in 0.6 in favor of Hadoop's LocalJobRunner.  
I'm not worried about being unable to mix compressed and uncompressed files in 
MiniMR mode.

> Reading compressed files in local mode + MiniMRCluster
> ------------------------------------------------------
>
>                 Key: PIG-175
>                 URL: https://issues.apache.org/jira/browse/PIG-175
>             Project: Pig
>          Issue Type: Bug
>            Reporter: Craig Macdonald
>         Attachments: testCompressed.sh
>
>
> I have written a small test script that tests if three simple compressed and 
> uncompressed files can be loaded successfully. Essentially, it writes a file, 
> compresses it using gzip and bzip2, and see if Pig can load it. I use both 
> local execution mode and miniMR cluster.
> Here are my results:
> MiniMRCluster
>  * uncompressed: OK
>  * gzip: OK
>  * bzip2: OK
>  * All three at once: not OK
> Local Execution Mode
>  * uncompressed: OK
>  * gzip: not OK (garbled output)
>  * bzip2: not OK ( garbled output)
>  * All three at once: not OK (expected)
> I'm not sure what the problem is with the miniMRcluster - there is a NPE in 
> PigSplit.getLocations(). I suspect that getFileCacheHints() is returning 
> null, which ususally indicates a non-existant file. 
> However, for the local execution mode, I'm fairly confident that this mode 
> has no support for compressed files.
> Craig
> {noformat}
> ==========================================
> Bashs good friend: cat
> ==========================================
> Normal
> A
> B
> C
> bz2
> A
> B
> C
> gzip
> A
> B
> C
> ==========================================
> MiniMRCluster
> ==========================================
> test.all.pig
> 2008-03-29 12:07:22,103 [main] INFO  
> org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting 
> to hadoop file system at: file:///
> 2008-03-29 12:07:22,241 [main] INFO  org.apache.hadoop.metrics.jvm.JvmMetrics 
> - Initializing JVM Metrics with processName=JobTracker, sessionId=
> 2008-03-29 12:07:22,555 [main] INFO  
> org.apache.pig.backend.hadoop.executionengine.POMapreduce - ----- MapReduce 
> Job -----
> 2008-03-29 12:07:22,556 [main] INFO  
> org.apache.pig.backend.hadoop.executionengine.POMapreduce - Input: 
> [/users/grad/craigm/src/pig/FROMApache/trunk4/trunk/test.normal:org.apache.pig.builtin.PigStorage()]
> 2008-03-29 12:07:22,556 [main] INFO  
> org.apache.pig.backend.hadoop.executionengine.POMapreduce - Map: [[*]]
> 2008-03-29 12:07:22,556 [main] INFO  
> org.apache.pig.backend.hadoop.executionengine.POMapreduce - Group: null
> 2008-03-29 12:07:22,556 [main] INFO  
> org.apache.pig.backend.hadoop.executionengine.POMapreduce - Combine: null
> 2008-03-29 12:07:22,556 [main] INFO  
> org.apache.pig.backend.hadoop.executionengine.POMapreduce - Reduce: null
> 2008-03-29 12:07:22,556 [main] INFO  
> org.apache.pig.backend.hadoop.executionengine.POMapreduce - Output: 
> /tmp/temp-1403805719/tmp1733057091:org.apache.pig.builtin.BinStorage
> 2008-03-29 12:07:22,556 [main] INFO  
> org.apache.pig.backend.hadoop.executionengine.POMapreduce - Split: null
> 2008-03-29 12:07:22,556 [main] INFO  
> org.apache.pig.backend.hadoop.executionengine.POMapreduce - Map parallelism: 
> -1
> 2008-03-29 12:07:22,557 [main] INFO  
> org.apache.pig.backend.hadoop.executionengine.POMapreduce - Reduce 
> parallelism: -1
> 2008-03-29 12:07:23,427 [Thread-0] INFO  org.apache.hadoop.mapred.MapTask - 
> numReduceTasks: 1
> 2008-03-29 12:07:23,544 [Thread-0] INFO  
> org.apache.hadoop.mapred.LocalJobRunner -
> 2008-03-29 12:07:23,545 [Thread-0] INFO  org.apache.hadoop.mapred.TaskRunner 
> - Task 'map_0000' done.
> 2008-03-29 12:07:23,581 [Thread-0] INFO  org.apache.hadoop.mapred.TaskRunner 
> - Saved output of task 'map_0000' to file:/tmp/temp-1403805719/tmp1733057091
> 2008-03-29 12:07:23,625 [Thread-0] INFO  
> org.apache.hadoop.mapred.LocalJobRunner - reduce > reduce
> 2008-03-29 12:07:23,626 [Thread-0] INFO  org.apache.hadoop.mapred.TaskRunner 
> - Task 'reduce_cibps7' done.
> 2008-03-29 12:07:23,630 [Thread-0] INFO  org.apache.hadoop.mapred.TaskRunner 
> - Saved output of task 'reduce_cibps7' to 
> file:/tmp/temp-1403805719/tmp1733057091
> 2008-03-29 12:07:24,383 [main] INFO  
> org.apache.pig.backend.hadoop.executionengine.mapreduceExec.MapReduceLauncher 
> - Pig progress = 100%
> (A)
> (B)
> (C)
> 2008-03-29 12:07:24,415 [main] INFO  
> org.apache.pig.backend.hadoop.executionengine.POMapreduce - ----- MapReduce 
> Job -----
> 2008-03-29 12:07:24,415 [main] INFO  
> org.apache.pig.backend.hadoop.executionengine.POMapreduce - Input: 
> [/user/craigm/test.gz:org.apache.pig.builtin.PigStorage()]
> 2008-03-29 12:07:24,416 [main] INFO  
> org.apache.pig.backend.hadoop.executionengine.POMapreduce - Map: [[*]]
> 2008-03-29 12:07:24,416 [main] INFO  
> org.apache.pig.backend.hadoop.executionengine.POMapreduce - Group: null
> 2008-03-29 12:07:24,416 [main] INFO  
> org.apache.pig.backend.hadoop.executionengine.POMapreduce - Combine: null
> 2008-03-29 12:07:24,416 [main] INFO  
> org.apache.pig.backend.hadoop.executionengine.POMapreduce - Reduce: null
> 2008-03-29 12:07:24,416 [main] INFO  
> org.apache.pig.backend.hadoop.executionengine.POMapreduce - Output: 
> /tmp/temp-1403805719/tmp-1191951534:org.apache.pig.builtin.BinStorage
> 2008-03-29 12:07:24,416 [main] INFO  
> org.apache.pig.backend.hadoop.executionengine.POMapreduce - Split: null
> 2008-03-29 12:07:24,416 [main] INFO  
> org.apache.pig.backend.hadoop.executionengine.POMapreduce - Map parallelism: 
> -1
> 2008-03-29 12:07:24,417 [main] INFO  
> org.apache.pig.backend.hadoop.executionengine.POMapreduce - Reduce 
> parallelism: -1
> java.lang.NullPointerException
>         at 
> org.apache.pig.backend.hadoop.executionengine.mapreduceExec.PigSplit.getLocations(PigSplit.java:107)
>         at 
> org.apache.hadoop.mapred.JobClient.writeSplitsFile(JobClient.java:638)
>         at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:540)
>         at 
> org.apache.pig.backend.hadoop.executionengine.mapreduceExec.MapReduceLauncher.launchPig(MapReduceLauncher.java:260)
>         at 
> org.apache.pig.backend.hadoop.executionengine.POMapreduce.open(POMapreduce.java:176)
>         at 
> org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.execute(HExecutionEngine.java:274)
>         at org.apache.pig.PigServer.openIterator(PigServer.java:314)
>         at 
> org.apache.pig.tools.grunt.GruntParser.processDump(GruntParser.java:255)
>         at 
> org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:160)
>         at 
> org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:63)
>         at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:60)
>         at org.apache.pig.Main.main(Main.java:265)
> 2008-03-29 12:07:24,868 [main] ERROR org.apache.pig.tools.grunt.Grunt - 
> java.io.IOException: Unable to open iterator for alias: gz
>         at 
> org.apache.pig.impl.util.WrappedIOException.wrap(WrappedIOException.java:16)
>         at org.apache.pig.PigServer.openIterator(PigServer.java:325)
>         at 
> org.apache.pig.tools.grunt.GruntParser.processDump(GruntParser.java:255)
>         at 
> org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:160)
>         at 
> org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:63)
>         at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:60)
>         at org.apache.pig.Main.main(Main.java:265)
> Caused by: org.apache.pig.backend.executionengine.ExecException: 
> java.io.IOException
>         at 
> org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.execute(HExecutionEngine.java:288)
>         at org.apache.pig.PigServer.openIterator(PigServer.java:314)
>         ... 5 more
> Caused by: java.io.IOException
>         at 
> org.apache.pig.impl.util.WrappedIOException.wrap(WrappedIOException.java:16)
>         at 
> org.apache.pig.impl.util.WrappedIOException.wrap(WrappedIOException.java:12)
>         at 
> org.apache.pig.backend.hadoop.executionengine.mapreduceExec.MapReduceLauncher.launchPig(MapReduceLauncher.java:380)
>         at 
> org.apache.pig.backend.hadoop.executionengine.POMapreduce.open(POMapreduce.java:176)
>         at 
> org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.execute(HExecutionEngine.java:274)
>         ... 6 more
> Caused by: java.lang.NullPointerException
>         at 
> org.apache.pig.backend.hadoop.executionengine.mapreduceExec.PigSplit.getLocations(PigSplit.java:107)
>         at 
> org.apache.hadoop.mapred.JobClient.writeSplitsFile(JobClient.java:638)
>         at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:540)
>         at 
> org.apache.pig.backend.hadoop.executionengine.mapreduceExec.MapReduceLauncher.launchPig(MapReduceLauncher.java:260)
>         ... 8 more
> 2008-03-29 12:07:24,869 [main] ERROR org.apache.pig.tools.grunt.Grunt - 
> Unable to open iterator for alias: gz
> test.bz2.pig
> 2008-03-29 12:07:25,349 [main] INFO  
> org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting 
> to hadoop file system at: file:///
> 2008-03-29 12:07:25,486 [main] INFO  org.apache.hadoop.metrics.jvm.JvmMetrics 
> - Initializing JVM Metrics with processName=JobTracker, sessionId=
> 2008-03-29 12:07:25,761 [main] INFO  
> org.apache.pig.backend.hadoop.executionengine.POMapreduce - ----- MapReduce 
> Job -----
> 2008-03-29 12:07:25,761 [main] INFO  
> org.apache.pig.backend.hadoop.executionengine.POMapreduce - Input: 
> [/users/grad/craigm/src/pig/FROMApache/trunk4/trunk/test.bz2:org.apache.pig.builtin.PigStorage()]
> 2008-03-29 12:07:25,761 [main] INFO  
> org.apache.pig.backend.hadoop.executionengine.POMapreduce - Map: [[*]]
> 2008-03-29 12:07:25,762 [main] INFO  
> org.apache.pig.backend.hadoop.executionengine.POMapreduce - Group: null
> 2008-03-29 12:07:25,762 [main] INFO  
> org.apache.pig.backend.hadoop.executionengine.POMapreduce - Combine: null
> 2008-03-29 12:07:25,762 [main] INFO  
> org.apache.pig.backend.hadoop.executionengine.POMapreduce - Reduce: null
> 2008-03-29 12:07:25,762 [main] INFO  
> org.apache.pig.backend.hadoop.executionengine.POMapreduce - Output: 
> /tmp/temp-142293823/tmp-1682881533:org.apache.pig.builtin.BinStorage
> 2008-03-29 12:07:25,762 [main] INFO  
> org.apache.pig.backend.hadoop.executionengine.POMapreduce - Split: null
> 2008-03-29 12:07:25,762 [main] INFO  
> org.apache.pig.backend.hadoop.executionengine.POMapreduce - Map parallelism: 
> -1
> 2008-03-29 12:07:25,762 [main] INFO  
> org.apache.pig.backend.hadoop.executionengine.POMapreduce - Reduce 
> parallelism: -1
> 2008-03-29 12:07:26,585 [Thread-0] INFO  org.apache.hadoop.mapred.MapTask - 
> numReduceTasks: 1
> 2008-03-29 12:07:26,802 [Thread-0] INFO  
> org.apache.hadoop.mapred.LocalJobRunner -
> 2008-03-29 12:07:26,802 [Thread-0] INFO  org.apache.hadoop.mapred.TaskRunner 
> - Task 'map_0000' done.
> 2008-03-29 12:07:26,809 [Thread-0] INFO  org.apache.hadoop.mapred.TaskRunner 
> - Saved output of task 'map_0000' to file:/tmp/temp-142293823/tmp-1682881533
> 2008-03-29 12:07:26,852 [Thread-0] INFO  
> org.apache.hadoop.mapred.LocalJobRunner - reduce > reduce
> 2008-03-29 12:07:26,852 [Thread-0] INFO  org.apache.hadoop.mapred.TaskRunner 
> - Task 'reduce_r75h48' done.
> 2008-03-29 12:07:26,859 [Thread-0] INFO  org.apache.hadoop.mapred.TaskRunner 
> - Saved output of task 'reduce_r75h48' to 
> file:/tmp/temp-142293823/tmp-1682881533
> 2008-03-29 12:07:27,547 [main] INFO  
> org.apache.pig.backend.hadoop.executionengine.mapreduceExec.MapReduceLauncher 
> - Pig progress = 100%
> (A)
> (B)
> (C)
> test.gz.pig
> 2008-03-29 12:07:28,110 [main] INFO  
> org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting 
> to hadoop file system at: file:///
> 2008-03-29 12:07:28,266 [main] INFO  org.apache.hadoop.metrics.jvm.JvmMetrics 
> - Initializing JVM Metrics with processName=JobTracker, sessionId=
> 2008-03-29 12:07:28,582 [main] INFO  
> org.apache.pig.backend.hadoop.executionengine.POMapreduce - ----- MapReduce 
> Job -----
> 2008-03-29 12:07:28,583 [main] INFO  
> org.apache.pig.backend.hadoop.executionengine.POMapreduce - Input: 
> [/users/grad/craigm/src/pig/FROMApache/trunk4/trunk/test.gz:org.apache.pig.builtin.PigStorage()]
> 2008-03-29 12:07:28,583 [main] INFO  
> org.apache.pig.backend.hadoop.executionengine.POMapreduce - Map: [[*]]
> 2008-03-29 12:07:28,583 [main] INFO  
> org.apache.pig.backend.hadoop.executionengine.POMapreduce - Group: null
> 2008-03-29 12:07:28,583 [main] INFO  
> org.apache.pig.backend.hadoop.executionengine.POMapreduce - Combine: null
> 2008-03-29 12:07:28,584 [main] INFO  
> org.apache.pig.backend.hadoop.executionengine.POMapreduce - Reduce: null
> 2008-03-29 12:07:28,584 [main] INFO  
> org.apache.pig.backend.hadoop.executionengine.POMapreduce - Output: 
> /tmp/temp-1552662535/tmp1393315176:org.apache.pig.builtin.BinStorage
> 2008-03-29 12:07:28,584 [main] INFO  
> org.apache.pig.backend.hadoop.executionengine.POMapreduce - Split: null
> 2008-03-29 12:07:28,584 [main] INFO  
> org.apache.pig.backend.hadoop.executionengine.POMapreduce - Map parallelism: 
> -1
> 2008-03-29 12:07:28,584 [main] INFO  
> org.apache.pig.backend.hadoop.executionengine.POMapreduce - Reduce 
> parallelism: -1
> 2008-03-29 12:07:29,621 [Thread-0] INFO  org.apache.hadoop.mapred.MapTask - 
> numReduceTasks: 1
> 2008-03-29 12:07:29,677 [Thread-0] WARN  
> org.apache.hadoop.util.NativeCodeLoader - Unable to load native-hadoop 
> library for your platform... using builtin-java classes where applicable
> 2008-03-29 12:07:29,830 [Thread-0] INFO  
> org.apache.hadoop.mapred.LocalJobRunner -
> 2008-03-29 12:07:29,831 [Thread-0] INFO  org.apache.hadoop.mapred.TaskRunner 
> - Task 'map_0000' done.
> 2008-03-29 12:07:29,875 [Thread-0] INFO  org.apache.hadoop.mapred.TaskRunner 
> - Saved output of task 'map_0000' to file:/tmp/temp-1552662535/tmp1393315176
> 2008-03-29 12:07:30,096 [Thread-0] INFO  
> org.apache.hadoop.mapred.LocalJobRunner - reduce > reduce
> 2008-03-29 12:07:30,097 [Thread-0] INFO  org.apache.hadoop.mapred.TaskRunner 
> - Task 'reduce_kan4fo' done.
> 2008-03-29 12:07:30,103 [Thread-0] INFO  org.apache.hadoop.mapred.TaskRunner 
> - Saved output of task 'reduce_kan4fo' to 
> file:/tmp/temp-1552662535/tmp1393315176
> 2008-03-29 12:07:30,583 [main] INFO  
> org.apache.pig.backend.hadoop.executionengine.mapreduceExec.MapReduceLauncher 
> - Pig progress = 100%
> (A)
> (B)
> (C)
> test.normal.pig
> 2008-03-29 12:07:31,114 [main] INFO  
> org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting 
> to hadoop file system at: file:///
> 2008-03-29 12:07:31,270 [main] INFO  org.apache.hadoop.metrics.jvm.JvmMetrics 
> - Initializing JVM Metrics with processName=JobTracker, sessionId=
> 2008-03-29 12:07:31,556 [main] INFO  
> org.apache.pig.backend.hadoop.executionengine.POMapreduce - ----- MapReduce 
> Job -----
> 2008-03-29 12:07:31,556 [main] INFO  
> org.apache.pig.backend.hadoop.executionengine.POMapreduce - Input: 
> [/users/grad/craigm/src/pig/FROMApache/trunk4/trunk/test.normal:org.apache.pig.builtin.PigStorage()]
> 2008-03-29 12:07:31,556 [main] INFO  
> org.apache.pig.backend.hadoop.executionengine.POMapreduce - Map: [[*]]
> 2008-03-29 12:07:31,556 [main] INFO  
> org.apache.pig.backend.hadoop.executionengine.POMapreduce - Group: null
> 2008-03-29 12:07:31,557 [main] INFO  
> org.apache.pig.backend.hadoop.executionengine.POMapreduce - Combine: null
> 2008-03-29 12:07:31,557 [main] INFO  
> org.apache.pig.backend.hadoop.executionengine.POMapreduce - Reduce: null
> 2008-03-29 12:07:31,557 [main] INFO  
> org.apache.pig.backend.hadoop.executionengine.POMapreduce - Output: 
> /tmp/temp-323341057/tmp-1104693095:org.apache.pig.builtin.BinStorage
> 2008-03-29 12:07:31,557 [main] INFO  
> org.apache.pig.backend.hadoop.executionengine.POMapreduce - Split: null
> 2008-03-29 12:07:31,557 [main] INFO  
> org.apache.pig.backend.hadoop.executionengine.POMapreduce - Map parallelism: 
> -1
> 2008-03-29 12:07:31,557 [main] INFO  
> org.apache.pig.backend.hadoop.executionengine.POMapreduce - Reduce 
> parallelism: -1
> 2008-03-29 12:07:32,402 [Thread-0] INFO  org.apache.hadoop.mapred.MapTask - 
> numReduceTasks: 1
> 2008-03-29 12:07:32,514 [Thread-0] INFO  
> org.apache.hadoop.mapred.LocalJobRunner -
> 2008-03-29 12:07:32,514 [Thread-0] INFO  org.apache.hadoop.mapred.TaskRunner 
> - Task 'map_0000' done.
> 2008-03-29 12:07:32,521 [Thread-0] INFO  org.apache.hadoop.mapred.TaskRunner 
> - Saved output of task 'map_0000' to file:/tmp/temp-323341057/tmp-1104693095
> 2008-03-29 12:07:32,568 [Thread-0] INFO  
> org.apache.hadoop.mapred.LocalJobRunner - reduce > reduce
> 2008-03-29 12:07:32,568 [Thread-0] INFO  org.apache.hadoop.mapred.TaskRunner 
> - Task 'reduce_4q573x' done.
> 2008-03-29 12:07:32,572 [Thread-0] INFO  org.apache.hadoop.mapred.TaskRunner 
> - Saved output of task 'reduce_4q573x' to 
> file:/tmp/temp-323341057/tmp-1104693095
> 2008-03-29 12:07:33,369 [main] INFO  
> org.apache.pig.backend.hadoop.executionengine.mapreduceExec.MapReduceLauncher 
> - Pig progress = 100%
> (A)
> (B)
> (C)
> ==========================================
> Local execution mode
> ==========================================
> test.all.pig
> (A)
> (B)
> (C)
> (?0?Gs?r?r?s?}8)
> (BZh91AY&SY????8 !?h3M???"?(HP??)
> test.bz2.pig
> (BZh91AY&SY????8 !?h3M???"?(HP??)
> test.gz.pig
> (?0?Gs?r?r?s?}8)
> test.normal.pig
> (A)
> (B)
> (C)
> {noformat}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to