[jira] Commented: (PIG-752) local mode doesn't read bzip2 and gzip compressed data files
[ https://issues.apache.org/jira/browse/PIG-752?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12803795#action_12803795 ] David Ciemiewicz commented on PIG-752: -- Jeff, What do you mean when you say local mode has been removed? Does this mean that the option -exectype local has been removed? Or does this mean that the local mode execution code has been replaced or will be replaced by a M/R execution engine that operates on the users local computer without the need for an HDFS grid. If the former (no local exection), this is nuts. If the latter (M/R execution for local execution), and this will supply the means of doing bzip compression reading and writing, then this isn't a WON'T FIX, this is a FIXED by change in execution engine? So which is it? local mode doesn't read bzip2 and gzip compressed data files Key: PIG-752 URL: https://issues.apache.org/jira/browse/PIG-752 Project: Pig Issue Type: Bug Affects Versions: 0.4.0 Reporter: David Ciemiewicz Assignee: Jeff Zhang Attachments: Pig_752.Patch Problem 1) use of .bz2 file extension does not store results bzip2 compressed in Local mode (-exectype local) If I use the .bz2 filename extension in a STORE statement on HDFS, the results are stored with bzip2 compression. If I use the .bz2 filename extension in a STORE statement on local file system, the results are NOT stored with bzip2 compression. compact.bz2.pig: {code} A = load 'events.test' using PigStorage(); store A into 'events.test.bz2' using PigStorage(); C = load 'events.test.bz2' using PigStorage(); C = limit C 10; dump C; {code} {code} -bash-3.00$ pig -exectype local compact.bz2.pig -bash-3.00$ file events.test events.test: ASCII English text, with very long lines -bash-3.00$ file events.test.bz2 events.test.bz2: ASCII English text, with very long lines -bash-3.00$ cat events.test | bzip2 events.test.bz2 -bash-3.00$ file events.test.bz2 events.test.bz2: bzip2 compressed data, block size = 900k {code} The output format in local mode is definitely not bzip2, but it should be. {code} Problem 2) pig in local mode does not decompress bzip2 compressed files, but should, to be consistent with HDFS read.bz2.pig: {code} A = load 'events.test.bz2' using PigStorage(); A = limit A 10; dump A; {code} The output should be human readable but is instead garbage, indicating no decompression took place during the load: {code} -bash-3.00$ pig -exectype local read.bz2.pig USING: /grid/0/gs/pig/current 2009-04-03 18:26:30,455 [main] INFO org.apache.pig.backend.local.executionengine.LocalPigLauncher - 100% complete! 2009-04-03 18:26:30,456 [main] INFO org.apache.pig.backend.local.executionengine.LocalPigLauncher - Success!! (BZh91AYsyoz?u?...@{x_?d?|u-??mK???;??4?C??) ((R? 6?*mg, ?6?Zj?k,???0?QT?d???hY?#mJ?[j???z?m?t?u?K)??K5+??)?m?E7j?X?8a?? ??U?p@@MT?$?B?P??N??=???(z}gk...@c$\??i]?g:?J) a(R?,?u?v???...@?i@??J??!D?)???A?PP?IY??m? (mP(i?4,#F[?I)@?...@??|7^?}U??wwg,?u?$?T???((Q!D?=`*?}hP??_|??=?(??2???m=?xG?(?rC?B?(33??:4?N???t|??T?*??k??NT?x???=?fyv?wf??4z???4t?) (?oou?t???Kwl?3?nCM?WS?;l???P?s?x a???e)B??9? ?44 ((?...@4?) (f) (?...@+?d?0@?U) (Q?SR) -bash-3.00$ {code} -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-752) local mode doesn't read bzip2 and gzip compressed data files
[ https://issues.apache.org/jira/browse/PIG-752?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12803808#action_12803808 ] Olga Natkovich commented on PIG-752: Just to clarify, local mode is not going away but in pig 0.7.0 it will be based on hadoop's local mode so we will get this fix for free. local mode doesn't read bzip2 and gzip compressed data files Key: PIG-752 URL: https://issues.apache.org/jira/browse/PIG-752 Project: Pig Issue Type: Bug Affects Versions: 0.4.0 Reporter: David Ciemiewicz Assignee: Jeff Zhang Attachments: Pig_752.Patch Problem 1) use of .bz2 file extension does not store results bzip2 compressed in Local mode (-exectype local) If I use the .bz2 filename extension in a STORE statement on HDFS, the results are stored with bzip2 compression. If I use the .bz2 filename extension in a STORE statement on local file system, the results are NOT stored with bzip2 compression. compact.bz2.pig: {code} A = load 'events.test' using PigStorage(); store A into 'events.test.bz2' using PigStorage(); C = load 'events.test.bz2' using PigStorage(); C = limit C 10; dump C; {code} {code} -bash-3.00$ pig -exectype local compact.bz2.pig -bash-3.00$ file events.test events.test: ASCII English text, with very long lines -bash-3.00$ file events.test.bz2 events.test.bz2: ASCII English text, with very long lines -bash-3.00$ cat events.test | bzip2 events.test.bz2 -bash-3.00$ file events.test.bz2 events.test.bz2: bzip2 compressed data, block size = 900k {code} The output format in local mode is definitely not bzip2, but it should be. {code} Problem 2) pig in local mode does not decompress bzip2 compressed files, but should, to be consistent with HDFS read.bz2.pig: {code} A = load 'events.test.bz2' using PigStorage(); A = limit A 10; dump A; {code} The output should be human readable but is instead garbage, indicating no decompression took place during the load: {code} -bash-3.00$ pig -exectype local read.bz2.pig USING: /grid/0/gs/pig/current 2009-04-03 18:26:30,455 [main] INFO org.apache.pig.backend.local.executionengine.LocalPigLauncher - 100% complete! 2009-04-03 18:26:30,456 [main] INFO org.apache.pig.backend.local.executionengine.LocalPigLauncher - Success!! (BZh91AYsyoz?u?...@{x_?d?|u-??mK???;??4?C??) ((R? 6?*mg, ?6?Zj?k,???0?QT?d???hY?#mJ?[j???z?m?t?u?K)??K5+??)?m?E7j?X?8a?? ??U?p@@MT?$?B?P??N??=???(z}gk...@c$\??i]?g:?J) a(R?,?u?v???...@?i@??J??!D?)???A?PP?IY??m? (mP(i?4,#F[?I)@?...@??|7^?}U??wwg,?u?$?T???((Q!D?=`*?}hP??_|??=?(??2???m=?xG?(?rC?B?(33??:4?N???t|??T?*??k??NT?x???=?fyv?wf??4z???4t?) (?oou?t???Kwl?3?nCM?WS?;l???P?s?x a???e)B??9? ?44 ((?...@4?) (f) (?...@+?d?0@?U) (Q?SR) -bash-3.00$ {code} -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-752) local mode doesn't read bzip2 and gzip compressed data files
[ https://issues.apache.org/jira/browse/PIG-752?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12757944#action_12757944 ] Alan Gates commented on PIG-752: It means that the patch program was unable to apply your patch to HFile.java. I would try regenerating the patch against the latest trunk and see if you get better results. local mode doesn't read bzip2 and gzip compressed data files Key: PIG-752 URL: https://issues.apache.org/jira/browse/PIG-752 Project: Pig Issue Type: Bug Affects Versions: 0.4.0 Reporter: David Ciemiewicz Assignee: Jeff Zhang Fix For: 0.4.0 Attachments: Pig_752.Patch Problem 1) use of .bz2 file extension does not store results bzip2 compressed in Local mode (-exectype local) If I use the .bz2 filename extension in a STORE statement on HDFS, the results are stored with bzip2 compression. If I use the .bz2 filename extension in a STORE statement on local file system, the results are NOT stored with bzip2 compression. compact.bz2.pig: {code} A = load 'events.test' using PigStorage(); store A into 'events.test.bz2' using PigStorage(); C = load 'events.test.bz2' using PigStorage(); C = limit C 10; dump C; {code} {code} -bash-3.00$ pig -exectype local compact.bz2.pig -bash-3.00$ file events.test events.test: ASCII English text, with very long lines -bash-3.00$ file events.test.bz2 events.test.bz2: ASCII English text, with very long lines -bash-3.00$ cat events.test | bzip2 events.test.bz2 -bash-3.00$ file events.test.bz2 events.test.bz2: bzip2 compressed data, block size = 900k {code} The output format in local mode is definitely not bzip2, but it should be. {code} Problem 2) pig in local mode does not decompress bzip2 compressed files, but should, to be consistent with HDFS read.bz2.pig: {code} A = load 'events.test.bz2' using PigStorage(); A = limit A 10; dump A; {code} The output should be human readable but is instead garbage, indicating no decompression took place during the load: {code} -bash-3.00$ pig -exectype local read.bz2.pig USING: /grid/0/gs/pig/current 2009-04-03 18:26:30,455 [main] INFO org.apache.pig.backend.local.executionengine.LocalPigLauncher - 100% complete! 2009-04-03 18:26:30,456 [main] INFO org.apache.pig.backend.local.executionengine.LocalPigLauncher - Success!! (BZh91AYsyoz?u?...@{x_?d?|u-??mK???;??4?C??) ((R? 6?*mg, ?6?Zj?k,???0?QT?d???hY?#mJ?[j???z?m?t?u?K)??K5+??)?m?E7j?X?8a?? ??U?p@@MT?$?B?P??N??=???(z}gk...@c$\??i]?g:?J) a(R?,?u?v???...@?i@??J??!D?)???A?PP?IY??m? (mP(i?4,#F[?I)@?...@??|7^?}U??wwg,?u?$?T???((Q!D?=`*?}hP??_|??=?(??2???m=?xG?(?rC?B?(33??:4?N???t|??T?*??k??NT?x???=?fyv?wf??4z???4t?) (?oou?t???Kwl?3?nCM?WS?;l???P?s?x a???e)B??9? ?44 ((?...@4?) (f) (?...@+?d?0@?U) (Q?SR) -bash-3.00$ {code} -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-752) local mode doesn't read bzip2 and gzip compressed data files
[ https://issues.apache.org/jira/browse/PIG-752?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12757595#action_12757595 ] Jeff Zhang commented on PIG-752: Alan, What does this message mean ? local mode doesn't read bzip2 and gzip compressed data files Key: PIG-752 URL: https://issues.apache.org/jira/browse/PIG-752 Project: Pig Issue Type: Bug Affects Versions: 0.4.0 Reporter: David Ciemiewicz Assignee: Jeff Zhang Fix For: 0.4.0 Attachments: Pig_752.Patch Problem 1) use of .bz2 file extension does not store results bzip2 compressed in Local mode (-exectype local) If I use the .bz2 filename extension in a STORE statement on HDFS, the results are stored with bzip2 compression. If I use the .bz2 filename extension in a STORE statement on local file system, the results are NOT stored with bzip2 compression. compact.bz2.pig: {code} A = load 'events.test' using PigStorage(); store A into 'events.test.bz2' using PigStorage(); C = load 'events.test.bz2' using PigStorage(); C = limit C 10; dump C; {code} {code} -bash-3.00$ pig -exectype local compact.bz2.pig -bash-3.00$ file events.test events.test: ASCII English text, with very long lines -bash-3.00$ file events.test.bz2 events.test.bz2: ASCII English text, with very long lines -bash-3.00$ cat events.test | bzip2 events.test.bz2 -bash-3.00$ file events.test.bz2 events.test.bz2: bzip2 compressed data, block size = 900k {code} The output format in local mode is definitely not bzip2, but it should be. {code} Problem 2) pig in local mode does not decompress bzip2 compressed files, but should, to be consistent with HDFS read.bz2.pig: {code} A = load 'events.test.bz2' using PigStorage(); A = limit A 10; dump A; {code} The output should be human readable but is instead garbage, indicating no decompression took place during the load: {code} -bash-3.00$ pig -exectype local read.bz2.pig USING: /grid/0/gs/pig/current 2009-04-03 18:26:30,455 [main] INFO org.apache.pig.backend.local.executionengine.LocalPigLauncher - 100% complete! 2009-04-03 18:26:30,456 [main] INFO org.apache.pig.backend.local.executionengine.LocalPigLauncher - Success!! (BZh91AYsyoz?u?...@{x_?d?|u-??mK???;??4?C??) ((R? 6?*mg, ?6?Zj?k,???0?QT?d???hY?#mJ?[j???z?m?t?u?K)??K5+??)?m?E7j?X?8a?? ??U?p@@MT?$?B?P??N??=???(z}gk...@c$\??i]?g:?J) a(R?,?u?v???...@?i@??J??!D?)???A?PP?IY??m? (mP(i?4,#F[?I)@?...@??|7^?}U??wwg,?u?$?T???((Q!D?=`*?}hP??_|??=?(??2???m=?xG?(?rC?B?(33??:4?N???t|??T?*??k??NT?x???=?fyv?wf??4z???4t?) (?oou?t???Kwl?3?nCM?WS?;l???P?s?x a???e)B??9? ?44 ((?...@4?) (f) (?...@+?d?0@?U) (Q?SR) -bash-3.00$ {code} -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-752) local mode doesn't read bzip2 and gzip compressed data files
[ https://issues.apache.org/jira/browse/PIG-752?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12756021#action_12756021 ] Jeff Zhang commented on PIG-752: BTW, does anybody know what LocalDataStorage is used for ? It seems Pig will use HDataStorage even I run it in Local mode, so why do we need LocalDataStorage ? local mode doesn't read bzip2 and gzip compressed data files Key: PIG-752 URL: https://issues.apache.org/jira/browse/PIG-752 Project: Pig Issue Type: Bug Affects Versions: 0.4.0 Reporter: David Ciemiewicz Assignee: Jeff Zhang Fix For: 0.4.0 Attachments: Pig_752.Patch Problem 1) use of .bz2 file extension does not store results bzip2 compressed in Local mode (-exectype local) If I use the .bz2 filename extension in a STORE statement on HDFS, the results are stored with bzip2 compression. If I use the .bz2 filename extension in a STORE statement on local file system, the results are NOT stored with bzip2 compression. compact.bz2.pig: {code} A = load 'events.test' using PigStorage(); store A into 'events.test.bz2' using PigStorage(); C = load 'events.test.bz2' using PigStorage(); C = limit C 10; dump C; {code} {code} -bash-3.00$ pig -exectype local compact.bz2.pig -bash-3.00$ file events.test events.test: ASCII English text, with very long lines -bash-3.00$ file events.test.bz2 events.test.bz2: ASCII English text, with very long lines -bash-3.00$ cat events.test | bzip2 events.test.bz2 -bash-3.00$ file events.test.bz2 events.test.bz2: bzip2 compressed data, block size = 900k {code} The output format in local mode is definitely not bzip2, but it should be. {code} Problem 2) pig in local mode does not decompress bzip2 compressed files, but should, to be consistent with HDFS read.bz2.pig: {code} A = load 'events.test.bz2' using PigStorage(); A = limit A 10; dump A; {code} The output should be human readable but is instead garbage, indicating no decompression took place during the load: {code} -bash-3.00$ pig -exectype local read.bz2.pig USING: /grid/0/gs/pig/current 2009-04-03 18:26:30,455 [main] INFO org.apache.pig.backend.local.executionengine.LocalPigLauncher - 100% complete! 2009-04-03 18:26:30,456 [main] INFO org.apache.pig.backend.local.executionengine.LocalPigLauncher - Success!! (BZh91AYsyoz?u?...@{x_?d?|u-??mK???;??4?C??) ((R? 6?*mg, ?6?Zj?k,???0?QT?d???hY?#mJ?[j???z?m?t?u?K)??K5+??)?m?E7j?X?8a?? ??U?p@@MT?$?B?P??N??=???(z}gk...@c$\??i]?g:?J) a(R?,?u?v???...@?i@??J??!D?)???A?PP?IY??m? (mP(i?4,#F[?I)@?...@??|7^?}U??wwg,?u?$?T???((Q!D?=`*?}hP??_|??=?(??2???m=?xG?(?rC?B?(33??:4?N???t|??T?*??k??NT?x???=?fyv?wf??4z???4t?) (?oou?t???Kwl?3?nCM?WS?;l???P?s?x a???e)B??9? ?44 ((?...@4?) (f) (?...@+?d?0@?U) (Q?SR) -bash-3.00$ {code} -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-752) local mode doesn't read bzip2 and gzip compressed data files
[ https://issues.apache.org/jira/browse/PIG-752?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12756026#action_12756026 ] Hadoop QA commented on PIG-752: --- -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12419764/Pig_752.Patch against trunk revision 815571. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 3 new or modified tests. -1 patch. The patch command could not apply the patch. Console output: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/32/console This message is automatically generated. local mode doesn't read bzip2 and gzip compressed data files Key: PIG-752 URL: https://issues.apache.org/jira/browse/PIG-752 Project: Pig Issue Type: Bug Affects Versions: 0.4.0 Reporter: David Ciemiewicz Assignee: Jeff Zhang Fix For: 0.4.0 Attachments: Pig_752.Patch Problem 1) use of .bz2 file extension does not store results bzip2 compressed in Local mode (-exectype local) If I use the .bz2 filename extension in a STORE statement on HDFS, the results are stored with bzip2 compression. If I use the .bz2 filename extension in a STORE statement on local file system, the results are NOT stored with bzip2 compression. compact.bz2.pig: {code} A = load 'events.test' using PigStorage(); store A into 'events.test.bz2' using PigStorage(); C = load 'events.test.bz2' using PigStorage(); C = limit C 10; dump C; {code} {code} -bash-3.00$ pig -exectype local compact.bz2.pig -bash-3.00$ file events.test events.test: ASCII English text, with very long lines -bash-3.00$ file events.test.bz2 events.test.bz2: ASCII English text, with very long lines -bash-3.00$ cat events.test | bzip2 events.test.bz2 -bash-3.00$ file events.test.bz2 events.test.bz2: bzip2 compressed data, block size = 900k {code} The output format in local mode is definitely not bzip2, but it should be. {code} Problem 2) pig in local mode does not decompress bzip2 compressed files, but should, to be consistent with HDFS read.bz2.pig: {code} A = load 'events.test.bz2' using PigStorage(); A = limit A 10; dump A; {code} The output should be human readable but is instead garbage, indicating no decompression took place during the load: {code} -bash-3.00$ pig -exectype local read.bz2.pig USING: /grid/0/gs/pig/current 2009-04-03 18:26:30,455 [main] INFO org.apache.pig.backend.local.executionengine.LocalPigLauncher - 100% complete! 2009-04-03 18:26:30,456 [main] INFO org.apache.pig.backend.local.executionengine.LocalPigLauncher - Success!! (BZh91AYsyoz?u?...@{x_?d?|u-??mK???;??4?C??) ((R? 6?*mg, ?6?Zj?k,???0?QT?d???hY?#mJ?[j???z?m?t?u?K)??K5+??)?m?E7j?X?8a?? ??U?p@@MT?$?B?P??N??=???(z}gk...@c$\??i]?g:?J) a(R?,?u?v???...@?i@??J??!D?)???A?PP?IY??m? (mP(i?4,#F[?I)@?...@??|7^?}U??wwg,?u?$?T???((Q!D?=`*?}hP??_|??=?(??2???m=?xG?(?rC?B?(33??:4?N???t|??T?*??k??NT?x???=?fyv?wf??4z???4t?) (?oou?t???Kwl?3?nCM?WS?;l???P?s?x a???e)B??9? ?44 ((?...@4?) (f) (?...@+?d?0@?U) (Q?SR) -bash-3.00$ {code} -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.