This seems like a bug in PigStorage. Would you mind opening a JIRA with the steps to reproduce that you've include here?
thanks, Bill On Mon, Aug 13, 2012 at 3:44 PM, jeremiah rounds <[email protected]>wrote: > Greetings pig users, > > This is regarding my previous post (in quotes below) > > > I was able to remove this column error by using the start up: > pig -x local -M -t ColumnMapKeyPrune > > > I have no more insight than that I only tried it because someone else > reported their column oriented error went away with that command line > switch. I restarted pig two times with and without the -t to verify > the error went away and came back. > > > With pig -x local -M -t ColumnMapKeyPrune I get: > grunt> dump s1; > (11,21,31) > (12,22,32) > (13,23,33) > (14,24,34) > (15,25,35) > > > With pig -x local -M I get: > grunt > dump s1; > (ERROR_9999_.csv,21,31) > (ERROR_9999_.csv,22,32) > (ERROR_9999_.csv,23,33) > (ERROR_9999_.csv,24,34) > (ERROR_9999_.csv,25,35) > > > > > ---------- Forwarded message ---------- > From: jeremiah rounds <[email protected]> > Date: Mon, Aug 13, 2012 at 5:49 PM > Subject: Can anyone give me a hint about this column behavior? > To: [email protected] > > > Greetings, > > I am new to pig. I am trying to get to know it on a laptop with > hadoop 20.2 installed in local mode. I have prior experience with > hadoop, but I figure my error is so weird I blew the pig install or > something. > > Here is what I have my problem distilled down too: > > $ pig -x local -M > > > grunt> set pig.splitCombination false; > grunt> cat ERROR_9999_.csv > 11,21,31 > 12,22,32 > 13,23,33 > 14,24,34 > 15,25,35 > > > > grunt> raw = load 'ERROR_9999_.csv' USING PigStorage(',', > '-tagsource') AS (file: chararray, col1: chararray,col2: chararray, > col3: chararray); > grunt> dump raw; > (ERROR_9999_.csv,11,21,31) > (ERROR_9999_.csv,12,22,32) > (ERROR_9999_.csv,13,23,33) > (ERROR_9999_.csv,14,24,34) > (ERROR_9999_.csv,15,25,35) > > grunt> s1 = FOREACH raw GENERATE col1, col2, col3; > grunt> dump s1; > (ERROR_9999_.csv,21,31) > (ERROR_9999_.csv,22,32) > (ERROR_9999_.csv,23,33) > (ERROR_9999_.csv,24,34) > (ERROR_9999_.csv,25,35) > > > Now obviously you wouldn't put on the filename only to take it off, > but this is a distilled down repeatable case that captures my issue in > a larger project. col1 has become the filename even though it used to > be a double digit number in a chararray for raw. > > The describes go like this: > grunt> describe raw; > raw: {file: chararray,col1: chararray,col2: chararray,col3: chararray} > grunt> describe s1; > s1: {col1: chararray,col2: chararray,col3: chararray} > > There is an explain at the end of the email if that is useful to > anyone. I have figured out that the issue seems related to -tagsource > and pruning columns. Is that indicative of anything I might have done > wrong in an install? > > > Thanks, > Jeremiah > > grunt> explain s1 > 2012-08-13 17:47:28,315 [main] INFO > org.apache.pig.newplan.logical.rules.ColumnPruneVisitor - Columns > pruned for raw: $0 > initialized > #----------------------------------------------- > # New Logical Plan: > #----------------------------------------------- > s1: (Name: LOStore Schema: > > col1#41:chararray,col2#42:chararray,col3#43:chararray)ColumnPrune:InputUids=[42, > 43, 41]ColumnPrune:OutputUids=[42, 43, 41] > | > |---s1: (Name: LOForEach Schema: > col1#41:chararray,col2#42:chararray,col3#43:chararray) > | | > | (Name: LOGenerate[false,false,false] Schema: > col1#41:chararray,col2#42:chararray,col3#43:chararray) > | | | > | | (Name: Cast Type: chararray Uid: 41) > | | | > | | |---col1:(Name: Project Type: bytearray Uid: 41 Input: 0 > Column: (*)) > | | | > | | (Name: Cast Type: chararray Uid: 42) > | | | > | | |---col2:(Name: Project Type: bytearray Uid: 42 Input: 1 > Column: (*)) > | | | > | | (Name: Cast Type: chararray Uid: 43) > | | | > | | |---col3:(Name: Project Type: bytearray Uid: 43 Input: 2 > Column: (*)) > | | > | |---(Name: LOInnerLoad[0] Schema: col1#41:bytearray) > | | > | |---(Name: LOInnerLoad[1] Schema: col2#42:bytearray) > | | > | |---(Name: LOInnerLoad[2] Schema: col3#43:bytearray) > | > |---raw: (Name: LOLoad Schema: > > col1#41:bytearray,col2#42:bytearray,col3#43:bytearray)ColumnPrune:RequiredColumns=[1, > 2, 3]ColumnPrune:InputUids=[42, 43, 41]ColumnPrune:OutputUids=[42, 43, > 41]RequiredFields:[1, 2, 3] > > #----------------------------------------------- > # Physical Plan: > #----------------------------------------------- > s1: Store(fakefile:org.apache.pig.builtin.PigStorage) - scope-40 > | > |---s1: New For Each(false,false,false)[bag] - scope-39 > | | > | Cast[chararray] - scope-31 > | | > | |---Project[bytearray][0] - scope-30 > | | > | Cast[chararray] - scope-34 > | | > | |---Project[bytearray][1] - scope-33 > | | > | Cast[chararray] - scope-37 > | | > | |---Project[bytearray][2] - scope-36 > | > |---raw: > Load(file:///home/jrounds/Documents/12summer/paper/ERROR_9999_.csv:PigStorage(',','-tagsource')) > - scope-29 > > 2012-08-13 17:47:28,321 [main] INFO > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler > - File concatenation threshold: 100 optimistic? false > #-------------------------------------------------- > # Map Reduce Plan > #-------------------------------------------------- > MapReduce node scope-41 > Map Plan > s1: Store(fakefile:org.apache.pig.builtin.PigStorage) - scope-40 > | > |---s1: New For Each(false,false,false)[bag] - scope-39 > | | > | Cast[chararray] - scope-31 > | | > | |---Project[bytearray][0] - scope-30 > | | > | Cast[chararray] - scope-34 > | | > | |---Project[bytearray][1] - scope-33 > | | > | Cast[chararray] - scope-37 > | | > | |---Project[bytearray][2] - scope-36 > | > |---raw: > Load(file:///home/jrounds/Documents/12summer/paper/ERROR_9999_.csv:PigStorage(',','-tagsource')) > - scope-29-------- > Global sort: false > ---------------- > -- *Note that I'm no longer using my Yahoo! email address. Please email me at [email protected] going forward.*
