Greetings pig users, This is regarding my previous post (in quotes below)
I was able to remove this column error by using the start up: pig -x local -M -t ColumnMapKeyPrune I have no more insight than that I only tried it because someone else reported their column oriented error went away with that command line switch. I restarted pig two times with and without the -t to verify the error went away and came back. With pig -x local -M -t ColumnMapKeyPrune I get: grunt> dump s1; (11,21,31) (12,22,32) (13,23,33) (14,24,34) (15,25,35) With pig -x local -M I get: grunt > dump s1; (ERROR_9999_.csv,21,31) (ERROR_9999_.csv,22,32) (ERROR_9999_.csv,23,33) (ERROR_9999_.csv,24,34) (ERROR_9999_.csv,25,35) ---------- Forwarded message ---------- From: jeremiah rounds <[email protected]> Date: Mon, Aug 13, 2012 at 5:49 PM Subject: Can anyone give me a hint about this column behavior? To: [email protected] Greetings, I am new to pig. I am trying to get to know it on a laptop with hadoop 20.2 installed in local mode. I have prior experience with hadoop, but I figure my error is so weird I blew the pig install or something. Here is what I have my problem distilled down too: $ pig -x local -M grunt> set pig.splitCombination false; grunt> cat ERROR_9999_.csv 11,21,31 12,22,32 13,23,33 14,24,34 15,25,35 grunt> raw = load 'ERROR_9999_.csv' USING PigStorage(',', '-tagsource') AS (file: chararray, col1: chararray,col2: chararray, col3: chararray); grunt> dump raw; (ERROR_9999_.csv,11,21,31) (ERROR_9999_.csv,12,22,32) (ERROR_9999_.csv,13,23,33) (ERROR_9999_.csv,14,24,34) (ERROR_9999_.csv,15,25,35) grunt> s1 = FOREACH raw GENERATE col1, col2, col3; grunt> dump s1; (ERROR_9999_.csv,21,31) (ERROR_9999_.csv,22,32) (ERROR_9999_.csv,23,33) (ERROR_9999_.csv,24,34) (ERROR_9999_.csv,25,35) Now obviously you wouldn't put on the filename only to take it off, but this is a distilled down repeatable case that captures my issue in a larger project. col1 has become the filename even though it used to be a double digit number in a chararray for raw. The describes go like this: grunt> describe raw; raw: {file: chararray,col1: chararray,col2: chararray,col3: chararray} grunt> describe s1; s1: {col1: chararray,col2: chararray,col3: chararray} There is an explain at the end of the email if that is useful to anyone. I have figured out that the issue seems related to -tagsource and pruning columns. Is that indicative of anything I might have done wrong in an install? Thanks, Jeremiah grunt> explain s1 2012-08-13 17:47:28,315 [main] INFO org.apache.pig.newplan.logical.rules.ColumnPruneVisitor - Columns pruned for raw: $0 initialized #----------------------------------------------- # New Logical Plan: #----------------------------------------------- s1: (Name: LOStore Schema: col1#41:chararray,col2#42:chararray,col3#43:chararray)ColumnPrune:InputUids=[42, 43, 41]ColumnPrune:OutputUids=[42, 43, 41] | |---s1: (Name: LOForEach Schema: col1#41:chararray,col2#42:chararray,col3#43:chararray) | | | (Name: LOGenerate[false,false,false] Schema: col1#41:chararray,col2#42:chararray,col3#43:chararray) | | | | | (Name: Cast Type: chararray Uid: 41) | | | | | |---col1:(Name: Project Type: bytearray Uid: 41 Input: 0 Column: (*)) | | | | | (Name: Cast Type: chararray Uid: 42) | | | | | |---col2:(Name: Project Type: bytearray Uid: 42 Input: 1 Column: (*)) | | | | | (Name: Cast Type: chararray Uid: 43) | | | | | |---col3:(Name: Project Type: bytearray Uid: 43 Input: 2 Column: (*)) | | | |---(Name: LOInnerLoad[0] Schema: col1#41:bytearray) | | | |---(Name: LOInnerLoad[1] Schema: col2#42:bytearray) | | | |---(Name: LOInnerLoad[2] Schema: col3#43:bytearray) | |---raw: (Name: LOLoad Schema: col1#41:bytearray,col2#42:bytearray,col3#43:bytearray)ColumnPrune:RequiredColumns=[1, 2, 3]ColumnPrune:InputUids=[42, 43, 41]ColumnPrune:OutputUids=[42, 43, 41]RequiredFields:[1, 2, 3] #----------------------------------------------- # Physical Plan: #----------------------------------------------- s1: Store(fakefile:org.apache.pig.builtin.PigStorage) - scope-40 | |---s1: New For Each(false,false,false)[bag] - scope-39 | | | Cast[chararray] - scope-31 | | | |---Project[bytearray][0] - scope-30 | | | Cast[chararray] - scope-34 | | | |---Project[bytearray][1] - scope-33 | | | Cast[chararray] - scope-37 | | | |---Project[bytearray][2] - scope-36 | |---raw: Load(file:///home/jrounds/Documents/12summer/paper/ERROR_9999_.csv:PigStorage(',','-tagsource')) - scope-29 2012-08-13 17:47:28,321 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler - File concatenation threshold: 100 optimistic? false #-------------------------------------------------- # Map Reduce Plan #-------------------------------------------------- MapReduce node scope-41 Map Plan s1: Store(fakefile:org.apache.pig.builtin.PigStorage) - scope-40 | |---s1: New For Each(false,false,false)[bag] - scope-39 | | | Cast[chararray] - scope-31 | | | |---Project[bytearray][0] - scope-30 | | | Cast[chararray] - scope-34 | | | |---Project[bytearray][1] - scope-33 | | | Cast[chararray] - scope-37 | | | |---Project[bytearray][2] - scope-36 | |---raw: Load(file:///home/jrounds/Documents/12summer/paper/ERROR_9999_.csv:PigStorage(',','-tagsource')) - scope-29-------- Global sort: false ----------------
