Greetings,

I am new to pig.  I am trying to get to know it on a laptop with
hadoop 20.2 installed in local mode.  I have prior experience with
hadoop, but I figure my error is so weird I blew the pig install or
something.

Here is what I have my problem distilled down too:

$ pig -x local -M


grunt> set pig.splitCombination false;
grunt> cat ERROR_9999_.csv
11,21,31
12,22,32
13,23,33
14,24,34
15,25,35



grunt> raw = load 'ERROR_9999_.csv' USING PigStorage(',',
'-tagsource') AS (file: chararray, col1: chararray,col2: chararray,
col3: chararray);
grunt> dump raw;
(ERROR_9999_.csv,11,21,31)
(ERROR_9999_.csv,12,22,32)
(ERROR_9999_.csv,13,23,33)
(ERROR_9999_.csv,14,24,34)
(ERROR_9999_.csv,15,25,35)

grunt> s1 = FOREACH raw GENERATE  col1, col2, col3;
grunt> dump s1;
(ERROR_9999_.csv,21,31)
(ERROR_9999_.csv,22,32)
(ERROR_9999_.csv,23,33)
(ERROR_9999_.csv,24,34)
(ERROR_9999_.csv,25,35)


Now obviously you wouldn't put on the filename only to take it off,
but this is a distilled down repeatable case that captures my issue in
a larger project.  col1 has become the filename even though it used to
be a double digit number in a chararray for raw.

The describes go like this:
grunt> describe raw;
raw: {file: chararray,col1: chararray,col2: chararray,col3: chararray}
grunt> describe s1;
s1: {col1: chararray,col2: chararray,col3: chararray}

There is an explain at the end of the email if that is useful to
anyone.  I have figured out that the issue seems related to -tagsource
and pruning columns.  Is that indicative of anything I might have done
wrong in an install?


Thanks,
Jeremiah

grunt> explain s1
2012-08-13 17:47:28,315 [main] INFO
org.apache.pig.newplan.logical.rules.ColumnPruneVisitor - Columns
pruned for raw: $0
initialized
#-----------------------------------------------
# New Logical Plan:
#-----------------------------------------------
s1: (Name: LOStore Schema:
col1#41:chararray,col2#42:chararray,col3#43:chararray)ColumnPrune:InputUids=[42,
43, 41]ColumnPrune:OutputUids=[42, 43, 41]
|
|---s1: (Name: LOForEach Schema:
col1#41:chararray,col2#42:chararray,col3#43:chararray)
    |   |
    |   (Name: LOGenerate[false,false,false] Schema:
col1#41:chararray,col2#42:chararray,col3#43:chararray)
    |   |   |
    |   |   (Name: Cast Type: chararray Uid: 41)
    |   |   |
    |   |   |---col1:(Name: Project Type: bytearray Uid: 41 Input: 0
Column: (*))
    |   |   |
    |   |   (Name: Cast Type: chararray Uid: 42)
    |   |   |
    |   |   |---col2:(Name: Project Type: bytearray Uid: 42 Input: 1
Column: (*))
    |   |   |
    |   |   (Name: Cast Type: chararray Uid: 43)
    |   |   |
    |   |   |---col3:(Name: Project Type: bytearray Uid: 43 Input: 2
Column: (*))
    |   |
    |   |---(Name: LOInnerLoad[0] Schema: col1#41:bytearray)
    |   |
    |   |---(Name: LOInnerLoad[1] Schema: col2#42:bytearray)
    |   |
    |   |---(Name: LOInnerLoad[2] Schema: col3#43:bytearray)
    |
    |---raw: (Name: LOLoad Schema:
col1#41:bytearray,col2#42:bytearray,col3#43:bytearray)ColumnPrune:RequiredColumns=[1,
2, 3]ColumnPrune:InputUids=[42, 43, 41]ColumnPrune:OutputUids=[42, 43,
41]RequiredFields:[1, 2, 3]

#-----------------------------------------------
# Physical Plan:
#-----------------------------------------------
s1: Store(fakefile:org.apache.pig.builtin.PigStorage) - scope-40
|
|---s1: New For Each(false,false,false)[bag] - scope-39
    |   |
    |   Cast[chararray] - scope-31
    |   |
    |   |---Project[bytearray][0] - scope-30
    |   |
    |   Cast[chararray] - scope-34
    |   |
    |   |---Project[bytearray][1] - scope-33
    |   |
    |   Cast[chararray] - scope-37
    |   |
    |   |---Project[bytearray][2] - scope-36
    |
    |---raw: 
Load(file:///home/jrounds/Documents/12summer/paper/ERROR_9999_.csv:PigStorage(',','-tagsource'))
- scope-29

2012-08-13 17:47:28,321 [main] INFO
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler
- File concatenation threshold: 100 optimistic? false
#--------------------------------------------------
# Map Reduce Plan
#--------------------------------------------------
MapReduce node scope-41
Map Plan
s1: Store(fakefile:org.apache.pig.builtin.PigStorage) - scope-40
|
|---s1: New For Each(false,false,false)[bag] - scope-39
    |   |
    |   Cast[chararray] - scope-31
    |   |
    |   |---Project[bytearray][0] - scope-30
    |   |
    |   Cast[chararray] - scope-34
    |   |
    |   |---Project[bytearray][1] - scope-33
    |   |
    |   Cast[chararray] - scope-37
    |   |
    |   |---Project[bytearray][2] - scope-36
    |
    |---raw: 
Load(file:///home/jrounds/Documents/12summer/paper/ERROR_9999_.csv:PigStorage(',','-tagsource'))
- scope-29--------
Global sort: false
----------------

Reply via email to