RuntimeException in Pig when using "dump" command on column name
----------------------------------------------------------------

                 Key: CASSANDRA-2810
                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2810
             Project: Cassandra
          Issue Type: Bug
    Affects Versions: 0.8.1
         Environment: Ubuntu 10.10, 32 bits
java version "1.6.0_24"
Brisk beta-2 installed from Debian packages
            Reporter: Silvère Lestang


This bug was previously report on [Brisk bug 
tracker|https://datastax.jira.com/browse/BRISK-232].

In cassandra-cli:
{code}
[default@unknown] create keyspace Test
    with placement_strategy = 'org.apache.cassandra.locator.SimpleStrategy'
    and strategy_options = [{replication_factor:1}];

[default@unknown] use Test;
Authenticated to keyspace: Test

[default@Test] create column family test;

[default@Test] set test[ascii('row1')][long(1)]=integer(35);
set test[ascii('row1')][long(2)]=integer(36);
set test[ascii('row1')][long(3)]=integer(38);
set test[ascii('row2')][long(1)]=integer(45);
set test[ascii('row2')][long(2)]=integer(42);
set test[ascii('row2')][long(3)]=integer(33);

[default@Test] list test;
Using default limit of 100
-------------------
RowKey: 726f7731
=> (column=0000000000000001, value=35, timestamp=1308744931122000)
=> (column=0000000000000002, value=36, timestamp=1308744931124000)
=> (column=0000000000000003, value=38, timestamp=1308744931125000)
-------------------
RowKey: 726f7732
=> (column=0000000000000001, value=45, timestamp=1308744931127000)
=> (column=0000000000000002, value=42, timestamp=1308744931128000)
=> (column=0000000000000003, value=33, timestamp=1308744932722000)

2 Rows Returned.

[default@Test] describe keyspace;
Keyspace: Test:
  Replication Strategy: org.apache.cassandra.locator.SimpleStrategy
  Durable Writes: true
    Options: [replication_factor:1]
  Column Families:
    ColumnFamily: test
      Key Validation Class: org.apache.cassandra.db.marshal.BytesType
      Default column value validator: org.apache.cassandra.db.marshal.BytesType
      Columns sorted by: org.apache.cassandra.db.marshal.BytesType
      Row cache size / save period in seconds: 0.0/0
      Key cache size / save period in seconds: 200000.0/14400
      Memtable thresholds: 0.571875/122/1440 (millions of ops/MB/minutes)
      GC grace seconds: 864000
      Compaction min/max thresholds: 4/32
      Read repair chance: 1.0
      Replicate on write: false
      Built indexes: []
{code}
In Pig command line:
{code}
grunt> test = LOAD 'cassandra://Test/test' USING CassandraStorage() AS 
(rowkey:chararray, columns: bag {T: (name:long, value:int)});

grunt> value_test = foreach test generate rowkey, columns.name, columns.value;

grunt> dump value_test;
{code}
In /var/log/cassandra/system.log, I have severals time this exception:
{code}
INFO [IPC Server handler 3 on 8012] 2011-06-22 15:03:28,533 TaskInProgress.java 
(line 551) Error from attempt_201106210955_0051_m_000000_3: 
java.lang.RuntimeException: Unexpected data type -1 found in stream.
        at org.apache.pig.data.BinInterSedes.writeDatum(BinInterSedes.java:478)
        at org.apache.pig.data.BinInterSedes.writeTuple(BinInterSedes.java:541)
        at org.apache.pig.data.BinInterSedes.writeBag(BinInterSedes.java:522)
        at org.apache.pig.data.BinInterSedes.writeDatum(BinInterSedes.java:361)
        at org.apache.pig.data.BinInterSedes.writeTuple(BinInterSedes.java:541)
        at org.apache.pig.data.BinInterSedes.writeDatum(BinInterSedes.java:357)
        at 
org.apache.pig.impl.io.InterRecordWriter.write(InterRecordWriter.java:73)
        at org.apache.pig.impl.io.InterStorage.putNext(InterStorage.java:87)
        at 
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat$PigRecordWriter.write(PigOutputFormat.java:138)
        at 
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat$PigRecordWriter.write(PigOutputFormat.java:97)
        at 
org.apache.hadoop.mapred.MapTask$NewDirectOutputCollector.write(MapTask.java:638)
        at 
org.apache.hadoop.mapreduce.TaskInputOutputContext.write(TaskInputOutputContext.java:80)
        at 
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapOnly$Map.collect(PigMapOnly.java:48)
        at 
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.runPipeline(PigMapBase.java:239)
        at 
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:232)
        at 
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:53)
        at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
        at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:763)
        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:369)
        at org.apache.hadoop.mapred.Child$4.run(Child.java:259)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:396)
        at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1059)
        at org.apache.hadoop.mapred.Child.main(Child.java:253)
{code}
and the request failed.

{code}
grunt> test = LOAD 'cassandra://Test/test' USING CassandraStorage() AS 
(rowkey:chararray, columns: bag {T: (name:long, value:int)});

grunt> value_test = foreach test generate rowkey, columns.value;

grunt> dump value_test;
{code}

This time, without the column name, it's work (but the value are displayed as 
char instead of integer). Result:
{code}
(row1,{(#),($),(&)})
(row2,{(-),(*),(!)})
{code}

Now we do the same test but we set a comparator to the CF.
{code}
[default@Test] create column family test with comparator = 'LongType';

[default@Test] set test[ascii('row1')][long(1)]=integer(35);
set test[ascii('row1')][long(2)]=integer(36);
set test[ascii('row1')][long(3)]=integer(38);
set test[ascii('row2')][long(1)]=integer(45);
set test[ascii('row2')][long(2)]=integer(42);
set test[ascii('row2')][long(3)]=integer(33);

[default@Test] list test;
Using default limit of 100
-------------------
RowKey: 726f7731
=> (column=1, value=35, timestamp=1308748643506000)
=> (column=2, value=36, timestamp=1308748643508000)
=> (column=3, value=38, timestamp=1308748643509000)
-------------------
RowKey: 726f7732
=> (column=1, value=45, timestamp=1308748643510000)
=> (column=2, value=42, timestamp=1308748643512000)
=> (column=3, value=33, timestamp=1308748645138000)

2 Rows Returned.

[default@Test] describe keyspace;
Keyspace: Test:
  Replication Strategy: org.apache.cassandra.locator.SimpleStrategy
  Durable Writes: true
    Options: [replication_factor:1]
  Column Families:
    ColumnFamily: test
      Key Validation Class: org.apache.cassandra.db.marshal.BytesType
      Default column value validator: org.apache.cassandra.db.marshal.BytesType
      Columns sorted by: org.apache.cassandra.db.marshal.LongType
      Row cache size / save period in seconds: 0.0/0
      Key cache size / save period in seconds: 200000.0/14400
      Memtable thresholds: 0.571875/122/1440 (millions of ops/MB/minutes)
      GC grace seconds: 864000
      Compaction min/max thresholds: 4/32
      Read repair chance: 1.0
      Replicate on write: false
      Built indexes: []
{code}
{code}
grunt> test = LOAD 'cassandra://Test/test' USING CassandraStorage() AS 
(rowkey:chararray, columns: bag {T: (name:long, value:int)});

grunt> value_test = foreach test generate rowkey, columns.name, columns.value;

grunt> dump value_test;
{code}
This time it's work as expected (appart from the value displayed as char). 
Result:
{code}
(row1,{(1),(2),(3)},{(#),($),(&)})
(row2,{(1),(2),(3)},{(-),(*),(!)})
{code}


--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


Reply via email to