RE: cassandra hadoop reducer writing to CQL3 - primary key - must it be text type?

2013-10-10 Thread John Lumby

 From: johnlu...@hotmail.com
 To: user@cassandra.apache.org
 Subject: RE: cassandra hadoop reducer writing to CQL3 - primary key - must it 
 be text type?
 Date: Wed, 9 Oct 2013 18:33:13 -0400

 reduce method :

 public void reduce(LongWritable writableRecid, IterableLongWritable 
 values, Context context) throws IOException, InterruptedException
 {
 Long sum = 0L;
 Long recordid = writableRecid.get();
 ListByteBuffer vbles = null;
 byte[] longByterray = new byte[8];
 for(int i= 0; i  8; i++) {
 longByterray[i] = (byte)(recordid (i * 8));
 }
 ByteBuffer recordIdByteBuf = ByteBuffer.allocate(8);
 recordIdByteBuf.wrap(longByterray);
 keys.put(recordid, recordIdByteBuf);
   ...
 context.write(keys, vbles);
 }


I finally got it working after finding the LongSerializer class source in 
cassandra,
I see that the correct way to build a ByteBuffer key from a Long is

    public ByteBuffer serialize(Long value)
    {
    return value == null ? ByteBufferUtil.EMPTY_BYTE_BUFFER : 
ByteBufferUtil.bytes(value);
    }

John  

RE: cassandra hadoop reducer writing to CQL3 - primary key - must it be text type?

2013-10-09 Thread John Lumby
I don't know what happened to my original post but it got truncated.

Let me try again :

    software versions : apache-cassandra-2.0.1    hadoop-2.1.0-beta

I have been experimenting with using hadoop for a map/reduce operation on 
cassandra,
outputting to the CqlOutputFormat.class.
I based my first program fairly closely on the famous WordCount example in
examples/hadoop_cql3_word_count
except --- I set my output colfamily to have a bigint primary key :

CREATE TABLE archive_recordids ( recordid bigint , count_num bigint, PRIMARY 
KEY (recordid))

and simply tried setting this key as one of the keys in the output map :

 keys.put(recordid, ByteBufferUtil.bytes(recordid.longValue()));

but it always failed with a strange error :

java.io.IOException: InvalidRequestException(why:Key may not be empty)

After trying to make it more similar to WordCount,
I eventually realized the one difference was datatype of the primary key
of the output colfamily:
WordCount has text
I had bigint

I changed mine to text :

CREATE TABLE archive_recordids ( recordid text , count_num bigint, PRIMARY KEY 
(recordid))

and set the primary key *twice* in the reducer :
   keys.put(recordid, ByteBufferUtil.bytes(String.valueOf(recordid)));
   context.getConfiguration().set(PRIMARY_KEY,String.valueOf(recordid));

and it then worked perfectly.

Is there a restriction in cassandra-hadoop-cql support that
the output colfamily's primary key(s) must be text?
And does that also apply to DELETE?
Or am I doing it wrong?
Or maybe there is some other OutputFormatter that I could use that would work?

Cheers,   John

RE: cassandra hadoop reducer writing to CQL3 - primary key - must it be text type?

2013-10-09 Thread John Lumby

 From: johnlu...@hotmail.com
 To: user@cassandra.apache.org
 Subject: RE: cassandra hadoop reducer writing to CQL3 - primary key - must it 
 be text type?
 Date: Wed, 9 Oct 2013 09:40:06 -0400

 software versions : apache-cassandra-2.0.1hadoop-2.1.0-beta

 I have been experimenting with using hadoop for a map/reduce operation on 
 cassandra,
 outputting to the CqlOutputFormat.class.
 I based my first program fairly closely on the famous WordCount example in
 examples/hadoop_cql3_word_count
 except --- I set my output colfamily to have a bigint primary key :

 CREATE TABLE archive_recordids ( recordid bigint , count_num bigint, PRIMARY 
 KEY (recordid))

 and simply tried setting this key as one of the keys in the output map :

  keys.put(recordid, ByteBufferUtil.bytes(recordid.longValue()));

 but it always failed with a strange error :

 java.io.IOException: InvalidRequestException(why:Key may not be empty)

I managed to get a little bit further and my M/R program now runs to completion
with output to the colfamily with bigint primary key and actually does manage
to UPDATE a row.

query:

 String query = UPDATE  + keyspace + . + OUTPUT_COLUMN_FAMILY +  SET 
count_num = ? ;

reduce method :

    public void reduce(LongWritable writableRecid, IterableLongWritable 
values, Context context) throws IOException, InterruptedException
    {
    Long sum = 0L;
    Long recordid = writableRecid.get();
    ListByteBuffer vbles = null;
    byte[] longByterray = new byte[8];
    for(int i= 0; i  8; i++) {
    longByterray[i] = (byte)(recordid (i * 8));
    }  
    ByteBuffer recordIdByteBuf = ByteBuffer.allocate(8);
    recordIdByteBuf.wrap(longByterray);
    keys.put(recordid, recordIdByteBuf);
  ...
    context.write(keys, vbles);
    }

and my logger output does show it outputting maps containing
what appear to be valid keys e.g.

writing key : 0x47407826 , hasarray ? : Y

there are about 74 mappings in the final reducer output,
each with a different numeric record key.

but after the program completes,   there is just one single row in the 
columnfamily
with a rowkey of 0 (zero).

SELECT * FROM archive_recordids LIMIT 9;

 recordid | count_num
--+---
    0 | 2

(1 rows)


I guess it is something relating to the way my code is wrapping along value 
into the ByteBuffer
or maybe the way the ByteBuffer is being allocated.    As far as I can tell,
the ByteBuffer needs to be populated in exactly the same way as a thrift 
application
would populate a ByteBuffer for a bigint key  --   does anyone know how to do 
that
or point me to an example that works?

Thanks   John



 Cheers,   John