RE: cassandra hadoop reducer writing to CQL3 - primary key - must it be text type?
From: johnlu...@hotmail.com To: user@cassandra.apache.org Subject: RE: cassandra hadoop reducer writing to CQL3 - primary key - must it be text type? Date: Wed, 9 Oct 2013 18:33:13 -0400 reduce method : public void reduce(LongWritable writableRecid, IterableLongWritable values, Context context) throws IOException, InterruptedException { Long sum = 0L; Long recordid = writableRecid.get(); ListByteBuffer vbles = null; byte[] longByterray = new byte[8]; for(int i= 0; i 8; i++) { longByterray[i] = (byte)(recordid (i * 8)); } ByteBuffer recordIdByteBuf = ByteBuffer.allocate(8); recordIdByteBuf.wrap(longByterray); keys.put(recordid, recordIdByteBuf); ... context.write(keys, vbles); } I finally got it working after finding the LongSerializer class source in cassandra, I see that the correct way to build a ByteBuffer key from a Long is public ByteBuffer serialize(Long value) { return value == null ? ByteBufferUtil.EMPTY_BYTE_BUFFER : ByteBufferUtil.bytes(value); } John
RE: cassandra hadoop reducer writing to CQL3 - primary key - must it be text type?
I don't know what happened to my original post but it got truncated. Let me try again : software versions : apache-cassandra-2.0.1 hadoop-2.1.0-beta I have been experimenting with using hadoop for a map/reduce operation on cassandra, outputting to the CqlOutputFormat.class. I based my first program fairly closely on the famous WordCount example in examples/hadoop_cql3_word_count except --- I set my output colfamily to have a bigint primary key : CREATE TABLE archive_recordids ( recordid bigint , count_num bigint, PRIMARY KEY (recordid)) and simply tried setting this key as one of the keys in the output map : keys.put(recordid, ByteBufferUtil.bytes(recordid.longValue())); but it always failed with a strange error : java.io.IOException: InvalidRequestException(why:Key may not be empty) After trying to make it more similar to WordCount, I eventually realized the one difference was datatype of the primary key of the output colfamily: WordCount has text I had bigint I changed mine to text : CREATE TABLE archive_recordids ( recordid text , count_num bigint, PRIMARY KEY (recordid)) and set the primary key *twice* in the reducer : keys.put(recordid, ByteBufferUtil.bytes(String.valueOf(recordid))); context.getConfiguration().set(PRIMARY_KEY,String.valueOf(recordid)); and it then worked perfectly. Is there a restriction in cassandra-hadoop-cql support that the output colfamily's primary key(s) must be text? And does that also apply to DELETE? Or am I doing it wrong? Or maybe there is some other OutputFormatter that I could use that would work? Cheers, John
RE: cassandra hadoop reducer writing to CQL3 - primary key - must it be text type?
From: johnlu...@hotmail.com To: user@cassandra.apache.org Subject: RE: cassandra hadoop reducer writing to CQL3 - primary key - must it be text type? Date: Wed, 9 Oct 2013 09:40:06 -0400 software versions : apache-cassandra-2.0.1hadoop-2.1.0-beta I have been experimenting with using hadoop for a map/reduce operation on cassandra, outputting to the CqlOutputFormat.class. I based my first program fairly closely on the famous WordCount example in examples/hadoop_cql3_word_count except --- I set my output colfamily to have a bigint primary key : CREATE TABLE archive_recordids ( recordid bigint , count_num bigint, PRIMARY KEY (recordid)) and simply tried setting this key as one of the keys in the output map : keys.put(recordid, ByteBufferUtil.bytes(recordid.longValue())); but it always failed with a strange error : java.io.IOException: InvalidRequestException(why:Key may not be empty) I managed to get a little bit further and my M/R program now runs to completion with output to the colfamily with bigint primary key and actually does manage to UPDATE a row. query: String query = UPDATE + keyspace + . + OUTPUT_COLUMN_FAMILY + SET count_num = ? ; reduce method : public void reduce(LongWritable writableRecid, IterableLongWritable values, Context context) throws IOException, InterruptedException { Long sum = 0L; Long recordid = writableRecid.get(); ListByteBuffer vbles = null; byte[] longByterray = new byte[8]; for(int i= 0; i 8; i++) { longByterray[i] = (byte)(recordid (i * 8)); } ByteBuffer recordIdByteBuf = ByteBuffer.allocate(8); recordIdByteBuf.wrap(longByterray); keys.put(recordid, recordIdByteBuf); ... context.write(keys, vbles); } and my logger output does show it outputting maps containing what appear to be valid keys e.g. writing key : 0x47407826 , hasarray ? : Y there are about 74 mappings in the final reducer output, each with a different numeric record key. but after the program completes, there is just one single row in the columnfamily with a rowkey of 0 (zero). SELECT * FROM archive_recordids LIMIT 9; recordid | count_num --+--- 0 | 2 (1 rows) I guess it is something relating to the way my code is wrapping along value into the ByteBuffer or maybe the way the ByteBuffer is being allocated. As far as I can tell, the ByteBuffer needs to be populated in exactly the same way as a thrift application would populate a ByteBuffer for a bigint key -- does anyone know how to do that or point me to an example that works? Thanks John Cheers, John