Re: [mapreduce] ColumnFamilyRecordWriter hidden reuse

2011-01-26 Thread Patrik Modesto
On Wed, Jan 26, 2011 at 08:58, Mck m...@apache.org wrote: You are correct that microseconds would be better but for the test it doesn't matter that much. Have you tried. I'm very new to cassandra as well, and always uncertain as to what to expect... IMHO it's matter of use-case. In my

Re: [mapreduce] ColumnFamilyRecordWriter hidden reuse

2011-01-26 Thread Mck
On Wed, 2011-01-26 at 12:13 +0100, Patrik Modesto wrote: BTW how to get current time in microseconds in Java? I'm using HFactory.clock() (from hector). As far as moving the clone(..) into ColumnFamilyRecordWriter.write(..) won't this hurt performance? The size of the queue is computed

Re: [mapreduce] ColumnFamilyRecordWriter hidden reuse

2011-01-26 Thread Jonathan Ellis
On Tue, Jan 25, 2011 at 12:09 PM, Mick Semb Wever m...@apache.org wrote: Well your key is a mutable Text object, so i can see some possibility depending on how hadoop uses these objects. Yes, that's it exactly. We recently fixed a bug in the demo word_count program for this. Now we do

Re: [mapreduce] ColumnFamilyRecordWriter hidden reuse

2011-01-25 Thread Mick Semb Wever
On Tue, 2011-01-25 at 09:37 +0100, Patrik Modesto wrote: While developing really simple MR task, I've found that a combiantion of Hadoop optimalization and Cassandra ColumnFamilyRecordWriter queue creates wrong keys to send to batch_mutate(). I've seen similar behaviour (junk rows being

Re: [mapreduce] ColumnFamilyRecordWriter hidden reuse

2011-01-25 Thread Patrik Modesto
Hi Mick, attached is the very simple MR job, that deletes expired URL from my test Cassandra DB. The keyspace looks like this: Keyspace: Test: Replication Strategy: org.apache.cassandra.locator.SimpleStrategy Replication Factor: 2 Column Families: ColumnFamily: Url2 Columns

Re: [mapreduce] ColumnFamilyRecordWriter hidden reuse

2011-01-25 Thread Mick Semb Wever
On Tue, 2011-01-25 at 14:16 +0100, Patrik Modesto wrote: The atttached file contains the working version with cloned key in reduce() method. My other aproache was: context.write(ByteBuffer.wrap(key.getBytes(), 0, key.getLength()), Collections.singletonList(getMutation(key))); Which

Re: [mapreduce] ColumnFamilyRecordWriter hidden reuse

2011-01-25 Thread Patrik Modesto
On Tue, Jan 25, 2011 at 19:09, Mick Semb Wever m...@apache.org wrote: In fact i have another problem (trying to write an empty byte[], or something, as a key, which put one whole row out of whack, ((one row in 25 million...))). But i'm debugging along the same code. I don't quite

Re: [mapreduce] ColumnFamilyRecordWriter hidden reuse

2011-01-25 Thread Mck
is d.timestamp = System.currentTimeMillis(); ok? You are correct that microseconds would be better but for the test it doesn't matter that much. Have you tried. I'm very new to cassandra as well, and always uncertain as to what to expect... ByteBuffer bbKey =