Ok I can reply to myself ... you have to add a clone of the KeyValue in the Put. So p.add(kv); becomes p.add(kv.clone());
If not, I suppose only the last one is added in HBase (but the result is quite weird and should be fixed IMO) Cheers, -- Damien 2012/11/9 Damien Hardy <[email protected]> > Hello, > > I am a bit confused here... > > I try to execute a M/R to import data in HBase table 'Consultation'. > > Running on CDH4.1.2 > > map function emits context.write(ImmutableBytesWritable, KeyValue) > > conf summary : > job.setOutputFormatClass(TableOutputFormat.class); > job.setInputFormatClass(DataDrivenDBInputFormat.class); > job.getConfiguration().set(TableOutputFormat.OUTPUT_TABLE, > "Consultation"); > job.setOutputKeyClass(ImmutableBytesWritable.class); > job.setOutputValueClass(KeyValue.class); > > > The reduce class is : > > static class ImportReducer > extends TableReducer<ImmutableBytesWritable, KeyValue, > ImmutableBytesWritable> { > @Override > public void reduce(ImmutableBytesWritable row, Iterable<KeyValue> kvs, > Reducer<ImmutableBytesWritable, KeyValue, ImmutableBytesWritable, > Writable>.Context context) > throws java.io.IOException, InterruptedException { > Put p = new Put(row.copyBytes()); > int i = 0; > byte[] rk = null; > for (KeyValue kv: kvs) { > p.add(kv); > if ( Bytes.compareTo(CF_VISITED, 0, CF_VISITED.length, > kv.getBuffer(), kv.getFamilyOffset(), kv.getFamilyLength() ) == 0 ) { > i++; > } > } > p.add(CF_COUNTER,QA_COUNTER,Bytes.toBytes(i)); > context.write(new ImmutableBytesWritable(row),p); > } > } > > > hbase(main):038:0> scan 'Consultation', {COLUMNS=> *'visiting_tl'*, LIMIT > => 10 } > ROW > COLUMN+CELL > > 00070db1aa26d1906a078a1e03f788cb-\x00\x13\x80\x15 column=* > visited_tl:*\x7F\xFF\xFE\xD9\x00\xFC\xDB\xB7\x001\xC5\xA7, > timestamp=1266998781000, > value=\x00\x00\x00\x00 > > 001316263fc8b454bbd86dff1587a347-\x00>t\x05 column=* > visited_tl:*\x7F\xFF\xFE\xD7\x0F\xB8u_\x00\x08\xE1\xA0, > timestamp=1275341540000, > value=\x00\x00\x00\x00 > > 001497e68d7c71a3cd281860484fa6be-\x00/\x0E^ column=* > visited_tl:*\x7F\xFF\xFE\xD8\x06\x9B\xB0\xB7\x00(3S, > timestamp=1271199453000, > value=\x00\x00\x00\x00 > > 001845aac2462a1c24b36eb90ab698cf-\x00\x04\x1E\xF5 column=* > visited_tl:*\x7F\xFF\xFE\xD6\xA8\xB9-\xEF\x002Po, > timestamp=1277069546000, > value=\x00\x00\x00\x01 > > 0019cec2c1f38c42b1c540ef7708c6a9-\x00;\xE0\x97 column=* > visited_tl:*\x7F\xFF\xFE\xD8\xF9\xC7\x0C_\x00\x02?., > timestamp=1267119748000, > value=\x00\x00\x00\x00 > > 001de6b92754b0ef44ee10bf2bdfe3c3-\x00%\x1AV column=* > visited_tl:*\x7F\xFF\xFE\xD6\xE4H\x99\xC7\x00\x0F\x7F9, > timestamp=1276070291000, > value=\x00\x00\x00\x01 > > 00217f082f96eb12108c139b99a3ccb7-\x00\x02w\x08 column=* > visited_tl:*\x7F\xFF\xFE\xD8\xEB\x1B\x95\xEF\x00\x0A7\x19, > timestamp=1267365866000, > value=\x00\x00\x00\x00 > > 0021cbfd559f56dd298e4b4fee7626a9-\x00r\xBF\xFA column=* > visited_tl:*\x7F\xFF\xFE\xD6\xA1\x0B-\x0F\x00\x03\xBC\x8B, > timestamp=1277198390000, > value=\x00\x00\x00\x02 > > 00266c02a60f9a6efb5d24317e6032a0-\x00\x0E]+ column=* > visited_tl:*\x7F\xFF\xFE\xD6\xBC\x0D\xD1\x7F\x00/ q, > timestamp=1276745232000, > value=\x00\x00\x00\x01 > > 0026dbbd6562da5b79f1b09e94e3b973-\x00C[\x93 column=* > visited_tl:*\x7F\xFF\xFE\xD7\xB0\xFA\xB7/\x00\x02~\x09, > timestamp=1272636066000, > value=\x00\x00\x00\x01 > > 10 row(s) in 2.1130 seconds > > > hbase(main):036:0> get 'Consultation', > "00070db1aa26d1906a078a1e03f788cb-\x00\x13\x80\x15" > COLUMN > CELL > > *visited_tl:\x7F\xFF\xFE\xD9\x00\xFC\xDB\xB7\x001\xC5\xA7* > timestamp=1266998781000, > value=\x00\x00\x00\x00 > > *visited_tl:\x7F\xFF\xFE\xD9\x00\xFC\xDB\xB7\x001\xC5\xA7* > timestamp=1266998781000, > value=\x00\x00\x00\x00 > > visits_count:_counter > timestamp=1352475456545, > value=\x00\x00\x02\xA1 > > 3 row(s) in 0.3260 seconds > > hbase(main):037:0> get 'Consultation', > "00070db1aa26d1906a078a1e03f788cb-\x00\x13\x80\x15", *'visiting_tl:'* > COLUMN > CELL > > *visited_tl:*\x7F\xFF\xFE\xD9\x00\xFC\xDB\xB7\x001\xC5\xA7 > timestamp=1266998781000, > value=\x00\x00\x00\x00 > > 1 row(s) in 0.1650 seconds > > So I have 3 problems : > > * table is only 1 VERSION enable : who can I get the cell > visited_tl:\x7F\xFF\xFE\xD9\x00\xFC\xDB\xB7\x001\xC5\xA7 2 time for a > single row ? > * when I explicitly query for CF 'visiting_tl:' , I get a 'visited_tl:' > cell ... WTF ? > * the Counter is (int)673 ... where are my 673 visited_tl cell ? (673 is > the good value according to my source) > > Cheers, > > -- > Damien HARDY > IT Infrastructure Architect > > Viadeo - 30 rue de la Victoire - 75009 Paris - France > T : +33 1 80 48 39 73 – F : +33 1 42 93 22 56 > > -- Damien HARDY IT Infrastructure Architect Viadeo - 30 rue de la Victoire - 75009 Paris - France T : +33 1 80 48 39 73 – F : +33 1 42 93 22 56
