Hi JM, My usage is the following: I want to write a C++ program which will answer RPC requests. Each request has a list of keys and responses will contain values. I want to use HFile because it has an efficient key-based index and because there is a whole set of tools in hadoop to produce this kind of file.
So, my usage is totally unrelated to HBase. I only have keys and values. Family and qualifier makes no sense in my design -- specifying empty values for those is a waste space in my case. TFile is a replacement for Hadoop's MapFile<https://issues.apache.org/jira/browse/HADOOP-3315>. HFile was designed after TFile. Sounds like TFile better fits my use case then. -Gatis On Fri, Dec 6, 2013 at 7:54 PM, Jean-Marc Spaggiari <[email protected] > wrote: > Hi Igor, > > > Have you looked at this constructor? > > /** > * Constructs KeyValue structure filled with null value. > * @param row - row key (arbitrary byte array) > * @param family family name > * @param qualifier column qualifier > */ > public KeyValue(final byte [] row, final byte [] family, > final byte [] qualifier, final byte [] value) > > You need to specify the column family and the column qualifier. That's in > your table definition. And then you give your value. > > Is that not what you are looking for? Also, what is a TFile? > > JM > > > 2013/12/6 Igor Gatis <[email protected]> > > > Sounds like hbase's HFileOutputFormat depends on KeyValue's "family" > field. > > I don't want that. > > > > All I want is to keep keys and values in an indexed filed. TFile would > work > > as well. But it seems there is no TFileOutputFormat available. > > > > > > On Fri, Dec 6, 2013 at 4:47 PM, Igor Gatis <[email protected]> wrote: > > > > > That's the kind of solution I'm looking for. > > > > > > Here is what I have: > > > > > > String jobName = "Seq2HFile"; > > > Job job = new Job(getConf(), jobName); > > > job.setJarByClass(Seq2HFile.class); > > > > > > job.setMapperClass(*MyIdentityMapper.class*); > > > job.setMapOutputKeyClass(BytesWritable.class); > > > job.setMapOutputValueClass(BytesWritable.class); > > > > > > job.setPartitionerClass(TotalOrderPartitioner.class); > > > > > > job.setReducerClass(KeyValueSortReducer.class); > > > job.setOutputKeyClass(ImmutableBytesWritable.class); > > > job.setOutputValueClass(KeyValue.class); > > > job.setNumReduceTasks(1); > > > > > > job.setInputFormatClass(SequenceFileInputFormat.class); > > > SequenceFileInputFormat.addInputPaths(job, inputPath); > > > > > > job.setOutputFormatClass(HFileOutputFormat.class); > > > HFileOutputFormat.setOutputPath(job, new Path(outputPath)); > > > > > > job.submit(); > > > job.waitForCompletion(true); > > > > > > The bit I'm stuck is MyIdentityMapper. My input is a > > > SequenceFile<BytesWritable, BytesWritable>. According to > > HFileOutputFormat > > > signature, output key is ImmutableBytesWritable and value is KeyValue. > > > > > > I guess BytesWritable -> ImmutableBytesWritable is straightforward. But > > > I've got no clue how to fill KeyValue. > > > > > > public static class MyIdentityMapper > > > extends Mapper<BytesWritable, BytesWritable, > > ImmutableBytesWritable, > > > KeyValue> { > > > public void map(BytesWritable key, BytesWritable value, Context > > > context) throws IOException, > > > InterruptedException { > > > * // What do I write here?* > > > } > > > } > > > > > > > > > > > > On Fri, Dec 6, 2013 at 12:31 PM, Jean-Marc Spaggiari < > > > [email protected]> wrote: > > > > > >> Hi Igor, > > >> > > >> I will say, MapReduce. > > >> > > >> SequenceFileInputFormat > > >> HFileOutputFormat > > >> > > >> JM > > >> > > >> > > >> 2013/12/5 Igor Gatis <[email protected]> > > >> > > >> > I have SequenceFiles I'd like to convert to HFile. How do I that? > > >> > > > >> > > > > > > > > >
