Also, you might want to look at HBASE-3880, which is committed but not released yet. It allows you to specify a custom Mapper class when running ImportTsv. It seems like a similar patch to make the input format plug-able would be needed in your case though.
On Tue, Jun 14, 2011 at 9:53 AM, Todd Lipcon <[email protected]> wrote: > Hi, > > Unfortunately I don't think the importtsv will work in "local job runner" > mode. Try runnign it on an MR cluster (could be pseudo-distributed) > > -Todd > > On Tue, Jun 14, 2011 at 2:01 AM, King JKing <[email protected]> wrote: > > > Thank for your reply. > > > > I just test importtsv and have Warning: > > > > java.lang.IllegalArgumentException: Can't read partitions file > > at > > > > > org.apache.hadoop.hbase.mapreduce.hadoopbackport.TotalOrderPartitioner.setConf(TotalOrderPartitioner.java:111) > > at > org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:62) > > at > > > > > org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:117) > > at > > > > > org.apache.hadoop.mapred.MapTask$NewOutputCollector.<init>(MapTask.java:527) > > at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:613) > > at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305) > > at > org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:177) > > Caused by: java.io.FileNotFoundException: File _partition.lst does not > > exist. > > at > > > > > org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:361) > > at > > > > > org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:245) > > at org.apache.hadoop.fs.FileSystem.getLength(FileSystem.java:676) > > at > org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1424) > > at > org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1419) > > at > > > > > org.apache.hadoop.hbase.mapreduce.hadoopbackport.TotalOrderPartitioner.readPartitions(TotalOrderPartitioner.java:296) > > at > > > > > org.apache.hadoop.hbase.mapreduce.hadoopbackport.TotalOrderPartitioner.setConf(TotalOrderPartitioner.java:82) > > ... 6 more > > > > Here is my command line: > > ./hadoop jar hbase-0.90.0.jar importtsv > > -Dimporttsv.columns=HBASE_ROW_KEY,f1:b,f1:c > -Dimporttsv.bulk.output=output > > t1 input > > > > In that, 't1', 'f1' is table and family in HBase. > > > > No data write in 'output' folder. > > > > Could you give me some advice? > > > > Thank you in advance. > > > > On Tue, Jun 14, 2011 at 10:44 AM, Todd Lipcon <[email protected]> wrote: > > > > > On Mon, Jun 13, 2011 at 8:17 PM, King JKing <[email protected]> wrote: > > > > > > > Dear all, > > > > > > > > I want to import data from Cassandra to HBase. > > > > > > > > > > > That's what we like to hear! ;-) > > > > > > > > > > I think the way maybe: > > > > Customize ImportTsv.java for read Cassandra data file (*.dbf) and > > convert > > > > to HBase data files, and use completebulkload tool > > > > > > > > > > > Sounds about right. I don't know what the .dbf format is, but if you > can > > > make an InputFormat that supports them, you can write a mapper to > > translate > > > from those records into HBase Puts, and then use HFileOutputFormat and > > bulk > > > loads just like ImportTsv. > > > > > > -Todd > > > -- > > > Todd Lipcon > > > Software Engineer, Cloudera > > > > > > > > > -- > Todd Lipcon > Software Engineer, Cloudera >
