I have been using saveAsNewAPIHadoopDataset but I use TableOutputFormat instead of HFileOutputFormat. But, hopefully this should help you:
val hbaseZookeeperQuorum = s"$zookeeperHost:$zookeeperPort:$zookeeperHbasePath" val conf = HBaseConfiguration.create() conf.set("hbase.zookeeper.quorum", hbaseZookeeperQuorum) conf.set(TableOutputFormat.QUORUM_ADDRESS, hbaseZookeeperQuorum) conf.set(TableOutputFormat.QUORUM_PORT, zookeeperPort.toString) conf.setClass("mapreduce.outputformat.class", classOf[TableOutputFormat[Object]], classOf[OutputFormat[Object, Writable]]) conf.set(TableOutputFormat.OUTPUT_TABLE, tableName) val rddToSave: RDD[(Array[Byte], Array[Byte], Array[Byte])] = ... // Some RDD that contains row key, column qualifier and data val putRDD = rddToSave.map(tuple => { val (rowKey, column data) = tuple val put: Put = new Put(rowKey) put.add(COLUMN_FAMILY_RAW_DATA_BYTES, column, data) (new ImmutableBytesWritable(rowKey), put) }) putRDD.saveAsNewAPIHadoopDataset(conf) On 19 September 2014 16:52, innowireless TaeYun Kim < taeyun....@innowireless.co.kr> wrote: > Hi, > > > > Sorry, I just found saveAsNewAPIHadoopDataset. > > Then, Can I use HFileOutputFormat with saveAsNewAPIHadoopDataset? Is there > any example code for that? > > > > Thanks. > > > > *From:* innowireless TaeYun Kim [mailto:taeyun....@innowireless.co.kr] > *Sent:* Friday, September 19, 2014 8:18 PM > *To:* user@spark.apache.org > *Subject:* RE: Bulk-load to HBase > > > > Hi, > > > > After reading several documents, it seems that saveAsHadoopDataset cannot > use HFileOutputFormat. > > It’s because saveAsHadoopDataset method uses JobConf, so it belongs to > the old Hadoop API, while HFileOutputFormat is a member of mapreduce > package which is for the new Hadoop API. > > > > Am I right? > > If so, is there another method to bulk-load to HBase from RDD? > > > > Thanks. > > > > *From:* innowireless TaeYun Kim [mailto:taeyun....@innowireless.co.kr > <taeyun....@innowireless.co.kr>] > *Sent:* Friday, September 19, 2014 7:17 PM > *To:* user@spark.apache.org > *Subject:* Bulk-load to HBase > > > > Hi, > > > > Is there a way to bulk-load to HBase from RDD? > > HBase offers HFileOutputFormat class for bulk loading by MapReduce job, > but I cannot figure out how to use it with saveAsHadoopDataset. > > > > Thanks. >