Re: Bulk-load to HBase

Aniket Bhatnagar Fri, 19 Sep 2014 05:01:49 -0700

I have been using saveAsNewAPIHadoopDataset but I use TableOutputFormat
instead of HFileOutputFormat. But, hopefully this should help you:


val hbaseZookeeperQuorum =
s"$zookeeperHost:$zookeeperPort:$zookeeperHbasePath"
val conf = HBaseConfiguration.create()
conf.set("hbase.zookeeper.quorum", hbaseZookeeperQuorum)
conf.set(TableOutputFormat.QUORUM_ADDRESS, hbaseZookeeperQuorum)
conf.set(TableOutputFormat.QUORUM_PORT, zookeeperPort.toString)
conf.setClass("mapreduce.outputformat.class",
classOf[TableOutputFormat[Object]], classOf[OutputFormat[Object, Writable]])
conf.set(TableOutputFormat.OUTPUT_TABLE, tableName)

val rddToSave: RDD[(Array[Byte], Array[Byte], Array[Byte])] = ... // Some
RDD that contains row key, column qualifier and data

val putRDD = rddToSave.map(tuple => {
    val (rowKey, column data) = tuple
    val put: Put = new Put(rowKey)
    put.add(COLUMN_FAMILY_RAW_DATA_BYTES, column, data)

    (new ImmutableBytesWritable(rowKey), put)
})

putRDD.saveAsNewAPIHadoopDataset(conf)


On 19 September 2014 16:52, innowireless TaeYun Kim <
taeyun....@innowireless.co.kr> wrote:

> Hi,
>
>
>
> Sorry, I just found saveAsNewAPIHadoopDataset.
>
> Then, Can I use HFileOutputFormat with saveAsNewAPIHadoopDataset? Is there
> any example code for that?
>
>
>
> Thanks.
>
>
>
> *From:* innowireless TaeYun Kim [mailto:taeyun....@innowireless.co.kr]
> *Sent:* Friday, September 19, 2014 8:18 PM
> *To:* user@spark.apache.org
> *Subject:* RE: Bulk-load to HBase
>
>
>
> Hi,
>
>
>
> After reading several documents, it seems that saveAsHadoopDataset cannot
> use HFileOutputFormat.
>
> It’s because saveAsHadoopDataset method uses JobConf, so it belongs to
> the old Hadoop API, while HFileOutputFormat is a member of mapreduce
> package which is for the new Hadoop API.
>
>
>
> Am I right?
>
> If so, is there another method to bulk-load to HBase from RDD?
>
>
>
> Thanks.
>
>
>
> *From:* innowireless TaeYun Kim [mailto:taeyun....@innowireless.co.kr
> <taeyun....@innowireless.co.kr>]
> *Sent:* Friday, September 19, 2014 7:17 PM
> *To:* user@spark.apache.org
> *Subject:* Bulk-load to HBase
>
>
>
> Hi,
>
>
>
> Is there a way to bulk-load to HBase from RDD?
>
> HBase offers HFileOutputFormat class for bulk loading by MapReduce job,
> but I cannot figure out how to use it with saveAsHadoopDataset.
>
>
>
> Thanks.
>

Re: Bulk-load to HBase

Reply via email to