Re: Spark HBase Bulk load using HFileFormat

2016-07-14 Thread Ted Yu
Please take a look at http://hbase.apache.org/book.html#dm.sort In your second example, the column qualifier of current cell was not in proper order. On Thu, Jul 14, 2016 at 12:13 PM, yeshwanth kumar wrote: > Hi , > > i have few questions regarding BulkLoad, > does the

Re: Spark HBase Bulk load using HFileFormat

2016-07-14 Thread yeshwanth kumar
Hi , i have few questions regarding BulkLoad, does the Rows needs to be in sorted order or, the KeyValues in the row needs to be in sorted order? sometimes i see exception between two different rowkeys, sometime i see exception between keyvalue pairs of same rowkey. for example current cell

Re: Spark HBase Bulk load using HFileFormat

2016-07-14 Thread yeshwanth kumar
following is the code snippet for saveASHFile def saveAsHFile(putRDD: RDD[(ImmutableBytesWritable, KeyValue)], outputPath: String) = { val conf = ConfigFactory.getConf val job = Job.getInstance(conf, "HBaseBulkPut") job.setMapOutputKeyClass(classOf[ImmutableBytesWritable])

Re: Spark HBase Bulk load using HFileFormat

2016-07-13 Thread Ted Yu
Can you show the code inside saveASHFile ? Maybe the partitions of the RDD need to be sorted (for 1st issue). Cheers On Wed, Jul 13, 2016 at 4:29 PM, yeshwanth kumar wrote: > Hi i am doing bulk load into HBase as HFileFormat, by > using saveAsNewAPIHadoopFile > > i am

Spark HBase Bulk load using HFileFormat

2016-07-13 Thread yeshwanth kumar
Hi i am doing bulk load into HBase as HFileFormat, by using saveAsNewAPIHadoopFile i am on HBase 1.2.0-cdh5.7.0 and spark 1.6 when i try to write i am getting an exception java.io.IOException: Added a key not lexically larger than previous. following is the code snippet case class