Hi Andrew, Hfileoutputformat2 needs the hbase keys to be sorted in lexicographically.
as per your suggestion timestamp + hashed key, i might end up doing a sort on the rdd. which i want to avoid. if i could generate a sequential key, i don't need to do a sort, i could just write after processing the data into hfiles. can you explain me how can i generate a sequential key. Thanks, Yesh On Sat, Jul 23, 2016 at 11:24 PM, Andrew Ehrlich <and...@aehrlich.com> wrote: > It’s hard to do in a distributed system. Maybe try generating a meaningful > key using a timestamp + hashed unique key fields in the record? > > > On Jul 23, 2016, at 7:53 PM, yeshwanth kumar <yeshwant...@gmail.com> > wrote: > > > > Hi, > > > > i am doing bulk load to hbase using spark, > > in which i need to generate a sequential key for each record, > > the key should be sequential across all the executors. > > > > i tried zipwith index, didn't worked because zipwith index gives index > per executor not across all executors. > > > > looking for some suggestions. > > > > > > Thanks, > > -Yeshwanth > >