Adding to @Ted Check Bulk Put Example - https://github.com/tmalaska/SparkOnHBase/blob/master/src/main/scala/org/apache/hadoop/hbase/spark/example/hbasecontext/HBaseBulkPutExampleFromFile.scala
On Sat, Jan 28, 2017 at 9:11 AM, Ted Yu <[email protected]> wrote: > Have you looked at hbase-spark module (currently in master branch) ? > > See hbase-spark/src/main/scala/org/apache/hadoop/hbase/spark/ > example/datasources/AvroSource.scala > and hbase-spark/src/test/scala/org/apache/hadoop/hbase/spark/ > DefaultSourceSuite.scala > for examples. > > There may be other options. > > FYI > > On Fri, Jan 27, 2017 at 7:28 PM, jeff saremi <[email protected]> > wrote: > > > Hi > > I'm seeking some pointers/guidance on what we could do to insert billions > > of records that we already have in avro files in hadoop into HBase. > > > > I read some articles online and one of them recommended using HFile > > format. I took a cursory look at the documentation for that. Given the > > complexity of that I think that may be the last resort we want to pursue. > > Unless some library is out there that easily helps us write our files > into > > that format. I didn't see any. > > Assuming that the Hbase native client may be our best bet, is there any > > advice around pre-paritioning our records or such techniques that we > could > > use? > > thanks > > > > Jeff > > >
