Thank you very much., I will try and post the updates. On Wed, May 18, 2016 at 10:29 PM, Josh Mahonin <jmaho...@gmail.com> wrote:
> Hi, > > The Spark integration uses the Phoenix MapReduce framework, which under > the hood translates those to UPSERTs spread across a number of workers. > > You should try out both methods and see which works best for your use > case. For what it's worth, we routinely do load / save operations using the > Spark integration on those data sizes. > > Josh > > On Tue, May 17, 2016 at 7:03 AM, Radha krishna <grkmc...@gmail.com> wrote: > >> Hi >> >> I have the same scenario, can you share your metrics like column count >> for each row, number of SALT_BUCKETS, compression technique which you used >> and how much time it is taking to load the complete data. >> >> my scenario is I have to load 1.9 billions of records ( approx 20 files >> data each file contains 100 million rows and 102 columns per each row) >> currently it is taking 35 to 45 minutes to load one file data >> >> >> >> On Tue, May 17, 2016 at 3:51 PM, Mohanraj Ragupathiraj < >> mohanaug...@gmail.com> wrote: >> >>> I have 100 million records to be inserted to a HBase table (PHOENIX) as >>> a result of a Spark Job. I would like to know if i convert it to a >>> Dataframe and save it, will it do Bulk load (or) it is not the efficient >>> way to write data to Phoenix HBase table >>> >>> -- >>> Thanks and Regards >>> Mohan >>> >> >> >> >> -- >> >> >> >> >> >> >> >> >> Thanks & Regards >> Radha krishna >> >> >> > -- Thanks and Regards Mohan VISA Pte Limited, Singapore.