Re: PHOENIX SPARK - DataFrame for BulkLoad

Mohanraj Ragupathiraj Thu, 19 May 2016 23:54:45 -0700

Thank you very much., I will try and post the updates.

On Wed, May 18, 2016 at 10:29 PM, Josh Mahonin <jmaho...@gmail.com> wrote:


> Hi,
>
> The Spark integration uses the Phoenix MapReduce framework, which under
> the hood translates those to UPSERTs spread across a number of workers.
>
> You should try out both methods and see which works best for your use
> case. For what it's worth, we routinely do load / save operations using the
> Spark integration on those data sizes.
>
> Josh
>
> On Tue, May 17, 2016 at 7:03 AM, Radha krishna <grkmc...@gmail.com> wrote:
>
>> Hi
>>
>> I have the same scenario, can you share your metrics like column count
>> for each row, number of SALT_BUCKETS, compression technique which you used
>> and how much time it is taking to load the complete data.
>>
>> my scenario is I have to load 1.9 billions of records ( approx 20 files
>> data each file contains 100 million rows and 102 columns per each row)
>> currently it is taking 35 to 45 minutes to load one file data
>>
>>
>>
>> On Tue, May 17, 2016 at 3:51 PM, Mohanraj Ragupathiraj <
>> mohanaug...@gmail.com> wrote:
>>
>>> I have 100 million records to be inserted to a HBase table (PHOENIX) as
>>> a result of a Spark Job. I would like to know if i convert it to a
>>> Dataframe and save it, will it do Bulk load (or) it is not the efficient
>>> way to write data to Phoenix HBase table
>>>
>>> --
>>> Thanks and Regards
>>> Mohan
>>>
>>
>>
>>
>> --
>>
>>
>>
>>
>>
>>
>>
>>
>> Thanks & Regards
>>    Radha krishna
>>
>>
>>
>


-- 
Thanks and Regards
Mohan
VISA Pte Limited, Singapore.

Re: PHOENIX SPARK - DataFrame for BulkLoad

Reply via email to