Thanks Pari.

The frequency of the job is weekly.
No. of rows is around 10 billion.
Cluster is 13 node.
>From what you have mentioned I see that CsvBulkLoadTool is best option for
my scenario.

I see you have mentioned about increasing the batch size to accommodate
more rows.
Are you talking about the 'phoenix.mutate.batchSize' configuration
parameter?

Vamsi Attluri

On Wed, Mar 16, 2016 at 9:01 AM Pariksheet Barapatre <pbarapa...@gmail.com>
wrote:

> Hi Vamsi,
>
> How many number of rows your expecting out of your transformation and what
> is the frequency of job?
>
> If there are less number of row (< ~100K and this depends on cluster size
> as well), you can go ahead with phoenix-spark plug-in , increase  batch
> size to accommodate more rows, else use CVSbulkLoader.
>
> Thanks
> Pari
>
> On 16 March 2016 at 20:03, Vamsi Krishna <vamsi.attl...@gmail.com> wrote:
>
>> Thanks Gabriel & Ravi.
>>
>> I have a data processing job wirtten in Spark-Scala.
>> I do a join on data from 2 data files (CSV files) and do data
>> transformation on the resulting data. Finally load the transformed data
>> into phoenix table using Phoenix-Spark plugin.
>> On seeing that Phoenix-Spark plugin goes through regular HBase write path
>> (writes to WAL), i'm thinking of option 2 to reduce the job execution time.
>>
>> *Option 2:* Do data transformation in Spark and write the transformed
>> data to a CSV file and use Phoenix CsvBulkLoadTool to load data into
>> Phoenix table.
>>
>> Has anyone tried this kind of exercise? Any thoughts.
>>
>> Thanks,
>> Vamsi Attluri
>>
>> On Tue, Mar 15, 2016 at 9:40 PM Ravi Kiran <maghamraviki...@gmail.com>
>> wrote:
>>
>>> Hi Vamsi,
>>>    The upserts through Phoenix-spark plugin definitely go through WAL .
>>>
>>>
>>> On Tue, Mar 15, 2016 at 5:56 AM, Gabriel Reid <gabriel.r...@gmail.com>
>>> wrote:
>>>
>>>> Hi Vamsi,
>>>>
>>>> I can't answer your question abotu the Phoenix-Spark plugin (although
>>>> I'm sure that someone else here can).
>>>>
>>>> However, I can tell you that the CsvBulkLoadTool does not write to the
>>>> WAL or to the Memstore. It simply writes HFiles and then hands those
>>>> HFiles over to HBase, so the memstore and WAL are never
>>>> touched/affected by this.
>>>>
>>>> - Gabriel
>>>>
>>>>
>>>> On Tue, Mar 15, 2016 at 1:41 PM, Vamsi Krishna <vamsi.attl...@gmail.com>
>>>> wrote:
>>>> > Team,
>>>> >
>>>> > Does phoenix CsvBulkLoadTool write to HBase WAL/Memstore?
>>>> >
>>>> > Phoenix-Spark plugin:
>>>> > Does saveToPhoenix method on RDD[Tuple] write to HBase WAL/Memstore?
>>>> >
>>>> > Thanks,
>>>> > Vamsi Attluri
>>>> > --
>>>> > Vamsi Attluri
>>>>
>>>
>>> --
>> Vamsi Attluri
>>
>
>
>
> --
> Cheers,
> Pari
>
-- 
Vamsi Attluri

Reply via email to