Thanks Gabriel & Ravi. I have a data processing job wirtten in Spark-Scala. I do a join on data from 2 data files (CSV files) and do data transformation on the resulting data. Finally load the transformed data into phoenix table using Phoenix-Spark plugin. On seeing that Phoenix-Spark plugin goes through regular HBase write path (writes to WAL), i'm thinking of option 2 to reduce the job execution time.
*Option 2:* Do data transformation in Spark and write the transformed data to a CSV file and use Phoenix CsvBulkLoadTool to load data into Phoenix table. Has anyone tried this kind of exercise? Any thoughts. Thanks, Vamsi Attluri On Tue, Mar 15, 2016 at 9:40 PM Ravi Kiran <maghamraviki...@gmail.com> wrote: > Hi Vamsi, > The upserts through Phoenix-spark plugin definitely go through WAL . > > > On Tue, Mar 15, 2016 at 5:56 AM, Gabriel Reid <gabriel.r...@gmail.com> > wrote: > >> Hi Vamsi, >> >> I can't answer your question abotu the Phoenix-Spark plugin (although >> I'm sure that someone else here can). >> >> However, I can tell you that the CsvBulkLoadTool does not write to the >> WAL or to the Memstore. It simply writes HFiles and then hands those >> HFiles over to HBase, so the memstore and WAL are never >> touched/affected by this. >> >> - Gabriel >> >> >> On Tue, Mar 15, 2016 at 1:41 PM, Vamsi Krishna <vamsi.attl...@gmail.com> >> wrote: >> > Team, >> > >> > Does phoenix CsvBulkLoadTool write to HBase WAL/Memstore? >> > >> > Phoenix-Spark plugin: >> > Does saveToPhoenix method on RDD[Tuple] write to HBase WAL/Memstore? >> > >> > Thanks, >> > Vamsi Attluri >> > -- >> > Vamsi Attluri >> > > -- Vamsi Attluri