Writing to HDFS with a columnar format like Parquet will always be faster
than writing to HBase. How about random access of a row? If you're not
doing point lookups and small range scans, you probably don't want to use
HBase (& Phoenix). HBase is writing more information than is written when
using Parquet. HBase is essentially maintaining an index by row key and
multiple versions of cells. You can reduce the amount of data by using our
storage formats [1] (though it still won't come close to Parquet in HDFS).

Using executeBatch won't make a performance difference unless you're using
the Phoenix Query Server (in which case the difference will be
substantial). For the thick client, I'd recommend turning auto commit off
and batching together ~100 rows before calling commit. These and other tips
can be found in our Tuning Guide [2].

[1] https://phoenix.apache.org/columnencoding.html
[2] https://phoenix.apache.org/tuning_guide.html

On Tue, Jan 23, 2018 at 3:45 AM, Flavio Pompermaier <pomperma...@okkam.it>
wrote:

> Any answer on this..?
>
> On Fri, Jan 12, 2018 at 10:38 AM, Flavio Pompermaier <pomperma...@okkam.it
> > wrote:
>
>> Hi to all,
>> looking at the documentation (https://phoenix.apache.org/tu
>> ning_guide.html), in the writing section, there's the following
>> sentence: "Phoenix uses commit() instead of executeBatch() to control batch
>> updates". Am using a Phoenix connection with autocommit enabled +
>> PreparedStatement.executeBatch(). Doesn't Phoenix handle this
>> correctly...?
>> I'm asking this because writing directly to HDFS (as Parquet) takes 1
>> minute, while UPSERTING into HBASE takes 15 m...what can I do to detect
>> what is slowing down the write?
>>
>> Best,
>> Flavio
>>
>
>
>
> --
> Flavio Pompermaier
> Development Department
>
> OKKAM S.r.l.
> Tel. +(39) 0461 041809 <+39%200461%20041809>
>

Reply via email to