Re: Phoenix executeBatch

2018-01-23 Thread James Taylor
Writing to HDFS with a columnar format like Parquet will always be faster
than writing to HBase. How about random access of a row? If you're not
doing point lookups and small range scans, you probably don't want to use
HBase (& Phoenix). HBase is writing more information than is written when
using Parquet. HBase is essentially maintaining an index by row key and
multiple versions of cells. You can reduce the amount of data by using our
storage formats [1] (though it still won't come close to Parquet in HDFS).

Using executeBatch won't make a performance difference unless you're using
the Phoenix Query Server (in which case the difference will be
substantial). For the thick client, I'd recommend turning auto commit off
and batching together ~100 rows before calling commit. These and other tips
can be found in our Tuning Guide [2].

[1] https://phoenix.apache.org/columnencoding.html
[2] https://phoenix.apache.org/tuning_guide.html

On Tue, Jan 23, 2018 at 3:45 AM, Flavio Pompermaier 
wrote:

> Any answer on this..?
>
> On Fri, Jan 12, 2018 at 10:38 AM, Flavio Pompermaier  > wrote:
>
>> Hi to all,
>> looking at the documentation (https://phoenix.apache.org/tu
>> ning_guide.html), in the writing section, there's the following
>> sentence: "Phoenix uses commit() instead of executeBatch() to control batch
>> updates". Am using a Phoenix connection with autocommit enabled +
>> PreparedStatement.executeBatch(). Doesn't Phoenix handle this
>> correctly...?
>> I'm asking this because writing directly to HDFS (as Parquet) takes 1
>> minute, while UPSERTING into HBASE takes 15 m...what can I do to detect
>> what is slowing down the write?
>>
>> Best,
>> Flavio
>>
>
>
>
> --
> Flavio Pompermaier
> Development Department
>
> OKKAM S.r.l.
> Tel. +(39) 0461 041809 <+39%200461%20041809>
>


Re: Phoenix executeBatch

2018-01-23 Thread Flavio Pompermaier
Any answer on this..?

On Fri, Jan 12, 2018 at 10:38 AM, Flavio Pompermaier 
wrote:

> Hi to all,
> looking at the documentation (https://phoenix.apache.org/tuning_guide.html),
> in the writing section, there's the following sentence: "Phoenix uses
> commit() instead of executeBatch() to control batch updates". Am using a
> Phoenix connection with autocommit enabled + PreparedStatement.executeBatch().
> Doesn't Phoenix handle this correctly...?
> I'm asking this because writing directly to HDFS (as Parquet) takes 1
> minute, while UPSERTING into HBASE takes 15 m...what can I do to detect
> what is slowing down the write?
>
> Best,
> Flavio
>



-- 
Flavio Pompermaier
Development Department

OKKAM S.r.l.
Tel. +(39) 0461 041809


Mutation state batch upserts

2018-01-23 Thread Flavio Pompermaier
Hi to all,
I've tested a program that writes (UPSERTS) to Phoenix using executeBatch().
In the logs I see "*Sent batch of 2 for SOMETABLE*" .
Is this correct? I fear that the batch is not executed in batch but
statement by statement.. the code within the
PhoenixStatement.executeBatch() is:

for (i = 0; i < returnCodes.length; i++) {
  PhoenixPreparedStatement statement = batch.get(i);
  returnCodes[i] = statement.execute(true) ? Statement.SUCCESS_NO_INFO :
statement.getUpdateCount();
}
flushIfNecessary();


Moreover, flushIfNecessary() doesn't actually flush anything because
connection is not in autoflush modedeed,

Am I wrong? Is the batch committed correctly or is it committed upsert by
upsert?

Best,
Flavio