Re: Bad performance of the first resultset.next()

Binh Luong Thu, 20 Apr 2017 04:13:00 -0700

Hi Josh,
thank you for your answer.
Yes, I am using HDP 2.3.4. You're right, with the newer versions it may improve the performance significantly. However, we are going to have a release shortly, so now it's not possible for an upgrade. But yes, it should happen in the upcoming application release.

The table has 21 columns:
- the first 3 (id,type and timestamp) make up the PK
- the following 18 columns are unsigned int.

No, there is no secondary indexes defined for the table.
An example query:
SELECT timestamp,VALUE04,VALUE15
FROM T.TABELLE
WHERE id='ID1' and type='A' and timestamp>=TO_TIMESTAMP('...') timestamp<=TO_TIMESTAMP('...')
ORDER BY id ASC, type ASC, timestamp ASC;

Explain plan:
| CLIENT 7-CHUNK PARALLEL 7-WAY RANGE SCAN OVER T.TABELLE [0,'ID1','A','2015-12-02 00:00:00.000'] - [0,'ID1','A','2017-01-01 00:00:00.000']
| SERVER FILTER BY (A.VALUE04 IS NOT NULL OR A.VALUE15 IS NOT NULL)
| CLIENT MERGE SORT

It looks like you suspect that phoenix is firstly reading the data and then post-filtering / sorting the data.

But why it take sometimes so much time in the first next() call?

When I try to send the request sequentially, the 1.next() always takes about less than 200 ms for processing. But when a large number of requests are coming in parallel, the processing time is increasing significantly to even more than 20, 30 secs.

Is it something relating to HBase, as the table is minor compacted from time to time and it has impact to the read performance?
I am not sure how the next() call is implemented in the phoenix 4.4.0? Which component can be the bottleneck in such concurrent processing scenario?

Thanks in advance
Lee

Re: Bad performance of the first resultset.next()

Reply via email to