Re: Rolling hourly data

Monil Gandhi Fri, 16 Nov 2018 09:45:02 -0800

Here it is
CLIENT 1-CHUNK PARALLEL 1-WAY RANGE SCAN OVER PUBLISHER_V4
[-3790006267705642038,'2018-07-01 00:00:00.000'] - [-37900062677056420
SERVER AGGREGATE INTO DISTINCT ROWS BY [SEGMENT_ID, CROSS_SEGMENT_ID]
CLIENT MERGE SORT


Note: we have a dedicated phoenix query server

On Thu, Nov 15, 2018 at 5:23 PM Geoffrey Jacoby <gjac...@salesforce.com>
wrote:

> Monil,
>
> Could you please post the results of an EXPLAIN plan of your query? For
> directions how to do this please see
> http://phoenix.apache.org/explainplan.html
>
> Geoffrey Jacoby
>
> On Thu, Nov 15, 2018 at 5:02 PM Monil Gandhi <mgand...@gmail.com> wrote:
>
>> Hello,
>> Currently we have hourly data in our phoenix table. However, the schema
>> was designed to perform well for daily data. Increasing the number of rows
>> by 24X has lead to degradation of our service over time.
>> Our current schema is as follows
>>
>> CREATE TABLE IF NOT EXISTS T1 (sid BIGINT NOT NULL, day DATE NOT NULL,
>> cid BIGINT NOT NULL, s.count INTEGER, CONSTRAINT PK PRIMARY KEY (sid, day,
>> cid) ) COMPRESSION='SNAPPY'
>>
>> The query we run is something along the lines of
>> SELECT sid, cid, CAST(SUM(count) AS BIGINT) AS count FROM PUBLISHER_V4
>> WHERE sid IN (34535) AND day BETWEEN TO_DATE('2018-07-01', 'yyyy-MM-dd',
>> 'GMT') AND TO_DATE('2018-07-02', 'yyyy-MM-dd', 'GMT') GROUP BY sid, cid
>>
>> Based on our investigation we have concluded that the main reason is
>> purely the number of rows that are being read. I am open to other
>> suggestions
>>
>> If number of rows is the case
>> I am wondering if there is a way to either
>> 1. to roll hourly data to daily using views, secondary index or map
>> reduce. I know map reduce is possible.
>> 2. migrate to a newer schema where cid is not part of pk and is actually
>> a column family. I was unable to find any kind of documentation on this.
>>
>> Thanks
>> Monil
>>
>

Re: Rolling hourly data

Reply via email to