Re: Rolling hourly data

Geoffrey Jacoby Thu, 15 Nov 2018 17:23:58 -0800

Monil,

Could you please post the results of an EXPLAIN plan of your query? For
directions how to do this please see
http://phoenix.apache.org/explainplan.html


Geoffrey Jacoby

On Thu, Nov 15, 2018 at 5:02 PM Monil Gandhi <mgand...@gmail.com> wrote:

> Hello,
> Currently we have hourly data in our phoenix table. However, the schema
> was designed to perform well for daily data. Increasing the number of rows
> by 24X has lead to degradation of our service over time.
> Our current schema is as follows
>
> CREATE TABLE IF NOT EXISTS T1 (sid BIGINT NOT NULL, day DATE NOT NULL, cid
> BIGINT NOT NULL, s.count INTEGER, CONSTRAINT PK PRIMARY KEY (sid, day, cid)
> ) COMPRESSION='SNAPPY'
>
> The query we run is something along the lines of
> SELECT sid, cid, CAST(SUM(count) AS BIGINT) AS count FROM PUBLISHER_V4
> WHERE sid IN (34535) AND day BETWEEN TO_DATE('2018-07-01', 'yyyy-MM-dd',
> 'GMT') AND TO_DATE('2018-07-02', 'yyyy-MM-dd', 'GMT') GROUP BY sid, cid
>
> Based on our investigation we have concluded that the main reason is
> purely the number of rows that are being read. I am open to other
> suggestions
>
> If number of rows is the case
> I am wondering if there is a way to either
> 1. to roll hourly data to daily using views, secondary index or map
> reduce. I know map reduce is possible.
> 2. migrate to a newer schema where cid is not part of pk and is actually a
> column family. I was unable to find any kind of documentation on this.
>
> Thanks
> Monil
>

Re: Rolling hourly data

Reply via email to