Monil, Could you please post the results of an EXPLAIN plan of your query? For directions how to do this please see http://phoenix.apache.org/explainplan.html
Geoffrey Jacoby On Thu, Nov 15, 2018 at 5:02 PM Monil Gandhi <mgand...@gmail.com> wrote: > Hello, > Currently we have hourly data in our phoenix table. However, the schema > was designed to perform well for daily data. Increasing the number of rows > by 24X has lead to degradation of our service over time. > Our current schema is as follows > > CREATE TABLE IF NOT EXISTS T1 (sid BIGINT NOT NULL, day DATE NOT NULL, cid > BIGINT NOT NULL, s.count INTEGER, CONSTRAINT PK PRIMARY KEY (sid, day, cid) > ) COMPRESSION='SNAPPY' > > The query we run is something along the lines of > SELECT sid, cid, CAST(SUM(count) AS BIGINT) AS count FROM PUBLISHER_V4 > WHERE sid IN (34535) AND day BETWEEN TO_DATE('2018-07-01', 'yyyy-MM-dd', > 'GMT') AND TO_DATE('2018-07-02', 'yyyy-MM-dd', 'GMT') GROUP BY sid, cid > > Based on our investigation we have concluded that the main reason is > purely the number of rows that are being read. I am open to other > suggestions > > If number of rows is the case > I am wondering if there is a way to either > 1. to roll hourly data to daily using views, secondary index or map > reduce. I know map reduce is possible. > 2. migrate to a newer schema where cid is not part of pk and is actually a > column family. I was unable to find any kind of documentation on this. > > Thanks > Monil >