Here it is CLIENT 1-CHUNK PARALLEL 1-WAY RANGE SCAN OVER PUBLISHER_V4 [-3790006267705642038,'2018-07-01 00:00:00.000'] - [-37900062677056420 SERVER AGGREGATE INTO DISTINCT ROWS BY [SEGMENT_ID, CROSS_SEGMENT_ID] CLIENT MERGE SORT
Note: we have a dedicated phoenix query server On Thu, Nov 15, 2018 at 5:23 PM Geoffrey Jacoby <gjac...@salesforce.com> wrote: > Monil, > > Could you please post the results of an EXPLAIN plan of your query? For > directions how to do this please see > http://phoenix.apache.org/explainplan.html > > Geoffrey Jacoby > > On Thu, Nov 15, 2018 at 5:02 PM Monil Gandhi <mgand...@gmail.com> wrote: > >> Hello, >> Currently we have hourly data in our phoenix table. However, the schema >> was designed to perform well for daily data. Increasing the number of rows >> by 24X has lead to degradation of our service over time. >> Our current schema is as follows >> >> CREATE TABLE IF NOT EXISTS T1 (sid BIGINT NOT NULL, day DATE NOT NULL, >> cid BIGINT NOT NULL, s.count INTEGER, CONSTRAINT PK PRIMARY KEY (sid, day, >> cid) ) COMPRESSION='SNAPPY' >> >> The query we run is something along the lines of >> SELECT sid, cid, CAST(SUM(count) AS BIGINT) AS count FROM PUBLISHER_V4 >> WHERE sid IN (34535) AND day BETWEEN TO_DATE('2018-07-01', 'yyyy-MM-dd', >> 'GMT') AND TO_DATE('2018-07-02', 'yyyy-MM-dd', 'GMT') GROUP BY sid, cid >> >> Based on our investigation we have concluded that the main reason is >> purely the number of rows that are being read. I am open to other >> suggestions >> >> If number of rows is the case >> I am wondering if there is a way to either >> 1. to roll hourly data to daily using views, secondary index or map >> reduce. I know map reduce is possible. >> 2. migrate to a newer schema where cid is not part of pk and is actually >> a column family. I was unable to find any kind of documentation on this. >> >> Thanks >> Monil >> >