"Will the partition on PRIMARY KEY ((YEAR, MONTH, DAY, HOUR) cause any
hotspot issues on a node given the hourly data size is ~13MB ?"

 13MB/partition is quite small, you should be fine. One thing to be careful
is the memtable flush frequency and appropriate compaction tuning to avoid
having one partition that spans thousands of SSTables.

On Tue, Nov 17, 2015 at 11:29 AM, Chandra Sekar KR <
chandraseka...@hotmail.com> wrote:

> Hi,
>
>
> I have a time-series based table with the below structure and partition
> size/volumetrics. The purpose of this table is to enable range based scans
> on log_ts and filter the log_id, so it can be further used in the main
> table (EVENT_LOG) for checking the actual data. The EVENT_LOG_BY_DATE acts
> as a lookup (index) to the main table.
>
>
> CREATE TABLE EVENT_LOG_BY_DATE (
>
>   YEAR INT,
>
>   MONTH INT,
>
>   DAY INT,
>
>   HOUR INT,
>
>   LOG_TS TIMESTAMP,
>
>   LOG_ID VARINT,
>
>   PRIMARY KEY ((YEAR, MONTH, DAY, HOUR), LOG_TS))
>
> WITH CLUSTERING ORDER BY (LOG_TS DESC);
>
>
> SELECT LOG_TS, LOG_ID FROM EVENT_LOG_BY_DATE
>
>   WHERE YEAR = 2015 AND
>
>   MONTH = 11 AND
>
>   DAY = 15 AND
>
>   HOUR IN (10,11) AND
>
>   LOG_TS >= '2015-11-15 10:00:00+0000' AND
>
>   LOG_TS <= '2015-11-15 11:00:00+0000';
>
>
> Average daily volume of records for this table is ~10million & the avg.
> row size is ~40B. The partition size for an hour comes close to 13MB with
> each partition spanning 416K rows. Will the partition on PRIMARY KEY
> ((YEAR, MONTH, DAY, HOUR) cause any hotspot issues on a node given the
> hourly data size is ~13MB ?
>
>
> Is there any alternate way to model the above time-series based table that
> enable range scans?
>
>
> Regards, Chandra KR
>

Reply via email to