Re: Effective partition key for time series data, which allows range queries?

2017-03-27 Thread Noorul Islam Kamal Malmiyoda
Have you looked at KairosDB schema ?

https://kairosdb.github.io/

Regards,
Noorul

On Tue, Mar 28, 2017 at 6:17 AM, Ali Akhtar  wrote:
> I have a use case where the data for individual users is being tracked, and
> every 15 minutes or so, the data for the past 15 minutes is inserted into
> the table.
>
> The table schema looks like:
> user id, timestamp, foo, bar, etc.
>
> Where foo, bar, etc are the items being tracked, and their values over the
> past 15 minutes.
>
> I initially planned to use the user id as the primary key of the table. But,
> I realized that this may cause really wide rows ( tracking for 24 hours
> means 96 records inserted (1 for each 15 min window), over 1 year this means
> 36k records per user, over 2 years, 72k, etc).
>
> I know the  limit of wide rows is billions of records, but I've heard that
> the practical limit is much lower.
>
> So I considered using a composite primary key: (user, timestamp)
>
> If I'm correct, the above should create a new row for each user & timestamp
> logged.
>
> However, will i still be able to do range queries on the timestamp, to e.g
> return the data for the last week?
>
> E.g select * from data where user_id = 'foo' and timestamp >= '<1 month
> ago>' and timestamp <= '' ?
>


Re: Too many sstables with DateTieredCompactionStrategy

2016-02-29 Thread Noorul Islam Kamal Malmiyoda
Hello Marcus,

I altered the table to set timestamp_resolution to 'MICROSECONDS'. I
waited for sometime, but the sstable count did not come down. Do you
think I should specific command to reduce the count of sstables after
setting this?

Thanks and Regards
Noorul


On Mon, Feb 29, 2016 at 7:22 PM, Marcus Eriksson  wrote:
> why do you have 'timestamp_resolution': 'MILLISECONDS'? It should be left as
> default (MICROSECONDS) unless you do "USING TIMESTAMP
> "-inserts, see
> https://issues.apache.org/jira/browse/CASSANDRA-11041
>
> On Mon, Feb 29, 2016 at 2:36 PM, Noorul Islam K M  wrote:
>>
>>
>> Hi all,
>>
>> We are using below compaction settings for a table
>>
>> compaction = {'timestamp_resolution': 'MILLISECONDS',
>> 'max_sstable_age_days': '365', 'base_time_seconds': '60', 'class':
>> 'org.apache.cassandra.db.compaction.DateTieredCompactionStrategy'}
>>
>> But it is creating too many sstables. Currently number of sstables
>> is 4. We have been injecting data for the last three days.
>>
>> We have set the compactionthroughput to 128 MB/s
>>
>> $ nodetool getcompactionthroughput
>>
>> Current compaction throughput: 128 MB/s
>>
>> But this is not helping.
>>
>> How can we control the number of sstables in this case?
>>
>> Thanks and Regards
>> Noorul
>
>


Re: Too many sstables with DateTieredCompactionStrategy

2016-02-29 Thread Noorul Islam Kamal Malmiyoda
Yes, we have enabled it on OpsCenter. Is that the reason?
On Feb 29, 2016 8:07 PM, "Dominik Keil"  wrote:

> Are you using incremental repais?
>
> Am 29.02.2016 um 14:36 schrieb Noorul Islam K M:
>
>
> Hi all,
>
> We are using below compaction settings for a table
>
> compaction = {'timestamp_resolution': 'MILLISECONDS',
> 'max_sstable_age_days': '365', 'base_time_seconds': '60', 'class':
> 'org.apache.cassandra.db.compaction.DateTieredCompactionStrategy'}
>
> But it is creating too many sstables. Currently number of sstables
> is 4. We have been injecting data for the last three days.
>
> We have set the compactionthroughput to 128 MB/s
>
> $ nodetool getcompactionthroughput
>
> Current compaction throughput: 128 MB/s
>
> But this is not helping.
>
> How can we control the number of sstables in this case?
>
> Thanks and Regards
> Noorul
>
>
> --
> *Dominik Keil*
> Phone: + 49 (0) 621 150 207 31
> Mobile: + 49 (0) 151 626 602 14
>
> Movilizer GmbH
> Julius-Hatry-Strasse 1
> 68163 Mannheim
> Germany
>
> movilizer.com
>
> [image: Visit company website] 
> *Reinvent Your Mobile Enterprise*
>
> 
> 
>
> *Be the first to know:*
> Twitter  | LinkedIn
>  | Facebook
>  | stack overflow
> 
>
> Company's registered office: Mannheim HRB: 700323 / Country Court:
> Mannheim Managing Directors: Alberto Zamora, Jörg Bernauer, Oliver Lesche
> Please inform us immediately if this e-mail and/or any attachment was
> transmitted incompletely or was not intelligible.
>
> This e-mail and any attachment is for authorized use by the intended
> recipient(s) only. It may contain proprietary material, confidential
> information and/or be subject to legal privilege. It should not be
> copied, disclosed to, retained or used by any other party. If you are not
> an intended recipient then please promptly delete this e-mail and any
> attachment and all copies and inform the sender.


Re: sstableloader throughput

2016-01-11 Thread Noorul Islam Kamal Malmiyoda
On Mon, Jan 11, 2016 at 10:25 PM, Jeff Jirsa  wrote:
>
> Make sure streaming throughput isn’t throttled on the destination cluster.
>


How do I do that? Is stream_throughput_outbound_megabits_per_sec the
attribute in cassandra.yaml.

I think we can set that on the fly using nodetool setstreamthroughput

I ran

nodetool setstreamthroughput 0

on target machine. But that doesn't improve the average througput.

Thanks and Regards
Noorul

> Stream from more machines (divide sstables between a bunch of machines, run 
> in parallel).
>
>
>
>
>
>
>
> On 1/11/16, 5:21 AM, "Noorul Islam K M"  wrote:
>
>>
>>I have a need to stream data to new cluster using sstableloader. I
>>spawned a machine with 32 cores assuming that sstableloader scaled with
>>respect to cores. But it doesn't look like so.
>>
>>I am getting an average throughput of 18 MB/s which seems to be pretty
>>low (I might be wrong).
>>
>>Is there any way to increase the throughput. OpsCenter data on target
>>cluster shows very less write requests / second.
>>
>>Thanks and Regards
>>Noorul


Re: Cassandra Java Driver

2015-12-25 Thread Noorul Islam Kamal Malmiyoda
Is DSE shipping with 3.x ?

Thanks and Regards
Noorul

On Fri, Dec 25, 2015 at 9:07 PM, Alexandre Dutra
 wrote:
> Hi Jean,
>
> You should use 3.0.0-beta1.
>
> TL;DR
>
> DataStax Java driver series 2.2.x has been discontinued in favor of series
> 3.x; we explained why in this mail to the Java driver mailing list. We do
> not advise users to use this series.
>
> So the most recent driver version compatible with all versions of Cassandra,
> including 2.2 and 3.x, is now 3.0.0-beta1, although 3.0.0-rc1 will be
> released very soon.
>
> In spite of its "beta" label, version 3.0.0-beta1 has been thoroughly tested
> against all versions of Cassandra and is definitely production-ready... as
> long as the Cassandra version in use is also production-ready. Note however
> that Cassandra 2.2 and 3.0 are quite recent and most companies AFAICT do not
> consider them yet as production-ready.
>
> Hope that helps,
>
> Alexandre
>
>
> On Tue, Dec 22, 2015 at 4:40 PM Jean Tremblay
>  wrote:
>>
>> Hi,
>> Which Java Driver is suited for Cassandra 2.2.x. ?
>> I see datastax 3.0.0 beta1 and datastax 2.2.0 rc3...
>> Are they suited for production?
>> Is there anything better?
>> Thanks for your comments and replies?
>> Jean
>
> --
> Alexandre Dutra
> Driver & Tools Engineer @ DataStax


Re: What is the ideal way to merge two Cassandra clusters with same keyspace into one?

2015-12-23 Thread Noorul Islam Kamal Malmiyoda
Is there a way to keep writetime and ttl of each record as it is in new cluster?

Thanks and Regards
Noorul

On Mon, Dec 21, 2015 at 5:46 PM, DuyHai Doan  wrote:
> For cross-cluster operation with the Spark/Cassandra connector, you can look
> at this trick:
> http://www.slideshare.net/doanduyhai/fast-track-to-getting-started-with-dse-max-ing/64
>
> On Mon, Dec 21, 2015 at 1:14 PM, George Sigletos 
> wrote:
>>
>> Roughly half TB of data.
>>
>> There is a timestamp column in the tables we migrated and we did use that
>> to achieve incremental updates.
>>
>> I don't know anything about kairosdb, but I can see from the docs that
>> there exists a row timestamp column. Could you maybe use that one?
>>
>> Kind regards,
>> George
>>
>> On Mon, Dec 21, 2015 at 12:53 PM, Noorul Islam K M 
>> wrote:
>>>
>>> George Sigletos  writes:
>>>
>>> > Hello,
>>> >
>>> > We had a similar problem where we needed to migrate data from one
>>> > cluster
>>> > to another.
>>> >
>>> > We ended up using Spark to accomplish this. It is fast and reliable but
>>> > some downtime was required after all.
>>> >
>>> > We minimized the downtime by doing a first run, and then run
>>> > incremental
>>> > updates.
>>> >
>>>
>>> How much data are you talking about?
>>>
>>> How did you achieve incremental run? We are using kairosdb and some of
>>> the other schemas does not have a way to filter based on date.
>>>
>>> Thanks and Regards
>>> Noorul
>>>
>>> > Kind regards,
>>> > George
>>> >
>>> >
>>> >
>>> > On Mon, Dec 21, 2015 at 10:12 AM, Noorul Islam K M 
>>> > wrote:
>>> >
>>> >>
>>> >> Hello all,
>>> >>
>>> >> We have two clusters X and Y with same keyspaces but distinct data
>>> >> sets.
>>> >> We are planning to merge these into single cluster. What would be
>>> >> ideal
>>> >> steps to achieve this without downtime for applications? We have time
>>> >> series data stream continuously writing to Cassandra.
>>> >>
>>> >> We have ruled out export/import as that will make us loose data during
>>> >> the time of copy.
>>> >>
>>> >> We also ruled out sstableloader as that is not reliable. It fails
>>> >> often
>>> >> and there is not way to start from where it failed.
>>> >>
>>> >> Any suggestions will help.
>>> >>
>>> >> Thanks and Regards
>>> >> Noorul
>>> >>
>>
>>
>