Re: Max number of windows when using TWCS

2019-02-11 Thread Akash Gangil
I have in the past tried to delete SSTables manually, but have noticed bits
and pieces of that data still remain, even though the sstables of that
window is deleted. So always wondered if playing directly with the
underlying filesystem is a safe bet?


On Mon, Feb 11, 2019 at 1:01 PM Jonathan Haddad  wrote:

> Deleting SSTables manually can be useful if you don't know your TTL up
> front.  For example, you have an ETL process that moves your raw Cassandra
> data into S3 as parquet files, and you want to be sure that process is
> completed before you delete the data.  You could also start out without
> setting a TTL and later realize you need one.  This is a remarkably common
> problem.
>
> On Mon, Feb 11, 2019 at 12:51 PM Nitan Kainth 
> wrote:
>
>> Jeff,
>>
>> It means we have to delete sstables manually?
>>
>>
>> Regards,
>>
>> Nitan
>>
>> Cell: 510 449 9629
>>
>> On Feb 11, 2019, at 2:40 PM, Jeff Jirsa  wrote:
>>
>> There's a bit of headache around overlapping sstables being strictly safe
>> to delete.  https://issues.apache.org/jira/browse/CASSANDRA-13418 was
>> added to allow the "I know it's not technically safe, but just delete it
>> anyway" use case. For a lot of people who started using TWCS before 13418,
>> "stop cassandra, remove stuff we know is expired, start cassandra" is a
>> not-uncommon pattern in very high-write, high-disk-space use cases.
>>
>>
>>
>> On Mon, Feb 11, 2019 at 12:34 PM Nitan Kainth 
>> wrote:
>>
>>> Hi,
>>> In regards to comment “Purging data is also straightforward, just
>>> dropping SSTables (by a script) where create date is older than a
>>> threshold, we don't even need to rely on TTL”
>>>
>>> Doesn’t the old sstables drop by itself? One ttl and gc grace seconds
>>> past whole sstable will have only tombstones.
>>>
>>>
>>> Regards,
>>>
>>> Nitan
>>>
>>> Cell: 510 449 9629
>>>
>>> On Feb 11, 2019, at 2:23 PM, DuyHai Doan  wrote:
>>>
>>> Purging data is also straightforward, just dropping SSTables (by a
>>> script) where create date is older than a threshold, we don't even need to
>>> rely on TTL
>>>
>>>
>
> --
> Jon Haddad
> http://www.rustyrazorblade.com
> twitter: rustyrazorblade
>


-- 
Akash


Modeling Time Series data

2019-01-11 Thread Akash Gangil
Hi,

I have a data model where the partition key for a lot of tables is based on
time
(year, month, day, hour)

Would this create a hotspot in my cluster, given all the writes/reads would
go to the same node for a given hour? Or does the cassandra storage engine
also takes into account the table info like table name, when distributing
the data?

If the above model would be a problem, what's the suggested way to solve
this? Add tablename to partition key?


-- 
Akash


Re: Reaper 1.2 released

2018-07-31 Thread Akash Gangil
Hi,

I see that when I try to access /snapshot/{clusterName} API endpoint, I get
a 404 while all the other endpoints /cluster, /repair_run and
/repair_scheduler work for me. I am using version 1.2.1
It appears as if the /snapshot endpoint is not there

API doc that I am referring http://cassandra-reaper.io/docs/api/

I double checked docs and this just seems to be a weird error



On Wed, Jul 25, 2018 at 3:09 PM, Mick Semb Wever  wrote:

>
> Feel free to file issues at https://github.com/thelastpick
> le/cassandra-reaper/issues
> or chat with us at https://gitter.im/thelastpickle/cassandra-reaper
>
> regards,
> Mick
>
>
>
> On Thu, 26 Jul 2018, at 06:18, Abdul Patel wrote:
> > Was abke start it but unable to start any repair manually it says
> > POST/repair_run
> > Unit conflits with exiting in clustername
> >
> > On Wednesday, July 25, 2018, Abdul Patel  wrote:
> >
> > > Ignore , alter and create permission were missing ..will msg if i
> actually
> > > see an showstopper
> > >
> > > On Wednesday, July 25, 2018, Abdul Patel  wrote:
> > >
> > >> I am trying to uograde to 1.2.2 version of reaper the instance isnt
> > >> starting and giving error that unable to create table snapshot ..do
> we need
> > >> to create it under reaper-db?
> > >>
> > >> On Wednesday, July 25, 2018, Steinmaurer, Thomas <
> > >> thomas.steinmau...@dynatrace.com> wrote:
> > >>
> > >>> Jon,
> > >>>
> > >>>
> > >>>
> > >>> eager trying it out.  Just FYI. Followed the installation
> > >>> instructions on http://cassandra-reaper.io/docs/download/install/
> > >>> Debian-based.
> > >>>
> > >>>
> > >>>
> > >>> 1) Importing the key results in:
> > >>>
> > >>>
> > >>>
> > >>> XXX:~$ sudo apt-key adv --keyserver keyserver.ubuntu.com --recv-keys
> > >>> 2895100917357435
> > >>>
> > >>> Executing: /tmp/tmp.tP0KAKG6iT/gpg.1.sh --keyserver
> > >>>
> > >>> keyserver.ubuntu.com
> > >>>
> > >>> --recv-keys
> > >>>
> > >>> 2895100917357435
> > >>>
> > >>> gpg: requesting key 17357435 from hkp server keyserver.ubuntu.com
> > >>>
> > >>> ?: [fd 4]: read error: Connection reset by peer
> > >>>
> > >>> gpgkeys: HTTP fetch error 7: couldn't connect: eof
> > >>>
> > >>> gpg: no valid OpenPGP data found.
> > >>>
> > >>> gpg: Total number processed: 0
> > >>>
> > >>> gpg: keyserver communications error: keyserver unreachable
> > >>>
> > >>> gpg: keyserver communications error: public key not found
> > >>>
> > >>> gpg: keyserver receive failed: public key not found
> > >>>
> > >>>
> > >>>
> > >>> I had to change the keyserver URL then the import worked:
> > >>>
> > >>>
> > >>>
> > >>> XXX:~$ sudo apt-key adv --keyserver *hkp://keyserver.ubuntu.com:80
> > >>> * --recv-keys 2895100917357435
> > >>>
> > >>> Executing: /tmp/tmp.JwPNeUkm6x/gpg.1.sh --keyserver
> > >>>
> > >>> hkp://keyserver.ubuntu.com:80
> > >>>
> > >>> --recv-keys
> > >>>
> > >>> 2895100917357435
> > >>>
> > >>> gpg: requesting key 17357435 from hkp server keyserver.ubuntu.com
> > >>>
> > >>> gpg: key 17357435: public key "TLP Reaper packages <
> > >>> rea...@thelastpickle.com>" imported
> > >>>
> > >>> gpg: Total number processed: 1
> > >>>
> > >>> gpg:   imported: 1  (RSA: 1)
> > >>>
> > >>>
> > >>>
> > >>>
> > >>>
> > >>> 2) Running apt-get update fails with:
> > >>>
> > >>>
> > >>>
> > >>> XXX:~$ sudo apt-get update
> > >>>
> > >>> Ign:1 https://dl.bintray.com/thelastpickle/reaper-deb wheezy
> InRelease
> > >>>
> > >>> Ign:2 https://dl.bintray.com/thelastpickle/reaper-deb wheezy Release
> > >>>
> > >>> Ign:3 https://dl.bintray.com/thelastpickle/reaper-deb wheezy/main
> amd64
> > >>> Packages
> > >>>
> > >>> Ign:4 https://dl.bintray.com/thelastpickle/reaper-deb wheezy/main
> i386
> > >>> Packages
> > >>>
> > >>> Ign:5 https://dl.bintray.com/thelastpickle/reaper-deb wheezy/main
> all
> > >>> Packages
> > >>>
> > >>> Ign:6 https://dl.bintray.com/thelastpickle/reaper-deb wheezy/main
> > >>> Translation-en_US
> > >>>
> > >>> Ign:7 https://dl.bintray.com/thelastpickle/reaper-deb wheezy/main
> > >>> Translation-en
> > >>>
> > >>> Ign:3 https://dl.bintray.com/thelastpickle/reaper-deb wheezy/main
> amd64
> > >>> Packages
> > >>>
> > >>> Ign:4 https://dl.bintray.com/thelastpickle/reaper-deb wheezy/main
> i386
> > >>> Packages
> > >>>
> > >>> Ign:5 https://dl.bintray.com/thelastpickle/reaper-deb wheezy/main
> all
> > >>> Packages
> > >>>
> > >>> Ign:6 https://dl.bintray.com/thelastpickle/reaper-deb wheezy/main
> > >>> Translation-en_US
> > >>>
> > >>> Ign:7 https://dl.bintray.com/thelastpickle/reaper-deb wheezy/main
> > >>> Translation-en
> > >>>
> > >>> Ign:3 https://dl.bintray.com/thelastpickle/reaper-deb wheezy/main
> amd64
> > >>> Packages
> > >>>
> > >>> Ign:4 https://dl.bintray.com/thelastpickle/reaper-deb wheezy/main
> i386
> > >>> Packages
> > >>>
> > >>> Ign:5 https://dl.bintray.com/thelastpickle/reaper-deb wheezy/main
> all
> > >>> Packages
> > >>>
> > >>> Ign:6 https://dl.bintray.com/thelastpickle/reaper-deb wheezy/main
> > >>> Translation-en_US
> 

Re: Materialized Views and TTLs

2018-02-23 Thread Akash Gangil
Hi Valentina,

In that case, are there any well defined ways on how to do downsampling of
data in C*?

thanks!


On Fri, Feb 23, 2018 at 11:36 AM, Valentina Crisan <
valentina.cri...@gmail.com> wrote:

> Hello,
>
> as far as I know it is not intended for MV's to have a different TTL than
> the base tables. There was patch released at some point to not allow TTL
> setting on MV (https://issues.apache.org/jira/browse/CASSANDRA-12868).
> MV's should inherit the TTL of the base table.
>
> Valentina
>
> On Fri, Feb 23, 2018 at 6:42 PM, Akash Gangil <akashg1...@gmail.com>
> wrote:
>
>> Hi,
>>
>> I had a couple of questions:
>>
>> 1. Can I create a materialized view on a table with a TTL longer than the
>> base table? For ex: my materialized view TTL is 1 month while my base table
>> TTL is 1 week.
>>
>> 2. In the above scenario, since the data in my base table would be gone
>> after a week, would it impact data in the materialized view?
>>
>> My use case if I have some time series data, which is stored in the base
>> table by_minute and I want to downsample it to by_month. So my base table
>> stores by_minute data but my materialized view stores stores by_week data.
>>
>> thanks!
>>
>> --
>> Akash
>>
>
>


-- 
Akash


Materialized Views and TTLs

2018-02-23 Thread Akash Gangil
Hi,

I had a couple of questions:

1. Can I create a materialized view on a table with a TTL longer than the
base table? For ex: my materialized view TTL is 1 month while my base table
TTL is 1 week.

2. In the above scenario, since the data in my base table would be gone
after a week, would it impact data in the materialized view?

My use case if I have some time series data, which is stored in the base
table by_minute and I want to downsample it to by_month. So my base table
stores by_minute data but my materialized view stores stores by_week data.

thanks!

-- 
Akash


Re: Secondary Indexes C* 3.0

2018-02-22 Thread Akash Gangil
To provide more context, I was going through this
https://docs.datastax.com/en/cql/3.3/cql/cql_using/useWhenIndex.html#useWhenIndex__highCardCol

On Thu, Feb 22, 2018 at 9:35 AM, Akash Gangil <akashg1...@gmail.com> wrote:

> Hi,
>
> I was wondering if there are recommendations around the cardinality of
> secondary indexes.
>
> As I understand an index on a column with many distinct values will be
> inefficient. Is it because the index would only direct me to the specfic
> sstable, but then it sequentially searches for the target records? So a
> wide range of the index could lead to a lot of ssltable options to traverse?
>
> Though what's unclear is what the recommended (or benchmarked?) limit, is
> it the index must have 100 distinct values, or can it have upto 1000 or
> 5 distinct values?
>
> thanks!
>
>
>
>
> --
> Akash
>



-- 
Akash


Secondary Indexes C* 3.0

2018-02-22 Thread Akash Gangil
Hi,

I was wondering if there are recommendations around the cardinality of
secondary indexes.

As I understand an index on a column with many distinct values will be
inefficient. Is it because the index would only direct me to the specfic
sstable, but then it sequentially searches for the target records? So a
wide range of the index could lead to a lot of ssltable options to traverse?

Though what's unclear is what the recommended (or benchmarked?) limit, is
it the index must have 100 distinct values, or can it have upto 1000 or
5 distinct values?

thanks!




-- 
Akash


Re: Cassandra Needs to Grow Up by Version Five!

2018-02-21 Thread Akash Gangil
I would second Jon in the arguments he made. Contributing outside work is
draining and really requires a lot of commitment. If someone requires
features around usability etc, just pay for it, period.

On Wed, Feb 21, 2018 at 2:20 PM, Kenneth Brotman <
kenbrot...@yahoo.com.invalid> wrote:

> Jon,
>
> Very sorry that you don't see the value of the time I'm taking for this.
> I don't have demands; I do have a stern warning and I'm right Jon.  Please
> be very careful not to mischaracterized my words Jon.
>
> You suggest I put things in JIRA's, then seem to suggest that I'd be lucky
> if anyone looked at it and did anything. That's what I figured too.
>
> I don't appreciate the hostility.  You will understand more fully in the
> next post where I'm coming from.  Try to keep the conversation civilized.
> I'm trying or at least so you understand I think what I'm doing is saving
> your gig and mine.  I really like a lot of people is this group.
>
> I've come to a preliminary assessment on things.  Soon the cloud will
> clear or I'll be gone.  Don't worry.  I'm a very peaceful person and like
> you I am driven by real important projects that I feel compelled to work on
> for the good of others.  I don't have time for people to hand hold a
> database and I can't get stuck with my projects on the wrong stuff.
>
> Kenneth Brotman
>
>
> -Original Message-
> From: Jon Haddad [mailto:jonathan.had...@gmail.com] On Behalf Of Jon
> Haddad
> Sent: Wednesday, February 21, 2018 12:44 PM
> To: user@cassandra.apache.org
> Cc: d...@cassandra.apache.org
> Subject: Re: Cassandra Needs to Grow Up by Version Five!
>
> Ken,
>
> Maybe it’s not clear how open source projects work, so let me try to
> explain.  There’s a bunch of us who either get paid by someone or volunteer
> on our free time.  The folks that get paid, (yay!) usually take direction
> on what the priorities are, and work on projects that directly affect our
> jobs.  That means that someone needs to care enough about the features you
> want to work on them, if you’re not going to do it yourself.
>
> Now as others have said already, please put your list of demands in JIRA,
> if someone is interested, they will work on it.  You may need to contribute
> a little more than you’ve done already, be prepared to get involved if you
> actually want to to see something get done.  Perhaps learning a little more
> about Cassandra’s internals and the people involved will reveal some of the
> design decisions and priorities of the project.
>
> Third, you seem to be a little obsessed with market share.  While market
> share is fun to talk about, *most* of us that are working on and
> contributing to Cassandra do so because it does actually solve a problem we
> have, and solves it reasonably well.  If some magic open source DB appears
> out of no where and does everything you want Cassandra to, and is bug free,
> keeps your data consistent, automatically does backups, comes with really
> nice cert management, ad hoc querying, amazing materialized views that are
> perfect, no caveats to secondary indexes, and somehow still gives you
> linear scalability without any mental overhead whatsoever then sure, people
> might start using it.  And that’s actually OK, because if that happens
> we’ll all be incredibly pumped out of our minds because we won’t have to
> work as hard.  If on the slim chance that doesn’t manifest, those of us
> that use Cassandra and are part of the community will keep working on the
> things we care about, iterating, and improving things.  Maybe someone will
> even take a look at your JIRA issues.
>
> Further filling the mailing list with your grievances will likely not help
> you progress towards your goal of a Cassandra that’s easier to use, so I
> encourage you to try to be a little more productive and try to help rather
> than just complain, which is not constructive.  I did a quick search for
> your name on the mailing list, and I’ve seen very little from you, so to
> everyone’s who’s been around for a while and trying to help you it looks
> like you’re just some random dude asking for people to work for free on the
> things you’re asking for, without offering anything back in return.
>
> Jon
>
>
> > On Feb 21, 2018, at 11:56 AM, Kenneth Brotman
>  wrote:
> >
> > Josh,
> >
> > To say nothing is indifference.  If you care about your community,
> sometimes don't you have to bring up a subject even though you know it's
> also temporarily adding some discomfort?
> >
> > As to opening a JIRA, I've got a very specific topic to try in mind
> now.  An easy one I'll work on and then announce.  Someone else will have
> to do the coding.  A year from now I would probably just knock it out to
> make sure it's as easy as I expect it to be but to be honest, as I've been
> saying, I'm not set up to do that right now.  I've barely looked at any
> Cassandra code; for one; everyone on this list probably codes more than I
> do, secondly; and