Re: backup/restore cassandra data

2018-03-07 Thread Ben Slater
You should be able to follow the same approach(s) as restoring from a
backup as outlined here:
https://docs.datastax.com/en/cassandra/2.1/cassandra/operations/ops_backup_snapshot_restore_t.html#ops_backup_snapshot_restore_t

Cheers
Ben

On Thu, 8 Mar 2018 at 17:07 onmstester onmstester 
wrote:

> Would it be possible to copy/paste Cassandra data directory from one of
> nodes (which Its OS partition corrupted) and use it in a fresh Cassandra
> node? I've used rf=1 so that's my only chance!
>
> Sent using Zoho Mail 
>
>
> --


*Ben Slater*

*Chief Product Officer *

   


Read our latest technical blog posts here
.

This email has been sent on behalf of Instaclustr Pty. Limited (Australia)
and Instaclustr Inc (USA).

This email and any attachments may contain confidential and legally
privileged information.  If you are not the intended recipient, do not copy
or disclose its content, but please reply to this email immediately and
highlight the error to the sender and then immediately delete the message.


backup/restore cassandra data

2018-03-07 Thread onmstester onmstester
Would it be possible to copy/paste Cassandra data directory from one of nodes 
(which Its OS partition corrupted) and use it in a fresh Cassandra node? I've 
used rf=1 so that's my only chance!



Sent using Zoho Mail







Re: system.size_estimates - safe to remove sstables?

2018-03-07 Thread Kunal Gangakhedkar
Thanks a lot, Chris.

Will try it today/tomorrow and update here.

Thanks,
Kunal

On 7 March 2018 at 00:25, Chris Lohfink  wrote:

> While its off you can delete the files in the directory yeah
>
> Chris
>
>
> On Mar 6, 2018, at 2:35 AM, Kunal Gangakhedkar 
> wrote:
>
> Hi Chris,
>
> I checked for snapshots and backups - none found.
> Also, we're not using opscenter, hadoop or spark or any such tool.
>
> So, do you think we can just remove the cf and restart the service?
>
> Thanks,
> Kunal
>
> On 5 March 2018 at 21:52, Chris Lohfink  wrote:
>
>> Any chance space used by snapshots? What files exist there that are
>> taking up space?
>>
>> > On Mar 5, 2018, at 1:02 AM, Kunal Gangakhedkar 
>> wrote:
>> >
>> > Hi all,
>> >
>> > I have a 2-node cluster running cassandra 2.1.18.
>> > One of the nodes has run out of disk space and died - almost all of it
>> shows up as occupied by size_estimates CF.
>> > Out of 296GiB, 288GiB shows up as consumed by size_estimates in 'du
>> -sh' output.
>> >
>> > This is while the other node is chugging along - shows only 25MiB
>> consumed by size_estimates (du -sh output).
>> >
>> > Any idea why this descripancy?
>> > Is it safe to remove the size_estimates sstables from the affected node
>> and restart the service?
>> >
>> > Thanks,
>> > Kunal
>>
>>
>> -
>> To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
>> For additional commands, e-mail: user-h...@cassandra.apache.org
>>
>>
>
>


Re: One time major deletion/purge vs periodic deletion

2018-03-07 Thread kurt greaves
The important point to consider is whether you are deleting old data or
recently written data. How old/recent depends on your write rate to the
cluster and there's no real formula. Basically you want to avoid deleting a
lot of old data all at once because the tombstones will end up in new
SSTables and the data to be deleted will live in higher levels (LCS) or
large SSTables (STCS), which won't get compacted together for a long time.
In this case it makes no difference if you do a big purge or if you break
it up, because at the end of the day if your big purge is just old data,
all the tombstones will have to stick around for awhile until they make it
to the higher levels/bigger SSTables.

If you have to purge large amounts of old data, the easiest way is to 1.
Make sure you have at least 50% disk free (for large/major compactions)
and/or 2. Use garbagecollect compactions (3.10+)
​


Re: Batch too large exception

2018-03-07 Thread Goutham reddy
Mkadek,
Sorry for the late reply. Thanks for the insight that I am unknowingly
using batch inserts (Spring Data Cassandra) using the repository.save where
I am inserting a list of objects at one go. And Cassandra is treating it as
Batch Inserts aborting because of size and write timeout exception. I have
now changed the logic and inserting each partition as all the partition
keys are distributed across all the coordinators on the cluster unlike in
Batch all the set of inserts are redirected to one coordinator node. Hope
somebody avoids this mistake of inserting list of objects.

http://christopher-batey.blogspot.com/2015/02/cassandra-anti-pattern-misuse-of.html?m=1

Above site explained clearly how to perform huge writes into Cassandra.

Thanks and Regards,
Goutham Reddy Aenugu.

On Wed, Feb 28, 2018 at 5:05 AM Marek Kadek -T (mkadek - CONSOL PARTNERS
LTD at Cisco)  wrote:

> Hi,
>
>
>
> Are you writing the batch to same partition? If not, there is a much
> stricter limit (I think 50Kb).
>
> Check https://docs.datastax.com/en/cql/3.3/cql/cql_using/useBatch.html ,
> and followups.
>
>
>
> *From: *Goutham reddy 
> *Reply-To: *"user@cassandra.apache.org" 
> *Date: *Tuesday, February 27, 2018 at 9:55 PM
> *To: *"user@cassandra.apache.org" 
> *Subject: *Batch too large exception
>
>
>
> Hi,
>
> I have been getting batch too large exception when performing WRITE from
> Client application. My insert size is 5MB, so I have to split the 10 insert
> objects to insert at one go. It save some inserts and closes after some
> uncertain time. And it is a wide column table, we do have 113 columns. Can
> anyone kindly provide solution what was going wrong on my execution.
> Appreciate your help.
>
>
> Regards
>
> Goutham Reddy
>
>
>
-- 
Regards
Goutham Reddy


Re: Is there any limit in the number of partitions that a table can have

2018-03-07 Thread Carlos Rolo
Great explanation, thanks Jeff!

On 7 Mar 2018 17:49, "Javier Pareja"  wrote:

> Thank you for your time Jeff, very helpful.I  couldn't find anything out
> there about the subject and I suspected that this could be the case.
>
> Regarding the clustering key in this case:
> Back in the RDBMS world, you will always assign a sequential (or as
> sequential as possible) clustering key to a table to minimize fragmentation
> and increase the speed of the insertions. In the Cassandra world, does the
> same apply to the clustering key? For example, is it a good idea to assign
> a UUID to a clustering key, or would a timestamp be a better choice? I am
> thinking that partitions need to keep some sort of binary index for the
> clustering keys and for relatively large partitions it can be relatively
> expensive to maintain.
>
> F Javier Pareja
>
> On Wed, Mar 7, 2018 at 5:20 PM, Jeff Jirsa  wrote:
>
>>
>>
>> On Wed, Mar 7, 2018 at 7:13 AM, Carlos Rolo  wrote:
>>
>>> Hi Jeff,
>>>
>>> Could you expand: "Tables without clustering keys are often deceptively
>>> expensive to compact, as a lot of work (relative to the other cell
>>> boundaries) happens on partition boundaries." This is something I didn't
>>> know and highly interesting to know more about!
>>>
>>>
>>>
>> We do a lot "by partition". We build column indexes by partition. We
>> update the partition index on each partition. We invalidate key cache by
>> partition. They're not super expensive, but they take time, and tables with
>> tiny partitions can actually be slower to compact.
>>
>> There's no magic cutoff where it does/doesn't make sense, my comment is
>> mostly a warning that the edges of the "normal" use cases tend to be less
>> optimized than the common case. Having a table with a hundred billion
>> records, where the key is numeric and the value is a single byte (let's say
>> you're keeping track of whether or not a specific sensor has ever detected
>> some magic event, and you have 100B sensors, that table will be close to
>> the worst-case example of this behavior).
>>
>
>

-- 


--





Seeking Cassandra speakers for the upcoming Distributed Data Day conference

2018-03-07 Thread Lynn Bender
Friends of Cassandra,

We are currently seeking speakers for the upcoming Distributed Data Day
conference in September.

*http://distributeddataday.com *

The conference will be held in San Francisco at Mission Bay Conference
Center -- which was the site of the first and second Cassandra Summits.

Both *Jonathan Ellis, Cofounder of DataStax*, and *Nate McCall, Apache
Cassandra PMC Chair*, have already confirmed as speakers.

Details on submitting a talk can be found on the proposals page:

*http://distributeddataday.com/2018-sf/proposals
*

Kind regards,

Lynn Bender
Global Data Geeks
https://www.linkedin.com/in/lynnbender


Re: Is there any limit in the number of partitions that a table can have

2018-03-07 Thread Javier Pareja
Thank you for your time Jeff, very helpful.I  couldn't find anything out
there about the subject and I suspected that this could be the case.

Regarding the clustering key in this case:
Back in the RDBMS world, you will always assign a sequential (or as
sequential as possible) clustering key to a table to minimize fragmentation
and increase the speed of the insertions. In the Cassandra world, does the
same apply to the clustering key? For example, is it a good idea to assign
a UUID to a clustering key, or would a timestamp be a better choice? I am
thinking that partitions need to keep some sort of binary index for the
clustering keys and for relatively large partitions it can be relatively
expensive to maintain.

F Javier Pareja

On Wed, Mar 7, 2018 at 5:20 PM, Jeff Jirsa  wrote:

>
>
> On Wed, Mar 7, 2018 at 7:13 AM, Carlos Rolo  wrote:
>
>> Hi Jeff,
>>
>> Could you expand: "Tables without clustering keys are often deceptively
>> expensive to compact, as a lot of work (relative to the other cell
>> boundaries) happens on partition boundaries." This is something I didn't
>> know and highly interesting to know more about!
>>
>>
>>
> We do a lot "by partition". We build column indexes by partition. We
> update the partition index on each partition. We invalidate key cache by
> partition. They're not super expensive, but they take time, and tables with
> tiny partitions can actually be slower to compact.
>
> There's no magic cutoff where it does/doesn't make sense, my comment is
> mostly a warning that the edges of the "normal" use cases tend to be less
> optimized than the common case. Having a table with a hundred billion
> records, where the key is numeric and the value is a single byte (let's say
> you're keeping track of whether or not a specific sensor has ever detected
> some magic event, and you have 100B sensors, that table will be close to
> the worst-case example of this behavior).
>


Re: Is there any limit in the number of partitions that a table can have

2018-03-07 Thread Jeff Jirsa
On Wed, Mar 7, 2018 at 7:13 AM, Carlos Rolo  wrote:

> Hi Jeff,
>
> Could you expand: "Tables without clustering keys are often deceptively
> expensive to compact, as a lot of work (relative to the other cell
> boundaries) happens on partition boundaries." This is something I didn't
> know and highly interesting to know more about!
>
>
>
We do a lot "by partition". We build column indexes by partition. We update
the partition index on each partition. We invalidate key cache by
partition. They're not super expensive, but they take time, and tables with
tiny partitions can actually be slower to compact.

There's no magic cutoff where it does/doesn't make sense, my comment is
mostly a warning that the edges of the "normal" use cases tend to be less
optimized than the common case. Having a table with a hundred billion
records, where the key is numeric and the value is a single byte (let's say
you're keeping track of whether or not a specific sensor has ever detected
some magic event, and you have 100B sensors, that table will be close to
the worst-case example of this behavior).


Re: Is there any limit in the number of partitions that a table can have

2018-03-07 Thread Javier Pareja
Thank you Jeff,

So, if I understood your email correctly, there is no restriction but I
should be using clustering for performance reasons.
I am expecting to store 10B rows per year in this table and each row will
have a user defined type with an approx size of 1500 bytes.
The access to the data in this table will be random as it stores the "raw"
data. There will be other tables with processed data organized by time in a
clustering column to access in sequence for the system.
Each row represents an event and it has a UUID which I am planning to use
as the partition key. Should I find another field for the partition and use
the UUID for the clustering instead?


F Javier Pareja

On Wed, Mar 7, 2018 at 2:36 PM, Jeff Jirsa  wrote:

> There is no limit
>
> The token range of murmur3 is 2^64, but Cassandra properly handles token
> overlaps (we use a key that’s effectively a tuple of the token/hash and the
> underlying key itself), so having more than 2^64 partitions won’t hurt
> anything in theory
>
> That said, having that many partitions would be an incredibly huge data
> set, and unless modeled properly, would be very likely to be unwieldy in
> practice.
>
> Tables without clustering keys are often deceptively expensive to compact,
> as a lot of work (relative to the other cell boundaries) happens on
> partition boundaries.
>
> --
> Jeff Jirsa
>
>
> > On Mar 7, 2018, at 3:06 AM, Javier Pareja 
> wrote:
> >
> > Hello all,
> >
> > I have been trying to find an answer to the following but I have had no
> luck so far:
> > Is there any limit to the number of partitions that a table can have?
> > Let's say a table has a partition key an no clustering key, is there a
> recommended limit on the number of values that this partition key can have?
> Is it recommended to have a clustering key to reduce this number by storing
> several rows in each partition instead of one row per partition.
> >
> > Regards,
> >
> > F Javier Pareja
>
> -
> To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
> For additional commands, e-mail: user-h...@cassandra.apache.org
>
>


Re: Is there any limit in the number of partitions that a table can have

2018-03-07 Thread Carlos Rolo
Hi Jeff,

Could you expand: "Tables without clustering keys are often deceptively
expensive to compact, as a lot of work (relative to the other cell
boundaries) happens on partition boundaries." This is something I didn't
know and highly interesting to know more about!

--
Carlos Rolo

On Wed, Mar 7, 2018 at 2:36 PM, Jeff Jirsa  wrote:

> There is no limit
>
> The token range of murmur3 is 2^64, but Cassandra properly handles token
> overlaps (we use a key that’s effectively a tuple of the token/hash and the
> underlying key itself), so having more than 2^64 partitions won’t hurt
> anything in theory
>
> That said, having that many partitions would be an incredibly huge data
> set, and unless modeled properly, would be very likely to be unwieldy in
> practice.
>
> Tables without clustering keys are often deceptively expensive to compact,
> as a lot of work (relative to the other cell boundaries) happens on
> partition boundaries.
>
> --
> Jeff Jirsa
>
>
> > On Mar 7, 2018, at 3:06 AM, Javier Pareja 
> wrote:
> >
> > Hello all,
> >
> > I have been trying to find an answer to the following but I have had no
> luck so far:
> > Is there any limit to the number of partitions that a table can have?
> > Let's say a table has a partition key an no clustering key, is there a
> recommended limit on the number of values that this partition key can have?
> Is it recommended to have a clustering key to reduce this number by storing
> several rows in each partition instead of one row per partition.
> >
> > Regards,
> >
> > F Javier Pareja
>
> -
> To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
> For additional commands, e-mail: user-h...@cassandra.apache.org
>
>

-- 


--





Re: Is there any limit in the number of partitions that a table can have

2018-03-07 Thread Jeff Jirsa
There is no limit

The token range of murmur3 is 2^64, but Cassandra properly handles token 
overlaps (we use a key that’s effectively a tuple of the token/hash and the 
underlying key itself), so having more than 2^64 partitions won’t hurt anything 
in theory

That said, having that many partitions would be an incredibly huge data set, 
and unless modeled properly, would be very likely to be unwieldy in practice.

Tables without clustering keys are often deceptively expensive to compact, as a 
lot of work (relative to the other cell boundaries) happens on partition 
boundaries.

-- 
Jeff Jirsa


> On Mar 7, 2018, at 3:06 AM, Javier Pareja  wrote:
> 
> Hello all,
> 
> I have been trying to find an answer to the following but I have had no luck 
> so far:
> Is there any limit to the number of partitions that a table can have?
> Let's say a table has a partition key an no clustering key, is there a 
> recommended limit on the number of values that this partition key can have? 
> Is it recommended to have a clustering key to reduce this number by storing 
> several rows in each partition instead of one row per partition.
> 
> Regards,
> 
> F Javier Pareja

-
To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
For additional commands, e-mail: user-h...@cassandra.apache.org



Re: Seed nodes of DC2 creating own versions of system keyspaces

2018-03-07 Thread Oleksandr Shulgin
On Tue, Mar 6, 2018 at 8:28 PM, Jeff Jirsa  wrote:

>
> Sorry, I wasnt as precise as I should have been:
>
> In 3.0 and newer, a bootstrapping node will wait until it has schema
> before it bootstraps. HOWEVER, we make the ssystem_auth/system_distributed,
> etc keyspaces as a node starts up, before it requests the schema from the
> rest of the cluster.
>
> You will see some schema exchanges go through the cluster as new 3.0 nodes
> come online, but it's a no-op schema change.
>

Well, this I also see from the code, but it doesn't answer the question of
"why". :)

Is this again because of the very first seed node corner case?  Will it
hang indefinitely waiting for schema from other nodes if it would try?

--
Alex


Re: Is there any limit in the number of partitions that a table can have

2018-03-07 Thread Javier Pareja
Thank you Rahul, but is it a good practice to use a large range here? Or
would it be better to create partitions with more than 1 row (by using a
clustering key)?
>From a data query point of view I will be accessing the rows by a UID one
at a time.

F Javier Pareja

On Wed, Mar 7, 2018 at 11:12 AM, Rahul Singh 
wrote:

> The range is 2*2^63
>
> --
> Rahul Singh
> rahul.si...@anant.us
>
> Anant Corporation
>
> On Mar 7, 2018, 6:06 AM -0500, Javier Pareja ,
> wrote:
>
> Hello all,
>
> I have been trying to find an answer to the following but I have had no
> luck so far:
> Is there any limit to the number of partitions that a table can have?
> Let's say a table has a partition key an no clustering key, is there a
> recommended limit on the number of values that this partition key can have?
> Is it recommended to have a clustering key to reduce this number by storing
> several rows in each partition instead of one row per partition.
>
> Regards,
>
> F Javier Pareja
>
>


Re: [External] Is there any limit in the number of partitions that a table can have

2018-03-07 Thread Tom van der Woerdt
Hi Javier,

When our users ask this question, I tend to answer "keep it above a
billion". More partitions is better.

I'm not aware of any actual limits on partition count. Practically it's
almost always limited by the disk space in a server.

Tom van der Woerdt
Site Reliability Engineer

Booking.com B.V.
Vijzelstraat 66-80 Amsterdam 1017HL Netherlands
[image: Booking.com] 
The world's #1 accommodation site
43 languages, 198+ offices worldwide, 120,000+ global destinations,
1,550,000+ room nights booked every day
No booking fees, best price always guaranteed
Subsidiary of Booking Holdings Inc. (NASDAQ: BKNG)

On Wed, Mar 7, 2018 at 12:06 PM, Javier Pareja 
wrote:

> Hello all,
>
> I have been trying to find an answer to the following but I have had no
> luck so far:
> Is there any limit to the number of partitions that a table can have?
> Let's say a table has a partition key an no clustering key, is there a
> recommended limit on the number of values that this partition key can have?
> Is it recommended to have a clustering key to reduce this number by storing
> several rows in each partition instead of one row per partition.
>
> Regards,
>
> F Javier Pareja
>


Re: Is there any limit in the number of partitions that a table can have

2018-03-07 Thread Rahul Singh
The range is 2*2^63

--
Rahul Singh
rahul.si...@anant.us

Anant Corporation

On Mar 7, 2018, 6:06 AM -0500, Javier Pareja , wrote:
> Hello all,
>
> I have been trying to find an answer to the following but I have had no luck 
> so far:
> Is there any limit to the number of partitions that a table can have?
> Let's say a table has a partition key an no clustering key, is there a 
> recommended limit on the number of values that this partition key can have? 
> Is it recommended to have a clustering key to reduce this number by storing 
> several rows in each partition instead of one row per partition.
>
> Regards,
>
> F Javier Pareja


Re: Cassandra Daemon not coming up

2018-03-07 Thread Rahul Singh
It’s possible that the schema supporting roles and users is corrupted. Do you 
have a backup of it? Another quick fix would be to potentially reset 
permissions on your data dirs and restart. You can also inspect using the 
offline Cassandra sstable reader to see if it’s unaffected.


--
Rahul Singh
rahul.si...@anant.us

Anant Corporation

On Mar 5, 2018, 6:26 PM -0500, mahesh rajamani , 
wrote:
> I did not add any user and disk space was fine.
>
>
>
> > On Tue, Feb 27, 2018, 11:33 Rahul Singh  
> > wrote:
> > > Were there any changes to the system such as permissions, etc. Did you 
> > > add users / change auth scheme?
> > >
> > > On Feb 27, 2018, 10:27 AM -0600, ZAIDI, ASAD A , wrote:
> > > > Can you check if you’ve enough disk space available ?
> > > > ~Asad
> > > >
> > > > From: mahesh rajamani [mailto:rajamani.mah...@gmail.com]
> > > > Sent: Tuesday, February 27, 2018 10:11 AM
> > > > To: user@cassandra.apache.org
> > > > Subject: Cassandra Daemon not coming up
> > > >
> > > > I am using Cassandra 3.0.9 version on a 12 node cluster. I have 
> > > > multiple node down after a restart. The cassandra VM is not coming up 
> > > > with an asset error as below. On running in debug mode it is failing 
> > > > while doing operation on " resource_role_permissons_index" in 
> > > > system_auth keyspace. Please let me know how to bring the cassandra 
> > > > running from this state.
> > > >
> > > > Logs from system.log
> > > >
> > > > INFO  [main] 2018-02-27 15:43:24,005 ColumnFamilyStore.java:389 - 
> > > > Initializing system_schema.columns
> > > > INFO  [main] 2018-02-27 15:43:24,012 ColumnFamilyStore.java:389 - 
> > > > Initializing system_schema.triggers
> > > > INFO  [main] 2018-02-27 15:43:24,019 ColumnFamilyStore.java:389 - 
> > > > Initializing system_schema.dropped_columns
> > > > INFO  [main] 2018-02-27 15:43:24,029 ColumnFamilyStore.java:389 - 
> > > > Initializing system_schema.views
> > > > INFO  [main] 2018-02-27 15:43:24,038 ColumnFamilyStore.java:389 - 
> > > > Initializing system_schema.types
> > > > INFO  [main] 2018-02-27 15:43:24,049 ColumnFamilyStore.java:389 - 
> > > > Initializing system_schema.functions
> > > > INFO  [main] 2018-02-27 15:43:24,061 ColumnFamilyStore.java:389 - 
> > > > Initializing system_schema.aggregates
> > > > INFO  [main] 2018-02-27 15:43:24,072 ColumnFamilyStore.java:389 - 
> > > > Initializing system_schema.indexes
> > > > ERROR [main] 2018-02-27 15:43:24,127 CassandraDaemon.java:709 - 
> > > > Exception encountered during startup
> > > > java.lang.AssertionError: null
> > > >         at 
> > > > org.apache.cassandra.db.marshal.CompositeType.getInstance(CompositeType.java:103)
> > > >  ~[apache-cassandra-3.0.9.jar:3.0.9]
> > > >         at 
> > > > org.apache.cassandra.config.CFMetaData.rebuild(CFMetaData.java:311) 
> > > > ~[apache-cassandra-3.0.9.jar:3.0.9]
> > > >         at 
> > > > org.apache.cassandra.config.CFMetaData.(CFMetaData.java:288) 
> > > > ~[apache-cassandra-3.0.9.jar:3.0.9]
> > > >         at 
> > > > org.apache.cassandra.config.CFMetaData.create(CFMetaData.java:366) 
> > > > ~[apache-cassandra-3.0.9.jar:3.0.9]
> > > >         at 
> > > > org.apache.cassandra.schema.SchemaKeyspace.fetchTable(SchemaKeyspace.java:954)
> > > >  ~[apache-cassandra-3.0.9.jar:3.0.9]
> > > >         at 
> > > > org.apache.cassandra.schema.SchemaKeyspace.fetchTables(SchemaKeyspace.java:928)
> > > >  ~[apache-cassandra-3.0.9.jar:3.0.9]
> > > >         at 
> > > > org.apache.cassandra.schema.SchemaKeyspace.fetchKeyspace(SchemaKeyspace.java:891)
> > > >  ~[apache-cassandra-3.0.9.jar:3.0.9]
> > > >         at 
> > > > org.apache.cassandra.schema.SchemaKeyspace.fetchKeyspacesWithout(SchemaKeyspace.java:868)
> > > >  ~[apache-cassandra-3.0.9.jar:3.0.9]
> > > >         at 
> > > > org.apache.cassandra.schema.SchemaKeyspace.fetchNonSystemKeyspaces(SchemaKeyspace.java:856)
> > > >  ~[apache-cassandra-3.0.9.jar:3.0.9]
> > > >         at 
> > > > org.apache.cassandra.config.Schema.loadFromDisk(Schema.java:136) 
> > > > ~[apache-cassandra-3.0.9.jar:3.0.9]
> > > >         at 
> > > > org.apache.cassandra.config.Schema.loadFromDisk(Schema.java:126) 
> > > > ~[apache-cassandra-3.0.9.jar:3.0.9]
> > > >         at 
> > > > org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:239)
> > > >  [apache-cassandra-3.0.9.jar:3.0.9]
> > > >         at 
> > > > org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:568)
> > > >  [apache-cassandra-3.0.9.jar:3.0.9]
> > > >         at 
> > > > org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:696)
> > > >  [apache-cassandra-3.0.9.jar:3.0.9]
> > > >
> > > > --
> > > > Regards,
> > > > Mahesh Rajamani


Is there any limit in the number of partitions that a table can have

2018-03-07 Thread Javier Pareja
Hello all,

I have been trying to find an answer to the following but I have had no
luck so far:
Is there any limit to the number of partitions that a table can have?
Let's say a table has a partition key an no clustering key, is there a
recommended limit on the number of values that this partition key can have?
Is it recommended to have a clustering key to reduce this number by storing
several rows in each partition instead of one row per partition.

Regards,

F Javier Pareja


Re: Adding disk to operating C*

2018-03-07 Thread Rahul Singh
Are you putting both the commitlogs and the Sstables on the adds? Consider 
moving your snapshots often if that’s also taking up space. Maybe able to save 
some space before you add drives.

You should be able to add these new drives and mount them without an issue. Try 
to avoid different number of data dirs across nodes. It makes automation of 
operational processes a little harder.

As an aside, Depending on your usecase you may not want to have a data density 
over 1.5 TB per node.

--
Rahul Singh
rahul.si...@anant.us

Anant Corporation

On Mar 7, 2018, 1:26 AM -0500, Eunsu Kim , wrote:
> Hello,
>
> I use 5 nodes to create a cluster of Cassandra. (SSD 1TB)
>
> I'm trying to mount an additional disk(SSD 1TB) on each node because each 
> disk usage growth rate is higher than I expected. Then I will add the the 
> directory to data_file_directories in cassanra.yaml
>
> Can I get advice from who have experienced this situation?
> If we go through the above steps one by one, will we be able to complete the 
> upgrade without losing data?
> The replication strategy is SimpleStrategy, RF 2.
>
> Thank you in advance
> -
> To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
> For additional commands, e-mail: user-h...@cassandra.apache.org
>


Re: One time major deletion/purge vs periodic deletion

2018-03-07 Thread Rahul Singh
Charu,

I am aware of what type of things you are trying to do and why. Not sure if DCS 
will solve your problem. Consider a process that identifies the data that needs 
to be deleted and sets a TTL on that row or cell sometime in the future such as 
10 days.

The process could be run daily , hourly, etc. depending on the volume but it 
would spread out the actual deletes.

--
Rahul Singh
rahul.si...@anant.us

Anant Corporation

On Mar 7, 2018, 3:26 AM -0500, Ben Slater , wrote:
> I would say you are better off spreading out the deletes so compactions have 
> the best chance of actually removing them from disk before they become a 
> problem. You will likely need to pay close attempting to compaction strategy 
> tuning.
>
> I don’t have any personal experience with it but you may also want to check 
> out deleting compaction strategy to see if it works for your use case: 
> https://github.com/protectwise/cassandra-util/tree/master/deleting-compaction-strategy
>
> Cheers
> Ben
>
> > On Wed, 7 Mar 2018 at 17:19 Charulata Sharma (charshar) 
> >  wrote:
> > > Well it’s not like that. We don’t just purge. There are business rules 
> > > which will decide the records to be purged or archived and then purged, 
> > > so cannot rely on TTL.
> > >
> > > Thanks,
> > > Charu
> > >
> > > From: Jens Rantil 
> > > Reply-To: "user@cassandra.apache.org" 
> > > Date: Tuesday, March 6, 2018 at 12:34 AM
> > > To: "user@cassandra.apache.org" 
> > > Subject: Re: One time major deletion/purge vs periodic deletion
> > >
> > > Sounds like you are using Cassandra as a queue. It's an antibiotic 
> > > pattern. What I would do would be to rely on TTL for removal of data and 
> > > use the TWCS compaction strategy to handle removal and you just focus on 
> > > insertion.
> > > On Tue, Mar 6, 2018, 07:39 Charulata Sharma (charshar) 
> > >  wrote:
> > > > quote_type
> > > > Hi,
> > > >
> > > >   Wanted the community’s feedback on deciding the schedule of 
> > > > Archive and Purge job.
> > > > Is it better to Purge a large volume of data at regular intervals (like 
> > > > run A jobs once in 3 months ) or purge smaller amounts more 
> > > > frequently (run the job weekly??)
> > > >
> > > > Some estimates on the number of deletes performed would be…upto 80-90K  
> > > > rows purged in 3 months vs 10K deletes every week ??
> > > >
> > > > Thanks,
> > > > Charu
> > > >
> > > --
> > > Jens Rantil
> > > Backend Developer @ Tink
> > > Tink AB, Wallingatan 5, 111 60 Stockholm, Sweden
> > > For urgent matters you can reach me at +46-708-84 18 32.
> --
> Ben Slater
> Chief Product Officer
>
>
> Read our latest technical blog posts here.
> This email has been sent on behalf of Instaclustr Pty. Limited (Australia) 
> and Instaclustr Inc (USA).
> This email and any attachments may contain confidential and legally 
> privileged information.  If you are not the intended recipient, do not copy 
> or disclose its content, but please reply to this email immediately and 
> highlight the error to the sender and then immediately delete the message.


Re: One time major deletion/purge vs periodic deletion

2018-03-07 Thread Ben Slater
I would say you are better off spreading out the deletes so compactions
have the best chance of actually removing them from disk before they become
a problem. You will likely need to pay close attempting to compaction
strategy tuning.

I don’t have any personal experience with it but you may also want to check
out deleting compaction strategy to see if it works for your use case:
https://github.com/protectwise/cassandra-util/tree/master/deleting-compaction-strategy

Cheers
Ben

On Wed, 7 Mar 2018 at 17:19 Charulata Sharma (charshar) 
wrote:

> Well it’s not like that. We don’t just purge. There are business rules
> which will decide the records to be purged or archived and then purged, so
> cannot rely on TTL.
>
>
>
> Thanks,
>
> Charu
>
>
>
> *From: *Jens Rantil 
> *Reply-To: *"user@cassandra.apache.org" 
> *Date: *Tuesday, March 6, 2018 at 12:34 AM
> *To: *"user@cassandra.apache.org" 
> *Subject: *Re: One time major deletion/purge vs periodic deletion
>
>
>
> Sounds like you are using Cassandra as a queue. It's an antibiotic
> pattern. What I would do would be to rely on TTL for removal of data and
> use the TWCS compaction strategy to handle removal and you just focus on
> insertion.
>
> On Tue, Mar 6, 2018, 07:39 Charulata Sharma (charshar) 
> wrote:
>
> Hi,
>
>
>
>   Wanted the community’s feedback on deciding the schedule of Archive
> and Purge job.
>
> Is it better to Purge a large volume of data at regular intervals (like
> run A jobs once in 3 months ) or purge smaller amounts more frequently
> (run the job weekly??)
>
>
>
> Some estimates on the number of deletes performed would be…upto 80-90K
>  rows purged in 3 months vs 10K deletes every week ??
>
>
>
> Thanks,
>
> Charu
>
>
>
> --
>
> Jens Rantil
> Backend Developer @ Tink
>
> Tink AB, Wallingatan 5, 111 60 Stockholm, Sweden
> 
> For urgent matters you can reach me at +46-708-84 18 32.
>
-- 


*Ben Slater*

*Chief Product Officer *

   


Read our latest technical blog posts here
.

This email has been sent on behalf of Instaclustr Pty. Limited (Australia)
and Instaclustr Inc (USA).

This email and any attachments may contain confidential and legally
privileged information.  If you are not the intended recipient, do not copy
or disclose its content, but please reply to this email immediately and
highlight the error to the sender and then immediately delete the message.