Hi,
Looks like that is my primary problem - the sstable count for the
daily_challenges column family is 5k. Azure had scheduled maintenance
window on Sat. All the VMs got rebooted one by one - including the current
cassandra one - and it's taking forever to bring cassandra back up online.
Is there any way I can re-organize my existing data? so that I can bring
down that count?
I don't want to lose that data.
If possible, can I do that while cassandra is down? As I mentioned, it's
taking forever to get the service up - it's stuck in reading those 5k
sstable (+ another 5k of corresponding secondary index) files. :(
Oh, did I mention I'm new to cassandra?
Thanks,
Kunal
Kunal
On 11 July 2015 at 03:29, Sebastian Estevez sebastian.este...@datastax.com
wrote:
#1
There is one table - daily_challenges - which shows compacted partition
max bytes as ~460M and another one - daily_guest_logins - which shows
compacted partition max bytes as ~36M.
460 is high, I like to keep my partitions under 100mb when possible. I've
seen worse though. The fix is to add something else (maybe month or week or
something) into your partition key:
PRIMARY KEY ((segment_type, something_else), date, user_id, sess_id)
#2 looks like your jam version is 3 per your env.sh so you're probably
okay to copy the env.sh over from the C* 3.0 link I shared once you
uncomment and tweak the MAX_HEAP. If there's something wrong your node
won't come up. tail your logs.
All the best,
[image: datastax_logo.png] http://www.datastax.com/
Sebastián Estévez
Solutions Architect | 954 905 8615 | sebastian.este...@datastax.com
[image: linkedin.png] https://www.linkedin.com/company/datastax [image:
facebook.png] https://www.facebook.com/datastax [image: twitter.png]
https://twitter.com/datastax [image: g+.png]
https://plus.google.com/+Datastax/about
http://feeds.feedburner.com/datastax
http://cassandrasummit-datastax.com/
DataStax is the fastest, most scalable distributed database technology,
delivering Apache Cassandra to the world’s most innovative enterprises.
Datastax is built to be agile, always-on, and predictably scalable to any
size. With more than 500 customers in 45 countries, DataStax is the
database technology and transactional backbone of choice for the worlds
most innovative companies such as Netflix, Adobe, Intuit, and eBay.
On Fri, Jul 10, 2015 at 2:44 PM, Kunal Gangakhedkar
kgangakhed...@gmail.com wrote:
And here is my cassandra-env.sh
https://gist.github.com/kunalg/2c092cb2450c62be9a20
Kunal
On 11 July 2015 at 00:04, Kunal Gangakhedkar kgangakhed...@gmail.com
wrote:
From jhat output, top 10 entries for Instance Count for All Classes
(excluding platform) shows:
2088223 instances of class org.apache.cassandra.db.BufferCell
1983245 instances of class
org.apache.cassandra.db.composites.CompoundSparseCellName
1885974 instances of class
org.apache.cassandra.db.composites.CompoundDenseCellName
63 instances of class
org.apache.cassandra.io.sstable.IndexHelper$IndexInfo
503687 instances of class org.apache.cassandra.db.BufferDeletedCell
378206 instances of class org.apache.cassandra.cql3.ColumnIdentifier
101800 instances of class org.apache.cassandra.utils.concurrent.Ref
101800 instances of class
org.apache.cassandra.utils.concurrent.Ref$State
90704 instances of class
org.apache.cassandra.utils.concurrent.Ref$GlobalState
71123 instances of class org.apache.cassandra.db.BufferDecoratedKey
At the bottom of the page, it shows:
Total of 8739510 instances occupying 193607512 bytes.
JFYI.
Kunal
On 10 July 2015 at 23:49, Kunal Gangakhedkar kgangakhed...@gmail.com
wrote:
Thanks for quick reply.
1. I don't know what are the thresholds that I should look for. So, to
save this back-and-forth, I'm attaching the cfstats output for the
keyspace.
There is one table - daily_challenges - which shows compacted partition
max bytes as ~460M and another one - daily_guest_logins - which shows
compacted partition max bytes as ~36M.
Can that be a problem?
Here is the CQL schema for the daily_challenges column family:
CREATE TABLE app_10001.daily_challenges (
segment_type text,
date timestamp,
user_id int,
sess_id text,
data text,
deleted boolean,
PRIMARY KEY (segment_type, date, user_id, sess_id)
) WITH CLUSTERING ORDER BY (date DESC, user_id ASC, sess_id ASC)
AND bloom_filter_fp_chance = 0.01
AND caching = '{keys:ALL, rows_per_partition:NONE}'
AND comment = ''
AND compaction = {'min_threshold': '4', 'class':
'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy',
'max_threshold': '32'}
AND compression = {'sstable_compression':
'org.apache.cassandra.io.compress.LZ4Compressor'}
AND dclocal_read_repair_chance = 0.1
AND default_time_to_live = 0
AND gc_grace_seconds = 864000
AND max_index_interval = 2048
AND memtable_flush_period_in_ms = 0
AND min_index_interval = 128