Re: storage engine series

2024-04-30 Thread Ranjib Dey
Great set of learning material Jon, thank you so much for the hard work

Sincerely
Ranjib

On Mon, Apr 29, 2024 at 4:24 PM Jon Haddad  wrote:

> Hey everyone,
>
> I'm doing a 4 week YouTube series on the C* storage engine.  My first
> video was last week where I gave an overview into some of the storage
> engine internals [1].
>
> The next 3 weeks are looking at the new Trie indexes coming in 5.0 [2],
> running Cassandra on EBS [3], and finally looking at some potential
> optimizations [4] that could be done to improve things even further in the
> future.
>
> I hope these videos are useful to the community, and I welcome feedback!
>
> Jon
>
> [1] https://www.youtube.com/live/yj0NQw9DgcE?si=ra1zqusMdSs6vl4T
> [2] https://www.youtube.com/live/ZdzwtH0cJDE?si=CumcPny2UG8zwtsw
> [3] https://www.youtube.com/live/kcq1TC407U4?si=pZ8AkXkMzIylQgB6
> [4] https://www.youtube.com/live/yj0NQw9DgcE?si=ra1zqusMdSs6vl4T
>


Re: Open source equivalents of OpsCenter

2016-07-13 Thread Ranjib Dey
we use datadog (metrics emitted as raw statsd) for the dashboard. All
repair & compaction is done via blender & serf[1].
[1]https://github.com/pagerduty/blender


On Wed, Jul 13, 2016 at 2:42 PM, Kevin O'Connor  wrote:

> Now that OpsCenter doesn't work with open source installs, are there any
> runs at an open source equivalent? I'd be more interested in looking at
> metrics of a running cluster and doing other tasks like managing
> repairs/rolling restarts more so than historical data.
>


Re: Operating on large cluster

2014-10-23 Thread Ranjib Dey
We use chef for configuration management and blender for on demand jobs

https://github.com/opscode/chef
https://github.com/PagerDuty/blender
 On Oct 23, 2014 2:18 AM, Alain RODRIGUEZ arodr...@gmail.com wrote:

 Hi,

 I was wondering about how do you guys handle a large cluster (50+
 machines).

 I mean there is sometime you need to change configuration (cassandra.yaml)
 or send a command to one, some or all nodes (cleanup, upgradesstables,
 setstramthoughput or whatever).

 So far we have been using things like custom scripts for repairs or any
 routine maintenance and cssh for specific and one shot actions on the
 cluster. But I guess this doesn't really scale, I guess we coul use pssh
 instead. For configuration changes we use Capistrano that might scale
 properly.

 So I would like to known, what are the methods that operators use on large
 cluster out there ? Have some of you built some open sourced cluster
 management interfaces or scripts that could make things easier while
 operating on large Cassandra clusters ?

 Alain



Re: How do you run integration tests for your cassandra code?

2014-10-13 Thread Ranjib Dey
you can use tools like chef along side vagrant to bring a cassandra. I
personally prefer LXC containers, as they mimic full blown vms, along side
chef-lxc which provides chef's awesome DSL for container customization
(similar dockerfile, and you wont install chef inside the container), for
our scenarios, i use 5 node clusters. and those whole spawn+ token set +
rebalance is done as part of the test setup,

this is assuming you want to run full blown integration test within a
single host. if you can afford multiple hosts, you can set up either
ephemeral hosts (chef-metal) or dedicated hosts (normal chef nodes but with
a CI agent , like team city or go-cd etc).

So, depending upon how big you want integration environment (from a single
host developer environment) till a production clone (say for capacity
planning  load testing), you can automate cassandra cluster provisioning
before you actually kick in your tests, which might be jmeter/gatling based
testing to actual unit/functional/UAT style testing. I use chef, but im
sure alternate exists in other frameworks
hope this help
ranjib

On Mon, Oct 13, 2014 at 10:36 PM, Paco Trujillo f.truji...@genetwister.nl
wrote:

 Hi Kevin



 We are using a similar solution than horschi. In the past we use
 CassandraUnit (https://github.com/jsevellec/cassandra-unit) but truncate
 the tables after and before each test works better for us. We also set
 gc_grace_seconds to zero.





 *From:* horschi [mailto:hors...@gmail.com]
 *Sent:* maandag 13 oktober 2014 22:17
 *To:* user@cassandra.apache.org
 *Subject:* Re: How do you run integration tests for your cassandra code?



 Hi Kevin,

 I run my tests against my locally running Cassandra instance. I am not
 using any framework, but simply truncate all my tables after/before each
 test. With which I am quite happy.



 You have to enable the unsafeSystem property, disable durable writes on
 the CFs and disable auto-snapshot in the yaml for it to be fast.

 kind regards,

 Christian



 On Mon, Oct 13, 2014 at 9:50 PM, Kevin Burton bur...@spinn3r.com wrote:

 Curious to see if any of you have an elegant solution here.



 Right now I”m using cassandra unit;



 https://github.com/jsevellec/cassandra-unit



 for my integration tests.



 The biggest problem is that it doesn’t support shutdown.  so I can’t stop
 or cleanup after cassandra between tests.



 I have other Java daemons that have the same problem.  For example,
 ActiveMQ doesn’t clean up after itself.



 I was *thinking* of using docker or vagrant to startup a daemon in a
 container, then shut it down between tests.



 But this seems difficult to setup and configure … as well as being not
 amazingly portable.



 Another solution is to use a test suite, and a setUp/tearDown that drops
 all tables created by a test.   This way you’re still on the same cassandra
 instance, but the tables are removed for each pass.



 Anyone have an elegant solution to this?



 --

 Founder/CEO Spinn3r.com

 Location: *San Francisco, CA*

 blog: http://burtonator.wordpress.com

 … or check out my Google+ profile
 https://plus.google.com/102718274791889610666/posts

  http://spinn3r.com

   http://spinn3r.com



Re: Exploring Simply Queueing

2014-10-06 Thread Ranjib Dey
i want answer the first question why one might use cassandra as a queuing
solution:
 - its the only opensource distributed persistence layer (i.e. no SPOF),
that you can run over WAN and provide lan/wan specific quorum controls
i know its sub optimal, as the deletion imposes additional
compaction/repair penalties, but there no other solution i am awaee of.
Most AMQP solutions are broker based and clustering is pain, while things
like riak only supports wan based cluster in their commercial solution. I
would love to know about other alternatives,

And thaks for sharing the ruby based priority queue prototype, it helps
people like me (sys ad :-) ) exploring these concepts betrter,

cheers
ranjib

On Mon, Oct 6, 2014 at 1:35 PM, Jan Algermissen jan.algermis...@nordsc.com
wrote:

 Shane,

 On 06 Oct 2014, at 16:34, Shane Hansen shanemhan...@gmail.com wrote:

  Sorry if I'm hijacking the conversation, but why in the world would you
 want
  to implement a queue on top of Cassandra? It seems like using a proper
 queuing service
  would make your life a lot easier.

 Agreed - however, the use case simply does not justify the additional
 operations.

 
  That being said, there might be a better way to play to the strengths of
 C*. Ideally everything you do
  is append only with few deletes or updates. So an interesting way to
 implement a queue might be
  to do one insert to put the job in the queue and another insert to mark
 the job as done or in process
  or whatever. This would also give you the benefit of being able to
 replay the state of the queue.

 Thanks, I’ll try that, too.

 Jan


 
 
  On Mon, Oct 6, 2014 at 12:57 AM, Jan Algermissen 
 jan.algermis...@nordsc.com wrote:
  Chris,
 
  thanks for taking a look.
 
  On 06 Oct 2014, at 04:44, Chris Lohfink clohf...@blackbirdit.com
 wrote:
 
   It appears you are aware of the tombstones affect that leads people to
 label this an anti-pattern.  Without due or any time based value being
 part of the partition key means you will still get a lot of buildup.  You
 only have 1 partition per shard which just linearly decreases the
 tombstones.  That isn't likely to be enough to really help in a situation
 of high queue throughput, especially with the default of 4 shards.
 
  Yes, dealing with the tombstones effect is the whole point. The work
 loads I have to deal with are not really high throughput, it is unlikely
 we’ll ever reach multiple messages per second.The emphasis is also more on
 coordinating producer and consumer than on high volume capacity problems.
 
  Your comment seems to suggest to include larger time frames (e.g. the
 due-hour) in the partition keys and use the current time to select the
 active partitions (e.g. the shards of the hour). Once an hour has passed,
 the corresponding shards will never be touched again.
 
  Am I understanding this correctly?
 
  
   You may want to consider switching to LCS from the default STCS since
 re-writing to same partitions a lot. It will still use STCS in L0 so in
 high write/delete scenarios, with low enough gc_grace, when it never gets
 higher then L1 it will be sameish write throughput. In scenarios where you
 get more LCS will shine I suspect by reducing number of obsolete
 tombstones.  Would be hard to identify difference in small tests I think.
 
  Thanks, I’ll try to explore the various effects
 
  
   Whats the plan to prevent two consumers from reading same message off
 of a queue?  You mention in docs you will address it at a later point in
 time but its kinda a biggy.  Big lock  batch reads like astyanax recipe?
 
  I have included a static column per shard to act as a lock (the ’lock’
 column in the examples) in combination with conditional updates.
 
  I must admit, I have not quite understood what Netfix is doing in terms
 of coordination - but since performance isn’t our concern, CAS should do
 fine, I guess(?)
 
  Thanks again,
 
  Jan
 
 
  
   ---
   Chris Lohfink
  
  
   On Oct 5, 2014, at 6:03 PM, Jan Algermissen 
 jan.algermis...@nordsc.com wrote:
  
   Hi,
  
   I have put together some thoughts on realizing simple queues with
 Cassandra.
  
   https://github.com/algermissen/cassandra-ruby-queue
  
   The design is inspired by (the much more sophisticated) Netfilx
 approach[1] but very reduced.
  
   Given that I am still a C* newbie, I’d be very glad to hear some
 thoughts on the design path I took.
  
   Jan
  
   [1] https://github.com/Netflix/astyanax/wiki/Message-Queue
  
 
 




Re: Monitoring with Cacti

2010-09-13 Thread Ranjib Dey
I use nagios + nrpe + some custom scripts to monitor our cassandra/hadoop
nodes. Given our long time comfortability with nagios, i didn't find any
major gotchas ..
regards
ranjib

On Mon, Sep 13, 2010 at 2:10 AM, Aaron Morton aa...@thelastpickle.comwrote:

 This is my first encounter with cacti, and it's feels a lot like having a
 cactus violently inserted in me :) Hopefully this week I can get back to it
 with a clearer head, part of my annoyance was probably trying to rush it
 through on a Friday and it's somewhat taxing configuration.

 Over the weekend I was thinking about going with some python (our in
 house favorite) in front of the jmxterm jar.

 I'll also try to learn a bit more about cacti, it cannot be as hard as it
 seemed on Friday.

 I'll email you out of the list this week if I make some progress.

 Aaron


 On 11 Sep, 2010,at 03:31 PM, Edward Capriolo edlinuxg...@gmail.com
 wrote:

 On Fri, Sep 10, 2010 at 7:29 PM, aaron morton aa...@thelastpickle.com
 wrote:
  Am going through the rather painful process of trying to monitor
 cassandra using Cacti (it's what we use at work). At the moment it feels
 like a losing battle :)
 
  Does anyone know of some cacti resources for monitoring the JVM or
 Cassandra metrics other than...
 
  mysql-cacti-templates
  http://code.google.com/p/mysql-cacti-templates/
  - provides templates and data sources that require ssh and can monitor
 JVM heap and a few things.
 
  Cassandra-cacti-m6
  http://www.jointhegrid.com/cassandra/cassandra-cacti-m6.jsp
  Coded for version 0.6* , have made some changes to stop it looking for
 stats that no longer exist. Missing some metrics I think but it's probably
 the best bet so far. If I get it working I'll contribute it back to them
 Most of the problems were probably down the how much effort it takes to
 setup cacti.
 
  jmxterm
  http://www.cyclopsgroup.org/projects/jmxterm/
  Allows for command line access to JMX. I started down the path of writing
 a cacti data source to use this just to see how it worked. Looks like a lot
 of work.
 
  Thanks for any advice.
  Aaron
 
 

 Setting up cacti is easy, the second time, and third time :)
 As for cassandra-cacti-m6 (i am the author). Unfortunately, I have
 been fighting the jmx switcharo battle for about 3 years now
 hadoop/hbase/cassandra/hornetq/vserver

 In a nutshell there is ALWAYS work involved. First, is because as you
 noticed attributes change/remove/add/renamed. Second it takes a human
 to logically group things together. For example, if you have two items
 cache hits and cache misses. You really do not want two separate
 graphs that will scale independently. You want one slick stack graph,
 with nice colors, and you want a CDEF to calculate the cache hit
 percentage by dividing one into the other and show that at the bottom.

 If you want to have a 7.0 branch to cassandra-cacti-m6 I would love
 the help. We are not on 7.0 yet so I have not had the time just to go
 out and make graphs for a version we are not using yet :) but if you
 come up with patches they are happily accepted.

 Edward