Hi, We are investigating the use of sidecar with cassandra to deliver bulk read/write workflows via spark/cassandra-analytics. I'm relatively new to cassandra devops, but have started reading some of the code (sidecar in particular).
I'm quite confused as to the intended topology of a large scale cassandra fleet (say 200+ nodes) and the sidecar process itself. We've currently assumed that sidecar MUST be local and have access to the sidecar "cassandra_instances.storage_dir" folder in order to carry out its bulk read/write functionality. But after reading stuff here (and on dev), it seems like a single sidecar process chould manage N(200) instances of cassandra, even remotely from a separate machine or network. Can someone clarify the intended topology of sidecar:cassandra-node or point me in the right direction for review, it would be appreciated. I see that 'sidecar' can be discussed as meaning many different things (modules?), and one thing I've been asked to do is restrict access to only parts of sidecar to clients (as we are paid to manage stuff for them). Is there any intention to make "my sidecar" configurable? Ie what parts of sidecar should be running on this node, without rebuilding from source? I could see a new block of config that enables/disables parts of the api. At the moment, we are moving forward with one sidecar to one cassandra, and intend to put sidecar upstream from an nginx server to limit access. Any pointers/reading material appreciated. Cheers, Carl
