Hi,

We are investigating the use of sidecar with cassandra to deliver bulk 
read/write workflows via spark/cassandra-analytics. I'm relatively new to 
cassandra devops, but have started reading some of the code (sidecar in 
particular).

I'm quite confused as to the intended topology of a large scale cassandra fleet 
(say 200+ nodes) and the sidecar process itself. We've currently assumed that 
sidecar MUST be local and have access to the sidecar 
"cassandra_instances.storage_dir" folder in order to carry out its bulk 
read/write functionality. But after reading stuff here (and on dev), it seems 
like a single sidecar process chould manage N(200) instances of cassandra, even 
remotely from a separate machine or network. Can someone clarify the intended 
topology of sidecar:cassandra-node or point me in the right direction for 
review, it would be appreciated.

I see that 'sidecar' can be discussed as meaning many different things 
(modules?), and one thing I've been asked to do is restrict access to only 
parts of sidecar to clients (as we are paid to manage stuff for them). Is there 
any intention to make "my sidecar" configurable? Ie what parts of sidecar 
should be running on this node, without rebuilding from source?  I could 
see a new block of config that enables/disables parts of the api. At the 
moment, we are moving forward with one sidecar to one cassandra, and intend to 
put sidecar upstream from an nginx server to limit access.

Any pointers/reading material appreciated.

Cheers,
Carl

Reply via email to