Re: About the relationship between the sstable compaction and the read path

2019-01-08 Thread Jinhua Luo
Thanks. Let me clarify my questions more.

1) For memtable, if the selected columns (assuming they are in simple
types) could be found in memtable only, why bother to search sstables
then? In leveldb and rocksdb, they would stop consulting sstables if
the memtable already fulfill the query.

2) For STCS and LCS, obviously, the sstables are grouped in
generations (old mutations would promoted into next level or bucket),
so why not search the columns level by level (or bucket by bucket)
until all selected columns are collected? In leveldb and rocksdb, they
do in this way.

3) Could you explain the collection, cdt and counter types in more
detail? Does they need to iterate all sstables? Because they could not
be simply filtered by timestamp or value range.

For collection, when I select a column of collection type, e.g.
map, to ensure the whole set of map fields is collected,
it is necessary to search in all sstables.

For cdt, it needs to ensure all fields of the cdt is collected.

For counter, it needs to merge all mutations distributed in all
sstables to give a final state of counter value.

Am I correct? If so, then there three complex types seems less
efficient than simple types, right?

Jeff Jirsa  于2019年1月8日周二 下午11:58写道:
>
> First:
>
> Compaction controls how sstables are combined but not how they’re read. The 
> read path (with one tiny exception) doesn’t know or care which compaction 
> strategy you’re using.
>
> A few more notes inline.
>
> > On Jan 8, 2019, at 3:04 AM, Jinhua Luo  wrote:
> >
> > Hi All,
> >
> > The compaction would organize the sstables, e.g. with LCS, the
> > sstables would be categorized into levels, and the read path should
> > read sstables level by level until the read is fulfilled, correct?
>
> LCS levels are to minimize the number of sstables scanned - at most one per 
> level - but there’s no attempt to fulfill the read with low levels beyond the 
> filtering done by timestamp.
>
> >
> > For STCS, it would search sstables in buckets from smallest to largest?
>
> Nope. No attempt to do this.
>
> >
> > What about other compaction cases? They would iterate all sstables?
>
> In all cases, we’ll use a combination of bloom filters and sstable metadata 
> and indices to include / exclude sstables. If the bloom filter hits, we’ll 
> consider things like timestamps and whether or not the min/max clustering of 
> the sstable matches the slice we care about. We don’t consult the compaction 
> strategy, though the compaction strategy may have (in the case of LCS or 
> TWCS) placed the sstables into a state that makes this read less expensive.
>
> >
> > But in the codes, I'm confused a lot:
> > In 
> > org.apache.cassandra.db.SinglePartitionReadCommand#queryMemtableAndDiskInternal,
> > it seems that no matter whether the selected columns (except the
> > collection/cdt and counter cases, let's assume here the selected
> > columns are simple cell) are collected and satisfied, it would search
> > both memtable and all sstables, regardless of the compaction strategy.
>
> There’s another that includes timestamps that will do some smart-ish 
> exclusion of sstables that aren’t needed for the read command.
>
> >
> > Why?
> >
> > Moreover, for collection/cdt (non-frozen) and counter types, it would
> > need to iterate all sstable to ensure the whole set of the fields are
> > collected, correct? If so, such multi-cell or counter types are
> > heavyweight in performance, correct?
> >
> > -
> > To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
> > For additional commands, e-mail: user-h...@cassandra.apache.org
> >
>
> -
> To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
> For additional commands, e-mail: user-h...@cassandra.apache.org
>

-
To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
For additional commands, e-mail: user-h...@cassandra.apache.org



Re: Authenticate cassandra-stress with cqlshrc

2019-01-08 Thread Ben Slater
Yep, cassandra-stress doesn’t attempt to use the cqlshrc file. Seems to me
it could be convenient so might make a nice contribution to the project.

Cheers
Ben

---


*Ben Slater*
*Chief Product Officer*


   


Read our latest technical blog posts here
.

This email has been sent on behalf of Instaclustr Pty. Limited (Australia)
and Instaclustr Inc (USA).

This email and any attachments may contain confidential and legally
privileged information.  If you are not the intended recipient, do not copy
or disclose its content, but please reply to this email immediately and
highlight the error to the sender and then immediately delete the message.


On Wed, 9 Jan 2019 at 11:01, Arvinder Dhillon  wrote:

> Yes, my cluster is set up to authenticate using
> PasswordAuthentication(host, user and password are stored in cqlshrc).
> When I try to run Cassandra-stress without providing user & password on
> command line, it through authentication error. I expect Cassandra-stress to
> read cqlshrc file and authenticate.
> However, if I provide user, password on command line, it works perfectly.
> Thanks
>
> -Arvinder
>
> On Tue, Jan 8, 2019, 1:35 PM Ben Slater 
>> Is your cluster set up to require authentication? I’m a bit unclear about
>> whether you’re trying to connect without passing a user name and password
>> at all (which should just work as the default) or if you’re looking for
>> some mechanism other than the command line to pass the user name / password
>> (in which case I don’t think there is one but stress has a hell of a lot of
>> options so I could be wrong).
>>
>> Cheers
>> Ben
>>
>> ---
>>
>>
>> *Ben Slater*
>> *Chief Product Officer*
>>
>>
>> 
>> 
>> 
>>
>> Read our latest technical blog posts here
>> .
>>
>> This email has been sent on behalf of Instaclustr Pty. Limited
>> (Australia) and Instaclustr Inc (USA).
>>
>> This email and any attachments may contain confidential and legally
>> privileged information.  If you are not the intended recipient, do not copy
>> or disclose its content, but please reply to this email immediately and
>> highlight the error to the sender and then immediately delete the message.
>>
>>
>> On Wed, 9 Jan 2019 at 06:01, Arvinder Dhillon 
>> wrote:
>>
>>> I'm trying to connect cassandra-stress 3.11.0 without providing user and
>>> password option on the comman line. It doesn't seems to be using cqlshrc.
>>> Any suggestions please?
>>>
>>> -Arvinder
>>>
>>


Re: Authenticate cassandra-stress with cqlshrc

2019-01-08 Thread Arvinder Dhillon
Yes, my cluster is set up to authenticate using
PasswordAuthentication(host, user and password are stored in cqlshrc).
When I try to run Cassandra-stress without providing user & password on
command line, it through authentication error. I expect Cassandra-stress to
read cqlshrc file and authenticate.
However, if I provide user, password on command line, it works perfectly.
Thanks

-Arvinder

On Tue, Jan 8, 2019, 1:35 PM Ben Slater  Is your cluster set up to require authentication? I’m a bit unclear about
> whether you’re trying to connect without passing a user name and password
> at all (which should just work as the default) or if you’re looking for
> some mechanism other than the command line to pass the user name / password
> (in which case I don’t think there is one but stress has a hell of a lot of
> options so I could be wrong).
>
> Cheers
> Ben
>
> ---
>
>
> *Ben Slater*
> *Chief Product Officer*
>
>
>    
>
>
> Read our latest technical blog posts here
> .
>
> This email has been sent on behalf of Instaclustr Pty. Limited (Australia)
> and Instaclustr Inc (USA).
>
> This email and any attachments may contain confidential and legally
> privileged information.  If you are not the intended recipient, do not copy
> or disclose its content, but please reply to this email immediately and
> highlight the error to the sender and then immediately delete the message.
>
>
> On Wed, 9 Jan 2019 at 06:01, Arvinder Dhillon 
> wrote:
>
>> I'm trying to connect cassandra-stress 3.11.0 without providing user and
>> password option on the comman line. It doesn't seems to be using cqlshrc.
>> Any suggestions please?
>>
>> -Arvinder
>>
>


Re: Authenticate cassandra-stress with cqlshrc

2019-01-08 Thread Ben Slater
Is your cluster set up to require authentication? I’m a bit unclear about
whether you’re trying to connect without passing a user name and password
at all (which should just work as the default) or if you’re looking for
some mechanism other than the command line to pass the user name / password
(in which case I don’t think there is one but stress has a hell of a lot of
options so I could be wrong).

Cheers
Ben

---


*Ben Slater*
*Chief Product Officer*


   


Read our latest technical blog posts here
.

This email has been sent on behalf of Instaclustr Pty. Limited (Australia)
and Instaclustr Inc (USA).

This email and any attachments may contain confidential and legally
privileged information.  If you are not the intended recipient, do not copy
or disclose its content, but please reply to this email immediately and
highlight the error to the sender and then immediately delete the message.


On Wed, 9 Jan 2019 at 06:01, Arvinder Dhillon  wrote:

> I'm trying to connect cassandra-stress 3.11.0 without providing user and
> password option on the comman line. It doesn't seems to be using cqlshrc.
> Any suggestions please?
>
> -Arvinder
>


Authenticate cassandra-stress with cqlshrc

2019-01-08 Thread Arvinder Dhillon
I'm trying to connect cassandra-stress 3.11.0 without providing user and
password option on the comman line. It doesn't seems to be using cqlshrc.
Any suggestions please?

-Arvinder


Re: How seed nodes are working and how to upgrade/replace them?

2019-01-08 Thread Jonathan Haddad
I've done some gossip simulations in the past and found virtually no
difference in the time it takes for messages to propagate in almost any
sized cluster.  IIRC it always converges by 17 iterations.  Thus, I
completely agree with Jeff's comment here.  If you aren't pushing 800-1000
nodes, it's not even worth bothering with.  Just be sure you have seeds in
each DC.

Something to be aware of - there's only a chance to gossip with a seed.
That chance goes down as cluster size increases, meaning seeds have less
and less of an impact as the cluster grows.  Once you get to 100+ nodes, a
given node is very rarely talking to a seed.

Just make sure when you start a node it's not in its own seed list and
you're good.


On Tue, Jan 8, 2019 at 9:39 AM Jeff Jirsa  wrote:

>
>
> On Tue, Jan 8, 2019 at 8:19 AM Jonathan Ballet  wrote:
>
>> Hi Jeff,
>>
>> thanks for answering to most of my points!
>> From the reloadseeds' ticket, I followed to
>> https://issues.apache.org/jira/browse/CASSANDRA-3829 which was very
>> instructive, although a bit old.
>>
>>
>> On Mon, 7 Jan 2019 at 17:23, Jeff Jirsa  wrote:
>>
>>> > On Jan 7, 2019, at 6:37 AM, Jonathan Ballet 
>>> wrote:
>>> >
>>> [...]
>>>
>>> >   In essence, in my example that would be:
>>> >
>>> >   - decide that #2 and #3 will be the new seed nodes
>>> >   - update all the configuration files of all the nodes to write the
>>> IP addresses of #2 and #3
>>> >   - DON'T restart any node - the new seed configuration will be picked
>>> up only if the Cassandra process restarts
>>> >
>>> > * If I can manage to sort my Cassandra nodes by their age, could it be
>>> a strategy to have the seeds set to the 2 oldest nodes in the cluster?
>>> (This implies these nodes would change as the cluster's nodes get
>>> upgraded/replaced).
>>>
>>> You could do this, seems like a lot of headache for little benefit.
>>> Could be done with simple seed provider and config management
>>> (puppet/chef/ansible) laying  down new yaml or with your own seed provider
>>>
>>
>> So, just to make it clear: sorting by age isn't a goal in itself, it was
>> just an example on how I could get a stable list.
>>
>> Right now, we have a dedicated group of seed nodes + a dedicated group
>> for non-seeds: doing rolling-upgrade of the nodes from the second list is
>> relatively painless (although slow) whereas we are facing the issues
>> discussed in CASSANDRA-3829 for the first group which are non-seeds nodes
>> are not bootstrapping automatically and we need to operate them in a more
>> careful way.
>>
>>
> Rolling upgrade shouldn't need to re-bootstrap. Only replacing a host
> should need a new bootstrap. That should be a new host in your list, so it
> seems like this should be fairly rare?
>
>
>> What I'm really looking for is a way to simplify adding and removing
>> nodes into our (small) cluster: I can easily provide a small list of nodes
>> from our cluster with our config management tool so that new nodes are
>> discovering the rest of the cluster, but the documentation seems to imply
>> that seed nodes also have other functions and I'm not sure what problems we
>> could face trying to simplify this approach.
>>
>> Ideally, what I would like to have would be:
>>
>> * Considering a stable cluster (no new nodes, no nodes leaving), the N
>> seeds should be always the same N nodes
>> * Adding new nodes should not change that list
>> * Stopping/removing one of these N nodes should "promote" another
>> (non-seed) node as a seed
>>   - that would not restart the already running Cassandra nodes but would
>> update their configuration files.
>>   - if a node restart for whatever reason it would pick up this new
>> configuration
>>
>> So: no node would start its life as a seed, only a few already existing
>> node would have this status. We would not have to deal with the "a seed
>> node doesn't bootstrap" problem and it would make our operation process
>> simpler.
>>
>>
>>> > I also have some more general questions about seed nodes and how they
>>> work:
>>> >
>>> > * I understand that seed nodes are used when a node starts and needs
>>> to discover the rest of the cluster's nodes. Once the node has joined and
>>> the cluster is stable, are seed nodes still playing a role in day to day
>>> operations?
>>>
>>> They’re used probabilistically in gossip to encourage convergence.
>>> Mostly useful in large clusters.
>>>
>>
>> How "large" are we speaking here? How many nodes would it start to be
>> considered "large"?
>>
>
> ~800-1000
>
>
>> Also, about the convergence: is this related to how fast/often the
>> cluster topology is changing? (new nodes, leaving nodes, underlying IP
>> addresses changing, etc.)
>>
>>
> New nodes, nodes going up/down, and schema propagation.
>
>
>> Thanks for your answers!
>>
>>  Jonathan
>>
>

-- 
Jon Haddad
http://www.rustyrazorblade.com
twitter: rustyrazorblade


Re: How seed nodes are working and how to upgrade/replace them?

2019-01-08 Thread Jeff Jirsa
On Tue, Jan 8, 2019 at 8:19 AM Jonathan Ballet  wrote:

> Hi Jeff,
>
> thanks for answering to most of my points!
> From the reloadseeds' ticket, I followed to
> https://issues.apache.org/jira/browse/CASSANDRA-3829 which was very
> instructive, although a bit old.
>
>
> On Mon, 7 Jan 2019 at 17:23, Jeff Jirsa  wrote:
>
>> > On Jan 7, 2019, at 6:37 AM, Jonathan Ballet  wrote:
>> >
>> [...]
>>
>> >   In essence, in my example that would be:
>> >
>> >   - decide that #2 and #3 will be the new seed nodes
>> >   - update all the configuration files of all the nodes to write the IP
>> addresses of #2 and #3
>> >   - DON'T restart any node - the new seed configuration will be picked
>> up only if the Cassandra process restarts
>> >
>> > * If I can manage to sort my Cassandra nodes by their age, could it be
>> a strategy to have the seeds set to the 2 oldest nodes in the cluster?
>> (This implies these nodes would change as the cluster's nodes get
>> upgraded/replaced).
>>
>> You could do this, seems like a lot of headache for little benefit. Could
>> be done with simple seed provider and config management
>> (puppet/chef/ansible) laying  down new yaml or with your own seed provider
>>
>
> So, just to make it clear: sorting by age isn't a goal in itself, it was
> just an example on how I could get a stable list.
>
> Right now, we have a dedicated group of seed nodes + a dedicated group for
> non-seeds: doing rolling-upgrade of the nodes from the second list is
> relatively painless (although slow) whereas we are facing the issues
> discussed in CASSANDRA-3829 for the first group which are non-seeds nodes
> are not bootstrapping automatically and we need to operate them in a more
> careful way.
>
>
Rolling upgrade shouldn't need to re-bootstrap. Only replacing a host
should need a new bootstrap. That should be a new host in your list, so it
seems like this should be fairly rare?


> What I'm really looking for is a way to simplify adding and removing nodes
> into our (small) cluster: I can easily provide a small list of nodes from
> our cluster with our config management tool so that new nodes are
> discovering the rest of the cluster, but the documentation seems to imply
> that seed nodes also have other functions and I'm not sure what problems we
> could face trying to simplify this approach.
>
> Ideally, what I would like to have would be:
>
> * Considering a stable cluster (no new nodes, no nodes leaving), the N
> seeds should be always the same N nodes
> * Adding new nodes should not change that list
> * Stopping/removing one of these N nodes should "promote" another
> (non-seed) node as a seed
>   - that would not restart the already running Cassandra nodes but would
> update their configuration files.
>   - if a node restart for whatever reason it would pick up this new
> configuration
>
> So: no node would start its life as a seed, only a few already existing
> node would have this status. We would not have to deal with the "a seed
> node doesn't bootstrap" problem and it would make our operation process
> simpler.
>
>
>> > I also have some more general questions about seed nodes and how they
>> work:
>> >
>> > * I understand that seed nodes are used when a node starts and needs to
>> discover the rest of the cluster's nodes. Once the node has joined and the
>> cluster is stable, are seed nodes still playing a role in day to day
>> operations?
>>
>> They’re used probabilistically in gossip to encourage convergence. Mostly
>> useful in large clusters.
>>
>
> How "large" are we speaking here? How many nodes would it start to be
> considered "large"?
>

~800-1000


> Also, about the convergence: is this related to how fast/often the cluster
> topology is changing? (new nodes, leaving nodes, underlying IP addresses
> changing, etc.)
>
>
New nodes, nodes going up/down, and schema propagation.


> Thanks for your answers!
>
>  Jonathan
>


Re: How seed nodes are working and how to upgrade/replace them?

2019-01-08 Thread Jeff Jirsa
Given Consul's popularity, seems like someone could make an argument that
we should be shipping a consul-aware seed provider.


On Tue, Jan 8, 2019 at 7:39 AM Jonathan Ballet  wrote:

> On Mon, 7 Jan 2019 at 16:51, Oleksandr Shulgin <
> oleksandr.shul...@zalando.de> wrote:
>
>> On Mon, Jan 7, 2019 at 3:37 PM Jonathan Ballet 
>> wrote:
>>
>>>
>>> I'm working on how we could improve the upgrades of our servers and how
>>> to replace them completely (new instance with a new IP address).
>>> What I would like to do is to replace the machines holding our current
>>> seeds (#1 and #2 at the moment) in a rolling upgrade fashion, on a regular
>>> basis:
>>>
>>> * Is it possible to "promote" any non-seed node as a seed node?
>>>
>>> * Is it possible to "promote" a new seed node without having to restart
>>> all the nodes?
>>>   In essence, in my example that would be:
>>>
>>>   - decide that #2 and #3 will be the new seed nodes
>>>   - update all the configuration files of all the nodes to write the IP
>>> addresses of #2 and #3
>>>   - DON'T restart any node - the new seed configuration will be picked
>>> up only if the Cassandra process restarts
>>>
>>
>> You can provide a custom implementation of the seed provider protocol:
>> org.apache.cassandra.locator.SeedProvider
>>
>> We were exploring that approach few years ago with etcd, which I think
>> provides capabilities similar to that of Consul:
>> https://github.com/a1exsh/cassandra-etcd-seed-provider/blob/master/src/main/java/org/zalando/cassandra/locator/EtcdSeedProvider.java
>>
>
> Hi Alex,
>
> we were using also a dedicated Consul seed provider but we weren't
> confident enough about maintaining our version so we removed it in favor of
> something simpler.
> Ultimately, we hope(d) that delegating the maintenance of that list to an
> external process (like Consul Template), directly updating the
> configuration file, is (should be?) mostly similar without having to
> maintain our own copy, built with the right version of Cassandra, etc.
>
> Thanks for the info though!
>
>  Jonathan
>
>


Re: How seed nodes are working and how to upgrade/replace them?

2019-01-08 Thread Jonathan Ballet
Hi Jeff,

thanks for answering to most of my points!
>From the reloadseeds' ticket, I followed to
https://issues.apache.org/jira/browse/CASSANDRA-3829 which was very
instructive, although a bit old.


On Mon, 7 Jan 2019 at 17:23, Jeff Jirsa  wrote:

> > On Jan 7, 2019, at 6:37 AM, Jonathan Ballet  wrote:
> >
> [...]
>
> >   In essence, in my example that would be:
> >
> >   - decide that #2 and #3 will be the new seed nodes
> >   - update all the configuration files of all the nodes to write the IP
> addresses of #2 and #3
> >   - DON'T restart any node - the new seed configuration will be picked
> up only if the Cassandra process restarts
> >
> > * If I can manage to sort my Cassandra nodes by their age, could it be a
> strategy to have the seeds set to the 2 oldest nodes in the cluster? (This
> implies these nodes would change as the cluster's nodes get
> upgraded/replaced).
>
> You could do this, seems like a lot of headache for little benefit. Could
> be done with simple seed provider and config management
> (puppet/chef/ansible) laying  down new yaml or with your own seed provider
>

So, just to make it clear: sorting by age isn't a goal in itself, it was
just an example on how I could get a stable list.

Right now, we have a dedicated group of seed nodes + a dedicated group for
non-seeds: doing rolling-upgrade of the nodes from the second list is
relatively painless (although slow) whereas we are facing the issues
discussed in CASSANDRA-3829 for the first group which are non-seeds nodes
are not bootstrapping automatically and we need to operate them in a more
careful way.

What I'm really looking for is a way to simplify adding and removing nodes
into our (small) cluster: I can easily provide a small list of nodes from
our cluster with our config management tool so that new nodes are
discovering the rest of the cluster, but the documentation seems to imply
that seed nodes also have other functions and I'm not sure what problems we
could face trying to simplify this approach.

Ideally, what I would like to have would be:

* Considering a stable cluster (no new nodes, no nodes leaving), the N
seeds should be always the same N nodes
* Adding new nodes should not change that list
* Stopping/removing one of these N nodes should "promote" another
(non-seed) node as a seed
  - that would not restart the already running Cassandra nodes but would
update their configuration files.
  - if a node restart for whatever reason it would pick up this new
configuration

So: no node would start its life as a seed, only a few already existing
node would have this status. We would not have to deal with the "a seed
node doesn't bootstrap" problem and it would make our operation process
simpler.


> > I also have some more general questions about seed nodes and how they
> work:
> >
> > * I understand that seed nodes are used when a node starts and needs to
> discover the rest of the cluster's nodes. Once the node has joined and the
> cluster is stable, are seed nodes still playing a role in day to day
> operations?
>
> They’re used probabilistically in gossip to encourage convergence. Mostly
> useful in large clusters.
>

How "large" are we speaking here? How many nodes would it start to be
considered "large"?
Also, about the convergence: is this related to how fast/often the cluster
topology is changing? (new nodes, leaving nodes, underlying IP addresses
changing, etc.)

Thanks for your answers!

 Jonathan


Re: About the relationship between the sstable compaction and the read path

2019-01-08 Thread Jeff Jirsa
First: 

Compaction controls how sstables are combined but not how they’re read. The 
read path (with one tiny exception) doesn’t know or care which compaction 
strategy you’re using. 

A few more notes inline. 

> On Jan 8, 2019, at 3:04 AM, Jinhua Luo  wrote:
> 
> Hi All,
> 
> The compaction would organize the sstables, e.g. with LCS, the
> sstables would be categorized into levels, and the read path should
> read sstables level by level until the read is fulfilled, correct?

LCS levels are to minimize the number of sstables scanned - at most one per 
level - but there’s no attempt to fulfill the read with low levels beyond the 
filtering done by timestamp.

> 
> For STCS, it would search sstables in buckets from smallest to largest?

Nope. No attempt to do this. 

> 
> What about other compaction cases? They would iterate all sstables?

In all cases, we’ll use a combination of bloom filters and sstable metadata and 
indices to include / exclude sstables. If the bloom filter hits, we’ll consider 
things like timestamps and whether or not the min/max clustering of the sstable 
matches the slice we care about. We don’t consult the compaction strategy, 
though the compaction strategy may have (in the case of LCS or TWCS) placed the 
sstables into a state that makes this read less expensive.
 
> 
> But in the codes, I'm confused a lot:
> In 
> org.apache.cassandra.db.SinglePartitionReadCommand#queryMemtableAndDiskInternal,
> it seems that no matter whether the selected columns (except the
> collection/cdt and counter cases, let's assume here the selected
> columns are simple cell) are collected and satisfied, it would search
> both memtable and all sstables, regardless of the compaction strategy.

There’s another that includes timestamps that will do some smart-ish exclusion 
of sstables that aren’t needed for the read command.  

> 
> Why?
> 
> Moreover, for collection/cdt (non-frozen) and counter types, it would
> need to iterate all sstable to ensure the whole set of the fields are
> collected, correct? If so, such multi-cell or counter types are
> heavyweight in performance, correct?
> 
> -
> To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
> For additional commands, e-mail: user-h...@cassandra.apache.org
> 

-
To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
For additional commands, e-mail: user-h...@cassandra.apache.org



Re: How seed nodes are working and how to upgrade/replace them?

2019-01-08 Thread Jonathan Ballet
On Mon, 7 Jan 2019 at 16:51, Oleksandr Shulgin 
wrote:

> On Mon, Jan 7, 2019 at 3:37 PM Jonathan Ballet  wrote:
>
>>
>> I'm working on how we could improve the upgrades of our servers and how
>> to replace them completely (new instance with a new IP address).
>> What I would like to do is to replace the machines holding our current
>> seeds (#1 and #2 at the moment) in a rolling upgrade fashion, on a regular
>> basis:
>>
>> * Is it possible to "promote" any non-seed node as a seed node?
>>
>> * Is it possible to "promote" a new seed node without having to restart
>> all the nodes?
>>   In essence, in my example that would be:
>>
>>   - decide that #2 and #3 will be the new seed nodes
>>   - update all the configuration files of all the nodes to write the IP
>> addresses of #2 and #3
>>   - DON'T restart any node - the new seed configuration will be picked up
>> only if the Cassandra process restarts
>>
>
> You can provide a custom implementation of the seed provider protocol:
> org.apache.cassandra.locator.SeedProvider
>
> We were exploring that approach few years ago with etcd, which I think
> provides capabilities similar to that of Consul:
> https://github.com/a1exsh/cassandra-etcd-seed-provider/blob/master/src/main/java/org/zalando/cassandra/locator/EtcdSeedProvider.java
>

Hi Alex,

we were using also a dedicated Consul seed provider but we weren't
confident enough about maintaining our version so we removed it in favor of
something simpler.
Ultimately, we hope(d) that delegating the maintenance of that list to an
external process (like Consul Template), directly updating the
configuration file, is (should be?) mostly similar without having to
maintain our own copy, built with the right version of Cassandra, etc.

Thanks for the info though!

 Jonathan


About the relationship between the sstable compaction and the read path

2019-01-08 Thread Jinhua Luo
Hi All,

The compaction would organize the sstables, e.g. with LCS, the
sstables would be categorized into levels, and the read path should
read sstables level by level until the read is fulfilled, correct?

For STCS, it would search sstables in buckets from smallest to largest?

What about other compaction cases? They would iterate all sstables?

But in the codes, I'm confused a lot:
In 
org.apache.cassandra.db.SinglePartitionReadCommand#queryMemtableAndDiskInternal,
it seems that no matter whether the selected columns (except the
collection/cdt and counter cases, let's assume here the selected
columns are simple cell) are collected and satisfied, it would search
both memtable and all sstables, regardless of the compaction strategy.

Why?

Moreover, for collection/cdt (non-frozen) and counter types, it would
need to iterate all sstable to ensure the whole set of the fields are
collected, correct? If so, such multi-cell or counter types are
heavyweight in performance, correct?

-
To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
For additional commands, e-mail: user-h...@cassandra.apache.org