Re: Frequency of rebuild_index

2018-04-26 Thread Anup Shirolkar
Hi,

The secondary indices in Cassandra are maintained continuously as data is
written. Also index rebuilding is kicked off automatically when you create
a new index. So, there is no good reason to schedule nodetool rebuild_index
regularly.

However, if you find any discrepancy in the index and data you should run
it. Ideally, this should not happen but if it is required as a result of
any major activity/failure you can use it.

Talking about the load it puts on system, it depends upon the size of index
itself. Although it will consume resources, it should not give a major
performance hit to the system.

Regards,
Anup

On 27 April 2018 at 13:46, Akshit Jain  wrote:

> Hi,
> How frequently one should run nodetool rebuild_index and what's its impact
> on performance in terms of iops,cpu utilisation etc.
>
> Regards
>
>


Frequency of rebuild_index

2018-04-26 Thread Akshit Jain
Hi,
How frequently one should run nodetool rebuild_index and what's its impact
on performance in terms of iops,cpu utilisation etc.

Regards


Re: Repair of 5GB data vs. disk throughput does not make sense

2018-04-26 Thread horschi
Hi Thomas,

I don't think I have seen compaction ever being faster.

For me, tables with small values usually are around 5 MB/s with a single
compaction. With larger blobs (few KB per blob) I have seen 16MB/s. Both
with "nodetool setcompactionthroughput 0".

I don't think its disk related either. I think parsing the data simply
utilizes the CPU or perhaps the issue is GC related? But I have never dug
into it, I just observed low IO-wait percentages in top.

regards,
Christian




On Thu, Apr 26, 2018 at 7:39 PM, Jonathan Haddad  wrote:

> I can't say for sure, because I haven't measured it, but I've seen a
> combination of readahead + large chunk size with compression cause serious
> issues with read amplification, although I'm not sure if or how it would
> apply here.  Likely depends on the size of your partitions and the
> fragmentation of the sstables, although at only 5GB I'm really surprised to
> hear 32GB read in, that seems a bit absurd.
>
> Definitely something to dig deeper into.
>
> On Thu, Apr 26, 2018 at 5:02 AM Steinmaurer, Thomas <
> thomas.steinmau...@dynatrace.com> wrote:
>
>> Hello,
>>
>>
>>
>> yet another question/issue with repair.
>>
>>
>>
>> Cassandra 2.1.18, 3 nodes, RF=3, vnode=256, data volume ~ 5G per node
>> only. A repair (nodetool repair -par) issued on a single node at this data
>> volume takes around 36min with an AVG of ~ 15MByte/s disk throughput
>> (read+write) for the entire time-frame, thus processing ~ 32GByte from a
>> disk perspective so ~ 6 times of the real data volume reported by nodetool
>> status. Does this make any sense? This is with 4 compaction threads and
>> compaction throughput = 64. Similar results doing this test a few times,
>> where most/all inconsistent data should be already sorted out by previous
>> runs.
>>
>>
>>
>> I know there is e.g. reaper, but the above is a simple use case simply
>> after a single failed node recovers beyond the 3h hinted handoff window.
>> How should this finish in a timely manner for > 500G on a recovering node?
>>
>>
>>
>> I have to admit this is with NFS as storage. I know, NFS might not be the
>> best idea, but with the above test at ~ 5GB data volume, we see an IOPS
>> rate at ~ 700 at a disk latency of ~ 15ms, thus I wouldn’t treat it as that
>> bad. This all is using/running Cassandra on-premise (at the customer, so
>> not hosted by us), so while we can make recommendations storage-wise (of
>> course preferring local disks), it may and will happen that NFS is being in
>> use then.
>>
>>
>>
>> Why we are using -par in combination with NFS is a different story and
>> related to this issue: https://issues.apache.org/
>> jira/browse/CASSANDRA-8743. Without switching from sequential to
>> parallel repair, we basically kill Cassandra.
>>
>>
>>
>> Throughput-wise, I also don’t think it is related to NFS, cause we see
>> similar repair throughput values with AWS EBS (gp2, SSD based) running
>> regular repairs on small-sized CFs.
>>
>>
>>
>> Thanks for any input.
>>
>> Thomas
>> The contents of this e-mail are intended for the named addressee only. It
>> contains information that may be confidential. Unless you are the named
>> addressee or an authorized designee, you may not copy or use it, or
>> disclose it to anyone else. If you received it in error please notify us
>> immediately and then destroy it. Dynatrace Austria GmbH (registration
>> number FN 91482h) is a company registered in Linz whose registered office
>> is at 4040 Linz, Austria, Freistädterstraße 313
>>
>


RE: [EXTERNAL] Re: Cassandra reaper

2018-04-26 Thread Durity, Sean R
Wait, isn’t this the Apache Cassandra mailing list? Shouldn’t this be on the 
pickle users list or something?

(Just kidding, everyone. I think there should be room for reaper and DataStax 
inquiries here.)


Sean Durity

From: Joaquin Casares [mailto:joaq...@thelastpickle.com]
Sent: Tuesday, April 24, 2018 9:01 PM
To: user@cassandra.apache.org
Subject: [EXTERNAL] Re: Cassandra reaper

Sure thing Abdul,

That's great to hear! Unfortunately, the JMX authentication needs to be in the 
config file currently. And even if the JMX authentication was stored within 
Cassandra, we would still need to store connection details within the yaml and 
storing the JMX authentication credentials within Cassandra may not be ideal 
from a security standpoint.

The UI keeps logs of all the previous repairs, to the best of my knowledge. If 
you want to completely uninstall Reaper, you can perform a DROP KEYSPACE 
reaper_db; from within cqlsh, but that would remove all schedules as well.

Cheers,

Joaquin

Joaquin Casares
Consultant
Austin, TX

Apache Cassandra Consulting
http://www.thelastpickle.com

On Tue, Apr 24, 2018 at 7:49 PM, Abdul Patel 
> wrote:
Thanks Joaquin,

Yes i used the same and worked fine ..only thing is i had to add userid 
password ..which is somewhat annoyoing to keep in comfig file ..can i get reed 
of it and still store on reaper_db keyspace?
Also how to clean reaper_db by deleting completed reaper information from gui? 
Or any other cleanup is required?


On Tuesday, April 24, 2018, Joaquin Casares 
> wrote:
Hello Abdul,

Depending on what you want your backend to be stored on, you'll want to use a 
different file.

So if you want your Reaper state to be stored within a Cassandra cluster, which 
I would recommend, use this file as your base file:

https://github.com/thelastpickle/cassandra-reaper/blob/master/src/packaging/resource/cassandra-reaper-cassandra.yaml

Make a copy of the yaml and include your system-specific settings. Then symlink 
it to the following location:

/etc/cassandra-reaper/cassandra-reaper.yaml

For completeness, this file is an example of how to use a Postgres server to 
store the Reaper state:

https://github.com/thelastpickle/cassandra-reaper/blob/master/src/packaging/resource/cassandra-reaper.yaml

Hope that helped!

Joaquin Casares
Consultant
Austin, TX

Apache Cassandra Consulting
http://www.thelastpickle.com

On Tue, Apr 24, 2018 at 7:07 PM, Abdul Patel 
> wrote:
Thanks

But the differnce here is cassandra-reaper-caasandra has more paramters than 
the cassandra-reaper.yaml
Can i just use the 1 file with all details or it looks for one specific file?


On Tuesday, April 24, 2018, Joaquin Casares 
> wrote:
Hello Abdul,

You'll only want one:

The yaml file used by the service is located at 
/etc/cassandra-reaper/cassandra-reaper.yaml and alternate config templates can 
be found under /etc/cassandra-reaper/configs. It is recommended to create a new 
file with your specific configuration and symlink it as 
/etc/cassandra-reaper/cassandra-reaper.yaml to avoid your configuration from 
being overwritten during upgrades.

Adapt the config file to suit your setup and then run `sudo service 
cassandra-reaper start`.

Source: 
http://cassandra-reaper.io/docs/download/install/#service-configuration

Hope that helps!

Joaquin Casares
Consultant
Austin, TX

Apache Cassandra Consulting

Re: does c* 3.0 use one ring for all datacenters?

2018-04-26 Thread Jeff Jirsa
On Thu, Apr 26, 2018 at 1:34 AM, Jinhua Luo  wrote:

> How to guarantee the tokens independent between DC?


Cassandra wont let you have duplicate tokens - it wont start if you do it
by mistake, and it won't do it automatically.


> They forms one
> ring, and they must be (re-)assigned when needed.
>

Tokens dont move automatically. There's no auto-reassignment. You can move
a token, but nothing does it automatically.


> Use offset per DC? But it seems that the DC list must be fixed in advanced?
> To make sure the tokens are evenly distributed into the ring among the
> DC(s), are there chances to change the tokens owned by per DC?
> Could you please give a detailed token re-balancing procedure in case
> of node add/remove?
>

Calculate final state. Run repair and cleanup. Move tokens as needed. If
you're not able to reason through this, you may want to consider using
vnodes so it becomes less of an issue.


>
> 2018-04-26 16:23 GMT+08:00 Xiaolong Jiang :
> > DC are independent of each other. Adding nodes to DC1  won't have any
> token
> > effect owned by other DC.
> >
> > On Thu, Apr 26, 2018 at 1:04 AM, Jinhua Luo  wrote:
> >>
> >> You're assuming per DC has same total num_tokens, right?
> >> If I add a new node into DC1, will it change the tokens owned by DC2 and
> >> DC3?
> >>
> >> 2018-04-12 0:59 GMT+08:00 Jeff Jirsa :
> >> > When you add DC3, they'll get tokens (that aren't currently in use in
> >> > any
> >> > existing DC). Either you assign tokens (let's pretend we manually
> >> > assigned
> >> > the other ones, since DC2 = DC1 + 1), but cassandra can also
> >> > auto-calculate
> >> > them, the exact behavior of which varies by version.
> >> >
> >> >
> >> > Let's pretend it's old style random assignment, and we end up with DC3
> >> > having 4, 17, 22, 36, 48, 53, 64, 73, 83
> >> >
> >> > In this case:
> >> >
> >> > If you use SimpleStrategy and RF=3, a key with token 5 would be placed
> >> > on
> >> > the hosts with token 10, 11, 17
> >> > If you use NetworkTopologyStrategy with RF=3 per DC, a key with token
> 5
> >> > would be placed on the hosts with tokens 10,20,30 ; 11, 21,31 ; 17,
> 22,
> >> > 36
> >> >
> >> >
> >> >
> >> >
> >> > On Wed, Apr 11, 2018 at 9:36 AM, Jinhua Luo 
> wrote:
> >> >>
> >> >> What if I add a new DC3?
> >> >> The token ranges would reshuffled into DC1, DC2, DC3?
> >> >>
> >> >> 2018-04-11 22:06 GMT+08:00 Jeff Jirsa :
> >> >> > Confirming again that it's definitely one ring.
> >> >> >
> >> >> > DC1 may have tokens 0, 10, 20, 30, 40, 50, 60, 70, 80
> >> >> > DC2 may have tokens 1, 11, 21, 31, 41, 51, 61, 71, 81
> >> >> >
> >> >> > If you use SimpleStrategy and RF=3, a key with token 5 would be
> >> >> > placed
> >> >> > on
> >> >> > the hosts with token 10, 11, 20
> >> >> > If you use NetworkTopologyStrategy with RF=3 per DC, a key with
> token
> >> >> > 5
> >> >> > would be placed on the hosts with tokens 10,20,30 and 11, 21,31
> >> >> >
> >> >> >
> >> >> >
> >> >> >
> >> >> >
> >> >> > On Wed, Apr 11, 2018 at 6:27 AM, Jinhua Luo 
> >> >> > wrote:
> >> >> >>
> >> >> >> Is it a different answer? One ring?
> >> >> >>
> >> >> >> Could you explain your answer according to my example?
> >> >> >>
> >> >> >> 2018-04-11 21:24 GMT+08:00 Jonathan Haddad :
> >> >> >> > There has always been a single ring.
> >> >> >> >
> >> >> >> > You can specify how many nodes in each DC you want and it’ll
> >> >> >> > figure
> >> >> >> > out
> >> >> >> > how
> >> >> >> > to do it as long as you have the right snitch and are using
> >> >> >> > NetworkToploogyStrategy.
> >> >> >> >
> >> >> >> >
> >> >> >> > On Wed, Apr 11, 2018 at 6:11 AM Jinhua Luo  >
> >> >> >> > wrote:
> >> >> >> >>
> >> >> >> >> Let me clarify my question:
> >> >> >> >>
> >> >> >> >> Given we have a cluster of two DCs, each DC has 2 nodes, each
> >> >> >> >> node
> >> >> >> >> sets num_token as 50.
> >> >> >> >> Then how are token ranges distributed in the cluster?
> >> >> >> >>
> >> >> >> >> If there is one global ring, then it may be (To simply the
> case,
> >> >> >> >> let's
> >> >> >> >> assume vnodes=1):
> >> >> >> >> {dc1, node1} 1-50
> >> >> >> >> {dc2, node1} 51-100
> >> >> >> >> {dc1, node1} 101-150
> >> >> >> >> {dc1, node2} 151-200
> >> >> >> >>
> >> >> >> >> But here comes more questions:
> >> >> >> >> a) what if I add a new datacenter? Then the token ranges need
> to
> >> >> >> >> be
> >> >> >> >> re-balanced?
> >> >> >> >> If so, what about the data associated with the ranges to be
> >> >> >> >> balanced?
> >> >> >> >> move them among DCs?
> >> >> >> >> But that doesn't make sense, because each keyspace would
> specify
> >> >> >> >> its
> >> >> >> >> snith and fix the DCs to store then.
> >> >> >> >>
> >> >> >> >> b) It seems no benefits from same ring, because of the snith.
> >> >> >> >>
> >> >> >> >> If each DC has own ring, then 

Re: Repair of 5GB data vs. disk throughput does not make sense

2018-04-26 Thread Jonathan Haddad
I can't say for sure, because I haven't measured it, but I've seen a
combination of readahead + large chunk size with compression cause serious
issues with read amplification, although I'm not sure if or how it would
apply here.  Likely depends on the size of your partitions and the
fragmentation of the sstables, although at only 5GB I'm really surprised to
hear 32GB read in, that seems a bit absurd.

Definitely something to dig deeper into.

On Thu, Apr 26, 2018 at 5:02 AM Steinmaurer, Thomas <
thomas.steinmau...@dynatrace.com> wrote:

> Hello,
>
>
>
> yet another question/issue with repair.
>
>
>
> Cassandra 2.1.18, 3 nodes, RF=3, vnode=256, data volume ~ 5G per node
> only. A repair (nodetool repair -par) issued on a single node at this data
> volume takes around 36min with an AVG of ~ 15MByte/s disk throughput
> (read+write) for the entire time-frame, thus processing ~ 32GByte from a
> disk perspective so ~ 6 times of the real data volume reported by nodetool
> status. Does this make any sense? This is with 4 compaction threads and
> compaction throughput = 64. Similar results doing this test a few times,
> where most/all inconsistent data should be already sorted out by previous
> runs.
>
>
>
> I know there is e.g. reaper, but the above is a simple use case simply
> after a single failed node recovers beyond the 3h hinted handoff window.
> How should this finish in a timely manner for > 500G on a recovering node?
>
>
>
> I have to admit this is with NFS as storage. I know, NFS might not be the
> best idea, but with the above test at ~ 5GB data volume, we see an IOPS
> rate at ~ 700 at a disk latency of ~ 15ms, thus I wouldn’t treat it as that
> bad. This all is using/running Cassandra on-premise (at the customer, so
> not hosted by us), so while we can make recommendations storage-wise (of
> course preferring local disks), it may and will happen that NFS is being in
> use then.
>
>
>
> Why we are using -par in combination with NFS is a different story and
> related to this issue:
> https://issues.apache.org/jira/browse/CASSANDRA-8743. Without switching
> from sequential to parallel repair, we basically kill Cassandra.
>
>
>
> Throughput-wise, I also don’t think it is related to NFS, cause we see
> similar repair throughput values with AWS EBS (gp2, SSD based) running
> regular repairs on small-sized CFs.
>
>
>
> Thanks for any input.
>
> Thomas
> The contents of this e-mail are intended for the named addressee only. It
> contains information that may be confidential. Unless you are the named
> addressee or an authorized designee, you may not copy or use it, or
> disclose it to anyone else. If you received it in error please notify us
> immediately and then destroy it. Dynatrace Austria GmbH (registration
> number FN 91482h) is a company registered in Linz whose registered office
> is at 4040 Linz, Austria, Freistädterstraße 313
>


Re: com.datastax.driver.core.exceptions.NoHostAvailableException

2018-04-26 Thread Lou DeGenaro
Good call! Java client was using Cassandra 2.11 lib jars in classpath.
Switching to Cassandra 3.11 jars in Java client classpath works!

Thx!

Lou.

On Thu, Apr 26, 2018 at 10:30 AM, Michael Shuler 
wrote:

> On 04/26/2018 09:17 AM, Lou DeGenaro wrote:
> >
> > I started fresh and edited the 3.11 cassandra.yaml file.  Here are the
> > exact changes:
> >
> > diff cassandra.yaml cassandra.yaml.orig
> > 425c425
> > <   - seeds: "bluej421"
> > ---
> >>   - seeds: "127.0.0.1"
> > 599c599
> > < listen_address: bluej421
> > ---
> >> listen_address: localhost
> > 676c676
> > < rpc_address: bluej421
> > ---
> >> rpc_address: localhost
> >
> > I made no other changes to Cassandra.  After launching server, cqlsh
> > client works.
>
> cqlsh uses embedded python driver. Good check, server is running.
>
> > My java client fails just the same.
>
> Check your java driver version is compatible with your version of
> Cassandra. See Andy Tolbert's comment on
> https://datastax-oss.atlassian.net/browse/JAVA-1092
>
> The system tables changed in 3.0+.
> (I hope this guess is close than my last couple :) )
>
> --
> Michael
>
> -
> To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
> For additional commands, e-mail: user-h...@cassandra.apache.org
>
>


Re: com.datastax.driver.core.exceptions.NoHostAvailableException

2018-04-26 Thread Michael Shuler
On 04/26/2018 09:17 AM, Lou DeGenaro wrote:
> 
> I started fresh and edited the 3.11 cassandra.yaml file.  Here are the
> exact changes:
> 
> diff cassandra.yaml cassandra.yaml.orig
> 425c425
> <   - seeds: "bluej421"
> ---
>>   - seeds: "127.0.0.1"
> 599c599
> < listen_address: bluej421
> ---
>> listen_address: localhost
> 676c676
> < rpc_address: bluej421
> ---
>> rpc_address: localhost
> 
> I made no other changes to Cassandra.  After launching server, cqlsh
> client works.

cqlsh uses embedded python driver. Good check, server is running.

> My java client fails just the same.

Check your java driver version is compatible with your version of
Cassandra. See Andy Tolbert's comment on
https://datastax-oss.atlassian.net/browse/JAVA-1092

The system tables changed in 3.0+.
(I hope this guess is close than my last couple :) )

-- 
Michael

-
To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
For additional commands, e-mail: user-h...@cassandra.apache.org



Re: com.datastax.driver.core.exceptions.NoHostAvailableException

2018-04-26 Thread Lou DeGenaro
I did not realize that the 3.0.9 cassandra.yaml file is not compatible with
3.11??

I started fresh and edited the 3.11 cassandra.yaml file.  Here are the
exact changes:

diff cassandra.yaml cassandra.yaml.orig
425c425
<   - seeds: "bluej421"
---
>   - seeds: "127.0.0.1"
599c599
< listen_address: bluej421
---
> listen_address: localhost
676c676
< rpc_address: bluej421
---
> rpc_address: localhost

I made no other changes to Cassandra.  After launching server, cqlsh client
works.  My java client fails just the same.

Lou.

On Thu, Apr 26, 2018 at 10:03 AM, Michael Shuler 
wrote:

> OK, thanks for the extra info.
>
> Hmm.. `unconfigured table schema_keyspaces`
>
> Seems like an incomplete upgrade to 3.0.9 (and now 3.11.2) from some
> earlier version, which used schema_columnfamilies, I think?
>
> --
> Michael
>
> On 04/26/2018 08:55 AM, Lou DeGenaro wrote:
> > Sorry, my mistake.  Everything is bluej421.  I tried  to (but in hind
> > sight should not have) edit the append to make the host more generic.
> > The actual experiment uses bluej421 everywhere.
> >
> > cqlsh from the same host works fine with the same exact host specified
> > as CQLSH_HOST.
> >
> > I just now installed apache-cassandra-3.11.2-bin.tar.gz and the problem
> > persists.
> >
> >
> >
> > On Thu, Apr 26, 2018 at 9:45 AM, Michael Shuler  > > wrote:
> >
> > host421 != bluej421
> > My guess is 192.168.3.232 != {host421,bluej421} somewhere.
> >
> > If DNS hostnames are being used, the DNS infrastructure needs to be
> spot
> > on, forward and reverse. If the DNS infrastructure is /etc/hosts,
> those
> > hosts entries need to be spot on for the entire cluster, forward and
> > reverse.
> >
> > `ping` your hosts from nodes themselves and from remote nodes. Check
> the
> > listening ports on all nodes with `netstat`. `telnet $host $port`
> > locally and remotely. Were the results expected?
> >
> > Basically, if using DNS, it has to be right everywhere and a lot of
> > people get DNS wrong.
> >
> > --
> > Kind regards,
> > Michael
> >
> > On 04/26/2018 08:17 AM, Lou DeGenaro wrote:
> > > version: cassandra-3.0.9
> > >
> > > conf/cassnadra.yaml changes:
> > >
> > >   - seeds: "host421"
> > > listen_address: host421
> > > rpc_address: host421
> > >
> > >
> > > Java client:
> > >
> > > package database.tools;
> > >
> > > import java.net.InetSocketAddress;
> > > import java.util.Map;
> > > import java.util.Map.Entry;
> > >
> > > import com.datastax.driver.core.AuthProvider;
> > > import com.datastax.driver.core.Cluster;
> > > import com.datastax.driver.core.PlainTextAuthProvider;
> > > import com.datastax.driver.core.Session;
> > > import
> > com.datastax.driver.core.exceptions.NoHostAvailableException;
> > >
> > > public class Creator {
> > >
> > > private static Cluster cluster;
> > > private static Session session = null;
> > >
> > > private static String dburl = "host421";
> > >
> > > public static void main(String[] args) {
> > > try {
> > > AuthProvider auth = new
> > > PlainTextAuthProvider("cassandra", "cassandra");
> > > cluster = Cluster.builder()
> > > .withAuthProvider(auth)
> > > .addContactPoint(dburl)
> > > .build();
> > >
> > > session = cluster.connect();
> > > }
> > > catch(NoHostAvailableException e) {
> > > e.printStackTrace();
> > > Map map =
> e.getErrors();
> > > for(Entry entry :
> > > map.entrySet()) {
> > > Throwable t = entry.getValue();
> > > t.printStackTrace();
> > > }
> > > }
> > > catch(Exception e) {
> > > e.printStackTrace();
> > > }
> > > }
> > >
> > > }
> > >
> > >
> > > Result:
> > >
> > >  INFO | Found Netty's native epoll transport in the classpath,
> > using it
> > > com.datastax.driver.core.exceptions.NoHostAvailableException:
> All
> > > host(s) tried for query failed (tried:
> > bluej421/192.168.3.232:9042 
> > > 
> > > (com.datastax.driver.core.exceptions.InvalidQueryException:
> > > unconfigured table schema_keyspaces))
> > > at
> > >
> >  com.datastax.driver.core.ControlConnection.reconnectInternal(
> 

Re: com.datastax.driver.core.exceptions.NoHostAvailableException

2018-04-26 Thread Michael Shuler
On 04/26/2018 09:03 AM, Michael Shuler wrote:
> Seems like an incomplete upgrade to 3.0.9 (and now 3.11.2) from some
> earlier version, which used schema_columnfamilies, I think?

Similar error on:
https://datastax-oss.atlassian.net/browse/JAVA-1092

-- 
Michael

-
To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
For additional commands, e-mail: user-h...@cassandra.apache.org



Re: com.datastax.driver.core.exceptions.NoHostAvailableException

2018-04-26 Thread Michael Shuler
OK, thanks for the extra info.

Hmm.. `unconfigured table schema_keyspaces`

Seems like an incomplete upgrade to 3.0.9 (and now 3.11.2) from some
earlier version, which used schema_columnfamilies, I think?

-- 
Michael

On 04/26/2018 08:55 AM, Lou DeGenaro wrote:
> Sorry, my mistake.  Everything is bluej421.  I tried  to (but in hind
> sight should not have) edit the append to make the host more generic. 
> The actual experiment uses bluej421 everywhere.
> 
> cqlsh from the same host works fine with the same exact host specified
> as CQLSH_HOST.
> 
> I just now installed apache-cassandra-3.11.2-bin.tar.gz and the problem
> persists.
> 
> 
> 
> On Thu, Apr 26, 2018 at 9:45 AM, Michael Shuler  > wrote:
> 
> host421 != bluej421
> My guess is 192.168.3.232 != {host421,bluej421} somewhere.
> 
> If DNS hostnames are being used, the DNS infrastructure needs to be spot
> on, forward and reverse. If the DNS infrastructure is /etc/hosts, those
> hosts entries need to be spot on for the entire cluster, forward and
> reverse.
> 
> `ping` your hosts from nodes themselves and from remote nodes. Check the
> listening ports on all nodes with `netstat`. `telnet $host $port`
> locally and remotely. Were the results expected?
> 
> Basically, if using DNS, it has to be right everywhere and a lot of
> people get DNS wrong.
> 
> -- 
> Kind regards,
> Michael
> 
> On 04/26/2018 08:17 AM, Lou DeGenaro wrote:
> > version: cassandra-3.0.9
> >
> >     conf/cassnadra.yaml changes:
> >
> >       - seeds: "host421"
> >     listen_address: host421
> >     rpc_address: host421
> >
> >
> > Java client:
> >
> >     package database.tools;
> >
> >     import java.net.InetSocketAddress;
> >     import java.util.Map;
> >     import java.util.Map.Entry;
> >
> >     import com.datastax.driver.core.AuthProvider;
> >     import com.datastax.driver.core.Cluster;
> >     import com.datastax.driver.core.PlainTextAuthProvider;
> >     import com.datastax.driver.core.Session;
> >     import
> com.datastax.driver.core.exceptions.NoHostAvailableException;
> >
> >     public class Creator {
> >    
> >     private static Cluster cluster;
> >     private static Session session = null;
> >    
> >     private static String dburl = "host421";
> >    
> >     public static void main(String[] args) {
> >         try {
> >             AuthProvider auth = new
> >     PlainTextAuthProvider("cassandra", "cassandra");
> >     cluster = Cluster.builder()
> >     .withAuthProvider(auth)
> >     .addContactPoint(dburl)
> >     .build();
> >    
> >     session = cluster.connect();
> >         }
> >         catch(NoHostAvailableException e) {
> >             e.printStackTrace();
> >             Map map = e.getErrors();
> >             for(Entry entry :
> >     map.entrySet()) {
> >                 Throwable t = entry.getValue();
> >                 t.printStackTrace();
> >             }
> >         }
> >         catch(Exception e) {
> >             e.printStackTrace();
> >         }
> >     }
> >
> >     }
> >
> >
> > Result:
> >
> >      INFO | Found Netty's native epoll transport in the classpath,
> using it
> >     com.datastax.driver.core.exceptions.NoHostAvailableException: All
> >     host(s) tried for query failed (tried:
> bluej421/192.168.3.232:9042 
> >     
> >     (com.datastax.driver.core.exceptions.InvalidQueryException:
> >     unconfigured table schema_keyspaces))
> >     at
> >   
>  
> com.datastax.driver.core.ControlConnection.reconnectInternal(ControlConnection.java:227)
> >     at
> >   
>  
> com.datastax.driver.core.ControlConnection.connect(ControlConnection.java:86)
> >     at
> com.datastax.driver.core.Cluster$Manager.init(Cluster.java:1409)
> >     at com.datastax.driver.core.Cluster.init(Cluster.java:160)
> >     at
> com.datastax.driver.core.Cluster.connectAsync(Cluster.java:338)
> >     at
> com.datastax.driver.core.Cluster.connectAsync(Cluster.java:311)
> >     at com.datastax.driver.core.Cluster.connect(Cluster.java:250)
> >     at
> org.apache.uima.ducc.database.tools.Creator.main(Creator.java:28)
> >     com.datastax.driver.core.exceptions.InvalidQueryException:
> >     unconfigured table schema_keyspaces
> >     at
> >   
>  

Re: com.datastax.driver.core.exceptions.NoHostAvailableException

2018-04-26 Thread Lou DeGenaro
Sorry, my mistake.  Everything is bluej421.  I tried  to (but in hind sight
should not have) edit the append to make the host more generic.  The actual
experiment uses bluej421 everywhere.

cqlsh from the same host works fine with the same exact host specified as
CQLSH_HOST.

I just now installed apache-cassandra-3.11.2-bin.tar.gz and the problem
persists.



On Thu, Apr 26, 2018 at 9:45 AM, Michael Shuler 
wrote:

> host421 != bluej421
> My guess is 192.168.3.232 != {host421,bluej421} somewhere.
>
> If DNS hostnames are being used, the DNS infrastructure needs to be spot
> on, forward and reverse. If the DNS infrastructure is /etc/hosts, those
> hosts entries need to be spot on for the entire cluster, forward and
> reverse.
>
> `ping` your hosts from nodes themselves and from remote nodes. Check the
> listening ports on all nodes with `netstat`. `telnet $host $port`
> locally and remotely. Were the results expected?
>
> Basically, if using DNS, it has to be right everywhere and a lot of
> people get DNS wrong.
>
> --
> Kind regards,
> Michael
>
> On 04/26/2018 08:17 AM, Lou DeGenaro wrote:
> > version: cassandra-3.0.9
> >
> > conf/cassnadra.yaml changes:
> >
> >   - seeds: "host421"
> > listen_address: host421
> > rpc_address: host421
> >
> >
> > Java client:
> >
> > package database.tools;
> >
> > import java.net.InetSocketAddress;
> > import java.util.Map;
> > import java.util.Map.Entry;
> >
> > import com.datastax.driver.core.AuthProvider;
> > import com.datastax.driver.core.Cluster;
> > import com.datastax.driver.core.PlainTextAuthProvider;
> > import com.datastax.driver.core.Session;
> > import com.datastax.driver.core.exceptions.NoHostAvailableException;
> >
> > public class Creator {
> >
> > private static Cluster cluster;
> > private static Session session = null;
> >
> > private static String dburl = "host421";
> >
> > public static void main(String[] args) {
> > try {
> > AuthProvider auth = new
> > PlainTextAuthProvider("cassandra", "cassandra");
> > cluster = Cluster.builder()
> > .withAuthProvider(auth)
> > .addContactPoint(dburl)
> > .build();
> >
> > session = cluster.connect();
> > }
> > catch(NoHostAvailableException e) {
> > e.printStackTrace();
> > Map map = e.getErrors();
> > for(Entry entry :
> > map.entrySet()) {
> > Throwable t = entry.getValue();
> > t.printStackTrace();
> > }
> > }
> > catch(Exception e) {
> > e.printStackTrace();
> > }
> > }
> >
> > }
> >
> >
> > Result:
> >
> >  INFO | Found Netty's native epoll transport in the classpath, using
> it
> > com.datastax.driver.core.exceptions.NoHostAvailableException: All
> > host(s) tried for query failed (tried: bluej421/192.168.3.232:9042
> > 
> > (com.datastax.driver.core.exceptions.InvalidQueryException:
> > unconfigured table schema_keyspaces))
> > at
> > com.datastax.driver.core.ControlConnection.reconnectInternal(
> ControlConnection.java:227)
> > at
> > com.datastax.driver.core.ControlConnection.connect(
> ControlConnection.java:86)
> > at com.datastax.driver.core.Cluster$Manager.init(Cluster.
> java:1409)
> > at com.datastax.driver.core.Cluster.init(Cluster.java:160)
> > at com.datastax.driver.core.Cluster.connectAsync(Cluster.
> java:338)
> > at com.datastax.driver.core.Cluster.connectAsync(Cluster.
> java:311)
> > at com.datastax.driver.core.Cluster.connect(Cluster.java:250)
> > at org.apache.uima.ducc.database.tools.Creator.main(Creator.
> java:28)
> > com.datastax.driver.core.exceptions.InvalidQueryException:
> > unconfigured table schema_keyspaces
> > at
> > com.datastax.driver.core.Responses$Error.asException(
> Responses.java:102)
> > at
> > com.datastax.driver.core.DefaultResultSetFuture.onSet(
> DefaultResultSetFuture.java:149)
> > at
> > com.datastax.driver.core.DefaultResultSetFuture.onSet(
> DefaultResultSetFuture.java:167)
> > at
> > com.datastax.driver.core.Connection$Dispatcher.
> channelRead0(Connection.java:1013)
> > at
> > com.datastax.driver.core.Connection$Dispatcher.
> channelRead0(Connection.java:936)
> > at
> > io.netty.channel.SimpleChannelInboundHandler.channelRead(
> SimpleChannelInboundHandler.java:105)
> > at
> > io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(
> AbstractChannelHandlerContext.java:339)
> > at
> > 

Re: com.datastax.driver.core.exceptions.NoHostAvailableException

2018-04-26 Thread Michael Shuler
host421 != bluej421
My guess is 192.168.3.232 != {host421,bluej421} somewhere.

If DNS hostnames are being used, the DNS infrastructure needs to be spot
on, forward and reverse. If the DNS infrastructure is /etc/hosts, those
hosts entries need to be spot on for the entire cluster, forward and
reverse.

`ping` your hosts from nodes themselves and from remote nodes. Check the
listening ports on all nodes with `netstat`. `telnet $host $port`
locally and remotely. Were the results expected?

Basically, if using DNS, it has to be right everywhere and a lot of
people get DNS wrong.

-- 
Kind regards,
Michael

On 04/26/2018 08:17 AM, Lou DeGenaro wrote:
> version: cassandra-3.0.9
> 
> conf/cassnadra.yaml changes:
> 
>   - seeds: "host421"
> listen_address: host421
> rpc_address: host421
> 
> 
> Java client:
> 
> package database.tools;
> 
> import java.net.InetSocketAddress;
> import java.util.Map;
> import java.util.Map.Entry;
> 
> import com.datastax.driver.core.AuthProvider;
> import com.datastax.driver.core.Cluster;
> import com.datastax.driver.core.PlainTextAuthProvider;
> import com.datastax.driver.core.Session;
> import com.datastax.driver.core.exceptions.NoHostAvailableException;
> 
> public class Creator {
>    
>     private static Cluster cluster;
>     private static Session session = null;
>    
>     private static String dburl = "host421";
>    
>     public static void main(String[] args) {
>         try {
>             AuthProvider auth = new
> PlainTextAuthProvider("cassandra", "cassandra");
>     cluster = Cluster.builder()
>     .withAuthProvider(auth)
>     .addContactPoint(dburl)
>     .build();
>    
>     session = cluster.connect();
>         }
>         catch(NoHostAvailableException e) {
>             e.printStackTrace();
>             Map map = e.getErrors();
>             for(Entry entry :
> map.entrySet()) {
>                 Throwable t = entry.getValue();
>                 t.printStackTrace();
>             }
>         }
>         catch(Exception e) {
>             e.printStackTrace();
>         }
>     }
> 
> }
> 
> 
> Result:
> 
>  INFO | Found Netty's native epoll transport in the classpath, using it
> com.datastax.driver.core.exceptions.NoHostAvailableException: All
> host(s) tried for query failed (tried: bluej421/192.168.3.232:9042
> 
> (com.datastax.driver.core.exceptions.InvalidQueryException:
> unconfigured table schema_keyspaces))
>     at
> 
> com.datastax.driver.core.ControlConnection.reconnectInternal(ControlConnection.java:227)
>     at
> 
> com.datastax.driver.core.ControlConnection.connect(ControlConnection.java:86)
>     at com.datastax.driver.core.Cluster$Manager.init(Cluster.java:1409)
>     at com.datastax.driver.core.Cluster.init(Cluster.java:160)
>     at com.datastax.driver.core.Cluster.connectAsync(Cluster.java:338)
>     at com.datastax.driver.core.Cluster.connectAsync(Cluster.java:311)
>     at com.datastax.driver.core.Cluster.connect(Cluster.java:250)
>     at org.apache.uima.ducc.database.tools.Creator.main(Creator.java:28)
> com.datastax.driver.core.exceptions.InvalidQueryException:
> unconfigured table schema_keyspaces
>     at
> com.datastax.driver.core.Responses$Error.asException(Responses.java:102)
>     at
> 
> com.datastax.driver.core.DefaultResultSetFuture.onSet(DefaultResultSetFuture.java:149)
>     at
> 
> com.datastax.driver.core.DefaultResultSetFuture.onSet(DefaultResultSetFuture.java:167)
>     at
> 
> com.datastax.driver.core.Connection$Dispatcher.channelRead0(Connection.java:1013)
>     at
> 
> com.datastax.driver.core.Connection$Dispatcher.channelRead0(Connection.java:936)
>     at
> 
> io.netty.channel.SimpleChannelInboundHandler.channelRead(SimpleChannelInboundHandler.java:105)
>     at
> 
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:339)
>     at
> 
> io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:324)
>     at
> 
> io.netty.handler.timeout.IdleStateHandler.channelRead(IdleStateHandler.java:254)
>     at
> 
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:339)
>     at
> 
> io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:324)
>     at
> 
> io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:103)
>     at
> 
> 

com.datastax.driver.core.exceptions.NoHostAvailableException

2018-04-26 Thread Lou DeGenaro
version: cassandra-3.0.9

conf/cassnadra.yaml changes:
>
>   - seeds: "host421"
> listen_address: host421
> rpc_address: host421
>

Java client:

package database.tools;
>
> import java.net.InetSocketAddress;
> import java.util.Map;
> import java.util.Map.Entry;
>
> import com.datastax.driver.core.AuthProvider;
> import com.datastax.driver.core.Cluster;
> import com.datastax.driver.core.PlainTextAuthProvider;
> import com.datastax.driver.core.Session;
> import com.datastax.driver.core.exceptions.NoHostAvailableException;
>
> public class Creator {
>
> private static Cluster cluster;
> private static Session session = null;
>
> private static String dburl = "host421";
>
> public static void main(String[] args) {
> try {
> AuthProvider auth = new PlainTextAuthProvider("cassandra",
> "cassandra");
> cluster = Cluster.builder()
> .withAuthProvider(auth)
> .addContactPoint(dburl)
> .build();
>
> session = cluster.connect();
> }
> catch(NoHostAvailableException e) {
> e.printStackTrace();
> Map map = e.getErrors();
> for(Entry entry :
> map.entrySet()) {
> Throwable t = entry.getValue();
> t.printStackTrace();
> }
> }
> catch(Exception e) {
> e.printStackTrace();
> }
> }
>
> }
>

Result:

 INFO | Found Netty's native epoll transport in the classpath, using it
> com.datastax.driver.core.exceptions.NoHostAvailableException: All host(s)
> tried for query failed (tried: bluej421/192.168.3.232:9042
> (com.datastax.driver.core.exceptions.InvalidQueryException: unconfigured
> table schema_keyspaces))
> at
> com.datastax.driver.core.ControlConnection.reconnectInternal(ControlConnection.java:227)
> at
> com.datastax.driver.core.ControlConnection.connect(ControlConnection.java:86)
> at com.datastax.driver.core.Cluster$Manager.init(Cluster.java:1409)
> at com.datastax.driver.core.Cluster.init(Cluster.java:160)
> at com.datastax.driver.core.Cluster.connectAsync(Cluster.java:338)
> at com.datastax.driver.core.Cluster.connectAsync(Cluster.java:311)
> at com.datastax.driver.core.Cluster.connect(Cluster.java:250)
> at org.apache.uima.ducc.database.tools.Creator.main(Creator.java:28)
> com.datastax.driver.core.exceptions.InvalidQueryException: unconfigured
> table schema_keyspaces
> at
> com.datastax.driver.core.Responses$Error.asException(Responses.java:102)
> at
> com.datastax.driver.core.DefaultResultSetFuture.onSet(DefaultResultSetFuture.java:149)
> at
> com.datastax.driver.core.DefaultResultSetFuture.onSet(DefaultResultSetFuture.java:167)
> at
> com.datastax.driver.core.Connection$Dispatcher.channelRead0(Connection.java:1013)
> at
> com.datastax.driver.core.Connection$Dispatcher.channelRead0(Connection.java:936)
> at
> io.netty.channel.SimpleChannelInboundHandler.channelRead(SimpleChannelInboundHandler.java:105)
> at
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:339)
> at
> io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:324)
> at
> io.netty.handler.timeout.IdleStateHandler.channelRead(IdleStateHandler.java:254)
> at
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:339)
> at
> io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:324)
> at
> io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:103)
> at
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:339)
> at
> io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:324)
> at
> io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:242)
> at
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:339)
> at
> io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:324)
> at
> io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:847)
> at
> io.netty.channel.epoll.EpollSocketChannel$EpollSocketUnsafe.epollInReady(EpollSocketChannel.java:722)
> at
> io.netty.channel.epoll.EpollEventLoop.processReady(EpollEventLoop.java:326)
> at io.netty.channel.epoll.EpollEventLoop.run(EpollEventLoop.java:264)
> at
> io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:111)
> at java.lang.Thread.run(Thread.java:811)
>

Surely user error, but what is being done wrongly please?

Thanks.

Lou.


Repair of 5GB data vs. disk throughput does not make sense

2018-04-26 Thread Steinmaurer, Thomas
Hello,

yet another question/issue with repair.

Cassandra 2.1.18, 3 nodes, RF=3, vnode=256, data volume ~ 5G per node only. A 
repair (nodetool repair -par) issued on a single node at this data volume takes 
around 36min with an AVG of ~ 15MByte/s disk throughput (read+write) for the 
entire time-frame, thus processing ~ 32GByte from a disk perspective so ~ 6 
times of the real data volume reported by nodetool status. Does this make any 
sense? This is with 4 compaction threads and compaction throughput = 64. 
Similar results doing this test a few times, where most/all inconsistent data 
should be already sorted out by previous runs.

I know there is e.g. reaper, but the above is a simple use case simply after a 
single failed node recovers beyond the 3h hinted handoff window. How should 
this finish in a timely manner for > 500G on a recovering node?

I have to admit this is with NFS as storage. I know, NFS might not be the best 
idea, but with the above test at ~ 5GB data volume, we see an IOPS rate at ~ 
700 at a disk latency of ~ 15ms, thus I wouldn't treat it as that bad. This all 
is using/running Cassandra on-premise (at the customer, so not hosted by us), 
so while we can make recommendations storage-wise (of course preferring local 
disks), it may and will happen that NFS is being in use then.

Why we are using -par in combination with NFS is a different story and related 
to this issue: https://issues.apache.org/jira/browse/CASSANDRA-8743. Without 
switching from sequential to parallel repair, we basically kill Cassandra.

Throughput-wise, I also don't think it is related to NFS, cause we see similar 
repair throughput values with AWS EBS (gp2, SSD based) running regular repairs 
on small-sized CFs.

Thanks for any input.
Thomas
The contents of this e-mail are intended for the named addressee only. It 
contains information that may be confidential. Unless you are the named 
addressee or an authorized designee, you may not copy or use it, or disclose it 
to anyone else. If you received it in error please notify us immediately and 
then destroy it. Dynatrace Austria GmbH (registration number FN 91482h) is a 
company registered in Linz whose registered office is at 4040 Linz, Austria, 
Freist?dterstra?e 313


Re: does c* 3.0 use one ring for all datacenters?

2018-04-26 Thread Jinhua Luo
How to guarantee the tokens independent between DC? They forms one
ring, and they must be (re-)assigned when needed.
Use offset per DC? But it seems that the DC list must be fixed in advanced?
To make sure the tokens are evenly distributed into the ring among the
DC(s), are there chances to change the tokens owned by per DC?
Could you please give a detailed token re-balancing procedure in case
of node add/remove?

2018-04-26 16:23 GMT+08:00 Xiaolong Jiang :
> DC are independent of each other. Adding nodes to DC1  won't have any token
> effect owned by other DC.
>
> On Thu, Apr 26, 2018 at 1:04 AM, Jinhua Luo  wrote:
>>
>> You're assuming per DC has same total num_tokens, right?
>> If I add a new node into DC1, will it change the tokens owned by DC2 and
>> DC3?
>>
>> 2018-04-12 0:59 GMT+08:00 Jeff Jirsa :
>> > When you add DC3, they'll get tokens (that aren't currently in use in
>> > any
>> > existing DC). Either you assign tokens (let's pretend we manually
>> > assigned
>> > the other ones, since DC2 = DC1 + 1), but cassandra can also
>> > auto-calculate
>> > them, the exact behavior of which varies by version.
>> >
>> >
>> > Let's pretend it's old style random assignment, and we end up with DC3
>> > having 4, 17, 22, 36, 48, 53, 64, 73, 83
>> >
>> > In this case:
>> >
>> > If you use SimpleStrategy and RF=3, a key with token 5 would be placed
>> > on
>> > the hosts with token 10, 11, 17
>> > If you use NetworkTopologyStrategy with RF=3 per DC, a key with token 5
>> > would be placed on the hosts with tokens 10,20,30 ; 11, 21,31 ; 17, 22,
>> > 36
>> >
>> >
>> >
>> >
>> > On Wed, Apr 11, 2018 at 9:36 AM, Jinhua Luo  wrote:
>> >>
>> >> What if I add a new DC3?
>> >> The token ranges would reshuffled into DC1, DC2, DC3?
>> >>
>> >> 2018-04-11 22:06 GMT+08:00 Jeff Jirsa :
>> >> > Confirming again that it's definitely one ring.
>> >> >
>> >> > DC1 may have tokens 0, 10, 20, 30, 40, 50, 60, 70, 80
>> >> > DC2 may have tokens 1, 11, 21, 31, 41, 51, 61, 71, 81
>> >> >
>> >> > If you use SimpleStrategy and RF=3, a key with token 5 would be
>> >> > placed
>> >> > on
>> >> > the hosts with token 10, 11, 20
>> >> > If you use NetworkTopologyStrategy with RF=3 per DC, a key with token
>> >> > 5
>> >> > would be placed on the hosts with tokens 10,20,30 and 11, 21,31
>> >> >
>> >> >
>> >> >
>> >> >
>> >> >
>> >> > On Wed, Apr 11, 2018 at 6:27 AM, Jinhua Luo 
>> >> > wrote:
>> >> >>
>> >> >> Is it a different answer? One ring?
>> >> >>
>> >> >> Could you explain your answer according to my example?
>> >> >>
>> >> >> 2018-04-11 21:24 GMT+08:00 Jonathan Haddad :
>> >> >> > There has always been a single ring.
>> >> >> >
>> >> >> > You can specify how many nodes in each DC you want and it’ll
>> >> >> > figure
>> >> >> > out
>> >> >> > how
>> >> >> > to do it as long as you have the right snitch and are using
>> >> >> > NetworkToploogyStrategy.
>> >> >> >
>> >> >> >
>> >> >> > On Wed, Apr 11, 2018 at 6:11 AM Jinhua Luo 
>> >> >> > wrote:
>> >> >> >>
>> >> >> >> Let me clarify my question:
>> >> >> >>
>> >> >> >> Given we have a cluster of two DCs, each DC has 2 nodes, each
>> >> >> >> node
>> >> >> >> sets num_token as 50.
>> >> >> >> Then how are token ranges distributed in the cluster?
>> >> >> >>
>> >> >> >> If there is one global ring, then it may be (To simply the case,
>> >> >> >> let's
>> >> >> >> assume vnodes=1):
>> >> >> >> {dc1, node1} 1-50
>> >> >> >> {dc2, node1} 51-100
>> >> >> >> {dc1, node1} 101-150
>> >> >> >> {dc1, node2} 151-200
>> >> >> >>
>> >> >> >> But here comes more questions:
>> >> >> >> a) what if I add a new datacenter? Then the token ranges need to
>> >> >> >> be
>> >> >> >> re-balanced?
>> >> >> >> If so, what about the data associated with the ranges to be
>> >> >> >> balanced?
>> >> >> >> move them among DCs?
>> >> >> >> But that doesn't make sense, because each keyspace would specify
>> >> >> >> its
>> >> >> >> snith and fix the DCs to store then.
>> >> >> >>
>> >> >> >> b) It seems no benefits from same ring, because of the snith.
>> >> >> >>
>> >> >> >> If each DC has own ring, then it may be:
>> >> >> >> {dc1, node1} 1-50
>> >> >> >> {dc1, node1} 51-100
>> >> >> >> {dc2, node1} 1-50
>> >> >> >> {dc2, node1} 51-100
>> >> >> >>
>> >> >> >> I think this is not a trivial question, because each key would be
>> >> >> >> hashed to determine the token it belongs to, and
>> >> >> >> the token range distribution in turns determine which node the
>> >> >> >> key
>> >> >> >> belongs
>> >> >> >> to.
>> >> >> >>
>> >> >> >> Any official answer?
>> >> >> >>
>> >> >> >>
>> >> >> >> 2018-04-11 20:54 GMT+08:00 Jacques-Henri Berthemet
>> >> >> >> :
>> >> >> >> > Maybe I misunderstood something but from what I understand,
>> >> >> >> > each
>> >> >> >> > DC
>> >> >> >> > have
>> >> >> >> > the same ring (0-100 in 

Re: does c* 3.0 use one ring for all datacenters?

2018-04-26 Thread Xiaolong Jiang
DC are independent of each other. Adding nodes to DC1  won't have any token
effect owned by other DC.

On Thu, Apr 26, 2018 at 1:04 AM, Jinhua Luo  wrote:

> You're assuming per DC has same total num_tokens, right?
> If I add a new node into DC1, will it change the tokens owned by DC2 and
> DC3?
>
> 2018-04-12 0:59 GMT+08:00 Jeff Jirsa :
> > When you add DC3, they'll get tokens (that aren't currently in use in any
> > existing DC). Either you assign tokens (let's pretend we manually
> assigned
> > the other ones, since DC2 = DC1 + 1), but cassandra can also
> auto-calculate
> > them, the exact behavior of which varies by version.
> >
> >
> > Let's pretend it's old style random assignment, and we end up with DC3
> > having 4, 17, 22, 36, 48, 53, 64, 73, 83
> >
> > In this case:
> >
> > If you use SimpleStrategy and RF=3, a key with token 5 would be placed on
> > the hosts with token 10, 11, 17
> > If you use NetworkTopologyStrategy with RF=3 per DC, a key with token 5
> > would be placed on the hosts with tokens 10,20,30 ; 11, 21,31 ; 17, 22,
> 36
> >
> >
> >
> >
> > On Wed, Apr 11, 2018 at 9:36 AM, Jinhua Luo  wrote:
> >>
> >> What if I add a new DC3?
> >> The token ranges would reshuffled into DC1, DC2, DC3?
> >>
> >> 2018-04-11 22:06 GMT+08:00 Jeff Jirsa :
> >> > Confirming again that it's definitely one ring.
> >> >
> >> > DC1 may have tokens 0, 10, 20, 30, 40, 50, 60, 70, 80
> >> > DC2 may have tokens 1, 11, 21, 31, 41, 51, 61, 71, 81
> >> >
> >> > If you use SimpleStrategy and RF=3, a key with token 5 would be placed
> >> > on
> >> > the hosts with token 10, 11, 20
> >> > If you use NetworkTopologyStrategy with RF=3 per DC, a key with token
> 5
> >> > would be placed on the hosts with tokens 10,20,30 and 11, 21,31
> >> >
> >> >
> >> >
> >> >
> >> >
> >> > On Wed, Apr 11, 2018 at 6:27 AM, Jinhua Luo 
> wrote:
> >> >>
> >> >> Is it a different answer? One ring?
> >> >>
> >> >> Could you explain your answer according to my example?
> >> >>
> >> >> 2018-04-11 21:24 GMT+08:00 Jonathan Haddad :
> >> >> > There has always been a single ring.
> >> >> >
> >> >> > You can specify how many nodes in each DC you want and it’ll figure
> >> >> > out
> >> >> > how
> >> >> > to do it as long as you have the right snitch and are using
> >> >> > NetworkToploogyStrategy.
> >> >> >
> >> >> >
> >> >> > On Wed, Apr 11, 2018 at 6:11 AM Jinhua Luo 
> >> >> > wrote:
> >> >> >>
> >> >> >> Let me clarify my question:
> >> >> >>
> >> >> >> Given we have a cluster of two DCs, each DC has 2 nodes, each node
> >> >> >> sets num_token as 50.
> >> >> >> Then how are token ranges distributed in the cluster?
> >> >> >>
> >> >> >> If there is one global ring, then it may be (To simply the case,
> >> >> >> let's
> >> >> >> assume vnodes=1):
> >> >> >> {dc1, node1} 1-50
> >> >> >> {dc2, node1} 51-100
> >> >> >> {dc1, node1} 101-150
> >> >> >> {dc1, node2} 151-200
> >> >> >>
> >> >> >> But here comes more questions:
> >> >> >> a) what if I add a new datacenter? Then the token ranges need to
> be
> >> >> >> re-balanced?
> >> >> >> If so, what about the data associated with the ranges to be
> >> >> >> balanced?
> >> >> >> move them among DCs?
> >> >> >> But that doesn't make sense, because each keyspace would specify
> its
> >> >> >> snith and fix the DCs to store then.
> >> >> >>
> >> >> >> b) It seems no benefits from same ring, because of the snith.
> >> >> >>
> >> >> >> If each DC has own ring, then it may be:
> >> >> >> {dc1, node1} 1-50
> >> >> >> {dc1, node1} 51-100
> >> >> >> {dc2, node1} 1-50
> >> >> >> {dc2, node1} 51-100
> >> >> >>
> >> >> >> I think this is not a trivial question, because each key would be
> >> >> >> hashed to determine the token it belongs to, and
> >> >> >> the token range distribution in turns determine which node the key
> >> >> >> belongs
> >> >> >> to.
> >> >> >>
> >> >> >> Any official answer?
> >> >> >>
> >> >> >>
> >> >> >> 2018-04-11 20:54 GMT+08:00 Jacques-Henri Berthemet
> >> >> >> :
> >> >> >> > Maybe I misunderstood something but from what I understand, each
> >> >> >> > DC
> >> >> >> > have
> >> >> >> > the same ring (0-100 in you example) but it's split differently
> >> >> >> > between
> >> >> >> > nodes in each DC. I think it's the same principle if using vnode
> >> >> >> > or
> >> >> >> > not.
> >> >> >> >
> >> >> >> > I think the confusion comes from the fact that the ring range is
> >> >> >> > the
> >> >> >> > same (0-100) but each DC manages it differently because nodes
> are
> >> >> >> > different.
> >> >> >> >
> >> >> >> > --
> >> >> >> > Jacques-Henri Berthemet
> >> >> >> >
> >> >> >> > -Original Message-
> >> >> >> > From: Jinhua Luo [mailto:luajit...@gmail.com]
> >> >> >> > Sent: Wednesday, April 11, 2018 2:26 PM
> >> >> >> > To: user@cassandra.apache.org
> >> >> >> > Subject: Re: does c* 3.0 use one ring 

Re: does c* 3.0 use one ring for all datacenters?

2018-04-26 Thread Jinhua Luo
You're assuming per DC has same total num_tokens, right?
If I add a new node into DC1, will it change the tokens owned by DC2 and DC3?

2018-04-12 0:59 GMT+08:00 Jeff Jirsa :
> When you add DC3, they'll get tokens (that aren't currently in use in any
> existing DC). Either you assign tokens (let's pretend we manually assigned
> the other ones, since DC2 = DC1 + 1), but cassandra can also auto-calculate
> them, the exact behavior of which varies by version.
>
>
> Let's pretend it's old style random assignment, and we end up with DC3
> having 4, 17, 22, 36, 48, 53, 64, 73, 83
>
> In this case:
>
> If you use SimpleStrategy and RF=3, a key with token 5 would be placed on
> the hosts with token 10, 11, 17
> If you use NetworkTopologyStrategy with RF=3 per DC, a key with token 5
> would be placed on the hosts with tokens 10,20,30 ; 11, 21,31 ; 17, 22, 36
>
>
>
>
> On Wed, Apr 11, 2018 at 9:36 AM, Jinhua Luo  wrote:
>>
>> What if I add a new DC3?
>> The token ranges would reshuffled into DC1, DC2, DC3?
>>
>> 2018-04-11 22:06 GMT+08:00 Jeff Jirsa :
>> > Confirming again that it's definitely one ring.
>> >
>> > DC1 may have tokens 0, 10, 20, 30, 40, 50, 60, 70, 80
>> > DC2 may have tokens 1, 11, 21, 31, 41, 51, 61, 71, 81
>> >
>> > If you use SimpleStrategy and RF=3, a key with token 5 would be placed
>> > on
>> > the hosts with token 10, 11, 20
>> > If you use NetworkTopologyStrategy with RF=3 per DC, a key with token 5
>> > would be placed on the hosts with tokens 10,20,30 and 11, 21,31
>> >
>> >
>> >
>> >
>> >
>> > On Wed, Apr 11, 2018 at 6:27 AM, Jinhua Luo  wrote:
>> >>
>> >> Is it a different answer? One ring?
>> >>
>> >> Could you explain your answer according to my example?
>> >>
>> >> 2018-04-11 21:24 GMT+08:00 Jonathan Haddad :
>> >> > There has always been a single ring.
>> >> >
>> >> > You can specify how many nodes in each DC you want and it’ll figure
>> >> > out
>> >> > how
>> >> > to do it as long as you have the right snitch and are using
>> >> > NetworkToploogyStrategy.
>> >> >
>> >> >
>> >> > On Wed, Apr 11, 2018 at 6:11 AM Jinhua Luo 
>> >> > wrote:
>> >> >>
>> >> >> Let me clarify my question:
>> >> >>
>> >> >> Given we have a cluster of two DCs, each DC has 2 nodes, each node
>> >> >> sets num_token as 50.
>> >> >> Then how are token ranges distributed in the cluster?
>> >> >>
>> >> >> If there is one global ring, then it may be (To simply the case,
>> >> >> let's
>> >> >> assume vnodes=1):
>> >> >> {dc1, node1} 1-50
>> >> >> {dc2, node1} 51-100
>> >> >> {dc1, node1} 101-150
>> >> >> {dc1, node2} 151-200
>> >> >>
>> >> >> But here comes more questions:
>> >> >> a) what if I add a new datacenter? Then the token ranges need to be
>> >> >> re-balanced?
>> >> >> If so, what about the data associated with the ranges to be
>> >> >> balanced?
>> >> >> move them among DCs?
>> >> >> But that doesn't make sense, because each keyspace would specify its
>> >> >> snith and fix the DCs to store then.
>> >> >>
>> >> >> b) It seems no benefits from same ring, because of the snith.
>> >> >>
>> >> >> If each DC has own ring, then it may be:
>> >> >> {dc1, node1} 1-50
>> >> >> {dc1, node1} 51-100
>> >> >> {dc2, node1} 1-50
>> >> >> {dc2, node1} 51-100
>> >> >>
>> >> >> I think this is not a trivial question, because each key would be
>> >> >> hashed to determine the token it belongs to, and
>> >> >> the token range distribution in turns determine which node the key
>> >> >> belongs
>> >> >> to.
>> >> >>
>> >> >> Any official answer?
>> >> >>
>> >> >>
>> >> >> 2018-04-11 20:54 GMT+08:00 Jacques-Henri Berthemet
>> >> >> :
>> >> >> > Maybe I misunderstood something but from what I understand, each
>> >> >> > DC
>> >> >> > have
>> >> >> > the same ring (0-100 in you example) but it's split differently
>> >> >> > between
>> >> >> > nodes in each DC. I think it's the same principle if using vnode
>> >> >> > or
>> >> >> > not.
>> >> >> >
>> >> >> > I think the confusion comes from the fact that the ring range is
>> >> >> > the
>> >> >> > same (0-100) but each DC manages it differently because nodes are
>> >> >> > different.
>> >> >> >
>> >> >> > --
>> >> >> > Jacques-Henri Berthemet
>> >> >> >
>> >> >> > -Original Message-
>> >> >> > From: Jinhua Luo [mailto:luajit...@gmail.com]
>> >> >> > Sent: Wednesday, April 11, 2018 2:26 PM
>> >> >> > To: user@cassandra.apache.org
>> >> >> > Subject: Re: does c* 3.0 use one ring for all datacenters?
>> >> >> >
>> >> >> > Thanks for your reply. I also think separate rings are more
>> >> >> > reasonable.
>> >> >> >
>> >> >> > So one ring for one dc is only for c* 1.x or 2.x without vnode?
>> >> >> >
>> >> >> > Check these references:
>> >> >> >
>> >> >> >
>> >> >> >
>> >> >> >
>> >> >> > https://docs.datastax.com/en/archived/cassandra/1.1/docs/initialize/token_generation.html
>> >> >> >