Re: Performance drop of current Java drivers

2020-05-04 Thread Matthias Pfau
Hi Chris and Adam,
thanks for looking into this!

You can find my tests for old/new client here:
https://gist.github.com/mpfau/7905cea3b73d235033e4f3319e219d15
https://gist.github.com/mpfau/a62cce01b83b56afde0dbb588470bc18


May 1, 2020, 16:22 by adam.holmb...@datastax.com:

> Also, if you can share your schema and benchmark code, that would be a good 
> start.
>
> On Fri, May 1, 2020 at 7:09 AM Chris Splinter <> chris.splinter...@gmail.com> 
> > wrote:
>
>> Hi Matthias,
>>
>> I have forwarded this to the developers that work on the Java driver and 
>> they will be looking into this first thing next week.
>>
>> Will circle back here with findings,
>>
>> Chris
>>
>> On Fri, May 1, 2020 at 12:28 AM Erick Ramirez <>> 
>> erick.rami...@datastax.com>> > wrote:
>>
>>> Matthias, I don't have an answer to your question but I just wanted to note 
>>> that I don't believe the driver contributors actively watch this mailing 
>>> list (I'm happy to be corrected  ) so I'd recommend you cross-post in the 
>>> Java driver channels as well. Cheers!
>>>
>
>
> -- 
> Adam Holmberg
> e.>  > adam.holmb...@datastax.com
>  > w.>  > www.datastax.com 
>
>


-
To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
For additional commands, e-mail: user-h...@cassandra.apache.org



RE: [EXTERNAL] Re: Adding new DC results in clients failing to connect

2020-05-04 Thread Gediminas Blazys
Hello again,

Looking into system.peers we found that some nodes contain entries about 
themselves with null values. Not sure if this could be an issue, maybe someone 
saw something similar? This state is there before including the funky DC into 
replication.
peer
 data_center
 host_id
 preferred_ip
 rack
 release_version
 rpc_address
 schema_version
 tokens

null
 null
 192.168.104.111
  null
null
null
null
null

Have a wonderful day 

Gediminas

From: Gediminas Blazys 
Sent: Monday, May 4, 2020 10:09
To: user@cassandra.apache.org
Subject: RE: [EXTERNAL] Re: Adding new DC results in clients failing to connect

Hello,

Thanks for the reply.

Following your advice we took a look at system.local for seed nodes and 
compared that data with nodetool ring. Both sources contain the same tokens for 
these specific hosts. Will continue looking into system.peers.

We have enabled more verbosity on the C# driver and this is the message that we 
get now:


ControlConnection: 05/03/2020 14:28:42.346 +03:00 : Updating keyspaces metadata

ControlConnection: 05/03/2020 14:28:42.377 +03:00 : Rebuilding token map

ControlConnection: 05/03/2020 14:29:03.837 +03:00 : Finished building TokenMap 
for 7 keyspaces and 210 hosts. It took 19403 milliseconds.

ControlConnection: 05/03/2020 14:29:03.901 +03:00 ALARMA: ENDPOINT: 
<>:9042 EXCEPTION: System.ArgumentException: The source argument 
contains duplicate keys.

   at 
System.Collections.Concurrent.ConcurrentDictionary`2.InitializeFromCollection(IEnumerable`1
 collection)

   at System.Collections.Concurrent.ConcurrentDictionary`2..ctor(IEnumerable`1 
collection, IEqualityComparer`1 comparer)

   at System.Collections.Concurrent.ConcurrentDictionary`2..ctor(IEnumerable`1 
collection)

   at Cassandra.TokenMap..ctor(TokenFactory factory, IReadOnlyDictionary`2 
tokenToHostsByKeyspace, List`1 ring, IReadOnlyDictionary`2 primaryReplicas, 
IReadOnlyDictionary`2 keyspaceTokensCache, IReadOnlyDictionary`2 datacenters, 
Int32 numberOfHostsWithTokens)

   at Cassandra.TokenMap.Build(String partitioner, ICollection`1 hosts, 
ICollection`1 keyspaces)

   at Cassandra.Metadata.d__59.MoveNext()

--- End of stack trace from previous location where exception was thrown ---

   at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task task)

   at 
System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task
 task)

   at 
System.Runtime.CompilerServices.ConfiguredTaskAwaitable.ConfiguredTaskAwaiter.GetResult()

   at Cassandra.Connections.ControlConnection.d__44.MoveNext()

The error occurs on Cassandra.TokenMap. We are analyzing objects that the 
driver initializes during the token map creation but we are yet to find that 
dictionary with duplicated keys.
Just to note, once this new DC is added to replication python driver is unable 
to establish a connection either. cqlsh though, seems to be ok. It is hard to 
say for sure, but for now at least, this issue seems to be pointing to 
Cassandra.

Gediminas

From: Jorge Bay Gondra 
mailto:jorgebaygon...@gmail.com>>
Sent: Thursday, April 30, 2020 11:45
To: user@cassandra.apache.org
Subject: [EXTERNAL] Re: Adding new DC results in clients failing to connect

Hi,
You can enable logging at driver to see what's happening under the hood: 
https://docs.datastax.com/en/developer/csharp-driver/3.14/faq/#how-can-i-enable-logging-in-the-driver
With logging information, it should be easy to track the issue down.

Can you query system.local and system.peers on a seed node / contact point to 
see if all the node list / token info is expected. You can compare it to 
nodetool ring info.

Not directly related: 256 vnodes is probably more than you want.

Thanks,
Jorge

On Thu, Apr 30, 2020 at 9:48 AM Gediminas Blazys 
mailto:gediminas.bla...@microsoft.com.invalid>>
 wrote:
Hello,

We have run into a very interesting issue and maybe some of you have 
encountered it or just have an idea where to look.

We are working towards adding new dcs into our cluster, here's the current 
topology:
DC1 - 18 nodes
DC2 - 18 nodes
DC3 - 18 nodes
DC4 - 18 nodes
DC5 - 18 nodes

Recently we introduced a new DC6 (60 nodes) into our cluster. The joining and 
rebuilding of DC6 went smoothly, clients are using it without issue. This is 
how it looked after joining DC6:
DC1 - 18 nodes
DC2 - 18 nodes
DC3 - 18 nodes
DC4 - 18 nodes
DC5 - 18 nodes
DC6 - 60 nodes

Next we wanted to add another DC7 (also 60 nodes) making it a total of 210 
nodes in the cluster, and while joining new nodes went 

Re: UDTs, UDFs und UDAs

2020-05-04 Thread Alexandre Dutra
Hi,

I already posted an answer for your question on
https://community.datastax.com/ but since you asked here as well:

1) When I call the aggregate, I would like to pass sample_size with a
> sub-query [..] Is that possible with Cassandra?


No.

2) When I try to register the bloomfilter_uda, I get the following error
> [...] Can I just pass Cassandra data types as a state (map, list, set)?


Yes, but you need to input a valid literal for your UDT by initializing at
least one field:

CREATE OR REPLACE AGGREGATE bloomfilter_uda ( text, int )
SFUNC bloomfilter_udf
STYPE bloomfilter_udt
INITCOND { n_as_sample_size : 0 };

This is a subtlety of the CQL parser; if you input just {} the parser would
be fooled into thinking that this is a set literal (an empty set) – hence
the (rather cryptic) error message.

3) If I assume, all of the above is my bad, how can I access the props of
> the state?


Inside functions and aggregates, if you need to access or modify a
user-defined type, you actually need to use the DataStax Java driver 3.x
API for User-defined types
:

CREATE OR REPLACE FUNCTION bloomfilter_udf (
state bloomfilter_udt,
value text,
sample_size int
)
CALLED ON NULL INPUT
RETURNS bloomfilter_udt
LANGUAGE java AS
$$
state.setInt("n_as_sample_size", 42);
state.setInt("m_as_number_of_buckets" 42);
state.setLong("p_as_next_prime_above_m", 4242L);
List hashForStringCoefficients = ...;
state.setList("hash_for_string_coefficient_a",
hashForStringCoefficients, Long.class);
// etc.
return state;
$$

The variable state inside the Java block is of type UDTValue

.

Hope that helps.

Alex Dutra

On Thu, Apr 30, 2020 at 6:36 PM Andreas R. 
wrote:

>  Hello
>
> I am trying to extract sketches (e.g. bloom filter) from some given data.
> I came this far, questions below:
>
> CREATE TYPE bloomfilter_udt(
> n_as_sample_size int,
> m_as_number_of_buckets int,
> p_as_next_prime_above_m bigint,
> hash_for_string_coefficient_a list ,
> hash_for_number_coefficients_a list ,
> hash_for_number_coefficients_b list ,
> bloom_filter_as_map map
> );
>
> CREATE OR REPLACE FUNCTION bloomfilter_udf(state bloomfilter_udt, value
> text, sample_size int)
> CALLED ON NULL INPUT
> RETURNS bloomfilter_udt
> LANGUAGE java
> AS
> $$
> //put n_as_sample_size in result
> //if(state.getUDTValue("n_as_sample_size") ==
> null){state.setInt("n_as_sample_size", sample_size);};
> //do some more stuff to the bloomfilter_udt
> return state;
> $$;
>
> CREATE OR REPLACE AGGREGATE bloomfilter_uda(text, int)
> SFUNC bloomfilter_udf
> STYPE bloomfilter_udt
> INITCOND {};
>
> 1) I would pass the sample_size with a sub-query, e.g.
> ==> "SELECT bloomfilter_uda(name, (SELECT count(*) FROM test_table)) FROM
> test_table;" <==
> Is that possible with Cassandra?
>
> 2) When I try to register the bloomfilter_uda, I get the following error:
> ==> InvalidRequest: Error from server: code=2200 [Invalid query]
> message="Invalid set literal for (dummy) of type bloomfilter_udt" <==
> Can I just pass Cassandra data types as a state (map, list, set)?
>
> 3) If I assume, all of the above is my bad, how can I access the props of
> the state? Like
> ==> state.n_as_sample_size <==
> Is this somehow possible?
>
> I am kind of struggling with Cassandra and I'd appreciate some help.
>
> Thanks
> Andreas
>
> -- andreasrimmelspac...@gmail.com
>
>


RE: [EXTERNAL] Re: Adding new DC results in clients failing to connect

2020-05-04 Thread Gediminas Blazys
Hello,

Thanks for the reply.

Following your advice we took a look at system.local for seed nodes and 
compared that data with nodetool ring. Both sources contain the same tokens for 
these specific hosts. Will continue looking into system.peers.

We have enabled more verbosity on the C# driver and this is the message that we 
get now:


ControlConnection: 05/03/2020 14:28:42.346 +03:00 : Updating keyspaces metadata

ControlConnection: 05/03/2020 14:28:42.377 +03:00 : Rebuilding token map

ControlConnection: 05/03/2020 14:29:03.837 +03:00 : Finished building TokenMap 
for 7 keyspaces and 210 hosts. It took 19403 milliseconds.

ControlConnection: 05/03/2020 14:29:03.901 +03:00 ALARMA: ENDPOINT: 
<>:9042 EXCEPTION: System.ArgumentException: The source argument 
contains duplicate keys.

   at 
System.Collections.Concurrent.ConcurrentDictionary`2.InitializeFromCollection(IEnumerable`1
 collection)

   at System.Collections.Concurrent.ConcurrentDictionary`2..ctor(IEnumerable`1 
collection, IEqualityComparer`1 comparer)

   at System.Collections.Concurrent.ConcurrentDictionary`2..ctor(IEnumerable`1 
collection)

   at Cassandra.TokenMap..ctor(TokenFactory factory, IReadOnlyDictionary`2 
tokenToHostsByKeyspace, List`1 ring, IReadOnlyDictionary`2 primaryReplicas, 
IReadOnlyDictionary`2 keyspaceTokensCache, IReadOnlyDictionary`2 datacenters, 
Int32 numberOfHostsWithTokens)

   at Cassandra.TokenMap.Build(String partitioner, ICollection`1 hosts, 
ICollection`1 keyspaces)

   at Cassandra.Metadata.d__59.MoveNext()

--- End of stack trace from previous location where exception was thrown ---

   at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task task)

   at 
System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task
 task)

   at 
System.Runtime.CompilerServices.ConfiguredTaskAwaitable.ConfiguredTaskAwaiter.GetResult()

   at Cassandra.Connections.ControlConnection.d__44.MoveNext()

The error occurs on Cassandra.TokenMap. We are analyzing objects that the 
driver initializes during the token map creation but we are yet to find that 
dictionary with duplicated keys.
Just to note, once this new DC is added to replication python driver is unable 
to establish a connection either. cqlsh though, seems to be ok. It is hard to 
say for sure, but for now at least, this issue seems to be pointing to 
Cassandra.

Gediminas

From: Jorge Bay Gondra 
Sent: Thursday, April 30, 2020 11:45
To: user@cassandra.apache.org
Subject: [EXTERNAL] Re: Adding new DC results in clients failing to connect

Hi,
You can enable logging at driver to see what's happening under the hood: 
https://docs.datastax.com/en/developer/csharp-driver/3.14/faq/#how-can-i-enable-logging-in-the-driver
With logging information, it should be easy to track the issue down.

Can you query system.local and system.peers on a seed node / contact point to 
see if all the node list / token info is expected. You can compare it to 
nodetool ring info.

Not directly related: 256 vnodes is probably more than you want.

Thanks,
Jorge

On Thu, Apr 30, 2020 at 9:48 AM Gediminas Blazys 
mailto:gediminas.bla...@microsoft.com.invalid>>
 wrote:
Hello,

We have run into a very interesting issue and maybe some of you have 
encountered it or just have an idea where to look.

We are working towards adding new dcs into our cluster, here's the current 
topology:
DC1 - 18 nodes
DC2 - 18 nodes
DC3 - 18 nodes
DC4 - 18 nodes
DC5 - 18 nodes

Recently we introduced a new DC6 (60 nodes) into our cluster. The joining and 
rebuilding of DC6 went smoothly, clients are using it without issue. This is 
how it looked after joining DC6:
DC1 - 18 nodes
DC2 - 18 nodes
DC3 - 18 nodes
DC4 - 18 nodes
DC5 - 18 nodes
DC6 - 60 nodes

Next we wanted to add another DC7 (also 60 nodes) making it a total of 210 
nodes in the cluster, and while joining new nodes went smoothly, once we 
changed the replication of user defined keyspaces to include DC7, no clients 
were able to connect to Cassandra (regardless of which DC is being addressed). 
They would throw an exception that I have provided at the end of the email.

Cassandra version 3.11.4.
C# driver version 3.12.0. Also tested with 3.14.0. We use dc round robin policy 
and update ring metadata for connecting clients.
Amount of vnodes per node: 256

The stack trace starts with an exception 'The source argument contains 
duplicate keys.'. Maybe you know what kind of data is in this dictionary? What 
data can be duplicated here?

Clients are unable to connect until the moment we remove DC7 from replication. 
Once replication is adjusted to exclude DC7,