Re: inter dc bandwidth calculation

2020-01-27 Thread Georg Brandemann
Hello,

just as a small addition: The numbers also depend on your consistency level
used for reads. It will behave like that if you just read on local nodes.
If you do reads on ALL,  QUORUM or EACH_QUORUM etc. you need also include
the read volume in the calculation.

Regards,
Georg

Am Mi., 15. Jan. 2020 um 19:35 Uhr schrieb Osman Yozgatlıoğlu <
osman.yozgatlio...@gmail.com>:

> Thank you. I have an insight now.
>
> Regards,
> Osman
>
> On Wed, 15 Jan 2020 at 19:18, Reid Pinchback 
> wrote:
> >
> > Oh, duh.  Revise that.  I was forgetting that multi-dc writes are sent
> to a single node in the other dc and tagged to be forwarded to other nodes
> within the dc.
> >
> > So your quick-and-dirty estimate would be more like (write volume) x 2
> to leave headroom for random other mechanics.
> >
> > R
> >
> >
> > On 1/15/20, 11:07 AM, "Reid Pinchback" 
> wrote:
> >
> >  Message from External Sender
> >
> > I would think that it would be largely driven by the replication
> factor.  It isn't that the sstables are forklifted from one dc to another,
> it's just that the writes being made to the memtables are also shipped
> around by the coordinator nodes as the writes happen.  Operations at the
> sstable level, like compactions, are local to the node.
> >
> > One potential wrinkle that I'm unclear on, is related to repairs.  I
> don't know if merkle trees are biased to mostly bounce around only
> intra-dc, versus how often they are communicated inter-dc.  Note that even
> queries can trigger some degree of repair traffic if you have a usage
> pattern of trying to read data recently written, because at the bleeding
> edge of the recent changes you'll have more cases of rows not having had
> time to settle to a consistent state.
> >
> > If you want a quick-and-dirty heuristic, I'd probably take (write
> volume) x (replication factor) x 2 as a guestimate so you have some
> headroom for C* and TCP mechanics, but then monitor to see what your real
> use is.
> >
> > R
> >
> >
> > On 1/15/20, 4:14 AM, "Osman Yozgatlıoğlu" <
> osman.yozgatlio...@gmail.com> wrote:
> >
> >  Message from External Sender
> >
> > Hello,
> >
> > Is there any way to calculate inter dc bandwidth requirements for
> > proper operation?
> > I can't find any info about this subject.
> > Can we say, how much sstable collected at one dc has to be
> transferred to other?
> > I can calculate bandwidth with generated sstable then.
> > I have twcs with one hour window.
> >
> > Regards,
> > Osman
> >
> >
>  -
> > To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
> > For additional commands, e-mail: user-h...@cassandra.apache.org
> >
> >
> >
> >
> >
>
> -
> To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
> For additional commands, e-mail: user-h...@cassandra.apache.org
>
>


Re: Aws instance stop and star with ebs

2019-11-29 Thread Georg Brandemann
Hi Rahul

Also have a look at  https://issues.apache.org/jira/browse/CASSANDRA-14358 .
We saw this on a 2.1.x cluster and there it also took ~10 minutes till the
restarted node was really fully available in the cluster. the echo ACKs
from some nodes simply seemed to never reach the target

Georg

Am Mi., 6. Nov. 2019 um 21:41 Uhr schrieb Rahul Reddy <
rahulreddy1...@gmail.com>:

> Thanks Daemeon ,
>
> will do that and post the results.
> I found jira in open state with similar issue
> https://issues.apache.org/jira/browse/CASSANDRA-13984
>
> On Wed, Nov 6, 2019 at 1:49 PM daemeon reiydelle 
> wrote:
>
>> No connection timeouts? No tcp level retries? I am sorry truly sorry but
>> you have exceeded my capability. I have never seen a java.io timeout
>> with out either a session half open failure (no response) or multiple
>> retries.
>>
>> I am out of my depth, so please feel free to ignore but, did you see the
>> packets that are making the initial connection (which must have timed out)?
>> Out of curiosity, a netstat -arn must be showing bad packets, timeouts,
>> etc. To see progress, create a simple shell script that dumps date, dumps
>> netstat, sleeps 100 seconds, repeated. During that window stop, wait 10
>> seconds, restart the remove node.
>>
>> <==>
>> Made weak by time and fate, but strong in will,
>> To strive, to seek, to find, and not to yield.
>> Ulysses - A. Lord Tennyson
>>
>> *Daemeon C.M. Reiydelle*
>>
>> *email: daeme...@gmail.com *
>> *San Francisco 1.415.501.0198/Skype daemeon.c.m.reiydelle*
>>
>>
>>
>> On Wed, Nov 6, 2019 at 9:11 AM Rahul Reddy 
>> wrote:
>>
>>> Thank you.
>>>
>>> I have stopped instance in east. i see that all other instances can
>>> gossip to that instance and only one instance in west having issues
>>> gossiping to that node.  when i enable debug mode i see below on the west
>>> node
>>>
>>> i see bellow messages from 16:32 to 16:47
>>> DEBUG [RMI TCP Connection(272)-127.0.0.1] 2019-11-06 16:44:50,
>>> 417 StorageProxy.java:2361 - Hosts not in agreement. Didn't get a
>>> response from everybody:
>>> 424 StorageProxy.java:2361 - Hosts not in agreement. Didn't get a
>>> response from everybody:
>>>
>>> later i see timeout
>>>
>>> DEBUG [MessagingService-Outgoing-/eastip-Gossip] 2019-11-06 16:47:04,831
>>> OutboundTcpConnection.java:350 - Error writing to /eastip
>>> java.io.IOException: Connection timed out
>>>
>>> then  INFO  [GossipStage:1] 2019-11-06 16:47:05,792 StorageService.j
>>> ava:2289 - Node /eastip state jump to NORMAL
>>>
>>> DEBUG [GossipStage:1] 2019-11-06 16:47:06,244 MigrationManager
>>> .java:99 - Not pulling schema from /eastip, because sche
>>> ma versions match: local/real=cdbb639b-1675-31b3-8a0d-84aca18e
>>> 86bf, local/compatible=49bf1daa-d585-38e0-a72b-b36ce82da9cb, r
>>> emote=cdbb639b-1675-31b3-8a0d-84aca18e86bf
>>>
>>> i tried running some tcpdump during that time i dont see any packet loss
>>> during that time.  still unsure why east instance which was stopped and
>>> started unreachable to west node almost for 15 minutes.
>>>
>>>
>>> On Tue, Nov 5, 2019 at 10:14 PM daemeon reiydelle 
>>> wrote:
>>>
 10 minutes is 600 seconds, and there are several timeouts that are set
 to that, including the data center timeout as I recall.

 You may be forced to tcpdump the interface(s) to see where the chatter
 is. Out of curiosity, when you restart the node, have you snapped the jvm's
 memory to see if e.g. heap is even in use?


 On Tue, Nov 5, 2019 at 7:03 PM Rahul Reddy 
 wrote:

> Thanks Ben,
> Before stoping the ec2 I did run nodetool drain .so i ruled it out and
> system.log also doesn't show commitlogs being applied.
>
>
>
>
>
> On Tue, Nov 5, 2019, 7:51 PM Ben Slater 
> wrote:
>
>> The logs between first start and handshaking should give you a
>> clue but my first guess would be replaying commit logs.
>>
>> Cheers
>> Ben
>>
>> ---
>>
>>
>> *Ben Slater**Chief Product Officer*
>>
>> 
>>
>> 
>> 
>> 
>>
>> Read our latest technical blog posts here
>> .
>>
>> This email has been sent on behalf of Instaclustr Pty. Limited
>> (Australia) and Instaclustr Inc (USA).
>>
>> This email and any attachments may contain confidential and legally
>> privileged information.  If you are not the intended recipient, do not 
>> copy
>> or disclose its content, but please reply to this email immediately and
>> highlight the error to the sender and then immediately delete the 
>> message.
>>
>>
>> On Wed, 6 Nov 2019 at 04:36, Rahul Reddy 
>> wrote:
>>
>>> I can reproduce the issue.
>>>
>>> I did drain Cassandra node then stop and started Cassandra 

Re: Upgrade to 3.11.1 give SSLv2Hello is disabled error

2018-01-17 Thread Georg Brandemann
If i remember correctly the protocol names differ between some JRE vendors.

With IBM Java for instance the protocol name would be TLSv12 ( without . ).

Are you using the same JRE on all nodes and is the protocol name and cipher
names exactly the same on all nodes?

2018-01-17 14:51 GMT+01:00 Tommy Stendahl :

> Thanks for your response.
>
> I got it working by removing my protocol setting from the configuration on
> the 3.11.1 node so it use the default protocol setting, I'm not sure
> exactly how that change things so I need to investigate that. We don't have
> any custom ssl settings that should affect this and we use jdk1.8.0_152.
>
> But I think this should have worked, as you say SSLv2Hello should be
> enabled on the server side so I don't understand why I can't specify TLSv1.2
>
> /Tommy
>
>
> On 2018-01-17 11:03, Stefan Podkowinski wrote:
>
>> I think what this error indicates is that a client is trying to connect
>> using a SSLv2Hello handshake, while this protocol has been disabled on
>> the server side. Starting with the mentioned ticket, we use the JVM
>> default list of enabled protocols. What makes this issue a bit
>> confusing, is that starting with 1.7 SSLv2Hello should be disabled by
>> default on the client side, but not on the server side. Cassandra should
>> be able to accept SSLv2Hello connections from 3.0 nodes just fine. What
>> JRE do you use? Any custom ssl specific settings that might be effective
>> here?
>>
>> On 16.01.2018 15:13, Tommy Stendahl wrote:
>>
>>> Hi,
>>>
>>> I have problems upgrading a cluster from 3.0.14 to 3.11.1 but when I
>>> upgrade the first node it fails to gossip.
>>>
>>> I have server encryption enabled on all nodes with this setting:
>>>
>>> server_encryption_options:
>>>  internode_encryption: all
>>>  keystore: /usr/share/cassandra/.ssl/server/keystore.jks
>>>  keystore_password: 'x'
>>>  truststore: /usr/share/cassandra/.ssl/server/truststore.jks
>>>  truststore_password: 'x'
>>>  protocol: TLSv1.2
>>>  cipher_suites:
>>> [TLS_ECDHE_ECDSA_WITH_AES_128_GCM_SHA256,TLS_ECDHE_RSA_WITH_
>>> AES_128_GCM_SHA256,TLS_RSA_WITH_AES_128_CBC_SHA]
>>>
>>>
>>> I get this error in the log:
>>>
>>> 2018-01-16T14:41:19.671+0100 ERROR [ACCEPT-/10.61.204.16]
>>> MessagingService.java:1329 SSL handshake error for inbound connection
>>> from 30f93bf4[SSL_NULL_WITH_NULL_NULL:
>>> Socket[addr=/x.x.x.x,port=40583,localport=7001]]
>>> javax.net.ssl.SSLHandshakeException: SSLv2Hello is disabled
>>>  at
>>> sun.security.ssl.InputRecord.handleUnknownRecord(InputRecord.java:637)
>>> ~[na:1.8.0_152]
>>>  at sun.security.ssl.InputRecord.read(InputRecord.java:527)
>>> ~[na:1.8.0_152]
>>>  at sun.security.ssl.SSLSocketImpl.readRecord(SSLSocketImpl.
>>> java:983)
>>> ~[na:1.8.0_152]
>>>  at
>>> sun.security.ssl.SSLSocketImpl.performInitialHandshake(SSLSo
>>> cketImpl.java:1385)
>>> ~[na:1.8.0_152]
>>>  at
>>> sun.security.ssl.SSLSocketImpl.readDataRecord(SSLSocketImpl.java:938)
>>> ~[na:1.8.0_152]
>>>  at sun.security.ssl.AppInputStream.read(AppInputStream.java:105)
>>> ~[na:1.8.0_152]
>>>  at sun.security.ssl.AppInputStream.read(AppInputStream.java:71)
>>> ~[na:1.8.0_152]
>>>  at java.io.DataInputStream.readInt(DataInputStream.java:387)
>>> ~[na:1.8.0_152]
>>>  at
>>> org.apache.cassandra.net.MessagingService$SocketThread.run(
>>> MessagingService.java:1303)
>>> ~[apache-cassandra-3.11.1.jar:3.11.1]
>>>
>>> I suspect that this has something to do with the change in
>>> CASSANDRA-10508. Any suggestions on how to get around this would be very
>>> much appreciated.
>>>
>>> Thanks, /Tommy
>>>
>>>
>>>
>>> -
>>> To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
>>> For additional commands, e-mail: user-h...@cassandra.apache.org
>>>
>>> -
>> To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
>> For additional commands, e-mail: user-h...@cassandra.apache.org
>>
>>
>>
>
>
> -
> To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
> For additional commands, e-mail: user-h...@cassandra.apache.org
>
>