Re: Guava version check in 3.0

2016-02-11 Thread Andrew Jorgensen
To answer my own question I was able to shade the dependencies in my jar
which fixed the issue and allow the job to run on Hadoop


org.apache.maven.plugins
maven-shade-plugin



com.google.common

com.foo.com.google.common



${project.artifactId}-${project.version}-jar-with-dependencies



package

shade




*:*

META-INF/*.SF
META-INF/*.DSA
META-INF/*.RSA




true


${main.class}







-- 
Andrew Jorgensen
@ajorgensen

On Thu, Feb 11, 2016, at 03:40 PM, Andrew Jorgensen wrote:
> Hello,
> 
> I am trying to get a cassandra v3.0 cluster up and running with the
> v3.0.0 of the datastax client. I am hitting a number of cases where I am
> running into the following exception:
> 
> java.lang.IllegalStateException: Detected Guava issue #1635 which
> indicates that a version of Guava less than 16.01 is in use.  This
> introduces codec resolution issues and potentially other incompatibility
> issues in the driver.  Please upgrade to Guava 16.01 or later.
> 
> I was wondering if there are any potential work arounds. There are some
> cases where I can work around this issue because I am able to control
> the dependency but there are a number of cases (Hadoop) where Guava is a
> provided dependency and currently set at v11.0.2. As such I cannot work
> around the version of Guava and as such cannot use the datastax driver
> in this context. Are there any work around to getting the driver working
> with older versions of Guava or would it be possible to turn off the
> sanity check?
> 
> Thanks,
> -- 
> Andrew Jorgensen
> @ajorgensen


Guava version check in 3.0

2016-02-11 Thread Andrew Jorgensen
Hello,

I am trying to get a cassandra v3.0 cluster up and running with the
v3.0.0 of the datastax client. I am hitting a number of cases where I am
running into the following exception:

java.lang.IllegalStateException: Detected Guava issue #1635 which
indicates that a version of Guava less than 16.01 is in use.  This
introduces codec resolution issues and potentially other incompatibility
issues in the driver.  Please upgrade to Guava 16.01 or later.

I was wondering if there are any potential work arounds. There are some
cases where I can work around this issue because I am able to control
the dependency but there are a number of cases (Hadoop) where Guava is a
provided dependency and currently set at v11.0.2. As such I cannot work
around the version of Guava and as such cannot use the datastax driver
in this context. Are there any work around to getting the driver working
with older versions of Guava or would it be possible to turn off the
sanity check?

Thanks,
-- 
Andrew Jorgensen
@ajorgensen


Non-zero nodes are marked as down after restarting cassandra process

2017-03-01 Thread Andrew Jorgensen
Helllo,

I have a cassandra cluster running on cassandra 3.0.3 and am seeing some
strange behavior that I cannot explain when restarting cassandra nodes. The
cluster is currently setup in a single datacenter and consists of 55 nodes.
I am currently in the process of restarting nodes in the cluster but have
noticed that after restarting the cassandra process with `service cassandra
start; service cassandra stop` when the node comes back and I run `nodetool
status` there is usually a non-zero number of nodes in the rest of the
cluster that are marked as DN. If I got to another node in the cluster,
from its perspective all nodes included the restarted one are marked as UN.
It seems to take ~15 to 20 minutes before the restarted node is updated to
show all nodes as UN. During the 15 minutes writes and reads . to the
cluster appear to be degraded and do not recover unless I stop the
cassandra process again or wait for all nodes to be marked as UN. The
cluster also has 3 seed nodes which during this process are up and
available the whole time.

I have also tried doing `gossipinfo` on the restarted node and according to
the output all nodes have a status of NORMAL. Has anyone seen this before
and is there anything I can do to fix/reduce the impact of running a
restart on a cassandra node?

Thanks,
Andrew Jorgensen
@ajorgensen


Re: Non-zero nodes are marked as down after restarting cassandra process

2017-05-16 Thread Andrew Jorgensen
Thanks for the info!

When you say "overall stability problems due to some bugs", can you
elaborate on if those were bugs in cassandra that were fixed due to an
upgrade or bugs in your own code and how you used cassandra. If the latter
would  it be possible to highlight what the most impactful fix was from the
usage side.

As far as I can tell there are no dropped messages, there are some pending
Compactions and a few Native-Transport_Request in the all time blocked
column.

Thanks!

Andrew Jorgensen
@ajorgensen

On Wed, Mar 1, 2017 at 12:58 PM, benjamin roth <brs...@gmail.com> wrote:

> You should always drain nodes before stopping the daemon whenever
> possible. This avoids commitlog replay on startup. This can take a while.
> But according to your description commit log replay seems not to be the
> cause.
>
> I once had a similar effect. Some nodes appeared down for some other nodes
> and up for others. At that time the cluster had overall stability problems
> due to some bugs. After those bugs have gone, I haven't seen this effect
> any more.
>
> If that happens again to you, you could check your logs or "nodetool
> tpstats" for dropped messages, watch out for suspicious network-related
> logs and the load of your nodes in general.
>
> 2017-03-01 17:36 GMT+01:00 Ben Dalling <b.dall...@locp.co.uk>:
>
>> Hi Andrew,
>>
>> We were having problems with gossip TCP connections being held open and
>> changed our SOP for stopping cassandra to being:
>>
>> nodetool disablegossip
>> nodetool drain
>> service cassandra stop
>>
>> This seemed to close down the gossip cleanly (the nodetool drain is
>> advised as well) and meant that the node rejoined the cluster fine after
>> issuing "service cassandra start".
>>
>> *Ben*
>>
>> On 1 March 2017 at 16:29, Andrew Jorgensen <and...@andrewjorgensen.com>
>> wrote:
>>
>>> Helllo,
>>>
>>> I have a cassandra cluster running on cassandra 3.0.3 and am seeing some
>>> strange behavior that I cannot explain when restarting cassandra nodes. The
>>> cluster is currently setup in a single datacenter and consists of 55 nodes.
>>> I am currently in the process of restarting nodes in the cluster but have
>>> noticed that after restarting the cassandra process with `service cassandra
>>> start; service cassandra stop` when the node comes back and I run `nodetool
>>> status` there is usually a non-zero number of nodes in the rest of the
>>> cluster that are marked as DN. If I got to another node in the cluster,
>>> from its perspective all nodes included the restarted one are marked as UN.
>>> It seems to take ~15 to 20 minutes before the restarted node is updated to
>>> show all nodes as UN. During the 15 minutes writes and reads . to the
>>> cluster appear to be degraded and do not recover unless I stop the
>>> cassandra process again or wait for all nodes to be marked as UN. The
>>> cluster also has 3 seed nodes which during this process are up and
>>> available the whole time.
>>>
>>> I have also tried doing `gossipinfo` on the restarted node and according
>>> to the output all nodes have a status of NORMAL. Has anyone seen this
>>> before and is there anything I can do to fix/reduce the impact of running a
>>> restart on a cassandra node?
>>>
>>> Thanks,
>>> Andrew Jorgensen
>>> @ajorgensen
>>>
>>
>>
>


InternalResponseStage low on some nodes

2017-05-23 Thread Andrew Jorgensen
Hello,

I will preface this and say that all of the nodes have been running for
about the same amount of time and were not restarted before running
nodetool tpstats.

This is more for my understanding that anything else but I have a 20 node
cassandra cluster running cassandra 3.0.3. I have 0 read and 0 writes going
to the cluster and I see a strange banding of load average for some of the
nodes.

18 out of 20 of the nodes sit around an 15 min LA of 0.3 while 2 nodes are
at 0.08. Once I applied writes to the cluster all of the load averages
increased by the same proportion, so 18 out of 20 nodes increased to ~0.4
and 2 nodes increased to 0.1

When I look at what is different between these nodes, all of which have
been running for the same amount of time, the only numerical difference in
nodetool tpstats is the InternalResponseStage. 18 out of 20 of the nodes
are in the range of 20,000 completed while 2 are only at 300. Interestingly
it also looks like the 2 nodes that are on the low side are exhibiting
symptoms of https://issues.apache.org/jira/browse/CASSANDRA-11090.

I restarted one of the low nodes and it immediately jumped up to match the
other 18 nodes in the cluster and settled around 0.4 15 minute LA.

I am curious of two things, first why the InternalResponseStage is so low
on two of the nodes in the cluster and what that means. Second is it
atypical for the other 18 nodes in the cluster to have such a high number
of InternalResponseStage completed tasks or do these numbers seems
reasonable for a idle cluster?

Thanks!
Andrew Jorgensen
@ajorgensen