Unsubscribe

2016-11-20 Thread Brian Fleming



UNSUBSCRIBE

2016-06-30 Thread Brian Fleming



Re: Read-repair working, repair not working?

2013-02-11 Thread Brian Fleming
ger.info(String.format("[repair #%s] new session: will sync %s 
> on range %s for %s.%s", getName(), repairedNodes(), range, tablename, 
> Arrays.toString(cfnames)));
> 
> When it completes it logs this
> 
> logger.info(String.format("[repair #%s] session completed successfully", 
> getName()));
> 
> Or this on failure 
> 
> logger.error(String.format("[repair #%s] session completed with the following 
> error", getName()), exception);
> 
> 
> Cheers
> 
> -
> Aaron Morton
> Freelance Cassandra Developer
> New Zealand
> 
> @aaronmorton
> http://www.thelastpickle.com
> 
> On 10/02/2013, at 9:56 PM, Brian Fleming  wrote:
> 
>> 
>> 
>> 
>> Hi,
>> 
>> 
>> 
>> I have a 20 node cluster running v1.0.7 split between 5 data centres, each 
>> with an RF of 2, containing a ~1TB unique dataset/~10TB of total data. 
>> 
>> 
>> 
>> I’ve had some intermittent issues with a new data centre (3 nodes, RF=2) I 
>> brought online late last year with data consistency & availability: I’d 
>> request data, nothing would be returned, I would then re-request the data 
>> and it would correctly be returned: i.e. read-repair appeared to be 
>> occurring.  However running repairs on the nodes didn’t resolve this (I 
>> tried general ‘repair’ commands as well as targeted keyspace commands) – 
>> this didn’t alter the behaviour.
>> 
>> 
>> 
>> After a lot of fruitless investigation, I decided to wipe & 
>> re-install/re-populate the nodes.  The re-install & repair operations are 
>> now complete: I see the expected amount of data on the nodes, however I am 
>> still seeing the same behaviour, i.e. I only get data after one failed 
>> attempt.
>> 
>> 
>> 
>> When I run repair commands, I don’t see any errors in the logs. 
>> 
>> I see the expected ‘AntiEntropySessions’ count in ‘nodetool tpstats’ during 
>> repair sessions.
>> 
>> I see a number of dropped ‘MUTATION’ operations : just under 5% of the total 
>> ‘MutationStage’ count.
>> 
>> 
>> 
>> Questions :
>> 
>> -  Could anybody suggest anything specific to look at to see why the 
>> repair operations aren’t having the desired effect? 
>> 
>> -  Would increasing logging level to ‘DEBUG’ show read-repair 
>> activity (to confirm that this is happening, when & for what proportion of 
>> total requests)?
>> 
>> -  Is there something obvious that I could be missing here?
>> 
>> 
>> 
>> Many thanks,
>> 
>> Brian
> 


Read-repair working, repair not working?

2013-02-10 Thread Brian Fleming
 **

Hi,

** **

I have a 20 node cluster running v1.0.7 split between 5 data centres, each
with an RF of 2, containing a ~1TB unique dataset/~10TB of total data. 

** **

I’ve had some intermittent issues with a new data centre (3 nodes, RF=2) I
brought online late last year with data consistency & availability: I’d
request data, nothing would be returned, I would then re-request the data
and it would correctly be returned: i.e. read-repair appeared to be
occurring.  However running repairs on the nodes didn’t resolve this (I
tried general ‘*repair’* commands as well as targeted keyspace commands) –
this didn’t alter the behaviour.

** **

After a lot of fruitless investigation, I decided to wipe &
re-install/re-populate the nodes.  The re-install & repair operations are
now complete: I see the expected amount of data on the nodes, however I am
still seeing the same behaviour, i.e. I only get data after one failed
attempt.

** **

When I run repair commands, I don’t see any errors in the logs. 

I see the expected ‘AntiEntropySessions’ count in ‘nodetool tpstats’ during
repair sessions.

I see a number of dropped ‘MUTATION’ operations : just under 5% of the
total ‘MutationStage’ count.

** **

Questions :

**-  **Could anybody suggest anything specific to look at to see
why the repair operations aren’t having the desired effect? 

**-  **Would increasing logging level to ‘DEBUG’ show read-repair
activity (to confirm that this is happening, when & for what proportion of
total requests)?

**-  **Is there something obvious that I could be missing here?

** **

Many thanks,

Brian

**


Re: Cassandra upgrade issues...

2012-11-01 Thread Brian Fleming
Hi Sylvain,

Simple as that!!!  Using the 1.1.5 nodetool version works as expected.  My
mistake.

Many thanks,

Brian



On Thu, Nov 1, 2012 at 8:24 AM, Sylvain Lebresne wrote:

> The first thing I would check is if nodetool is using the right jar. I
> sounds a lot like if the server has been correctly updated but
> nodetool haven't and still use the old classes.
> Check the nodetool executable, it's a shell script, and try echoing
> the CLASSPATH in there and check it correctly point to what it should.
>
> --
> Sylvain
>
> On Thu, Nov 1, 2012 at 9:10 AM, Brian Fleming 
> wrote:
> > Hi,
> >
> >
> >
> > I was testing upgrading from Cassandra v.1.0.7 to v.1.1.5 yesterday on a
> > single node dev cluster with ~6.5GB of data & it went smoothly in that no
> > errors were thrown, the data was migrated to the new directory
> structure, I
> > can still read/write data as expected, etc.  However nodetool commands
> are
> > behaving strangely – full details below.
> >
> >
> >
> > I couldn’t find anything relevant online relating to these exceptions –
> any
> > help/pointers would be greatly appreciated.
> >
> >
> >
> > Thanks & Regards,
> >
> >
> >
> > Brian
> >
> >
> >
> >
> >
> >
> >
> >
> >
> > ‘nodetool cleanup’ runs successfully
> >
> >
> >
> > ‘nodetool info’ produces :
> >
> >
> >
> > Token: 82358484304664259547357526550084691083
> >
> > Gossip active: true
> >
> > Load : 7.69 GB
> >
> > Generation No: 1351697611
> >
> > Uptime (seconds) : 58387
> >
> > Heap Memory (MB) : 936.91 / 1928.00
> >
> > Exception in thread "main" java.lang.ClassCastException: java.lang.String
> > cannot be cast to org.apache.cassandra.dht.Token
> >
> > at
> > org.apache.cassandra.tools.NodeProbe.getEndpoint(NodeProbe.java:546)
> >
> > at
> > org.apache.cassandra.tools.NodeProbe.getDataCenter(NodeProbe.java:559)
> >
> > at org.apache.cassandra.tools.NodeCmd.printInfo(NodeCmd.java:313)
> >
> > at org.apache.cassandra.tools.NodeCmd.main(NodeCmd.java:651)
> >
> >
> >
> > ‘nodetool repair’ produces :
> >
> > Exception in thread "main" java.lang.reflect.UndeclaredThrowableException
> >
> > at $Proxy0.forceTableRepair(Unknown Source)
> >
> > at
> > org.apache.cassandra.tools.NodeProbe.forceTableRepair(NodeProbe.java:203)
> >
> > at
> > org.apache.cassandra.tools.NodeCmd.optionalKSandCFs(NodeCmd.java:880)
> >
> > at org.apache.cassandra.tools.NodeCmd.main(NodeCmd.java:719)
> >
> > Caused by: javax.management.ReflectionException: Signature mismatch for
> > operation forceTableRepair: (java.lang.String, [Ljava.lang.String;)
> should
> > be (java.lang.String, boolean, [Ljava.lang.String;)
> >
> > at
> > com.sun.jmx.mbeanserver.PerInterface.noSuchMethod(PerInterface.java:152)
> >
> > at
> > com.sun.jmx.mbeanserver.PerInterface.invoke(PerInterface.java:117)
> >
> > at
> > com.sun.jmx.mbeanserver.MBeanSupport.invoke(MBeanSupport.java:262)
> >
> > at
> >
> com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.invoke(DefaultMBeanServerInterceptor.java:836)
> >
> > at
> > com.sun.jmx.mbeanserver.JmxMBeanServer.invoke(JmxMBeanServer.java:761)
> >
> > at
> >
> javax.management.remote.rmi.RMIConnectionImpl.doOperation(RMIConnectionImpl.java:1427)
> >
> > at
> >
> javax.management.remote.rmi.RMIConnectionImpl.access$200(RMIConnectionImpl.java:72)
> >
> > at
> >
> javax.management.remote.rmi.RMIConnectionImpl$PrivilegedOperation.run(RMIConnectionImpl.java:1265)
> >
> > at
> >
> javax.management.remote.rmi.RMIConnectionImpl.doPrivilegedOperation(RMIConnectionImpl.java:1360)
> >
> > at
> >
> javax.management.remote.rmi.RMIConnectionImpl.invoke(RMIConnectionImpl.java:788)
> >
> > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> >
> > at
> >
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> >
> > at
> >
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> >
> > at java.lang.reflect.Method.invoke(Method.java:597)
> >
> 

Node repair : excessive data

2011-12-12 Thread Brian Fleming
Hi,

We simulated a node 'failure' on one of our nodes by deleting the entire
Cassandra installation directory & reconfiguring a fresh instance with the
same token.  When we issued a 'repair' it started streaming data back onto
the node as expected.

However after the repair completed, we had over 2.5 times the original
load.  Issuing a 'cleanup' reduced this to about 1.5 times the original
load.  We observed an increase in the number of keys via 'cfstats' which is
obviously accounting for the increased load.

Would anybody know why the repair pulled more keys in than it had initially
with the same token?  How can we avoid this recurring?

If we didn't have sufficient headroom on the disk to handle say 3 times the
load, we could be in a difficult situation should we experience a genuine
failure.

(we're using Cassandra 1.0.5, 12 nodes split across 2 data centres, total
cluster load during testing was about 150GB)

Many thanks,

Brian


Re: Efficiency of Cross Data Center Replication...?

2011-11-16 Thread Brian Fleming
Great - thanks Jake

B.

On Wed, Nov 16, 2011 at 8:40 PM, Jake Luciani  wrote:

> the former
>
>
> On Wed, Nov 16, 2011 at 3:33 PM, Brian Fleming 
> wrote:
>
>>
>> Hi All,
>>
>> I have a question about inter-data centre replication : if you have 2
>> Data Centers, each with a local RF of 2 (i.e. total RF of 4) and write to a
>> node in DC1, how efficient is the replication to DC2 - i.e. is that data :
>>  - replicated over to a single node in DC2 once and internally replicated
>>  or
>>  - replicated explicitly to two separate nodes?
>>
>> Obviously from a LAN resource utilisation perspective, the former would
>> be preferable.
>>
>> Many thanks,
>>
>> Brian
>>
>>
>
>
> --
> http://twitter.com/tjake
>


Efficiency of Cross Data Center Replication...?

2011-11-16 Thread Brian Fleming
Hi All,

I have a question about inter-data centre replication : if you have 2 Data
Centers, each with a local RF of 2 (i.e. total RF of 4) and write to a node
in DC1, how efficient is the replication to DC2 - i.e. is that data :
 - replicated over to a single node in DC2 once and internally replicated
 or
 - replicated explicitly to two separate nodes?

Obviously from a LAN resource utilisation perspective, the former would be
preferable.

Many thanks,

Brian


Monitoring....

2011-10-12 Thread Brian Fleming
> Hi,
>  
> Has anybody used any solutions for harvesting and storing Cassandra JMX 
> metrics for monitoring, trend analysis, etc.?
>  
> JConsole is useful for single node monitoring/etc but not scalable & data 
> obviously doesn't persist between sessions...
> 
> Many thanks,
> 
> Brian