Removed node, jumps back into the cluster

2012-09-11 Thread Fredrik
I've tested a scenario where I wanted to reuse a removed node in a new 
cluster with same IP, maybe not very common but anyway, found some 
strange behaviour in Gossiper.


Here is what I think/see happening:
- Cassandra 1.1. Three node cluster A, B and C.
- Shutdown node C and remove token for node C.
- Everything looks ok in logs, reporting that node C is removed etc..
- Node A and B still sends Gossip digest about the removed node, but I 
guess that's ok since they know about it (Gossiper.endpointStateMap).

- Node C has status removed when checking in JMX console.
- Checked in LocationInfo that Ring only contains token/IP for node A and B.
- Removed system/data tables for C.
- Changed seed on C to point to itself.
- Startup node C, node C only gossips itself and node A and B doesn't 
recognize that node C is running, which is correct.
- Restart e.g. node A. Now node A will loose all gossip information 
(Gossiper.endpointStateMap) about node C. Node A will request 
information from LocationInfo and ask node B
  about endpoint states. Node A will receive information from node B 
about node C, this will trigger Gossiper.handleMajorStateChange and node 
C will be first marked as unreachable
  because it's in dead state (removed), node A will try to Gossip 
(unreachable endpoints) to node C, which will reply that it's up and 
node C becomes incorporated into the old cluster again.


Is this a a bug or is it a requirement that if you take a node out of 
the cluster you must change IP on the removed node if you want to use it 
in another cluster?

Please enlight me.

Regards
/Fredrik





Re: Assertions running Cleanup on a 3-node cluster with Cassandra 1.1.4 and LCS

2012-09-11 Thread Omid Aladini
Which version of Cassandra has your data been created initially with?

A bug in Cassandra 1.1.2 and earlier could cause out-of-order sstables
and inter-level overlaps in CFs with Leveled Compaction. Your sstables
generated with 1.1.3 and later should not have this issue [1] [2].

In case you have old Leveled-compacted sstables (generated with 1.1.2
or earlier. including 1.0.x) you need to run offline scrub using
Cassandra 1.1.4 or later via /bin/sstablescrub command so it'll fix
out-of-order sstables and inter-level overlaps caused by previous
versions of LCS. You need to take nodes down in order to run offline
scrub.

 After 3 hours the job is done and there are 11390 compaction tasks pending.
 My question: Can these assertions be ignored? Or do I need to worry about
 it?

They can't be ignored since pending compactions elevate the upper
bound on number of disk seeks you need to make to read a row and you
don't get the nice guarantees of leveled compaction.

Cheers,
Omid

[1] https://issues.apache.org/jira/browse/CASSANDRA-4411
[2] https://issues.apache.org/jira/browse/CASSANDRA-4321

On Mon, Sep 10, 2012 at 6:37 PM, Rudolf van der Leeden
rudolf.vanderlee...@scoreloop.com wrote:
 Hi,

 I'm getting 5 identical assertions while running 'nodetool cleanup' on a
 Cassandra 1.1.4 node with Load=104G and 80m keys.
 From  system.log :

 ERROR [CompactionExecutor:576] 2012-09-10 11:25:50,265
 AbstractCassandraDaemon.java (line 134) Exception in thread
 Thread[CompactionExecutor:576,1,main]
 java.lang.AssertionError
 at
 org.apache.cassandra.db.compaction.LeveledManifest.promote(LeveledManifest.java:214)
 at
 org.apache.cassandra.db.compaction.LeveledCompactionStrategy.handleNotification(LeveledCompactionStrategy.java:158)
 at
 org.apache.cassandra.db.DataTracker.notifySSTablesChanged(DataTracker.java:531)
 at
 org.apache.cassandra.db.DataTracker.replaceCompactedSSTables(DataTracker.java:254)
 at
 org.apache.cassandra.db.ColumnFamilyStore.replaceCompactedSSTables(ColumnFamilyStore.java:992)
 at
 org.apache.cassandra.db.compaction.CompactionTask.execute(CompactionTask.java:200)
 at
 org.apache.cassandra.db.compaction.LeveledCompactionTask.execute(LeveledCompactionTask.java:50)
 at
 org.apache.cassandra.db.compaction.CompactionManager$1.runMayThrow(CompactionManager.java:154)
 at
 org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:30)
 at
 java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
 at
 java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
 at java.util.concurrent.FutureTask.run(FutureTask.java:138)
 at
 java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
 at
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
 at java.lang.Thread.run(Thread.java:662)

 After 3 hours the job is done and there are 11390 compaction tasks pending.
 My question: Can these assertions be ignored? Or do I need to worry about
 it?

 Thanks for your help and best regards,
 -Rudolf.



Re: [RELEASE] Apache Cassandra 1.1.5 released

2012-09-11 Thread André Cruz
I'm also having AssertionErrors.

ERROR [ReadStage:51687] 2012-09-10 14:33:54,211 AbstractCassandraDaemon.java 
(line 134) Exception in thread Thread[ReadStage:51687,5,main]
java.io.IOError: java.io.EOFException
at 
org.apache.cassandra.db.columniterator.SSTableSliceIterator.init(SSTableSliceIterator.java:64)
at 
org.apache.cassandra.db.filter.SliceQueryFilter.getSSTableColumnIterator(SliceQueryFilter.java:66)
at 
org.apache.cassandra.db.filter.QueryFilter.getSSTableColumnIterator(QueryFilter.java:78)
at 
org.apache.cassandra.db.CollationController.collectAllData(CollationController.java:256)
at 
org.apache.cassandra.db.CollationController.getTopLevelColumns(CollationController.java:63)
at 
org.apache.cassandra.db.ColumnFamilyStore.getTopLevelColumns(ColumnFamilyStore.java:1345)
at 
org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(ColumnFamilyStore.java:1207)
at 
org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(ColumnFamilyStore.java:1142)
at org.apache.cassandra.db.Table.getRow(Table.java:378)
at 
org.apache.cassandra.db.SliceFromReadCommand.getRow(SliceFromReadCommand.java:69)
at 
org.apache.cassandra.service.StorageProxy$LocalReadRunnable.runMayThrow(StorageProxy.java:816)
at 
org.apache.cassandra.service.StorageProxy$DroppableRunnable.run(StorageProxy.java:1250)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:662)
Caused by: java.io.EOFException
at java.io.RandomAccessFile.readFully(RandomAccessFile.java:399)
at java.io.RandomAccessFile.readFully(RandomAccessFile.java:377)
at 
org.apache.cassandra.io.util.RandomAccessReader.readBytes(RandomAccessReader.java:324)
at 
org.apache.cassandra.utils.ByteBufferUtil.read(ByteBufferUtil.java:398)
at 
org.apache.cassandra.utils.ByteBufferUtil.readWithShortLength(ByteBufferUtil.java:380)
at 
org.apache.cassandra.db.columniterator.SSTableSliceIterator.init(SSTableSliceIterator.java:54)
... 14 more
ERROR [ReadStage:51801] 2012-09-10 14:44:38,852 AbstractCassandraDaemon.java 
(line 134) Exception in thread Thread[ReadStage:51801,5,main]
java.lang.AssertionError: DecoratedKey(12064825934064381804725403203980154559, 
0bc7e1c580001170726573656e746174696f6e5f707074200017696d6167652f782d706f727461626c652d7069786d617000420013000102487fff8001000469636f6e04c90f4f6527560007554e4b4e4f574e000a746578742f782d74657800420013000102487fff8001000469636f6e04c90f8a0b80ac0007554e4b4e4f574e00466170706c69636174696f6e2f766e642e6f70656e786d6c666f726d6174732d6f696365646f63756d656e742e70726573656e746174696f6e6d6c2e736c69646573686f77004c0013000102487fff8001000469636f6e04c90bc7e19a88001170726573656e746174696f6e5f7070732a746578742f782d632b2b00420013000102487fff8001000469636f6e04c90f4e902aaa0007554e4b4e4f574e000c696d6167652f782d78706d6900420013000102487fff8001000469636f6e04c90f4b8360f20007554e4b4e4f574e0013696d6167652f782d77696e646f77732d626d7000440013000102487fff8001000469636f6e04c90bc7de8969696d6167655f626d7000156170706c69636174696f6e2f782d646f736578656300490013000102487fff8001000469636f6e04c90bc7dd973e61706c69636174696f6e5f6578650009766964656f2f64766400430013000102487fff8001000469636f6e04c90bc7e07598746578745f766f620008746578742f63737300430013000102487fff8001000469636f6e04c90bc7e07d68746578745f637373001d6170706c69636174696f6e2f782d73686f636b776176652d666c61736800440013000102487fff8001000469636f6e04c90bc7deb079766964656f5f737766000a746578742f782d61776b00420013000102487fff8001000469636f6e04c9117d73ced50007554e4b4e4f574e00186170706c69636174696f6e2f766e642e6d732d657863656c00430013000102487fff8001000469636f6e04c90bc7df19e80008746578745f786c73000f766964656f2f717569636b74696d6500)
 != DecoratedKey(121031529647353036275964125031804748412, 
6170706c69636174696f6e2f7a6970) in 

Re: Assertions running Cleanup on a 3-node cluster with Cassandra 1.1.4 and LCS

2012-09-11 Thread Rudolf van der Leeden

 Which version of Cassandra has your data been created initially with?
 A bug in Cassandra 1.1.2 and earlier could cause out-of-order sstables
 and inter-level overlaps in CFs with Leveled Compaction. Your sstables
 generated with 1.1.3 and later should not have this issue [1] [2].
 In case you have old Leveled-compacted sstables (generated with 1.1.2
 or earlier. including 1.0.x) you need to run offline scrub using
 Cassandra 1.1.4 or later via /bin/sstablescrub command so it'll fix
 out-of-order sstables and inter-level overlaps caused by previous
 versions of LCS. You need to take nodes down in order to run offline
 scrub.


The data was orginally created on a 1.1.2 cluster with STCS (i.e. NOT
leveled compaction).
After the upgrade to 1.1.4 we changed from STCS to LCS w/o problems.
Then we ran more tests and created more and very big keys with millions of
columns.
The assertion only shows up with one particular CF containing these big
keys.
So, from your explanation, I don't think an offline scrub will help.

Thanks,
-Rudolf.


Re: Assertions running Cleanup on a 3-node cluster with Cassandra 1.1.4 and LCS

2012-09-11 Thread Omid Aladini
Could you, as Aaron suggested, open a ticket?

-- Omid

On Tue, Sep 11, 2012 at 2:35 PM, Rudolf van der Leeden
rudolf.vanderlee...@scoreloop.com wrote:
 Which version of Cassandra has your data been created initially with?
 A bug in Cassandra 1.1.2 and earlier could cause out-of-order sstables
 and inter-level overlaps in CFs with Leveled Compaction. Your sstables
 generated with 1.1.3 and later should not have this issue [1] [2].
 In case you have old Leveled-compacted sstables (generated with 1.1.2
 or earlier. including 1.0.x) you need to run offline scrub using
 Cassandra 1.1.4 or later via /bin/sstablescrub command so it'll fix
 out-of-order sstables and inter-level overlaps caused by previous
 versions of LCS. You need to take nodes down in order to run offline
 scrub.


 The data was orginally created on a 1.1.2 cluster with STCS (i.e. NOT
 leveled compaction).
 After the upgrade to 1.1.4 we changed from STCS to LCS w/o problems.
 Then we ran more tests and created more and very big keys with millions of
 columns.
 The assertion only shows up with one particular CF containing these big
 keys.
 So, from your explanation, I don't think an offline scrub will help.

 Thanks,
 -Rudolf.



Re: Assertions running Cleanup on a 3-node cluster with Cassandra 1.1.4 and LCS

2012-09-11 Thread Rudolf van der Leeden
 Could you, as Aaron suggested, open a ticket?


Done:  https://issues.apache.org/jira/browse/CASSANDRA-4644


Re: JVM 7, Cass 1.1.1 and G1 garbage collector

2012-09-11 Thread Jonathan Ellis
Relatedly, I'd love to learn how to reliably reproduce full GC pauses
on C* 1.1+.

On Mon, Sep 10, 2012 at 12:37 PM, Oleg Dulin oleg.du...@gmail.com wrote:
 I am currently profiling a Cassandra 1.1.1 set up using G1 and JVM 7.

 It is my feeble attempt to reduce Full GC pauses.

 Has anyone had any experience with this ? Anyone tried it ?

 --
 Regards,
 Oleg Dulin
 NYC Java Big Data Engineer
 http://www.olegdulin.com/





-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of DataStax, the source for professional Cassandra support
http://www.datastax.com


Re: JVM 7, Cass 1.1.1 and G1 garbage collector

2012-09-11 Thread Shahryar Sedghi
I was able to run IBM Java 7 with Cassandra (could not do it with 1.6
because of snappy). It has a new Garbage collection policy (called
balanced)  that is good for very large heap size (over 8 GB),
documented 
http://www.ibm.com/developerworks/websphere/techjournal/1108_sciampacone/1108_sciampacone.html
herehttp://www.ibm.com/developerworks/websphere/techjournal/1108_sciampacone/1108_sciampacone.html
that
is so promising with Cassandra. I have not tried it but I like to see how
it is in action.

Regrads

Shahryar

On Mon, Sep 10, 2012 at 1:37 PM, Oleg Dulin oleg.du...@gmail.com wrote:

 I am currently profiling a Cassandra 1.1.1 set up using G1 and JVM 7.

 It is my feeble attempt to reduce Full GC pauses.

 Has anyone had any experience with this ? Anyone tried it ?

 --
 Regards,
 Oleg Dulin
 NYC Java Big Data Engineer
 http://www.olegdulin.com/





Re: Assertions running Cleanup on a 3-node cluster with Cassandra 1.1.4 and LCS

2012-09-11 Thread Janne Jalkanen

 A bug in Cassandra 1.1.2 and earlier could cause out-of-order sstables
 and inter-level overlaps in CFs with Leveled Compaction. Your sstables
 generated with 1.1.3 and later should not have this issue [1] [2].

Does this mean that LCS on 1.0.x should be considered unsafe to use? I'm using 
them for semi-wide frequently-updated CounterColumns and they're performing 
much better on LCS than on STCS.

 In case you have old Leveled-compacted sstables (generated with 1.1.2
 or earlier. including 1.0.x) you need to run offline scrub using
 Cassandra 1.1.4 or later via /bin/sstablescrub command so it'll fix
 out-of-order sstables and inter-level overlaps caused by previous
 versions of LCS. You need to take nodes down in order to run offline
 scrub.

The  1.1.5 README does not mention this. Should it?

/Janne



Compound Keys: Connecting the dots between CQL3 and Java APIs

2012-09-11 Thread Brian O'Neill
Our data architects (ex-Oracle DBA types) are jumping on the CQL3
bandwagon and creating schemas for us.  That triggered me to write a
quick article mapping the CQL3 schemas to how they are accessed via
Java APIs (for our dev team).

I hope others find this useful as well:
http://brianoneill.blogspot.com/2012/09/composite-keys-connecting-dots-between.html

-brian

-- 
Brian ONeill
Lead Architect, Health Market Science (http://healthmarketscience.com)
Apache Cassandra MVP
mobile:215.588.6024
blog: http://brianoneill.blogspot.com/
twitter: @boneill42


Re: replace_token code?

2012-09-11 Thread aaron morton
This looks correct…

  INFO [GossipStage:1] 2012-09-10 08:01:23,036 Gossiper.java (line 850) Node 
 /10.72.201.80 is now part of the cluster

  INFO [GossipStage:1] 2012-09-10 08:01:23,037 Gossiper.java (line 816) 
 InetAddress /10.72.201.80 is now UP
  
80 joined the ring because it was in the stored ring state. 

 INFO [GossipStage:1] 2012-09-10 08:01:23,038 StorageService.java (line 1126) 
 Nodes /10.72.201.80 and /10.190.221.204 have the same token 
 166594924822352415786406422619018814804.  Ignoring /10.72.201.80
New node took ownership

  INFO [GossipTasks:1] 2012-09-10 08:01:32,967 Gossiper.java (line 830) 
 InetAddress /10.72.201.80 is now dead.
  INFO [GossipTasks:1] 2012-09-10 08:01:53,976 Gossiper.java (line 644) 
 FatClient /10.72.201.80 has been silent for 3ms, removing from gossip
Old node marked as dead and the process to remove is started. 

Has the 80 node re appeared in the logs ? 

If it does can you include the output from nodetool gossipinfo ?

Cheers


-
Aaron Morton
Freelance Developer
@aaronmorton
http://www.thelastpickle.com

On 11/09/2012, at 5:59 AM, Yang tedd...@gmail.com wrote:

 Thanks Jim, looks I'll have to read into the code to understand what is 
 happening under the hood
 
 yang
 
 On Mon, Sep 10, 2012 at 9:45 AM, Jim Cistaro jcist...@netflix.com wrote:
 We have seen various issues from these replaced nodes hanging around.  For 
 clusters where a lot of nodes have been replaced, we see these replaced nodes 
 having an impact on heap/GC and a lot of tcp timeouts/retransmits (because 
 the old nodes no longer exist).  As a result, we have begun cleaning these up 
 using unsafeAssassinateEndpoint via jmx.  We have only started using 
 recently.  So far no bad side effects.  This also helps because those 
 replaced nodes can appear as unreachable nodes wrt schema and sometimes 
 prevent things like CF truncation.
 
 Using unsafeAssassinateEndpoint will clean these from unreachable nodes and 
 will mark them as LEFT in gossip info.  There is a ttl for them in gossipinfo 
 and they should go away after 3 days.  Once they are marked LEFT, you should 
 stop seeing those up/same/dead messages.
 
 unsafeAssassinateEndpoint is unsafe in that, if you specify IP of a real 
 node in cluster, that node will be assassinated.  Otherwise, if you specify 
 nodes that have been replaced, it is supposed to work correctly.
 
 Hope this helps,
 jc
 
 
 From: Yang tedd...@gmail.com
 Reply-To: user@cassandra.apache.org
 Date: Mon, 10 Sep 2012 01:10:56 -0700
 To: user@cassandra.apache.org
 Subject: replace_token code?
 
 it looks that by specifying replace_token, the old owner is not removed from 
 gossip (which I had thought it would do).
 Then it's understandable that the old owner would resurface later and we get 
 some warning saying that the same token is owned by both.
 
 
 I ran an example with a 2-node cluster, with RF=2.  host 10.72.201.80 was 
 running for a while and had some data, then i shut it down, and 
 booted up 10.190.221.204 with replace_token of the old token owned by the 
 previous host.
 the following log sequence shows that the new host does acquire the token, 
 but it does not at the same time remove 80 forcefully from gossip.
 instead, a few seconds later, it believed that .80 became live again.
 I don't have much understanding of the Gossip protocol, but roughly know that 
 it's probability-based, looks we need an assertive/NOW 
 membership control message for replace_token.
 
 
 
 
 thanks
 yang
 
 
  WARN [main] 2012-09-10 08:00:21,855 TokenMetadata.java (line 160) Token 
 166594924822352415786406422619018814804 changing ownership from /10.72.201.80 
 to /10.190.221.204
  INFO [main] 2012-09-10 08:00:21,855 StorageService.java (line 753) JOINING: 
 Starting to bootstrap...
  INFO [CompactionExecutor:2] 2012-09-10 08:00:21,875 CompactionTask.java 
 (line 109) Compacting 
 [SSTableReader(path='/mnt/cassandra/data/system/LocationInfo/system-LocationInfo-hd-1-Data.db'),
  
 SSTableReader(path='/mnt/cassandra/data/system/LocationInfo/system-LocationInfo-hd-3-Data.db'),
  
 SSTableReader(path='/mnt/cassandra/data/system/LocationInfo/system-LocationInfo-hd-4-Data.db'),
  
 SSTableReader(path='/mnt/cassandra/data/system/LocationInfo/system-LocationInfo-hd-2-Data.db')]
  INFO [CompactionExecutor:2] 2012-09-10 08:00:21,979 CompactionTask.java 
 (line 221) Compacted to 
 [/mnt/cassandra/data/system/LocationInfo/system-LocationInfo-hd-5-Data.db,].  
 499 to 394 (~78% of original) bytes for 3 keys at 0.003997MB/s.  Time: 94ms.
  INFO [Thread-4] 2012-09-10 08:00:22,070 StreamInSession.java (line 214) 
 Finished streaming session 1 from /10.72.102.61
  INFO [main] 2012-09-10 08:00:22,073 ColumnFamilyStore.java (line 643) 
 Enqueuing flush of Memtable-LocationInfo@30624226(77/96 serialized/live 
 bytes, 2 ops)
  INFO [FlushWriter:2] 2012-09-10 08:00:22,074 Memtable.java (line 266) 
 Writing Memtable-LocationInfo@30624226(77/96 serialized/live bytes, 2 ops)
 

Re: Cassandra 1.1.1 on Java 7

2012-09-11 Thread Oleg Dulin

So, my experiment didn't quite work out.

I was hoping to use G1 collector to minimize pauses -- pauses didn't 
really go away, but what's worse is I think the memtable memory 
calculations are driven by CMS, so my memtables would fill up and cause 
Cass to run out of heap :(



On 2012-09-09 19:04:41 +, Jeremy Hanna said:

Starting with 1.6.0_34, you'll need xss set to 180k.  It's updated with 
the forthcoming 1.1.5 as well as the next minor rev of 1.0.x (1.0.12).

https://issues.apache.org/jira/browse/CASSANDRA-4631
See also the comments on 
https://issues.apache.org/jira/browse/CASSANDRA-4602 for the reference 
to what required a higher stack.


On Sep 9, 2012, at 12:47 PM, Christopher Keller cnkel...@gmail.com wrote:

This is necessary under the later versions of 1.6v35  as well. Nodetool 
will show the cluster as being down even though individual nodes will 
be up.


--Chris


On Sep 9, 2012, at 7:13 AM, dong.yajun dongt...@gmail.com wrote:

ruuning for a while, you should set the -Xss to more than 160k when you 
using jdk1.7. On Sun, Sep 9, 2012 at 3:39 AM, Peter Schuller 
peter.schul...@infidyne.com wrote:

Has anyone tried running 1.1.1 on Java 7?


Have been running jdk 1.7 on several clusters on 1.1 for a while now.

--
/ Peter Schuller (@scode, http://worldmodscode.wordpress.com)



--
Ric Dong Newegg Ecommerce, MIS department --
The downside of being better than everyone else is that people tend to 
assume you're pretentious.



--
Regards,
Oleg Dulin
NYC Java Big Data Engineer
http://www.olegdulin.com/




Re: Assertions running Cleanup on a 3-node cluster with Cassandra 1.1.4 and LCS

2012-09-11 Thread Omid Aladini
On Tue, Sep 11, 2012 at 8:33 PM, Janne Jalkanen
janne.jalka...@ecyrd.com wrote:

 A bug in Cassandra 1.1.2 and earlier could cause out-of-order sstables
 and inter-level overlaps in CFs with Leveled Compaction. Your sstables
 generated with 1.1.3 and later should not have this issue [1] [2].

 Does this mean that LCS on 1.0.x should be considered unsafe to
 use? I'm using them for semi-wide frequently-updated CounterColumns
 and they're performing much better on LCS than on STCS.

That's true. Unsafe in the sense that your data might not be in the
right shape with respect to order of keys in sstables and LCS's
properties and you might need to offline-scrub when you upgrade to the
latest 1.1.x.

 In case you have old Leveled-compacted sstables (generated with 1.1.2
 or earlier. including 1.0.x) you need to run offline scrub using
 Cassandra 1.1.4 or later via /bin/sstablescrub command so it'll fix
 out-of-order sstables and inter-level overlaps caused by previous
 versions of LCS. You need to take nodes down in order to run offline
 scrub.

 The  1.1.5 README does not mention this. Should it?

The fix was released on 1.1.3 (LCS fix) and 1.1.4 (offline scrub) and
I agree it would be helpful to have it on NEWS.txt.

Cheers,
Omid

 /Janne



Re: Assertions running Cleanup on a 3-node cluster with Cassandra 1.1.4 and LCS

2012-09-11 Thread Mikhail Panchenko
Based on the steps outlined here
https://issues.apache.org/jira/browse/CASSANDRA-4644?focusedCommentId=13453156page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13453156it
seems that LCS was not used until after 1.1.4 and they were able to do
a
full repair cleanup compact cycle on 1.1.4 before running into problems.

I don't see any major bugfixes for LCS in 1.1.5 either, so this appears to
be a legitimate bug if the timeline is correct.

On Tue, Sep 11, 2012 at 2:50 PM, Omid Aladini omidalad...@gmail.com wrote:

 On Tue, Sep 11, 2012 at 8:33 PM, Janne Jalkanen
 janne.jalka...@ecyrd.com wrote:
 
  A bug in Cassandra 1.1.2 and earlier could cause out-of-order sstables
  and inter-level overlaps in CFs with Leveled Compaction. Your sstables
  generated with 1.1.3 and later should not have this issue [1] [2].
 
  Does this mean that LCS on 1.0.x should be considered unsafe to
  use? I'm using them for semi-wide frequently-updated CounterColumns
  and they're performing much better on LCS than on STCS.

 That's true. Unsafe in the sense that your data might not be in the
 right shape with respect to order of keys in sstables and LCS's
 properties and you might need to offline-scrub when you upgrade to the
 latest 1.1.x.

  In case you have old Leveled-compacted sstables (generated with 1.1.2
  or earlier. including 1.0.x) you need to run offline scrub using
  Cassandra 1.1.4 or later via /bin/sstablescrub command so it'll fix
  out-of-order sstables and inter-level overlaps caused by previous
  versions of LCS. You need to take nodes down in order to run offline
  scrub.
 
  The  1.1.5 README does not mention this. Should it?

 The fix was released on 1.1.3 (LCS fix) and 1.1.4 (offline scrub) and
 I agree it would be helpful to have it on NEWS.txt.

 Cheers,
 Omid

  /Janne
 



Re: replace_token code?

2012-09-11 Thread Yang
replied in blue, Thanks
Yang


I thought the very first log line already acquired ownership , instead of
later in the sequence?


 WARN [main] 2012-09-10 08:00:21,855 TokenMetadata.java (line 160) Token
166594924822352415786406422619018814804 changing ownership from /
10.72.201.80 to /10.190.221.204



On Tue, Sep 11, 2012 at 1:55 PM, aaron morton aa...@thelastpickle.comwrote:

 This looks correct…

   INFO [GossipStage:1] 2012-09-10 08:01:23,036 Gossiper.java (line 850)
 Node /10.72.201.80 is now part of the cluster

  INFO [GossipStage:1] 2012-09-10 08:01:23,037 Gossiper.java (line 816)
 InetAddress /10.72.201.80 is now UP



 80 joined the ring because it was in the stored ring state.



This is where I was having a doubt: instead of being allowed to come out
from stored ring state, 80 should be immediately purged from ring
membership right after the first log  line, which purports to have acquired
ownership.  It's true that token ownership and ring membership are
orthogonal things, but here an explicit taking over token
operation immediately implies that the old one must be dead, and should be
kicked out of the ring. Granted that the detection of duplicate ownership
later will kick the old node out, I guess it maybe leaves room for
uncertainty before
the duplication is detected.



 INFO [GossipStage:1] 2012-09-10 08:01:23,038 StorageService.java (line
 1126) Nodes /10.72.201.80 and /10.190.221.204 have the same token
 166594924822352415786406422619018814804.  Ignoring /10.72.201.80

 New node took ownership

  INFO [GossipTasks:1] 2012-09-10 08:01:32,967 Gossiper.java (line 830)
 InetAddress /10.72.201.80 is now dead.
  INFO [GossipTasks:1] 2012-09-10 08:01:53,976 Gossiper.java (line 644)
 FatClient /10.72.201.80 has been silent for 3ms, removing from gossip

 Old node marked as dead and the process to remove is started.

 Has the 80 node re appeared in the logs ?

no,


 If it does can you include the output from nodetool gossipinfo ?








 Cheers


   -
 Aaron Morton
 Freelance Developer
 @aaronmorton
 http://www.thelastpickle.com

 On 11/09/2012, at 5:59 AM, Yang tedd...@gmail.com wrote:

 Thanks Jim, looks I'll have to read into the code to understand what is
 happening under the hood

 yang

 On Mon, Sep 10, 2012 at 9:45 AM, Jim Cistaro jcist...@netflix.com wrote:

  We have seen various issues from these replaced nodes hanging around.
  For clusters where a lot of nodes have been replaced, we see these
 replaced nodes having an impact on heap/GC and a lot of tcp
 timeouts/retransmits (because the old nodes no longer exist).  As a result,
 we have begun cleaning these up using unsafeAssassinateEndpoint via jmx.
  We have only started using recently.  So far no bad side effects.  This
 also helps because those replaced nodes can appear as unreachable nodes
 wrt schema and sometimes prevent things like CF truncation.

  Using unsafeAssassinateEndpoint will clean these from unreachable nodes
 and will mark them as LEFT in gossip info.  There is a ttl for them in
 gossipinfo and they should go away after 3 days.  Once they are marked
 LEFT, you should stop seeing those up/same/dead messages.

  unsafeAssassinateEndpoint is unsafe in that, if you specify IP of a
 real node in cluster, that node will be assassinated.  Otherwise, if you
 specify nodes that have been replaced, it is supposed to work correctly.

  Hope this helps,
 jc


   From: Yang tedd...@gmail.com
 Reply-To: user@cassandra.apache.org
 Date: Mon, 10 Sep 2012 01:10:56 -0700
 To: user@cassandra.apache.org
 Subject: replace_token code?

  it looks that by specifying replace_token, the old owner is not removed
 from gossip (which I had thought it would do).
 Then it's understandable that the old owner would resurface later and we
 get some warning saying that the same token is owned by both.


  I ran an example with a 2-node cluster, with RF=2.  host 10.72.201.80
 was running for a while and had some data, then i shut it down, and
 booted up 10.190.221.204 with replace_token of the old token owned by the
 previous host.
 the following log sequence shows that the new host does acquire the
 token, but it does not at the same time remove 80 forcefully from gossip.
 instead, a few seconds later, it believed that .80 became live again.
 I don't have much understanding of the Gossip protocol, but roughly know
 that it's probability-based, looks we need an assertive/NOW
 membership control message for replace_token.




  thanks
 yang


   WARN [main] 2012-09-10 08:00:21,855 TokenMetadata.java (line 160)
 Token 166594924822352415786406422619018814804 changing ownership from /
 10.72.201.80 to /10.190.221.204
  INFO [main] 2012-09-10 08:00:21,855 StorageService.java (line 753)
 JOINING: Starting to bootstrap...
  INFO [CompactionExecutor:2] 2012-09-10 08:00:21,875 CompactionTask.java
 (line 109) Compacting
 [SSTableReader(path='/mnt/cassandra/data/system/LocationInfo/system-LocationInfo-hd-1-Data.db'),
 

How to replace a dead *seed* node while keeping quorum

2012-09-11 Thread Edward Sargisson

Hi all,
We just ran into an interesting and unexpected situation with restarting 
a downed node.


If the downed node is a seed node then neither of the replace a dead 
node procedures work (-Dcassandra.replace_token and taking 
initial_token-1). The ring remains split.
The host is listed as a seed in the config for the other members of the 
ring. If we rename the host then it will rejoin the ring.
In other words, if the host name is on the seeds list then it appears 
that the rest of the ring refuses to bootstrap it.


This leads to a problem: If the node needs to be taken out of the seeds 
list on every working node then that requires a restart of each node - 
which means that, for short periods, the ring is missing 2 nodes and a 
quorum read or write (RF=3) will fail.


Are there any useful tricks for restarting the node with the same 
hostname or are we expected to rename the node?


Cheers,
Edward
--

Edward Sargisson

senior java developer
Global Relay

edward.sargis...@globalrelay.net mailto:edward.sargis...@globalrelay.net


*866.484.6630*
New York | Chicago | Vancouver | London (+44.0800.032.9829) | Singapore 
(+65.3158.1301)


Global Relay Archive supports email, instant messaging, BlackBerry, 
Bloomberg, Thomson Reuters, Pivot, YellowJacket, LinkedIn, Twitter, 
Facebook and more.



Ask about *Global Relay Message* 
http://www.globalrelay.com/services/message*--- *The Future of 
Collaboration in the Financial Services World


*
*All email sent to or from this address will be retained by Global 
Relay's email archiving system. This message is intended only for the 
use of the individual or entity to which it is addressed, and may 
contain information that is privileged, confidential, and exempt from 
disclosure under applicable law.  Global Relay will not be liable for 
any compliance or technical information provided herein. All trademarks 
are the property of their respective owners.




Re: Number of columns per row for Composite Primary Key CQL 3.0

2012-09-11 Thread Data Craftsman 木匠
Hi Aaron,

Thanks for the suggestion, as always.  :)   I'll read your slides soon.

What is MM stands for? million ?

Thanks,
Charlie

On Mon, Sep 10, 2012 at 6:37 PM, aaron morton aa...@thelastpickle.com wrote:
 In general wider rows take a bit longer to read, however different access
 patterns have different performance. I did some tests here
 http://www.slideshare.net/aaronmorton/cassandra-sf-2012-technical-deep-dive-query-performance
 and http://thelastpickle.com/2011/07/04/Cassandra-Query-Plans/

 I would suggest 1MM cols is fine, if you get to 10MM cols per row you
 probably have gone too far. Remember the byte size of the row is also
 important; larger rows churn memory more and take longer to compact /
 repair.

 Hope that helps.

 -
 Aaron Morton
 Freelance Developer
 @aaronmorton
 http://www.thelastpickle.com

 On 8/09/2012, at 11:05 AM, Data Craftsman 木匠 database.crafts...@gmail.com
 wrote:

 Hello experts.

 Should I limit the number of rows per Composite Primary Key's leading
 column?

 I think it falls into the same wide row good practice for number of
 columns per row for CQL 2.0, e.g. 10M or less.

 Any comments will be appreciated.

 --
 Thanks,

 Charlie (@mujiang) 木匠
 ===
 Data Architect Developer 汉唐 田园牧歌DBA
 http://mujiang.blogspot.com


how to enter float value from cassandra-cli ?

2012-09-11 Thread Yuhan Zhang
Hi all,

I'm trying to manually adding some double values into a column family. From
the Hector client, there's a DoubleSerializer.
but looks like the cli tool is not providing a way to enter floating point
values. here's the message I got:

[default@video] set cateogry['1']['sport'] = float('0.5');
Function 'float' not found. Available functions: bytes, integer, long, int,
lexicaluuid, timeuuid, utf8, ascii, countercolumn.

Is there a way to insert floating pointer value from the cli tool?


Thank you.

Yuhan


Re: [RELEASE] Apache Cassandra 1.1.5 released

2012-09-11 Thread Jason Axelson
Hi André,

That looks like something that I've run into as well on previous
versions of Cassandra. Our workaround was to not drop a keyspace and
the re-use it (which we were doing as part of a test suite).

This is a related stackoverflow post:
http://stackoverflow.com/questions/11623356/cassandra-server-throws-java-lang-assertionerror-decoratedkey-decorated

Jason

On Mon, Sep 10, 2012 at 11:29 PM, André Cruz andre.c...@co.sapo.pt wrote:
 I'm also having AssertionErrors.

 ERROR [ReadStage:51687] 2012-09-10 14:33:54,211 AbstractCassandraDaemon.java 
 (line 134) Exception in thread Thread[ReadStage:51687,5,main]
 java.io.IOError: java.io.EOFException
 at 
 org.apache.cassandra.db.columniterator.SSTableSliceIterator.init(SSTableSliceIterator.java:64)
 at 
 org.apache.cassandra.db.filter.SliceQueryFilter.getSSTableColumnIterator(SliceQueryFilter.java:66)
 at 
 org.apache.cassandra.db.filter.QueryFilter.getSSTableColumnIterator(QueryFilter.java:78)
 at 
 org.apache.cassandra.db.CollationController.collectAllData(CollationController.java:256)
 at 
 org.apache.cassandra.db.CollationController.getTopLevelColumns(CollationController.java:63)
 at 
 org.apache.cassandra.db.ColumnFamilyStore.getTopLevelColumns(ColumnFamilyStore.java:1345)
 at 
 org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(ColumnFamilyStore.java:1207)
 at 
 org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(ColumnFamilyStore.java:1142)
 at org.apache.cassandra.db.Table.getRow(Table.java:378)
 at 
 org.apache.cassandra.db.SliceFromReadCommand.getRow(SliceFromReadCommand.java:69)
 at 
 org.apache.cassandra.service.StorageProxy$LocalReadRunnable.runMayThrow(StorageProxy.java:816)
 at 
 org.apache.cassandra.service.StorageProxy$DroppableRunnable.run(StorageProxy.java:1250)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
 at java.lang.Thread.run(Thread.java:662)
 Caused by: java.io.EOFException
 at java.io.RandomAccessFile.readFully(RandomAccessFile.java:399)
 at java.io.RandomAccessFile.readFully(RandomAccessFile.java:377)
 at 
 org.apache.cassandra.io.util.RandomAccessReader.readBytes(RandomAccessReader.java:324)
 at 
 org.apache.cassandra.utils.ByteBufferUtil.read(ByteBufferUtil.java:398)
 at 
 org.apache.cassandra.utils.ByteBufferUtil.readWithShortLength(ByteBufferUtil.java:380)
 at 
 org.apache.cassandra.db.columniterator.SSTableSliceIterator.init(SSTableSliceIterator.java:54)
 ... 14 more
 ERROR [ReadStage:51801] 2012-09-10 14:44:38,852 AbstractCassandraDaemon.java 
 (line 134) Exception in thread Thread[ReadStage:51801,5,main]
 java.lang.AssertionError: 
 DecoratedKey(12064825934064381804725403203980154559, 
 

Re: nodetool connection refused

2012-09-11 Thread Manu Zhang
problems solved. I didn't add the jmx_host and jmx_port to vm_arguments in
Eclipse. How come it is not covered in wiki
http://wiki.apache.org/cassandra/RunningCassandraInEclipse ? Or is it
outdated?

On Mon, Sep 10, 2012 at 10:11 AM, Manu Zhang owenzhang1...@gmail.comwrote:

 It's more like an Eclipse issue now since I find a 0.0.0.0:7199
 listener when executing bin/cassandra in terminal but none when running
 Cassandra in Eclipse.


 On Sun, Sep 9, 2012 at 12:56 PM, Manu Zhang owenzhang1...@gmail.comwrote:

 No, I don't find a listener whose port is 7199. Where to setup? I've been
 experimenting on my laptop so both of them are local.


 On Sun, Sep 9, 2012 at 1:28 AM, Senthilvel Rangaswamy 
 senthil...@gmail.com wrote:

 What is the address for thrift listener. Did you put 0.0.0.0:7199 ?

 On Fri, Sep 7, 2012 at 11:53 PM, Manu Zhang owenzhang1...@gmail.comwrote:

 When I run Cassandra-trunk in Eclipse, nodetool fail to connect with
 the following error
 Failed to connect to '127.0.0.1:7199': Connection refused
 But if I run in terminal, all will be fine.




 --
 ..Senthil

 If there's anything more important than my ego around, I want it
  caught and shot now.
 - Douglas Adams.






Re: Astyanax InstantiationException when accessing ColumnList

2012-09-11 Thread Ran User
Oops, forgot to mention Cassandra version - 1.1.4

On Tue, Sep 11, 2012 at 5:54 AM, Ran User ranuse...@gmail.com wrote:

 Stuck for hours on this one, thanks in advance!

 -  Scala 2.9.2
 - Astyanax 1.0.6 (also tried 1.0.5)
 - Using CompositeRowKey, CompositeColumnName
 - No problem inserting into Cassandra
 - Can read a row, ColumnList.size() returns correct count however any
 attempt to access ColumnList (i.e. iterate, access iterate ColumnList,
 getColumnByIndex(), getColumnByName(), etc) will throw the following
 exception:

 Exception:

 java.lang.RuntimeException: java.lang.InstantiationException

 relevant stack trace:

 java.lang.RuntimeException: java.lang.InstantiationException:
 shops.integration.db.scalaquery.ReportingDao$MetricsLogFileCompositeColumn
 at
 com.netflix.astyanax.serializers.AnnotatedCompositeSerializer.fromByteBuffer(AnnotatedCompositeSerializer.java:136)
 at
 com.netflix.astyanax.serializers.AbstractSerializer.fromBytes(AbstractSerializer.java:40)
 at
 com.netflix.astyanax.thrift.model.ThriftColumnOrSuperColumnListImpl.constructMap(ThriftColumnOrSuperColumnListImpl.java:201)
 at
 com.netflix.astyanax.thrift.model.ThriftColumnOrSuperColumnListImpl.getColumn(ThriftColumnOrSuperColumnListImpl.java:189)
 at
 com.netflix.astyanax.thrift.model.ThriftColumnOrSuperColumnListImpl.getColumnByName(ThriftColumnOrSuperColumnListImpl.java:103)

 Relevant sample code:

 class TestCompositeColumn(@(Component @field) var logFileId: Long,
 @(Component @field) var dt: String, @(Component @field) var dk: String)
 extends Ordered[TestCompositeColumn] {
 def this() = this(0l, , )
 //equals, hashCode, compare all implemented
 }

 I've also tried this variation on the class:

 class TestCompositeColumn(idIn: Long, key1In: String, key2In: String)
 extends Ordered[TestCompositeColumn] {
 @Component(ordinal = 0) var id: Long = idIn
 @Component(ordinal = 1) var key1: String = key1In
 @Component(ordinal = 2) var key2: String = key2In

 def this() = this(0, null, null)
 //equals, hashCode, compare all implemented
 }
 val TEST_COLUMN_FAMILY = new ColumnFamily[TestRowKey, TestCompositeColumn](
 test_column_family,
 new AnnotatedCompositeSerializer[TestRowKey](classOf[TestRowKey]),
 new
 AnnotatedCompositeSerializer[TestCompositeColumn](classOf[TestCompositeColumn]),
 BytesArraySerializer.get());

 var columnList = keyspace.prepareQuery(TEST_COLUMN_FAMILY)
 .getKey(TestRowKey(1l, 2012090100))
 .execute().getResult()

 // OK - will return 6 for example, also verified via cassandra-cli
 println(columnList.size())

 // ERROR - will throw exception above.  Iterating, or any type of access
 will also throw same exception
 println(columnList.getColumnByIndex(0).getStringValue())

 Thank you!!!