date:20111201

Hinted handoff bug?

2011-12-01 Thread Fredrik L Stigbäck


Hi,
We,re running cassandra 1.0.3.
I've done some testing with 2 nodes (node A, node B), replication factor 2.
I take node A down, writing some data to node B and then take node A up.
Sometimes hints aren't delivered when node A comes up.

I've done some debugging in org.apache.cassandra.db.HintedHandOffManager 
and sometimes node B ends up in a strange state in method
org.apache.cassandra.db.HintedHandOffManager.deliverHints(final 
InetAddress to), where 
org.apache.cassandra.db.HintedHandOffManager.queuedDeliveries already 
has node A in it's Set and therefore no hints will ever be delivered to 
node A.
The only reason for this that I can see is that in 
org.apache.cassandra.db.HintedHandOffManager.deliverHintsToEndpoint(InetAddress 
endpoint) the hintStore.isEmpty() check returns true and the endpoint 
(node A)  isn't removed from 
org.apache.cassandra.db.HintedHandOffManager.queuedDeliveries. Then no 
hints will ever be delivered again until node B is restarted.

During what conditions will hintStore.isEmpty() return true?
Shouldn't the hintStore.isEmpty() check be inside the try {} finally{} 
clause, removing the endpoint from queuedDeliveries in the finally block?


public void deliverHints(final InetAddress to)
{
logger_.debug(deliverHints to {}, to);
*if (!queuedDeliveries.add(to))*
return;
...
}

private void deliverHintsToEndpoint(InetAddress endpoint) throws 
IOException, DigestMismatchException, InvalidRequestException, 
TimeoutException,

{
ColumnFamilyStore hintStore = 
Table.open(Table.SYSTEM_TABLE).getColumnFamilyStore(HINTS_CF);

*   if (hintStore.isEmpty())*
return; // nothing to do, don't confuse users by logging a 
no-op handoff

try
{
..
}
finally
{
*queuedDeliveries.remove(endpoint);*
}
}

Regards
/Fredrik

Re: Hinted handoff bug?

2011-12-01 Thread Sylvain Lebresne

You're right, good catch.
Do you mind opening a ticket on jira
(https://issues.apache.org/jira/browse/CASSANDRA)?

--
Sylvain

On Thu, Dec 1, 2011 at 10:03 AM, Fredrik L Stigbäck
fredrik.l.stigb...@sitevision.se wrote:
 Hi,
 We,re running cassandra 1.0.3.
 I've done some testing with 2 nodes (node A, node B), replication factor 2.
 I take node A down, writing some data to node B and then take node A up.
 Sometimes hints aren't delivered when node A comes up.

 I've done some debugging in org.apache.cassandra.db.HintedHandOffManager and
 sometimes node B ends up in a strange state in method
 org.apache.cassandra.db.HintedHandOffManager.deliverHints(final InetAddress
 to), where org.apache.cassandra.db.HintedHandOffManager.queuedDeliveries
 already has node A in it's Set and therefore no hints will ever be delivered
 to node A.
 The only reason for this that I can see is that in
 org.apache.cassandra.db.HintedHandOffManager.deliverHintsToEndpoint(InetAddress
 endpoint) the hintStore.isEmpty() check returns true and the endpoint (node
 A)  isn't removed from
 org.apache.cassandra.db.HintedHandOffManager.queuedDeliveries. Then no hints
 will ever be delivered again until node B is restarted.
 During what conditions will hintStore.isEmpty() return true?
 Shouldn't the hintStore.isEmpty() check be inside the try {} finally{}
 clause, removing the endpoint from queuedDeliveries in the finally block?

 public void deliverHints(final InetAddress to)
 {
     logger_.debug(deliverHints to {}, to);
     if (!queuedDeliveries.add(to))
     return;
         ...
 }

 private void deliverHintsToEndpoint(InetAddress endpoint) throws
 IOException, DigestMismatchException, InvalidRequestException,
 TimeoutException,
 {
     ColumnFamilyStore hintStore =
 Table.open(Table.SYSTEM_TABLE).getColumnFamilyStore(HINTS_CF);
     if (hintStore.isEmpty())
     return; // nothing to do, don't confuse users by logging a no-op
 handoff
     try
     {
     ..
     }
     finally
     {
     queuedDeliveries.remove(endpoint);
     }
 }

 Regards
 /Fredrik

Insufficient disk space to flush

2011-12-01 Thread Alexandru Dan Sicoe

Hello everyone,
 4 node Cassandra 0.8.5 cluster with RF =2.
 One node started throwing exceptions in its log:

ERROR 10:02:46,837 Fatal exception in thread Thread[FlushWriter:1317,5,main]
java.lang.RuntimeException: java.lang.RuntimeException: Insufficient disk
space to flush 17296 bytes
at
org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:34)
at
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:619)
Caused by: java.lang.RuntimeException: Insufficient disk space to flush
17296 bytes
at
org.apache.cassandra.db.ColumnFamilyStore.getFlushPath(ColumnFamilyStore.java:714)
at
org.apache.cassandra.db.ColumnFamilyStore.createFlushWriter(ColumnFamilyStore.java:2301)
at
org.apache.cassandra.db.Memtable.writeSortedContents(Memtable.java:246)
at org.apache.cassandra.db.Memtable.access$400(Memtable.java:49)
at org.apache.cassandra.db.Memtable$3.runMayThrow(Memtable.java:270)
at
org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:30)
... 3 more

Checked disk and obviously it's 100% full.

How do I recover from this without loosing the data? I've got plenty of
space on the other nodes, so I thought of doing a decommission which I
understand reassigns ranges to the other nodes and replicates data to them.
After that's done I plan on manually deleting the data on the node and then
joining in the same cluster position with auto-bootstrap turned off so that
I won't get back the old data and I can continue getting new data with the
node.

Note, I would like to have 4 nodes in because the other three barely take
the input load alone. These are just long running tests until I get some
better machines.

On strange thing I found is that the data folder on the ndoe that filled up
the disk is 150 GB (as measured with du) while the data folder on all other
3 nodes is 50 GB. At the same time, DataStax OpsCenter shows a size of
around 50GB for all 4 nodes. I though that the node was making a major
compaction at which time it filled up the diskbut even that doesn't
make sense because shouldn't a major compaction just be capable of doubling
the size, not triple-ing it? Doesn anyone know how to explain this behavior?

Thanks,
Alex

Re: Hinted handoff bug?

2011-12-01 Thread Fredrik L Stigbäck


Yes, I'll do that.

/Fredrik
Sylvain Lebresne skrev 2011-12-01 11:10:

You're right, good catch.
Do you mind opening a ticket on jira
(https://issues.apache.org/jira/browse/CASSANDRA)?

--
Sylvain

On Thu, Dec 1, 2011 at 10:03 AM, Fredrik L Stigbäck
fredrik.l.stigb...@sitevision.se  wrote:

Hi,
We,re running cassandra 1.0.3.
I've done some testing with 2 nodes (node A, node B), replication factor 2.
I take node A down, writing some data to node B and then take node A up.
Sometimes hints aren't delivered when node A comes up.

I've done some debugging in org.apache.cassandra.db.HintedHandOffManager and
sometimes node B ends up in a strange state in method
org.apache.cassandra.db.HintedHandOffManager.deliverHints(final InetAddress
to), where org.apache.cassandra.db.HintedHandOffManager.queuedDeliveries
already has node A in it's Set and therefore no hints will ever be delivered
to node A.
The only reason for this that I can see is that in
org.apache.cassandra.db.HintedHandOffManager.deliverHintsToEndpoint(InetAddress
endpoint) the hintStore.isEmpty() check returns true and the endpoint (node
A)  isn't removed from
org.apache.cassandra.db.HintedHandOffManager.queuedDeliveries. Then no hints
will ever be delivered again until node B is restarted.
During what conditions will hintStore.isEmpty() return true?
Shouldn't the hintStore.isEmpty() check be inside the try {} finally{}
clause, removing the endpoint from queuedDeliveries in the finally block?

public void deliverHints(final InetAddress to)
{
 logger_.debug(deliverHints to {}, to);
 if (!queuedDeliveries.add(to))
 return;
 ...
}

private void deliverHintsToEndpoint(InetAddress endpoint) throws
IOException, DigestMismatchException, InvalidRequestException,
TimeoutException,
{
 ColumnFamilyStore hintStore =
Table.open(Table.SYSTEM_TABLE).getColumnFamilyStore(HINTS_CF);
 if (hintStore.isEmpty())
 return; // nothing to do, don't confuse users by logging a no-op
handoff
 try
 {
 ..
 }
 finally
 {
 queuedDeliveries.remove(endpoint);
 }
}

Regards
/Fredrik

Re: Strategies to maintain counters sorted row ?

2011-12-01 Thread Aditya

In my application, I need to store the total scores/reputation of the users
as counters and want to show sorted(by score) lists of users. Also want to
implement a name search facility on top of that. Could you suggest  any
schema to achieve that using cassandra.


On Wed, Nov 30, 2011 at 12:53 PM, Aditya ady...@gmail.com wrote:

 I know it is not possible to sort columns in a row by counter values so
 what are the other strategies to maintain a sorted list (of counters)  in
 cassandra? Could you propose some schema that might be helpful to achieve
 this ?

 Or do I need to retrieve thousands of columns each time and do the sorting
 at application level ?

RE: [RELEASE] Apache Cassandra 1.0.5 released

2011-12-01 Thread Michael Vaknine

Unfortunately no.

Second time I did the test:

Restore 4 nodes to a new Cassandra cluster (0.7.8)
Upgrade to 1.0.0
Run nodetool scrub on each node after upgrade before upgrading next node.
Upgrade to 1.0.3
Upgrade to 1.0.5
Run nodetool repair on all nodes.
All process was successful.

What does this error means and how is it affecting my cluster?
Do you think it is safe to upgrade to 1.0.5 and disregard this error?

Thanks
Michael


-Original Message-
From: Jonathan Ellis [mailto:jbel...@gmail.com] 
Sent: Thursday, December 01, 2011 2:10 AM
To: user@cassandra.apache.org
Subject: Re: [RELEASE] Apache Cassandra 1.0.5 released

I don't think so.  That code hasn't changed in a long time.  Is it
reproducible?

On Wed, Nov 30, 2011 at 2:46 PM, Michael Vaknine micha...@citypath.com
wrote:
 Hi,
 Upgrade 1.0.3 to 1.0.5
 I have this errors
 TST-Cass2 ERROR [Thread-58] 2011-11-30 20:40:17,449
 AbstractCassandraDaemon.java (line 133) Fatal exception in thread Thread
 TST-Cass2 ERROR [Thread-58] 2011-11-30 20:40:17,449
java.lang.AssertionError
 TST-Cass2 ERROR [Thread-58] 2011-11-30 20:40:17,449 at

org.apache.cassandra.db.ColumnFamilyStore.maybeSwitchMemtable(ColumnFamilySt
 ore.java:671)
 TST-Cass2 ERROR [Thread-58] 2011-11-30 20:40:17,449 at

org.apache.cassandra.db.ColumnFamilyStore.forceFlush(ColumnFamilyStore.java:
 745)
 TST-Cass2 ERROR [Thread-58] 2011-11-30 20:40:17,449 at

org.apache.cassandra.db.ColumnFamilyStore.forceBlockingFlush(ColumnFamilySto
 re.java:750)
 TST-Cass2 ERROR [Thread-58] 2011-11-30 20:40:17,449 at

org.apache.cassandra.db.index.keys.KeysIndex.forceBlockingFlush(KeysIndex.ja
 va:119)
 TST-Cass2 ERROR [Thread-58] 2011-11-30 20:40:17,449 at

org.apache.cassandra.db.index.SecondaryIndexManager.flushIndexesBlocking(Sec
 ondaryIndexManager.java:258)
 TST-Cass2 ERROR [Thread-58] 2011-11-30 20:40:17,449 at

org.apache.cassandra.db.index.SecondaryIndexManager.maybeBuildSecondaryIndex
 es(SecondaryIndexManager.java:123)
 TST-Cass2 ERROR [Thread-58] 2011-11-30 20:40:17,449 at

org.apache.cassandra.streaming.StreamInSession.closeIfFinished(StreamInSessi
 on.java:151)
 TST-Cass2 ERROR [Thread-58] 2011-11-30 20:40:17,449 at

org.apache.cassandra.streaming.IncomingStreamReader.read(IncomingStreamReade
 r.java:103)
 TST-Cass2 ERROR [Thread-58] 2011-11-30 20:40:17,449 at

org.apache.cassandra.net.IncomingTcpConnection.stream(IncomingTcpConnection.
 java:184)
 TST-Cass2 ERROR [Thread-58] 2011-11-30 20:40:17,449 at

org.apache.cassandra.net.IncomingTcpConnection.run(IncomingTcpConnection.jav
 a:81)

 Is this another regression?

 Thanks
 Michael

 -Original Message-
 From: Brandon Williams [mailto:dri...@gmail.com]
 Sent: Wednesday, November 30, 2011 9:43 PM
 To: user@cassandra.apache.org
 Subject: Re: [RELEASE] Apache Cassandra 1.0.5 released

 On Wed, Nov 30, 2011 at 1:29 PM, Michael Vaknine micha...@citypath.com
 wrote:
 The files are not on the site
 The requested URL
 /apache//cassandra/1.0.5/apache-cassandra-1.0.5-bin.tar.gz
 was not found on this server.

 It takes the mirrors some time to sync.

 -Brandon




-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of DataStax, the source for professional Cassandra support
http://www.datastax.com

Re: [RELEASE] Apache Cassandra 1.0.5 released

2011-12-01 Thread Sylvain Lebresne

On Thu, Dec 1, 2011 at 11:58 AM, Michael Vaknine micha...@citypath.com wrote:
 Unfortunately no.

 Second time I did the test:

 Restore 4 nodes to a new Cassandra cluster (0.7.8)
 Upgrade to 1.0.0
 Run nodetool scrub on each node after upgrade before upgrading next node.
 Upgrade to 1.0.3
 Upgrade to 1.0.5
 Run nodetool repair on all nodes.
 All process was successful.

 What does this error means and how is it affecting my cluster?
 Do you think it is safe to upgrade to 1.0.5 and disregard this error?

It's a race condition between the flush of a memtable and the flush of
it's secondary indexes. It's not specific to 1.0.5 at all (it's in 0.8, probably
since 0.8.0), and it's very unlikely to happen (you're the first to report it).
The error has not real effect (it cannot result in data corruption or
data loss).

I've created https://issues.apache.org/jira/browse/CASSANDRA-3547
to fix.

--
Sylvain


 Thanks
 Michael


 -Original Message-
 From: Jonathan Ellis [mailto:jbel...@gmail.com]
 Sent: Thursday, December 01, 2011 2:10 AM
 To: user@cassandra.apache.org
 Subject: Re: [RELEASE] Apache Cassandra 1.0.5 released

 I don't think so.  That code hasn't changed in a long time.  Is it
 reproducible?

 On Wed, Nov 30, 2011 at 2:46 PM, Michael Vaknine micha...@citypath.com
 wrote:
 Hi,
 Upgrade 1.0.3 to 1.0.5
 I have this errors
 TST-Cass2 ERROR [Thread-58] 2011-11-30 20:40:17,449
 AbstractCassandraDaemon.java (line 133) Fatal exception in thread Thread
 TST-Cass2 ERROR [Thread-58] 2011-11-30 20:40:17,449
 java.lang.AssertionError
 TST-Cass2 ERROR [Thread-58] 2011-11-30 20:40:17,449 at

 org.apache.cassandra.db.ColumnFamilyStore.maybeSwitchMemtable(ColumnFamilySt
 ore.java:671)
 TST-Cass2 ERROR [Thread-58] 2011-11-30 20:40:17,449 at

 org.apache.cassandra.db.ColumnFamilyStore.forceFlush(ColumnFamilyStore.java:
 745)
 TST-Cass2 ERROR [Thread-58] 2011-11-30 20:40:17,449 at

 org.apache.cassandra.db.ColumnFamilyStore.forceBlockingFlush(ColumnFamilySto
 re.java:750)
 TST-Cass2 ERROR [Thread-58] 2011-11-30 20:40:17,449 at

 org.apache.cassandra.db.index.keys.KeysIndex.forceBlockingFlush(KeysIndex.ja
 va:119)
 TST-Cass2 ERROR [Thread-58] 2011-11-30 20:40:17,449 at

 org.apache.cassandra.db.index.SecondaryIndexManager.flushIndexesBlocking(Sec
 ondaryIndexManager.java:258)
 TST-Cass2 ERROR [Thread-58] 2011-11-30 20:40:17,449 at

 org.apache.cassandra.db.index.SecondaryIndexManager.maybeBuildSecondaryIndex
 es(SecondaryIndexManager.java:123)
 TST-Cass2 ERROR [Thread-58] 2011-11-30 20:40:17,449 at

 org.apache.cassandra.streaming.StreamInSession.closeIfFinished(StreamInSessi
 on.java:151)
 TST-Cass2 ERROR [Thread-58] 2011-11-30 20:40:17,449 at

 org.apache.cassandra.streaming.IncomingStreamReader.read(IncomingStreamReade
 r.java:103)
 TST-Cass2 ERROR [Thread-58] 2011-11-30 20:40:17,449 at

 org.apache.cassandra.net.IncomingTcpConnection.stream(IncomingTcpConnection.
 java:184)
 TST-Cass2 ERROR [Thread-58] 2011-11-30 20:40:17,449 at

 org.apache.cassandra.net.IncomingTcpConnection.run(IncomingTcpConnection.jav
 a:81)

 Is this another regression?

 Thanks
 Michael

 -Original Message-
 From: Brandon Williams [mailto:dri...@gmail.com]
 Sent: Wednesday, November 30, 2011 9:43 PM
 To: user@cassandra.apache.org
 Subject: Re: [RELEASE] Apache Cassandra 1.0.5 released

 On Wed, Nov 30, 2011 at 1:29 PM, Michael Vaknine micha...@citypath.com
 wrote:
 The files are not on the site
 The requested URL
 /apache//cassandra/1.0.5/apache-cassandra-1.0.5-bin.tar.gz
 was not found on this server.

 It takes the mirrors some time to sync.

 -Brandon




 --
 Jonathan Ellis
 Project Chair, Apache Cassandra
 co-founder of DataStax, the source for professional Cassandra support
 http://www.datastax.com

NetworkTopologyStrategy bug?

2011-12-01 Thread Viktor Jevdokimov

Assume for now we have 1 DC and 1 rack with 3 nodes. Ring will look like:
(we use own snitch, which returns DC=0, Rack=0 for this case).

AddressDC   Rack   Token

  113427455640312821154458202477256070484
10.0.0.1 0  0  0
10.0.0.2 0  0  
56713727820156410577229101238628035242
10.0.0.3 0  0  
113427455640312821154458202477256070484

Schema: ReplicaPlacementStrategy=NetworkTopologyStrategy, options: [0:2] (2 
replicas in DC 0).

When trying to run cleanup (same problem with repair), Cassandra reports:

From 10.0.0.1:
DEBUG [time] 10.0.0.2,10.0.0.3 endpoints in datacenter 0 for token 0
DEBUG [time] 10.0.0.2,10.0.0.3 endpoints in datacenter 0 for token 
56713727820156410577229101238628035242
DEBUG [time] 10.0.0.3,10.0.0.2 endpoints in datacenter 0 for token 
113427455640312821154458202477256070484
INFO [time] Cleanup cannot run before a node has joined the ring

From 10.0.0.2:
DEBUG [time] 10.0.0.1,10.0.0.3 endpoints in datacenter 0 for token 0
DEBUG [time] 10.0.0.1,10.0.0.3 endpoints in datacenter 0 for token 
56713727820156410577229101238628035242
DEBUG [time] 10.0.0.3,10.0.0.1 endpoints in datacenter 0 for token 
113427455640312821154458202477256070484
INFO [time] Cleanup cannot run before a node has joined the ring

From 10.0.0.3:
DEBUG [time] 10.0.0.1,10.0.0.2 endpoints in datacenter 0 for token 0
DEBUG [time] 10.0.0.1,10.0.0.2 endpoints in datacenter 0 for token 
56713727820156410577229101238628035242
DEBUG [time] 10.0.0.2,10.0.0.1 endpoints in datacenter 0 for token 
113427455640312821154458202477256070484
INFO [time] Cleanup cannot run before a node has joined the ring

For me this means, that one node thinks that whole data range is on other two 
nodes.

As a result:

WRITE request with any key/any token sent to 10.0.0.1 controller will be 
forwarded and saved on 10.0.0.2 and 10.0.0.3
READ request with CL.One with any key/any token sent to 10.0.0.2 controller 
will be forwarded to 10.0.0.1 or 10.0.0.3, and since 10.0.0.1 can't have data 
for write above, some requests fails, some don't (if 10.0.0.3 answers).
More of it, every READ request to any node will be forwarded to other node.

That what we have right now with 0.8.6 and up to 1.0.5 as with 3 nodes in 1 DC, 
as with 8x2 nodes.





Best regards/ Pagarbiai



Viktor Jevdokimov

Senior Developer



Email:  viktor.jevdoki...@adform.com

Phone: +370 5 212 3063. Fax: +370 5 261 0453

J. Jasinskio 16C, LT-01112 Vilnius, Lithuania






[Adform news]http://www.adform.com/

[Visit us!]

Follow:


[twitter]http://twitter.com/#!/adforminsider

Visit our bloghttp://www.adform.com/site/blog



Disclaimer: The information contained in this message and attachments is 
intended solely for the attention and use of the named addressee and may be 
confidential. If you are not the intended recipient, you are reminded that the 
information remains the property of the sender. You must not use, disclose, 
distribute, copy, print or rely on this e-mail. If you have received this 
message in error, please contact the sender immediately and irrevocably delete 
this message and any copies.

inline: signature-logo29.pnginline: dm-exco4823.pnginline: tweet18be.png

RE: NetworkTopologyStrategy bug?

2011-12-01 Thread Viktor Jevdokimov

Sorry, the bug was in our snitch. We're using getHostName() instead of 
getCanonicalHostName() to determine DC  Rack and since for local it returns 
alias, instead of reverse DNS, DC  Rack numbers are not as expected.





Best regards/ Pagarbiai



Viktor Jevdokimov

Senior Developer



Email:  viktor.jevdoki...@adform.com

Phone: +370 5 212 3063. Fax: +370 5 261 0453

J. Jasinskio 16C, LT-01112 Vilnius, Lithuania






[Adform news]http://www.adform.com/

[Visit us!]

Follow:


[twitter]http://twitter.com/#!/adforminsider

Visit our bloghttp://www.adform.com/site/blog



Disclaimer: The information contained in this message and attachments is 
intended solely for the attention and use of the named addressee and may be 
confidential. If you are not the intended recipient, you are reminded that the 
information remains the property of the sender. You must not use, disclose, 
distribute, copy, print or rely on this e-mail. If you have received this 
message in error, please contact the sender immediately and irrevocably delete 
this message and any copies.

From: Viktor Jevdokimov [mailto:viktor.jevdoki...@adform.com]
Sent: Thursday, December 01, 2011 14:05
To: user@cassandra.apache.org
Subject: NetworkTopologyStrategy bug?

Assume for now we have 1 DC and 1 rack with 3 nodes. Ring will look like:
(we use own snitch, which returns DC=0, Rack=0 for this case).

AddressDC   Rack   Token

  113427455640312821154458202477256070484
10.0.0.1 0  0  0
10.0.0.2 0  0  
56713727820156410577229101238628035242
10.0.0.3 0  0  
113427455640312821154458202477256070484

Schema: ReplicaPlacementStrategy=NetworkTopologyStrategy, options: [0:2] (2 
replicas in DC 0).

When trying to run cleanup (same problem with repair), Cassandra reports:

From 10.0.0.1:
DEBUG [time] 10.0.0.2,10.0.0.3 endpoints in datacenter 0 for token 0
DEBUG [time] 10.0.0.2,10.0.0.3 endpoints in datacenter 0 for token 
56713727820156410577229101238628035242
DEBUG [time] 10.0.0.3,10.0.0.2 endpoints in datacenter 0 for token 
113427455640312821154458202477256070484
INFO [time] Cleanup cannot run before a node has joined the ring

From 10.0.0.2:
DEBUG [time] 10.0.0.1,10.0.0.3 endpoints in datacenter 0 for token 0
DEBUG [time] 10.0.0.1,10.0.0.3 endpoints in datacenter 0 for token 
56713727820156410577229101238628035242
DEBUG [time] 10.0.0.3,10.0.0.1 endpoints in datacenter 0 for token 
113427455640312821154458202477256070484
INFO [time] Cleanup cannot run before a node has joined the ring

From 10.0.0.3:
DEBUG [time] 10.0.0.1,10.0.0.2 endpoints in datacenter 0 for token 0
DEBUG [time] 10.0.0.1,10.0.0.2 endpoints in datacenter 0 for token 
56713727820156410577229101238628035242
DEBUG [time] 10.0.0.2,10.0.0.1 endpoints in datacenter 0 for token 
113427455640312821154458202477256070484
INFO [time] Cleanup cannot run before a node has joined the ring

For me this means, that one node thinks that whole data range is on other two 
nodes.

As a result:

WRITE request with any key/any token sent to 10.0.0.1 controller will be 
forwarded and saved on 10.0.0.2 and 10.0.0.3
READ request with CL.One with any key/any token sent to 10.0.0.2 controller 
will be forwarded to 10.0.0.1 or 10.0.0.3, and since 10.0.0.1 can't have data 
for write above, some requests fails, some don't (if 10.0.0.3 answers).
More of it, every READ request to any node will be forwarded to other node.

That what we have right now with 0.8.6 and up to 1.0.5 as with 3 nodes in 1 DC, 
as with 8x2 nodes.





Best regards/ Pagarbiai



Viktor Jevdokimov

Senior Developer



Email:  viktor.jevdoki...@adform.commailto:viktor.jevdoki...@adform.com

Phone: +370 5 212 3063. Fax: +370 5 261 0453

J. Jasinskio 16C, LT-01112 Vilnius, Lithuania






[Adform news]http://www.adform.com/

[Visit us!]

Follow:


[twitter]http://twitter.com/#!/adforminsider


Visit our bloghttp://www.adform.com/site/blog



Disclaimer: The information contained in this message and attachments is 
intended solely for the attention and use of the named addressee and may be 
confidential. If you are not the intended recipient, you are reminded that the 
information remains the property of the sender. You must not use, disclose, 
distribute, copy, print or rely on this e-mail. If you have received this 
message in error, please contact the sender immediately and irrevocably delete 
this message and any copies.

inline: image001.pnginline: image002.pnginline: image003.pnginline: signature-logo46e2.pnginline: dm-exco578c.pnginline: tweet7db.png

RE: [RELEASE] Apache Cassandra 1.0.5 released

2011-12-01 Thread Michael Vaknine

Hi,

After upgrading cluster to 1.0.5
I am having problems connecting to the cluster using hector

Any help will be aprichiated.

Thanks
Michael



me.prettyprint.hector.api.exceptions.HTimedOutException: TimedOutException()
at
me.prettyprint.cassandra.service.ExceptionsTranslatorImpl.translate(Exceptio
nsTranslatorImpl.java:42)
at
me.prettyprint.cassandra.service.KeyspaceServiceImpl$19.execute(KeyspaceServ
iceImpl.java:744)
at
me.prettyprint.cassandra.service.KeyspaceServiceImpl$19.execute(KeyspaceServ
iceImpl.java:726)
at
me.prettyprint.cassandra.service.Operation.executeAndSetResult(Operation.jav
a:101)
at
me.prettyprint.cassandra.connection.HConnectionManager.operateWithFailover(H
ConnectionManager.java:233)
at
me.prettyprint.cassandra.service.KeyspaceServiceImpl.operateWithFailover(Key
spaceServiceImpl.java:131)
at
me.prettyprint.cassandra.service.KeyspaceServiceImpl.getIndexedSlices(Keyspa
ceServiceImpl.java:748)
at
me.prettyprint.cassandra.model.IndexedSlicesQuery$1.doInKeyspace(IndexedSlic
esQuery.java:140)
at
me.prettyprint.cassandra.model.IndexedSlicesQuery$1.doInKeyspace(IndexedSlic
esQuery.java:131)
at
me.prettyprint.cassandra.model.KeyspaceOperationCallback.doInKeyspaceAndMeas
ure(KeyspaceOperationCallback.java:20)
at
me.prettyprint.cassandra.model.ExecutingKeyspace.doExecute(ExecutingKeyspace
.java:85)
at
me.prettyprint.cassandra.model.IndexedSlicesQuery.execute(IndexedSlicesQuery
.java:130)
at
com.lookin2.cassandra.access.CassandraImpl.executeQuery(CassandraImpl.java:2
20)
at
com.lookin2.cassandra.access.CassandraImpl.privateGet(CassandraImpl.java:327
)
at com.lookin2.cassandra.access.Cassandra.get(Cassandra.java:67)
at com.lookin2.cassandra.entities.BaseCass.getRows(BaseCass.java:150)
at com.lookin2.cassandra.entities.CassSet.populateOne(CassSet.java:148)
at com.lookin2.cassandra.entities.CassSet.populate(CassSet.java:117)
at
com.lookin2.cassandra.ontologies.CategoryAccess.getCategories(CategoryAccess
.java:20)
at
com.lookin2.cassandra.ontologies.CategoryOntologyCAO.loadAllNames(CategoryOn
tologyCAO.java:23)
at
com.lookin2.common.dawg.AbstractNamedItemTrie.init(AbstractNamedItemTrie.jav
a:69)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39
)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl
.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at
org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory
.invokeCustomInitMethod(AbstractAutowireCapableBeanFactory.java:1536)
at
org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory
.invokeInitMethods(AbstractAutowireCapableBeanFactory.java:1477)
at
org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory
.initializeBean(AbstractAutowireCapableBeanFactory.java:1409)
at
org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory
.doCreateBean(AbstractAutowireCapableBeanFactory.java:519)
at
org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory
.createBean(AbstractAutowireCapableBeanFactory.java:456)
at
org.springframework.beans.factory.support.AbstractBeanFactory$1.getObject(Ab
stractBeanFactory.java:291)
at
org.springframework.beans.factory.support.DefaultSingletonBeanRegistry.getSi
ngleton(DefaultSingletonBeanRegistry.java:222)
at
org.springframework.beans.factory.support.AbstractBeanFactory.doGetBean(Abst
ractBeanFactory.java:288)
at
org.springframework.beans.factory.support.AbstractBeanFactory.getBean(Abstra
ctBeanFactory.java:190)
at
org.springframework.beans.factory.support.DefaultListableBeanFactory.preInst
antiateSingletons(DefaultListableBeanFactory.java:574)
at
org.springframework.context.support.AbstractApplicationContext.finishBeanFac
toryInitialization(AbstractApplicationContext.java:895)
at
org.springframework.context.support.AbstractApplicationContext.refresh(Abstr
actApplicationContext.java:425)
at
org.springframework.web.context.ContextLoader.createWebApplicationContext(Co
ntextLoader.java:276)
at
org.springframework.web.context.ContextLoader.initWebApplicationContext(Cont
extLoader.java:197)
at
org.springframework.web.context.ContextLoaderListener.contextInitialized(Con
textLoaderListener.java:47)
at
org.apache.catalina.core.StandardContext.listenerStart(StandardContext.java:
3972)
at
org.apache.catalina.core.StandardContext.start(StandardContext.java:4467)
at org.apache.catalina.core.ContainerBase.start(ContainerBase.java:1045)
at org.apache.catalina.core.StandardHost.start(StandardHost.java:785)
at org.apache.catalina.core.ContainerBase.start(ContainerBase.java:1045)
at
org.apache.catalina.core.StandardEngine.start(StandardEngine.java:443)
at

is there a no disk storage mode ?

2011-12-01 Thread DE VITO Dominique

Hi,

I want to use Cassandra for (fast) unit testing with a small number of data.

So, I imagined the Cassandra embedded server I plan to use would start faster 
and would be more portable (because no file path depending on OS), without disk 
storage mode (so, diskless if you want).

Is there some no disk storage mode for Cassandra ?

Thanks.

Regards,
Dominique

Re: Insufficient disk space to flush

2011-12-01 Thread Jeremiah Jordan

If you are writing data with QUORUM or ALL you should be safe to restart 
cassandra on that node.  If the extra space is all from *tmp* files from 
compaction they will get deleted at startup.  You will then need to run 
repair on that node to get back any data that was missed while it was 
full.  If your commit log was on a different device you may not even 
have lost much.


-Jeremiah

On 12/01/2011 04:16 AM, Alexandru Dan Sicoe wrote:

Hello everyone,
 4 node Cassandra 0.8.5 cluster with RF =2.
 One node started throwing exceptions in its log:

ERROR 10:02:46,837 Fatal exception in thread 
Thread[FlushWriter:1317,5,main]
java.lang.RuntimeException: java.lang.RuntimeException: Insufficient 
disk space to flush 17296 bytes
at 
org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:34)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)

at java.lang.Thread.run(Thread.java:619)
Caused by: java.lang.RuntimeException: Insufficient disk space to 
flush 17296 bytes
at 
org.apache.cassandra.db.ColumnFamilyStore.getFlushPath(ColumnFamilyStore.java:714)
at 
org.apache.cassandra.db.ColumnFamilyStore.createFlushWriter(ColumnFamilyStore.java:2301)
at 
org.apache.cassandra.db.Memtable.writeSortedContents(Memtable.java:246)

at org.apache.cassandra.db.Memtable.access$400(Memtable.java:49)
at 
org.apache.cassandra.db.Memtable$3.runMayThrow(Memtable.java:270)
at 
org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:30)

... 3 more

Checked disk and obviously it's 100% full.

How do I recover from this without loosing the data? I've got plenty 
of space on the other nodes, so I thought of doing a decommission 
which I understand reassigns ranges to the other nodes and replicates 
data to them. After that's done I plan on manually deleting the data 
on the node and then joining in the same cluster position with 
auto-bootstrap turned off so that I won't get back the old data and I 
can continue getting new data with the node.


Note, I would like to have 4 nodes in because the other three barely 
take the input load alone. These are just long running tests until I 
get some better machines.


On strange thing I found is that the data folder on the ndoe that 
filled up the disk is 150 GB (as measured with du) while the data 
folder on all other 3 nodes is 50 GB. At the same time, DataStax 
OpsCenter shows a size of around 50GB for all 4 nodes. I though that 
the node was making a major compaction at which time it filled up the 
diskbut even that doesn't make sense because shouldn't a major 
compaction just be capable of doubling the size, not triple-ing it? 
Doesn anyone know how to explain this behavior?


Thanks,
Alex

decommissioned not show being gossipped

2011-12-01 Thread huyle

Hi,

We have 2 nodes have been decommissioned from the cluster running 1.0.3. 
However, the live nodes still making references to the decommissioned nodes
3 days after the nodes were decommissioned.  Nodetool does not show the
decommissioned noes. Here are sample log entries:

 INFO [GossipStage:1] 2011-12-01 18:20:37,882 Gossiper.java (line 759)
InetAddress /x.x.x.x is now dead.
 INFO [GossipStage:1] 2011-12-01 18:20:37,882 StorageService.java (line
1039) Removing token 170141183460469231731687303715884105727 for /x.x.x.x


What might be causing this issue?Any chance this is related to
https://issues.apache.org/jira/browse/CASSANDRA-3243?

Thanks!

Huy


--
View this message in context: 
http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/decommissioned-not-show-being-gossipped-tp7051526p7051526.html
Sent from the cassandra-u...@incubator.apache.org mailing list archive at 
Nabble.com.

Re: decommissioned not show being gossipped

2011-12-01 Thread Brandon Williams

On Thu, Dec 1, 2011 at 12:26 PM, huyle hu...@springpartners.com wrote:
 Hi,

 We have 2 nodes have been decommissioned from the cluster running 1.0.3.
 However, the live nodes still making references to the decommissioned nodes
 3 days after the nodes were decommissioned.  Nodetool does not show the
 decommissioned noes. Here are sample log entries:

How in sync are the clocks in the cluster?

-Brandon

MapReduce on Cassandra using Ruby and REST!

2011-12-01 Thread Brian O'Neill

I know I've been spamming the list a bit with new features for Virgil, but
this one is actually really cool...

Enamored with what Riak provides as far as map/reduce via HTTP,
http://wiki.basho.com/MapReduce.html#MapReduce-via-the-HTTP-API

We implemented the same thing for Virgil/Cassandra.   Simply write a script
in ruby, then POST that script to Virgil.  Virgil will kick off the Hadoop
job against Cassandra using column families for input and output.

Right now it only supports Ruby, but there is nothing preventing us from
adding support for other languages.  We'll probably throw Groovy in there
as well.
Check it out when you get a chance...
http://brianoneill.blogspot.com/2011/12/hadoopmapreduce-on-cassandra-using-ruby.html

After we add the ability to push the job to an existing Hadoop cluster,
we'll move on to PIG support and CRUD operations in the GUI.

stay tuned,
-brian

-- 
Brian ONeill
Lead Architect, Health Market Science (http://healthmarketscience.com)
mobile:215.588.6024
blog: http://weblogs.java.net/blog/boneill42/
blog: http://brianoneill.blogspot.com/

garbage collecting tombstones

2011-12-01 Thread A J

Hello,
Is 'garbage collecting tombstones ' a different operation than the JVM GC.
Garbage collecting tombstones is controlled by gc_grace_seconds which
by default is set to 10 days. But the traditional GC seems to happen
much more frequently (when observed through jconsole) ?

How can I force the garbage collecting tombstones to happen ad-hoc
when I want to ?

Thanks.

Re: decommissioned not show being gossipped

2011-12-01 Thread huyle

The clocks are very sync'ed between the nodes as they have ntp running
hitting our time servers.

Huy



--
View this message in context: 
http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/decommissioned-nodes-still-being-gossipped-tp7051526p7051701.html
Sent from the cassandra-u...@incubator.apache.org mailing list archive at 
Nabble.com.

Re: decommissioned not show being gossipped

2011-12-01 Thread Brandon Williams

On Thu, Dec 1, 2011 at 1:10 PM, huyle hu...@springpartners.com wrote:
 The clocks are very sync'ed between the nodes as they have ntp running
 hitting our time servers.

Maybe they weren't 3 days after the token left, which
https://issues.apache.org/jira/browse/CASSANDRA-2961 requires.

If a node sees the token you can removetoken it, otherwise you'll need
https://issues.apache.org/jira/browse/CASSANDRA-3337

-Brandon

Re: is there a no disk storage mode ?

2011-12-01 Thread huyle

I am not sure of no disk option, but as for fast unit testing, you can try
RAM disk for storage.

Huy

--
View this message in context: 
http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/is-there-a-no-disk-storage-mode-tp7051192p7051728.html
Sent from the cassandra-u...@incubator.apache.org mailing list archive at 
Nabble.com.

Re: Insufficient disk space to flush

2011-12-01 Thread Alexandru Dan Sicoe

Hi Jeremiah,
My commitlog was indeed on another disk. I did what you said and yes the
node restart brings back the disk size to the around 50 GB I was expecting.
Still I do not understand how the node managed to get itself in the
situation of having these tmp files? Could you clarify what these are, how
they are produced and why? I've tried to find a clear definition but all I
could come up with is hints that they are produced during compaction. I
also found a thread that described a similar problem:
http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Errors-During-Compaction-td5953493.html
as described there it seems like compaction fails and tmp files don't get
cleaned up until they fill the disk. Is this what happened in my case?
Compactions did not finish properly because the disk utilization was more
than half and then more and more files tmp started getting accumulated at
each other attempt. The Cassandra log would indicate this because I get
many of these:
ERROR [CompactionExecutor:22850] 2011-12-01 04:12:15,200
CompactionManager.java (line 513) insufficie
nt space to compact even the two smallest files, aborting

before I started getting many of these:
ERROR [FlushWriter:283] 2011-12-01 04:12:22,917
AbstractCassandraDaemon.java (line 139) Fatal exception in thread
Thread[FlushWriter:283,5,main] java.lang.RuntimeException:
java.lang.RuntimeException: Insufficient disk space to flush 42531 bytes

I just want to clearly understand what happened.

Thanks,
Alex

On Thu, Dec 1, 2011 at 6:58 PM, Jeremiah Jordan
jeremiah.jor...@morningstar.com wrote:

If you are writing data with QUORUM or ALL you should be safe to restart
cassandra on that node. If the extra space is all from *tmp* files from
compaction they will get deleted at startup. You will then need to run
repair on that node to get back any data that was missed while it was
full. If your commit log was on a different device you may not even have
lost much.

-Jeremiah

On 12/01/2011 04:16 AM, Alexandru Dan Sicoe wrote:

Hello everyone,
4 node Cassandra 0.8.5 cluster with RF =2.
One node started throwing exceptions in its log:

ERROR 10:02:46,837 Fatal exception in thread
Thread[FlushWriter:1317,5,main]
java.lang.RuntimeException: java.lang.RuntimeException: Insufficient disk
space to flush 17296 bytes
at
org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:34)
at
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:619)
Caused by: java.lang.RuntimeException: Insufficient disk space to flush
17296 bytes
at
org.apache.cassandra.db.ColumnFamilyStore.getFlushPath(ColumnFamilyStore.java:714)
at
org.apache.cassandra.db.ColumnFamilyStore.createFlushWriter(ColumnFamilyStore.java:2301)
at
org.apache.cassandra.db.Memtable.writeSortedContents(Memtable.java:246)
at org.apache.cassandra.db.Memtable.access$400(Memtable.java:49)
at
org.apache.cassandra.db.Memtable$3.runMayThrow(Memtable.java:270)
at
org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:30)
... 3 more

Checked disk and obviously it's 100% full.

How do I recover from this without loosing the data? I've got plenty of
space on the other nodes, so I thought of doing a decommission which I
understand reassigns ranges to the other nodes and replicates data to them.
After that's done I plan on manually deleting the data on the node and then
joining in the same cluster position with auto-bootstrap turned off so that
I won't get back the old data and I can continue getting new data with the
node.

Note, I would like to have 4 nodes in because the other three barely take
the input load alone. These are just long running tests until I get some
better machines.

On strange thing I found is that the data folder on the ndoe that filled
up the disk is 150 GB (as measured with du) while the data folder on all
other 3 nodes is 50 GB. At the same time, DataStax OpsCenter shows a size
of around 50GB for all 4 nodes. I though that the node was making a major
compaction at which time it filled up the diskbut even that doesn't
make sense because shouldn't a major compaction just be capable of doubling
the size, not triple-ing it? Doesn anyone know how to explain this behavior?

Thanks,
Alex

Re: is there a no disk storage mode ?

2011-12-01 Thread Tom van den Berge


Hi Dominique,

I don't think there is a way to run cassandra without disk storage. But 
running it embedded can be very useful for unit testing. I'm using 
cassandra-unit (https://github.com/jsevellec/cassandra-unit) to 
integrate it in my tests. You don't need to configure any file paths; it 
works fine out of the box.


I've set it up to drop and recreate my keyspace before each test case, 
and even then it performs quite good.


Good luck,
Tom


On 12/1/11 5:36 PM, DE VITO Dominique wrote:


Hi,

I want to use Cassandra for (fast) unit testing with a small number of 
data.


So, I imagined the Cassandra embedded server I plan to use would start 
faster and would be more portable (because no file path depending on 
OS), without disk storage mode (so, diskless if you want).


Is there some no disk storage mode for Cassandra ?

Thanks.

Regards,

Dominique

Re: garbage collecting tombstones

2011-12-01 Thread Jahangir Mohammed

Tombstone is a marker indicating the record to be deleted.

gc_grace_seconds is the time after which the record will be deleted
physically from the node.

There is no ad-hoc way of gc'ing tombstones. Only after gc_grace_seconds
the tombstones will be gc'ed.

Thanks,
Jahangir Mohammed.


On Thu, Dec 1, 2011 at 2:01 PM, A J s5a...@gmail.com wrote:

 Hello,
 Is 'garbage collecting tombstones ' a different operation than the JVM GC.
 Garbage collecting tombstones is controlled by gc_grace_seconds which
 by default is set to 10 days. But the traditional GC seems to happen
 much more frequently (when observed through jconsole) ?

 How can I force the garbage collecting tombstones to happen ad-hoc
 when I want to ?

 Thanks.

Re: Insufficient disk space to flush

2011-12-01 Thread Jahangir Mohammed

Yes, mostly sounds like it. In our case failed repairs were causing
accumulation of the tmp files.

Thanks,
Jahangir Mohammed.

On Thu, Dec 1, 2011 at 2:43 PM, Alexandru Dan Sicoe
sicoe.alexan...@googlemail.com wrote:

http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Errors-During-Compaction-td5953493.html
as described there it seems like compaction fails and tmp files don't get
cleaned up until they fill the disk. Is this what happened in my case?
Compactions did not finish properly because the disk utilization was more
than half and then more and more files tmp started getting accumulated at
each other attempt. The Cassandra log would indicate this because I get
many of these:
ERROR [CompactionExecutor:22850] 2011-12-01 04:12:15,200
CompactionManager.java (line 513) insufficie
nt space to compact even the two smallest files, aborting

I just want to clearly understand what happened.

Thanks,
Alex

On Thu, Dec 1, 2011 at 6:58 PM, Jeremiah Jordan
jeremiah.jor...@morningstar.com wrote:

-Jeremiah

On 12/01/2011 04:16 AM, Alexandru Dan Sicoe wrote:

Hello everyone,
4 node Cassandra 0.8.5 cluster with RF =2.
One node started throwing exceptions in its log:

Checked disk and obviously it's 100% full.

Note, I would like to have 4 nodes in because the other three barely take
the input load alone. These are just long running tests until I get some
better machines.

Thanks,
Alex

Re: garbage collecting tombstones

2011-12-01 Thread Eric Evans

On Thu, Dec 1, 2011 at 2:45 PM, Jahangir Mohammed
md.jahangi...@gmail.com wrote:
 Tombstone is a marker indicating the record to be deleted.
 gc_grace_seconds is the time after which the record will be deleted
 physically from the node.
 There is no ad-hoc way of gc'ing tombstones. Only after gc_grace_seconds
 the tombstones will be gc'ed.

Actually, they'll be removed on the first compaction that occurs after
gc_grace_seconds.

 On Thu, Dec 1, 2011 at 2:01 PM, A J s5a...@gmail.com wrote:
 Is 'garbage collecting tombstones ' a different operation than the JVM GC.
 Garbage collecting tombstones is controlled by gc_grace_seconds which
 by default is set to 10 days. But the traditional GC seems to happen
 much more frequently (when observed through jconsole) ?

 How can I force the garbage collecting tombstones to happen ad-hoc
 when I want to ?

-- 
Eric Evans
Acunu | http://www.acunu.com | @acunu

Re: cassandra read performance on large dataset

2011-12-01 Thread Bill


 Our largest dataset has 1200 billion rows.

Radim, out of curiosity, how many nodes is that running across?

Bill

On 28/11/11 13:44, Radim Kolar wrote:



I understand that my computer may be not as powerful as those used in
the other benchmarks,
but it shouldn't be that far off (1:30), right?

cassandra has very fast writes. you can have read:write ratios like 1:1000

pure read workload on 1 billion rows without key/row cache on 2 node cluster
Running workload in 10 threads 1000 ops each.
Workload took 88.59 seconds, thruput 112.88 ops/sec

each node can do about 240 IOPS. Which means average 4 iops per read in
cassandra on cold system.
After OS cache warms enough to cache indirect seek blocks it gets faster
to almost ideal:
Workload took 79.76 seconds, thruput 200.59 ops/sec
Ideal cassandra read performance is (without caches) is 2 IOPS per read
- one io to read index, second to data.

pure write workload:
Running workload in 40 threads 10 ops each.
Workload took 302.51 seconds, thruput 13222.62 ops/sec
write is slow here because nodes are running out of memory most likely
due to memory leaks in 1.0 branch. Also writes in this test are not batched.

Cassandra is really awesome for its price tag. Getting similar numbers
from Oracle will cost you way too much. For one 2 core Oracle licence
suitable for processing large data you can get about 8 cassandra nodes -
and dont forget that oracle needs some hardware too. Transactions are
not always needed for data warehousing - if you are importing chunks of
data, you do not need to do rollbacks, just schedule failed chunks for
later processing. If you are able to code your app to work without
transactions, cassandra is way to go.

Hadoop and cassandra are very good products for working with large data
basically for just price of learning new technology. Usually cassandra
is deployed first, its easy to get it running and day-to-day operations
are simple. Hadoop follows later after discovering that cassandra is not
really suitable for large batch jobs because it needs random access for
data reading.

We finished processing migration from Commercional SQL to Hadoop/Cassa
in 3 months, not only that it costs 10x less, we are able to process
about 100 times larger datasets. Our largest dataset has 1200 billion rows.

Problems with this setup are:
bloom filters are using too much memory. they should be configurable for
applications where read performance is unimportant
node startup is really slow
data loaded into cassandra are about 2 times bigger then CSV export.
(not really problem, diskspace is cheap, but there is kinda high per row
overhead)
writing applications is harder then coding for SQL backend. Hadoop is
way harder to use then cassandra.
lack of good import/export tools for cassandra. especially lack of
monitoring
must have knowledge of workarounds for hadoop bugs. Hadoop is not easy
to use efficiently.
index overhead is too big (about 100% slower) compared to index overhead
in SQL databases (about 20% slower)
no delete over index
repair is slow

Re: [RELEASE] Apache Cassandra 1.0.5 released

2011-12-01 Thread Evgeniy Ryabitskiy

+1
After upgrade to 1.0.5 also have Timeout exception on Secondary Index
search (get_indexed_slices API) .

Re: [RELEASE] Apache Cassandra 1.0.5 released

2011-12-01 Thread Zhong Li

After upgrade to 1.0.5 RangeSlice got timeout. Ticket 
https://issues.apache.org/jira/browse/CASSANDRA-3551

On Dec 1, 2011, at 5:43 PM, Evgeniy Ryabitskiy wrote:

 +1 
 After upgrade to 1.0.5 also have Timeout exception on Secondary Index search 
 (get_indexed_slices API) .

Re: Hinted handoff bug?

2011-12-01 Thread Terje Marthinussen

Sorry for not checking source to see if things have changed but i just 
remembered an issue I have forgotten to make jira for.

In old days, nodes would periodically try to deliver queues.

However, this was at some stage changed so it only deliver if a node is being 
marked up.

However, you can definitely have a scenario where  A fails to deliver to B so 
it send the hint to C instead.

However, B is not really down, it just could not accept that packet at that 
time and C always (correctly in this case) thinks B is up and it never tries to 
deliver the hints to B.

Will this change fix this, or do we need to get back the thread that 
periodically tried to deliver hints regardless of node status changes?

Regards,
Terje

On 1 Dec 2011, at 19:10, Sylvain Lebresne sylv...@datastax.com wrote:

 You're right, good catch.
 Do you mind opening a ticket on jira
 (https://issues.apache.org/jira/browse/CASSANDRA)?
 
 --
 Sylvain
 
 On Thu, Dec 1, 2011 at 10:03 AM, Fredrik L Stigbäck
 fredrik.l.stigb...@sitevision.se wrote:
 Hi,
 We,re running cassandra 1.0.3.
 I've done some testing with 2 nodes (node A, node B), replication factor 2.
 I take node A down, writing some data to node B and then take node A up.
 Sometimes hints aren't delivered when node A comes up.
 
 I've done some debugging in org.apache.cassandra.db.HintedHandOffManager and
 sometimes node B ends up in a strange state in method
 org.apache.cassandra.db.HintedHandOffManager.deliverHints(final InetAddress
 to), where org.apache.cassandra.db.HintedHandOffManager.queuedDeliveries
 already has node A in it's Set and therefore no hints will ever be delivered
 to node A.
 The only reason for this that I can see is that in
 org.apache.cassandra.db.HintedHandOffManager.deliverHintsToEndpoint(InetAddress
 endpoint) the hintStore.isEmpty() check returns true and the endpoint (node
 A)  isn't removed from
 org.apache.cassandra.db.HintedHandOffManager.queuedDeliveries. Then no hints
 will ever be delivered again until node B is restarted.
 During what conditions will hintStore.isEmpty() return true?
 Shouldn't the hintStore.isEmpty() check be inside the try {} finally{}
 clause, removing the endpoint from queuedDeliveries in the finally block?
 
 public void deliverHints(final InetAddress to)
 {
 logger_.debug(deliverHints to {}, to);
 if (!queuedDeliveries.add(to))
 return;
 ...
 }
 
 private void deliverHintsToEndpoint(InetAddress endpoint) throws
 IOException, DigestMismatchException, InvalidRequestException,
 TimeoutException,
 {
 ColumnFamilyStore hintStore =
 Table.open(Table.SYSTEM_TABLE).getColumnFamilyStore(HINTS_CF);
 if (hintStore.isEmpty())
 return; // nothing to do, don't confuse users by logging a no-op
 handoff
 try
 {
 ..
 }
 finally
 {
 queuedDeliveries.remove(endpoint);
 }
 }
 
 Regards
 /Fredrik

Re: is there a no disk storage mode ?

2011-12-01 Thread Deno Vichas

if you're in java land there's a maven plugin - 
http://mojo.codehaus.org/cassandra-maven-plugin/



On 12/1/2011 12:13 PM, Tom van den Berge wrote:

Hi Dominique,

I don't think there is a way to run cassandra without disk storage. 
But running it embedded can be very useful for unit testing. I'm using 
cassandra-unit (https://github.com/jsevellec/cassandra-unit) to 
integrate it in my tests. You don't need to configure any file paths; 
it works fine out of the box.


I've set it up to drop and recreate my keyspace before each test case, 
and even then it performs quite good.


Good luck,
Tom


On 12/1/11 5:36 PM, DE VITO Dominique wrote:


Hi,

I want to use Cassandra for (fast) unit testing with a small number 
of data.


So, I imagined the Cassandra embedded server I plan to use would 
start faster and would be more portable (because no file path 
depending on OS), without disk storage mode (so, diskless if you want).


Is there some no disk storage mode for Cassandra ?

Thanks.

Regards,

Dominique

Re: Hinted handoff bug?

2011-12-01 Thread Jonathan Ellis

Nope, that's a separate issue.
https://issues.apache.org/jira/browse/CASSANDRA-3554

On Thu, Dec 1, 2011 at 5:59 PM, Terje Marthinussen
tmarthinus...@gmail.com wrote:
 Sorry for not checking source to see if things have changed but i just 
 remembered an issue I have forgotten to make jira for.

 In old days, nodes would periodically try to deliver queues.

 However, this was at some stage changed so it only deliver if a node is being 
 marked up.

 However, you can definitely have a scenario where  A fails to deliver to B so 
 it send the hint to C instead.

 However, B is not really down, it just could not accept that packet at that 
 time and C always (correctly in this case) thinks B is up and it never tries 
 to deliver the hints to B.

 Will this change fix this, or do we need to get back the thread that 
 periodically tried to deliver hints regardless of node status changes?

 Regards,
 Terje

 On 1 Dec 2011, at 19:10, Sylvain Lebresne sylv...@datastax.com wrote:

 You're right, good catch.
 Do you mind opening a ticket on jira
 (https://issues.apache.org/jira/browse/CASSANDRA)?

 --
 Sylvain

 On Thu, Dec 1, 2011 at 10:03 AM, Fredrik L Stigbäck
 fredrik.l.stigb...@sitevision.se wrote:
 Hi,
 We,re running cassandra 1.0.3.
 I've done some testing with 2 nodes (node A, node B), replication factor 2.
 I take node A down, writing some data to node B and then take node A up.
 Sometimes hints aren't delivered when node A comes up.

 I've done some debugging in org.apache.cassandra.db.HintedHandOffManager and
 sometimes node B ends up in a strange state in method
 org.apache.cassandra.db.HintedHandOffManager.deliverHints(final InetAddress
 to), where org.apache.cassandra.db.HintedHandOffManager.queuedDeliveries
 already has node A in it's Set and therefore no hints will ever be delivered
 to node A.
 The only reason for this that I can see is that in
 org.apache.cassandra.db.HintedHandOffManager.deliverHintsToEndpoint(InetAddress
 endpoint) the hintStore.isEmpty() check returns true and the endpoint (node
 A)  isn't removed from
 org.apache.cassandra.db.HintedHandOffManager.queuedDeliveries. Then no hints
 will ever be delivered again until node B is restarted.
 During what conditions will hintStore.isEmpty() return true?
 Shouldn't the hintStore.isEmpty() check be inside the try {} finally{}
 clause, removing the endpoint from queuedDeliveries in the finally block?

 public void deliverHints(final InetAddress to)
 {
         logger_.debug(deliverHints to {}, to);
         if (!queuedDeliveries.add(to))
             return;
         ...
 }

 private void deliverHintsToEndpoint(InetAddress endpoint) throws
 IOException, DigestMismatchException, InvalidRequestException,
 TimeoutException,
 {
         ColumnFamilyStore hintStore =
 Table.open(Table.SYSTEM_TABLE).getColumnFamilyStore(HINTS_CF);
         if (hintStore.isEmpty())
             return; // nothing to do, don't confuse users by logging a no-op
 handoff
     try
     {
         ..
     }
     finally
     {
             queuedDeliveries.remove(endpoint);
     }
 }

 Regards
 /Fredrik



-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of DataStax, the source for professional Cassandra support
http://www.datastax.com

Leveled Compaction in cassandra1.0 may not be perfect

2011-12-01 Thread liangfeng

Hello，everyone!
   In this
doc(http://www.datastax.com/dev/blog/leveled-compaction-in-apache-cassandra),I
found a conclusion Only enough space for 10x the sstable size needs to be
reserved for temporary use by compaction.I don't know how can we got this
conclusion,but I guess the author of this doc may got this conclusion by one
compaction rule from levelDB.In one doc of
levelDB(http://leveldb.googlecode.com/svn/trunk/doc/impl.html),I found this rule
We also switch to a new output file when the key range of the current output
file has grown enough to overlap more then ten level-(L+2) files.As this
compaction rule descripting,10x the sstable size is enough for every
compaction.Unfortunatly,cassandra1.0 may not implement this compaction rule,so I
think this conclusion may be arbitrary.Everyone,what do you think about it?
   Of course，implement this compaction rule may not be hard,but this
implementation may cause another problem.Many small sstables which overlap just
10 sstables with next level may be generated in compaction,especially when we
use RandomPartitioner.This may cause many compactions when these small sstables
have to uplevel to next level.In my practice,I write 120g data to one cassandra
node,and cassandra node spent 24 hourse to compacte this data by Leveled
Compaction.So,I don't think Leveled Compaction is perfect.What do you think
about it,my friends?

Hinted handoff bug?

Re: Hinted handoff bug?

Insufficient disk space to flush

Re: Hinted handoff bug?

Re: Strategies to maintain counters sorted row ?

RE: [RELEASE] Apache Cassandra 1.0.5 released

Re: [RELEASE] Apache Cassandra 1.0.5 released

NetworkTopologyStrategy bug?

RE: NetworkTopologyStrategy bug?

RE: [RELEASE] Apache Cassandra 1.0.5 released

is there a no disk storage mode ?

Re: Insufficient disk space to flush

decommissioned not show being gossipped

Re: decommissioned not show being gossipped

MapReduce on Cassandra using Ruby and REST!

garbage collecting tombstones

Re: decommissioned not show being gossipped

Re: decommissioned not show being gossipped

Re: is there a no disk storage mode ?

Re: Insufficient disk space to flush

Re: is there a no disk storage mode ?

Re: garbage collecting tombstones

Re: Insufficient disk space to flush

Re: garbage collecting tombstones

Re: cassandra read performance on large dataset

Re: [RELEASE] Apache Cassandra 1.0.5 released

Re: [RELEASE] Apache Cassandra 1.0.5 released

Re: Hinted handoff bug?

Re: is there a no disk storage mode ?

Re: Hinted handoff bug?

Leveled Compaction in cassandra1.0 may not be perfect

31 matches

Site Navigation

Mail list logo

Footer information