Re: logged out: #

2011-05-09 Thread Stu Hood
As a side note, be aware that running with DEBUG logging enabled can make
your cluster run a full order of magnitude slower.

On Mon, May 9, 2011 at 6:54 PM, Suan Aik Yeo  wrote:

> Ah, must be the status check that I set up. Thanks!
>
>
> On Mon, May 9, 2011 at 7:42 PM, Tyler Hobbs  wrote:
>
>> It just means a client connection was closed.
>>
>>
>> On Mon, May 9, 2011 at 5:41 PM, Suan Aik Yeo wrote:
>>
>>> I have a Cassandra 0.7.0, 3 node cluster with logging set to DEBUG. A few
>>> days ago, and I'm not sure what triggered this, the logs started showing
>>> messages like
>>> DEBUG 17:37:30,399 logged out: #
>>> every second or so, regardless whether there was Cassandra activity.
>>> Today I just upgraded to 0.7.5, and they're still there. Apart from blowing
>>> up the log file sizes, what do these messages mean?
>>>
>>> Thanks,
>>> Suan
>>>
>>
>>
>>
>> --
>> Tyler Hobbs
>> Software Engineer, DataStax 
>> Maintainer of the pycassa  Cassandra
>> Python client library
>>
>>
>


Re: cassandra not reading keyspaces defined in cassandra.yaml

2011-05-09 Thread Paul Loy
http://wiki.apache.org/cassandra/FAQ#no_keyspaces

On Tue, May 10, 2011 at 6:21 AM, Narendra Sharma
wrote:

> Look for "Where are my keyspaces?" on following page:
> *http://wiki.apache.org/cassandra/StorageConfiguration
> *
> On Mon, May 9, 2011 at 5:51 PM, Anurag Gujral wrote:
>
>> Hi All,
>>I have following in my cassandra.yaml
>> keyspaces:
>> - column_families:
>>   - column_metadata: []
>> column_type: Standard
>> compare_with: BytesType
>> gc_grace_seconds: 86400
>> key_cache_save_period_in_seconds: 14400
>> keys_cached: 0.0
>> max_compaction_threshold: 32
>> memtable_flush_after_mins: 1440
>> memtable_operations_in_millions: 100.0
>> memtable_throughput_in_mb: 256
>> min_compaction_threshold: 4
>> name: data
>> read_repair_chance: 1.0
>> row_cache_save_period_in_seconds: 0
>> rows_cached: 1000
>>   name: offline
>>   replica_placement_strategy:
>> org.apache.cassandra.locator.RackUnawareStrategy
>>   replication_factor: 1
>>
>> Cassandra starts properly without giving any warnngs/error but does not
>> create the keyspace offline
>> which is defined above.
>>
>> Please suggest.
>>
>> Thanks
>> Anurag
>>
>
>
>
> --
> Narendra Sharma
> Solution Architect
> *http://www.persistentsys.com*
> *http://narendrasharma.blogspot.com/*
>
>
>


-- 
-
Paul Loy
p...@keteracel.com
http://uk.linkedin.com/in/paulloy


Re: cassandra not reading keyspaces defined in cassandra.yaml

2011-05-09 Thread Narendra Sharma
Look for "Where are my keyspaces?" on following page:
*http://wiki.apache.org/cassandra/StorageConfiguration
*
On Mon, May 9, 2011 at 5:51 PM, Anurag Gujral wrote:

> Hi All,
>I have following in my cassandra.yaml
> keyspaces:
> - column_families:
>   - column_metadata: []
> column_type: Standard
> compare_with: BytesType
> gc_grace_seconds: 86400
> key_cache_save_period_in_seconds: 14400
> keys_cached: 0.0
> max_compaction_threshold: 32
> memtable_flush_after_mins: 1440
> memtable_operations_in_millions: 100.0
> memtable_throughput_in_mb: 256
> min_compaction_threshold: 4
> name: data
> read_repair_chance: 1.0
> row_cache_save_period_in_seconds: 0
> rows_cached: 1000
>   name: offline
>   replica_placement_strategy:
> org.apache.cassandra.locator.RackUnawareStrategy
>   replication_factor: 1
>
> Cassandra starts properly without giving any warnngs/error but does not
> create the keyspace offline
> which is defined above.
>
> Please suggest.
>
> Thanks
> Anurag
>



-- 
Narendra Sharma
Solution Architect
*http://www.persistentsys.com*
*http://narendrasharma.blogspot.com/*


Re: Renaming cluster

2011-05-09 Thread aaron morton
Can you provide the full error stack, it will show where it failed when 
starting up. 

AFAIK this i the correct process. I just did a quick test on a singe 0.7 node 
and it could start up after removing the locations SSTables. 

If you go ahead with removing all the system sstables you can re-recreate the 
schema and it will pickup the existing files. Also be aware that the initial 
token will be lost unless it is in the conf file. 

Hope that helps. 
 
-
Aaron Morton
Freelance Cassandra Developer
@aaronmorton
http://www.thelastpickle.com

On 10 May 2011, at 15:57, Shaun Newman wrote:

> Hi,
> 
> I am trying to rename my cluster which has several keyspaces running on 
> cassandra 0.7.5.  When I try to remove the system files as suggested by 
> http://wiki.apache.org/cassandra/FAQ#clustername_mismatch , I get "Could not 
> read system table. Did you change partitioners?" error. If I remove all the 
> system files, the cluster does boot up well but I  loose the schema's of all 
> CFs.  
> 
> Am I missing something or is this a bug. Can anyone suggest what would be the 
> right approach?
> 
> Many thanks in advance.
> 
> Shaun
> 
> 



Renaming cluster

2011-05-09 Thread Shaun Newman
Hi,

I am trying to rename my cluster which has several keyspaces running on
cassandra 0.7.5.  When I try to remove the system files as suggested by
http://wiki.apache.org/cassandra/FAQ#clustername_mismatch , I get "Could not
read system table. Did you change partitioners?" error. If I remove all the
system files, the cluster does boot up well but I  loose the schema's of all
CFs.

Am I missing something or is this a bug. Can anyone suggest what would be
the right approach?

Many thanks in advance.

Shaun


Re: Ec2 Stress Results

2011-05-09 Thread Jonathan Ellis
On Mon, May 9, 2011 at 5:58 PM, Alex Araujo > How many
replicas are you writing?
>
> Replication factor is 3.

So you're actually spot on the predicted numbers: you're pushing
20k*3=60k "raw" rows/s across your 4 machines.

You might get another 10% or so from increasing memtable thresholds,
but bottom line is you're right around what we'd expect to see.
Furthermore, CPU is the primary bottleneck which is what you want to
see on a pure write workload.

-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of DataStax, the source for professional Cassandra support
http://www.datastax.com


Re: A query in deletioelo

2011-05-09 Thread Jonathan Ellis
http://wiki.apache.org/cassandra/FAQ#range_ghosts

On Mon, May 9, 2011 at 9:24 PM, anuya joshi  wrote:
> Hello,
>
> I am unclear on Why deleting a row in Cassandra does not delete a row key?
> Is an empty row never deleted from Column Family?
>
> It would be of great help if someone can elaborate on this.
>
> Thanks,
> Anuya
>



-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of DataStax, the source for professional Cassandra support
http://www.datastax.com


A query in deletion of a row

2011-05-09 Thread anuya joshi
  

Hello,
>
> I am unclear on Why deleting a row in Cassandra does not delete a row key?
> Is an empty row never deleted from Column Family?
>
> It would be of great help if someone can elaborate on this.
>
> Thanks,
> Anuya
>


A query in deletioelo

2011-05-09 Thread anuya joshi
Hello,

I am unclear on Why deleting a row in Cassandra does not delete a row key?
Is an empty row never deleted from Column Family?

It would be of great help if someone can elaborate on this.

Thanks,
Anuya


Re: logged out: #

2011-05-09 Thread Suan Aik Yeo
Ah, must be the status check that I set up. Thanks!

On Mon, May 9, 2011 at 7:42 PM, Tyler Hobbs  wrote:

> It just means a client connection was closed.
>
>
> On Mon, May 9, 2011 at 5:41 PM, Suan Aik Yeo  wrote:
>
>> I have a Cassandra 0.7.0, 3 node cluster with logging set to DEBUG. A few
>> days ago, and I'm not sure what triggered this, the logs started showing
>> messages like
>> DEBUG 17:37:30,399 logged out: #
>> every second or so, regardless whether there was Cassandra activity. Today
>> I just upgraded to 0.7.5, and they're still there. Apart from blowing up the
>> log file sizes, what do these messages mean?
>>
>> Thanks,
>> Suan
>>
>
>
>
> --
> Tyler Hobbs
> Software Engineer, DataStax 
> Maintainer of the pycassa  Cassandra
> Python client library
>
>


Re: cassandra not reading keyspaces defined in cassandra.yaml

2011-05-09 Thread Tyler Hobbs
http://wiki.apache.org/cassandra/FAQ#no_keyspaces

On Mon, May 9, 2011 at 7:51 PM, Anurag Gujral wrote:

> Hi All,
>I have following in my cassandra.yaml
> keyspaces:
> - column_families:
>   - column_metadata: []
> column_type: Standard
> compare_with: BytesType
> gc_grace_seconds: 86400
> key_cache_save_period_in_seconds: 14400
> keys_cached: 0.0
> max_compaction_threshold: 32
> memtable_flush_after_mins: 1440
> memtable_operations_in_millions: 100.0
> memtable_throughput_in_mb: 256
> min_compaction_threshold: 4
> name: data
> read_repair_chance: 1.0
> row_cache_save_period_in_seconds: 0
> rows_cached: 1000
>   name: offline
>   replica_placement_strategy:
> org.apache.cassandra.locator.RackUnawareStrategy
>   replication_factor: 1
>
> Cassandra starts properly without giving any warnngs/error but does not
> create the keyspace offline
> which is defined above.
>
> Please suggest.
>
> Thanks
> Anurag
>



-- 
Tyler Hobbs
Software Engineer, DataStax 
Maintainer of the pycassa  Cassandra
Python client library


cassandra not reading keyspaces defined in cassandra.yaml

2011-05-09 Thread Anurag Gujral
Hi All,
   I have following in my cassandra.yaml
keyspaces:
- column_families:
  - column_metadata: []
column_type: Standard
compare_with: BytesType
gc_grace_seconds: 86400
key_cache_save_period_in_seconds: 14400
keys_cached: 0.0
max_compaction_threshold: 32
memtable_flush_after_mins: 1440
memtable_operations_in_millions: 100.0
memtable_throughput_in_mb: 256
min_compaction_threshold: 4
name: data
read_repair_chance: 1.0
row_cache_save_period_in_seconds: 0
rows_cached: 1000
  name: offline
  replica_placement_strategy:
org.apache.cassandra.locator.RackUnawareStrategy
  replication_factor: 1

Cassandra starts properly without giving any warnngs/error but does not
create the keyspace offline
which is defined above.

Please suggest.

Thanks
Anurag


Re: logged out: #

2011-05-09 Thread Tyler Hobbs
It just means a client connection was closed.

On Mon, May 9, 2011 at 5:41 PM, Suan Aik Yeo  wrote:

> I have a Cassandra 0.7.0, 3 node cluster with logging set to DEBUG. A few
> days ago, and I'm not sure what triggered this, the logs started showing
> messages like
> DEBUG 17:37:30,399 logged out: #
> every second or so, regardless whether there was Cassandra activity. Today
> I just upgraded to 0.7.5, and they're still there. Apart from blowing up the
> log file sizes, what do these messages mean?
>
> Thanks,
> Suan
>



-- 
Tyler Hobbs
Software Engineer, DataStax 
Maintainer of the pycassa  Cassandra
Python client library


Re: Ec2 Stress Results

2011-05-09 Thread Alex Araujo

On 5/6/11 9:47 PM, Jonathan Ellis wrote:

On Fri, May 6, 2011 at 5:13 PM, Alex Araujo
  wrote:

I raised the default MAX_HEAP setting from the AMI to 12GB (~80% of
available memory).

This is going to make GC pauses larger for no good reason.
Good point - only doing writes at the moment.  I will revert the change 
and raise this conservatively once I add reads to the mix.



raised
concurrent_writes to 300 based on a (perhaps arbitrary?) recommendation in
'Cassandra: The Definitive Guide'

That's never been a good recommendation.
It seemed to contradict the '8 * number of cores' rule of thumb.  I set 
that back to the default of 32.



Based on the above, would I be correct in assuming that frequent memtable
flushes and/or commitlog I/O are the likely bottlenecks?

Did I miss where you said what CPU usage was?
I observed a consistent 200-350% initially; 300-380% once 'hot' for all 
runs.  Here is an average case sample:


 PID USER  PR  NI  VIRT  RES  SHR S %CPU %MEMTIME+  COMMAND
15108 cassandr  20   0 5406m 4.5g  15m S  331 30.4  89:32.50 jsvc


How many replicas are you writing?


Replication factor is 3.


Recent testing suggests that putting the commitlog on the raid0 volume
is better than on the root volume on ec2, since the root isn't really
a separate device.

I migrated the commitlog to the raid0 volume and retested with the above 
changes.  I/O appeared more consistent in iostat.  Here's an average 
case (%util in the teens):


avg-cpu:  %user   %nice %system %iowait  %steal   %idle
  36.844.05   13.973.04   18.42   23.68

Device: rrqm/s   wrqm/s r/s w/s   rsec/s   wsec/s 
avgrq-sz avgqu-sz   await  svctm  %util
xvdap10.00 0.000.000.00 0.00 0.00 
0.00 0.000.00   0.00   0.00
xvdb  0.00 0.000.00  222.00 0.00 18944.00
85.3313.80   62.16   0.59  13.00
xvdc  0.00 0.000.00  231.00 0.00 19480.00
84.33 5.80   25.11   0.78  18.00
xvdd  0.00 0.000.00  228.00 0.00 19456.00
85.3317.43   76.45   0.57  13.00
xvde  0.00 0.000.00  229.00 0.00 19464.00
85.0010.41   45.46   0.44  10.00
md0   0.00 0.000.00  910.00 0.00 77344.00
84.99 0.000.00   0.00   0.00


and worst case (%util above 60):

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
  44.330.00   24.540.82   15.46   14.85

Device: rrqm/s   wrqm/s r/s w/s   rsec/s   wsec/s 
avgrq-sz avgqu-sz   await  svctm  %util
xvdap10.00 1.000.004.00 0.0040.00
10.00 0.15   37.50  22.50   9.00
xvdb  0.00 0.000.00  427.00 0.00 36440.00
85.3454.12  147.85   1.69  72.00
xvdc  0.00 0.001.00  295.00 8.00 25072.00
84.7334.56   84.32   2.13  63.00
xvdd  0.00 0.000.00  355.00 0.00 30296.00
85.3494.49  257.61   2.17  77.00
xvde  0.00 0.000.00  373.00 0.00 31768.00
85.1768.50  189.33   1.88  70.00
md0   0.00 0.001.00 1418.00 8.00 120824.00
85.15 0.000.00   0.00   0.00


Overall, results were roughly the same.  The most noticeable difference 
was no timeouts until number of client threads was 350 (previously 200):


+--+--+--+--+--+--+--+
|  Server  |  Client  | --keep-  | Columns  |  Client  |  Total   | 
Combined |
|  Nodes   |  Nodes   |  going   |  | Threads  | Threads  | Rate 
(wr |
|  |  |  |  |  |  | 
ites/s)  |

+==+==+==+==+==+==+==+
|4 |3 |N | 1000 |   150|   450|  
21241   |

+--+--+--+--+--+--+--+
|4 |3 |N | 1000 |   200|   600|  
21536   |

+--+--+--+--+--+--+--+
|4 |3 |N | 1000 |   250|   750|  
19451   |

+--+--+--+--+--+--+--+
|4 |3 |N | 1000 |   300|   900|  
19741   |

+--+--+--+--+--+--+--+

Those results are after I compiled/deployed the latest cassandra-0.7 
with the patch for 
https://issues.apache.org/jira/browse/CASSANDRA-2578.  Thoughts?





logged out: #

2011-05-09 Thread Suan Aik Yeo
I have a Cassandra 0.7.0, 3 node cluster with logging set to DEBUG. A few
days ago, and I'm not sure what triggered this, the logs started showing
messages like
DEBUG 17:37:30,399 logged out: #
every second or so, regardless whether there was Cassandra activity. Today I
just upgraded to 0.7.5, and they're still there. Apart from blowing up the
log file sizes, what do these messages mean?

Thanks,
Suan


Re: datacenter ShardStrategy

2011-05-09 Thread aaron morton
If you are using 0.7 the recommended approach is to use the 
NetworkTopologyStrategy. 

Here is a recent discussion on setting the tokens in a multi DC deployment
http://www.mail-archive.com/user@cassandra.apache.org/msg12898.html

Can you move to 0.7  ?

Cheers

-
Aaron Morton
Freelance Cassandra Developer
@aaronmorton
http://www.thelastpickle.com

On 10 May 2011, at 05:50, Anurag Gujral wrote:

> Jonathan thanks for your email. If I use datacenter shard strategy in 
> cassandra 
> how will it effect the ring structure of the cassandra cluster can you please 
> explain.
> 
> Thanks
> Anurag
> 
> On Sun, May 8, 2011 at 11:13 PM, Jonathan Ellis  wrote:
> Step 0: Upgrade to 0.7 and read about NetworkTopologyStrategy instead.
> 
> Intro: 
> http://www.datastax.com/dev/blog/deploying-cassandra-across-multiple-data-centers
> 
> On Sun, May 8, 2011 at 8:09 PM, Anurag Gujral  wrote:
> > Hi All,
> >I want to use datacenter ShardStrategy  in my cassandra setup
> > .Can someone please let me know what steps / configuration changes I need to
> > make.
> > Thanks
> > Anurag
> 
> 
> 
> --
> Jonathan Ellis
> Project Chair, Apache Cassandra
> co-founder of DataStax, the source for professional Cassandra support
> http://www.datastax.com
> 



Re: Memory Usage During Read

2011-05-09 Thread Sanjeev Kulkarni
Hi Adam,
We have been facing some similar issues of late. Wondering if Jonathan's
suggestions worked for you.
Thanks!

On Sat, May 7, 2011 at 6:37 PM, Jonathan Ellis  wrote:

> The live:serialized size ratio depends on what your data looks like
> (small columns will be less efficient than large blobs) but using the
> rule of thumb of 10x, around 1G * (1 + memtable_flush_writers +
> memtable_flush_queue_size).
>
> So first thing I would do is drop writers and queue to 1 and 1.
>
> Then I would drop the max heap to 1G, memtable size to 8MB so the heap
> dump is easier to analyze. Then let it OOM and look at the dump with
> http://www.eclipse.org/mat/
>
> On Sat, May 7, 2011 at 3:54 PM, Serediuk, Adam
>  wrote:
> > How much memory should a single hot cf with a 128mb memtable take with
> row and key caching disabled during read?
> >
> > Because I'm seeing heap go from 3.5gb skyrocketing straight to max
> (regardless of the size, 8gb and 24gb both do the same) at which time the
> jvm will do nothing but full gc and is unable to reclaim any meaningful
> amount of memory. Cassandra then becomes unusable.
> >
> > I see the same behavior with smaller memtables, eg 64mb.
> >
> > This happens well into the read operation an only on a small number of
> nodes in the cluster(1-4 out of a total of 60 nodes.)
> >
> > Sent from my iPhone
> >
> > On May 6, 2011, at 22:45, "Jonathan Ellis"  wrote:
> >
> >> You don't GC storm without legitimately having a too-full heap.  It's
> >> normal to see occasional full GCs from fragmentation, but that will
> >> actually compact the heap and everything goes back to normal IF you
> >> had space actually freed up.
> >>
> >> You say you've played w/ memtable size but that would still be my bet.
> >> Most people severely underestimate how much space this takes (10x in
> >> memory over serialized size), which will bite you when you have lots
> >> of CFs defined.
> >>
> >> Otherwise, force a heap dump after a full GC and take a look to see
> >> what's referencing all the memory.
> >>
> >> On Fri, May 6, 2011 at 12:25 PM, Serediuk, Adam
> >>  wrote:
> >>> We're troubleshooting a memory usage problem during batch reads. We've
> spent the last few days profiling and trying different GC settings. The
> symptoms are that after a certain amount of time during reads one or more
> nodes in the cluster will exhibit extreme memory pressure followed by a gc
> storm. We've tried every possible JVM setting and different GC methods and
> the issue persists. This is pointing towards something instantiating a lot
> of objects and keeping references so that they can't be cleaned up.
> >>>
> >>> Typically nothing is ever logged other than the GC failures however
> just now one of the nodes emitted logs we've never seen before:
> >>>
> >>>  INFO [ScheduledTasks:1] 2011-05-06 15:04:55,085 StorageService.java
> (line 2218) Unable to reduce heap usage since there are no dirty column
> families
> >>>
> >>> We have tried increasing the heap on these nodes to large values, eg
> 24GB and still run into the same issue. We're running 8GB of heap normally
> and only one or two nodes will ever exhibit this issue, randomly. We don't
> use key/row caching and our memtable sizing is 64mb/0.3. Larger or smaller
> memtables make no difference in avoiding the issue. We're on 0.7.5, mmap,
> jna and jdk 1.6.0_24
> >>>
> >>> We've somewhat hit the wall in troubleshooting and any advice is
> greatly appreciated.
> >>>
> >>> --
> >>> Adam
> >>>
> >>
> >>
> >>
> >> --
> >> Jonathan Ellis
> >> Project Chair, Apache Cassandra
> >> co-founder of DataStax, the source for professional Cassandra support
> >> http://www.datastax.com
> >>
> >
> >
>
>
>
> --
> Jonathan Ellis
> Project Chair, Apache Cassandra
> co-founder of DataStax, the source for professional Cassandra support
> http://www.datastax.com
>


Re: New node not joining

2011-05-09 Thread Sanjeev Kulkarni
Thanks!

On Sun, May 8, 2011 at 3:40 PM, aaron morton wrote:

> Ah, I see the case you are talking about.
>
> If the node will auto bootstrap on startup if when it joins the ring: it is
> not already bootstrapped, auto bootstrap is enabled, and the node is not in
> it's own seed list.
>
> In the auto bootstrap process then finds the token it wants, but aborts the
> process if there are no non system tables defined.That may happen because
> the bootstrap code finds the node with the highest load and splits it's
> range, if all the nodes have zero load (no user data) then that process is
> unreliable. But it's also unreliable if there is a schema and no data.
>
> Created https://issues.apache.org/jira/browse/CASSANDRA-2625 to see if it
> can be changed.
>
> Thanks
>
> -
> Aaron Morton
> Freelance Cassandra Developer
> @aaronmorton
> http://www.thelastpickle.com
>
> On 7 May 2011, at 05:25, Len Bucchino wrote:
>
> While I agree that what you suggested is a very good idea the bootstrapping
> process _*should*_ work properly.
>
> Here is some additional detail on the original problem.  If the current
> node that you are trying to bootstrap has itself listed in seeds in its yaml
> then it will be able to bootstrap on an empty schema.  If it does not have
> itself listed in seeds in its yaml and you have and empty schema then the
> bootstrap process will not complete and no errors will be reported in the
> logs even with debug enabled.
>
> *From:* aaron morton [mailto:aa...@thelastpickle.com]
> *Sent:* Thursday, May 05, 2011 6:51 PM
> *To:* user@cassandra.apache.org
> *Subject:* Re: New node not joining
>
> When adding nodes it is a *very* good idea to manually set the tokens, see
> http://wiki.apache.org/cassandra/Operations#Load_balancing
>
> bootstrap is a process that happens only once on a node, where as well as
> telling the other nodes it's around it asks them to stream over the data it
> will no be responsible for.
>
> nodetool loadbalance is an old utility that should have better warnings not
> to use it. The best way to load balance the cluster is manually creating the
> tokens and assigning them either using the initial_token config param or
> using nodetool move.
>
> Hope that helps.
>
> -
> Aaron Morton
> Freelance Cassandra Developer
> @aaronmorton
> http://www.thelastpickle.com
>
> On 6 May 2011, at 08:37, Sanjeev Kulkarni wrote:
>
>
> Here is what I did.
> I booted up the first one. After that I started the second one with
> bootstrap turned off.
> Then I did a nodetool loadbalance on the second node.
> After which I added the third node again with bootstrap turned off. Then
> did the loadbalance again on the third node.
> This seems to have successfully completed and I am now able to read/write
> into my system.
>
> Thanks!
> On Thu, May 5, 2011 at 1:22 PM, Len Bucchino 
> wrote:
> I just rebuilt the cluster in the same manner as I did originally except
> after I setup the first node I added a keyspace and column family before
> adding any new nodes.  This time the 3rd node auto bootstrapped
> successfully.
>
> *From:* Len Bucchino [mailto:len.bucch...@veritix.com]
> *Sent:* Thursday, May 05, 2011 1:31 PM
>
> *To:* user@cassandra.apache.org
> *Subject:* RE: New node not joining
>
>
> Also, setting auto_bootstrap to false and setting token to the one that it
> said it would use in the logs allows the new node to join the ring.
>
> *From:* Len Bucchino [mailto:len.bucch...@veritix.com]
> *Sent:* Thursday, May 05, 2011 1:25 PM
> *To:* user@cassandra.apache.org
> *Subject:* RE: New node not joining
>
> Adding the fourth node to the cluster with an empty schema using
> auto_bootstrap was not successful.  A nodetool netstats on the new node
> shows “Mode: Joining: getting bootstrap token” similar to what the third
> node did before it was manually added.  Also, there are no exceptions in the
> logs but it never joins the ring.
>
> *From:* Sanjeev Kulkarni [mailto:sanj...@locomatix.com]
> *Sent:* Thursday, May 05, 2011 11:47 AM
> *To:* user@cassandra.apache.org
> *Subject:* Re: New node not joining
>
> Hi Len,
> This looks like a decent workaround. I would be very interested to see how
> the addition of the 4th node went. Please post it whenever you get a chance.
> Thanks!
>
> On Thu, May 5, 2011 at 6:47 AM, Len Bucchino 
> wrote:
> I have the same problem on 0.7.5 auto bootstrapping a 3rd node onto an
> empty 2 node test cluster (the two nodes were manually added) and the it
> currently has an empty schema.  My log entries look similar to yours.  I
> took the new token it says its going to use from the log file added it to
> the yaml and turned off auto bootstrap and the node added fine.  I'm
> bringing up a 4th node now and will see if it has the same problem auto
> bootstrapping.
>
> --
>
> *From:* Sanjeev Kulkarni [sanj...@locomatix.com]
> *Sent:* Thursday, May 05, 2011 2:18 AM
> *To:* user@cassandra.apache.org
> *Subject:* New node not joi

Re: datacenter ShardStrategy

2011-05-09 Thread Anurag Gujral
Jonathan thanks for your email. If I use datacenter shard strategy in
cassandra
how will it effect the ring structure of the cassandra cluster can you
please explain.

Thanks
Anurag

On Sun, May 8, 2011 at 11:13 PM, Jonathan Ellis  wrote:

> Step 0: Upgrade to 0.7 and read about NetworkTopologyStrategy instead.
>
> Intro:
> http://www.datastax.com/dev/blog/deploying-cassandra-across-multiple-data-centers
>
> On Sun, May 8, 2011 at 8:09 PM, Anurag Gujral 
> wrote:
> > Hi All,
> >I want to use datacenter ShardStrategy  in my cassandra setup
> > .Can someone please let me know what steps / configuration changes I need
> to
> > make.
> > Thanks
> > Anurag
>
>
>
> --
> Jonathan Ellis
> Project Chair, Apache Cassandra
> co-founder of DataStax, the source for professional Cassandra support
> http://www.datastax.com
>


Re: Native heap leaks?

2011-05-09 Thread Hannes Schmidt
On Thu, May 5, 2011 at 4:16 PM, aaron morton  wrote:
> Hannes,
>        To get a baseline of behaviour set disk_access to standard. You will 
> probably want to keep it like that if you want better control over the memory 
> on the box.

I'll do a test with standard and report back.

>
>        Also connect to the box with JConsole and look at the PermGen space 
> used it is not included in the max heap space setting. You can also check the 
> heap usage there, running inside of 1G is very tricky.

PermGen is at 25M which doesn't explain the 700-1000M RSS overhead.
Nevertheless, I wasn't aware that PermGen isn't capped by -Xmx, so
thank you for pointing it out.

>
>        If you want to keep it inside of 2Gb trying setting the heap max to 
> 1.5G, use standard IO, disable caches, and use a low memtable threshold (it 
> depends on how many CF's you have, try 32mb)

I'm not sure I follow. Besides the slowly increasing RSS size,
Cassandra works great for us with 1G. Don't the caches and memtables
live in the heap? I am not seeing any GC pressure at all so 1G should
be ok. Or do the caches and memtables have native components attached
to them like JNA-allocated memory or direct byte buffers?

>
> Hope that helps.
>
> -
> Aaron Morton
> Freelance Cassandra Developer
> @aaronmorton
> http://www.thelastpickle.com
>
> On 5 May 2011, at 22:30, Hannes Schmidt wrote:
>
>> This was my first thought, too. We switched to mmap_index_only and
>> didn't see any change in behavior. Looking at the smaps file attached
>> to my original post, one can see that the mmapped index files take up
>> only a minuscule part of RSS.
>>
>> On Wed, May 4, 2011 at 11:37 PM, Oleg Anastasyev  wrote:
>>> Probably this is because of mmapped io access mode, which is enabled by 
>>> default
>>> in 64-bit VMs - RAM is occupied by data files.
>>> If you have such a tight memory reqs, you can turn on standard access mode 
>>> in
>>> storage-conf.xml, but dont expect it to work fast then:
>>> 
>>>
>>>
>>>  standard
>>>
>>>
>
>


Re: Native heap leaks?

2011-05-09 Thread Hannes Schmidt
On Thu, May 5, 2011 at 5:56 PM, Benjamin Coverston
 wrote:
> How many column families do you have?

We have 10 key spaces, each with 2 column families.

>
> On 5/4/11 12:50 PM, Hannes Schmidt wrote:
>>
>> Hi,
>>
>> We are using Cassandra 0.6.12 in a cluster of 9 nodes. Each node is
>> 64-bit, has 4 cores and 4G of RAM and runs on Ubuntu Lucid with the
>> stock 2.6.32-31-generic kernel. We use the Sun/Oracle JDK.
>>
>> Here's the problem: The Cassandra process starts up with 1.1G resident
>> memory (according to top) but slowly grows to 2.1G at a rate that
>> seems proportional to the write load. No writes, no growth. The node
>> is running other memory-sensitive applications (a second JVM for our
>> in-house webapp and a short-lived C++ program) so we need to ensure
>> that each process stays within certain bounds as far as memory
>> requirements go. The nodes OOM and crash when the Cassandra process is
>> at 2.1G so I can't say if the growth is bounded or not.
>>
>> Looking at the /proc/$pid/smaps for the Cassandra process it seems to
>> me that it is the native heap of the Cassandra JVM that is leaking. I
>> attached a readable version of the smaps file generated by [1].
>>
>> Some more data: Cassandra runs with default command line arguments,
>> which means it gets 1G heap. The JNA jar is present and Cassandra logs
>> that the memory locking was successful. In storage-conf.xml,
>> DiskAccessMode is mmap_index_only. Other than that and some increased
>> timeouts we left the defaults. Swap is completely disabled. I don't
>> think this is related but I am mentioning it anyways: overcommit [2]
>> is always-on (vm.overcommit_memory=1). Without that we get OOMs when
>> our application JVM is fork()'ing and exec()'ing our C++program even
>> though there is enough free RAM to satisfy the demands of the C++
>> program. We think this is caused by a flawed kernel heuristic that
>> assumes that the forked process (our C++ app) is as big as the forking
>> one (the 2nd JVM). Anyways, the Cassandra process leaks with both,
>> vm.overcommit_memory=0 (the default) and 1.
>>
>> Whether it is the native heap that leaks or something else, I think
>> that 1.1G of additional RAM for 1G of Java heap can't be normal. I'd
>> be grateful for any insights or pointers at what to try next.
>>
>> [1] http://bmaurer.blogspot.com/2006/03/memory-usage-with-smaps.html
>> [2] http://www.win.tue.nl/~aeb/linux/lk/lk-9.html#ss9.6
>
> --
> Ben Coverston
> DataStax -- The Apache Cassandra Company
> http://www.datastax.com/
>
>


Re: Native heap leaks?

2011-05-09 Thread Hannes Schmidt
> I have not looked into smaps before. But it actually seems odd that that
> mmaped Index files are taking up so *little memory*.  Are they only a
> few kb on disk?

The sum of the sizes of all *-Index.db files in /var/lib/cassandra is 2924kb.

> Is this a snapshot taken shortly after the process
> started or before the OOM killer is presumably about to come along.

It was taken after a day of considerable write load.

>  How long does it take to go from 1.1 G to 2.1 G resident?

Two days of active use with considerable write load.

>  Either way, it
> would be worthwhile to set one node to standard io to make sure it's
> really not mmap causing the problem.

Will do that. I am currently a/b testing OpenJDK vs. Sun/Oracle JDK.
Next test will be for standard vs. auto.

> Anyway, assuming it's not mmap, here are the other similar threads on
> the topic.  Unfortunately none of them claim an obvious solution:
>
> http://www.mail-archive.com/user@cassandra.apache.org/msg09279.html
> http://www.mail-archive.com/user@cassandra.apache.org/msg08063.html
> http://www.mail-archive.com/user@cassandra.apache.org/msg12036.html
> http://mail.openjdk.java.net/pipermail/hotspot-dev/2011-April/004091.html
>

Thank you. I wasn't aware of these threads. Must have googled wrong.


Re: Index interval tuning

2011-05-09 Thread Héctor Izquierdo Seliva
El lun, 09-05-2011 a las 17:58 +0200, Peter Schuller escribió:
> > I have a few sstables with around 500 million keys, and memory usage has
> > grown a lot, I suppose because of the indexes. This sstables are
> > comprised of skinny rows, but a lot of them. Would tuning index interval
> > make the memory usage go down? And what would the performance hit be?
> 
> Assuming no row caching, and assuming you're talking about heap usage
> and not the virtual size of the process in top, the primary two things
> that will grow with row count are (1) bloom filters for sstables and
> (2) the sampled index keys. Bloom filters are of a certain size to
> achieve a sufficiently small false positive rate. That target rate
> could be increased to allow smaller bloom filters, but that is not
> exposed as a configuration option and would require code changes.
> 

No row cache and no key cache. I've tried with both, but the keys being
read are constantly changing, and I didn't see hit ratios beyond 0.8 %.

That reminds me, my false positive ration is stuck at 1.0, so I guess
bloom filters aren't doing a lot for me.

> For key sampling, the primary performance penalty should be CPU and
> maybe some disk. On average, when looking up a key an sstable index
> file, you'll read sample interval/2 entries and deserialize them
> before finding the one you're after. Increasing sampling interval will
> thus increase the amount of deserialization taking place, as well as
> make the average range of data span additional pages on disk. The
> impact on disk is difficult to judge and likely depends a lot on i/o
> scheduling and other details.
> 

So the only thing I can do is test it and see how it goes. To make the
change affective, should I do anything beyond changing the value in
cassandra.yaml and restart the node? I'll try first with 256 and see
what happens.



Re: Index interval tuning

2011-05-09 Thread Peter Schuller
> I have a few sstables with around 500 million keys, and memory usage has
> grown a lot, I suppose because of the indexes. This sstables are
> comprised of skinny rows, but a lot of them. Would tuning index interval
> make the memory usage go down? And what would the performance hit be?

Assuming no row caching, and assuming you're talking about heap usage
and not the virtual size of the process in top, the primary two things
that will grow with row count are (1) bloom filters for sstables and
(2) the sampled index keys. Bloom filters are of a certain size to
achieve a sufficiently small false positive rate. That target rate
could be increased to allow smaller bloom filters, but that is not
exposed as a configuration option and would require code changes.

For key sampling, the primary performance penalty should be CPU and
maybe some disk. On average, when looking up a key an sstable index
file, you'll read sample interval/2 entries and deserialize them
before finding the one you're after. Increasing sampling interval will
thus increase the amount of deserialization taking place, as well as
make the average range of data span additional pages on disk. The
impact on disk is difficult to judge and likely depends a lot on i/o
scheduling and other details.

-- 
/ Peter Schuller


Index interval tuning

2011-05-09 Thread Héctor Izquierdo Seliva
Hi everyone.

I have a few sstables with around 500 million keys, and memory usage has
grown a lot, I suppose because of the indexes. This sstables are
comprised of skinny rows, but a lot of them. Would tuning index interval
make the memory usage go down? And what would the performance hit be? 

I had to up heap from 5GB to 8GB and tune memtable thresholds way lower
than what I was using with less data.

I'm running 0.7.5 in a 6 machine cluster with RF=3. HW is quad core
intel machines with 16GB ram, and md raid0 on three sata disks.

Thanks all for your time!




Re: Does anyone have Cassandra running on OpenSolaris?

2011-05-09 Thread Jeffrey Kesselman
Ah. That solved it. ty.


On Mon, May 9, 2011 at 11:29 AM, Roland Gude  wrote:
>
> Use bash as a shell
>
> #bash bin/cassandra -f
>
>
> -Ursprüngliche Nachricht-
> Von: Jeffrey Kesselman [mailto:jef...@gmail.com]
> Gesendet: Montag, 9. Mai 2011 17:12
> An: user@cassandra.apache.org
> Betreff: Does anyone have Cassandra running on OpenSolaris?
>
> I get this error:
>
> bin/cassandra: syntax error at line 29: `system_memory_in_mb=$' unexpected
>
> Thanks
>
> JK
>
>
> --
> It's always darkest just before you are eaten by a grue.
>
>
>



-- 
It's always darkest just before you are eaten by a grue.


AW: Does anyone have Cassandra running on OpenSolaris?

2011-05-09 Thread Roland Gude

Use bash as a shell

#bash bin/cassandra -f


-Ursprüngliche Nachricht-
Von: Jeffrey Kesselman [mailto:jef...@gmail.com] 
Gesendet: Montag, 9. Mai 2011 17:12
An: user@cassandra.apache.org
Betreff: Does anyone have Cassandra running on OpenSolaris?

I get this error:

bin/cassandra: syntax error at line 29: `system_memory_in_mb=$' unexpected

Thanks

JK


-- 
It's always darkest just before you are eaten by a grue.




Does anyone have Cassandra running on OpenSolaris?

2011-05-09 Thread Jeffrey Kesselman
I get this error:

bin/cassandra: syntax error at line 29: `system_memory_in_mb=$' unexpected

Thanks

JK


-- 
It's always darkest just before you are eaten by a grue.


Re: RequestResponseStage Assertion Error

2011-05-09 Thread Jonathan Ellis
Fixed since 0.7.4. You should upgrade.
https://issues.apache.org/jira/browse/CASSANDRA-2282

On Sun, May 8, 2011 at 2:37 PM, Eric tamme  wrote:
> I have a 4 node ring that was  setup with tokens a,b,c,d using NTS and
> 2 nodes in each of 2 datacenters with a replication of DC1:1, DC2:1.
> I was getting uneven replica placement so I did a drop keyspace,
> followed by a nodetool move to DC1 having tokens (a,b) and DC2 having
> tokens (a+1,b+1) , then I removed the old data directory and recreated
> the keyspace.
>
> This has resolved my uneven replication, but now on one of my nodes I
> consistently get these errors.
>
> ERROR [RequestResponseStage:1] 2011-05-08 20:52:28,824
> DebuggableThreadPoolExecutor.java (line 103) Error in
> ThreadPoolExecutor
> java.lang.AssertionError
>        at 
> org.apache.cassandra.service.ReadCallback.response(ReadCallback.java:127)
>        at 
> org.apache.cassandra.net.ResponseVerbHandler.doVerb(ResponseVerbHandler.java:49)
>        at 
> org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:72)
>        at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
>        at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
>        at java.lang.Thread.run(Thread.java:636)
>
> I dont know if i ... shutdown the node in the middle of one of the
> earlier operations or what.  It seems to insert data fine, and my
> distribution is very even.
>
> What is this error, what is causing it, and how do i fix it?
>
> Thanks
> -Eric
>



-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of DataStax, the source for professional Cassandra support
http://www.datastax.com


Re: compaction strategy

2011-05-09 Thread Terje Marthinussen
Sorry, I was referring to the claim that "one big file" was a problem, not
the non-overlapping part.

If you never compact to a single file, you never get rid of all
generations/duplicates.
With non-overlapping files covering small enough token ranges, compacting
down to one file is not a big issue.

Terje

On Mon, May 9, 2011 at 8:52 PM, David Boxenhorn  wrote:

> If they each have their own copy of the data, then they are *not*
> non-overlapping!
>
> If you have non-overlapping SSTables (and you know the min/max keys), it's
> like having one big SSTable because you know exactly where each row is, and
> it becomes easy to merge a new SSTable in small batches, rather than in one
> huge batch.
>
> The only step that you have to add to the current merge process is, when
> you going to write a new SSTable, if it's too big, to write N
> (non-overlapping!) pieces instead.
>
>
> On Mon, May 9, 2011 at 12:46 PM, Terje Marthinussen <
> tmarthinus...@gmail.com> wrote:
>
>> Yes, agreed.
>>
>> I actually think cassandra has to.
>>
>> And if you do not go down to that single file, how do you avoid getting
>> into a situation where you can very realistically end up with 4-5 big
>> sstables each having its own copy of the same data massively increasing disk
>> requirements?
>>
>> Terje
>>
>> On Mon, May 9, 2011 at 5:58 PM, David Boxenhorn wrote:
>>
>>> "I'm also not too much in favor of triggering major compactions, because
>>> it mostly have a nasty effect (create one huge sstable)."
>>>
>>> If that is the case, why can't major compactions create many,
>>> non-overlapping SSTables?
>>>
>>> In general, it seems to me that non-overlapping SSTables have all the
>>> advantages of big SSTables (i.e. you know exactly where the data is) without
>>> the disadvantages that come with being big. Why doesn't Cassandra take
>>> advantage of that in a major way?
>>>
>>
>>
>


Re: compaction strategy

2011-05-09 Thread David Boxenhorn
If they each have their own copy of the data, then they are *not*
non-overlapping!

If you have non-overlapping SSTables (and you know the min/max keys), it's
like having one big SSTable because you know exactly where each row is, and
it becomes easy to merge a new SSTable in small batches, rather than in one
huge batch.

The only step that you have to add to the current merge process is, when you
going to write a new SSTable, if it's too big, to write N (non-overlapping!)
pieces instead.


On Mon, May 9, 2011 at 12:46 PM, Terje Marthinussen  wrote:

> Yes, agreed.
>
> I actually think cassandra has to.
>
> And if you do not go down to that single file, how do you avoid getting
> into a situation where you can very realistically end up with 4-5 big
> sstables each having its own copy of the same data massively increasing disk
> requirements?
>
> Terje
>
> On Mon, May 9, 2011 at 5:58 PM, David Boxenhorn  wrote:
>
>> "I'm also not too much in favor of triggering major compactions, because
>> it mostly have a nasty effect (create one huge sstable)."
>>
>> If that is the case, why can't major compactions create many,
>> non-overlapping SSTables?
>>
>> In general, it seems to me that non-overlapping SSTables have all the
>> advantages of big SSTables (i.e. you know exactly where the data is) without
>> the disadvantages that come with being big. Why doesn't Cassandra take
>> advantage of that in a major way?
>>
>
>


Re: RequestResponseStage Assertion Error

2011-05-09 Thread Eric tamme
On Mon, May 9, 2011 at 7:18 AM, aaron morton  wrote:
> You can check the schema using cassandra-cli, run "describe cluster" it will
> tell you how many schemas are defined.
> I think the best approach when you discover bad schemas is to drain then
> stop the affected node, remove the Location, Migrations and Schema files in
> the System data directory, restart and let gossip tell the node whats new.
> Note this will also remove the nodes initial token, which will be read again
> from the config file.
> I've cannot remember hearing about a better solution.
> The errors are probably http://wiki.apache.org/cassandra/FAQ#jna
> Hope that helps.


Hmm.. nope just checked - all nodes are on the same schema.  As far as
what the error means, I meant what is "RequestResponseStage Assertion
Error"  It does not seem to be critical, as the cluster is operating
"normally" except for the errors printing out on this node.

Any other thoughts?

Thanks again,
Eric


Re: RequestResponseStage Assertion Error

2011-05-09 Thread aaron morton
You can check the schema using cassandra-cli, run "describe cluster" it will 
tell you how many schemas are defined. 

I think the best approach when you discover bad schemas is to drain then stop 
the affected node, remove the Location, Migrations and Schema files in the 
System data directory, restart and let gossip tell the node whats new. Note 
this will also remove the nodes initial token, which will be read again from 
the config file. 

I've cannot remember hearing about a better solution.   

The errors are probably http://wiki.apache.org/cassandra/FAQ#jna

Hope that helps. 

-
Aaron Morton
Freelance Cassandra Developer
@aaronmorton
http://www.thelastpickle.com

On 9 May 2011, at 22:53, Eric tamme wrote:

> On Sun, May 8, 2011 at 7:17 PM, aaron morton  wrote:
>> What version are you on ?
>> 
>> Check the nodetool ring from each node in your cluster to check they have 
>> the same view.
> 
> 
> I am running 0.7.3.  I checked nodetool ring on all hosts and it all
> comes back the same.  I had some funky business when i dropped the
> keyspace (basically an out of memory error) so I added jna.jar to the
> host I was dropping from, which worked ... then the others didnt.  So
> I went back  and added jna.jar all around and all but this host seemed
> to pickup the change.  I had to drop the keyspace again on this host
> that is giving me errors - so it had a mismatched schema for some
> period ... and maybe it still does???
> 
> Any more thoughts?  Can I check the schema?  Can I "force" a schema
> update on the node that is giving me errors?... and What do those
> errors mean exactly?
> 
> Thanks again,
> Eric



Re: RequestResponseStage Assertion Error

2011-05-09 Thread Eric tamme
On Sun, May 8, 2011 at 7:17 PM, aaron morton  wrote:
> What version are you on ?
>
> Check the nodetool ring from each node in your cluster to check they have the 
> same view.


I am running 0.7.3.  I checked nodetool ring on all hosts and it all
comes back the same.  I had some funky business when i dropped the
keyspace (basically an out of memory error) so I added jna.jar to the
host I was dropping from, which worked ... then the others didnt.  So
I went back  and added jna.jar all around and all but this host seemed
to pickup the change.  I had to drop the keyspace again on this host
that is giving me errors - so it had a mismatched schema for some
period ... and maybe it still does???

Any more thoughts?  Can I check the schema?  Can I "force" a schema
update on the node that is giving me errors?... and What do those
errors mean exactly?

Thanks again,
Eric


Re: compaction strategy

2011-05-09 Thread Terje Marthinussen
Yes, agreed.

I actually think cassandra has to.

And if you do not go down to that single file, how do you avoid getting into
a situation where you can very realistically end up with 4-5 big sstables
each having its own copy of the same data massively increasing disk
requirements?

Terje

On Mon, May 9, 2011 at 5:58 PM, David Boxenhorn  wrote:

> "I'm also not too much in favor of triggering major compactions, because it
> mostly have a nasty effect (create one huge sstable)."
>
> If that is the case, why can't major compactions create many,
> non-overlapping SSTables?
>
> In general, it seems to me that non-overlapping SSTables have all the
> advantages of big SSTables (i.e. you know exactly where the data is) without
> the disadvantages that come with being big. Why doesn't Cassandra take
> advantage of that in a major way?
>


Re: Migrating all rows from 0.6.13 to 0.7.5 over thrift?

2011-05-09 Thread aaron morton
That was my initial thought, just wanted to see if there was anything else 
going on. Sounds like Henrik has a workaround so all is well. 

Cheers
 
-
Aaron Morton
Freelance Cassandra Developer
@aaronmorton
http://www.thelastpickle.com

On 9 May 2011, at 18:10, Jonathan Ellis wrote:

> Strongly suspect that he has invalid unicode characters in his keys.
> 0.6 wasn't as good at validating those as 0.7.
> 
> On Sun, May 8, 2011 at 8:51 PM, aaron morton  wrote:
>> Out of interest i've done some more digging. Not sure how much more I've
>> contributed but here goes...
>> Ran this against an clean v 0.6.12 and it works (I expected it to fail on
>> the first read)
>> client = pycassa.connect()
>> standard1 = pycassa.ColumnFamily(client, 'Keyspace1', 'Standard1')
>> uni_str = u"数時間"
>> uni_str = uni_str.encode("utf-8")
>> 
>> print "Insert row", uni_str
>> print uni_str, standard1.insert(uni_str, {"bar" : "baz"})
>> print "Read rows"
>> print "???", standard1.get("???")
>> print uni_str, standard1.get(uni_str)
>> Ran that against the current 0.6 head from the command line and it works.
>> Run against the code running in intelli J and the code fails as expected.
>> Code also fails as expected on 0.7.5
>> At one stage I grabbed the buffer created by fastbinary.encode_binary in the
>> python generated batch_mutate_args.write() and it looked like the key was
>> correctly utf-8 encoded (matching bytes to the previous utf-8 encoding of
>> that string).
>> I've updated the git
>> project https://github.com/amorton/cassandra-unicode-bug
>> Am going to leave it there unless there is interest to keep looking
>> into it.
>> -
>> Aaron Morton
>> Freelance Cassandra Developer
>> @aaronmorton
>> http://www.thelastpickle.com
>> On 8 May 2011, at 13:31, Jonathan Ellis wrote:
>> 
>> Right, that's sort of a half-repair: it will repair differences in
>> replies it got, but it won't doublecheck md5s on the rest in the
>> background. So if you're doing CL.ONE reads this is a no-op.
>> 
>> On Sat, May 7, 2011 at 4:25 PM, aaron morton 
>> wrote:
>> 
>> I remembered something like that so had a look at
>> RangeSliceResponseResolver.resolve()  in 0.6.12 and it looks like it
>> schedules the repairs...
>> 
>>protected Row getReduced()
>> 
>>{
>> 
>>ColumnFamily resolved =
>> ReadResponseResolver.resolveSuperset(versions);
>> 
>>ReadResponseResolver.maybeScheduleRepairs(resolved, table,
>> key, versions, versionSources);
>> 
>>versions.clear();
>> 
>>versionSources.clear();
>> 
>>return new Row(key, resolved);
>> 
>>}
>> 
>> 
>> Is that right?
>> 
>> 
>> -
>> 
>> Aaron Morton
>> 
>> Freelance Cassandra Developer
>> 
>> @aaronmorton
>> 
>> http://www.thelastpickle.com
>> 
>> On 8 May 2011, at 00:48, Jonathan Ellis wrote:
>> 
>> range_slices respects consistencylevel, but only single-row reads and
>> 
>> multiget do the *repair* part of RR.
>> 
>> On Sat, May 7, 2011 at 1:44 AM, aaron morton 
>> wrote:
>> 
>> get_range_slices() does read repair if enabled (checked
>> DoConsistencyChecksBoolean in the config, it's on by default) so you should
>> be getting good reads. If you want belt-and-braces run nodetool repair
>> first.
>> 
>> Hope that helps.
>> 
>> 
>> On 7 May 2011, at 11:46, Jeremy Hanna wrote:
>> 
>> Great!  I just wanted to make sure you were getting the information you
>> needed.
>> 
>> On May 6, 2011, at 6:42 PM, Henrik Schröder wrote:
>> 
>> Well, I already completed the migration program. Using get_range_slices I
>> could migrate a few thousand rows per second, which means that migrating all
>> of our data would take a few minutes, and we'll end up with pristine
>> datafiles for the new cluster. Problem solved!
>> 
>> I'll see if I can create datafiles in 0.6 that are uncleanable in 0.7 so
>> that you all can repeat this and hopefully fix it.
>> 
>> 
>> /Henrik Schröder
>> 
>> On Sat, May 7, 2011 at 00:35, Jeremy Hanna 
>> wrote:
>> 
>> If you're able, go into the #cassandra channel on freenode (IRC) and talk to
>> driftx or jbellis or aaron_morton about your problem.  It could be that you
>> don't have to do all of this based on a conversation there.
>> 
>> On May 6, 2011, at 5:04 AM, Henrik Schröder wrote:
>> 
>> I'll see if I can make some example broken files this weekend.
>> 
>> 
>> /Henrik Schröder
>> 
>> On Fri, May 6, 2011 at 02:10, aaron morton  wrote:
>> 
>> The difficulty is the different thrift clients between 0.6 and 0.7.
>> 
>> If you want to roll your own solution I would consider:
>> 
>> - write an app to talk to 0.6 and pull out the data using keys from the
>> other system (so you know can check referential integrity while you are at
>> it). Dump the data to flat file.
>> 
>> - write an app to talk to 0.7 to load the data back in.
>> 
>> I've not given up digging on your migration problem, having to manually dump
>

Re: Adding a new node

2011-05-09 Thread aaron morton
Gossip should help them converge on the truth. 

Can you give an example of the different views from nodetool ring ? 

Also check the logs to see if there is anything been logged about endpoints. 

Hope that helps. 
 
-
Aaron Morton
Freelance Cassandra Developer
@aaronmorton
http://www.thelastpickle.com

On 9 May 2011, at 18:11, Venkat Rama wrote:

> Thanks for the pointer.  I restarted entire cluster and started nodes at the 
> same time. However, I still see the issue.  The view is not consistant. Am 
> running 0.7.5. 
> In general, if a node with bad ring view starts first, then I guess the 
> restart also doesnt help as it might be propagating its view.  Is this 
> assumption correct?
> 
> 
> 
> On Sun, May 8, 2011 at 9:02 PM, aaron morton  wrote:
> It is possible to change IP address of a node, background 
> http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/change-node-IP-address-td6197607.html
>  
> 
> If you have already bought a new node back with a different IP and the nodes 
> in the cluster have different views of the ring (nodetool ring) you should 
> see 
> http://www.datastax.com/docs/0.7/troubleshooting/index#view-of-ring-differs-between-some-nodes
>  
> 
> What version are you on and what does nodetool ring say?
> 
> Hope that helps.
> 
> -
> Aaron Morton
> Freelance Cassandra Developer
> @aaronmorton
> http://www.thelastpickle.com
> 
> On 9 May 2011, at 12:24, Venkat Rama wrote:
> 
>> Hi,
>> 
>> I am trying to bring up a new node (with different IP) to replace a dead 
>> node on cassandra 0.7.5.   Rather than bootstrap, I am copying the SSTable 
>> files to the new node(backed up files) as my data runs into several GB.  
>> Although the node successfully joins the ring, some of the ring nodes still 
>> seem to point to the old dead node as seen from ring command.  Is there a 
>> way to notify all nodes about the new node?  Am looking for options that can 
>> bring the cluster back to it original state in a faster and reliable manner 
>> since I do have all the SSTable files.   
>> One option I looked at was to remove all system table and restart the entire 
>> cluster.  But I loose the schemas with this approach. 
>> 
>> Thanks in advance for your reply.  
>> 
>> VR
>> 
>> 
> 
> 



Re: compaction strategy

2011-05-09 Thread David Boxenhorn
"I'm also not too much in favor of triggering major compactions, because it
mostly have a nasty effect (create one huge sstable)."

If that is the case, why can't major compactions create many,
non-overlapping SSTables?

In general, it seems to me that non-overlapping SSTables have all the
advantages of big SSTables (i.e. you know exactly where the data is) without
the disadvantages that come with being big. Why doesn't Cassandra take
advantage of that in a major way?


Re: compaction strategy

2011-05-09 Thread Sylvain Lebresne
On Sat, May 7, 2011 at 7:20 PM, Terje Marthinussen
 wrote:
> This is an all ssd system. I have no problems with read/write performance
> due to I/O.
> I do have a potential with the crazy explosion you can get in terms of disk
> use if compaction cannot keep up.
>
> As things falls behind and you get many generations of data, yes, read
> performance gets a problem due to the number of sstables.
>
> As things start falling behind, you have a bunch of minor compactions trying
> to merge 20MB (sstables cassandra generally dumps with current config when
> under pressure) into 40 MB into 80MB into

Everyone may be well aware of that, but I'll still remark that a minor
compaction
will try to merge "as many 20MB sstables as it can" up to the max compaction
threshold (which is configurable). So if you do accumulate some newly created
sstable at some point in time, the next minor compaction will take all of them
and thus not create a 40 MB sstable, then 80MB etc... Sure there will be more
step than with a major compaction, but let's keep in mind we don't
merge sstables
2 by 2.

I'm also not too much in favor of triggering major compactions,
because it mostly
have a nasty effect (create one huge sstable). Now maybe we could expose the
difference factor for which we'll consider sstables in the same bucket
(i.e, of similar
size). As a side note, I think that
https://issues.apache.org/jira/browse/CASSANDRA-1610,
if done correctly, could help in such situation in that one could try
a strategy adapted
to it's work load.

>
> Anyone wants to do the math on how many times you are rewriting the data
> going this route?
>
> There is just no way this can keep up. It will just fall more and more
> behind.
> Only way to recover as I can see would be to trigger a full compaction?
>
> It does not really make sense to me to go through all these minor merges
> when a full compaction will do a much faster and better job.
>
> Terje
>
> On Sat, May 7, 2011 at 9:54 PM, Jonathan Ellis  wrote:
>>
>> On Sat, May 7, 2011 at 2:01 AM, Terje Marthinussen
>>  wrote:
>> > 1. Would it make sense to make full compactions occur a bit more
>> > aggressive.
>>
>> I'd rather reduce the performance impact of being behind, than do more
>> full compactions: https://issues.apache.org/jira/browse/CASSANDRA-2498
>>
>> > 2. I
>> > would think the code should be smart enough to either trigger a full
>> > compaction and scrap the current queue, or at least merge some of those
>> > pending tasks into larger ones
>>
>> Not crazy but a queue-rewriter would be nontrivial. For now I'm okay
>> with saying "add capacity until compaction can mostly keep up." (Most
>> people's problem is making compaction LESS aggressive, hence
>> https://issues.apache.org/jira/browse/CASSANDRA-2156.)
>>
>> --
>> Jonathan Ellis
>> Project Chair, Apache Cassandra
>> co-founder of DataStax, the source for professional Cassandra support
>> http://www.datastax.com
>
>