from:"Oleg Dulin"

Re: Upgrading from 1.2 to 2.1 questions

2015-02-02 Thread Oleg Dulin


Sure but the question is really about going from 1.2 to 2.0 ...

On 2015-02-02 13:59:27 +, Kai Wang said:

I would not use 2.1.2 for production yet. It doesn't seem stable enough 
based on the feedbacks I see here. The newest 2.0.12 may be a better 
option.

On Feb 2, 2015 8:43 AM, Sibbald, Charles charles.sibb...@bskyb.com wrote:
Hi Oleg,

What is the minor version of 1.2? I am looking to do the same for 1.2.14
in a very large cluster.

Regards

Charles


On 02/02/2015 13:33, Oleg Dulin oleg.du...@gmail.com wrote:

Dear Distinguished Colleagues:

We'd like to upgrade our cluster from 1.2 to 2.0 and then to 2.1 .

We are using Pelops Thrift client, which has long been abandoned by its
authors. I've read that 2.x has changes to the Thrift protocol making
it incompatible with 1.2 (and of course now the link to that site
eludes me). If that is true, we need to first upgrade our Thrift client
and then upgrade cassandra.

Let's start by confirming if that indeed is the case -- if that is
true, I have my work cut out for me.

Anyone knows for sure ?

Regards,
Oleg



Information in this email including any attachments may be privileged, 
confidential and is intended exclusively for the addressee. The views 
expressed may not be official policy, but the personal views of the 
originator. If you have received it in error, please notify the sender 
by return e-mail and delete it from your system. You should not 
reproduce, distribute, store, retransmit, use or disclose its contents 
to anyone. Please note we reserve the right to monitor all e-mail 
communication through our internal and external networks. SKY and the 
SKY marks are trademarks of British Sky Broadcasting Group plc and Sky 
International AG and are used under licence. British Sky Broadcasting 
Limited (Registration No. 2906991), Sky-In-Home Service Limited 
(Registration No. 2067075) and Sky Subscribers Services Limited 
(Registration No. 2340150) are direct or indirect subsidiaries of 
British Sky Broadcasting Group plc (Registration No. 2247735). All of 
the companies mentioned in this paragraph are incorporated in England 
and Wales and share the same registered office at Grant Way, Isleworth, 
Middlesex TW7 5QD.

Re: Upgrading from 1.2 to 2.1 questions

2015-02-02 Thread Oleg Dulin


Our minor version is 1.2.15 ...

I am not looking forward to the experience, and would like to gather as 
much information as possible.


This presents an opportunity to also review the data structures we use 
and possibly move them out of Cassandra.


Oleg

On 2015-02-02 13:42:52 +, Sibbald, Charles said:


Hi Oleg,

What is the minor version of 1.2? I am looking to do the same for 1.2.14
in a very large cluster.

Regards

Charles


On 02/02/2015 13:33, Oleg Dulin oleg.du...@gmail.com wrote:


Dear Distinguished Colleagues:

We'd like to upgrade our cluster from 1.2 to 2.0 and then to 2.1 .

We are using Pelops Thrift client, which has long been abandoned by its
authors. I've read that 2.x has changes to the Thrift protocol making
it incompatible with 1.2 (and of course now the link to that site
eludes me). If that is true, we need to first upgrade our Thrift client
and then upgrade cassandra.

Let's start by confirming if that indeed is the case -- if that is
true, I have my work cut out for me.

Anyone knows for sure ?

Regards,
Oleg




Information in this email including any attachments may be privileged, 
confidential and is intended exclusively for the addressee. The views 
expressed may not be official policy, but the personal views of the 
originator. If you have received it in error, please notify the sender 
by return e-mail and delete it from your system. You should not 
reproduce, distribute, store, retransmit, use or disclose its contents 
to anyone. Please note we reserve the right to monitor all e-mail 
communication through our internal and external networks. SKY and the 
SKY marks are trademarks of British Sky Broadcasting Group plc and Sky 
International AG and are used under licence. British Sky Broadcasting 
Limited (Registration No. 2906991), Sky-In-Home Service Limited 
(Registration No. 2067075) and Sky Subscribers Services Limited 
(Registration No. 2340150) are direct or indirect subsidiaries of 
British Sky Broadcasting Group plc (Registration No. 2247735). All of 
the companies mentioned in this paragraph are incorporated in England 
and Wales and share the same registered office at Grant Way, Isleworth, 
Middlesex TW7 5QD.

Upgrading from 1.2 to 2.1 questions

2015-02-02 Thread Oleg Dulin


Dear Distinguished Colleagues:

We'd like to upgrade our cluster from 1.2 to 2.0 and then to 2.1 .

We are using Pelops Thrift client, which has long been abandoned by its 
authors. I've read that 2.x has changes to the Thrift protocol making 
it incompatible with 1.2 (and of course now the link to that site 
eludes me). If that is true, we need to first upgrade our Thrift client 
and then upgrade cassandra.


Let's start by confirming if that indeed is the case -- if that is 
true, I have my work cut out for me.


Anyone knows for sure ?

Regards,
Oleg

Re: Upgrading from 1.2 to 2.1 questions

2015-02-02 Thread Oleg Dulin

What about Java clients that were built for 1.2 and how they work with 2.0 ? 


On 2015-02-02 14:32:53 +, Carlos Rolo said:

Using Pycassa (https://github.com/pycassa/pycassa)I had no trouble with 
the Clients writing/reading from 1.2.x to 2.0.x (Can't recall the minor 
versions out of my head right now).


Regards,

Carlos Juzarte Rolo
Cassandra Consultant
 
Pythian - Love your data

rolo@pythian | Twitter: cjrolo | Linkedin: linkedin.com/in/carlosjuzarterolo
Tel: 1649
www.pythian.com

On Mon, Feb 2, 2015 at 3:21 PM, Oleg Dulin oleg.du...@gmail.com wrote:
Sure but the question is really about going from 1.2 to 2.0 ...


On 2015-02-02 13:59:27 +, Kai Wang said:


I would not use 2.1.2 for production yet. It doesn't seem stable enough 
based on the feedbacks I see here. The newest 2.0.12 may be a better 
option.

On Feb 2, 2015 8:43 AM, Sibbald, Charles charles.sibb...@bskyb.com wrote:
Hi Oleg,


What is the minor version of 1.2? I am looking to do the same for 1.2.14
in a very large cluster.


Regards


Charles




On 02/02/2015 13:33, Oleg Dulin oleg.du...@gmail.com wrote:


Dear Distinguished Colleagues:

We'd like to upgrade our cluster from 1.2 to 2.0 and then to 2.1 .

We are using Pelops Thrift client, which has long been abandoned by its
authors. I've read that 2.x has changes to the Thrift protocol making
it incompatible with 1.2 (and of course now the link to that site
eludes me). If that is true, we need to first upgrade our Thrift client
and then upgrade cassandra.

Let's start by confirming if that indeed is the case -- if that is
true, I have my work cut out for me.

Anyone knows for sure ?

Regards,
Oleg




Information in this email including any attachments may be privileged, 
confidential and is intended exclusively for the addressee. The views 
expressed may not be official policy, but the personal views of the 
originator. If you have received it in error, please notify the sender 
by return e-mail and delete it from your system. You should not 
reproduce, distribute, store, retransmit, use or disclose its contents 
to anyone. Please note we reserve the right to monitor all e-mail 
communication through our internal and external networks. SKY and the 
SKY marks are trademarks of British Sky Broadcasting Group plc and Sky 
International AG and are used under licence. British Sky Broadcasting 
Limited (Registration No. 2906991), Sky-In-Home Service Limited 
(Registration No. 2067075) and Sky Subscribers Services Limited 
(Registration No. 2340150) are direct or indirect subsidiaries of 
British Sky Broadcasting Group plc (Registration No. 2247735). All of 
th! e compani es mentioned in this paragraph are incorporated in 
England and Wales and share the same registered office at Grant Way, 
Isleworth, Middlesex TW7 5QD.











--






--
Regards,
Oleg Dulin
http://www.olegdulin.com

EC2 Snitch load imbalance

2014-10-28 Thread Oleg Dulin

I have a setup with 6 cassandra nodes (1.2.18), using RandomPartition, 
not using vnodes -- this is a legacy cluster.


We went from 3 nodes to 6 in the last few days to add capacity. 
However, there appears to be an imbalance:


Datacenter: us-east
==
Replicas: 2

Address RackStatus State   LoadOwns 
  Token
   
  113427455640312821154458202477256070484
x.x.x.73   1d  Up Normal  154.64 GB   33.33%
 85070591730234615865843651857942052863
x.x.x.2511a  Up Normal  62.26 GB16.67%  
   28356863910078205288614550619314017621
x.x.x.238   1b  Up Normal  243.7 GB50.00%   
  56713727820156410577229101238628035242

x.x.x.25   1a  Up Normal  169.3 GB33.33%  210
x.x.x.162  1b  Up Normal  118.24 GB   50.00%
 141784319550391026443072753096570088105
x.x.x.208   1d  Up Normal  226.85 GB   16.67%   
  113427455640312821154458202477256070484



What is the cause of this imbalance ? How can I rectify it ?

Regards,
Oleg

Re: EC2 Snitch load imbalance

2014-10-28 Thread Oleg Dulin


Thanks Mark.

The output in my original post is with keyspace specified.

On 2014-10-28 12:00:15 +, Mark Reddy said:


Oleg, 

If you are running nodetool status, be sure to specify the keyspace 
also. If you don't specify the keyspace the results will be nonsense.


https://issues.apache.org/jira/browse/CASSANDRA-7173


Regards,
Mark

On 28 October 2014 10:35, Oleg Dulin oleg.du...@gmail.com wrote:
I have a setup with 6 cassandra nodes (1.2.18), using RandomPartition, 
not using vnodes -- this is a legacy cluster.


We went from 3 nodes to 6 in the last few days to add capacity. 
However, there appears to be an imbalance:


Datacenter: us-east
==
Replicas: 2

Address         Rack        Status State   Load            Owns         
      Token
                                                                        
     113427455640312821154458202477256070484
x.x.x.73   1d          Up     Normal  154.64 GB       33.33%            
 85070591730234615865843651857942052863
x.x.x.251    1a          Up     Normal  62.26 GB        16.67%          
   28356863910078205288614550619314017621
x.x.x.238   1b          Up     Normal  243.7 GB        50.00%           
  56713727820156410577229101238628035242

x.x.x.25   1a          Up     Normal  169.3 GB        33.33%              210
x.x.x.162  1b          Up     Normal  118.24 GB       50.00%            
 141784319550391026443072753096570088105
x.x.x.208   1d          Up     Normal  226.85 GB       16.67%           
  113427455640312821154458202477256070484



What is the cause of this imbalance ? How can I rectify it ?

Regards,
Oleg

Moving Cassandra from EC2 Classic into VPC

2014-09-08 Thread Oleg Dulin


Dear Colleagues:

I need to move Cassandra from EC2 classic into VPC.

What I was thinking is that I can create a new data center within VPC 
and rebuild it from my existing one (switching to vnodes while I am at 
it). However, I don't understand how the ec2-snitch will deal with this.


Another idea I had was taking the ec2-snitch configuration and 
converting it into a Property file snitch. But I still don't understand 
how to perform this move since I need my newly created VPC instances to 
have public IPs -- something I would like to avoid.


Any thoughts are appreciated.

Regards,
Oleg

Re: Moving Cassandra from EC2 Classic into VPC

2014-09-08 Thread Oleg Dulin

I get that, but if you read my opening post, I have an existing cluster 
in EC2 classic that I have no idea how to move to VPC cleanly.



On 2014-09-08 19:52:28 +, Bram Avontuur said:

I have setup Cassandra into VPC with the EC2Snitch and it works without 
issues. I didn't need to do anything special to the configuration. I 
have created instances in 2 availability zones, and it automatically
picks it up as 2 different data racks. Just make sure your nodes can 
see each other in the VPC, e.g. setup a security group that allows 
connections from other nodes from the same group.


There should be no need to use public IP's if whatever talks to 
cassandra is also within your VPC.


Hope this helps.
Bram


On Mon, Sep 8, 2014 at 3:34 PM, Oleg Dulin oleg.du...@gmail.com wrote:
Dear Colleagues:

I need to move Cassandra from EC2 classic into VPC.

What I was thinking is that I can create a new data center within VPC 
and rebuild it from my existing one (switching to vnodes while I am at 
it). However, I don't understand how the ec2-snitch will deal with this.


Another idea I had was taking the ec2-snitch configuration and 
converting it into a Property file snitch. But I still don't understand 
how to perform this move since I need my newly created VPC instances to 
have public IPs -- something I would like to avoid.


Any thoughts are appreciated.

Regards,
Oleg

Options for expanding Cassandra cluster on AWS

2014-08-19 Thread Oleg Dulin


Distinguished Colleagues:

Our current Cassandra cluster on AWS looks like this:

3 nodes in N. Virginia, one per zone.
RF=3

Each node is a c3.4xlarge with 2x160G SSDs in RAID-0 (~300 Gig SSD on 
each node). Works great, I find it the most optimal configuration for a 
Cassandra node.


But the time is coming soon when I need to expand storage capacity.

I have the following options in front of me:

1) Add 3 more c3.4xlarge nodes. This keeps the amount of data on each 
node reasonable, and all repairs and other tasks can complete in a 
reasonable amount of time. The downside is that c3.4xlarge are pricey.


2) Add provisioned EBS volumes. These days I can get SSD-backed EBS 
with up to 4000 IOPS provisioned. I can add those volumes to 
data_directories list in Yaml, and I expect Cassandra can deal with 
that JBOD-style The upside is that it is much cheaper than option 
#1 above; the downside is that it is a much slower configuration and 
repairs can take longer.


I'd appreciate any input on this topic.

Thanks in advance,
Oleg

ANNOUNCEMENT: cassandra-aws project

2014-06-06 Thread Oleg Dulin


Colleagues:

I'd like to announce a pet project I started: 
https://github.com/olegdulin/cassandra-aws


What I would like to accomplish as an end-goal is an Amazon marketplace 
AMI that makes it easy to configure a new Cassandra cluster or add new 
nodes to an existing Cassandra cluster, w/o having to jump through 
hoops. Ideally I'd like to do for Cassandra what RDS does for 
PostgreSQL in AWS, for instance, but I am not sure if ultimately it is 
possible.


To get started, I shared some notes in the wiki as well as a couple of 
scripts I used to simplify things for myself. I put those scripts 
together from input I received on the #cassandra IRC channel and this 
mailing list and I am very greatful to the community for helping me 
through this -- so this is my contribution back.


Consider this email as a solicitation for help. I am open to 
discussions, and contributions, and suggestions, anything you can help 
with.



Regards,
Oleg

Re: ANNOUNCEMENT: cassandra-aws project

2014-06-06 Thread Oleg Dulin


I guess I didn't know about the ComboAMI!

Thanks! I'll look into this.

I have been rolling my own AMIs for a simple reason -- we have 
on-premises environments and in AWS. I wanted them to be the same, 
structurally, so I used our on-prem configurations as a starting point.


Regards,
Oleg

On 2014-06-06 15:25:44 +, Michael Shuler said:


On 06/06/2014 09:57 AM, Oleg Dulin wrote:

I'd like to announce a pet project I started:
https://github.com/olegdulin/cassandra-aws


Cool  :)

https://github.com/riptano/ComboAMI is the DataStax AMI repo.


What I would like to accomplish as an end-goal is an Amazon marketplace
AMI that makes it easy to configure a new Cassandra cluster or add new
nodes to an existing Cassandra cluster, w/o having to jump through
hoops. Ideally I'd like to do for Cassandra what RDS does for PostgreSQL
in AWS, for instance, but I am not sure if ultimately it is possible.


Is there something that ComboAMI doesn't cover for your needs or is 
there some area that could be improved upon?



To get started, I shared some notes in the wiki as well as a couple of
scripts I used to simplify things for myself. I put those scripts
together from input I received on the #cassandra IRC channel and this
mailing list and I am very greatful to the community for helping me
through this -- so this is my contribution back.

Consider this email as a solicitation for help. I am open to
discussions, and contributions, and suggestions, anything you can help
with.


Would it be less overall work to implement changes you'd like to see by 
contributing them to ComboAMI?


I fully support lots of variations of tools - whatever makes things 
easiest for people to do exactly what they need, or in languages 
they're comfortable with, etc.

How to balance this cluster out ?

2014-05-13 Thread Oleg Dulin


I have a cluster that looks like this:

Datacenter: us-east
==
Replicas: 2

Address RackStatus State   LoadOwns 
  Token
   
  113427455640312821154458202477256070484
*.*.*.1   1b  Up Normal  141.88 GB   66.67% 
56713727820156410577229101238628035242

*.*.*.2  1a  Up Normal  113.2 GB66.67%  210
*.*.*.3   1d  Up Normal  102.37 GB   66.67% 
113427455640312821154458202477256070484



Obviously, the first node in 1b has 40% more data than the others. If I 
wanted to rebalance this cluster, how would I go about that ? Would 
shifting the tokens accomplish what I need and which tokens ?


Regards,
Oleg

Re: How to rebalance a cluster?

2014-05-12 Thread Oleg Dulin


I keep asking same question it seems -- sign of insanity.

Cassandra version 1.2, not using vnodes (legacy).

On 2014-03-07 19:37:48 +, Robert Coli said:


On Fri, Mar 7, 2014 at 6:00 AM, Oleg Dulin oleg.du...@gmail.com wrote:
I have the following situation:

10.194.2.5    RAC1        Up     Normal  378.6 GB        50.00%              0
10.194.2.4    RAC1        Up     Normal  427.5 GB        50.00%         
    127605887595351923798765477786913079295
10.194.2.7    RAC1        Up     Normal  350.63 GB       50.00%         
    85070591730234615865843651857942052864
10.194.2.6    RAC1        Up     Normal  314.42 GB       50.00%         
    42535295865117307932921825928971026432



As you can see, the 2.4 node has over 100 G more data than 2.6 . You 
can definitely see the imbalance. It also happens to be the heaviest 
loaded node by CPU usage.


The first step is to understand why.

Are you using vnodes? What version of Cassandra?
 
What would be a clean way to rebalance ? If I use move operation 
follwoed by cleanup, would it require a repair afterwards ?


Move is not, as I understand it, subject to CASSANDRA-2434, so should 
not require a post-move repair.


=Rob
 


L

How safe is nodetool move in 1.2 ?

2014-04-16 Thread Oleg Dulin

I need to rebalance my cluster. I am sure this question has been asked 
before -- will 1.2 continue to serve reads and writes correctly while 
move is in progress ?


Need this for my sanity.
--
Regards,
Oleg Dulin
http://www.olegdulin.com

Re: Commitlog questions

2014-04-09 Thread Oleg Dulin


Parag:

To answer your questions:

1) Default is just that, a default. I wouldn't advise raising it 
though. The bigger it is the longer it takes to restart the node.
2) I think they juse use fsync. There is no queue. All files in 
cassandra use java.nio buffers, but they need to be fsynced 
periodically. Look at commitlog_sync parameters in cassandra.yaml file, 
the comments there explain how it works. I believe the difference 
between periodic and batch is just that -- if it is periodic, it will 
fsync every 10 seconds, if it is batch it will fsync if there were any 
changes within a time window.


On 2014-04-09 10:06:52 +, Parag Patel said:


 
1)  Why is the default 4GB?  Has anyone changed this? What are some 
aspects to consider when determining the commitlog size?
2)  If the commitlog is in periodic mode, there is a property to 
set a time interval to flush the incoming mutations to disk.  This 
implies that there is a queue inside Cassandra to hold this data in 
memory until it is flushed.

a.   Is there a name for this queue?
b.  Is there a limit for this queue?
c.   Are there any tuning parameters for this queue?

 
Thanks,
Parag



--
Regards,
Oleg Dulin
http://www.olegdulin.com

Why is my cluster imbalanced ?

2014-04-07 Thread Oleg Dulin


I added two more nodes on Friday, and moved tokens around.

For four nodes, the tokesn should be:

 Node #1:    0
 Node #2:   42535295865117307932921825928971026432
 Node #3:   85070591730234615865843651857942052864
 Node #4:  127605887595351923798765477786913079296

And yet my ring status shows this (for a specific keyspace). RF=2.

Datacenter: us-east
==
Replicas: 2

AddressRackStatus State   LoadOwns  
 Token
   
 42535295865117307932921825928971026432
x.x.x.1  1b  Up Normal  13.51 GB25.00%  
127605887595351923798765477786913079296
x.x.x.2  1b  Up Normal  4.46 GB 25.00%  
85070591730234615865843651857942052164

x.x.x.3  1a  Up Normal  62.58 GB100.00% 0
x.x.x.4  1b  Up Normal  66.71 GB50.00%  
42535295865117307932921825928971026432


Datacenter: us-west
==
Replicas: 1

AddressRackStatus State   LoadOwns  
 Token


x.x.x.5   1b  Up Normal  62.72 GB100.00% 100
--
Regards,
Oleg Dulin
http://www.olegdulin.com

Re: Why is my cluster imbalanced ?

2014-04-07 Thread Oleg Dulin


Excellent, thanks.

On 2014-04-07 12:23:51 +, Tupshin Harper said:


Your us-east datacenter, has RF=2, and 2 racks, which is the right way
to do it (I would rarely recommend using a different number of racks
than your RF). But by having three nodes on one rack (1b) and only one
on the other(1a), you are telling Cassandra to distribute the data so
that no two copies of the same partition exist on the same rack.

So with rack ownership of 100% and 100% respectively, there is no even
way to distribute your data among those four nodes.

tl;dr Switch node 2 to rack 1a.

-Tupshin



On Mon, Apr 7, 2014 at 8:08 AM, Oleg Dulin oleg.du...@gmail.com wrote:

I added two more nodes on Friday, and moved tokens around.

For four nodes, the tokesn should be:

Node #1:0
Node #2:   42535295865117307932921825928971026432
Node #3:   85070591730234615865843651857942052864
Node #4:  127605887595351923798765477786913079296

And yet my ring status shows this (for a specific keyspace). RF=2.

Datacenter: us-east
==
Replicas: 2

AddressRackStatus State   LoadOwns
Token

42535295865117307932921825928971026432
x.x.x.1  1b  Up Normal  13.51 GB25.00%
127605887595351923798765477786913079296
x.x.x.2  1b  Up Normal  4.46 GB 25.00%
85070591730234615865843651857942052164
x.x.x.3  1a  Up Normal  62.58 GB100.00% 0
x.x.x.4  1b  Up Normal  66.71 GB50.00%
42535295865117307932921825928971026432

Datacenter: us-west
==
Replicas: 1

AddressRackStatus State   LoadOwns
Token

x.x.x.5   1b  Up Normal  62.72 GB100.00% 100
--
Regards,
Oleg Dulin
http://www.olegdulin.com



--
Regards,
Oleg Dulin
http://www.olegdulin.com

Re: Why is my cluster imbalanced ?

2014-04-07 Thread Oleg Dulin


Tupshin:

For EC2, 3 us-east, would you recommend RF=3 ? That would make sense, 
wouldn't it...


That's what I'll do for production.

Oleg

On 2014-04-07 12:23:51 +, Tupshin Harper said:


Your us-east datacenter, has RF=2, and 2 racks, which is the right way
to do it (I would rarely recommend using a different number of racks
than your RF). But by having three nodes on one rack (1b) and only one
on the other(1a), you are telling Cassandra to distribute the data so
that no two copies of the same partition exist on the same rack.

So with rack ownership of 100% and 100% respectively, there is no even
way to distribute your data among those four nodes.

tl;dr Switch node 2 to rack 1a.

-Tupshin



On Mon, Apr 7, 2014 at 8:08 AM, Oleg Dulin oleg.du...@gmail.com wrote:

I added two more nodes on Friday, and moved tokens around.

For four nodes, the tokesn should be:

Node #1:0
Node #2:   42535295865117307932921825928971026432
Node #3:   85070591730234615865843651857942052864
Node #4:  127605887595351923798765477786913079296

And yet my ring status shows this (for a specific keyspace). RF=2.

Datacenter: us-east
==
Replicas: 2

AddressRackStatus State   LoadOwns
Token

42535295865117307932921825928971026432
x.x.x.1  1b  Up Normal  13.51 GB25.00%
127605887595351923798765477786913079296
x.x.x.2  1b  Up Normal  4.46 GB 25.00%
85070591730234615865843651857942052164
x.x.x.3  1a  Up Normal  62.58 GB100.00% 0
x.x.x.4  1b  Up Normal  66.71 GB50.00%
42535295865117307932921825928971026432

Datacenter: us-west
==
Replicas: 1

AddressRackStatus State   LoadOwns
Token

x.x.x.5   1b  Up Normal  62.72 GB100.00% 100
--
Regards,
Oleg Dulin
http://www.olegdulin.com



--
Regards,
Oleg Dulin
http://www.olegdulin.com

Re: need help with Cassandra 1.2 Full GCing -- output of jmap histogram

2014-03-25 Thread Oleg Dulin

Sigh, so I am back to where I started from...

I did lower gc_grace...

jmap -histo:live shows heap is stuffed with DeletedColumn and
ExpiringColumn

This is extremely frustrating.

On 2014-03-11 19:24:50 +, Oleg Dulin said:

Good news is that since I lowered gc_grace period it collected over
100Gigs of tombstones and seems much happier now.

Oleg

On 2014-03-10 13:33:43 +, Jonathan Lacefield said:

Hello,

You have several options:

1) going forward lower gc_grace_seconds
http://www.datastax.com/documentation/cassandra/1.2/cassandra/configuration/configStorage_r.html?pagename=docsversion=1.2file=configuration/storage_configuration#gc-grace-seconds

- this is very use case specific. Default is 10 days. Some
users will put this at 0 for specific use cases.
2) you could also lower tombstone compaction threshold and interval
to get tombstone compaction to fire more often on your tables/cfs:
https://datastax.jira.com/wiki/pages/viewpage.action?pageId=54493436
3) to clean out old tombstones you could always run a manual
compaction, those these aren't typically recommended though:
http://www.datastax.com/documentation/cassandra/1.2/cassandra/tools/toolsNodetool_r.html

For 1 and 2, be sure your disks can keep up with compaction to ensure
tombstone, or other, compaction fires regularly enough to clean out old
tombstones. Also, you probably want to ensure you are using Level
Compaction:
http://www.datastax.com/dev/blog/when-to-use-leveled-compaction.
Again, this assumes your disk system can handle the increased io from
Leveled Compaction.

Also, you may be running into this with the older version of
Cassandra: https://issues.apache.org/jira/browse/CASSANDRA-6541

Hope this helps.

Jonathan

Jonathan Lacefield
Solutions Architect, DataStax
(404) 822 3487
image

image

On Mon, Mar 10, 2014 at 6:41 AM, Oleg Dulin oleg.du...@gmail.com wrote:
I get that :)

What I'd like to know is how to fix that :)

On 2014-03-09 20:24:54 +, Takenori Sato said:

You have millions of org.apache.cassandra.db.DeletedColumn instances on
the snapshot.

This means you have lots of column tombstones, and I guess, which are
read into memory by slice query.

On Sun, Mar 9, 2014 at 10:55 PM, Oleg Dulin oleg.du...@gmail.com wrote:
I am trying to understand why one of my nodes keeps full GC.

I have Xmx set to 8gigs, memtable total size is 2 gigs.

Consider the top entries from jmap -histo:live @ http://pastebin.com/UaatHfpJ

--
Regards,
Oleg Dulin
http://www.olegdulin.com

Need help understanding hinted_handoff_throttle_in_kb

2014-03-13 Thread Oleg Dulin

I came across something on the cassandra it that made me concerned.

Default value for hinted_handoff_throttle_in_kb is 1024, one Meg per
second. I have four nodes and rf=2. I have hints timeout set to 24, to
avoid having to do repairs if I took longer than that to reboot a node.

What got me thinking though is that if I'm generating gigabytes worth of
hints during the day and across four nodes the throttle becomes 250k per
second, that is too slow to replay all of my hints properly. Is tht right ?

I need to understand this setting better. I would like to make sure that
all of my hints get replayed. What is a recommended setting ?

Any input is greatly appreciated.

Regards,
Oleg

1.2: Why can't I see what is in hints CF ?

2014-03-13 Thread Oleg Dulin


Check this out:

[default@system] list hints limit 10;
Using default cell limit of 100
null
TimedOutException()
	at 
org.apache.cassandra.thrift.Cassandra$get_range_slices_result.read(Cassandra.java:12932) 


at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:78)
	at 
org.apache.cassandra.thrift.Cassandra$Client.recv_get_range_slices(Cassandra.java:734) 

	at 
org.apache.cassandra.thrift.Cassandra$Client.get_range_slices(Cassandra.java:718) 


at org.apache.cassandra.cli.CliClient.executeList(CliClient.java:1495)
at 
org.apache.cassandra.cli.CliClient.executeCLIStatement(CliClient.java:279)
	at 
org.apache.cassandra.cli.CliMain.processStatementInteractive(CliMain.java:213) 


at org.apache.cassandra.cli.CliMain.main(CliMain.java:339)


My nodes are accumulating hints and I am wondering what in the world is 
going on...


--
Regards,
Oleg Dulin
http://www.olegdulin.com

Re: How to guarantee consistency between counter and materialized view?

2014-03-13 Thread Oleg Dulin

Robert Coli rc...@eventbrite.com wrote:
 On Tue, Mar 11, 2014 at 4:30 PM, ziju feng pkdog...@gmail.com wrote:
 
 Is there any way to guarantee a counter's value
 
 no.
 
 =Rob

I wouldn't use cassandra for counters... Use something like redis if that
is what you want.

Re: need help with Cassandra 1.2 Full GCing -- output of jmap histogram

2014-03-11 Thread Oleg Dulin

Good news is that since I lowered gc_grace period it collected over 
100Gigs of tombstones and seems much happier now.


Oleg

On 2014-03-10 13:33:43 +, Jonathan Lacefield said:


Hello,

  You have several options:

  1) going forward lower gc_grace_seconds 
http://www.datastax.com/documentation/cassandra/1.2/cassandra/configuration/configStorage_r.html?pagename=docsversion=1.2file=configuration/storage_configuration#gc-grace-seconds 

       - this is very use case specific.  Default is 10 days.  Some 
users will put this at 0 for specific use cases.
  2) you could also lower tombstone compaction threshold and interval 
to get tombstone compaction to fire more often on your tables/cfs:  
https://datastax.jira.com/wiki/pages/viewpage.action?pageId=54493436
  3) to clean out old tombstones you could always run a manual 
compaction, those these aren't typically recommended though:  
http://www.datastax.com/documentation/cassandra/1.2/cassandra/tools/toolsNodetool_r.html 


       
  For 1 and 2, be sure your disks can keep up with compaction to ensure 
tombstone, or other, compaction fires regularly enough to clean out old 
tombstones.  Also, you probably want to ensure you are using Level 
Compaction:  
http://www.datastax.com/dev/blog/when-to-use-leveled-compaction.  
Again, this assumes your disk system can handle the increased io from 
Leveled Compaction.


  Also, you may be running into this with the older version of 
Cassandra: https://issues.apache.org/jira/browse/CASSANDRA-6541


  Hope this helps.

Jonathan


Jonathan Lacefield
Solutions Architect, DataStax
(404) 822 3487
image


image


On Mon, Mar 10, 2014 at 6:41 AM, Oleg Dulin oleg.du...@gmail.com wrote:
I get that :)

What I'd like to know is how to fix that :)

On 2014-03-09 20:24:54 +, Takenori Sato said:

You have millions of org.apache.cassandra.db.DeletedColumn instances on 
the snapshot.


This means you have lots of column tombstones, and I guess, which are 
read into memory by slice query. 



On Sun, Mar 9, 2014 at 10:55 PM, Oleg Dulin oleg.du...@gmail.com wrote:
I am trying to understand why one of my nodes keeps full GC.

I have Xmx set to 8gigs, memtable total size is 2 gigs.

Consider the top entries from jmap -histo:live @ http://pastebin.com/UaatHfpJ

--
Regards,
Oleg Dulin
http://www.olegdulin.com


--
Regards,
Oleg Dulin
http://www.olegdulin.com


S
--
Regards,
Oleg Dulin
http://www.olegdulin.com

Re: need help with Cassandra 1.2 Full GCing -- output of jmap histogram

2014-03-10 Thread Oleg Dulin


I get that :)

What I'd like to know is how to fix that :)

On 2014-03-09 20:24:54 +, Takenori Sato said:

You have millions of org.apache.cassandra.db.DeletedColumn instances on 
the snapshot.


This means you have lots of column tombstones, and I guess, which are 
read into memory by slice query. 



On Sun, Mar 9, 2014 at 10:55 PM, Oleg Dulin oleg.du...@gmail.com wrote:
I am trying to understand why one of my nodes keeps full GC.

I have Xmx set to 8gigs, memtable total size is 2 gigs.

Consider the top entries from jmap -histo:live @ http://pastebin.com/UaatHfpJ

--
Regards,
Oleg Dulin
http://www.olegdulin.com



--
Regards,
Oleg Dulin
http://www.olegdulin.com

need help with Cassandra 1.2 Full GCing -- output of jmap histogram

2014-03-09 Thread Oleg Dulin


I am trying to understand why one of my nodes keeps full GC.

I have Xmx set to 8gigs, memtable total size is 2 gigs.

Consider the top entries from jmap -histo:live @ http://pastebin.com/UaatHfpJ

--
Regards,
Oleg Dulin
http://www.olegdulin.com

How to rebalance a cluster?

2014-03-07 Thread Oleg Dulin


I have the following situation:

10.194.2.5RAC1Up Normal  378.6 GB50.00%  0
10.194.2.4RAC1Up Normal  427.5 GB50.00% 
127605887595351923798765477786913079295
10.194.2.7RAC1Up Normal  350.63 GB   50.00% 
85070591730234615865843651857942052864
10.194.2.6RAC1Up Normal  314.42 GB   50.00% 
42535295865117307932921825928971026432



As you can see, the 2.4 node has over 100 G more data than 2.6 . You 
can definitely see the imbalance. It also happens to be the heaviest 
loaded node by CPU usage.


What would be a clean way to rebalance ? If I use move operation 
follwoed by cleanup, would it require a repair afterwards ?


--
Regards,
Oleg Dulin
http://www.olegdulin.com

Re: Cass 1.2.11 : java.lang.AssertionError: originally calculated column size

2014-02-17 Thread Oleg Dulin


Bumping this up -- anything ? anyone ?

On 2014-02-13 16:01:50 +, Oleg Dulin said:

I am getting these exceptions on one of the nodes, quite often, during 
compactions:


java.lang.AssertionError: originally calculated column size of 
84562492 but now it is 84562600


Usually  this is on the same column family.

I believe this is preventing compactions from completing, and 
subsequently causing other performance issues for me.


Is there a way to fix that ? Would nodetool scrub take care of this ?



--
Regards,
Oleg Dulin
http://www.olegdulin.com

Cass 1.2.11 : java.lang.AssertionError: originally calculated column size

2014-02-13 Thread Oleg Dulin

I am getting these exceptions on one of the nodes, quite often, during 
compactions:


java.lang.AssertionError: originally calculated column size of 
84562492 but now it is 84562600


Usually  this is on the same column family.

I believe this is preventing compactions from completing, and 
subsequently causing other performance issues for me.


Is there a way to fix that ? Would nodetool scrub take care of this ?



--
Regards,
Oleg Dulin
http://www.olegdulin.com

Cass 1.2.11: Replacing a node procedure

2014-02-13 Thread Oleg Dulin


Dear Distinguished Colleagues:

I have a situation where in the production environment one of the 
machines is overheating and needs to be serviced. Now, the landscape 
looks like this:


4 machines in primary DC, 4 machiens in DR DC. Replication factor is 2.

I also have a QA environment with 4 machines in a single DC, RF=2 as well.

We need to work with the manufaturer to figure out what is wrong with 
the machine. The proposed course of action is the following:


1) Take the faulty prod machine (lets call it X) out of production.
2) Take a healthy QA machine (lets call it Y) out of QA
3) Plug QA machine into the prod cluster and rebuild it.
4) Plug prod machine into the QA cluster and leave it alone and let the 
manufacturer service it to their liking until they say it is fixed, at 
which point we will just leave it in QA.


So basically we are talking about replacing a dead node.

I found this: 
http://www.datastax.com/documentation/cassandra/1.2/webhelp/index.html#cassandra/operations/ops_replace_node_t.html 



I am not using vnodes, just plain vanilla tokens and RandomPartitioner. 
So that procedure doesn't apply. I need some help putting together a 
step-by-step checklist what I would need to do.




--
Regards,
Oleg Dulin
http://www.olegdulin.com

Re: Cass 1.2.11: Replacing a node procedure

2014-02-13 Thread Oleg Dulin


Here is what I am thinking.

1) Add the new node with token-1 of the old one and let it bootstrap.
2) Once it bootstrapped, remove the old node from the ring

Now, it is #2 that I need clarification on.

Do I use decommission or remove ? How long should I expect those 
processes to run ?


Regards,
Oleg



On 2014-02-13 22:01:10 +, Oleg Dulin said:


Dear Distinguished Colleagues:

I have a situation where in the production environment one of the 
machines is overheating and needs to be serviced. Now, the landscape 
looks like this:


4 machines in primary DC, 4 machiens in DR DC. Replication factor is 2.

I also have a QA environment with 4 machines in a single DC, RF=2 as well.

We need to work with the manufaturer to figure out what is wrong with 
the machine. The proposed course of action is the following:


1) Take the faulty prod machine (lets call it X) out of production.
2) Take a healthy QA machine (lets call it Y) out of QA
3) Plug QA machine into the prod cluster and rebuild it.
4) Plug prod machine into the QA cluster and leave it alone and let the 
manufacturer service it to their liking until they say it is fixed, at 
which point we will just leave it in QA.


So basically we are talking about replacing a dead node.

I found this: 
http://www.datastax.com/documentation/cassandra/1.2/webhelp/index.html#cassandra/operations/ops_replace_node_t.html 



I am not using vnodes, just plain vanilla tokens and RandomPartitioner. 
So that procedure doesn't apply. I need some help putting together a 
step-by-step checklist what I would need to do.



--
Regards,
Oleg Dulin
http://www.olegdulin.com

Re: Thrift CAS usage

2014-02-13 Thread Oleg Dulin


On 2014-02-12 23:11:01 +, mahesh rajamani said:


Hi,

I am using CAS feature through thrift cas api.

I am able to set the expected column with some value and use cas 
through thrift api. But I am sure what I should set for expected column 
list to achieve IF NOT EXIST condition for a column.


Can someone help me on this?

--
Regards,
Mahesh Rajamani


Read the column first...

--
Regards,
Oleg Dulin
http://www.olegdulin.com

Re: Cassandra 1.2 : OutOfMemoryError: unable to create new native thread

2013-12-18 Thread Oleg Dulin


I figured it out. Another process on that machine was leaking threads.

All is well!

Thanks guys!

Oleg


On 2013-12-16 13:48:39 +, Maciej Miklas said:


the cassandra-env.sh has option
JVM_OPTS=$JVM_OPTS -Xss180k

it will give this error if you start cassandra with java 7. So increase 
the value, or remove option.


Regards,
Maciej


On Mon, Dec 16, 2013 at 2:37 PM, srmore comom...@gmail.com wrote:
What is your thread stack size (xss) ? try increasing that, that could 
help. Sometimes the limitation is imposed by the host provider (e.g.  
amazon ec2 etc.)


Thanks,
Sandeep  



On Mon, Dec 16, 2013 at 6:53 AM, Oleg Dulin oleg.du...@gmail.com wrote:
Hi guys!

I beleive my limits settings are correct. Here is the output of ulimits -a:

core file size          (blocks, -c) 0
data seg size           (kbytes, -d) unlimited
scheduling priority             (-e) 0
file size               (blocks, -f) unlimited
pending signals                 (-i) 1547135
max locked memory       (kbytes, -l) unlimited
max memory size         (kbytes, -m) unlimited
open files                      (-n) 10
pipe size            (512 bytes, -p) 8
POSIX message queues     (bytes, -q) 819200
real-time priority              (-r) 0
stack size              (kbytes, -s) 8192
cpu time               (seconds, -t) unlimited
max user processes              (-u) 32768
virtual memory          (kbytes, -v) unlimited
file locks                      (-x) unlimited

However,  I just had a couple of cassandra nodes go down over the 
weekend for no apparent reason with the following error:


java.lang.OutOfMemoryError: unable to create new native thread
       at java.lang.Thread.start0(Native Method)
       at java.lang.Thread.start(Thread.java:691)
       at 
java.util.concurrent.ThreadPoolExecutor.addWorker(ThreadPoolExecutor.java:949) 

       at 
java.util.concurrent.ThreadPoolExecutor.processWorkerExit(ThreadPoolExecutor.java:1017) 

       at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1163) 

       at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) 


       at java.lang.Thread.run(Thread.java:722)

Any input is greatly appreciated.
--
Regards,
Oleg Dulin
http://www.olegdulin.com



--
Regards,
Oleg Dulin
http://www.olegdulin.com

Cassandra 1.2 : OutOfMemoryError: unable to create new native thread

2013-12-16 Thread Oleg Dulin


Hi guys!

I beleive my limits settings are correct. Here is the output of ulimits -a:

core file size  (blocks, -c) 0
data seg size   (kbytes, -d) unlimited
scheduling priority (-e) 0
file size   (blocks, -f) unlimited
pending signals (-i) 1547135
max locked memory   (kbytes, -l) unlimited
max memory size (kbytes, -m) unlimited
open files  (-n) 10
pipe size(512 bytes, -p) 8
POSIX message queues (bytes, -q) 819200
real-time priority  (-r) 0
stack size  (kbytes, -s) 8192
cpu time   (seconds, -t) unlimited
max user processes  (-u) 32768
virtual memory  (kbytes, -v) unlimited
file locks  (-x) unlimited

However,  I just had a couple of cassandra nodes go down over the 
weekend for no apparent reason with the following error:


java.lang.OutOfMemoryError: unable to create new native thread
   at java.lang.Thread.start0(Native Method)
   at java.lang.Thread.start(Thread.java:691)
   at 
java.util.concurrent.ThreadPoolExecutor.addWorker(ThreadPoolExecutor.java:949) 

   at 
java.util.concurrent.ThreadPoolExecutor.processWorkerExit(ThreadPoolExecutor.java:1017) 

   at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1163) 

   at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) 


   at java.lang.Thread.run(Thread.java:722)

Any input is greatly appreciated.
--
Regards,
Oleg Dulin
http://www.olegdulin.com

Re: Cassandra and bug track type number sequencing.

2013-12-16 Thread Oleg Dulin

If you want sequential numbers, you can't trust distributed counters 
from Cassandra. However, you could use Redis for this.


Additionally, you can also use a random UUID and only show the customer 
first 6 characters -- it is unique enough...


Oleg

On 2013-12-16 09:33:39 +, Jacob Rhoden said:


Hi Guys,

As per the subject, is there any way at all to easily associate small 
numbers in systems where users traditionally associate “bug/request” 
tickets with short numbers?


In this use case I imagine the requirements would be as follows:
	•	The numbers don’t necessary need to be sequential, just need to be 
short enough for a user to read out loud.

•   The numbers must be unique.
	•	It doesn’t need to scale, i.e. a typical “request” system is not 
getting hundreds of requests per second.
In an ideal world, we could do away with associating “requests” with 
numbers, but its so ubiquitous I’m not sure you can sell doing away 
with short number codes.


I am toying with the idea of a Cassandra table that makes available 
short “blocks” of numbers that an app server can hold “reservations” 
on. i.e.


create table request_id_block(
    start int,
    end int,
    uuid uuid,
    reserved_by int,
    reserved_until bigint,
    primary key(start,end));

Will having an app server mark a block as reserved (QUOROM) and then 
reading it back (QUOROM) be enough to for an app server to know it owns 
that block of numbers?


Best regards,
Jacob



--
Regards,
Oleg Dulin
http://www.olegdulin.com

Re: 1.1.11: system keyspace is filling up

2013-11-05 Thread Oleg Dulin

What happens if they are not being successfully delivered ? Will they 
eventually TTL-out ?


Also, do I need to truncate hints on every node or is it replicated ?

Oleg

On 2013-11-04 21:34:55 +, Robert Coli said:


On Mon, Nov 4, 2013 at 11:34 AM, Oleg Dulin oleg.du...@gmail.com wrote:
I have a dual DC setup, 4 nodes, RF=4 in each.

The one that is used as primary has its system keyspace fill up with 
200 gigs of data, majority of which is hints.


Why does this happen ?

How can I clean it up ?

If you have this many hints, you probably have flapping / frequent 
network partition, or very overloaded nodes. If you compare the number 
of hints to the number of dropped messages, that would be informative. 
If you're hinting because you're dropping, increase capacity. If you're 
hinting because of partition, figure out why there's so much partition.


WRT cleaning up hints, they will automatically be cleaned up 
eventually, as long as they are successfully being delivered. If you 
need to manually clean them up you can truncate system.hints keyspace.


=Rob
 




--
Regards,
Oleg Dulin
http://www.olegdulin.com

Re: Cass 1.1.11 out of memory during compaction ?

2013-11-04 Thread Oleg Dulin

If i do that, wouldn't I need to scrub my sstables ?


Takenori Sato ts...@cloudian.com wrote:
 Try increasing column_index_size_in_kb.
 
 A slice query to get some ranges(SliceFromReadCommand) requires to read
 all the column indexes for the row, thus could hit OOM if you have a very 
 wide row.
 
 On Sun, Nov 3, 2013 at 11:54 PM, Oleg Dulin oleg.du...@gmail.com wrote:
 
 Cass 1.1.11 ran out of memory on me with this exception (see below).
 
 My parameters are 8gig heap, new gen is 1200M.
 
 ERROR [ReadStage:55887] 2013-11-02 23:35:18,419
 AbstractCassandraDaemon.java (line 132) Exception in thread
 Thread[ReadStage:55887,5,main] java.lang.OutOfMemoryError: Java heap
 space   
 at 
 org.apache.cassandra.io.util.RandomAccessReader.readBytes(RandomAccessReader.java:323)
 
at org.apache.cassandra.utils.ByteBufferUtil.read(
 ByteBufferUtil.java:398)at
 org.apache.cassandra.utils.ByteBufferUtil.readWithShortLength(ByteBufferUtil.java:380)
 
at 
 org.apache.cassandra.db.ColumnSerializer.deserialize(ColumnSerializer.java:88)
 
at 
 org.apache.cassandra.db.ColumnSerializer.deserialize(ColumnSerializer.java:83)
 
at 
 org.apache.cassandra.db.ColumnSerializer.deserialize(ColumnSerializer.java:73)
 
at 
 org.apache.cassandra.db.ColumnSerializer.deserialize(ColumnSerializer.java:37)
 
at org.apache.cassandra.db.columniterator.IndexedSliceReader$
 IndexedBlockFetcher.getNextBlock(IndexedSliceReader.java:179)at
 org.apache.cassandra.db.columniterator.IndexedSliceReader.
 computeNext(IndexedSliceReader.java:121)at
 org.apache.cassandra.db.columniterator.IndexedSliceReader.
 computeNext(IndexedSliceReader.java:48)at
 com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:140)
 
at 
 com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:135)
 
at org.apache.cassandra.db.columniterator.
 SSTableSliceIterator.hasNext(SSTableSliceIterator.java:116)at
 org.apache.cassandra.utils.MergeIterator$Candidate.
 advance(MergeIterator.java:147)at
 org.apache.cassandra.utils.MergeIterator$ManyToOne.
 advance(MergeIterator.java:126)at
 org.apache.cassandra.utils.MergeIterator$ManyToOne.
 computeNext(MergeIterator.java:100)at
 com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:140)
 
at 
 com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:135)
 
at org.apache.cassandra.db.filter.SliceQueryFilter.
 collectReducedColumns(SliceQueryFilter.java:117)at
 org.apache.cassandra.db.filter.QueryFilter.
 collateColumns(QueryFilter.java:140)   
 at 
 org.apache.cassandra.db.CollationController.collectAllData(CollationController.java:292)
 
at
 org.apache.cassandra.db.CollationController.getTopLevelColumns(CollationController.java:64)
 
at
 org.apache.cassandra.db.ColumnFamilyStore.getTopLevelColumns(ColumnFamilyStore.java:1362)
 
at
 org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(ColumnFamilyStore.java:1224)
 
at
 org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(ColumnFamilyStore.java:1159)
 
at org.apache.cassandra.db.Table.getRow(Table.java:378)at
 org.apache.cassandra.db.SliceFromReadCommand.getRow(SliceFromReadCommand.java:69)
 
at org.apache.cassandra.db.ReadVerbHandler.doVerb(
 ReadVerbHandler.java:51)at
 org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:59)
 
at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
 
at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
 
at java.lang.Thread.run(Thread.java:722)
 
 Any thoughts ?
 
 This is a dual data center set up, with 4 nodes in each DC and RF=2 in each.
 
 --
 Regards,
 Oleg Dulin a href=http://www.olegdulin.com;http://www.olegdulin.com/a

1.1.11: system keyspace is filling up

2013-11-04 Thread Oleg Dulin


I have a dual DC setup, 4 nodes, RF=4 in each.

The one that is used as primary has its system keyspace fill up with 
200 gigs of data, majority of which is hints.


Why does this happen ?

How can I clean it up ?

--
Regards,
Oleg Dulin
http://www.olegdulin.com

Cass 1.1.11 out of memory during compaction ?

2013-11-03 Thread Oleg Dulin


Cass 1.1.11 ran out of memory on me with this exception (see below).

My parameters are 8gig heap, new gen is 1200M.

ERROR [ReadStage:55887] 2013-11-02 23:35:18,419 
AbstractCassandraDaemon.java (line 132) Exception in thread 
Thread[ReadStage:55887,5,main]

java.lang.OutOfMemoryError: Java heap space
   at 
org.apache.cassandra.io.util.RandomAccessReader.readBytes(RandomAccessReader.java:323) 

   at 
org.apache.cassandra.utils.ByteBufferUtil.read(ByteBufferUtil.java:398)
   at 
org.apache.cassandra.utils.ByteBufferUtil.readWithShortLength(ByteBufferUtil.java:380) 

   at 
org.apache.cassandra.db.ColumnSerializer.deserialize(ColumnSerializer.java:88) 

   at 
org.apache.cassandra.db.ColumnSerializer.deserialize(ColumnSerializer.java:83) 

   at 
org.apache.cassandra.db.ColumnSerializer.deserialize(ColumnSerializer.java:73) 

   at 
org.apache.cassandra.db.ColumnSerializer.deserialize(ColumnSerializer.java:37) 

   at 
org.apache.cassandra.db.columniterator.IndexedSliceReader$IndexedBlockFetcher.getNextBlock(IndexedSliceReader.java:179) 

   at 
org.apache.cassandra.db.columniterator.IndexedSliceReader.computeNext(IndexedSliceReader.java:121) 

   at 
org.apache.cassandra.db.columniterator.IndexedSliceReader.computeNext(IndexedSliceReader.java:48) 

   at 
com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:140) 

   at 
com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:135) 

   at 
org.apache.cassandra.db.columniterator.SSTableSliceIterator.hasNext(SSTableSliceIterator.java:116) 

   at 
org.apache.cassandra.utils.MergeIterator$Candidate.advance(MergeIterator.java:147) 

   at 
org.apache.cassandra.utils.MergeIterator$ManyToOne.advance(MergeIterator.java:126) 

   at 
org.apache.cassandra.utils.MergeIterator$ManyToOne.computeNext(MergeIterator.java:100) 

   at 
com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:140) 

   at 
com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:135) 

   at 
org.apache.cassandra.db.filter.SliceQueryFilter.collectReducedColumns(SliceQueryFilter.java:117) 

   at 
org.apache.cassandra.db.filter.QueryFilter.collateColumns(QueryFilter.java:140) 

   at 
org.apache.cassandra.db.CollationController.collectAllData(CollationController.java:292) 

   at 
org.apache.cassandra.db.CollationController.getTopLevelColumns(CollationController.java:64) 

   at 
org.apache.cassandra.db.ColumnFamilyStore.getTopLevelColumns(ColumnFamilyStore.java:1362) 

   at 
org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(ColumnFamilyStore.java:1224) 

   at 
org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(ColumnFamilyStore.java:1159) 


   at org.apache.cassandra.db.Table.getRow(Table.java:378)
   at 
org.apache.cassandra.db.SliceFromReadCommand.getRow(SliceFromReadCommand.java:69) 

   at 
org.apache.cassandra.db.ReadVerbHandler.doVerb(ReadVerbHandler.java:51)
   at 
org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:59) 

   at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) 

   at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) 


   at java.lang.Thread.run(Thread.java:722)


Any thoughts ?

This is a dual data center set up, with 4 nodes in each DC and RF=2 in each.


--
Regards,
Oleg Dulin
http://www.olegdulin.com

Frustration with repair process in 1.1.11

2013-11-01 Thread Oleg Dulin


First I need to vent.

rant
One of my cassandra cluster is a dual data center setup, with DC1 
acting as primary, and DC2 acting as a hot backup.


Well, guess what ? I am pretty sure that it falls behind on 
replication. So I am told I need to run repair.


I run repair (with -pr) on DC2. First time I run it it gets *stuck* 
(i.e. frozen) within the first 30 seconds, with no error or any sort of 
message. I then run it again -- and it completes in seconds on each 
node, with about 50 gigs of data on each.


That seems suspicious, so I do some research.

I am told on IRC that running repair -pr will only do the repair on 
100 tokens (the offset from DC1 to DC2)… Seriously ???


Repair process is, indeed, a joke: 
https://issues.apache.org/jira/browse/CASSANDRA-5396 . Repair is the 
worst thing you can do to your cluster, it consumes enormous resources, 
and can leave your cluster in an inconsistent state. Oh and by the way 
you must run it every week…. Whoever invented that process must not 
live in a real world, with real applications.

/rant

No… lets have a constructive conversation.

How do I know, with certainty, that my DC2 cluster is up to date on 
replication ? I have a few options:


1) I set read repair chance to 100% on critical column families and I 
write a tool to scan every CF, every column of every row. This strikes 
me as very silly. 
Q1: Do I need to scan every column or is looking at one column enough 
to trigger a read repair ?


2) Can someone explain to me how the repair works such that I don't 
totally trash my cluster or spill into work week ?


Is there any improvement and clarity in 1.2 ? How about 2.0 ?



--
Regards,
Oleg Dulin
http://www.olegdulin.com

Too many open files with Cassandra 1.2.11

2013-10-29 Thread Oleg Dulin


Got this error:

WARN [Thread-8] 2013-10-29 02:58:24,565 CustomTThreadPoolServer.java 
(line 122) Transport error occurred during acceptance of message.
2 org.apache.thrift.transport.TTransportException: 
java.net.SocketException: Too many open files
3 at 
org.apache.cassandra.thrift.TCustomServerSocket.acceptImpl(TCustomServerSocket.java:109) 

4 at 
org.apache.cassandra.thrift.TCustomServerSocket.acceptImpl(TCustomServerSocket.java:36) 

5 at 
org.apache.thrift.transport.TServerTransport.accept(TServerTransport.java:31) 

6 at 
org.apache.cassandra.thrift.CustomTThreadPoolServer.serve(CustomTThreadPoolServer.java:110) 

7 at 
org.apache.cassandra.thrift.ThriftServer$ThriftServerThread.run(ThriftServer.java:111) 



I haven't seen this since 1.0 days. 1.1.11 had it all fixed I thought.

ulimit outputs unlimited

What could cause this ?

Any help is greatly apprecaited.

--
Regards,
Oleg Dulin
http://www.olegdulin.com

Adding a data center with data already in place

2013-10-25 Thread Oleg Dulin

I am using Cassandra 1.1.11 and plan on upgrading soon, but in the 
meantime here is what happened.


I couldn't run repairs because of a slow WAN pipe, so i removed the 
second data center from the cluster.


Today I need to bring that data center back in. It is not 2-3 days out 
dated. I have two options:


1) Treat this as a new data center and let the nodes sync from scratch, or
2) Bring the nodes back up with all the data in place and do a repair.

We are talking about 30-40Gigs per node. There are 4 nodes in both data 
centers, with RF=2.



--
Regards,
Oleg Dulin
http://www.olegdulin.com

Unbalanced ring mystery multi-DC issue with 1.1.11

2013-09-27 Thread Oleg Dulin


Consider this output from nodetool ring:

Address DC  RackStatus State   Load
Effective-Ownership Token
   
  127605887595351923798765477786913079396
dc1.5  DC1 	RAC1Up Normal  32.07 GB50.00%   
  0
dc2.100DC2 RAC1Up Normal  8.21 GB 50.00%
 100
dc1.6  DC1 RAC1Up Normal  32.82 GB50.00%
 42535295865117307932921825928971026432
dc2.101DC2 RAC1Up Normal  12.41 GB50.00%
 42535295865117307932921825928971026532
dc1.7  DC1 RAC1Up Normal  28.37 GB50.00%
 85070591730234615865843651857942052864
dc2.102DC2 RAC1Up Normal  12.27 GB50.00%
 85070591730234615865843651857942052964
dc1.8  DC1 RAC1Up Normal  27.34 GB50.00%
 127605887595351923798765477786913079296
dc2.103DC2 RAC1Up Normal  13.46 GB50.00%
 127605887595351923798765477786913079396


I concealed IPs and DC names for confidentiality.

All of the data loading was happening against DC1 at a pretty brisk 
rate, of, say, 200K writes per minute.


Note how my tokens are offset by 100. Shouldn't that mean that load on 
each node should be roughly identical ? In DC1 it is roughly around 30 
G on each node. In DC2 it is almost 1/3rd of the nearest DC1 node by 
token range.


To verify that the nodes are in sync, I ran nodetool -h localhost 
repair MyKeySpace --partitioner-range on each node in DC2. Watching the 
logs, I see that the repair went really quick and all column families 
are in sync!


I need help making sense of this. Is this because DC1 is not fully 
compacted ? Is it because DC2 is not fully synced and I am not checking 
correctly ? How can I tell that there is still replication going on in 
progress (note, I started my load yesterday at 9:50am).


--
Regards,
Oleg Dulin
http://www.olegdulin.com

Re: Unbalanced ring mystery multi-DC issue with 1.1.11

2013-09-27 Thread Oleg Dulin


Wanted to add one more thing:

I can also tell that the numbers are not consistent across DRs this way 
-- I have a column family with really wide rows (a couple million 
columns).


DC1 reports higher column counts than DC2. DC2 only becomes consistent 
after I do the command a couple of times and trigger a read-repair. But 
why would nodetool repair logs show that everything is in sync ?


Regards,
Oleg

On 2013-09-27 10:23:45 +, Oleg Dulin said:


Consider this output from nodetool ring:

Address DC  RackStatus State   Load
Effective-Ownership Token

   127605887595351923798765477786913079396
dc1.5  DC1 	RAC1Up Normal  32.07 GB50.00%   
   0
dc2.100DC2 RAC1Up Normal  8.21 GB 50.00%
  100
dc1.6  DC1 RAC1Up Normal  32.82 GB50.00%
  42535295865117307932921825928971026432
dc2.101DC2 RAC1Up Normal  12.41 GB50.00%
  42535295865117307932921825928971026532
dc1.7  DC1 RAC1Up Normal  28.37 GB50.00%
  85070591730234615865843651857942052864
dc2.102DC2 RAC1Up Normal  12.27 GB50.00%
  85070591730234615865843651857942052964
dc1.8  DC1 RAC1Up Normal  27.34 GB50.00%
  127605887595351923798765477786913079296
dc2.103DC2 RAC1Up Normal  13.46 GB50.00%
  127605887595351923798765477786913079396


I concealed IPs and DC names for confidentiality.

All of the data loading was happening against DC1 at a pretty brisk 
rate, of, say, 200K writes per minute.


Note how my tokens are offset by 100. Shouldn't that mean that load on 
each node should be roughly identical ? In DC1 it is roughly around 30 
G on each node. In DC2 it is almost 1/3rd of the nearest DC1 node by 
token range.


To verify that the nodes are in sync, I ran nodetool -h localhost 
repair MyKeySpace --partitioner-range on each node in DC2. Watching the 
logs, I see that the repair went really quick and all column families 
are in sync!


I need help making sense of this. Is this because DC1 is not fully 
compacted ? Is it because DC2 is not fully synced and I am not checking 
correctly ? How can I tell that there is still replication going on in 
progress (note, I started my load yesterday at 9:50am).



--
Regards,
Oleg Dulin
http://www.olegdulin.com

Re: Unbalanced ring mystery multi-DC issue with 1.1.11

2013-09-27 Thread Oleg Dulin


Here is some more information.

I am running full repair on one of the nodes and I am observing strange 
behavior.


Both DCs were up during the data load. But repair is reporting a lot of 
out-of-sync data. Why would that be ? Is there a way for me to tell 
that WAN may be dropping hinted handoff traffic ?


Regards,
Oleg

On 2013-09-27 10:35:34 +, Oleg Dulin said:


Wanted to add one more thing:

I can also tell that the numbers are not consistent across DRs this way 
-- I have a column family with really wide rows (a couple million 
columns).


DC1 reports higher column counts than DC2. DC2 only becomes consistent 
after I do the command a couple of times and trigger a read-repair. But 
why would nodetool repair logs show that everything is in sync ?


Regards,
Oleg

On 2013-09-27 10:23:45 +, Oleg Dulin said:


Consider this output from nodetool ring:

Address DC  RackStatus State   Load
Effective-Ownership Token


127605887595351923798765477786913079396
dc1.5  DC1  RAC1Up Normal  32.07 GB50.00%   0
dc2.100DC2 RAC1Up Normal  8.21 GB 50.00%100
dc1.6  DC1 RAC1Up Normal  32.82 GB50.00%
42535295865117307932921825928971026432
dc2.101DC2 RAC1Up Normal  12.41 GB50.00%
42535295865117307932921825928971026532
dc1.7  DC1 RAC1Up Normal  28.37 GB50.00%
85070591730234615865843651857942052864
dc2.102DC2 RAC1Up Normal  12.27 GB50.00%
85070591730234615865843651857942052964
dc1.8  DC1 RAC1Up Normal  27.34 GB50.00%
127605887595351923798765477786913079296
dc2.103DC2 RAC1Up Normal  13.46 GB50.00%
127605887595351923798765477786913079396


I concealed IPs and DC names for confidentiality.

All of the data loading was happening against DC1 at a pretty brisk 
rate, of, say, 200K writes per minute.


Note how my tokens are offset by 100. Shouldn't that mean that load on 
each node should be roughly identical ? In DC1 it is roughly around 30 
G on each node. In DC2 it is almost 1/3rd of the nearest DC1 node by 
token range.


To verify that the nodes are in sync, I ran nodetool -h localhost 
repair MyKeySpace --partitioner-range on each node in DC2. Watching the 
logs, I see that the repair went really quick and all column families 
are in sync!


I need help making sense of this. Is this because DC1 is not fully 
compacted ? Is it because DC2 is not fully synced and I am not checking 
correctly ? How can I tell that there is still replication going on in 
progress (note, I started my load yesterday at 9:50am).



--
Regards,
Oleg Dulin
http://www.olegdulin.com

Need help configuring WAN replication over slow WAN

2013-09-18 Thread Oleg Dulin


Here is a problem:

My customer has a 45Megabit connection to their off-site DR data 
center. They have about 500G worth of data. That connection is shared. 
Needless to say this is not an optimal configuration.


To replicate all that in real time it'll take a week.

My primary cluster is 4 nodes, RF=2. DR cluster is also 4 nodes, RF=2.

I need a way to somehow setup the primary cluster, populate all the 
data, then transfer it to the DR cluster.




One suggestion is:

1) Setup the primary cluster, plus configure a Mac Mini as a backup 
data center but on the same network

2) Populate the data
3) Physically take Mac Mini to the DR data center and transfer its data 
to one of the nodes  and then run nodetool cleanup to move the data 
around amongs nodes.


Now… this doesn't strike me as optimal. I feel like I'll need to run 
repair on the new cluster, which defeats the purpose -- it'll just hog 
the 45Megabit pipe…


Somehow I need away to load all the data into primary cluster, then 
ship it over to the backup in a more timely fashion…


Any suggestions are greatly appreciated.

Also,  I need a way to know if the replication is up to date or not.

--
Regards,
Oleg Dulin
http://www.olegdulin.com

Pycassa xget not parsing composite column name properly

2013-06-16 Thread Oleg Dulin


I have a column family defined as:

create column family LSItemIdsByFieldValueIndex_Integer
 with column_type = 'Standard'
 and comparator = 
'CompositeType(org.apache.cassandra.db.marshal.IntegerType,org.apache.cassandra.db.marshal.UTF8Type)' 


 and default_validation_class = 'UTF8Type'
 and key_validation_class = 'UTF8Type';

This snippet of code:

   result=searchIndex.get_range(column_count=1)
   for key,columns in result:
   print '\t',key
   indexData=searchIndex[indexCF].xget(key)
   for name, value in indexData:
   print name

does not correctly print column name as parsed into a tuple of two parts.

Am I doing something wrong here ?



--
Regards,
Oleg Dulin
http://www.olegdulin.com

Re: Running Cassandra with no open TCP ports

2013-05-28 Thread Oleg Dulin


Mark:

This begs a question -- why are you using Cassandra for this ? There 
are simpler noSQL stores than Cassandra that are better for embedding.


Oleg

On 2013-05-28 02:24:48 +, Mark Mccraw said:


Hi All,

I'm using Cassandra as an embedded datastore for a small service that 
doesn't need (or want) to act as a database service in any way.  
Moreover, we may want to start up multiple instances of the 
application, and right now whenever that happens, we get port conflicts 
on 7000 because Cassandra is listening for connections.  I couldn't 
find an obvious way to disable listening on any port.  Is there an easy 
way?


Thanks!
Mark



--
Regards,
Oleg Dulin
http://www.olegdulin.com

Re: Iterating through large numbers of rows with JDBC

2013-05-12 Thread Oleg Dulin


On 2013-05-11 14:42:32 +, Robert Wille said:


I'm using the JDBC driver to access Cassandra. I'm wondering if its
possible to iterate through a large number of records (e.g. to perform
maintenance on a large column family). I tried calling
Connection.createStatement(ResultSet.TYPE_FORWARD_ONLY,
ResultSet.CONCUR_READ_ONLY), but it times out, so I'm guessing that
cursors aren't supported. Is there another way to do this, or do I need to
use a different API?

Thanks in advance

Robert


If you feel that you need to iterate through a large number of rows 
then you are probably not using a correct data model.


Can you describe your use case ?

--
Regards,
Oleg Dulin
NYC Java Big Data Engineer
http://www.olegdulin.com/

Cassanrda 1.1.11 compression: how to tell if it works ?

2013-05-07 Thread Oleg Dulin


I have a column family with really wide rows set to use Snappy like this:

compression_options = {'sstable_compression' : 
'org.apache.cassandra.io.compress.SnappyCompressor'}  

My understanding is that if a file is compressed I should not be able 
to use strings command to view its contents. But it seems like I can 
view the contents like this:


strings *-Data.db 


At what point does compression start ? How can I confirm it is working ?


--
Regards,
Oleg Dulin
NYC Java Big Data Engineer
http://www.olegdulin.com/

How much heap does Cassandra 1.1.11 really need ?

2013-05-03 Thread Oleg Dulin

Here is my question. It can't possibly be a good set up to use 16gig 
heap space, but this is the best I can do. Setting it to default never 
worked well for me, setting it to 8g doesn't work well either. It can't 
keep up with flushing memtables. It is possibly that someone at some 
point may have broken something in the config files. If I were to look 
for hints there, what should I look at ?


Look at my gc log from Cassandra:

Starts off like this:

2013-04-29T08:53:44.548-0400: 5.386: [GC 1677824K-11345K(16567552K), 
0.0509880 secs]
   2 2013-04-29T08:53:47.701-0400: 8.539: [GC 
1689169K-42027K(16567552K), 0.1269180 secs]
   3 2013-04-29T08:54:05.361-0400: 26.199: [GC 
1719851K-231763K(16567552K), 0.1436070 secs]
   4 2013-04-29T08:55:44.797-0400: 125.635: [GC 
1909587K-1480096K(16567552K), 1.2626270 secs]
   5 2013-04-29T08:58:44.367-0400: 305.205: [GC 
3157920K-2358588K(16567552K), 1.1198150 secs]
   6 2013-04-29T09:01:12.167-0400: 453.005: [GC 
4036412K-3634298K(16567552K), 1.0098650 secs]
   7 2013-04-29T09:03:35.204-0400: 596.042: [GC 
5312122K-4339703K(16567552K), 0.4597180 secs]
   8 2013-04-29T09:04:51.562-0400: 672.400: [GC 
6017527K-4956381K(16567552K), 0.5361800 secs]
   9 2013-04-29T09:04:59.205-0400: 680.043: [GC 
6634205K-5131825K(16567552K), 0.1741690 secs]
  10 2013-04-29T09:05:06.638-0400: 687.476: [GC 
6809649K-5027933K(16567552K), 0.0607470 secs]
  11 2013-04-29T09:05:13.908-0400: 694.747: [GC 
6705757K-5012439K(16567552K), 0.0624410 secs]
  12 2013-04-29T09:05:20.909-0400: 701.747: [GC 
6690263K-5039538K(16567552K), 0.0618750 secs]
  13 2013-04-29T09:06:35.914-0400: 776.752: [GC 
6717362K-5819204K(16567552K), 0.5738550 secs]
  14 2013-04-29T09:08:05.589-0400: 866.428: [GC 
7497028K-6678597K(16567552K), 0.6781900 secs]
  15 2013-04-29T09:08:12.458-0400: 873.296: [GC 
8356421K-6865736K(16567552K), 0.1423040 secs]
  16 2013-04-29T09:08:18.690-0400: 879.529: [GC 
8543560K-6742902K(16567552K), 0.0516470 secs]
  17 2013-04-29T09:08:24.914-0400: 885.752: [GC 
8420726K-6725877K(16567552K), 0.0517290 secs]
  18 2013-04-29T09:08:31.008-0400: 891.846: [GC 
8403701K-6741781K(16567552K), 0.0532540 secs]
  19 2013-04-29T09:08:37.201-0400: 898.039: [GC 
8419605K-6759614K(16567552K), 0.0563290 secs]
  20 2013-04-29T09:08:43.493-0400: 904.331: [GC 
8437438K-6772147K(16567552K), 0.0569580 secs]
  21 2013-04-29T09:08:49.757-0400: 910.595: [GC 
8449971K-6776883K(16567552K), 0.0558070 secs]
  22 2013-04-29T09:08:55.973-0400: 916.812: [GC 
8454707K-6789404K(16567552K), 0.0577230 secs]


……


look what it is today:

41536 2013-05-03T07:17:13.519-0400: 339814.357: [GC 
9178946K-9176740K(16567552K), 0.0265830 secs]
41537 2013-05-03T07:17:19.556-0400: 339820.394: [GC 
10854564K-9178449K(16567552K), 0.0253180 secs]
41538 2013-05-03T07:17:24.390-0400: 339825.228: [GC 
10856273K-9179073K(16567552K), 0.0266450 secs]
41539 2013-05-03T07:17:30.729-0400: 339831.567: [GC 
10856897K-9178629K(16567552K), 0.0261150 secs]
41540 2013-05-03T07:17:35.584-0400: 339836.422: [GC 
10856453K-9178586K(16567552K), 0.0250870 secs]
41541 2013-05-03T07:17:38.514-0400: 339839.352: [GC 
10856410K-9179314K(16567552K), 0.0258120 secs]
41542 2013-05-03T07:17:43.200-0400: 339844.038: [GC 
10857138K-9180160K(16567552K), 0.0250150 secs]
41543 2013-05-03T07:17:46.566-0400: 339847.404: [GC 
10857984K-9179071K(16567552K), 0.0264420 secs]
41544 2013-05-03T07:17:52.913-0400: 339853.751: [GC 
10856895K-9179870K(16567552K), 0.0262430 secs]
41545 2013-05-03T07:17:58.303-0400: 339859.141: [GC 
10857694K-9179209K(16567552K), 0.0255130 secs]
41546 2013-05-03T07:18:03.427-0400: 339864.265: [GC 
10857033K-9178316K(16567552K), 0.0263140 secs]
41547 2013-05-03T07:18:11.657-0400: 339872.495: [GC 
10856140K-9178351K(16567552K), 0.0265340 secs]
41548 2013-05-03T07:18:17.429-0400: 339878.267: [GC 
10856175K-9179067K(16567552K), 0.0254820 secs]
41549 2013-05-03T07:18:21.251-0400: 339882.089: [GC 
10856891K-9179680K(16567552K), 0.0264210 secs]
41550 2013-05-03T07:18:25.062-0400: 339885.900: [GC 
10857504K-9178985K(16567552K), 0.0267200 secs]





--
Regards,
Oleg Dulin
NYC Java Big Data Engineer
http://www.olegdulin.com/

Re: How much heap does Cassandra 1.1.11 really need ?

2013-05-03 Thread Oleg Dulin


What constitutes an extreme write ?

On 2013-05-03 15:45:33
+, Edward Capriolo said:

If your writes are so extreme that metables are flushing all the time, 
the best you can do is turn off all caches, do bloom filters off heap, 
and then instruct cassandra to use large portions of the heap as 
memtables. 



On Fri, May 3, 2013 at 11:40 AM, Bryan Talbot btal...@aeriagames.com wrote:
It's true that a 16GB heap is generally not a good idea; however, it's 
not clear from the data provided what problem you're trying to solve.


What is it that you don't like about the default settings?

-Bryan



On Fri, May 3, 2013 at 4:27 AM, Oleg Dulin oleg.du...@gmail.com wrote:
Here is my question. It can't possibly be a good set up to use 16gig 
heap space, but this is the best I can do. Setting it to default never 
worked well for me, setting it to 8g doesn't work well either. It can't 
keep up with flushing memtables. It is possibly that someone at some 
point may have broken something in the config files. If I were to look 
for hints there, what should I look at ?


Look at my gc log from Cassandra:

Starts off like this:

2013-04-29T08:53:44.548-0400: 5.386: [GC 1677824K-11345K(16567552K), 
0.0509880 secs]
   2 2013-04-29T08:53:47.701-0400: 8.539: [GC 
1689169K-42027K(16567552K), 0.1269180 secs]
   3 2013-04-29T08:54:05.361-0400: 26.199: [GC 
1719851K-231763K(16567552K), 0.1436070 secs]
   4 2013-04-29T08:55:44.797-0400: 125.635: [GC 
1909587K-1480096K(16567552K), 1.2626270 secs]
   5 2013-04-29T08:58:44.367-0400: 305.205: [GC 
3157920K-2358588K(16567552K), 1.1198150 secs]
   6 2013-04-29T09:01:12.167-0400: 453.005: [GC 
4036412K-3634298K(16567552K), 1.0098650 secs]
   7 2013-04-29T09:03:35.204-0400: 596.042: [GC 
5312122K-4339703K(16567552K), 0.4597180 secs]
   8 2013-04-29T09:04:51.562-0400: 672.400: [GC 
6017527K-4956381K(16567552K), 0.5361800 secs]
   9 2013-04-29T09:04:59.205-0400: 680.043: [GC 
6634205K-5131825K(16567552K), 0.1741690 secs]
  10 2013-04-29T09:05:06.638-0400: 687.476: [GC 
6809649K-5027933K(16567552K), 0.0607470 secs]
  11 2013-04-29T09:05:13.908-0400: 694.747: [GC 
6705757K-5012439K(16567552K), 0.0624410 secs]
  12 2013-04-29T09:05:20.909-0400: 701.747: [GC 
6690263K-5039538K(16567552K), 0.0618750 secs]
  13 2013-04-29T09:06:35.914-0400: 776.752: [GC 
6717362K-5819204K(16567552K), 0.5738550 secs]
  14 2013-04-29T09:08:05.589-0400: 866.428: [GC 
7497028K-6678597K(16567552K), 0.6781900 secs]
  15 2013-04-29T09:08:12.458-0400: 873.296: [GC 
8356421K-6865736K(16567552K), 0.1423040 secs]
  16 2013-04-29T09:08:18.690-0400: 879.529: [GC 
8543560K-6742902K(16567552K), 0.0516470 secs]
  17 2013-04-29T09:08:24.914-0400: 885.752: [GC 
8420726K-6725877K(16567552K), 0.0517290 secs]
  18 2013-04-29T09:08:31.008-0400: 891.846: [GC 
8403701K-6741781K(16567552K), 0.0532540 secs]
  19 2013-04-29T09:08:37.201-0400: 898.039: [GC 
8419605K-6759614K(16567552K), 0.0563290 secs]
  20 2013-04-29T09:08:43.493-0400: 904.331: [GC 
8437438K-6772147K(16567552K), 0.0569580 secs]
  21 2013-04-29T09:08:49.757-0400: 910.595: [GC 
8449971K-6776883K(16567552K), 0.0558070 secs]
  22 2013-04-29T09:08:55.973-0400: 916.812: [GC 
8454707K-6789404K(16567552K), 0.0577230 secs]


……


look what it is today:

41536 2013-05-03T07:17:13.519-0400: 339814.357: [GC 
9178946K-9176740K(16567552K), 0.0265830 secs]
41537 2013-05-03T07:17:19.556-0400: 339820.394: [GC 
10854564K-9178449K(16567552K), 0.0253180 secs]
41538 2013-05-03T07:17:24.390-0400: 339825.228: [GC 
10856273K-9179073K(16567552K), 0.0266450 secs]
41539 2013-05-03T07:17:30.729-0400: 339831.567: [GC 
10856897K-9178629K(16567552K), 0.0261150 secs]
41540 2013-05-03T07:17:35.584-0400: 339836.422: [GC 
10856453K-9178586K(16567552K), 0.0250870 secs]
41541 2013-05-03T07:17:38.514-0400: 339839.352: [GC 
10856410K-9179314K(16567552K), 0.0258120 secs]
41542 2013-05-03T07:17:43.200-0400: 339844.038: [GC 
10857138K-9180160K(16567552K), 0.0250150 secs]
41543 2013-05-03T07:17:46.566-0400: 339847.404: [GC 
10857984K-9179071K(16567552K), 0.0264420 secs]
41544 2013-05-03T07:17:52.913-0400: 339853.751: [GC 
10856895K-9179870K(16567552K), 0.0262430 secs]
41545 2013-05-03T07:17:58.303-0400: 339859.141: [GC 
10857694K-9179209K(16567552K), 0.0255130 secs]
41546 2013-05-03T07:18:03.427-0400: 339864.265: [GC 
10857033K-9178316K(16567552K), 0.0263140 secs]
41547 2013-05-03T07:18:11.657-0400: 339872.495: [GC 
10856140K-9178351K(16567552K), 0.0265340 secs]
41548 2013-05-03T07:18:17.429-0400: 339878.267: [GC 
10856175K-9179067K(16567552K), 0.0254820 secs]
41549 2013-05-03T07:18:21.251-0400: 339882.089: [GC 
10856891K-9179680K(16567552K), 0.0264210 secs]
41550 2013-05-03T07:18:25.062-0400: 339885.900: [GC 
10857504K-9178985K(16567552K), 0.0267200 secs]





--
Regards,
Oleg Dulin
NYC Java Big Data Engineer
http://www.olegdulin.com/



--
Regards,
Oleg Dulin
NYC Java Big Data Engineer
http://www.olegdulin.com/

Cass 1.1.1 and 1.1.11 Exception during compactions

2013-04-29 Thread Oleg Dulin

We saw this exception with 1.1.1 and also with 1.1.11 (we upgraded for 
unrelated reasons, to fix the FD leak during slice queries) -- name of 
the CF replaced with * for confidentiality:


10419 ERROR [CompactionExecutor:36] 2013-04-29 07:50:49,060 
AbstractCassandraDaemon.java (line 132) Exception in thread T  
hread[CompactionExecutor:36,1,main]
10420 java.lang.RuntimeException: Last written key 
DecoratedKey(138024912283272996716128964353306009224, 6138633035613062  
   2d61362d376330612d666531662d373738616630636265396535) = 
current key DecoratedKey(12706537740594940274338371890  1402082101, 
64323962636163652d646561372d333039322d386166322d663064346132363963386131) 
writing into *-tmp-hf-7372-Data.db
10421 at 
org.apache.cassandra.io.sstable.SSTableWriter.beforeAppend(SSTableWriter.java:134) 

10422 at 
org.apache.cassandra.io.sstable.SSTableWriter.append(SSTableWriter.java:153) 

10423 at 
org.apache.cassandra.db.compaction.CompactionTask.execute(CompactionTask.java:160) 

10424 at 
org.apache.cassandra.db.compaction.LeveledCompactionTask.execute(LeveledCompactionTask.java:50) 

10425 at 
org.apache.cassandra.db.compaction.CompactionManager$2.runMayThrow(CompactionManager.java:164) 

10426 at 
org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:30)
10427 at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
10428 at 
java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)

10429 at java.util.concurrent.FutureTask.run(FutureTask.java:166)
10430 at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) 

10431 at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) 


10432 at java.lang.Thread.run(Thread.java:722)


ANy thoughts ? Should I be concerned about data being lost ?


--
Regards,
Oleg Dulin
NYC Java Big Data Engineer
http://www.olegdulin.com/

Re: Replication factor and performance questions

2012-11-05 Thread Oleg Dulin


Should be all under 400Gig on each.

My question is -- is there additional overhead with replicas making 
requests to one another for keys they don't have ? how much of an 
overhead is that ?


On 2012-11-05 17:00:37 +, Michael Kjellman said:


Rule of thumb is to try to keep nodes under 400GB.
Compactions/Repairs/Move operations etc become a nightmare otherwise. How
much data do you expect to have on each node? Also depends on caches,
bloom filters etc

On 11/5/12 8:57 AM, Oleg Dulin oleg.du...@gmail.com wrote:


I have 4 nodes at my disposal.

I can configure them like this:

1) RF=1, each node has 25% of the data. On random-reads, how big is the
performance penalty if a node needs to look for data on another replica
?

2) RF=2, each node has 50% of the data. Same question ?



--
Regards,
Oleg Dulin
NYC Java Big Data Engineer
http://www.olegdulin.com/





'Like' us on Facebook for exclusive content and other resources on all 
Barracuda Networks solutions.


Visit http://barracudanetworks.com/facebook






--
Regards,
Oleg Dulin
NYC Java Big Data Engineer
http://www.olegdulin.com/

Re: Text searches and free form queries

2012-10-09 Thread Oleg Dulin





It works pretty fast.

Cool.
Just keep an eye out for how big the lucene token row gets.
Cheers




Indeed, it may get out of hand, but for now we are ok -- for the 
foreseable future I would say.


Should it get larger, I can split it up into rows -- i.e. all tokens 
that start with a, all tokens that start with b, etc.

1.1.1 is repair still needed ?

2012-10-09 Thread Oleg Dulin


My understanding is that the repair has to happen within gc_grace period.

But in 1.1.1 you can set gc_grace by CF. A couple of my CFs that are 
frequently updated have gc_grace of 1 hour, but we do run a weekly 
repair.


So the question is, is this still needed ? Do we even need to run 
nodetool repair ?


If gc_grace is 10 days on all other CFs, are we saying that as long as 
we restart that node within the 10 day period we don't need to run 
nodetool repair ?


The reason I bring this up is because repair once in a while runs for 
more than a day on some of these nodes (500+Gigs of data) and it is 
causing slowness with read requests.



--
Regards,
Oleg Dulin
NYC Java Big Data Engineer
http://www.olegdulin.com/

Re: Text searches and free form queries

2012-10-06 Thread Oleg Dulin


So, what I ended up doing is this --

As I write my records into the main CF, I tokenize some fields that I 
want to search on using Lucene and write an index into a separate CF, 
such that my columns are a composite of:


luceneToken:record key

I can then search my records by doing a slice for each lucene token in 
the search query and then do an intersection of the sets. It works 
pretty fast.


Regards,
Oleg

On 2012-09-05 01:28:44 +, aaron morton said:

AFAIk if you want to keep it inside cassandra then DSE, roll your own 
from scratch or start with https://github.com/tjake/Solandra . 


Outside of Cassandra I've heard of people using Elastic Search or Solr 
which I *think* is now faster at updating the index. 


Hope that helps. 

 
-
Aaron Morton
Freelance Developer
@aaronmorton
http://www.thelastpickle.com

On 4/09/2012, at 3:00 AM, Andrey V. Panov panov.a...@gmail.com wrote:
Some one did search on Lucene, but for very fresh data they build 
search index in memory so data become available for search without 
delays.


On 3 September 2012 22:25, Oleg Dulin oleg.du...@gmail.com wrote:
Dear Distinguished Colleagues:



--
Regards,
Oleg Dulin
NYC Java Big Data Engineer
http://www.olegdulin.com/

Re: Cassandra 1.1.1 on Java 7

2012-09-11 Thread Oleg Dulin


So, my experiment didn't quite work out.

I was hoping to use G1 collector to minimize pauses -- pauses didn't 
really go away, but what's worse is I think the memtable memory 
calculations are driven by CMS, so my memtables would fill up and cause 
Cass to run out of heap :(



On 2012-09-09 19:04:41 +, Jeremy Hanna said:

Starting with 1.6.0_34, you'll need xss set to 180k.  It's updated with 
the forthcoming 1.1.5 as well as the next minor rev of 1.0.x (1.0.12).

https://issues.apache.org/jira/browse/CASSANDRA-4631
See also the comments on 
https://issues.apache.org/jira/browse/CASSANDRA-4602 for the reference 
to what required a higher stack.


On Sep 9, 2012, at 12:47 PM, Christopher Keller cnkel...@gmail.com wrote:

This is necessary under the later versions of 1.6v35  as well. Nodetool 
will show the cluster as being down even though individual nodes will 
be up.


--Chris


On Sep 9, 2012, at 7:13 AM, dong.yajun dongt...@gmail.com wrote:

ruuning for a while, you should set the -Xss to more than 160k when you 
using jdk1.7. On Sun, Sep 9, 2012 at 3:39 AM, Peter Schuller 
peter.schul...@infidyne.com wrote:

Has anyone tried running 1.1.1 on Java 7?


Have been running jdk 1.7 on several clusters on 1.1 for a while now.

--
/ Peter Schuller (@scode, http://worldmodscode.wordpress.com)



--
Ric Dong Newegg Ecommerce, MIS department --
The downside of being better than everyone else is that people tend to 
assume you're pretentious.



--
Regards,
Oleg Dulin
NYC Java Big Data Engineer
http://www.olegdulin.com/

Re: Long-life TTL and extending TTL

2012-09-10 Thread Oleg Dulin

You should create an index where you store references to your records. 
You can use composite column names where column 
name=composite(timestamp,key)


then you would get a slice of all columns where timestamp part of the 
composite is = TTL in the past, and then iterate through them and 
delete the items.


Regards,
Oleg


On 2012-09-10 09:47:31 +, Robin Verlangen said:


Hi there,

I'm working on a project that might want to set TTL to roughly 7 years. 
However it might occur that the TTL should be reduced or extended. Is 
there any way of updating the TTL without being in need of rewriting 
the data back again? This would cause way to much overhead for this.


If not, is running a Map/Reduce task on the whole data set the best 
option or should I think in a difference approach for this challenge?


My last question is regarding to a long term TTL, does this have any 
negative impact on the cluster? Maybe during compaction, repair, 
reading/writing?


Best regards, 

Robin Verlangen
Software engineer

W http://www.robinverlangen.nl
E ro...@us2.nl

Disclaimer: The information contained in this message and attachments 
is intended solely for the attention and use of the named addressee and 
may be confidential. If you are not the intended recipient, you are 
reminded that the information remains the property of the sender. You 
must not use, disclose, distribute, copy, print or rely on this e-mail. 
If you have received this message in error, please contact the sender 
immediately and irrevocably delete this message and any copies.

Re: High commit size

2012-09-10 Thread Oleg Dulin


It is memory-mapped I/O. I wouldn't worry about it.

BTW, Windows might not be the best choice to run Cassandra on. My 
experience running Cassandra on Windows has not been positive one. We 
no longer support Windows as our production platform.


Regards,
Oleg


On 2012-09-10 09:00:02 +, Rene Kochen said:


Hi all,

On my test cluster I have three Windows Server 2008 R2 machines running 
Cassandra 1.0.11


If i use memory mapped IO (the default), then the nodes freeze after a 
while. Paging is disabled.


The private bytes are OK (8GB). That is the amount I use in the -Xms 
and -Xmx arguments. The virtual size is big as expected because of the 
memory mapped IO. However, the working set size (size in RAM) is 24 GB 
(my total RAM usage). If I look with Process Explorer to the physical 
memory section I see a very high value in the WS Sharable section.


Anyone has a clue what is going om here?

Many thanks!

Rene
image



--
Regards,
Oleg Dulin
NYC Java Big Data Engineer
http://www.olegdulin.com/

JVM 7, Cass 1.1.1 and G1 garbage collector

2012-09-10 Thread Oleg Dulin


I am currently profiling a Cassandra 1.1.1 set up using G1 and JVM 7.

It is my feeble attempt to reduce Full GC pauses.

Has anyone had any experience with this ? Anyone tried it ?

--
Regards,
Oleg Dulin
NYC Java Big Data Engineer
http://www.olegdulin.com/

Cassandra 1.1.1 on Java 7

2012-09-08 Thread Oleg Dulin

Has anyone tried running 1.1.1 on Java 7?

I know Datastax does not recommend it for DSE, is there a reason why ?

Regards,
Oleg

Text searches and free form queries

2012-09-03 Thread Oleg Dulin


Dear Distinguished Colleagues:

I need to add full-text search and somewhat free form queries to my 
application. Our data is made up of items that are stored in a single 
column family, and we have a bunch of secondary indices for look ups. 
An item has header fields and data fields, and the structure of the 
items CF is a super column family with row-key being item's natural ID, 
super column for header, super column for data.


Our application is made up of a several redundant/load balanced servers 
all pointing at a Cassandra cluster. Our servers run embedded Jetty.


I need to be able to find items by a combination of field values. 
Currently I have an index for items by field value which works 
reasonably well. I could also add support for data types and index 
items by fields of appropriate types, so we can do range queries on 
items.


Ultimately, though, what we want is full text search with suggestions 
and human language sensitivity. We want to search by date ranges, by 
field values, etc. I did some homework on this topic, and here is what 
I see as options:


1) Use an SQL database as a helper. This is rather clunky, not sure 
what it gets us since just about anything that can be done in SQL can 
be done in Cassandra with proper structures. Then the problem here also 
is where am I going to get an open source database that can handle the 
workload ? Probably nowhere, nor do I get natural language support.
2) Each of our servers can index data using Lucene, but again we have 
to come up with a clunky mechanism where either one of the servers does 
the indexing and results are replicated, or each server does its own 
indexing.
3) We can use Solr as is, perhaps with some small modifications it can 
run within our server JVM -- since we already run embedded Jetty. I 
like this idea, actually, but I know that Solr indexing doesn't take 
advantage of Cassandra.
4) Datastax Enterprise with search, presumably, supports Solr indexing 
of existing column families -- but for the life of me I couldn't figure 
out how exactly it does that. The Wikipedia example shows that Solr can 
create column families based on Solr schemas that I can then query 
using Cassandra itself (which is great) and supposedly I can modify 
those column families directly and Solr will reindex them (which is 
even better), but I am not sure how that fits into our server design. 
The other concern is locking in to a commercial product, something I am 
very much worried about.


So, one possibility I can see is using Solr embedded within our own 
server solution but storing its indexes in the file system outside of 
Cassandra. This is not optimal, and maybe over time i can add my own 
support for storing Solr index in Cassandra w/o relying on the Datastax 
solution.


In any case, what are your thoughts and experiences ?


Regards,
Oleg

Deleting a row from a counter CF

2012-08-22 Thread Oleg Dulin


I get this:

InvalidRequestException(why:invalid operation for commutative columnfamily

Any thoughts ?

We use Pelops...

Data aggregation -- help me design a solution

2012-08-21 Thread Oleg Dulin


Here are my requirements.

We use Cassandra.

I get millions of invoice line items into the system. As I load them I 
need to build up some data structures.


* Invoice line items by invoice id (each line item has an invoice id on 
it ), with total dollar value

* Invoice line items by customer id , with total dollar value
* Invoice line items by territory, with total dollar value

In all of those cases, what we want is to see the total by a given 
attribute, that's all there is to it.


Line items may change daily, i.e. a territory may change or they may 
correct the values. In this case I need to update the aggregations 
accordingly.


Here are my ideas:

- I can use counters and store the data in buckets
- I can just store the data in buckets and do the math in Java

In both cases the challenge is that the items can be updated. Which 
means I need to look up a current version of an item and decide how to 
proceed. That puts a huge performance penalty on the application (# of 
line items we receive is in the millions and we need to process them in 
a timely fashion).


Help me out here -- any ideas on how I could design this in Cassandra ?


Regards,
Oleg

Wide rows and reads

2012-07-05 Thread Oleg Dulin


Here is my flow:

One process write a really wide row (250K+ supercolumns, each one with 
5 subcolumns, for the total of 1K or so per supercolumn)


Second process comes in literally 2-3 seconds later and starts reading from it.

My observation is that nothing good happens. It is ridiculously slow to 
read. It seems that if I wait long enough, the reads from that row will 
be much faster.


Could someone enlighten me as to what exactly happens when I do this ?

Regards,
Oleg

Supercolumn behavior on writes

2012-06-13 Thread Oleg Dulin

Does a write to a sub column involve deserialization of the entire super
column ?

Thanks, 
Oleg

Disappearing keyspaces in Cassandra 1.1

2012-06-12 Thread Oleg Dulin

I am using cassandra 1.1.0 on a 3 node environment. I just truncated a 
few column families then restarted the nodes. now when I restarted them 
it says my keyspace doesn't exist. The data for the keyspace is still 
in the data directory. Does anyone know what could have caused this?

Data corruption issues with 1.1

2012-06-07 Thread Oleg Dulin

I can't quite describe what happened, but essentially one day I found 
that my column values that are supposed to be UTF-8 strings started 
getting bogus characters.


Is there a known data corruption issue with 1.1 ?

nodetool repair -- should I schedule a weekly one ?

2012-06-07 Thread Oleg Dulin

We have a 3-node cluster. We use RF of 3 and CL of ONE for both reads 
and writes…. Is there a reason I should schedule a regular nodetool 
repair job ?


Thanks,
Oleg

TimedOutException()

2012-06-01 Thread Oleg Dulin

We are using Cassandra 1.1.0 with an older Pelops version, but I don't 
think that in itself is a problem here.


I am getting this exception:

TimedOutException()
   at 
org.apache.cassandra.thrift.Cassandra$get_slice_result.read(Cassandra.java:7660) 

   at 
org.apache.cassandra.thrift.Cassandra$Client.recv_get_slice(Cassandra.java:570) 

   at 
org.apache.cassandra.thrift.Cassandra$Client.get_slice(Cassandra.java:542) 


   at org.scale7.cassandra.pelops.Selector$3.execute(Selector.java:683)
   at org.scale7.cassandra.pelops.Selector$3.execute(Selector.java:680)
   at org.scale7.cassandra.pelops.Operand.tryOperation(Operand.java:82)


Is my understanding correct that this is where cassandra is telling us 
it can't accomplish something within that timeout value -- as opposed 
to network timeout ? Where is it set ?


Thanks,
Oleg

Re: TimedOutException()

2012-06-01 Thread Oleg Dulin

Tyler Hobbs ty...@datastax.com wrote:
 On Fri, Jun 1, 2012 at 9:39 AM, Oleg Dulin oleg.du...@gmail.com wrote:
 
 Is my understanding correct that this is where cassandra is telling us it
 can't accomplish something within that timeout value -- as opposed to
 network timeout ? Where is it set ?
 
 That's correct.  Basically, the coordinator sees that a replica has not
 responded (or can not respond) before hitting a timeout.  This is
 controlled by rpc_timeout_in_ms in cassandra.yaml.
 
 --
 Tyler Hobbs
 DataStax a href=http://datastax.com/;http://datastax.com//a

So if we are using random partitioner, and read consistency of one, what
does that mean ? 

We have a 3 node cluster, use write / read consistency of one, replication
factor of 3. 

Is the node we are connecting to try to proxy requests ? Wouldn't our
configuration ensure all nodes have replicas ?

Renaming a keyspace in 1.1

2012-05-30 Thread Oleg Dulin


Is it possible ? How ?

Data aggregation - averages, sums, etc.

2012-05-19 Thread Oleg Dulin

Dear distinguished colleagues:

I am trying to come up with a data model that lets me do aggregations, such
as sums and averages.

Here are my requirements:

1. Data may be updated concurrently
2. I want to avoid changing schema; we have a multitennant cloud solution
that is driven by configuration. Schema is the same for all customs.

Here is what I have at my disposal:

1. We have a proprietary distributed in memory column store that acts as a
buffer between the server and Cassandra. Frequent reads are not a problem.
2. I know I have counter columns. I can do sums. But can I do averages ?

One of the ideas is to record data as it comes in organized by time and
periodically aggregate it.

Thoughts ?

Re: how can we get (a lot) more performance from cassandra

2012-05-16 Thread Oleg Dulin

Please do keep us posted. We have a somewhat similar Cassandra 
utilization pattern, and I would like to know what your solution is...



On 2012-05-16 20:38:37 +, Yiming Sun said:

Thanks Oleg.  Another caveat from our side is, we have a very large 
data space (imaging picking 100 items out of 3 million, the chance of 
having 2 items from the same bin is pretty low). We will experiment 
with row cache, and hopefully it will help, not the opposite (the 
tuning guide says row cache could be detrimental in some circumstances).


-- Y.

On Wed, May 16, 2012 at 4:25 PM, Oleg Dulin oleg.du...@gmail.com wrote:
Indeed. This is how we are trying to solve this problem.

Our application has a built-in cache that resembles a supercolumn or 
standardcolumn data structure and has API that resembles a combination 
of Pelops selector and mutator. You can do something like that for 
Hector.


The cache is constrained and uses LRU to purge unused items and keep 
memory usage steady.


It is not perfect and we have bugs still but it cuts down on 90% of 
cassandra reads.



On 2012-05-16 20:07:11 +, Mike Peters said:

Hi Yiming,

Cassandra is optimized for write-heavy environments.

If you have a read-heavy application, you shouldn't be running your 
reads through Cassandra.


On the bright side - Cassandra read throughput will remain consistent, 
regardless of your volume.  But you are going to have to wrap your 
reads with memcache (or redis), so that the bulk of your reads can be 
served from memory.



Thanks,
Mike Peters

On 5/16/2012 3:59 PM, Yiming Sun wrote:
Hello,

I asked the question as a follow-up under a different thread, so I 
figure I should ask here instead in case the other one gets buried, and 
besides, I have a little more information.


We find the lack of performance disturbing as we are only able to get 
about 3-4MB/sec read performance out of Cassandra.


We are using cassandra as the backend for an IR repository of digital 
texts. It is a read-mostly repository with occasional writes.  Each row 
represents a book volume, and each column of a row represents a page of 
the volume.  Granted the data size is small -- the average size of a 
column text is 2-3KB, and each row has about 250 columns (varies quite 
a bit from one volume to another).


Currently we are running a 3-node cluster, and will soon be upgraded to 
a 6-node setup.  Each node is a VM with 4 cores and 16GB of memory.  
All VMs use SAN as disk storage.  


To retrieve a volume, a slice query is used via Hector that specifies 
the row key (the volume), and a list of column keys (pages), and the 
consistency level is set to ONE.  It is typical to retrieve multiple 
volumes per request.


The read rate that I have been seeing is about 3-4 MB/sec, and that is 
reading the raw bytes... using string serializer the rate is even 
lower, about 2.2MB/sec.  


The server log shows the GC ParNew frequently gets longer than 200ms, 
often in the range of 4-5seconds.  But nowhere near 15 seconds (which 
is an indication that JVM heap is being swapped out).


Currently we have not added JNA.  From a blog post, it seems JNA is 
able to increase the performance by 13%, and we are hoping to increase 
the performance by something more like 1300% (3-4 MB/sec is just 
disturbingly low).  And we are hesitant to disable swap entirely since 
one of the nodes is running a couple other services


Do you have any suggestions on how we may boost the performance?  Thanks!

-- Y.

Configuring cassandra cluster with host preferences

2012-05-14 Thread Oleg Dulin

I am running my processes on the same nodes as Cassandra.

What I'd like to do is when I get a connection from Pelops, it gives preference 
to the Cassandra node local to the host my process is on.

Is it possible ? How ?


Regards,
Oleg Dulin
Please note my new office #: 732-917-0159

76 matches

Mail list logo