Re: Prevent queries from OOM nodes

2012-10-01 Thread Віталій Тимчишин
It's not about columns, it's about rows, see example statement.
In QueryProcessor#processStatement it reads rows into list, then does
list.size()

2012/10/1 aaron morton aa...@thelastpickle.com

 CQL will read everything into List to make latter a count.


 From 1.0 onwards count paginated reading the columns. What version are you
 on ?

 https://issues.apache.org/jira/browse/CASSANDRA-2894

 Cheers

 -
 Aaron Morton
 Freelance Developer
 @aaronmorton
 http://www.thelastpickle.com

 On 26/09/2012, at 8:26 PM, Віталій Тимчишин tiv...@gmail.com wrote:

 Actually an easy way to put cassandra down is
 select count(*) from A limit 1000
 CQL will read everything into List to make latter a count.

 2012/9/26 aaron morton aa...@thelastpickle.com

 Can you provide some information on the queries and the size of the data
 they traversed ?

 The default maximum size for a single thrift message is 16MB, was it
 larger than that ?
 https://github.com/apache/cassandra/blob/trunk/conf/cassandra.yaml#L375

 Cheers


 On 25/09/2012, at 8:33 AM, Bryce Godfrey bryce.godf...@azaleos.com
 wrote:

 Is there anything I can do on the configuration side to prevent nodes
 from going OOM due to queries that will read large amounts of data and
 exceed the heap available? 
 ** **
 For the past few days of we had some nodes consistently freezing/crashing
 with OOM.  We got a heap dump into MAT and figured out the nodes were dying
 due to some queries for a few extremely large data sets.  Tracked it back
 to an app that just didn't prevent users from doing these large queries,
 but it seems like Cassandra could be smart enough to guard against this
 type of thing?
 ** **
 Basically some kind of setting like if the data to satisfy query 
 available heap then throw an error to the caller and about query.  I would
 much rather return errors to clients then crash a node, as the error is
 easier to track down that way and resolve.
 ** **
 Thanks.





 --
 Best regards,
  Vitalii Tymchyshyn





-- 
Best regards,
 Vitalii Tymchyshyn


Re: downgrade from 1.1.4 to 1.0.X

2012-09-27 Thread Віталій Тимчишин
I suppose the way is to convert all SST to json, then install previous
version, convert back and load

2012/9/24 Arend-Jan Wijtzes ajwyt...@wise-guys.nl

 On Thu, Sep 20, 2012 at 10:13:49AM +1200, aaron morton wrote:
  No.
  They use different minor file versions which are not backwards
 compatible.

 Thanks Aaron.

 Is upgradesstables capable of downgrading the files to 1.0.8?
 Looking for a way to make this work.

 Regards,
 Arend-Jan


  On 18/09/2012, at 11:18 PM, Arend-Jan Wijtzes ajwyt...@wise-guys.nl
 wrote:
 
   Hi,
  
   We are running Cassandra 1.1.4 and like to experiment with
   Datastax Enterprise which uses 1.0.8. Can we safely downgrade
   a production cluster or is it incompatible? Any special steps
   involved?

 --
 Arend-Jan Wijtzes -- Wiseguys -- www.wise-guys.nl




-- 
Best regards,
 Vitalii Tymchyshyn


Re: Prevent queries from OOM nodes

2012-09-26 Thread Віталій Тимчишин
Actually an easy way to put cassandra down is
select count(*) from A limit 1000
CQL will read everything into List to make latter a count.

2012/9/26 aaron morton aa...@thelastpickle.com

 Can you provide some information on the queries and the size of the data
 they traversed ?

 The default maximum size for a single thrift message is 16MB, was it
 larger than that ?
 https://github.com/apache/cassandra/blob/trunk/conf/cassandra.yaml#L375

 Cheers


 On 25/09/2012, at 8:33 AM, Bryce Godfrey bryce.godf...@azaleos.com
 wrote:

 Is there anything I can do on the configuration side to prevent nodes from
 going OOM due to queries that will read large amounts of data and exceed
 the heap available? 
 ** **
 For the past few days of we had some nodes consistently freezing/crashing
 with OOM.  We got a heap dump into MAT and figured out the nodes were dying
 due to some queries for a few extremely large data sets.  Tracked it back
 to an app that just didn’t prevent users from doing these large queries,
 but it seems like Cassandra could be smart enough to guard against this
 type of thing?
 ** **
 Basically some kind of setting like “if the data to satisfy query 
 available heap then throw an error to the caller and about query”.  I would
 much rather return errors to clients then crash a node, as the error is
 easier to track down that way and resolve.
 ** **
 Thanks.





-- 
Best regards,
 Vitalii Tymchyshyn


Re: any ways to have compaction use less disk space?

2012-09-25 Thread Віталій Тимчишин
See my comments inline

2012/9/25 Aaron Turner synfina...@gmail.com

 On Mon, Sep 24, 2012 at 10:02 AM, Віталій Тимчишин tiv...@gmail.com
 wrote:
  Why so?
  What are pluses and minuses?
  As for me, I am looking for number of files in directory.
  700GB/512MB*5(files per SST) = 7000 files, that is OK from my view.
  700GB/5MB*5 = 70 files, that is too much for single directory, too
 much
  memory used for SST data, too huge compaction queue (that leads to
 strange
  pauses, I suppose because of compactor thinking what to compact next),...


 Not sure why a lot of files is a problem... modern filesystems deal
 with that pretty well.


May be. May be it's not filesystem, but cassandra. I've seen slowdowns of
compaction when the compaction queue is too large. And it can be too large
if you have a lot of SSTables. Note that each SSTable is both FS metadata
(and FS metadata cache can be limited) and cassandra in-memory data.
Anyway, as for me, performance test would be great in this area. Otherwise
it's all speculations.



 Really large sstables mean that compactions now are taking a lot more
 disk IO and time to complete.


As for me, this point is valid only when your flushes are small. Otherwise
you still need to compact the whole key range flush cover, no matter if
this is one large file or multiple small ones. One large file can even be
cheapier to compact.


 Remember, Leveled Compaction is more
 disk IO intensive, so using large sstables makes that even worse.
 This is a big reason why the default is 5MB. Also, each level is 10x
 the size as the previous level.  Also, for level compaction, you need
 10x the sstable size worth of free space to do compactions.  So now
 you need 5GB of free disk, vs 50MB of free disk.


I really don't think 5GB of free space is too much :)



 Also, if you're doing deletes in those CF's, that old, deleted data is
 going to stick around a LOT longer with 512MB files, because it can't
 get deleted until you have 10x512MB files to compact to level 2.
 Heaven forbid it doesn't get deleted then because each level is 10x
 bigger so you end up waiting a LOT longer to actually delete that data
 from disk.


But if I have small SSTables, all my data goes to high levels (4th for me
when I've had 128M setting). And it also take time for updates to reach
this level. I am not sure which way is faster.



 Now, if you're using SSD's then larger sstables is probably doable,
 but even then I'd guesstimate 50MB is far more reasonable then 512MB.


I don't think SSD are great for writes/compaction. Cassandra does this in
streaming fashion and regular HDDs are faster then SSDs for linear
read/write. SSD are good for random access, that for cassandra means reads.

P.S. I still think my way is better, yet it would be great to perform some
real tests.


 -Aaron


  2012/9/23 Aaron Turner synfina...@gmail.com
 
  On Sun, Sep 23, 2012 at 8:18 PM, Віталій Тимчишин tiv...@gmail.com
  wrote:
   If you think about space, use Leveled compaction! This won't only
 allow
   you
   to fill more space, but also will shrink you data much faster in case
 of
   updates. Size compaction can give you 3x-4x more space used than there
   are
   live data. Consider the following (our simplified) scenario:
   1) The data is updated weekly
   2) Each week a large SSTable is written (say, 300GB) after full update
   processing.
   3) In 3 weeks you will have 1.2TB of data in 3 large SSTables.
   4) Only after 4th week they all will be compacted into one 300GB
   SSTable.
  
   Leveled compaction've tamed space for us. Note that you should set
   sstable_size_in_mb to reasonably high value (it is 512 for us with
   ~700GB
   per node) to prevent creating a lot of small files.
 
  512MB per sstable?  Wow, that's freaking huge.  From my conversations
  with various developers 5-10MB seems far more reasonable.   I guess it
  really depends on your usage patterns, but that seems excessive to me-
  especially as sstables are promoted.
 
 
  --
  Best regards,
   Vitalii Tymchyshyn



 --
 Aaron Turner
 http://synfin.net/ Twitter: @synfinatic
 http://tcpreplay.synfin.net/ - Pcap editing and replay tools for Unix 
 Windows
 Those who would give up essential Liberty, to purchase a little temporary
 Safety, deserve neither Liberty nor Safety.
 -- Benjamin Franklin
 carpe diem quam minimum credula postero




-- 
Best regards,
 Vitalii Tymchyshyn


Re: any ways to have compaction use less disk space?

2012-09-24 Thread Віталій Тимчишин
Why so?
What are pluses and minuses?
As for me, I am looking for number of files in directory.
700GB/512MB*5(files per SST) = 7000 files, that is OK from my view.
700GB/5MB*5 = 70 files, that is too much for single directory, too much
memory used for SST data, too huge compaction queue (that leads to strange
pauses, I suppose because of compactor thinking what to compact next),...

2012/9/23 Aaron Turner synfina...@gmail.com

 On Sun, Sep 23, 2012 at 8:18 PM, Віталій Тимчишин tiv...@gmail.com
 wrote:
  If you think about space, use Leveled compaction! This won't only allow
 you
  to fill more space, but also will shrink you data much faster in case of
  updates. Size compaction can give you 3x-4x more space used than there
 are
  live data. Consider the following (our simplified) scenario:
  1) The data is updated weekly
  2) Each week a large SSTable is written (say, 300GB) after full update
  processing.
  3) In 3 weeks you will have 1.2TB of data in 3 large SSTables.
  4) Only after 4th week they all will be compacted into one 300GB SSTable.
 
  Leveled compaction've tamed space for us. Note that you should set
  sstable_size_in_mb to reasonably high value (it is 512 for us with ~700GB
  per node) to prevent creating a lot of small files.

 512MB per sstable?  Wow, that's freaking huge.  From my conversations
 with various developers 5-10MB seems far more reasonable.   I guess it
 really depends on your usage patterns, but that seems excessive to me-
 especially as sstables are promoted.


-- 
Best regards,
 Vitalii Tymchyshyn


Re: any ways to have compaction use less disk space?

2012-09-23 Thread Віталій Тимчишин
If you think about space, use Leveled compaction! This won't only allow you
to fill more space, but also will shrink you data much faster in case of
updates. Size compaction can give you 3x-4x more space used than there are
live data. Consider the following (our simplified) scenario:
1) The data is updated weekly
2) Each week a large SSTable is written (say, 300GB) after full update
processing.
3) In 3 weeks you will have 1.2TB of data in 3 large SSTables.
4) Only after 4th week they all will be compacted into one 300GB SSTable.

Leveled compaction've tamed space for us. Note that you should set
sstable_size_in_mb
to reasonably high value (it is 512 for us with ~700GB per node) to prevent
creating a lot of small files.

Best regards, Vitalii Tymchyshyn.

2012/9/20 Hiller, Dean dean.hil...@nrel.gov

 While diskspace is cheap, nodes are not that cheap, and usually systems
 have a 1T limit on each node which means we would love to really not add
 more nodes until we hit 70% disk space instead of the normal 50% that we
 have read about due to compaction.

 Is there any way to use less disk space during compactions?
 Is there any work being done so that compactions take less space in the
 future meaning we can buy less nodes?

 Thanks,
 Dean




-- 
Best regards,
 Vitalii Tymchyshyn


Re: persistent compaction issue (1.1.4 and 1.1.5)

2012-09-19 Thread Віталій Тимчишин
I did see problems with schema agreement on 1.1.4, but they did go away
after rolling restart (BTW: it would be still good to check describe schema
for unreachable). Same rolling restart helped to force compactions after
moving to Leveled compaction. If your compactions still don't go, you can
try removing *.json files from the data directory of the stopped node to
force moving all SSTables to level0.

Best regards, Vitalii Tymchyshyn

2012/9/19 Michael Kjellman mkjell...@barracuda.com

 Potentially the pending compactions are a symptom and not the root
 cause/problem.

 When updating a 3rd column family with a larger sstable_size_in_mb it
 looks like the schema may not be in a good state

 [default@] UPDATE COLUMN FAMILY screenshots WITH
 compaction_strategy=LeveledCompactionStrategy AND
 compaction_strategy_options={sstable_size_in_mb: 200};
 290cf619-57b0-3ad1-9ae3-e313290de9c9
 Waiting for schema agreement...
 Warning: unreachable nodes 10.8.30.102The schema has not settled in 10
 seconds; further migrations are ill-advised until it does.
 Versions are UNREACHABLE:[10.8.30.102],
 290cf619-57b0-3ad1-9ae3-e313290de9c9:[10.8.30.15, 10.8.30.14, 10.8.30.13,
 10.8.30.103, 10.8.30.104, 10.8.30.105, 10.8.30.106],
 f1de54f5-8830-31a6-9cdd-aaa6220cccd1:[10.8.30.101]


 However, tpstats looks good. And the schema changes eventually do get
 applied on *all* the nodes (even the ones that seem to have different
 schema versions). There are no communications issues between the nodes and
 they are all in the same rack

 root@:~# nodetool tpstats
 Pool NameActive   Pending  Completed   Blocked
 All time blocked
 ReadStage 0 01254592 0
 0
 RequestResponseStage  0 09480827 0
 0
 MutationStage 0 08662263 0
 0
 ReadRepairStage   0 0 339158 0
 0
 ReplicateOnWriteStage 0 0  0 0
 0
 GossipStage   0 01469197 0
 0
 AntiEntropyStage  0 0  0 0
 0
 MigrationStage0 0   1808 0
 0
 MemtablePostFlusher   0 0248 0
 0
 StreamStage   0 0  0 0
 0
 FlushWriter   0 0248 0
 4
 MiscStage 0 0  0 0
 0
 commitlog_archiver0 0  0 0
 0
 InternalResponseStage 0 0   5286 0
 0
 HintedHandoff 0 0 21 0
 0

 Message type   Dropped
 RANGE_SLICE  0
 READ_REPAIR  0
 BINARY   0
 READ 0
 MUTATION 0
 REQUEST_RESPONSE 0

 So I'm guessing maybe the different schema versions may be potentially
 stopping compactions? Will compactions still happen if there are different
 versions of the schema?





 On 9/18/12 11:38 AM, Michael Kjellman mkjell...@barracuda.com wrote:

 Thanks, I just modified the schema on the worse offending column family
 (as determined by the .json) from 10MB to 200MB.
 
 Should I kick off a compaction on this cf now/repair?/scrub?
 
 Thanks
 
 -michael
 
 From: Віталій Тимчишин tiv...@gmail.commailto:tiv...@gmail.com
 Reply-To: user@cassandra.apache.orgmailto:user@cassandra.apache.org
 user@cassandra.apache.orgmailto:user@cassandra.apache.org
 To: user@cassandra.apache.orgmailto:user@cassandra.apache.org
 user@cassandra.apache.orgmailto:user@cassandra.apache.org
 Subject: Re: persistent compaction issue (1.1.4 and 1.1.5)
 
 I've started to use LeveledCompaction some time ago and from my
 experience this indicates some SST on lower levels than they should be.
 The compaction is going, moving them up level by level, but total count
 does not change as new data goes in.
 The numbers are pretty high as for me. Such numbers mean a lot of files
 (over 100K in single directory) and a lot of thinking for compaction
 executor to decide what to compact next. I can see numbers like 5K-10K
 and still thing this is high number. If I were you, I'd increase
 sstable_size_in_mb 10-20 times it is now.
 
 2012/9/17 Michael Kjellman
 mkjell...@barracuda.commailto:mkjell...@barracuda.com
 Hi All,
 
 I have an issue where each one of my nodes (currently all running at
 1.1.5) is reporting around 30,000 pending compactions. I understand that
 a pending compaction doesn't necessarily mean it is a scheduled task
 however I'm confused why this behavior is occurring. It is the same on
 all nodes

Re: persistent compaction issue (1.1.4 and 1.1.5)

2012-09-18 Thread Віталій Тимчишин
I've started to use LeveledCompaction some time ago and from my experience
this indicates some SST on lower levels than they should be. The compaction
is going, moving them up level by level, but total count does not change as
new data goes in.
The numbers are pretty high as for me. Such numbers mean a lot of files
(over 100K in single directory) and a lot of thinking for compaction
executor to decide what to compact next. I can see numbers like 5K-10K and
still thing this is high number. If I were you, I'd increase sstable_size_in_mb
10-20 times it is now.

2012/9/17 Michael Kjellman mkjell...@barracuda.com

 Hi All,

 I have an issue where each one of my nodes (currently all running at
 1.1.5) is reporting around 30,000 pending compactions. I understand that a
 pending compaction doesn't necessarily mean it is a scheduled task however
 I'm confused why this behavior is occurring. It is the same on all nodes,
 occasionally goes down 5k pending compaction tasks, and then returns to
 25,000-35,000 compaction tasks pending.

 I have tried a repair operation/scrub operation on two of the nodes and
 while compactions initially happen the number of pending compactions does
 not decrease.

 Any ideas? Thanks for your time.

 Best,
 michael


 'Like' us on Facebook for exclusive content and other resources on all
 Barracuda Networks solutions.

 Visit http://barracudanetworks.com/facebook







-- 
Best regards,
 Vitalii Tymchyshyn


Re: Disk configuration in new cluster node

2012-09-18 Thread Віталій Тимчишин
Network also matters. It would take a lot of time sending 6TB over 1Gb
link, even fully saturating it. IMHO You can try with 10Gb, but you will
need to raise your streaming/compaction limits a lot.
Also you will need to ensure that your compaction can keep up. It is often
done in one thread and I am not sure if it will be enough for you. As of
parallel compaction, I don't know exact limitations and if it will be
working in your case.

2012/9/18 Casey Deccio ca...@deccio.net

 On Tue, Sep 18, 2012 at 1:54 AM, aaron morton aa...@thelastpickle.comwrote:

 each with several disks having large capacity, totaling 10 - 12 TB.  Is
 this (another) bad idea?

 Yes. Very bad.
 If you had 6TB on average system with spinning disks you would measure
 duration of repairs and compactions in days.

 If you want to store 12 TB of data you will need more machines.



 Would it help if I partitioned the computing resources of my physical
 machines into VMs?  For example, I put four VMs on each of three virtual
 machines, each with a dedicated 2TB drive.  I can now have four tokens in
 the ring and a RF of 3.  And of course, I can arrange them into a way that
 makes the most sense.  Is this getting any better, or am I missing the
 point?

 Casey




-- 
Best regards,
 Vitalii Tymchyshyn


Re: Practical node size limits

2012-09-05 Thread Віталій Тимчишин
You can try increasing streaming throttle.

2012/9/4 Dustin Wenz dustinw...@ebureau.com

 I'm following up on this issue, which I've been monitoring for the last
 several weeks. I thought people might find my observations interesting.

 Ever since increasing the heap size to 64GB, we've had no OOM conditions
 that resulted in a JVM termination. Our nodes have around 2.5TB of data
 each, and the replication factor is four. IO on the cluster seems to be
 fine, though I haven't been paying particular attention to any GC hangs.

 The bottleneck now seems to be the repair time. If any node becomes too
 inconsistent, or needs to be replaced, the rebuilt time is over a week.
 That issue alone makes this cluster configuration unsuitable for production
 use.

 - .Dustin

 On Jul 30, 2012, at 2:04 PM, Dustin Wenz dustinw...@ebureau.com wrote:

  Thanks for the pointer! It sounds likely that's what I'm seeing. CFStats
 reports that the bloom filter size is currently several gigabytes. Is there
 any way to estimate how much heap space a repair would require? Is it a
 function of simply adding up the filter file sizes, plus some fraction of
 neighboring nodes?
 
  I'm still curious about the largest heap sizes that people are running
 with on their deployments. I'm considering increasing ours to 64GB (with
 96GB physical memory) to see where that gets us. Would it be necessary to
 keep the young-gen size small to avoid long GC pauses? I also suspect that
 I may need to keep my memtable sizes small to avoid long flushes; maybe in
 the 1-2GB range.
 
- .Dustin
 
  On Jul 29, 2012, at 10:45 PM, Edward Capriolo edlinuxg...@gmail.com
 wrote:
 
  Yikes. You should read:
 
  http://wiki.apache.org/cassandra/LargeDataSetConsiderations
 
  Essentially what it sounds like your are now running into is this:
 
  The BloomFilters for each SSTable must exist in main memory. Repair
  tends to create some extra data which normally gets compacted away
  later.
 
  Your best bet is to temporarily raise the Xmx heap and adjust the
  index sampling size. If you need to save the data (if it is just test
  data you may want to give up and start fresh)
 
  Generally the issue with the large disk configurations it is hard to
  keep a good ram/disk ratio. Then most reads turn into disk seeks and
  the throughput is low. I get the vibe people believe large stripes are
  going to help Cassandra. The issue is that stripes generally only
  increase sequential throughput, but Cassandra is a random read system.
 
  How much ram/disk you need is case dependent but 1/5 ratio of RAM to
  disk is where I think most people want to be, unless their system is
  carrying SSD disks.
 
  Again you have to keep your bloom filters in java heap memory so and
  design that tries to create a quatrillion small rows is going to have
  memory issues as well.
 
  On Sun, Jul 29, 2012 at 10:40 PM, Dustin Wenz dustinw...@ebureau.com
 wrote:
  I'm trying to determine if there are any practical limits on the
 amount of data that a single node can handle efficiently, and if so,
 whether I've hit that limit or not.
 
  We've just set up a new 7-node cluster with Cassandra 1.1.2 running
 under OpenJDK6. Each node is 12-core Xeon with 24GB of RAM and is connected
 to a stripe of 10 3TB disk mirrors (a total of 20 spindles each) and
 connected via dual SATA-3 interconnects. I can read and write around
 900MB/s sequentially on the arrays. I started out with Cassandra tuned with
 all-default values, with the exception of the compaction throughput which
 was increased from 16MB/s to 100MB/s. These defaults will set the heap size
 to 6GB.
 
  Our schema is pretty simple; only 4 column families and each has one
 secondary index. The replication factor was set to four, and compression
 disabled. Our access patterns are intended to be about equal numbers of
 inserts and selects, with no updates, and the occasional delete.
 
  The first thing we did was begin to load data into the cluster. We
 could perform about 3000 inserts per second, which stayed mostly flat.
 Things started to go wrong around the time the nodes exceeded 800GB.
 Cassandra began to generate a lot of mutations messages dropped warnings,
 and was complaining that the heap was over 75% capacity.
 
  At that point, we stopped all activity on the cluster and attempted a
 repair. We did this so we could be sure that the data was fully consistent
 before continuing. Our mistake was probably trying to repair all of the
 nodes simultaneously - within an hour, Java terminated on one of the nodes
 with a heap out-of-memory message. I then increased all of the heap sizes
 to 8GB, and reduced the heap_newsize to 800MB. All of the nodes were
 restarted, and there was no no outside activity on the cluster. I then
 began a repair on a single node. Within a few hours, it OOMed again and
 exited. I then increased the heap to 12GB, and attempted the same thing.
 This time, the repair ran for about 7 hours before 

Re: Failing operations repair

2012-06-09 Thread Віталій Тимчишин
Thanks a lot. I was not sure if coordinator somehow tries to roll-back
transactions that failed to reach it's consistency level.
(Yet I could not imagine a method to do this, without 2-phase commit :) )

2012/6/8 aaron morton aa...@thelastpickle.com

 I am making some cassandra presentations in Kyiv and would like to check
 that I am telling people truth :)

 Thanks for spreading the word :)

 1) Failed (from client-side view) operation may still be applied to cluster

 Yes.
 If you fail with UnavailableException it's because from the coordinators
 view of the cluster there is less than CL nodes available. So retry.
 Somewhat similar story with TimedOutException.

 2) Coordinator does not try anything to roll-back operation that failed
 because it was processed by less then consitency level number of nodes.

 Correct.

 3) Hinted handoff works only for successfull operations.

 HH will be stored if the coordinator proceeds with the request.
 In 1.X HH is stored on the coordinator if a replica is down when the
 request starts and if the node does not reply in rpc_timeout.

 4) Counters are not reliable because of (1)

 If you get a TimedOutException when writing a counter you should not
 re-send the request.

 5) Read-repair may help to propagate operation that was failed it's
 consistency level, but was persisted to some nodes.

 Yes. It works in the background, by default is only enabled on 10% of
 requests.
 Note that RR is not the same as the Consistent Level for read. If you work
 as a CL  ONE the results from CL nodes are always compared and differences
 resolved. RR is concerned with the replicas not involved in the CL read.

 6) Manual repair is still needed because of (2) and (3)

 Manual repair is *the* was to achieve consistency of data on disk. HH and
 RR are optimisations designed to reduce the chance of a Digest Mismatch
 during a read with CL  ONE.
 It is also essential for distributing Tombstones before they are purged by
 compaction.

 P.S. If some points apply only to some cassandra versions, I will be happy
 to know this too.

 Assume everyone for version 1.X

 Thanks

 -
 Aaron Morton
 Freelance Developer
 @aaronmorton
 http://www.thelastpickle.com

 On 8/06/2012, at 1:20 AM, Віталій Тимчишин wrote:

 Hello.

 I am making some cassandra presentations in Kyiv and would like to check
 that I am telling people truth :)
 Could community tell me if next points are true:
 1) Failed (from client-side view) operation may still be applied to cluster
 2) Coordinator does not try anything to roll-back operation that failed
 because it was processed by less then consitency level number of nodes.
 3) Hinted handoff works only for successfull operations.
 4) Counters are not reliable because of (1)
 5) Read-repair may help to propagate operation that was failed it's
 consistency level, but was persisted to some nodes.
 6) Manual repair is still needed because of (2) and (3)

 P.S. If some points apply only to some cassandra versions, I will be happy
 to know this too.
 --
 Best regards,
  Vitalii Tymchyshyn





-- 
Best regards,
 Vitalii Tymchyshyn


Failing operations repair

2012-06-07 Thread Віталій Тимчишин
Hello.

I am making some cassandra presentations in Kyiv and would like to check
that I am telling people truth :)
Could community tell me if next points are true:
1) Failed (from client-side view) operation may still be applied to cluster
2) Coordinator does not try anything to roll-back operation that failed
because it was processed by less then consitency level number of nodes.
3) Hinted handoff works only for successfull operations.
4) Counters are not reliable because of (1)
5) Read-repair may help to propagate operation that was failed it's
consistency level, but was persisted to some nodes.
6) Manual repair is still needed because of (2) and (3)

P.S. If some points apply only to some cassandra versions, I will be happy
to know this too.
-- 
Best regards,
 Vitalii Tymchyshyn


Re: Query on how to count the total number of rowkeys and columns in them

2012-05-24 Thread Віталій Тимчишин
You should read multiple batches specifying last key received from
previous batch as first key for next one.
For large databases I'd recommend you to use statistical approach (if it's
feasible). With random parittioner it works well.
Don't read the whole db. Knowing whole keyspace you can read part, get
number of records per key (1), then multiply by keyspace size and get your
total.
You can even implement an algorithm that will work until required precision
is obtained (simply after each batch compare you previous and current
estimate).
For me it's enough to read ~1% of DB to get good result.

Best regards, Vitalii Tymchyshyn

2012/5/24 Prakrati Agrawal prakrati.agra...@mu-sigma.com

  Hi

 ** **

 I am trying to learn Cassandra and I have one doubt. I am using the Thrift
 API, to count the number of row keys I am using KeyRange to specify the row
 keys. To count all of them, I specify the start and end as “new byte[0]”.
 But the count is set to 100 by default. How do I use this method to count
 the keys if I don’t know the actual number of keys in my Cassandra
 database? Please help me

  **

-- 
Best regards,
 Vitalii Tymchyshyn


Re: Cassandra dying when gets many deletes

2012-05-06 Thread Віталій Тимчишин
Thanks a lot. It seems that a fix is commited now and fix will appear in
the next release, so I won't need my own patched cassandra :)

Best regards, Vitalii Tymchyshyn.

2012/5/3 Andrey Kolyadenko akolyade...@gmail.com

 Hi Vitalii,

 I sent patch.


 2012/4/24 Віталій Тимчишин tiv...@gmail.com

 Glad you've got it working properly. I've tried to make as local
 changes as possible, so changed only single value calculation. But it's
 possible your way is better and will be accepted by cassandra maintainer.
 Could you attach your patch to the ticket. I'd like for any fix to be
 applied to the trunk since currently I have to make my own patched build
 each time I upgrade because of the bug.

 Best regards, Vitalii Tymchyshyn

 25 квітня 2012 р. 09:08 crypto five cryptof...@gmail.com написав:

 I agree with your observations.
 From another hand I found that ColumnFamily.size() doesn't calculate
 object size correctly. It doesn't count two fields Objects sizes and
 returns 0 if there is no object in columns container.
 I increased initial size variable value to 24 which is size of two
 objects(I didn't now what's correct value), and cassandra started
 calculating live ratio correctly, increasing trhouhput value and flushing
 memtables.

 On Tue, Apr 24, 2012 at 2:00 AM, Vitalii Tymchyshyn tiv...@gmail.comwrote:

 **
 Hello.

 For me  there are no dirty column families in your message tells it's
 possibly the same problem.
 The issue is that column families that gets full row deletes only do
 not get ANY SINGLE dirty byte accounted and so can't be picked by flusher.
 Any ratio can't help simply because it is multiplied by 0. Check your
 cfstats.

 24.04.12 09:54, crypto five написав(ла):

 Thank you Vitalii.

  Looking at the Jonathan's answer to your patch I think it's probably
 not my case. I see that LiveRatio is calculated in my case, but
 calculations look strange:

  WARN [MemoryMeter:1] 2012-04-23 23:29:48,430 Memtable.java (line 181)
 setting live ratio to maximum of 64 instead of Infinity
  INFO [MemoryMeter:1] 2012-04-23 23:29:48,432 Memtable.java (line 186)
 CFS(Keyspace='lexems', ColumnFamily='countersCF') liveRatio is 64.0
 (just-counted was 64.0).  calculation took 63355ms for 0 columns

  Looking at the comments in the code: If it gets higher than 64
 something is probably broken., looks like it's probably the problem.
 Not sure how to investigate it.

 2012/4/23 Віталій Тимчишин tiv...@gmail.com

 See https://issues.apache.org/jira/browse/CASSANDRA-3741
 I did post a fix there that helped me.


 2012/4/24 crypto five cryptof...@gmail.com

 Hi,

  I have 50 millions of rows in column family on 4G RAM box. I
 allocatedf 2GB to cassandra.
 I have program which is traversing this CF and cleaning some data
 there, it generates about 20k delete statements per second.
 After about of 3 millions deletions cassandra stops responding to
 queries: it doesn't react to CLI, nodetool etc.
 I see in the logs that it tries to free some memory but can't even if
 I wait whole day.
 Also I see following in  the logs:

  INFO [ScheduledTasks:1] 2012-04-23 18:38:13,333 StorageService.java
 (line 2647) Unable to reduce heap usage since there are no dirty column
 families

  When I am looking at memory dump I see that memory goes to
 ConcurrentSkipListMap(10%), HeapByteBuffer(13%), DecoratedKey(6%),
 int[](6%), BigInteger(8.2%), ConcurrentSkipListMap$HeadIndex(7.2%),
 ColumnFamily(6.5%), ThreadSafeSortedColumns(13.7%), long[](5.9%).

  What can I do to make cassandra stop dying?
 Why it can't free the memory?
 Any ideas?

  Thank you.




   --
 Best regards,
  Vitalii Tymchyshyn







 --
 Best regards,
  Vitalii Tymchyshyn





-- 
Best regards,
 Vitalii Tymchyshyn


Re: Cassandra dying when gets many deletes

2012-04-25 Thread Віталій Тимчишин
Glad you've got it working properly. I've tried to make as local changes
as possible, so changed only single value calculation. But it's possible
your way is better and will be accepted by cassandra maintainer. Could you
attach your patch to the ticket. I'd like for any fix to be applied to the
trunk since currently I have to make my own patched build each time I
upgrade because of the bug.

Best regards, Vitalii Tymchyshyn

25 квітня 2012 р. 09:08 crypto five cryptof...@gmail.com написав:

 I agree with your observations.
 From another hand I found that ColumnFamily.size() doesn't calculate
 object size correctly. It doesn't count two fields Objects sizes and
 returns 0 if there is no object in columns container.
 I increased initial size variable value to 24 which is size of two
 objects(I didn't now what's correct value), and cassandra started
 calculating live ratio correctly, increasing trhouhput value and flushing
 memtables.

 On Tue, Apr 24, 2012 at 2:00 AM, Vitalii Tymchyshyn tiv...@gmail.comwrote:

 **
 Hello.

 For me  there are no dirty column families in your message tells it's
 possibly the same problem.
 The issue is that column families that gets full row deletes only do not
 get ANY SINGLE dirty byte accounted and so can't be picked by flusher. Any
 ratio can't help simply because it is multiplied by 0. Check your cfstats.

 24.04.12 09:54, crypto five написав(ла):

 Thank you Vitalii.

  Looking at the Jonathan's answer to your patch I think it's probably
 not my case. I see that LiveRatio is calculated in my case, but
 calculations look strange:

  WARN [MemoryMeter:1] 2012-04-23 23:29:48,430 Memtable.java (line 181)
 setting live ratio to maximum of 64 instead of Infinity
  INFO [MemoryMeter:1] 2012-04-23 23:29:48,432 Memtable.java (line 186)
 CFS(Keyspace='lexems', ColumnFamily='countersCF') liveRatio is 64.0
 (just-counted was 64.0).  calculation took 63355ms for 0 columns

  Looking at the comments in the code: If it gets higher than 64
 something is probably broken., looks like it's probably the problem.
 Not sure how to investigate it.

 2012/4/23 Віталій Тимчишин tiv...@gmail.com

 See https://issues.apache.org/jira/browse/CASSANDRA-3741
 I did post a fix there that helped me.


 2012/4/24 crypto five cryptof...@gmail.com

 Hi,

  I have 50 millions of rows in column family on 4G RAM box. I
 allocatedf 2GB to cassandra.
 I have program which is traversing this CF and cleaning some data
 there, it generates about 20k delete statements per second.
 After about of 3 millions deletions cassandra stops responding to
 queries: it doesn't react to CLI, nodetool etc.
 I see in the logs that it tries to free some memory but can't even if I
 wait whole day.
 Also I see following in  the logs:

  INFO [ScheduledTasks:1] 2012-04-23 18:38:13,333 StorageService.java
 (line 2647) Unable to reduce heap usage since there are no dirty column
 families

  When I am looking at memory dump I see that memory goes to
 ConcurrentSkipListMap(10%), HeapByteBuffer(13%), DecoratedKey(6%),
 int[](6%), BigInteger(8.2%), ConcurrentSkipListMap$HeadIndex(7.2%),
 ColumnFamily(6.5%), ThreadSafeSortedColumns(13.7%), long[](5.9%).

  What can I do to make cassandra stop dying?
 Why it can't free the memory?
 Any ideas?

  Thank you.




   --
 Best regards,
  Vitalii Tymchyshyn







-- 
Best regards,
 Vitalii Tymchyshyn


Re: Cassandra dying when gets many deletes

2012-04-23 Thread Віталій Тимчишин
See https://issues.apache.org/jira/browse/CASSANDRA-3741
I did post a fix there that helped me.

2012/4/24 crypto five cryptof...@gmail.com

 Hi,

 I have 50 millions of rows in column family on 4G RAM box. I allocatedf
 2GB to cassandra.
 I have program which is traversing this CF and cleaning some data there,
 it generates about 20k delete statements per second.
 After about of 3 millions deletions cassandra stops responding to queries:
 it doesn't react to CLI, nodetool etc.
 I see in the logs that it tries to free some memory but can't even if I
 wait whole day.
 Also I see following in  the logs:

 INFO [ScheduledTasks:1] 2012-04-23 18:38:13,333 StorageService.java (line
 2647) Unable to reduce heap usage since there are no dirty column families

 When I am looking at memory dump I see that memory goes to
 ConcurrentSkipListMap(10%), HeapByteBuffer(13%), DecoratedKey(6%),
 int[](6%), BigInteger(8.2%), ConcurrentSkipListMap$HeadIndex(7.2%),
 ColumnFamily(6.5%), ThreadSafeSortedColumns(13.7%), long[](5.9%).

 What can I do to make cassandra stop dying?
 Why it can't free the memory?
 Any ideas?

 Thank you.




-- 
Best regards,
 Vitalii Tymchyshyn


Re: [RELEASE CANDIDATE] Apache Cassandra 1.1.0-rc1 released

2012-04-15 Thread Віталій Тимчишин
Is the on-disk format already settled? I've thought to try betas but
impossibility to upgrade to 1.1 release stopped me.

2012/4/13 Sylvain Lebresne sylv...@datastax.com

 The Cassandra team is pleased to announce the release of the first release
 candidate for the future Apache Cassandra 1.1.


-- 
Best regards,
 Vitalii Tymchyshyn


Re: swap grows

2012-04-15 Thread Віталій Тимчишин
BTW: Are you sure system doing wrong? System may save some pages to swap
not removing them from RAM simply to have possibility to remove them later
fast if needed.

2012/4/14 ruslan usifov ruslan.usi...@gmail.com

 Hello

 We have 6 node cluster (cassandra 0.8.10). On one node i increase java
 heap size to 6GB, and now at this node begin grows swap, but system have
 about 3GB of free memory:


 root@6wd003:~# free
  total   used   free sharedbuffers cached
 Mem:  24733664   217028123030852  0   6792   13794724
 -/+ buffers/cache:7901296   16832368
 Swap:  1998840   23521996488


 And swap space slowly grows, but i misunderstand why?


 PS: We have JNA mlock, and set  vm.swappiness = 0
 PS: OS ubuntu 10.0.4(2.6.32-40-generic)





-- 
Best regards,
 Vitalii Tymchyshyn


Re: Compression on client side vs server side

2012-04-03 Thread Віталій Тимчишин
We are using client-side compression because of next points. Can you
confirm they are valid?
1) Server-side compression uses replication factor more CPU (3 times more
with replication factor of 3).
2) Network is used more by compression factor (as you are sending
uncompressed data over the wire).
4) Any server utility operations, like repair or move (not sure for the
latter) will decompress/compress
So, client side decompression looks way cheapier and can be very efficient
for long columns.

Best regards, Vitalii Tymchyshyn

2012/4/2 Jeremiah Jordan jeremiah.jor...@morningstar.com

  The server side compression can compress across columns/rows so it will
 most likely be more efficient.
 Whether you are CPU bound or IO bound depends on your application and node
 setup.  Unless your working set fits in memory you will be IO bound, and in
 that case server side compression helps because there is less to read from
 disk.  In many cases it is actually faster to read a compressed file from
 disk and decompress it, then to read an uncompressed file from disk.

 See Ed's post:
 Cassandra compression is like more servers for free!

 http://www.edwardcapriolo.com/roller/edwardcapriolo/entry/cassandra_compression_is_like_getting

  --
 *From:* benjamin.j.mcc...@gmail.com [benjamin.j.mcc...@gmail.com] on
 behalf of Ben McCann [b...@benmccann.com]
 *Sent:* Monday, April 02, 2012 10:42 AM
 *To:* user@cassandra.apache.org
 *Subject:* Compression on client side vs server side

  Hi,

  I was curious if I compress my data on the client side with Snappy
 whether there's any difference between doing that and doing it on the
 server side?  The wiki said that compression works best where each row has
 the same columns.  Does this mean the compression will be more efficient on
 the server side since it can look at multiple rows at once instead of only
 the row being inserted?  The reason I was thinking about possibly doing it
 client side was that it would save CPU on the datastore machine.  However,
 does this matter?  Is CPU typically the bottleneck on a machine or is it
 some other resource? (of course this will vary for each person, but
 wondering if there's a rule of thumb.  I'm making a web app, which
 hopefully will store about 5TB of data and have 10s of millions of page
 views per month)

  Thanks,
 Ben




-- 
Best regards,
 Vitalii Tymchyshyn


Re: Write performance compared to Postgresql

2012-04-03 Thread Віталій Тимчишин
Hello.

We are using java async thrift client.
As of ruby, it seems you need to use something like
http://www.mikeperham.com/2010/02/09/cassandra-and-eventmachine/
(Not sure as I know nothing about ruby).

Best regards, Vitalii Tymchyshyn


2012/4/3 Jeff Williams je...@wherethebitsroam.com

 Vitalii,

 Yep, that sounds like a good idea. Do you have any more information about
 how you're doing that? Which client?

 Because even with 3 concurrent client nodes, my single postgresql server
 is still out performing my 2 node cassandra cluster, although the gap is
 narrowing.

 Jeff

 On Apr 3, 2012, at 4:08 PM, Vitalii Tymchyshyn wrote:

  Note that having tons of TCP connections is not good. We are using async
 client to issue multiple calls over single connection at same time. You can
 do the same.
 
  Best regards, Vitalii Tymchyshyn.
 
  03.04.12 16:18, Jeff Williams написав(ла):
  Ok, so you think the write speed is limited by the client and protocol,
 rather than the cassandra backend? This sounds reasonable, and fits with
 our use case, as we will have several servers writing. However, a bit
 harder to test!
 
  Jeff
 
  On Apr 3, 2012, at 1:27 PM, Jake Luciani wrote:
 
  Hi Jeff,
 
  Writing serially over one connection will be slower. If you run many
 threads hitting the server at once you will see throughput improve.
 
  Jake
 
 
 
  On Apr 3, 2012, at 7:08 AM, Jeff Williamsje...@wherethebitsroam.com
  wrote:
 
  Hi,
 
  I am looking at cassandra for a logging application. We currently log
 to a Postgresql database.
 
  I set up 2 cassandra servers for testing. I did a benchmark where I
 had 100 hashes representing logs entries, read from a json file. I then
 looped over these to do 10,000 log inserts. I repeated the same writing to
 a postgresql instance on one of the cassandra servers. The script is
 attached. The cassandra writes appear to perform a lot worse. Is this
 expected?
 
  jeff@transcoder01:~$ ruby cassandra-bm.rb
  cassandra
  3.17   0.48   3.65 ( 12.032212)
  jeff@transcoder01:~$ ruby cassandra-bm.rb
  postgres
  2.14   0.33   2.47 (  7.002601)
 
  Regards,
  Jeff
 
  cassandra-bm.rb
 




-- 
Best regards,
 Vitalii Tymchyshyn


Re: About initial token, autobootstraping and load balance

2012-01-16 Thread Віталій Тимчишин
Yep, I think I can. Here you are: https://github.com/tivv/cassandra-balancer

2012/1/15 Carlos Pérez Miguel cperez...@gmail.com

 If you can partage it would be greate

 Carlos Pérez Miguel



 2012/1/15 Віталій Тимчишин tiv...@gmail.com:
  Yep. Have written groovy script this friday to perform autobalancing :)
 I am
  going to add it to my jenkins soon.
 
 
  2012/1/15 Maxim Potekhin potek...@bnl.gov
 
  I see. Sure, that's a bit more complicated and you'd have to move tokens
  after adding a machine.
 
  Maxim
 
 
 
  On 1/15/2012 4:40 AM, Віталій Тимчишин wrote:
 
  It's nothing wrong for 3 nodes. It's a problem for cluster of 20+ nodes,
  growing.
 
  2012/1/14 Maxim Potekhin potek...@bnl.gov
 
  I'm just wondering -- what's wrong with manual specification of tokens?
  I'm so glad I did it and have not had problems with balancing and all.
 
  Before I was indeed stuck with 25/25/50 setup in a 3 machine cluster,
  when had to move tokens to make it 33/33/33 and I screwed up a little
 in
  that the first one did not start with 0, which is not a good idea.
 
  Maxim
 
 
 
  --
  Best regards,
   Vitalii Tymchyshyn
 
 
 
 
 
  --
  Best regards,
   Vitalii Tymchyshyn




-- 
Best regards,
 Vitalii Tymchyshyn


Re: About initial token, autobootstraping and load balance

2012-01-15 Thread Віталій Тимчишин
It's nothing wrong for 3 nodes. It's a problem for cluster of 20+ nodes,
growing.

2012/1/14 Maxim Potekhin potek...@bnl.gov

  I'm just wondering -- what's wrong with manual specification of tokens?
 I'm so glad I did it and have not had problems with balancing and all.

 Before I was indeed stuck with 25/25/50 setup in a 3 machine cluster, when
 had to move tokens to make it 33/33/33 and I screwed up a little in that
 the first one did not start with 0, which is not a good idea.

 Maxim



-- 
Best regards,
 Vitalii Tymchyshyn


Re: About initial token, autobootstraping and load balance

2012-01-15 Thread Віталій Тимчишин
Yep. Have written groovy script this friday to perform autobalancing :) I
am going to add it to my jenkins soon.

2012/1/15 Maxim Potekhin potek...@bnl.gov

  I see. Sure, that's a bit more complicated and you'd have to move tokens
 after adding a machine.

 Maxim



 On 1/15/2012 4:40 AM, Віталій Тимчишин wrote:

 It's nothing wrong for 3 nodes. It's a problem for cluster of 20+ nodes,
 growing.

 2012/1/14 Maxim Potekhin potek...@bnl.gov

  I'm just wondering -- what's wrong with manual specification of tokens?
 I'm so glad I did it and have not had problems with balancing and all.

 Before I was indeed stuck with 25/25/50 setup in a 3 machine cluster,
 when had to move tokens to make it 33/33/33 and I screwed up a little in
 that the first one did not start with 0, which is not a good idea.

 Maxim



  --
 Best regards,
  Vitalii Tymchyshyn





-- 
Best regards,
 Vitalii Tymchyshyn


Re: About initial token, autobootstraping and load balance

2012-01-14 Thread Віталій Тимчишин
Actually for me it seems that largest means with most data, not range, that
with replication involved makes the feature useless.

2012/1/13 David McNelis dmcne...@gmail.com

 The documentation for that section needs to be updated...

 What happens is that if you just autobootstrap without setting a token it
 will by default bisect the range of the largest node.

 So if you go through several iterations of adding nodes, then this is what
 you would see:

 Gen 1:
 Node A:  100% of tokens, token range 1-10 (for example)

 Gen 2:
 Node A: 50% of tokens  (1-5)
 Node B: 50% of tokens (6-10)

 Gen 3:
 Node A: 25% of tokens (1-2.5)
 Node B: 50% of tokens (6-10)
 Node C: 25% of tokens (2.6-5)

 In reality, what you'd want in gen 3 is every node to be 33%, but it would
 not be the case without setting the tokens to begin with.

 You'll notice that there are a couple of scripts available to generate a
 list of  initial tokens for your particular cluster size, then ever time
 you add a node you'll need to update all the nodes with new tokens in order
 to properly load balance.

 Does this make sense?

 Other folks, am I explaining this correctly?

 David


 2012/1/13 Carlos Pérez Miguel cperez...@gmail.com

 Hello,

 I have a doubt about how initial token is determined. In Cassandra's
 documentation it is said that it is better to manually configure the
 initial token to each node in the system but also is said that if
 initial token is not defined and autobootstrap is true, new nodes
 choose initial token in order to better the load balance of the
 cluster. But what happens if no initial token is chosen and
 autobootstrap is not activated? How each node selects its initial
 token to balance the ring?

 I ask this because I am making tests with a 20 nodes cassandra cluster
 with cassandra 0.7.9. Any node has initial token, nor
 autobootstraping. I restart the cluster with each test I want to make
 and in the end the cluster is always well balanced.

 Thanks

 Carlos Pérez Miguel





-- 
Best regards,
 Vitalii Tymchyshyn


Re: Cassandra OOM

2012-01-13 Thread Віталій Тимчишин
2012/1/4 Vitalii Tymchyshyn tiv...@gmail.com

 04.01.12 14:25, Radim Kolar написав(ла):

   So, what are cassandra memory requirement? Is it 1% or 2% of disk data?
 It depends on number of rows you have. if you have lot of rows then
 primary memory eaters are index sampling data and bloom filters. I use
 index sampling 512 and bloom filters set to 4% to cut down memory needed.

 I've raised index sampling and bloom filter setting seems not to be on
 trunk yet. For me memtables is what's eating heap :(


Hello, all.

I've found out and fixed the problem today (after one my node OOMed
constantly replaying heap on start-up). full-key deletes are not accounted
and so column families with delete-only operations are not flushed. Here is
Jira: https://issues.apache.org/jira/browse/CASSANDRA-3741 and my pull
request to fix it: https://github.com/apache/cassandra/pull/5

Best regards, Vitalii Tymchyshyn


Re: is it bad to have lots of column families?

2012-01-05 Thread Віталій Тимчишин
2012/1/5 Michael Cetrulo mail2sa...@gmail.com

 in a traditional database it's not a good a idea to have hundreds of
 tables but is it also bad to have hundreds of column families in cassandra?
 thank you.


As far as I can see, this may raise memory requirements for you, since you
need to have index/bloom filter for each column family in memory.

-- 
Best regards,
 Vitalii Tymchyshyn


Cassandra OOM

2012-01-03 Thread Віталій Тимчишин
Hello.

We are using cassandra for some time in our project. Currently we are on
1.1 trunk (it was accidental migration, but since it's hard to migrate back
and it's performing nice enough we are currently on 1.1).
During New Year holidays one of the servers've produces a number of OOM
messages in the log.
According to heap dump taken, most of the memory is taken by MutationStage
queue (over 2millions of items).
So, I am curious now if cassandra have any flow control for messages? We
are using Quorum for writes and it seems to me that one slow server may
start getting more messages than it can consume. The writes will still
succeed performed by other servers in the replication set.
If there is no flow control, it should eventually get OOM. Is it the case?
Are there any plans to handle this?
BTW: A lot of memory (~half) is taken by Inet4Address objects, so making a
cache of such objects would make this problem less possible.

-- 
Best regards,
 Vitalii Tymchyshyn