It had only been running for 2 hours back then, but it has been a full 24
hours now and our read ping program is still showing the same read times
pretty consistently.

Dean

On 3/21/13 1:51 AM, "Andras Szerdahelyi"
<andras.szerdahe...@ignitionone.com> wrote:

>Wow. SO LCS with bloom filter fp chance of 0.1 and an index sampling rate
>of 512 on a column family of 1.7billion rows each node yields 100% result
>on first sstable reads? That sounds amazing. And I assume this is
>cfhistograms output from a node that has been on 512 for a while? ( I
>still think its unlikely 1.2x re-samples sstables on startup -- I'm on on
>1.1x though ) For LCS, same fp chance and sampling rate, with 300-500mil
>rows per node ( 300-400GB ) on 1.1x my sstable reads for a single read got
>pretty much out of control.
>
>On 20/03/13 14:35, "Hiller, Dean" <dean.hil...@nrel.gov> wrote:
>
>>I am using LCS so bloom filter fp default for 1.2.2 is 0.1 so my
>>bloomfilter size is 1.27G RAM(nodetool cfstats)....1.7 billion rows each
>>node.
>>
>>My cfstats for this CF is attached(Since cut and paste screwed up the
>>formatting).  During testing in QA, we were not sure if index_interval
>>change was working so we dug into the code to find out, it basically
>>seems
>>to immediately convert on startup though doesn't log anything except at a
>>"debug" level which we don't have on.
>>
>>Dean
>>
>>
>>
>>On 3/20/13 6:58 AM, "Andras Szerdahelyi"
>><andras.szerdahe...@ignitionone.com> wrote:
>>
>>>I am curious, thanks. ( I am in the same situation, big nodes choking
>>>under 300-400G data load, 500mil keys )
>>>
>>>How does your "cfhistograms Keyspace CF" output look like? How many
>>>sstable reads ?
>>>What is your bloom filter fp chance ?
>>>
>>>Regards,
>>>Andras
>>>
>>>On 20/03/13 13:54, "Hiller, Dean" <dean.hil...@nrel.gov> wrote:
>>>
>>>>Oh, and to give you an idea of memory savings, we had a node at 10G RAM
>>>>usage...we had upped a few nodes to 16G from 8G as we don't have our
>>>>new
>>>>nodes ready yet(we know we should be at 8G but we would have a dead
>>>>cluster if we did that).
>>>>
>>>>On startup, the initial RAM is around 6-8G.  Startup with
>>>>index_interval=512 resulted in a 2.5G-2.8G initial RAM and I have seen
>>>>it
>>>>grow to 3.3G and back down to 2.8G.  We just rolled this out an hour
>>>>ago.
>>>>Our website response time is the same as before as well.
>>>>
>>>>We rolled to only 2 nodes(out of 6) in our cluster so far to test it
>>>>out
>>>>and let it soak a bit.  We will slowly roll to more nodes monitoring
>>>>the
>>>>performance as we go.  Also, since dynamic snitch is not working with
>>>>SimpleSnitch, we know that just one slow node affects our website(from
>>>>personal pain/experience of nodes hitting RAM limit and slowing down
>>>>causing website to get real slow).
>>>>
>>>>Dean
>>>>
>>>>On 3/20/13 6:41 AM, "Andras Szerdahelyi"
>>>><andras.szerdahe...@ignitionone.com> wrote:
>>>>
>>>>>2. Upping index_interval from 128 to 512 (this seemed to reduce our
>>>>>memory
>>>>>usage significantly!!!)
>>>>>
>>>>>
>>>>>I'd be very careful with that as a one-stop improvement solution for
>>>>>two
>>>>>reasons AFAIK
>>>>>1) you have to rebuild stables ( not an issue if you are evaluating,
>>>>>doing
>>>>>test writes.. Etc, not so much in production )
>>>>>2) it can affect reads ( number of sstable reads to serve a read )
>>>>>especially if your key/row cache is ineffective
>>>>>
>>>>>On 20/03/13 13:34, "Hiller, Dean" <dean.hil...@nrel.gov> wrote:
>>>>>
>>>>>>Also, look at the cassandra logs.  I bet you see the typicalŠblah
>>>>>>blah
>>>>>>is
>>>>>>at 0.85, doing memory cleanup which is not exactly GC but cassandra
>>>>>>memory
>>>>>>managementŠ..and of course, you have GC on top of that.
>>>>>>
>>>>>>If you need to get your memory down, there are multiple ways
>>>>>>1. Switching size tiered compaction to leveled compaction(with 1
>>>>>>billion
>>>>>>narrow rows, this helped us quite a bit)
>>>>>>2. Upping index_interval from 128 to 512 (this seemed to reduce our
>>>>>>memory
>>>>>>usage significantly!!!)
>>>>>>3. Just add more nodes as moving the rows to other servers reduces
>>>>>>memory
>>>>>>from #1 and #2 above since the server would have less rows
>>>>>>
>>>>>>Later,
>>>>>>Dean
>>>>>>
>>>>>>On 3/20/13 6:29 AM, "Andras Szerdahelyi"
>>>>>><andras.szerdahe...@ignitionone.com> wrote:
>>>>>>
>>>>>>>
>>>>>>>I'd say GC. Please fill in form CASS-FREEZE-001 below and get back
>>>>>>>to
>>>>>>>us
>>>>>>>:-) ( sorry )
>>>>>>>
>>>>>>>How big is your JVM heap ? How many CPUs ?
>>>>>>>Garbage collection taking long ? ( look for log lines from
>>>>>>>GCInspector)
>>>>>>>Running out of heap ? ( "heap is .. full" log lines )
>>>>>>>Any tasks backing up / being dropped ? ( nodetool tpstats and "..
>>>>>>>dropped
>>>>>>>in last .. ms" log lines )
>>>>>>>Are writes really slow? ( nodetool cfhistograms Keyspace
>>>>>>>ColumnFamily
>>>>>>>)
>>>>>>>
>>>>>>>How much is lots of data? Wide or skinny rows? Mutations/sec ?
>>>>>>>Which Compaction Strategy are you using? Output of show schema (
>>>>>>>cassandra-cli ) for the relevant Keyspace/CF might help as well
>>>>>>>
>>>>>>>What consistency are you doing your writes with ? I assume ONE or
>>>>>>>ANY
>>>>>>>if
>>>>>>>you have a single node.
>>>>>>>
>>>>>>>What are the values for these settings in cassandra.yaml
>>>>>>>
>>>>>>>memtable_total_space_in_mb:
>>>>>>>memtable_flush_writers:
>>>>>>>memtable_flush_queue_size:
>>>>>>>compaction_throughput_mb_per_sec:
>>>>>>>
>>>>>>>concurrent_writes:
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>Which version of Cassandra?
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>Regards,
>>>>>>>Andras
>>>>>>>
>>>>>>>From:  Joel Samuelsson <samuelsson.j...@gmail.com>
>>>>>>>Reply-To:  "user@cassandra.apache.org" <user@cassandra.apache.org>
>>>>>>>Date:  Wednesday 20 March 2013 13:06
>>>>>>>To:  "user@cassandra.apache.org" <user@cassandra.apache.org>
>>>>>>>Subject:  Cassandra freezes
>>>>>>>
>>>>>>>
>>>>>>>Hello,
>>>>>>>
>>>>>>>I've been trying to load test a one node cassandra cluster. When I
>>>>>>>add
>>>>>>>lots of data, the Cassandra node freezes for 4-5 minutes during
>>>>>>>which
>>>>>>>neither reads nor writes are served.
>>>>>>>During this time, Cassandra takes 100% of a single CPU core.
>>>>>>>My initial thought was that this was Cassandra flushing memtables to
>>>>>>>the
>>>>>>>disk, however, the disk i/o is very low during this time.
>>>>>>>Any idea what my problem could be?
>>>>>>>I'm running in a virtual environment in which I have no control of
>>>>>>>drives.
>>>>>>>So commit log and data directory is (probably) on the same drive.
>>>>>>>
>>>>>>>Best regards,
>>>>>>>Joel Samuelsson
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>

Reply via email to