We use counters in a 8 node cluster with RF 2 in cassandra 1.0.5.
We use phpcassa and execute cql queries through thrift to work with
composite types.
We do not have any problem of overcounts as we tally with RDBMS daily.
It works fine but we are having some GC pressure for young generation.
Per
@Rohit: We also use counters quite a lot (lets say 2000 increments / sec),
but don't see the 50-100KB of garbage per increment. Are you sure that
memory is coming from your counters?
Best regards,
Robin Verlangen
*Software engineer*
*
*
W http://www.robinverlangen.nl
E ro...@us2.nl
Disclaimer:
We've not been trying to create inconsistencies as you describe above. But
it seems legit that those situations cause problems.
Sometimes you can see log messages that indicate that counters are out of
sync in the cluster and they get repaired. My guess would be that the
repairs actually destroys
It is slowly dawning on me that I need a super-column to use column blooms
effectively and at the same time don't want the entire sub-column list
deserialized.
Queries by name use the row level bloom filter, regardless of the CF type.
In fact, for my use-case I also do not need a column
Given the advice to use a single RAID 0 volume, I think that's what I'll do.
By system mirror, you are referring to the volume on which the OS is
installed?
Yes.
I was thinking about a simple RAID 1 OS volume and RAID 0 data volume setup.
With the Commit Log on the OS volume so it does
They are. Can you provide some more information ?
What happens when you read the super column ?
Cheers
-
Aaron Morton
Freelance Developer
@aaronmorton
http://www.thelastpickle.com
On 18/09/2012, at 5:33 AM, Cyril Auburtin cyril.aubur...@gmail.com wrote:
First sorry but I'm
@Robin
I'm pretty sure the GC issue is due to counters only. Since we have
only write-heavy counter incrementing traffic.
GC Frequency also increases linearly with write load.
@Bartlomiej
On Stress Testing, we see GC frequency and consequently write latency
increase to several milliseconds.
At
Thanks a lot for clarifying. We'll complete the upgrade of all nodes.
Regards.
On Mon, Sep 17, 2012 at 3:51 PM, Sylvain Lebresne sylv...@datastax.comwrote:
On Mon, Sep 17, 2012 at 11:06 AM, B R software.research.w...@gmail.com
wrote:
Could this problem be due to running repair on a node
Any errors in the log ?
The node recovers ?
Do you use secondary indexes ? If so check comments for
memtable_flush_queue_size in the yaml. if this value is too low writes may back
up. But I would not expect it to cause dropped messages.
nodetool info also shows we have over a gig of
Hi
@Robin, about the log message:
Sometimes you can see log messages that indicate that counters are out of
sync in the cluster and they get repaired. My guess would be that the
repairs actually destroys it, however I have no knowledge of the underlying
techniques.
Here you got an answer form
@Alain: If you don't have much time to read this, just know that it's a
random error, which appear with low frequency, but regularly, seems to
appear quite randomly, and nobody knows the reason why it appears yet.
Also, you need to know that it's repaired by taking the highest of the
two
What Compaction Strategy are you using ?
Are there any errors in the logs ?
If you restart a node how long does it take for the numbers to start to rise ?
Cheers
-
Aaron Morton
Freelance Developer
@aaronmorton
http://www.thelastpickle.com
On 18/09/2012, at 7:39 AM, Michael
Also, I saw a presentation which said that if I don't have rows with more
than a hundred rows in Cassandra, whether I am doing something wrong or I
shouldn't be using Cassandra.
I do not agree with that statement. (I read that as rows with ore than a
hundred _columns_)
I need to support a
Some more background
http://spyced.blogspot.com/2009/01/all-you-ever-wanted-to-know-about.html
In additional to the SSTable bloom filter for keys, there are row level bloom
filters for columns.
Cheers
-
Aaron Morton
Freelance Developer
@aaronmorton
Range queries do not use bloom filters. It holds good for composite-columns
also right?
Since I assume you are referring to column's bloom filters (key's bloom filters
are always used) then yes, that holds good for composite columns. Currently,
composite column name are completely opaque to the
Hi,
We are running Cassandra 1.1.4 and like to experiment with
Datastax Enterprise which uses 1.0.8. Can we safely downgrade
a production cluster or is it incompatible? Any special steps
involved?
Arend-Jan
--
Arend-Jan Wijtzes -- Wiseguys -- www.wise-guys.nl
What is the query you are using to read the streams ?
Can you reduce the fault to this query is not returning data but it's there ?
Cheers
-
Aaron Morton
Freelance Developer
@aaronmorton
http://www.thelastpickle.com
On 18/09/2012, at 4:11 PM, Ishan Thilina is...@ishans.info
What version are you on ?
HTimedOutException is logged for all the nodes.
TimedOutException happens when less than CL replica nodes respond to the
coordinator in time.
You could get the error from all nodes in your cluster if the 3 nodes that
store the key are having problems.
On Sep 18, 2012, at 3:06 AM, aaron morton aa...@thelastpickle.com wrote:
select filename from inode where filename ‘/tmp’ and filename ‘/tmq’ and
sentinel = ‘x’;
Wouldn't that return files from directories '/tmp1', '/tmp2', for example? I
thought the goal was to return files and
I wanted to clarify the where that statement comes from on wide rows ….
Realize some people make the claim that if you don’t' have 1000's of columns in
some rows in cassandra you are doing something wrong. This is not true, BUT
it comes from the fact that people are setting up indexes. This
Hi Aaron, thank you for your reply. Please read inline comment.
On Tue, Sep 18, 2012 at 7:36 PM, aaron morton aa...@thelastpickle.comwrote:
What version are you on ?
cassandra version 1.0.8
HTimedOutException is logged for all the nodes.
TimedOutException happens when less than CL
Aaron,
Thank you very much for the answers! Helped me a lot!
I would like just a bit more clarification about the points bellow, if
you allow me:
- You can query your data using Hadoop easily enough. You may want take
a look at DSE from http://datastax.com/ it makes using Hadoop
I will have just 6 columns in my CF, but I will have about a billion writes
per hour. In this case, I think Cassandra applies then, by what you are
saying.
This answer helped a lot too, thanks!
2012/9/18 Hiller, Dean dean.hil...@nrel.gov
I wanted to clarify the where that statement comes from
Until Aaron replies, here are my thoughts on the relational piece…
If everything in my model fits into a relational database, if my
data is structured, would it still be a good idea to use Cassandra? Why?
The playOrm project explores exactly this issue……A query on 1,000,000 rows in a
I would like to understand or do my best helping you to understand this
issue.
I got the following (shortened) logs:
(03a227f0-a5c3-11e1--b7f5e49dceff, 3, 3) and
(03a227f0-a5c3-11e1--b7f5e49dceff, 3, 0)
(03a227f0-a5c3-11e1--b7f5e49dceff, 6, 6) and
I wrote a Hadoop mapper-only job that uses BulkOutputFormat to load a Cassandra
table.
That job would consistently fail with a flurry of exceptions (primary cause
looks like EOFExceptions
streaming between nodes).
I restructured the job to use an identity mapper and perform the updates in the
The repair of taking the highest value of two inconsistent might cause
getting higher values?
If a counter counts backwards (therefore has negative values), then repair
would still choose the larger value? Or does cassandra take the highter
absolute value? This would result to an undercounting
Leveled. nothing in the logs. Normal compactions seem to be occurring...these
ones just won't go away.
I've tried a rolling restart and literally tries killing our entire cluster and
bringing up one node at a time in case gossip was causing this. Same result.
The compactions are there
To go further, would it maybe be an idea to count everything twice? One
as postive value and once as negative value. When reading the counters, the
application could just compare the negative and positive counter to get an
error margin.
This sounds interesting. Maybe someone should implement
In your data directory there should be a .json file for each column family
that holds the manifest.
Do any of those indicate that you have a large number of SSTables in L0?
This number is also indicated in JMX by the UnLeveledSSTables count for
each column family.
If not it's possible that the
On Tue, Sep 18, 2012 at 1:54 AM, aaron morton aa...@thelastpickle.comwrote:
each with several disks having large capacity, totaling 10 - 12 TB. Is
this (another) bad idea?
Yes. Very bad.
If you had 6TB on average system with spinning disks you would measure
duration of repairs and
You're talking about this project, right?
https://github.com/deanhiller/playorm
I will take a look. However, I don't think using Cassandra's model itself
(with CFs / key-values) would be a problem, I just need to know where the
advantage relies on. By your answer, my guess is it relies on better
Yes,
If you are using 1.1 take a look at:
dclocal_read_repair_chance
and
read_repair_chance
CF settings.
Regards,
/VJ
On Sun, Sep 16, 2012 at 5:03 PM, Raj N raj.cassan...@gmail.com wrote:
Hi,
I have a 2 DC setup(DC1:3, DC2:3). All reads and writes are at
LOCAL_QUORUM. The question is
There are a large number of members in generation 0, which I'm assuming refers
to L0 according to a few of the .json files I checked in my largest column
families.
This particular node I'm checking I have already tried a scrub and repair. What
steps should I take to move these SSTables to the
I've started to use LeveledCompaction some time ago and from my experience
this indicates some SST on lower levels than they should be. The compaction
is going, moving them up level by level, but total count does not change as
new data goes in.
The numbers are pretty high as for me. Such numbers
Thanks, I just modified the schema on the worse offending column family (as
determined by the .json) from 10MB to 200MB.
Should I kick off a compaction on this cf now/repair?/scrub?
Thanks
-michael
From: Віталій Тимчишин tiv...@gmail.commailto:tiv...@gmail.com
Reply-To:
Cassandra is fully aware of all tables created with playOrm and you can still
use DataStax enterprise features to get real time analytics. Playroom is a
layer on top of cassandra and with any layer it makes a developer more
productive at a slight cost of performance just like hibernate on top
Oh, and yes, that is the correct link.
Dean
From: Marcelo Elias Del Valle mvall...@gmail.commailto:mvall...@gmail.com
Reply-To: user@cassandra.apache.orgmailto:user@cassandra.apache.org
user@cassandra.apache.orgmailto:user@cassandra.apache.org
Date: Tuesday, September 18, 2012 10:50 AM
To:
Potentially the pending compactions are a symptom and not the root
cause/problem.
When updating a 3rd column family with a larger sstable_size_in_mb it
looks like the schema may not be in a good state
[default@] UPDATE COLUMN FAMILY screenshots WITH
which version is that? in version, 1.1.2 , nodetool does take the column
family.
setcachecapacity keyspace cfname keycachecapacity rowcachecapacity
- Set the key and row cache capacities of a given column family
On Wed, Sep 19, 2012 at 2:15 AM, rohit reddy rohit.kommare...@gmail.comwrote:
Hi,
Network also matters. It would take a lot of time sending 6TB over 1Gb
link, even fully saturating it. IMHO You can try with 10Gb, but you will
need to raise your streaming/compaction limits a lot.
Also you will need to ensure that your compaction can keep up. It is often
done in one thread and I
try increasing vm.max_map_count
per http://blog.timstoop.nl/2011/04/20/cassandra-java-io-ioerror-java-io-ioexception-map-failed/
Feng Qu
From: Raj N raj.cassan...@gmail.com
To: user@cassandra.apache.org
Sent: Tuesday, September 18, 2012 6:37 PM
Subject:
42 matches
Mail list logo