-XX:+CMSIncrementalMode \
-XX:+CMSIncrementalPacing \
This may not be an issue given your other VM opts, but just FYI I
have had some difficulty making the incremental CMS mode perform GC
work sufficiently aggressively to avoid concurrent mode failures
during significant
Does Cassandra make any guarantees on the outcome of a scenario like this:
Two clients insert the same key/colum with different values at the same
time:
client A does insert(keyspace, key_1,
column_name_1, value_A, timestamp_1, consistency_level.QUORUM)
client B does insert(keyspace,
Thanks Jonathan, Brandon and Peter for your quick response. I'm going to test
the issue's workaround. Also I will test batch mode instead of periodic
mode for the commit log and I'll keep you informed.
Thanks!
Daniel Gimenez.
--
View this message in context:
There are other threads linked to this issue. Most notable, I think we're
hitting
https://issues.apache.org/jira/browse/CASSANDRA-1014
here.
2010/4/27 Schubert Zhang zson...@gmail.com
Seems:
ROW-MUTATION-STAGE 32 3349 63897493
is the clue, too many mutation requests are
Thanks all!
The reason I was thinking of having two keyspaces is that I expect them to
evolve at different rates. Our normal column families will change rarely
(hopefully never) but our index column families will change whenever we want
to query the data in a new way, that isn't supported by the
From what you've all said, it doesn't seem like it's worth it.
No. But you will want to follow that
https://issues.apache.org/jira/browse/CASSANDRA-1007
On Wed, Apr 28, 2010 at 1:13 AM, Mark Robson mar...@gmail.com wrote:
I can't see any advantage in using multiple keyspaces. It is highly
I think even through the real deletion is done when compaction.
The get/get_range_slices should not return the deleted-marked keys (or
columns).
Schubert
On Wed, Apr 28, 2010 at 1:39 PM, Jeff Zhang zjf...@gmail.com wrote:
Thanks Lu, it's helpful.
On Wed, Apr 28, 2010 at 11:42 AM, Greg Lu
I don't think secondary index is necessary for cassandra core, at least it
is not urgent.
I think currently, the first urgent improvements of cassandra are:
1. re-clarify the data-model.
2. re-implement the storage and index, especially the current SSTable
implement is not good.
In fact, the
I think, at least currently, we should leave the logic of current
SuperColumn and addational indexing features to application layer of
cassandra core.
On Wed, Apr 28, 2010 at 6:44 PM, Schubert Zhang zson...@gmail.com wrote:
I don't think secondary index is necessary for cassandra core, at least
hi,
The compaction process is very slow, when the size of new generating
sstable file grows upon 25GB;
at the meantime, the garbage collector is running frequently.
Firstly, I have a question that, is there a limitation of the sstable
size? if not, is 2GB heap size not
enough
OK, I have solved my problems with Cassandra data model. Now I am using
Column Families of type Super and SuperColumns with many columns inside.
Thanks!
2010/4/16 Julio Carlos Barrera Juez juliocar...@gmail.com
Hi again,
First of all, obviously, I have omitted the timestamps to make easy the
Hi all!
I am using org.apache.cassandra.auth.SimpleAuthenticator to use
authentication in my cluster with one node (with cassandra 0.6.1). I have
put:
Authenticatororg.apache.cassandra.auth.SimpleAuthenticator/Authenticator
in storage-conf.xml file, and:
keyspace=username
in access.properties
If I understand correctly, the distinction between supercolumns and
subcolumns is critical to good database design if you want to use random
partitioning: you can do range queries on subcolumns but not on
supercolumns.
Is this correct?
On Mon, Apr 26, 2010 at 7:11 PM, Jonathan Ellis
OK, I have solved my problems with Cassandra data model. Now I am using
Column Families of type Super and SuperColumns with many columns inside.
You need to be aware of the third point of
http://wiki.apache.org/cassandra/CassandraLimitations.
That is, super columns are not indexed. Which means
Hi,
Yesterday, I saw a lot of discussion about how to store a file (big one). It
looks like the suggestion is store in multiple rows (even not multiple
column in a single row).
My question is:
Is there any best maximum column size which can help to make the decision on
the segment size?
Hi,
Here are some links I collected:
1. http://wiki.apache.org/cassandra/CassandraCli: this is how bring it up
and run
2. http://arin.me/blog/wtf-is-a-supercolumn-cassandra-data-model is very
good to start to understand the schema
3.
2010/4/28 Даниел Симеонов dsimeo...@gmail.com:
Hi Sylvain,
Thank you very much! I still have some further questions, I didn't find
how row cache is being configured?
Provided you don't use trunk but something stable like 0.6.1 (which
you should),
it is in storage-conf.xml. It's one option of
I also want to know
2010/4/28 David Boxenhorn da...@lookin2.com
When I change the cluster name in storage-conf.xml, the CLI complains that
the cluster name doesn't equal Test Cluster.
How do I change the cluster name that the CLI looks for?
new try, previous went to wrong place...
Hi all,
i'm trying to run a scenario of adding files from specific folder to cassandra.
Now I have 64 files(about 15-20 MB per file) and overall of 1GB of data.
I'm able to insert a round 40 files, but after that the cassandra goes to some
GC loop and
Is there a Cassandra Navigator, or some way that I can see the data in
Cassandra if I don't know what the keys are?
Hello,
our company has a huge table in a relational database which keeps statistics
of some financional operations.
It looks like the following:
SERVER_ID - server, which served the transaction
ACCOUNT_FROM - account1
ACCOUNT_TO - account2
HOUR - time range for this statistics row (from 0 minutes
It sounds like either there is a fairly obvious bug, or you're doing
something wrong. :)
Can you reproduce against a single node?
On Tue, Apr 27, 2010 at 5:14 PM, Joost Ouwerkerk jo...@openplaces.org wrote:
Update: I ran a test whereby I deleted ALL the rows in a column
family, using a
The thing is, that I'm not running close to being out of memory. The data
from nodetool info is showing that only about half of the available heap
space is being used and running free from the command line shows that I have
plenty of RAM available and some usage of the 1G swap space which is
Compaction time is proportional to the size of the sstable, yes. Not
sure how it could be otherwise. And it does generate a lot of
garbage. So unless you are seeing concurrent failures in the GC and
corresponding large pause times, your heap should be fine, as long as
the rows you are
Thanks Jonathan, that hits exactly the heart of my question. Unfortunately
it kills my original idea to implement a unique transaction identifier
creation algorithm - for this, even eventual consistency would be
sufficient, but I would need to know if I am consistent at the time of a
read request.
Hi,
What about if the upper bound of columns in a row is loosely defined, i.e.
it is ok that we have maximum of around 100 for example, but not exactly
(maybe 105, 110)?
What if I make a slice query to return say 1/5th of the columns in a row, I
believe that such query again will not deserialize
On Tue, Apr 27, 2010 at 10:49 PM, Jeff Zhang zjf...@gmail.com wrote:
Mark,
Thanks for your suggestion, It's really not a good idea to store one
file in multiple columns in one row. The heap space problem will still
exist. And I take your advice to store it in multiple rows, it works,
I can
One last question (sorry to bother you): isn't the behavior of read repair
strictly deterministic in this case? You say both read requests could try to
read repair the result (each time in the opposite direction). Inside the
read repair algorithm, when we have exactly the same timestamps, what
Hi,
What about if the upper bound of columns in a row is loosely defined, i.e.
it is ok that we have maximum of around 100 for example, but not exactly
(maybe 105, 110)?
What if I make a slice query to return say 1/5th of the columns in a row, I
believe that such query again will not
I think your file (as cassandra column value) is too large.
And I also think Cassandra is not good at store files.
On Wed, Apr 28, 2010 at 10:24 PM, Jussi P?öri
ju...@androidconsulting.comwrote:
new try, previous went to wrong place...
Hi all,
i'm trying to run a scenario of adding files
Your schema desigin is a RDBMS schema, not a Cassandra schema.
On Thu, Apr 15, 2010 at 11:44 PM, Miguel Verde miguelitov...@gmail.comwrote:
Just to nitpick your representation a little bit, columnB/etc... are
supercolumnB/etc..., key1/etc... are column1/etc..., and you can probably
omit
On Wed, Apr 28, 2010 at 5:24 AM, David Boxenhorn da...@lookin2.com wrote:
If I understand correctly, the distinction between supercolumns and
subcolumns is critical to good database design if you want to use random
partitioning: you can do range queries on subcolumns but not on
supercolumns.
I was thinking this too, but I think that the overall insert amount is
not that big.
Data is basically map data, and the files are map tiles, which I can
easily make smaller.
We are currently using this data from multiple nodes(GRID), but we want
to get rid off the files system hassle(basically
On 4/26/10 2:44 AM, dir dir wrote:
Suppose I have a MPEG video files 15 MB. To save this video file into
Cassandra database I will store
this file into array of byte. One day, I feel this video is not
necessary again,
therefore I delete it from the database. My question is, after I
delete this
There is no column size limitation. As to performance due to the size of a
column and with the speeds that Cassandra are running at, I don't belive it
would make a bit of a difference if it was 1 byte or a million bytes.
Can anyone here prove me right or wrong?
Regards,
Michael
On Wed, Apr
Hello. I am using Cassandra 0.6.1 on ubuntu 8.04. 3 node cluster.
I notice that when I start making lots of read requests (serially), memory
usage of jsvc keeps climbing until it uses up all memory on the server (happens
for all 3 servers in the cluster). At that point, the box starts
On Wed, Apr 28, 2010 at 12:12 PM, Kyusik Chung kyu...@discovereads.com wrote:
Hello. I am using Cassandra 0.6.1 on ubuntu 8.04. 3 node cluster.
I notice that when I start making lots of read requests (serially), memory
usage of jsvc keeps climbing until it uses up all memory on the server
Hi Ryan,
Do you mean these settings, or other settings?
SlicedBufferSizeInKB64/SlicedBufferSizeInKB
FlushDataBufferSizeInMB32/FlushDataBufferSizeInMB
FlushIndexBufferSizeInMB8/FlushIndexBufferSizeInMB
ColumnIndexSizeInKB64/ColumnIndexSizeInKB
MemtableThroughputInMB64/MemtableThroughputInMB
On Wed, Apr 28, 2010 at 3:17 AM, David Boxenhorn da...@lookin2.com wrote:
When I change the cluster name in storage-conf.xml, the CLI complains that
the cluster name doesn't equal Test Cluster.
What do you mean? I don't see any checks for cluster name equality in
the CLI code.
--
Jonathan
On Wed, Apr 28, 2010 at 3:17 AM, David Boxenhorn da...@lookin2.com wrote:
When I change the cluster name in storage-conf.xml, the CLI complains that
the cluster name doesn't equal Test Cluster.
How do I change the cluster name that the CLI looks for?
I don't think you mean the CLI, but the
Yes! Reproduced on single-node cluster:
10/04/28 16:30:24 INFO mapred.JobClient: ROWS=274884
10/04/28 16:30:24 INFO mapred.JobClient: TOMBSTONES=951083
10/04/28 16:42:49 INFO mapred.JobClient: ROWS=166580
10/04/28 16:42:49 INFO mapred.JobClient: TOMBSTONES=1059387
On Wed, Apr
Ah, now I understand. Supercolumns it is.
On Wed, Apr 28, 2010 at 9:40 AM, Jonathan Ellis jbel...@gmail.com wrote:
I don't think you are missing anything. You'll have to pick your poison.
FWIW, if each BAR has relatively few fields then supercolumns aren't
bad. It's when a BAR has
This sounds similar to /proc/sys/vm/swappiness misconfiguration. Is it zero
or close to zero? If setting it 0 solves your problem, make sure all your
nodes get this:
/etc/sysctl.conf:
vm.swappiness=0
On Wed, Apr 28, 2010 at 12:12 PM, Kyusik Chung kyu...@discovereads.comwrote:
Hello. I am
OK so the issue seems to be that the maven repo's web server (nginx)
sends though files gzipped regardless as to whether or not the client
requested as such.
Unfortunately I cant work out to share this information with Ivy.
Switching to Ibiblio repository leads to another set of problems.
On
Isnt setting swappiness to a lower value a good idea only if you know you have
the physical RAM to support it? What Im observing on my box is that jsvc uses
up all the physical RAM. Its VM size is 4-5GB right now (not sure if it will
continue to grow).
Apologies if Im misunderstanding how
http://www.reddit.com/r/programming/comments/bcqhi/reddits_now_running_on_cassandra/
It seems to me that they are still using Cassandra in persistant storage
layer as a replacement of memcachedb, not in cache layer.
I'm new here with Cassandra actually, but now I'm also curious about the
Facebook did a lot of work to keep their huge memcache
cluster consistent and fault-tolerant.
I think a cache infrastructure like Cassandra would make that a lot easier.
On Thu, Apr 29, 2010 at 11:54 AM, Lisen Mu imm...@gmail.com wrote:
On Wed, Apr 21, 2010 at 10:08 PM, Oleg Anastasjev olega...@gmail.comwrote:
Hello,
I am testing how cassandra behaves on single node disk failures to know
what to
expect when things go bad.
I had a cluster of 4 cassandra nodes, stress loaded it with client and made
2
tests:
1. emulated
use get_range_slices, with a start key of '', and page through it
On Wed, Apr 28, 2010 at 9:26 AM, David Boxenhorn da...@lookin2.com wrote:
Is there a Cassandra Navigator, or some way that I can see the data in
Cassandra if I don't know what the keys are?
--
Jonathan Ellis
Project Chair,
Interesting. Googling your error turns up
http://stackoverflow.com/questions/1124771/how-to-solve-java-io-ioexception-error12-cannot-allocate-memory-calling-runt
Why not just leave the swap on? It's usually a Good Thing to be able
to page out unused memory, and use the ram for buffer cache
Good! :)
Can you reproduce w/o map/reduce, with raw get_range_slices?
On Wed, Apr 28, 2010 at 3:56 PM, Joost Ouwerkerk jo...@openplaces.org wrote:
Yes! Reproduced on single-node cluster:
10/04/28 16:30:24 INFO mapred.JobClient: ROWS=274884
10/04/28 16:30:24 INFO mapred.JobClient:
key : stock ID, e.g. AAPL+year
column family: closting price and valume, tow CFs.
colum name: timestamp LongType
AAPL+2010- CF:closingPrice - {'04-13' : 242, '04-14': 245}
AAPL+2010- CF:volume - {'04-13' : 242, '04-14': 245}
On Thu, Apr 22, 2010 at 2:00 AM, Miguel Verde
I found hector is not a good design.
1. We cannot create multiple threads (each thread have a connection to
cassandra server) to one cassandra server.
As we known, usually, cassandra client should be multiple-threads to
achieve good throughput.
2. The implementation is too fat.
3. Introduce
Hi Schubert, I'm sorry Hector isn't a good fit for you, so let's see what's
missing for your.
On Thu, Apr 29, 2010 at 8:22 AM, Schubert Zhang zson...@gmail.com wrote:
I found hector is not a good design.
1. We cannot create multiple threads (each thread have a connection to
cassandra server)
54 matches
Mail list logo