Re: Problem with JVM? concurrent mode failure

2010-04-28 Thread Peter Schuller
    -XX:+CMSIncrementalMode \     -XX:+CMSIncrementalPacing \ This may not be an issue given your other VM opts, but just FYI I have had some difficulty making the incremental CMS mode perform GC work sufficiently aggressively to avoid concurrent mode failures during significant

Detailed behavior of insert() operation?

2010-04-28 Thread Roland Hänel
Does Cassandra make any guarantees on the outcome of a scenario like this: Two clients insert the same key/colum with different values at the same time: client A does insert(keyspace, key_1, column_name_1, value_A, timestamp_1, consistency_level.QUORUM) client B does insert(keyspace,

Re: Problem with JVM? concurrent mode failure

2010-04-28 Thread Daniel Gimenez
Thanks Jonathan, Brandon and Peter for your quick response. I'm going to test the issue's workaround. Also I will test batch mode instead of periodic mode for the commit log and I'll keep you informed. Thanks! Daniel Gimenez. -- View this message in context:

Re: Cassandra cluster runs into OOM when bulk loading data

2010-04-28 Thread Roland Hänel
There are other threads linked to this issue. Most notable, I think we're hitting https://issues.apache.org/jira/browse/CASSANDRA-1014 here. 2010/4/27 Schubert Zhang zson...@gmail.com Seems: ROW-MUTATION-STAGE 32 3349 63897493 is the clue, too many mutation requests are

Re: Multiple keyspaces per application?

2010-04-28 Thread David Boxenhorn
Thanks all! The reason I was thinking of having two keyspaces is that I expect them to evolve at different rates. Our normal column families will change rarely (hopefully never) but our index column families will change whenever we want to query the data in a new way, that isn't supported by the

Re: Multiple keyspaces per application?

2010-04-28 Thread Sylvain Lebresne
From what you've all said, it doesn't seem like it's worth it. No. But you will want to follow that https://issues.apache.org/jira/browse/CASSANDRA-1007 On Wed, Apr 28, 2010 at 1:13 AM, Mark Robson mar...@gmail.com wrote: I can't see any advantage in using multiple keyspaces. It is highly

Re: How to permanently delete one key ?

2010-04-28 Thread Schubert Zhang
I think even through the real deletion is done when compaction. The get/get_range_slices should not return the deleted-marked keys (or columns). Schubert On Wed, Apr 28, 2010 at 1:39 PM, Jeff Zhang zjf...@gmail.com wrote: Thanks Lu, it's helpful. On Wed, Apr 28, 2010 at 11:42 AM, Greg Lu

Re: Is SuperColumn necessary?

2010-04-28 Thread Schubert Zhang
I don't think secondary index is necessary for cassandra core, at least it is not urgent. I think currently, the first urgent improvements of cassandra are: 1. re-clarify the data-model. 2. re-implement the storage and index, especially the current SSTable implement is not good. In fact, the

Re: Is SuperColumn necessary?

2010-04-28 Thread Schubert Zhang
I think, at least currently, we should leave the logic of current SuperColumn and addational indexing features to application layer of cassandra core. On Wed, Apr 28, 2010 at 6:44 PM, Schubert Zhang zson...@gmail.com wrote: I don't think secondary index is necessary for cassandra core, at least

compaction slow while sstable25GB,limitation of the sstable size?

2010-04-28 Thread casablinca126.com
hi, The compaction process is very slow, when the size of new generating sstable file grows upon 25GB; at the meantime, the garbage collector is running frequently. Firstly, I have a question that, is there a limitation of the sstable size? if not, is 2GB heap size not enough

Re: inserting rows in columns inside a supercolumn

2010-04-28 Thread Julio Carlos Barrera Juez
OK, I have solved my problems with Cassandra data model. Now I am using Column Families of type Super and SuperColumns with many columns inside. Thanks! 2010/4/16 Julio Carlos Barrera Juez juliocar...@gmail.com Hi again, First of all, obviously, I have omitted the timestamps to make easy the

Login failure with SimpleAuthenticator

2010-04-28 Thread Julio Carlos Barrera Juez
Hi all! I am using org.apache.cassandra.auth.SimpleAuthenticator to use authentication in my cluster with one node (with cassandra 0.6.1). I have put: Authenticatororg.apache.cassandra.auth.SimpleAuthenticator/Authenticator in storage-conf.xml file, and: keyspace=username in access.properties

Re: Is SuperColumn necessary?

2010-04-28 Thread David Boxenhorn
If I understand correctly, the distinction between supercolumns and subcolumns is critical to good database design if you want to use random partitioning: you can do range queries on subcolumns but not on supercolumns. Is this correct? On Mon, Apr 26, 2010 at 7:11 PM, Jonathan Ellis

Re: inserting rows in columns inside a supercolumn

2010-04-28 Thread Sylvain Lebresne
OK, I have solved my problems with Cassandra data model. Now I am using Column Families of type Super and SuperColumns with many columns inside. You need to be aware of the third point of http://wiki.apache.org/cassandra/CassandraLimitations. That is, super columns are not indexed. Which means

What's the best maximum size for a single column?

2010-04-28 Thread Dop Sun
Hi, Yesterday, I saw a lot of discussion about how to store a file (big one). It looks like the suggestion is store in multiple rows (even not multiple column in a single row). My question is: Is there any best maximum column size which can help to make the decision on the segment size?

RE: Long Time relational Database Programmer needing help

2010-04-28 Thread Dop Sun
Hi, Here are some links I collected: 1. http://wiki.apache.org/cassandra/CassandraCli: this is how bring it up and run 2. http://arin.me/blog/wtf-is-a-supercolumn-cassandra-data-model is very good to start to understand the schema 3.

Re: question about how columns are deserialized in memory

2010-04-28 Thread Sylvain Lebresne
2010/4/28 Даниел Симеонов dsimeo...@gmail.com: Hi Sylvain,   Thank you very much! I still have some further questions, I didn't find how row cache is being configured? Provided you don't use trunk but something stable like 0.6.1 (which you should), it is in storage-conf.xml. It's one option of

Re: How do I change the Cluster Name in the CLI?

2010-04-28 Thread yangfeng
I also want to know 2010/4/28 David Boxenhorn da...@lookin2.com When I change the cluster name in storage-conf.xml, the CLI complains that the cluster name doesn't equal Test Cluster. How do I change the cluster name that the CLI looks for?

Inserting files to Cassandra timeouts

2010-04-28 Thread Jussi P?öri
new try, previous went to wrong place... Hi all, i'm trying to run a scenario of adding files from specific folder to cassandra. Now I have 64 files(about 15-20 MB per file) and overall of 1GB of data. I'm able to insert a round 40 files, but after that the cassandra goes to some GC loop and

How can I view all data??

2010-04-28 Thread David Boxenhorn
Is there a Cassandra Navigator, or some way that I can see the data in Cassandra if I don't know what the keys are?

Correct data model for Cassandra

2010-04-28 Thread Oleg Ivanov
Hello, our company has a huge table in a relational database which keeps statistics of some financional operations. It looks like the following: SERVER_ID - server, which served the transaction ACCOUNT_FROM - account1 ACCOUNT_TO - account2 HOUR - time range for this statistics row (from 0 minutes

Re: Cassandra reverting deletes?

2010-04-28 Thread Jonathan Ellis
It sounds like either there is a fairly obvious bug, or you're doing something wrong. :) Can you reproduce against a single node? On Tue, Apr 27, 2010 at 5:14 PM, Joost Ouwerkerk jo...@openplaces.org wrote: Update: I ran a test whereby I deleted ALL the rows in a column family, using a

Re: error during snapshot

2010-04-28 Thread Lee Parker
The thing is, that I'm not running close to being out of memory. The data from nodetool info is showing that only about half of the available heap space is being used and running free from the command line shows that I have plenty of RAM available and some usage of the 1G swap space which is

Re: compaction slow while sstable25GB,limitation of the sstable size?

2010-04-28 Thread Jonathan Ellis
Compaction time is proportional to the size of the sstable, yes. Not sure how it could be otherwise. And it does generate a lot of garbage. So unless you are seeing concurrent failures in the GC and corresponding large pause times, your heap should be fine, as long as the rows you are

Re: Detailed behavior of insert() operation?

2010-04-28 Thread Roland Hänel
Thanks Jonathan, that hits exactly the heart of my question. Unfortunately it kills my original idea to implement a unique transaction identifier creation algorithm - for this, even eventual consistency would be sufficient, but I would need to know if I am consistent at the time of a read request.

Re: question about how columns are deserialized in memory

2010-04-28 Thread Даниел Симеонов
Hi, What about if the upper bound of columns in a row is loosely defined, i.e. it is ok that we have maximum of around 100 for example, but not exactly (maybe 105, 110)? What if I make a slice query to return say 1/5th of the columns in a row, I believe that such query again will not deserialize

Re: how to store file in the cassandra?

2010-04-28 Thread Tatu Saloranta
On Tue, Apr 27, 2010 at 10:49 PM, Jeff Zhang zjf...@gmail.com wrote: Mark, Thanks for your suggestion, It's really not a good idea to store one file in multiple columns in one row. The heap space problem will still exist. And I take your advice to store it in multiple rows, it works, I can

Re: Detailed behavior of insert() operation?

2010-04-28 Thread Sylvain Lebresne
One last question (sorry to bother you): isn't the behavior of read repair strictly deterministic in this case? You say both read requests could try to read repair the result (each time in the opposite direction). Inside the read repair algorithm, when we have exactly the same timestamps, what

Re: question about how columns are deserialized in memory

2010-04-28 Thread Sylvain Lebresne
Hi,   What about if the upper bound of columns in a row is loosely defined, i.e. it is ok that we have maximum of around 100 for example, but not exactly (maybe 105, 110)? What if I make a slice query to return say 1/5th of the columns in a row, I believe that such query again will not

Re: Inserting files to Cassandra timeouts

2010-04-28 Thread Schubert Zhang
I think your file (as cassandra column value) is too large. And I also think Cassandra is not good at store files. On Wed, Apr 28, 2010 at 10:24 PM, Jussi P?öri ju...@androidconsulting.comwrote: new try, previous went to wrong place... Hi all, i'm trying to run a scenario of adding files

Re: inserting rows in columns inside a supercolumn

2010-04-28 Thread Schubert Zhang
Your schema desigin is a RDBMS schema, not a Cassandra schema. On Thu, Apr 15, 2010 at 11:44 PM, Miguel Verde miguelitov...@gmail.comwrote: Just to nitpick your representation a little bit, columnB/etc... are supercolumnB/etc..., key1/etc... are column1/etc..., and you can probably omit

Re: Is SuperColumn necessary?

2010-04-28 Thread Mike Malone
On Wed, Apr 28, 2010 at 5:24 AM, David Boxenhorn da...@lookin2.com wrote: If I understand correctly, the distinction between supercolumns and subcolumns is critical to good database design if you want to use random partitioning: you can do range queries on subcolumns but not on supercolumns.

Re: Inserting files to Cassandra timeouts

2010-04-28 Thread Jussi P?öri
I was thinking this too, but I think that the overall insert amount is not that big. Data is basically map data, and the files are map tiles, which I can easily make smaller. We are currently using this data from multiple nodes(GRID), but we want to get rid off the files system hassle(basically

Re: how to store file in the cassandra?

2010-04-28 Thread Robert Coli
On 4/26/10 2:44 AM, dir dir wrote: Suppose I have a MPEG video files 15 MB. To save this video file into Cassandra database I will store this file into array of byte. One day, I feel this video is not necessary again, therefore I delete it from the database. My question is, after I delete this

Re: What's the best maximum size for a single column?

2010-04-28 Thread uncle mantis
There is no column size limitation. As to performance due to the size of a column and with the speeds that Cassandra are running at, I don't belive it would make a bit of a difference if it was 1 byte or a million bytes. Can anyone here prove me right or wrong? Regards, Michael On Wed, Apr

Memory usage continually increases with reads

2010-04-28 Thread Kyusik Chung
Hello. I am using Cassandra 0.6.1 on ubuntu 8.04. 3 node cluster. I notice that when I start making lots of read requests (serially), memory usage of jsvc keeps climbing until it uses up all memory on the server (happens for all 3 servers in the cluster). At that point, the box starts

Re: Memory usage continually increases with reads

2010-04-28 Thread Ryan King
On Wed, Apr 28, 2010 at 12:12 PM, Kyusik Chung kyu...@discovereads.com wrote: Hello.  I am using Cassandra 0.6.1 on ubuntu 8.04.  3 node cluster. I notice that when I start making lots of read requests (serially), memory usage of jsvc keeps climbing until it uses up all memory on the server

Re: Memory usage continually increases with reads

2010-04-28 Thread Kyusik Chung
Hi Ryan, Do you mean these settings, or other settings? SlicedBufferSizeInKB64/SlicedBufferSizeInKB FlushDataBufferSizeInMB32/FlushDataBufferSizeInMB FlushIndexBufferSizeInMB8/FlushIndexBufferSizeInMB ColumnIndexSizeInKB64/ColumnIndexSizeInKB MemtableThroughputInMB64/MemtableThroughputInMB

Re: How do I change the Cluster Name in the CLI?

2010-04-28 Thread Jonathan Ellis
On Wed, Apr 28, 2010 at 3:17 AM, David Boxenhorn da...@lookin2.com wrote: When I change the cluster name in storage-conf.xml, the CLI complains that the cluster name doesn't equal Test Cluster. What do you mean? I don't see any checks for cluster name equality in the CLI code. -- Jonathan

Re: How do I change the Cluster Name in the CLI?

2010-04-28 Thread Brandon Williams
On Wed, Apr 28, 2010 at 3:17 AM, David Boxenhorn da...@lookin2.com wrote: When I change the cluster name in storage-conf.xml, the CLI complains that the cluster name doesn't equal Test Cluster. How do I change the cluster name that the CLI looks for? I don't think you mean the CLI, but the

Re: Cassandra reverting deletes?

2010-04-28 Thread Joost Ouwerkerk
Yes! Reproduced on single-node cluster: 10/04/28 16:30:24 INFO mapred.JobClient: ROWS=274884 10/04/28 16:30:24 INFO mapred.JobClient: TOMBSTONES=951083 10/04/28 16:42:49 INFO mapred.JobClient: ROWS=166580 10/04/28 16:42:49 INFO mapred.JobClient: TOMBSTONES=1059387 On Wed, Apr

Re: Storage Layout Questions

2010-04-28 Thread Jonathan Shook
Ah, now I understand. Supercolumns it is. On Wed, Apr 28, 2010 at 9:40 AM, Jonathan Ellis jbel...@gmail.com wrote: I don't think you are missing anything. You'll have to pick your poison. FWIW, if each BAR has relatively few fields then supercolumns aren't bad. It's when a BAR has

Re: Memory usage continually increases with reads

2010-04-28 Thread Time Less
This sounds similar to /proc/sys/vm/swappiness misconfiguration. Is it zero or close to zero? If setting it 0 solves your problem, make sure all your nodes get this: /etc/sysctl.conf: vm.swappiness=0 On Wed, Apr 28, 2010 at 12:12 PM, Kyusik Chung kyu...@discovereads.comwrote: Hello. I am

Re: problem building source

2010-04-28 Thread Colin Taylor
OK so the issue seems to be that the maven repo's web server (nginx) sends though files gzipped regardless as to whether or not the client requested as such. Unfortunately I cant work out to share this information with Ivy. Switching to Ibiblio repository leads to another set of problems. On

Re: Memory usage continually increases with reads

2010-04-28 Thread Kyusik Chung
Isnt setting swappiness to a lower value a good idea only if you know you have the physical RAM to support it? What Im observing on my box is that jsvc uses up all the physical RAM. Its VM size is 4-5GB right now (not sure if it will continue to grow). Apologies if Im misunderstanding how

Re: Cassandra use cases: as a datagrid ? as a distributed cache ?

2010-04-28 Thread Lisen Mu
http://www.reddit.com/r/programming/comments/bcqhi/reddits_now_running_on_cassandra/ It seems to me that they are still using Cassandra in persistant storage layer as a replacement of memcachedb, not in cache layer. I'm new here with Cassandra actually, but now I'm also curious about the

Re: Cassandra use cases: as a datagrid ? as a distributed cache ?

2010-04-28 Thread Lisen Mu
Facebook did a lot of work to keep their huge memcache cluster consistent and fault-tolerant. I think a cache infrastructure like Cassandra would make that a lot easier. On Thu, Apr 29, 2010 at 11:54 AM, Lisen Mu imm...@gmail.com wrote:

Re: Cassandra's bad behavior on disk failure

2010-04-28 Thread Schubert Zhang
On Wed, Apr 21, 2010 at 10:08 PM, Oleg Anastasjev olega...@gmail.comwrote: Hello, I am testing how cassandra behaves on single node disk failures to know what to expect when things go bad. I had a cluster of 4 cassandra nodes, stress loaded it with client and made 2 tests: 1. emulated

Re: How can I view all data??

2010-04-28 Thread Jonathan Ellis
use get_range_slices, with a start key of '', and page through it On Wed, Apr 28, 2010 at 9:26 AM, David Boxenhorn da...@lookin2.com wrote: Is there a Cassandra Navigator, or some way that I can see the data in Cassandra if I don't know what the keys are? -- Jonathan Ellis Project Chair,

Re: error during snapshot

2010-04-28 Thread Jonathan Ellis
Interesting. Googling your error turns up http://stackoverflow.com/questions/1124771/how-to-solve-java-io-ioexception-error12-cannot-allocate-memory-calling-runt Why not just leave the swap on? It's usually a Good Thing to be able to page out unused memory, and use the ram for buffer cache

Re: Cassandra reverting deletes?

2010-04-28 Thread Jonathan Ellis
Good! :) Can you reproduce w/o map/reduce, with raw get_range_slices? On Wed, Apr 28, 2010 at 3:56 PM, Joost Ouwerkerk jo...@openplaces.org wrote: Yes! Reproduced on single-node cluster: 10/04/28 16:30:24 INFO mapred.JobClient:     ROWS=274884 10/04/28 16:30:24 INFO mapred.JobClient:    

Re: Cassandra data model for financial data

2010-04-28 Thread Schubert Zhang
key : stock ID, e.g. AAPL+year column family: closting price and valume, tow CFs. colum name: timestamp LongType AAPL+2010- CF:closingPrice - {'04-13' : 242, '04-14': 245} AAPL+2010- CF:volume - {'04-13' : 242, '04-14': 245} On Thu, Apr 22, 2010 at 2:00 AM, Miguel Verde

Re: Cassandra Java Client

2010-04-28 Thread Schubert Zhang
I found hector is not a good design. 1. We cannot create multiple threads (each thread have a connection to cassandra server) to one cassandra server. As we known, usually, cassandra client should be multiple-threads to achieve good throughput. 2. The implementation is too fat. 3. Introduce

Re: Cassandra Java Client

2010-04-28 Thread Ran Tavory
Hi Schubert, I'm sorry Hector isn't a good fit for you, so let's see what's missing for your. On Thu, Apr 29, 2010 at 8:22 AM, Schubert Zhang zson...@gmail.com wrote: I found hector is not a good design. 1. We cannot create multiple threads (each thread have a connection to cassandra server)