Re: solandra or pig or....?
I can speak for what I know : Pig I have taken only a quick look and maybe some guys from Twitter can answer better than me on that particular program. Pig is not for on demand queries: they are quite slow and as you said you extract relevant information and append it to another CF where you can retrieve quickly the statistics. SolR is purely a search engine. It is not only text based but also time based etc... To do statistics you need mathematical operations, statistics, SolR won't provide that. It can do simple things in terms of statistics but mostly it is a search engine. Personally for what you are asking I would use Pig and stock that in CF. I would update those CF regularly. For simple statistics you can generate them with your favorite language or a specialized language such as R as long as it concerns small sets. Hope it helps, Victor Kabdebon 2011/6/21 Sasha Dolgy sdo...@gmail.com Folks, Simple question ... Assuming my current use case is the ability to log lots of trivial and seemingly useless sports statistics ... I want a user to be able to query / compare For example: -- Show me all baseball players in cheektowaga and ontario, california who have hit a grandslam on tuesdays where it was just a leap year. Each baseball player is represented by a single row in a CF: player_uuid, fullname, hometown, game1, game2, game3, game4 Game's are UUID's that are a reference to another row in the same CF that provides information about that game... location, final score, date (unix timestamp or ISO format) , and statitics which are represented as a new column timestamp:player_uuid I can use PIG, as I understand, to run a query to generate specific information about specific things and populate that data back into Cassandra in another CF ... similar to the hypothetical search aboveas the information is structured already, i assume PIG is the right tool for the job, but may not be ideal for a web application and enabling ad-hoc queries ... it could take anywhere from 2-? seconds for that query to generate, populate, and return to the user...? On the other hand, I have started to read about Solr / Solandra / Lucandra can this provide similar functionality or better ? or is it more geared towards full text search and indexing ... I don't want to get into the habit of guessing what my potential users want to search for ... trying to think of ways to offload this to them. -- Sasha Dolgy sasha.do...@gmail.com
Re: New web client future API
Ok thanks for the update. I thought the query string was translated to Thrift, then send to a server. Victor Kabdebon 2011/6/15 Eric Evans eev...@rackspace.com On Tue, 2011-06-14 at 09:49 -0400, Victor Kabdebon wrote: Actually from what I understood (please correct me if I am wrong) CQL is based on Thrift / Avro. In this project, we tend to use the word Thrift as a sort of shorthand for Cassandra's RPC interface, and not, The serialization and RPC framework from the Apache Thrift project. CQL does not (yet )have its own networking protocol, so it uses Thrift as a means of delivering queries, and serializing the results, but it is *not* a wrapper around the existing RPC methods. The query string you provide is parsed entirely on the server. -- Eric Evans eev...@rackspace.com
Re: New web client future API
Hello Markus, Actually from what I understood (please correct me if I am wrong) CQL is based on Thrift / Avro. Victor Kabdebon 2011/6/14 Markus Wiesenbacher | Codefreun.de m...@codefreun.de Hi, what is the future API for Cassandra? Thrift, Avro, CQL? I just released an early version of my web client (http://www.codefreun.de/apollo http://www.codefreun.de/apollo) which is Thrift-based, and therefore I would like to know what the future is ... Many thanks MW
Re: When should I use Solandra?
Why do you need Solandra for storing data ? If you want to retrieve data simply use Cassandra. Solandra is for research and indexing it is a search engine. I do not recommand you to store data uniquely in a search engine. Use the following desgin : *Store ALL data in Cassandra then extract from Cassandra only the data you need to index in Solandra. For what it matters you can use Solr instead of Solandra. In SolR you have something called schema.xml where you can set up which fields to index. My advice is do not store you passwords in plain text. Add salt (random sequence) AND hash it then insert the bytes in Cassandra. Otherwise you'll end up like Sony and a massive lawsuit when hackers will breach in your website and steal the passwords.* If you really want to use Solandra I guess there is an equivalent to the schema.xml where you have lines to tell wether or not to index some fields. Victor Kabdebon http://www.victorkabdebon.com 2011/6/4 Jean-Nicolas Boulay Desjardins jnbdzjn...@gmail.com Hi, I am planning to use Cassandra to store my users passwords and at the same time data for my website that need to be accessible via search. My Question is should I use two DB: Cassandra (for users passwords) and Solandra (for the websites data) or can I put everything in Solandra? Is there a way to stop Solandra from indexing my users passwords? Thanks in advance for any help.
Re: Appending to fields
As Jonathan stated I believe that the insert is in O(N + M), unless there are some operations that I don't know. There are other NoSQL database that can be used with Cassandra as buffers for quick access and modification and then after the content can be dumped into Cassandra for long term storage. Here is an example with Redis : http://redis.io/commands/append The append command is said to be in O(1) but it is a little bit suspicious to me... Best regards, Victor Kabdebon http://www.voxnucleus.fr 2011/5/31 Jonathan Ellis jbel...@gmail.com On Tue, May 31, 2011 at 2:22 PM, Marcus Bointon mar...@synchromedia.co.uk wrote: mysql reads the entire value of y, appends the data, then writes the whole thing back, which unfortunately is an O(n^2) operation. Actually, this analysis is incorrect. Appending M bytes to N is O(N + M) which isn't the same as N^2 at all. At least in Cassandra, nor can I think of any possible algorithm which would allow MySQL to achieve N^2, but I don't claim to be an expert there. -- Jonathan Ellis Project Chair, Apache Cassandra co-founder of DataStax, the source for professional Cassandra support http://www.datastax.com
Re: Cassandra 0.8 questions
It's not really possible to give a general answer your second question, it depends of your implementation. Personally I do two thing : the first one is to map arrays with a key and then name of column as a key of your array and value of column as the data storage. However for some application, as I am using Java I just serialize my ArrayList (or List) and push all the content to one column. It all depends on what you want to achieve. Third question: try to make CF according to what you want to achieve. I am designing an internal messaging system I use only two column family to hold the message lists, message and message box. I would have used one; but I need one that is sorted by TimeUUID and the other one by UTF8Type. I think there is a general consensus here : try to avoid super columns. 2 sets of columns can do the same jobs has one SuperColumn and it's the preferred scheme. Again just experiment and be ready to change your organization if you begin with Cassandra, this is the best way to figure out what to do for your data organization. Victor Kabdebon http://www.voxnucleus.fr http://www.victorkabdebon.net 2011/5/24 Jian Fang jian.fang.subscr...@gmail.com Does anyone have a good suggestion on my second question? I believe that question is a pretty common one. My third question is a design question. For the same data, we can stored them into multiple column families or a single column family with multiple super columns. From Cassandra read/write performance point of view, what are the general rules to make mutliple column families and when to use a single column family? Thanks again, John On Mon, May 23, 2011 at 5:47 PM, Jian Fang jian.fang.subscr...@gmail.comwrote: Hi, I am pretty new to Cassandra and am going to use Cassandra 0.8.0. I have two questions (sorry if they are very basic ones): 1) I have a column family to hold many super columns, say 30. When I first insert the data to the column family, do I need to insert each column one at a time or can I insert the whole column family in one transaction (or call?)? The latter one seems to be more efficient to me. Does Cassandra support that? For example, I saw the following code to do insertion (with Hector), Mutator m = HFactory.createMutator(keyspace, stringSerializer); //MutatorString m = HFactory.createMutator(keyspace,stringSerializer); m.insert(p.getCassandraKey(), colFamily, HFactory.createStringColumn(type, p.getStringValue())); m.insert(p.getCassandraKey(), colFamily, HFactory.createColumn(data, p.getCompressedXML(), StringSerializer.get(), BytesArraySerializer.get())); Will the insertions be two separate calls to Cassandra? Or they are just one transaction? If it is the former case, is there any way to make them as one call to Cassandra? 2) How to store a list/array of data in Cassandra? For example, I have a data field called categories, which include none or many categories and each category includes a category id and a category description. Usually, how do people handle this scenario when they use Cassandra? Thanks in advance, John
Re: CQL v1.0.0: why super column family not descirbed in it?
Hello Eric, Compound columns seem to be a very interesting feature. Do you have any idea in which Cassandra version it is going to be introduced : 0.8.X or 0.9.X ? Thanks, Victor 2011/5/5 Eric Evans eev...@rackspace.com On Thu, 2011-05-05 at 18:19 +0800, Guofeng Zhang wrote: I read the CQL v1.0 document. There are operations about column families, but it does not describe how to operate on super column families. Why? Does this mean that super column families would not be supported by CQL in this version? Will it be supported in the future? No CQL will never support super columns, but later versions (not 1.0.0) will support compound columns. Compound columns are better; instead of a two-deep structure, you can have one of arbitrary depth. What you see is what you get for 1.0.0, there simply wasn't enough time to do everything (you have to start somewhere). -- Eric Evans eev...@rackspace.com
Re: CQL v1.0.0: why super column family not descirbed in it?
Thank you, I will look into that and I will probably wait until there is an out of the box comparator. But it's an excellent new feature ! Regards, Victor K. 2011/5/5 Eric Evans eev...@rackspace.com On Thu, 2011-05-05 at 10:49 -0400, Victor Kabdebon wrote: Hello Eric, Compound columns seem to be a very interesting feature. Do you have any idea in which Cassandra version it is going to be introduced : 0.8.X or 0.9.X ? You can use these today with a custom comparator[1]. There is an open issue[2] (marked as for-0.8.1) to ship one out-of-the-box. Language support[3] for CQL will probably take a bit longer. [1]: https://github.com/edanuff/CassandraCompositeType [2]: https://issues.apache.org/jira/browse/CASSANDRA-2231 [3]: https://issues.apache.org/jira/browse/CASSANDRA-2474 -- Eric Evans eev...@rackspace.com
Re: database design
Dear Jean-Yves, You can have a different approach of the problem. You need on one side a relational database (MySQL, PostGreSQL) or SolR (as an very efficient index) and on the other side Cassandra. The relational database or SolR must contain the minimum amount of information possible : a date and only the relevant data. It enabled me to keep a simple model for Cassandra. Cassandra will act as a vault where you keep all the data and then you dispatch the data from Cassandra to the relational database or SolR. When you want to query you query against SolR or the relational data the key / column / supercolumn and you retrieve the complete data from Cassandra. The hard thing is to maintain the coherence between the query part and the Cassandra part. I speak from personal experience but it was very hard for me to use only Cassandra to do everything my (small amateur) website needed. Now I found an alternative I use : Cassandra (data vault) + Redis (Sessions and other volatile data) + SolR (Search engine) + PostGreSQL ( for relational queries). Best regards, Victor Kabdebon http://www.voxnucleus.fr 2011/4/13 Edward Capriolo edlinuxg...@gmail.com On Wed, Apr 13, 2011 at 10:39 AM, Jean-Yves LEBLEU jleb...@gmail.com wrote: Hi all, Just some thoughts and question I have about cassandra data modeling. If I understand well, cassandra is better on writing than on reading. So you have to think about your queries to design cassandra schema. We are doing incremental design, and already have our system in production and we have to develop new queries. How do you usualy do when you have new queries, do you write a specific job to update data in the database to match the new query you are writing ? Thanks for your help. Jean-Yves Good point, Generally you will need to write some type of range scanning/map reduce application to process and back fill your data.
Re: Abnormal memory consumption
And about the production 7Gb or RAM is sufficient ? Or 11 Gb is the minimum ? Thank you for your inputs for the JVM I'll try to tune that 2011/4/4 Peter Schuller peter.schul...@infidyne.com You can change VM settings and tweak things like memtable thresholds and in-memory compaction limits to get it down and get away with a smaller heap size, but honestly I don't recommend doing so unless you're willing to spend some time getting that right and probably repeating some of the work in the future with future versions of Cassandra. That said, if you do want to do so to give it a try, I suggest (1) changing cassandra-env to remove all the GC stuff: VM_OPTS=$JVM_OPTS -XX:+UseParNewGC JVM_OPTS=$JVM_OPTS -XX:+UseConcMarkSweepGC JVM_OPTS=$JVM_OPTS -XX:+CMSParallelRemarkEnabled JVM_OPTS=$JVM_OPTS -XX:SurvivorRatio=8 JVM_OPTS=$JVM_OPTS -XX:MaxTenuringThreshold=1 JVM_OPTS=$JVM_OPTS -XX:CMSInitiatingOccupancyFraction=75 JVM_OPTS=$JVM_OPTS -XX:+UseCMSInitiatingOccupancyOnly And then setting a fixed heap size, and removing the manual fixation of new gen: JVM_OPTS=$JVM_OPTS -Xmn${HEAP_NEWSIZE} Then maybe remove the initial heap size enforcement, but that might not help depending: JVM_OPTS=$JVM_OPTS -Xms${MAX_HEAP_SIZE} And then go through cassandra.yaml and tune down all the various limitations. Less concurrent readers/writers, all the *_mb_* settings way down, and the RPC framing limitations. But let me re-iterate: I don't recommend running in any such configuration in production. But if you just want it running for testing/for just being available, with no special requirements, and not in production, the above might work. I haven't really tested it myself; there may be gotchas involved. -- / Peter Schuller
Re: memory consuption
Is it possible to change the maximum JVM heap memory use in 0.6.X ? 2011/2/17 Aaron Morton aa...@thelastpickle.com What are you using for disk_access_mode ? Have you tried reducing the JVM head size? Have you added the Jna.jar file to lib/ ? This will allow Cassandra to lock the JVM memory. Aaron On 17/02/2011, at 9:20 PM, ruslan usifov ruslan.usi...@gmail.com wrote: 2011/2/16 Aaron Morton aa...@thelastpickle.comaa...@thelastpickle.com JVM heap memory is controlled by the settings in conf/Cassandra-env.sh Memory mapped files will use additional virtual memory, is controlled in conf/Cassandra.yaml disk_access_mode And??? JVM memory heap in cassandra 0.7 is by default half of memory is system in my case 4GB, here is a part of cassandra-env.sh: calculate_heap_size() { case `uname` in Linux) system_memory_in_mb=`free -m | awk '/Mem:/ {print $2}'` MAX_HEAP_SIZE=$((system_memory_in_mb / 2))M return 0 ;; FreeBSD) system_memory_in_bytes=`sysctl hw.physmem | awk '{print $2}'` MAX_HEAP_SIZE=$((system_memory_in_bytes / 1024 / 1024 / 2))M return 0 ;; *) MAX_HEAP_SIZE=1024M return 1 ;; esac } I set all this options by default. All my nodes have 8GB of memory. And i affraid that after some time all my nodes goes to hard swap, and only reboot help them :-((( PS: as i understand that down sometime of cassandra is normal?
Re: memory consuption
Oh right but Cassandra doesn't really respect that, I thought there was another option to set that. Just for your information, I set xms and xmx very low with a small amount of data. I am waiting to be able to connect jconsole, I don't know why it is not reachable at the moment. Here is my result : 105 26115 0.2 27.3 1125328 755316 ? Sl Feb09 23:58 /usr/bin/java -ea -Xms64M -Xmx128M 2011/2/17 Aaron Morton aa...@thelastpickle.com bin/cassandra.in.sh set Xms and Xmx in the JVM_OPTS Aaron On 18 Feb, 2011,at 09:10 AM, Victor Kabdebon victor.kabde...@gmail.com wrote: Is it possible to change the maximum JVM heap memory use in 0.6.X ? 2011/2/17 Aaron Morton aa...@thelastpickle.com What are you using for disk_access_mode ? Have you tried reducing the JVM head size? Have you added the Jna.jar file to lib/ ? This will allow Cassandra to lock the JVM memory. Aaron On 17/02/2011, at 9:20 PM, ruslan usifov ruslan.usi...@gmail.com wrote: 2011/2/16 Aaron Morton aa...@thelastpickle.comaa...@thelastpickle.com JVM heap memory is controlled by the settings in conf/Cassandra-env.sh Memory mapped files will use additional virtual memory, is controlled in conf/Cassandra.yaml disk_access_mode And??? JVM memory heap in cassandra 0.7 is by default half of memory is system in my case 4GB, here is a part of cassandra-env.sh: calculate_heap_size() { case `uname` in Linux) system_memory_in_mb=`free -m | awk '/Mem:/ {print $2}'` MAX_HEAP_SIZE=$((system_memory_in_mb / 2))M return 0 ;; FreeBSD) system_memory_in_bytes=`sysctl hw.physmem | awk '{print $2}'` MAX_HEAP_SIZE=$((system_memory_in_bytes / 1024 / 1024 / 2))M return 0 ;; *) MAX_HEAP_SIZE=1024M return 1 ;; esac } I set all this options by default. All my nodes have 8GB of memory. And i affraid that after some time all my nodes goes to hard swap, and only reboot help them :-((( PS: as i understand that down sometime of cassandra is normal?
Re: memory consuption
Sorry I forgot to say that this is the partial result of : ps aux | grep cassandra Best regards 2011/2/17 Victor Kabdebon victor.kabde...@gmail.com Oh right but Cassandra doesn't really respect that, I thought there was another option to set that. Just for your information, I set xms and xmx very low with a small amount of data. I am waiting to be able to connect jconsole, I don't know why it is not reachable at the moment. Here is my result : 105 26115 0.2 27.3 1125328 755316 ? Sl Feb09 23:58 /usr/bin/java -ea -Xms64M -Xmx128M 2011/2/17 Aaron Morton aa...@thelastpickle.com bin/cassandra.in.sh set Xms and Xmx in the JVM_OPTS Aaron On 18 Feb, 2011,at 09:10 AM, Victor Kabdebon victor.kabde...@gmail.com wrote: Is it possible to change the maximum JVM heap memory use in 0.6.X ? 2011/2/17 Aaron Morton aa...@thelastpickle.com What are you using for disk_access_mode ? Have you tried reducing the JVM head size? Have you added the Jna.jar file to lib/ ? This will allow Cassandra to lock the JVM memory. Aaron On 17/02/2011, at 9:20 PM, ruslan usifov ruslan.usi...@gmail.com wrote: 2011/2/16 Aaron Morton aa...@thelastpickle.com aa...@thelastpickle.com JVM heap memory is controlled by the settings in conf/Cassandra-env.sh Memory mapped files will use additional virtual memory, is controlled in conf/Cassandra.yaml disk_access_mode And??? JVM memory heap in cassandra 0.7 is by default half of memory is system in my case 4GB, here is a part of cassandra-env.sh: calculate_heap_size() { case `uname` in Linux) system_memory_in_mb=`free -m | awk '/Mem:/ {print $2}'` MAX_HEAP_SIZE=$((system_memory_in_mb / 2))M return 0 ;; FreeBSD) system_memory_in_bytes=`sysctl hw.physmem | awk '{print $2}'` MAX_HEAP_SIZE=$((system_memory_in_bytes / 1024 / 1024 / 2))M return 0 ;; *) MAX_HEAP_SIZE=1024M return 1 ;; esac } I set all this options by default. All my nodes have 8GB of memory. And i affraid that after some time all my nodes goes to hard swap, and only reboot help them :-((( PS: as i understand that down sometime of cassandra is normal?
Re: Cassandra memory consumption
Yes I didn't see there was 2 different parameters. I was personally setting ( in cassandra 0.6.6 ) MemTableThoughputInMB, but I don't know what BinaryMemtableThroughtputInMB is. And I take this opportunity to ask a question : If you have a small amount of data per key so that your memtable is maybe a few Ko big. Is the memory footprint of the memtable going to be MemTableThoughputInMB mb or few Ko + overhead ? Ruslan I have seen your question in the other mail and I have the same problem. How many CF do you have ? 2011/2/16 ruslan usifov ruslan.usi...@gmail.com Each of your 21 column families will have its own memtable if you have the default memtable settings your memory usage will grow quite large over time. Have you tuned down your memtable size? Which config parameter make this? binary_memtable_throughput_in_mb?
Re: Cassandra memory consumption
Someone please correct me if I am wrong, but I think the overhead you can expect is something like : 16* MemTableThroughtPutInMB but I don't know when BinaryMemTableThroughputInMb come into account.. 2011/2/16 ruslan usifov ruslan.usi...@gmail.com 2011/2/16 Victor Kabdebon victor.kabde...@gmail.com Ruslan I have seen your question in the other mail and I have the same problem. How many CF do you have ? 16
Re: Cassandra memory consumption
Thanks robert, and do you know if there is a way to control the maximum likely number of memtables ? (I'd like to cap it at 2) 2011/2/16 Robert Coli rc...@digg.com On Wed, Feb 16, 2011 at 7:12 AM, Victor Kabdebon victor.kabde...@gmail.com wrote: Someone please correct me if I am wrong, but I think the overhead you can expect is something like : MemTableThroughtPutInMB * JavaOverheadFudgeFactor * maximum likely number of such memtables which might exist at once, due to flushing logic JavaOverHeadFudgeFactor is at least 2. The maximum likely number of such memtables is usually roughly 3 when considered across an assortment of columnfamilies with different write patterns. but I don't know when BinaryMemTableThroughputInMb come into account.. BinaryMemTable options are only considered when using the Binary Memtable interface. If you don't know what that is, you're not using it. =Rob
Re: online chat scenario
Hello Sasha. In this sort of real time application the way you insert (QUORUM, ONE, etc..) and the way you retrieve is extremely important because your data may not have had the time to propagate to all your nodes. Be sure to use adequate policies to do that : insert to a certain number of nodes but don't sacrifice to much time doing that to keep the real time component. Here is a presentation of how the chat is made in Facebook, it may be useful to you : http://www.erlang-factory.com/upload/presentations/31/EugeneLetuchy-ErlangatFacebook.pdf It's more focused on erlang, but it might give you ideas on how to deal with that problem (I am not sure that DB are the best way to deal with that... but it's just my opinion). Victor Kabdebon http://www.voxnucleus.fr 2011/2/15 Sasha Dolgy sdo...@gmail.com thanks for the response. thinking about this, this would not allow for the sorting of messages into a chronological order for end user display. i had thought about having each message as its own column against the room or the user, but i have had some inconsistencies in retrieving the data. sometimes i get 3 columns, sometimes i get 50...( i think this is because of the random partitioner) i had thought about this structure: [messages][nickname][message id = message data] [chatrooms][room_name][message id] this way i can pull all messages a user ever posted, not specific to a room. what i haven't been able to do so far is print the timestamp on the row or column. does this have to be explicitly added somewhere or can it be returned as part of a 'get' request? -sd On Tue, Feb 15, 2011 at 2:12 PM, Michal Augustýn augustyn.mic...@gmail.com wrote: The schema design depends on chatrooms/users/messages numbers. I.e. you can have one CF, where key is chatroom, column name is username, column value is the message and message time is the same as column timestamp. You can add day-timestamp to the chatroom name to avoid large rows. Augi 2011/2/15 Andrey V. Panov panov.a...@gmail.com I never did it. But I suppose you can use chatroom name as key and store messages nicks as columns in JSON and timestamp as columnName. -- Sasha Dolgy sasha.do...@gmail.com
Re: Subscribe
Looks like your wish has been granted. 2011/2/15 Chris Goffinet c...@chrisgoffinet.com I would like to subscribe to your newsletter. On Tue, Feb 15, 2011 at 8:04 AM, A J s5a...@gmail.com wrote:
Re: unique key generation
Yes i have done a mistake I know ! But I hoped nobody would notice :). It is the odds of winning 3 days in a row (standard probability fail). Still it is totally unlikely Sorry about this mistake, Best regards, Victor K.
Re: Cassandra memory consumption
It is really weird that I am the only one to have this issue. I restarted Cassandra today and already the memory compution is over the limit : root 1739 4.0 24.5 664968 *494996* pts/4 SLl 15:51 0:12 /usr/bin/java -ea -Xms128M -Xmx256M -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -XX:+CMSParallelRemarkEnabled -XX:SurvivorRatio=8 -XX:MaxTenuringThreshold=1 -XX:CMSInitiatingOccupancyFraction=75 -XX:+UseCMSInitiatingOccupancyOnly -XX:+HeapDumpOnOutOfMemoryError -Dcom.sun.management.jmxremote.port=8081 -Dcom.sun.management.jmxremote.ssl=false -Dcom.sun.management.jmxremote.authenticate=false -Dstorage-config=bin/../conf -cp bin/../conf:bin/../build/classes:bin/../lib/antlr-3.1.3.jar:bin/../lib/apache-cassandra-0.6.6.jar:bin/../lib/avro-1.2.0-dev.jar:bin/../lib/cassandra-javautils.jar:bin/../lib/clhm-production.jar:bin/../lib/commons-cli-1.1.jar:bin/../lib/commons-codec-1.2.jar:bin/../lib/commons-collections-3.2.1.jar:bin/../lib/commons-io-1.4.jar:bin/../lib/commons-lang-2.4.jar:bin/../lib/commons-pool-1.5.4.jar:bin/../lib/google-collections-1.0.jar:bin/../lib/hadoop-core-0.20.1.jar:bin/../lib/hector-0.6.0-14.jar:bin/../lib/high-scale-lib.jar:bin/../lib/ivy-2.1.0.jar:bin/../lib/jackson-core-asl-1.4.0.jar:bin/../lib/jackson-mapper-asl-1.4.0.jar:bin/../lib/jline-0.9.94.jar:bin/../lib/jna.jar:bin/../lib/json-simple-1.1.jar:bin/../lib/libthrift-r917130.jar:bin/../lib/log4j-1.2.14.jar:bin/../lib/perf4j-0.9.12.jar:bin/../lib/slf4j-api-1.5.8.jar:bin/../lib/slf4j-log4j12-1.5.8.jar:bin/../lib/uuid-3.1.jar org.apache.cassandra.thrift.CassandraDaemon It is really an annoying problem if we cannot really foresee memory consumption. Best regards, Victor K 2011/2/8 Victor Kabdebon victor.kabde...@gmail.com Dear all, Sorry to come back again to this point but I am really worried about Cassandra memory consumption. I have a single machine that runs one Cassandra server. There is almost no data on it but I see a crazy memory consumption and it doesn't care at all about the instructions... Note that I am not using mmap, but Standard, I use also JNA (inside lib folder), i am running on debian 5 64 bits, so a pretty normal configuration. I also use Cassandra 0.6.8. Here are the informations I gathered on Cassandra : 105 16765 0.1 34.1 1089424* 687476* ? Sl Feb02 14:58 /usr/bin/java -ea* -Xms128M* *-Xmx256M* -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -XX:+CMSParallelRemarkEnabled -XX:SurvivorRatio=8 -XX:MaxTenuringThreshold=1 -XX:CMSInitiatingOccupancyFraction=75 -XX:+UseCMSInitiatingOccupancyOnly -XX:+HeapDumpOnOutOfMemoryError -Dcom.sun.management.jmxremote.port=8081 -Dcom.sun.management.jmxremote.ssl=false -Dcom.sun.management.jmxremote.authenticate=false -Dstorage-config=bin/../conf -Dcassandra-foreground=yes -cp bin/../conf:bin/../build/classes:bin/../lib/antlr-3.1.3.jar:bin/../lib/apache-cassandra-0.6.6.jar:bin/../lib/avro-1.2.0-dev.jar:bin/../lib/cassandra-javautils.jar:bin/../lib/clhm-production.jar:bin/../lib/commons-cli-1.1.jar:bin/../lib/commons-codec-1.2.jar:bin/../lib/commons-collections-3.2.1.jar:bin/../lib/commons-io-1.4.jar:bin/../lib/commons-lang-2.4.jar:bin/../lib/commons-pool-1.5.4.jar:bin/../lib/google-collections-1.0.jar:bin/../lib/hadoop-core-0.20.1.jar:bin/../lib/hector-0.6.0-14.jar:bin/../lib/high-scale-lib.jar:bin/../lib/ivy-2.1.0.jar:bin/../lib/jackson-core-asl-1.4.0.jar:bin/../lib/jackson-mapper-asl-1.4.0.jar:bin/../lib/jline-0.9.94.jar:bin/../lib/jna.jar:bin/../lib/json-simple-1.1.jar:bin/../lib/libthrift-r917130.jar:bin/../lib/log4j-1.2.14.jar:bin/../lib/perf4j-0.9.12.jar:bin/../lib/slf4j-api-1.5.8.jar:bin/../lib/slf4j-log4j12-1.5.8.jar:bin/../lib/uuid-3.1.jar org.apache.cassandra.thrift.CassandraDaemon result of nodetool info : 116024732779488843382476400091948985708 *Load : 1,94 MB* Generation No: 1296673772 Uptime (seconds) : 467550 *Heap Memory (MB) : 120,26 / 253,94* I have about 21 column families, none of them have a lot of information ( as you see I have 2 Mb of text which is really small). Even if I set Xmx at 256 there is 687M of memory used. Where does this memory come from ? Bad garbage collection ? Something that I ignore ? Thank you for your help I really need to get rid of that problem. Best regards, Victor Kabdebon
Re: Cassandra memory consumption
Sorry Jonathan : So most of these informations were taken using the command : sudo ps aux | grep cassandra For the nodetool information it is : /bin/nodetool --host localhost --port 8081 info Regars, Victor K. 2011/2/8 Jonathan Ellis jbel...@gmail.com I missed the part where you explained where you're getting your numbers from. On Tue, Feb 8, 2011 at 9:32 AM, Victor Kabdebon victor.kabde...@gmail.com wrote: It is really weird that I am the only one to have this issue. I restarted Cassandra today and already the memory compution is over the limit : root 1739 4.0 24.5 664968 494996 pts/4 SLl 15:51 0:12 /usr/bin/java -ea -Xms128M -Xmx256M -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -XX:+CMSParallelRemarkEnabled -XX:SurvivorRatio=8 -XX:MaxTenuringThreshold=1 -XX:CMSInitiatingOccupancyFraction=75 -XX:+UseCMSInitiatingOccupancyOnly -XX:+HeapDumpOnOutOfMemoryError -Dcom.sun.management.jmxremote.port=8081 -Dcom.sun.management.jmxremote.ssl=false -Dcom.sun.management.jmxremote.authenticate=false -Dstorage-config=bin/../conf -cp bin/../conf:bin/../build/classes:bin/../lib/antlr-3.1.3.jar:bin/../lib/apache-cassandra-0.6.6.jar:bin/../lib/avro-1.2.0-dev.jar:bin/../lib/cassandra-javautils.jar:bin/../lib/clhm-production.jar:bin/../lib/commons-cli-1.1.jar:bin/../lib/commons-codec-1.2.jar:bin/../lib/commons-collections-3.2.1.jar:bin/../lib/commons-io-1.4.jar:bin/../lib/commons-lang-2.4.jar:bin/../lib/commons-pool-1.5.4.jar:bin/../lib/google-collections-1.0.jar:bin/../lib/hadoop-core-0.20.1.jar:bin/../lib/hector-0.6.0-14.jar:bin/../lib/high-scale-lib.jar:bin/../lib/ivy-2.1.0.jar:bin/../lib/jackson-core-asl-1.4.0.jar:bin/../lib/jackson-mapper-asl-1.4.0.jar:bin/../lib/jline-0.9.94.jar:bin/../lib/jna.jar:bin/../lib/json-simple-1.1.jar:bin/../lib/libthrift-r917130.jar:bin/../lib/log4j-1.2.14.jar:bin/../lib/perf4j-0.9.12.jar:bin/../lib/slf4j-api-1.5.8.jar:bin/../lib/slf4j-log4j12-1.5.8.jar:bin/../lib/uuid-3.1.jar org.apache.cassandra.thrift.CassandraDaemon It is really an annoying problem if we cannot really foresee memory consumption. Best regards, Victor K 2011/2/8 Victor Kabdebon victor.kabde...@gmail.com Dear all, Sorry to come back again to this point but I am really worried about Cassandra memory consumption. I have a single machine that runs one Cassandra server. There is almost no data on it but I see a crazy memory consumption and it doesn't care at all about the instructions... Note that I am not using mmap, but Standard, I use also JNA (inside lib folder), i am running on debian 5 64 bits, so a pretty normal configuration. I also use Cassandra 0.6.8. Here are the informations I gathered on Cassandra : 105 16765 0.1 34.1 1089424 687476 ? Sl Feb02 14:58 /usr/bin/java -ea -Xms128M -Xmx256M -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -XX:+CMSParallelRemarkEnabled -XX:SurvivorRatio=8 -XX:MaxTenuringThreshold=1 -XX:CMSInitiatingOccupancyFraction=75 -XX:+UseCMSInitiatingOccupancyOnly -XX:+HeapDumpOnOutOfMemoryError -Dcom.sun.management.jmxremote.port=8081 -Dcom.sun.management.jmxremote.ssl=false -Dcom.sun.management.jmxremote.authenticate=false -Dstorage-config=bin/../conf -Dcassandra-foreground=yes -cp bin/../conf:bin/../build/classes:bin/../lib/antlr-3.1.3.jar:bin/../lib/apache-cassandra-0.6.6.jar:bin/../lib/avro-1.2.0-dev.jar:bin/../lib/cassandra-javautils.jar:bin/../lib/clhm-production.jar:bin/../lib/commons-cli-1.1.jar:bin/../lib/commons-codec-1.2.jar:bin/../lib/commons-collections-3.2.1.jar:bin/../lib/commons-io-1.4.jar:bin/../lib/commons-lang-2.4.jar:bin/../lib/commons-pool-1.5.4.jar:bin/../lib/google-collections-1.0.jar:bin/../lib/hadoop-core-0.20.1.jar:bin/../lib/hector-0.6.0-14.jar:bin/../lib/high-scale-lib.jar:bin/../lib/ivy-2.1.0.jar:bin/../lib/jackson-core-asl-1.4.0.jar:bin/../lib/jackson-mapper-asl-1.4.0.jar:bin/../lib/jline-0.9.94.jar:bin/../lib/jna.jar:bin/../lib/json-simple-1.1.jar:bin/../lib/libthrift-r917130.jar:bin/../lib/log4j-1.2.14.jar:bin/../lib/perf4j-0.9.12.jar:bin/../lib/slf4j-api-1.5.8.jar:bin/../lib/slf4j-log4j12-1.5.8.jar:bin/../lib/uuid-3.1.jar org.apache.cassandra.thrift.CassandraDaemon result of nodetool info : 116024732779488843382476400091948985708 Load : 1,94 MB Generation No: 1296673772 Uptime (seconds) : 467550 Heap Memory (MB) : 120,26 / 253,94 I have about 21 column families, none of them have a lot of information ( as you see I have 2 Mb of text which is really small). Even if I set Xmx at 256 there is 687M of memory used. Where does this memory come from ? Bad garbage collection ? Something that I ignore ? Thank you for your help I really need to get rid of that problem. Best regards, Victor Kabdebon -- Jonathan Ellis Project Chair, Apache Cassandra co-founder of DataStax, the source for professional Cassandra support http://www.datastax.com
Re: Cassandra memory consumption
Information on the system : *Debian 5* *Jvm :* victor@testhost:~/database/apache-cassandra-0.6.6$ java -version java version 1.6.0_22 Java(TM) SE Runtime Environment (build 1.6.0_22-b04) Java HotSpot(TM) 64-Bit Server VM (build 17.1-b03, mixed mode) *RAM :* 2Go 2011/2/8 Victor Kabdebon victor.kabde...@gmail.com Sorry Jonathan : So most of these informations were taken using the command : sudo ps aux | grep cassandra For the nodetool information it is : /bin/nodetool --host localhost --port 8081 info Regars, Victor K. 2011/2/8 Jonathan Ellis jbel...@gmail.com I missed the part where you explained where you're getting your numbers from. On Tue, Feb 8, 2011 at 9:32 AM, Victor Kabdebon victor.kabde...@gmail.com wrote: It is really weird that I am the only one to have this issue. I restarted Cassandra today and already the memory compution is over the limit : root 1739 4.0 24.5 664968 494996 pts/4 SLl 15:51 0:12 /usr/bin/java -ea -Xms128M -Xmx256M -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -XX:+CMSParallelRemarkEnabled -XX:SurvivorRatio=8 -XX:MaxTenuringThreshold=1 -XX:CMSInitiatingOccupancyFraction=75 -XX:+UseCMSInitiatingOccupancyOnly -XX:+HeapDumpOnOutOfMemoryError -Dcom.sun.management.jmxremote.port=8081 -Dcom.sun.management.jmxremote.ssl=false -Dcom.sun.management.jmxremote.authenticate=false -Dstorage-config=bin/../conf -cp bin/../conf:bin/../build/classes:bin/../lib/antlr-3.1.3.jar:bin/../lib/apache-cassandra-0.6.6.jar:bin/../lib/avro-1.2.0-dev.jar:bin/../lib/cassandra-javautils.jar:bin/../lib/clhm-production.jar:bin/../lib/commons-cli-1.1.jar:bin/../lib/commons-codec-1.2.jar:bin/../lib/commons-collections-3.2.1.jar:bin/../lib/commons-io-1.4.jar:bin/../lib/commons-lang-2.4.jar:bin/../lib/commons-pool-1.5.4.jar:bin/../lib/google-collections-1.0.jar:bin/../lib/hadoop-core-0.20.1.jar:bin/../lib/hector-0.6.0-14.jar:bin/../lib/high-scale-lib.jar:bin/../lib/ivy-2.1.0.jar:bin/../lib/jackson-core-asl-1.4.0.jar:bin/../lib/jackson-mapper-asl-1.4.0.jar:bin/../lib/jline-0.9.94.jar:bin/../lib/jna.jar:bin/../lib/json-simple-1.1.jar:bin/../lib/libthrift-r917130.jar:bin/../lib/log4j-1.2.14.jar:bin/../lib/perf4j-0.9.12.jar:bin/../lib/slf4j-api-1.5.8.jar:bin/../lib/slf4j-log4j12-1.5.8.jar:bin/../lib/uuid-3.1.jar org.apache.cassandra.thrift.CassandraDaemon It is really an annoying problem if we cannot really foresee memory consumption. Best regards, Victor K 2011/2/8 Victor Kabdebon victor.kabde...@gmail.com Dear all, Sorry to come back again to this point but I am really worried about Cassandra memory consumption. I have a single machine that runs one Cassandra server. There is almost no data on it but I see a crazy memory consumption and it doesn't care at all about the instructions... Note that I am not using mmap, but Standard, I use also JNA (inside lib folder), i am running on debian 5 64 bits, so a pretty normal configuration. I also use Cassandra 0.6.8. Here are the informations I gathered on Cassandra : 105 16765 0.1 34.1 1089424 687476 ? Sl Feb02 14:58 /usr/bin/java -ea -Xms128M -Xmx256M -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -XX:+CMSParallelRemarkEnabled -XX:SurvivorRatio=8 -XX:MaxTenuringThreshold=1 -XX:CMSInitiatingOccupancyFraction=75 -XX:+UseCMSInitiatingOccupancyOnly -XX:+HeapDumpOnOutOfMemoryError -Dcom.sun.management.jmxremote.port=8081 -Dcom.sun.management.jmxremote.ssl=false -Dcom.sun.management.jmxremote.authenticate=false -Dstorage-config=bin/../conf -Dcassandra-foreground=yes -cp bin/../conf:bin/../build/classes:bin/../lib/antlr-3.1.3.jar:bin/../lib/apache-cassandra-0.6.6.jar:bin/../lib/avro-1.2.0-dev.jar:bin/../lib/cassandra-javautils.jar:bin/../lib/clhm-production.jar:bin/../lib/commons-cli-1.1.jar:bin/../lib/commons-codec-1.2.jar:bin/../lib/commons-collections-3.2.1.jar:bin/../lib/commons-io-1.4.jar:bin/../lib/commons-lang-2.4.jar:bin/../lib/commons-pool-1.5.4.jar:bin/../lib/google-collections-1.0.jar:bin/../lib/hadoop-core-0.20.1.jar:bin/../lib/hector-0.6.0-14.jar:bin/../lib/high-scale-lib.jar:bin/../lib/ivy-2.1.0.jar:bin/../lib/jackson-core-asl-1.4.0.jar:bin/../lib/jackson-mapper-asl-1.4.0.jar:bin/../lib/jline-0.9.94.jar:bin/../lib/jna.jar:bin/../lib/json-simple-1.1.jar:bin/../lib/libthrift-r917130.jar:bin/../lib/log4j-1.2.14.jar:bin/../lib/perf4j-0.9.12.jar:bin/../lib/slf4j-api-1.5.8.jar:bin/../lib/slf4j-log4j12-1.5.8.jar:bin/../lib/uuid-3.1.jar org.apache.cassandra.thrift.CassandraDaemon result of nodetool info : 116024732779488843382476400091948985708 Load : 1,94 MB Generation No: 1296673772 Uptime (seconds) : 467550 Heap Memory (MB) : 120,26 / 253,94 I have about 21 column families, none of them have a lot of information ( as you see I have 2 Mb of text which is really small). Even if I set Xmx at 256 there is 687M of memory used. Where does this memory come from ? Bad garbage collection
Re: Cassandra memory consumption
I will do that in the future and I will post my results here ( I upgraded the server to debian 6 to see if there is any change, so memory is back to normal). I will report in a few days. In the meantime I am open to any suggestion... 2011/2/8 Aaron Morton aa...@thelastpickle.com When you attach to the JVM with JConsole how much non heap memory and how much heap memory is reported on the memory tab? Xmx controls the total size of the heap memory, which excludes the permanent generation. see http://www.oracle.com/technetwork/java/javase/gc-tuning-6-140523.html#generation_sizing and http://blogs.suncom/jonthecollector/entry/presenting_the_permanent_generationhttp://blogs.sun.com/jonthecollector/entry/presenting_the_permanent_generation http://blogs.sun.com/jonthecollector/entry/presenting_the_permanent_generation Total non-heap memory on a 0.7 box I have is around 27M. You numbers seem large but it would be interesting to know what the JVM is reporting. Aaron On 09 Feb, 2011,at 05:57 AM, Victor Kabdebon victor.kabde...@gmail.com wrote: Information on the system : *Debian 5* *Jvm :* victor@testhost:~/database/apache-cassandra-0.6.6$ java -version java version 1.6.0_22 Java(TM) SE Runtime Environment (build 1.6.0_22-b04) Java HotSpot(TM) 64-Bit Server VM (build 17.1-b03, mixed mode) *RAM :* 2Go 2011/2/8 Victor Kabdebon victor.kabde...@gmail.com Sorry Jonathan : So most of these informations were taken using the command : sudo ps aux | grep cassandra For the nodetool information it is : /bin/nodetool --host localhost --port 8081 info Regars, Victor K. 2011/2/8 Jonathan Ellis jbel...@gmail.com I missed the part where you explained where you're getting your numbers from. On Tue, Feb 8, 2011 at 9:32 AM, Victor Kabdebon victor.kabde...@gmail.com wrote: It is really weird that I am the only one to have this issue. I restarted Cassandra today and already the memory compution is over the limit : root 1739 4.0 24.5 664968 494996 pts/4 SLl 15:51 0:12 /usr/bin/java -ea -Xms128M -Xmx256M -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -XX:+CMSParallelRemarkEnabled -XX:SurvivorRatio=8 -XX:MaxTenuringThreshold=1 -XX:CMSInitiatingOccupancyFraction=75 -XX:+UseCMSInitiatingOccupancyOnly -XX:+HeapDumpOnOutOfMemoryError -Dcom.sun.management.jmxremote.port=8081 -Dcom.sun.management.jmxremotessl=false -Dcom.sun.management.jmxremote.authenticate=false -Dstorage-config=bin/../conf -cp bin/../conf:bin/../build/classes:bin/../lib/antlr-3.1.3.jar:bin/../lib/apache-cassandra-06.6.jar:bin/../lib/avro-1.2.0-dev.jar:bin/../lib/cassandra-javautils.jar:bin/../lib/clhm-production.jar:bin/../lib/commons-cli-1.1.jar:bin/../lib/commons-codec-1.2.jar:bin/../lib/commons-collections-3.2.1.jar:bin/../lib/commons-io-1.4.jar:bin/../lib/commons-lang-2.4.jar:bin/../lib/commons-pool-1.5.4.jar:bin/../lib/google-collections-1.0.jar:bin/../lib/hadoop-core-0.20.1.jar:bin/../lib/hector-0.6.0-14.jar:bin/../lib/high-scale-lib.jar:bin/./lib/ivy-2.1.0.jar:bin/../lib/jackson-core-asl-1.4.0.jar:bin/../lib/jackson-mapper-asl-1.4.0.jar:bin/../lib/jline-0.9.94.jar:bin/../lib/jna.jar:bin/../lib/json-simple-1.1.jar:bin/../lib/libthrift-r917130.jar:bin/../lib/log4j-1.2.14.jar:bin/../lib/perf4j-0.9.12.jar:bin/../lib/slf4j-api-1.5.8.jar:bin/../lib/slf4j-log4j12-1.5.8.jar:bin/../lib/uuid-3.1.jar org.apache.cassandra.thrift.CassandraDaemon It is really an annoying problem if we cannot really foresee memory consumption. Best regards, Victor K 2011/2/8 Victor Kabdebon victor.kabde...@gmail.com Dear all, Sorry to come back again to this point but I am really worried about Cassandra memory consumption. I have a single machine that runs one Cassandra server. There is almost no data on it but I see a crazy memory consumption and it doesn't care at all about the instructions... Note that I am not using mmap, but Standard, I use also JNA (inside lib folder), i am running on debian 5 64 bits, so a pretty normal configuration. I also use Cassandra 0.6.8. Here are the informations I gathered on Cassandra : 105 16765 0.1 34.1 1089424 687476 ? Sl Feb02 14:58I think you are /usr/bin/java -ea -Xms128M -Xmx256M -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -XX:+CMSParallelRemarkEnabled -XX:SurvivorRatio=8 -XX:MaxTenuringThreshold=1 -XX:CMSInitiatingOccupancyFraction=75 -XX:+UseCMSInitiatingOccupancyOnly -XX:+HeapDumpOnOutOfMemoryError -Dcom.sunmanagement.jmxremote.port=8081 -Dcom.sun.management.jmxremote.ssl=false -Dcom.sun.management.jmxremote.authenticate=false -Dstorage-config=bin/../conf -Dcassandra-foreground=yes -cp bin/../conf:bin/../build/classes:bin/../lib/antlr-3.1.3.jar:bin/../lib/apache-cassandra-0.6.6.jar:bin/../lib/avro-1.20-dev.jar:bin/../lib/cassandra-javautils.jar:bin/../lib/clhm-production.jar:bin/../lib/commons-cli-1.1.jar:bin/../lib/commons-codec-1.2.jar:bin/../lib/commons-collections-3.2.1
Re: Cassandra memory consumption
Yes I have, but I have to add that this is a server where there is so little data (2.0 Mo of text, rougly a book) that even if there were an overhead due to those things it would be minimal. I don't understand what's eating up all that memory, is it because of Linux that has difficulty getting rid of used memory ... I really am puzzled. (by the way it is not a Amazon EC2 server this is a dedicated server). Regards, Victor K. 2011/2/8 Edward Capriolo edlinuxg...@gmail.com On Tue, Feb 8, 2011 at 4:56 PM, Victor Kabdebon victor.kabde...@gmail.com wrote: I will do that in the future and I will post my results here ( I upgraded the server to debian 6 to see if there is any change, so memory is back to normal). I will report in a few days. In the meantime I am open to any suggestion... 2011/2/8 Aaron Morton aa...@thelastpickle.com When you attach to the JVM with JConsole how much non heap memory and how much heap memory is reported on the memory tab? Xmx controls the total size of the heap memory, which excludes the permanent generation. see http://www.oracle.com/technetwork/java/javase/gc-tuning-6-140523.html#generation_sizing and http://blogs.suncom/jonthecollector/entry/presenting_the_permanent_generation Total non-heap memory on a 0.7 box I have is around 27M. You numbers seem large but it would be interesting to know what the JVM is reporting. Aaron On 09 Feb, 2011,at 05:57 AM, Victor Kabdebon victor.kabde...@gmail.com wrote: Information on the system : Debian 5 Jvm : victor@testhost:~/database/apache-cassandra-0.6.6$ java -version java version 1.6.0_22 Java(TM) SE Runtime Environment (build 1.6.0_22-b04) Java HotSpot(TM) 64-Bit Server VM (build 17.1-b03, mixed mode) RAM : 2Go 2011/2/8 Victor Kabdebon victor.kabde...@gmail.com Sorry Jonathan : So most of these informations were taken using the command : sudo ps aux | grep cassandra For the nodetool information it is : /bin/nodetool --host localhost --port 8081 info Regars, Victor K. 2011/2/8 Jonathan Ellis jbel...@gmail.com I missed the part where you explained where you're getting your numbers from. On Tue, Feb 8, 2011 at 9:32 AM, Victor Kabdebon victor.kabde...@gmail.com wrote: It is really weird that I am the only one to have this issue. I restarted Cassandra today and already the memory compution is over the limit : root 1739 4.0 24.5 664968 494996 pts/4 SLl 15:51 0:12 /usr/bin/java -ea -Xms128M -Xmx256M -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -XX:+CMSParallelRemarkEnabled -XX:SurvivorRatio=8 -XX:MaxTenuringThreshold=1 -XX:CMSInitiatingOccupancyFraction=75 -XX:+UseCMSInitiatingOccupancyOnly -XX:+HeapDumpOnOutOfMemoryError -Dcom.sun.management.jmxremote.port=8081 -Dcom.sun.management.jmxremotessl=false -Dcom.sun.management.jmxremote.authenticate=false -Dstorage-config=bin/../conf -cp bin/../conf:bin/../build/classes:bin/../lib/antlr-3.1.3.jar:bin/../lib/apache-cassandra-06.6.jar:bin/../lib/avro-1.2.0-dev.jar:bin/../lib/cassandra-javautils.jar:bin/../lib/clhm-production.jar:bin/../lib/commons-cli-1.1.jar:bin/../lib/commons-codec-1.2.jar:bin/../lib/commons-collections-3.2.1.jar:bin/../lib/commons-io-1.4.jar:bin/../lib/commons-lang-2.4.jar:bin/../lib/commons-pool-1.5.4.jar:bin/../lib/google-collections-1.0.jar:bin/../lib/hadoop-core-0.20.1.jar:bin/../lib/hector-0.6.0-14.jar:bin/../lib/high-scale-lib.jar:bin/./lib/ivy-2.1.0.jar:bin/../lib/jackson-core-asl-1.4.0.jar:bin/../lib/jackson-mapper-asl-1.4.0.jar:bin/../lib/jline-0.9.94.jar:bin/../lib/jna.jar:bin/../lib/json-simple-1.1.jar:bin/../lib/libthrift-r917130.jar:bin/../lib/log4j-1.2.14.jar:bin/../lib/perf4j-0.9.12.jar:bin/../lib/slf4j-api-1.5.8.jar:bin/../lib/slf4j-log4j12-1.5.8.jar:bin/../lib/uuid-3.1.jar org.apache.cassandra.thrift.CassandraDaemon It is really an annoying problem if we cannot really foresee memory consumption. Best regards, Victor K 2011/2/8 Victor Kabdebon victor.kabde...@gmail.com Dear all, Sorry to come back again to this point but I am really worried about Cassandra memory consumption. I have a single machine that runs one Cassandra server. There is almost no data on it but I see a crazy memory consumption and it doesn't care at all about the instructions... Note that I am not using mmap, but Standard, I use also JNA (inside lib folder), i am running on debian 5 64 bits, so a pretty normal configuration. I also use Cassandra 0.6.8. Here are the informations I gathered on Cassandra : 105 16765 0.1 34.1 1089424 687476 ? Sl Feb02 14:58I think you are /usr/bin/java -ea -Xms128M -Xmx256M -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -XX:+CMSParallelRemarkEnabled -XX:SurvivorRatio=8 -XX:MaxTenuringThreshold=1 -XX:CMSInitiatingOccupancyFraction=75 -XX
Re: unique key generation
Hello Kallin. If you use timeUUID the chance to generate two time the same uuid is the following : considering that both client generate the uuid at the *same millisecond*, the chance of generating the same uuid is : 1/1.84467441 × 1019Which is equal to the probability for winning a national lottery for 1e11 days in a row ( for 270 million years). Well if you do have a collision you should play the lottery :). Best regards, Victor Kabdebon http://www.voxnucleus.fr 2011/2/7 Kallin Nagelberg kallin.nagelb...@gmail.com Hey, I am developing a session management system using Cassandra and need to generate unique sessionIDs (cassandra columnfamily keys). Does anyone know of an elegant/simple way to accomplish this? I am not sure about using time based uuids on the client as there a chance that multiple clients could generate the same ID. I've heard suggestions of using zookeeper as a source for the IDs, but was just hoping that there might be something simpler for my purposes. Thanks, -Kal
Cassandra memory consumption
Dear all, Sorry to come back again to this point but I am really worried about Cassandra memory consumption. I have a single machine that runs one Cassandra server. There is almost no data on it but I see a crazy memory consumption and it doesn't care at all about the instructions... Note that I am not using mmap, but Standard, I use also JNA (inside lib folder), i am running on debian 5 64 bits, so a pretty normal configuration. I also use Cassandra 0.6.8. Here are the informations I gathered on Cassandra : 105 16765 0.1 34.1 1089424* 687476* ? Sl Feb02 14:58 /usr/bin/java -ea* -Xms128M* *-Xmx256M* -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -XX:+CMSParallelRemarkEnabled -XX:SurvivorRatio=8 -XX:MaxTenuringThreshold=1 -XX:CMSInitiatingOccupancyFraction=75 -XX:+UseCMSInitiatingOccupancyOnly -XX:+HeapDumpOnOutOfMemoryError -Dcom.sun.management.jmxremote.port=8081 -Dcom.sun.management.jmxremote.ssl=false -Dcom.sun.management.jmxremote.authenticate=false -Dstorage-config=bin/../conf -Dcassandra-foreground=yes -cp bin/../conf:bin/../build/classes:bin/../lib/antlr-3.1.3.jar:bin/../lib/apache-cassandra-0.6.6.jar:bin/../lib/avro-1.2.0-dev.jar:bin/../lib/cassandra-javautils.jar:bin/../lib/clhm-production.jar:bin/../lib/commons-cli-1.1.jar:bin/../lib/commons-codec-1.2.jar:bin/../lib/commons-collections-3.2.1.jar:bin/../lib/commons-io-1.4.jar:bin/../lib/commons-lang-2.4.jar:bin/../lib/commons-pool-1.5.4.jar:bin/../lib/google-collections-1.0.jar:bin/../lib/hadoop-core-0.20.1.jar:bin/../lib/hector-0.6.0-14.jar:bin/../lib/high-scale-lib.jar:bin/../lib/ivy-2.1.0.jar:bin/../lib/jackson-core-asl-1.4.0.jar:bin/../lib/jackson-mapper-asl-1.4.0.jar:bin/../lib/jline-0.9.94.jar:bin/../lib/jna.jar:bin/../lib/json-simple-1.1.jar:bin/../lib/libthrift-r917130.jar:bin/../lib/log4j-1.2.14.jar:bin/../lib/perf4j-0.9.12.jar:bin/../lib/slf4j-api-1.5.8.jar:bin/../lib/slf4j-log4j12-1.5.8.jar:bin/../lib/uuid-3.1.jar org.apache.cassandra.thrift.CassandraDaemon result of nodetool info : 116024732779488843382476400091948985708 *Load : 1,94 MB* Generation No: 1296673772 Uptime (seconds) : 467550 *Heap Memory (MB) : 120,26 / 253,94* I have about 21 column families, none of them have a lot of information ( as you see I have 2 Mb of text which is really small). Even if I set Xmx at 256 there is 687M of memory used. Where does this memory come from ? Bad garbage collection ? Something that I ignore ? Thank you for your help I really need to get rid of that problem. Best regards, Victor Kabdebon
Re: revisioned data
Hello Raj, No it actually doesn't make sense from the point of view of Cassandra; OrderingPartioner preserves the order of the *keys*. The Ordering will be done according to the *supercolumn name*. In that case you can set the ordering with compare_super_with (sorry I don't remember exactly the new term in Cassandra, but that's the idea). The compare_with will order your columns inside your supercolumn. However, and I think that many will agree here, tend to avoid SuperColumn. Rather than using SuperColumns try to think like that : CF1 : ObjectStore Key :ID (long) Columns : { name other fields update time (long [date]) ...} CF2 : ObjectOrder Key : myorderedobjects Column:{ { name : identifier that can be sorted value :ObjectID}, ... } Best regards, Victor Kabdebon, http://www.voxnucleus.fr 2011/2/5 Raj Bakhru rbak...@gmail.com Hi all - We're new to Cassandra and have read plenty on the data model, but we wanted to poll for thoughts on how to best handle this structure. We have simple objects that have and ID and we want to maintain a history of all the revisions. e.g. MyObject: ID (long) name other fields update time (long [date]) Any time the object changes, we'll store down a new version of the object (same ID, but different update time and other fields). We need to be able to query out what the object was as-of any time historically. We also need to be able to query out what some or all of the items of this object type were as-of any time historically.. In SQL, we'd just find the max(id) where update time queried_as_of_time In Cassandra, we were thinking of modeling as follows: CF: MyObjectType Super-Column: ID of object (e.g. 625) Column: updatetime (e.g. 1000245242) Value: byte[] of serialized object We were thinking of using the OrderingPartitioner and using range queries against the data. Does this make sense? Are we approaching this in the wrong way? Thanks a lot
Re: Using Cassandra to store files
Dear Brendan, I would really be interested by your findings too. I need a system to store various documents, I am thinking of Cassandra (that I am already using) or using a second type of database or any other system. Maybe like dan suggested, using mogilefs. Thank you, Victor Kabdebon http://www.voxnucleus.fr 2011/2/3 Dan Kuebrich dan.kuebr...@gmail.com CouchDB That's not what document-oriented means! (har har) I don't know all the details of your case, but with serving static files I suspect you could do ok with something that has a much smaller memory/cpu footprint as you won't have as great of write throughput / read latency concerns. I've used mogilefs http://www.danga.com/mogilefs/ for this before. -- View this message in context: http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Using-Cassandra-to-store-files-tp5988698p5989122.html Sent from the cassandra-u...@incubator.apache.org mailing list archive at Nabble.com.
Re: Cassandra and count
Buddasystem is right. A count returns columns to the client which count it. My advice : do not count big columns / supercolumns. People in the dev team are trying to develop distributed counters but I don't know the state of this research. Best regards, Victor Kabdebon http://www.voxnucleus.fr 2011/1/28 buddhasystem potek...@bnl.gov As far as I know, there are no aggregate operations built into Cassandra, which means you'll have to retrieve all of the data to count it in the client. I had a thread on this topic 2 weeks ago. It's pretty bad. -- View this message in context: http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Cassandra-and-count-tp5969159p5970315.html Sent from the cassandra-u...@incubator.apache.org mailing list archive at Nabble.com.
Re: cass0.7: Creating colum family Sorting
Comparator comparates only the column inside a Key. Key sorting is done by your partitionner. Best regards, Victor Kabdebon 2011/1/16 kh jo jo80la...@yahoo.com I am having some problems with creating column families and sorting them, I want to create a countries column family where I can get a sorted list of countries(by country's name) the following command fails: create column family Countries with comparator=LongType and column_metadata=[ {column_name: cid, validation_class: LongType, index_type: KEYS}, {column_name: cname, validation_class: UTF8Type}, {column_name: code, validation_class: UTF8Type, index_type: KEYS} ]; IT SHOWS: 'id' could not be translated into a LongType. the following works: create column family Countries with comparator=UTF8Type and column_metadata=[ {column_name: cid, validation_class: LongType, index_type: KEYS}, {column_name: cname, validation_class: UTF8Type}, {column_name: code, validation_class: UTF8Type, index_type: KEYS} ]; but when I insert some columns, they are not sorted as I want $countries = new ColumnFamily(Cassandra::con(), 'Countries'); $countries-insert('Afghanistan', array('cid'= '1', 'cname' = 'Afghanistan', 'code' = 'AF')); $countries-insert('Germany', array('cid'= '2', 'cname' = 'Germany', 'code' ='DE')); $countries-insert('Zimbabwe', array('cid'= '3', 'cname' = 'Zimbabwe', 'code' ='ZM')); now: list Countries; shows: --- RowKey: Germany = (column=cid, value=2, timestamp=1295211346716047) = (column=cname, value=Germany, timestamp=1295211346716047) = (column=code, value=DE, timestamp=1295211346716047) --- RowKey: Zimbabwe = (column=cid, value=3, timestamp=1295211346713570) = (column=cname, value=Zimbabwe, timestamp=1295211346713570) = (column=code, value=ZM, timestamp=1295211346713570) --- RowKey: Afghanistan = (column=cid, value=1, timestamp=1295211346709448) = (column=cname, value=Afghanistan, timestamp=1295211346709448) = (column=code, value=AF, timestamp=1295211346709448) I don't see any sorting here?!
Re: Cassandra in less than 1G of memory?
If it's because of swapping made by Linux, wouldn't I only see the swap memory consumption rise ? Because the problem is (apart from swap becoming bigger and bigger) that cassandra ram memory consumption is going through the roof. However I want to give a try to the proposed method. Thank you very much, Best Regards, Victor Kabdebon PS : memory consumption : root 19093 0.1 35.8 *1362108 722312* ? Sl Jan11 14:01 /usr/bin/java -ea -Xms128M -Xmx512M -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -XX:+CMSParallelRemarkEnabled -XX:SurvivorRatio=8 -XX:MaxTenuringThreshold=1 -XX:CMSInitiatingOccupancyFraction=75 -XX:+UseCMSInitiatingOccupancyOnly -XX:+HeapDumpOnOutOfMemoryError -Dcom.sun.management.jmxremote.port=8081 -Dcom.sun.management.jmxremote.ssl=false -Dcom.sun.management.jmxremote.authenticate=false -Dstorage-config=bin/../conf -Dcassandra-foreground=yes -cp bin/../conf:bin/../build/classes:bin/../lib/antlr-3.1.3.jar:bin/../lib/apache-cassandra-0.6.6.jar:bin/../lib/avro-1.2.0-dev.jar:bin/../lib/cassandra-javautils.jar:bin/../lib/clhm-production.jar:bin/../lib/commons-cli-1.1.jar:bin/../lib/commons-codec-1.2.jar:bin/../lib/commons-collections-3.2.1.jar:bin/../lib/commons-io-1.4.jar:bin/../lib/commons-lang-2.4.jar:bin/../lib/commons-pool-1.5.4.jar:bin/../lib/google-collections-1.0.jar:bin/../lib/hadoop-core-0.20.1.jar:bin/../lib/hector-0.6.0-14.jar:bin/../lib/high-scale-lib.jar:bin/../lib/ivy-2.1.0.jar:bin/../lib/jackson-core-asl-1.4.0.jar:bin/../lib/jackson-mapper-asl-1.4.0.jar:bin/../lib/jline-0.9.94.jar:bin/../lib/json-simple-1.1.jar:bin/../lib/libthrift-r917130.jar:bin/../lib/log4j-1.2.14.jar:bin/../lib/perf4j-0.9.12.jar:bin/../lib/slf4j-api-1.5.8.jar:bin/../lib/slf4j-log4j12-1.5.8.jar:bin/../lib/uuid-3.1.jar org.apache.cassandra.thrift.CassandraDaemon 2011/1/16 Aaron Morton aa...@thelastpickle.com The OS will make it's best guess as to how much memory if can give over to mmapped files. Unfortunately it will not always makes the best decision, see the information on adding JNA and mlockall() support in cassandra 0.6.5 http://www.datastax.com/blog/whats-new-cassandra-065 http://www.datastax.com/blog/whats-new-cassandra-065As Jonathan says, try setting the disk mode to standard to see the difference. WRT the resident memory for the process, not all memory allocation is done on the heap. To see the non heap usage connect to the processing using JConsole and take a look at the Memory tab. For example on my box now Cassandra has 110M of heap memory and 20M of non heap. AFAIK memory such as the class definitions are not included in the heap memory usage. Hope that helps. Aaron On 15 Jan, 2011,at 08:03 PM, Victor Kabdebon victor.kabde...@gmail.com wrote: Hi Jonathan, hi Edward, Jonathan : but it looks like mmaping wants to consume the entire memory of my server. It goes up to 1.7 Gb for a ridiculously small amount of data. Am I doing something wrong or is there something I should change to prevent this never ending increase of memory consumption ? Edward : I am not sure, I will try to see that tomorrow but my disk access mode is standard, not mmap. Anyway thank you very much, Victor K. PS : here is some hours after the result of ps aux | grep cassandra root 19093 0.1 30.0 1243940 *605060* ? Sl Jan11 10:15 /usr/bin/java -ea -Xms128M *-Xmx512M* -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -XX:+CMSParallelRemarkEnabled -XX:SurvivorRatio=8 -XX:MaxTenuringThreshold=1 -XX:CMSInitiatingOccupancyFraction=75 -XX:+UseCMSInitiatingOccupancyOnly -XX:+HeapDumpOnOutOfMemoryError -Dcom.sun.management.jmxremote.port=8081 -Dcom.sun.management.jmxremote.ssl=false -Dcom.sun.management.jmxremote.authenticate=false -Dstorage-config=bin/../conf -Dcassandra-foreground=yes -cp bin/../conf:bin/../build/classes:bin/../lib/antlr-3.1.3.jar:bin/../lib/apache-cassandra-0.6.6.jar:bin/../lib/avro-1.2.0-dev.jar:bin/../lib/cassandra-javautils.jar:bin/../lib/clhm-production.jar:bin/../lib/commons-cli-1.1.jar:bin/../lib/commons-codec-1.2.jar:bin/../lib/commons-collections-3.2.1.jar:bin/../lib/commons-io-1.4.jar:bin/../lib/commons-lang-2.4.jar:bin/../lib/commons-pool-1.5.4.jar:bin/../lib/google-collections-1.0.jar:bin/../lib/hadoop-core-0.20.1.jar:bin/../lib/hector-0.6.0-14.jar:bin/../lib/high-scale-lib.jar:bin/../lib/ivy-2.1.0.jar:bin/../lib/jackson-core-asl-1.4.0.jar:bin/../lib/jackson-mapper-asl-1.4.0.jar:bin/../lib/jline-0.9.94.jar:bin/../lib/json-simple-1.1.jar:bin/../lib/libthrift-r917130.jar:bin/../lib/log4j-1.2.14.jar:bin/../lib/perf4j-0.9.12.jar:bin/../lib/slf4j-api-1.5.8.jar:bin/../lib/slf4j-log4j12-1.5.8.jar:bin/../lib/uuid-3.1.jar org.apache.cassandra.thrift.CassandraDaemon 2011/1/15 Jonathan Ellis jbel...@gmail.com mmapping only consumes memory that the OS can afford to feed it. On Fri, Jan 14, 2011 at 7:29 PM, Edward Capriolo edlinuxg...@gmail.com wrote: On Fri, Jan 14, 2011 at 2:13 PM, Victor Kabdebon victor.kabde...@gmail.com wrote: Dear rajat
Re: Cassandra in less than 1G of memory?
Dear rajat, Yes it is possible, I have the same constraints. However I must warn you, from what I see Cassandra memory consumption is not bounded in 0.6.X on debian 64 Bit Here is an example of an instance launch in a node : root 19093 0.1 28.3 1210696 *570052* ? Sl Jan11 9:08 /usr/bin/java -ea -Xms128M *-Xmx512M *-XX:+UseParNewGC -XX:+UseConcMarkSweepGC -XX:+CMSParallelRemarkEnabled -XX:SurvivorRatio=8 -XX:MaxTenuringThreshold=1 -XX:CMSInitiatingOccupancyFraction=75 -XX:+UseCMSInitiatingOccupancyOnly -XX:+HeapDumpOnOutOfMemoryError -Dcom.sun.management.jmxremote.port=8081 -Dcom.sun.management.jmxremote.ssl=false -Dcom.sun.management.jmxremote.authenticate=false -Dstorage-config=bin/../conf -Dcassandra-foreground=yes -cp bin/../conf:bin/../build/classes:bin/../lib/antlr-3.1.3.jar:bin/../lib/apache-cassandra-0.6.6.jar:bin/../lib/avro-1.2.0-dev.jar:bin/../lib/cassandra-javautils.jar:bin/../lib/clhm-production.jar:bin/../lib/commons-cli-1.1.jar:bin/../lib/commons-codec-1.2.jar:bin/../lib/commons-collections-3.2.1.jar:bin/../lib/commons-io-1.4.jar:bin/../lib/commons-lang-2.4.jar:bin/../lib/commons-pool-1.5.4.jar:bin/../lib/google-collections-1.0.jar:bin/../lib/hadoop-core-0.20.1.jar:bin/../lib/hector-0.6.0-14.jar:bin/../lib/high-scale-lib.jar:bin/../lib/ivy-2.1.0.jar:bin/../lib/jackson-core-asl-1.4.0.jar:bin/../lib/jackson-mapper-asl-1.4.0.jar:bin/../lib/jline-0.9.94.jar:bin/../lib/json-simple-1.1.jar:bin/../lib/libthrift-r917130.jar:bin/../lib/log4j-1.2.14.jar:bin/../lib/perf4j-0.9.12.jar:bin/../lib/slf4j-api-1.5.8.jar:bin/../lib/slf4j-log4j12-1.5.8.jar:bin/../lib/uuid-3.1.jar org.apache.cassandra.thrift.CassandraDaemon Look at the second bold value, Xmx indicates the maximum memory that cassandra can use; it is set to be 512, so it could easily fit into 1 Gb. Now look at the first one, 570Mb 512 Mb. Moreover if I come back in one day the first value will be even higher. Probably around 610 Mb. Actually it increases to the point where I need to restart it otherwise other program are shut down by Linux for cassandra to further expand its memory usage... By the way it's a call to other cassandra users, am I the only one to encounter this problem ? Best regards, Victor K. 2011/1/14 Rajat Chopra rcho...@makara.com Hello. According to JVM heap size topic at http://wiki.apache.org/cassandra/MemtableThresholds , Cassandra would need atleast 1G of memory to run. Is it possible to have a running Cassandra cluster with machines that have less than that memory… say 512M? I can live with slow transactions, no compactions etc, but do not want an OutOfMemory error. The reason for a smaller bound for Cassandra is that I want to leave room for other processes to run. Please help with specific parameters to tune. Thanks, Rajat
Re: Cassandra in less than 1G of memory?
Hi Jonathan, hi Edward, Jonathan : but it looks like mmaping wants to consume the entire memory of my server. It goes up to 1.7 Gb for a ridiculously small amount of data. Am I doing something wrong or is there something I should change to prevent this never ending increase of memory consumption ? Edward : I am not sure, I will try to see that tomorrow but my disk access mode is standard, not mmap. Anyway thank you very much, Victor K. PS : here is some hours after the result of ps aux | grep cassandra root 19093 0.1 30.0 1243940 *605060* ? Sl Jan11 10:15 /usr/bin/java -ea -Xms128M *-Xmx512M* -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -XX:+CMSParallelRemarkEnabled -XX:SurvivorRatio=8 -XX:MaxTenuringThreshold=1 -XX:CMSInitiatingOccupancyFraction=75 -XX:+UseCMSInitiatingOccupancyOnly -XX:+HeapDumpOnOutOfMemoryError -Dcom.sun.management.jmxremote.port=8081 -Dcom.sun.management.jmxremote.ssl=false -Dcom.sun.management.jmxremote.authenticate=false -Dstorage-config=bin/../conf -Dcassandra-foreground=yes -cp bin/../conf:bin/../build/classes:bin/../lib/antlr-3.1.3.jar:bin/../lib/apache-cassandra-0.6.6.jar:bin/../lib/avro-1.2.0-dev.jar:bin/../lib/cassandra-javautils.jar:bin/../lib/clhm-production.jar:bin/../lib/commons-cli-1.1.jar:bin/../lib/commons-codec-1.2.jar:bin/../lib/commons-collections-3.2.1.jar:bin/../lib/commons-io-1.4.jar:bin/../lib/commons-lang-2.4.jar:bin/../lib/commons-pool-1.5.4.jar:bin/../lib/google-collections-1.0.jar:bin/../lib/hadoop-core-0.20.1.jar:bin/../lib/hector-0.6.0-14.jar:bin/../lib/high-scale-lib.jar:bin/../lib/ivy-2.1.0.jar:bin/../lib/jackson-core-asl-1.4.0.jar:bin/../lib/jackson-mapper-asl-1.4.0.jar:bin/../lib/jline-0.9.94.jar:bin/../lib/json-simple-1.1.jar:bin/../lib/libthrift-r917130.jar:bin/../lib/log4j-1.2.14.jar:bin/../lib/perf4j-0.9.12.jar:bin/../lib/slf4j-api-1.5.8.jar:bin/../lib/slf4j-log4j12-1.5.8.jar:bin/../lib/uuid-3.1.jar org.apache.cassandra.thrift.CassandraDaemon 2011/1/15 Jonathan Ellis jbel...@gmail.com mmapping only consumes memory that the OS can afford to feed it. On Fri, Jan 14, 2011 at 7:29 PM, Edward Capriolo edlinuxg...@gmail.com wrote: On Fri, Jan 14, 2011 at 2:13 PM, Victor Kabdebon victor.kabde...@gmail.com wrote: Dear rajat, Yes it is possible, I have the same constraints. However I must warn you, from what I see Cassandra memory consumption is not bounded in 0.6.X on debian 64 Bit Here is an example of an instance launch in a node : root 19093 0.1 28.3 1210696 570052 ? Sl Jan11 9:08 /usr/bin/java -ea -Xms128M -Xmx512M -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -XX:+CMSParallelRemarkEnabled -XX:SurvivorRatio=8 -XX:MaxTenuringThreshold=1 -XX:CMSInitiatingOccupancyFraction=75 -XX:+UseCMSInitiatingOccupancyOnly -XX:+HeapDumpOnOutOfMemoryError -Dcom.sun.management.jmxremote.port=8081 -Dcom.sun.management.jmxremote.ssl=false -Dcom.sun.management.jmxremote.authenticate=false -Dstorage-config=bin/../conf -Dcassandra-foreground=yes -cp bin/../conf:bin/../build/classes:bin/../lib/antlr-3.1.3.jar:bin/../lib/apache-cassandra-0.6.6.jar:bin/../lib/avro-1.2.0-dev.jar:bin/../lib/cassandra-javautils.jar:bin/../lib/clhm-production.jar:bin/../lib/commons-cli-1.1.jar:bin/../lib/commons-codec-1.2.jar:bin/../lib/commons-collections-3.2.1.jar:bin/../lib/commons-io-1.4.jar:bin/../lib/commons-lang-2.4.jar:bin/../lib/commons-pool-1.5.4.jar:bin/../lib/google-collections-1.0.jar:bin/../lib/hadoop-core-0.20.1.jar:bin/../lib/hector-0.6.0-14.jar:bin/../lib/high-scale-lib.jar:bin/../lib/ivy-2.1.0.jar:bin/../lib/jackson-core-asl-1.4.0.jar:bin/../lib/jackson-mapper-asl-1.4.0.jar:bin/../lib/jline-0.9.94.jar:bin/../lib/json-simple-1.1.jar:bin/../lib/libthrift-r917130.jar:bin/../lib/log4j-1.2.14.jar:bin/../lib/perf4j-0.9.12.jar:bin/../lib/slf4j-api-1.5.8.jar:bin/../lib/slf4j-log4j12-1.5.8.jar:bin/../lib/uuid-3.1.jar org.apache.cassandra.thrift.CassandraDaemon Look at the second bold value, Xmx indicates the maximum memory that cassandra can use; it is set to be 512, so it could easily fit into 1 Gb. Now look at the first one, 570Mb 512 Mb. Moreover if I come back in one day the first value will be even higher. Probably around 610 Mb. Actually it increases to the point where I need to restart it otherwise other program are shut down by Linux for cassandra to further expand its memory usage... By the way it's a call to other cassandra users, am I the only one to encounter this problem ? Best regards, Victor K. 2011/1/14 Rajat Chopra rcho...@makara.com Hello. According to JVM heap size topic at http://wiki.apache.org/cassandra/MemtableThresholds , Cassandra would need atleast 1G of memory to run. Is it possible to have a running Cassandra cluster with machines that have less than that memory… say 512M? I can live with slow transactions, no compactions etc, but do not want an OutOfMemory error. The reason for a smaller bound for Cassandra is that I want
Storing big objects into columns
Dear all, In a project I would like to store big objects in columns, serialized. For example entire images (several Ko to several Mo), flash animations (several Mo) etc... Does someone use Cassandra with those relatively big columns and if yes does it work well ? Is there any drawbacks using this method ? Thank you, Victor K.
Re: Storing big objects into columns
Is there any recommanded maximum size for a Column ? (not the very upper limit which is 2Gb) Why is it useful to chunk the content into multiple columns ? Thank you, Victor K. 2011/1/13 Ryan King r...@twitter.com On Thu, Jan 13, 2011 at 2:38 PM, Victor Kabdebon victor.kabde...@gmail.com wrote: Dear all, In a project I would like to store big objects in columns, serialized. For example entire images (several Ko to several Mo), flash animations (several Mo) etc... Does someone use Cassandra with those relatively big columns and if yes does it work well ? Is there any drawbacks using this method ? I haven't benchmarked this myself, but I think you'll want to chunk your content into multiple columns in the same row. -ryan
Re: Storing big objects into columns
Ok thank you very much for these information ! If somebody has more insights on this matter I am still interested ! Victor K. 2011/1/13 Ryan King r...@twitter.com On Thu, Jan 13, 2011 at 2:44 PM, Victor Kabdebon victor.kabde...@gmail.com wrote: Is there any recommanded maximum size for a Column ? (not the very upper limit which is 2Gb) Why is it useful to chunk the content into multiple columns ? I think you're going to have to do some tests yourself. You want to chunk it so that you can pseudo-stream the content. You don't want to have to load the whole content at once. -ryan