Re: solandra or pig or....?

2011-06-21 Thread Victor Kabdebon
I can speak for what I know :

Pig I have taken only a quick look and maybe some guys from Twitter can
answer better than me on that particular program. Pig is not for on demand
queries: they are quite slow and as you said you extract relevant
information and append it to another CF where you can retrieve quickly the

SolR is purely a search engine. It is not only text based but also time
based etc... To do statistics you need mathematical operations, statistics,
SolR won't provide that. It can do simple things in terms of statistics but
mostly it is a search engine.

Personally for what you are asking I would use Pig and stock that in CF. I
would update those CF regularly. For simple statistics you can generate them
with your favorite language or a specialized language such as R as long as
it concerns small sets.

Hope it helps,
Victor Kabdebon

2011/6/21 Sasha Dolgy


 Simple question ... Assuming my current use case is the ability to log
 lots of trivial and seemingly useless sports statistics ... I want a
 user to be able to query / compare  For example:

 -- Show me all baseball players in cheektowaga and ontario,
 california who have hit a grandslam on tuesdays where it was just a
 leap year.

 Each baseball player is represented by a single row in a CF:

 player_uuid, fullname, hometown, game1, game2, game3, game4

 Game's are UUID's that are a reference to another row in the same CF
 that provides information about that game...

 location, final score, date (unix timestamp or ISO format) , and
 statitics which are represented as a new column timestamp:player_uuid

 I can use PIG, as I understand, to run a query to generate specific
 information about specific things and populate that data back into
 Cassandra in another CF ... similar to the hypothetical search
 aboveas the information is structured already, i assume PIG is the
 right tool for the job, but may not be ideal for a web application and
 enabling ad-hoc queries ... it could take anywhere from 2-?
 seconds for that query to generate, populate, and return to the

 On the other hand, I have started to read about Solr / Solandra /
 Lucandra  can this provide similar functionality or better ?  or
 is it more geared towards full text search and indexing ...

 I don't want to get into the habit of guessing what my potential users
 want to search for ... trying to think of ways to offload this to

 Sasha Dolgy

Re: New web client future API

2011-06-15 Thread Victor Kabdebon
Ok thanks for the update. I thought the query string was translated to
Thrift, then send to a server.

Victor Kabdebon

2011/6/15 Eric Evans

 On Tue, 2011-06-14 at 09:49 -0400, Victor Kabdebon wrote:
  Actually from what I understood (please correct me if I am wrong) CQL
  is based on Thrift / Avro.

 In this project, we tend to use the word Thrift as a sort of shorthand
 for Cassandra's RPC interface, and not, The serialization and RPC
 framework from the Apache Thrift project.

 CQL does not (yet )have its own networking protocol, so it uses Thrift
 as a means of delivering queries, and serializing the results, but it is
 *not* a wrapper around the existing RPC methods.  The query string you
 provide is parsed entirely on the server.

 Eric Evans

Re: New web client future API

2011-06-14 Thread Victor Kabdebon
Hello Markus,

Actually from what I understood (please correct me if I am wrong) CQL is
based on Thrift / Avro.

Victor Kabdebon

2011/6/14 Markus Wiesenbacher |


 what is the future API for Cassandra? Thrift, Avro, CQL?

 I just released an early version of my web client 
 ( which is Thrift-based, and therefore I
 would like to know what the future is ...

 Many thanks

Re: When should I use Solandra?

2011-06-04 Thread Victor Kabdebon
Why do you need Solandra for storing data ? If you want to retrieve data
simply use Cassandra. Solandra is for research and indexing it is a search
engine. I do not recommand you to store data uniquely in a search engine.

Use the following desgin :

*Store ALL data in Cassandra then extract from Cassandra only the data you
need to index in Solandra. For what it matters you can use Solr instead of
Solandra. In SolR you have something called schema.xml where you can set up
which fields to index. My advice is do not store you passwords in plain
text. Add salt (random sequence) AND hash it then insert the bytes in
Cassandra. Otherwise you'll end up like Sony and a massive lawsuit when
hackers will breach in your website and steal the passwords.*

If you really want to use Solandra I guess there is an equivalent to the
schema.xml where you have lines to tell wether or not to index some fields.

Victor Kabdebon

2011/6/4 Jean-Nicolas Boulay Desjardins


 I am planning to use Cassandra to store my users passwords and at the same
 time data for my website that need to be accessible via search. My Question
 is should I use two DB: Cassandra (for users passwords) and Solandra (for
 the websites data) or can I put everything in Solandra?

 Is there a way to stop Solandra from indexing my users passwords?

 Thanks in advance for any help.

Re: Appending to fields

2011-05-31 Thread Victor Kabdebon
As Jonathan stated I believe that the insert is in O(N + M), unless there
are some operations that I don't know.

There are other NoSQL database that  can be used with Cassandra as buffers
for quick access and modification and then after the content can be dumped
into Cassandra for long term storage. Here is an example with Redis :
The append command is said to be in O(1) but it is a little bit suspicious
to me...

Best regards,
Victor Kabdebon

2011/5/31 Jonathan Ellis

 On Tue, May 31, 2011 at 2:22 PM, Marcus Bointon wrote:
  mysql reads the entire value of y, appends the data, then writes the
 whole thing back, which unfortunately is an O(n^2) operation.

 Actually, this analysis is incorrect. Appending M bytes to N is O(N +
 M) which isn't the same as N^2 at all.

 At least in Cassandra, nor can I think of any possible algorithm which
 would allow MySQL to achieve N^2, but I don't claim to be an expert

 Jonathan Ellis
 Project Chair, Apache Cassandra
 co-founder of DataStax, the source for professional Cassandra support

Re: Cassandra 0.8 questions

2011-05-24 Thread Victor Kabdebon
It's not really possible to give a general answer your second question, it
depends of your implementation. Personally I do two thing : the first one is
to map arrays with a key and then name of column as a key of your array and
value of column as the data storage. However for some application, as I am
using Java I just serialize my ArrayList (or List) and push all the content
to one column. It all depends on what you want to achieve.

Third question: try to make CF according to what you want to achieve. I am
designing an internal messaging system I use only two column family to hold
the message lists, message and message box. I would have used one; but I
need one that is sorted by TimeUUID and the other one by UTF8Type. I think
there is a general consensus here : try to avoid super columns. 2 sets of
columns can do the same jobs has one SuperColumn and it's
the preferred scheme.

Again just experiment and be ready to change your organization if you begin
with Cassandra, this is the best way to figure out what to do for your data

Victor Kabdebon

2011/5/24 Jian Fang

 Does anyone have a good suggestion on my second question? I believe that
 question is a pretty common one.

 My third question is a design question. For the same data, we can stored
 them into multiple column families or a single column family with multiple
 super columns.
 From Cassandra read/write performance point of view, what are the general
 rules to make mutliple column families and when to use a single column

 Thanks again,


 On Mon, May 23, 2011 at 5:47 PM, Jian Fang 


 I am pretty new to Cassandra and am going to use Cassandra 0.8.0. I have
 two questions (sorry if they are very basic ones):

 1) I have a column family to hold many super columns, say 30. When I first
 insert the data to the column family, do I need to insert each column one at
 a time or can I insert the whole column family in one transaction (or
 call?)? The latter one seems to be more efficient to me. Does Cassandra
 support that?

 For example, I saw the following code to do insertion (with Hector),

 Mutator m = HFactory.createMutator(keyspace, stringSerializer);
 //MutatorString m =
 m.insert(p.getCassandraKey(), colFamily,
 m.insert(p.getCassandraKey(), colFamily,
 p.getCompressedXML(), StringSerializer.get(),

 Will the insertions be two separate calls to Cassandra? Or they are just
 one transaction? If it is the former case, is there any way to make them as
 one call to Cassandra?

 2) How to store a list/array of data in Cassandra? For example, I have a
 data field called categories, which include none or many categories and each
 category includes a category id and a category description. Usually, how do
 people handle this scenario when they use Cassandra?

 Thanks in advance,


Re: CQL v1.0.0: why super column family not descirbed in it?

2011-05-05 Thread Victor Kabdebon
Hello Eric,

Compound columns seem to be a very interesting feature. Do you have any idea
in which Cassandra version it is going to be introduced : 0.8.X or 0.9.X ?



2011/5/5 Eric Evans

 On Thu, 2011-05-05 at 18:19 +0800, Guofeng Zhang wrote:
  I read the CQL v1.0 document. There are operations about column
  families, but it does not describe how to operate on super column
  families. Why? Does this mean that super column families would not be
  supported by CQL in this version? Will it be supported in the future?

 No CQL will never support super columns, but later versions (not 1.0.0)
 will support compound columns.  Compound columns are better; instead of
 a two-deep structure, you can have one of arbitrary depth.

 What you see is what you get for 1.0.0, there simply wasn't enough time
 to do everything (you have to start somewhere).

 Eric Evans

Re: CQL v1.0.0: why super column family not descirbed in it?

2011-05-05 Thread Victor Kabdebon
Thank you, I will look into that and I will probably wait until there is an
out of the box comparator. But it's an excellent new feature !

Victor K.

2011/5/5 Eric Evans

 On Thu, 2011-05-05 at 10:49 -0400, Victor Kabdebon wrote:
  Hello Eric,
  Compound columns seem to be a very interesting feature. Do you have any
  in which Cassandra version it is going to be introduced : 0.8.X or 0.9.X

 You can use these today with a custom comparator[1].  There is an open
 issue[2] (marked as for-0.8.1) to ship one out-of-the-box.

 Language support[3] for CQL will probably take a bit longer.


 Eric Evans

Re: database design

2011-04-13 Thread Victor Kabdebon
Dear Jean-Yves,

You can have a different approach of the problem.
You need on one side a relational database (MySQL, PostGreSQL) or SolR (as
an very efficient index) and on the other side Cassandra. The relational
database or SolR must contain the minimum amount of information possible : a
date and only the relevant data. It enabled me to keep a simple model for
Cassandra will act as a vault where you keep all the data and then you
dispatch the data from Cassandra to the relational database or SolR. When
you want to query you query against SolR or the relational data the key /
column / supercolumn and you retrieve the complete data from Cassandra. The
hard thing is to maintain the coherence between the query part and the
Cassandra part.
I speak from personal experience but it was very hard for me to use only
Cassandra to do everything my (small amateur) website needed. Now I found an
alternative I use : Cassandra (data vault) + Redis (Sessions and other
volatile data) + SolR (Search engine) + PostGreSQL ( for relational

Best regards,
Victor Kabdebon

2011/4/13 Edward Capriolo

 On Wed, Apr 13, 2011 at 10:39 AM, Jean-Yves LEBLEU
  Hi all,
  Just some thoughts and question I have about cassandra data modeling.
  If I understand well, cassandra is better on writing than on reading.
  So you have to think about your queries to design cassandra schema. We
  are doing incremental design, and already have our system in
  production and we have to develop new queries.
  How do you usualy do when you have new queries, do you write a
  specific job to update data in the database to match the new query you
  are writing ?
  Thanks for your help.

 Good point, Generally you will need to write some type of range
 scanning/map reduce application to process and back fill your data.

Re: Abnormal memory consumption

2011-04-04 Thread Victor Kabdebon
And about the production 7Gb or RAM is sufficient ? Or 11 Gb is the minimum
Thank you for your inputs for the JVM I'll try to tune that

2011/4/4 Peter Schuller

  You can change VM settings and tweak things like memtable thresholds
  and in-memory compaction limits to get it down and get away with a
  smaller heap size, but honestly I don't recommend doing so unless
  you're willing to spend some time getting that right and probably
  repeating some of the work in the future with future versions of

 That said, if you do want to do so to give it a try, I suggest (1)
 changing cassandra-env to remove all the GC stuff:

 JVM_OPTS=$JVM_OPTS -XX:+CMSParallelRemarkEnabled
 JVM_OPTS=$JVM_OPTS -XX:SurvivorRatio=8
 JVM_OPTS=$JVM_OPTS -XX:MaxTenuringThreshold=1
 JVM_OPTS=$JVM_OPTS -XX:CMSInitiatingOccupancyFraction=75
 JVM_OPTS=$JVM_OPTS -XX:+UseCMSInitiatingOccupancyOnly

 And then setting a fixed heap size, and removing the manual fixation of new


 Then maybe remove the initial heap size enforcement, but that might
 not help depending:


 And then go through cassandra.yaml and tune down all the various
 limitations. Less concurrent readers/writers, all the *_mb_* settings
 way down, and the RPC framing limitations.

 But let me re-iterate: I don't recommend running in any such
 configuration in production. But if you just want it running for
 testing/for just being available, with no special requirements, and
 not in production, the above might work. I haven't really tested it
 myself; there may be gotchas involved.

 / Peter Schuller

Re: memory consuption

2011-02-17 Thread Victor Kabdebon
Is it possible to change the maximum JVM heap memory use in 0.6.X ?

2011/2/17 Aaron Morton

 What are you using for disk_access_mode ?
 Have you tried reducing the JVM head size?
 Have you added the Jna.jar file to lib/ ? This will allow Cassandra to lock
 the JVM memory.


 On 17/02/2011, at 9:20 PM, ruslan usifov wrote:

 2011/2/16 Aaron Morton

 JVM heap memory is controlled by the settings in conf/

 Memory mapped files will use additional virtual memory, is controlled in
 conf/Cassandra.yaml disk_access_mode

 And??? JVM memory heap in cassandra 0.7 is by default half of memory is
 system in my case 4GB, here is a part of

 case `uname` in
 system_memory_in_mb=`free -m | awk '/Mem:/ {print $2}'`
 MAX_HEAP_SIZE=$((system_memory_in_mb / 2))M
 return 0
 system_memory_in_bytes=`sysctl hw.physmem | awk '{print $2}'`
 MAX_HEAP_SIZE=$((system_memory_in_bytes / 1024 / 1024 / 2))M
 return 0
 return 1

 I set all this options by default. All my nodes have 8GB of memory. And i
 affraid that after some time all my nodes goes to hard swap, and only reboot
 help them :-(((

 PS: as i understand that down sometime of cassandra is normal?

Re: memory consuption

2011-02-17 Thread Victor Kabdebon
Oh right but Cassandra doesn't really respect that, I thought there was
another option to set that.

Just for your information, I set xms and xmx very low with a small amount of
data. I am waiting to be able to connect jconsole, I don't know why it is
not reachable at the moment. Here is my result :

105  26115  0.2 27.3 1125328 755316 ?  Sl   Feb09  23:58
/usr/bin/java -ea -Xms64M -Xmx128M

2011/2/17 Aaron Morton

 set Xms and Xmx in the JVM_OPTS


 On 18 Feb, 2011,at 09:10 AM, Victor Kabdebon

 Is it possible to change the maximum JVM heap memory use in 0.6.X ?

 2011/2/17 Aaron Morton

 What are you using for disk_access_mode ?
 Have you tried reducing the JVM head size?
 Have you added the Jna.jar file to lib/ ? This will allow Cassandra to
 lock the JVM memory.


 On 17/02/2011, at 9:20 PM, ruslan usifov wrote:

 2011/2/16 Aaron Morton

 JVM heap memory is controlled by the settings in conf/

 Memory mapped files will use additional virtual memory, is controlled in
 conf/Cassandra.yaml disk_access_mode

 And??? JVM memory heap in cassandra 0.7 is by default half of memory is
 system in my case 4GB, here is a part of

 case `uname` in
 system_memory_in_mb=`free -m | awk '/Mem:/ {print $2}'`
 MAX_HEAP_SIZE=$((system_memory_in_mb / 2))M
 return 0
 system_memory_in_bytes=`sysctl hw.physmem | awk '{print $2}'`
 MAX_HEAP_SIZE=$((system_memory_in_bytes / 1024 / 1024 / 2))M
 return 0
 return 1

 I set all this options by default. All my nodes have 8GB of memory. And i
 affraid that after some time all my nodes goes to hard swap, and only reboot
 help them :-(((

 PS: as i understand that down sometime of cassandra is normal?

Re: memory consuption

2011-02-17 Thread Victor Kabdebon
Sorry I forgot to say that this is the partial result of :
ps aux | grep cassandra

Best regards

2011/2/17 Victor Kabdebon

 Oh right but Cassandra doesn't really respect that, I thought there was
 another option to set that.

 Just for your information, I set xms and xmx very low with a small amount
 of data. I am waiting to be able to connect jconsole, I don't know why it is
 not reachable at the moment. Here is my result :

 105  26115  0.2 27.3 1125328 755316 ?  Sl   Feb09  23:58
 /usr/bin/java -ea -Xms64M -Xmx128M

 2011/2/17 Aaron Morton

 set Xms and Xmx in the JVM_OPTS


 On 18 Feb, 2011,at 09:10 AM, Victor Kabdebon

 Is it possible to change the maximum JVM heap memory use in 0.6.X ?

 2011/2/17 Aaron Morton

 What are you using for disk_access_mode ?
 Have you tried reducing the JVM head size?
 Have you added the Jna.jar file to lib/ ? This will allow Cassandra to
 lock the JVM memory.


 On 17/02/2011, at 9:20 PM, ruslan usifov

 2011/2/16 Aaron Morton

 JVM heap memory is controlled by the settings in conf/

 Memory mapped files will use additional virtual memory, is controlled in
 conf/Cassandra.yaml disk_access_mode

 And??? JVM memory heap in cassandra 0.7 is by default half of memory is
 system in my case 4GB, here is a part of

 case `uname` in
 system_memory_in_mb=`free -m | awk '/Mem:/ {print $2}'`
 MAX_HEAP_SIZE=$((system_memory_in_mb / 2))M
 return 0
 system_memory_in_bytes=`sysctl hw.physmem | awk '{print $2}'`
 MAX_HEAP_SIZE=$((system_memory_in_bytes / 1024 / 1024 / 2))M
 return 0
 return 1

 I set all this options by default. All my nodes have 8GB of memory. And i
 affraid that after some time all my nodes goes to hard swap, and only reboot
 help them :-(((

 PS: as i understand that down sometime of cassandra is normal?

Re: Cassandra memory consumption

2011-02-16 Thread Victor Kabdebon
Yes I didn't see there was 2 different parameters. I was personally setting
( in cassandra 0.6.6 ) MemTableThoughputInMB, but I don't know what
BinaryMemtableThroughtputInMB is.

And I take this opportunity to ask a question :
If you have a small amount of data per key so that your memtable is maybe a
few Ko big. Is the memory footprint of the memtable going to be
MemTableThoughputInMB mb or few Ko + overhead ?

Ruslan I have seen your question in the other mail and I have the same
problem. How many CF do you have ?

2011/2/16 ruslan usifov

 Each of your 21 column families will have its own memtable if you have
 the default memtable settings your memory usage will grow quite large
 over time. Have you tuned down your memtable size?

 Which config parameter make this? binary_memtable_throughput_in_mb?

Re: Cassandra memory consumption

2011-02-16 Thread Victor Kabdebon
Someone please correct me if I am wrong, but I think the overhead you can
expect is something like :

16* MemTableThroughtPutInMB
 but I don't know when BinaryMemTableThroughputInMb come into account..

2011/2/16 ruslan usifov

 2011/2/16 Victor Kabdebon

 Ruslan I have seen your question in the other mail and I have the same
 problem. How many CF do you have ?


Re: Cassandra memory consumption

2011-02-16 Thread Victor Kabdebon
Thanks robert, and do you know if there is a way to control the maximum
likely number of memtables ? (I'd like to cap it at 2)

2011/2/16 Robert Coli

 On Wed, Feb 16, 2011 at 7:12 AM, Victor Kabdebon wrote:
  Someone please correct me if I am wrong, but I think the overhead you can
  expect is something like :

 MemTableThroughtPutInMB * JavaOverheadFudgeFactor * maximum likely
 number of such memtables which might exist at once, due to flushing

 JavaOverHeadFudgeFactor is at least 2.

 The maximum likely number of such memtables is usually roughly 3
 when considered across an assortment of columnfamilies with different
 write patterns.

   but I don't know when BinaryMemTableThroughputInMb come into account..

 BinaryMemTable options are only considered when using the Binary
 Memtable interface. If you don't know what that is, you're not using


Re: online chat scenario

2011-02-15 Thread Victor Kabdebon
Hello Sasha.

In this sort of real time application the way you insert (QUORUM, ONE,
etc..) and  the way you retrieve is extremely important because your data
may not have had the time to propagate to all your nodes. Be sure to use
adequate policies to do that : insert to a certain number of nodes but don't
sacrifice to much time doing that to keep the real time component.
Here is a presentation of how the chat is made in Facebook, it may be useful
to you :

It's more focused on erlang, but it might give you ideas on how to deal with
that problem (I am not sure that DB are the best way to deal with that...
but it's just my opinion).

Victor Kabdebon

2011/2/15 Sasha Dolgy

 thanks for the response.  thinking about this, this would not allow for the
 sorting of messages into a chronological order for end user display.  i had
 thought about having each message as its own column against the room or the
 user, but i have had some inconsistencies in retrieving the data.  sometimes
 i get 3 columns, sometimes i get 50...( i think this is because of the
 random partitioner)

 i had thought about this structure:

 [messages][nickname][message id = message data]
 [chatrooms][room_name][message id]

 this way i can pull all messages a user ever posted, not specific to a
 room.  what i haven't been able to do so far is print the timestamp on the
 row or column.  does this have to be explicitly added somewhere or can it be
 returned as part of a 'get' request?


 On Tue, Feb 15, 2011 at 2:12 PM, Michal Augustýn wrote:

 The schema design depends on chatrooms/users/messages numbers. I.e. you
 can have one CF, where key is chatroom, column name is username, column
 value is the message and message time is the same as column timestamp.
 You can add day-timestamp to the chatroom name to avoid large rows.


 2011/2/15 Andrey V. Panov

 I never did it. But I suppose you can use chatroom name as key and store
 messages  nicks as columns in JSON and timestamp as columnName.

 Sasha Dolgy

Re: Subscribe

2011-02-15 Thread Victor Kabdebon
Looks like your wish has been granted.

2011/2/15 Chris Goffinet

 I would like to subscribe to your newsletter.

 On Tue, Feb 15, 2011 at 8:04 AM, A J wrote:

Re: unique key generation

2011-02-09 Thread Victor Kabdebon
Yes i have done a mistake I know ! But I hoped nobody would notice :).

It is the odds of winning 3 days in a row (standard probability fail). Still
it is totally unlikely

Sorry about this mistake,

Best regards,
Victor K.

Re: Cassandra memory consumption

2011-02-08 Thread Victor Kabdebon
It is really weird that I am the only one to have this issue.
I restarted Cassandra today and already the memory compution is over the
limit :

root  1739  4.0 24.5 664968 *494996* pts/4   SLl  15:51   0:12
/usr/bin/java -ea -Xms128M -Xmx256M -XX:+UseParNewGC -XX:+UseConcMarkSweepGC
-XX:+CMSParallelRemarkEnabled -XX:SurvivorRatio=8 -XX:MaxTenuringThreshold=1
-XX:CMSInitiatingOccupancyFraction=75 -XX:+UseCMSInitiatingOccupancyOnly
-Dstorage-config=bin/../conf -cp

It is really an annoying problem if we cannot really foresee memory

Best regards,
Victor K

2011/2/8 Victor Kabdebon

 Dear all,

 Sorry to come back again to this point but I am really worried about
 Cassandra memory consumption. I have a single machine that runs one
 Cassandra server. There is almost no data on it but I see a crazy memory
 consumption and it doesn't care at all about the instructions...
 Note that I am not using mmap, but Standard, I use also JNA (inside lib
 folder), i am running on debian 5 64 bits, so a pretty normal configuration.
 I also use Cassandra 0.6.8.

 Here are the informations I gathered on Cassandra :

 105  16765  0.1 34.1 1089424* 687476* ?  Sl   Feb02  14:58
 /usr/bin/java -ea* -Xms128M* *-Xmx256M* -XX:+UseParNewGC
 -XX:+UseConcMarkSweepGC -XX:+CMSParallelRemarkEnabled -XX:SurvivorRatio=8
 -XX:MaxTenuringThreshold=1 -XX:CMSInitiatingOccupancyFraction=75
 -XX:+UseCMSInitiatingOccupancyOnly -XX:+HeapDumpOnOutOfMemoryError
 -Dstorage-config=bin/../conf -Dcassandra-foreground=yes -cp

 result of nodetool info :

 *Load : 1,94 MB*
 Generation No: 1296673772
 Uptime (seconds) : 467550
 *Heap Memory (MB) : 120,26 / 253,94*

 I have about 21 column families, none of them have a lot of information (
 as you see I have 2 Mb of text which is really small). Even if I set Xmx at
 256 there is 687M of memory used. Where does this memory come from ? Bad
 garbage collection ? Something that I ignore ?
 Thank you for your help I really need to get rid of that problem.

 Best regards,
 Victor Kabdebon

Re: Cassandra memory consumption

2011-02-08 Thread Victor Kabdebon
Sorry Jonathan :

So most of these informations were taken using the command :

sudo ps aux | grep cassandra

For the nodetool information it is :

/bin/nodetool --host localhost --port 8081 info


Victor K.

2011/2/8 Jonathan Ellis

 I missed the part where you explained where you're getting your numbers

 On Tue, Feb 8, 2011 at 9:32 AM, Victor Kabdebon wrote:
  It is really weird that I am the only one to have this issue.
  I restarted Cassandra today and already the memory compution is over the
  limit :
  root  1739  4.0 24.5 664968 494996 pts/4   SLl  15:51   0:12
  /usr/bin/java -ea -Xms128M -Xmx256M -XX:+UseParNewGC
  -XX:+CMSParallelRemarkEnabled -XX:SurvivorRatio=8
  -XX:CMSInitiatingOccupancyFraction=75 -XX:+UseCMSInitiatingOccupancyOnly
  -Dstorage-config=bin/../conf -cp
  It is really an annoying problem if we cannot really foresee memory
  Best regards,
  Victor K
  2011/2/8 Victor Kabdebon
  Dear all,
  Sorry to come back again to this point but I am really worried about
  Cassandra memory consumption. I have a single machine that runs one
  Cassandra server. There is almost no data on it but I see a crazy memory
  consumption and it doesn't care at all about the instructions...
  Note that I am not using mmap, but Standard, I use also JNA (inside
  folder), i am running on debian 5 64 bits, so a pretty normal
  I also use Cassandra 0.6.8.
  Here are the informations I gathered on Cassandra :
  105  16765  0.1 34.1 1089424 687476 ?  Sl   Feb02  14:58
  /usr/bin/java -ea -Xms128M -Xmx256M -XX:+UseParNewGC
  -XX:+CMSParallelRemarkEnabled -XX:SurvivorRatio=8
  -XX:CMSInitiatingOccupancyFraction=75 -XX:+UseCMSInitiatingOccupancyOnly
  -Dstorage-config=bin/../conf -Dcassandra-foreground=yes -cp
  result of nodetool info :
  Load : 1,94 MB
  Generation No: 1296673772
  Uptime (seconds) : 467550
  Heap Memory (MB) : 120,26 / 253,94
  I have about 21 column families, none of them have a lot of information
  as you see I have 2 Mb of text which is really small). Even if I set Xmx
  256 there is 687M of memory used. Where does this memory come from ? Bad
  garbage collection ? Something that I ignore ?
  Thank you for your help I really need to get rid of that problem.
  Best regards,
  Victor Kabdebon

 Jonathan Ellis
 Project Chair, Apache Cassandra
 co-founder of DataStax, the source for professional Cassandra support

Re: Cassandra memory consumption

2011-02-08 Thread Victor Kabdebon
Information on the system :

*Debian 5*
*Jvm :*
victor@testhost:~/database/apache-cassandra-0.6.6$ java -version
java version 1.6.0_22
Java(TM) SE Runtime Environment (build 1.6.0_22-b04)
Java HotSpot(TM) 64-Bit Server VM (build 17.1-b03, mixed mode)

*RAM :* 2Go

2011/2/8 Victor Kabdebon

 Sorry Jonathan :

 So most of these informations were taken using the command :

 sudo ps aux | grep cassandra

 For the nodetool information it is :

 /bin/nodetool --host localhost --port 8081 info


 Victor K.

 2011/2/8 Jonathan Ellis

 I missed the part where you explained where you're getting your numbers

 On Tue, Feb 8, 2011 at 9:32 AM, Victor Kabdebon wrote:
  It is really weird that I am the only one to have this issue.
  I restarted Cassandra today and already the memory compution is over the
  limit :
  root  1739  4.0 24.5 664968 494996 pts/4   SLl  15:51   0:12
  /usr/bin/java -ea -Xms128M -Xmx256M -XX:+UseParNewGC
  -XX:+CMSParallelRemarkEnabled -XX:SurvivorRatio=8
  -XX:CMSInitiatingOccupancyFraction=75 -XX:+UseCMSInitiatingOccupancyOnly
  -Dstorage-config=bin/../conf -cp
  It is really an annoying problem if we cannot really foresee memory
  Best regards,
  Victor K
  2011/2/8 Victor Kabdebon
  Dear all,
  Sorry to come back again to this point but I am really worried about
  Cassandra memory consumption. I have a single machine that runs one
  Cassandra server. There is almost no data on it but I see a crazy
  consumption and it doesn't care at all about the instructions...
  Note that I am not using mmap, but Standard, I use also JNA (inside
  folder), i am running on debian 5 64 bits, so a pretty normal
  I also use Cassandra 0.6.8.
  Here are the informations I gathered on Cassandra :
  105  16765  0.1 34.1 1089424 687476 ?  Sl   Feb02  14:58
  /usr/bin/java -ea -Xms128M -Xmx256M -XX:+UseParNewGC
  -XX:+CMSParallelRemarkEnabled -XX:SurvivorRatio=8
  -Dstorage-config=bin/../conf -Dcassandra-foreground=yes -cp
  result of nodetool info :
  Load : 1,94 MB
  Generation No: 1296673772
  Uptime (seconds) : 467550
  Heap Memory (MB) : 120,26 / 253,94
  I have about 21 column families, none of them have a lot of information
  as you see I have 2 Mb of text which is really small). Even if I set
 Xmx at
  256 there is 687M of memory used. Where does this memory come from ?
  garbage collection

Re: Cassandra memory consumption

2011-02-08 Thread Victor Kabdebon
I will do that in the future and I will post my results here ( I upgraded
the server to debian 6 to see if there is any change, so memory is back to
normal). I will report in a few days.
In the meantime I am open to any suggestion...

2011/2/8 Aaron Morton

 When you attach to the JVM with JConsole how much non heap memory and how
 much heap memory is reported on the memory tab?

 Xmx controls the total size of the heap memory, which excludes the
 permanent generation.

 Total non-heap memory on a 0.7 box I have is around 27M. You numbers seem
 large but it would be interesting to know what the JVM is reporting.


 On 09 Feb, 2011,at 05:57 AM, Victor Kabdebon

 Information on the system :

 *Debian 5*
 *Jvm :*
 victor@testhost:~/database/apache-cassandra-0.6.6$ java -version
 java version 1.6.0_22
 Java(TM) SE Runtime Environment (build 1.6.0_22-b04)
 Java HotSpot(TM) 64-Bit Server VM (build 17.1-b03, mixed mode)

 *RAM :* 2Go

 2011/2/8 Victor Kabdebon

 Sorry Jonathan :

 So most of these informations were taken using the command :

 sudo ps aux | grep cassandra

 For the nodetool information it is :

 /bin/nodetool --host localhost --port 8081 info


 Victor K.

 2011/2/8 Jonathan Ellis

 I missed the part where you explained where you're getting your numbers

 On Tue, Feb 8, 2011 at 9:32 AM, Victor Kabdebon wrote:
  It is really weird that I am the only one to have this issue.
  I restarted Cassandra today and already the memory compution is over
  limit :
  root  1739  4.0 24.5 664968 494996 pts/4   SLl  15:51   0:12
  /usr/bin/java -ea -Xms128M -Xmx256M -XX:+UseParNewGC
  -XX:+CMSParallelRemarkEnabled -XX:SurvivorRatio=8
  -Dstorage-config=bin/../conf -cp

  It is really an annoying problem if we cannot really foresee memory
  Best regards,
  Victor K
  2011/2/8 Victor Kabdebon
  Dear all,
  Sorry to come back again to this point but I am really worried about
  Cassandra memory consumption. I have a single machine that runs one
  Cassandra server. There is almost no data on it but I see a crazy
  consumption and it doesn't care at all about the instructions...
  Note that I am not using mmap, but Standard, I use also JNA (inside
  folder), i am running on debian 5 64 bits, so a pretty normal
  I also use Cassandra 0.6.8.
  Here are the informations I gathered on Cassandra :
  105  16765  0.1 34.1 1089424 687476 ?  Sl   Feb02  14:58I
 think you are

  /usr/bin/java -ea -Xms128M -Xmx256M -XX:+UseParNewGC
  -XX:+CMSParallelRemarkEnabled -XX:SurvivorRatio=8
  -Dstorage-config=bin/../conf -Dcassandra-foreground=yes -cp

Re: Cassandra memory consumption

2011-02-08 Thread Victor Kabdebon
Yes I have, but I have to add that this is a server where there is so little
data (2.0 Mo of text, rougly a book) that even if there were an overhead due
to those things it would be minimal.
I don't understand what's eating up all that memory, is it because of Linux
that has difficulty getting rid of used memory ... I really am puzzled. (by
the way it is not a Amazon EC2 server this is a dedicated server).

Victor K.

2011/2/8 Edward Capriolo

 On Tue, Feb 8, 2011 at 4:56 PM, Victor Kabdebon wrote:
  I will do that in the future and I will post my results here ( I upgraded
  the server to debian 6 to see if there is any change, so memory is back
  normal). I will report in a few days.
  In the meantime I am open to any suggestion...
  2011/2/8 Aaron Morton
  When you attach to the JVM with JConsole how much non heap memory and
  much heap memory is reported on the memory tab?
  Xmx controls the total size of the heap memory, which excludes the
  permanent generation.
  Total non-heap memory on a 0.7 box I have is around 27M. You numbers
  large but it would be interesting to know what the JVM is reporting.
  On 09 Feb, 2011,at 05:57 AM, Victor Kabdebon
  Information on the system :
  Debian 5
  Jvm :
  victor@testhost:~/database/apache-cassandra-0.6.6$ java -version
  java version 1.6.0_22
  Java(TM) SE Runtime Environment (build 1.6.0_22-b04)
  Java HotSpot(TM) 64-Bit Server VM (build 17.1-b03, mixed mode)
  RAM : 2Go
  2011/2/8 Victor Kabdebon
  Sorry Jonathan :
  So most of these informations were taken using the command :
  sudo ps aux | grep cassandra
  For the nodetool information it is :
  /bin/nodetool --host localhost --port 8081 info
  Victor K.
  2011/2/8 Jonathan Ellis
  I missed the part where you explained where you're getting your
  On Tue, Feb 8, 2011 at 9:32 AM, Victor Kabdebon wrote:
   It is really weird that I am the only one to have this issue.
   I restarted Cassandra today and already the memory compution is over
   limit :
   root  1739  4.0 24.5 664968 494996 pts/4   SLl  15:51   0:12
   /usr/bin/java -ea -Xms128M -Xmx256M -XX:+UseParNewGC
   -XX:+CMSParallelRemarkEnabled -XX:SurvivorRatio=8
   -Dstorage-config=bin/../conf -cp
   It is really an annoying problem if we cannot really foresee memory
   Best regards,
   Victor K
   2011/2/8 Victor Kabdebon
   Dear all,
   Sorry to come back again to this point but I am really worried
   Cassandra memory consumption. I have a single machine that runs one
   Cassandra server. There is almost no data on it but I see a crazy
   consumption and it doesn't care at all about the instructions...
   Note that I am not using mmap, but Standard, I use also JNA
   folder), i am running on debian 5 64 bits, so a pretty normal
   I also use Cassandra 0.6.8.
   Here are the informations I gathered on Cassandra :
   105  16765  0.1 34.1 1089424 687476 ?  Sl   Feb02  14:58I
   think you are
   /usr/bin/java -ea -Xms128M -Xmx256M -XX:+UseParNewGC
   -XX:+CMSParallelRemarkEnabled -XX:SurvivorRatio=8

Re: unique key generation

2011-02-07 Thread Victor Kabdebon
Hello Kallin.
If you use timeUUID the chance to generate two time the same uuid is the
following :
considering that both client generate the uuid at the *same millisecond*,
the chance of generating the same uuid is :

1/1.84467441 × 1019Which is equal to the probability for winning a national
lottery for 1e11 days in a row ( for 270 million years).
Well if you do have a collision you should play the lottery :).

Best regards,
Victor Kabdebon

2011/2/7 Kallin Nagelberg


 I am developing a session management system using Cassandra and need
 to generate unique sessionIDs (cassandra columnfamily keys). Does
 anyone know of an elegant/simple way to accomplish this? I am not sure
 about using time based uuids on the client as there a chance that
 multiple clients could generate the same ID. I've heard suggestions of
 using zookeeper as a source for the IDs, but was just hoping that
 there might be something simpler for my purposes.


Cassandra memory consumption

2011-02-07 Thread Victor Kabdebon
Dear all,

Sorry to come back again to this point but I am really worried about
Cassandra memory consumption. I have a single machine that runs one
Cassandra server. There is almost no data on it but I see a crazy memory
consumption and it doesn't care at all about the instructions...
Note that I am not using mmap, but Standard, I use also JNA (inside lib
folder), i am running on debian 5 64 bits, so a pretty normal configuration.
I also use Cassandra 0.6.8.

Here are the informations I gathered on Cassandra :

105  16765  0.1 34.1 1089424* 687476* ?  Sl   Feb02  14:58
/usr/bin/java -ea* -Xms128M* *-Xmx256M* -XX:+UseParNewGC
-XX:+UseConcMarkSweepGC -XX:+CMSParallelRemarkEnabled -XX:SurvivorRatio=8
-XX:MaxTenuringThreshold=1 -XX:CMSInitiatingOccupancyFraction=75
-XX:+UseCMSInitiatingOccupancyOnly -XX:+HeapDumpOnOutOfMemoryError
-Dstorage-config=bin/../conf -Dcassandra-foreground=yes -cp

result of nodetool info :

*Load : 1,94 MB*
Generation No: 1296673772
Uptime (seconds) : 467550
*Heap Memory (MB) : 120,26 / 253,94*

I have about 21 column families, none of them have a lot of information ( as
you see I have 2 Mb of text which is really small). Even if I set Xmx at 256
there is 687M of memory used. Where does this memory come from ? Bad garbage
collection ? Something that I ignore ?
Thank you for your help I really need to get rid of that problem.

Best regards,
Victor Kabdebon

Re: revisioned data

2011-02-05 Thread Victor Kabdebon
Hello Raj,

No it actually doesn't make sense from the point of view of Cassandra;
OrderingPartioner preserves the order of the *keys*. The Ordering will be
done according to the *supercolumn name*. In that case you can set the
ordering with compare_super_with (sorry I don't remember exactly the new
term in Cassandra, but that's the idea). The compare_with will order your
columns inside your supercolumn.

However, and I think that many will agree here, tend to avoid SuperColumn.
Rather than using SuperColumns try to think like that :

CF1 : ObjectStore
Key :ID (long)
Columns : {
other fields
update time (long [date])

CF2 : ObjectOrder
Key : myorderedobjects
   { name : identifier that can be sorted
   value :ObjectID},

Best regards,
Victor Kabdebon,

2011/2/5 Raj Bakhru

 Hi all -

 We're new to Cassandra and have read plenty on the data model, but we
 wanted to poll for thoughts on how to best handle this structure.

 We have simple objects that have and ID and we want to maintain a history
 of all the revisions.

 ID (long)
 other fields
 update time (long [date])

 Any time the object changes, we'll store down a new version of the object
 (same ID, but different update time and other fields).  We need to be able
 to query out what the object was as-of any time historically.  We also need
 to be able to query out what some or all of the items of this object type
 were as-of any time historically..

 In SQL, we'd just find the max(id) where update time  queried_as_of_time

 In Cassandra, we were thinking of modeling as follows:

 CF:  MyObjectType
 Super-Column: ID of object (e.g. 625)
 Column:  updatetime  (e.g. 1000245242)
 Value: byte[] of serialized object

 We were thinking of using the OrderingPartitioner and using range queries
 against the data.

 Does this make sense?  Are we approaching this in the wrong way?

 Thanks a lot

Re: Using Cassandra to store files

2011-02-03 Thread Victor Kabdebon
Dear Brendan,

I would really be interested by your findings too. I need a system to store
various documents, I am thinking of Cassandra (that I am already using) or
using a second type of database or any other system. Maybe like dan
suggested, using mogilefs.

Thank you,
Victor Kabdebon

2011/2/3 Dan Kuebrich


 That's not what document-oriented means! (har har)

 I don't know all the details of your case, but with serving static files I
 suspect you could do ok with something that has a much smaller memory/cpu
 footprint as you won't have as great of write throughput / read latency
 concerns.  I've used mogilefs for this


 View this message in context:
 Sent from the mailing list archive at

Re: Cassandra and count

2011-01-28 Thread Victor Kabdebon
Buddasystem is right.
A count returns columns to the client which count it. My advice : do not
count big columns / supercolumns. People in the dev team are trying to
develop distributed counters but I don't know the state of this research.

Best regards,
Victor Kabdebon

2011/1/28 buddhasystem

 As far as I know, there are no aggregate operations built into Cassandra,
 which means you'll have to retrieve all of the data to count it in the
 client. I had a thread on this topic 2 weeks ago. It's pretty bad.

 View this message in context:
 Sent from the mailing list archive at

Re: cass0.7: Creating colum family Sorting

2011-01-16 Thread Victor Kabdebon
Comparator comparates only the column inside a Key.
Key sorting is done by your partitionner.

Best regards,
Victor Kabdebon

2011/1/16 kh jo

 I am having some problems with creating column families and sorting them,

 I want to create a countries column family where I can get a sorted list of
 countries(by country's name)

 the following command fails:

 create column family Countries with comparator=LongType
 and column_metadata=[
 {column_name: cid, validation_class: LongType, index_type: KEYS},
 {column_name: cname, validation_class: UTF8Type},
 {column_name: code, validation_class: UTF8Type, index_type: KEYS}

 IT SHOWS: 'id' could not be translated into a LongType.

 the following works:

 create column family Countries with comparator=UTF8Type
 and column_metadata=[
 {column_name: cid, validation_class: LongType, index_type: KEYS},
 {column_name: cname, validation_class: UTF8Type},
 {column_name: code, validation_class: UTF8Type, index_type: KEYS}

 but when I insert some columns, they are not sorted as I want

 $countries = new ColumnFamily(Cassandra::con(), 'Countries');
 $countries-insert('Afghanistan', array('cid'= '1', 'cname' =
 'Afghanistan', 'code' = 'AF'));
 $countries-insert('Germany', array('cid'= '2', 'cname' = 'Germany',
 'code' ='DE'));
 $countries-insert('Zimbabwe', array('cid'= '3', 'cname' = 'Zimbabwe',
 'code' ='ZM'));

 list Countries;

 RowKey: Germany
 = (column=cid, value=2, timestamp=1295211346716047)
 = (column=cname, value=Germany, timestamp=1295211346716047)
 = (column=code, value=DE, timestamp=1295211346716047)
 RowKey: Zimbabwe
 = (column=cid, value=3, timestamp=1295211346713570)
 = (column=cname, value=Zimbabwe, timestamp=1295211346713570)
 = (column=code, value=ZM, timestamp=1295211346713570)
 RowKey: Afghanistan
 = (column=cid, value=1, timestamp=1295211346709448)
 = (column=cname, value=Afghanistan, timestamp=1295211346709448)
 = (column=code, value=AF, timestamp=1295211346709448)

 I don't see any sorting here?!

Re: Cassandra in less than 1G of memory?

2011-01-16 Thread Victor Kabdebon
If it's because of swapping made by Linux, wouldn't I only see the swap
memory consumption rise ? Because the problem is (apart from swap becoming
bigger and bigger) that cassandra ram memory consumption is going through
the roof.

However I want to give a try to the proposed method.

Thank you very much,
Best Regards,
Victor Kabdebon

PS : memory consumption :

root 19093  0.1 35.8 *1362108 722312* ?  Sl   Jan11  14:01
/usr/bin/java -ea -Xms128M -Xmx512M -XX:+UseParNewGC -XX:+UseConcMarkSweepGC
-XX:+CMSParallelRemarkEnabled -XX:SurvivorRatio=8 -XX:MaxTenuringThreshold=1
-XX:CMSInitiatingOccupancyFraction=75 -XX:+UseCMSInitiatingOccupancyOnly
-Dstorage-config=bin/../conf -Dcassandra-foreground=yes -cp

2011/1/16 Aaron Morton

 The OS will make it's best guess as to how much memory if can give over to
 mmapped files. Unfortunately it will not always makes the best decision, see
 the information on adding JNA and mlockall() support in cassandra 0.6.5 Jonathan says,
 try setting the disk mode to standard to see the difference.

 WRT the resident memory for the process, not all memory allocation is done
 on the heap. To see the non heap usage connect to the processing using
 JConsole and take a look at the Memory tab. For example on my box now
 Cassandra has 110M of heap memory and 20M of non heap. AFAIK memory such as
 the class definitions are not included in the heap memory usage.

 Hope that helps.

 On 15 Jan, 2011,at 08:03 PM, Victor Kabdebon

 Hi Jonathan, hi Edward,

 Jonathan : but it looks like mmaping wants to consume the entire memory of
 my server. It goes up to 1.7 Gb for a ridiculously small amount of data.
 Am I doing something wrong or is there something I should change to prevent
 this never ending increase of memory consumption ?
 Edward : I am not sure, I will try to see that tomorrow but my disk access
 mode is standard, not mmap.

 Anyway thank you very much,
 Victor K.

 PS : here is some hours after the result of ps aux | grep cassandra
 root 19093  0.1 30.0 1243940 *605060* ?  Sl   Jan11  10:15
 /usr/bin/java -ea -Xms128M *-Xmx512M* -XX:+UseParNewGC
 -XX:+UseConcMarkSweepGC -XX:+CMSParallelRemarkEnabled -XX:SurvivorRatio=8
 -XX:MaxTenuringThreshold=1 -XX:CMSInitiatingOccupancyFraction=75
 -XX:+UseCMSInitiatingOccupancyOnly -XX:+HeapDumpOnOutOfMemoryError
 -Dstorage-config=bin/../conf -Dcassandra-foreground=yes -cp

 2011/1/15 Jonathan Ellis

 mmapping only consumes memory that the OS can afford to feed it.

 On Fri, Jan 14, 2011 at 7:29 PM, Edward Capriolo
  On Fri, Jan 14, 2011 at 2:13 PM, Victor Kabdebon wrote:
  Dear rajat

Re: Cassandra in less than 1G of memory?

2011-01-14 Thread Victor Kabdebon
Dear rajat,

Yes it is possible, I have the same constraints. However I must warn you,
from what I see Cassandra memory consumption is not bounded in 0.6.X on
debian 64 Bit

Here is an example of an instance launch in a node :

root 19093  0.1 28.3 1210696 *570052* ?  Sl   Jan11   9:08
/usr/bin/java -ea -Xms128M *-Xmx512M *-XX:+UseParNewGC
-XX:+UseConcMarkSweepGC -XX:+CMSParallelRemarkEnabled -XX:SurvivorRatio=8
-XX:MaxTenuringThreshold=1 -XX:CMSInitiatingOccupancyFraction=75
-XX:+UseCMSInitiatingOccupancyOnly -XX:+HeapDumpOnOutOfMemoryError
-Dstorage-config=bin/../conf -Dcassandra-foreground=yes -cp

Look at the second bold value, Xmx indicates the maximum memory that
cassandra can use; it is set to be 512, so it could easily fit into 1 Gb.
Now look at the first one, 570Mb  512 Mb. Moreover if I come back in one
day the first value will be even higher. Probably around 610 Mb. Actually it
increases to the point where I need to restart it otherwise other program
are shut down by Linux for cassandra to further expand its memory usage...

By the way it's a call to other cassandra users, am I the only one to
encounter this problem ?

Best regards,

Victor K.

2011/1/14 Rajat Chopra


 According to  JVM heap size topic at , Cassandra would need
 atleast 1G of memory to run. Is it possible to have a running Cassandra
 cluster with machines that have less than that memory… say 512M?

 I can live with slow transactions, no compactions etc, but do not want an
 OutOfMemory error. The reason for a smaller bound for Cassandra is that I
 want to leave room for other processes to run.

 Please help with specific parameters to tune.



Re: Cassandra in less than 1G of memory?

2011-01-14 Thread Victor Kabdebon
Hi Jonathan, hi Edward,

Jonathan : but it looks like mmaping wants to consume the entire memory of
my server. It goes up to 1.7 Gb for a ridiculously small amount of data.
Am I doing something wrong or is there something I should change to prevent
this never ending increase of memory consumption ?
Edward : I am not sure, I will try to see that tomorrow but my disk access
mode is standard, not mmap.

Anyway thank you very much,
Victor K.

PS : here is some hours after the result of ps aux | grep cassandra
root 19093  0.1 30.0 1243940 *605060* ?  Sl   Jan11  10:15
/usr/bin/java -ea -Xms128M *-Xmx512M* -XX:+UseParNewGC
-XX:+UseConcMarkSweepGC -XX:+CMSParallelRemarkEnabled -XX:SurvivorRatio=8
-XX:MaxTenuringThreshold=1 -XX:CMSInitiatingOccupancyFraction=75
-XX:+UseCMSInitiatingOccupancyOnly -XX:+HeapDumpOnOutOfMemoryError
-Dstorage-config=bin/../conf -Dcassandra-foreground=yes -cp

2011/1/15 Jonathan Ellis

 mmapping only consumes memory that the OS can afford to feed it.

 On Fri, Jan 14, 2011 at 7:29 PM, Edward Capriolo
  On Fri, Jan 14, 2011 at 2:13 PM, Victor Kabdebon wrote:
  Dear rajat,
  Yes it is possible, I have the same constraints. However I must warn
  from what I see Cassandra memory consumption is not bounded in 0.6.X on
  debian 64 Bit
  Here is an example of an instance launch in a node :
  root 19093  0.1 28.3 1210696 570052 ?  Sl   Jan11   9:08
  /usr/bin/java -ea -Xms128M -Xmx512M -XX:+UseParNewGC
  -XX:+CMSParallelRemarkEnabled -XX:SurvivorRatio=8
  -XX:CMSInitiatingOccupancyFraction=75 -XX:+UseCMSInitiatingOccupancyOnly
  -Dstorage-config=bin/../conf -Dcassandra-foreground=yes -cp
  Look at the second bold value, Xmx indicates the maximum memory that
  cassandra can use; it is set to be 512, so it could easily fit into 1
  Now look at the first one, 570Mb  512 Mb. Moreover if I come back in
  day the first value will be even higher. Probably around 610 Mb.
 Actually it
  increases to the point where I need to restart it otherwise other
  are shut down by Linux for cassandra to further expand its memory
  By the way it's a call to other cassandra users, am I the only one to
  encounter this problem ?
  Best regards,
  Victor K.
  2011/1/14 Rajat Chopra
  According to  JVM heap size topic at , Cassandra would
  atleast 1G of memory to run. Is it possible to have a running Cassandra
  cluster with machines that have less than that memory… say 512M?
  I can live with slow transactions, no compactions etc, but do not want
  OutOfMemory error. The reason for a smaller bound for Cassandra is that

Storing big objects into columns

2011-01-13 Thread Victor Kabdebon
Dear all,

In a project I would like to store big objects in columns, serialized. For
example entire images (several Ko to several Mo), flash animations (several
Mo) etc...
Does someone use Cassandra with those relatively big columns and if yes does
it work well ? Is there any drawbacks using this method ?

Thank you,
Victor K.

Re: Storing big objects into columns

2011-01-13 Thread Victor Kabdebon
Is there any recommanded maximum size for a Column ? (not the very upper
limit which is 2Gb)
Why is it useful to chunk the content into multiple columns ?

Thank you,
Victor K.

2011/1/13 Ryan King

 On Thu, Jan 13, 2011 at 2:38 PM, Victor Kabdebon wrote:
  Dear all,
  In a project I would like to store big objects in columns, serialized.
  example entire images (several Ko to several Mo), flash animations
  Mo) etc...
  Does someone use Cassandra with those relatively big columns and if yes
  it work well ? Is there any drawbacks using this method ?

 I haven't benchmarked this myself, but I think you'll want to chunk
 your content into multiple columns in the same row.


Re: Storing big objects into columns

2011-01-13 Thread Victor Kabdebon
Ok thank you very much for these information !
If somebody has more insights on this matter I am still interested !

Victor K.

2011/1/13 Ryan King

 On Thu, Jan 13, 2011 at 2:44 PM, Victor Kabdebon wrote:
  Is there any recommanded maximum size for a Column ? (not the very upper
  limit which is 2Gb)
  Why is it useful to chunk the content into multiple columns ?

 I think you're going to have to do some tests yourself.

 You want to chunk it so that you can pseudo-stream the content. You
 don't want to have to load the whole content at once.
