Re: Replica data distributing between racks

2011-05-04 Thread Konstantin Naryshkin
The way that I understand it (and that seems to be consistent with what was said in this discussion) is that each DC has its own data space. Using your simplified 1-10 system: DC1 DC2 0 D1R1 D2R2 1 D1R1 D2R1 2 D1R1 D2R1 3 D1R1 D2R1 4 D1R1 D2R1 5 D1R2 D2R1 6 D1R2 D2R2 7 D1R2

Making a custom Cassandra RPM

2011-05-04 Thread Konstantin Naryshkin
I want to create a custom RPM of Cassandra (so I can deploy it pre-configured). There is an RPM in the source tree, but it does not contain any details of the setup required to create the RPM (what files should I have where). I have tried to run rpmbuild -bi on the spec file and I am getting

Re: Making a custom Cassandra RPM

2011-05-06 Thread Konstantin Naryshkin
: Re: Making a custom Cassandra RPM Your apache ant install is too old. The ant that comes with rhel/centos 5.X isn't new enough to build cassandra. You will need to install ant manually. On Wed, May 4, 2011 at 2:01 PM, Konstantin Naryshkin konstant...@a-bb.net wrote: I want to create a custom RPM

Re: Making a custom Cassandra RPM

2011-05-06 Thread Konstantin Naryshkin
: Konstantin Naryshkin konstant...@a-bb.net To: user@cassandra.apache.org Sent: Friday, May 6, 2011 2:56:43 PM Subject: Re: Making a custom Cassandra RPM Sorry that I did not get back to you on the issue. Your suggestion worked and I was able to get the RPM to build. Unfortunately, it still does not work

Forcing Cassandra to free up some space

2011-05-26 Thread Konstantin Naryshkin
I have a basic understanding of how Cassandra handles the file system (flushes in Memtables out to SSTables, SSTables get compacted) and I understand that old files are only deleted when a node is restarted, when Java does a GC, or when Cassandra feels like it is running out of space. My

Re: Forcing Cassandra to free up some space

2011-05-26 Thread Konstantin Naryshkin
So, in summary, there is no way to predictably and efficiently tell Cassandra to get rid of all of the extra space it is using on disk? - Original Message - From: Jeffrey Kesselman jef...@gmail.com To: user@cassandra.apache.org Sent: Thursday, May 26, 2011 8:57:49 PM Subject: Re: Forcing

Re: pb deletion

2011-05-27 Thread Konstantin Naryshkin
What is the ConsitencyLevel of your reads? A ConsistencyLevel.ONE remove returns when it has deleted the record from at least 1 replica (and any other ones will be deleted when they can). It could be the case that you are deleting the record off of one node and then reading it off of the other

Re: bring out your rpms...

2011-06-14 Thread Konstantin Naryshkin
You could try to roll your own. I managed to create a custom 0.8 RPM using the spec file from the redhat directory. First check out the source. Then edit the spec file with the following changes: Set the Version and Release variables appropriately. At the end of %install, add the following 2

Re: Unable to access column family in CLI after building CF in CQL

2011-06-16 Thread Konstantin Naryshkin
The second error (the CQL select) is because you have different Key Validation Class values for your two user columns. users is org.apache.cassandra.db.marshal.BytesType, while users2 is org.apache.cassandra.db.marshal.UTF8Type. The select is failing because you are comparing a String to a

Re: Meaning of 'nodetool repair has to run within GCGraceSeconds'

2011-06-30 Thread Konstantin Naryshkin
As I understand, it has to do with a node being up but missing the delete message (remember, if you apply the delete at CL.QUORUM, you can have almost half the replicas miss it and still succeed). Imagine that you have 3 nodes A, B, and C, each of which has a column 'foo' with a value 'bar'.

Re: insert a super column

2011-07-13 Thread Konstantin Naryshkin
A ColumnPath can contain a super column, so you should be fine inserting a super column family (in fact I do that). Quoting cassandra.thrift: struct ColumnPath { 3: required string column_family, 4: optional binary super_column, 5: optional binary column, } - Original Message

Re: deletion questions

2011-07-19 Thread Konstantin Naryshkin
2. Trying to reduce disk occupation I deleted CF which used 90% of available space. After issuing a drop column family User; command no *User*.db files were deleted. nodetool compact haven't helped too. How can that deletion be triggered? You have to wait for a garbage collect (or do a rolling

Re: best example of indexing

2011-07-20 Thread Konstantin Naryshkin
In the Cassandra CLI tutorial(http://wiki.apache.org/cassandra/CassandraCli), there is an example of creating a secondary index. Konstantin - Original Message - From: CASSANDRA learner cassandralear...@gmail.com To: user@cassandra.apache.org Sent: Wednesday, July 20, 2011 9:47:28 AM

Re: Cassandra start/stop scripts

2011-08-02 Thread Konstantin Naryshkin
As mentioned, there is an init.d script in the RPM package to start and stop Cassandra (it is what we use). If you do not use the RPM and don't want to or cannot install the full package, you can get just the script at: https://svn.apache.org/repos/asf/cassandra/trunk/redhat/cassandra -

Re: Question about eventually consistent in Cassandra

2011-08-02 Thread Konstantin Naryshkin
I believe that what would happen is that whichever data center has the later clock will win. Every modification you make gets a time stamp (generally set by your client to the current time, if you are using one). I believe that whatever modification happened with the last time stamp is

Re: Problems using Thrift API in C

2011-08-04 Thread Konstantin Naryshkin
I have had similar issues when I generated Cassandra for Erlang. It seems that Thrift 0.6.1 (the latest stable version) does not work with Cassandra. Using Thrift 0.7 does. I had issues where it would give me run time errors when trying to send an insert (it would not serialize correctly).

Re: Problems using Thrift API in C

2011-08-04 Thread Konstantin Naryshkin
API in C - Original Message - From: Konstantin Naryshkin konstant...@a-bb.net To: user@cassandra.apache.org Cc: Sent: Thursday, August 4, 2011 10:36 AM Subject: Re: Problems using Thrift API in C I have had similar issues when I generated Cassandra for Erlang. It seems

Re: How to release a customised Cassandra from Eclipse?

2011-08-10 Thread Konstantin Naryshkin
When I build cassandra, I use: #ant #ant release It does produce a working cassandra.jar, though I am not sure if it will fulfill your needs since I make mine to create an RPM out of it. - Original Message - From: Norman Maurer norman.mau...@googlemail.com To: user@cassandra.apache.org

Re: Planet Cassandra is now live

2011-08-12 Thread Konstantin Naryshkin
Would you consider adding an RSS feed to the site for the benefit of those who like to use feed readers to keep track of unread posts and what not? - Original Message - From: Lynn Bender line...@gmail.com To: user@cassandra.apache.org Sent: Friday, August 12, 2011 2:18:45 PM Subject:

Re: Planet Cassandra is now live

2011-08-15 Thread Konstantin Naryshkin
Thanks. I did not see a link to it when I was sending my message. - Original Message - From: Zhu Han schumi@gmail.com To: user@cassandra.apache.org Sent: Saturday, August 13, 2011 12:11:37 AM Subject: Re: Planet Cassandra is now live On Sat, Aug 13, 2011 at 4:35 AM, Konstantin

Re: Reg row limit sorting

2011-08-17 Thread Konstantin Naryshkin
1. The 100 row limit is for listing (i.e. how many rows that the list command will print). You can give list another limit: list User limit 1000; This limit has nothing to do with any internal Cassandra limitation. I am not aware of any limitation on the number of rows that you can have. 2. I

Re: Customized Secondary Index Schema

2011-08-25 Thread Konstantin Naryshkin
Why are you keeping all your indexes in the same row? We do a similar thing (maintain several indexes over the same data) and we just have an index column family with keys like dest192.168.0.1 which means destination index of 192.168.0.1. You can do rows like User_Keys_By_Last_Name_adams and

Re: Customized Secondary Index Schema

2011-08-25 Thread Konstantin Naryshkin
starting with adams_. Am I right? I want to know what's the cost difference of rang query and slice query? If I can use either composite key or composite column name, which one gives me less query cost? 2011/8/25 Konstantin Naryshkin konstant...@a-bb.net Why are you keeping all your indexes

Re: Replicate On Write behavior

2011-09-01 Thread Konstantin Naryshkin
Yeah, I believe that Yan has a type in his post. A CF is no read in one go, a row is. As for the scalability of having all the columns being read at once, I do not believe that it was ever meant to be. All the columns in a row are stored together, on the same set of machines. This means that if

Re: HUnavailableException: : May not be enough replicas present to handle consistency level.

2011-09-02 Thread Konstantin Naryshkin
I think that Oleg may have misunderstood how replicas are selected. If you have 3 nodes in your cluster and a RF of 2, Cassandra first selects what two nodes, out of the 3 will get data, then, and only then does it write it out. The selection is based on the row key, the token of the node, and

Re: Replace Live Node

2011-09-12 Thread Konstantin Naryshkin
The ring wraps around, so the value before 0 is the max possible token. I believe that it is 2**127 -1 . - Original Message - From: Kyle Gibson kyle.gib...@frozenonline.com To: user@cassandra.apache.org Sent: Monday, September 12, 2011 3:30:20 PM Subject: Re: Replace Live Node What

Re: Configuring multi DC cluster

2011-09-15 Thread Konstantin Naryshkin
Wait, his nodes are going SC, SC, AT, AT. Shouldn't they go SC, AT, SC, AT? By which I mean that if he adds another node to the ring (or lowers the replication factor), he will have a node that is under-utilized. The rings in his data centers have the tokens: SC: 0, 1 AT:

Re: Column family has more SSTables than threshold but no minorcompaction is running

2011-09-20 Thread Konstantin Naryshkin
I believe that minor compactions work on tables of same or similar size, so as long as your tables do not fall within a small range of each other in terms of size, Cassandra does not see an opportunity to run a minor compaction. - Original Message - From: myreasoner myreaso...@gmail.com

Re: Search over composite Column and Super Column name

2011-09-22 Thread Konstantin Naryshkin
One thing you can do is search over the range from username: to username;. username: is the first possible string starting with username:. username; is the first possible sting after all of the stings that start with username: . This works because ; is the character right after : in ASCII. I

Re: unable to start as a service on Ubuntu server

2011-09-27 Thread Konstantin Naryshkin
Yes, they start Cassandra as a daemon in the background. It is running. You can connect to it from the CLI or any other client. You can see what it is doing by reading the logs. cassandra -f starts Cassandra in the foreground, that is why it does not return a prompt when the server starts.

Re: node selection for replication factor 3

2011-10-03 Thread Konstantin Naryshkin
It picks sequentially (the two previous ones, I believe). So in your example it would be 105.12 and 105.11 - Original Message - From: Ramesh Natarajan rames...@gmail.com To: user@cassandra.apache.org Sent: Monday, October 3, 2011 5:06:10 PM Subject: node selection for replication factor

Re: Question about sharding of rows and atomicity

2011-10-05 Thread Konstantin Naryshkin
Cassandra does not break apart a row. All of the columns of a row are kept on the same nodes. I believe that writing multiple columns of the same row is transactional, but not atomic. By which I mean that if one column is written all the other ones will be written as well, but if a read

Changing the replication factor of a keyspace

2011-10-24 Thread Konstantin Naryshkin
We are setting up my application around Cassandra .8.0 (will move to Cassandra 1.0 in the near future). In production the application will be running in a two (or more) node cluster with RF 2. In development, we do not always have 2 machines to test on, so we may have to run a Cassandra cluster

Re: Best way to search content in Cassandra

2011-10-28 Thread Konstantin Naryshkin
You can do a column slice for columns between image/ (the first ASCII string that starts with that sub-string) and image/~ (the last printable ASCII string that starts with that sub-string). On Thu, Oct 27, 2011 at 21:10, Jean-Nicolas Boulay Desjardins jnbdzjn...@gmail.com wrote: Normally in SQL

Re: Second Cassandra users survey

2011-11-03 Thread Konstantin Naryshkin
I realize that it is not realistic to expect it, but is would be good to have a Partitioner that supports both range slices and automatic load balancing. On Thu, Nov 3, 2011 at 13:57, Ertio Lew ertio...@gmail.com wrote: Provide an option to sort columns by timestamp i.e, in the order they have

Re: Physical data layout of columns in super column family

2011-11-09 Thread Konstantin Naryshkin
I assume that Reports is the Super column family, the first 1: is the report id and in the topology is the row key, that the second 1: is the report line and in the Cassandra topology the super column, and that value 1 is the column name. If this is not the case, maybe explain the topology better.

Re: questions on frequency and timing of async replication between DCs

2011-11-14 Thread Konstantin Naryshkin
It may be the case that your CL is the issue. You are writing it at ONE, which means that out of the 4 replicas of that key (two in each data center), you are only putting it on one of them. When you read at CL ONE, if only looks at a single replica to see if the data is there. In other words. If

Re: Fast lookups for userId to username and vice versa

2011-11-16 Thread Konstantin Naryshkin
Or just have two column families to do it: A CF idToName that has the userIds as keys and the userName as the only column and a CF nameToId that has the userNames as keys and the userId as the only column On Mon, Nov 14, 2011 at 03:50, chovatia jaydeep chovatia_jayd...@yahoo.co.in wrote: Check

cqlsh not returning the column name of the first column when reversed

2011-12-06 Thread Konstantin Naryshkin
I am running Cassandra 1.0.0. I am using cqlsh for inspecting my data (very useful tool, thank you whoever wrote it). I notice that when I query for the FIRST N REVERSED column, it is omitting the column name on the first column. For example, cqlsh SELECT FIRST 1 REVERSED * FROM netflow_raw;