Re: Hector Problem Basic one
I only saw this error message when all Cassandra nodes are down. How you get the Cluster and how you set the hosts? 发件人: CASSANDRA learner [mailto:cassandralear...@gmail.com] 发送时间: 2011年10月12日 14:30 收件人: user@cassandra.apache.org 主题: Re: Hector Problem Basic one Thanks for the reply ben. Actually The problem is, I could not able to run a basic hector example from eclipse. Its throwing me.prettyprint.hector.api. exceptions.HectorException: All host pools marked down. Retry burden pushed out to client Can you please let me know why i am getting this On Tue, Oct 11, 2011 at 3:54 PM, Ben Ashton b...@bossastudios.commailto:b...@bossastudios.com wrote: Hey, We had this one, even tho in the hector documentation it says that it retry s failed servers even 30 by default, it doesn't. Once we explicitly set it to X seconds, when ever there is a failure, ie with network (AWS), it will retry and add it back into the pool. Ben On 11 October 2011 11:09, CASSANDRA learner cassandralear...@gmail.commailto:cassandralear...@gmail.com wrote: Hi Every One, Actually I was using cassandra long time back and when i tried today, I am getting a problem from eclipse. When i am trying to run a basic hector (java) example, I am getting an exception me.prettyprint.hector.api.exceptions.HectorException: All host pools marked down. Retry burden pushed out to client. . But My server is up. Node tool also whows that it is up. I donno what happens.. 1.)Is it any thing to do with JMX port. 2.) What is the storage port in casandra.yaml and jmx port in cassandra-env.sh
Re: Indexes on heterogeneous rows
Does the get_indexed_slice in 0.7.4 version already do thing that way? It seems always take the 1st indexed column with EQ. Or is it a new feature of coming 0.7.5 or 0.8? -邮件原件- 发件人: Jonathan Ellis [mailto:jbel...@gmail.com] 发送时间: 2011年4月15日 0:21 收件人: user@cassandra.apache.org 抄送: David Boxenhorn; aaron morton 主题: Re: Indexes on heterogeneous rows This should work reasonably well w/ 0.7 indexes. Cassandra tracks statistics on index selectivity, so it would plan that query as index lookup on e=5, then iterate over those results and return only rows that also have type=2. On Thu, Apr 14, 2011 at 5:33 AM, David Boxenhorn da...@taotown.com wrote: Thank you for your answer, and sorry about the sloppy terminology. I'm thinking of the scenario where there are a small number of results in the result set, but there are billions of rows in the first of your secondary indexes. That is, I want to do something like (not sure of the CQL syntax): select * where type=2 and e=5 where there are billions of rows of type 2, but some manageable number of those rows have e=5. As I understand it, secondary indexes are like column families, where each value is a column. So the billions of rows where type=2 would go into a single row of the secondary index. This sounds like a problem to me, is it? I'm assuming that the billions of rows that don't have column e at all (those rows of other types) are not a problem at all... On Thu, Apr 14, 2011 at 12:12 PM, aaron morton aa...@thelastpickle.com wrote: Need to clear up some terminology here. Rows have a key and can be retrieved by key. This is *sort of* the primary index, but not primary in the normal RDBMS sense. Rows can have different columns and the column names are sorted and can be efficiently selected. There are secondary indexes in cassandra 0.7 based on column values http://www.datastax.com/dev/blog/whats-new-cassandra-07-secondary-indexes So you could create secondary indexes on the a,e, and h columns and get rows that have specific values. There are some limitations to secondary indexes, read the linked article. Or you can make your own secondary indexes using row keys as the index values. If you have billions of rows, how many do you need to read back at once? Hope that helps Aaron On 14 Apr 2011, at 04:23, David Boxenhorn wrote: Is it possible in 0.7.x to have indexes on heterogeneous rows, which have different sets of columns? For example, let's say you have three types of objects (1, 2, 3) which each had three members. If your rows had the following pattern type=1 a=? b=? c=? type=2 d=? e=? f=? type=3 g=? h=? i=? could you index type as your primary index, and also index a, e, h as secondary indexes, to get the objects of that type that you are looking for? Would it work if you had billions of rows of each type? -- Jonathan Ellis Project Chair, Apache Cassandra co-founder of DataStax, the source for professional Cassandra support http://www.datastax.com
Re: result of get_indexed_slices() seems wrong
Thanks aaron. Maybe we need to do more check at ThriftValidation.validateIndexClauses(), add this: MapByteBuffer, ColumnDefinition colDefs = DatabaseDescriptor.getTableDefinition(keyspace).cfMetaData().get(columnFamily).getColumn_metadata(); for (IndexExpression expression : index_clause.expressions) { if (!colDefs.containsKey(expression.column_name)) throw new InvalidRequestException(No column definition for + expression.column_name); } 发件人: aaron morton [mailto:aa...@thelastpickle.com] 发送时间: 2011年3月24日 12:24 收件人: user@cassandra.apache.org 主题: Re: result of get_indexed_slices() seems wrong Looks like this https://issues.apache.org/jira/browse/CASSANDRA-2347 From this discussion http://www.mail-archive.com/user@cassandra.apache.org/msg11291.html Aaron On 24 Mar 2011, at 17:17, Wangpei (Peter) wrote: Hi, This problem occurs when the clause has multi expression and a expression with operator other than EQ. Is anyone meet the same problem? I trace the code, and seen this at ColumnFamilyStore.satisfies() method: int v = data.getComparator().compare(column.value(), expression.value); It seems when I need the type of column value here, it use the type of my column names which is UTF8Type, so give the wrong result. To fix it, the expression needs a optional “comparator_type” attribute, then satisfies() can get the correct type to compare. pls point out if I am wrong.
result of get_indexed_slices() seems wrong
Hi, This problem occurs when the clause has multi expression and a expression with operator other than EQ. Is anyone meet the same problem? I trace the code, and seen this at ColumnFamilyStore.satisfies() method: int v = data.getComparator().compare(column.value(), expression.value); It seems when I need the type of column value here, it use the type of my column names which is UTF8Type, so give the wrong result. To fix it, the expression needs a optional comparator_type attribute, then satisfies() can get the correct type to compare. pls point out if I am wrong.
Re: understanding tombstones
My question: what the client would get, when following happens:(RF=3, N=3) 1, write with timestamp T and succeed in all nodes. 2, delete with timestamp T+1, CL=Q, and succeed in node1 and node2 but failed in node3. 3, force flush + compaction 4, read CL=Q Does the client will get the row and read repair will fix the data? If not, how cassandra prevent from this? -邮件原件- 发件人: Jonathan Ellis [mailto:jbel...@gmail.com] 发送时间: 2011年3月10日 10:19 收件人: user@cassandra.apache.org 主题: Re: understanding tombstones On Wed, Mar 9, 2011 at 4:54 PM, Jeffrey Wang jw...@palantir.com wrote: insert row X with timestamp T delete row X with timestamp T+1 force flush + compaction insert row X with timestamp T My understanding is that the tombstone created by the delete (and row X) will disappear with the flush + compaction which means the last insertion should show up. Right. I believe I have traced this to the fact that the markedForDeleteAt field on the ColumnFamily does not get reset after a compaction (after gc_grace_seconds has passed); is this desirable? I think it introduces an inconsistency in how tombstoned columns work versus tombstoned CFs. Thanks. That does sound like a bug. Can you create a ticket? -- Jonathan Ellis Project Chair, Apache Cassandra co-founder of DataStax, the source for professional Cassandra support http://www.datastax.com
Re: managing a limited-length list as a value
Maybe you can try this: use MAX-time as your column name, then get the first limit columns. -邮件原件- 发件人: Benson Margulies [mailto:bimargul...@gmail.com] 发送时间: 2011年2月19日 2:11 收件人: user@cassandra.apache.org 主题: managing a limited-length list as a value The following is derived from the redis list operations. The data model is that a key maps to an list of items. The operation is to push a new item into the front, and discard any items from the end above a threshold number of items. of course, this can be done by reading a value, fiddling with it, and writing it back. I write this email to wonder if there's any native trickery to avoid having to read the value, but rather permitting some sort of 'push' operation.
Re: Partitioning
I have same question. I read the source code of NetworkTopologyStrategy, seems it always put replica on the first nodes on the ring of the DC. If I am misunderstand, It seems those nodes will became hot spot. Why NetworkTopologyStrategy works that way? is there some alternative can avoid this shortcoming? Thanks in advance. Peter 发件人: Aaron Morton [mailto:aa...@thelastpickle.com] 发送时间: 2011年2月16日 3:56 收件人: user@cassandra.apache.org 主题: Re: Partitioning You can using the Network Topology Strategy see http://wiki.apache.org/cassandra/Operations?highlight=(topology)|(network)#Network_topology and NetworkTopologyStrategy in the conf/cassandra.yaml file. You can control the number of replicas to each DC. Also look at conf/cassandra-topology.properties for information on how to tell cassandra about your network topology. Aaron On 16 Feb, 2011,at 05:10 AM, RWN s5a...@gmail.com wrote: Hi, I am new to Cassandra and am evaluating it. Following diagram is how my setup will be: http://bit.ly/gJZlhw Here each oval represents one data center. I want to keep N=4. i.e. four copies of every Column Family. I want one copy in each data-center. In other words, COMPLETE database must be contained in each of the data centers. Question: 1. Is this possible ? If so, how do I configure (partitioner, replica etc) ? Thanks AJ P.S excuse my multiple posting of the same. I am unable to subscribe for some reason. -- View this message in context: http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Partitioning-tp6028132p6028132.html Sent from the cassandra-u...@incubator.apache.orgmailto:cassandra-u...@incubator.apache.org mailing list archive at Nabble.comhttp://Nabble.com.
Re: time to live rows
AFAIK 2nd index only works for operator EQ. -邮件原件- 发件人: Kallin Nagelberg [mailto:kallin.nagelb...@gmail.com] 发送时间: 2011年2月9日 3:36 收件人: user@cassandra.apache.org 主题: Re: time to live rows I'm thinking if this row expiry notion doesn't pan out then I might create a 'lastAccessed' column with a secondary index (i think that's right) on it. Then I can periodically run a query to find all lastAccessed columns less than a certain value and manually delete them. Sound reasonable? -Kal
Re: Row Key Types
Did you set compare_with attribute of your ColumnFamily to TimeUUIDType? -邮件原件- 发件人: Bill Speirs [mailto:bill.spe...@gmail.com] 发送时间: 2011年2月2日 0:47 收件人: Cassandra Usergroup 主题: Row Key Types What is the type of a Row Key? Can you define how they are compared? I ask because I'm using TimeUUIDs as my row keys, but when I make a call to get a range of row keys (get_range in phpcassa) I have to specify the UTF8 range of '' to '----' instead of the TimeUUID range of '----' to '----'. This works, but feels wrong/inefficient... thoughts? Thanks... Bill-
Re: Cassandra + Thrift on RedHat Enterprise 5
Hector document: http://www.riptano.com/sites/default/files/hector-v2-client-doc.pdf 发件人: Vedarth Kulkarni [mailto:vedar...@gmail.com] 发送时间: 2011年1月30日 14:03 收件人: user@cassandra.apache.org 主题: Re: Cassandra + Thrift on RedHat Enterprise 5 How can I use Hector please can you explain me in detail ? I am new to these things. - Vedarth Kulkarni, TYBSc (Computer Science). On Sun, Jan 30, 2011 at 11:20 AM, Andrey V. Panov panov.a...@gmail.commailto:panov.a...@gmail.com wrote: Use Hector instead of pure Trift. https://github.com/rantav/hector/ And checkout wiki.
Re: Schema Design
I am also working on a system store logs from hundreds system. In my scenario, most query will like this: let's look at login logs (category EQ) of that proxy (host EQ) between this Monday and Wednesday(time range). My data model like this: . only 1 CF. that's enough for this scenario. . group logs from each host and day to one row. Key format is hostname.category.date . store each log entry as a super column, super olumn name is TimeUUID of the log. each attribute as a column. Then this query can be done as 3 GET, no need to do key range scan. Then I can use RP instead of OPP. If I use OPP, I have to worry about load balance myself. I hate that. However, if I need to do a time range access, I can still use column slice. An additional benefit is, I can clean old logs very easily. We only store logs in 1 year. Just deleting by keys can do this job well. I think storing all logs for a host in a single row is not a good choice. 2 reason: 1, too few keys, so your data will not distributing well. 2, data under a key will always increase. So Cassandra have to do more SSTable compaction. -邮件原件- 发件人: William R Speirs [mailto:bill.spe...@gmail.com] 发送时间: 2011年1月27日 9:15 收件人: user@cassandra.apache.org 主题: Re: Schema Design It makes sense that the single row for a system (with a growing number of columns) will reside on a single machine. With that in mind, here is my updated schema: - A single column family for all the messages. The row keys will be the TimeUUID of the message with the following columns: date/time (in UTC POSIX), system name/id (with an index for fast/easy gets), the actual message payload. - A column family for each system. The row keys will be UTC POSIX time with 1 second (maybe 1 minute) bucketing, and the column names will be the TimeUUID of any messages that were logged during that time bucket. My only hesitation with this design is that buddhasystem warned that each column family, is allocated a piece of memory on the server. I'm not sure what the implications of this are and/or if this would be a problem if a I had a number of systems on the order of hundreds. Thanks... Bill- On 01/26/2011 06:51 PM, Shu Zhang wrote: Each row can have a maximum of 2 billion columns, which a logging system will probably hit eventually. More importantly, you'll only have 1 row per set of system logs. Every row is stored on the same machine(s), which you means you'll definitely not be able to distribute your load very well. From: Bill Speirs [bill.spe...@gmail.com] Sent: Wednesday, January 26, 2011 1:23 PM To: user@cassandra.apache.org Subject: Re: Schema Design I like this approach, but I have 2 questions: 1) what is the implications of continually adding columns to a single row? I'm unsure how Cassandra is able to grow. I realize you can have a virtually infinite number of columns, but what are the implications of growing the number of columns over time? 2) maybe it's just a restriction of the CLI, but how do I do issue a slice request? Also, what if start (or end) columns don't exist? I'm guessing it's smart enough to get the columns in that range. Thanks! Bill- On Wed, Jan 26, 2011 at 4:12 PM, David McNelis dmcne...@agentisenergy.com wrote: I would say in that case you might want to try a single column family where the key to the column is the system name. Then, you could name your columns as the timestamp. Then when retrieving information from the data store you can can, in your slice request, specify your start column as X and end column as Y. Then you can use the stored column name to know when an event occurred. On Wed, Jan 26, 2011 at 2:56 PM, Bill Speirsbill.spe...@gmail.com wrote: I'm looking to use Cassandra to store log messages from various systems. A log message only has a message (UTF8Type) and a data/time. My thought is to create a column family for each system. The row key will be a TimeUUIDType. Each row will have 7 columns: year, month, day, hour, minute, second, and message. I then have indexes setup for each of the date/time columns. I was hoping this would allow me to answer queries like: What are all the log messages that were generated between X Y? The problem is that I can ONLY use the equals operator on these column values. For example, I cannot issuing: get system_x where month 1; gives me this error: No indexed columns present in index clause with operator EQ. The equals operator works as expected though: get system_x where month = 1; What schema would allow me to get date ranges? Thanks in advance... Bill- * ColumnFamily description * ColumnFamily: system_x_msg Columns sorted by: org.apache.cassandra.db.marshal.UTF8Type Row cache size / save period: 0.0/0 Key cache size / save period: 20.0/3600 Memtable thresholds: 1.1671875/249/60 GC grace seconds: 864000 Compaction min/max
Re: Basic question on a write operation immediately followed by a read
What is the ConsistencyLevel value? Is it ConsistencyLevel.ANY? Javadoc: * Write consistency levels make the following guarantees before reporting success to the client: * ANY Ensure that the write has been written once somewhere, including possibly being hinted in a non-target node. * ONE Ensure that the write has been written to at least 1 node's commit log and memory table * QUORUM Ensure that the write has been written to ReplicationFactor / 2 + 1 nodes * LOCAL_QUORUM Ensure that the write has been written to ReplicationFactor / 2 + 1 nodes, within the local datacenter (requires NetworkTopologyStrategy) * EACH_QUORUM Ensure that the write has been written to ReplicationFactor / 2 + 1 nodes in each datacenter (requires NetworkTopologyStrategy) * ALL Ensure that the write is written to codelt;ReplicationFactorgt;/code nodes before responding to the client. 发件人: Roshan Dawrani [mailto:roshandawr...@gmail.com] 发送时间: 2011年1月25日 10:57 收件人: user@cassandra.apache.org; hector-us...@googlegroups.com 主题: Basic question on a write operation immediately followed by a read Hi, I have a basic question - maybe silly too. Say, I have a 1-node Cassandra setup (no replication, eventual consistency, etc) and I do an insert into a column family and then very close in time to the insert, I do a read on it for the same data. Is there a possibility that my read operation may miss the data that just got inserted? Since there are no DB transactions in Cassandra, are writes immediately seen to readers - even partially as they get written? Or can there be a delay sometimes due to flusing-to-SSTables, etc? Or, the writes are first in-memory and immediately visible to readers and flusing, etc is independent of all this and happens in background? Thanks. -- Roshan Blog: http://roshandawrani.wordpress.com/ Twitter: @roshandawranihttp://twitter.com/roshandawrani Skype: roshandawrani