Re: composite table with cassandra without using cql3?
I had to give up on using CQL and thrift, and go to composites created and accessed in thrift. I'm using Hector for java access and python for scripting access. -g On Sun, Aug 19, 2012 at 5:19 AM, Georg Köster georg.koes...@gmail.com wrote: Hi all, I had problems creating a table with composite keys with CQL 3 and accessing it via thrift. AFAIK the comparators weren't set up in a compatible way. Probably due to betaness of CQL 3. So I'm now creating and using CFs with Composite Columns exclusively via thrift/Astyanax. Pelops works too. Astyanax composite help: https://github.com/Netflix/astyanax/wiki/Examples - Search for Composite. Beware that the serializer currently doesn't find annotated fields if they are inherited. Cheers! Georg On Sat, Aug 18, 2012 at 1:29 AM, Ben Frank b...@airlust.com wrote: Hi Dean, I'm interested in this too, but I get a 404 with the link below, looks like I can't see your nosqlORM project. -Ben On Thu, Aug 2, 2012 at 9:04 AM, Hiller, Dean dean.hil...@nrel.gov wrote: For how to do it with astyanax, you can see here... Lines 310 and 335 https://github.com/deanhiller/nosqlORM/blob/indexing/input/javasrc/com/alva zan/orm/layer3/spi/db/cassandra/CassandraSession.java For how to do with thrift, you could look at astyanax. I use it on that project for indexing for the ORM layer we use(which is not listed on the cassandra ORM's page as of yet ;) ). Later, Dean On 8/2/12 9:50 AM, Greg Fausak g...@named.com wrote: I've been using the cql3 to create a composite table. Can I use the thrift interface to accomplish the same thing? In other words, do I have to use cql 3 to get a composite table type? (The same behavior as multiple PRIMARY key columns). Thanks, ---greg
CQL / cli question
I think I'm having a major brain fart here, I am just not getting something. I have the following CF declared in CQL -3 create columnfamily testCQL5( ac_event_id int, ac_c text, ac_mtcreation bigint, ac_action text , ac_id text, PRIMARY KEY (ac_c, ac_mtcreation)); I can do this: cqlsh:op2 select * from testCQL5 where ac_c = '2099.t4.l1.cisco.com'; ac_c | ac_mtcreation | ac_action | ac_event_id | ac_id --+---+---+-+ 2099.t4.l1.cisco.com |767473 | Suspense | 101 | a common comme 2099.t4.l1.cisco.com | 987987987 |UP | 123 | ThisIsMe and this... cqlsh:op2 select * from testCQL5 where ac_mtcreation 60 and ac_mtcreation 80; ac_c | ac_mtcreation | ac_action | ac_event_id | ac_id --+---+---+-+ 2099.t4.l1.cisco.com |767473 | Suspense | 101 | a common comme 1171.t4.l1.cisco.com |757473 | JACKSON | 100 | this is the ID I can do this: cqlsh:op2 select * from testCQL5 where ac_c = '2099.t4.l1.cisco.com' and ac_mtcreation 60 and ac_mtcreation 80; ac_c | ac_mtcreation | ac_action | ac_event_id | ac_id --+---+---+-+ 2099.t4.l1.cisco.com |767473 | Suspense | 101 | a common comme I can do the former query in the cli, using this statement: [default@op2] get testCQL5['2099.t4.l1.cisco.com']; = (column=767473:ac_action, value=Suspense, timestamp=1344354369232000) = (column=767473:ac_event_id, value=e, timestamp=1344354369232001) = (column=767473:ac_id, value=a common comme, timestamp=1344354369232002) = (column=987987987:ac_action, value=UP, timestamp=1344354733386000) = (column=987987987:ac_event_id, value={, timestamp=1344355513787000) = (column=987987987:ac_id, value=ThisIsMe, timestamp=1344354749195000) Returned 6 results. How can I do the next two CQL queries using the cli? Or even a query that tells me the ac_mtcreation time of 767473? create column family testcql5 with column_type = 'Standard' and comparator = 'CompositeType(org.apache.cassandra.db.marshal.LongType,org.apache.cassandra.db.marshal.UTF8Type)' and default_validation_class = 'UTF8Type' and key_validation_class = 'UTF8Type' and read_repair_chance = 0.1 and dclocal_read_repair_chance = 0.0 and gc_grace = 864000 and min_compaction_threshold = 4 and max_compaction_threshold = 32 and replicate_on_write = true and compaction_strategy = 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy' and caching = 'KEYS_ONLY' and compression_options = {'sstable_compression' : 'org.apache.cassandra.io.compress.SnappyCompressor'}; I am really pulling my hair out over this. Any help would be appreciated! Thanks, -greg
composite table with cassandra without using cql3?
I've been using the cql3 to create a composite table. Can I use the thrift interface to accomplish the same thing? In other words, do I have to use cql 3 to get a composite table type? (The same behavior as multiple PRIMARY key columns). Thanks, ---greg
Re: Does Cassandra support operations in a transaction?
Hi Ivan, No Cassandra does not support transactions. I believe each operation is atomic. If that operation returns a successful result, then it worked. You can't do things like bind two operations and guarantee is either fails they both fail. You will find that Cassandra doesn't do a lot of things compared to a sql db :-) But, it does write a lot of data quickly. -g On Wed, Aug 1, 2012 at 5:21 AM, Ivan Jiang wiwi1...@gmail.com wrote: Hi, I am a new guy to Cassandra, I wonder if available to call Cassandra in one Transaction such as in Relation-DB. Thanks in advance. Best Regards, Ivan Jiang
Re: virtual memory of all cassandra-nodes is growing extremly since Cassandra 1.1.0
Mina, Thanks for that post. Very interesting :-) What sort of things are you graphing? Standard *nux stuff (mem/cpu/etc)? Or do you have some hooks in to the C* process (I saw somoething about port 1414 in the .yaml file). Best, -g On Thu, Jul 26, 2012 at 9:27 AM, Mina Naguib mina.nag...@bloomdigital.com wrote: Hi Thomas On a modern 64bit server, I recommend you pay little attention to the virtual size. It's made up of almost everything within the process's address space, including on-disk files mmap()ed in for zero-copy access. It's not unreasonable for a machine with N amount RAM to have a process whose virtual size is several times the value of N. That in and of itself is not problematic In a default cassandra 1.1.x setup, the bulk of that will be your sstables' data and index files. On linux you can invoke the pmap tool on the cassandra process's PID to see what's in there. Much of it will be anonymous memory allocations (the JVM heap itself, off-heap data structures, etc), but lots of it will be references to files on disk (binaries, libraries, mmap()ed files, etc). What's more important to keep an eye on is the JVM heap - typically statically allocated to a fixed size at cassandra startup. You can get info about its used/capacity values via nodetool -h localhost info. You can also hook up jconsole and trend it over time. The other critical piece is the process's RESident memory size, which includes the JVM heap but also other off-heap data structures and miscellanea. Cassandra has recently been making more use of off-heap structures (for example, row caching via SerializingCacheProvider). This is done as a matter of efficiency - a serialized off-heap row is much smaller than a classical object sitting in the JVM heap - so you can do more with less. Unfortunately, in my experience, it's not perfect. They still have a cost, in terms of on-heap usage, as well as off-heap growth over time. Specifically, my experience with cassandra 1.1.0 showed that off-heap row caches incurred a very high on-heap cost (ironic) - see my post at http://mail-archives.apache.org/mod_mbox/cassandra-user/201206.mbox/%3c6feb097f-287b-471d-bea2-48862b30f...@bloomdigital.com%3E - as documented in that email, I managed that with regularly scheduled full GC runs via System.gc() I have, since then, moved away from scheduled System.gc() to scheduled row cache invalidations. While this had the same effect as System.gc() I described in my email, it eliminated the 20-30 second pause associated with it. It did however introduce (or may be I never noticed earlier), slow creep in memory usage outside of the heap. It's typical in my case for example for a process configured with 6G of JVM heap to start up, stabilize at 6.5 - 7GB RESident usage, then creep up slowly throughout a week to 10-11GB range. Depending on what else the box is doing, I've experienced the linux OOM killer killing cassandra as you've described, or heavy swap usage bringing everything down (we're latency-sensitive), etc.. And now for the good news. Since I've upgraded to 1.1.2: 1. There's no more need for regularly scheduled System.gc() 2. There's no more need for regularly scheduled row cache invalidation 3. The HEAP usage within the JVM is stable over time 4. The RESident size of the process appears also stable over time Point #4 above is still pending as I only have 3 day graphs since the upgrade, but they show promising results compared to the slope of the same graph before the upgrade to 1.1.2 So my advice is give 1.1.2 a shot - just be mindful of https://issues.apache.org/jira/browse/CASSANDRA-4411 On 2012-07-26, at 2:18 AM, Thomas Spengler wrote: I saw this. All works fine upto version 1.1.0 the 0.8.x takes 5GB of memory of an 8GB machine the 1.0.x takes between 6 and 7 GB on a 8GB machine and the 1.1.0 takes all and it is a problem for me it is no solution to wait of the OOM-Killer from the linux kernel and restart the cassandraprocess when my machine has less then 100MB ram available then I have a problem. On 07/25/2012 07:06 PM, Tyler Hobbs wrote: Are you actually seeing any problems from this? High virtual memory usage on its own really doesn't mean anything. See http://wiki.apache.org/cassandra/FAQ#mmap On Wed, Jul 25, 2012 at 1:21 AM, Thomas Spengler thomas.speng...@toptarif.de wrote: No one has any idea? we tryed update to 1.1.2 DiskAccessMode standard, indexAccessMode standard row_cache_size_in_mb: 0 key_cache_size_in_mb: 0 Our next try will to change SerializingCacheProvider to ConcurrentLinkedHashCacheProvider any other proposals are welcom On 07/04/2012 02:13 PM, Thomas Spengler wrote: Hi @all, since our upgrade form cassandra 1.0.3 to 1.1.0 the virtual memory usage of the cassandra-nodes explodes our setup is: * 5 - centos 5.8 nodes * each 4
Re: Dynamic CF
I thought I'd give this a try: create columnfamily at_event__ac_c ( ac_c text, ac_creation bigint, ac_name text, ac_value text, PRIMARY KEY (ac_c, ac_creation) ) with compression_parameters:sstable_compression = ''; Then, insert a few columns: begin batch using consistency quorum insert into at_event__ac_c (ac_c, ac_creation, ac_name, ac_value) values ('83433.361.t4.l1.cisco.com', 1303167920747402, 'ac_event_type', 'SERV.CPE.CONN') insert into at_event__ac_c (ac_c, ac_creation, ac_name, ac_value) values ('83433.361.t4.l1.cisco.com', 1303167920747402, 'ac_vid', '') insert into at_event__ac_c (ac_c, ac_creation, ac_name, ac_value) values ('83433.361.t4.l1.cisco.com', 1303167920747402, 'ac_state', 'UP') apply batch; then followed up with a query: cqlsh:op2 select * from at_event__ac_c; ac_c | ac_creation | ac_name | ac_value ---+--+--+-- 83433.361.t4.l1.cisco.com | 1303167920747402 | ac_state | UP So, I can't get the same index behavior from the ac_creation column as I get from the ac_c column. That is, ac_c is additive, ac_creation overwrites. When I turn this around and create a real wide row, like this: create columnfamily at_event__ac_c ( ac_c text, ac_creation bigint, ac_event_type text, ac_vid text, ac_state text, PRIMARY KEY (ac_c, ac_creation) ) with compression_parameters:sstable_compression = ''; I get the behavior I want, which is a composite row which is indexed by ac_c and ac_creation. If I have 100 columns defined as a static CF (like above), when I do an insert does it just use the space for the columns inserted, or the columns declared? -g On Tue, Jul 10, 2012 at 7:23 AM, Sylvain Lebresne sylv...@datastax.com wrote: On Fri, Jul 6, 2012 at 10:49 PM, Leonid Ilyevsky lilyev...@mooncapital.com wrote: At this point I am really confused about what direction Cassandra is going. CQL 3 has the benefit of composite keys, but no dynamic columns. I thought, the whole point of Cassandra was to provide dynamic tables. CQL3 absolutely provide dynamic tables/wide rows, the syntax is just different. The typical example for wide rows is a time serie, for instance keeping all the events for a given event_kind in the same C* row ordered by time. You declare that in CQL3 using: CREATE TABLE events ( event_kind text, time timestamp, event_name text, event_details text, PRIMARY KEY (event_kind, time) ) The important part in such definition is that one CQL row (i.e a given event_kind, time, event_name, even_details) does not map to an internal Cassandra row. More precisely, all events sharing the same event_kind will be in the same internal row. This is a wide row/dynamic table in the sense of thrift. I need to have a huge table to store market quotes, and be able to query it by name and timestamp (t1 = t = t2), therefore I wanted the composite key. Loading data to such table using prepared statements (CQL 3-based) was very slow, because it makes a server call for each row. You should use a BATCH statement which is the equivalent to batch_mutate. -- Sylvain
Re: Starting cassandra with -D option
I did something similar for my installation, but I used ENV variables: I created a directory on a machine (call this the master) with directories for all of the distributions (call them slaves). So, consider: /master/slave1 /master/slave2 ... /master/slaven then i rdist this to all of my slaves. In the /master directory all of the standard cassandra distribution. In the /master/slave* directory all of the machine dependent stuff. Also in /master I have a .profile with: -bash-4.1$ cat /master/.profile # export CASSANDRA_HOME=$HOME/run SHOST=`hostname | sed s'/\..*//'` export CASSANDRA_CONF=$CASSANDRA_HOME/conf/$SHOST export CASSANDRA_INCLUDE=$CASSANDRA_HOME/conf/$SHOST/cassandra.in.sh . $CASSANDRA_HOME/conf/cassandra-env.sh PATH=$HOME/run/bin:$PATH echo 'to start cassandra type cassandra' this leaves me with this environment on each slave (slave1 example): -bash-4.1$ env | grep CAS CASSANDRA_HOME=/usr/share/cassandra/run CASSANDRA_CONF=/usr/share/cassandra/run/conf/slave1 CASSANDRA_INCLUDE=/usr/share/cassandra/run/conf/slave1/cassandra.in.sh Using this technique I maintain my Cassandra cluster on 1 machine and rdist to the participants.Rdist makes each node independent. -greg On Sun, Jun 24, 2012 at 1:11 PM, aaron morton aa...@thelastpickle.com wrote: Idea is to avoid having the copies of cassandra code in each node, If you run cassandra from the NAS you are adding a single point of failure into the system. Better to use some form of deployment automation and install all the requirement components onto each node. Cheers - Aaron Morton Freelance Developer @aaronmorton http://www.thelastpickle.com On 22/06/2012, at 12:29 AM, Flavio Baronti wrote: The option must actually include also the name of the yaml file: Dcassandra.config=file:///Users/walmart/Downloads/Cassandra/Node2-Cassandra1.1.0/conf/cassandra.yaml Flavio Il 6/21/2012 13:16 PM, Roshni Rajagopal ha scritto: Hi Folks, We wanted to have a single cassandra installation, and use it to start cassandra in other nodes by passing it the cassandra configuration directories as a parameter. Idea is to avoid having the copies of cassandra code in each node, and starting each node by getting into bin/cassandra of that node. As per http://www.datastax.com/docs/1.0/references/cassandra, We have an option –D where we can supply some parameters to cassandra. Has anyone tried this? Im getting an error as below. walmarts-MacBook-Pro-2:Node1-Cassandra1.1.0 walmart$ bin/cassandra -Dcassandra.config=file:///Users/walmart/Downloads/Cassandra/Node2-Cassandra1.1.0/conf walmarts-MacBook-Pro-2:Node1-Cassandra1.1.0 walmart$ INFO 15:38:01,763 Logging initialized INFO 15:38:01,766 JVM vendor/version: Java HotSpot(TM) 64-Bit Server VM/1.6.0_31 INFO 15:38:01,766 Heap size: 1052770304/1052770304 INFO 15:38:01,766 Classpath: bin/../conf:bin/../build/classes/main:bin/../build/classes/thrift:bin/../lib/antlr-3.2.jar:bin/../lib/apache-cassandra-1.1.0.jar:bin/../lib/apache-cassandra-clientutil-1.1.0.jar:bin/../lib/apache-cassandra-thrift-1.1.0.jar:bin/../lib/avro-1.4.0-fixes.jar:bin/../lib/avro-1.4.0-sources-fixes.jar:bin/../lib/commons-cli-1.1.jar:bin/../lib/commons-codec-1.2.jar:bin/../lib/commons-lang-2.4.jar:bin/../lib/compress-lzf-0.8.4.jar:bin/../lib/concurrentlinkedhashmap-lru-1.2.jar:bin/../lib/guava-r08.jar:bin/../lib/high-scale-lib-1.1.2.jar:bin/../lib/jackson-core-asl-1.9.2.jar:bin/../lib/jackson-mapper-asl-1.9.2.jar:bin/../lib/jamm-0.2.5.jar:bin/../lib/jline-0.9.94.jar:bin/../lib/json-simple-1.1.jar:bin/../lib/libthrift-0.7.0.jar:bin/../lib/log4j-1.2.16.jar:bin/../lib/metrics-core-2.0.3.jar:bin/../lib/mx4j-tools-3.0.1.jar:bin/../lib/servlet-api-2.5-20081211.jar:bin/../lib/slf4j-api-1.6.1.jar:bin/../lib/slf4j-log4j12-1.6.1.jar:bin/../lib/snakeyaml-1.6.jar:bin/../lib/snappy-java-1.0.4.1.jar:bin/../lib/snaptree-0.1.jar:bin/../lib/jamm-0.2.5.jar INFO 15:38:01,768 JNA not found. Native methods will be disabled. INFO 15:38:01,826 Loading settings from file:/Users/walmart/Downloads/Cassandra/Node2-Cassandra1.1.0/conf ERROR 15:38:01,873 Fatal configuration error error Can't construct a java object for tag:yaml.org,2002:org.apache.cassandra.config.Config; exception=No single argument constructor found for class org.apache.cassandra.config.Config in reader, line 1, column 1: cassandra.yaml The other option would be to modify cassandra.in.sh. Has anyone tried this?? Regards, Roshni This email and any files transmitted with it are confidential and intended solely for the individual or entity to whom they are addressed. If you have received this email in error destroy it immediately. *** Walmart Confidential ***
Re: Supercolumn behavior on writes
Derek, Thanks for that! Yes, I am aware of that technique. I am currently using something very similar on an sql database. I think one of the great benefits with Cassandra is that you can invent these on the fly. I also think there is great benefit to keep all of the columns in the same row. Anyway, I didn't mean to hijack Oleg's thread. I am interested in the original question about the serialization/deserialization on write. Does anybody know? -g On Wed, Jun 13, 2012 at 11:45 PM, Derek Williams de...@fyrie.net wrote: On Wed, Jun 13, 2012 at 9:08 PM, Greg Fausak g...@named.com wrote: Interesting. How do you do it? I have a version 2 CF, that works fine. A version 3 table won't let me invent columns that don't exist yet. (for composite tables). What's the trick? You are able to get the same behaviour as non cql by doing something like this: CREATE TABLE mytable ( id bigint, name text, value text, PRIMARY KEY (id, name) ) WITH COMPACT STORAGE; This table will work exactly like a standard column family with no defined columns. For example: cqlsh:testing INSERT INTO mytable (id, name, value) VALUES (1, 'firstname', 'Alice'); cqlsh:testing INSERT INTO mytable (id, name, value) VALUES (1, 'email', 'al...@example.org'); cqlsh:testing INSERT INTO mytable (id, name, value) VALUES (2, 'firstname', 'Bob'); cqlsh:testing INSERT INTO mytable (id, name, value) VALUES (2, 'webpage', 'http://bob.example.org'); cqlsh:testing INSERT INTO mytable (id, name, value) VALUES (2, 'email', 'b...@example.org'); cqlsh:testing SELECT name, value FROM mytable WHERE id = 2; name | value ---+ email | b...@example.org firstname | Bob webpage | http://bob.example.org Not very exciting, but when you take a look with cassandra-cli: [default@testing] get mytable[2]; = (column=email, value=b...@example.org, timestamp=1339648270284000) = (column=firstname, value=Bob, timestamp=1339648270275000) = (column=webpage, value=http://bob.example.org, timestamp=133964827028) Returned 3 results. Elapsed time: 11 msec(s). which is exactly what you would expect from a normal cassandra column family. So the trick is to separate your static columns and your dynamic columns into separate column families. Column names and types can of course be something different then my example, and inserts can be done within a 'BATCH' to avoid multiple round trips. Also, I'm not trying to advocate this as being a better solution then just using the old thrift interface, I'm just showing an example of how to do it. I personally do prefer this way as it is more predictable, but of course others will have a different opinion. -- Derek Williams
cql 3 qualification failing?
I have playing around with composite CFs, I have one declared: create columnfamily at_event_ac_c ( ac_event_id int, ac_creation timestamp, ac_action text, ac_addr text, ac_advisory_id text, ac_c text, ... ev_sev text, ... ev_total text, ev_url text, ev_used text, toast text, fw text, name text, resp text, size text, PRIMARY KEY (ac_c, ac_creation) ) with compression_parameters:sstable_compression = ''; So, my main primary key is on the ac_c column, text, and the secondary composite key is on ac_creation, which is a date. These queries perform correctly: select * from at_event_ac_c where ac_c = '1234'; select * from at_event_ac_c where ac_c = '1234' and ac_creation '2012-07-15' and ac_creation '2012-07-18'; What's weird is I can't qualify on a non-indexed column, like: select * from at_event_ac_c where ac_c = '1234' and ac_creation '2012-07-15' and ac_creation '2012-07-18' and ev_sev = 2; I get an error: Bad Request: No indexed columns present in by-columns clause with Equal operator But, I just attended a class on this. I thought that once I used my indices the remaining qualifications would be satisfied via a filtering method. Obviously this is incorrect. Is there a way to 'filter' results? -g
Re: Supercolumn behavior on writes
That's a good question. I just went to a class, Ben was saying that any action on a super column requires de-re-serialization. But, it would be nice if a write had this sort of efficiency. I have been playing with the 1.1.1 version, in that one there are 'composite' columns, which I think are like super columns, but they don't require serialization and deserialization. However, there seems to be a catch. You can't 'invent' columns on the fly, everything has to be declared when you declare the column family. ---greg On Wed, Jun 13, 2012 at 6:52 PM, Oleg Dulin oleg.du...@gmail.com wrote: Does a write to a sub column involve deserialization of the entire super column ? Thanks, Oleg
Re: Supercolumn behavior on writes
Interesting. How do you do it? I have a version 2 CF, that works fine. A version 3 table won't let me invent columns that don't exist yet. (for composite tables). What's the trick? cqlsh -3 cas1 use onplus; cqlsh:onplus select * from at_event where ac_event_id = 7690254; ac_event_id | ac_creation | ac_event_type | ac_id | ev_sev -+--+---+---+ 7690254 | 2011-07-23 00:11:47+ | SERV.CPE.CONN | \N | 5 cqlsh:onplus update at_event set wingy = 'toto' where ac_event_id = 7690254; Bad Request: Unknown identifier wingy This is what I used to create it: // // create the event column family, this contains the static // part of the definition. many additional columns can be specified // in the port from relational, these would be mainly the at_event table // use onplus; create columnfamily at_event ( ac_event_id int PRIMARY KEY, ac_event_type text, ev_sev int, ac_id text, ac_creation timestamp ) with compression_parameters:sstable_compression = '' ; -g On Wed, Jun 13, 2012 at 9:36 PM, samal samalgo...@gmail.com wrote: You can't 'invent' columns on the fly, everything has to be declared when you declare the column family. That' s incorrect. You can define name on fly. Validation must be define when declaring CF