Re: cql 3 qualification failing?
On Thu, Jun 14, 2012 at 7:04 PM, Greg Fausak g...@named.com wrote: But, I just attended a class on this. I thought that once I used my indices the remaining qualifications would be satisfied via a filtering method. The actual rule is that you need to at least qualify one of the indexed column with an EQUAL. Then, if that is satisfied, you can indeed add filtering on non-indexed columns. In you example, the EQUAL is on a non-indexed column, so this won't work. For the request to work, you would need to index ac_c (but then you wouldn't need to index ac_creation for that specific request). -- Sylvain
Re: GCInspector works every 10 seconds!
Hi After I enable key cache and row cache, the problem gone, I guess it because we have lots of data in SSTable, and it takes more time, memory and cpu to search the data. BRs //Tang Weiqiang 2012/6/18 aaron morton aa...@thelastpickle.com It is also strange that although no data in Cassandra can fulfill the query conditions, but it takes more time if we have more data in Cassandra. These log messages: DEBUG [ReadStage:89] 2012-06-17 20:17:26,958 SliceQueryFilter.java (line 123) collecting 0 of 5000: 7fff0137f63408920e049c22:true:4@1339865451865018 DEBUG [ReadStage:89] 2012-06-17 20:17:26,958 SliceQueryFilter.java (line 123) collecting 0 of 5000: 7fff0137f63408a0eeab052a:true:4@1339865451866000 Say that the slice query read columns from the disk that were deleted. Have you tried your test with a clean (no files on disk) database ? Cheers - Aaron Morton Freelance Developer @aaronmorton http://www.thelastpickle.com On 18/06/2012, at 12:36 AM, Jason Tang wrote: Hi After I change log level to DEBUG, I found some log. Although we don't have traffic to Cassandra, but we have scheduled the task to perform the sliceQuery. We use time-stamp as the index, we will perform the query by every second to check if we have tasks to do. After 24 hours, we have 40G data in Cassandra, and we configure Casssandra as Max JVM Heap 6G, memtable 1G, disk_access_mode: mmap_index_only. It is also strange that although no data in Cassandra can fulfill the query conditions, but it takes more time if we have more data in Cassandra. Because we total have 20 million records in Cassandra which has time stamp as the index, and we query by MultigetSubSliceQuery, and set the range the value which not match any data in Cassnadra, So it suppose to return fast, but as we have 20 million data, it takes 2 seconds to get the query result. Is the GC caused by the scheduled query operation, and why it takes so many memory. Could we improve it? System.log: INFO [ScheduledTasks:1] 2012-06-17 20:17:13,574 GCInspector.java (line 123) GC for ParNew: 559 ms for 1 collections, 3258240912 used; max is 6274678784 DEBUG [ReadStage:99] 2012-06-17 20:17:25,563 SliceQueryFilter.java (line 123) collecting 0 of 5000: 0138ad1035880137f3372f3e0e28e3b6:false:36@1339815309124015 DEBUG [ReadStage:99] 2012-06-17 20:17:25,565 ReadVerbHandler.java (line 60) Read key 3331; sending response to 158060445@/192.168.0.3 DEBUG [ReadStage:96] 2012-06-17 20:17:25,845 SliceQueryFilter.java (line 123) collecting 0 of 5000: 0138ad1035880137f33a80cf6cb5d383:false:36@1339815526613007 DEBUG [ReadStage:96] 2012-06-17 20:17:25,847 ReadVerbHandler.java (line 60) Read key 3233; sending response to 158060447@/192.168.0.3 DEBUG [ReadStage:105] 2012-06-17 20:17:25,952 SliceQueryFilter.java (line 123) collecting 0 of 5000: 0138ad1035880137f330cd70c86690cd:false:36@1339814890872015 DEBUG [ReadStage:105] 2012-06-17 20:17:25,953 ReadVerbHandler.java (line 75) digest is d41d8cd98f00b204e9800998ecf8427e DEBUG [ReadStage:105] 2012-06-17 20:17:25,953 ReadVerbHandler.java (line 60) Read key 3139; sending response to 158060448@/192.168.0.3 DEBUG [ReadStage:89] 2012-06-17 20:17:25,959 CollationController.java (line 191) collectAllData DEBUG [ReadStage:108] 2012-06-17 20:17:25,959 CollationController.java (line 191) collectAllData DEBUG [ReadStage:107] 2012-06-17 20:17:25,959 CollationController.java (line 191) collectAllData DEBUG [ReadStage:89] 2012-06-17 20:17:26,958 SliceQueryFilter.java (line 123) collecting 0 of 5000: 7fff0137f63408920e049c22:true:4@1339865451865018 DEBUG [ReadStage:89] 2012-06-17 20:17:26,958 SliceQueryFilter.java (line 123) collecting 0 of 5000: 7fff0137f63408a0eeab052a:true:4@1339865451866000 DEBUG [ReadStage:89] 2012-06-17 20:17:26,959 SliceQueryFilter.java (line 123) collecting 0 of 5000: 7fff0137f63408b1319577c9:true:4@1339865451867003 DEBUG [ReadStage:89] 2012-06-17 20:17:26,959 SliceQueryFilter.java (line 123) collecting 0 of 5000: 7fff0137f63408c081e0b8a3:true:4@1339865451867004 DEBUG [ReadStage:89] 2012-06-17 20:17:26,959 SliceQueryFilter.java (line 123) collecting 0 of 5000: 7fff0137f6340deefb8a0627:true:4@1339865451920001 DEBUG [ReadStage:89] 2012-06-17 20:17:26,959 SliceQueryFilter.java (line 123) collecting 0 of 5000: 7fff0137f6340df9c21e9979:true:4@1339865451923002 DEBUG [ReadStage:89] 2012-06-17 20:17:26,959 SliceQueryFilter.java (line 123) collecting 0 of 5000: 7fff0137f6340e095ead1498:true:4@1339865451928000 DEBUG [ReadStage:89] 2012-06-17 20:17:26,960 SliceQueryFilter.java (line 123) collecting 0 of 5000: 7fff0137f6340e1af16cf151:true:4@1339865451935000 DEBUG [ReadStage:89] 2012-06-17 20:17:26,960 SliceQueryFilter.java (line 123) collecting
cassandra 1.0.x and java 1.7
Hello! Is it safe to use java 1.7 with cassandra 1.0.x Reason why i want do that, is that in java 1.7 appear options for rotate GC log: http://bugs.sun.com/bugdatabase/view_bug.do;jsessionid=ff824681055961e1f62393b68deb5?bug_id=6941923
RE: cassandra 1.0.x and java 1.7
We use 7u3 in production long enough with no problems. 7u4 requires larger minimum stack size, 160KB vs 128KB, but 160KB still not enough for Cassandra, with 192KB better, but needs more testing. https://issues.apache.org/jira/browse/CASSANDRA-4275 suggests 256KB Best regards / Pagarbiai Viktor Jevdokimov Senior Developer Email: viktor.jevdoki...@adform.com Phone: +370 5 212 3063 Fax: +370 5 261 0453 J. Jasinskio 16C, LT-01112 Vilnius, Lithuania Disclaimer: The information contained in this message and attachments is intended solely for the attention and use of the named addressee and may be confidential. If you are not the intended recipient, you are reminded that the information remains the property of the sender. You must not use, disclose, distribute, copy, print or rely on this e-mail. If you have received this message in error, please contact the sender immediately and irrevocably delete this message and any copies. -Original Message- From: ruslan usifov [mailto:ruslan.usi...@gmail.com] Sent: Monday, June 18, 2012 15:13 To: user@cassandra.apache.org Subject: cassandra 1.0.x and java 1.7 Hello! Is it safe to use java 1.7 with cassandra 1.0.x Reason why i want do that, is that in java 1.7 appear options for rotate GC log: http://bugs.sun.com/bugdatabase/view_bug.do;jsessionid=ff824681055961f fffe1f62393b68deb5?bug_id=6941923
What determines the memory that used by key cache??
Dear all! In my cluster, I found every key needs 192bytes in the key cache.So I want to know what determines the memory that used by key cache. How to calculate the value. Thanks in advance.
Re[2]: Problem with streaming with sstableloader into ubuntu node
Okay, we investigated the problem and found the source of proble in package org.apache.cassandra.io.sstable; public class Descriptor public static PairDescriptor,String fromFilename(File directory, String name) { // tokenize the filename StringTokenizer st = new StringTokenizer(name, String.valueOf(separator)); String nexttok; if bulkloader running from windows and cassandra running under Ubuntu, directory is (KeySpaceName\\ColumnFamilyName\\KeySpaceName-ColumnFamilyName-hc-177-Data.db so at next rows String ksname = st.nextToken(); String cfname = st.nextToken(); ksname becomes KeySpaceName\\ColumnFamilyName\\KeySpaceName Sincerely, Nury. Mon, 18 Jun 2012 15:40:17 +1200 от aaron morton aa...@thelastpickle.com: Cross platform clusters are not really supported. That said it sounds like a bug. If you can create some steps to reproduce it please create a ticket here https://issues.apache.org/jira/browse/CASSANDRA it may get looked it. Cheers - Aaron Morton Freelance Developer @aaronmorton http://www.thelastpickle.com On 16/06/2012, at 12:41 AM, Nury Redjepow wrote: Good day, everyone We are using sstableloader to bulk insert data into cassandra. Script is executed on developers machine with Windows to Single Node Cassandra. %JAVA_HOME%\bin\java -ea -cp %CASSANDRA_CLASSPATH% -Xmx256M -Dlog4j.configuration=log4j-tools.properties org.apache.cassandra.tools.BulkLoader -d 10.0.3.37 --debug -v DestinationPrices/PricesByHotel This works fine if destination cassandra is working under windows, but doesn't work with ubuntu instance. Cli is able to connect, but sstable seem to have problem with keyspace name. Logs in ubuntu instance show error messages like: ERROR [Thread-41] 2012-06-15 16:05:47,620 AbstractCassandraDaemon.java (line 134) Exception in thread Thread[Thread-41,5,main] java.lang.AssertionError: Unknown keyspace DestinationPrices\PricesByHotel\DestinationPrices In our schema we have keyspace DestinationPrices, and column family PricesByHotel. Somehow it's not accepted properly. So my question is, how should I specify keyspace name in command, to make it work correctly with Ubuntu?
Re: What determines the memory that used by key cache??
On Mon, Jun 18, 2012 at 8:53 AM, mich.hph mich@gmail.com wrote: Dear all! In my cluster, I found every key needs 192bytes in the key cache.So I want to know what determines the memory that used by key cache. How to calculate the value. According to http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Calculate-memory-used-for-keycache-tp6170528p6170814.htmlthe formula is: key cache memory use = Keycache size * (8 bytes for position i.e. value + X bytes for key + 16 bytes for token (RP) + 8 byte reference for DecoratedKey + 8 bytes for descriptor reference) which simplifies to Keycache size * (key size in bytes + 40) Are your row keys 152 bytes? Jim Thanks in advance.
large Rows Vs multinodes ring
In what extent, having possibly large rows, (many columns (sorted as timeststamp, or geohash or ...) will be nefast for a muli-node ring. I guess a row can be read/write just on one node, if yes it's more likely to fail, (than having one row per timestamp ..) thanks for explanations
Re: Not solr based type
For DSE specific questions try http://www.datastax.com/support-forums/ Cheers - Aaron Morton Freelance Developer @aaronmorton http://www.thelastpickle.com On 19/06/2012, at 1:09 AM, Abhijit Chanda wrote: Hi All, I am using datastax enterprise editions 2.1 latest feature solr. I have a keyspace name ABC and columnfamily namely XYZ. I want to search using solr on the field A_NO which is UTF8Type in nature. For that i have created a schema.xml in my solr node. It looks like this: ?xml version=1.0 encoding=UTF-8 ? schema name=ABC version=1.1 types fieldType name=text class=solr.TextField/ /types fields field name=A_NO type=text indexed=true stored=true/ /fields uniqueKeyA_NO/uniqueKey /schema After creating the schema i have made a shell script which looks like this #!/bin/sh SOLRCONFIG_URL=http://192.168.2.41:8983/solr/resource/ABC.XYZ/solrconfig.xml SOLRCONFIG=solrconfig.xml curl $SOLRCONFIG_URL --data-binary @$SOLRCONFIG -H 'Content-type:text/xml; charset=utf-8' echo Posted $SOLRCONFIG to $SOLRCONFIG_URL SCHEMA_URL=http://192.168.2.41:8983/solr/resource/ABC.XYZ/schema.xml SCHEMA=schema.xml curl $SCHEMA_URL --data-binary @$SCHEMA -H 'Content-type:text/xml; charset=utf-8' echo Posted $SCHEMA to $SCHEMA_URL Now when i execute the script it throws the warning message WARNING: java.lang.RuntimeException: javax.xml.parsers.ParserConfigurationException: Not Solr-based type of index is found: ABC_XYZ_A_NO_index Posted schema.xml to http://192.168.2.41:8983/solr/resource/ABC.XYZ/schema.xml for which proper indexing is not done. can any one help me Please guys i need your help Thanks in Advance -- Abhijit Chanda VeHere Interactive Pvt. Ltd. +91-974395
Setting up a cluster
I am new to Cassandra, and setting up a cluster for the first time with 1.1.1. There are three nodes, 1 acts as a seed node that all three have the ip address of that node as their seed. I have set the listen address to the address of each node and rpc address as 0.0.0.0. I turned the trace on on all three and see the GOSSIP messages between seed node and the other two, not between the two non-seed nodes, sometime I see connection timeout between seed node and the nodes but not very often; However nodetool -h address ring only shows one node in each machine (the localhost) a,d when I define a keyspace with any replication factor the begin and end token of the keyspace is the localhost token. P.S. i have generated tokens for each node. What did I miss here? Thanks Shahryar Sedghi -- Life is what happens while you are making other plans. ~ John Lennon
Re: large Rows Vs multinodes ring
It's not an exact science. Some general guidelines though: * A row normally represents an entity * Rows wider than the thrift_max_message_length_in_mb (16MB) cannot be retrieved in a single call * Wide rows (in the 10's of MB) become can make repair do more work than is needed. * Rows wider than in_memory_compaction_limit_in_mb (64) make compaction run slower Cheers - Aaron Morton Freelance Developer @aaronmorton http://www.thelastpickle.com On 19/06/2012, at 5:18 AM, Cyril Auburtin wrote: In what extent, having possibly large rows, (many columns (sorted as timeststamp, or geohash or ...) will be nefast for a muli-node ring. I guess a row can be read/write just on one node, if yes it's more likely to fail, (than having one row per timestamp ..) thanks for explanations
Re: Setting up a cluster
Did you set the cluster name to be the same ? Check the logs on the machines for errors or warnings. Finally check that each node can telnet to port 7000 on the others. Cheers - Aaron Morton Freelance Developer @aaronmorton http://www.thelastpickle.com On 19/06/2012, at 6:29 AM, Shahryar Sedghi wrote: I am new to Cassandra, and setting up a cluster for the first time with 1.1.1. There are three nodes, 1 acts as a seed node that all three have the ip address of that node as their seed. I have set the listen address to the address of each node and rpc address as 0.0.0.0. I turned the trace on on all three and see the GOSSIP messages between seed node and the other two, not between the two non-seed nodes, sometime I see connection timeout between seed node and the nodes but not very often; However nodetool -h address ring only shows one node in each machine (the localhost) a,d when I define a keyspace with any replication factor the begin and end token of the keyspace is the localhost token. P.S. i have generated tokens for each node. What did I miss here? Thanks Shahryar Sedghi -- Life is what happens while you are making other plans. ~ John Lennon
Re: Setting up a cluster
I did all you said. No errors and warnings. On Mon, Jun 18, 2012 at 2:31 PM, aaron morton aa...@thelastpickle.comwrote: Did you set the cluster name to be the same ? Check the logs on the machines for errors or warnings. Finally check that each node can telnet to port 7000 on the others. Cheers - Aaron Morton Freelance Developer @aaronmorton http://www.thelastpickle.com On 19/06/2012, at 6:29 AM, Shahryar Sedghi wrote: I am new to Cassandra, and setting up a cluster for the first time with 1.1.1. There are three nodes, 1 acts as a seed node that all three have the ip address of that node as their seed. I have set the listen address to the address of each node and rpc address as 0.0.0.0. I turned the trace on on all three and see the GOSSIP messages between seed node and the other two, not between the two non-seed nodes, sometime I see connection timeout between seed node and the nodes but not very often; However nodetool -h address ring only shows one node in each machine (the localhost) a,d when I define a keyspace with any replication factor the begin and end token of the keyspace is the localhost token. P.S. i have generated tokens for each node. What did I miss here? Thanks Shahryar Sedghi -- Life is what happens while you are making other plans. ~ John Lennon -- Life is what happens while you are making other plans. ~ John Lennon
Re: cassandra 1.0.9 error - Read an invalid frame size of 0
I found a fix for this one, rather a workaround. I changed the rpc_server_type in cassandra.yaml, from hsha to sync, and the error went away. I guess, there is some issue with the thrift nonblocking server. Thanks Gurpreet On Wed, May 16, 2012 at 7:04 PM, Gurpreet Singh gurpreet.si...@gmail.comwrote: Thanks Aaron. will do! On Mon, May 14, 2012 at 1:14 PM, aaron morton aa...@thelastpickle.comwrote: Are you using framed transport on the client side ? Try the Hector user list for hector specific help https://groups.google.com/forum/?fromgroups#!searchin/hector-users Cheers - Aaron Morton Freelance Developer @aaronmorton http://www.thelastpickle.com On 12/05/2012, at 5:44 AM, Gurpreet Singh wrote: This is hampering our testing of cassandra a lot, and our move to cassandra 1.0.9. Has anyone seen this before? Should I be trying a different version of cassandra? /G On Thu, May 10, 2012 at 11:29 PM, Gurpreet Singh gurpreet.si...@gmail.com wrote: Hi, i have created 1 node cluster of cassandra 1.0.9. I am setting this up for testing reads/writes. I am seeing the following error in the server system.log ERROR [Selector-Thread-7] 2012-05-10 22:44:02,607 TNonblockingServer.java (line 467) Read an invalid frame size of 0. Are you using TFramedTransport on the client side? Initially i was using a old hector 0.7.x, but even after switching to hector 1.0-5 and thrift version 0.6.1, i still see this error. I am using 20 threads writing/reading from cassandra. The max write batch size is 10 with payload size constant per key to be 600 bytes. On the client side, i see Hector exceptions happenning coinciding with these messages on the server. Any ideas why these errors are happenning? Thanks Gurpreet
Re: kswapd0 causing read timeouts
On Mon, 18 Jun 2012 11:57:17 -0700, Gurpreet Singh wrote: Thanks for all the information Holger. Will do the jvm updates, kernel updates will be slow to come by. I see that with disk access mode standard, the performance is stable and better than in mmap mode, so i will probably stick to that. Please let us know how things work out. Are you suggesting i try out mongodb? Uhm, no. :) I meant that it also uses mmap exclusively (!), and consequently can also have pretty bad/irregular performance when the (active) data set grows much larger than RAM. To be fair, that is a pretty hard problem in general. -h
Change of behaviour in multiget_slice query for unknown keys between 0.7 and 1.1?
Hi all, Was there a change of behaviour in multiget_slice query in Cassandra or Hector between 0.7 and 1.1 when dealing with a key that doesn't exist? We've just upgraded and our in memory unit test is failing (although just on my machine). The test code is looking for a key that doesn't exist and expects to get null. Instead it gets a ColumnSlice with a single column called val. If there were something there then we'd expect columns with names like bytes, int or string. Other rows in the column family have those columns as well as val. Is there a reason for this behaviour? I'd like to see if there was an explanation before I change the unit test for it. Many thanks in advance, Edward -- Edward Sargisson senior java developer Global Relay edward.sargis...@globalrelay.net mailto:edward.sargis...@globalrelay.net *866.484.6630* New York | Chicago | Vancouver | London (+44.0800.032.9829) | Singapore (+65.3158.1301) Global Relay Archive supports email, instant messaging, BlackBerry, Bloomberg, Thomson Reuters, Pivot, YellowJacket, LinkedIn, Twitter, Facebook and more. Ask about *Global Relay Message* http://www.globalrelay.com/services/message*--- *The Future of Collaboration in the Financial Services World * *All email sent to or from this address will be retained by Global Relay's email archiving system. This message is intended only for the use of the individual or entity to which it is addressed, and may contain information that is privileged, confidential, and exempt from disclosure under applicable law. Global Relay will not be liable for any compliance or technical information provided herein. All trademarks are the property of their respective owners.
Re: Row caching in Cassandra 1.1 by column family
Check out the rows_cached CF attribute. On 06/18/2012 06:01 PM, Oleg Dulin wrote: Dear distinguished colleagues: I don't want all of my CFs cached, but one in particular I do. How can I configure that ? Thanks, Oleg
Re: Setting up a cluster
Are you sure all your settings are perfect. If so, then plz follow this steps ./nodetool disablethrift ./nodetool disablegossip ./nodetool drain stop the service and then delete the all data, saved_caches and commitlog files. Then restart your service. Repeat these steps for all the nodes. I hope it will work. Regards, -- Abhijit Chanda VeHere Interactive Pvt. Ltd. +91-974395
Re: large Rows Vs multinodes ring
ok so it would be better to cut those large rows, inserting rows with row+ monthId, or row+week, and then all the corresponding columns inside, it will drastically reduce rows size, but to retrieve results overlapping between weeks or month, I have to to a multiget, less simple than a get thx for answer 2012/6/18 aaron morton aa...@thelastpickle.com It's not an exact science. Some general guidelines though: * A row normally represents an entity * Rows wider than the thrift_max_message_length_in_mb (16MB) cannot be retrieved in a single call * Wide rows (in the 10's of MB) become can make repair do more work than is needed. * Rows wider than in_memory_compaction_limit_in_mb (64) make compaction run slower Cheers - Aaron Morton Freelance Developer @aaronmorton http://www.thelastpickle.com On 19/06/2012, at 5:18 AM, Cyril Auburtin wrote: In what extent, having possibly large rows, (many columns (sorted as timeststamp, or geohash or ...) will be nefast for a muli-node ring. I guess a row can be read/write just on one node, if yes it's more likely to fail, (than having one row per timestamp ..) thanks for explanations