Re: Performance deterioration while building secondary index
Well, the problem is still there, i.e. I tried to add one more index and the 3-node cluster is just going spastic, becomes unresponsive etc. These boxes have plenty of CPU and memory. -- View this message in context: http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Performance-deterioration-while-building-secondary-index-tp6564401p6801680.html Sent from the cassandra-u...@incubator.apache.org mailing list archive at Nabble.com.
Re: Node added, no performance boost -- are the tokens correct?
On two different clusters, if I set the token to zero, on a node, its ownership drops to zero after migration. After I added the third one and moved tokens, I now have this: 33.33% 56713727820156410577229101238628035242 33.33% 113427455640312821154458202477256070484 33.33% 170141183460469231731687303715884105727 No zeroes. Eric Gilmore-3 wrote: A script that I have says the following: $ python ctokens.py How many nodes are in your cluster? 2 node 0: 0 node 1: 85070591730234615865843651857942052864 The first token should be zero, for the reasons discussed here: http://www.datastax.com/dev/tutorials/getting_started_0_7/configuring#initial-token-values More details are available in http://www.datastax.com/docs/0.7/operations/clustering#adding-capacity The DS docs have some weak areas, but these two pages have been pretty well vetted over the past months :) On Thu, Mar 31, 2011 at 3:06 PM, buddhasystem lt;potek...@bnl.govgt; wrote: I just configured a cluster of two nodes -- do these token values make sense? The reason I'm asking that so far I don't see load balancing to be happening, judging from performance. Address Status State LoadOwnsToken 170141183460469231731687303715884105728 130.199.185.194 Up Normal 153.52 GB 50.00% 85070591730234615865843651857942052864 130.199.185.193 Up Normal 199.82 GB 50.00% 170141183460469231731687303715884105728 -- View this message in context: http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Node-added-no-performance-boost-are-the-tokens-correct-tp6228872p6228872.html Sent from the cassandra-u...@incubator.apache.org mailing list archive at Nabble.com. -- View this message in context: http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Node-added-no-performance-boost-are-the-tokens-correct-tp6228872p6231845.html Sent from the cassandra-u...@incubator.apache.org mailing list archive at Nabble.com.
Netstats out of sync?
I'm rebalancing a cluster of 2 nodes at this point. Netstats on the source node reports progress of the stream, whereas on the receving end netstats states that progress = 0. Did anyone see that? Do I need both nodes listed as seeds in cassandra.yaml? TIA/ -- View this message in context: http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Netstats-out-of-sync-tp6227986p6227986.html Sent from the cassandra-u...@incubator.apache.org mailing list archive at Nabble.com.
Node added, no performance boost -- are the tokens correct?
I just configured a cluster of two nodes -- do these token values make sense? The reason I'm asking that so far I don't see load balancing to be happening, judging from performance. Address Status State LoadOwnsToken 170141183460469231731687303715884105728 130.199.185.194 Up Normal 153.52 GB 50.00% 85070591730234615865843651857942052864 130.199.185.193 Up Normal 199.82 GB 50.00% 170141183460469231731687303715884105728 -- View this message in context: http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Node-added-no-performance-boost-are-the-tokens-correct-tp6228872p6228872.html Sent from the cassandra-u...@incubator.apache.org mailing list archive at Nabble.com.
Re: Node added, no performance boost -- are the tokens correct?
Yup, I screwed up the token setting, my bad. Now, I moved the tokens. I still observe that read latency deteriorated with 3 machines vs original one. Replication factor is 1, Cassandra version 0.7.2 (didn't have time to upgrade as I need results by this weekend). Key and row caching was disabled to get the worse case scenario test results. -- View this message in context: http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Node-added-no-performance-boost-are-the-tokens-correct-tp6228872p6229564.html Sent from the cassandra-u...@incubator.apache.org mailing list archive at Nabble.com.
Re: data aggregation in Cassandra
Hello Saurabh, I have a similar situation, with a more complex data model, and I do an equivalent of map-reduce by hand. The redeeming value is that you have complete freedom in how you hash, and you design the way you store indexes and similar structures. If there is a pattern in data store, you use it to your advantage. In the end, you get good performance. -- View this message in context: http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/data-aggregation-in-Cassandra-tp6206994p6207879.html Sent from the cassandra-u...@incubator.apache.org mailing list archive at Nabble.com.
Re: cassandra nodes with mixed hard disk sizes
aaron morton wrote: Also a node is be responsible for storing it's token range and acting as a replica for other token ranges. So reducing the token range may not have a dramatic affect on the storage requirements. Aaron, is there a way to configure wimpy nodes such that the replicas are elsewhere? -- View this message in context: http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/cassandra-nodes-with-mixed-hard-disk-sizes-tp6194071p6195543.html Sent from the cassandra-u...@incubator.apache.org mailing list archive at Nabble.com.
Re: Deleting old SSTables
Jonathan, for all of us just tinker with test clusters, building confidence in the product, it would be nice to be able to do same with nodetool, without jconsole, just my 0.5 penny. Thanks. Jonathan Ellis-3 wrote: From the next paragraph of the same wiki page: SSTables that are obsoleted by a compaction are deleted asynchronously when the JVM performs a GC. You can force a GC from jconsole if necessary, but Cassandra will force one itself if it detects that it is low on space. A compaction marker is also added to obsolete sstables so they can be deleted on startup if the server does not perform a GC before being restarted. On Tue, Mar 22, 2011 at 8:30 AM, Jonathan Colby lt;jonathan.co...@gmail.comgt; wrote: gt; According to the Wiki Page on compaction: once compaction is finished, the old SSTable files may be deleted* gt; gt; * http://wiki.apache.org/cassandra/MemtableSSTable gt; gt; I thought the old SSTables would be deleted automatically, but this wiki page got me thinking otherwise. gt; gt; Question is, if it is true that old SSTables must be manually deleted, how can one safely identify which SSTables can be deleted?? gt; gt; Jon gt; gt; gt; gt; gt; gt; -- Jonathan Ellis Project Chair, Apache Cassandra co-founder of DataStax, the source for professional Cassandra support http://www.datastax.com -- View this message in context: http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Deleting-old-SSTables-tp6196113p6198172.html Sent from the cassandra-u...@incubator.apache.org mailing list archive at Nabble.com.
Re: 0.7.2 choking on a 5 MB column
Jonathan, wide rows have been discussed. I thought that the limit on number of columns is way bigger than 45k. What can one expect in reality? -- View this message in context: http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/0-7-2-choking-on-a-5-MB-column-tp6198387p6198548.html Sent from the cassandra-u...@incubator.apache.org mailing list archive at Nabble.com.
Re: 0.7.2 choking on a 5 MB column
I see. I'm doing something even more drastic then, because I'm only inserting one row in this case, and just use cf.insert(), without batch mutator. It didn't occur to me that was a bad idea. So I take it, this method will fail. Hmm. -- View this message in context: http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/0-7-2-choking-on-a-5-MB-column-tp6198387p6198618.html Sent from the cassandra-u...@incubator.apache.org mailing list archive at Nabble.com.
Re: Reading whole row vs a range of columns (pycassa)
Aaron, thanks for chiming in. I'm doing what you said, i.e. all data for a single object (which is quite lean with about 100 attributes 10 bytes each) just goes into a single column, as opposed to the previous version of my application, which had all attributes of each small object mapped to individual columns. So yes, I perhaps considered having 100 objects in a single column but that is suboptimal for many reasons (hard to add object later). My reference to OOP was this -- if I was sticking with the original design, it could have been advantageous to have OOP since statistically it's likely that requests for objects are often serial, e.g. often people don't query for just one object with id=123, but for a series like id=[123..145]. If I bunch these into rows containing 100 objects each, that promises some efficiency right there, as I read one row as opposed to say 50. aaron morton wrote: I'd collapse all the data for a single object into a single column, not sure about storing 100 objects in a single column though. Have you considered any concurrency issues ? e.g. multiple threads / processes wanting to update different objects in the same group of 100? Dont understand your reference to the OOP in the context of a reading 100 columns from a row. Aaron On 19 Mar 2011, at 16:22, buddhasystem wrote: gt; As I'm working on this further, I want to understand this: gt; gt; Is it advantageous to flatten data in blocks (strings) each containing a gt; series of objects, if I know that a serial object read is often likely, but gt; don't want to resort to OPP? I worked out the optimal granularity, it seems. gt; Is it better to read a serialized single column with 100 objects than a row gt; consisting of a hundred columns each modeling an object? gt; gt; -- gt; View this message in context: http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Reading-whole-row-vs-a-range-of-columns-pycassa-tp6186518p6186782.html gt; Sent from the cassandra-u...@incubator.apache.org mailing list archive at Nabble.com. -- View this message in context: http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Reading-whole-row-vs-a-range-of-columns-pycassa-tp6186518p6190639.html Sent from the cassandra-u...@incubator.apache.org mailing list archive at Nabble.com.
Undead rows after nodetool compact
This has been discussed once, but I don't remember the outcome. I insert a row and then delete the key immediately. I then run nodetool compact. In cassanra-cli, list cf still return 1 empty row. This is not a showstopper but damn unpretty. Is there a way to make deleted rows go, immediately? -- View this message in context: http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Undead-rows-after-nodetool-compact-tp6186021p6186021.html Sent from the cassandra-u...@incubator.apache.org mailing list archive at Nabble.com.
Reading whole row vs a range of columns (pycassa)
Is there is noticeable difference in speed between reading the whole row through Pycassa, vs a range of columns? Both rows and columns are pretty slim. -- View this message in context: http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Reading-whole-row-vs-a-range-of-columns-pycassa-tp6186518p6186518.html Sent from the cassandra-u...@incubator.apache.org mailing list archive at Nabble.com.
Re: Reading whole row vs a range of columns (pycassa)
As I'm working on this further, I want to understand this: Is it advantageous to flatten data in blocks (strings) each containing a series of objects, if I know that a serial object read is often likely, but don't want to resort to OPP? I worked out the optimal granularity, it seems. Is it better to read a serialized single column with 100 objects than a row consisting of a hundred columns each modeling an object? -- View this message in context: http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Reading-whole-row-vs-a-range-of-columns-pycassa-tp6186518p6186782.html Sent from the cassandra-u...@incubator.apache.org mailing list archive at Nabble.com.
Does concurrent_reads relate to number of drives in RAID0?
Hello, in the instructions, I need to link concurrent_reads to number of drives. Is this related to number of physical drives that I have in my RAID0, or something else? -- View this message in context: http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Does-concurrent-reads-relate-to-number-of-drives-in-RAID0-tp6182346p6182346.html Sent from the cassandra-u...@incubator.apache.org mailing list archive at Nabble.com.
Re: Does concurrent_reads relate to number of drives in RAID0?
Thanks to all for replying, but frankly I didn't get the answer I wanted. Does the number of disks apply to number of spindles in RAID0? Or something else like a separate disk for commitlog and for data? -- View this message in context: http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Does-concurrent-reads-relate-to-number-of-drives-in-RAID0-tp6182346p6183033.html Sent from the cassandra-u...@incubator.apache.org mailing list archive at Nabble.com.
Re: Does concurrent_reads relate to number of drives in RAID0?
Thanks Peter, I can see it better now. -- View this message in context: http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Does-concurrent-reads-relate-to-number-of-drives-in-RAID0-tp6182346p6183051.html Sent from the cassandra-u...@incubator.apache.org mailing list archive at Nabble.com.
Re: Does concurrent_reads relate to number of drives in RAID0?
Where and how do I choose it? -- View this message in context: http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Does-concurrent-reads-relate-to-number-of-drives-in-RAID0-tp6182346p6183069.html Sent from the cassandra-u...@incubator.apache.org mailing list archive at Nabble.com.
Please help decipher /proc/cpuinfo for optimal Cassandra config
Dear All, this is from my new Cassandra server. It obviously uses hyperthreading, I just don't know how to translate this to concurrent readers and writers in cassandra.yaml -- can somebody take a look and tell me what number of cores I need to assume for concurrent_reads and concurrent_writes. Is it 24? Thanks! [cassandra@cassandra01 bin]$ cat /proc/cpuinfo processor : 0 vendor_id : GenuineIntel cpu family : 6 model : 44 model name : Intel(R) Xeon(R) CPU X5650 @ 2.67GHz stepping: 2 cpu MHz : 1596.000 cache size : 12288 KB physical id : 0 siblings: 12 core id : 0 cpu cores : 6 apicid : 0 initial apicid : 0 fpu : yes fpu_exception : yes cpuid level : 11 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good xtopology nonstop_tsc aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 cx16 xtpr pdcm dca sse4_1 sse4_2 popcnt aes lahf_lm arat tpr_shadow vnmi flexpriority ept vpid bogomips: 5333.91 clflush size: 64 cache_alignment : 64 address sizes : 40 bits physical, 48 bits virtual power management: processor : 1 vendor_id : GenuineIntel cpu family : 6 model : 44 model name : Intel(R) Xeon(R) CPU X5650 @ 2.67GHz stepping: 2 cpu MHz : 1596.000 cache size : 12288 KB physical id : 0 siblings: 12 core id : 1 cpu cores : 6 apicid : 2 initial apicid : 2 fpu : yes fpu_exception : yes cpuid level : 11 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good xtopology nonstop_tsc aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 cx16 xtpr pdcm dca sse4_1 sse4_2 popcnt aes lahf_lm arat tpr_shadow vnmi flexpriority ept vpid bogomips: 5333.15 clflush size: 64 cache_alignment : 64 address sizes : 40 bits physical, 48 bits virtual power management: processor : 2 vendor_id : GenuineIntel cpu family : 6 model : 44 model name : Intel(R) Xeon(R) CPU X5650 @ 2.67GHz stepping: 2 cpu MHz : 1596.000 cache size : 12288 KB physical id : 0 siblings: 12 core id : 2 cpu cores : 6 apicid : 4 initial apicid : 4 fpu : yes fpu_exception : yes cpuid level : 11 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good xtopology nonstop_tsc aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 cx16 xtpr pdcm dca sse4_1 sse4_2 popcnt aes lahf_lm arat tpr_shadow vnmi flexpriority ept vpid bogomips: 5333.15 clflush size: 64 cache_alignment : 64 address sizes : 40 bits physical, 48 bits virtual power management: processor : 3 vendor_id : GenuineIntel cpu family : 6 model : 44 model name : Intel(R) Xeon(R) CPU X5650 @ 2.67GHz stepping: 2 cpu MHz : 1596.000 cache size : 12288 KB physical id : 0 siblings: 12 core id : 8 cpu cores : 6 apicid : 16 initial apicid : 16 fpu : yes fpu_exception : yes cpuid level : 11 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good xtopology nonstop_tsc aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 cx16 xtpr pdcm dca sse4_1 sse4_2 popcnt aes lahf_lm arat tpr_shadow vnmi flexpriority ept vpid bogomips: 5333.15 clflush size: 64 cache_alignment : 64 address sizes : 40 bits physical, 48 bits virtual power management: processor : 4 vendor_id : GenuineIntel cpu family : 6 model : 44 model name : Intel(R) Xeon(R) CPU X5650 @ 2.67GHz stepping: 2 cpu MHz : 1596.000 cache size : 12288 KB physical id : 0 siblings: 12 core id : 9 cpu cores : 6 apicid : 18 initial apicid : 18 fpu : yes fpu_exception : yes cpuid level : 11 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good xtopology nonstop_tsc aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 cx16
Re: Is column update column-atomic or row atomic?
Hello Peter, thanks for the note. I'm not looking for anything fancy. It's just when I'm looking at the following bit of Pycassa docs, it's not 100% clear to me that it won't overwrite the entire row for the key, if I want to simply add an extra column {'foo':'bar'} to the already existing row. I don't care about cross-node consistency at this point. insert(key, columns[, timestamp][, ttl][, write_consistency_level])¶ Insert or update columns in the row with key key. columns should be a dictionary of columns or super columns to insert or update. If this is a standard column family, columns should look like {column_name: column_value}. If this is a super column family, columns should look like {super_column_name: {sub_column_name: value}} -- View this message in context: http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Is-column-update-column-atomic-or-row-atomic-tp6174445p6179492.html Sent from the cassandra-u...@incubator.apache.org mailing list archive at Nabble.com.
Re: Is column update column-atomic or row atomic?
Thanks for clarification, Tyler, sorry again for the basic question. I've been doing straight inserts from Oracle so far but now I need to update rows with new columns. -- View this message in context: http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Is-column-update-column-atomic-or-row-atomic-tp6174445p6179536.html Sent from the cassandra-u...@incubator.apache.org mailing list archive at Nabble.com.
Re: Please help decipher /proc/cpuinfo for optimal Cassandra config
Thanks! Docs say it's good to set it to 8*Ncores, are saying you see 8 cores in this output? I know I need to go way above default 32 with this setup. -- View this message in context: http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Please-help-decipher-proc-cpuinfo-for-optimal-Cassandra-config-tp6179487p6179539.html Sent from the cassandra-u...@incubator.apache.org mailing list archive at Nabble.com.
Is column update column-atomic or row atomic?
Sorry for the rather primitive question, but it's not clear to me if I need to fetch the whole row, add a column as a dictionary entry and re-insert it if I want to expand the row by one column. Help will be appreciated. -- View this message in context: http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Is-column-update-column-atomic-or-row-atomic-tp6174445p6174445.html Sent from the cassandra-u...@incubator.apache.org mailing list archive at Nabble.com.
Re: Is column update column-atomic or row atomic?
Thanks. Can you give me a pycassa example, if possible? Thanks! -- View this message in context: http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Is-column-update-column-atomic-or-row-atomic-tp6174445p6174487.html Sent from the cassandra-u...@incubator.apache.org mailing list archive at Nabble.com.
Re: Cassandra LongType data insertion problem for secondary index usage
Tyler, as a collateral issue - I've been wondering for a while what advantage if any it buys me, if I declare a value 'long' (which it roughly is) as opposed to passing around strings. String is flattened onto a replica of itself, I assume? No conversion? Maybe it even means better speed. Thanks, Maxim -- View this message in context: http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Cassandra-LongType-data-insertion-problem-for-secondary-index-usage-tp6158486p6159840.html Sent from the cassandra-u...@incubator.apache.org mailing list archive at Nabble.com.
null vs value not found?
I'm doing insertion with a pycassa client. It seems to work in most cases, but sometimes, when I go to Cassandra-cli, and query with key and column that I inserted, I get null whereas I shouldn't. What could be causes for that? -- View this message in context: http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/null-vs-value-not-found-tp6061828p6061828.html Sent from the cassandra-u...@incubator.apache.org mailing list archive at Nabble.com.
Re: null vs value not found?
Thanks Tyler, ColumnFamily: index1 Columns sorted by: org.apache.cassandra.db.marshal.AsciiType Row cache size / save period: 0.0/0 Key cache size / save period: 1.0/3600 Memtable thresholds: 0.8765625/50/60 GC grace seconds: 864000 Compaction min/max thresholds: 4/32 Read repair chance: 1.0 Built indexes: [] I pretty much went with the default settings, and the column name is 'CATALOG'. Maxim Tyler Hobbs-2 wrote: On Thu, Feb 24, 2011 at 2:27 PM, buddhasystem potek...@bnl.gov wrote: I'm doing insertion with a pycassa client. It seems to work in most cases, but sometimes, when I go to Cassandra-cli, and query with key and column that I inserted, I get null whereas I shouldn't. What could be causes for that? Could you clarify what column name and value you are using as well as the comparator and validator types? -- Tyler Hobbs Software Engineer, DataStax http://datastax.com/ Maintainer of the pycassa http://github.com/pycassa/pycassa Cassandra Python client library -- View this message in context: http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/null-vs-value-not-found-tp6061828p6061900.html Sent from the cassandra-u...@incubator.apache.org mailing list archive at Nabble.com.
Re: null vs value not found?
Thanks! You are right. I see exception but have no idea what went wrong. ERROR [ReadStage:14] 2011-02-24 21:51:29,374 AbstractCassandraDaemon.java (line 113) Fatal exception in thread Thread[ReadStage:14,5,main] java.io.IOError: java.io.EOFException at org.apache.cassandra.db.columniterator.SSTableNamesIterator.init(SSTableNamesIterator.java:75) at org.apache.cassandra.db.filter.NamesQueryFilter.getSSTableColumnIterator(NamesQueryFilter.java:59) at org.apache.cassandra.db.filter.QueryFilter.getSSTableColumnIterator(QueryFilter.java:80) at org.apache.cassandra.db.ColumnFamilyStore.getTopLevelColumns(ColumnFamilyStore.java:1316) at org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(ColumnFamilyStore.java:1205) at org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(ColumnFamilyStore.java:1134) at org.apache.cassandra.db.Table.getRow(Table.java:386) at org.apache.cassandra.db.SliceByNamesReadCommand.getRow(SliceByNamesReadCommand.java:60) at org.apache.cassandra.db.ReadVerbHandler.doVerb(ReadVerbHandler.java:69) at org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:70) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(Unknown Source) at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) at java.lang.Thread.run(Unknown Source) Caused by: java.io.EOFException at java.io.DataInputStream.readInt(Unknown Source) at org.apache.cassandra.utils.BloomFilterSerializer.deserialize(BloomFilterSerializer.java:48) at org.apache.cassandra.utils.BloomFilterSerializer.deserialize(BloomFilterSerializer.java:30) at org.apache.cassandra.io.sstable.IndexHelper.defreezeBloomFilter(IndexHelper.java:108) at org.apache.cassandra.db.columniterator.SSTableNamesIterator.read(SSTableNamesIterator.java:106) at org.apache.cassandra.db.columniterator.SSTableNamesIterator.init(SSTableNamesIterator.java:71) ... 12 more -- View this message in context: http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/null-vs-value-not-found-tp6061828p6061983.html Sent from the cassandra-u...@incubator.apache.org mailing list archive at Nabble.com.
Re: Homebrew CF-indexing vs secondary indexing
FWIW, for me the advantage of homebrew indexes is that they can be a lot more sophisticated than the standard -- I can hash combinations of column values to whatever I want. I also put counters on column values in the index, so there is lots of functionality. Of course, I can do it because my data becomes read-only, I know it's a luxury. -- View this message in context: http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Homebrew-CF-indexing-vs-secondary-indexing-tp6062677p6062705.html Sent from the cassandra-u...@incubator.apache.org mailing list archive at Nabble.com.
Will the large datafile size affect the performance?
I know that theoretically it should not (apart from compaction issues), but maybe somebody has experience showing otherwise: My test cluster now has 250GB of data and will have 1.5TB in its reincarnation. If all these data is in a single CF -- will it cause read or write performance problems? Should I shard it? One advantage of splitting the data would be reducing the impact of compaction and repairs (or so I naively assume). TIA Maxim -- View this message in context: http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Will-the-large-datafile-size-affect-the-performance-tp6057991p6057991.html Sent from the cassandra-u...@incubator.apache.org mailing list archive at Nabble.com.
Can I count on Super Column Families why planing 3 years out?
There was a discussion here on how well (or not so well) the Super CFs are supported. I now need to make a strategic decision as to how I plan my data. What's the consensus -- will the super CF be there 3 years out? TIA Maxim -- View this message in context: http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Can-I-count-on-Super-Column-Families-why-planing-3-years-out-tp6057997p6057997.html Sent from the cassandra-u...@incubator.apache.org mailing list archive at Nabble.com.
How come key cache increases speed by x4?
Well I know the cache is there for a reason, I just can't explain the factor of 4 when I run my queries on a hot vs cold cache. My queries are actually a chain of one on an inverted index, which produces a tuple of keys to be used in the main query. The inverted index query should be downright trivial. I see the turnaround time per row go down to 1 ms from 4 ms. Am I missing something? Why such a large factor? TIA Maxim -- View this message in context: http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/How-come-key-cache-increases-speed-by-x4-tp6058435p6058435.html Sent from the cassandra-u...@incubator.apache.org mailing list archive at Nabble.com.
Virtues and pitfall of using TYPES?
I've been too smart for my own good trying to type columns, on the theory that it would later increase performance by having more efficient comparators in place. So if a string represents an integer, I would convert it to an integer and declare the column as such. Same for LONG. What I found is that during the write operation, the type conversion kills the performance. It's really not too trivial amount of time. Has anyone had a similar experience? -- View this message in context: http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Virtues-and-pitfall-of-using-TYPES-tp6042432p6042432.html Sent from the cassandra-u...@incubator.apache.org mailing list archive at Nabble.com.
Re: Virtues and pitfall of using TYPES?
Dude, I never mentioned the server side, sorry if it wasn't obvious. As for python being slow, I'm not going away from it. It performs amazingly well in other circumstances. Jonathan Ellis-3 wrote: That doesn't make sense to me. IntegerType validation is a no-op and LongType validation is pretty close (just a size check). If you meant that the conversion is killing performance on your client, you should switch to a more performant client language. :) On Fri, Feb 18, 2011 at 9:56 PM, buddhasystem potek...@bnl.gov wrote: I've been too smart for my own good trying to type columns, on the theory that it would later increase performance by having more efficient comparators in place. So if a string represents an integer, I would convert it to an integer and declare the column as such. Same for LONG. What I found is that during the write operation, the type conversion kills the performance. It's really not too trivial amount of time. Has anyone had a similar experience? -- View this message in context: http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Virtues-and-pitfall-of-using-TYPES-tp6042432p6042432.html Sent from the cassandra-u...@incubator.apache.org mailing list archive at Nabble.com. -- Jonathan Ellis Project Chair, Apache Cassandra co-founder of DataStax, the source for professional Cassandra support http://www.datastax.com -- View this message in context: http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Virtues-and-pitfall-of-using-TYPES-tp6042432p6042601.html Sent from the cassandra-u...@incubator.apache.org mailing list archive at Nabble.com.
Re: create additional secondary index
I sidestep this problem by using a Python script (pycassa-based) where I configure my CFs. This way, it's reproducible and documented. -- View this message in context: http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/create-additional-secondary-index-tp6033574p6033683.html Sent from the cassandra-u...@incubator.apache.org mailing list archive at Nabble.com.
What is the most solid version of Cassandra? No secondary indexes needed.
Hello, we are acquiring new hardware for our cluster and will be installing it soon. It's likely that I won't need to rely on secondary index functionality, as data will be write-once read-many and I can get away with inverse index creation at load time, plus I have some more complex indexing in mind than comes packaged (too much to explain here). So, if I don't need indexes, what is the most stable, reliable version of Cassandra that I can put in production? I'm seeing bug reports here and some sound quite serious, I just want something that works day in, day out. Thank you, Maxim -- View this message in context: http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/What-is-the-most-solid-version-of-Cassandra-No-secondary-indexes-needed-tp6028966p6028966.html Sent from the cassandra-u...@incubator.apache.org mailing list archive at Nabble.com.
Re: What is the most solid version of Cassandra? No secondary indexes needed.
Thank you! It's just that 7.1 seems the bleeding edge now (a serious bug fixed today). Would you still trust it as a production-level service? I'm just slightly concerned. I don't want to create a perception among our IT that the product is not ready for prime time. -- View this message in context: http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/What-is-the-most-solid-version-of-Cassandra-No-secondary-indexes-needed-tp6028966p6029047.html Sent from the cassandra-u...@incubator.apache.org mailing list archive at Nabble.com.
Re: What is the most solid version of Cassandra? No secondary indexes needed.
Thank you Attila! We will indeed have a few months of breaking in. I suppose I'll keep my fingers crossed and see that 0.7.X is very stable. So I'll deploy 0.7.1 -- I will need to apply all the patches, there is no cumulative download, is that correct? Attila Babo wrote: 0.6.8 is stable and production ready, the later versions of the 0.6 branch has issues. No offense, but the 0.7 branch is fairly unstable from my experience. I have reproduced all the open bugs with a production dataset, even when tried to rebuild it from scratch after a complete loss. If you have a few month before going to production your best bet is still 0.7.1 as it will stabilize but the switch between versions is painful. /Attila -- View this message in context: http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/What-is-the-most-solid-version-of-Cassandra-No-secondary-indexes-needed-tp6028966p6029622.html Sent from the cassandra-u...@incubator.apache.org mailing list archive at Nabble.com.
Re: Column name size
I've been thinking about this as well. I'm migrating data from a large Oracle database, and the RDBMS columns names are descriptive (good) and long (bad). For now I just keep them when populating Cassandra, but I can shave off about 30% of storage by hashing names. I don't need any automation and can just maintain a dictionary of serial numbers to strings and vice versa, it's still under a 100 items. When you start building inverse indexes and other auxiliary structures, the size effect may be amiplified. -- View this message in context: http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Column-name-size-tp6015127p6016109.html Sent from the cassandra-u...@incubator.apache.org mailing list archive at Nabble.com.
Re: Limit on amount of CFs
I asked a similar question (but didn't receive an answer). I'm trying to see if a large number of CFs might be beneficial. One thing I can think about is the size of extra storage needed for compaction -- obviously it will be smaller in case of many smaller CFs. -- View this message in context: http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Limit-on-amount-of-CFs-tp6013702p6016125.html Sent from the cassandra-u...@incubator.apache.org mailing list archive at Nabble.com.
Re: Calculating the size of rows in KBs
Does it also mean that the whole row will be deserialized when a query comes just for one column? -- View this message in context: http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Calculating-the-size-of-rows-in-KBs-tp6011243p6017870.html Sent from the cassandra-u...@incubator.apache.org mailing list archive at Nabble.com.
Re: Specifying row caching on per query basis ?
Jonathan, what if the data is really homogeneous, but over a long period of time. I decided that the users who hit the database for recent past should have a better ride. Splitting into a separate CF also has costs, right? In fact, if I were to go this way, do you think I can crank down the key caches? If yes, down to what level, zero? Thanks! Jonathan Ellis-3 wrote: Not really, no. If you can't trust LRU to cache the hottest rows perhaps you should split the data into different ColumnFamilies. On Wed, Feb 9, 2011 at 1:43 PM, Ertio Lew ertio...@gmail.com wrote: Is this under consideration for future releases ? or being thought about!? On Thu, Feb 10, 2011 at 12:56 AM, Jonathan Ellis jbel...@gmail.com wrote: Currently there is not. On Wed, Feb 9, 2011 at 12:04 PM, Ertio Lew ertio...@gmail.com wrote: Is there any way to specify on per query basis(like we specify the Consistency level), what rows be cached while you're reading them, from a row_cache enabled CF. I believe, this could lead to much more efficient use of the cache space!!( if you use same data for different features/ parts in your application which have different caching needs). -- Jonathan Ellis Project Chair, Apache Cassandra co-founder of DataStax, the source for professional Cassandra support http://www.datastax.com -- Jonathan Ellis Project Chair, Apache Cassandra co-founder of DataStax, the source for professional Cassandra support http://www.datastax.com -- View this message in context: http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Specifying-row-caching-on-per-query-basis-tp6008838p6009462.html Sent from the cassandra-u...@incubator.apache.org mailing list archive at Nabble.com.
What will happen if I try to compact with insufficient headroom?
One of my nodes is 76% full. I know that one of CFs represents 90% of the data, others are really minor. Can I still compact under these conditions? Will it crash and lose the data? Will it try to create one very large file out of fragments, for that dominating CF? TIA -- View this message in context: http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/What-will-happen-if-I-try-to-compact-with-insufficient-headroom-tp6009619p6009619.html Sent from the cassandra-u...@incubator.apache.org mailing list archive at Nabble.com.
Can serialized objects in columns serve as ersatz superCFs?
Seeing that discussion here about indexes not supported in superCFs, and less than clear future of superCFs altogether, I was thinking about getting a modicum of same functionality with serialized objects inside columns. This way the column key becomes sort of analog of supercolumn key, and I handle the dictionaries I receive in the client. Does this sound OK? -- View this message in context: http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Can-serialized-objects-in-columns-serve-as-ersatz-superCFs-tp6003775p6003775.html Sent from the cassandra-u...@incubator.apache.org mailing list archive at Nabble.com.
Re: Can serialized objects in columns serve as ersatz superCFs?
Thanks for the comment! In my case, I want to store various time slices as indexes, so the content can be serialized as comma-separated concatenation of unique object IDs. Example: on 20101204, multiple clouds experienced a variety of errors in job execution. In addition, multiple users ran (or failed) on different clouds. If I combine user id, cloud id and error code, I can relatively easily drill for errors on a particular date. So each CF maps to a date, and each column in it is a compound index. -- View this message in context: http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Can-serialized-objects-in-columns-serve-as-ersatz-superCFs-tp6003775p6004834.html Sent from the cassandra-u...@incubator.apache.org mailing list archive at Nabble.com.
Java bombs during compaction, please help
Hello, one node in my 3-machine cluster cannot perform compaction. I tried multiple times, it ran out of heap space once and I increased it. Now I'm getting the dump below (after it does run for a few minutes). I hope somebody can shed a little light on what' going on, because I'm at a loss and this is a real show stopper. [me@mymachine]~/cassandra-test% Error occured while compacting keyspace Tracer java.util.concurrent.ExecutionException: java.lang.NullPointerException at java.util.concurrent.FutureTask$Sync.innerGet(Unknown Source) at java.util.concurrent.FutureTask.get(Unknown Source) at org.apache.cassandra.db.CompactionManager.performMajor(CompactionManager.java:186) at org.apache.cassandra.db.ColumnFamilyStore.forceMajorCompaction(ColumnFamilyStore.java:1766) at org.apache.cassandra.service.StorageService.forceTableCompaction(StorageService.java:1236) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source) at java.lang.reflect.Method.invoke(Unknown Source) at com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(Unknown Source) at com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(Unknown Source) at com.sun.jmx.mbeanserver.MBeanIntrospector.invokeM(Unknown Source) at com.sun.jmx.mbeanserver.PerInterface.invoke(Unknown Source) at com.sun.jmx.mbeanserver.MBeanSupport.invoke(Unknown Source) at com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.invoke(Unknown Source) at com.sun.jmx.mbeanserver.JmxMBeanServer.invoke(Unknown Source) at javax.management.remote.rmi.RMIConnectionImpl.doOperation(Unknown Source) at javax.management.remote.rmi.RMIConnectionImpl.access$200(Unknown Source) at javax.management.remote.rmi.RMIConnectionImpl$PrivilegedOperation.run(Unknown Source) at javax.management.remote.rmi.RMIConnectionImpl.doPrivilegedOperation(Unknown Source) at javax.management.remote.rmi.RMIConnectionImpl.invoke(Unknown Source) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source) at java.lang.reflect.Method.invoke(Unknown Source) at sun.rmi.server.UnicastServerRef.dispatch(Unknown Source) at sun.rmi.transport.Transport$1.run(Unknown Source) at java.security.AccessController.doPrivileged(Native Method) at sun.rmi.transport.Transport.serviceCall(Unknown Source) at sun.rmi.transport.tcp.TCPTransport.handleMessages(Unknown Source) at sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run0(Unknown Source) at sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run(Unknown Source) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(Unknown Source) at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) at java.lang.Thread.run(Unknown Source) Caused by: java.lang.NullPointerException at org.apache.cassandra.io.util.ColumnIterator$1.getKey(ColumnSortedMap.java:276) at org.apache.cassandra.io.util.ColumnIterator$1.getKey(ColumnSortedMap.java:263) at java.util.concurrent.ConcurrentSkipListMap.buildFromSorted(Unknown Source) at java.util.concurrent.ConcurrentSkipListMap.init(Unknown Source) at org.apache.cassandra.db.SuperColumnSerializer.deserialize(SuperColumn.java:384) at org.apache.cassandra.db.SuperColumnSerializer.deserialize(SuperColumn.java:332) at org.apache.cassandra.db.ColumnFamilySerializer.deserializeColumns(ColumnFamilySerializer.java:129) at org.apache.cassandra.io.sstable.SSTableIdentityIterator.getColumnFamilyWithColumns(SSTableIdentityIterator.java:137) at org.apache.cassandra.io.PrecompactedRow.init(PrecompactedRow.java:78) at org.apache.cassandra.io.CompactionIterator.getCompactedRow(CompactionIterator.java:139) at org.apache.cassandra.io.CompactionIterator.getReduced(CompactionIterator.java:108) at org.apache.cassandra.io.CompactionIterator.getReduced(CompactionIterator.java:43) at org.apache.cassandra.utils.ReducingIterator.computeNext(ReducingIterator.java:73) at com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:136) at com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:131) at org.apache.commons.collections.iterators.FilterIterator.setNextObject(FilterIterator.java:183) at org.apache.commons.collections.iterators.FilterIterator.hasNext(FilterIterator.java:94) at org.apache.cassandra.db.CompactionManager.doCompaction(CompactionManager.java:427) at
Re: Java bombs during compaction, please help
Thanks Jonathan -- does it mean that the machine is experiencing IO problems? -- View this message in context: http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Java-bombs-during-compaction-please-help-tp6001773p6002320.html Sent from the cassandra-u...@incubator.apache.org mailing list archive at Nabble.com.
Re: Finding the intersection results of column sets of two rows
Hello, If the amount of data is _that_ small, you'll have a much easier life with MySQL, which supports the join procedure -- because that's exactly what you want to achieve. asil klin wrote: Hi all, I want to procure the intersection of columns set of two rows (from 2 different column families). To achieve the intersection results, Can I, first retrieve all columns(around 300) from first row and just query by those column names in the second row(which contains maximum 100 000 columns) ? I am using the results during the write time not before presentation to the user, so latency wont be much concern while writing. Is it the proper way to procure intersection results of two rows ? Would love to hear your comments.. - Regards, Asil -- View this message in context: http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Finding-the-intersection-results-of-column-sets-of-two-rows-tp5997248p5997743.html Sent from the cassandra-u...@incubator.apache.org mailing list archive at Nabble.com.
How bad is teh impact of compaction on performance?
Just wanted to see if someone with experience in running an actual service can advise me: how often do you run nodetool compact on your nodes? Do you stagger it in time, for each node? How badly is performance affected? I know this all seems too generic but then again no two clusters are created equal anyhow. Just wanted to get a feel. Thanks, Maxim -- View this message in context: http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/How-bad-is-teh-impact-of-compaction-on-performance-tp5995868p5995868.html Sent from the cassandra-u...@incubator.apache.org mailing list archive at Nabble.com.
Re: How bad is teh impact of compaction on performance?
Thanks Edward. In our usage scenario, there is never downtime, it's a global 24/7 operation. What is impacted the worst, the read or write? How does a node handle compaction when there is a spike of writes coming to it? Edward Capriolo wrote: On Sat, Feb 5, 2011 at 11:59 AM, buddhasystem potek...@bnl.gov wrote: Just wanted to see if someone with experience in running an actual service can advise me: how often do you run nodetool compact on your nodes? Do you stagger it in time, for each node? How badly is performance affected? I know this all seems too generic but then again no two clusters are created equal anyhow. Just wanted to get a feel. Thanks, Maxim -- View this message in context: http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/How-bad-is-teh-impact-of-compaction-on-performance-tp5995868p5995868.html Sent from the cassandra-u...@incubator.apache.org mailing list archive at Nabble.com. This is an interesting topic. Cassandra can now remove tombstones on non-major compaction. For some use cases you may not have to trigger nodetool compact yourself to remove tombstones. Use cases that do not to many updates, deletes may have the least need to run compaction yourself. !However! If you have smaller SSTables, or less SSTables your read operations will be more efficient. if you have downtime such as from 1AM-6AM. Going through a major compaction might shrink you dataset significantly and that will make reads better. Compaction can be more or less intensive. The largest factor is is row size. Users with large rows probably see faster compaction while smaller rows see it take a long time. You can lower the priority of the compaction thread for experimentation. As to the performance you want to get your cluster to the state where it is not compacting often. This may mean you need more nodes to handle writes. I graph the compaction information from JMX http://www.jointhegrid.com/cassandra/cassandra-cacti-m6.jsp to get a feel for how often a node is compacting on average. Also I cross reference the compaction with Read latency and IO graphs I have to see what impact compaction has on reads. Forcing a major compaction also lowers the chances a compaction will happen during the day on peak time. I major compact a few cluster nodes each night through cron (gc time 3 days). This has been good for keeping our data on disk as small as possible. Forcing the major compact at night uses IO, but i find it saves IO over the course of the day because each read seeks less on disk. -- View this message in context: http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/How-bad-is-the-impact-of-compaction-on-performance-tp5995868p5995978.html Sent from the cassandra-u...@incubator.apache.org mailing list archive at Nabble.com.
Re: order of index expressions
Jonathan, what's the implementation of that? I.e. is is a product of indexes or nested loops? Thanks, Maxim -- View this message in context: http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/order-of-index-expressions-tp5995909p5996488.html Sent from the cassandra-u...@incubator.apache.org mailing list archive at Nabble.com.
Re: Using Cassandra to store files
Even when storage is in NFS, Cassandra can still be quite useful as a file catalog. Your physical storage can change, move etc. Therefore, it's a good idea to provide mapping of logical names to physical store points (which in fact can be many). This is a standard technique used in mass storage. -- View this message in context: http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Using-Cassandra-to-store-files-tp5988698p5993357.html Sent from the cassandra-u...@incubator.apache.org mailing list archive at Nabble.com.
Re: Moving data
FWIW, I'm working on migrating a large amount of data out of Oracle into my test cluster. The data has been warehoused as CSV files on Amazon S3. Having that in place allows me to not put extra load on the production service when doing many repeated tests. I then parse the data using CSV Python module and, as Jonathan says, use threads to batch upload data into Cassandra. Notable points: since the data is relatively sparse (i.e. many zeros for integers and empty strings for strings etc), I establish a default value dictionary, and don't write these to Cassandra at all -- they can be reconstructed as needed when reading back. Also, make sure you wrap Cassandra writes etc into exceptions. When load is high, you might get timeouts at TSocket level etc. -- View this message in context: http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Moving-data-tp5992669p5993443.html Sent from the cassandra-u...@incubator.apache.org mailing list archive at Nabble.com.
Re: Using Cassandra to store files
CouchDB -- View this message in context: http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Using-Cassandra-to-store-files-tp5988698p5989122.html Sent from the cassandra-u...@incubator.apache.org mailing list archive at Nabble.com.
Re: Slow network writes
Dude, are you asking me to unsubscribe? -- View this message in context: http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Slow-network-writes-tp5985757p5991488.html Sent from the cassandra-u...@incubator.apache.org mailing list archive at Nabble.com.
Commit log compaction
How often and by what criteria is the commit log compacted/truncated? Thanks, Maxim -- View this message in context: http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Commit-log-compaction-tp5985221p5985221.html Sent from the cassandra-u...@incubator.apache.org mailing list archive at Nabble.com.
Re: Commit log compaction
Thank you. So what is exactly the condition that causes the older commit log files to actually be removed? I observe that indeed they are rotated out when the threshold is reached, but then new ones a placed in the directory and the older ones are still there. Thanks, Maxim -- View this message in context: http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Commit-log-compaction-tp5985221p5986399.html Sent from the cassandra-u...@incubator.apache.org mailing list archive at Nabble.com.
Re: Counters in 0.8 -- conditional?
Thanks. Just wanted to note that counting the number of rows where foo=bar is a fairly ubiquitous task in db applications. In case of big data, trafficking all these data to client just to count something isn't optimal at all. Maxim -- View this message in context: http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Counters-in-0-8-conditional-tp5985214p5986442.html Sent from the cassandra-u...@incubator.apache.org mailing list archive at Nabble.com.
Re: Counters in 0.8 -- conditional?
Thanks. Yes I know it's by no means trivial. I thought in case there was an index on the column on which I want to place condition, the index machinery itself can do the counting (i.e. when the index is updated, the counter is incremented). It doesn't seem too orthogonal to the current implementation, at least from my very limited experience. Maxim -- View this message in context: http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Counters-in-0-8-conditional-tp5985214p5986871.html Sent from the cassandra-u...@incubator.apache.org mailing list archive at Nabble.com.
Re: Cassandra memory needs
Oleg, I just wanted to add that I confirmed the importance of that rule of thumb the hard way. I created two extra CFs and was able to reliably crash the nodes during writes. I guess for the final setting I'll rely on results of my testing. But it's also important to not cause the swap death of your machine (i.e. when you go too high on JVM memory). Regards Maxim -- View this message in context: http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Cassandra-memory-needs-tp5986663p5986911.html Sent from the cassandra-u...@incubator.apache.org mailing list archive at Nabble.com.
How do I get 0.7.1?
Thanks. Maxim -- View this message in context: http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/How-do-I-get-0-7-1-tp5986927p5986927.html Sent from the cassandra-u...@incubator.apache.org mailing list archive at Nabble.com.
Re: Slow network writes
Jonathan, where do I find that contrib/stress? Maxim -- View this message in context: http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Slow-network-writes-tp5985757p5986937.html Sent from the cassandra-u...@incubator.apache.org mailing list archive at Nabble.com.
Re: How do I get 0.7.1?
Stephen, sorry I didn't understand your missive. Maxim -- View this message in context: http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/How-do-I-get-0-7-1-tp5986927p5987184.html Sent from the cassandra-u...@incubator.apache.org mailing list archive at Nabble.com.
Re: cassandra as session store
Most if not all modern web application frameworks support sessions. This applies to Django (with which I have most experience and also run it with X.509 security layer) but also to Ruby on Rails and Pylons. So, why would you re-invent the wheel? Too messy. It's all out there for you to use. Regards, Maxim -- View this message in context: http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/cassandra-as-session-store-tp5981871p5981961.html Sent from the cassandra-u...@incubator.apache.org mailing list archive at Nabble.com.
Re: cassandra as session store
For completeness: http://stackoverflow.com/questions/3746685/running-django-site-in-multiserver-environment-how-to-handle-sessions http://docs.djangoproject.com/en/dev/topics/http/sessions/#using-cached-sessions I guess your approach does make sense, one only wishes that the servlet in question did more work for you. If I read correctly, Django can cache sessions transparently in memcached. So memcached mecomes your Session Management System. Is it better or worse than Cassandra? My feeling is that it's probably faster and easier to set up. -- View this message in context: http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/cassandra-as-session-store-tp5981871p5982024.html Sent from the cassandra-u...@incubator.apache.org mailing list archive at Nabble.com.
TSocket timing out
When I do a lot of inserts into my cluster (10k at a time) I get timeouts from Thrift, the TScoket.py module. What do I do? Thanks, Maxim -- View this message in context: http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/TSocket-timing-out-tp5973548p5973548.html Sent from the cassandra-u...@incubator.apache.org mailing list archive at Nabble.com.
Re: Cassandra and count
As far as I know, there are no aggregate operations built into Cassandra, which means you'll have to retrieve all of the data to count it in the client. I had a thread on this topic 2 weeks ago. It's pretty bad. -- View this message in context: http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Cassandra-and-count-tp5969159p5970315.html Sent from the cassandra-u...@incubator.apache.org mailing list archive at Nabble.com.
Re: Node going down when streaming data, what next?
Sorry Aaron but this doesn't help. As I said, machine is dead, kaput, finished. So I can't do decommission. I can remove token to any other node, but -- the dead machine is going to hang around in my ring reports like a zombie. -- View this message in context: http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Node-going-down-when-streaming-data-what-next-tp5962944p5971349.html Sent from the cassandra-u...@incubator.apache.org mailing list archive at Nabble.com.
Re: Node going down when streaming data, what next?
It does remove tokens, and the ring shows that the problematic node owns 0 tokens, which is OK. However, it's still there, listed. It's not a bug but kind of like a feature -- you can move that node back in two days later and move tokens in same or different way. What I wish happened was that API allowed for the nodetool to issue a command: nodetool --host foobar removeempty Which would then really scratch the node with zero tokens from the ring, no questions asked. Even if the flaky node physically disappeared. -- View this message in context: http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Node-going-down-when-streaming-data-what-next-tp5962944p5971851.html Sent from the cassandra-u...@incubator.apache.org mailing list archive at Nabble.com.
Re: Using Cassandra for storing large objects
Will it work for a billion rows? Because that's where eventually I'll end up being. -- View this message in context: http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Using-Cassandra-for-storing-large-objects-tp5965418p5966284.html Sent from the cassandra-u...@incubator.apache.org mailing list archive at Nabble.com.
Re: Using Cassandra for storing large objects
I would ask myself a different question, which is what media-hosting sites use (YouTube and all others). Cassandra still may have its usefulness here as a mapper between a logical id and physical file location. -- View this message in context: http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Using-Cassandra-for-storing-large-objects-tp5965418p5967730.html Sent from the cassandra-u...@incubator.apache.org mailing list archive at Nabble.com.
Re: Node going down when streaming data, what next?
OK, after running repair and waiting overnight the rebalancing worked and now 3 nodes share the load as I expected. However, one node that is broken is still listed in the ring. I have no intention of reviving it. What's the optimal way to get rid of it as far as the ring configuration is concerned (it's still listed as down but I would like to really scratch it)? Thanks, Maxim -- View this message in context: http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Node-going-down-when-streaming-data-what-next-tp5962944p5968075.html Sent from the cassandra-u...@incubator.apache.org mailing list archive at Nabble.com.
Node going down when streaming data, what next?
I was moving a node and at some point it started streaming data to 2 other nodes. Later, that node keeled over and let's assume I can't fix it for the next 3 days and just want to move tokens on the remaining three to even out and see if I can live with it. But I can't do that! The node that was on the receiving end of the stream refuses to move, because it's still receiving. What do I do? Maxim -- View this message in context: http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Node-going-down-when-streaming-data-what-next-tp5962944p5962944.html Sent from the cassandra-u...@incubator.apache.org mailing list archive at Nabble.com.
Re: Schema Design
Having separate columns for Year, Month etc seems redundant. It's tons more efficient to keep say UTC time in POSIX format (basically integer). It's easy to convert back and forth. If you want to get a range of dates, in that case you might use Order Preserving Partitioner, and sort out which systems logged later in client. Read up on consequences of using OPP. Whether to shard data as per system depends on how many you have. If more than a few, don't do that, there are memory considerations. Cheers Maxim -- View this message in context: http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Schema-Design-tp5964167p5964227.html Sent from the cassandra-u...@incubator.apache.org mailing list archive at Nabble.com.
Re: Node going down when streaming data, what next?
Bump. I still don't know what is the best things to do, plz help. -- View this message in context: http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Node-going-down-when-streaming-data-what-next-tp5962944p5964231.html Sent from the cassandra-u...@incubator.apache.org mailing list archive at Nabble.com.
Re: Schema Design
I used the term sharding a bit frivolously. Sorry. It's just splitting semantically homogenious data among CFs doesn't scale too well, as each CF is allocated a piece of memory on the server. -- View this message in context: http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Schema-Design-tp5964167p5964326.html Sent from the cassandra-u...@incubator.apache.org mailing list archive at Nabble.com.
Re: Node going down when streaming data, what next?
Hello, from what I know, you don't really have to restart simultaneously, although of course you don't want to wait. I finally decided to use removetoken command to actually scratch out the sickly node from the cluster. I'll bootstrap is later when it's fixed. -- View this message in context: http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Node-going-down-when-streaming-data-what-next-tp5962944p5964804.html Sent from the cassandra-u...@incubator.apache.org mailing list archive at Nabble.com.
Why does cassandra stream data when moving tokens?
Sorry if this sounds silly, but I can't get my brain around this one: if all nodes contain replicas, why does the cluster stream data every time I more or remove a token? If the data is already there, what needs to be streamed? Thanks Maxim -- View this message in context: http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Why-does-cassandra-stream-data-when-moving-tokens-tp5964839p5964839.html Sent from the cassandra-u...@incubator.apache.org mailing list archive at Nabble.com.
RE: Why does cassandra stream data when moving tokens?
Thanks, I'll look at the configuration again. In the meantime, I can't move the first node in the ring (after I removed the previous node's token) -- it throws an exception and says data is being streamed to it -- however, this is not what netstats says! Weirdness continues... Maxim -- View this message in context: http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Why-does-cassandra-stream-data-when-moving-tokens-tp5964839p5964883.html Sent from the cassandra-u...@incubator.apache.org mailing list archive at Nabble.com.
Re: Forcing GC w/o jconsole
Thanks! It doesn't seem to have any effect on GCing dropped CFs, though. Maxim -- View this message in context: http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Forcing-GC-w-o-jconsole-tp5956747p5960100.html Sent from the cassandra-u...@incubator.apache.org mailing list archive at Nabble.com.
Re: Stress test inconsistencies
Oleg, I'm a novice at this, but for what it's worth I can't imagine you can have a _sustained_ 1kHz insertion rate on a single machine which also does some reads. If I'm wrong, I'll be glad to learn that I was. It just doesn't seem to square with a typical seek time on a hard drive. Maxim -- View this message in context: http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Stress-test-inconsistencies-tp5957467p5960182.html Sent from the cassandra-u...@incubator.apache.org mailing list archive at Nabble.com.
Re-partitioning the cluster with nodetool: what's happening?
I'm trying re-partition my 4-node cluster to make the load exactly 25% on each node. As per recipes found in documentation, I calculate: for x in xrange(4): ... print 2**127/4*x ... 0 42535295865117307932921825928971026432 85070591730234615865843651857942052864 127605887595351923798765477786913079296 And I need to move the first one to 0, then the second one to 42535295865117307932921825928971026432 etc. Once I start the procedure, I see no progress when I look at nodetool netstats. Nothing's happening. What am I doing wrong? Thanks, Maxim -- View this message in context: http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Re-partitioning-the-cluster-with-nodetool-what-s-happening-tp5960843p5960843.html Sent from the cassandra-u...@incubator.apache.org mailing list archive at Nabble.com.
Re: Re-partitioning the cluster with nodetool: what's happening?
Correction -- what I meant to say that I do see announcements about streaming in the output, but these are stuck at 0%. -- View this message in context: http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Re-partitioning-the-cluster-with-nodetool-what-s-happening-tp5960843p5960851.html Sent from the cassandra-u...@incubator.apache.org mailing list archive at Nabble.com.
Forcing GC w/o jconsole
My situation is similar to one described at this link: http://stackoverflow.com/questions/4155696/how-to-trigger-manual-java-gc-from-linux-console-with-no-x11 I'm trying the following command but it fails (connection refused) java -jar cmdline-jmxclient-0.10.3.jar - localhost:8081 java.lang:type=Memory gc What port number do I actually need? I really have no experience in doing that, if somebody can give me the correct recipe, this will be much appreciated. Maxim -- View this message in context: http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Forcing-GC-w-o-jconsole-tp5956747p5956747.html Sent from the cassandra-u...@incubator.apache.org mailing list archive at Nabble.com.
Re: Does Major Compaction work on dropped CFs? Doesn't seem so.
OK, so I'm looking at this page: http://wiki.apache.org/cassandra/MemtableSSTable This looks promising: A compaction marker is also added to obsolete sstables so they can be deleted on startup if the server does not perform a GC before being restarted. So it would seem that if I restart the server, the obsoleted data should be GCd out of existence, don't you think? But it's not happening. I brought down one node, restarted it and the old data is still there. Ideas? -- View this message in context: http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Does-Major-Compaction-work-on-dropped-CFs-Doesn-t-seem-so-tp5946031p5957155.html Sent from the cassandra-u...@incubator.apache.org mailing list archive at Nabble.com.
Re: Does Major Compaction work on dropped CFs? Doesn't seem so.
Thanks for the note, yes, I do know what files I don't need anymore. And, I do realize the difference between grace period of CFs, and garbage collection (or at least I hope I do). On the face value, documentation wasn't precise enough about JVM GC taking care of dropped CFs. I understand this is why nodetool compact didn't have the desired effect. I guess I'll have to do manual deletion after all. Maxim -- View this message in context: http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Does-Major-Compaction-work-on-dropped-CFs-Doesn-t-seem-so-tp5946031p5957252.html Sent from the cassandra-u...@incubator.apache.org mailing list archive at Nabble.com.
Re: Does Major Compaction work on dropped CFs? Doesn't seem so.
Thanks Aaron. As I remarked earlier (and it seems it not uncommon) none of the nodes have X11 installed (I think I could arrange this, but it's a bit of a hassle). So if I understand correctly, jconsole is a X11 app, and I'm out of luck with that. I would agree with you that having a proper nodetool command to zap the data you know you don't need, would be quite ideal. The reason I'm so retentive about it is that I plan to test scaling up to 250 million rows, and disk space matters. -- View this message in context: http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Does-Major-Compaction-work-on-dropped-CFs-Doesn-t-seem-so-tp5946031p5957426.html Sent from the cassandra-u...@incubator.apache.org mailing list archive at Nabble.com.
Multiple indexes - how does Cassandra handle these internally?
Greetings -- if I use multiple secondary indexes in the query, what will Cassandra do? Some examples say it will index on first EQ and then loop on others. Does it ever do a proper index product to avoid inner loops? Thanks Maxim -- View this message in context: http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Multiple-indexes-how-does-Cassandra-handle-these-internally-tp5947533p5947533.html Sent from the cassandra-u...@incubator.apache.org mailing list archive at Nabble.com.
Does Major Compaction work on dropped CFs? Doesn't seem so.
Greetings, I just used teh nodetool to force a major compaction on my cluster. It seems like the cfs currently in service were indeed compacted, while the old test materials (which I dropped from CLI) were still there as tombstones. Is that the expected behavior? Hmm... TIA. -- View this message in context: http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Does-Major-Compaction-work-on-dropped-CFs-Doesn-t-seem-so-tp5946031p5946031.html Sent from the cassandra-u...@incubator.apache.org mailing list archive at Nabble.com.
Re: Does Major Compaction work on dropped CFs? Doesn't seem so.
Thanks! What's strange anyhow is that the GC period for these cfs expired some days ago. I thought that a compaction would take care of these tombstones. I used nodetool to compact. -- View this message in context: http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Does-Major-Compaction-work-on-dropped-CFs-Doesn-t-seem-so-tp5946031p5946231.html Sent from the cassandra-u...@incubator.apache.org mailing list archive at Nabble.com.