Re: Performance deterioration while building secondary index

2011-09-16 Thread buddhasystem
Well, the problem is still there, i.e. I tried to add one more index and the
3-node cluster is just going spastic, becomes unresponsive etc. These boxes
have plenty of CPU and memory.


--
View this message in context: 
http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Performance-deterioration-while-building-secondary-index-tp6564401p6801680.html
Sent from the cassandra-u...@incubator.apache.org mailing list archive at 
Nabble.com.


Re: Node added, no performance boost -- are the tokens correct?

2011-04-01 Thread buddhasystem
On two different clusters, if I set the token to zero, on a node, its
ownership drops to zero after migration.



After I added the third one and moved tokens, I now have this:

33.33%  56713727820156410577229101238628035242
33.33%  113427455640312821154458202477256070484
33.33%  170141183460469231731687303715884105727

No zeroes.





Eric Gilmore-3 wrote:
 
 A script that I have says the following:
 
 $ python ctokens.py
 How many nodes are in your cluster? 2
 node 0: 0
 node 1: 85070591730234615865843651857942052864
 
 The first token should be zero, for the reasons discussed here:
 http://www.datastax.com/dev/tutorials/getting_started_0_7/configuring#initial-token-values
 
 More details are available in
 http://www.datastax.com/docs/0.7/operations/clustering#adding-capacity
 
 The DS docs have some weak areas, but these two pages have been pretty
 well
 vetted over the past months :)
 
 
 
 On Thu, Mar 31, 2011 at 3:06 PM, buddhasystem lt;potek...@bnl.govgt;
 wrote:
 
 I just configured a cluster of two nodes -- do these token values make
 sense?
 The reason I'm asking that so far I don't see load balancing to be
 happening, judging from performance.

 Address Status State   LoadOwnsToken

 170141183460469231731687303715884105728
 130.199.185.194 Up Normal  153.52 GB   50.00%
 85070591730234615865843651857942052864
 130.199.185.193 Up Normal  199.82 GB   50.00%
 170141183460469231731687303715884105728


 --
 View this message in context:
 http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Node-added-no-performance-boost-are-the-tokens-correct-tp6228872p6228872.html
 Sent from the cassandra-u...@incubator.apache.org mailing list archive at
 Nabble.com.

 


--
View this message in context: 
http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Node-added-no-performance-boost-are-the-tokens-correct-tp6228872p6231845.html
Sent from the cassandra-u...@incubator.apache.org mailing list archive at 
Nabble.com.


Netstats out of sync?

2011-03-31 Thread buddhasystem
I'm rebalancing a cluster of 2 nodes at this point. Netstats on the source
node reports progress of the stream, whereas on the receving end netstats
states that progress = 0. Did anyone see that?

Do I need both nodes listed as seeds in cassandra.yaml?

TIA/


--
View this message in context: 
http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Netstats-out-of-sync-tp6227986p6227986.html
Sent from the cassandra-u...@incubator.apache.org mailing list archive at 
Nabble.com.


Node added, no performance boost -- are the tokens correct?

2011-03-31 Thread buddhasystem
I just configured a cluster of two nodes -- do these token values make sense?
The reason I'm asking that so far I don't see load balancing to be
happening, judging from performance.

Address Status State   LoadOwnsToken
  
170141183460469231731687303715884105728
130.199.185.194 Up Normal  153.52 GB   50.00% 
85070591730234615865843651857942052864
130.199.185.193 Up Normal  199.82 GB   50.00% 
170141183460469231731687303715884105728


--
View this message in context: 
http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Node-added-no-performance-boost-are-the-tokens-correct-tp6228872p6228872.html
Sent from the cassandra-u...@incubator.apache.org mailing list archive at 
Nabble.com.


Re: Node added, no performance boost -- are the tokens correct?

2011-03-31 Thread buddhasystem
Yup, I screwed up the token setting, my bad.

Now, I moved the tokens. I still observe that read latency deteriorated with
3 machines vs original one. Replication factor is 1, Cassandra version 0.7.2
(didn't have time to upgrade as I need results by this weekend).

Key and row caching was disabled to get the worse case scenario test
results.


--
View this message in context: 
http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Node-added-no-performance-boost-are-the-tokens-correct-tp6228872p6229564.html
Sent from the cassandra-u...@incubator.apache.org mailing list archive at 
Nabble.com.


Re: data aggregation in Cassandra

2011-03-25 Thread buddhasystem
Hello Saurabh,

I have a similar situation, with a more complex data model, and I do an
equivalent of map-reduce by hand. The redeeming value is that you have
complete freedom in how you hash, and you design the way you store indexes
and similar structures. If there is a pattern in data store, you use it to
your advantage. In the end, you get good performance.

--
View this message in context: 
http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/data-aggregation-in-Cassandra-tp6206994p6207879.html
Sent from the cassandra-u...@incubator.apache.org mailing list archive at 
Nabble.com.


Re: cassandra nodes with mixed hard disk sizes

2011-03-22 Thread buddhasystem

aaron morton wrote:
 
 
 Also a node is be responsible for storing it's token range and acting as a
 replica for other token ranges. So reducing the token range may not have a
 dramatic affect on the storage requirements. 
 

Aaron,

is there a way to configure wimpy nodes such that the replicas are
elsewhere?


--
View this message in context: 
http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/cassandra-nodes-with-mixed-hard-disk-sizes-tp6194071p6195543.html
Sent from the cassandra-u...@incubator.apache.org mailing list archive at 
Nabble.com.


Re: Deleting old SSTables

2011-03-22 Thread buddhasystem
Jonathan,

for all of us just tinker with test clusters, building confidence in the
product, it would be nice to be able to do same with nodetool, without
jconsole, just my 0.5 penny.  Thanks.


Jonathan Ellis-3 wrote:
 
 From the next paragraph of the same wiki page:
 
 SSTables that are obsoleted by a compaction are deleted asynchronously
 when the JVM performs a GC. You can force a GC from jconsole if
 necessary, but Cassandra will force one itself if it detects that it
 is low on space. A compaction marker is also added to obsolete
 sstables so they can be deleted on startup if the server does not
 perform a GC before being restarted.
 
 On Tue, Mar 22, 2011 at 8:30 AM, Jonathan Colby
 lt;jonathan.co...@gmail.comgt; wrote:
 gt; According to the Wiki Page on compaction:  once compaction is
 finished, the old SSTable files may be deleted*
 gt;
 gt; * http://wiki.apache.org/cassandra/MemtableSSTable
 gt;
 gt; I thought the old SSTables would be deleted automatically, but this
 wiki page got me thinking otherwise.
 gt;
 gt; Question is,  if it is true that old SSTables must be manually
 deleted, how can one safely identify which SSTables can be deleted??
 gt;
 gt; Jon
 gt;
 gt;
 gt;
 gt;
 gt;
 gt;
 
 
 
 -- 
 Jonathan Ellis
 Project Chair, Apache Cassandra
 co-founder of DataStax, the source for professional Cassandra support
 http://www.datastax.com
 


--
View this message in context: 
http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Deleting-old-SSTables-tp6196113p6198172.html
Sent from the cassandra-u...@incubator.apache.org mailing list archive at 
Nabble.com.


Re: 0.7.2 choking on a 5 MB column

2011-03-22 Thread buddhasystem
Jonathan, wide rows have been discussed. I thought that the limit on number
of columns is way bigger than 45k. What can one expect in reality?

--
View this message in context: 
http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/0-7-2-choking-on-a-5-MB-column-tp6198387p6198548.html
Sent from the cassandra-u...@incubator.apache.org mailing list archive at 
Nabble.com.


Re: 0.7.2 choking on a 5 MB column

2011-03-22 Thread buddhasystem
I see. I'm doing something even more drastic then, because I'm only inserting
one row in this case, and just use cf.insert(), without batch mutator. It
didn't occur to me that was a bad idea.

So I take it, this method will fail. Hmm.


--
View this message in context: 
http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/0-7-2-choking-on-a-5-MB-column-tp6198387p6198618.html
Sent from the cassandra-u...@incubator.apache.org mailing list archive at 
Nabble.com.


Re: Reading whole row vs a range of columns (pycassa)

2011-03-20 Thread buddhasystem
Aaron, thanks for chiming in.

I'm doing what you said, i.e. all data for a single object (which is quite
lean with about 100 attributes 10 bytes each) just goes into a single
column, as opposed to the previous version of my application, which had all
attributes of each small object mapped to individual columns.

So yes, I perhaps considered having 100 objects in a single column but that
is suboptimal for many reasons (hard to add object later).

My reference to OOP was this -- if I was sticking with the original design,
it could have been advantageous to have OOP since statistically it's likely
that requests for objects are often serial, e.g. often people don't query
for just one object with id=123, but for a series like id=[123..145]. If I
bunch these into rows containing 100 objects each, that promises some
efficiency right there, as I read one row as opposed to say 50.




aaron morton wrote:
 
 I'd collapse all the data for a single object into a single column, not
 sure about storing 100 objects in a single column though. 
 
 Have you considered any concurrency issues ? e.g. multiple threads /
 processes wanting to update different objects in the same group of 100? 
 
 Dont understand your reference to the OOP in the context of a reading 100
 columns from a row. 
 
 Aaron
 
  
 On 19 Mar 2011, at 16:22, buddhasystem wrote:
 
 gt; As I'm working on this further, I want to understand this:
 gt; 
 gt; Is it advantageous to flatten data in blocks (strings) each
 containing a
 gt; series of objects, if I know that a serial object read is often
 likely, but
 gt; don't want to resort to OPP? I worked out the optimal granularity, it
 seems.
 gt; Is it better to read a serialized single column with 100 objects than
 a row
 gt; consisting of a hundred columns each modeling an object?
 gt; 
 gt; --
 gt; View this message in context:
 http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Reading-whole-row-vs-a-range-of-columns-pycassa-tp6186518p6186782.html
 gt; Sent from the cassandra-u...@incubator.apache.org mailing list
 archive at Nabble.com.
 


--
View this message in context: 
http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Reading-whole-row-vs-a-range-of-columns-pycassa-tp6186518p6190639.html
Sent from the cassandra-u...@incubator.apache.org mailing list archive at 
Nabble.com.


Undead rows after nodetool compact

2011-03-18 Thread buddhasystem
This has been discussed once, but I don't remember the outcome. I insert a
row and then delete the key immediately. I then run nodetool compact. In
cassanra-cli, list cf still return 1 empty row. This is not a showstopper
but damn unpretty. Is there a way to make deleted rows go, immediately?


--
View this message in context: 
http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Undead-rows-after-nodetool-compact-tp6186021p6186021.html
Sent from the cassandra-u...@incubator.apache.org mailing list archive at 
Nabble.com.


Reading whole row vs a range of columns (pycassa)

2011-03-18 Thread buddhasystem
Is there is noticeable difference in speed between reading the whole row
through Pycassa, vs a range of columns? Both rows and columns are pretty
slim.


--
View this message in context: 
http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Reading-whole-row-vs-a-range-of-columns-pycassa-tp6186518p6186518.html
Sent from the cassandra-u...@incubator.apache.org mailing list archive at 
Nabble.com.


Re: Reading whole row vs a range of columns (pycassa)

2011-03-18 Thread buddhasystem
As I'm working on this further, I want to understand this:

Is it advantageous to flatten data in blocks (strings) each containing a
series of objects, if I know that a serial object read is often likely, but
don't want to resort to OPP? I worked out the optimal granularity, it seems.
Is it better to read a serialized single column with 100 objects than a row
consisting of a hundred columns each modeling an object?

--
View this message in context: 
http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Reading-whole-row-vs-a-range-of-columns-pycassa-tp6186518p6186782.html
Sent from the cassandra-u...@incubator.apache.org mailing list archive at 
Nabble.com.


Does concurrent_reads relate to number of drives in RAID0?

2011-03-17 Thread buddhasystem
Hello, in the instructions, I need to link concurrent_reads to number of
drives. Is this related to number of physical drives that I have in my
RAID0, or something else?

--
View this message in context: 
http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Does-concurrent-reads-relate-to-number-of-drives-in-RAID0-tp6182346p6182346.html
Sent from the cassandra-u...@incubator.apache.org mailing list archive at 
Nabble.com.


Re: Does concurrent_reads relate to number of drives in RAID0?

2011-03-17 Thread buddhasystem
Thanks to all for replying, but frankly I didn't get the answer I wanted.
Does the number of disks apply to number of spindles in RAID0? Or
something else like a separate disk for commitlog and for data?


--
View this message in context: 
http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Does-concurrent-reads-relate-to-number-of-drives-in-RAID0-tp6182346p6183033.html
Sent from the cassandra-u...@incubator.apache.org mailing list archive at 
Nabble.com.


Re: Does concurrent_reads relate to number of drives in RAID0?

2011-03-17 Thread buddhasystem
Thanks Peter, I can see it better now.


--
View this message in context: 
http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Does-concurrent-reads-relate-to-number-of-drives-in-RAID0-tp6182346p6183051.html
Sent from the cassandra-u...@incubator.apache.org mailing list archive at 
Nabble.com.


Re: Does concurrent_reads relate to number of drives in RAID0?

2011-03-17 Thread buddhasystem
Where and how do I choose it?


--
View this message in context: 
http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Does-concurrent-reads-relate-to-number-of-drives-in-RAID0-tp6182346p6183069.html
Sent from the cassandra-u...@incubator.apache.org mailing list archive at 
Nabble.com.


Please help decipher /proc/cpuinfo for optimal Cassandra config

2011-03-16 Thread buddhasystem
Dear All,
this is from my new Cassandra server. It obviously uses hyperthreading, I
just don't know how to translate this to concurrent readers and writers in
cassandra.yaml -- can somebody take a look and tell me what number of cores
I need to assume for concurrent_reads and concurrent_writes. Is it 24?
Thanks!

[cassandra@cassandra01 bin]$ cat /proc/cpuinfo
processor   : 0
vendor_id   : GenuineIntel
cpu family  : 6
model   : 44
model name  : Intel(R) Xeon(R) CPU   X5650  @ 2.67GHz
stepping: 2
cpu MHz : 1596.000
cache size  : 12288 KB
physical id : 0
siblings: 12
core id : 0
cpu cores   : 6
apicid  : 0
initial apicid  : 0
fpu : yes
fpu_exception   : yes
cpuid level : 11
wp  : yes
flags   : fpu vme de pse tsc msr pae mce cx8 apic mtrr pge mca cmov
pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb
rdtscp lm constant_tsc arch_perfmon pebs bts rep_good xtopology nonstop_tsc
aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 cx16
xtpr pdcm dca sse4_1 sse4_2 popcnt aes lahf_lm arat tpr_shadow vnmi
flexpriority ept vpid
bogomips: 5333.91
clflush size: 64
cache_alignment : 64
address sizes   : 40 bits physical, 48 bits virtual
power management:

processor   : 1
vendor_id   : GenuineIntel
cpu family  : 6
model   : 44
model name  : Intel(R) Xeon(R) CPU   X5650  @ 2.67GHz
stepping: 2
cpu MHz : 1596.000
cache size  : 12288 KB
physical id : 0
siblings: 12
core id : 1
cpu cores   : 6
apicid  : 2
initial apicid  : 2
fpu : yes
fpu_exception   : yes
cpuid level : 11
wp  : yes
flags   : fpu vme de pse tsc msr pae mce cx8 apic mtrr pge mca cmov
pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb
rdtscp lm constant_tsc arch_perfmon pebs bts rep_good xtopology nonstop_tsc
aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 cx16
xtpr pdcm dca sse4_1 sse4_2 popcnt aes lahf_lm arat tpr_shadow vnmi
flexpriority ept vpid
bogomips: 5333.15
clflush size: 64
cache_alignment : 64
address sizes   : 40 bits physical, 48 bits virtual
power management:

processor   : 2
vendor_id   : GenuineIntel
cpu family  : 6
model   : 44
model name  : Intel(R) Xeon(R) CPU   X5650  @ 2.67GHz
stepping: 2
cpu MHz : 1596.000
cache size  : 12288 KB
physical id : 0
siblings: 12
core id : 2
cpu cores   : 6
apicid  : 4
initial apicid  : 4
fpu : yes
fpu_exception   : yes
cpuid level : 11
wp  : yes
flags   : fpu vme de pse tsc msr pae mce cx8 apic mtrr pge mca cmov
pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb
rdtscp lm constant_tsc arch_perfmon pebs bts rep_good xtopology nonstop_tsc
aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 cx16
xtpr pdcm dca sse4_1 sse4_2 popcnt aes lahf_lm arat tpr_shadow vnmi
flexpriority ept vpid
bogomips: 5333.15
clflush size: 64
cache_alignment : 64
address sizes   : 40 bits physical, 48 bits virtual
power management:

processor   : 3
vendor_id   : GenuineIntel
cpu family  : 6
model   : 44
model name  : Intel(R) Xeon(R) CPU   X5650  @ 2.67GHz
stepping: 2
cpu MHz : 1596.000
cache size  : 12288 KB
physical id : 0
siblings: 12
core id : 8
cpu cores   : 6
apicid  : 16
initial apicid  : 16
fpu : yes
fpu_exception   : yes
cpuid level : 11
wp  : yes
flags   : fpu vme de pse tsc msr pae mce cx8 apic mtrr pge mca cmov
pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb
rdtscp lm constant_tsc arch_perfmon pebs bts rep_good xtopology nonstop_tsc
aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 cx16
xtpr pdcm dca sse4_1 sse4_2 popcnt aes lahf_lm arat tpr_shadow vnmi
flexpriority ept vpid
bogomips: 5333.15
clflush size: 64
cache_alignment : 64
address sizes   : 40 bits physical, 48 bits virtual
power management:

processor   : 4
vendor_id   : GenuineIntel
cpu family  : 6
model   : 44
model name  : Intel(R) Xeon(R) CPU   X5650  @ 2.67GHz
stepping: 2
cpu MHz : 1596.000
cache size  : 12288 KB
physical id : 0
siblings: 12
core id : 9
cpu cores   : 6
apicid  : 18
initial apicid  : 18
fpu : yes
fpu_exception   : yes
cpuid level : 11
wp  : yes
flags   : fpu vme de pse tsc msr pae mce cx8 apic mtrr pge mca cmov
pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb
rdtscp lm constant_tsc arch_perfmon pebs bts rep_good xtopology nonstop_tsc
aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 cx16

Re: Is column update column-atomic or row atomic?

2011-03-16 Thread buddhasystem
Hello Peter, thanks for the note.

I'm not looking for anything fancy. It's just when I'm looking at the
following bit of Pycassa docs, it's not 100% clear to me that it won't
overwrite the entire row for the key, if I want to simply add an extra
column {'foo':'bar'} to the already existing row. I don't care about
cross-node consistency at this point.

insert(key, columns[, timestamp][, ttl][, write_consistency_level])¶

Insert or update columns in the row with key key.

columns should be a dictionary of columns or super columns to insert or
update. If this is a standard column family, columns should look like
{column_name: column_value}. If this is a super column family, columns
should look like {super_column_name: {sub_column_name: value}}

--
View this message in context: 
http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Is-column-update-column-atomic-or-row-atomic-tp6174445p6179492.html
Sent from the cassandra-u...@incubator.apache.org mailing list archive at 
Nabble.com.


Re: Is column update column-atomic or row atomic?

2011-03-16 Thread buddhasystem
Thanks for clarification, Tyler, sorry again for the basic question. I've
been doing straight inserts from Oracle so far but now I need to update rows
with new columns.


--
View this message in context: 
http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Is-column-update-column-atomic-or-row-atomic-tp6174445p6179536.html
Sent from the cassandra-u...@incubator.apache.org mailing list archive at 
Nabble.com.


Re: Please help decipher /proc/cpuinfo for optimal Cassandra config

2011-03-16 Thread buddhasystem
Thanks! Docs say it's good to set it to 8*Ncores, are saying you see 8 cores
in this output? I know I need to go way above default 32 with this setup.

--
View this message in context: 
http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Please-help-decipher-proc-cpuinfo-for-optimal-Cassandra-config-tp6179487p6179539.html
Sent from the cassandra-u...@incubator.apache.org mailing list archive at 
Nabble.com.


Is column update column-atomic or row atomic?

2011-03-15 Thread buddhasystem
Sorry for the rather primitive question, but it's not clear to me if I need
to fetch the whole row, add a column as a dictionary entry and re-insert it
if I want to expand the row by one column. Help will be appreciated.


--
View this message in context: 
http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Is-column-update-column-atomic-or-row-atomic-tp6174445p6174445.html
Sent from the cassandra-u...@incubator.apache.org mailing list archive at 
Nabble.com.


Re: Is column update column-atomic or row atomic?

2011-03-15 Thread buddhasystem
Thanks. Can you give me a pycassa example, if possible?

Thanks!


--
View this message in context: 
http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Is-column-update-column-atomic-or-row-atomic-tp6174445p6174487.html
Sent from the cassandra-u...@incubator.apache.org mailing list archive at 
Nabble.com.


Re: Cassandra LongType data insertion problem for secondary index usage

2011-03-10 Thread buddhasystem
Tyler, as a collateral issue - I've been wondering for a while what advantage
if any it buys me, if I declare a value 'long' (which it roughly is) as
opposed to passing around strings. String is flattened onto a replica of
itself, I assume? No conversion? Maybe it even means better speed.

Thanks,
Maxim

--
View this message in context: 
http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Cassandra-LongType-data-insertion-problem-for-secondary-index-usage-tp6158486p6159840.html
Sent from the cassandra-u...@incubator.apache.org mailing list archive at 
Nabble.com.


null vs value not found?

2011-02-24 Thread buddhasystem

I'm doing insertion with a pycassa client. It seems to work in most cases,
but sometimes, when I go to Cassandra-cli, and query with key and column
that I inserted, I get null whereas I shouldn't. What could be causes for
that?
-- 
View this message in context: 
http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/null-vs-value-not-found-tp6061828p6061828.html
Sent from the cassandra-u...@incubator.apache.org mailing list archive at 
Nabble.com.


Re: null vs value not found?

2011-02-24 Thread buddhasystem

Thanks Tyler,

ColumnFamily: index1
  Columns sorted by: org.apache.cassandra.db.marshal.AsciiType
  Row cache size / save period: 0.0/0
  Key cache size / save period: 1.0/3600
  Memtable thresholds: 0.8765625/50/60
  GC grace seconds: 864000
  Compaction min/max thresholds: 4/32
  Read repair chance: 1.0
  Built indexes: []

I pretty much went with the default settings, and the column name is
'CATALOG'.

Maxim




Tyler Hobbs-2 wrote:
 
 On Thu, Feb 24, 2011 at 2:27 PM, buddhasystem potek...@bnl.gov wrote:
 

 I'm doing insertion with a pycassa client. It seems to work in most
 cases,
 but sometimes, when I go to Cassandra-cli, and query with key and column
 that I inserted, I get null whereas I shouldn't. What could be causes
 for
 that?

 
 Could you clarify what column name and value you are using as well as the
 comparator and validator types?
 
 -- 
 Tyler Hobbs
 Software Engineer, DataStax http://datastax.com/
 Maintainer of the pycassa http://github.com/pycassa/pycassa Cassandra
 Python client library
 
 

-- 
View this message in context: 
http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/null-vs-value-not-found-tp6061828p6061900.html
Sent from the cassandra-u...@incubator.apache.org mailing list archive at 
Nabble.com.


Re: null vs value not found?

2011-02-24 Thread buddhasystem

Thanks! You are right. I see exception but have no idea what went wrong.


ERROR [ReadStage:14] 2011-02-24 21:51:29,374 AbstractCassandraDaemon.java
(line 113) Fatal exception in thread Thread[ReadStage:14,5,main]
java.io.IOError: java.io.EOFException
at
org.apache.cassandra.db.columniterator.SSTableNamesIterator.init(SSTableNamesIterator.java:75)
at
org.apache.cassandra.db.filter.NamesQueryFilter.getSSTableColumnIterator(NamesQueryFilter.java:59)
at
org.apache.cassandra.db.filter.QueryFilter.getSSTableColumnIterator(QueryFilter.java:80)
at
org.apache.cassandra.db.ColumnFamilyStore.getTopLevelColumns(ColumnFamilyStore.java:1316)
at
org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(ColumnFamilyStore.java:1205)
at
org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(ColumnFamilyStore.java:1134)
at org.apache.cassandra.db.Table.getRow(Table.java:386)
at
org.apache.cassandra.db.SliceByNamesReadCommand.getRow(SliceByNamesReadCommand.java:60)
at
org.apache.cassandra.db.ReadVerbHandler.doVerb(ReadVerbHandler.java:69)
at
org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:70)
at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(Unknown
Source)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown
Source)
at java.lang.Thread.run(Unknown Source)
Caused by: java.io.EOFException
at java.io.DataInputStream.readInt(Unknown Source)
at
org.apache.cassandra.utils.BloomFilterSerializer.deserialize(BloomFilterSerializer.java:48)
at
org.apache.cassandra.utils.BloomFilterSerializer.deserialize(BloomFilterSerializer.java:30)
at
org.apache.cassandra.io.sstable.IndexHelper.defreezeBloomFilter(IndexHelper.java:108)
at
org.apache.cassandra.db.columniterator.SSTableNamesIterator.read(SSTableNamesIterator.java:106)
at
org.apache.cassandra.db.columniterator.SSTableNamesIterator.init(SSTableNamesIterator.java:71)
... 12 more

-- 
View this message in context: 
http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/null-vs-value-not-found-tp6061828p6061983.html
Sent from the cassandra-u...@incubator.apache.org mailing list archive at 
Nabble.com.


Re: Homebrew CF-indexing vs secondary indexing

2011-02-24 Thread buddhasystem

FWIW, for me the advantage of homebrew indexes is that they can be a lot more
sophisticated than the standard -- I can hash combinations of column values
to whatever I want. I also put counters on column values in the index, so
there is lots of functionality. Of course, I can do it because my data
becomes read-only, I know it's a luxury.

-- 
View this message in context: 
http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Homebrew-CF-indexing-vs-secondary-indexing-tp6062677p6062705.html
Sent from the cassandra-u...@incubator.apache.org mailing list archive at 
Nabble.com.


Will the large datafile size affect the performance?

2011-02-23 Thread buddhasystem

I know that theoretically it should not (apart from compaction issues), but
maybe somebody has experience showing otherwise:

My test cluster now has 250GB of data and will have 1.5TB in its
reincarnation. If all these data is in a single CF -- will it cause read or
write performance problems? Should I shard it? One advantage of splitting
the data would be reducing the impact of compaction and repairs (or so I
naively assume).

TIA

Maxim

-- 
View this message in context: 
http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Will-the-large-datafile-size-affect-the-performance-tp6057991p6057991.html
Sent from the cassandra-u...@incubator.apache.org mailing list archive at 
Nabble.com.


Can I count on Super Column Families why planing 3 years out?

2011-02-23 Thread buddhasystem

There was a discussion here on how well (or not so well) the Super CFs are
supported. I now need to make a strategic decision as to how I plan my data.
What's the consensus -- will the super CF be there 3 years out?


TIA
Maxim

-- 
View this message in context: 
http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Can-I-count-on-Super-Column-Families-why-planing-3-years-out-tp6057997p6057997.html
Sent from the cassandra-u...@incubator.apache.org mailing list archive at 
Nabble.com.


How come key cache increases speed by x4?

2011-02-23 Thread buddhasystem

Well I know the cache is there for a reason, I just can't explain the factor
of 4 when I run my queries on a hot vs cold cache. My queries are actually a
chain of one on an inverted index, which produces a tuple of keys to be used
in the main query. The inverted index query should be downright trivial.

I see the turnaround time per row go down to 1 ms from 4 ms. Am I missing
something? Why such a large factor?

TIA

Maxim

-- 
View this message in context: 
http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/How-come-key-cache-increases-speed-by-x4-tp6058435p6058435.html
Sent from the cassandra-u...@incubator.apache.org mailing list archive at 
Nabble.com.


Virtues and pitfall of using TYPES?

2011-02-18 Thread buddhasystem

I've been too smart for my own good trying to type columns, on the theory
that it would later increase performance by having more efficient
comparators in place. So if a string represents an integer, I would convert
it to an integer and declare the column as such. Same for LONG.

What I found is that during the write operation, the type conversion kills
the performance. It's really not too trivial amount of time.

Has anyone had a similar experience?

-- 
View this message in context: 
http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Virtues-and-pitfall-of-using-TYPES-tp6042432p6042432.html
Sent from the cassandra-u...@incubator.apache.org mailing list archive at 
Nabble.com.


Re: Virtues and pitfall of using TYPES?

2011-02-18 Thread buddhasystem

Dude, I never mentioned the server side, sorry if it wasn't obvious.
As for python being slow, I'm not going away from it. It performs
amazingly well in other circumstances.


Jonathan Ellis-3 wrote:
 
 That doesn't make sense to me.  IntegerType validation is a no-op and
 LongType validation is pretty close (just a size check).
 
 If you meant that the conversion is killing performance on your
 client, you should switch to a more performant client language. :)
 
 On Fri, Feb 18, 2011 at 9:56 PM, buddhasystem potek...@bnl.gov wrote:

 I've been too smart for my own good trying to type columns, on the theory
 that it would later increase performance by having more efficient
 comparators in place. So if a string represents an integer, I would
 convert
 it to an integer and declare the column as such. Same for LONG.

 What I found is that during the write operation, the type conversion
 kills
 the performance. It's really not too trivial amount of time.

 Has anyone had a similar experience?

 --
 View this message in context:
 http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Virtues-and-pitfall-of-using-TYPES-tp6042432p6042432.html
 Sent from the cassandra-u...@incubator.apache.org mailing list archive at
 Nabble.com.

 
 
 
 -- 
 Jonathan Ellis
 Project Chair, Apache Cassandra
 co-founder of DataStax, the source for professional Cassandra support
 http://www.datastax.com
 
 

-- 
View this message in context: 
http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Virtues-and-pitfall-of-using-TYPES-tp6042432p6042601.html
Sent from the cassandra-u...@incubator.apache.org mailing list archive at 
Nabble.com.


Re: create additional secondary index

2011-02-16 Thread buddhasystem

I sidestep this problem by using a Python script (pycassa-based) where I
configure my CFs. This way, it's reproducible and documented.
-- 
View this message in context: 
http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/create-additional-secondary-index-tp6033574p6033683.html
Sent from the cassandra-u...@incubator.apache.org mailing list archive at 
Nabble.com.


What is the most solid version of Cassandra? No secondary indexes needed.

2011-02-15 Thread buddhasystem

Hello,

we are acquiring new hardware for our cluster and will be installing it
soon. It's likely that I won't need to rely on secondary index
functionality, as data will be write-once read-many and I can get away with
inverse index creation at load time, plus I have some more complex indexing
in mind than comes packaged (too much to explain here).

So, if I don't need indexes, what is the most stable, reliable version of
Cassandra that I can put in production? I'm seeing bug reports here and some
sound quite serious, I just want something that works day in, day out.

Thank you,

Maxim

-- 
View this message in context: 
http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/What-is-the-most-solid-version-of-Cassandra-No-secondary-indexes-needed-tp6028966p6028966.html
Sent from the cassandra-u...@incubator.apache.org mailing list archive at 
Nabble.com.


Re: What is the most solid version of Cassandra? No secondary indexes needed.

2011-02-15 Thread buddhasystem

Thank you! It's just that 7.1 seems the bleeding edge now (a serious bug
fixed today). Would you still trust it as a production-level service? I'm
just slightly concerned. I don't want to create a perception among our IT
that the product is not ready for prime time.
-- 
View this message in context: 
http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/What-is-the-most-solid-version-of-Cassandra-No-secondary-indexes-needed-tp6028966p6029047.html
Sent from the cassandra-u...@incubator.apache.org mailing list archive at 
Nabble.com.


Re: What is the most solid version of Cassandra? No secondary indexes needed.

2011-02-15 Thread buddhasystem

Thank you Attila!

We will indeed have a few months of breaking in. I suppose I'll
keep my fingers crossed and see that 0.7.X is very stable. So I'll
deploy 0.7.1 -- I will need to apply all the patches, there is no
cumulative download, is that correct?


Attila Babo wrote:
 
 0.6.8 is stable and production ready, the later versions of the 0.6
 branch has issues. No offense, but the 0.7 branch is fairly unstable
 from my experience. I have reproduced all the open bugs with a
 production dataset, even when tried to rebuild it from scratch after a
 complete loss.
 
 If you have a few month before going to production your best bet is
 still 0.7.1 as it will stabilize but the switch between versions is
 painful.
 
 /Attila
 
 

-- 
View this message in context: 
http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/What-is-the-most-solid-version-of-Cassandra-No-secondary-indexes-needed-tp6028966p6029622.html
Sent from the cassandra-u...@incubator.apache.org mailing list archive at 
Nabble.com.


Re: Column name size

2011-02-11 Thread buddhasystem

I've been thinking about this as well. I'm migrating data from a large Oracle
database, and the RDBMS columns names are descriptive (good) and long (bad).
For now I just keep them when populating Cassandra, but I can shave off
about 30% of storage by hashing names. I don't need any automation and can
just maintain a dictionary of serial numbers to strings and vice versa, it's
still under a 100 items. When you start building inverse indexes and other
auxiliary structures, the size effect may be amiplified.
-- 
View this message in context: 
http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Column-name-size-tp6015127p6016109.html
Sent from the cassandra-u...@incubator.apache.org mailing list archive at 
Nabble.com.


Re: Limit on amount of CFs

2011-02-11 Thread buddhasystem

I asked a similar question (but didn't receive an answer). I'm trying to see
if a large number of CFs might be beneficial. One thing I can think about is
the size of extra storage needed for compaction -- obviously it will be
smaller in case of many smaller CFs.
-- 
View this message in context: 
http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Limit-on-amount-of-CFs-tp6013702p6016125.html
Sent from the cassandra-u...@incubator.apache.org mailing list archive at 
Nabble.com.


Re: Calculating the size of rows in KBs

2011-02-11 Thread buddhasystem

Does it also mean that the whole row will be deserialized when a query comes
just for one column?

-- 
View this message in context: 
http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Calculating-the-size-of-rows-in-KBs-tp6011243p6017870.html
Sent from the cassandra-u...@incubator.apache.org mailing list archive at 
Nabble.com.


Re: Specifying row caching on per query basis ?

2011-02-09 Thread buddhasystem

Jonathan, what if the data is really homogeneous, but over a long period of
time. I decided that the users who hit the database for recent past should
have a better ride. Splitting into a separate CF also has costs, right?

In fact, if I were to go this way, do you think I can crank down the key
caches? If yes, down to what level, zero?

Thanks!



Jonathan Ellis-3 wrote:
 
 Not really, no.  If you can't trust LRU to cache the hottest rows
 perhaps you should split the data into different ColumnFamilies.
 
 On Wed, Feb 9, 2011 at 1:43 PM, Ertio Lew ertio...@gmail.com wrote:
 Is this under consideration for future releases ? or being thought
 about!?



 On Thu, Feb 10, 2011 at 12:56 AM, Jonathan Ellis jbel...@gmail.com
 wrote:
 Currently there is not.

 On Wed, Feb 9, 2011 at 12:04 PM, Ertio Lew ertio...@gmail.com wrote:
 Is there any way to specify on per query basis(like we specify the
 Consistency level), what rows be cached while you're reading them,
 from a row_cache enabled CF. I believe, this could lead to much more
 efficient use of the cache space!!( if you use same data for different
 features/ parts in your application which have different caching
 needs).




 --
 Jonathan Ellis
 Project Chair, Apache Cassandra
 co-founder of DataStax, the source for professional Cassandra support
 http://www.datastax.com


 
 
 
 -- 
 Jonathan Ellis
 Project Chair, Apache Cassandra
 co-founder of DataStax, the source for professional Cassandra support
 http://www.datastax.com
 
 

-- 
View this message in context: 
http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Specifying-row-caching-on-per-query-basis-tp6008838p6009462.html
Sent from the cassandra-u...@incubator.apache.org mailing list archive at 
Nabble.com.


What will happen if I try to compact with insufficient headroom?

2011-02-09 Thread buddhasystem

One of my nodes is 76% full. I know that one of CFs represents 90% of the
data, others are really minor. Can I still compact under these conditions?
Will it crash and lose the data? Will it try to create one very large file
out of fragments, for that dominating CF?

TIA

-- 
View this message in context: 
http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/What-will-happen-if-I-try-to-compact-with-insufficient-headroom-tp6009619p6009619.html
Sent from the cassandra-u...@incubator.apache.org mailing list archive at 
Nabble.com.


Can serialized objects in columns serve as ersatz superCFs?

2011-02-08 Thread buddhasystem

Seeing that discussion here about indexes not supported in superCFs, and less
than clear future of superCFs altogether, I was thinking about getting a
modicum of same functionality with serialized objects inside columns. This
way the column key becomes sort of analog of supercolumn key, and I handle
the dictionaries I receive in the client.

Does this sound OK?

-- 
View this message in context: 
http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Can-serialized-objects-in-columns-serve-as-ersatz-superCFs-tp6003775p6003775.html
Sent from the cassandra-u...@incubator.apache.org mailing list archive at 
Nabble.com.


Re: Can serialized objects in columns serve as ersatz superCFs?

2011-02-08 Thread buddhasystem

Thanks for the comment! In my case, I want to store various time slices as
indexes, so the content can be serialized as comma-separated concatenation
of unique object IDs. Example: on 20101204, multiple clouds experienced a
variety of errors in job execution. In addition, multiple users ran (or
failed) on different clouds. If I combine user id, cloud id and error code,
I can relatively easily drill for errors on a particular date. So each CF
maps to a date, and each column in it is a compound index.

-- 
View this message in context: 
http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Can-serialized-objects-in-columns-serve-as-ersatz-superCFs-tp6003775p6004834.html
Sent from the cassandra-u...@incubator.apache.org mailing list archive at 
Nabble.com.


Java bombs during compaction, please help

2011-02-07 Thread buddhasystem

Hello,
one node in my 3-machine cluster cannot perform compaction. I tried multiple
times, it ran out of heap space once and I increased it. Now I'm getting the
dump below (after it does run for a few minutes). I hope somebody can shed a
little light on what' going on, because I'm at a loss and this is a real
show stopper.


[me@mymachine]~/cassandra-test% Error occured while compacting keyspace
Tracer
java.util.concurrent.ExecutionException: java.lang.NullPointerException
at java.util.concurrent.FutureTask$Sync.innerGet(Unknown Source)
at java.util.concurrent.FutureTask.get(Unknown Source)
at
org.apache.cassandra.db.CompactionManager.performMajor(CompactionManager.java:186)
at
org.apache.cassandra.db.ColumnFamilyStore.forceMajorCompaction(ColumnFamilyStore.java:1766)
at
org.apache.cassandra.service.StorageService.forceTableCompaction(StorageService.java:1236)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source)
at java.lang.reflect.Method.invoke(Unknown Source)
at
com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(Unknown Source)
at
com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(Unknown Source)
at com.sun.jmx.mbeanserver.MBeanIntrospector.invokeM(Unknown Source)
at com.sun.jmx.mbeanserver.PerInterface.invoke(Unknown Source)
at com.sun.jmx.mbeanserver.MBeanSupport.invoke(Unknown Source)
at
com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.invoke(Unknown Source)
at com.sun.jmx.mbeanserver.JmxMBeanServer.invoke(Unknown Source)
at javax.management.remote.rmi.RMIConnectionImpl.doOperation(Unknown
Source)
at javax.management.remote.rmi.RMIConnectionImpl.access$200(Unknown
Source)
at
javax.management.remote.rmi.RMIConnectionImpl$PrivilegedOperation.run(Unknown
Source)
at
javax.management.remote.rmi.RMIConnectionImpl.doPrivilegedOperation(Unknown
Source)
at javax.management.remote.rmi.RMIConnectionImpl.invoke(Unknown
Source)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source)
at java.lang.reflect.Method.invoke(Unknown Source)
at sun.rmi.server.UnicastServerRef.dispatch(Unknown Source)
at sun.rmi.transport.Transport$1.run(Unknown Source)
at java.security.AccessController.doPrivileged(Native Method)
at sun.rmi.transport.Transport.serviceCall(Unknown Source)
at sun.rmi.transport.tcp.TCPTransport.handleMessages(Unknown Source)
at sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run0(Unknown
Source)
at sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run(Unknown
Source)
at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(Unknown
Source)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown
Source)
at java.lang.Thread.run(Unknown Source)
Caused by: java.lang.NullPointerException
at
org.apache.cassandra.io.util.ColumnIterator$1.getKey(ColumnSortedMap.java:276)
at
org.apache.cassandra.io.util.ColumnIterator$1.getKey(ColumnSortedMap.java:263)
at
java.util.concurrent.ConcurrentSkipListMap.buildFromSorted(Unknown Source)
at java.util.concurrent.ConcurrentSkipListMap.init(Unknown Source)
at
org.apache.cassandra.db.SuperColumnSerializer.deserialize(SuperColumn.java:384)
at
org.apache.cassandra.db.SuperColumnSerializer.deserialize(SuperColumn.java:332)
at
org.apache.cassandra.db.ColumnFamilySerializer.deserializeColumns(ColumnFamilySerializer.java:129)
at
org.apache.cassandra.io.sstable.SSTableIdentityIterator.getColumnFamilyWithColumns(SSTableIdentityIterator.java:137)
at
org.apache.cassandra.io.PrecompactedRow.init(PrecompactedRow.java:78)
at
org.apache.cassandra.io.CompactionIterator.getCompactedRow(CompactionIterator.java:139)
at
org.apache.cassandra.io.CompactionIterator.getReduced(CompactionIterator.java:108)
at
org.apache.cassandra.io.CompactionIterator.getReduced(CompactionIterator.java:43)
at
org.apache.cassandra.utils.ReducingIterator.computeNext(ReducingIterator.java:73)
at
com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:136)
at
com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:131)
at
org.apache.commons.collections.iterators.FilterIterator.setNextObject(FilterIterator.java:183)
at
org.apache.commons.collections.iterators.FilterIterator.hasNext(FilterIterator.java:94)
at
org.apache.cassandra.db.CompactionManager.doCompaction(CompactionManager.java:427)
at

Re: Java bombs during compaction, please help

2011-02-07 Thread buddhasystem

Thanks Jonathan -- does it mean that the machine is experiencing IO problems?

-- 
View this message in context: 
http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Java-bombs-during-compaction-please-help-tp6001773p6002320.html
Sent from the cassandra-u...@incubator.apache.org mailing list archive at 
Nabble.com.


Re: Finding the intersection results of column sets of two rows

2011-02-06 Thread buddhasystem

Hello,

If the amount of data is _that_ small, you'll have a much easier life with
MySQL, which supports the join procedure -- because that's exactly what
you want to achieve.


asil klin wrote:
 
 Hi all,
 
 I want to procure the intersection of columns set of two rows (from 2
 different column families).
 
 To achieve the intersection results, Can I, first retrieve all
 columns(around 300) from first row and just query by those column
 names in the second row(which contains maximum 100 000 columns) ?
 
 I am using the results during the write time  not before presentation
 to the user, so latency wont be much concern while writing.
 
 Is it the proper way to procure intersection results of two rows ?
 
 Would love to hear your comments..
 
 
 -
 
 Regards,
 Asil
 
 

-- 
View this message in context: 
http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Finding-the-intersection-results-of-column-sets-of-two-rows-tp5997248p5997743.html
Sent from the cassandra-u...@incubator.apache.org mailing list archive at 
Nabble.com.


How bad is teh impact of compaction on performance?

2011-02-05 Thread buddhasystem

Just wanted to see if someone with experience in running an actual service
can advise me:

how often do you run nodetool compact on your nodes? Do you stagger it in
time, for each node? How badly is performance affected?

I know this all seems too generic but then again no two clusters are created
equal anyhow. Just wanted to get a feel.

Thanks,
Maxim

-- 
View this message in context: 
http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/How-bad-is-teh-impact-of-compaction-on-performance-tp5995868p5995868.html
Sent from the cassandra-u...@incubator.apache.org mailing list archive at 
Nabble.com.


Re: How bad is teh impact of compaction on performance?

2011-02-05 Thread buddhasystem

Thanks Edward. In our usage scenario, there is never downtime, it's a global
24/7 operation.

What is impacted the worst, the read or write?

How does a node handle compaction when there is a spike of writes coming to
it?



Edward Capriolo wrote:
 
 On Sat, Feb 5, 2011 at 11:59 AM, buddhasystem potek...@bnl.gov wrote:

 Just wanted to see if someone with experience in running an actual
 service
 can advise me:

 how often do you run nodetool compact on your nodes? Do you stagger it in
 time, for each node? How badly is performance affected?

 I know this all seems too generic but then again no two clusters are
 created
 equal anyhow. Just wanted to get a feel.

 Thanks,
 Maxim

 --
 View this message in context:
 http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/How-bad-is-teh-impact-of-compaction-on-performance-tp5995868p5995868.html
 Sent from the cassandra-u...@incubator.apache.org mailing list archive at
 Nabble.com.

 
 This is an interesting topic. Cassandra can now remove tombstones on
 non-major compaction. For some use cases you may not have to trigger
 nodetool compact yourself to remove tombstones. Use cases that do not
 to many updates, deletes may have the least need to run compaction
 yourself.
 
 !However! If you have smaller SSTables, or less SSTables your read
 operations will be more efficient.
 
 if you have downtime such as from 1AM-6AM. Going through a major
 compaction might shrink you dataset significantly and that will make
 reads better.
 
 Compaction can be more or less intensive. The largest factor is is row
 size.  Users with large rows probably see faster compaction while
 smaller rows see it take a long time. You can lower the priority of
 the compaction thread for experimentation.
 
 As to the performance you want to get your cluster to the state where
 it is not compacting often. This may mean you need more nodes to
 handle writes.
 
 I graph the compaction information from JMX
 http://www.jointhegrid.com/cassandra/cassandra-cacti-m6.jsp
 to get a feel for how often a node is compacting on average. Also I
 cross reference the compaction with Read latency and IO graphs I have
 to see what impact compaction has on reads.
 
 Forcing a major compaction also lowers the chances a compaction will
 happen during the day on peak time. I major compact a few cluster
 nodes each night through cron (gc time 3 days). This has been good for
 keeping our data on disk as small as possible. Forcing the major
 compact at night uses IO, but i find it saves IO over the course of
 the day because each read seeks less on disk.
 
 

-- 
View this message in context: 
http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/How-bad-is-the-impact-of-compaction-on-performance-tp5995868p5995978.html
Sent from the cassandra-u...@incubator.apache.org mailing list archive at 
Nabble.com.


Re: order of index expressions

2011-02-05 Thread buddhasystem

Jonathan,

what's the implementation of that? I.e. is is a product of indexes or nested
loops?

Thanks,

Maxim

-- 
View this message in context: 
http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/order-of-index-expressions-tp5995909p5996488.html
Sent from the cassandra-u...@incubator.apache.org mailing list archive at 
Nabble.com.


Re: Using Cassandra to store files

2011-02-04 Thread buddhasystem

Even when storage is in NFS, Cassandra can still be quite useful as a file
catalog. Your physical storage can change, move etc. Therefore, it's a good
idea to provide mapping of logical names to physical store points (which in
fact can be many). This is a standard technique used in mass storage.

-- 
View this message in context: 
http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Using-Cassandra-to-store-files-tp5988698p5993357.html
Sent from the cassandra-u...@incubator.apache.org mailing list archive at 
Nabble.com.


Re: Moving data

2011-02-04 Thread buddhasystem

FWIW, I'm working on migrating a large amount of data out of Oracle into my
test cluster. The data has been warehoused as CSV files on Amazon S3. Having
that in place allows me to not put extra load on the production service when
doing many repeated tests. I then parse the data using CSV Python module
and, as Jonathan says, use threads to batch upload data into Cassandra.
Notable points: since the data is relatively sparse (i.e. many zeros for
integers and empty strings for strings etc), I establish a default value
dictionary, and don't write these to Cassandra at all -- they can be
reconstructed as needed when reading back.

Also, make sure you wrap Cassandra writes etc into exceptions. When load is
high, you might get timeouts at TSocket level etc.

-- 
View this message in context: 
http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Moving-data-tp5992669p5993443.html
Sent from the cassandra-u...@incubator.apache.org mailing list archive at 
Nabble.com.


Re: Using Cassandra to store files

2011-02-03 Thread buddhasystem

CouchDB

-- 
View this message in context: 
http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Using-Cassandra-to-store-files-tp5988698p5989122.html
Sent from the cassandra-u...@incubator.apache.org mailing list archive at 
Nabble.com.


Re: Slow network writes

2011-02-03 Thread buddhasystem

Dude, are you asking me to unsubscribe?

-- 
View this message in context: 
http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Slow-network-writes-tp5985757p5991488.html
Sent from the cassandra-u...@incubator.apache.org mailing list archive at 
Nabble.com.


Commit log compaction

2011-02-02 Thread buddhasystem

How often and by what criteria is the commit log compacted/truncated?

Thanks,

Maxim

-- 
View this message in context: 
http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Commit-log-compaction-tp5985221p5985221.html
Sent from the cassandra-u...@incubator.apache.org mailing list archive at 
Nabble.com.


Re: Commit log compaction

2011-02-02 Thread buddhasystem

Thank you. So what is exactly the condition that causes the older commit log
files to actually be removed? I observe that indeed they are rotated out
when the threshold is reached, but then new ones a placed in the directory
and the older ones are still there.

Thanks,
Maxim

-- 
View this message in context: 
http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Commit-log-compaction-tp5985221p5986399.html
Sent from the cassandra-u...@incubator.apache.org mailing list archive at 
Nabble.com.


Re: Counters in 0.8 -- conditional?

2011-02-02 Thread buddhasystem

Thanks. Just wanted to note that counting the number of rows where foo=bar is
a fairly ubiquitous task in db applications. In case of big data,
trafficking all these data to client just to count something isn't optimal
at all.

Maxim

-- 
View this message in context: 
http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Counters-in-0-8-conditional-tp5985214p5986442.html
Sent from the cassandra-u...@incubator.apache.org mailing list archive at 
Nabble.com.


Re: Counters in 0.8 -- conditional?

2011-02-02 Thread buddhasystem

Thanks. Yes I know it's by no means trivial. I thought in case there was an
index on the column on which I want to place condition, the index machinery
itself can do the counting (i.e. when the index is updated, the counter is
incremented). It doesn't seem too orthogonal to the current implementation,
at least from my very limited experience.

Maxim

-- 
View this message in context: 
http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Counters-in-0-8-conditional-tp5985214p5986871.html
Sent from the cassandra-u...@incubator.apache.org mailing list archive at 
Nabble.com.


Re: Cassandra memory needs

2011-02-02 Thread buddhasystem

Oleg,

I just wanted to add that I confirmed the importance of that rule of thumb
the hard way. I created two extra CFs and was able to reliably crash the
nodes during writes. I guess for the final setting I'll rely on results of
my testing.

But it's also important to not cause the swap death of your machine (i.e.
when you go too high on JVM memory).

Regards

Maxim

-- 
View this message in context: 
http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Cassandra-memory-needs-tp5986663p5986911.html
Sent from the cassandra-u...@incubator.apache.org mailing list archive at 
Nabble.com.


How do I get 0.7.1?

2011-02-02 Thread buddhasystem

Thanks.

Maxim

-- 
View this message in context: 
http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/How-do-I-get-0-7-1-tp5986927p5986927.html
Sent from the cassandra-u...@incubator.apache.org mailing list archive at 
Nabble.com.


Re: Slow network writes

2011-02-02 Thread buddhasystem

Jonathan,

where do I find that contrib/stress?

Maxim

-- 
View this message in context: 
http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Slow-network-writes-tp5985757p5986937.html
Sent from the cassandra-u...@incubator.apache.org mailing list archive at 
Nabble.com.


Re: How do I get 0.7.1?

2011-02-02 Thread buddhasystem

Stephen, sorry I didn't understand your missive.

Maxim

-- 
View this message in context: 
http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/How-do-I-get-0-7-1-tp5986927p5987184.html
Sent from the cassandra-u...@incubator.apache.org mailing list archive at 
Nabble.com.


Re: cassandra as session store

2011-02-01 Thread buddhasystem

Most if not all modern web application frameworks support sessions. This
applies to Django (with which I have most experience and also run it with
X.509 security layer) but also to Ruby on Rails and Pylons.

So, why would you re-invent the wheel? Too messy. It's all out there for you
to use.

Regards,
Maxim

-- 
View this message in context: 
http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/cassandra-as-session-store-tp5981871p5981961.html
Sent from the cassandra-u...@incubator.apache.org mailing list archive at 
Nabble.com.


Re: cassandra as session store

2011-02-01 Thread buddhasystem

For completeness:

http://stackoverflow.com/questions/3746685/running-django-site-in-multiserver-environment-how-to-handle-sessions
http://docs.djangoproject.com/en/dev/topics/http/sessions/#using-cached-sessions

I guess your approach does make sense, one only wishes that the servlet in
question did more work for you. If I read correctly, Django can cache
sessions transparently in memcached. So memcached mecomes your Session
Management System. Is it better or worse than Cassandra? My feeling is that
it's probably faster and easier to set up.


-- 
View this message in context: 
http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/cassandra-as-session-store-tp5981871p5982024.html
Sent from the cassandra-u...@incubator.apache.org mailing list archive at 
Nabble.com.


TSocket timing out

2011-01-29 Thread buddhasystem

When I do a lot of inserts into my cluster (10k at a time) I get timeouts
from Thrift, the TScoket.py module.

What do I do?

Thanks,

Maxim

-- 
View this message in context: 
http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/TSocket-timing-out-tp5973548p5973548.html
Sent from the cassandra-u...@incubator.apache.org mailing list archive at 
Nabble.com.


Re: Cassandra and count

2011-01-28 Thread buddhasystem

As far as I know, there are no aggregate operations built into Cassandra,
which means you'll have to retrieve all of the data to count it in the
client. I had a thread on this topic 2 weeks ago. It's pretty bad.

-- 
View this message in context: 
http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Cassandra-and-count-tp5969159p5970315.html
Sent from the cassandra-u...@incubator.apache.org mailing list archive at 
Nabble.com.


Re: Node going down when streaming data, what next?

2011-01-28 Thread buddhasystem

Sorry Aaron but this doesn't help. As I said, machine is dead, kaput,
finished. So I can't do decommission. I can remove token to any other
node, but -- the dead machine is going to hang around in my ring reports
like a zombie.

-- 
View this message in context: 
http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Node-going-down-when-streaming-data-what-next-tp5962944p5971349.html
Sent from the cassandra-u...@incubator.apache.org mailing list archive at 
Nabble.com.


Re: Node going down when streaming data, what next?

2011-01-28 Thread buddhasystem

It does remove tokens, and the ring shows that the problematic node owns 0
tokens, which is OK. However, it's still there, listed.

It's not a bug but kind of like a feature -- you can move that node back in
two days later and move tokens in same or different way.

What I wish happened was that API allowed for the nodetool to issue a
command:

nodetool --host foobar removeempty

Which would then really scratch the node with zero tokens from the ring, no
questions asked. Even if the flaky node physically disappeared.

-- 
View this message in context: 
http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Node-going-down-when-streaming-data-what-next-tp5962944p5971851.html
Sent from the cassandra-u...@incubator.apache.org mailing list archive at 
Nabble.com.


Re: Using Cassandra for storing large objects

2011-01-27 Thread buddhasystem

Will it work for a billion rows? Because that's where eventually I'll end up
being.

-- 
View this message in context: 
http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Using-Cassandra-for-storing-large-objects-tp5965418p5966284.html
Sent from the cassandra-u...@incubator.apache.org mailing list archive at 
Nabble.com.


Re: Using Cassandra for storing large objects

2011-01-27 Thread buddhasystem

I would ask myself a different question, which is what media-hosting sites
use (YouTube and all others). Cassandra still may have its usefulness here
as a mapper between a logical id and physical file location.
-- 
View this message in context: 
http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Using-Cassandra-for-storing-large-objects-tp5965418p5967730.html
Sent from the cassandra-u...@incubator.apache.org mailing list archive at 
Nabble.com.


Re: Node going down when streaming data, what next?

2011-01-27 Thread buddhasystem

OK, after running repair and waiting overnight the rebalancing worked and
now 3 nodes share the load as I expected. However, one node that is broken
is still listed in the ring. I have no intention of reviving it. What's the
optimal way to get rid of it as far as the ring configuration is concerned
(it's still listed as down but I would like to really scratch it)?

Thanks,

Maxim

-- 
View this message in context: 
http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Node-going-down-when-streaming-data-what-next-tp5962944p5968075.html
Sent from the cassandra-u...@incubator.apache.org mailing list archive at 
Nabble.com.


Node going down when streaming data, what next?

2011-01-26 Thread buddhasystem

I was moving a node and at some point it started streaming data to 2 other
nodes. Later, that node keeled over and let's assume I can't fix it for the
next 3 days and just want to move tokens on the remaining three to even out
and see if I can live with it.

But I can't do that! The node that was on the receiving end of the stream
refuses to move, because it's still receiving.

What do I do?

Maxim

-- 
View this message in context: 
http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Node-going-down-when-streaming-data-what-next-tp5962944p5962944.html
Sent from the cassandra-u...@incubator.apache.org mailing list archive at 
Nabble.com.


Re: Schema Design

2011-01-26 Thread buddhasystem

Having separate columns for Year, Month etc seems redundant. It's tons more
efficient to keep say UTC time in POSIX format (basically integer). It's
easy to convert back and forth.

If you want to get a range of dates, in that case you might use Order
Preserving Partitioner, and sort out which systems logged later in client.
Read up on consequences of using OPP.

Whether to shard data as per system depends on how many you have. If more
than a few, don't do that, there are memory considerations.

Cheers

Maxim

-- 
View this message in context: 
http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Schema-Design-tp5964167p5964227.html
Sent from the cassandra-u...@incubator.apache.org mailing list archive at 
Nabble.com.


Re: Node going down when streaming data, what next?

2011-01-26 Thread buddhasystem

Bump. I still don't know what is the best things to do, plz help.
-- 
View this message in context: 
http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Node-going-down-when-streaming-data-what-next-tp5962944p5964231.html
Sent from the cassandra-u...@incubator.apache.org mailing list archive at 
Nabble.com.


Re: Schema Design

2011-01-26 Thread buddhasystem

I used the term sharding a bit frivolously. Sorry. It's just splitting
semantically homogenious data among CFs doesn't scale too well, as each CF
is allocated a piece of memory on the server.
-- 
View this message in context: 
http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Schema-Design-tp5964167p5964326.html
Sent from the cassandra-u...@incubator.apache.org mailing list archive at 
Nabble.com.


Re: Node going down when streaming data, what next?

2011-01-26 Thread buddhasystem

Hello,

from what I know, you don't really have to restart simultaneously,
although of course you don't want to wait.

I finally decided to use removetoken command to actually scratch out the
sickly node from the cluster. I'll bootstrap is later when it's fixed.


-- 
View this message in context: 
http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Node-going-down-when-streaming-data-what-next-tp5962944p5964804.html
Sent from the cassandra-u...@incubator.apache.org mailing list archive at 
Nabble.com.


Why does cassandra stream data when moving tokens?

2011-01-26 Thread buddhasystem

Sorry if this sounds silly, but I can't get my brain around this one: if all
nodes contain replicas, why does the cluster stream data every time I more
or remove a token? If the data is already there, what needs to be streamed?

Thanks
Maxim

-- 
View this message in context: 
http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Why-does-cassandra-stream-data-when-moving-tokens-tp5964839p5964839.html
Sent from the cassandra-u...@incubator.apache.org mailing list archive at 
Nabble.com.


RE: Why does cassandra stream data when moving tokens?

2011-01-26 Thread buddhasystem

Thanks, I'll look at the configuration again.

In the meantime, I can't move the first node in the ring (after I removed
the previous node's token) -- it throws an exception and says data is being
streamed to it -- however, this is not what netstats says! Weirdness
continues...

Maxim

-- 
View this message in context: 
http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Why-does-cassandra-stream-data-when-moving-tokens-tp5964839p5964883.html
Sent from the cassandra-u...@incubator.apache.org mailing list archive at 
Nabble.com.


Re: Forcing GC w/o jconsole

2011-01-25 Thread buddhasystem

Thanks! It doesn't seem to have any effect on GCing dropped CFs, though.

Maxim

-- 
View this message in context: 
http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Forcing-GC-w-o-jconsole-tp5956747p5960100.html
Sent from the cassandra-u...@incubator.apache.org mailing list archive at 
Nabble.com.


Re: Stress test inconsistencies

2011-01-25 Thread buddhasystem

Oleg,

I'm a novice at this, but for what it's worth I can't imagine you can have a
_sustained_ 1kHz insertion rate on a single machine which also does some
reads. If I'm wrong, I'll be glad to learn that I was. It just doesn't seem
to square with a typical seek time on a hard drive.

Maxim

-- 
View this message in context: 
http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Stress-test-inconsistencies-tp5957467p5960182.html
Sent from the cassandra-u...@incubator.apache.org mailing list archive at 
Nabble.com.


Re-partitioning the cluster with nodetool: what's happening?

2011-01-25 Thread buddhasystem

I'm trying re-partition my 4-node cluster to make the load exactly 25% on
each node.
As per recipes found in documentation, I calculate:
 for x in xrange(4):
... print 2**127/4*x
...
0
42535295865117307932921825928971026432
85070591730234615865843651857942052864
127605887595351923798765477786913079296

And I need to move the first one to 0, then the second one to
42535295865117307932921825928971026432 etc.

Once I start the procedure, I see no progress when I look at nodetool
netstats. Nothing's happening. What am I doing wrong?

Thanks,

Maxim

-- 
View this message in context: 
http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Re-partitioning-the-cluster-with-nodetool-what-s-happening-tp5960843p5960843.html
Sent from the cassandra-u...@incubator.apache.org mailing list archive at 
Nabble.com.


Re: Re-partitioning the cluster with nodetool: what's happening?

2011-01-25 Thread buddhasystem

Correction -- what I meant to say that I do see announcements about streaming
in the output, but these are stuck at 0%.

-- 
View this message in context: 
http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Re-partitioning-the-cluster-with-nodetool-what-s-happening-tp5960843p5960851.html
Sent from the cassandra-u...@incubator.apache.org mailing list archive at 
Nabble.com.


Forcing GC w/o jconsole

2011-01-24 Thread buddhasystem

My situation is similar to one described at this link:
http://stackoverflow.com/questions/4155696/how-to-trigger-manual-java-gc-from-linux-console-with-no-x11

I'm trying the following command but it fails (connection refused)

java -jar cmdline-jmxclient-0.10.3.jar - localhost:8081
java.lang:type=Memory gc

What port number do I actually need?

I really have no experience in doing that, if somebody can give me the
correct recipe, this will be much appreciated.

Maxim

-- 
View this message in context: 
http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Forcing-GC-w-o-jconsole-tp5956747p5956747.html
Sent from the cassandra-u...@incubator.apache.org mailing list archive at 
Nabble.com.


Re: Does Major Compaction work on dropped CFs? Doesn't seem so.

2011-01-24 Thread buddhasystem

OK, so I'm looking at this page:

http://wiki.apache.org/cassandra/MemtableSSTable

This looks promising:
A compaction marker is also added to obsolete sstables so they can be
deleted on startup if the server does not perform a GC before being
restarted.

So it would seem that if I restart the server, the obsoleted data should be
GCd out of existence, don't you think? But it's not happening. I brought
down one node, restarted it and the old data is still there.

Ideas?

-- 
View this message in context: 
http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Does-Major-Compaction-work-on-dropped-CFs-Doesn-t-seem-so-tp5946031p5957155.html
Sent from the cassandra-u...@incubator.apache.org mailing list archive at 
Nabble.com.


Re: Does Major Compaction work on dropped CFs? Doesn't seem so.

2011-01-24 Thread buddhasystem

Thanks for the note,

yes, I do know what files I don't need anymore. And, I do realize the
difference between grace period of CFs, and garbage collection (or at least
I hope I do).

On the face value, documentation wasn't precise enough about JVM GC taking
care of dropped CFs. I understand this is why nodetool compact didn't have
the desired effect. I guess I'll have to do manual deletion after all. 

Maxim

-- 
View this message in context: 
http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Does-Major-Compaction-work-on-dropped-CFs-Doesn-t-seem-so-tp5946031p5957252.html
Sent from the cassandra-u...@incubator.apache.org mailing list archive at 
Nabble.com.


Re: Does Major Compaction work on dropped CFs? Doesn't seem so.

2011-01-24 Thread buddhasystem

Thanks Aaron. As I remarked earlier (and it seems it not uncommon) none of
the nodes have X11 installed (I think I could arrange this, but it's a bit
of a hassle). So if I understand correctly, jconsole is a X11 app, and I'm
out of luck with that.

I would agree with you that having a proper nodetool command to zap the data
you know you don't need, would be quite ideal. The reason I'm so retentive
about it is that I plan to test scaling up to 250 million rows, and disk
space matters.
-- 
View this message in context: 
http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Does-Major-Compaction-work-on-dropped-CFs-Doesn-t-seem-so-tp5946031p5957426.html
Sent from the cassandra-u...@incubator.apache.org mailing list archive at 
Nabble.com.


Multiple indexes - how does Cassandra handle these internally?

2011-01-21 Thread buddhasystem

Greetings --

if I use multiple secondary indexes in the query, what will Cassandra do?
Some examples say it will index on first EQ and then loop on others. Does it
ever do a proper index product to avoid inner loops?

Thanks

Maxim

-- 
View this message in context: 
http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Multiple-indexes-how-does-Cassandra-handle-these-internally-tp5947533p5947533.html
Sent from the cassandra-u...@incubator.apache.org mailing list archive at 
Nabble.com.


Does Major Compaction work on dropped CFs? Doesn't seem so.

2011-01-20 Thread buddhasystem

Greetings,

I just used teh nodetool to force a major compaction on my cluster. It seems
like the cfs currently in service were indeed compacted, while the old test
materials (which I dropped from CLI) were still there as tombstones.

Is that the expected behavior? Hmm...

TIA.

-- 
View this message in context: 
http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Does-Major-Compaction-work-on-dropped-CFs-Doesn-t-seem-so-tp5946031p5946031.html
Sent from the cassandra-u...@incubator.apache.org mailing list archive at 
Nabble.com.


Re: Does Major Compaction work on dropped CFs? Doesn't seem so.

2011-01-20 Thread buddhasystem

Thanks!

What's strange anyhow is that the GC period for these cfs expired some days
ago. I thought that a compaction would take care of these tombstones. I used
nodetool to compact.

-- 
View this message in context: 
http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Does-Major-Compaction-work-on-dropped-CFs-Doesn-t-seem-so-tp5946031p5946231.html
Sent from the cassandra-u...@incubator.apache.org mailing list archive at 
Nabble.com.