from:"buddhasystem"


aaron morton wrote:
 
 
 Also a node is be responsible for storing it's token range and acting as a
 replica for other token ranges. So reducing the token range may not have a
 dramatic affect on the storage requirements. 
 

Aaron,

is there a way to configure wimpy nodes such that the replicas are
elsewhere?


--
View this message in context: 
http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/cassandra-nodes-with-mixed-hard-disk-sizes-tp6194071p6195543.html
Sent from the cassandra-u...@incubator.apache.org mailing list archive at 
Nabble.com.

Re: Deleting old SSTables

Jonathan,

for all of us just tinker with test clusters, building confidence in the
product, it would be nice to be able to do same with nodetool, without
jconsole, just my 0.5 penny.  Thanks.


Jonathan Ellis-3 wrote:
 
 From the next paragraph of the same wiki page:
 
 SSTables that are obsoleted by a compaction are deleted asynchronously
 when the JVM performs a GC. You can force a GC from jconsole if
 necessary, but Cassandra will force one itself if it detects that it
 is low on space. A compaction marker is also added to obsolete
 sstables so they can be deleted on startup if the server does not
 perform a GC before being restarted.
 
 On Tue, Mar 22, 2011 at 8:30 AM, Jonathan Colby
 lt;jonathan.co...@gmail.comgt; wrote:
 gt; According to the Wiki Page on compaction:  once compaction is
 finished, the old SSTable files may be deleted*
 gt;
 gt; * http://wiki.apache.org/cassandra/MemtableSSTable
 gt;
 gt; I thought the old SSTables would be deleted automatically, but this
 wiki page got me thinking otherwise.
 gt;
 gt; Question is,  if it is true that old SSTables must be manually
 deleted, how can one safely identify which SSTables can be deleted??
 gt;
 gt; Jon
 gt;
 gt;
 gt;
 gt;
 gt;
 gt;
 
 
 
 -- 
 Jonathan Ellis
 Project Chair, Apache Cassandra
 co-founder of DataStax, the source for professional Cassandra support
 http://www.datastax.com
 


--
View this message in context: 
http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Deleting-old-SSTables-tp6196113p6198172.html
Sent from the cassandra-u...@incubator.apache.org mailing list archive at 
Nabble.com.

Re: 0.7.2 choking on a 5 MB column

Jonathan, wide rows have been discussed. I thought that the limit on number
of columns is way bigger than 45k. What can one expect in reality?

--
View this message in context: 
http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/0-7-2-choking-on-a-5-MB-column-tp6198387p6198548.html
Sent from the cassandra-u...@incubator.apache.org mailing list archive at 
Nabble.com.

Re: 0.7.2 choking on a 5 MB column

I see. I'm doing something even more drastic then, because I'm only inserting
one row in this case, and just use cf.insert(), without batch mutator. It
didn't occur to me that was a bad idea.

So I take it, this method will fail. Hmm.


--
View this message in context: 
http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/0-7-2-choking-on-a-5-MB-column-tp6198387p6198618.html
Sent from the cassandra-u...@incubator.apache.org mailing list archive at 
Nabble.com.

Re: Reading whole row vs a range of columns (pycassa)

2011-03-20 Thread buddhasystem

Aaron, thanks for chiming in.

I'm doing what you said, i.e. all data for a single object (which is quite
lean with about 100 attributes 10 bytes each) just goes into a single
column, as opposed to the previous version of my application, which had all
attributes of each small object mapped to individual columns.

So yes, I perhaps considered having 100 objects in a single column but that
is suboptimal for many reasons (hard to add object later).

My reference to OOP was this -- if I was sticking with the original design,
it could have been advantageous to have OOP since statistically it's likely
that requests for objects are often serial, e.g. often people don't query
for just one object with id=123, but for a series like id=[123..145]. If I
bunch these into rows containing 100 objects each, that promises some
efficiency right there, as I read one row as opposed to say 50.

aaron morton wrote:

I'd collapse all the data for a single object into a single column, not
sure about storing 100 objects in a single column though.

Have you considered any concurrency issues ? e.g. multiple threads /
processes wanting to update different objects in the same group of 100?

Dont understand your reference to the OOP in the context of a reading 100
columns from a row.

Aaron

On 19 Mar 2011, at 16:22, buddhasystem wrote:

gt; As I'm working on this further, I want to understand this:
gt;
gt; Is it advantageous to flatten data in blocks (strings) each
containing a
gt; series of objects, if I know that a serial object read is often
likely, but
gt; don't want to resort to OPP? I worked out the optimal granularity, it
seems.
gt; Is it better to read a serialized single column with 100 objects than
a row
gt; consisting of a hundred columns each modeling an object?
gt;
gt; --
gt; View this message in context:
http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Reading-whole-row-vs-a-range-of-columns-pycassa-tp6186518p6186782.html
gt; Sent from the cassandra-u...@incubator.apache.org mailing list
archive at Nabble.com.

--
View this message in context:
http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Reading-whole-row-vs-a-range-of-columns-pycassa-tp6186518p6190639.html
Sent from the cassandra-u...@incubator.apache.org mailing list archive at
Nabble.com.

Undead rows after nodetool compact

2011-03-18 Thread buddhasystem

This has been discussed once, but I don't remember the outcome. I insert a
row and then delete the key immediately. I then run nodetool compact. In
cassanra-cli, list cf still return 1 empty row. This is not a showstopper
but damn unpretty. Is there a way to make deleted rows go, immediately?


--
View this message in context: 
http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Undead-rows-after-nodetool-compact-tp6186021p6186021.html
Sent from the cassandra-u...@incubator.apache.org mailing list archive at 
Nabble.com.

Reading whole row vs a range of columns (pycassa)

2011-03-18 Thread buddhasystem

Is there is noticeable difference in speed between reading the whole row
through Pycassa, vs a range of columns? Both rows and columns are pretty
slim.


--
View this message in context: 
http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Reading-whole-row-vs-a-range-of-columns-pycassa-tp6186518p6186518.html
Sent from the cassandra-u...@incubator.apache.org mailing list archive at 
Nabble.com.

Re: Reading whole row vs a range of columns (pycassa)

2011-03-18 Thread buddhasystem

As I'm working on this further, I want to understand this:

Is it advantageous to flatten data in blocks (strings) each containing a
series of objects, if I know that a serial object read is often likely, but
don't want to resort to OPP? I worked out the optimal granularity, it seems.
Is it better to read a serialized single column with 100 objects than a row
consisting of a hundred columns each modeling an object?

--
View this message in context: 
http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Reading-whole-row-vs-a-range-of-columns-pycassa-tp6186518p6186782.html
Sent from the cassandra-u...@incubator.apache.org mailing list archive at 
Nabble.com.

Does concurrent_reads relate to number of drives in RAID0?

Hello, in the instructions, I need to link concurrent_reads to number of
drives. Is this related to number of physical drives that I have in my
RAID0, or something else?

--
View this message in context: 
http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Does-concurrent-reads-relate-to-number-of-drives-in-RAID0-tp6182346p6182346.html
Sent from the cassandra-u...@incubator.apache.org mailing list archive at 
Nabble.com.

Re: Does concurrent_reads relate to number of drives in RAID0?

Thanks to all for replying, but frankly I didn't get the answer I wanted.
Does the number of disks apply to number of spindles in RAID0? Or
something else like a separate disk for commitlog and for data?


--
View this message in context: 
http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Does-concurrent-reads-relate-to-number-of-drives-in-RAID0-tp6182346p6183033.html
Sent from the cassandra-u...@incubator.apache.org mailing list archive at 
Nabble.com.

Re: Does concurrent_reads relate to number of drives in RAID0?

Thanks Peter, I can see it better now.


--
View this message in context: 
http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Does-concurrent-reads-relate-to-number-of-drives-in-RAID0-tp6182346p6183051.html
Sent from the cassandra-u...@incubator.apache.org mailing list archive at 
Nabble.com.

Re: Does concurrent_reads relate to number of drives in RAID0?

Where and how do I choose it?


--
View this message in context: 
http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Does-concurrent-reads-relate-to-number-of-drives-in-RAID0-tp6182346p6183069.html
Sent from the cassandra-u...@incubator.apache.org mailing list archive at 
Nabble.com.

Please help decipher /proc/cpuinfo for optimal Cassandra config

Dear All,
this is from my new Cassandra server. It obviously uses hyperthreading, I
just don't know how to translate this to concurrent readers and writers in
cassandra.yaml -- can somebody take a look and tell me what number of cores
I need to assume for concurrent_reads and concurrent_writes. Is it 24?
Thanks!

[cassandra@cassandra01 bin]$ cat /proc/cpuinfo
processor   : 0
vendor_id   : GenuineIntel
cpu family  : 6
model   : 44
model name  : Intel(R) Xeon(R) CPU   X5650  @ 2.67GHz
stepping: 2
cpu MHz : 1596.000
cache size  : 12288 KB
physical id : 0
siblings: 12
core id : 0
cpu cores   : 6
apicid  : 0
initial apicid  : 0
fpu : yes
fpu_exception   : yes
cpuid level : 11
wp  : yes
flags   : fpu vme de pse tsc msr pae mce cx8 apic mtrr pge mca cmov
pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb
rdtscp lm constant_tsc arch_perfmon pebs bts rep_good xtopology nonstop_tsc
aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 cx16
xtpr pdcm dca sse4_1 sse4_2 popcnt aes lahf_lm arat tpr_shadow vnmi
flexpriority ept vpid
bogomips: 5333.91
clflush size: 64
cache_alignment : 64
address sizes   : 40 bits physical, 48 bits virtual
power management:

processor   : 1
vendor_id   : GenuineIntel
cpu family  : 6
model   : 44
model name  : Intel(R) Xeon(R) CPU   X5650  @ 2.67GHz
stepping: 2
cpu MHz : 1596.000
cache size  : 12288 KB
physical id : 0
siblings: 12
core id : 1
cpu cores   : 6
apicid  : 2
initial apicid  : 2
fpu : yes
fpu_exception   : yes
cpuid level : 11
wp  : yes
flags   : fpu vme de pse tsc msr pae mce cx8 apic mtrr pge mca cmov
pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb
rdtscp lm constant_tsc arch_perfmon pebs bts rep_good xtopology nonstop_tsc
aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 cx16
xtpr pdcm dca sse4_1 sse4_2 popcnt aes lahf_lm arat tpr_shadow vnmi
flexpriority ept vpid
bogomips: 5333.15
clflush size: 64
cache_alignment : 64
address sizes   : 40 bits physical, 48 bits virtual
power management:

processor   : 2
vendor_id   : GenuineIntel
cpu family  : 6
model   : 44
model name  : Intel(R) Xeon(R) CPU   X5650  @ 2.67GHz
stepping: 2
cpu MHz : 1596.000
cache size  : 12288 KB
physical id : 0
siblings: 12
core id : 2
cpu cores   : 6
apicid  : 4
initial apicid  : 4
fpu : yes
fpu_exception   : yes
cpuid level : 11
wp  : yes
flags   : fpu vme de pse tsc msr pae mce cx8 apic mtrr pge mca cmov
pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb
rdtscp lm constant_tsc arch_perfmon pebs bts rep_good xtopology nonstop_tsc
aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 cx16
xtpr pdcm dca sse4_1 sse4_2 popcnt aes lahf_lm arat tpr_shadow vnmi
flexpriority ept vpid
bogomips: 5333.15
clflush size: 64
cache_alignment : 64
address sizes   : 40 bits physical, 48 bits virtual
power management:

processor   : 3
vendor_id   : GenuineIntel
cpu family  : 6
model   : 44
model name  : Intel(R) Xeon(R) CPU   X5650  @ 2.67GHz
stepping: 2
cpu MHz : 1596.000
cache size  : 12288 KB
physical id : 0
siblings: 12
core id : 8
cpu cores   : 6
apicid  : 16
initial apicid  : 16
fpu : yes
fpu_exception   : yes
cpuid level : 11
wp  : yes
flags   : fpu vme de pse tsc msr pae mce cx8 apic mtrr pge mca cmov
pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb
rdtscp lm constant_tsc arch_perfmon pebs bts rep_good xtopology nonstop_tsc
aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 cx16
xtpr pdcm dca sse4_1 sse4_2 popcnt aes lahf_lm arat tpr_shadow vnmi
flexpriority ept vpid
bogomips: 5333.15
clflush size: 64
cache_alignment : 64
address sizes   : 40 bits physical, 48 bits virtual
power management:

processor   : 4
vendor_id   : GenuineIntel
cpu family  : 6
model   : 44
model name  : Intel(R) Xeon(R) CPU   X5650  @ 2.67GHz
stepping: 2
cpu MHz : 1596.000
cache size  : 12288 KB
physical id : 0
siblings: 12
core id : 9
cpu cores   : 6
apicid  : 18
initial apicid  : 18
fpu : yes
fpu_exception   : yes
cpuid level : 11
wp  : yes
flags   : fpu vme de pse tsc msr pae mce cx8 apic mtrr pge mca cmov
pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb
rdtscp lm constant_tsc arch_perfmon pebs bts rep_good xtopology nonstop_tsc
aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 cx16

Re: Is column update column-atomic or row atomic?

Hello Peter, thanks for the note.

I'm not looking for anything fancy. It's just when I'm looking at the
following bit of Pycassa docs, it's not 100% clear to me that it won't
overwrite the entire row for the key, if I want to simply add an extra
column {'foo':'bar'} to the already existing row. I don't care about
cross-node consistency at this point.

insert(key, columns[, timestamp][, ttl][, write_consistency_level])¶

Insert or update columns in the row with key key.

columns should be a dictionary of columns or super columns to insert or
update. If this is a standard column family, columns should look like
{column_name: column_value}. If this is a super column family, columns
should look like {super_column_name: {sub_column_name: value}}

--
View this message in context: 
http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Is-column-update-column-atomic-or-row-atomic-tp6174445p6179492.html
Sent from the cassandra-u...@incubator.apache.org mailing list archive at 
Nabble.com.

Re: Is column update column-atomic or row atomic?

Thanks for clarification, Tyler, sorry again for the basic question. I've
been doing straight inserts from Oracle so far but now I need to update rows
with new columns.


--
View this message in context: 
http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Is-column-update-column-atomic-or-row-atomic-tp6174445p6179536.html
Sent from the cassandra-u...@incubator.apache.org mailing list archive at 
Nabble.com.

Re: Please help decipher /proc/cpuinfo for optimal Cassandra config

Thanks! Docs say it's good to set it to 8*Ncores, are saying you see 8 cores
in this output? I know I need to go way above default 32 with this setup.

--
View this message in context: 
http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Please-help-decipher-proc-cpuinfo-for-optimal-Cassandra-config-tp6179487p6179539.html
Sent from the cassandra-u...@incubator.apache.org mailing list archive at 
Nabble.com.

Is column update column-atomic or row atomic?

2011-03-15 Thread buddhasystem

Sorry for the rather primitive question, but it's not clear to me if I need
to fetch the whole row, add a column as a dictionary entry and re-insert it
if I want to expand the row by one column. Help will be appreciated.


--
View this message in context: 
http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Is-column-update-column-atomic-or-row-atomic-tp6174445p6174445.html
Sent from the cassandra-u...@incubator.apache.org mailing list archive at 
Nabble.com.

Re: Is column update column-atomic or row atomic?

2011-03-15 Thread buddhasystem

Thanks. Can you give me a pycassa example, if possible?

Thanks!


--
View this message in context: 
http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Is-column-update-column-atomic-or-row-atomic-tp6174445p6174487.html
Sent from the cassandra-u...@incubator.apache.org mailing list archive at 
Nabble.com.

Re: Cassandra LongType data insertion problem for secondary index usage

2011-03-10 Thread buddhasystem

Tyler, as a collateral issue - I've been wondering for a while what advantage
if any it buys me, if I declare a value 'long' (which it roughly is) as
opposed to passing around strings. String is flattened onto a replica of
itself, I assume? No conversion? Maybe it even means better speed.

Thanks,
Maxim

--
View this message in context: 
http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Cassandra-LongType-data-insertion-problem-for-secondary-index-usage-tp6158486p6159840.html
Sent from the cassandra-u...@incubator.apache.org mailing list archive at 
Nabble.com.

null vs value not found?


I'm doing insertion with a pycassa client. It seems to work in most cases,
but sometimes, when I go to Cassandra-cli, and query with key and column
that I inserted, I get null whereas I shouldn't. What could be causes for
that?
-- 
View this message in context: 
http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/null-vs-value-not-found-tp6061828p6061828.html
Sent from the cassandra-u...@incubator.apache.org mailing list archive at 
Nabble.com.

Re: null vs value not found?


Thanks Tyler,

ColumnFamily: index1
  Columns sorted by: org.apache.cassandra.db.marshal.AsciiType
  Row cache size / save period: 0.0/0
  Key cache size / save period: 1.0/3600
  Memtable thresholds: 0.8765625/50/60
  GC grace seconds: 864000
  Compaction min/max thresholds: 4/32
  Read repair chance: 1.0
  Built indexes: []

I pretty much went with the default settings, and the column name is
'CATALOG'.

Maxim




Tyler Hobbs-2 wrote:
 
 On Thu, Feb 24, 2011 at 2:27 PM, buddhasystem potek...@bnl.gov wrote:
 

 I'm doing insertion with a pycassa client. It seems to work in most
 cases,
 but sometimes, when I go to Cassandra-cli, and query with key and column
 that I inserted, I get null whereas I shouldn't. What could be causes
 for
 that?

 
 Could you clarify what column name and value you are using as well as the
 comparator and validator types?
 
 -- 
 Tyler Hobbs
 Software Engineer, DataStax http://datastax.com/
 Maintainer of the pycassa http://github.com/pycassa/pycassa Cassandra
 Python client library
 
 

-- 
View this message in context: 
http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/null-vs-value-not-found-tp6061828p6061900.html
Sent from the cassandra-u...@incubator.apache.org mailing list archive at 
Nabble.com.

Re: null vs value not found?


Thanks! You are right. I see exception but have no idea what went wrong.


ERROR [ReadStage:14] 2011-02-24 21:51:29,374 AbstractCassandraDaemon.java
(line 113) Fatal exception in thread Thread[ReadStage:14,5,main]
java.io.IOError: java.io.EOFException
at
org.apache.cassandra.db.columniterator.SSTableNamesIterator.init(SSTableNamesIterator.java:75)
at
org.apache.cassandra.db.filter.NamesQueryFilter.getSSTableColumnIterator(NamesQueryFilter.java:59)
at
org.apache.cassandra.db.filter.QueryFilter.getSSTableColumnIterator(QueryFilter.java:80)
at
org.apache.cassandra.db.ColumnFamilyStore.getTopLevelColumns(ColumnFamilyStore.java:1316)
at
org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(ColumnFamilyStore.java:1205)
at
org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(ColumnFamilyStore.java:1134)
at org.apache.cassandra.db.Table.getRow(Table.java:386)
at
org.apache.cassandra.db.SliceByNamesReadCommand.getRow(SliceByNamesReadCommand.java:60)
at
org.apache.cassandra.db.ReadVerbHandler.doVerb(ReadVerbHandler.java:69)
at
org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:70)
at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(Unknown
Source)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown
Source)
at java.lang.Thread.run(Unknown Source)
Caused by: java.io.EOFException
at java.io.DataInputStream.readInt(Unknown Source)
at
org.apache.cassandra.utils.BloomFilterSerializer.deserialize(BloomFilterSerializer.java:48)
at
org.apache.cassandra.utils.BloomFilterSerializer.deserialize(BloomFilterSerializer.java:30)
at
org.apache.cassandra.io.sstable.IndexHelper.defreezeBloomFilter(IndexHelper.java:108)
at
org.apache.cassandra.db.columniterator.SSTableNamesIterator.read(SSTableNamesIterator.java:106)
at
org.apache.cassandra.db.columniterator.SSTableNamesIterator.init(SSTableNamesIterator.java:71)
... 12 more

-- 
View this message in context: 
http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/null-vs-value-not-found-tp6061828p6061983.html
Sent from the cassandra-u...@incubator.apache.org mailing list archive at 
Nabble.com.

Re: Homebrew CF-indexing vs secondary indexing


FWIW, for me the advantage of homebrew indexes is that they can be a lot more
sophisticated than the standard -- I can hash combinations of column values
to whatever I want. I also put counters on column values in the index, so
there is lots of functionality. Of course, I can do it because my data
becomes read-only, I know it's a luxury.

-- 
View this message in context: 
http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Homebrew-CF-indexing-vs-secondary-indexing-tp6062677p6062705.html
Sent from the cassandra-u...@incubator.apache.org mailing list archive at 
Nabble.com.

Will the large datafile size affect the performance?

2011-02-23 Thread buddhasystem


I know that theoretically it should not (apart from compaction issues), but
maybe somebody has experience showing otherwise:

My test cluster now has 250GB of data and will have 1.5TB in its
reincarnation. If all these data is in a single CF -- will it cause read or
write performance problems? Should I shard it? One advantage of splitting
the data would be reducing the impact of compaction and repairs (or so I
naively assume).

TIA

Maxim

-- 
View this message in context: 
http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Will-the-large-datafile-size-affect-the-performance-tp6057991p6057991.html
Sent from the cassandra-u...@incubator.apache.org mailing list archive at 
Nabble.com.

Can I count on Super Column Families why planing 3 years out?

2011-02-23 Thread buddhasystem


There was a discussion here on how well (or not so well) the Super CFs are
supported. I now need to make a strategic decision as to how I plan my data.
What's the consensus -- will the super CF be there 3 years out?


TIA
Maxim

-- 
View this message in context: 
http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Can-I-count-on-Super-Column-Families-why-planing-3-years-out-tp6057997p6057997.html
Sent from the cassandra-u...@incubator.apache.org mailing list archive at 
Nabble.com.

How come key cache increases speed by x4?

2011-02-23 Thread buddhasystem


Well I know the cache is there for a reason, I just can't explain the factor
of 4 when I run my queries on a hot vs cold cache. My queries are actually a
chain of one on an inverted index, which produces a tuple of keys to be used
in the main query. The inverted index query should be downright trivial.

I see the turnaround time per row go down to 1 ms from 4 ms. Am I missing
something? Why such a large factor?

TIA

Maxim

-- 
View this message in context: 
http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/How-come-key-cache-increases-speed-by-x4-tp6058435p6058435.html
Sent from the cassandra-u...@incubator.apache.org mailing list archive at 
Nabble.com.

Virtues and pitfall of using TYPES?

2011-02-18 Thread buddhasystem


I've been too smart for my own good trying to type columns, on the theory
that it would later increase performance by having more efficient
comparators in place. So if a string represents an integer, I would convert
it to an integer and declare the column as such. Same for LONG.

What I found is that during the write operation, the type conversion kills
the performance. It's really not too trivial amount of time.

Has anyone had a similar experience?

-- 
View this message in context: 
http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Virtues-and-pitfall-of-using-TYPES-tp6042432p6042432.html
Sent from the cassandra-u...@incubator.apache.org mailing list archive at 
Nabble.com.

Re: Virtues and pitfall of using TYPES?

2011-02-18 Thread buddhasystem

Dude, I never mentioned the server side, sorry if it wasn't obvious.
As for python being slow, I'm not going away from it. It performs
amazingly well in other circumstances.

Jonathan Ellis-3 wrote:

That doesn't make sense to me. IntegerType validation is a no-op and
LongType validation is pretty close (just a size check).

If you meant that the conversion is killing performance on your
client, you should switch to a more performant client language. :)

On Fri, Feb 18, 2011 at 9:56 PM, buddhasystem potek...@bnl.gov wrote:

I've been too smart for my own good trying to type columns, on the theory
that it would later increase performance by having more efficient
comparators in place. So if a string represents an integer, I would
convert
it to an integer and declare the column as such. Same for LONG.

What I found is that during the write operation, the type conversion
kills
the performance. It's really not too trivial amount of time.

Has anyone had a similar experience?

--
View this message in context:
http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Virtues-and-pitfall-of-using-TYPES-tp6042432p6042432.html
Sent from the cassandra-u...@incubator.apache.org mailing list archive at
Nabble.com.

--
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of DataStax, the source for professional Cassandra support
http://www.datastax.com

--
View this message in context:
http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Virtues-and-pitfall-of-using-TYPES-tp6042432p6042601.html
Sent from the cassandra-u...@incubator.apache.org mailing list archive at
Nabble.com.

Re: create additional secondary index

2011-02-16 Thread buddhasystem


I sidestep this problem by using a Python script (pycassa-based) where I
configure my CFs. This way, it's reproducible and documented.
-- 
View this message in context: 
http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/create-additional-secondary-index-tp6033574p6033683.html
Sent from the cassandra-u...@incubator.apache.org mailing list archive at 
Nabble.com.

What is the most solid version of Cassandra? No secondary indexes needed.

2011-02-15 Thread buddhasystem


Hello,

we are acquiring new hardware for our cluster and will be installing it
soon. It's likely that I won't need to rely on secondary index
functionality, as data will be write-once read-many and I can get away with
inverse index creation at load time, plus I have some more complex indexing
in mind than comes packaged (too much to explain here).

So, if I don't need indexes, what is the most stable, reliable version of
Cassandra that I can put in production? I'm seeing bug reports here and some
sound quite serious, I just want something that works day in, day out.

Thank you,

Maxim

-- 
View this message in context: 
http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/What-is-the-most-solid-version-of-Cassandra-No-secondary-indexes-needed-tp6028966p6028966.html
Sent from the cassandra-u...@incubator.apache.org mailing list archive at 
Nabble.com.

Re: What is the most solid version of Cassandra? No secondary indexes needed.

2011-02-15 Thread buddhasystem


Thank you! It's just that 7.1 seems the bleeding edge now (a serious bug
fixed today). Would you still trust it as a production-level service? I'm
just slightly concerned. I don't want to create a perception among our IT
that the product is not ready for prime time.
-- 
View this message in context: 
http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/What-is-the-most-solid-version-of-Cassandra-No-secondary-indexes-needed-tp6028966p6029047.html
Sent from the cassandra-u...@incubator.apache.org mailing list archive at 
Nabble.com.

Re: What is the most solid version of Cassandra? No secondary indexes needed.

2011-02-15 Thread buddhasystem


Thank you Attila!

We will indeed have a few months of breaking in. I suppose I'll
keep my fingers crossed and see that 0.7.X is very stable. So I'll
deploy 0.7.1 -- I will need to apply all the patches, there is no
cumulative download, is that correct?


Attila Babo wrote:
 
 0.6.8 is stable and production ready, the later versions of the 0.6
 branch has issues. No offense, but the 0.7 branch is fairly unstable
 from my experience. I have reproduced all the open bugs with a
 production dataset, even when tried to rebuild it from scratch after a
 complete loss.
 
 If you have a few month before going to production your best bet is
 still 0.7.1 as it will stabilize but the switch between versions is
 painful.
 
 /Attila
 
 

-- 
View this message in context: 
http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/What-is-the-most-solid-version-of-Cassandra-No-secondary-indexes-needed-tp6028966p6029622.html
Sent from the cassandra-u...@incubator.apache.org mailing list archive at 
Nabble.com.

Re: Column name size

2011-02-11 Thread buddhasystem


I've been thinking about this as well. I'm migrating data from a large Oracle
database, and the RDBMS columns names are descriptive (good) and long (bad).
For now I just keep them when populating Cassandra, but I can shave off
about 30% of storage by hashing names. I don't need any automation and can
just maintain a dictionary of serial numbers to strings and vice versa, it's
still under a 100 items. When you start building inverse indexes and other
auxiliary structures, the size effect may be amiplified.
-- 
View this message in context: 
http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Column-name-size-tp6015127p6016109.html
Sent from the cassandra-u...@incubator.apache.org mailing list archive at 
Nabble.com.

Re: Limit on amount of CFs

2011-02-11 Thread buddhasystem


I asked a similar question (but didn't receive an answer). I'm trying to see
if a large number of CFs might be beneficial. One thing I can think about is
the size of extra storage needed for compaction -- obviously it will be
smaller in case of many smaller CFs.
-- 
View this message in context: 
http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Limit-on-amount-of-CFs-tp6013702p6016125.html
Sent from the cassandra-u...@incubator.apache.org mailing list archive at 
Nabble.com.

Re: Calculating the size of rows in KBs

2011-02-11 Thread buddhasystem


Does it also mean that the whole row will be deserialized when a query comes
just for one column?

-- 
View this message in context: 
http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Calculating-the-size-of-rows-in-KBs-tp6011243p6017870.html
Sent from the cassandra-u...@incubator.apache.org mailing list archive at 
Nabble.com.

Re: Specifying row caching on per query basis ?

2011-02-09 Thread buddhasystem

Jonathan, what if the data is really homogeneous, but over a long period of
time. I decided that the users who hit the database for recent past should
have a better ride. Splitting into a separate CF also has costs, right?

In fact, if I were to go this way, do you think I can crank down the key
caches? If yes, down to what level, zero?

Thanks!

Jonathan Ellis-3 wrote:

Not really, no. If you can't trust LRU to cache the hottest rows
perhaps you should split the data into different ColumnFamilies.

On Wed, Feb 9, 2011 at 1:43 PM, Ertio Lew ertio...@gmail.com wrote:
Is this under consideration for future releases ? or being thought
about!?

On Thu, Feb 10, 2011 at 12:56 AM, Jonathan Ellis jbel...@gmail.com
wrote:
Currently there is not.

On Wed, Feb 9, 2011 at 12:04 PM, Ertio Lew ertio...@gmail.com wrote:
Is there any way to specify on per query basis(like we specify the
Consistency level), what rows be cached while you're reading them,
from a row_cache enabled CF. I believe, this could lead to much more
efficient use of the cache space!!( if you use same data for different
features/ parts in your application which have different caching
needs).

--
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of DataStax, the source for professional Cassandra support
http://www.datastax.com

--
View this message in context:
http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Specifying-row-caching-on-per-query-basis-tp6008838p6009462.html
Sent from the cassandra-u...@incubator.apache.org mailing list archive at
Nabble.com.

What will happen if I try to compact with insufficient headroom?

2011-02-09 Thread buddhasystem


One of my nodes is 76% full. I know that one of CFs represents 90% of the
data, others are really minor. Can I still compact under these conditions?
Will it crash and lose the data? Will it try to create one very large file
out of fragments, for that dominating CF?

TIA

-- 
View this message in context: 
http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/What-will-happen-if-I-try-to-compact-with-insufficient-headroom-tp6009619p6009619.html
Sent from the cassandra-u...@incubator.apache.org mailing list archive at 
Nabble.com.

Can serialized objects in columns serve as ersatz superCFs?

2011-02-08 Thread buddhasystem


Seeing that discussion here about indexes not supported in superCFs, and less
than clear future of superCFs altogether, I was thinking about getting a
modicum of same functionality with serialized objects inside columns. This
way the column key becomes sort of analog of supercolumn key, and I handle
the dictionaries I receive in the client.

Does this sound OK?

-- 
View this message in context: 
http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Can-serialized-objects-in-columns-serve-as-ersatz-superCFs-tp6003775p6003775.html
Sent from the cassandra-u...@incubator.apache.org mailing list archive at 
Nabble.com.

Re: Can serialized objects in columns serve as ersatz superCFs?

2011-02-08 Thread buddhasystem


Thanks for the comment! In my case, I want to store various time slices as
indexes, so the content can be serialized as comma-separated concatenation
of unique object IDs. Example: on 20101204, multiple clouds experienced a
variety of errors in job execution. In addition, multiple users ran (or
failed) on different clouds. If I combine user id, cloud id and error code,
I can relatively easily drill for errors on a particular date. So each CF
maps to a date, and each column in it is a compound index.

-- 
View this message in context: 
http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Can-serialized-objects-in-columns-serve-as-ersatz-superCFs-tp6003775p6004834.html
Sent from the cassandra-u...@incubator.apache.org mailing list archive at 
Nabble.com.

Java bombs during compaction, please help

2011-02-07 Thread buddhasystem


Hello,
one node in my 3-machine cluster cannot perform compaction. I tried multiple
times, it ran out of heap space once and I increased it. Now I'm getting the
dump below (after it does run for a few minutes). I hope somebody can shed a
little light on what' going on, because I'm at a loss and this is a real
show stopper.


[me@mymachine]~/cassandra-test% Error occured while compacting keyspace
Tracer
java.util.concurrent.ExecutionException: java.lang.NullPointerException
at java.util.concurrent.FutureTask$Sync.innerGet(Unknown Source)
at java.util.concurrent.FutureTask.get(Unknown Source)
at
org.apache.cassandra.db.CompactionManager.performMajor(CompactionManager.java:186)
at
org.apache.cassandra.db.ColumnFamilyStore.forceMajorCompaction(ColumnFamilyStore.java:1766)
at
org.apache.cassandra.service.StorageService.forceTableCompaction(StorageService.java:1236)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source)
at java.lang.reflect.Method.invoke(Unknown Source)
at
com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(Unknown Source)
at
com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(Unknown Source)
at com.sun.jmx.mbeanserver.MBeanIntrospector.invokeM(Unknown Source)
at com.sun.jmx.mbeanserver.PerInterface.invoke(Unknown Source)
at com.sun.jmx.mbeanserver.MBeanSupport.invoke(Unknown Source)
at
com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.invoke(Unknown Source)
at com.sun.jmx.mbeanserver.JmxMBeanServer.invoke(Unknown Source)
at javax.management.remote.rmi.RMIConnectionImpl.doOperation(Unknown
Source)
at javax.management.remote.rmi.RMIConnectionImpl.access$200(Unknown
Source)
at
javax.management.remote.rmi.RMIConnectionImpl$PrivilegedOperation.run(Unknown
Source)
at
javax.management.remote.rmi.RMIConnectionImpl.doPrivilegedOperation(Unknown
Source)
at javax.management.remote.rmi.RMIConnectionImpl.invoke(Unknown
Source)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source)
at java.lang.reflect.Method.invoke(Unknown Source)
at sun.rmi.server.UnicastServerRef.dispatch(Unknown Source)
at sun.rmi.transport.Transport$1.run(Unknown Source)
at java.security.AccessController.doPrivileged(Native Method)
at sun.rmi.transport.Transport.serviceCall(Unknown Source)
at sun.rmi.transport.tcp.TCPTransport.handleMessages(Unknown Source)
at sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run0(Unknown
Source)
at sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run(Unknown
Source)
at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(Unknown
Source)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown
Source)
at java.lang.Thread.run(Unknown Source)
Caused by: java.lang.NullPointerException
at
org.apache.cassandra.io.util.ColumnIterator$1.getKey(ColumnSortedMap.java:276)
at
org.apache.cassandra.io.util.ColumnIterator$1.getKey(ColumnSortedMap.java:263)
at
java.util.concurrent.ConcurrentSkipListMap.buildFromSorted(Unknown Source)
at java.util.concurrent.ConcurrentSkipListMap.init(Unknown Source)
at
org.apache.cassandra.db.SuperColumnSerializer.deserialize(SuperColumn.java:384)
at
org.apache.cassandra.db.SuperColumnSerializer.deserialize(SuperColumn.java:332)
at
org.apache.cassandra.db.ColumnFamilySerializer.deserializeColumns(ColumnFamilySerializer.java:129)
at
org.apache.cassandra.io.sstable.SSTableIdentityIterator.getColumnFamilyWithColumns(SSTableIdentityIterator.java:137)
at
org.apache.cassandra.io.PrecompactedRow.init(PrecompactedRow.java:78)
at
org.apache.cassandra.io.CompactionIterator.getCompactedRow(CompactionIterator.java:139)
at
org.apache.cassandra.io.CompactionIterator.getReduced(CompactionIterator.java:108)
at
org.apache.cassandra.io.CompactionIterator.getReduced(CompactionIterator.java:43)
at
org.apache.cassandra.utils.ReducingIterator.computeNext(ReducingIterator.java:73)
at
com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:136)
at
com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:131)
at
org.apache.commons.collections.iterators.FilterIterator.setNextObject(FilterIterator.java:183)
at
org.apache.commons.collections.iterators.FilterIterator.hasNext(FilterIterator.java:94)
at
org.apache.cassandra.db.CompactionManager.doCompaction(CompactionManager.java:427)
at

Re: Java bombs during compaction, please help

2011-02-07 Thread buddhasystem


Thanks Jonathan -- does it mean that the machine is experiencing IO problems?

-- 
View this message in context: 
http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Java-bombs-during-compaction-please-help-tp6001773p6002320.html
Sent from the cassandra-u...@incubator.apache.org mailing list archive at 
Nabble.com.

Re: Finding the intersection results of column sets of two rows

2011-02-06 Thread buddhasystem


Hello,

If the amount of data is _that_ small, you'll have a much easier life with
MySQL, which supports the join procedure -- because that's exactly what
you want to achieve.


asil klin wrote:
 
 Hi all,
 
 I want to procure the intersection of columns set of two rows (from 2
 different column families).
 
 To achieve the intersection results, Can I, first retrieve all
 columns(around 300) from first row and just query by those column
 names in the second row(which contains maximum 100 000 columns) ?
 
 I am using the results during the write time  not before presentation
 to the user, so latency wont be much concern while writing.
 
 Is it the proper way to procure intersection results of two rows ?
 
 Would love to hear your comments..
 
 
 -
 
 Regards,
 Asil
 
 

-- 
View this message in context: 
http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Finding-the-intersection-results-of-column-sets-of-two-rows-tp5997248p5997743.html
Sent from the cassandra-u...@incubator.apache.org mailing list archive at 
Nabble.com.

How bad is teh impact of compaction on performance?

2011-02-05 Thread buddhasystem


Just wanted to see if someone with experience in running an actual service
can advise me:

how often do you run nodetool compact on your nodes? Do you stagger it in
time, for each node? How badly is performance affected?

I know this all seems too generic but then again no two clusters are created
equal anyhow. Just wanted to get a feel.

Thanks,
Maxim

-- 
View this message in context: 
http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/How-bad-is-teh-impact-of-compaction-on-performance-tp5995868p5995868.html
Sent from the cassandra-u...@incubator.apache.org mailing list archive at 
Nabble.com.

Re: How bad is teh impact of compaction on performance?

2011-02-05 Thread buddhasystem

Thanks Edward. In our usage scenario, there is never downtime, it's a global
24/7 operation.

What is impacted the worst, the read or write?

How does a node handle compaction when there is a spike of writes coming to
it?

Edward Capriolo wrote:

On Sat, Feb 5, 2011 at 11:59 AM, buddhasystem potek...@bnl.gov wrote:

Just wanted to see if someone with experience in running an actual
service
can advise me:

how often do you run nodetool compact on your nodes? Do you stagger it in
time, for each node? How badly is performance affected?

I know this all seems too generic but then again no two clusters are
created
equal anyhow. Just wanted to get a feel.

Thanks,
Maxim

--
View this message in context:
http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/How-bad-is-teh-impact-of-compaction-on-performance-tp5995868p5995868.html
Sent from the cassandra-u...@incubator.apache.org mailing list archive at
Nabble.com.

This is an interesting topic. Cassandra can now remove tombstones on
non-major compaction. For some use cases you may not have to trigger
nodetool compact yourself to remove tombstones. Use cases that do not
to many updates, deletes may have the least need to run compaction
yourself.

!However! If you have smaller SSTables, or less SSTables your read
operations will be more efficient.

if you have downtime such as from 1AM-6AM. Going through a major
compaction might shrink you dataset significantly and that will make
reads better.

Compaction can be more or less intensive. The largest factor is is row
size. Users with large rows probably see faster compaction while
smaller rows see it take a long time. You can lower the priority of
the compaction thread for experimentation.

As to the performance you want to get your cluster to the state where
it is not compacting often. This may mean you need more nodes to
handle writes.

I graph the compaction information from JMX
http://www.jointhegrid.com/cassandra/cassandra-cacti-m6.jsp
to get a feel for how often a node is compacting on average. Also I
cross reference the compaction with Read latency and IO graphs I have
to see what impact compaction has on reads.

Forcing a major compaction also lowers the chances a compaction will
happen during the day on peak time. I major compact a few cluster
nodes each night through cron (gc time 3 days). This has been good for
keeping our data on disk as small as possible. Forcing the major
compact at night uses IO, but i find it saves IO over the course of
the day because each read seeks less on disk.

--
View this message in context:
http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/How-bad-is-the-impact-of-compaction-on-performance-tp5995868p5995978.html
Sent from the cassandra-u...@incubator.apache.org mailing list archive at
Nabble.com.

Re: order of index expressions

2011-02-05 Thread buddhasystem


Jonathan,

what's the implementation of that? I.e. is is a product of indexes or nested
loops?

Thanks,

Maxim

-- 
View this message in context: 
http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/order-of-index-expressions-tp5995909p5996488.html
Sent from the cassandra-u...@incubator.apache.org mailing list archive at 
Nabble.com.

Re: Using Cassandra to store files

2011-02-04 Thread buddhasystem


Even when storage is in NFS, Cassandra can still be quite useful as a file
catalog. Your physical storage can change, move etc. Therefore, it's a good
idea to provide mapping of logical names to physical store points (which in
fact can be many). This is a standard technique used in mass storage.

-- 
View this message in context: 
http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Using-Cassandra-to-store-files-tp5988698p5993357.html
Sent from the cassandra-u...@incubator.apache.org mailing list archive at 
Nabble.com.

Re: Moving data

2011-02-04 Thread buddhasystem


FWIW, I'm working on migrating a large amount of data out of Oracle into my
test cluster. The data has been warehoused as CSV files on Amazon S3. Having
that in place allows me to not put extra load on the production service when
doing many repeated tests. I then parse the data using CSV Python module
and, as Jonathan says, use threads to batch upload data into Cassandra.
Notable points: since the data is relatively sparse (i.e. many zeros for
integers and empty strings for strings etc), I establish a default value
dictionary, and don't write these to Cassandra at all -- they can be
reconstructed as needed when reading back.

Also, make sure you wrap Cassandra writes etc into exceptions. When load is
high, you might get timeouts at TSocket level etc.

-- 
View this message in context: 
http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Moving-data-tp5992669p5993443.html
Sent from the cassandra-u...@incubator.apache.org mailing list archive at 
Nabble.com.

Re: Using Cassandra to store files

2011-02-03 Thread buddhasystem


CouchDB

-- 
View this message in context: 
http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Using-Cassandra-to-store-files-tp5988698p5989122.html
Sent from the cassandra-u...@incubator.apache.org mailing list archive at 
Nabble.com.

Re: Slow network writes

2011-02-03 Thread buddhasystem


Dude, are you asking me to unsubscribe?

-- 
View this message in context: 
http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Slow-network-writes-tp5985757p5991488.html
Sent from the cassandra-u...@incubator.apache.org mailing list archive at 
Nabble.com.

Commit log compaction


How often and by what criteria is the commit log compacted/truncated?

Thanks,

Maxim

-- 
View this message in context: 
http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Commit-log-compaction-tp5985221p5985221.html
Sent from the cassandra-u...@incubator.apache.org mailing list archive at 
Nabble.com.

Re: Commit log compaction


Thank you. So what is exactly the condition that causes the older commit log
files to actually be removed? I observe that indeed they are rotated out
when the threshold is reached, but then new ones a placed in the directory
and the older ones are still there.

Thanks,
Maxim

-- 
View this message in context: 
http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Commit-log-compaction-tp5985221p5986399.html
Sent from the cassandra-u...@incubator.apache.org mailing list archive at 
Nabble.com.

Re: Counters in 0.8 -- conditional?


Thanks. Just wanted to note that counting the number of rows where foo=bar is
a fairly ubiquitous task in db applications. In case of big data,
trafficking all these data to client just to count something isn't optimal
at all.

Maxim

-- 
View this message in context: 
http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Counters-in-0-8-conditional-tp5985214p5986442.html
Sent from the cassandra-u...@incubator.apache.org mailing list archive at 
Nabble.com.

Re: Counters in 0.8 -- conditional?


Thanks. Yes I know it's by no means trivial. I thought in case there was an
index on the column on which I want to place condition, the index machinery
itself can do the counting (i.e. when the index is updated, the counter is
incremented). It doesn't seem too orthogonal to the current implementation,
at least from my very limited experience.

Maxim

-- 
View this message in context: 
http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Counters-in-0-8-conditional-tp5985214p5986871.html
Sent from the cassandra-u...@incubator.apache.org mailing list archive at 
Nabble.com.

Re: Cassandra memory needs


Oleg,

I just wanted to add that I confirmed the importance of that rule of thumb
the hard way. I created two extra CFs and was able to reliably crash the
nodes during writes. I guess for the final setting I'll rely on results of
my testing.

But it's also important to not cause the swap death of your machine (i.e.
when you go too high on JVM memory).

Regards

Maxim

-- 
View this message in context: 
http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Cassandra-memory-needs-tp5986663p5986911.html
Sent from the cassandra-u...@incubator.apache.org mailing list archive at 
Nabble.com.

How do I get 0.7.1?


Thanks.

Maxim

-- 
View this message in context: 
http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/How-do-I-get-0-7-1-tp5986927p5986927.html
Sent from the cassandra-u...@incubator.apache.org mailing list archive at 
Nabble.com.

Re: Slow network writes


Jonathan,

where do I find that contrib/stress?

Maxim

-- 
View this message in context: 
http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Slow-network-writes-tp5985757p5986937.html
Sent from the cassandra-u...@incubator.apache.org mailing list archive at 
Nabble.com.

Re: How do I get 0.7.1?


Stephen, sorry I didn't understand your missive.

Maxim

-- 
View this message in context: 
http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/How-do-I-get-0-7-1-tp5986927p5987184.html
Sent from the cassandra-u...@incubator.apache.org mailing list archive at 
Nabble.com.

Re: cassandra as session store

2011-02-01 Thread buddhasystem


Most if not all modern web application frameworks support sessions. This
applies to Django (with which I have most experience and also run it with
X.509 security layer) but also to Ruby on Rails and Pylons.

So, why would you re-invent the wheel? Too messy. It's all out there for you
to use.

Regards,
Maxim

-- 
View this message in context: 
http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/cassandra-as-session-store-tp5981871p5981961.html
Sent from the cassandra-u...@incubator.apache.org mailing list archive at 
Nabble.com.

Re: cassandra as session store

2011-02-01 Thread buddhasystem


For completeness:

http://stackoverflow.com/questions/3746685/running-django-site-in-multiserver-environment-how-to-handle-sessions
http://docs.djangoproject.com/en/dev/topics/http/sessions/#using-cached-sessions

I guess your approach does make sense, one only wishes that the servlet in
question did more work for you. If I read correctly, Django can cache
sessions transparently in memcached. So memcached mecomes your Session
Management System. Is it better or worse than Cassandra? My feeling is that
it's probably faster and easier to set up.


-- 
View this message in context: 
http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/cassandra-as-session-store-tp5981871p5982024.html
Sent from the cassandra-u...@incubator.apache.org mailing list archive at 
Nabble.com.

TSocket timing out

2011-01-29 Thread buddhasystem


When I do a lot of inserts into my cluster (10k at a time) I get timeouts
from Thrift, the TScoket.py module.

What do I do?

Thanks,

Maxim

-- 
View this message in context: 
http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/TSocket-timing-out-tp5973548p5973548.html
Sent from the cassandra-u...@incubator.apache.org mailing list archive at 
Nabble.com.

Re: Cassandra and count

2011-01-28 Thread buddhasystem


As far as I know, there are no aggregate operations built into Cassandra,
which means you'll have to retrieve all of the data to count it in the
client. I had a thread on this topic 2 weeks ago. It's pretty bad.

-- 
View this message in context: 
http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Cassandra-and-count-tp5969159p5970315.html
Sent from the cassandra-u...@incubator.apache.org mailing list archive at 
Nabble.com.

Re: Node going down when streaming data, what next?

2011-01-28 Thread buddhasystem


Sorry Aaron but this doesn't help. As I said, machine is dead, kaput,
finished. So I can't do decommission. I can remove token to any other
node, but -- the dead machine is going to hang around in my ring reports
like a zombie.

-- 
View this message in context: 
http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Node-going-down-when-streaming-data-what-next-tp5962944p5971349.html
Sent from the cassandra-u...@incubator.apache.org mailing list archive at 
Nabble.com.

Re: Node going down when streaming data, what next?

2011-01-28 Thread buddhasystem


It does remove tokens, and the ring shows that the problematic node owns 0
tokens, which is OK. However, it's still there, listed.

It's not a bug but kind of like a feature -- you can move that node back in
two days later and move tokens in same or different way.

What I wish happened was that API allowed for the nodetool to issue a
command:

nodetool --host foobar removeempty

Which would then really scratch the node with zero tokens from the ring, no
questions asked. Even if the flaky node physically disappeared.

-- 
View this message in context: 
http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Node-going-down-when-streaming-data-what-next-tp5962944p5971851.html
Sent from the cassandra-u...@incubator.apache.org mailing list archive at 
Nabble.com.

Re: Using Cassandra for storing large objects

2011-01-27 Thread buddhasystem


Will it work for a billion rows? Because that's where eventually I'll end up
being.

-- 
View this message in context: 
http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Using-Cassandra-for-storing-large-objects-tp5965418p5966284.html
Sent from the cassandra-u...@incubator.apache.org mailing list archive at 
Nabble.com.

Re: Using Cassandra for storing large objects

2011-01-27 Thread buddhasystem


I would ask myself a different question, which is what media-hosting sites
use (YouTube and all others). Cassandra still may have its usefulness here
as a mapper between a logical id and physical file location.
-- 
View this message in context: 
http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Using-Cassandra-for-storing-large-objects-tp5965418p5967730.html
Sent from the cassandra-u...@incubator.apache.org mailing list archive at 
Nabble.com.

Re: Node going down when streaming data, what next?

2011-01-27 Thread buddhasystem


OK, after running repair and waiting overnight the rebalancing worked and
now 3 nodes share the load as I expected. However, one node that is broken
is still listed in the ring. I have no intention of reviving it. What's the
optimal way to get rid of it as far as the ring configuration is concerned
(it's still listed as down but I would like to really scratch it)?

Thanks,

Maxim

-- 
View this message in context: 
http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Node-going-down-when-streaming-data-what-next-tp5962944p5968075.html
Sent from the cassandra-u...@incubator.apache.org mailing list archive at 
Nabble.com.

Node going down when streaming data, what next?


I was moving a node and at some point it started streaming data to 2 other
nodes. Later, that node keeled over and let's assume I can't fix it for the
next 3 days and just want to move tokens on the remaining three to even out
and see if I can live with it.

But I can't do that! The node that was on the receiving end of the stream
refuses to move, because it's still receiving.

What do I do?

Maxim

-- 
View this message in context: 
http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Node-going-down-when-streaming-data-what-next-tp5962944p5962944.html
Sent from the cassandra-u...@incubator.apache.org mailing list archive at 
Nabble.com.

Re: Schema Design


Having separate columns for Year, Month etc seems redundant. It's tons more
efficient to keep say UTC time in POSIX format (basically integer). It's
easy to convert back and forth.

If you want to get a range of dates, in that case you might use Order
Preserving Partitioner, and sort out which systems logged later in client.
Read up on consequences of using OPP.

Whether to shard data as per system depends on how many you have. If more
than a few, don't do that, there are memory considerations.

Cheers

Maxim

-- 
View this message in context: 
http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Schema-Design-tp5964167p5964227.html
Sent from the cassandra-u...@incubator.apache.org mailing list archive at 
Nabble.com.

Re: Node going down when streaming data, what next?


Bump. I still don't know what is the best things to do, plz help.
-- 
View this message in context: 
http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Node-going-down-when-streaming-data-what-next-tp5962944p5964231.html
Sent from the cassandra-u...@incubator.apache.org mailing list archive at 
Nabble.com.

Re: Schema Design


I used the term sharding a bit frivolously. Sorry. It's just splitting
semantically homogenious data among CFs doesn't scale too well, as each CF
is allocated a piece of memory on the server.
-- 
View this message in context: 
http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Schema-Design-tp5964167p5964326.html
Sent from the cassandra-u...@incubator.apache.org mailing list archive at 
Nabble.com.

Re: Node going down when streaming data, what next?


Hello,

from what I know, you don't really have to restart simultaneously,
although of course you don't want to wait.

I finally decided to use removetoken command to actually scratch out the
sickly node from the cluster. I'll bootstrap is later when it's fixed.


-- 
View this message in context: 
http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Node-going-down-when-streaming-data-what-next-tp5962944p5964804.html
Sent from the cassandra-u...@incubator.apache.org mailing list archive at 
Nabble.com.

Why does cassandra stream data when moving tokens?


Sorry if this sounds silly, but I can't get my brain around this one: if all
nodes contain replicas, why does the cluster stream data every time I more
or remove a token? If the data is already there, what needs to be streamed?

Thanks
Maxim

-- 
View this message in context: 
http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Why-does-cassandra-stream-data-when-moving-tokens-tp5964839p5964839.html
Sent from the cassandra-u...@incubator.apache.org mailing list archive at 
Nabble.com.

RE: Why does cassandra stream data when moving tokens?


Thanks, I'll look at the configuration again.

In the meantime, I can't move the first node in the ring (after I removed
the previous node's token) -- it throws an exception and says data is being
streamed to it -- however, this is not what netstats says! Weirdness
continues...

Maxim

-- 
View this message in context: 
http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Why-does-cassandra-stream-data-when-moving-tokens-tp5964839p5964883.html
Sent from the cassandra-u...@incubator.apache.org mailing list archive at 
Nabble.com.

Re: Forcing GC w/o jconsole


Thanks! It doesn't seem to have any effect on GCing dropped CFs, though.

Maxim

-- 
View this message in context: 
http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Forcing-GC-w-o-jconsole-tp5956747p5960100.html
Sent from the cassandra-u...@incubator.apache.org mailing list archive at 
Nabble.com.

Re: Stress test inconsistencies


Oleg,

I'm a novice at this, but for what it's worth I can't imagine you can have a
_sustained_ 1kHz insertion rate on a single machine which also does some
reads. If I'm wrong, I'll be glad to learn that I was. It just doesn't seem
to square with a typical seek time on a hard drive.

Maxim

-- 
View this message in context: 
http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Stress-test-inconsistencies-tp5957467p5960182.html
Sent from the cassandra-u...@incubator.apache.org mailing list archive at 
Nabble.com.

Re-partitioning the cluster with nodetool: what's happening?


I'm trying re-partition my 4-node cluster to make the load exactly 25% on
each node.
As per recipes found in documentation, I calculate:
 for x in xrange(4):
... print 2**127/4*x
...
0
42535295865117307932921825928971026432
85070591730234615865843651857942052864
127605887595351923798765477786913079296

And I need to move the first one to 0, then the second one to
42535295865117307932921825928971026432 etc.

Once I start the procedure, I see no progress when I look at nodetool
netstats. Nothing's happening. What am I doing wrong?

Thanks,

Maxim

-- 
View this message in context: 
http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Re-partitioning-the-cluster-with-nodetool-what-s-happening-tp5960843p5960843.html
Sent from the cassandra-u...@incubator.apache.org mailing list archive at 
Nabble.com.

Re: Re-partitioning the cluster with nodetool: what's happening?


Correction -- what I meant to say that I do see announcements about streaming
in the output, but these are stuck at 0%.

-- 
View this message in context: 
http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Re-partitioning-the-cluster-with-nodetool-what-s-happening-tp5960843p5960851.html
Sent from the cassandra-u...@incubator.apache.org mailing list archive at 
Nabble.com.

Forcing GC w/o jconsole


My situation is similar to one described at this link:
http://stackoverflow.com/questions/4155696/how-to-trigger-manual-java-gc-from-linux-console-with-no-x11

I'm trying the following command but it fails (connection refused)

java -jar cmdline-jmxclient-0.10.3.jar - localhost:8081
java.lang:type=Memory gc

What port number do I actually need?

I really have no experience in doing that, if somebody can give me the
correct recipe, this will be much appreciated.

Maxim

-- 
View this message in context: 
http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Forcing-GC-w-o-jconsole-tp5956747p5956747.html
Sent from the cassandra-u...@incubator.apache.org mailing list archive at 
Nabble.com.

Re: Does Major Compaction work on dropped CFs? Doesn't seem so.


OK, so I'm looking at this page:

http://wiki.apache.org/cassandra/MemtableSSTable

This looks promising:
A compaction marker is also added to obsolete sstables so they can be
deleted on startup if the server does not perform a GC before being
restarted.

So it would seem that if I restart the server, the obsoleted data should be
GCd out of existence, don't you think? But it's not happening. I brought
down one node, restarted it and the old data is still there.

Ideas?

-- 
View this message in context: 
http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Does-Major-Compaction-work-on-dropped-CFs-Doesn-t-seem-so-tp5946031p5957155.html
Sent from the cassandra-u...@incubator.apache.org mailing list archive at 
Nabble.com.

Re: Does Major Compaction work on dropped CFs? Doesn't seem so.


Thanks for the note,

yes, I do know what files I don't need anymore. And, I do realize the
difference between grace period of CFs, and garbage collection (or at least
I hope I do).

On the face value, documentation wasn't precise enough about JVM GC taking
care of dropped CFs. I understand this is why nodetool compact didn't have
the desired effect. I guess I'll have to do manual deletion after all. 

Maxim

-- 
View this message in context: 
http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Does-Major-Compaction-work-on-dropped-CFs-Doesn-t-seem-so-tp5946031p5957252.html
Sent from the cassandra-u...@incubator.apache.org mailing list archive at 
Nabble.com.

Re: Does Major Compaction work on dropped CFs? Doesn't seem so.