Re: Chunking if size > 64MB

2011-06-29 Thread aaron morton
AFAIK there is no server side chunking of column values. This link http://wiki.apache.org/cassandra/FAQ#large_file_and_blob_storage is just suggesting in the app you do not store more than 64MB per column. Cheers - Aaron Morton Freelance Cassandra Developer @aaronmorton http://

Re: hadoop results

2011-06-29 Thread aaron morton
How about get_slice() with reversed == true and count = 1 to get the highest time UUID ? Or you can also store a column with a magic name that have the value of the timeuuid that is the current metric to use. Cheers - Aaron Morton Freelance Cassandra Developer @aaronmorton h

Re: Cannot set column value to zero

2011-06-29 Thread aaron morton
The extra () in the describe keyspace output is only there if the column comparator is the BytesType, the client tries to format the data as UTF8. Dont forget truncate is doing snapshots, so check the snapshots dir and delete things if you are using it a lot for testing. The 0 == 1 thing does

Re: custom reconciling columns?

2011-06-29 Thread Jonathan Ellis
On Tue, Jun 28, 2011 at 10:06 PM, Yang wrote: > I'm trying to see whether there are some easy magic bullets for a drop-in > replacement for concurrentSkipListMap... I'm highly interested if you find one. :) -- Jonathan Ellis Project Chair, Apache Cassandra co-founder of DataStax, the source for

Re: No Transactions: An Example

2011-06-29 Thread AJ
On 6/22/2011 9:18 AM, Trevor Smith wrote: Right -- that's the part that I am more interested in fleshing out in this post. Here is one way. Use MVCC . A single global clean-up process would be acceptable since it's not a sin

Re: api to extract gossiper results

2011-06-29 Thread Edward Capriolo
A simple solution is to setup log4j to a DEBUG level on Gossip events. You can also use the StorageProxy/Fat client and then participate in gossip. Each system has its own converging view of the ring, thus what your local gossip things is the topology may not be the same across the cluster. Edwar

Cassandra client loses connectivity to cluster

2011-06-29 Thread Jim Ancona
In reviewing client logs as part of our Cassandra testing, I noticed several Hector "All host pools marked down" exceptions in the logs. Further investigation showed a consistent pattern of "java.net.SocketException: Broken pipe" and "java.net.SocketException: Connection reset" messages. These erro

api to extract gossiper results

2011-06-29 Thread A J
Cassandra uses accrual failure detector to interpret the gossips. Is it somehow possible to extract these (gossip values and results of the failure detector) in an external system ? Thanks

Chunking if size > 64MB

2011-06-29 Thread A J
>From what I read, Cassandra allows a single column value to be up-to 2GB but would chunk the data if greater than 64MB. Is the chunking transparent to the application or does the app need to know if/how/when the chunking happened for a specific column value that happened to be > 64MB. Thank you.

RE: RAID or no RAID

2011-06-29 Thread Jeremiah Jordan
With multiple data dirs you are still limited by the space free on any one drive. So if you have two data dirs with 40GB free on each, and you have 50GB to be compacted, it won't work, but if you had a raid, you would have 80GB free and could compact... -Original Message- From: mcasandra

CQL injection attacks?

2011-06-29 Thread dnallsopp
Someone asked a while ago whether Cassandra was vulnerable to injection attacks: http://stackoverflow.com/questions/5998838/nosql-injection-php-phpcassa-cassandra With Thrift, the answer was 'no'. With CQL, presumably the situation is different, at least until prepared statements are possible (

hadoop results

2011-06-29 Thread William Oberman
I'll start with my question: given a CF with comparator TimeUUIDType, what is the most efficient way to get the greatest column's value? Context: I've been running cassandra for a couple of months now, so obviously it's time to start layering more on top :-) In my test environment, I managed to g

Re: custom reconciling columns? (improve performance of long rows )

2011-06-29 Thread Yang
I hacked around the code, and first I thought that the cost on map put and get was due to the synchronization cost , so I tried replacing concurrentSkipListMap with TreeMap. I created a subclass of ColumnFamily and use the subclass only in pure read path : interestingly on the read path, no more th

Re: Data storage security

2011-06-29 Thread Eric tamme
On Wed, Jun 29, 2011 at 12:37 PM, A J wrote: > Are there any options to encrypt the column families when they are > stored in the database. Say in a given keyspace some CF has sensitive > info and I don't want a 'select *' of that CF to layout the data in > plain text. > > Thanks. > I think this

Data storage security

2011-06-29 Thread A J
Are there any options to encrypt the column families when they are stored in the database. Say in a given keyspace some CF has sensitive info and I don't want a 'select *' of that CF to layout the data in plain text. Thanks.

Re: question on capacity planning

2011-06-29 Thread Ryan King
On Wed, Jun 29, 2011 at 5:36 AM, Jacob, Arun wrote: > if I'm planning to store 20TB of new data per week, and expire all data > every 2 weeks, with a replication factor of 3, do I only need approximately > 120 TB of disk? I'm going to use ttl in my column values to automatically > expire data. Or

Cannot set column value to zero

2011-06-29 Thread dnallsopp
I had a strange problem recently where I was unable to set the value of a column to '0' (it always returned '1') but setting it to other values worked fine: [default@Test] set Urls['rowkey']['status']='1'; Value inserted. [default@Test] get Urls['rowkey']; => (column=status, value=1, timestamp=130

Re: Ec2 snitch with network topology strategy

2011-06-29 Thread pankaj soni
Hmm... Just tested the config. It works, got confused with the options, my bad. On Wed, Jun 29, 2011 at 2:26 PM, pankajsoni0126 wrote: > I was thinking of leveraging ec2 snitch. But my question is then how do I > give replica placement options? > > Or can I give snitch as ec2snitch and write the

question on capacity planning

2011-06-29 Thread Jacob, Arun
if I'm planning to store 20TB of new data per week, and expire all data every 2 weeks, with a replication factor of 3, do I only need approximately 120 TB of disk? I'm going to use ttl in my column values to automatically expire data. Or would I need more capacity to handle sstable merges? Given

Ec2 snitch with network topology strategy

2011-06-29 Thread pankajsoni0126
I was thinking of leveraging ec2 snitch. But my question is then how do I give replica placement options? Or can I give snitch as ec2snitch and write the nodes cassandra-topology.prop and in give locator strategy at time of creating keyspace as network topology strategy. But will it work? And th