Re: Limit what nodes are writeable

2011-07-11 Thread Maki Watanabe
Cassandra has authentication interface, but doesn't have authorization.
So you need to implement authorization in your application layer.

maki


2011/7/11 David McNelis dmcne...@agentisenergy.com:
 I've been looking in the documentation and haven't found anything about
 this...  but is there support for making a node  read-only?
 For example, you have a cluster set up in two different data centers / racks
 / whatever, with your replication strategy set up so that the data is
 redundant between the two places.  In one of the places all of the incoming
 data will be  processed and inserted into your cluster.  In the other data
 center you plan to allow people to run analytics, but you want to restrict
 the permissions so that the people running analytics can connect to
 Cassandra in whatever way makes the most sense for them, but you don't want
 those people to be able to edit/update data.
 Is it currently possible to configure your cluster in this manner?  Or would
 it only be possible through a third-party solution like wrapping one of the
 access libraries in a way that does not support write operations.

 --
 David McNelis
 Lead Software Engineer
 Agentis Energy
 www.agentisenergy.com
 o: 630.359.6395
 c: 219.384.5143
 A Smart Grid technology company focused on helping consumers of energy
 control an often under-managed resource.





-- 
w3m


Re: Storing counters in the standard column families along with non-counter columns ?

2011-07-11 Thread Chris Burroughs
On 07/10/2011 01:09 PM, Aditya Narayan wrote:
 Is there any target version in near future for which this has been promised
 ?

The ticket is problematic in that it would -- unless someone has a
clever new idea -- require breaking thrift compatibility to add it to
the api.  Since is unfortunate since it would be so useful.

If it's in the 0.8.x series it will only be through CQL.


Re: Limit what nodes are writeable

2011-07-11 Thread Yuki Morishita
I never used the feature, but there is the way to control access based
on user name.
Configuring both conf/passwd.properties and conf/access.properties, then
modify cassandra.yaml as follows.

# authentication backend, implementing IAuthenticator; used to identify users
authenticator: org.apache.cassandra.auth.SimpleAuthenticator

# authorization backend, implementing IAuthority; used to limit
access/provide permissions
authority: org.apache.cassandra.auth.SimpleAuthority

2011/7/11 Maki Watanabe watanabe.m...@gmail.com:
 Cassandra has authentication interface, but doesn't have authorization.
 So you need to implement authorization in your application layer.

 maki


 2011/7/11 David McNelis dmcne...@agentisenergy.com:
 I've been looking in the documentation and haven't found anything about
 this...  but is there support for making a node  read-only?
 For example, you have a cluster set up in two different data centers / racks
 / whatever, with your replication strategy set up so that the data is
 redundant between the two places.  In one of the places all of the incoming
 data will be  processed and inserted into your cluster.  In the other data
 center you plan to allow people to run analytics, but you want to restrict
 the permissions so that the people running analytics can connect to
 Cassandra in whatever way makes the most sense for them, but you don't want
 those people to be able to edit/update data.
 Is it currently possible to configure your cluster in this manner?  Or would
 it only be possible through a third-party solution like wrapping one of the
 access libraries in a way that does not support write operations.

 --
 David McNelis
 Lead Software Engineer
 Agentis Energy
 www.agentisenergy.com
 o: 630.359.6395
 c: 219.384.5143
 A Smart Grid technology company focused on helping consumers of energy
 control an often under-managed resource.





 --
 w3m




-- 
Yuki Morishita
 t:yukim (http://twitter.com/yukim)


AntiEntropy?

2011-07-11 Thread Yang
I looked around in the code, it seems that AntiEntropy operations are
not automatically run in the server daemon, but only
manually invoked through nodetool, am I correct?

if this is the case, I guess the reason to disable it is just the load
impact it brings to servers?

Thanks
Yang


Re: AntiEntropy?

2011-07-11 Thread Peter Schuller
 I looked around in the code, it seems that AntiEntropy operations are
 not automatically run in the server daemon, but only
 manually invoked through nodetool, am I correct?

Yes, and it's important that you do run repair:
http://wiki.apache.org/cassandra/Operations#Frequency_of_nodetool_repair

-- 
/ Peter Schuller


Re: Meaning of 'nodetool repair has to run within GCGraceSeconds'

2011-07-11 Thread A J
Instead of doing nodetool repair, is it not a cheaper operation to
keep tab of failed writes (be it deletes or inserts or updates) and
read these failed writes at a set frequency in some batch job ? By
reading them, RR would get triggered and they would get to a
consistent state.

Because these would targeted reads (only for those that failed during
writes), it should be a shorter list and quick to repair (than
nodetool repair).


On Thu, Jun 30, 2011 at 5:27 PM, Jonathan Ellis jbel...@gmail.com wrote:
 On Thu, Jun 30, 2011 at 3:47 PM, Edward Capriolo edlinuxg...@gmail.com 
 wrote:
 Read repair does NOT repair tombstones.

 It does, but you can't rely on RR to repair _all_ tombstones, because
 RR only happens if the row in question is requested by a client.

 --
 Jonathan Ellis
 Project Chair, Apache Cassandra
 co-founder of DataStax, the source for professional Cassandra support
 http://www.datastax.com



Re: Storing counters in the standard column families along with non-counter columns ?

2011-07-11 Thread Aditya Narayan
Oops that's really very much disheartening and it could seriously impact our
plans for going live in near future. Without this facility I guess counters
currently have very little usefulness.

On Mon, Jul 11, 2011 at 8:16 PM, Chris Burroughs
chris.burrou...@gmail.comwrote:

 On 07/10/2011 01:09 PM, Aditya Narayan wrote:
  Is there any target version in near future for which this has been
 promised
  ?

 The ticket is problematic in that it would -- unless someone has a
 clever new idea -- require breaking thrift compatibility to add it to
 the api.  Since is unfortunate since it would be so useful.

 If it's in the 0.8.x series it will only be through CQL.



Secondary Index doesn't work with LOCAL_QUORUM

2011-07-11 Thread Hefeng Yuan
Hi,

We're using Cassandra with 2 DC
- one OLTP Cassandra, 6 nodes, with RF3
- the other is a Brisk, 3 nodes, with RF1

We noticed that when I do a write-then-read operation on the Cassandra DC, it 
fails with the following information (from cqlsh):
Unable to complete request: one or more nodes were unavailable.
- write: LOCAL_QUORUM, successful
- read: LOCAL_QUORUM, using the secondary indexed column, fail

Seems it's taking a long while working on this. When I retry the same query 
after ~10 minutes, it succeeds actually.

Any help is appreciated.

Thanks,
Hefeng

Re: Meaning of 'nodetool repair has to run within GCGraceSeconds'

2011-07-11 Thread A J
Never mind. I see the issue with this. I will be able to catch the
writes as failed only if I set CL=ALL. For other CLs, I may not know
that it failed on some node.

On Mon, Jul 11, 2011 at 2:33 PM, A J s5a...@gmail.com wrote:
 Instead of doing nodetool repair, is it not a cheaper operation to
 keep tab of failed writes (be it deletes or inserts or updates) and
 read these failed writes at a set frequency in some batch job ? By
 reading them, RR would get triggered and they would get to a
 consistent state.

 Because these would targeted reads (only for those that failed during
 writes), it should be a shorter list and quick to repair (than
 nodetool repair).


 On Thu, Jun 30, 2011 at 5:27 PM, Jonathan Ellis jbel...@gmail.com wrote:
 On Thu, Jun 30, 2011 at 3:47 PM, Edward Capriolo edlinuxg...@gmail.com 
 wrote:
 Read repair does NOT repair tombstones.

 It does, but you can't rely on RR to repair _all_ tombstones, because
 RR only happens if the row in question is requested by a client.

 --
 Jonathan Ellis
 Project Chair, Apache Cassandra
 co-founder of DataStax, the source for professional Cassandra support
 http://www.datastax.com




custom StoragePort?

2011-07-11 Thread Yang
I tried to run multiple cassandra daemons on the same host, using
different ports, for a test env.

I thought this would work, but it turns out that the StoragePort used
by outputTcpConnection is always assumed to be the one specified
in .yaml, i.e. the code assumes that the storageport is same
everywhere. in fact this assumption seems deeply held in many places
in the code, so it's a bit difficult to refactor it , for example by
substituting InetAddress with InetSocketAddress.


I am just wondering, do you see any other value to a custom
storageport, besides testing? if there is real value, maybe someone
more familiar with the code could do the refactoring


Thanks
yang


Node repair questions

2011-07-11 Thread A J
Hello,
Have the following questions related to nodetool repair:
1. I know that Nodetool Repair Interval has to be less than
GCGraceSeconds. How do I come up with an exact value of GCGraceSeconds
and 'Nodetool Repair Interval'. What factors would want me to change
the default of 10 days of GCGraceSeconds. Similarly what factors would
want me to keep Nodetool Repair Interval to be just slightly less than
GCGraceSeconds (say a day less).

2. Does a Nodetool Repair block any reads and writes on the node,
while the repair is going on ? During repair, if I try to do an
insert, will the insert wait for repair to complete first ?

3. I read that repair can impact your workload as it causes additional
disk and cpu activity. But any details of the impact mechanism and any
ballpark on how much the read/write performance deteriorates ?

Thanks.


Re: custom StoragePort?

2011-07-11 Thread Yang
never mind, found this..
https://issues.apache.org/jira/browse/CASSANDRA-200?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel

On Mon, Jul 11, 2011 at 12:39 PM, Yang tedd...@gmail.com wrote:
 I tried to run multiple cassandra daemons on the same host, using
 different ports, for a test env.

 I thought this would work, but it turns out that the StoragePort used
 by outputTcpConnection is always assumed to be the one specified
 in .yaml, i.e. the code assumes that the storageport is same
 everywhere. in fact this assumption seems deeply held in many places
 in the code, so it's a bit difficult to refactor it , for example by
 substituting InetAddress with InetSocketAddress.


 I am just wondering, do you see any other value to a custom
 storageport, besides testing? if there is real value, maybe someone
 more familiar with the code could do the refactoring


 Thanks
 yang



Out of memory error in cassandra

2011-07-11 Thread Anurag Gujral
Hi All,
   I am getting following error from cassandra:
ERROR [ReadStage:23] 2011-07-10 17:19:18,300
DebuggableThreadPoolExecutor.java (line 103) Error in ThreadPoolExecutor
java.lang.OutOfMemoryError: Java heap space
at
org.apache.cassandra.utils.BloomFilterSerializer.deserialize(BloomFilterSerializer.java:49)
at
org.apache.cassandra.utils.BloomFilterSerializer.deserialize(BloomFilterSerializer.java:30)
at
org.apache.cassandra.io.sstable.IndexHelper.defreezeBloomFilter(IndexHelper.java:117)
at
org.apache.cassandra.io.sstable.IndexHelper.defreezeBloomFilter(IndexHelper.java:94)
at
org.apache.cassandra.db.columniterator.SSTableNamesIterator.read(SSTableNamesIterator.java:107)
at
org.apache.cassandra.db.columniterator.SSTableNamesIterator.init(SSTableNamesIterator.java:72)
at
org.apache.cassandra.db.filter.NamesQueryFilter.getSSTableColumnIterator(NamesQueryFilter.java:59)
at
org.apache.cassandra.db.filter.QueryFilter.getSSTableColumnIterator(QueryFilter.java:80)
at
org.apache.cassandra.db.ColumnFamilyStore.getTopLevelColumns(ColumnFamilyStore.java:1311)
at
org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(ColumnFamilyStore.java:1203)
at
org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(ColumnFamilyStore.java:1131)
at org.apache.cassandra.db.Table.getRow(Table.java:333)
at
org.apache.cassandra.db.SliceByNamesReadCommand.getRow(SliceByNamesReadCommand.java:60)
at
org.apache.cassandra.db.ReadVerbHandler.doVerb(ReadVerbHandler.java:69)
at
org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:72)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
at java.lang.Thread.run(Thread.java:636)
 INFO [ScheduledTasks:1] 2011-07-10 17:19:18,306 StatusLogger.java (line 66)
RequestResponseStage  0 0
ERROR [ReadStage:23] 2011-07-10 17:19:18,306 AbstractCassandraDaemon.java
(line 114) Fatal exception in thread Thread[ReadStage:23,5,main]
java.lang.OutOfMemoryError: Java heap space
at
org.apache.cassandra.utils.BloomFilterSerializer.deserialize(BloomFilterSerializer.java:49)
at
org.apache.cassandra.utils.BloomFilterSerializer.deserialize(BloomFilterSerializer.java:30)
at
org.apache.cassandra.io.sstable.IndexHelper.defreezeBloomFilter(IndexHelper.java:117)
at
org.apache.cassandra.io.sstable.IndexHelper.defreezeBloomFilter(IndexHelper.java:94)
at
org.apache.cassandra.db.columniterator.SSTableNamesIterator.read(SSTableNamesIterator.java:107)
at
org.apache.cassandra.db.columniterator.SSTableNamesIterator.init(SSTableNamesIterator.java:72)
at
org.apache.cassandra.db.filter.NamesQueryFilter.getSSTableColumnIterator(NamesQueryFilter.java:59)
at
org.apache.cassandra.db.filter.QueryFilter.getSSTableColumnIterator(QueryFilter.java:80)
at
org.apache.cassandra.db.ColumnFamilyStore.getTopLevelColumns(ColumnFamilyStore.java:1311)
at
org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(ColumnFamilyStore.java:1203)
at
org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(ColumnFamilyStore.java:1131)


Can someone please help debug this? The maximum heap size is 28G .

I am not sure why cassandra is giving Out of memory error here.

Thanks
Anurag


Re: Out of memory error in cassandra

2011-07-11 Thread Jeffrey Kesselman
Are you on a 64 bit VM?  A 32 bit vm will basically ignore any setting over
2GB

On Mon, Jul 11, 2011 at 4:55 PM, Anurag Gujral anurag.guj...@gmail.comwrote:

 Hi All,
I am getting following error from cassandra:
 ERROR [ReadStage:23] 2011-07-10 17:19:18,300
 DebuggableThreadPoolExecutor.java (line 103) Error in ThreadPoolExecutor
 java.lang.OutOfMemoryError: Java heap space
 at
 org.apache.cassandra.utils.BloomFilterSerializer.deserialize(BloomFilterSerializer.java:49)
 at
 org.apache.cassandra.utils.BloomFilterSerializer.deserialize(BloomFilterSerializer.java:30)
 at
 org.apache.cassandra.io.sstable.IndexHelper.defreezeBloomFilter(IndexHelper.java:117)
 at
 org.apache.cassandra.io.sstable.IndexHelper.defreezeBloomFilter(IndexHelper.java:94)
 at
 org.apache.cassandra.db.columniterator.SSTableNamesIterator.read(SSTableNamesIterator.java:107)
 at
 org.apache.cassandra.db.columniterator.SSTableNamesIterator.init(SSTableNamesIterator.java:72)
 at
 org.apache.cassandra.db.filter.NamesQueryFilter.getSSTableColumnIterator(NamesQueryFilter.java:59)
 at
 org.apache.cassandra.db.filter.QueryFilter.getSSTableColumnIterator(QueryFilter.java:80)
 at
 org.apache.cassandra.db.ColumnFamilyStore.getTopLevelColumns(ColumnFamilyStore.java:1311)
 at
 org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(ColumnFamilyStore.java:1203)
 at
 org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(ColumnFamilyStore.java:1131)
 at org.apache.cassandra.db.Table.getRow(Table.java:333)
 at
 org.apache.cassandra.db.SliceByNamesReadCommand.getRow(SliceByNamesReadCommand.java:60)
 at
 org.apache.cassandra.db.ReadVerbHandler.doVerb(ReadVerbHandler.java:69)
 at
 org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:72)
 at
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
 at
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
 at java.lang.Thread.run(Thread.java:636)
  INFO [ScheduledTasks:1] 2011-07-10 17:19:18,306 StatusLogger.java (line
 66) RequestResponseStage  0 0
 ERROR [ReadStage:23] 2011-07-10 17:19:18,306 AbstractCassandraDaemon.java
 (line 114) Fatal exception in thread Thread[ReadStage:23,5,main]
 java.lang.OutOfMemoryError: Java heap space
 at
 org.apache.cassandra.utils.BloomFilterSerializer.deserialize(BloomFilterSerializer.java:49)
 at
 org.apache.cassandra.utils.BloomFilterSerializer.deserialize(BloomFilterSerializer.java:30)
 at
 org.apache.cassandra.io.sstable.IndexHelper.defreezeBloomFilter(IndexHelper.java:117)
 at
 org.apache.cassandra.io.sstable.IndexHelper.defreezeBloomFilter(IndexHelper.java:94)
 at
 org.apache.cassandra.db.columniterator.SSTableNamesIterator.read(SSTableNamesIterator.java:107)
 at
 org.apache.cassandra.db.columniterator.SSTableNamesIterator.init(SSTableNamesIterator.java:72)
 at
 org.apache.cassandra.db.filter.NamesQueryFilter.getSSTableColumnIterator(NamesQueryFilter.java:59)
 at
 org.apache.cassandra.db.filter.QueryFilter.getSSTableColumnIterator(QueryFilter.java:80)
 at
 org.apache.cassandra.db.ColumnFamilyStore.getTopLevelColumns(ColumnFamilyStore.java:1311)
 at
 org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(ColumnFamilyStore.java:1203)
 at
 org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(ColumnFamilyStore.java:1131)


 Can someone please help debug this? The maximum heap size is 28G .

 I am not sure why cassandra is giving Out of memory error here.

 Thanks
 Anurag




-- 
It's always darkest just before you are eaten by a grue.


RE: custom StoragePort?

2011-07-11 Thread Jeremiah Jordan
If you are on linux see:
https://github.com/pcmanus/ccm 

-Original Message-
From: Yang [mailto:tedd...@gmail.com] 
Sent: Monday, July 11, 2011 3:08 PM
To: user@cassandra.apache.org
Subject: Re: custom StoragePort?

never mind, found this..
https://issues.apache.org/jira/browse/CASSANDRA-200?page=com.atlassian.j
ira.plugin.system.issuetabpanels:all-tabpanel

On Mon, Jul 11, 2011 at 12:39 PM, Yang tedd...@gmail.com wrote:
 I tried to run multiple cassandra daemons on the same host, using 
 different ports, for a test env.

 I thought this would work, but it turns out that the StoragePort used 
 by outputTcpConnection is always assumed to be the one specified in 
 .yaml, i.e. the code assumes that the storageport is same everywhere. 
 in fact this assumption seems deeply held in many places in the code, 
 so it's a bit difficult to refactor it , for example by substituting 
 InetAddress with InetSocketAddress.


 I am just wondering, do you see any other value to a custom 
 storageport, besides testing? if there is real value, maybe someone 
 more familiar with the code could do the refactoring


 Thanks
 yang



RE: Node repair questions

2011-07-11 Thread Jeremiah Jordan
The more often you repair, the quicker it will be.  The more often your
nodes go down the longer it will be.

Repair streams data that is missing between nodes.  So the more data
that is different the longer it will take.  Your workload is impacted
because the node has to scan the data it has to be able to compare with
other nodes, and if there are differences, it has to send/receive data
from other nodes.


-Original Message-
From: A J [mailto:s5a...@gmail.com] 
Sent: Monday, July 11, 2011 2:43 PM
To: user@cassandra.apache.org
Subject: Node repair questions

Hello,
Have the following questions related to nodetool repair:
1. I know that Nodetool Repair Interval has to be less than
GCGraceSeconds. How do I come up with an exact value of GCGraceSeconds
and 'Nodetool Repair Interval'. What factors would want me to change the
default of 10 days of GCGraceSeconds. Similarly what factors would want
me to keep Nodetool Repair Interval to be just slightly less than
GCGraceSeconds (say a day less).

2. Does a Nodetool Repair block any reads and writes on the node, while
the repair is going on ? During repair, if I try to do an insert, will
the insert wait for repair to complete first ?

3. I read that repair can impact your workload as it causes additional
disk and cpu activity. But any details of the impact mechanism and any
ballpark on how much the read/write performance deteriorates ?

Thanks.


Re: Cassandra Secondary index/Twissandra

2011-07-11 Thread Eldad Yamin
Hi Aaron,
Thank you again for your response.

I've read the article but I didn't understand everything. it would be great
if the benchmark will include the actual CLI/Python comments (that way it
will be easier to understand the query). in addition, an explanation about
row pages - what is it?.

Anyway, for a scale proportion, we can take as example
the average Facebook/Twitter user which can get 100K columns per user
(Userline).
So what is needed is to take the first 50 columns (order by TimeUUID), then
column 51 to 100, 101 to 150 etc.
Any suggestion on fast will it be? or how you recommend on configuring
Cassandra? or even a different way of achieving that goal?

Thanks,
Eldad.

On Sun, Jul 10, 2011 at 8:31 PM, aaron morton aa...@thelastpickle.comwrote:

 Can you recommend on a better way of doing that or a way to tune Cassandra
 to support those 2 CF?

 A select with no start or finish column name, a column count and not in
 reversed order is about the fastest read query.

 You will need to do a reversed query, which will be a little slower. But
 may still be plenty fast enough, depending on scale and throughput and all
 those other things. see
 http://thelastpickle.com/2011/07/04/Cassandra-Query-Plans/

 Cheers


 -
 Aaron Morton
 Freelance Cassandra Developer
 @aaronmorton
 http://www.thelastpickle.com

 On 10 Jul 2011, at 00:14, Eldad Yamin wrote:

 Aaron - Thank you for the fast response!


1. Does performance decrease (significantly) if the uniqueness of the
column’s name is high when comparator is LONG_TYPE/TimeUUID and each row 
 has
lots of columns?

 Depends on what sort of operations you are doing. Some read operations
 have to pay a constant cost to decode the row level column index, this can
 be tuned though. AFAIK the comparator type has very little to do with the
 performance.

 In Twissandra, the columns are used as alternative index for the
 Userline/Timeline. therefore the operation I'm going to do is slice_range.
 I'm going to get (for example) the first 50  columns (using comparator of
 TimeUUID/LONG).
 Can you recommend on a better way of doing that or a way to tune Cassandra
 to support those 2 CF?


 Thanks!

 On Sun, Jul 10, 2011 at 3:26 AM, aaron morton aa...@thelastpickle.comwrote:


1. Is there a limit on the number of columns in a single column family
that serve as secondary indexes?

 AFAIK there is no coded limit, however every index is implemented as
 another (hidden) Column Family that inherits the settings of the parent CF.
 So under 0.7 you may run out of memory, under 0.8 you may flush  a lot.
 Also, when an indexed column is updated there are potentially 3 operations
 that have to happen: read the old value, delete the old value, write the new
 value. More indexes == more index updating, just like any other database.


1. Does performance decrease (significantly) if the uniqueness of the
column’s values is high?

 Low cardinality is recommended

 http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Secondary-indices-Why-low-cardinality-td6160509.html


1. The CF for Userline/Uimeline - have comparator of LONG_TYPE
and not TimeUUID?

 Probably just to make the demo easier. It's used to order tweets in the
 user and public timelines by the current time
 https://github.com/twissandra/twissandra/blob/master/cass.py#L204


1. Does performance decrease (significantly) if the uniqueness of the
column’s name is high when comparator is LONG_TYPE/TimeUUID and each row 
 has
lots of columns?

 Depends on what sort of operations you are doing. Some read operations
 have to pay a constant cost to decode the row level column index, this can
 be tuned though. AFAIK the comparator type has very little to do with the
 performance.

 Hope that helps.

 -
  -
 Aaron Morton
 Freelance Cassandra Developer
 @aaronmorton
 http://www.thelastpickle.com

 On 9 Jul 2011, at 12:15, Eldad Yamin wrote:

 Hi,
 I have few questions:

 *Secondary index*

1. Is there a limit on the number of columns in a single column family
that serve as secondary indexes?
2. Does performance decrease (significantly) if the uniqueness of the
column’s values is high?


 *Twissandra*

1. Why in the source (or any tutorial I've read):
The CF for Userline/Uimeline - have comparator of LONG_TYPE and
not TimeUUID?


 https://github.com/twissandra/twissandra/blob/master/tweets/management/commands/sync_cassandra.py
2. Does performance decrease (significantly) if the uniqueness of the
column’s name is high when comparator is LONG_TYPE/TimeUUID and each row 
 has
lots of columns?


 Thanks!
 Eldad







Re: Strong Consistency with ONE read/writes

2011-07-11 Thread Yang
I'm not proposing any changes to be done, but this looks like a very
interesting topic for thought/hack/learning, so the following are only
for thought exercises 


HBase enforces a single write/read entry point, so you can achieve
strong consistency by writing/reading only one node.  but just writing
to one node exposes you to loss of data if that node fails. so the
region server HLog is replicated to 3 HDFS data nodes.  the
interesting thing here is that each replica sees a complete *prefix*
of the HLog: it won't miss a record, if a record sync() to a data node
fails, all the existing bytes in the block are replicated to a new
data node.

if we employ a similar leader node among the N replicas of
cassandra (coordinator always waits for the reply from leader, but
leader does not do further replication like in HBase or counters), the
leader sees all writes onto the key range, but the other replicas
could miss some writes, as a result, each of the non-leader replicas'
write history has some holes, so when the leader dies, and when we
elect a new one, no one is going to have a complete history. so you'd
have to do a repair amongst all the replicas to reconstruct the full
history, which is slow.

it seems possible that we could utilize the FIFO property of the
InComingTCPConnection to simplify history reconstruction, just like
Zookeeper. if the IncomingTcpConnection of a replica fails, that means
that it may have missed some edits, then when it reconnects, we force
it to talk to the active leader first, to catch up to date. when the
leader dies, the next leader is elected to be the replica with the
most recent history.  by maintaining the property that each node has a
complete prefix of history, we only need to catch up on the tail of
history, and avoid doing a complete repair on the entire
memtable+SStable.  but one issue is that the history at the leader has
to be kept really long - if a non-leader replica goes off for 2
days, the leader has to keep all the history for 2 days to feed them
to the replica when it comes back online. but possibly this could be
limited to some max length so that over that length, the woken replica
simply does a complete bootstrap.


thanks
yang
On Sun, Jul 3, 2011 at 8:25 PM, AJ a...@dude.podzone.net wrote:
 We seem to be having a fundamental misunderstanding.  Thanks for your
 comments. aj

 On 7/3/2011 8:28 PM, William Oberman wrote:

 I'm using cassandra as a tool, like a black box with a certain contract to
 the world.  Without modifying the core, C* will send the updates to all
 replicas, so your plan would cause the extra write (for the placeholder).  I
 wasn't assuming a modification to how C* fundamentally works.
 Sounds like you are hacking (or at least looking) at the source, so all the
 power to you if/when you try these kind of changes.
 will
 On Sun, Jul 3, 2011 at 8:45 PM, AJ a...@dude.podzone.net wrote:

 On 7/3/2011 6:32 PM, William Oberman wrote:

 Was just going off of:  Send the value to the primary replica and send
 placeholder values to the other replicas.  Sounded like you wanted to write
 the value to one, and write the placeholder to N-1 to me.

 Yes, that is what I was suggesting.  The point of the placeholders is to
 handle the crash case that I talked about... like a WAL does.

 But, C* will propagate the value to N-1 eventually anyways, 'cause that's
 just what it does anyways :-)
 will

 On Sun, Jul 3, 2011 at 7:47 PM, AJ a...@dude.podzone.net wrote:

 On 7/3/2011 3:49 PM, Will Oberman wrote:

 Why not send the value itself instead of a placeholder?  Now it takes 2x
 writes on a random node to do a single update (write placeholder, write
 update) and N*x writes from the client (write value, write placeholder to
 N-1). Where N is replication factor.  Seems like extra network and IO
 instead of less...

 To send the value to each node is 1.) unnecessary, 2.) will only cause a
 large burst of network traffic.  Think about if it's a large data value,
 such as a document.  Just let C* do it's thing.  The extra messages are tiny
 and doesn't significantly increase latency since they are all sent
 asynchronously.


 Of course, I still think this sounds like reimplementing Cassandra
 internals in a Cassandra client (just guessing, I'm not a cassandra dev)

 I don't see how.  Maybe you should take a peek at the source.


 On Jul 3, 2011, at 5:20 PM, AJ a...@dude.podzone.net wrote:

 Yang,

 How would you deal with the problem when the 1st node responds success
 but then crashes before completely forwarding any replicas?  Then, after
 switching to the next primary, a read would return stale data.

 Here's a quick-n-dirty way:  Send the value to the primary replica and
 send placeholder values to the other replicas.  The placeholder value is
 something like, PENDING_UPDATE.  The placeholder values are sent with
 timestamps 1 less than the timestamp for the actual value that went to the
 primary.  Later, when the changes propagate, the actual values 

Re: Node repair questions

2011-07-11 Thread Peter Schuller
(not answering (1) right now, because it's more involved)

 2. Does a Nodetool Repair block any reads and writes on the node,
 while the repair is going on ? During repair, if I try to do an
 insert, will the insert wait for repair to complete first ?

It doesn't imply any blocking. It's roughly similar to compaction in
its impact on nodes; in addition when data is streamed (if any) the
impact should be similar to node bootstrapping.

 3. I read that repair can impact your workload as it causes additional
 disk and cpu activity. But any details of the impact mechanism and any
 ballpark on how much the read/write performance deteriorates ?

The compaction part will have an impact similar to regular compaction
except it's read-only (no writing of new sstables). It is subject to
compaction throttling if you run a version of Cassandra with
compaction throttling.

Streaming causes disk/networking load and is not yet rate limited like
compaction.

In addition be aware that repair can cause disk space usage to
temporarily increase if there are significant differences to be
repaired.

-- 
/ Peter Schuller


Re: Node repair questions

2011-07-11 Thread Peter Schuller
 The more often you repair, the quicker it will be.  The more often your
 nodes go down the longer it will be.

Going to have to disagree a bit here. In most cases the cost of
running through the data and calculating the merkle tree should be
quite significant, and hopefully the differences should be fairly
limited.

The actual data being streamed can be a problem, but unless you have a
situation where you are consistently going significantly out-of-synch
and there is no read-repair, I wouldn't recommend more frequent
repairs if your aim is to minimize the impact on the cluster. (In the
general case, there will be exceptions.)

Also to OP: In general, expect repairs to be more impactful on your
cluster the bigger your data is in comparison to available memory used
for caching. Basically the more cache reliant you are, the grater the
impact of repairs (and compaction) will tend to be.

-- 
/ Peter Schuller


Re: Corrupted data

2011-07-11 Thread Jonathan Ellis
That looks a lot like what I've seen from machines with bad ram.

2011/7/8 Héctor Izquierdo Seliva izquie...@strands.com:
 Hi everyone,

 I'm having thousands of these errors:

  WARN [CompactionExecutor:1] 2011-07-08 16:36:45,705
 CompactionManager.java (line 737) Non-fatal error reading row
 (stacktrace follows)
 java.io.IOError: java.io.IOException: Impossible row size
 6292724931198053
        at
 org.apache.cassandra.db.compaction.CompactionManager.scrubOne(CompactionManager.java:719)
        at
 org.apache.cassandra.db.compaction.CompactionManager.doScrub(CompactionManager.java:633)
        at org.apache.cassandra.db.compaction.CompactionManager.access
 $600(CompactionManager.java:65)
        at org.apache.cassandra.db.compaction.CompactionManager
 $3.call(CompactionManager.java:250)
        at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
        at java.util.concurrent.FutureTask.run(FutureTask.java:138)
        at java.util.concurrent.ThreadPoolExecutor
 $Worker.runTask(ThreadPoolExecutor.java:886)
        at java.util.concurrent.ThreadPoolExecutor
 $Worker.run(ThreadPoolExecutor.java:908)
        at java.lang.Thread.run(Thread.java:662)
 Caused by: java.io.IOException: Impossible row size 6292724931198053
        ... 9 more
  INFO [CompactionExecutor:1] 2011-07-08 16:36:45,705
 CompactionManager.java (line 743) Retrying from row index; data is -8
 bytes starting at 4735525245
  WARN [CompactionExecutor:1] 2011-07-08 16:36:45,705
 CompactionManager.java (line 767) Retry failed too.  Skipping to next
 row (retry's stacktrace follows)
 java.io.IOError: java.io.EOFException: bloom filter claims to be
 863794556 bytes, longer than entire row size -8


 THis is during scrub, as I saw similar errors while in normal operation.
 Is there anything I can do? It looks like I'm going to lose a ton of
 data





-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of DataStax, the source for professional Cassandra support
http://www.datastax.com


thrift API

2011-07-11 Thread 魏金仙
Hi, can anyone explain why APIs include multiget, batch_insert,get_range_slice 
are removed in Version above 7.0?

commitlog replay missing data

2011-07-11 Thread Jeffrey Wang
Hey all,

 

Recently upgraded to 0.8.1 and noticed what seems to be missing data after a
commitlog replay on a single-node cluster. I start the node, insert a bunch
of stuff (~600MB), stop it, and restart it. There are log messages
pertaining to the commitlog replay and no errors, but some of the data is
missing. If I flush before stopping the node, everything is fine, and
running cfstats in the two cases shows different amounts of data in the
SSTables. Moreover, the amount of data that is missing is nondeterministic.
Has anyone run into this? Thanks.

 

Here is the output of a side-by-side diff between cfstats outputs for a
single CF before restarting (left) and after (right). Somehow a 37MB
memtable became a 2.9MB SSTable (note the difference in write count as
well)?

 

Column Family: Blocks   Column
Family: Blocks

SSTable count: 0  | SSTable
count: 1

Space used (live): 0  | Space used
(live): 2907637

Space used (total): 0 | Space used
(total): 2907637

Memtable Columns Count: 8198  | Memtable
Columns Count: 0

Memtable Data Size: 37550510  | Memtable
Data Size: 0

Memtable Switch Count: 0  | Memtable
Switch Count: 1

Read Count: 0   Read Count:
0

Read Latency: NaN ms.   Read
Latency: NaN ms.

Write Count: 8198 | Write Count:
1526

Write Latency: 0.018 ms.  | Write
Latency: 0.011 ms.

Pending Tasks: 0Pending
Tasks: 0

Key cache capacity: 20  Key cache
capacity: 20

Key cache size: 0   Key cache
size: 0

Key cache hit rate: NaN Key cache
hit rate: NaN

Row cache: disabled Row cache:
disabled

Compacted row minimum size: 0 | Compacted
row minimum size: 1110

Compacted row maximum size: 0 | Compacted
row maximum size: 2299

Compacted row mean size: 0| Compacted
row mean size: 1960

 

Note that I patched https://issues.apache.org/jira/browse/CASSANDRA-2317 in
my version, but there are no deletions involved so I don't think it's
relevant unless I messed something up while patching.

 

-Jeffrey



smime.p7s
Description: S/MIME cryptographic signature


Re: CassandraFS in 1.0?

2011-07-11 Thread David Strauss
It's not, currently, but I'm happy to answer questions about its architecture.

On Thu, Jul 7, 2011 at 10:35, Norman Maurer
norman.mau...@googlemail.com wrote:
 May I ask if its opensource by any chance ?

 bye
 norman

 Am Donnerstag, 7. Juli 2011 schrieb David Strauss da...@davidstrauss.net:
 I'm not sure HDFS has the right properties for a media-storage file
 system. We have, however, built a WebDAV server on top of Cassandra
 that avoids any pretension of being a general-purpose, POSIX-compliant
 file system. We mount it on our servers using davfs2, which is also
 nice for a few reasons:

 * We can use standard HTTP load-balancing and dead host avoidance
 strategies with WebDAV.
 * Encrypting access and authenticating clients with PKI/HTTPS works 
 seamlessly.
 * WebDAV + davfs2 is etag-header aware, allowing clients to
 efficiently validate cached items.
 * HTTP is browser and CDN/reverse proxy cache friendly for
 distributing content to people who don't need to mount the file
 system.
 * We could extend the server's support to allow connections from a
 broad variety of interactive desktop clients.

 On Wed, Jul 6, 2011 at 13:11, Joseph Stein crypt...@gmail.com wrote:
 Hey folks, I am going to start prototyping our media tier using cassandra as
 a file system (meaning upload video/audio/images to web server save in
 cassandra and then streaming them out)
 Has anyone done this before?
 I was thinking brisk's CassandraFS might be a fantastic implementation for
 this but then I feel that I need to run another/different Cassandra cluster
 outside of what our ops folks do with Apache Cassandra 0.8.X
 Am I best to just compress files uploaded to the web server and then start
 chunking and saving chunks in rows and columns so the mem issue does not
 smack me in the face?  And use our existing cluster and build it out
 accordingly?
 I am sure our ops people would like the command line aspect of CassandraFS
 but looking for something that makes the most sense all around.
 It seems to me there is a REALLY great thing in CassandraFS and would love
 to see it as part of 1.0 =8^)  or at a minimum some streamlined
 implementation to-do the same thing.
 If comparing to HDFS that is part of Hadoop project even though Cloudera has
 a distribution of Hadoop :) maybe that can work here too _fingers_crosed_
 (or mongodb-gridfs)
 happy to help as I am moving down this road in general
 Thanks!

 /*
 Joe Stein
 http://www.linkedin.com/in/charmalloc
 Twitter: @allthingshadoop
 */




 --
 David Strauss
    | da...@davidstrauss.net
    | +1 512 577 5827 [mobile]





-- 
David Strauss
   | da...@davidstrauss.net
   | +1 512 577 5827 [mobile]