Re: OutOfMemory on count on cassandra 0.6.8 for large number of columns

2010-12-12 Thread Dave Martin
Thanks Tyler. I was unaware of counters.

The use case for column counts is really from a operational perspective,
to allow a sysadmin to do adhoc checks on columns to see if something
has gone wrong in software outside of cassandra.

I think running a cassandra-cli command such as count, which makes
cassandra fall over is not ideal,
unless we can say for X number of columns cassandra needs at least Y
memory allocation for stability.

Cheers

Dave


On Sun, Dec 12, 2010 at 6:39 PM, Tyler Hobbs ty...@riptano.com wrote:
 Cassandra has to deserialize all of the columns in the row for get_count().
 So from Cassandra's perspective, it's almost as much work as getting the
 entire row, it just doesn't have to send everything back over the network.

 If you're frequently counting 8 million columns (or really, anything
 significant), you need to use counters instead.  If this is a rare
 occurrence, you can do the count in multiple chunks by using a starting and
 ending column in the SlicePredicate for each chunk, but this requires some
 rough knowledge about the distribution of the column names in the row.

 - Tyler


Re: N to N relationships

2010-12-12 Thread David Boxenhorn
You want to store every value twice? That would be a pain to maintain, and
possibly lead to inconsistent data.

On Fri, Dec 10, 2010 at 3:50 AM, Nick Bailey n...@riptano.com wrote:

 I would also recommend two column families. Storing the key as NxN would
 require you to hit multiple machines to query for an entire row or column
 with RandomPartitioner. Even with OPP you would need to pick row or columns
 to order by and the other would require hitting multiple machines.  Two
 column families avoids this and avoids any problems with choosing OPP.


 On Thu, Dec 9, 2010 at 2:26 PM, Aaron Morton aa...@thelastpickle.comwrote:

 Am assuming you have one matrix and you know the dimensions. Also as you
 say the most important queries are to get an entire column or an entire row.

 I would consider using a standard CF for the Columns and one for the Rows.
  The key for each would be the col / row number, each cassandra column name
 would be the id of the other dimension and the value whatever you want.

 - when storing the data update both the Column and Row CF
 - reading a whole row/col would be simply reading from the appropriate CF.
 - reading an intersection is a get_slice to either col or row CF using the
 column_names field to identify the other dimension.

 You would not need secondary indexes to serve these queries.

 Hope that helps.
 Aaron

 On 10 Dec, 2010,at 07:02 AM, Sébastien Druon sdr...@spotuse.com wrote:

 I mean if I have secondary indexes. Apparently they are calculated in the
 background...

 On 9 December 2010 18:33, David Boxenhorn da...@lookin2.com wrote:

 What do you mean by indexing?


 On Thu, Dec 9, 2010 at 7:30 PM, Sébastien Druon sdr...@spotuse.comwrote:

 Thanks a lot for the answer

 What about the indexing when adding a new element? Is it incremental?

 Thanks again



 On 9 December 2010 14:38, David Boxenhorn da...@lookin2.com wrote:

 How about a regular CF where keys are n...@n ?

 Then, getting a matrix row would be the same cost as getting a matrix
 column (N gets), and it would be very easy to add element N+1.



 On Thu, Dec 9, 2010 at 1:48 PM, Sébastien Druon sdr...@spotuse.comwrote:

 Hello,

 For a specific case, we are thinking about representing a N to N
 relationship with a NxN Matrix in Cassandra.
 The relations will be only between a subset of elements, so the Matrix
 will mostly contain empty elements.

 We have a set of questions concerning this:
 - what is the best way to represent this matrix? what would have the
 best performance in reading? in writing?
   . a super column family with n column families, with n columns each
   . a column family with n columns and n lines

 In the second case, we would need to extract 2 kinds of information:
 - all the relations for a line: this should be no specific problem;
 - all the relations for a column: in that case we would need an index
 for the columns, right? and then get all the lines where the value of the
 column in question is not null... is it the correct way to do?
 When using indexes, say we want to add another element N+1. What
 impact in terms of time would it have on the indexation job?

 Thanks a lot for the answers,

 Best regards,

 Sébastien Druon









Quorum and Datacenter loss

2010-12-12 Thread Jonathan Colby
Hi cassandra experts -

We're planning a cassandra cluster across 2 datacenters
(datacenter-aware, random partitioning) with QUORUM consistency.

It seems to me that with 2 datacenters, if one datacenter is lost,
the  reads/writes to cassandra  will fail in the surviving datacenter
because of the N/2 + 1 distribution of replicas.  In other words, you
need more than half of the replicas to respond but in the case of a
datacenter loss you would only ever get 1/2 to respond at best.

Is my logic wrong here?  Is there a way to ensure the nodes in the
alive datacenter respond successfully if the second datacenter is
lost?  Anyone have experience with this kind of problem?

Thanks.


Re: Quorum and Datacenter loss

2010-12-12 Thread Peter Schuller
 Is my logic wrong here?  Is there a way to ensure the nodes in the
 alive datacenter respond successfully if the second datacenter is
 lost?  Anyone have experience with this kind of problem?

It's impossible to achieve the consistency and availability at the
same time. See:

   http://en.wikipedia.org/wiki/CAP_theorem

-- 
/ Peter Schuller


Re: Quorum and Datacenter loss

2010-12-12 Thread Peter Schuller
 Is my logic wrong here?  Is there a way to ensure the nodes in the
 alive datacenter respond successfully if the second datacenter is
 lost?  Anyone have experience with this kind of problem?

 It's impossible to achieve the consistency and availability at the
 same time. See:

(Assuming partition tolerance)

Anyways, to expand a bit: The final consequence is that if you have a
cluster that really does need QUORUM consistency, you won't be able to
survive (in terms of availability, i.e., the cluster serving your
traffic) data centers going down. If you want to continue operating in
the case of a partition, you (1) cannot use QUORUM and (2) your
application must be designed to work with and survive seeing
inconsistent data.

-- 
/ Peter Schuller


Unsubscribe

2010-12-12 Thread Colin
Unsubscribe

Please

Sent from my iPad

On Dec 12, 2010, at 1:26 AM, Dave Martin moyesys...@googlemail.com wrote:

 Hi there,
 
 I see the following:
 
 1) Add 8,000,000 columns to a single row. Each column name is a UUID.
 2) Use cassandra-cli to run count keyspace.cf['myGUID']
 
 The following is reported in the logs:
 
 ERROR [DroppedMessagesLogger] 2010-12-12 18:17:36,046 CassandraDaemon.java 
 (line 87) Uncaught exception in thread Thread[DroppedMessagesLogger,5,main]
 java.lang.OutOfMemoryError: Java heap space
 ERROR [pool-1-thread-2] 2010-12-12 18:17:36,046 Cassandra.java (line 1407) 
 Internal error processing get_count
 java.lang.OutOfMemoryError: Java heap space
 
 and Cassandra falls over. I see the same behaviour with 0.6.6.
 
 Increasing the memory allocation with the -Xmx  -Xms args to 4GB allows the 
 count to return in this particular example (i.e. no OutOfMemory is thrown).
 
 Here's the scala code that was ran to load the column, which uses the AKKA 
 persistence API:
 
 object ColumnTest {
def main(args : Array[String]) : Unit = {
println(Super column test starting)
val hosts = Array{localhost}
val sessions = new 
 CassandraSessionPool(occurrence,StackPool(SocketProvider(localhost, 
 9160)),Protocol.Binary,ConsistencyLevel.ONE)
val session = sessions.newSession
loadRow(myGUID, 800, session)
session.close
}

def loadRow(key:String, noOfColumns:Int, session:CassandraSession){
print(loading: +key+, with columns: +noOfColumns)
val start = System.currentTimeMillis
val rawPath = new ColumnPath(dr)
for(i - 0 until noOfColumns){
val recordUuid = UUID.randomUUID.toString
session ++| (key, rawPath.setColumn(recordUuid.getBytes), 
 1.getBytes, System.currentTimeMillis)
session.flush
}
val finish = System.currentTimeMillis
print(, Time taken (secs) : +((finish-start)/1000) +  seconds.\n)
}
 }
 
 Heres the configuration used:
 
 # Arguments to pass to the JVM
 JVM_OPTS= \
-ea \
-Xms1G \
-Xmx2G \
-XX:+UseParNewGC \
-XX:+UseConcMarkSweepGC \
-XX:+CMSParallelRemarkEnabled \
-XX:SurvivorRatio=8 \
-XX:MaxTenuringThreshold=1 \
-XX:CMSInitiatingOccupancyFraction=75 \
-XX:+UseCMSInitiatingOccupancyOnly \
-XX:+HeapDumpOnOutOfMemoryError \
-Dcom.sun.management.jmxremote.port=8080 \
-Dcom.sun.management.jmxremote.ssl=false \
-Dcom.sun.management.jmxremote.authenticate=false
 
 Admittedly the resource allocation is small, but I wondered if there should 
 be some configuration guidelines (e.g. memory allocation vs number of columns 
 supported).
 
 Im running this on my MBP with a single node and java as thus:
 
 $ java -version
 java version 1.6.0_22
 Java(TM) SE Runtime Environment (build 1.6.0_22-b04-307-10M3261)
 Java HotSpot(TM) 64-Bit Server VM (build 17.1-b03-307, mixed mode)
 
 Heres the CF definition:
 
Keyspace Name=occurrence
  ColumnFamily Name=dr
CompareWith=UTF8Type
Comment=The column family for dataset tracking/
 
 ReplicaPlacementStrategyorg.apache.cassandra.locator.RackUnawareStrategy/ReplicaPlacementStrategy
 ReplicationFactor1/ReplicationFactor
 
 EndPointSnitchorg.apache.cassandra.locator.EndPointSnitch/EndPointSnitch
/Keyspace
 
 Apologies in advance if this is a known issue or a known limitation of 0.6.x.
 I had wondered if I was hitting the 2GB row limit for 0.6.x releases, but 
 8mill columns = 300MB approx in this particular case.   
 I guess it may also be a result of the limitations with thrift (i.e. no 
 streaming capabilities).
 
 Any thoughts appreciated,
 
 Dave
 
 
 
 
 
 
 
 


Re: Unsubscribe

2010-12-12 Thread Peter Schuller
 Unsubscribe

http://wiki.apache.org/cassandra/FAQ#unsubscribe


-- 
/ Peter Schuller


Re: Quorum and Datacenter loss

2010-12-12 Thread Jonathan Colby
Thanks a lot Peter.   So basically we would need to choose a
consistency other than QUORUM.I think in our case consistency is
not necessarily an issue since our data is write-once, read-many
(immutable data).   I suppose having a replication factor of 4 would
result in two nodes in each datacenter having a copy of the data.   If
there's a flaw in my logic, please let me know : ]

On Sun, Dec 12, 2010 at 2:04 PM, Peter Schuller
peter.schul...@infidyne.com wrote:
 Is my logic wrong here?  Is there a way to ensure the nodes in the
 alive datacenter respond successfully if the second datacenter is
 lost?  Anyone have experience with this kind of problem?

 It's impossible to achieve the consistency and availability at the
 same time. See:

 (Assuming partition tolerance)

 Anyways, to expand a bit: The final consequence is that if you have a
 cluster that really does need QUORUM consistency, you won't be able to
 survive (in terms of availability, i.e., the cluster serving your
 traffic) data centers going down. If you want to continue operating in
 the case of a partition, you (1) cannot use QUORUM and (2) your
 application must be designed to work with and survive seeing
 inconsistent data.

 --
 / Peter Schuller



Re: Memory leak with Sun Java 1.6 ?

2010-12-12 Thread Timo Nentwig

On Dec 10, 2010, at 19:37, Peter Schuller wrote:

 To cargo cult it: Are you running a modern JVM? (Not e.g. openjdk b17
 in lenny or some such.) If it is a JVM issue, ensuring you're using a
 reasonably recent JVM is probably much easier than to start tracking
 it down...

I had OOM problems with OpenJDK, switched to Sun/Oracle's recent 1.6.0_23 
and...still have the same problem :-\ Stack trace always looks the same:

java.lang.OutOfMemoryError: Java heap space
at java.nio.HeapByteBuffer.init(HeapByteBuffer.java:57)
at java.nio.ByteBuffer.allocate(ByteBuffer.java:329)
at 
org.apache.cassandra.utils.FBUtilities.readByteArray(FBUtilities.java:261)
at 
org.apache.cassandra.db.ColumnSerializer.deserialize(ColumnSerializer.java:76)
at 
org.apache.cassandra.db.ColumnSerializer.deserialize(ColumnSerializer.java:35)
at 
org.apache.cassandra.db.ColumnFamilySerializer.deserializeColumns(ColumnFamilySerializer.java:129)
at 
org.apache.cassandra.db.ColumnFamilySerializer.deserialize(ColumnFamilySerializer.java:120)
at 
org.apache.cassandra.db.RowMutationSerializer.defreezeTheMaps(RowMutation.java:383)
at 
org.apache.cassandra.db.RowMutationSerializer.deserialize(RowMutation.java:393)
at 
org.apache.cassandra.db.RowMutationSerializer.deserialize(RowMutation.java:351)
at 
org.apache.cassandra.db.RowMutationVerbHandler.doVerb(RowMutationVerbHandler.java:52)
at 
org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:63)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
at java.lang.Thread.run(Thread.java:636)

I'm writing from 1 client with 50 threads to a cluster of 4 machines (with 
hector). With QUORUM and ONE 2 machines quite reliably will soon die with OOM. 
What may cause this? Won't cassandra block/reject when memtable is full and 
being flushed to disk but grow and if flushing to disk isn't fast enough will 
run out of memory?

Re: Memory leak with Sun Java 1.6 ?

2010-12-12 Thread Jonathan Ellis
http://www.riptano.com/docs/0.6/troubleshooting/index#nodes-are-dying-with-oom-errors

On Sun, Dec 12, 2010 at 9:52 AM, Timo Nentwig timo.nent...@toptarif.dewrote:


 On Dec 10, 2010, at 19:37, Peter Schuller wrote:

  To cargo cult it: Are you running a modern JVM? (Not e.g. openjdk b17
  in lenny or some such.) If it is a JVM issue, ensuring you're using a
  reasonably recent JVM is probably much easier than to start tracking
  it down...

 I had OOM problems with OpenJDK, switched to Sun/Oracle's recent 1.6.0_23
 and...still have the same problem :-\ Stack trace always looks the same:

 java.lang.OutOfMemoryError: Java heap space
at java.nio.HeapByteBuffer.init(HeapByteBuffer.java:57)
at java.nio.ByteBuffer.allocate(ByteBuffer.java:329)
at
 org.apache.cassandra.utils.FBUtilities.readByteArray(FBUtilities.java:261)
at
 org.apache.cassandra.db.ColumnSerializer.deserialize(ColumnSerializer.java:76)
at
 org.apache.cassandra.db.ColumnSerializer.deserialize(ColumnSerializer.java:35)
at
 org.apache.cassandra.db.ColumnFamilySerializer.deserializeColumns(ColumnFamilySerializer.java:129)
at
 org.apache.cassandra.db.ColumnFamilySerializer.deserialize(ColumnFamilySerializer.java:120)
at
 org.apache.cassandra.db.RowMutationSerializer.defreezeTheMaps(RowMutation.java:383)
at
 org.apache.cassandra.db.RowMutationSerializer.deserialize(RowMutation.java:393)
at
 org.apache.cassandra.db.RowMutationSerializer.deserialize(RowMutation.java:351)
at
 org.apache.cassandra.db.RowMutationVerbHandler.doVerb(RowMutationVerbHandler.java:52)
at
 org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:63)
at
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
at
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
at java.lang.Thread.run(Thread.java:636)

 I'm writing from 1 client with 50 threads to a cluster of 4 machines (with
 hector). With QUORUM and ONE 2 machines quite reliably will soon die with
 OOM. What may cause this? Won't cassandra block/reject when memtable is full
 and being flushed to disk but grow and if flushing to disk isn't fast enough
 will run out of memory?




-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of Riptano, the source for professional Cassandra support
http://riptano.com


Dynamic Snitch / Read Path Questions

2010-12-12 Thread Daniel Doubleday

Hi again.

It would be great if someone could comment whether the following is true 
or not.
I tried to understand the consequences of using 
|-Dcassandra.dynamic_snitch=true for the read path |and that's what I 
came up with:


1) If using CL  1 than using the dynamic snitch will result in a data 
read from node with the lowest latency (little simplified) even if the 
proxy node contains the data but has a higher latency that other 
possible nodes which means that it is not necessary to do load-based 
balancing on the client side.


2) If using CL =1 than the proxy node will always return the data itself 
even when there is another node with less load.


3) Digest requests will be sent to all other living peer nodes for that 
key and will result in a data read on all nodes to calculate the digest. 
The only difference is that the data is not sent back but IO-wise it is 
just as expensive.



The next one goes a little further:

We read / write with quorum / rf = 3.

It seems to me that it wouldn't be hard to patch the StorageProxy to 
send only one read request and one digest request. Only if one of the 
requests fail we would have to query the remaining node. We don't need 
read repair because we have to repair once a week anyways and quorum 
guarantees consistency. This way we could reduce read load significantly 
which should compensate for latency increase by failing reads. Am I 
missing something?



Best,
Daniel





Re: Quorum and Datacenter loss

2010-12-12 Thread Peter Schuller
 Thanks a lot Peter.   So basically we would need to choose a
 consistency other than QUORUM.    I think in our case consistency is
 not necessarily an issue since our data is write-once, read-many
 (immutable data).   I suppose having a replication factor of 4 would
 result in two nodes in each datacenter having a copy of the data.   If
 there's a flaw in my logic, please let me know : ]

It would, but note that if you're writing at consistency level ONE
only a single copy of the data is required to exist before your write
is ACK:ed back to the client (but it will still be replicated).

-- 
/ Peter Schuller


iterate over all the rows with RP

2010-12-12 Thread shimi
Is the same connection is required when iterating over all the rows with
Random Paritioner or is it possible to use a different connection for each
iteration?

Shimi


Re: iterate over all the rows with RP

2010-12-12 Thread Peter Schuller
 Is the same connection is required when iterating over all the rows with
 Random Paritioner or is it possible to use a different connection for each
 iteration?

In general, the choice of RPC connection (I assume you mean the
underlying thrift connection) does not affect the semantics of the RPC
calls.

-- 
/ Peter Schuller


Re: iterate over all the rows with RP

2010-12-12 Thread shimi
So if I will use a different connection (thrift via Hector), will I get the
same results? It's make sense when you use OPP and I assume it is the same
with RP. I just wanted to make sure this is the case and there is no state
which is kept.

Shimi

On Sun, Dec 12, 2010 at 8:14 PM, Peter Schuller peter.schul...@infidyne.com
 wrote:

  Is the same connection is required when iterating over all the rows with
  Random Paritioner or is it possible to use a different connection for
 each
  iteration?

 In general, the choice of RPC connection (I assume you mean the
 underlying thrift connection) does not affect the semantics of the RPC
 calls.

 --
 / Peter Schuller



Re: N to N relationships

2010-12-12 Thread Edward Capriolo
On Sun, Dec 12, 2010 at 3:20 AM, David Boxenhorn da...@lookin2.com wrote:
 You want to store every value twice? That would be a pain to maintain, and
 possibly lead to inconsistent data.

 On Fri, Dec 10, 2010 at 3:50 AM, Nick Bailey n...@riptano.com wrote:

 I would also recommend two column families. Storing the key as NxN would
 require you to hit multiple machines to query for an entire row or column
 with RandomPartitioner. Even with OPP you would need to pick row or columns
 to order by and the other would require hitting multiple machines.  Two
 column families avoids this and avoids any problems with choosing OPP.

 On Thu, Dec 9, 2010 at 2:26 PM, Aaron Morton aa...@thelastpickle.com
 wrote:

 Am assuming you have one matrix and you know the dimensions. Also as you
 say the most important queries are to get an entire column or an entire row.
 I would consider using a standard CF for the Columns and one for the
 Rows.  The key for each would be the col / row number, each cassandra column
 name would be the id of the other dimension and the value whatever you want.

 - when storing the data update both the Column and Row CF
 - reading a whole row/col would be simply reading from the appropriate
 CF.
 - reading an intersection is a get_slice to either col or row CF using
 the column_names field to identify the other dimension.
 You would not need secondary indexes to serve these queries.
 Hope that helps.
 Aaron
 On 10 Dec, 2010,at 07:02 AM, Sébastien Druon sdr...@spotuse.com wrote:

 I mean if I have secondary indexes. Apparently they are calculated in the
 background...

 On 9 December 2010 18:33, David Boxenhorn da...@lookin2.com wrote:

 What do you mean by indexing?


 On Thu, Dec 9, 2010 at 7:30 PM, Sébastien Druon sdr...@spotuse.com
 wrote:

 Thanks a lot for the answer
 What about the indexing when adding a new element? Is it incremental?
 Thanks again


 On 9 December 2010 14:38, David Boxenhorn da...@lookin2.com wrote:

 How about a regular CF where keys are n...@n ?

 Then, getting a matrix row would be the same cost as getting a matrix
 column (N gets), and it would be very easy to add element N+1.



 On Thu, Dec 9, 2010 at 1:48 PM, Sébastien Druon sdr...@spotuse.com
 wrote:

 Hello,
 For a specific case, we are thinking about representing a N to N
 relationship with a NxN Matrix in Cassandra.
 The relations will be only between a subset of elements, so the
 Matrix will mostly contain empty elements.
 We have a set of questions concerning this:
 - what is the best way to represent this matrix? what would have the
 best performance in reading? in writing?
   . a super column family with n column families, with n columns each
   . a column family with n columns and n lines
 In the second case, we would need to extract 2 kinds of information:
 - all the relations for a line: this should be no specific problem;
 - all the relations for a column: in that case we would need an index
 for the columns, right? and then get all the lines where the value of 
 the
 column in question is not null... is it the correct way to do?
 When using indexes, say we want to add another element N+1. What
 impact in terms of time would it have on the indexation job?
 Thanks a lot for the answers,
 Best regards,
 Sébastien Druon






Before secondary indexes the only option was to store the data twice.
Yes you have to maintain this yourself. The data model only provides
fast searches on the key. An index normally a separate entity with
different ordering, almost the same here.


Re: OutOfMemory on count on cassandra 0.6.8 for large number of columns

2010-12-12 Thread Tyler Hobbs
Well, in this case I would say you probably need about 300MB of space in the
heap, since that's what you've calculated.

The APIs are designed to let you do what you think is best and they
definitely won't stop you from shooting yourself in the foot.  Counting a
huge row, or trying to grab every row in a large column family are examples
of this.  Some of the clients try to protect you from this, but there is
only so much that can be done without specific knowledge of the data, and
get_count() is an example of this.

While we're on the topic of large rows, if your row is essentially unbounded
in size, you need to consider splitting it. This is especially true if you
stay with 0.6, where compactions of large rows can OOM you pretty easily.

- Tyler

On Sun, Dec 12, 2010 at 2:07 AM, Dave Martin moyesys...@googlemail.comwrote:

 Thanks Tyler. I was unaware of counters.

 The use case for column counts is really from a operational perspective,
 to allow a sysadmin to do adhoc checks on columns to see if something
 has gone wrong in software outside of cassandra.

 I think running a cassandra-cli command such as count, which makes
 cassandra fall over is not ideal,
 unless we can say for X number of columns cassandra needs at least Y
 memory allocation for stability.

 Cheers

 Dave


 On Sun, Dec 12, 2010 at 6:39 PM, Tyler Hobbs ty...@riptano.com wrote:
  Cassandra has to deserialize all of the columns in the row for
 get_count().
  So from Cassandra's perspective, it's almost as much work as getting the
  entire row, it just doesn't have to send everything back over the
 network.
 
  If you're frequently counting 8 million columns (or really, anything
  significant), you need to use counters instead.  If this is a rare
  occurrence, you can do the count in multiple chunks by using a starting
 and
  ending column in the SlicePredicate for each chunk, but this requires
 some
  rough knowledge about the distribution of the column names in the row.
 
  - Tyler



Re: iterate over all the rows with RP

2010-12-12 Thread Ran Tavory
This should be the case, yes, semantics isn't affected by the
connection and state isn't kept. What might happen if you read/write
with low consistency levels then when you hit a different host on the
ring it might have an inconsistent state in case of partition.

On Sunday, December 12, 2010, shimi shim...@gmail.com wrote:
 So if I will use a different connection (thrift via Hector), will I get the 
 same results? It's make sense when you use OPP and I assume it is the same 
 with RP. I just wanted to make sure this is the case and there is no state 
 which is kept.

 Shimi

 On Sun, Dec 12, 2010 at 8:14 PM, Peter Schuller peter.schul...@infidyne.com 
 wrote:

 Is the same connection is required when iterating over all the rows with
 Random Paritioner or is it possible to use a different connection for each
 iteration?

 In general, the choice of RPC connection (I assume you mean the
 underlying thrift connection) does not affect the semantics of the RPC
 calls.

 --
 / Peter Schuller




-- 
/Ran