Re: Cassandra nodes on EC2 in two different regions not communicating

2011-02-23 Thread Himanshi Sharma
giving private ip to rpc address gives the same exception
and the keeping it blank and providing public to listen also fails. I tried keeping both blank and did telnet on 7000 so i get following o/p
 
[root@ip-10-166-223-150 bin]# telnet 122.248.193.37 7000Trying 122.248.193.37...Connected to 122.248.193.37.Escape character is '^]'.
 
Similarly from another achine
 
[root@ip-10-136-75-201 bin]# telnet 184.72.22.87 7000Trying 184.72.22.87...Connected to 184.72.22.87.Escape character is '^]'.
 
-Dave Viner wrote: - 

To: user@cassandra.apache.orgFrom: Dave Viner Date: 02/24/2011 11:59AMcc: Himanshi Sharma Subject: Re: Cassandra nodes on EC2 in two different regions not communicatingTry using the private ipv4 address in the rpc_address field, and the public ipv4 (NOT the elastic ip) in the listen_address. 

If that fails, go back to rpc_address empty, and start up cassandra. 

Then from the other node, please telnet to port 7000 on the first node.  And show the output of that session in your reply. 

I haven't actually constructed a cross-region cluster nor have I used v0.7, but this really sounds like it should be easy. 
On Wed, Feb 23, 2011 at 10:22 PM, Himanshi Sharma < himanshi.sha...@tcs.com > wrote: 

Hi Dave, 
  
I tried with the public ips. If i mention the public ip in rpc address field, Cassandra gives the same exception but if leave it blank then Cassandra runs but again in the nodetool command with ring option it does'nt show the node in another region. 
  
Thanks, 
Himanshi -Dave Viner wrote: - 


To: user@cassandra.apache.org From: Dave Viner < davevi...@gmail.com > Date: 02/24/2011 10:43AM 


Subject: Re: Cassandra nodes on EC2 in two different regions not communicating That looks like it's not an issue of communicating between nodes.  It appears that the node can not bind to the address on the localhost that you're asking for. 

" java.net.BindException: Cannot assign requested address  " 

I think the issue is that the Elastic IP address is not actually an IP address that's on the localhost.  So the daemon can not bind to that IP.  Instead of using the EIP, use the local IP address for the rpc_address (i think that's what you need since that is what Thrift will bind to).  Then for the listen_address should be the ip address that is routable from the other node.  I would first try with the actual public IP address (not the Elastic IP).  Once you get that to work, then shutdown the cluster, change the listen_address to the EIP, boot up and try again. 

Dave Viner 

On Wed, Feb 23, 2011 at 8:54 PM, Himanshi Sharma < himanshi.sha...@tcs.com > wrote: 
Hey Dave, Sorry i forgot to mention the Non-seed configuration. for first node in us-west its as belowi.e its own elastic ip listen_address: 50.18.60.117 rpc_address: 50.18.60.117 and for second node in ap-southeast-1 its as belowi.e again its own elastic ip listen_address: 175.41.143.192 rpc_address: 175.41.143.192 Thanks, Himanshi 



From: 

Dave Viner < davevi...@gmail.com > 

To: 
user@cassandra.apache.org 

Date: 
02/23/2011 11:01 PM 

Subject: 
Re: Cassandra nodes on EC2 in two different regions not communicating 

internal EC2 ips (10.xxx.xxx.xxx) work across availability zones (e.g., from us-east-1a to us-east-1b) but do not work across regions (e.g., us-east to us-west).  To do regions, you must use the public ip address assigned by amazon. Himanshi, when you log into 1 node, and telnet to port 7000 on the other node, which IP address did you use - the 10.x address or the public ip address? And what is the seed/non-seed configuration in both cassandra.yaml files? Dave Viner On Wed, Feb 23, 2011 at 8:12 AM, Frank LoVecchio < fr...@isidorey.com > wrote: The internal Amazon IP address is what you will want to use so you don't have to go through DNS anyways; not sure if this works from US-East to US-West, but it does make things quicker in between zones, e.g. us-east-1a to us-east-1b. On Wed, Feb 23, 2011 at 9:09 AM, Dave Viner < davevi...@gmail.com > wrote: Try using the IP address, not the dns name in the cassandra.yaml. If you can telnet from one to the other on port 7000, and both nodes have the other node in their config, it should work. Dave Viner On Wed, Feb 23, 2011 at 1:43 AM, Himanshi Sharma < himanshi.sha...@tcs.com > wrote: Ya they do. Have specified Public DNS in seed field of each node in Cassandra.yaml...nt able to figure out what the problem is ??? 



From: 
Sasha Dolgy < sdo...@gmail.com > 

To: 
user@cassandra.apache.org 

Date: 
02/23/2011 02:56 PM 

Subject: 
Re: Cassandra nodes on EC2 in two different regions not communicating 

did you define the other host in the cassandra.yaml ?  on both servers  they need to know about each other On Wed, Feb 23, 2011 at 10:16 AM, Himanshi Sharma < himanshi.sha...@tcs.com > wrote: Thanks Dave but I am able to telnet to other instances on port 7000 and when i run  ./nodetool --host ec2-50-18-60-117.us-west-1.compute.amazonaws.com  ring... I can see only one no

Re: Cassandra nodes on EC2 in two different regions not communicating

2011-02-23 Thread Dave Viner
Try using the private ipv4 address in the rpc_address field, and the public
ipv4 (NOT the elastic ip) in the listen_address.

If that fails, go back to rpc_address empty, and start up cassandra.

Then from the other node, please telnet to port 7000 on the first node.  And
show the output of that session in your reply.

I haven't actually constructed a cross-region cluster nor have I used v0.7,
but this really sounds like it should be easy.

On Wed, Feb 23, 2011 at 10:22 PM, Himanshi Sharma
wrote:

> Hi Dave,
>
> I tried with the public ips. If i mention the public ip in rpc address
> field, Cassandra gives the same exception but if leave it blank then
> Cassandra runs but again in the nodetool command with ring option it does'nt
> show the node in another region.
>
> Thanks,
> Himanshi
>
>
> -Dave Viner wrote: -
>
> To: user@cassandra.apache.org
> From: Dave Viner 
> Date: 02/24/2011 10:43AM
>
> Subject: Re: Cassandra nodes on EC2 in two different regions not
> communicating
>
> That looks like it's not an issue of communicating between nodes.  It
> appears that the node can not bind to the address on the localhost that
> you're asking for.
>
> " java.net.BindException: Cannot assign requested address  "
>
> I think the issue is that the Elastic IP address is not actually an IP
> address that's on the localhost.  So the daemon can not bind to that IP.
>  Instead of using the EIP, use the local IP address for the rpc_address (i
> think that's what you need since that is what Thrift will bind to).  Then
> for the listen_address should be the ip address that is routable from the
> other node.  I would first try with the actual public IP address (not the
> Elastic IP).  Once you get that to work, then shutdown the cluster, change
> the listen_address to the EIP, boot up and try again.
>
> Dave Viner
>
>
> On Wed, Feb 23, 2011 at 8:54 PM, Himanshi Sharma < himanshi.sha...@tcs.com
> > wrote:
>
>>
>> Hey Dave,
>>
>> Sorry i forgot to mention the Non-seed configuration.
>>
>> for first node in us-west its as belowi.e its own elastic ip
>>
>> listen_address: 50.18.60.117
>> rpc_address: 50.18.60.117
>>
>> and for second node in ap-southeast-1 its as belowi.e again its own
>> elastic ip
>>
>> listen_address: 175.41.143.192
>> rpc_address: 175.41.143.192
>>
>> Thanks,
>> Himanshi
>>
>>
>>
>>
>>
>>   From:
>> Dave Viner < davevi...@gmail.com >
>>  To: user@cassandra.apache.org  Date: 02/23/2011 11:01 PM  Subject: Re:
>> Cassandra nodes on EC2 in two different regions not communicating
>> --
>>
>>
>>
>> internal EC2 ips (10.xxx.xxx.xxx) work across availability zones (e.g.,
>> from us-east-1a to us-east-1b) but do not work across regions (e.g., us-east
>> to us-west).  To do regions, you must use the public ip address assigned by
>> amazon.
>>
>> Himanshi, when you log into 1 node, and telnet to port 7000 on the other
>> node, which IP address did you use - the 10.x address or the public ip
>> address?
>> And what is the seed/non-seed configuration in both cassandra.yaml files?
>>
>> Dave Viner
>>
>>
>> On Wed, Feb 23, 2011 at 8:12 AM, Frank LoVecchio < *fr...@isidorey.com 
>> *>
>> wrote:
>> The internal Amazon IP address is what you will want to use so you don't
>> have to go through DNS anyways; not sure if this works from US-East to
>> US-West, but it does make things quicker in between zones, e.g. us-east-1a
>> to us-east-1b.
>>
>>
>> On Wed, Feb 23, 2011 at 9:09 AM, Dave Viner < *davevi...@gmail.com 
>> *>
>> wrote:
>> Try using the IP address, not the dns name in the cassandra.yaml.
>>
>> If you can telnet from one to the other on port 7000, and both nodes have
>> the other node in their config, it should work.
>>
>> Dave Viner
>>
>>
>> On Wed, Feb 23, 2011 at 1:43 AM, Himanshi Sharma < *himanshi.sha...@tcs.com
>> * > wrote:
>>
>> Ya they do. Have specified Public DNS in seed field of each node in
>> Cassandra.yaml...nt able to figure out what the problem is ???
>>
>>
>>
>>   From: Sasha Dolgy < *sdo...@gmail.com * >  To: 
>> *user@cassandra.apache.org
>> *  Date: 02/23/2011 02:56 PM  Subject: Re:
>> Cassandra nodes on EC2 in two different regions not communicating
>>
>> --
>>
>>
>>
>> did you define the other host in the cassandra.yaml ?  on both servers
>>  they need to know about each other
>>
>> On Wed, Feb 23, 2011 at 10:16 AM, Himanshi Sharma < *himanshi.sha...@tcs.com
>> * > wrote:
>>
>> Thanks Dave but I am able to telnet to other instances on port 7000
>> and when i run  ./nodetool --host 
>> *ec2-50-18-60-117.us-west-1.compute.amazonaws.com
>> *  ring... I
>> can see only one node.
>>
>> Do we need to configure anything else in Cassandra.yaml or
>> Cassandra-env.sh ???
>>
>>
>>
>>
>>
>>   From: Dave Viner < *davevi...@gmail.com * >  To: 
>> *user@cassandra.apache.org
>> *  Cc: Himanshi Sharma < *himanshi.sha...@tcs.com
>> * >  Date: 02/23/2011 11:36 AM  Subject: Re:
>> C

Re: Cassandra nodes on EC2 in two different regions not communicating

2011-02-23 Thread Himanshi Sharma
Hi Dave,
 
I tried with the public ips. If i mention the public ip in rpc address field, Cassandra gives the same exception but if leave it blank then Cassandra runs but again in the nodetool command with ring option it does'nt show the node in another region.
 
Thanks, 
Himanshi-Dave Viner wrote: - 

To: user@cassandra.apache.orgFrom: Dave Viner Date: 02/24/2011 10:43AMSubject: Re: Cassandra nodes on EC2 in two different regions not communicatingThat looks like it's not an issue of communicating between nodes.  It appears that the node can not bind to the address on the localhost that you're asking for. 

" java.net.BindException: Cannot assign requested address  " 

I think the issue is that the Elastic IP address is not actually an IP address that's on the localhost.  So the daemon can not bind to that IP.  Instead of using the EIP, use the local IP address for the rpc_address (i think that's what you need since that is what Thrift will bind to).  Then for the listen_address should be the ip address that is routable from the other node.  I would first try with the actual public IP address (not the Elastic IP).  Once you get that to work, then shutdown the cluster, change the listen_address to the EIP, boot up and try again. 

Dave Viner 

On Wed, Feb 23, 2011 at 8:54 PM, Himanshi Sharma < himanshi.sha...@tcs.com > wrote: 
Hey Dave, Sorry i forgot to mention the Non-seed configuration. for first node in us-west its as belowi.e its own elastic ip listen_address: 50.18.60.117 rpc_address: 50.18.60.117 and for second node in ap-southeast-1 its as belowi.e again its own elastic ip listen_address: 175.41.143.192 rpc_address: 175.41.143.192 Thanks, Himanshi 



From: 

Dave Viner < davevi...@gmail.com > 

To: 
user@cassandra.apache.org 

Date: 
02/23/2011 11:01 PM 

Subject: 
Re: Cassandra nodes on EC2 in two different regions not communicating 

internal EC2 ips (10.xxx.xxx.xxx) work across availability zones (e.g., from us-east-1a to us-east-1b) but do not work across regions (e.g., us-east to us-west).  To do regions, you must use the public ip address assigned by amazon. Himanshi, when you log into 1 node, and telnet to port 7000 on the other node, which IP address did you use - the 10.x address or the public ip address? And what is the seed/non-seed configuration in both cassandra.yaml files? Dave Viner On Wed, Feb 23, 2011 at 8:12 AM, Frank LoVecchio < fr...@isidorey.com > wrote: The internal Amazon IP address is what you will want to use so you don't have to go through DNS anyways; not sure if this works from US-East to US-West, but it does make things quicker in between zones, e.g. us-east-1a to us-east-1b. On Wed, Feb 23, 2011 at 9:09 AM, Dave Viner < davevi...@gmail.com > wrote: Try using the IP address, not the dns name in the cassandra.yaml. If you can telnet from one to the other on port 7000, and both nodes have the other node in their config, it should work. Dave Viner On Wed, Feb 23, 2011 at 1:43 AM, Himanshi Sharma < himanshi.sha...@tcs.com > wrote: Ya they do. Have specified Public DNS in seed field of each node in Cassandra.yaml...nt able to figure out what the problem is ??? 



From: 
Sasha Dolgy < sdo...@gmail.com > 

To: 
user@cassandra.apache.org 

Date: 
02/23/2011 02:56 PM 

Subject: 
Re: Cassandra nodes on EC2 in two different regions not communicating 

did you define the other host in the cassandra.yaml ?  on both servers  they need to know about each other On Wed, Feb 23, 2011 at 10:16 AM, Himanshi Sharma < himanshi.sha...@tcs.com > wrote: Thanks Dave but I am able to telnet to other instances on port 7000 and when i run  ./nodetool --host ec2-50-18-60-117.us-west-1.compute.amazonaws.com  ring... I can see only one node. Do we need to configure anything else in Cassandra.yaml or Cassandra-env.sh ??? 



From: 
Dave Viner < davevi...@gmail.com > 

To: 
user@cassandra.apache.org 

Cc: 
Himanshi Sharma < himanshi.sha...@tcs.com > 

Date: 
02/23/2011 11:36 AM 

Subject: 
Re: Cassandra nodes on EC2 in two different regions not communicating 

If you login to one of the nodes, can you telnet to port 7000 on the other node? If not, then almost certainly it's a firewall/Security Group issue. You can find out the security groups for any node by logging in, and then running: % curl " http://169.254.169.254/latest/meta-data/security-groups " Assuming that both nodes are in the same security group, ensure that the SG is configured to allow other members of the SG to communicate on port 7000 to each other. HTH, Dave Viner On Tue, Feb 22, 2011 at 8:59 PM, Himanshi Sharma < himanshi.sha...@tcs.com > wrote: Hi, I am new to Cassandra. I m running Cassandra on EC2. I configured Cassandra cluster on two instances in different regions. But when I am trying the nodetool command with ring option, I am getting only single node. How to make these two nodes communicate with each other. I have already opened required ports. i.e 7000, 8080, 9160 in respective security groups. Plz help me

Re: How does Cassandra handle failure during synchronous writes

2011-02-23 Thread Narendra Sharma
>>c. Read with CL = QUORUM. If read hits node1 and node2/node3, new data
that was written to node1 will be returned.

>>In this case - N1 will be identified as a discrepancy and the change will
be discarded via read repair

[Naren] How will Cassandra know this is a discrepancy?

On Wed, Feb 23, 2011 at 6:05 PM, Anthony John  wrote:

> >Remember the simple rule. Column with highest timestamp is the one that
> will be considered correct EVENTUALLY. So consider following case:
>
> I am sorry, that will return inconsistent results even a Q. Time stamp have
> nothing to do with this. It is just an application provided artifact and
> could be anything.
>
> >c. Read with CL = QUORUM. If read hits node1 and node2/node3, new data
> that was written to node1 will be returned.
>
> In this case - N1 will be identified as a discrepancy and the change will
> be discarded via read repair
>
> On Wed, Feb 23, 2011 at 6:47 PM, Narendra Sharma <
> narendra.sha...@gmail.com> wrote:
>
>> Remember the simple rule. Column with highest timestamp is the one that
>> will be considered correct EVENTUALLY. So consider following case:
>>
>> Cluster size = 3 (say node1, node2 and node3), RF = 3, Read/Write CL =
>> QUORUM
>> a. QUORUM in this case requires 2 nodes. Write failed with successful
>> write to only 1 node say node1.
>> b. Read with CL = QUORUM. If read hits node2 and node3, old data will be
>> returned with read repair triggered in background. On next read you will get
>> the data that was written to node1.
>> c. Read with CL = QUORUM. If read hits node1 and node2/node3, new data
>> that was written to node1 will be returned.
>>
>> HTH!
>>
>> Thanks,
>> Naren
>>
>>
>>
>> On Wed, Feb 23, 2011 at 3:36 PM, Ritesh Tijoriwala <
>> tijoriwala.rit...@gmail.com> wrote:
>>
>>> Hi Anthony,
>>> I am not talking about the case of CL ANY. I am talking about the case
>>> where your consistency level is  R + W > N and you want to write to W nodes
>>> but only succeed in writing to X ( where X < W) nodes and hence fail the
>>> write to the client.
>>>
>>> thanks,
>>> Ritesh
>>>
>>> On Wed, Feb 23, 2011 at 2:48 PM, Anthony John wrote:
>>>
 Ritesh,

 At CL ANY - if all endpoints are down - a HH is written. And it is a
 successful write - not a failed write.

 Now that does not guarantee a READ of the value just written - but that
 is a risk that you take when you use the ANY CL!

 HTH,

 -JA


 On Wed, Feb 23, 2011 at 4:40 PM, Ritesh Tijoriwala <
 tijoriwala.rit...@gmail.com> wrote:

> hi Anthony,
> While you stated the facts right, I don't see how it relates to the
> question I ask. Can you elaborate specifically what happens in the case I
> mentioned above to Dave?
>
> thanks,
> Ritesh
>
>
> On Wed, Feb 23, 2011 at 1:57 PM, Anthony John 
> wrote:
>
>> Seems to me that the explanations are getting incredibly complicated -
>> while I submit the real issue is not!
>>
>> Salient points here:-
>> 1. To be guaranteed data consistency - the writes and reads have to be
>> at Quorum CL or more
>> 2. Any W/R at lesser CL means that the application has to handle the
>> inconsistency, or has to be tolerant of it
>> 3. Writing at "ANY" CL - a special case - means that writes will
>> always go through (as long as any node is up), even if the destination 
>> nodes
>> are not up. This is done via hinted handoff. But this can result in
>> inconsistent reads, and yes that is a problem but refer to pt-2 above
>> 4. At QUORUM CL R/W - after Quorum is met, hinted handoffs are used to
>> handle that case where a particular node is down and the write needs to 
>> be
>> replicated to it. But this will not cause inconsistent R as the hinted
>> handoff (in this case) only applies after Quorum is met - so a Quorum R 
>> is
>> not dependent on the down node being up, and having got the hint.
>>
>> Hope I state this appropriately!
>>
>> HTH,
>>
>> -JA
>>
>>
>> On Wed, Feb 23, 2011 at 3:39 PM, Ritesh Tijoriwala <
>> tijoriwala.rit...@gmail.com> wrote:
>>
>>> > Read repair will probably occur at that point (depending on your
>>> config), which would cause the newest value to propagate to more 
>>> replicas.
>>>
>>> Is the newest value the "quorum" value which means it is the old
>>> value that will be written back to the nodes having "newer non-quorum" 
>>> value
>>> or the newest value is the real new value? :) If later, than this seems 
>>> kind
>>> of odd to me and how it will be useful to any application. A bug?
>>>
>>> Thanks,
>>> Ritesh
>>>
>>>
>>> On Wed, Feb 23, 2011 at 12:43 PM, Dave Revell wrote:
>>>
 Ritesh,

 You have seen the problem. Clients may read the newly written value
 even though the client performing the write saw it as

Re: Cassandra nodes on EC2 in two different regions not communicating

2011-02-23 Thread Dave Viner
That looks like it's not an issue of communicating between nodes.  It
appears that the node can not bind to the address on the localhost that
you're asking for.

"java.net.BindException: Cannot assign requested address "

I think the issue is that the Elastic IP address is not actually an IP
address that's on the localhost.  So the daemon can not bind to that IP.
 Instead of using the EIP, use the local IP address for the rpc_address (i
think that's what you need since that is what Thrift will bind to).  Then
for the listen_address should be the ip address that is routable from the
other node.  I would first try with the actual public IP address (not the
Elastic IP).  Once you get that to work, then shutdown the cluster, change
the listen_address to the EIP, boot up and try again.

Dave Viner


On Wed, Feb 23, 2011 at 8:54 PM, Himanshi Sharma wrote:

>
> Hey Dave,
>
> Sorry i forgot to mention the Non-seed configuration.
>
> for first node in us-west its as belowi.e its own elastic ip
>
> listen_address: 50.18.60.117
> rpc_address: 50.18.60.117
>
> and for second node in ap-southeast-1 its as belowi.e again its own
> elastic ip
>
> listen_address: 175.41.143.192
> rpc_address: 175.41.143.192
>
> Thanks,
> Himanshi
>
>
>
>
>
>  From:
> Dave Viner 
> To: user@cassandra.apache.org Date: 02/23/2011 11:01 PM Subject: Re:
> Cassandra nodes on EC2 in two different regions not communicating
> --
>
>
>
> internal EC2 ips (10.xxx.xxx.xxx) work across availability zones (e.g.,
> from us-east-1a to us-east-1b) but do not work across regions (e.g., us-east
> to us-west).  To do regions, you must use the public ip address assigned by
> amazon.
>
> Himanshi, when you log into 1 node, and telnet to port 7000 on the other
> node, which IP address did you use - the 10.x address or the public ip
> address?
> And what is the seed/non-seed configuration in both cassandra.yaml files?
>
> Dave Viner
>
>
> On Wed, Feb 23, 2011 at 8:12 AM, Frank LoVecchio 
> <*fr...@isidorey.com*>
> wrote:
> The internal Amazon IP address is what you will want to use so you don't
> have to go through DNS anyways; not sure if this works from US-East to
> US-West, but it does make things quicker in between zones, e.g. us-east-1a
> to us-east-1b.
>
>
> On Wed, Feb 23, 2011 at 9:09 AM, Dave Viner 
> <*davevi...@gmail.com*>
> wrote:
> Try using the IP address, not the dns name in the cassandra.yaml.
>
> If you can telnet from one to the other on port 7000, and both nodes have
> the other node in their config, it should work.
>
> Dave Viner
>
>
> On Wed, Feb 23, 2011 at 1:43 AM, Himanshi Sharma <*himanshi.sha...@tcs.com
> * > wrote:
>
> Ya they do. Have specified Public DNS in seed field of each node in
> Cassandra.yaml...nt able to figure out what the problem is ???
>
>
>
>   From: Sasha Dolgy <*sdo...@gmail.com* >  To: *
> user@cassandra.apache.org*   Date: 02/23/2011
> 02:56 PM  Subject: Re: Cassandra nodes on EC2 in two different regions not
> communicating
>
>  --
>
>
>
> did you define the other host in the cassandra.yaml ?  on both servers 
> they need to know about each other
>
> On Wed, Feb 23, 2011 at 10:16 AM, Himanshi Sharma <*
> himanshi.sha...@tcs.com* > wrote:
>
> Thanks Dave but I am able to telnet to other instances on port 7000
> and when i run  ./nodetool --host *
> ec2-50-18-60-117.us-west-1.compute.amazonaws.com*
>  ring... I can see only one node.
>
> Do we need to configure anything else in Cassandra.yaml or Cassandra-env.sh
> ???
>
>
>
>
>
>   From: Dave Viner <*davevi...@gmail.com* >  To: *
> user@cassandra.apache.org*   Cc: Himanshi
> Sharma <*himanshi.sha...@tcs.com* >  Date: 02/23/2011
> 11:36 AM  Subject: Re: Cassandra nodes on EC2 in two different regions not
> communicating
>
>
>  --
>
>
>
> If you login to one of the nodes, can you telnet to port 7000 on the other
> node?
>
> If not, then almost certainly it's a firewall/Security Group issue.
>
> You can find out the security groups for any node by logging in, and then
> running:
>
> % curl 
> "*http://169.254.169.254/latest/meta-data/security-groups*"
>
>
> Assuming that both nodes are in the same security group, ensure that the SG
> is configured to allow other members of the SG to communicate on port 7000
> to each other.
>
> HTH,
> Dave Viner
>
>
> On Tue, Feb 22, 2011 at 8:59 PM, Himanshi Sharma <*himanshi.sha...@tcs.com
> * > wrote:
>
> Hi,
>
> I am new to Cassandra. I m running Cassandra on EC2. I configured Cassandra
> cluster on two instances in different regions.
> But when I am trying the nodetool command with ring option, I am getting
> only single node.
>
> How to make these two nodes communicate with each other. I have already
> opened required ports. i.e 7000, 8080, 9160 in respective
> security groups. Plz help me with this.
>
> Regards,
> Himanshi 

Re: Cassandra nodes on EC2 in two different regions not communicating

2011-02-23 Thread Himanshi Sharma
Hey Dave,

Sorry i forgot to mention the Non-seed configuration.

for first node in us-west its as belowi.e its own elastic ip

listen_address: 50.18.60.117
rpc_address: 50.18.60.117

and for second node in ap-southeast-1 its as belowi.e again its own 
elastic ip

listen_address: 175.41.143.192
rpc_address: 175.41.143.192

Thanks,
Himanshi






From:
Dave Viner 
To:
user@cassandra.apache.org
Date:
02/23/2011 11:01 PM
Subject:
Re: Cassandra nodes on EC2 in two different regions not communicating



internal EC2 ips (10.xxx.xxx.xxx) work across availability zones (e.g., 
from us-east-1a to us-east-1b) but do not work across regions (e.g., 
us-east to us-west).  To do regions, you must use the public ip address 
assigned by amazon.

Himanshi, when you log into 1 node, and telnet to port 7000 on the other 
node, which IP address did you use - the 10.x address or the public ip 
address?
And what is the seed/non-seed configuration in both cassandra.yaml files?

Dave Viner


On Wed, Feb 23, 2011 at 8:12 AM, Frank LoVecchio  
wrote:
The internal Amazon IP address is what you will want to use so you don't 
have to go through DNS anyways; not sure if this works from US-East to 
US-West, but it does make things quicker in between zones, e.g. us-east-1a 
to us-east-1b.


On Wed, Feb 23, 2011 at 9:09 AM, Dave Viner  wrote:
Try using the IP address, not the dns name in the cassandra.yaml.

If you can telnet from one to the other on port 7000, and both nodes have 
the other node in their config, it should work.

Dave Viner


On Wed, Feb 23, 2011 at 1:43 AM, Himanshi Sharma  
wrote:

Ya they do. Have specified Public DNS in seed field of each node in 
Cassandra.yaml...nt able to figure out what the problem is ??? 




From: 
Sasha Dolgy  
To: 
user@cassandra.apache.org 
Date: 
02/23/2011 02:56 PM 
Subject: 
Re: Cassandra nodes on EC2 in two different regions not communicating




did you define the other host in the cassandra.yaml ?  on both servers 
 they need to know about each other 

On Wed, Feb 23, 2011 at 10:16 AM, Himanshi Sharma  wrote: 

Thanks Dave but I am able to telnet to other instances on port 7000 
and when i run  ./nodetool --host 
ec2-50-18-60-117.us-west-1.compute.amazonaws.com  ring... I can see only 
one node. 

Do we need to configure anything else in Cassandra.yaml or 
Cassandra-env.sh ??? 






From: 
Dave Viner  
To: 
user@cassandra.apache.org 
Cc: 
Himanshi Sharma  
Date: 
02/23/2011 11:36 AM 
Subject: 
Re: Cassandra nodes on EC2 in two different regions not communicating





If you login to one of the nodes, can you telnet to port 7000 on the other 
node? 

If not, then almost certainly it's a firewall/Security Group issue. 

You can find out the security groups for any node by logging in, and then 
running: 

% curl "http://169.254.169.254/latest/meta-data/security-groups"; 

Assuming that both nodes are in the same security group, ensure that the 
SG is configured to allow other members of the SG to communicate on port 
7000 to each other. 

HTH, 
Dave Viner 


On Tue, Feb 22, 2011 at 8:59 PM, Himanshi Sharma  
wrote: 

Hi, 

I am new to Cassandra. I m running Cassandra on EC2. I configured 
Cassandra cluster on two instances in different regions. 
But when I am trying the nodetool command with ring option, I am getting 
only single node. 

How to make these two nodes communicate with each other. I have already 
opened required ports. i.e 7000, 8080, 9160 in respective 
security groups. Plz help me with this. 

Regards, 
Himanshi Sharma 


=-=-=
Notice: The information contained in this e-mail
message and/or attachments to it may contain 
confidential or privileged information. If you are 

not the intended recipient, any dissemination, use, 
review, distribution, printing or copying of the 
information contained in this e-mail message 
and/or attachments to it are strictly prohibited. If 
you have received this communication in error, 

please notify us by reply e-mail or telephone and 
immediately and permanently delete the message 
and any attachments. Thank you




=-=-=


Notice: The information contained in this e-mail
message and/or attachments to it may contain 
confidential or privileged information. If you are 
not the intended recipient, any dissemination, use, 
review, distribution, printing or copying of the 


information contained in this e-mail message 
and/or attachments to it are strictly prohibited. If 
you have received this communication in error, 
please notify us by reply e-mail or telephone and 
immediately and permanently delete the message 


and any attachments. Thank you





-- 
Sasha Dolgy
sasha.do...@gmail.com 

=-=-=
Notice: The information contained in this e-mail
message and/or attachments to it may contain 
confidential or privileged information. If you are 



not the intended recipient, any dissemination, use, 
review, distribution, printing or copying of the 
information contained i

Re: Cassandra nodes on EC2 in two different regions not communicating

2011-02-23 Thread Himanshi Sharma
Hi Dave,

Thanks for ur reply..I tried using elastics ips.

 And below is the configuration of the cassandra.yaml in both the nodes. 

seeds:
   - 50.18.60.117
   - 175.41.143.192

Now when i run cassandra i get following exception

INFO 04:30:56,680 Heap size: 878116864/879165440
 INFO 04:30:56,684 JNA not found. Native methods will be disabled.
 INFO 04:30:56,691 Loading settings from 
file:/opt/cassandra/apache-cassandra-0.7.0/conf/cassandra.yaml
 INFO 04:30:56,898 DiskAccessMode 'auto' determined to be standard, 
indexAccessMode is standard
 INFO 04:30:57,092 Creating new commitlog segment 
/var/lib/cassandra/commitlog/CommitLog-1298521857092.log
 INFO 04:30:57,265 reading saved cache 
/var/lib/cassandra/saved_caches/system-IndexInfo-KeyCache
 INFO 04:30:57,270 reading saved cache 
/var/lib/cassandra/saved_caches/system-Schema-KeyCache
 INFO 04:30:57,305 Opening /var/lib/cassandra/data/system/Schema-e-2
 INFO 04:30:57,315 Opening /var/lib/cassandra/data/system/Schema-e-1
 INFO 04:30:57,326 reading saved cache 
/var/lib/cassandra/saved_caches/system-Migrations-KeyCache
 INFO 04:30:57,327 Opening /var/lib/cassandra/data/system/Migrations-e-2
 INFO 04:30:57,331 Opening /var/lib/cassandra/data/system/Migrations-e-1
 INFO 04:30:57,412 reading saved cache 
/var/lib/cassandra/saved_caches/system-LocationInfo-KeyCache
 INFO 04:30:57,413 Opening 
/var/lib/cassandra/data/system/LocationInfo-e-65
 INFO 04:30:57,416 Opening 
/var/lib/cassandra/data/system/LocationInfo-e-67
 INFO 04:30:57,421 Opening 
/var/lib/cassandra/data/system/LocationInfo-e-66
 INFO 04:30:57,427 reading saved cache 
/var/lib/cassandra/saved_caches/system-HintsColumnFamily-KeyCache
 INFO 04:30:57,454 Loading schema version 
f13dfb89-3db9-11e0-8bbe-e700f669bcfc
 WARN 04:30:57,831 Schema definitions were defined both locally and in 
cassandra.yaml. Definitions in cassandra.yaml were ignored.
 INFO 04:30:57,841 reading saved cache 
/var/lib/cassandra/saved_caches/AddressBook-Himanshi-KeyCache
 INFO 04:30:57,842 Opening 
/var/lib/cassandra/data/AddressBook/Himanshi-e-1
 INFO 04:30:57,854 Replaying 
/var/lib/cassandra/commitlog/CommitLog-1298521699165.log
 INFO 04:30:57,854 Finished reading 
/var/lib/cassandra/commitlog/CommitLog-1298521699165.log
 INFO 04:30:57,855 Log replay complete
 INFO 04:30:57,981 Cassandra version: 0.7.0
 INFO 04:30:57,981 Thrift API version: 19.4.0
 INFO 04:30:57,989 Loading persisted ring state
 INFO 04:30:58,025 Starting up server gossip
 INFO 04:30:58,046 switching in a fresh Memtable for LocationInfo at 
CommitLogContext(file='/var/lib/cassandra/commitlog/CommitLog-1298521857092.log',
 
position=148)
 INFO 04:30:58,047 Enqueuing flush of Memtable-LocationInfo@22845412(29 
bytes, 1 operations)
 INFO 04:30:58,047 Writing Memtable-LocationInfo@22845412(29 bytes, 1 
operations)
 INFO 04:30:58,425 Completed flushing 
/var/lib/cassandra/data/system/LocationInfo-e-68-Data.db (149 bytes)
 INFO 04:30:58,426 Compacting 
[org.apache.cassandra.io.sstable.SSTableReader(path='/var/lib/cassandra/data/system/LocationInfo-e-65-Data.db'),org.apache.cassandra.io.sstable.SSTableReader(path='/var/lib/cassandra/data/system/LocationInfo-e-66-Data.db'),org.apache.cassandra.io.sstable.SSTableReader(path='/var/lib/cassandra/data/system/LocationInfo-e-67-Data.db'),org.apache.cassandra.io.sstable.SSTableReader(path='/var/lib/cassandra/data/system/LocationInfo-e-68-Data.db')]
ERROR 04:30:58,513 Exception encountered during startup.
java.net.BindException: Cannot assign requested address
at sun.nio.ch.Net.bind(Native Method)
at 
sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:126)
at 
sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:59)
at 
sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:52)
at 
org.apache.cassandra.net.MessagingService.listen(MessagingService.java:181)
at 
org.apache.cassandra.service.StorageService.initServer(StorageService.java:334)
at 
org.apache.cassandra.service.AbstractCassandraDaemon.setup(AbstractCassandraDaemon.java:161)
at 
org.apache.cassandra.thrift.CassandraDaemon.setup(CassandraDaemon.java:55)
at 
org.apache.cassandra.service.AbstractCassandraDaemon.activate(AbstractCassandraDaemon.java:217)
at 
org.apache.cassandra.thrift.CassandraDaemon.main(CassandraDaemon.java:134)
Exception encountered during startup.
java.net.BindException: Cannot assign requested address
at sun.nio.ch.Net.bind(Native Method)
at 
sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:126)
at 
sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:59)
at 
sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:52)
at 
org.apache.cassandra.net.MessagingService.listen(MessagingService.java:181)
at 
org.apache.cassandra.service.StorageService.initServer(StorageService.java:334)
at 
org.apache.cassandra.service.AbstractCassandraDaemon.setup(Abstr

Re: Fill disks more than 50%

2011-02-23 Thread Edward Capriolo
On Wed, Feb 23, 2011 at 9:39 PM, Terje Marthinussen
 wrote:
> Hi,
> Given that you have have always increasing key values (timestamps) and never
> delete and hardly ever overwrite data.
> If you want to minimize work on rebalancing and statically assign (new)
> token ranges to new nodes as you add them so they always get the latest
> data
> Lets say you add a new node each year to handle next years data.
> In a scenario like this, could you with 0.7 be able to safely fill disks
> significantly more than 50% and still manage things like repair/recovery of
> faulty nodes?
>
> Regards,
> Terje

Since all your data for a day/month/year would sit on the same server.
Meaning all your servers with old data would be idle and your servers
with current data would be very busy. This is probably not a good way
to go.

There is a ticket open for 0.8 for efficient node moves joins. It is
already a lot better in 0.7. Pretend you did not see this (you can
join nodes using rsync if you know some tricks) if you are really
afraid of joins, which you really should not be.

As for the 50% statement. In a worse case scenario a major compaction
will require double the disk size of your column family. So if you
have more then 1 column family you do NOT need 50% overhead.


A simple script that creates multi node clusters on a single machine.

2011-02-23 Thread Edward Capriolo
On the mailing list and IRC there are many questions about Cassandra
internals. I understand where the questions are coming from because it
took me a while to get a grip on it.

However if you have a laptop with a descent amount of RAM 2 GB is
enough for 3-5 nodes, (4GB is better). You can kick up a multi-node
cluster right on your laptop. Then you can test failure/eventual
consistent scenarios such as (insert to node A, kill node B, join node
C) till your hearts content.

http://www.edwardcapriolo.com/roller/edwardcapriolo/entry/lauching_5_node_cassandra_clusters


Re: Is it possible to get list of row keys?

2011-02-23 Thread Joshua Partogi
Thanks Roshan,

I think I understand now. The setRowCount() is in the Java Cassandra
driver. I'll try to find the similar method in the Ruby API.

Kind regards,
Joshua

On Thu, Feb 24, 2011 at 1:04 PM, Roshan Dawrani  wrote:
> On Thu, Feb 24, 2011 at 6:54 AM, Joshua Partogi 
> wrote:
>>
>> I am sorry for not making it clear in my original
>> post that what I am looking for is the list of keys in the database
>> assuming that the client application does not know the keys. From what
>> I understand, RangeSliceQuery requires you to pass the startKey, which
>> means the client application have to know beforehand the key that will
>> be used as startkey.
>
> I think it was quite clear anyway that your client app does not know any
> specific keys, and you don't have to pass an existing key as the start / end
> key.
> You can pass start and end keys as empty values with setRowCount() left to
> default of 100 or another specific value that u want. After that first
> batch, you have to pick-up the last key of the batch and make that as the
> start key of the next batch query, and keep moving along like that (as
> described previously in this thread)
> --
> Roshan
> Blog: http://roshandawrani.wordpress.com/
> Twitter: @roshandawrani
> Skype: roshandawrani
>
>



-- 
http://twitter.com/jpartogi


Fill disks more than 50%

2011-02-23 Thread Terje Marthinussen
Hi,

Given that you have have always increasing key values (timestamps) and never
delete and hardly ever overwrite data.

If you want to minimize work on rebalancing and statically assign (new)
token ranges to new nodes as you add them so they always get the latest
data
Lets say you add a new node each year to handle next years data.

In a scenario like this, could you with 0.7 be able to safely fill disks
significantly more than 50% and still manage things like repair/recovery of
faulty nodes?


Regards,
Terje


Re: New Chain for : Does Cassandra use vector clocks

2011-02-23 Thread Edward Capriolo
On Wed, Feb 23, 2011 at 9:28 PM, Ritesh Tijoriwala
 wrote:
> I was about to ask what Anthony's latest post below captures - if we don't
> have vector clocks and no locking, how does cassandra prevent/detect
> conflicts? This is somewhat related to the question I asked in last post
> - http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/How-does-Cassandra-handle-failure-during-synchronous-writes-td6055152.html
> Thanks,
> Ritesh
>
>
>
> On Wed, Feb 23, 2011 at 6:22 PM, Anthony John  wrote:
>>
>> Apologies : For some reason my response on the original mail keeps
>> bouncing back, thus this new one!
>>
>> > From the other hand, the same article says:
>> > "For conditional writes to work, the condition must be evaluated at all
>> > update
>> > sites before the write can be allowed to succeed."
>> >
>> > This means, that when doing such an update CL=ALL must be used
>>
>> Sorry, but I am confused by that entire thread!
>> Questions:-
>> 1. Does Cassandra implement any kind of data locking - at any granularity
>> whether it be row/colF/Col ?
>> 2. If the answer to 1 above is NO! - how does CL ALL prevent conflicts.
>> Concurrent updates on exactly the same piece of data on different nodes can
>> still mess each other up, right ?
>> -JA
>

Cassandra does not provide any build in locking. It can not protect
from "lost updates" caused by multiple independent entities reading
and writing the same data.

The cages library handles locking externally and is really easy to use.
http://ria101.wordpress.com/2010/05/12/locking-and-transactions-over-cassandra-using-cages/


Re: New Chain for : Does Cassandra use vector clocks

2011-02-23 Thread Ritesh Tijoriwala
I was about to ask what Anthony's latest post below captures - if we don't
have vector clocks and no locking, how does cassandra prevent/detect
conflicts? This is somewhat related to the question I asked in last post -
http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/How-does-Cassandra-handle-failure-during-synchronous-writes-td6055152.html


Thanks,
Ritesh



On Wed, Feb 23, 2011 at 6:22 PM, Anthony John  wrote:

> Apologies : For some reason my response on the original mail keeps bouncing
> back, thus this new one!
> > From the other hand, the same article says:
> > "For conditional writes to work, the condition must be evaluated at all
> update
> > sites before the write can be allowed to succeed."
> >
> > This means, that when doing such an update CL=ALL must be used
>
> Sorry, but I am confused by that entire thread!
>
> Questions:-
> 1. Does Cassandra implement any kind of data locking - at any granularity
> whether it be row/colF/Col ?
> 2. If the answer to 1 above is NO! - how does CL ALL prevent conflicts.
> Concurrent updates on exactly the same piece of data on different nodes can
> still mess each other up, right ?
>
> -JA
>


Re: Splitting a single row into multiple

2011-02-23 Thread Aditya Narayan
Thanks Aaron.. I was looking to spliting the rows so that I could use
a standard CF instead of super.. but your argument also makes sense.



On Thu, Feb 24, 2011 at 1:19 AM, Aaron Morton  wrote:
> AFAIK performance in the single row case will better. Multi get may require 
> multiple seeks and reads in an sstable,, verses obviously a single seek and 
> read for a single row. Multiplied by the number of sstables that contain row 
> data.
>
> Using the key cache would reduce the the seeks.
>
> If it makes sense in your app do it. In general though try to model data so a 
> single row read gets what you need.
>
> Aaron
>
> On 24/02/2011, at 5:59 AM, Aditya Narayan  wrote:
>
>> Does it make any difference if I split a row, that needs to be
>> accessed together, into two or three rows and then read those multiple
>> rows ??
>> (Assume the keys of all the three rows are known to me programatically
>> since I split columns by certain categories).
>> Would the performance be any better if all the three were just a single row 
>> ??
>>
>> I guess the performance should be same in both cases, the columns
>> remain the same in quantity & there spread into several SST files..
>


New Chain for : Does Cassandra use vector clocks

2011-02-23 Thread Anthony John
Apologies : For some reason my response on the original mail keeps bouncing
back, thus this new one!
> From the other hand, the same article says:
> "For conditional writes to work, the condition must be evaluated at all
update
> sites before the write can be allowed to succeed."
>
> This means, that when doing such an update CL=ALL must be used

Sorry, but I am confused by that entire thread!

Questions:-
1. Does Cassandra implement any kind of data locking - at any granularity
whether it be row/colF/Col ?
2. If the answer to 1 above is NO! - how does CL ALL prevent conflicts.
Concurrent updates on exactly the same piece of data on different nodes can
still mess each other up, right ?

-JA


Re: How does Cassandra handle failure during synchronous writes

2011-02-23 Thread Ritesh Tijoriwala
>In this case - N1 will be identified as a discrepancy and the change will
be discarded via read repair

Brilliant. This does sound correct :)

One more related question - how are read repairs protected against a quorum
write that is in-progress? For e.g. say nodes A, B, C and Client C1 intends
to write K = X for Quorum ( = 2 nodes) say on A & B and mean while just
after it finishes writing on A and before writing on B, client C2 reads with
Quorum. Then does that trigger a read repair and race with C1?

Also, when a client reads with Quorum, does it read all nodes (A, B, C in
this case) or any Quorum and if it cannot figure out a consistent value then
it reads more? What is the process here? For e.g. in the above example, if
C2 were to read A and C, then it will bet X and W which will not achieve a
quorum so then would that trigger a read on C? And does this continue for
some number of times until a quorum is achieved or timeout occurs? For e.g.
under high concurrency for a specific value, values might be changing fast.

Thanks,
Ritesh

On Wed, Feb 23, 2011 at 6:05 PM, Anthony John  wrote:

> >Remember the simple rule. Column with highest timestamp is the one that
> will be considered correct EVENTUALLY. So consider following case:
>
> I am sorry, that will return inconsistent results even a Q. Time stamp have
> nothing to do with this. It is just an application provided artifact and
> could be anything.
>
> >c. Read with CL = QUORUM. If read hits node1 and node2/node3, new data
> that was written to node1 will be returned.
>
> In this case - N1 will be identified as a discrepancy and the change will
> be discarded via read repair
>
> On Wed, Feb 23, 2011 at 6:47 PM, Narendra Sharma <
> narendra.sha...@gmail.com> wrote:
>
>> Remember the simple rule. Column with highest timestamp is the one that
>> will be considered correct EVENTUALLY. So consider following case:
>>
>> Cluster size = 3 (say node1, node2 and node3), RF = 3, Read/Write CL =
>> QUORUM
>> a. QUORUM in this case requires 2 nodes. Write failed with successful
>> write to only 1 node say node1.
>> b. Read with CL = QUORUM. If read hits node2 and node3, old data will be
>> returned with read repair triggered in background. On next read you will get
>> the data that was written to node1.
>> c. Read with CL = QUORUM. If read hits node1 and node2/node3, new data
>> that was written to node1 will be returned.
>>
>> HTH!
>>
>> Thanks,
>> Naren
>>
>>
>>
>> On Wed, Feb 23, 2011 at 3:36 PM, Ritesh Tijoriwala <
>> tijoriwala.rit...@gmail.com> wrote:
>>
>>> Hi Anthony,
>>> I am not talking about the case of CL ANY. I am talking about the case
>>> where your consistency level is  R + W > N and you want to write to W nodes
>>> but only succeed in writing to X ( where X < W) nodes and hence fail the
>>> write to the client.
>>>
>>> thanks,
>>> Ritesh
>>>
>>> On Wed, Feb 23, 2011 at 2:48 PM, Anthony John wrote:
>>>
 Ritesh,

 At CL ANY - if all endpoints are down - a HH is written. And it is a
 successful write - not a failed write.

 Now that does not guarantee a READ of the value just written - but that
 is a risk that you take when you use the ANY CL!

 HTH,

 -JA


 On Wed, Feb 23, 2011 at 4:40 PM, Ritesh Tijoriwala <
 tijoriwala.rit...@gmail.com> wrote:

> hi Anthony,
> While you stated the facts right, I don't see how it relates to the
> question I ask. Can you elaborate specifically what happens in the case I
> mentioned above to Dave?
>
> thanks,
> Ritesh
>
>
> On Wed, Feb 23, 2011 at 1:57 PM, Anthony John 
> wrote:
>
>> Seems to me that the explanations are getting incredibly complicated -
>> while I submit the real issue is not!
>>
>> Salient points here:-
>> 1. To be guaranteed data consistency - the writes and reads have to be
>> at Quorum CL or more
>> 2. Any W/R at lesser CL means that the application has to handle the
>> inconsistency, or has to be tolerant of it
>> 3. Writing at "ANY" CL - a special case - means that writes will
>> always go through (as long as any node is up), even if the destination 
>> nodes
>> are not up. This is done via hinted handoff. But this can result in
>> inconsistent reads, and yes that is a problem but refer to pt-2 above
>> 4. At QUORUM CL R/W - after Quorum is met, hinted handoffs are used to
>> handle that case where a particular node is down and the write needs to 
>> be
>> replicated to it. But this will not cause inconsistent R as the hinted
>> handoff (in this case) only applies after Quorum is met - so a Quorum R 
>> is
>> not dependent on the down node being up, and having got the hint.
>>
>> Hope I state this appropriately!
>>
>> HTH,
>>
>> -JA
>>
>>
>> On Wed, Feb 23, 2011 at 3:39 PM, Ritesh Tijoriwala <
>> tijoriwala.rit...@gmail.com> wrote:
>>
>>> > Read repair will

Re: How does Cassandra handle failure during synchronous writes

2011-02-23 Thread Anthony John
>Remember the simple rule. Column with highest timestamp is the one that
will be considered correct EVENTUALLY. So consider following case:

I am sorry, that will return inconsistent results even a Q. Time stamp have
nothing to do with this. It is just an application provided artifact and
could be anything.

>c. Read with CL = QUORUM. If read hits node1 and node2/node3, new data that
was written to node1 will be returned.

In this case - N1 will be identified as a discrepancy and the change will be
discarded via read repair

On Wed, Feb 23, 2011 at 6:47 PM, Narendra Sharma
wrote:

> Remember the simple rule. Column with highest timestamp is the one that
> will be considered correct EVENTUALLY. So consider following case:
>
> Cluster size = 3 (say node1, node2 and node3), RF = 3, Read/Write CL =
> QUORUM
> a. QUORUM in this case requires 2 nodes. Write failed with successful write
> to only 1 node say node1.
> b. Read with CL = QUORUM. If read hits node2 and node3, old data will be
> returned with read repair triggered in background. On next read you will get
> the data that was written to node1.
> c. Read with CL = QUORUM. If read hits node1 and node2/node3, new data that
> was written to node1 will be returned.
>
> HTH!
>
> Thanks,
> Naren
>
>
>
> On Wed, Feb 23, 2011 at 3:36 PM, Ritesh Tijoriwala <
> tijoriwala.rit...@gmail.com> wrote:
>
>> Hi Anthony,
>> I am not talking about the case of CL ANY. I am talking about the case
>> where your consistency level is  R + W > N and you want to write to W nodes
>> but only succeed in writing to X ( where X < W) nodes and hence fail the
>> write to the client.
>>
>> thanks,
>> Ritesh
>>
>> On Wed, Feb 23, 2011 at 2:48 PM, Anthony John wrote:
>>
>>> Ritesh,
>>>
>>> At CL ANY - if all endpoints are down - a HH is written. And it is a
>>> successful write - not a failed write.
>>>
>>> Now that does not guarantee a READ of the value just written - but that
>>> is a risk that you take when you use the ANY CL!
>>>
>>> HTH,
>>>
>>> -JA
>>>
>>>
>>> On Wed, Feb 23, 2011 at 4:40 PM, Ritesh Tijoriwala <
>>> tijoriwala.rit...@gmail.com> wrote:
>>>
 hi Anthony,
 While you stated the facts right, I don't see how it relates to the
 question I ask. Can you elaborate specifically what happens in the case I
 mentioned above to Dave?

 thanks,
 Ritesh


 On Wed, Feb 23, 2011 at 1:57 PM, Anthony John wrote:

> Seems to me that the explanations are getting incredibly complicated -
> while I submit the real issue is not!
>
> Salient points here:-
> 1. To be guaranteed data consistency - the writes and reads have to be
> at Quorum CL or more
> 2. Any W/R at lesser CL means that the application has to handle the
> inconsistency, or has to be tolerant of it
> 3. Writing at "ANY" CL - a special case - means that writes will always
> go through (as long as any node is up), even if the destination nodes are
> not up. This is done via hinted handoff. But this can result in 
> inconsistent
> reads, and yes that is a problem but refer to pt-2 above
> 4. At QUORUM CL R/W - after Quorum is met, hinted handoffs are used to
> handle that case where a particular node is down and the write needs to be
> replicated to it. But this will not cause inconsistent R as the hinted
> handoff (in this case) only applies after Quorum is met - so a Quorum R is
> not dependent on the down node being up, and having got the hint.
>
> Hope I state this appropriately!
>
> HTH,
>
> -JA
>
>
> On Wed, Feb 23, 2011 at 3:39 PM, Ritesh Tijoriwala <
> tijoriwala.rit...@gmail.com> wrote:
>
>> > Read repair will probably occur at that point (depending on your
>> config), which would cause the newest value to propagate to more 
>> replicas.
>>
>> Is the newest value the "quorum" value which means it is the old value
>> that will be written back to the nodes having "newer non-quorum" value or
>> the newest value is the real new value? :) If later, than this seems 
>> kind of
>> odd to me and how it will be useful to any application. A bug?
>>
>> Thanks,
>> Ritesh
>>
>>
>> On Wed, Feb 23, 2011 at 12:43 PM, Dave Revell wrote:
>>
>>> Ritesh,
>>>
>>> You have seen the problem. Clients may read the newly written value
>>> even though the client performing the write saw it as a failure. When 
>>> the
>>> client reads, it will use the correct number of replicas for the chosen 
>>> CL,
>>> then return the newest value seen at any replica. This "newest value" 
>>> could
>>> be the result of a failed write.
>>>
>>> Read repair will probably occur at that point (depending on your
>>> config), which would cause the newest value to propagate to more 
>>> replicas.
>>>
>>> R+W>N guarantees serial order of operations: any read at CL=R that
>>> occurs after

Re: Is it possible to get list of row keys?

2011-02-23 Thread Roshan Dawrani
On Thu, Feb 24, 2011 at 6:54 AM, Joshua Partogi wrote:

>
> I am sorry for not making it clear in my original
> post that what I am looking for is the list of keys in the database
> assuming that the client application does not know the keys. From what
> I understand, RangeSliceQuery requires you to pass the startKey, which
> means the client application have to know beforehand the key that will
> be used as startkey.
>

I think it was quite clear anyway that your client app does not know any
specific keys, and you don't have to pass an existing key as the start / end
key.

You can pass start and end keys as empty values with setRowCount() left to
default of 100 or another specific value that u want. After that first
batch, you have to pick-up the last key of the batch and make that as the
start key of the next batch query, and keep moving along like that (as
described previously in this thread)

-- 
Roshan
Blog: http://roshandawrani.wordpress.com/
Twitter: @roshandawrani 
Skype: roshandawrani


Re: How does Cassandra handle failure during synchronous writes

2011-02-23 Thread Ritesh Tijoriwala
Thanks Narendra. This is exactly what I was looking for. So the read will
return with old value but at the same time, repair will occur and next reads
will return "new value". But the new value was never written successfully in
the first place as Quorum was never achieved. Isn't that semantically
incorrect?
Taking configuration of cluster size = 3 and RF = 3 as you described with
Read/Write CL = Quorum,

0. Current value for some key K = W.
1. Client writes K = X. Unfortunately, due to intermittent network error,
writes cannot be done successfully on quorum nodes (say node 2 or node 3).
Node 1 has written successfully the value of X for K. Hence, a failure is
returned to the client. If this X gets written for some unknown reason
behind the scene, how the client is suppose to know this? This sounds like a
major design flaw. For e.g. consider withdrawing $500 from account B. If
client is told that withdrawal cannot succeed, he will try again just to
find out that his account is in overdraft state even though the consistency
level he was using is Read/Write Consistent with Quorum.

On step 2 after 1, when client asks for K, I agree that W should be returned
but at the same time, I don't know if silently propagating the failed value
to rest of the nodes is the right behavior.

Thanks,
Ritesh


On Wed, Feb 23, 2011 at 4:47 PM, Narendra Sharma
wrote:

> Remember the simple rule. Column with highest timestamp is the one that
> will be considered correct EVENTUALLY. So consider following case:
>
> Cluster size = 3 (say node1, node2 and node3), RF = 3, Read/Write CL =
> QUORUM
> a. QUORUM in this case requires 2 nodes. Write failed with successful write
> to only 1 node say node1.
> b. Read with CL = QUORUM. If read hits node2 and node3, old data will be
> returned with read repair triggered in background. On next read you will get
> the data that was written to node1.
> c. Read with CL = QUORUM. If read hits node1 and node2/node3, new data that
> was written to node1 will be returned.
>
> HTH!
>
> Thanks,
> Naren
>
>
>
> On Wed, Feb 23, 2011 at 3:36 PM, Ritesh Tijoriwala <
> tijoriwala.rit...@gmail.com> wrote:
>
>> Hi Anthony,
>> I am not talking about the case of CL ANY. I am talking about the case
>> where your consistency level is  R + W > N and you want to write to W nodes
>> but only succeed in writing to X ( where X < W) nodes and hence fail the
>> write to the client.
>>
>> thanks,
>> Ritesh
>>
>> On Wed, Feb 23, 2011 at 2:48 PM, Anthony John wrote:
>>
>>> Ritesh,
>>>
>>> At CL ANY - if all endpoints are down - a HH is written. And it is a
>>> successful write - not a failed write.
>>>
>>> Now that does not guarantee a READ of the value just written - but that
>>> is a risk that you take when you use the ANY CL!
>>>
>>> HTH,
>>>
>>> -JA
>>>
>>>
>>> On Wed, Feb 23, 2011 at 4:40 PM, Ritesh Tijoriwala <
>>> tijoriwala.rit...@gmail.com> wrote:
>>>
 hi Anthony,
 While you stated the facts right, I don't see how it relates to the
 question I ask. Can you elaborate specifically what happens in the case I
 mentioned above to Dave?

 thanks,
 Ritesh


 On Wed, Feb 23, 2011 at 1:57 PM, Anthony John wrote:

> Seems to me that the explanations are getting incredibly complicated -
> while I submit the real issue is not!
>
> Salient points here:-
> 1. To be guaranteed data consistency - the writes and reads have to be
> at Quorum CL or more
> 2. Any W/R at lesser CL means that the application has to handle the
> inconsistency, or has to be tolerant of it
> 3. Writing at "ANY" CL - a special case - means that writes will always
> go through (as long as any node is up), even if the destination nodes are
> not up. This is done via hinted handoff. But this can result in 
> inconsistent
> reads, and yes that is a problem but refer to pt-2 above
> 4. At QUORUM CL R/W - after Quorum is met, hinted handoffs are used to
> handle that case where a particular node is down and the write needs to be
> replicated to it. But this will not cause inconsistent R as the hinted
> handoff (in this case) only applies after Quorum is met - so a Quorum R is
> not dependent on the down node being up, and having got the hint.
>
> Hope I state this appropriately!
>
> HTH,
>
> -JA
>
>
> On Wed, Feb 23, 2011 at 3:39 PM, Ritesh Tijoriwala <
> tijoriwala.rit...@gmail.com> wrote:
>
>> > Read repair will probably occur at that point (depending on your
>> config), which would cause the newest value to propagate to more 
>> replicas.
>>
>> Is the newest value the "quorum" value which means it is the old value
>> that will be written back to the nodes having "newer non-quorum" value or
>> the newest value is the real new value? :) If later, than this seems 
>> kind of
>> odd to me and how it will be useful to any application. A bug?
>>
>> Thanks,
>> Ri

Re: Is it possible to get list of row keys?

2011-02-23 Thread Joshua Partogi
Hi Buddha system

It is updated.

Kind regards,
Joshua.

On Thu, Feb 24, 2011 at 12:41 PM, buddhasystem  wrote:
>
> Is your data updated or large chunks are read-only?
> --
> View this message in context: 
> http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Is-it-possible-to-get-list-of-row-keys-tp6055419p6058764.html
> Sent from the cassandra-u...@incubator.apache.org mailing list archive at 
> Nabble.com.
>



-- 
http://twitter.com/jpartogi


Re: Is it possible to get list of row keys?

2011-02-23 Thread buddhasystem

Is your data updated or large chunks are read-only?
-- 
View this message in context: 
http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Is-it-possible-to-get-list-of-row-keys-tp6055419p6058764.html
Sent from the cassandra-u...@incubator.apache.org mailing list archive at 
Nabble.com.


Re: Is it possible to get list of row keys?

2011-02-23 Thread Joshua Partogi
Hi everyone,

Thank you to everyone that have responded to my email. I really
appreciate that. I am sorry for not making it clear in my original
post that what I am looking for is the list of keys in the database
assuming that the client application does not know the keys. From what
I understand, RangeSliceQuery requires you to pass the startKey, which
means the client application have to know beforehand the key that will
be used as startkey.

So, I am trying to do this in cassandra:

select id from table_name;

while RangeSliceQuery would be something like this in SQL (CMIIW),
which is not what I want:

select id from table_name where id between 100 and 1000;

Please let me know whether what I am after is achievable in cassandra.

Kind regards,
Joshua.

On Thu, Feb 24, 2011 at 12:47 AM, Ching-Cheng Chen
 wrote:
> Actually, if you want to get ALL keys, I believe you can still use
> RangeSliceQuery with RP.
> Just use setKeys("","") as first batch call.
> Then use the last key from previous batch as startKey for next batch.
> Beware that since startKey is inclusive, so you'd need to ignore first key
> from now on.
> Keep going until you finish all batches.  You will know you'd need to stop
> when setKeys(key_xyz,"") return you only one key.
> This should get you all keys even with RP.
> Regards,
> Chen
> www.evidentsoftware.com
>
> On Wed, Feb 23, 2011 at 8:23 AM, Norman Maurer  wrote:
>>
>> query per ranges is only possible with OPP or BPP.
>>
>> Bye,
>> Norman
>>
>>
>> 2011/2/23 Sasha Dolgy :
>> > What if i want 20 rows and the next 20 rows in a subsequent query?  can
>> > this
>> > only be achieved with OPP?
>> >
>> > --
>> > Sasha Dolgy
>> > sasha.do...@gmail.com
>> >
>> > On 23 Feb 2011 13:54, "Ching-Cheng Chen" 
>> > wrote:
>> >
>
>



-- 
http://twitter.com/jpartogi


Re: How does Cassandra handle failure during synchronous writes

2011-02-23 Thread Narendra Sharma
Remember the simple rule. Column with highest timestamp is the one that will
be considered correct EVENTUALLY. So consider following case:

Cluster size = 3 (say node1, node2 and node3), RF = 3, Read/Write CL =
QUORUM
a. QUORUM in this case requires 2 nodes. Write failed with successful write
to only 1 node say node1.
b. Read with CL = QUORUM. If read hits node2 and node3, old data will be
returned with read repair triggered in background. On next read you will get
the data that was written to node1.
c. Read with CL = QUORUM. If read hits node1 and node2/node3, new data that
was written to node1 will be returned.

HTH!

Thanks,
Naren


On Wed, Feb 23, 2011 at 3:36 PM, Ritesh Tijoriwala <
tijoriwala.rit...@gmail.com> wrote:

> Hi Anthony,
> I am not talking about the case of CL ANY. I am talking about the case
> where your consistency level is  R + W > N and you want to write to W nodes
> but only succeed in writing to X ( where X < W) nodes and hence fail the
> write to the client.
>
> thanks,
> Ritesh
>
> On Wed, Feb 23, 2011 at 2:48 PM, Anthony John wrote:
>
>> Ritesh,
>>
>> At CL ANY - if all endpoints are down - a HH is written. And it is a
>> successful write - not a failed write.
>>
>> Now that does not guarantee a READ of the value just written - but that is
>> a risk that you take when you use the ANY CL!
>>
>> HTH,
>>
>> -JA
>>
>>
>> On Wed, Feb 23, 2011 at 4:40 PM, Ritesh Tijoriwala <
>> tijoriwala.rit...@gmail.com> wrote:
>>
>>> hi Anthony,
>>> While you stated the facts right, I don't see how it relates to the
>>> question I ask. Can you elaborate specifically what happens in the case I
>>> mentioned above to Dave?
>>>
>>> thanks,
>>> Ritesh
>>>
>>>
>>> On Wed, Feb 23, 2011 at 1:57 PM, Anthony John wrote:
>>>
 Seems to me that the explanations are getting incredibly complicated -
 while I submit the real issue is not!

 Salient points here:-
 1. To be guaranteed data consistency - the writes and reads have to be
 at Quorum CL or more
 2. Any W/R at lesser CL means that the application has to handle the
 inconsistency, or has to be tolerant of it
 3. Writing at "ANY" CL - a special case - means that writes will always
 go through (as long as any node is up), even if the destination nodes are
 not up. This is done via hinted handoff. But this can result in 
 inconsistent
 reads, and yes that is a problem but refer to pt-2 above
 4. At QUORUM CL R/W - after Quorum is met, hinted handoffs are used to
 handle that case where a particular node is down and the write needs to be
 replicated to it. But this will not cause inconsistent R as the hinted
 handoff (in this case) only applies after Quorum is met - so a Quorum R is
 not dependent on the down node being up, and having got the hint.

 Hope I state this appropriately!

 HTH,

 -JA


 On Wed, Feb 23, 2011 at 3:39 PM, Ritesh Tijoriwala <
 tijoriwala.rit...@gmail.com> wrote:

> > Read repair will probably occur at that point (depending on your
> config), which would cause the newest value to propagate to more replicas.
>
> Is the newest value the "quorum" value which means it is the old value
> that will be written back to the nodes having "newer non-quorum" value or
> the newest value is the real new value? :) If later, than this seems kind 
> of
> odd to me and how it will be useful to any application. A bug?
>
> Thanks,
> Ritesh
>
>
> On Wed, Feb 23, 2011 at 12:43 PM, Dave Revell wrote:
>
>> Ritesh,
>>
>> You have seen the problem. Clients may read the newly written value
>> even though the client performing the write saw it as a failure. When the
>> client reads, it will use the correct number of replicas for the chosen 
>> CL,
>> then return the newest value seen at any replica. This "newest value" 
>> could
>> be the result of a failed write.
>>
>> Read repair will probably occur at that point (depending on your
>> config), which would cause the newest value to propagate to more 
>> replicas.
>>
>> R+W>N guarantees serial order of operations: any read at CL=R that
>> occurs after a write at CL=W will observe the write. I don't think this
>> property is relevant to your current question, though.
>>
>> Cassandra has no mechanism to "roll back" the partial write, other
>> than to simply write again. This may also fail.
>>
>> Best,
>> Dave
>>
>>
>> On Wed, Feb 23, 2011 at 10:12 AM, wrote:
>>
>>> Hi Dave,
>>> Thanks for your input. In the steps you mention, what happens when
>>> client tries to read the value at step 6? Is it possible that the 
>>> client may
>>> see the new value? My understanding was if R + W > N, then client will 
>>> not
>>> see the new value as Quorum nodes will not agree on the new value. If 
>>> that
>>> is the

Re: Understand eventually consistent

2011-02-23 Thread mcasandra

I am reading this again http://wiki.apache.org/cassandra/HintedHandoff and
got little confused. This is my understanding about how HH should work based
on what I read in Dynamo Paper:

1) Say node A, B, C, D, E are in the cluster in a ring (in that order). 
2) For a given key K RF=3.
3) Node B holds theyhash of that key K. Which means when K is written it
will be written to B (owner of the hash) + C + D since RF = 3
4) If Node D goes down and there is a write again to key K then this time
key K row will be written with W=1 to B (owner) + C + E (HH) since RF=3
needs to be satisfied. Is this correct?
5) In above scenario where node D is down and if we are reading at W=2 and
R=2 would it fail even though original nodes B + C are up? Here I am
thinking W=2 and R=2 means that 2 nodes that hold the key K are up so it
satisfies the CL and thus writes and read will not fail.
-- 
View this message in context: 
http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Understand-eventually-consistent-tp6038330p6058576.html
Sent from the cassandra-u...@incubator.apache.org mailing list archive at 
Nabble.com.


Changing comparators

2011-02-23 Thread Narendra Sharma
Today it is not possible to change the comparators (compare_with and
compare_subcolumns_with). I went through the discussion on thread
http://comments.gmane.org/gmane.comp.db.cassandra.user/12466.

Does it make sense to atleast allow one way change i.e. from specific types
to generic type? For eg change from TimeUUIDType or UTF8 to BytesType. This
could be a manual process where users will do the schema change and then run
major compaction on all the nodes to fix the ordering.

Thanks,
Naren


Re: How come key cache increases speed by x4?

2011-02-23 Thread Robert Coli
On Wed, Feb 23, 2011 at 4:04 PM, buddhasystem  wrote:

> Well I know the cache is there for a reason, I just can't explain the factor
> of 4 when I run my queries on a hot vs cold cache. My queries are actually a
> chain of one on an inverted index, which produces a tuple of keys to be used
> in the "main" query. The inverted index query should be downright trivial.
>
> I see the turnaround time per row go down to 1 ms from 4 ms. Am I missing
> something? Why such a large factor?

(simplified for discussion purposes, not necessarily exhaustive
description of.. )

Path in the cold key cache case :

a) check all bloom filters, 1 per sstable in the CF, which is in memory
b) read the index file (not in memory) and traverse index for every
sstable which returns positive in a)
c) read the actual data file once for every sstable

Path in the hot key cache case :

a) read list of filenames and offsets from key cache
b) read the actual data file

You will notice that the former involves a lot more seeking than the
latter, especially if you have "many" sstables. This seeking almost
certainly is the cause of your observed difference. If you graph I/O
throughput in the two different cases, you will almost certainly see
yourself doing more (slow) I/O in the cold cache case. Memory spent on
key cache is usually relatively well spent, for this reason.

=Rob


How come key cache increases speed by x4?

2011-02-23 Thread buddhasystem

Well I know the cache is there for a reason, I just can't explain the factor
of 4 when I run my queries on a hot vs cold cache. My queries are actually a
chain of one on an inverted index, which produces a tuple of keys to be used
in the "main" query. The inverted index query should be downright trivial.

I see the turnaround time per row go down to 1 ms from 4 ms. Am I missing
something? Why such a large factor?

TIA

Maxim

-- 
View this message in context: 
http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/How-come-key-cache-increases-speed-by-x4-tp6058435p6058435.html
Sent from the cassandra-u...@incubator.apache.org mailing list archive at 
Nabble.com.


Re: How does Cassandra handle failure during synchronous writes

2011-02-23 Thread Ritesh Tijoriwala
Hi Anthony,
I am not talking about the case of CL ANY. I am talking about the case where
your consistency level is  R + W > N and you want to write to W nodes but
only succeed in writing to X ( where X < W) nodes and hence fail the write
to the client.

thanks,
Ritesh

On Wed, Feb 23, 2011 at 2:48 PM, Anthony John  wrote:

> Ritesh,
>
> At CL ANY - if all endpoints are down - a HH is written. And it is a
> successful write - not a failed write.
>
> Now that does not guarantee a READ of the value just written - but that is
> a risk that you take when you use the ANY CL!
>
> HTH,
>
> -JA
>
>
> On Wed, Feb 23, 2011 at 4:40 PM, Ritesh Tijoriwala <
> tijoriwala.rit...@gmail.com> wrote:
>
>> hi Anthony,
>> While you stated the facts right, I don't see how it relates to the
>> question I ask. Can you elaborate specifically what happens in the case I
>> mentioned above to Dave?
>>
>> thanks,
>> Ritesh
>>
>>
>> On Wed, Feb 23, 2011 at 1:57 PM, Anthony John wrote:
>>
>>> Seems to me that the explanations are getting incredibly complicated -
>>> while I submit the real issue is not!
>>>
>>> Salient points here:-
>>> 1. To be guaranteed data consistency - the writes and reads have to be at
>>> Quorum CL or more
>>> 2. Any W/R at lesser CL means that the application has to handle the
>>> inconsistency, or has to be tolerant of it
>>> 3. Writing at "ANY" CL - a special case - means that writes will always
>>> go through (as long as any node is up), even if the destination nodes are
>>> not up. This is done via hinted handoff. But this can result in inconsistent
>>> reads, and yes that is a problem but refer to pt-2 above
>>> 4. At QUORUM CL R/W - after Quorum is met, hinted handoffs are used to
>>> handle that case where a particular node is down and the write needs to be
>>> replicated to it. But this will not cause inconsistent R as the hinted
>>> handoff (in this case) only applies after Quorum is met - so a Quorum R is
>>> not dependent on the down node being up, and having got the hint.
>>>
>>> Hope I state this appropriately!
>>>
>>> HTH,
>>>
>>> -JA
>>>
>>>
>>> On Wed, Feb 23, 2011 at 3:39 PM, Ritesh Tijoriwala <
>>> tijoriwala.rit...@gmail.com> wrote:
>>>
 > Read repair will probably occur at that point (depending on your
 config), which would cause the newest value to propagate to more replicas.

 Is the newest value the "quorum" value which means it is the old value
 that will be written back to the nodes having "newer non-quorum" value or
 the newest value is the real new value? :) If later, than this seems kind 
 of
 odd to me and how it will be useful to any application. A bug?

 Thanks,
 Ritesh


 On Wed, Feb 23, 2011 at 12:43 PM, Dave Revell wrote:

> Ritesh,
>
> You have seen the problem. Clients may read the newly written value
> even though the client performing the write saw it as a failure. When the
> client reads, it will use the correct number of replicas for the chosen 
> CL,
> then return the newest value seen at any replica. This "newest value" 
> could
> be the result of a failed write.
>
> Read repair will probably occur at that point (depending on your
> config), which would cause the newest value to propagate to more replicas.
>
> R+W>N guarantees serial order of operations: any read at CL=R that
> occurs after a write at CL=W will observe the write. I don't think this
> property is relevant to your current question, though.
>
> Cassandra has no mechanism to "roll back" the partial write, other than
> to simply write again. This may also fail.
>
> Best,
> Dave
>
>
> On Wed, Feb 23, 2011 at 10:12 AM,  wrote:
>
>> Hi Dave,
>> Thanks for your input. In the steps you mention, what happens when
>> client tries to read the value at step 6? Is it possible that the client 
>> may
>> see the new value? My understanding was if R + W > N, then client will 
>> not
>> see the new value as Quorum nodes will not agree on the new value. If 
>> that
>> is the case, then its alright to return failure to the client. However, 
>> if
>> not, then it is difficult to program as after every failure, you as an
>> client are not sure if failure is a pseudo failure with some side 
>> effects or
>> real failure.
>>
>> Thanks,
>> Ritesh
>>
>> 
>>
>> Ritesh,
>>
>> There is no commit protocol. Writes may be persisted on some replicas
>> even
>> though the quorum fails. Here's a sequence of events that shows the
>> "problem:"
>>
>> 1. Some replica R fails, but recently, so its failure has not yet been
>> detected
>> 2. A client writes with consistency > 1
>> 3. The write goes to all replicas, all replicas except R persist the
>> write
>> to disk
>> 4. Replica R never responds
>> 5. Failure is returned to the client, but the 

cassandra as user-profile data store

2011-02-23 Thread Dave Viner
Hi all,

I'm wondering if anyone has used cassandra as a datastore for a user-profile
service.  I'm thinking of applications like behavioral targeting, where
there are lots & lots of users (10s to 100s of millions), and lots & lots of
data about them intermixed in, say, weblogs (probably TBs worth).  The idea
would be to use Cassandra as a datastore for distributed parallel processing
of the TBs of files (say on hadoop).  Then the resulting user-profiles would
be query-able quickly.

Anyone know of that sort of application of Cassandra?  I'm trying to puzzle
out just what the column family might look like.  Seems like a mix of
time-oriented information (user x visits site y at time z), location
information (user x appeared from ip x.y.z.a which is geo-location 31.20309,
120.10923), and derived information (because user x visited site y 15 times
within a 10 day window, user x must be interested in buying a car).

I don't have specifics as yet... just some general thoughts.  But this feels
like a Cassandra type problem.  (User profile can have lots of columns per
user, but the exact columns might differ from user to user... very scalable,
etc)

Thanks
Dave Viner


Re: How does Cassandra handle failure during synchronous writes

2011-02-23 Thread Anthony John
Ritesh,

At CL ANY - if all endpoints are down - a HH is written. And it is a
successful write - not a failed write.

Now that does not guarantee a READ of the value just written - but that is a
risk that you take when you use the ANY CL!

HTH,

-JA

On Wed, Feb 23, 2011 at 4:40 PM, Ritesh Tijoriwala <
tijoriwala.rit...@gmail.com> wrote:

> hi Anthony,
> While you stated the facts right, I don't see how it relates to the
> question I ask. Can you elaborate specifically what happens in the case I
> mentioned above to Dave?
>
> thanks,
> Ritesh
>
>
> On Wed, Feb 23, 2011 at 1:57 PM, Anthony John wrote:
>
>> Seems to me that the explanations are getting incredibly complicated -
>> while I submit the real issue is not!
>>
>> Salient points here:-
>> 1. To be guaranteed data consistency - the writes and reads have to be at
>> Quorum CL or more
>> 2. Any W/R at lesser CL means that the application has to handle the
>> inconsistency, or has to be tolerant of it
>> 3. Writing at "ANY" CL - a special case - means that writes will always go
>> through (as long as any node is up), even if the destination nodes are not
>> up. This is done via hinted handoff. But this can result in inconsistent
>> reads, and yes that is a problem but refer to pt-2 above
>> 4. At QUORUM CL R/W - after Quorum is met, hinted handoffs are used to
>> handle that case where a particular node is down and the write needs to be
>> replicated to it. But this will not cause inconsistent R as the hinted
>> handoff (in this case) only applies after Quorum is met - so a Quorum R is
>> not dependent on the down node being up, and having got the hint.
>>
>> Hope I state this appropriately!
>>
>> HTH,
>>
>> -JA
>>
>>
>> On Wed, Feb 23, 2011 at 3:39 PM, Ritesh Tijoriwala <
>> tijoriwala.rit...@gmail.com> wrote:
>>
>>> > Read repair will probably occur at that point (depending on your
>>> config), which would cause the newest value to propagate to more replicas.
>>>
>>> Is the newest value the "quorum" value which means it is the old value
>>> that will be written back to the nodes having "newer non-quorum" value or
>>> the newest value is the real new value? :) If later, than this seems kind of
>>> odd to me and how it will be useful to any application. A bug?
>>>
>>> Thanks,
>>> Ritesh
>>>
>>>
>>> On Wed, Feb 23, 2011 at 12:43 PM, Dave Revell wrote:
>>>
 Ritesh,

 You have seen the problem. Clients may read the newly written value even
 though the client performing the write saw it as a failure. When the client
 reads, it will use the correct number of replicas for the chosen CL, then
 return the newest value seen at any replica. This "newest value" could be
 the result of a failed write.

 Read repair will probably occur at that point (depending on your
 config), which would cause the newest value to propagate to more replicas.

 R+W>N guarantees serial order of operations: any read at CL=R that
 occurs after a write at CL=W will observe the write. I don't think this
 property is relevant to your current question, though.

 Cassandra has no mechanism to "roll back" the partial write, other than
 to simply write again. This may also fail.

 Best,
 Dave


 On Wed, Feb 23, 2011 at 10:12 AM,  wrote:

> Hi Dave,
> Thanks for your input. In the steps you mention, what happens when
> client tries to read the value at step 6? Is it possible that the client 
> may
> see the new value? My understanding was if R + W > N, then client will not
> see the new value as Quorum nodes will not agree on the new value. If that
> is the case, then its alright to return failure to the client. However, if
> not, then it is difficult to program as after every failure, you as an
> client are not sure if failure is a pseudo failure with some side effects 
> or
> real failure.
>
> Thanks,
> Ritesh
>
> 
>
> Ritesh,
>
> There is no commit protocol. Writes may be persisted on some replicas
> even
> though the quorum fails. Here's a sequence of events that shows the
> "problem:"
>
> 1. Some replica R fails, but recently, so its failure has not yet been
> detected
> 2. A client writes with consistency > 1
> 3. The write goes to all replicas, all replicas except R persist the
> write
> to disk
> 4. Replica R never responds
> 5. Failure is returned to the client, but the new value is still in the
> cluster, on all replicas except R.
>
> Something very similar could happen for CL QUORUM.
>
> This is a conscious design decision because a commit protocol would
> constitute tight coupling between nodes, which goes against the
> Cassandra
> philosophy. But unfortunately you do have to write your app with this
> case
> in mind.
>
> Best,
> Dave
>
> On Tue, Feb 22, 2011 at 8:22 PM, tijoriwala.ritesh <
> tijori

Understanding Indexes

2011-02-23 Thread mcasandra

So far my understanding about indexes is that you can create indexes only on
column values (username in below eg).

Does it make sense to also have index on the keys that columnFamily uses to
store rows (row keys "abc" in below example). I am thinking in an event rows
keep growing would search be fast if there is an index on row keys if you
want to retrieve for eg "def" only out of tons of rows?

UserProfile = { // this is a ColumnFamily
abc: {   // this is the key to this Row inside the CF
// now we have an infinite # of columns in this row
username: "phatduckk",
email: "phatdu...@example.com",
phone: "(900) 976-"
}, // end row
def: {   // this is the key to another row in the CF
// now we have another infinite # of columns in this row
username: "ieure",
email: "ie...@example.com",
phone: "(888) 555-1212"
age: "66",
gender: "undecided"
},
}


2) Is the hash of column key used or row key used by RandomPartitioner to
distribute it accross the cassandra nodes?
-- 
View this message in context: 
http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Understanding-Indexes-tp6058238p6058238.html
Sent from the cassandra-u...@incubator.apache.org mailing list archive at 
Nabble.com.


Re: How does Cassandra handle failure during synchronous writes

2011-02-23 Thread Ritesh Tijoriwala
hi Anthony,
While you stated the facts right, I don't see how it relates to the question
I ask. Can you elaborate specifically what happens in the case I mentioned
above to Dave?

thanks,
Ritesh

On Wed, Feb 23, 2011 at 1:57 PM, Anthony John  wrote:

> Seems to me that the explanations are getting incredibly complicated -
> while I submit the real issue is not!
>
> Salient points here:-
> 1. To be guaranteed data consistency - the writes and reads have to be at
> Quorum CL or more
> 2. Any W/R at lesser CL means that the application has to handle the
> inconsistency, or has to be tolerant of it
> 3. Writing at "ANY" CL - a special case - means that writes will always go
> through (as long as any node is up), even if the destination nodes are not
> up. This is done via hinted handoff. But this can result in inconsistent
> reads, and yes that is a problem but refer to pt-2 above
> 4. At QUORUM CL R/W - after Quorum is met, hinted handoffs are used to
> handle that case where a particular node is down and the write needs to be
> replicated to it. But this will not cause inconsistent R as the hinted
> handoff (in this case) only applies after Quorum is met - so a Quorum R is
> not dependent on the down node being up, and having got the hint.
>
> Hope I state this appropriately!
>
> HTH,
>
> -JA
>
>
> On Wed, Feb 23, 2011 at 3:39 PM, Ritesh Tijoriwala <
> tijoriwala.rit...@gmail.com> wrote:
>
>> > Read repair will probably occur at that point (depending on your
>> config), which would cause the newest value to propagate to more replicas.
>>
>> Is the newest value the "quorum" value which means it is the old value
>> that will be written back to the nodes having "newer non-quorum" value or
>> the newest value is the real new value? :) If later, than this seems kind of
>> odd to me and how it will be useful to any application. A bug?
>>
>> Thanks,
>> Ritesh
>>
>>
>> On Wed, Feb 23, 2011 at 12:43 PM, Dave Revell  wrote:
>>
>>> Ritesh,
>>>
>>> You have seen the problem. Clients may read the newly written value even
>>> though the client performing the write saw it as a failure. When the client
>>> reads, it will use the correct number of replicas for the chosen CL, then
>>> return the newest value seen at any replica. This "newest value" could be
>>> the result of a failed write.
>>>
>>> Read repair will probably occur at that point (depending on your config),
>>> which would cause the newest value to propagate to more replicas.
>>>
>>> R+W>N guarantees serial order of operations: any read at CL=R that occurs
>>> after a write at CL=W will observe the write. I don't think this property is
>>> relevant to your current question, though.
>>>
>>> Cassandra has no mechanism to "roll back" the partial write, other than
>>> to simply write again. This may also fail.
>>>
>>> Best,
>>> Dave
>>>
>>>
>>> On Wed, Feb 23, 2011 at 10:12 AM,  wrote:
>>>
 Hi Dave,
 Thanks for your input. In the steps you mention, what happens when
 client tries to read the value at step 6? Is it possible that the client 
 may
 see the new value? My understanding was if R + W > N, then client will not
 see the new value as Quorum nodes will not agree on the new value. If that
 is the case, then its alright to return failure to the client. However, if
 not, then it is difficult to program as after every failure, you as an
 client are not sure if failure is a pseudo failure with some side effects 
 or
 real failure.

 Thanks,
 Ritesh

 

 Ritesh,

 There is no commit protocol. Writes may be persisted on some replicas
 even
 though the quorum fails. Here's a sequence of events that shows the
 "problem:"

 1. Some replica R fails, but recently, so its failure has not yet been
 detected
 2. A client writes with consistency > 1
 3. The write goes to all replicas, all replicas except R persist the
 write
 to disk
 4. Replica R never responds
 5. Failure is returned to the client, but the new value is still in the
 cluster, on all replicas except R.

 Something very similar could happen for CL QUORUM.

 This is a conscious design decision because a commit protocol would
 constitute tight coupling between nodes, which goes against the
 Cassandra
 philosophy. But unfortunately you do have to write your app with this
 case
 in mind.

 Best,
 Dave

 On Tue, Feb 22, 2011 at 8:22 PM, tijoriwala.ritesh <
 tijoriwala.rit...@gmail.com> wrote:

 >
 > Hi,
 > I wanted to get details on how does cassandra do synchronous writes to
 W
 > replicas (out of N)? Does it do a 2PC? If not, how does it deal with
 > failures of of nodes before it gets to write to W replicas? If the
 > orchestrating node cannot write to W nodes successfully, I guess it
 will
 > fail the write operation but what happens to the completed writes on X
 (W
 > >

map reduce job over indexed range of keys

2011-02-23 Thread Matt Kennedy
Let me start out by saying that I think I'm going to have to write a patch
to get what I want, but I'm fine with that.  I just wanted to check here
first to make sure that I'm not missing something obvious.

I'd like to be able to run a MapReduce job that takes a value in an indexed
column as a parameter, and use that to select the data that the MapReduce
job operates on.  Right now, it looks like this isn't possible because
org.apache.cassandra.hadoop.ColumnFamilyRecordReader will only fetch data
with get_range_slices, not get_indexed_slices.

An example might be useful.  Let's say I want to run a map reduce job over
all the data for a particular country.  Right now I can do this in Map
Reduce by simply discarding all the data that is not from the country I want
to process on. I suspect it will be faster if I can reduce the size of the
Map Reduce job by only selecting the data I want by using secondary indexes
in Cassandra.

So, first question: Am I wrong?  Is there some clever way to enable the
behavior I'm looking for (without modifying the cassandra codebase)?

Second question: If I'm not wrong, should I open a JIRA issue for this and
start coding up this feature?

Finally, the real reason that I want to get this working is so that I can
enhance the CassandraStorage pig loadfunc so that it can take query
parameters on in the URL string that is used to specify the keyspace and
column family.  So for example, you might load data into Pig with this
sytax:

rows = LOAD 'cassandra://mykeyspace/mycolumnfamily?country=UK' using
CassandraStorage();

I'd like to get some feedback on that syntax.

Thanks,
Matt Kennedy


Re: Will the large datafile size affect the performance?

2011-02-23 Thread Edward Capriolo
On Wed, Feb 23, 2011 at 4:51 PM, buddhasystem  wrote:
>
> I know that theoretically it should not (apart from compaction issues), but
> maybe somebody has experience showing otherwise:
>
> My test cluster now has 250GB of data and will have 1.5TB in its
> reincarnation. If all these data is in a single CF -- will it cause read or
> write performance problems? Should I "shard" it? One advantage of splitting
> the data would be reducing the impact of compaction and repairs (or so I
> naively assume).
>
> TIA
>
> Maxim
>
> --
> View this message in context: 
> http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Will-the-large-datafile-size-affect-the-performance-tp6057991p6057991.html
> Sent from the cassandra-u...@incubator.apache.org mailing list archive at 
> Nabble.com.
>

http://wiki.apache.org/cassandra/LargeDataSetConsiderations

By dividing your data you get the benefits of being able to apply two
different settings at the Column Family or keyspace level. For example
you might have some batch data that you only want to replicate twice,
or some small subset of data that needs to be read frequently that is
highly cached. Also as you said having three smaller CF's helps you
avoid a single very long running and intensive operations like repair
or major compact.

If you always need to read both CF's to satisfy you application it is
not a good idea.


Re: Does Cassandra use vector clocks

2011-02-23 Thread Oleg Anastastasyev
Jonathan Ellis  gmail.com> writes:

> 
> IMO if you only get CL.ALL it's not superior enough to pessimistic
> locking to justify the complexity of adding it.
> 
Yes, may be youre right, but CL.ALL is neccessary only to solve this problem in
a generic way. 

In some (most?) cases, conflicts detection, even on read/compaction stage is
enough. Such conflicts can be then resolved in non generic, app specific way
(eg, by specifying some ConflictResolver class on a per CF config). And for app
specific resolution, lower CLs are ok, because they dont require to detect
conflict during update operation.

For example, in our heavy loaded messaging cluster, only 0.015% of updates are
writing to the same column. Employing pessimistic locking for 100% of updates to
resolve that tiny percent of conflicts (and sighnifically raising latency of all
updates) looked like overkill, but all conflicts could be resolved in app
specific way.

It may be even more important to help ppl to diagnose, that conflicting updates
problem do exist in their appllication. Current cassanda's conflict resolution
silently ignores (and overwrites) conflicting updates, making this problem
hidden for prolonged time and, so, difficult to diagnose and fix at early stages
of development.




Re: Multiple Seeds

2011-02-23 Thread Edward Capriolo
On Wed, Feb 23, 2011 at 3:28 PM,   wrote:
> So does cassandra monitor the config file for changes? If it doesn't how else 
> would it know unless you restart you had added a new seed?
>
> -Original Message-
> From: Edward Capriolo [mailto:edlinuxg...@gmail.com]
> Sent: Wednesday, February 23, 2011 3:23 PM
> To: user@cassandra.apache.org
> Cc: Truelove, Jeremy: IT (NYK)
> Subject: Re: Multiple Seeds
>
> On Wed, Feb 23, 2011 at 2:59 PM,   wrote:
>> To add a host to the seeds list after it has had the data streamed to it I
>> need to
>>
>>
>>
>> 1.   stop it
>>
>> 2.   edit the yaml file to
>>
>> a.   include it in the seeds list
>>
>> b.  set auto boostrap to false
>>
>> 3.    restart it
>>
>>
>>
>> correct? Additionally you would need to add it to the other nodes seed lists
>> and restart them as well.
>>
>>
>>
>> From: Eric Gilmore [mailto:e...@datastax.com]
>> Sent: Wednesday, February 23, 2011 2:47 PM
>> To: user@cassandra.apache.org
>> Subject: Re: Multiple Seeds
>>
>>
>>
>> Well -- when you first bring a node into a ring, you will probably want to
>> stream data to it with auto_bootstrap: true.
>>
>> If you want that node to be a seed, then add it to the seeds list AFTER it
>> has joined the ring.
>>
>> I'd refer you to the "Seed List" and "Autoboostrapping" sections of the
>> Getting Started guide, which contain the following blurbs:
>>
>> There is no strict rule to determine which hosts need to be listed as seeds,
>> but all nodes in a cluster need the same seed list. For a production
>> deployment, DataStax recommends two seeds per data center.
>>
>> An autobootstrapping node cannot have itself in the list of seeds nor can it
>> contain an initial_token already claimed by another node. To add new seeds,
>> autobootstrap the nodes first, and then configure them as seeds.
>>
>>
>>
>>
>>
>>
>>
>> On Wed, Feb 23, 2011 at 11:39 AM, 
>> wrote:
>>
>> So all seeds should always be set to 'auto_bootstrap: false' in their .yaml
>> file.
>>
>> -Original Message-
>> From: Edward Capriolo [mailto:edlinuxg...@gmail.com]
>> Sent: Wednesday, February 23, 2011 2:36 PM
>> To: user@cassandra.apache.org
>>
>> Cc: Truelove, Jeremy: IT (NYK)
>> Subject: Re: Multiple Seeds
>>
>> On Wed, Feb 23, 2011 at 2:30 PM,  
>> wrote:
>>> Yeah I set the tokens, I'm more asking if I start the first seed node with
>>> autobootstrap set to false the second seed should have it set to true as
>>> well as all the slave nodes correct? I didn't see this in the docs but I
>>> may
>>> have just missed it.
>>>
>>>
>>>
>>> From: Eric Gilmore [mailto:e...@datastax.com]
>>> Sent: Wednesday, February 23, 2011 2:24 PM
>>> To: user@cassandra.apache.org
>>> Subject: Re: Multiple Seeds
>>>
>>>
>>>
>>> The DataStax documentation offers some answers to those questions in the
>>> Getting Started section and the Clustering reference docs.
>>>
>>> Autobootstrap should be true, but with the important caveat that
>>> intial_token values should be specified.  Have a look at those docs, and
>>> please give feedback on how helpful they are/aren't.
>>>
>>> Regards,
>>>
>>> Eric Gilmore
>>>
>>> On Wed, Feb 23, 2011 at 11:15 AM, 
>>> wrote:
>>>
>>> What's the best way to bring multiple seeds up, should only one of them
>>> have
>>> auto bootstrap set to true or should neither of them? Should they list
>>> themselves and the other seed in their seed section in the yaml config?
>>>
>>> ___
>>>
>>>
>>>
>>> This e-mail may contain information that is confidential, privileged or
>>> otherwise protected from disclosure. If you are not an intended recipient
>>> of
>>> this e-mail, do not duplicate or redistribute it by any means. Please
>>> delete
>>> it and any attachments and notify the sender that you have received it in
>>> error. Unless specifically indicated, this e-mail is not an offer to buy
>>> or
>>> sell or a solicitation to buy or sell any securities, investment products
>>> or
>>> other financial product or service, an official confirmation of any
>>> transaction, or an official statement of Barclays. Any views or opinions
>>> presented are solely those of the author and do not necessarily represent
>>> those of Barclays. This e-mail is subject to terms available at the
>>> following link: www.barcap.com/emaildisclaimer. By messaging with Barclays
>>> you consent to the foregoing.  Barclays Capital is the investment banking
>>> division of Barclays Bank PLC, a company registered in England (number
>>> 1026167) with its registered office at 1 Churchill Place, London, E14 5HP.
>>> This email may relate to or be sent from other members of the Barclays
>>> Group.
>>>
>>> ___
>>>
>>>
>>
>> If a node is defined as a seeds it will never auto bootstrap. After it
>> has bootstrapped and has a system table you can set its yaml file as a
>> seed if you wish.
>>
>>
>>
>> ___
>>
>>
>>
>> This e-mail may con

Re: How does Cassandra handle failure during synchronous writes

2011-02-23 Thread Anthony John
Seems to me that the explanations are getting incredibly complicated - while
I submit the real issue is not!

Salient points here:-
1. To be guaranteed data consistency - the writes and reads have to be at
Quorum CL or more
2. Any W/R at lesser CL means that the application has to handle the
inconsistency, or has to be tolerant of it
3. Writing at "ANY" CL - a special case - means that writes will always go
through (as long as any node is up), even if the destination nodes are not
up. This is done via hinted handoff. But this can result in inconsistent
reads, and yes that is a problem but refer to pt-2 above
4. At QUORUM CL R/W - after Quorum is met, hinted handoffs are used to
handle that case where a particular node is down and the write needs to be
replicated to it. But this will not cause inconsistent R as the hinted
handoff (in this case) only applies after Quorum is met - so a Quorum R is
not dependent on the down node being up, and having got the hint.

Hope I state this appropriately!

HTH,

-JA

On Wed, Feb 23, 2011 at 3:39 PM, Ritesh Tijoriwala <
tijoriwala.rit...@gmail.com> wrote:

> > Read repair will probably occur at that point (depending on your config),
> which would cause the newest value to propagate to more replicas.
>
> Is the newest value the "quorum" value which means it is the old value that
> will be written back to the nodes having "newer non-quorum" value or the
> newest value is the real new value? :) If later, than this seems kind of odd
> to me and how it will be useful to any application. A bug?
>
> Thanks,
> Ritesh
>
>
> On Wed, Feb 23, 2011 at 12:43 PM, Dave Revell  wrote:
>
>> Ritesh,
>>
>> You have seen the problem. Clients may read the newly written value even
>> though the client performing the write saw it as a failure. When the client
>> reads, it will use the correct number of replicas for the chosen CL, then
>> return the newest value seen at any replica. This "newest value" could be
>> the result of a failed write.
>>
>> Read repair will probably occur at that point (depending on your config),
>> which would cause the newest value to propagate to more replicas.
>>
>> R+W>N guarantees serial order of operations: any read at CL=R that occurs
>> after a write at CL=W will observe the write. I don't think this property is
>> relevant to your current question, though.
>>
>> Cassandra has no mechanism to "roll back" the partial write, other than to
>> simply write again. This may also fail.
>>
>> Best,
>> Dave
>>
>>
>> On Wed, Feb 23, 2011 at 10:12 AM,  wrote:
>>
>>> Hi Dave,
>>> Thanks for your input. In the steps you mention, what happens when client
>>> tries to read the value at step 6? Is it possible that the client may see
>>> the new value? My understanding was if R + W > N, then client will not see
>>> the new value as Quorum nodes will not agree on the new value. If that is
>>> the case, then its alright to return failure to the client. However, if not,
>>> then it is difficult to program as after every failure, you as an client are
>>> not sure if failure is a pseudo failure with some side effects or real
>>> failure.
>>>
>>> Thanks,
>>> Ritesh
>>>
>>> 
>>>
>>> Ritesh,
>>>
>>> There is no commit protocol. Writes may be persisted on some replicas
>>> even
>>> though the quorum fails. Here's a sequence of events that shows the
>>> "problem:"
>>>
>>> 1. Some replica R fails, but recently, so its failure has not yet been
>>> detected
>>> 2. A client writes with consistency > 1
>>> 3. The write goes to all replicas, all replicas except R persist the
>>> write
>>> to disk
>>> 4. Replica R never responds
>>> 5. Failure is returned to the client, but the new value is still in the
>>> cluster, on all replicas except R.
>>>
>>> Something very similar could happen for CL QUORUM.
>>>
>>> This is a conscious design decision because a commit protocol would
>>> constitute tight coupling between nodes, which goes against the Cassandra
>>> philosophy. But unfortunately you do have to write your app with this
>>> case
>>> in mind.
>>>
>>> Best,
>>> Dave
>>>
>>> On Tue, Feb 22, 2011 at 8:22 PM, tijoriwala.ritesh <
>>> tijoriwala.rit...@gmail.com> wrote:
>>>
>>> >
>>> > Hi,
>>> > I wanted to get details on how does cassandra do synchronous writes to
>>> W
>>> > replicas (out of N)? Does it do a 2PC? If not, how does it deal with
>>> > failures of of nodes before it gets to write to W replicas? If the
>>> > orchestrating node cannot write to W nodes successfully, I guess it
>>> will
>>> > fail the write operation but what happens to the completed writes on X
>>> (W
>>> > >
>>> > X) nodes?
>>> >
>>> > Thanks,
>>> > Ritesh
>>> > --
>>> > View this message in context:
>>> >
>>> http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/How-does-Cassandra-handle-failure-during-synchronous-writes-tp6055152p6055152.html
>>> > Sent from the cassandra-u...@incubator.apache.org mailing list archive
>>> at
>>> > Nabble.com.
>>> >
>>>
>>> 
>>> Quoted from:
>>>
>>> http://cassandra-user-i

Can I count on Super Column Families why planing 3 years out?

2011-02-23 Thread buddhasystem

There was a discussion here on how well (or not so well) the Super CFs are
supported. I now need to make a strategic decision as to how I plan my data.
What's the consensus -- will the super CF be there 3 years out?


TIA
Maxim

-- 
View this message in context: 
http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Can-I-count-on-Super-Column-Families-why-planing-3-years-out-tp6057997p6057997.html
Sent from the cassandra-u...@incubator.apache.org mailing list archive at 
Nabble.com.


Will the large datafile size affect the performance?

2011-02-23 Thread buddhasystem

I know that theoretically it should not (apart from compaction issues), but
maybe somebody has experience showing otherwise:

My test cluster now has 250GB of data and will have 1.5TB in its
reincarnation. If all these data is in a single CF -- will it cause read or
write performance problems? Should I "shard" it? One advantage of splitting
the data would be reducing the impact of compaction and repairs (or so I
naively assume).

TIA

Maxim

-- 
View this message in context: 
http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Will-the-large-datafile-size-affect-the-performance-tp6057991p6057991.html
Sent from the cassandra-u...@incubator.apache.org mailing list archive at 
Nabble.com.


RE: Is Cassandra suitable for my problem?

2011-02-23 Thread Prasanna Jayapalan
Hi Alexandru,

   We @ EvidentSoftware (http://www.evidentsoftware.com/) have a monitoring
solution  that is storing timeseries information in Cassandra and also
neo4j. Check this blogpost
http://www.evidentsoftware.com/evident-clearstone-5-implements-cassandra-and-neo4j-as-an-elastic-data-store/.
Can you share more details about your use case, so we can give you some
guidance.



Prasanna



*From:* Alexandru Dan Sicoe [mailto:sicoe.alexan...@googlemail.com]
*Sent:* Wednesday, February 23, 2011 3:45 PM
*To:* user@cassandra.apache.org
*Subject:* Is Cassandra suitable for my problem?



Hello,

I'm currently doing my masters project. I need to store lots of time series
data of any type (String, int, booleans, arrays of the previous) with a high
writing rate(20MBytes/sec -> 170TBytes/year - note not running continuously)
but less strict read requirements. This is monitoring data from a vast
distributed network. The queries will be something like: give me this data
between Time1 and Time2.

The hardware that I have available is between 2 and 5 hosts.

Questions:

   Should I use Cassandra?

   Suggestions of how to structure the data? (I read
Cloudkick's blog
https://www.cloudkick.com/blog/2010/mar/02/4_months_with_cassandra/ but I
found that it doesn't give too much detail)



Any help is much appreciated,

Alex


Re: I: Re: Are row-keys sorted by the compareWith?

2011-02-23 Thread Dan Washusen
Hi Matthew,
As you mention the map returned from multiget_slice is not order preserving, 
Pelops is doing this on the client side...

Cheers,
Dan

-- 
Dan Washusen
Sent with Sparrow
On Wednesday, 23 February 2011 at 8:38 PM, Matthew Dennis wrote: 
> The map returned by multiget_slice (what I suspect is the underlying thrift 
> call for getColumnsFromRows) is not a order preserving map, it's a HashMap so 
> the order of the returned results cannot be depended on.  Even if it was a 
> order preserving map, not all languages would be able to make use of the 
> results since not all languages have ordered maps (though many, including 
> Java, certainly do).
> 
> That being said, it would be fairly easy to change this on the C* side to 
> preserve the order the keys were requested in, though as mentioned not all 
> clients could take advantage of it.
> 
>  On Mon, Feb 21, 2011 at 4:09 PM, cbert...@libero.it  
> wrote:
> > > 
> > > As Jonathan mentions the compareWith on a column family def. is defines 
> > > the order for the columns *within* a row... In order to control the 
> > > ordering of rows you'll need to use the OrderPreservingPartitioner 
> > > (http://www.datastax.com/docs/0.7/operations/clustering#tokens-partitioners-ring).
> > > 
> > > Thanks for your answer and for your time, I will take a look at this.
> > > 
> > > As for getColumnsFromRows; it should be returning you a map of lists. The 
> > > map is insertion-order-preserving and populated based on the provided 
> > > list of row keys (so if you iterate over the entries in the map they 
> > > should be in the same order as the list of row keys). 
> > > 
> > > mmm ... well it didn't happen like this. In my code I had a CF named 
> > > comments and also a CF called usercomments. UserComments use an uuid as 
> > > row-key to keep, TimeUUID sorted, the "pointers" to the comments of the 
> > > user. When I get the sorted list of keys from the UserComments and I use 
> > > this list as row-keys-list in the GetColumnsFromRows I don't get back the 
> > > data sorted as I expect them to be. 
> > > It looks like if Cassandra/Pelops does not care on how I provide the 
> > > row-keys-list. I am sure about that cause I did something different: I 
> > > iterate over my row-keys-list and made many GetColumnFromRow instead of 
> > > one GetColumnsFromRows and when I iterate data are correctly sorted. But 
> > > this can not be a solution ...
> > > 
> > > I am using Cassandra 0.6.9
> > > 
> > > I profit of your knownledge of Pelops to ask you something: I am 
> > > evaluating the migration to Cassandra 0.7 ... as far as you know, in 
> > > terms of written code, is it an heavy job? 
> > > 
> > > Best Regards
> > > 
> > > Carlo
> > > 
> > > >  Messaggio originale
> > > >  Da: d...@reactive.org
> > > > 
> > > > On Saturday, 19 February 2011 at 8:16 AM, cbert...@libero.it wrote:
> > > > > Hi all,
> > > > > I created a CF in which i need to get, sorted by time, the Rows 
> > > > > inside. Each 
> > > > > Row represents a comment.
> > > > > 
> > > > > 
> > > > > 
> > > > > I've created a few rows using as Row Key a generated TimeUUID but 
> > > > > when I call 
> > > > > the Pelops method "GetColumnsFromRows" I don't get the data back as I 
> > > > > expect: 
> > > > > rows are not sorted by TimeUUID.
> > > > >  I though it was probably cause of the random-part of the TimeUUID so 
> > > > > I create 
> > > > > a new CF ...
> > > > > 
> > > > > 
> > > > > 
> > > > > This time I created a few rows using the java 
> > > > > System.CurrentTimeMillis() that 
> > > > >  retrieve a long. I call again the "GetColumnsFromRows" and again the 
> > > > > same 
> > > > > results: data are not sorted!
> > > > > I've read many times that Rows are sorted as specified in the 
> > > > > compareWith but 
> > > > > I can't see it. 
> > > > >  To solve this problem for the moment I've used a SuperColumnFamily 
> > > > > with an 
> > > > > UNIQUE ROW ... but I think this is just a workaround and not the 
> > > > > solution.
> > > > > 
> > > > >  > > > >  CompareSubcolumnsWith="BytesType"/ >
> > > > > 
> > > > > Now when I call the "GetSuperColumnsFromRow" I get all the 
> > > > > SuperColumns as I 
> > > > > expected: sorted by TimeUUID. Why it does not happen the same with 
> > > > > the Rows? 
> > > > >  I'm confused.
> > > > > 
> > > > > TIA for any help.
> > > > > 
> > > > > Best Regards
> > > > > 
> > > > > Carlo
> > > > > 
> > > > 
> > > > 
> > > 
> > > 
> 
> 


Re: Is Cassandra suitable for my problem?

2011-02-23 Thread Ritesh Tijoriwala
Hi Alexandru,
I feel Cassandra can certainly be used to solve the problem you have but if
your requires are not very strict, you need very high throughput and its
okay for you to lose some data occasionally due to machine crash, then I
recommend you look at Redis (http://redis.io/). It is a high performant
key/value store with very high throughput and used for analytics.

thanks,
Ritesh

On Wed, Feb 23, 2011 at 12:45 PM, Alexandru Dan Sicoe <
sicoe.alexan...@googlemail.com> wrote:

> Hello,
>
> I'm currently doing my masters project. I need to store lots of time series
> data of any type (String, int, booleans, arrays of the previous) with a high
> writing rate(20MBytes/sec -> 170TBytes/year - note not running continuously)
> but less strict read requirements. This is monitoring data from a vast
> distributed network. The queries will be something like: give me this data
> between Time1 and Time2.
>
> The hardware that I have available is between 2 and 5 hosts.
>
> Questions:
>
>Should I use Cassandra?
>
>Suggestions of how to structure the data? (I read
> Cloudkick's blog
> https://www.cloudkick.com/blog/2010/mar/02/4_months_with_cassandra/ but I
> found that it doesn't give too much detail)
>
>
> Any help is much appreciated,
>
> Alex
>


Re: How does Cassandra handle failure during synchronous writes

2011-02-23 Thread Ritesh Tijoriwala
> Read repair will probably occur at that point (depending on your config),
which would cause the newest value to propagate to more replicas.

Is the newest value the "quorum" value which means it is the old value that
will be written back to the nodes having "newer non-quorum" value or the
newest value is the real new value? :) If later, than this seems kind of odd
to me and how it will be useful to any application. A bug?

Thanks,
Ritesh

On Wed, Feb 23, 2011 at 12:43 PM, Dave Revell  wrote:

> Ritesh,
>
> You have seen the problem. Clients may read the newly written value even
> though the client performing the write saw it as a failure. When the client
> reads, it will use the correct number of replicas for the chosen CL, then
> return the newest value seen at any replica. This "newest value" could be
> the result of a failed write.
>
> Read repair will probably occur at that point (depending on your config),
> which would cause the newest value to propagate to more replicas.
>
> R+W>N guarantees serial order of operations: any read at CL=R that occurs
> after a write at CL=W will observe the write. I don't think this property is
> relevant to your current question, though.
>
> Cassandra has no mechanism to "roll back" the partial write, other than to
> simply write again. This may also fail.
>
> Best,
> Dave
>
>
> On Wed, Feb 23, 2011 at 10:12 AM,  wrote:
>
>> Hi Dave,
>> Thanks for your input. In the steps you mention, what happens when client
>> tries to read the value at step 6? Is it possible that the client may see
>> the new value? My understanding was if R + W > N, then client will not see
>> the new value as Quorum nodes will not agree on the new value. If that is
>> the case, then its alright to return failure to the client. However, if not,
>> then it is difficult to program as after every failure, you as an client are
>> not sure if failure is a pseudo failure with some side effects or real
>> failure.
>>
>> Thanks,
>> Ritesh
>>
>> 
>>
>> Ritesh,
>>
>> There is no commit protocol. Writes may be persisted on some replicas even
>> though the quorum fails. Here's a sequence of events that shows the
>> "problem:"
>>
>> 1. Some replica R fails, but recently, so its failure has not yet been
>> detected
>> 2. A client writes with consistency > 1
>> 3. The write goes to all replicas, all replicas except R persist the write
>> to disk
>> 4. Replica R never responds
>> 5. Failure is returned to the client, but the new value is still in the
>> cluster, on all replicas except R.
>>
>> Something very similar could happen for CL QUORUM.
>>
>> This is a conscious design decision because a commit protocol would
>> constitute tight coupling between nodes, which goes against the Cassandra
>> philosophy. But unfortunately you do have to write your app with this case
>> in mind.
>>
>> Best,
>> Dave
>>
>> On Tue, Feb 22, 2011 at 8:22 PM, tijoriwala.ritesh <
>> tijoriwala.rit...@gmail.com> wrote:
>>
>> >
>> > Hi,
>> > I wanted to get details on how does cassandra do synchronous writes to W
>> > replicas (out of N)? Does it do a 2PC? If not, how does it deal with
>> > failures of of nodes before it gets to write to W replicas? If the
>> > orchestrating node cannot write to W nodes successfully, I guess it will
>> > fail the write operation but what happens to the completed writes on X
>> (W
>> > >
>> > X) nodes?
>> >
>> > Thanks,
>> > Ritesh
>> > --
>> > View this message in context:
>> >
>> http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/How-does-Cassandra-handle-failure-during-synchronous-writes-tp6055152p6055152.html
>> > Sent from the cassandra-u...@incubator.apache.org mailing list archive
>> at
>> > Nabble.com.
>> >
>>
>> 
>> Quoted from:
>>
>> http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/How-does-Cassandra-handle-failure-during-synchronous-writes-tp6055152p6055408.html
>>
>
>


Re: How does Cassandra handle failure during synchronous writes

2011-02-23 Thread Dave Revell
Ritesh,

You have seen the problem. Clients may read the newly written value even
though the client performing the write saw it as a failure. When the client
reads, it will use the correct number of replicas for the chosen CL, then
return the newest value seen at any replica. This "newest value" could be
the result of a failed write.

Read repair will probably occur at that point (depending on your config),
which would cause the newest value to propagate to more replicas.

R+W>N guarantees serial order of operations: any read at CL=R that occurs
after a write at CL=W will observe the write. I don't think this property is
relevant to your current question, though.

Cassandra has no mechanism to "roll back" the partial write, other than to
simply write again. This may also fail.

Best,
Dave


On Wed, Feb 23, 2011 at 10:12 AM,  wrote:

> Hi Dave,
> Thanks for your input. In the steps you mention, what happens when client
> tries to read the value at step 6? Is it possible that the client may see
> the new value? My understanding was if R + W > N, then client will not see
> the new value as Quorum nodes will not agree on the new value. If that is
> the case, then its alright to return failure to the client. However, if not,
> then it is difficult to program as after every failure, you as an client are
> not sure if failure is a pseudo failure with some side effects or real
> failure.
>
> Thanks,
> Ritesh
>
> 
> Ritesh,
>
> There is no commit protocol. Writes may be persisted on some replicas even
> though the quorum fails. Here's a sequence of events that shows the
> "problem:"
>
> 1. Some replica R fails, but recently, so its failure has not yet been
> detected
> 2. A client writes with consistency > 1
> 3. The write goes to all replicas, all replicas except R persist the write
> to disk
> 4. Replica R never responds
> 5. Failure is returned to the client, but the new value is still in the
> cluster, on all replicas except R.
>
> Something very similar could happen for CL QUORUM.
>
> This is a conscious design decision because a commit protocol would
> constitute tight coupling between nodes, which goes against the Cassandra
> philosophy. But unfortunately you do have to write your app with this case
> in mind.
>
> Best,
> Dave
>
> On Tue, Feb 22, 2011 at 8:22 PM, tijoriwala.ritesh <
> tijoriwala.rit...@gmail.com> wrote:
>
> >
> > Hi,
> > I wanted to get details on how does cassandra do synchronous writes to W
> > replicas (out of N)? Does it do a 2PC? If not, how does it deal with
> > failures of of nodes before it gets to write to W replicas? If the
> > orchestrating node cannot write to W nodes successfully, I guess it will
> > fail the write operation but what happens to the completed writes on X (W
> > >
> > X) nodes?
> >
> > Thanks,
> > Ritesh
> > --
> > View this message in context:
> >
> http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/How-does-Cassandra-handle-failure-during-synchronous-writes-tp6055152p6055152.html
> > Sent from the cassandra-u...@incubator.apache.org mailing list archive
> at
> > Nabble.com.
> >
>
> 
> Quoted from:
>
> http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/How-does-Cassandra-handle-failure-during-synchronous-writes-tp6055152p6055408.html
>


Is Cassandra suitable for my problem?

2011-02-23 Thread Alexandru Dan Sicoe
Hello,

I'm currently doing my masters project. I need to store lots of time series
data of any type (String, int, booleans, arrays of the previous) with a high
writing rate(20MBytes/sec -> 170TBytes/year - note not running continuously)
but less strict read requirements. This is monitoring data from a vast
distributed network. The queries will be something like: give me this data
between Time1 and Time2.

The hardware that I have available is between 2 and 5 hosts.

Questions:

   Should I use Cassandra?

   Suggestions of how to structure the data? (I read
Cloudkick's blog
https://www.cloudkick.com/blog/2010/mar/02/4_months_with_cassandra/ but I
found that it doesn't give too much detail)


Any help is much appreciated,

Alex


RE: Multiple Seeds

2011-02-23 Thread Jeremy.Truelove
So does cassandra monitor the config file for changes? If it doesn't how else 
would it know unless you restart you had added a new seed?

-Original Message-
From: Edward Capriolo [mailto:edlinuxg...@gmail.com] 
Sent: Wednesday, February 23, 2011 3:23 PM
To: user@cassandra.apache.org
Cc: Truelove, Jeremy: IT (NYK)
Subject: Re: Multiple Seeds

On Wed, Feb 23, 2011 at 2:59 PM,   wrote:
> To add a host to the seeds list after it has had the data streamed to it I
> need to
>
>
>
> 1.   stop it
>
> 2.   edit the yaml file to
>
> a.   include it in the seeds list
>
> b.  set auto boostrap to false
>
> 3.    restart it
>
>
>
> correct? Additionally you would need to add it to the other nodes seed lists
> and restart them as well.
>
>
>
> From: Eric Gilmore [mailto:e...@datastax.com]
> Sent: Wednesday, February 23, 2011 2:47 PM
> To: user@cassandra.apache.org
> Subject: Re: Multiple Seeds
>
>
>
> Well -- when you first bring a node into a ring, you will probably want to
> stream data to it with auto_bootstrap: true.
>
> If you want that node to be a seed, then add it to the seeds list AFTER it
> has joined the ring.
>
> I'd refer you to the "Seed List" and "Autoboostrapping" sections of the
> Getting Started guide, which contain the following blurbs:
>
> There is no strict rule to determine which hosts need to be listed as seeds,
> but all nodes in a cluster need the same seed list. For a production
> deployment, DataStax recommends two seeds per data center.
>
> An autobootstrapping node cannot have itself in the list of seeds nor can it
> contain an initial_token already claimed by another node. To add new seeds,
> autobootstrap the nodes first, and then configure them as seeds.
>
>
>
>
>
>
>
> On Wed, Feb 23, 2011 at 11:39 AM, 
> wrote:
>
> So all seeds should always be set to 'auto_bootstrap: false' in their .yaml
> file.
>
> -Original Message-
> From: Edward Capriolo [mailto:edlinuxg...@gmail.com]
> Sent: Wednesday, February 23, 2011 2:36 PM
> To: user@cassandra.apache.org
>
> Cc: Truelove, Jeremy: IT (NYK)
> Subject: Re: Multiple Seeds
>
> On Wed, Feb 23, 2011 at 2:30 PM,  
> wrote:
>> Yeah I set the tokens, I'm more asking if I start the first seed node with
>> autobootstrap set to false the second seed should have it set to true as
>> well as all the slave nodes correct? I didn't see this in the docs but I
>> may
>> have just missed it.
>>
>>
>>
>> From: Eric Gilmore [mailto:e...@datastax.com]
>> Sent: Wednesday, February 23, 2011 2:24 PM
>> To: user@cassandra.apache.org
>> Subject: Re: Multiple Seeds
>>
>>
>>
>> The DataStax documentation offers some answers to those questions in the
>> Getting Started section and the Clustering reference docs.
>>
>> Autobootstrap should be true, but with the important caveat that
>> intial_token values should be specified.  Have a look at those docs, and
>> please give feedback on how helpful they are/aren't.
>>
>> Regards,
>>
>> Eric Gilmore
>>
>> On Wed, Feb 23, 2011 at 11:15 AM, 
>> wrote:
>>
>> What's the best way to bring multiple seeds up, should only one of them
>> have
>> auto bootstrap set to true or should neither of them? Should they list
>> themselves and the other seed in their seed section in the yaml config?
>>
>> ___
>>
>>
>>
>> This e-mail may contain information that is confidential, privileged or
>> otherwise protected from disclosure. If you are not an intended recipient
>> of
>> this e-mail, do not duplicate or redistribute it by any means. Please
>> delete
>> it and any attachments and notify the sender that you have received it in
>> error. Unless specifically indicated, this e-mail is not an offer to buy
>> or
>> sell or a solicitation to buy or sell any securities, investment products
>> or
>> other financial product or service, an official confirmation of any
>> transaction, or an official statement of Barclays. Any views or opinions
>> presented are solely those of the author and do not necessarily represent
>> those of Barclays. This e-mail is subject to terms available at the
>> following link: www.barcap.com/emaildisclaimer. By messaging with Barclays
>> you consent to the foregoing.  Barclays Capital is the investment banking
>> division of Barclays Bank PLC, a company registered in England (number
>> 1026167) with its registered office at 1 Churchill Place, London, E14 5HP.
>> This email may relate to or be sent from other members of the Barclays
>> Group.
>>
>> ___
>>
>>
>
> If a node is defined as a seeds it will never auto bootstrap. After it
> has bootstrapped and has a system table you can set its yaml file as a
> seed if you wish.
>
>
>
> ___
>
>
>
> This e-mail may contain information that is confidential, privileged or
> otherwise protected from disclosure. If you are not an intended recipient of
> this e-mail, do not duplicate or redistribute it by any means. Please

RE: Multiple Seeds

2011-02-23 Thread Jeremy.Truelove
Also should non-seed host be perpetually set to auto_bootstrap: true ?

From: Truelove, Jeremy: IT (NYK)
Sent: Wednesday, February 23, 2011 3:00 PM
To: user@cassandra.apache.org
Subject: RE: Multiple Seeds

To add a host to the seeds list after it has had the data streamed to it I need 
to


1.   stop it

2.   edit the yaml file to

a.   include it in the seeds list

b.  set auto boostrap to false

3.restart it

correct? Additionally you would need to add it to the other nodes seed lists 
and restart them as well.

From: Eric Gilmore [mailto:e...@datastax.com]
Sent: Wednesday, February 23, 2011 2:47 PM
To: user@cassandra.apache.org
Subject: Re: Multiple Seeds

Well -- when you first bring a node into a ring, you will probably want to 
stream data to it with auto_bootstrap: true.

If you want that node to be a seed, then add it to the seeds list AFTER it has 
joined the ring.

I'd refer you to the "Seed List" and "Autoboostrapping" sections of the Getting 
Started guide, which contain the following blurbs:

There is no strict rule to determine which hosts need to be listed as seeds, 
but all nodes in a cluster need the same seed list. For a production 
deployment, DataStax recommends two seeds per data center.

An autobootstrapping node cannot have itself in the list of seeds nor can it 
contain an 
initial_token
 already claimed by another node. To add new seeds, autobootstrap the nodes 
first, and then configure them as seeds.





On Wed, Feb 23, 2011 at 11:39 AM, 
mailto:jeremy.truel...@barclayscapital.com>>
 wrote:
So all seeds should always be set to 'auto_bootstrap: false' in their .yaml 
file.

-Original Message-
From: Edward Capriolo 
[mailto:edlinuxg...@gmail.com]
Sent: Wednesday, February 23, 2011 2:36 PM
To: user@cassandra.apache.org
Cc: Truelove, Jeremy: IT (NYK)
Subject: Re: Multiple Seeds

On Wed, Feb 23, 2011 at 2:30 PM,  
mailto:jeremy.truel...@barclayscapital.com>>
 wrote:
> Yeah I set the tokens, I'm more asking if I start the first seed node with
> autobootstrap set to false the second seed should have it set to true as
> well as all the slave nodes correct? I didn't see this in the docs but I may
> have just missed it.
>
>
>
> From: Eric Gilmore [mailto:e...@datastax.com]
> Sent: Wednesday, February 23, 2011 2:24 PM
> To: user@cassandra.apache.org
> Subject: Re: Multiple Seeds
>
>
>
> The DataStax documentation offers some answers to those questions in the
> Getting Started section and the Clustering reference docs.
>
> Autobootstrap should be true, but with the important caveat that
> intial_token values should be specified.  Have a look at those docs, and
> please give feedback on how helpful they are/aren't.
>
> Regards,
>
> Eric Gilmore
>
> On Wed, Feb 23, 2011 at 11:15 AM, 
> mailto:jeremy.truel...@barclayscapital.com>>
> wrote:
>
> What's the best way to bring multiple seeds up, should only one of them have
> auto bootstrap set to true or should neither of them? Should they list
> themselves and the other seed in their seed section in the yaml config?
>
> ___
>
>
>
> This e-mail may contain information that is confidential, privileged or
> otherwise protected from disclosure. If you are not an intended recipient of
> this e-mail, do not duplicate or redistribute it by any means. Please delete
> it and any attachments and notify the sender that you have received it in
> error. Unless specifically indicated, this e-mail is not an offer to buy or
> sell or a solicitation to buy or sell any securities, investment products or
> other financial product or service, an official confirmation of any
> transaction, or an official statement of Barclays. Any views or opinions
> presented are solely those of the author and do not necessarily represent
> those of Barclays. This e-mail is subject to terms available at the
> following link: 
> www.barcap.com/emaildisclaimer. By 
> messaging with Barclays
> you consent to the foregoing.  Barclays Capital is the investment banking
> division of Barclays Bank PLC, a company registered in England (number
> 1026167) with its registered office at 1 Churchill Place, London, E14 5HP.
> This email may relate to or be sent from other members of the Barclays
> Group.
>
> ___
>
>

If a node is defined as a seeds it will never auto bootstrap. After it
has bootstrapped and has a system table you can set its yaml file as a
seed if you wish.

___

This e-mail may contain information that is confidential, privileged or 
otherwise protected from disclosure. If you are not an intended recipient of 
this e-mail, do not duplicate or redistribute it by any means. Pl

Re: Multiple Seeds

2011-02-23 Thread Edward Capriolo
On Wed, Feb 23, 2011 at 2:59 PM,   wrote:
> To add a host to the seeds list after it has had the data streamed to it I
> need to
>
>
>
> 1.   stop it
>
> 2.   edit the yaml file to
>
> a.   include it in the seeds list
>
> b.  set auto boostrap to false
>
> 3.    restart it
>
>
>
> correct? Additionally you would need to add it to the other nodes seed lists
> and restart them as well.
>
>
>
> From: Eric Gilmore [mailto:e...@datastax.com]
> Sent: Wednesday, February 23, 2011 2:47 PM
> To: user@cassandra.apache.org
> Subject: Re: Multiple Seeds
>
>
>
> Well -- when you first bring a node into a ring, you will probably want to
> stream data to it with auto_bootstrap: true.
>
> If you want that node to be a seed, then add it to the seeds list AFTER it
> has joined the ring.
>
> I'd refer you to the "Seed List" and "Autoboostrapping" sections of the
> Getting Started guide, which contain the following blurbs:
>
> There is no strict rule to determine which hosts need to be listed as seeds,
> but all nodes in a cluster need the same seed list. For a production
> deployment, DataStax recommends two seeds per data center.
>
> An autobootstrapping node cannot have itself in the list of seeds nor can it
> contain an initial_token already claimed by another node. To add new seeds,
> autobootstrap the nodes first, and then configure them as seeds.
>
>
>
>
>
>
>
> On Wed, Feb 23, 2011 at 11:39 AM, 
> wrote:
>
> So all seeds should always be set to 'auto_bootstrap: false' in their .yaml
> file.
>
> -Original Message-
> From: Edward Capriolo [mailto:edlinuxg...@gmail.com]
> Sent: Wednesday, February 23, 2011 2:36 PM
> To: user@cassandra.apache.org
>
> Cc: Truelove, Jeremy: IT (NYK)
> Subject: Re: Multiple Seeds
>
> On Wed, Feb 23, 2011 at 2:30 PM,  
> wrote:
>> Yeah I set the tokens, I'm more asking if I start the first seed node with
>> autobootstrap set to false the second seed should have it set to true as
>> well as all the slave nodes correct? I didn't see this in the docs but I
>> may
>> have just missed it.
>>
>>
>>
>> From: Eric Gilmore [mailto:e...@datastax.com]
>> Sent: Wednesday, February 23, 2011 2:24 PM
>> To: user@cassandra.apache.org
>> Subject: Re: Multiple Seeds
>>
>>
>>
>> The DataStax documentation offers some answers to those questions in the
>> Getting Started section and the Clustering reference docs.
>>
>> Autobootstrap should be true, but with the important caveat that
>> intial_token values should be specified.  Have a look at those docs, and
>> please give feedback on how helpful they are/aren't.
>>
>> Regards,
>>
>> Eric Gilmore
>>
>> On Wed, Feb 23, 2011 at 11:15 AM, 
>> wrote:
>>
>> What's the best way to bring multiple seeds up, should only one of them
>> have
>> auto bootstrap set to true or should neither of them? Should they list
>> themselves and the other seed in their seed section in the yaml config?
>>
>> ___
>>
>>
>>
>> This e-mail may contain information that is confidential, privileged or
>> otherwise protected from disclosure. If you are not an intended recipient
>> of
>> this e-mail, do not duplicate or redistribute it by any means. Please
>> delete
>> it and any attachments and notify the sender that you have received it in
>> error. Unless specifically indicated, this e-mail is not an offer to buy
>> or
>> sell or a solicitation to buy or sell any securities, investment products
>> or
>> other financial product or service, an official confirmation of any
>> transaction, or an official statement of Barclays. Any views or opinions
>> presented are solely those of the author and do not necessarily represent
>> those of Barclays. This e-mail is subject to terms available at the
>> following link: www.barcap.com/emaildisclaimer. By messaging with Barclays
>> you consent to the foregoing.  Barclays Capital is the investment banking
>> division of Barclays Bank PLC, a company registered in England (number
>> 1026167) with its registered office at 1 Churchill Place, London, E14 5HP.
>> This email may relate to or be sent from other members of the Barclays
>> Group.
>>
>> ___
>>
>>
>
> If a node is defined as a seeds it will never auto bootstrap. After it
> has bootstrapped and has a system table you can set its yaml file as a
> seed if you wish.
>
>
>
> ___
>
>
>
> This e-mail may contain information that is confidential, privileged or
> otherwise protected from disclosure. If you are not an intended recipient of
> this e-mail, do not duplicate or redistribute it by any means. Please delete
> it and any attachments and notify the sender that you have received it in
> error. Unless specifically indicated, this e-mail is not an offer to buy or
> sell or a solicitation to buy or sell any securities, investment products or
> other financial product or service, an official confirmation of any
> transaction, or an official statement of

RE: Multiple Seeds

2011-02-23 Thread Jeremy.Truelove
To add a host to the seeds list after it has had the data streamed to it I need 
to


1.   stop it

2.   edit the yaml file to

a.   include it in the seeds list

b.  set auto boostrap to false

3.restart it

correct? Additionally you would need to add it to the other nodes seed lists 
and restart them as well.

From: Eric Gilmore [mailto:e...@datastax.com]
Sent: Wednesday, February 23, 2011 2:47 PM
To: user@cassandra.apache.org
Subject: Re: Multiple Seeds

Well -- when you first bring a node into a ring, you will probably want to 
stream data to it with auto_bootstrap: true.

If you want that node to be a seed, then add it to the seeds list AFTER it has 
joined the ring.

I'd refer you to the "Seed List" and "Autoboostrapping" sections of the Getting 
Started guide, which contain the following blurbs:

There is no strict rule to determine which hosts need to be listed as seeds, 
but all nodes in a cluster need the same seed list. For a production 
deployment, DataStax recommends two seeds per data center.

An autobootstrapping node cannot have itself in the list of seeds nor can it 
contain an 
initial_token
 already claimed by another node. To add new seeds, autobootstrap the nodes 
first, and then configure them as seeds.





On Wed, Feb 23, 2011 at 11:39 AM, 
mailto:jeremy.truel...@barclayscapital.com>>
 wrote:
So all seeds should always be set to 'auto_bootstrap: false' in their .yaml 
file.

-Original Message-
From: Edward Capriolo 
[mailto:edlinuxg...@gmail.com]
Sent: Wednesday, February 23, 2011 2:36 PM
To: user@cassandra.apache.org
Cc: Truelove, Jeremy: IT (NYK)
Subject: Re: Multiple Seeds

On Wed, Feb 23, 2011 at 2:30 PM,  
mailto:jeremy.truel...@barclayscapital.com>>
 wrote:
> Yeah I set the tokens, I'm more asking if I start the first seed node with
> autobootstrap set to false the second seed should have it set to true as
> well as all the slave nodes correct? I didn't see this in the docs but I may
> have just missed it.
>
>
>
> From: Eric Gilmore [mailto:e...@datastax.com]
> Sent: Wednesday, February 23, 2011 2:24 PM
> To: user@cassandra.apache.org
> Subject: Re: Multiple Seeds
>
>
>
> The DataStax documentation offers some answers to those questions in the
> Getting Started section and the Clustering reference docs.
>
> Autobootstrap should be true, but with the important caveat that
> intial_token values should be specified.  Have a look at those docs, and
> please give feedback on how helpful they are/aren't.
>
> Regards,
>
> Eric Gilmore
>
> On Wed, Feb 23, 2011 at 11:15 AM, 
> mailto:jeremy.truel...@barclayscapital.com>>
> wrote:
>
> What's the best way to bring multiple seeds up, should only one of them have
> auto bootstrap set to true or should neither of them? Should they list
> themselves and the other seed in their seed section in the yaml config?
>
> ___
>
>
>
> This e-mail may contain information that is confidential, privileged or
> otherwise protected from disclosure. If you are not an intended recipient of
> this e-mail, do not duplicate or redistribute it by any means. Please delete
> it and any attachments and notify the sender that you have received it in
> error. Unless specifically indicated, this e-mail is not an offer to buy or
> sell or a solicitation to buy or sell any securities, investment products or
> other financial product or service, an official confirmation of any
> transaction, or an official statement of Barclays. Any views or opinions
> presented are solely those of the author and do not necessarily represent
> those of Barclays. This e-mail is subject to terms available at the
> following link: 
> www.barcap.com/emaildisclaimer. By 
> messaging with Barclays
> you consent to the foregoing.  Barclays Capital is the investment banking
> division of Barclays Bank PLC, a company registered in England (number
> 1026167) with its registered office at 1 Churchill Place, London, E14 5HP.
> This email may relate to or be sent from other members of the Barclays
> Group.
>
> ___
>
>

If a node is defined as a seeds it will never auto bootstrap. After it
has bootstrapped and has a system table you can set its yaml file as a
seed if you wish.


___

This e-mail may contain information that is confidential, privileged or 
otherwise protected from disclosure. If you are not an intended recipient of 
this e-mail, do not duplicate or redistribute it by any means. Please delete it 
and any attachments and notify the sender that you have received it in error. 
Unless specifically indicated, this e-mail is not an offer to buy or sell or a 
solicitation to buy or sell any

Re: Splitting a single row into multiple

2011-02-23 Thread Aaron Morton
AFAIK performance in the single row case will better. Multi get may require 
multiple seeks and reads in an sstable,, verses obviously a single seek and 
read for a single row. Multiplied by the number of sstables that contain row 
data.

Using the key cache would reduce the the seeks.

If it makes sense in your app do it. In general though try to model data so a 
single row read gets what you need.

Aaron

On 24/02/2011, at 5:59 AM, Aditya Narayan  wrote:

> Does it make any difference if I split a row, that needs to be
> accessed together, into two or three rows and then read those multiple
> rows ??
> (Assume the keys of all the three rows are known to me programatically
> since I split columns by certain categories).
> Would the performance be any better if all the three were just a single row ??
> 
> I guess the performance should be same in both cases, the columns
> remain the same in quantity & there spread into several SST files..


Re: Multiple Seeds

2011-02-23 Thread Eric Gilmore
Well -- when you first bring a node into a ring, you will probably want to
stream data to it with auto_bootstrap: true.

If you want that node to be a seed, then add it to the seeds list AFTER it
has joined the ring.

I'd refer you to the "Seed List" and "Autoboostrapping" sections of the
Getting Started guide, which contain the following blurbs:

*There is no strict rule to determine which hosts need to be listed as
seeds, but all nodes in a cluster need the same seed list. For a production
deployment, DataStax recommends two seeds per data center.*
*
*

*An autobootstrapping node cannot have itself in the list of seeds nor can
it contain an 
initial_tokenalready
claimed by another node. To add new seeds, autobootstrap the nodes
first, and then configure them as seeds.*





On Wed, Feb 23, 2011 at 11:39 AM, wrote:

> So all seeds should always be set to 'auto_bootstrap: false' in their .yaml
> file.
>
> -Original Message-
> From: Edward Capriolo [mailto:edlinuxg...@gmail.com]
> Sent: Wednesday, February 23, 2011 2:36 PM
> To: user@cassandra.apache.org
> Cc: Truelove, Jeremy: IT (NYK)
> Subject: Re: Multiple Seeds
>
> On Wed, Feb 23, 2011 at 2:30 PM,  
> wrote:
> > Yeah I set the tokens, I'm more asking if I start the first seed node
> with
> > autobootstrap set to false the second seed should have it set to true as
> > well as all the slave nodes correct? I didn't see this in the docs but I
> may
> > have just missed it.
> >
> >
> >
> > From: Eric Gilmore [mailto:e...@datastax.com]
> > Sent: Wednesday, February 23, 2011 2:24 PM
> > To: user@cassandra.apache.org
> > Subject: Re: Multiple Seeds
> >
> >
> >
> > The DataStax documentation offers some answers to those questions in the
> > Getting Started section and the Clustering reference docs.
> >
> > Autobootstrap should be true, but with the important caveat that
> > intial_token values should be specified.  Have a look at those docs, and
> > please give feedback on how helpful they are/aren't.
> >
> > Regards,
> >
> > Eric Gilmore
> >
> > On Wed, Feb 23, 2011 at 11:15 AM, 
> > wrote:
> >
> > What's the best way to bring multiple seeds up, should only one of them
> have
> > auto bootstrap set to true or should neither of them? Should they list
> > themselves and the other seed in their seed section in the yaml config?
> >
> > ___
> >
> >
> >
> > This e-mail may contain information that is confidential, privileged or
> > otherwise protected from disclosure. If you are not an intended recipient
> of
> > this e-mail, do not duplicate or redistribute it by any means. Please
> delete
> > it and any attachments and notify the sender that you have received it in
> > error. Unless specifically indicated, this e-mail is not an offer to buy
> or
> > sell or a solicitation to buy or sell any securities, investment products
> or
> > other financial product or service, an official confirmation of any
> > transaction, or an official statement of Barclays. Any views or opinions
> > presented are solely those of the author and do not necessarily represent
> > those of Barclays. This e-mail is subject to terms available at the
> > following link: www.barcap.com/emaildisclaimer. By messaging with
> Barclays
> > you consent to the foregoing.  Barclays Capital is the investment banking
> > division of Barclays Bank PLC, a company registered in England (number
> > 1026167) with its registered office at 1 Churchill Place, London, E14
> 5HP.
> > This email may relate to or be sent from other members of the Barclays
> > Group.
> >
> > ___
> >
> >
>
> If a node is defined as a seeds it will never auto bootstrap. After it
> has bootstrapped and has a system table you can set its yaml file as a
> seed if you wish.
>


Re: How does Cassandra handle failure during synchronous writes

2011-02-23 Thread Aaron Morton
At CL levels high than ANY hinted handoff will be used if enabled. It does not 
contribute to the number of replicas considered written by the coordinator 
though. E.g. If you ask for quorum, and this is 3 nodes, and only 2 are up the 
write will fail without starting. In this case the HH is included in the 
message sent to one of the up nodes.

At CL any HH is accepted as a viable replica. Even if all the natural endpoints 
are down the coordinator node will store the HH.

aaron
On 24/02/2011, at 3:28 AM, Javier Canillas  wrote:
> There is something call Hinted Handoff. Suppose that you WRITE something with 
> ConsistencyLevel.ONE on a cluster defined by 4 nodes. Then, the write is done 
> on the corresponding node and it is returned an OK to the client, even if the 
> ReplicationFactor over the destination Keyspace is set to a higher value.
> 
> If in that write, one of the replicated nodes is down, then the coordinator 
> node (the one that will hold value if first place) will mark that replication 
> message as not sent and will retry eventually, making the replication happens.
> 
> Please, if I have explained it wrongly correct me. 
> 
> On Wed, Feb 23, 2011 at 5:45 AM, Aaron Morton  wrote:
> In the case described below if less than CL nodes respond in rpc_timeout 
> (from conf yaml) the client will get a timeout error. I think most higher 
> level clients will automatically retry in this case.
> 
> If there are not enough nodes to start the request you will get an 
> Unavailable exception. Again the client can retry safely.
> 
> Aaron
> 
> 
> On 23/02/2011, at 8:07 PM, Dave Revell  wrote:
> 
>> Ritesh,
>> 
>> There is no commit protocol. Writes may be persisted on some replicas even 
>> though the quorum fails. Here's a sequence of events that shows the 
>> "problem:"
>> 
>> 1. Some replica R fails, but recently, so its failure has not yet been 
>> detected
>> 2. A client writes with consistency > 1
>> 3. The write goes to all replicas, all replicas except R persist the write 
>> to disk
>> 4. Replica R never responds
>> 5. Failure is returned to the client, but the new value is still in the 
>> cluster, on all replicas except R.
>> 
>> Something very similar could happen for CL QUORUM.
>> 
>> This is a conscious design decision because a commit protocol would 
>> constitute tight coupling between nodes, which goes against the Cassandra 
>> philosophy. But unfortunately you do have to write your app with this case 
>> in mind.
>> 
>> Best,
>> Dave
>> 
>> On Tue, Feb 22, 2011 at 8:22 PM, tijoriwala.ritesh 
>>  wrote:
>> 
>> Hi,
>> I wanted to get details on how does cassandra do synchronous writes to W
>> replicas (out of N)? Does it do a 2PC? If not, how does it deal with
>> failures of of nodes before it gets to write to W replicas? If the
>> orchestrating node cannot write to W nodes successfully, I guess it will
>> fail the write operation but what happens to the completed writes on X (W >
>> X) nodes?
>> 
>> Thanks,
>> Ritesh
>> --
>> View this message in context: 
>> http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/How-does-Cassandra-handle-failure-during-synchronous-writes-tp6055152p6055152.html
>> Sent from the cassandra-u...@incubator.apache.org mailing list archive at 
>> Nabble.com.
>> 
> 


RE: Multiple Seeds

2011-02-23 Thread Jeremy.Truelove
So all seeds should always be set to 'auto_bootstrap: false' in their .yaml 
file.

-Original Message-
From: Edward Capriolo [mailto:edlinuxg...@gmail.com] 
Sent: Wednesday, February 23, 2011 2:36 PM
To: user@cassandra.apache.org
Cc: Truelove, Jeremy: IT (NYK)
Subject: Re: Multiple Seeds

On Wed, Feb 23, 2011 at 2:30 PM,   wrote:
> Yeah I set the tokens, I'm more asking if I start the first seed node with
> autobootstrap set to false the second seed should have it set to true as
> well as all the slave nodes correct? I didn't see this in the docs but I may
> have just missed it.
>
>
>
> From: Eric Gilmore [mailto:e...@datastax.com]
> Sent: Wednesday, February 23, 2011 2:24 PM
> To: user@cassandra.apache.org
> Subject: Re: Multiple Seeds
>
>
>
> The DataStax documentation offers some answers to those questions in the
> Getting Started section and the Clustering reference docs.
>
> Autobootstrap should be true, but with the important caveat that
> intial_token values should be specified.  Have a look at those docs, and
> please give feedback on how helpful they are/aren't.
>
> Regards,
>
> Eric Gilmore
>
> On Wed, Feb 23, 2011 at 11:15 AM, 
> wrote:
>
> What's the best way to bring multiple seeds up, should only one of them have
> auto bootstrap set to true or should neither of them? Should they list
> themselves and the other seed in their seed section in the yaml config?
>
> ___
>
>
>
> This e-mail may contain information that is confidential, privileged or
> otherwise protected from disclosure. If you are not an intended recipient of
> this e-mail, do not duplicate or redistribute it by any means. Please delete
> it and any attachments and notify the sender that you have received it in
> error. Unless specifically indicated, this e-mail is not an offer to buy or
> sell or a solicitation to buy or sell any securities, investment products or
> other financial product or service, an official confirmation of any
> transaction, or an official statement of Barclays. Any views or opinions
> presented are solely those of the author and do not necessarily represent
> those of Barclays. This e-mail is subject to terms available at the
> following link: www.barcap.com/emaildisclaimer. By messaging with Barclays
> you consent to the foregoing.  Barclays Capital is the investment banking
> division of Barclays Bank PLC, a company registered in England (number
> 1026167) with its registered office at 1 Churchill Place, London, E14 5HP.
> This email may relate to or be sent from other members of the Barclays
> Group.
>
> ___
>
>

If a node is defined as a seeds it will never auto bootstrap. After it
has bootstrapped and has a system table you can set its yaml file as a
seed if you wish.


Re: Multiple Seeds

2011-02-23 Thread Edward Capriolo
On Wed, Feb 23, 2011 at 2:30 PM,   wrote:
> Yeah I set the tokens, I’m more asking if I start the first seed node with
> autobootstrap set to false the second seed should have it set to true as
> well as all the slave nodes correct? I didn’t see this in the docs but I may
> have just missed it.
>
>
>
> From: Eric Gilmore [mailto:e...@datastax.com]
> Sent: Wednesday, February 23, 2011 2:24 PM
> To: user@cassandra.apache.org
> Subject: Re: Multiple Seeds
>
>
>
> The DataStax documentation offers some answers to those questions in the
> Getting Started section and the Clustering reference docs.
>
> Autobootstrap should be true, but with the important caveat that
> intial_token values should be specified.  Have a look at those docs, and
> please give feedback on how helpful they are/aren't.
>
> Regards,
>
> Eric Gilmore
>
> On Wed, Feb 23, 2011 at 11:15 AM, 
> wrote:
>
> What’s the best way to bring multiple seeds up, should only one of them have
> auto bootstrap set to true or should neither of them? Should they list
> themselves and the other seed in their seed section in the yaml config?
>
> ___
>
>
>
> This e-mail may contain information that is confidential, privileged or
> otherwise protected from disclosure. If you are not an intended recipient of
> this e-mail, do not duplicate or redistribute it by any means. Please delete
> it and any attachments and notify the sender that you have received it in
> error. Unless specifically indicated, this e-mail is not an offer to buy or
> sell or a solicitation to buy or sell any securities, investment products or
> other financial product or service, an official confirmation of any
> transaction, or an official statement of Barclays. Any views or opinions
> presented are solely those of the author and do not necessarily represent
> those of Barclays. This e-mail is subject to terms available at the
> following link: www.barcap.com/emaildisclaimer. By messaging with Barclays
> you consent to the foregoing.  Barclays Capital is the investment banking
> division of Barclays Bank PLC, a company registered in England (number
> 1026167) with its registered office at 1 Churchill Place, London, E14 5HP.
> This email may relate to or be sent from other members of the Barclays
> Group.
>
> ___
>
>

If a node is defined as a seeds it will never auto bootstrap. After it
has bootstrapped and has a system table you can set its yaml file as a
seed if you wish.


RE: Multiple Seeds

2011-02-23 Thread Jeremy.Truelove
Yeah I set the tokens, I'm more asking if I start the first seed node with 
autobootstrap set to false the second seed should have it set to true as well 
as all the slave nodes correct? I didn't see this in the docs but I may have 
just missed it.

From: Eric Gilmore [mailto:e...@datastax.com]
Sent: Wednesday, February 23, 2011 2:24 PM
To: user@cassandra.apache.org
Subject: Re: Multiple Seeds

The DataStax documentation offers some answers to those questions in the 
Getting 
Started
 section and the 
Clustering
 reference docs.

Autobootstrap should be true, but with the important caveat that intial_token 
values should be specified.  Have a look at those docs, and please give 
feedback on how helpful they are/aren't.

Regards,

Eric Gilmore

On Wed, Feb 23, 2011 at 11:15 AM, 
mailto:jeremy.truel...@barclayscapital.com>>
 wrote:
What's the best way to bring multiple seeds up, should only one of them have 
auto bootstrap set to true or should neither of them? Should they list 
themselves and the other seed in their seed section in the yaml config?
___

This e-mail may contain information that is confidential, privileged or 
otherwise protected from disclosure. If you are not an intended recipient of 
this e-mail, do not duplicate or redistribute it by any means. Please delete it 
and any attachments and notify the sender that you have received it in error. 
Unless specifically indicated, this e-mail is not an offer to buy or sell or a 
solicitation to buy or sell any securities, investment products or other 
financial product or service, an official confirmation of any transaction, or 
an official statement of Barclays. Any views or opinions presented are solely 
those of the author and do not necessarily represent those of Barclays. This 
e-mail is subject to terms available at the following link: 
www.barcap.com/emaildisclaimer. By 
messaging with Barclays you consent to the foregoing.  Barclays Capital is the 
investment banking division of Barclays Bank PLC, a company registered in 
England (number 1026167) with its registered office at 1 Churchill Place, 
London, E14 5HP.  This email may relate to or be sent from other members of the 
Barclays Group.
___



Re: Multiple Seeds

2011-02-23 Thread Eric Gilmore
The DataStax documentation offers some answers to those questions in
the Getting
Startedsection
and the
Clusteringreference
docs.

Autobootstrap should be true, but with the important caveat that
intial_token values should be specified.  Have a look at those docs, and
please give feedback on how helpful they are/aren't.

Regards,

Eric Gilmore


On Wed, Feb 23, 2011 at 11:15 AM, wrote:

> What’s the best way to bring multiple seeds up, should only one of them
> have auto bootstrap set to true or should neither of them? Should they list
> themselves and the other seed in their seed section in the yaml config?
>
> ___
>
>
>
> This e-mail may contain information that is confidential, privileged or
> otherwise protected from disclosure. If you are not an intended recipient of
> this e-mail, do not duplicate or redistribute it by any means. Please delete
> it and any attachments and notify the sender that you have received it in
> error. Unless specifically indicated, this e-mail is not an offer to buy or
> sell or a solicitation to buy or sell any securities, investment products or
> other financial product or service, an official confirmation of any
> transaction, or an official statement of Barclays. Any views or opinions
> presented are solely those of the author and do not necessarily represent
> those of Barclays. This e-mail is subject to terms available at the
> following link: www.barcap.com/emaildisclaimer. By messaging with Barclays
> you consent to the foregoing.  Barclays Capital is the investment banking
> division of Barclays Bank PLC, a company registered in England (number
> 1026167) with its registered office at 1 Churchill Place, London, E14 5HP.
> This email may relate to or be sent from other members of the Barclays
> Group.**
>
> ___
>


Multiple Seeds

2011-02-23 Thread Jeremy.Truelove
What's the best way to bring multiple seeds up, should only one of them have 
auto bootstrap set to true or should neither of them? Should they list 
themselves and the other seed in their seed section in the yaml config?

___

This e-mail may contain information that is confidential, privileged or 
otherwise protected from disclosure. If you are not an intended recipient of 
this e-mail, do not duplicate or redistribute it by any means. Please delete it 
and any attachments and notify the sender that you have received it in error. 
Unless specifically indicated, this e-mail is not an offer to buy or sell or a 
solicitation to buy or sell any securities, investment products or other 
financial product or service, an official confirmation of any transaction, or 
an official statement of Barclays. Any views or opinions presented are solely 
those of the author and do not necessarily represent those of Barclays. This 
e-mail is subject to terms available at the following link: 
www.barcap.com/emaildisclaimer. By messaging with Barclays you consent to the 
foregoing.  Barclays Capital is the investment banking division of Barclays 
Bank PLC, a company registered in England (number 1026167) with its registered 
office at 1 Churchill Place, London, E14 5HP.  This email may relate to or be 
sent from other members of the Barclays Group.
___


Re: Cassandra nodes on EC2 in two different regions not communicating

2011-02-23 Thread Peter Fales
I posted on this topic last September.   (See
http://www.mail-archive.com/user@cassandra.apache.org/msg05692.html)

I was able to use Cassandra across EC2regions.  However, the trick is 
that you have must use the "external" addresses in your storage-conf.xml, 
but since you don't have a NIC that can actually bind to those addresses, you
need to listen on the "internal" addresses (or, more simply, all interfaces)

At the time, I was  not able to get the cross-region cluster to work
without making changes to the Cassandra code.  Perhaps things have
evolved so that there are other ways to do it now.

On Wed, Feb 23, 2011 at 11:31:05AM -0600, Dave Viner wrote:
> 
>internal EC2 ips (10.xxx.xxx.xxx) work across availability zones
>(e.g., from us-east-1a to us-east-1b) but do not work across regions
>(e.g., us-east to us-west).  To do regions, you must use the public ip
>address assigned by amazon.
> 
>Himanshi, when you log into 1 node, and telnet to port 7000 on the
>other node, which IP address did you use - the 10.x address or the
>public ip address?
> 
>And what is the seed/non-seed configuration in both cassandra.yaml
>files?
> 
>Dave Viner
> 
>On Wed, Feb 23, 2011 at 8:12 AM, Frank LoVecchio
><[1]fr...@isidorey.com> wrote:
> 
>  The internal Amazon IP address is what you will want to use so you
>  don't have to go through DNS anyways; not sure if this works from
>  US-East to US-West, but it does make things quicker in between
>  zones, e.g. us-east-1a to us-east-1b.
> 
>On Wed, Feb 23, 2011 at 9:09 AM, Dave Viner <[2]davevi...@gmail.com>
>wrote:
> 
>  Try using the IP address, not the dns name in the cassandra.yaml.
> 
>If you can telnet from one to the other on port 7000, and both nodes
>have the other node in their config, it should work.
> 
>Dave Viner
> 
>On Wed, Feb 23, 2011 at 1:43 AM, Himanshi Sharma
><[3]himanshi.sha...@tcs.com> wrote:
> 
>  Ya they do. Have specified Public DNS in seed field of each node in
>  Cassandra.yaml...nt able to figure out what the problem is ???
> 
>From: Sasha Dolgy <[4]sdo...@gmail.com>
>To: [5]user@cassandra.apache.org
>Date: 02/23/2011 02:56 PM
>Subject: Re: Cassandra nodes on EC2 in two different regions not
>communicating
>__
> 
>  did you define the other host in the cassandra.yaml ?  on both
>  servers  they need to know about each other
>  On Wed, Feb 23, 2011 at 10:16 AM, Himanshi Sharma
>  <[6]himanshi.sha...@tcs.com> wrote:
>  Thanks Dave but I am able to telnet to other instances on port 7000
>  and when i run  ./nodetool --host
>  [7]ec2-50-18-60-117.us-west-1.compute.amazonaws.com  ring... I can
>  see only one node.
>  Do we need to configure anything else in Cassandra.yaml or
>  Cassandra-env.sh ???
> 
>From: Dave Viner <[8]davevi...@gmail.com>
>To: [9]user@cassandra.apache.org
>Cc: Himanshi Sharma <[10]himanshi.sha...@tcs.com>
>Date: 02/23/2011 11:36 AM
>Subject: Re: Cassandra nodes on EC2 in two different regions not
>communicating
>__
> 
>  If you login to one of the nodes, can you telnet to port 7000 on
>  the other node?
>  If not, then almost certainly it's a firewall/Security Group issue.
>  You can find out the security groups for any node by logging in,
>  and then running:
>  % curl
>  "[11]http://169.254.169.254/latest/meta-data/security-groups";
>  Assuming that both nodes are in the same security group, ensure
>  that the SG is configured to allow other members of the SG to
>  communicate on port 7000 to each other.
>  HTH,
>  Dave Viner
>  On Tue, Feb 22, 2011 at 8:59 PM, Himanshi Sharma
>  <[12]himanshi.sha...@tcs.com> wrote:
>  Hi,
>  I am new to Cassandra. I m running Cassandra on EC2. I configured
>  Cassandra cluster on two instances in different regions.
>  But when I am trying the nodetool command with ring option, I am
>  getting only single node.
>  How to make these two nodes communicate with each other. I have
>  already opened required ports. i.e 7000, 8080, 9160 in respective
>  security groups. Plz help me with this.
>  Regards,
>  Himanshi Sharma
>  =-=-=
>  Notice: The information contained in this e-mail
>  message and/or attachments to it may contain
>  confidential or privileged information. If you are
>  not the intended recipient, any dissemination, use,
>  review, distribution, printing or copying of the
>  information contained in this e-mail message
>  and/or attachments to it are strictly prohibited. If
>  you have received this communication in error,
>  please notify us by reply e-mail or telephone and
>  immediately and permanently 

Re: Migrate from 0.6.5 to 0.7.2

2011-02-23 Thread Jonathan Ellis
On Wed, Feb 23, 2011 at 11:18 AM, Zhong Li  wrote:
> Hi all,
>
> We want migrate from version 0.6.5 to version 0.7.2. Is there step by step
>  guide or document  we can follow?

NEWS.txt

> Also there is new branch cassandra-0.7.2 on svn, what is purpose to create
> the new branch instead of one branch cassandra-0.7? Will you maintain both
> branches?

We create temporary branches during the release process in case we
need to abort the release and retry with bug fixes.  They are removed
eventually.

-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of DataStax, the source for professional Cassandra support
http://www.datastax.com


Re: Cassandra nodes on EC2 in two different regions not communicating

2011-02-23 Thread Dave Viner
internal EC2 ips (10.xxx.xxx.xxx) work across availability zones (e.g., from
us-east-1a to us-east-1b) but do not work across regions (e.g., us-east to
us-west).  To do regions, you must use the public ip address assigned by
amazon.

Himanshi, when you log into 1 node, and telnet to port 7000 on the other
node, which IP address did you use - the 10.x address or the public ip
address?
And what is the seed/non-seed configuration in both cassandra.yaml files?

Dave Viner


On Wed, Feb 23, 2011 at 8:12 AM, Frank LoVecchio  wrote:

> The internal Amazon IP address is what you will want to use so you don't
> have to go through DNS anyways; not sure if this works from US-East to
> US-West, but it does make things quicker in between zones, e.g. us-east-1a
> to us-east-1b.
>
>
> On Wed, Feb 23, 2011 at 9:09 AM, Dave Viner  wrote:
>
>> Try using the IP address, not the dns name in the cassandra.yaml.
>>
>> If you can telnet from one to the other on port 7000, and both nodes have
>> the other node in their config, it should work.
>>
>>  Dave Viner
>>
>>
>> On Wed, Feb 23, 2011 at 1:43 AM, Himanshi Sharma > > wrote:
>>
>>>
>>> Ya they do. Have specified Public DNS in seed field of each node in
>>> Cassandra.yaml...nt able to figure out what the problem is ???
>>>
>>>
>>>
>>>
>>>  From: Sasha Dolgy  To: user@cassandra.apache.org
>>> Date: 02/23/2011 02:56 PM Subject: Re: Cassandra nodes on EC2 in two
>>> different regions not communicating
>>> --
>>>
>>>
>>>
>>> did you define the other host in the cassandra.yaml ?  on both servers
>>>  they need to know about each other
>>>
>>> On Wed, Feb 23, 2011 at 10:16 AM, Himanshi Sharma <*
>>> himanshi.sha...@tcs.com* > wrote:
>>>
>>> Thanks Dave but I am able to telnet to other instances on port 7000
>>> and when i run  ./nodetool --host *
>>> ec2-50-18-60-117.us-west-1.compute.amazonaws.com*
>>>  ring... I can see only one node.
>>>
>>> Do we need to configure anything else in Cassandra.yaml or
>>> Cassandra-env.sh ???
>>>
>>>
>>>
>>>
>>>
>>>
>>>   From: Dave Viner <*davevi...@gmail.com* >  To: *
>>> user@cassandra.apache.org*   Cc: Himanshi
>>> Sharma <*himanshi.sha...@tcs.com* >  Date: 
>>> 02/23/2011
>>> 11:36 AM  Subject: Re: Cassandra nodes on EC2 in two different regions
>>> not communicating
>>>
>>>  --
>>>
>>>
>>>
>>> If you login to one of the nodes, can you telnet to port 7000 on the
>>> other node?
>>>
>>> If not, then almost certainly it's a firewall/Security Group issue.
>>>
>>> You can find out the security groups for any node by logging in, and then
>>> running:
>>>
>>> % curl 
>>> "*http://169.254.169.254/latest/meta-data/security-groups*"
>>>
>>>
>>> Assuming that both nodes are in the same security group, ensure that the
>>> SG is configured to allow other members of the SG to communicate on port
>>> 7000 to each other.
>>>
>>> HTH,
>>> Dave Viner
>>>
>>>
>>> On Tue, Feb 22, 2011 at 8:59 PM, Himanshi Sharma <*
>>> himanshi.sha...@tcs.com* > wrote:
>>>
>>> Hi,
>>>
>>> I am new to Cassandra. I m running Cassandra on EC2. I configured
>>> Cassandra cluster on two instances in different regions.
>>> But when I am trying the nodetool command with ring option, I am getting
>>> only single node.
>>>
>>> How to make these two nodes communicate with each other. I have already
>>> opened required ports. i.e 7000, 8080, 9160 in respective
>>> security groups. Plz help me with this.
>>>
>>> Regards,
>>> Himanshi Sharma
>>>
>>>
>>> =-=-=
>>> Notice: The information contained in this e-mail
>>> message and/or attachments to it may contain
>>> confidential or privileged information. If you are
>>>
>>> not the intended recipient, any dissemination, use,
>>> review, distribution, printing or copying of the
>>> information contained in this e-mail message
>>> and/or attachments to it are strictly prohibited. If
>>> you have received this communication in error,
>>>
>>> please notify us by reply e-mail or telephone and
>>> immediately and permanently delete the message
>>> and any attachments. Thank you
>>>
>>>
>>>
>>>
>>> =-=-=
>>>
>>>
>>> Notice: The information contained in this e-mail
>>> message and/or attachments to it may contain
>>> confidential or privileged information. If you are
>>> not the intended recipient, any dissemination, use,
>>> review, distribution, printing or copying of the
>>>
>>>
>>> information contained in this e-mail message
>>> and/or attachments to it are strictly prohibited. If
>>> you have received this communication in error,
>>> please notify us by reply e-mail or telephone and
>>> immediately and permanently delete the message
>>>
>>>
>>> and any attachments. Thank you
>>>
>>>
>>>
>>>
>>>
>>> --
>>> Sasha Dolgy*
>>> **sasha.do...@gmail.com* 
>>>
>>> =-=-=
>>> Notice: The information contained in this e-mail
>>>

Re: Does Cassandra use vector clocks

2011-02-23 Thread Jonathan Ellis
On Wed, Feb 23, 2011 at 9:57 AM, Oleg Anastasyev  wrote:
> From the other hand, the same article says:
> "For conditional writes to work, the condition must be evaluated at all update
> sites before the write can be allowed to succeed."
>
> This means, that when doing such an update CL=ALL must be used.

Exactly.

> This could be a
> bearable price to pay for conflict detection.

IMO if you only get CL.ALL it's not superior enough to pessimistic
locking to justify the complexity of adding it.

-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of DataStax, the source for professional Cassandra support
http://www.datastax.com


Migrate from 0.6.5 to 0.7.2

2011-02-23 Thread Zhong Li

Hi all,

We want migrate from version 0.6.5 to version 0.7.2. Is there step by  
step  guide or document  we can follow?


Also there is new branch cassandra-0.7.2 on svn, what is purpose to  
create the new branch instead of one branch cassandra-0.7? Will you  
maintain both branches?


Thanks,

Zhong Li


Splitting a single row into multiple

2011-02-23 Thread Aditya Narayan
Does it make any difference if I split a row, that needs to be
accessed together, into two or three rows and then read those multiple
rows ??
(Assume the keys of all the three rows are known to me programatically
since I split columns by certain categories).
Would the performance be any better if all the three were just a single row ??

I guess the performance should be same in both cases, the columns
remain the same in quantity & there spread into several SST files..


Re: Cassandra nodes on EC2 in two different regions not communicating

2011-02-23 Thread Frank LoVecchio
The internal Amazon IP address is what you will want to use so you don't
have to go through DNS anyways; not sure if this works from US-East to
US-West, but it does make things quicker in between zones, e.g. us-east-1a
to us-east-1b.

On Wed, Feb 23, 2011 at 9:09 AM, Dave Viner  wrote:

> Try using the IP address, not the dns name in the cassandra.yaml.
>
> If you can telnet from one to the other on port 7000, and both nodes have
> the other node in their config, it should work.
>
> Dave Viner
>
>
> On Wed, Feb 23, 2011 at 1:43 AM, Himanshi Sharma 
> wrote:
>
>>
>> Ya they do. Have specified Public DNS in seed field of each node in
>> Cassandra.yaml...nt able to figure out what the problem is ???
>>
>>
>>
>>
>>  From: Sasha Dolgy  To: user@cassandra.apache.org Date: 
>> 02/23/2011
>> 02:56 PM Subject: Re: Cassandra nodes on EC2 in two different regions not
>> communicating
>> --
>>
>>
>>
>> did you define the other host in the cassandra.yaml ?  on both servers
>>  they need to know about each other
>>
>> On Wed, Feb 23, 2011 at 10:16 AM, Himanshi Sharma <*
>> himanshi.sha...@tcs.com* > wrote:
>>
>> Thanks Dave but I am able to telnet to other instances on port 7000
>> and when i run  ./nodetool --host *
>> ec2-50-18-60-117.us-west-1.compute.amazonaws.com*
>>  ring... I can see only one node.
>>
>> Do we need to configure anything else in Cassandra.yaml or
>> Cassandra-env.sh ???
>>
>>
>>
>>
>>
>>
>>   From: Dave Viner <*davevi...@gmail.com* >  To: *
>> user@cassandra.apache.org*   Cc: Himanshi
>> Sharma <*himanshi.sha...@tcs.com* >  Date: 
>> 02/23/2011
>> 11:36 AM  Subject: Re: Cassandra nodes on EC2 in two different regions
>> not communicating
>>
>>  --
>>
>>
>>
>> If you login to one of the nodes, can you telnet to port 7000 on the other
>> node?
>>
>> If not, then almost certainly it's a firewall/Security Group issue.
>>
>> You can find out the security groups for any node by logging in, and then
>> running:
>>
>> % curl 
>> "*http://169.254.169.254/latest/meta-data/security-groups*"
>>
>>
>> Assuming that both nodes are in the same security group, ensure that the
>> SG is configured to allow other members of the SG to communicate on port
>> 7000 to each other.
>>
>> HTH,
>> Dave Viner
>>
>>
>> On Tue, Feb 22, 2011 at 8:59 PM, Himanshi Sharma <*
>> himanshi.sha...@tcs.com* > wrote:
>>
>> Hi,
>>
>> I am new to Cassandra. I m running Cassandra on EC2. I configured
>> Cassandra cluster on two instances in different regions.
>> But when I am trying the nodetool command with ring option, I am getting
>> only single node.
>>
>> How to make these two nodes communicate with each other. I have already
>> opened required ports. i.e 7000, 8080, 9160 in respective
>> security groups. Plz help me with this.
>>
>> Regards,
>> Himanshi Sharma
>>
>>
>> =-=-=
>> Notice: The information contained in this e-mail
>> message and/or attachments to it may contain
>> confidential or privileged information. If you are
>>
>> not the intended recipient, any dissemination, use,
>> review, distribution, printing or copying of the
>> information contained in this e-mail message
>> and/or attachments to it are strictly prohibited. If
>> you have received this communication in error,
>>
>> please notify us by reply e-mail or telephone and
>> immediately and permanently delete the message
>> and any attachments. Thank you
>>
>>
>>
>>
>> =-=-=
>>
>>
>> Notice: The information contained in this e-mail
>> message and/or attachments to it may contain
>> confidential or privileged information. If you are
>> not the intended recipient, any dissemination, use,
>> review, distribution, printing or copying of the
>>
>>
>> information contained in this e-mail message
>> and/or attachments to it are strictly prohibited. If
>> you have received this communication in error,
>> please notify us by reply e-mail or telephone and
>> immediately and permanently delete the message
>>
>>
>> and any attachments. Thank you
>>
>>
>>
>>
>>
>> --
>> Sasha Dolgy*
>> **sasha.do...@gmail.com* 
>>
>> =-=-=
>> Notice: The information contained in this e-mail
>> message and/or attachments to it may contain
>> confidential or privileged information. If you are
>>
>> not the intended recipient, any dissemination, use,
>> review, distribution, printing or copying of the
>> information contained in this e-mail message
>> and/or attachments to it are strictly prohibited. If
>> you have received this communication in error,
>>
>> please notify us by reply e-mail or telephone and
>> immediately and permanently delete the message
>> and any attachments. Thank you
>>
>>
>>
>


-- 
Frank LoVecchio
Senior Software Engineer | Isidorey, LLC
Google Voice +1.720.295.9179
isidorey.com | facebook.com/franklovecchio | franklovecchio.com


Re: Cassandra nodes on EC2 in two different regions not communicating

2011-02-23 Thread Dave Viner
Try using the IP address, not the dns name in the cassandra.yaml.

If you can telnet from one to the other on port 7000, and both nodes have
the other node in their config, it should work.

Dave Viner


On Wed, Feb 23, 2011 at 1:43 AM, Himanshi Sharma wrote:

>
> Ya they do. Have specified Public DNS in seed field of each node in
> Cassandra.yaml...nt able to figure out what the problem is ???
>
>
>
>
>  From: Sasha Dolgy  To: user@cassandra.apache.org Date: 
> 02/23/2011
> 02:56 PM Subject: Re: Cassandra nodes on EC2 in two different regions not
> communicating
> --
>
>
>
> did you define the other host in the cassandra.yaml ?  on both servers 
> they need to know about each other
>
> On Wed, Feb 23, 2011 at 10:16 AM, Himanshi Sharma <*
> himanshi.sha...@tcs.com* > wrote:
>
> Thanks Dave but I am able to telnet to other instances on port 7000
> and when i run  ./nodetool --host *
> ec2-50-18-60-117.us-west-1.compute.amazonaws.com*
>  ring... I can see only one node.
>
> Do we need to configure anything else in Cassandra.yaml or Cassandra-env.sh
> ???
>
>
>
>
>
>
>   From: Dave Viner <*davevi...@gmail.com* >  To: *
> user@cassandra.apache.org*   Cc: Himanshi
> Sharma <*himanshi.sha...@tcs.com* >  Date: 02/23/2011
> 11:36 AM  Subject: Re: Cassandra nodes on EC2 in two different regions not
> communicating
>
>  --
>
>
>
> If you login to one of the nodes, can you telnet to port 7000 on the other
> node?
>
> If not, then almost certainly it's a firewall/Security Group issue.
>
> You can find out the security groups for any node by logging in, and then
> running:
>
> % curl 
> "*http://169.254.169.254/latest/meta-data/security-groups*"
>
>
> Assuming that both nodes are in the same security group, ensure that the SG
> is configured to allow other members of the SG to communicate on port 7000
> to each other.
>
> HTH,
> Dave Viner
>
>
> On Tue, Feb 22, 2011 at 8:59 PM, Himanshi Sharma <*himanshi.sha...@tcs.com
> * > wrote:
>
> Hi,
>
> I am new to Cassandra. I m running Cassandra on EC2. I configured Cassandra
> cluster on two instances in different regions.
> But when I am trying the nodetool command with ring option, I am getting
> only single node.
>
> How to make these two nodes communicate with each other. I have already
> opened required ports. i.e 7000, 8080, 9160 in respective
> security groups. Plz help me with this.
>
> Regards,
> Himanshi Sharma
>
>
> =-=-=
> Notice: The information contained in this e-mail
> message and/or attachments to it may contain
> confidential or privileged information. If you are
>
> not the intended recipient, any dissemination, use,
> review, distribution, printing or copying of the
> information contained in this e-mail message
> and/or attachments to it are strictly prohibited. If
> you have received this communication in error,
>
> please notify us by reply e-mail or telephone and
> immediately and permanently delete the message
> and any attachments. Thank you
>
>
>
>
> =-=-=
>
>
> Notice: The information contained in this e-mail
> message and/or attachments to it may contain
> confidential or privileged information. If you are
> not the intended recipient, any dissemination, use,
> review, distribution, printing or copying of the
>
>
> information contained in this e-mail message
> and/or attachments to it are strictly prohibited. If
> you have received this communication in error,
> please notify us by reply e-mail or telephone and
> immediately and permanently delete the message
>
>
> and any attachments. Thank you
>
>
>
>
>
> --
> Sasha Dolgy*
> **sasha.do...@gmail.com* 
>
> =-=-=
> Notice: The information contained in this e-mail
> message and/or attachments to it may contain
> confidential or privileged information. If you are
> not the intended recipient, any dissemination, use,
> review, distribution, printing or copying of the
> information contained in this e-mail message
> and/or attachments to it are strictly prohibited. If
> you have received this communication in error,
> please notify us by reply e-mail or telephone and
> immediately and permanently delete the message
> and any attachments. Thank you
>
>
>


Re: Does Cassandra use vector clocks

2011-02-23 Thread Oleg Anastasyev
> From the article I linked:
> 
> "But wait, some might say, you can avoid all this by using vectors in
> a different way – to prevent update conflicts by issuing conditional
> writes which specify a version (vector) and only succeed if that
> version is still current. Sorry, but no, or at least not generally. In
> a partition-tolerant system, nodes on each side of a partition may
> simultaneously accept conflicting writes against the same initial
> version, and you’ll still have to resolve the conflict when you
> resolve the partition."
> 

>From the other hand, the same article says: 
"For conditional writes to work, the condition must be evaluated at all update
sites before the write can be allowed to succeed."

This means, that when doing such an update CL=ALL must be used. This could be a
bearable price to pay for conflict detection. Of course for cassandra this could
be not so easy to implement, because , AFAIK, it performs conflict resolution
during reads and compactions, not during writes, like, eg voldemort does.




Re: How does Cassandra handle failure during synchronous writes

2011-02-23 Thread Javier Canillas
There is something call Hinted Handoff. Suppose that you WRITE something
with ConsistencyLevel.ONE on a cluster defined by 4 nodes. Then, the write
is done on the corresponding node and it is returned an OK to the client,
even if the ReplicationFactor over the destination Keyspace is set to a
higher value.

If in that write, one of the replicated nodes is down, then the coordinator
node (the one that will hold value if first place) will mark that
replication message as not sent and will retry eventually, making the
replication happens.

Please, if I have explained it wrongly correct me.

On Wed, Feb 23, 2011 at 5:45 AM, Aaron Morton wrote:

> In the case described below if less than CL nodes respond in rpc_timeout
> (from conf yaml) the client will get a timeout error. I think most higher
> level clients will automatically retry in this case.
>
> If there are not enough nodes to start the request you will get an
> Unavailable exception. Again the client can retry safely.
>
> Aaron
>
>
> On 23/02/2011, at 8:07 PM, Dave Revell  wrote:
>
> Ritesh,
>
> There is no commit protocol. Writes may be persisted on some replicas even
> though the quorum fails. Here's a sequence of events that shows the
> "problem:"
>
> 1. Some replica R fails, but recently, so its failure has not yet been
> detected
> 2. A client writes with consistency > 1
> 3. The write goes to all replicas, all replicas except R persist the write
> to disk
> 4. Replica R never responds
> 5. Failure is returned to the client, but the new value is still in the
> cluster, on all replicas except R.
>
> Something very similar could happen for CL QUORUM.
>
> This is a conscious design decision because a commit protocol would
> constitute tight coupling between nodes, which goes against the Cassandra
> philosophy. But unfortunately you do have to write your app with this case
> in mind.
>
> Best,
> Dave
>
> On Tue, Feb 22, 2011 at 8:22 PM, tijoriwala.ritesh 
> <
> tijoriwala.rit...@gmail.com> wrote:
>
>>
>> Hi,
>> I wanted to get details on how does cassandra do synchronous writes to W
>> replicas (out of N)? Does it do a 2PC? If not, how does it deal with
>> failures of of nodes before it gets to write to W replicas? If the
>> orchestrating node cannot write to W nodes successfully, I guess it will
>> fail the write operation but what happens to the completed writes on X (W
>> >
>> X) nodes?
>>
>> Thanks,
>> Ritesh
>> --
>> View this message in context:
>> 
>> http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/How-does-Cassandra-handle-failure-during-synchronous-writes-tp6055152p6055152.html
>> Sent from the 
>> cassandra-u...@incubator.apache.org mailing list archive at Nabble.com.
>>
>
>


Re: Is it possible to get list of row keys?

2011-02-23 Thread Daniel Lundin
They are, however, in *stable* order, which is important.

On Wed, Feb 23, 2011 at 3:20 PM, Norman Maurer  wrote:
> yes but be aware that the keys will not in the "right order".
>
> Bye,
> Norman
>
> 2011/2/23 Roshan Dawrani :
>> On Wed, Feb 23, 2011 at 7:17 PM, Ching-Cheng Chen
>>  wrote:
>>>
>>> Actually, if you want to get ALL keys, I believe you can still use
>>> RangeSliceQuery with RP.
>>> Just use setKeys("","") as first batch call.
>>> Then use the last key from previous batch as startKey for next batch.
>>> Beware that since startKey is inclusive, so you'd need to ignore first key
>>> from now on.
>>> Keep going until you finish all batches.  You will know you'd need to stop
>>> when setKeys(key_xyz,"") return you only one key.
>>
>> This is what I meant to suggest when I earlier said "So, if you want all,
>> you will need to keep paging forward and collecting the keys." :-)
>


Re: Is it possible to get list of row keys?

2011-02-23 Thread Roshan Dawrani
Yes. But I don't think the retrieving keys in the "right order" was part of
the original question. :-)

On Wed, Feb 23, 2011 at 7:50 PM, Norman Maurer  wrote:

> yes but be aware that the keys will not in the "right order".
>
> Bye,
> Norman
>
> 2011/2/23 Roshan Dawrani :
> > On Wed, Feb 23, 2011 at 7:17 PM, Ching-Cheng Chen
> >  wrote:
> >>
> >> Actually, if you want to get ALL keys, I believe you can still use
> >> RangeSliceQuery with RP.
> >> Just use setKeys("","") as first batch call.
> >> Then use the last key from previous batch as startKey for next batch.
> >> Beware that since startKey is inclusive, so you'd need to ignore first
> key
> >> from now on.
> >> Keep going until you finish all batches.  You will know you'd need to
> stop
> >> when setKeys(key_xyz,"") return you only one key.
> >
> > This is what I meant to suggest when I earlier said "So, if you want all,
> > you will need to keep paging forward and collecting the keys." :-)
>


Re: Does Cassandra use vector clocks

2011-02-23 Thread Jonathan Ellis
On Wed, Feb 23, 2011 at 3:32 AM, Oleg Anastasyev  wrote:
>> Basically: vector clocks tell you there was a conflict, but not how to
>> resolve it (that is, you simply don't have enough information to
>> resolve it even if you push that back to the client a la Dynamo).
>> What dynamo-like systems mostly VC for is the trivial case of "client
>> X updated field 1, client Y updated field 2, so I can resolve that
>> into a merged value containing both updates."  But Cassandra already
>> handles that by splitting the row up into column-per-field, so VC
>> doesn't add anything for us.
>
> Still, vector clocks are very useful in read-modify-update scenarios. In these
> scenarios cassandra requies external pessimistic locking servers, which makes
> cassandra almost unusable for these scenarios, because of introduced latency,
> especially when you have low concurrency for a single column, where optimistic
> locking is a perfect fit.

>From the article I linked:

"But wait, some might say, you can avoid all this by using vectors in
a different way – to prevent update conflicts by issuing conditional
writes which specify a version (vector) and only succeed if that
version is still current. Sorry, but no, or at least not generally. In
a partition-tolerant system, nodes on each side of a partition may
simultaneously accept conflicting writes against the same initial
version, and you’ll still have to resolve the conflict when you
resolve the partition."

-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of DataStax, the source for professional Cassandra support
http://www.datastax.com


Re: Is it possible to get list of row keys?

2011-02-23 Thread Norman Maurer
yes but be aware that the keys will not in the "right order".

Bye,
Norman

2011/2/23 Roshan Dawrani :
> On Wed, Feb 23, 2011 at 7:17 PM, Ching-Cheng Chen
>  wrote:
>>
>> Actually, if you want to get ALL keys, I believe you can still use
>> RangeSliceQuery with RP.
>> Just use setKeys("","") as first batch call.
>> Then use the last key from previous batch as startKey for next batch.
>> Beware that since startKey is inclusive, so you'd need to ignore first key
>> from now on.
>> Keep going until you finish all batches.  You will know you'd need to stop
>> when setKeys(key_xyz,"") return you only one key.
>
> This is what I meant to suggest when I earlier said "So, if you want all,
> you will need to keep paging forward and collecting the keys." :-)


Re: Is it possible to get list of row keys?

2011-02-23 Thread Roshan Dawrani
On Wed, Feb 23, 2011 at 7:17 PM, Ching-Cheng Chen  wrote:

> Actually, if you want to get ALL keys, I believe you can still use
> RangeSliceQuery with RP.
>
> Just use setKeys("","") as first batch call.
>
> Then use the last key from previous batch as startKey for next batch.
> Beware that since startKey is inclusive, so you'd need to ignore first key
> from now on.
>
> Keep going until you finish all batches.  You will know you'd need to stop
> when setKeys(key_xyz,"") return you only one key.
>

This is what I meant to suggest when I earlier said "So, if you want all,
you will need to keep paging forward and collecting the keys." :-)


Re: Is it possible to get list of row keys?

2011-02-23 Thread Ching-Cheng Chen
Actually, if you want to get ALL keys, I believe you can still use
RangeSliceQuery with RP.

Just use setKeys("","") as first batch call.

Then use the last key from previous batch as startKey for next batch.
Beware that since startKey is inclusive, so you'd need to ignore first key
from now on.

Keep going until you finish all batches.  You will know you'd need to stop
when setKeys(key_xyz,"") return you only one key.

This should get you all keys even with RP.

Regards,

Chen

www.evidentsoftware.com

On Wed, Feb 23, 2011 at 8:23 AM, Norman Maurer  wrote:

> query per ranges is only possible with OPP or BPP.
>
> Bye,
> Norman
>
>
> 2011/2/23 Sasha Dolgy :
> > What if i want 20 rows and the next 20 rows in a subsequent query?  can
> this
> > only be achieved with OPP?
> >
> > --
> > Sasha Dolgy
> > sasha.do...@gmail.com
> >
> > On 23 Feb 2011 13:54, "Ching-Cheng Chen" 
> wrote:
> >
>


Re: Is it possible to get list of row keys?

2011-02-23 Thread Norman Maurer
query per ranges is only possible with OPP or BPP.

Bye,
Norman


2011/2/23 Sasha Dolgy :
> What if i want 20 rows and the next 20 rows in a subsequent query?  can this
> only be achieved with OPP?
>
> --
> Sasha Dolgy
> sasha.do...@gmail.com
>
> On 23 Feb 2011 13:54, "Ching-Cheng Chen"  wrote:
>


Re: Is it possible to get list of row keys?

2011-02-23 Thread Sasha Dolgy
What if i want 20 rows and the next 20 rows in a subsequent query?  can this
only be achieved with OPP?

--
Sasha Dolgy
sasha.do...@gmail.com
On 23 Feb 2011 13:54, "Ching-Cheng Chen"  wrote:


Re: Is it possible to get list of row keys?

2011-02-23 Thread Ching-Cheng Chen
You can use the setRowCount() method to specify how many keys to return per
call.
By default is 100.

Beware don't set it too high since it might cause OOM.

And underline code will pre-allocate an array list with size you speify in
setRowCount().   So you might get a OOM if
you used something like Interger.MAX.

Regards,

Chen

www.evidentsoftware.com

On Wed, Feb 23, 2011 at 2:24 AM, Roshan Dawrani wrote:

> Does it help:
> https://github.com/rantav/hector/blob/master/core/src/test/java/me/prettyprint/cassandra/model/RangeSlicesQueryTest.java
>
>
> It
> uses setReturnKeysOnly()...
>
> Same for index queries in:
> https://github.com/rantav/hector/blob/master/core/src/test/java/me/prettyprint/cassandra/model/IndexedSlicesQueryTest.java
>
> I think it will not return all keys from the ColumnFamily at one shot (as
> with rows)
>
> So, if you want all, you will need to keep paging forward and collecting
> the keys.
>
> On Wed, Feb 23, 2011 at 12:41 PM, Joshua Partogi wrote:
>
>> Hi,
>>
>> Assuming the application does not know the list of keys that is stored
>> inside cassandra, how would it be possible to get list of row keys?
>> This list of row keys is going to be used to get a range of slices.
>>
>> Thank you for your help.
>>
>> --
>> http://twitter.com/jpartogi
>>
>
>
>
> --
> Roshan
> Blog: http://roshandawrani.wordpress.com/
> Twitter: @roshandawrani 
> Skype: roshandawrani
>
>


Re: Reads and memory usage clarification

2011-02-23 Thread Viktor Jevdokimov
Everything as I thought, thank you!

2011/2/23 Matthew Dennis 

> Data is in Memtables from writes before they get flushed (based on first
> threshold of ops/size/time exceeded; all are configurable) to SSTables on
> disk.
>
> There is a keycache and a rowcache.  The keycache caches offsets into
> SSTables for the rows.  the rowcache caches the entire row.  There is also
> the OS page cache which is heavily used.
>
> When a read happens, the keycache is updated with the information for the
> SSTables the row was eventually found in.  If there are too many entries now
> in the keycache, some are ejected.  Overall the keycache uses very little
> memory per entry and can cut your disk IO in half so it's a pretty big win.
>
> If you read an entire row it goes in the row cache.  Like the keycache,
> this may result in older entries being ejected from the cache.  If you
> insert lots of really large rows in the rowcache you can OOM your JVM.  The
> rowcache is kept up to date with the memtables as writes come in.
>
> When a read comes in, C* will collect the data from the SSTables and
> Memtables and merge them together but data only goes into Memtables from
> writes.
>
>
> On Tue, Feb 22, 2011 at 3:32 AM, Viktor Jevdokimov 
> wrote:
>
>> Hello,
>>
>> Write path is perfectly documented in architecture overview.
>>
>> I need Reads to be clarified:
>>
>> How memory is used
>> 1. When data is in the Memtable
>> 2. When data is in the SSTable
>>
>> How cache is used alongside with Memtable?
>>
>> Are records created in the Memtable from writes only or from reads also?
>>
>> What I need to know is, how Cassandra uses memory and Memtables for reads?
>>
>>
>> Thenk you,
>> Viktor
>>
>
>


Re: Cassandra nodes on EC2 in two different regions not communicating

2011-02-23 Thread Himanshi Sharma
Ya they do. Have specified Public DNS in seed field of each node in 
Cassandra.yaml...nt able to figure out what the problem is ???





From:
Sasha Dolgy 
To:
user@cassandra.apache.org
Date:
02/23/2011 02:56 PM
Subject:
Re: Cassandra nodes on EC2 in two different regions not communicating



did you define the other host in the cassandra.yaml ?  on both servers 
 they need to know about each other 

On Wed, Feb 23, 2011 at 10:16 AM, Himanshi Sharma  wrote:

Thanks Dave but I am able to telnet to other instances on port 7000 
and when i run  ./nodetool --host 
ec2-50-18-60-117.us-west-1.compute.amazonaws.com  ring... I can see only 
one node. 

Do we need to configure anything else in Cassandra.yaml or 
Cassandra-env.sh ??? 







From: 
Dave Viner  
To: 
user@cassandra.apache.org 
Cc: 
Himanshi Sharma  
Date: 
02/23/2011 11:36 AM 
Subject: 
Re: Cassandra nodes on EC2 in two different regions not communicating




If you login to one of the nodes, can you telnet to port 7000 on the other 
node? 

If not, then almost certainly it's a firewall/Security Group issue. 

You can find out the security groups for any node by logging in, and then 
running: 

% curl "http://169.254.169.254/latest/meta-data/security-groups"; 

Assuming that both nodes are in the same security group, ensure that the 
SG is configured to allow other members of the SG to communicate on port 
7000 to each other. 

HTH, 
Dave Viner 


On Tue, Feb 22, 2011 at 8:59 PM, Himanshi Sharma  
wrote: 

Hi, 

I am new to Cassandra. I m running Cassandra on EC2. I configured 
Cassandra cluster on two instances in different regions. 
But when I am trying the nodetool command with ring option, I am getting 
only single node. 

How to make these two nodes communicate with each other. I have already 
opened required ports. i.e 7000, 8080, 9160 in respective 
security groups. Plz help me with this. 

Regards, 
Himanshi Sharma 


=-=-=
Notice: The information contained in this e-mail
message and/or attachments to it may contain 
confidential or privileged information. If you are 

not the intended recipient, any dissemination, use, 
review, distribution, printing or copying of the 
information contained in this e-mail message 
and/or attachments to it are strictly prohibited. If 
you have received this communication in error, 

please notify us by reply e-mail or telephone and 
immediately and permanently delete the message 
and any attachments. Thank you




=-=-=


Notice: The information contained in this e-mail
message and/or attachments to it may contain 
confidential or privileged information. If you are 
not the intended recipient, any dissemination, use, 
review, distribution, printing or copying of the 


information contained in this e-mail message 
and/or attachments to it are strictly prohibited. If 
you have received this communication in error, 
please notify us by reply e-mail or telephone and 
immediately and permanently delete the message 


and any attachments. Thank you





-- 
Sasha Dolgy
sasha.do...@gmail.com

=-=-=
Notice: The information contained in this e-mail
message and/or attachments to it may contain 
confidential or privileged information. If you are 
not the intended recipient, any dissemination, use, 
review, distribution, printing or copying of the 
information contained in this e-mail message 
and/or attachments to it are strictly prohibited. If 
you have received this communication in error, 
please notify us by reply e-mail or telephone and 
immediately and permanently delete the message 
and any attachments. Thank you




Re: I: Re: Are row-keys sorted by the compareWith?

2011-02-23 Thread Matthew Dennis
The map returned by multiget_slice (what I suspect is the underlying thrift
call for getColumnsFromRows) is not a order preserving map, it's a HashMap
so the order of the returned results cannot be depended on.  Even if it was
a order preserving map, not all languages would be able to make use of the
results since not all languages have ordered maps (though many, including
Java, certainly do).

That being said, it would be fairly easy to change this on the C* side to
preserve the order the keys were requested in, though as mentioned not all
clients could take advantage of it.

On Mon, Feb 21, 2011 at 4:09 PM, cbert...@libero.it wrote:

>
> *As Jonathan mentions the compareWith on a column family def. is defines
> the order for the columns *within* a row... In order to control the ordering
> of rows you'll need to use the OrderPreservingPartitioner (
> http://www.datastax.com/docs/0.7/operations/clustering#tokens-partitioners-ring
> ).*
>
> Thanks for your answer and for your time, I will take a look at this.
>
> *As for getColumnsFromRows; it should be returning you a map of lists.
>  The map is insertion-order-preserving and populated based on the provided
> list of row keys (so if you iterate over the entries in the map they should
> be in the same order as the list of row keys).  *
>
>
> mmm ... well it didn't happen like this. In my code I had a CF named
> comments and also a CF called usercomments. UserComments use an uuid as
> row-key to keep, TimeUUID sorted, the "pointers" to the comments of the
> user. When I get the sorted list of keys from the UserComments and I use
> this list as row-keys-list in the GetColumnsFromRows I don't get back the
> data sorted as I expect them to be*.*
>
> It looks like if Cassandra/Pelops does not care on how I provide the
> row-keys-list. I am sure about that cause I did something different: I
> iterate over my row-keys-list and made many GetColumnFromRow instead of one
> GetColumnsFromRows and when I iterate data are correctly sorted. But this
> can not be a solution ...
>
>
> I am using Cassandra 0.6.9
>
>
> I profit of your knownledge of Pelops to ask you something: I am evaluating
> the migration to Cassandra 0.7 ... as far as you know, in terms of written
> code, is it an heavy job?
>
>
> Best Regards
>
>
> Carlo
>
>
>  Messaggio originale
> Da: d...@reactive.org
>
> On Saturday, 19 February 2011 at 8:16 AM, cbert...@libero.it wrote:
>
> Hi all,
> I created a CF in which i need to get, sorted by time, the Rows inside.
> Each
> Row represents a comment.
>
> 
>
> I've created a few rows using as Row Key a generated TimeUUID but when I
> call
> the Pelops method "GetColumnsFromRows" I don't get the data back as I
> expect:
> rows are not sorted by TimeUUID.
> I though it was probably cause of the random-part of the TimeUUID so I
> create
> a new CF ...
>
> 
>
> This time I created a few rows using the java System.CurrentTimeMillis()
> that
> retrieve a long. I call again the "GetColumnsFromRows" and again the same
> results: data are not sorted!
> I've read many times that Rows are sorted as specified in the compareWith
> but
> I can't see it.
> To solve this problem for the moment I've used a SuperColumnFamily with an
> UNIQUE ROW ... but I think this is just a workaround and not the solution.
>
>  CompareSubcolumnsWith="BytesType"/ >
>
> Now when I call the "GetSuperColumnsFromRow" I get all the SuperColumns as
> I
> expected: sorted by TimeUUID. Why it does not happen the same with the
> Rows?
> I'm confused.
>
> TIA for any help.
>
> Best Regards
>
> Carlo
>
>
>
>
>
>
>


Re: dedication to read & write

2011-02-23 Thread Aaron Morton
Not necessary. All the nodes have the same function and the same access to data. AaronOn 23 Feb, 2011,at 10:27 PM, Sasha Dolgy  wrote:Hi ,

Is there benefit to delegate some nodes specifically for read
operations, and others specifically for write?  When designing a web
app, I could create a connection pool for reads and one for writes ...
or is this me falling back to my rdbms way of thinking?

-sd

-- 
Sasha Dolgy
sasha.do...@gmail.com


Re: Does Cassandra use vector clocks

2011-02-23 Thread Oleg Anastasyev
> Basically: vector clocks tell you there was a conflict, but not how to
> resolve it (that is, you simply don't have enough information to
> resolve it even if you push that back to the client a la Dynamo).
> What dynamo-like systems mostly VC for is the trivial case of "client
> X updated field 1, client Y updated field 2, so I can resolve that
> into a merged value containing both updates."  But Cassandra already
> handles that by splitting the row up into column-per-field, so VC
> doesn't add anything for us.

Still, vector clocks are very useful in read-modify-update scenarios. In these
scenarios cassandra requies external pessimistic locking servers, which makes
cassandra almost unusable for these scenarios, because of introduced latency,
especially when you have low concurrency for a single column, where optimistic
locking is a perfect fit.





dedication to read & write

2011-02-23 Thread Sasha Dolgy
Hi ,

Is there benefit to delegate some nodes specifically for read
operations, and others specifically for write?  When designing a web
app, I could create a connection pool for reads and one for writes ...
or is this me falling back to my rdbms way of thinking?

-sd

-- 
Sasha Dolgy
sasha.do...@gmail.com


Re: Cassandra nodes on EC2 in two different regions not communicating

2011-02-23 Thread Sasha Dolgy
did you define the other host in the cassandra.yaml ?  on both servers 
they need to know about each other

On Wed, Feb 23, 2011 at 10:16 AM, Himanshi Sharma
wrote:

>
> Thanks Dave but I am able to telnet to other instances on port 7000
> and when i run  ./nodetool --host
> ec2-50-18-60-117.us-west-1.compute.amazonaws.com  ring... I can see only
> one node.
>
> Do we need to configure anything else in Cassandra.yaml or Cassandra-env.sh
> ???
>
>
>
>
>
>
>
>   From: Dave Viner  To: user@cassandra.apache.org Cc: 
> Himanshi
> Sharma  Date: 02/23/2011 11:36 AM Subject: Re:
> Cassandra nodes on EC2 in two different regions not communicating
> --
>
>
>
> If you login to one of the nodes, can you telnet to port 7000 on the other
> node?
>
> If not, then almost certainly it's a firewall/Security Group issue.
>
> You can find out the security groups for any node by logging in, and then
> running:
>
> % curl 
> "*http://169.254.169.254/latest/meta-data/security-groups*
> "
>
> Assuming that both nodes are in the same security group, ensure that the SG
> is configured to allow other members of the SG to communicate on port 7000
> to each other.
>
> HTH,
> Dave Viner
>
>
> On Tue, Feb 22, 2011 at 8:59 PM, Himanshi Sharma <*himanshi.sha...@tcs.com
> * > wrote:
>
> Hi,
>
> I am new to Cassandra. I m running Cassandra on EC2. I configured Cassandra
> cluster on two instances in different regions.
> But when I am trying the nodetool command with ring option, I am getting
> only single node.
>
> How to make these two nodes communicate with each other. I have already
> opened required ports. i.e 7000, 8080, 9160 in respective
> security groups. Plz help me with this.
>
> Regards,
> Himanshi Sharma
>
>
> =-=-=
> Notice: The information contained in this e-mail
> message and/or attachments to it may contain
> confidential or privileged information. If you are
>
> not the intended recipient, any dissemination, use,
> review, distribution, printing or copying of the
> information contained in this e-mail message
> and/or attachments to it are strictly prohibited. If
> you have received this communication in error,
>
> please notify us by reply e-mail or telephone and
> immediately and permanently delete the message
> and any attachments. Thank you
>
>
>
>
> =-=-=
>
> Notice: The information contained in this e-mail
> message and/or attachments to it may contain
> confidential or privileged information. If you are
> not the intended recipient, any dissemination, use,
> review, distribution, printing or copying of the
>
> information contained in this e-mail message
> and/or attachments to it are strictly prohibited. If
> you have received this communication in error,
> please notify us by reply e-mail or telephone and
> immediately and permanently delete the message
>
> and any attachments. Thank you
>
>
>


-- 
Sasha Dolgy
sasha.do...@gmail.com


Re: Reads and memory usage clarification

2011-02-23 Thread Matthew Dennis
Data is in Memtables from writes before they get flushed (based on first
threshold of ops/size/time exceeded; all are configurable) to SSTables on
disk.

There is a keycache and a rowcache.  The keycache caches offsets into
SSTables for the rows.  the rowcache caches the entire row.  There is also
the OS page cache which is heavily used.

When a read happens, the keycache is updated with the information for the
SSTables the row was eventually found in.  If there are too many entries now
in the keycache, some are ejected.  Overall the keycache uses very little
memory per entry and can cut your disk IO in half so it's a pretty big win.

If you read an entire row it goes in the row cache.  Like the keycache, this
may result in older entries being ejected from the cache.  If you insert
lots of really large rows in the rowcache you can OOM your JVM.  The
rowcache is kept up to date with the memtables as writes come in.

When a read comes in, C* will collect the data from the SSTables and
Memtables and merge them together but data only goes into Memtables from
writes.

On Tue, Feb 22, 2011 at 3:32 AM, Viktor Jevdokimov wrote:

> Hello,
>
> Write path is perfectly documented in architecture overview.
>
> I need Reads to be clarified:
>
> How memory is used
> 1. When data is in the Memtable
> 2. When data is in the SSTable
>
> How cache is used alongside with Memtable?
>
> Are records created in the Memtable from writes only or from reads also?
>
> What I need to know is, how Cassandra uses memory and Memtables for reads?
>
>
> Thenk you,
> Viktor
>


Re: Cassandra nodes on EC2 in two different regions not communicating

2011-02-23 Thread Himanshi Sharma
Thanks Dave but I am able to telnet to other instances on port 7000 
and when i run  ./nodetool --host 
ec2-50-18-60-117.us-west-1.compute.amazonaws.com  ring... I can see only 
one node.

Do we need to configure anything else in Cassandra.yaml or 
Cassandra-env.sh ???








From:
Dave Viner 
To:
user@cassandra.apache.org
Cc:
Himanshi Sharma 
Date:
02/23/2011 11:36 AM
Subject:
Re: Cassandra nodes on EC2 in two different regions not communicating



If you login to one of the nodes, can you telnet to port 7000 on the other 
node?

If not, then almost certainly it's a firewall/Security Group issue.

You can find out the security groups for any node by logging in, and then 
running:

% curl "http://169.254.169.254/latest/meta-data/security-groups";

Assuming that both nodes are in the same security group, ensure that the 
SG is configured to allow other members of the SG to communicate on port 
7000 to each other.

HTH,
Dave Viner


On Tue, Feb 22, 2011 at 8:59 PM, Himanshi Sharma  
wrote:

Hi, 

I am new to Cassandra. I m running Cassandra on EC2. I configured 
Cassandra cluster on two instances in different regions. 
But when I am trying the nodetool command with ring option, I am getting 
only single node. 

How to make these two nodes communicate with each other. I have already 
opened required ports. i.e 7000, 8080, 9160 in respective 
security groups. Plz help me with this. 

Regards, 
Himanshi Sharma 


=-=-=
Notice: The information contained in this e-mail
message and/or attachments to it may contain 
confidential or privileged information. If you are 

not the intended recipient, any dissemination, use, 
review, distribution, printing or copying of the 
information contained in this e-mail message 
and/or attachments to it are strictly prohibited. If 
you have received this communication in error, 

please notify us by reply e-mail or telephone and 
immediately and permanently delete the message 
and any attachments. Thank you




=-=-=
Notice: The information contained in this e-mail
message and/or attachments to it may contain 
confidential or privileged information. If you are 
not the intended recipient, any dissemination, use, 
review, distribution, printing or copying of the 
information contained in this e-mail message 
and/or attachments to it are strictly prohibited. If 
you have received this communication in error, 
please notify us by reply e-mail or telephone and 
immediately and permanently delete the message 
and any attachments. Thank you




Re: How does Cassandra handle failure during synchronous writes

2011-02-23 Thread Aaron Morton
In the case described below if less than CL nodes respond in rpc_timeout (from 
conf yaml) the client will get a timeout error. I think most higher level 
clients will automatically retry in this case.

If there are not enough nodes to start the request you will get an Unavailable 
exception. Again the client can retry safely.

Aaron

On 23/02/2011, at 8:07 PM, Dave Revell  wrote:

> Ritesh,
> 
> There is no commit protocol. Writes may be persisted on some replicas even 
> though the quorum fails. Here's a sequence of events that shows the "problem:"
> 
> 1. Some replica R fails, but recently, so its failure has not yet been 
> detected
> 2. A client writes with consistency > 1
> 3. The write goes to all replicas, all replicas except R persist the write to 
> disk
> 4. Replica R never responds
> 5. Failure is returned to the client, but the new value is still in the 
> cluster, on all replicas except R.
> 
> Something very similar could happen for CL QUORUM.
> 
> This is a conscious design decision because a commit protocol would 
> constitute tight coupling between nodes, which goes against the Cassandra 
> philosophy. But unfortunately you do have to write your app with this case in 
> mind.
> 
> Best,
> Dave
> 
> On Tue, Feb 22, 2011 at 8:22 PM, tijoriwala.ritesh 
>  wrote:
> 
> Hi,
> I wanted to get details on how does cassandra do synchronous writes to W
> replicas (out of N)? Does it do a 2PC? If not, how does it deal with
> failures of of nodes before it gets to write to W replicas? If the
> orchestrating node cannot write to W nodes successfully, I guess it will
> fail the write operation but what happens to the completed writes on X (W >
> X) nodes?
> 
> Thanks,
> Ritesh
> --
> View this message in context: 
> http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/How-does-Cassandra-handle-failure-during-synchronous-writes-tp6055152p6055152.html
> Sent from the cassandra-u...@incubator.apache.org mailing list archive at 
> Nabble.com.
> 


Re: How scalable are automatic secondary indexes in Cassandra 0.7?

2011-02-23 Thread Stu Hood
In practice, local secondary indexes scale to {RF * the limit of a single
machine} for -low cardinality- values (ex: users living in a certain state)
since the first node is likely to be able to answer your question. This also
means they are good for performing filtering for analytics.

On the other hand, they are not very useful for high cardinality values (ex:
users born at a particular second), because in the worst case you have to
query every node in your cluster, and you are much more likely to hit the
worst case with rare values.

If you have high cardinality values, it is currently recommended to build
your own secondary indexes from the client side, as you suggested. Triggers
may help you perform this distributed indexing in the near future: see
CASSANDRA-1311.

On Tue, Feb 22, 2011 at 4:45 PM, Piotr J.  wrote:

> Hi, As far as I understand automatic secondary indexes are generated for
> node local data.
>
> In this case query by secondary index involve all nodes storing part of
> column family to get results (?) so (if i am right) if data is spread across
> 50 nodes then 50 nodes are involved in single query?
>
> How far can this scale? Is this more scalable than manual secondary indexes
> (inverted index column family)? Few nodes or hundred nodes?
>
> Regards
>


Re: Exceptions on 0.7.0

2011-02-23 Thread Stu Hood
I expect that this problem was due to
https://issues.apache.org/jira/browse/CASSANDRA-2216 : I'll make noise to
try and get it released soon as 0.7.3

On Tue, Feb 22, 2011 at 5:41 AM, David Boxenhorn  wrote:

> Thanks, Shimi. I'll keep you posted if we make progress. Riptano is working
> on this problem too.
>
> On Tue, Feb 22, 2011 at 3:30 PM, shimi  wrote:
>
>> I didn't solved it.
>> Since it is a test cluster I deleted all the data. I copied some sstables
>> from my production cluster and I tried again, this time I didn't have this
>> problem.
>> I am planing on removing everything from this test cluster. I will start
>> all over again with 0.6.x , then I will load it with 10th of GB of data (not
>> sstable copy) and test the upgrade again.
>>
>> I did a mistake that I didn't backup the data files before I upgraded.
>>
>> Shimi
>>
>> On Tue, Feb 22, 2011 at 2:24 PM, David Boxenhorn wrote:
>>
>>> Shimi,
>>>
>>> I am getting the same error that you report here. What did you do to
>>> solve it?
>>>
>>> David
>>>
>>>
>>> On Thu, Feb 10, 2011 at 2:54 PM, shimi  wrote:
>>>
 I upgraded the version on all the nodes but I still gets the Exceptions.
 I run cleanup on one of the nodes but I don't think there is any cleanup
 going on.

 Another weird thing that I see is:
 INFO [CompactionExecutor:1] 2011-02-10 12:08:21,353
 CompactionIterator.java (line 135) Compacting large row
 333531353730363835363237353338383836383035363036393135323132383
 73630323034313a446f20322e384c20656e67696e657320686176652061646a75737461626c65206c696674657273
 (725849473109 bytes) incrementally

 In my production version the largest row is 10259. It shouldn't be
 different in this case.

 The first Exception is been thrown on 3 nodes during compaction.
 The second Exception (Internal error processing get_range_slices) is
 been thrown all the time by a forth node. I disabled gossip and any client
 traffic to it and I still get the Exceptions.
 Is it possible to boot a node with gossip disable?

 Shimi

 On Thu, Feb 10, 2011 at 11:11 AM, aaron morton >>> > wrote:

> I should be able to repair, install the new version and kick off
> nodetool repair .
>
> If you are uncertain search for cassandra-1992 on the list, there has
> been some discussion. You can also wait till some peeps in the states wake
> up if you want to be extra sure.
>
>  The number if the number of columns the iterator is going to return
> from the row. I'm guessing that because this happening during compaction
> it's using asked for the maximum possible number of columns.
>
> Aaron
>
>
>
> On 10 Feb 2011, at 21:37, shimi wrote:
>
> On 10 Feb 2011, at 13:42, Dan Hendry wrote:
>
>  Out of curiosity, do you really have on the order of 1,986,622,313
> elements (I believe elements=keys) in the cf?
>
> Dan
>
> No. I was too puzzled by the numbers
>
>
> On Thu, Feb 10, 2011 at 10:30 AM, aaron morton <
> aa...@thelastpickle.com> wrote:
>
>> Shimi,
>> You may be seeing the result of CASSANDRA-1992, are you able to test
>> with the most recent 0.7 build ?
>> https://hudson.apache.org/hudson/job/Cassandra-0.7/
>>
>>
>> Aaron
>>
> I will. I hope the data was not corrupted.
>
>
>
> On Thu, Feb 10, 2011 at 10:30 AM, aaron morton <
> aa...@thelastpickle.com> wrote:
>
>> Shimi,
>> You may be seeing the result of CASSANDRA-1992, are you able to test
>> with the most recent 0.7 build ?
>> https://hudson.apache.org/hudson/job/Cassandra-0.7/
>>
>>
>> Aaron
>>
>> On 10 Feb 2011, at 13:42, Dan Hendry wrote:
>>
>> Out of curiosity, do you really have on the order of 1,986,622,313
>> elements (I believe elements=keys) in the cf?
>>
>> Dan
>>
>>  *From:* shimi [mailto:shim...@gmail.com]
>> *Sent:* February-09-11 15:06
>> *To:* user@cassandra.apache.org
>> *Subject:* Exceptions on 0.7.0
>>
>> I have a 4 node test cluster were I test the port to 0.7.0 from 0.6.X
>> On 3 out of the 4 nodes I get exceptions in the log.
>> I am using RP.
>> Changes that I did:
>> 1. changed the replication factor from 3 to 4
>> 2. configured the nodes to use Dynamic Snitch
>> 3. RR of 0.33
>>
>> I run repair on 2 nodes  before I noticed the errors. One of them is
>> having the first error and the other the second.
>> I restart the nodes but I still get the exceptions.
>>
>> The following Exception I get from 2 nodes:
>>  WARN [CompactionExecutor:1] 2011-02-09 19:50:51,281 BloomFilter.java
>> (line 84) Cannot provide an optimal Bloom
>> Filter for 1986622313 elements (1/4 buckets per element).
>> ERROR [CompactionExecutor:1] 2011-02-09 19:51:10,190
>> AbstractCassandraDaemon.java (line 91) Fatal excep