date:20160204

Re: Want inputs about super column family vs map/list

2016-02-04 Thread Atul Saroha

Hmm, I have same issue.
I would like to know how you are able migrating data from RDBMS to
cassandra in this way., i.e. making column's value as column name.

Are you using some programming script or datastax scoop support for this?

-
Atul Saroha
*Sr. Software Engineer*
*M*: +91 8447784271 *T*: +91 124-415-6069 *EXT*: 12369
Plot # 362, ASF Centre - Tower A, Udyog Vihar,
 Phase -4, Sector 18, Gurgaon, Haryana 122016, INDIA

On Thu, Feb 4, 2016 at 4:07 PM, Bhuvan Rawal  wrote:

> Hi All,
>
> There are two ways to achieve this :
> 1. Using super column family:
>
> raman | atul | bhuvan
> ---
> 1234  | 5678 | 2345
>
> OR
> Using single Collection column :
> Phone Number
> ---
> Map 
> 
> 
>
> I would like to know which approach would be better in the below use cases
> :
>
>1. First Case - Frequent complete map Update
>2. Second Case - Frequent complete map Read
>3. Frequent Update only for specific fields.
>4. Frequent Read only for specific fields.
>
> Also is there any way to configure cassandra-stress tool for testing this
> scenario?
>
> Thanks & Regards,
> Bhuvan
>

Atomic Batch: Maintaining consistency between tables

2016-02-04 Thread aeljami.ext

Hello,

I read in the documentation DataStax :

"The coordinator node might also need to work hard to process a logged batch 
while maintaining consistency between tables"

It means that the coordinator send the mutations  to all replica nodes and 
waits for RF acknowledgements ? or only one node of set of replica ?

thx

_

Ce message et ses pieces jointes peuvent contenir des informations 
confidentielles ou privilegiees et ne doivent donc
pas etre diffuses, exploites ou copies sans autorisation. Si vous avez recu ce 
message par erreur, veuillez le signaler
a l'expediteur et le detruire ainsi que les pieces jointes. Les messages 
electroniques etant susceptibles d'alteration,
Orange decline toute responsabilite si ce message a ete altere, deforme ou 
falsifie. Merci.

This message and its attachments may contain confidential or privileged 
information that may be protected by law;
they should not be distributed, used or copied without authorisation.
If you have received this email in error, please notify the sender and delete 
this message and its attachments.
As emails may be altered, Orange is not liable for messages that have been 
modified, changed or falsified.
Thank you.

Re: Duplicated key with an IN statement

2016-02-04 Thread Alain RODRIGUEZ

Hi,

This is interesting.

It seems rational that if you are looking at 2 keys and both exist (which
is the case) it returns you 2 keys, it. Yet, I just checked this kind of
command on MySQL and it gives a one line result. So here CQL differs from
SQL (at least MySQL). I know we are trying to fit as much as possible with
SQL to avoid loosing people, so we might want to change this.
Not sure if this behavior is intentional / known. Not even sure someone
ever tried to do this kind of query actually :).

Does anyone know about that ? Should we raise a ticket ?

-
Alain Rodriguez
France

The Last Pickle
http://www.thelastpickle.com



2016-02-04 8:36 GMT+00:00 Edouard COLE :

> Hello,
>
> I just discovered this, and I think this is weird:
>
> ed@debian:~$ cqlsh 192.168.10.8
> Connected to _CLUSTER_ at 192.168.10.8:9160.
> [cqlsh 4.0.1 | Cassandra 2.0.14.459 | CQL spec 3.1.1 | Thrift protocol
> 19.39.0]
> Use HELP for help.
> cqlsh> USE ks-test ;
> cqlsh:ks-test> CREATE TABLE t (
> ... key int,
> ... value int,
> ... PRIMARY KEY (key)
> ... );
> cqlsh:ks-test> INSERT INTO t (key, value) VALUES (123, 456) ;
> cqlsh:ks-test> SELECT * FROM t ;
>
>  key | value
> -+---
>  123 |   456
>
> (1 rows)
>
> cqlsh:ks-test> SELECT * FROM t WHERE key IN (123, 123);
>
>  key | value
> -+---
>  123 |   456
>  123 |   456 <- WTF?
>
> (2 rows)
>
> Adding multiple time the same key into an IN statement make the query
> returns multiple time the tuple
>
> This looks weird to me, can anyone give me some feedback on such a
> behavior?
>
> Edouard COLE
>
>

Want inputs about super column family vs map/list

2016-02-04 Thread Bhuvan Rawal

Hi All,

There are two ways to achieve this :
1. Using super column family:

raman | atul | bhuvan
---
1234  | 5678 | 2345

OR
Using single Collection column :
Phone Number
---
Map 



I would like to know which approach would be better in the below use cases :

   1. First Case - Frequent complete map Update
   2. Second Case - Frequent complete map Read
   3. Frequent Update only for specific fields.
   4. Frequent Read only for specific fields.

Also is there any way to configure cassandra-stress tool for testing this
scenario?

Thanks & Regards,
Bhuvan

Re: "Not enough replicas available for query" after reboot

2016-02-04 Thread Bryan Cheng

Hey Flavien!

Did your reboot come with any other changes (schema, configuration,
topology, version)?

On Thu, Feb 4, 2016 at 2:06 PM, Flavien Charlon 
wrote:

> I'm using the C# driver 2.5.2. I did try to restart the client
> application, but that didn't make any difference, I still get the same
> error after restart.
>
> On 4 February 2016 at 21:54,  wrote:
>
>> What client are you using?
>>
>>
>>
>> It is possible that the client saw nodes down and has kept them marked
>> that way (without retrying). Depending on the client, you may have options
>> to set in RetryPolicy, FailoverPolicy, etc. A bounce of the client will
>> probably fix the problem for now.
>>
>>
>>
>>
>>
>> Sean Durity
>>
>>
>>
>> *From:* Flavien Charlon [mailto:flavien.char...@gmail.com]
>> *Sent:* Thursday, February 04, 2016 4:06 PM
>> *To:* user@cassandra.apache.org
>> *Subject:* Re: "Not enough replicas available for query" after reboot
>>
>>
>>
>> Yes, all three nodes see all three nodes as UN.
>>
>>
>>
>> Also, connecting from a local Cassandra machine using cqlsh, I can run
>> the same query just fine (with QUORUM consistency level).
>>
>>
>>
>> On 4 February 2016 at 21:02, Robert Coli  wrote:
>>
>> On Thu, Feb 4, 2016 at 12:53 PM, Flavien Charlon <
>> flavien.char...@gmail.com> wrote:
>>
>> My cluster was running fine. I rebooted all three nodes (one by one), and
>> now all nodes are back up and running. "nodetool status" shows UP for all
>> three nodes on all three nodes:
>>
>>
>>
>> --  AddressLoad   Tokens  OwnsHost ID
>>   Rack
>>
>> UN  xx.xx.xx.xx331.84 GB  1   ?
>> d3d3a79b-9ca5-43f9-88c4-c3c7f08ca538  RAC1
>>
>> UN  xx.xx.xx.xx317.2 GB   1   ?
>> de7917ed-0de9-434d-be88-bc91eb4f8713  RAC1
>>
>> UN  xx.xx.xx.xx  291.61 GB  1   ?
>> b489c970-68db-44a7-90c6-be734b41475f  RAC1
>>
>>
>>
>> However, now the client application fails to run queries on the cluster
>> with:
>>
>>
>>
>> Cassandra.UnavailableException: Not enough replicas available for query
>> at consistency Quorum (2 required but only 1 alive)
>>
>>
>>
>> Do *all* nodes see each other as UP/UN?
>>
>>
>>
>> =Rob
>>
>>
>>
>>
>>
>> --
>>
>> The information in this Internet Email is confidential and may be legally
>> privileged. It is intended solely for the addressee. Access to this Email
>> by anyone else is unauthorized. If you are not the intended recipient, any
>> disclosure, copying, distribution or any action taken or omitted to be
>> taken in reliance on it, is prohibited and may be unlawful. When addressed
>> to our clients any opinions or advice contained in this Email are subject
>> to the terms and conditions expressed in any applicable governing The Home
>> Depot terms of business or client engagement letter. The Home Depot
>> disclaims all responsibility and liability for the accuracy and content of
>> this attachment and for any damages or losses arising from any
>> inaccuracies, errors, viruses, e.g., worms, trojan horses, etc., or other
>> items of a destructive nature, which may be contained in this attachment
>> and shall not be liable for direct, indirect, consequential or special
>> damages in connection with this e-mail message or its attachment.
>>
>
>

Re: Any tips on how to track down why Cassandra won't cluster?

2016-02-04 Thread Alain RODRIGUEZ

Hi Richard,

I think you just can't use EC2Snitch with public IPs.

See
https://docs.datastax.com/en/cassandra/2.0/cassandra/architecture/architectureSnitchEC2_t.html

Precisely "Because private IPs are used, this snitch does not work across
multiple regions"

54.*.*.* looks like a public one.

You can stick with the private IPs (with limitation written above, even if
you can workaround with a VPN tunnel across Regions). In this case set
listen address to private IP and comment broadcast_address. You can also
use the EC2MultiRegionSnitch, but then be careful with broadcast_address
(public IP) and listen-address (private IP) configuration on the
cassandra.yaml files and also with ports management on AWS console.

Also, as you nodes already bootstrapped, you might have to clean the
cassandra folder, usually something like rm -rf /var/lib/cassandra/*
*warning: *you will loose all the data, but this "cluster" doesn't look
like a running cluster, only you can know :-).

Any suggestions on how to track down what might trigger this problem

This kind of issue might be due to:

- Different cluster names
- *Bad configuration* (IPs, Snitch + configuration files, ...) <-- probably
your case
- Ports (firewall, AWS rules...) <-- telnet might be useful here
- Seeds being differents on the nodes <-- make sure that your seeds are the
same on every node

Hope this will be enough to get you out of this,

C*heers,
-
Alain Rodriguez
France

The Last Pickle
http://www.thelastpickle.com

2016-02-04 16:35 GMT+00:00 Victor Chen :

> Along the lines of what Ben and Bryan suggested, what are you using to
> verify ports are open? If you do something like:
>
> node1$ nc -zv node2 9042
> node2$ nc -zv node1 9042
>
> does it succeed from both nodes?
> Does the first node 'know' that it is a seed? i.e. do you have first node
> listed in its own seed's list?
> What does the system.log show as both nodes are spun up?
>
>
> On Wed, Feb 3, 2016 at 7:20 PM, Bryan Cheng  wrote:
>
>>
>>
>>
>>> On Wed, 3 Feb 2016 at 11:49 Richard L. Burton III 
>>> wrote:
>>>

 Any suggestions on how to track down what might trigger this problem?
 I'm not receiving any exceptions.

>>>
>> You're not getting "Unable to gossip with any seeds" on the second node?
>> What does nodetool status show on both machines?
>>
>
>

Re: "Not enough replicas available for query" after reboot

2016-02-04 Thread Robert Coli

On Thu, Feb 4, 2016 at 12:53 PM, Flavien Charlon 
wrote:

> My cluster was running fine. I rebooted all three nodes (one by one), and
> now all nodes are back up and running. "nodetool status" shows UP for all
> three nodes on all three nodes:
>
> --  AddressLoad   Tokens  OwnsHost ID
>   Rack
> UN  xx.xx.xx.xx331.84 GB  1   ?
> d3d3a79b-9ca5-43f9-88c4-c3c7f08ca538  RAC1
> UN  xx.xx.xx.xx317.2 GB   1   ?
> de7917ed-0de9-434d-be88-bc91eb4f8713  RAC1
> UN  xx.xx.xx.xx  291.61 GB  1   ?
> b489c970-68db-44a7-90c6-be734b41475f  RAC1
>
> However, now the client application fails to run queries on the cluster
> with:
>
> Cassandra.UnavailableException: Not enough replicas available for query at
>> consistency Quorum (2 required but only 1 alive)
>
>
Do *all* nodes see each other as UP/UN?

=Rob

RE: "Not enough replicas available for query" after reboot

2016-02-04 Thread SEAN_R_DURITY

What client are you using?

It is possible that the client saw nodes down and has kept them marked that way 
(without retrying). Depending on the client, you may have options to set in 
RetryPolicy, FailoverPolicy, etc. A bounce of the client will probably fix the 
problem for now.

Sean Durity

From: Flavien Charlon [mailto:flavien.char...@gmail.com]
Sent: Thursday, February 04, 2016 4:06 PM
To: user@cassandra.apache.org
Subject: Re: "Not enough replicas available for query" after reboot

Yes, all three nodes see all three nodes as UN.

Also, connecting from a local Cassandra machine using cqlsh, I can run the same 
query just fine (with QUORUM consistency level).

On 4 February 2016 at 21:02, Robert Coli 
> wrote:
On Thu, Feb 4, 2016 at 12:53 PM, Flavien Charlon 
> wrote:
My cluster was running fine. I rebooted all three nodes (one by one), and now 
all nodes are back up and running. "nodetool status" shows UP for all three 
nodes on all three nodes:

--  AddressLoad   Tokens  OwnsHost ID   
Rack
UN  xx.xx.xx.xx331.84 GB  1   ?   
d3d3a79b-9ca5-43f9-88c4-c3c7f08ca538  RAC1
UN  xx.xx.xx.xx317.2 GB   1   ?   
de7917ed-0de9-434d-be88-bc91eb4f8713  RAC1
UN  xx.xx.xx.xx  291.61 GB  1   ?   
b489c970-68db-44a7-90c6-be734b41475f  RAC1

However, now the client application fails to run queries on the cluster with:

Cassandra.UnavailableException: Not enough replicas available for query at 
consistency Quorum (2 required but only 1 alive)

Do *all* nodes see each other as UP/UN?

=Rob

The information in this Internet Email is confidential and may be legally 
privileged. It is intended solely for the addressee. Access to this Email by 
anyone else is unauthorized. If you are not the intended recipient, any 
disclosure, copying, distribution or any action taken or omitted to be taken in 
reliance on it, is prohibited and may be unlawful. When addressed to our 
clients any opinions or advice contained in this Email are subject to the terms 
and conditions expressed in any applicable governing The Home Depot terms of 
business or client engagement letter. The Home Depot disclaims all 
responsibility and liability for the accuracy and content of this attachment 
and for any damages or losses arising from any inaccuracies, errors, viruses, 
e.g., worms, trojan horses, etc., or other items of a destructive nature, which 
may be contained in this attachment and shall not be liable for direct, 
indirect, consequential or special damages in connection with this e-mail 
message or its attachment.

Re: "Not enough replicas available for query" after reboot

2016-02-04 Thread Flavien Charlon

No, there was no other change. I did run "apt-get upgrade" before
rebooting, but Cassandra has not been upgraded.

On 4 February 2016 at 22:48, Bryan Cheng  wrote:

> Hey Flavien!
>
> Did your reboot come with any other changes (schema, configuration,
> topology, version)?
>
> On Thu, Feb 4, 2016 at 2:06 PM, Flavien Charlon  > wrote:
>
>> I'm using the C# driver 2.5.2. I did try to restart the client
>> application, but that didn't make any difference, I still get the same
>> error after restart.
>>
>> On 4 February 2016 at 21:54,  wrote:
>>
>>> What client are you using?
>>>
>>>
>>>
>>> It is possible that the client saw nodes down and has kept them marked
>>> that way (without retrying). Depending on the client, you may have options
>>> to set in RetryPolicy, FailoverPolicy, etc. A bounce of the client will
>>> probably fix the problem for now.
>>>
>>>
>>>
>>>
>>>
>>> Sean Durity
>>>
>>>
>>>
>>> *From:* Flavien Charlon [mailto:flavien.char...@gmail.com]
>>> *Sent:* Thursday, February 04, 2016 4:06 PM
>>> *To:* user@cassandra.apache.org
>>> *Subject:* Re: "Not enough replicas available for query" after reboot
>>>
>>>
>>>
>>> Yes, all three nodes see all three nodes as UN.
>>>
>>>
>>>
>>> Also, connecting from a local Cassandra machine using cqlsh, I can run
>>> the same query just fine (with QUORUM consistency level).
>>>
>>>
>>>
>>> On 4 February 2016 at 21:02, Robert Coli  wrote:
>>>
>>> On Thu, Feb 4, 2016 at 12:53 PM, Flavien Charlon <
>>> flavien.char...@gmail.com> wrote:
>>>
>>> My cluster was running fine. I rebooted all three nodes (one by one),
>>> and now all nodes are back up and running. "nodetool status" shows UP for
>>> all three nodes on all three nodes:
>>>
>>>
>>>
>>> --  AddressLoad   Tokens  OwnsHost ID
>>> Rack
>>>
>>> UN  xx.xx.xx.xx331.84 GB  1   ?
>>> d3d3a79b-9ca5-43f9-88c4-c3c7f08ca538  RAC1
>>>
>>> UN  xx.xx.xx.xx317.2 GB   1   ?
>>> de7917ed-0de9-434d-be88-bc91eb4f8713  RAC1
>>>
>>> UN  xx.xx.xx.xx  291.61 GB  1   ?
>>> b489c970-68db-44a7-90c6-be734b41475f  RAC1
>>>
>>>
>>>
>>> However, now the client application fails to run queries on the cluster
>>> with:
>>>
>>>
>>>
>>> Cassandra.UnavailableException: Not enough replicas available for query
>>> at consistency Quorum (2 required but only 1 alive)
>>>
>>>
>>>
>>> Do *all* nodes see each other as UP/UN?
>>>
>>>
>>>
>>> =Rob
>>>
>>>
>>>
>>>
>>>
>>> --
>>>
>>> The information in this Internet Email is confidential and may be
>>> legally privileged. It is intended solely for the addressee. Access to this
>>> Email by anyone else is unauthorized. If you are not the intended
>>> recipient, any disclosure, copying, distribution or any action taken or
>>> omitted to be taken in reliance on it, is prohibited and may be unlawful.
>>> When addressed to our clients any opinions or advice contained in this
>>> Email are subject to the terms and conditions expressed in any applicable
>>> governing The Home Depot terms of business or client engagement letter. The
>>> Home Depot disclaims all responsibility and liability for the accuracy and
>>> content of this attachment and for any damages or losses arising from any
>>> inaccuracies, errors, viruses, e.g., worms, trojan horses, etc., or other
>>> items of a destructive nature, which may be contained in this attachment
>>> and shall not be liable for direct, indirect, consequential or special
>>> damages in connection with this e-mail message or its attachment.
>>>
>>
>>
>

System block cache vs. disk access and metrics

2016-02-04 Thread Jeff Ferland

We struggled for a while to upgrade due to an out of order SStables bug. During 
this time, load continued to increase and we were eventually accessing the disk 
a lot. When we could finally expand the cluster, the went down by an order of 
magnitude. This leads me to conclude that we had blown out the block cache.

Linux unfortunately doesn’t have a metric for tracking the block cache hit 
ratio. There is system tap which may be the way we have to go, but I’m 
wondering about Cassandra counters as well. If I can track the ratio of SSTable 
reads vs. actual disk reads, I’ll have sufficiently good enough data to not 
spend my time writing up a system tap script.

This brings about the following specific questions:
 * Which if any metric corresponds to the number of queries made by clients
 * Which if any metric corresponds to the number of sstable reads performed

Metrics such as cassandra.ReadCount aren’t perfectly clear as to what they do 
and don’t indicate, so feedback on that before I go on another source code 
reading adventure is welcomed.

Cheers all,
-Jeff

Re: "Not enough replicas available for query" after reboot

2016-02-04 Thread Flavien Charlon

Yes, all three nodes see all three nodes as UN.

Also, connecting from a local Cassandra machine using cqlsh, I can run the
same query just fine (with QUORUM consistency level).

On 4 February 2016 at 21:02, Robert Coli  wrote:

> On Thu, Feb 4, 2016 at 12:53 PM, Flavien Charlon <
> flavien.char...@gmail.com> wrote:
>
>> My cluster was running fine. I rebooted all three nodes (one by one), and
>> now all nodes are back up and running. "nodetool status" shows UP for all
>> three nodes on all three nodes:
>>
>> --  AddressLoad   Tokens  OwnsHost ID
>>   Rack
>> UN  xx.xx.xx.xx331.84 GB  1   ?
>> d3d3a79b-9ca5-43f9-88c4-c3c7f08ca538  RAC1
>> UN  xx.xx.xx.xx317.2 GB   1   ?
>> de7917ed-0de9-434d-be88-bc91eb4f8713  RAC1
>> UN  xx.xx.xx.xx  291.61 GB  1   ?
>> b489c970-68db-44a7-90c6-be734b41475f  RAC1
>>
>> However, now the client application fails to run queries on the cluster
>> with:
>>
>> Cassandra.UnavailableException: Not enough replicas available for query
>>> at consistency Quorum (2 required but only 1 alive)
>>
>>
> Do *all* nodes see each other as UP/UN?
>
> =Rob
>
>

Re: Clustering key values not distributed

2016-02-04 Thread Alain RODRIGUEZ

Hi Ralf,

I am not familiar with the "columnspec" but I'll try to help.

First, are you sure that the result is not the one expected ? Did you try a
select query specifying a partition key, to check the number of rows
returned ? Partitions aren't ordered when fetched, so something like the
query below would probably be a better approach than fetching all and
limiting to 30 rows.

$ cqlsh> select user_id, event_type, session_type, created_at from
stresscql.batch_too_large WHERE *user_id = '%\x7f\x03/.d29cluster: uniform(10..10)


This is very restrictive and looks to be what you finally achieve (like
exactly 10 distinct created_at values per partition). If I understand your
need you would like to have distinct values for other clustering keys too ?
Is that correct ?
The order of your clustering columns matter in Cassandra, it might be the
case for this test, maybe saying uniform(10..10) on the last column means
the previous part of the key should be the same for all the rows. This is
an assumption, probably wrong that you might want to check. Or maybe are
you defining keys as part of the partition key - like ((user_id,
event_type, session_type), created_at)? The schema would help here.

This key being the last one, and you saying you want 10 of those seems to
be forcing other clustering columns to stick with one value somehow, but
once again, I am not sure about how it works :/. So I would basically to
play with number to try to understand the behavior of this tool while using
multiple clustering keys.

What about the result using the following (for example) ?

 - name: created_at
   cluster: uniform(3..20)

I am really uncertain about this, but I imagine it is better to have
something to try than no answer :-).

Let us know how it goes.

C*heers,
-
Alain Rodriguez
France

The Last Pickle
http://www.thelastpickle.com

2016-02-02 8:55 GMT+00:00 Ralf Steppacher :

> I am trying to get the stress tool to generate random values for three
> clustering keys. I am trying to simulate collecting events per user id
> (text, partition key). Events have a session type (text), event type
> (text), and creation time (timestamp) (clustering keys, in that order). For
> testing purposes I ended up with the following column spec:
>
> columnspec:
>  - name: created_at
>cluster: uniform(10..10)
>  - name: event_type
>size: uniform(5..10)
>population: uniform(1..30)
>cluster: uniform(1..30)
>  - name: session_type
>size: fixed(5)
>population: uniform(1..4)
>cluster: uniform(1..4)
>  - name: user_id
>size: fixed(15)
>population: uniform(1..100)
>  - name: message
>size: uniform(10..100)
>population: uniform(1..100B)
>
> My expectation was that this would lead to anywhere between 10 and 1200
> rows to be created per partition key. But it seems that exactly 10 rows are
> being created, with the created_at timestamp being the only variable that
> is assigned variable values (per partition key). The session_type and
> event_type variables are assigned fixed values. This is even the case if I
> set the cluster distribution to uniform(1..30) and uniform(4..4)
> respectively. With this setting I expected 1200 rows per partition key to
> be created, as announced when running the stress tool, but it is still 10.
>
> [rsteppac@centos bin]$ ./cassandra-stress user
> profile=../batch_too_large.yaml ops\(insert=1\) -log level=verbose
> file=~/centos_eventy_patient_session_event_timestamp_insert_only.log -node
> 10.211.55.8
> …
> Created schema. Sleeping 1s for propagation.
> Generating batches with [1..1] partitions and [1..1] rows (of [1200..1200]
> total rows in the partitions)
> Improvement over 4 threadCount: 19%
> ...
>
>
> Sample of generated data:
>
> cqlsh> select user_id, event_type, session_type, created_at from
> stresscql.batch_too_large LIMIT 30 ;
>
> user_id | event_type   | session_type | created_at
>
> -+--+--+--
>%\x7f\x03/.d29 08:14:11+
>%\x7f\x03/.d29 04:04:56+
>%\x7f\x03/.d29 00:39:23+
>%\x7f\x03/.d29 19:56:30+
>%\x7f\x03/.d29 20:46:26+
>%\x7f\x03/.d29 03:27:17+
>%\x7f\x03/.d29 23:30:34+
>%\x7f\x03/.d29 02:41:28+
>%\x7f\x03/.d29 07:23:48+
>%\x7f\x03/.d29

How to migrate MYSQL to cassandra -special case in scoop+datastax

2016-02-04 Thread Atul Saroha

MySQL Table:
User | PhoneNumber

raman1234
bhuvan  2345
atul   5678

Using single Collection column and map collection :

Phone Number
---
Map 



Want to transform data in this way, i.e. key is mapped from value of "user"
column in the map.

Any help will be appreciated.



-
Atul Saroha
*Sr. Software Engineer*
*M*: +91 8447784271 *T*: +91 124-415-6069 *EXT*: 12369
Plot # 362, ASF Centre - Tower A, Udyog Vihar,
 Phase -4, Sector 18, Gurgaon, Haryana 122016, INDIA

Re: "Not enough replicas available for query" after reboot

2016-02-04 Thread Peddi, Praveen

Are you able to run queries using cqlsh with consistency ALL?

On Feb 4, 2016, at 6:32 PM, Flavien Charlon 
> wrote:

No, there was no other change. I did run "apt-get upgrade" before rebooting, 
but Cassandra has not been upgraded.

On 4 February 2016 at 22:48, Bryan Cheng 
> wrote:
Hey Flavien!

Did your reboot come with any other changes (schema, configuration, topology, 
version)?

On Thu, Feb 4, 2016 at 2:06 PM, Flavien Charlon 
> wrote:
I'm using the C# driver 2.5.2. I did try to restart the client application, but 
that didn't make any difference, I still get the same error after restart.

On 4 February 2016 at 21:54, 
> wrote:
What client are you using?

It is possible that the client saw nodes down and has kept them marked that way 
(without retrying). Depending on the client, you may have options to set in 
RetryPolicy, FailoverPolicy, etc. A bounce of the client will probably fix the 
problem for now.

Sean Durity

From: Flavien Charlon 
[mailto:flavien.char...@gmail.com]
Sent: Thursday, February 04, 2016 4:06 PM
To: user@cassandra.apache.org
Subject: Re: "Not enough replicas available for query" after reboot

Yes, all three nodes see all three nodes as UN.

Also, connecting from a local Cassandra machine using cqlsh, I can run the same 
query just fine (with QUORUM consistency level).

On 4 February 2016 at 21:02, Robert Coli 
> wrote:
On Thu, Feb 4, 2016 at 12:53 PM, Flavien Charlon 
> wrote:
My cluster was running fine. I rebooted all three nodes (one by one), and now 
all nodes are back up and running. "nodetool status" shows UP for all three 
nodes on all three nodes:

--  AddressLoad   Tokens  OwnsHost ID   
Rack
UN  xx.xx.xx.xx331.84 GB  1   ?   
d3d3a79b-9ca5-43f9-88c4-c3c7f08ca538  RAC1
UN  xx.xx.xx.xx317.2 GB   1   ?   
de7917ed-0de9-434d-be88-bc91eb4f8713  RAC1
UN  xx.xx.xx.xx  291.61 GB  1   ?   
b489c970-68db-44a7-90c6-be734b41475f  RAC1

However, now the client application fails to run queries on the cluster with:

Cassandra.UnavailableException: Not enough replicas available for query at 
consistency Quorum (2 required but only 1 alive)

Do *all* nodes see each other as UP/UN?

=Rob

The information in this Internet Email is confidential and may be legally 
privileged. It is intended solely for the addressee. Access to this Email by 
anyone else is unauthorized. If you are not the intended recipient, any 
disclosure, copying, distribution or any action taken or omitted to be taken in 
reliance on it, is prohibited and may be unlawful. When addressed to our 
clients any opinions or advice contained in this Email are subject to the terms 
and conditions expressed in any applicable governing The Home Depot terms of 
business or client engagement letter. The Home Depot disclaims all 
responsibility and liability for the accuracy and content of this attachment 
and for any damages or losses arising from any inaccuracies, errors, viruses, 
e.g., worms, trojan horses, etc., or other items of a destructive nature, which 
may be contained in this attachment and shall not be liable for direct, 
indirect, consequential or special damages in connection with this e-mail 
message or its attachment.

Re: "Not enough replicas available for query" after reboot

2016-02-04 Thread Flavien Charlon

Yes, that works with consistency ALL.

I restarted one of the Cassandra instances, and seems it's working again
now. I'm not sure what happened.

On 4 February 2016 at 23:48, Peddi, Praveen  wrote:

> Are you able to run queries using cqlsh with consistency ALL?
>
> On Feb 4, 2016, at 6:32 PM, Flavien Charlon 
> wrote:
>
> No, there was no other change. I did run "apt-get upgrade" before
> rebooting, but Cassandra has not been upgraded.
>
> On 4 February 2016 at 22:48, Bryan Cheng  wrote:
>
>> Hey Flavien!
>>
>> Did your reboot come with any other changes (schema, configuration,
>> topology, version)?
>>
>> On Thu, Feb 4, 2016 at 2:06 PM, Flavien Charlon <
>> flavien.char...@gmail.com> wrote:
>>
>>> I'm using the C# driver 2.5.2. I did try to restart the client
>>> application, but that didn't make any difference, I still get the same
>>> error after restart.
>>>
>>> On 4 February 2016 at 21:54,  wrote:
>>>
 What client are you using?

 It is possible that the client saw nodes down and has kept them marked
 that way (without retrying). Depending on the client, you may have options
 to set in RetryPolicy, FailoverPolicy, etc. A bounce of the client will
 probably fix the problem for now.

 Sean Durity

 *From:* Flavien Charlon [mailto:flavien.char...@gmail.com]
 *Sent:* Thursday, February 04, 2016 4:06 PM
 *To:* user@cassandra.apache.org
 *Subject:* Re: "Not enough replicas available for query" after reboot

 Yes, all three nodes see all three nodes as UN.

 Also, connecting from a local Cassandra machine using cqlsh, I can run
 the same query just fine (with QUORUM consistency level).

 On 4 February 2016 at 21:02, Robert Coli  wrote:

 On Thu, Feb 4, 2016 at 12:53 PM, Flavien Charlon <
 flavien.char...@gmail.com> wrote:

 My cluster was running fine. I rebooted all three nodes (one by one),
 and now all nodes are back up and running. "nodetool status" shows UP for
 all three nodes on all three nodes:

 --  AddressLoad   Tokens  OwnsHost ID
 Rack

 UN  xx.xx.xx.xx331.84 GB  1   ?
 d3d3a79b-9ca5-43f9-88c4-c3c7f08ca538  RAC1

 UN  xx.xx.xx.xx317.2 GB   1   ?
 de7917ed-0de9-434d-be88-bc91eb4f8713  RAC1

 UN  xx.xx.xx.xx  291.61 GB  1   ?
 b489c970-68db-44a7-90c6-be734b41475f  RAC1

 However, now the client application fails to run queries on the cluster
 with:

 Cassandra.UnavailableException: Not enough replicas available for query
 at consistency Quorum (2 required but only 1 alive)

 Do *all* nodes see each other as UP/UN?

 =Rob

 --

 The information in this Internet Email is confidential and may be
 legally privileged. It is intended solely for the addressee. Access to this
 Email by anyone else is unauthorized. If you are not the intended
 recipient, any disclosure, copying, distribution or any action taken or
 omitted to be taken in reliance on it, is prohibited and may be unlawful.
 When addressed to our clients any opinions or advice contained in this
 Email are subject to the terms and conditions expressed in any applicable
 governing The Home Depot terms of business or client engagement letter. The
 Home Depot disclaims all responsibility and liability for the accuracy and
 content of this attachment and for any damages or losses arising from any
 inaccuracies, errors, viruses, e.g., worms, trojan horses, etc., or other
 items of a destructive nature, which may be contained in this attachment
 and shall not be liable for direct, indirect, consequential or special
 damages in connection with this e-mail message or its attachment.

>>>
>>>
>>
>

Re: Atomic Batch: Maintaining consistency between tables

2016-02-04 Thread Carlos Alonso

Hi,

The coordinator will send mutations to all required replicas and wait for
required acknowledgements to fulfil consistency level.

Regards

Carlos Alonso | Software Engineer | @calonso 

On 4 February 2016 at 11:56,  wrote:

> Hello,
>
>
>
> I read in the documentation DataStax :
>
>
>
> “The coordinator node might also need to work hard to process a logged
> batch while maintaining consistency between tables”
>
>
>
> It means that the coordinator send the mutations  to all replica nodes and
> waits for RF acknowledgements ? or only one node of set of replica ?
>
>
>
> thx
>
> _
>
> Ce message et ses pieces jointes peuvent contenir des informations 
> confidentielles ou privilegiees et ne doivent donc
> pas etre diffuses, exploites ou copies sans autorisation. Si vous avez recu 
> ce message par erreur, veuillez le signaler
> a l'expediteur et le detruire ainsi que les pieces jointes. Les messages 
> electroniques etant susceptibles d'alteration,
> Orange decline toute responsabilite si ce message a ete altere, deforme ou 
> falsifie. Merci.
>
> This message and its attachments may contain confidential or privileged 
> information that may be protected by law;
> they should not be distributed, used or copied without authorisation.
> If you have received this email in error, please notify the sender and delete 
> this message and its attachments.
> As emails may be altered, Orange is not liable for messages that have been 
> modified, changed or falsified.
> Thank you.
>
>

RE: Duplicated key with an IN statement

2016-02-04 Thread Edouard COLE

Hello,

When running that kind of query with TRACING ON; I noticed the coordinator is 
also performing multiple time the same query

Because the element in the IN statement can involve many nodes, it makes sense 
to map/reduce the query, but running multiple time the same sub query should 
not happen. What if the result set change? Let’s imagine that query : SELECT * 
FROM t WHERE key IN (123, 123, …. X1000, 123), and while this query runs, the 
data for 123 change?

key | value
-+---
123 |   456
123 |   456
 123 |   456
 123 |   789 <-- Change here ☹
123 |   789


There’s also something very important: when your table define a tuple being 
unique for a specific key, this is a real problem to be able to have a result 
set having multiple time the same key, which should be unique. This is why on 
every SQL implementation, this is not happening

I think this is a bug

Edouard COLE


De : Alain RODRIGUEZ [mailto:arodr...@gmail.com]
Envoyé : Thursday, February 04, 2016 11:55 AM
À : Edouard COLE
Cc : user@cassandra.apache.org
Objet : Re: Duplicated key with an IN statement

Hi,

This is interesting.

It seems rational that if you are looking at 2 keys and both exist (which is 
the case) it returns you 2 keys, it. Yet, I just checked this kind of command 
on MySQL and it gives a one line result. So here CQL differs from SQL (at least 
MySQL). I know we are trying to fit as much as possible with SQL to avoid 
loosing people, so we might want to change this.
Not sure if this behavior is intentional / known. Not even sure someone ever 
tried to do this kind of query actually :).

Does anyone know about that ? Should we raise a ticket ?

-
Alain Rodriguez
France

The Last Pickle
http://www.thelastpickle.com



2016-02-04 8:36 GMT+00:00 Edouard COLE 
>:
Hello,

I just discovered this, and I think this is weird:

ed@debian:~$ cqlsh 192.168.10.8
Connected to _CLUSTER_ at 192.168.10.8:9160.
[cqlsh 4.0.1 | Cassandra 2.0.14.459 | CQL spec 3.1.1 | Thrift protocol 19.39.0]
Use HELP for help.
cqlsh> USE ks-test ;
cqlsh:ks-test> CREATE TABLE t (
... key int,
... value int,
... PRIMARY KEY (key)
... );
cqlsh:ks-test> INSERT INTO t (key, value) VALUES (123, 456) ;
cqlsh:ks-test> SELECT * FROM t ;

 key | value
-+---
 123 |   456

(1 rows)

cqlsh:ks-test> SELECT * FROM t WHERE key IN (123, 123);

 key | value
-+---
 123 |   456
 123 |   456 <- WTF?

(2 rows)

Adding multiple time the same key into an IN statement make the query returns 
multiple time the tuple

This looks weird to me, can anyone give me some feedback on such a behavior?

Edouard COLE

Re: Duplicated key with an IN statement

2016-02-04 Thread Sylvain Lebresne

That behavior has been changed in 2.2 and upwards. If you don't like it,
upgrade. In the meantime, it's probably not hard to avoid passing duplicate
keys in IN.

On Thu, Feb 4, 2016 at 3:48 PM, Edouard COLE 
wrote:

> Hello,
>
>
>
> When running that kind of query with TRACING ON; I noticed the coordinator
> is also performing multiple time the same query
>
>
>
> Because the element in the IN statement can involve many nodes, it makes
> sense to map/reduce the query, but running multiple time the same sub query
> should not happen. What if the result set change? Let’s imagine that query
> : SELECT * FROM t WHERE key IN (123, 123, …. X1000, 123), and while this
> query runs, the data for 123 change?
>
>
>
> key | value
>
> -+---
>
> 123 |   456
>
> 123 |   456
>
>  123 |   456
>
>  123 |   789 <-- Change here L
>
> 123 |   789
>
>
>
>
>
> There’s also something very important: when your table define a tuple
> being unique for a specific key, this is a real problem to be able to have
> a result set having multiple time the same key, which should be unique.
> This is why on every SQL implementation, this is not happening
>
>
>
> I think this is a bug
>
>
>
> Edouard COLE
>
>
>
>
>
> *De :* Alain RODRIGUEZ [mailto:arodr...@gmail.com]
> *Envoyé :* Thursday, February 04, 2016 11:55 AM
> *À :* Edouard COLE
> *Cc :* user@cassandra.apache.org
> *Objet :* Re: Duplicated key with an IN statement
>
>
>
> Hi,
>
>
>
> This is interesting.
>
>
>
> It seems rational that if you are looking at 2 keys and both exist (which
> is the case) it returns you 2 keys, it. Yet, I just checked this kind of
> command on MySQL and it gives a one line result. So here CQL differs from
> SQL (at least MySQL). I know we are trying to fit as much as possible with
> SQL to avoid loosing people, so we might want to change this.
>
> Not sure if this behavior is intentional / known. Not even sure someone
> ever tried to do this kind of query actually :).
>
>
>
> Does anyone know about that ? Should we raise a ticket ?
>
>
>
> -
>
> Alain Rodriguez
>
> France
>
>
>
> The Last Pickle
>
> http://www.thelastpickle.com
>
>
>
>
>
>
>
> 2016-02-04 8:36 GMT+00:00 Edouard COLE :
>
> Hello,
>
> I just discovered this, and I think this is weird:
>
> ed@debian:~$ cqlsh 192.168.10.8
> Connected to _CLUSTER_ at 192.168.10.8:9160.
> [cqlsh 4.0.1 | Cassandra 2.0.14.459 | CQL spec 3.1.1 | Thrift protocol
> 19.39.0]
> Use HELP for help.
> cqlsh> USE ks-test ;
> cqlsh:ks-test> CREATE TABLE t (
> ... key int,
> ... value int,
> ... PRIMARY KEY (key)
> ... );
> cqlsh:ks-test> INSERT INTO t (key, value) VALUES (123, 456) ;
> cqlsh:ks-test> SELECT * FROM t ;
>
>  key | value
> -+---
>  123 |   456
>
> (1 rows)
>
> cqlsh:ks-test> SELECT * FROM t WHERE key IN (123, 123);
>
>  key | value
> -+---
>  123 |   456
>  123 |   456 <- WTF?
>
> (2 rows)
>
> Adding multiple time the same key into an IN statement make the query
> returns multiple time the tuple
>
> This looks weird to me, can anyone give me some feedback on such a
> behavior?
>
> Edouard COLE
>
>
>

Restart Cassandra automatically

2016-02-04 Thread Debraj Manna

Hi,

What is the best way to keep cassandra running? My requirement is if for
some reason cassandra stops then it should get started automatically.

I tried to achieve this by adding cassandra to supervisord. My supervisor
conf for cassandra looks like below:-

[program:cassandra]
command=/bin/bash -c 'sleep 10 && bin/cassandra'
directory=/opt/cassandra/
autostart=true
autorestart=true
startretries=3
stderr_logfile=/var/log/cassandra_supervisor.err.log
stdout_logfile=/var/log/cassandra_supervisor.out.log

But it does not seem to work properly. Even if I stop cassandra from
supervisor then the cassandra process seem to be running if I do

ps -ef | grep cassandra

I also tried the configuration mentioned in this question

but still no luck.

Can someone let me know what is the best way to keep cassandra running on
production environment?
Environment

   - Cassandra 2.2.4
   - Debian 8

Thanks,

Re: Duplicated key with an IN statement

2016-02-04 Thread Jack Krupansky

Sylvain, there's a bug in CHANGES.TXT for this issue. It says: "Duplicate
rows returned when in clause has repeated values (CASSANDRA-6707)", but the
issue number is really 6706.

-- Jack Krupansky

On Thu, Feb 4, 2016 at 9:54 AM, Sylvain Lebresne 
wrote:

> That behavior has been changed in 2.2 and upwards. If you don't like it,
> upgrade. In the meantime, it's probably not hard to avoid passing duplicate
> keys in IN.
>
> On Thu, Feb 4, 2016 at 3:48 PM, Edouard COLE 
> wrote:
>
>> Hello,
>>
>>
>>
>> When running that kind of query with TRACING ON; I noticed the
>> coordinator is also performing multiple time the same query
>>
>>
>>
>> Because the element in the IN statement can involve many nodes, it makes
>> sense to map/reduce the query, but running multiple time the same sub query
>> should not happen. What if the result set change? Let’s imagine that query
>> : SELECT * FROM t WHERE key IN (123, 123, …. X1000, 123), and while this
>> query runs, the data for 123 change?
>>
>>
>>
>> key | value
>>
>> -+---
>>
>> 123 |   456
>>
>> 123 |   456
>>
>>  123 |   456
>>
>>  123 |   789 <-- Change here L
>>
>> 123 |   789
>>
>>
>>
>>
>>
>> There’s also something very important: when your table define a tuple
>> being unique for a specific key, this is a real problem to be able to have
>> a result set having multiple time the same key, which should be unique.
>> This is why on every SQL implementation, this is not happening
>>
>>
>>
>> I think this is a bug
>>
>>
>>
>> Edouard COLE
>>
>>
>>
>>
>>
>> *De :* Alain RODRIGUEZ [mailto:arodr...@gmail.com]
>> *Envoyé :* Thursday, February 04, 2016 11:55 AM
>> *À :* Edouard COLE
>> *Cc :* user@cassandra.apache.org
>> *Objet :* Re: Duplicated key with an IN statement
>>
>>
>>
>> Hi,
>>
>>
>>
>> This is interesting.
>>
>>
>>
>> It seems rational that if you are looking at 2 keys and both exist (which
>> is the case) it returns you 2 keys, it. Yet, I just checked this kind of
>> command on MySQL and it gives a one line result. So here CQL differs from
>> SQL (at least MySQL). I know we are trying to fit as much as possible with
>> SQL to avoid loosing people, so we might want to change this.
>>
>> Not sure if this behavior is intentional / known. Not even sure someone
>> ever tried to do this kind of query actually :).
>>
>>
>>
>> Does anyone know about that ? Should we raise a ticket ?
>>
>>
>>
>> -
>>
>> Alain Rodriguez
>>
>> France
>>
>>
>>
>> The Last Pickle
>>
>> http://www.thelastpickle.com
>>
>>
>>
>>
>>
>>
>>
>> 2016-02-04 8:36 GMT+00:00 Edouard COLE :
>>
>> Hello,
>>
>> I just discovered this, and I think this is weird:
>>
>> ed@debian:~$ cqlsh 192.168.10.8
>> Connected to _CLUSTER_ at 192.168.10.8:9160.
>> [cqlsh 4.0.1 | Cassandra 2.0.14.459 | CQL spec 3.1.1 | Thrift protocol
>> 19.39.0]
>> Use HELP for help.
>> cqlsh> USE ks-test ;
>> cqlsh:ks-test> CREATE TABLE t (
>> ... key int,
>> ... value int,
>> ... PRIMARY KEY (key)
>> ... );
>> cqlsh:ks-test> INSERT INTO t (key, value) VALUES (123, 456) ;
>> cqlsh:ks-test> SELECT * FROM t ;
>>
>>  key | value
>> -+---
>>  123 |   456
>>
>> (1 rows)
>>
>> cqlsh:ks-test> SELECT * FROM t WHERE key IN (123, 123);
>>
>>  key | value
>> -+---
>>  123 |   456
>>  123 |   456 <- WTF?
>>
>> (2 rows)
>>
>> Adding multiple time the same key into an IN statement make the query
>> returns multiple time the tuple
>>
>> This looks weird to me, can anyone give me some feedback on such a
>> behavior?
>>
>> Edouard COLE
>>
>>
>>
>
>

Re: Duplicated key with an IN statement

2016-02-04 Thread Robert Wille

You shouldn’t be using IN anyway. It is better to issue multiple queries, each 
for a single key, and issue them in parallel. Better performance. Less GC 
pressure.

On Feb 4, 2016, at 7:54 AM, Sylvain Lebresne 
> wrote:

That behavior has been changed in 2.2 and upwards. If you don't like it, 
upgrade. In the meantime, it's probably not hard to avoid passing duplicate 
keys in IN.

On Thu, Feb 4, 2016 at 3:48 PM, Edouard COLE 
> wrote:
Hello,

When running that kind of query with TRACING ON; I noticed the coordinator is 
also performing multiple time the same query

Because the element in the IN statement can involve many nodes, it makes sense 
to map/reduce the query, but running multiple time the same sub query should 
not happen. What if the result set change? Let’s imagine that query : SELECT * 
FROM t WHERE key IN (123, 123, …. X1000, 123), and while this query runs, the 
data for 123 change?

key | value
-+---
123 |   456
123 |   456
 123 |   456
 123 |   789 <-- Change here :(
123 |   789

There’s also something very important: when your table define a tuple being 
unique for a specific key, this is a real problem to be able to have a result 
set having multiple time the same key, which should be unique. This is why on 
every SQL implementation, this is not happening

I think this is a bug

Edouard COLE

De : Alain RODRIGUEZ [mailto:arodr...@gmail.com]
Envoyé : Thursday, February 04, 2016 11:55 AM
À : Edouard COLE
Cc : user@cassandra.apache.org
Objet : Re: Duplicated key with an IN statement

Hi,

This is interesting.

It seems rational that if you are looking at 2 keys and both exist (which is 
the case) it returns you 2 keys, it. Yet, I just checked this kind of command 
on MySQL and it gives a one line result. So here CQL differs from SQL (at least 
MySQL). I know we are trying to fit as much as possible with SQL to avoid 
loosing people, so we might want to change this.
Not sure if this behavior is intentional / known. Not even sure someone ever 
tried to do this kind of query actually :).

Does anyone know about that ? Should we raise a ticket ?

-
Alain Rodriguez
France

The Last Pickle
http://www.thelastpickle.com

2016-02-04 8:36 GMT+00:00 Edouard COLE 
>:
Hello,

I just discovered this, and I think this is weird:

ed@debian:~$ cqlsh 192.168.10.8
Connected to _CLUSTER_ at 192.168.10.8:9160.
[cqlsh 4.0.1 | Cassandra 2.0.14.459 | CQL spec 3.1.1 | Thrift protocol 19.39.0]
Use HELP for help.
cqlsh> USE ks-test ;
cqlsh:ks-test> CREATE TABLE t (
... key int,
... value int,
... PRIMARY KEY (key)
... );
cqlsh:ks-test> INSERT INTO t (key, value) VALUES (123, 456) ;
cqlsh:ks-test> SELECT * FROM t ;

 key | value
-+---
 123 |   456

(1 rows)

cqlsh:ks-test> SELECT * FROM t WHERE key IN (123, 123);

 key | value
-+---
 123 |   456
 123 |   456 <- WTF?

(2 rows)

Adding multiple time the same key into an IN statement make the query returns 
multiple time the tuple

This looks weird to me, can anyone give me some feedback on such a behavior?

Edouard COLE

Re: Any tips on how to track down why Cassandra won't cluster?

2016-02-04 Thread Victor Chen

Along the lines of what Ben and Bryan suggested, what are you using to
verify ports are open? If you do something like:

node1$ nc -zv node2 9042
node2$ nc -zv node1 9042

does it succeed from both nodes?
Does the first node 'know' that it is a seed? i.e. do you have first node
listed in its own seed's list?
What does the system.log show as both nodes are spun up?

On Wed, Feb 3, 2016 at 7:20 PM, Bryan Cheng  wrote:

>
>
>
>> On Wed, 3 Feb 2016 at 11:49 Richard L. Burton III 
>> wrote:
>>
>>>
>>> Any suggestions on how to track down what might trigger this problem?
>>> I'm not receiving any exceptions.
>>>
>>
> You're not getting "Unable to gossip with any seeds" on the second node?
> What does nodetool status show on both machines?
>

RE: Duplicated key with an IN statement

2016-02-04 Thread Edouard COLE

Thanks :)

De : Robert Wille [mailto:rwi...@fold3.com]
Envoyé : Thursday, February 04, 2016 4:37 PM
À : user@cassandra.apache.org
Objet : Re: Duplicated key with an IN statement

You shouldn't be using IN anyway. It is better to issue multiple queries, each 
for a single key, and issue them in parallel. Better performance. Less GC 
pressure.

On Feb 4, 2016, at 7:54 AM, Sylvain Lebresne 
> wrote:

That behavior has been changed in 2.2 and upwards. If you don't like it, 
upgrade. In the meantime, it's probably not hard to avoid passing duplicate 
keys in IN.

On Thu, Feb 4, 2016 at 3:48 PM, Edouard COLE 
> wrote:
Hello,

When running that kind of query with TRACING ON; I noticed the coordinator is 
also performing multiple time the same query

Because the element in the IN statement can involve many nodes, it makes sense 
to map/reduce the query, but running multiple time the same sub query should 
not happen. What if the result set change? Let's imagine that query : SELECT * 
FROM t WHERE key IN (123, 123,  X1000, 123), and while this query runs, the 
data for 123 change?

key | value
-+---
123 |   456
123 |   456
 123 |   456
 123 |   789 <-- Change here :(
123 |   789

There's also something very important: when your table define a tuple being 
unique for a specific key, this is a real problem to be able to have a result 
set having multiple time the same key, which should be unique. This is why on 
every SQL implementation, this is not happening

I think this is a bug

Edouard COLE

De : Alain RODRIGUEZ [mailto:arodr...@gmail.com]
Envoyé : Thursday, February 04, 2016 11:55 AM
À : Edouard COLE
Cc : user@cassandra.apache.org
Objet : Re: Duplicated key with an IN statement

Hi,

This is interesting.

It seems rational that if you are looking at 2 keys and both exist (which is 
the case) it returns you 2 keys, it. Yet, I just checked this kind of command 
on MySQL and it gives a one line result. So here CQL differs from SQL (at least 
MySQL). I know we are trying to fit as much as possible with SQL to avoid 
loosing people, so we might want to change this.
Not sure if this behavior is intentional / known. Not even sure someone ever 
tried to do this kind of query actually :).

Does anyone know about that ? Should we raise a ticket ?

-
Alain Rodriguez
France

The Last Pickle
http://www.thelastpickle.com

2016-02-04 8:36 GMT+00:00 Edouard COLE 
>:
Hello,

I just discovered this, and I think this is weird:

ed@debian:~$ cqlsh 192.168.10.8
Connected to _CLUSTER_ at 192.168.10.8:9160.
[cqlsh 4.0.1 | Cassandra 2.0.14.459 | CQL spec 3.1.1 | Thrift protocol 19.39.0]
Use HELP for help.
cqlsh> USE ks-test ;
cqlsh:ks-test> CREATE TABLE t (
... key int,
... value int,
... PRIMARY KEY (key)
... );
cqlsh:ks-test> INSERT INTO t (key, value) VALUES (123, 456) ;
cqlsh:ks-test> SELECT * FROM t ;

 key | value
-+---
 123 |   456

(1 rows)

cqlsh:ks-test> SELECT * FROM t WHERE key IN (123, 123);

 key | value
-+---
 123 |   456
 123 |   456 <- WTF?

(2 rows)

Adding multiple time the same key into an IN statement make the query returns 
multiple time the tuple

This looks weird to me, can anyone give me some feedback on such a behavior?

Edouard COLE

Re: Want inputs about super column family vs map/list

2016-02-04 Thread Robert Coli

On Thu, Feb 4, 2016 at 2:37 AM, Bhuvan Rawal  wrote:

> 1. Using super column family:
>

Super columns have been not-recommended for use for about five years now.

=Rob

Re: Duplicated key with an IN statement

2016-02-04 Thread Tyler Hobbs

On Thu, Feb 4, 2016 at 9:57 AM, Jack Krupansky 
wrote:

> there's a bug in CHANGES.TXT for this issue. It says: "Duplicate rows
> returned when in clause has repeated values (CASSANDRA-6707)", but the
> issue number is really 6706.
>

Thanks, I've fixed this.


-- 
Tyler Hobbs
DataStax

"Not enough replicas available for query" after reboot

2016-02-04 Thread Flavien Charlon

Hi,

My cluster was running fine. I rebooted all three nodes (one by one), and
now all nodes are back up and running. "nodetool status" shows UP for all
three nodes on all three nodes:

--  AddressLoad   Tokens  OwnsHost ID
Rack
UN  xx.xx.xx.xx331.84 GB  1   ?
d3d3a79b-9ca5-43f9-88c4-c3c7f08ca538  RAC1
UN  xx.xx.xx.xx317.2 GB   1   ?
de7917ed-0de9-434d-be88-bc91eb4f8713  RAC1
UN  xx.xx.xx.xx  291.61 GB  1   ?
b489c970-68db-44a7-90c6-be734b41475f  RAC1

However, now the client application fails to run queries on the cluster
with:

Cassandra.UnavailableException: Not enough replicas available for query at
> consistency Quorum (2 required but only 1 alive)


The replication factor is 3. I am running Cassandra 2.1.7.

Any idea where that could come from or how to troubleshoot this further?

Best,
Flavien

Duplicated key with an IN statement

2016-02-04 Thread Edouard COLE

Hello,

I just discovered this, and I think this is weird:

ed@debian:~$ cqlsh 192.168.10.8
Connected to _CLUSTER_ at 192.168.10.8:9160.
[cqlsh 4.0.1 | Cassandra 2.0.14.459 | CQL spec 3.1.1 | Thrift protocol 19.39.0]
Use HELP for help.
cqlsh> USE ks-test ;
cqlsh:ks-test> CREATE TABLE t (
... key int,
... value int,
... PRIMARY KEY (key)
... );
cqlsh:ks-test> INSERT INTO t (key, value) VALUES (123, 456) ;
cqlsh:ks-test> SELECT * FROM t ;

 key | value
-+---
 123 |   456

(1 rows)

cqlsh:ks-test> SELECT * FROM t WHERE key IN (123, 123);

 key | value
-+---
 123 |   456
 123 |   456 <- WTF?

(2 rows)

Adding multiple time the same key into an IN statement make the query returns 
multiple time the tuple

This looks weird to me, can anyone give me some feedback on such a behavior?

Edouard COLE

Re: Want inputs about super column family vs map/list

Atomic Batch: Maintaining consistency between tables

Re: Duplicated key with an IN statement

Want inputs about super column family vs map/list

Re: "Not enough replicas available for query" after reboot

Re: Any tips on how to track down why Cassandra won't cluster?

Re: "Not enough replicas available for query" after reboot

RE: "Not enough replicas available for query" after reboot

Re: "Not enough replicas available for query" after reboot

System block cache vs. disk access and metrics

Re: "Not enough replicas available for query" after reboot

Re: Clustering key values not distributed

How to migrate MYSQL to cassandra -special case in scoop+datastax

Re: "Not enough replicas available for query" after reboot

Re: "Not enough replicas available for query" after reboot

Re: Atomic Batch: Maintaining consistency between tables

RE: Duplicated key with an IN statement

Re: Duplicated key with an IN statement

Restart Cassandra automatically

Re: Duplicated key with an IN statement

Re: Duplicated key with an IN statement

Re: Any tips on how to track down why Cassandra won't cluster?

RE: Duplicated key with an IN statement

Re: Want inputs about super column family vs map/list

Re: Duplicated key with an IN statement

"Not enough replicas available for query" after reboot

Duplicated key with an IN statement

27 matches

Site Navigation

Mail list logo

Footer information