Re: Performance impact with ALLOW FILTERING clause.

2019-07-25 Thread Jacques-Henri Berthemet
Hi Asad,

That’s because of the way Spark works. Essentially, when you execute a Spark 
job, it pulls the full content of the datastore (Cassandra in your case) in it 
RDDs and works with it “in memory”. While Spark uses “data locality” to read 
data from the nodes that have the required data on its local disks, it’s still 
reading all data from Cassandra tables. To do so it’s sending ‘select * from 
Table ALLOW FILTERING’ query to Cassandra.

From Spark you don’t have much control on the initial query to fill the RDDs, 
sometimes you’ll read the whole table even if you only need one row.

Regards,
Jacques-Henri Berthemet

From: "ZAIDI, ASAD A" 
Reply to: "user@cassandra.apache.org" 
Date: Thursday 25 July 2019 at 15:49
To: "user@cassandra.apache.org" 
Subject: Performance impact with ALLOW FILTERING clause.

Hello Folks,

I was going thru documentation and saw at many places saying ALLOW FILTERING 
causes performance unpredictability.  Our developers says ALLOW FILTERING 
clause is implicitly added on bunch of queries by spark-Cassandra  connector 
and they cannot control it; however at the same time we see unpredictability in 
application performance – just as documentation says.

I’m trying to understand why would a connector add a clause in query when this 
can cause negative impact on database/application performance. Is that data 
model that is driving connector make its decision and add allow filtering to 
query automatically or if there are other reason this clause is added to the 
code. I’m not a developer though I want to know why developer don’t have any 
control on this to happen.

I’ll appreciate your guidance here.

Thanks
Asad




Re: [EXTERNAL] multiple Cassandra instances per server, possible?

2019-04-18 Thread Jacques-Henri Berthemet
So how much data can you safely fit per node using SSDs with Cassandra 3.11? 
How much free space do you need on your disks?

There should be some recommendations on node sizes on:

http://cassandra.apache.org/doc/latest/operating/hardware.html

Documentation - Apache 
Cassandra
cassandra.apache.org
The Apache Cassandra database is the right choice when you need scalability and 
high availability without compromising performance. Linear scalability and 
proven fault-tolerance on commodity hardware or cloud infrastructure make it 
the perfect platform for mission-critical data. Cassandra's support for 
replicating across multiple datacenters is best-in-class, providing lower 
latency for your ...





From: Jon Haddad 
Sent: Thursday, April 18, 2019 6:43:15 PM
To: user@cassandra.apache.org
Subject: Re: [EXTERNAL] multiple Cassandra instances per server, possible?

Agreed with Jeff here.  The whole "community recommends no more than
1TB" has been around, and inaccurate, for a long time.

The biggest issue with dense nodes is how long it takes to replace
them.  4.0 should help with that under certain circumstances.


On Thu, Apr 18, 2019 at 6:57 AM Jeff Jirsa  wrote:
>
> Agreed that you can go larger than 1T on ssd
>
> You can do this safely with both instances in the same cluster if you 
> guarantee two replicas aren’t on the same machine. Cassandra provides a 
> primitive to do this - rack awareness through the network topology snitch.
>
> The limitation (until 4.0) is that you’ll need two IPs per machine as both 
> instances have to run in the same port.
>
>
> --
> Jeff Jirsa
>
>
> On Apr 18, 2019, at 6:45 AM, Durity, Sean R  
> wrote:
>
> What is the data problem that you are trying to solve with Cassandra? Is it 
> high availability? Low latency queries? Large data volumes? High concurrent 
> users? I would design the solution to fit the problem(s) you are solving.
>
>
>
> For example, if high availability is the goal, I would be very cautious about 
> 2 nodes/machine. If you need the full amount of the disk – you *can* have 
> larger nodes than 1 TB. I agree that administration tasks (like 
> adding/removing nodes, etc.) are more painful with large nodes – but not 
> impossible. For large amounts of data, I like nodes that have about 2.5 – 3 
> TB of usable SSD disk.
>
>
>
> It is possible that your nodes might be under-utilized, especially at first. 
> But if the hardware is already available, you have to use what you have.
>
>
>
> We have done multiple nodes on single physical hardware, but they were two 
> separate clusters (for the same application). In that case, we had  a 
> different install location and different ports for one of the clusters.
>
>
>
> Sean Durity
>
>
>
> From: William R 
> Sent: Thursday, April 18, 2019 9:14 AM
> To: user@cassandra.apache.org
> Subject: [EXTERNAL] multiple Cassandra instances per server, possible?
>
>
>
> Hi all,
>
>
>
> In our small company we have 10 nodes of (2 x 3 TB HD) 6 TB each, 128 GB ram 
> and 64 cores and we are thinking to use them as Cassandra nodes. From what I 
> am reading around, the community recommends that every node should not keep 
> more than 1 TB data so in this case I am wondering if it is possible to 
> install 2 instances per node using docker so each docker instance can write 
> to its own physical disk and utilise more efficiently the rest hardware (CPU 
> & RAM).
>
>
>
> I understand with this setup there is the danger of creating a single point 
> of failure for 2 Cassandra nodes but except that do you think that is a 
> possible setup to start with the cluster?
>
>
>
> Except the docker solution do you recommend any other way to split the 
> physical node to 2 instances? (VMWare? or even maybe 2 separate installations 
> of Cassandra? )
>
>
>
> Eventually we are aiming in a cluster consisted of 2 DCs with 10 nodes each 
> (5 baremetal nodes with 2 Cassandra instances)
>
>
>
> Probably later when we will start introducing more nodes to the cluster we 
> can decommissioning the "double-instaned" ones and aim for a more homogeneous 
> solution..
>
>
>
> Thank you,
>
>
>
> Wil
>
>
> 
>
> The information in this Internet Email is confidential and may be legally 
> privileged. It is intended solely for the addressee. Access to this Email by 
> anyone else is unauthorized. If you are not the intended recipient, any 
> disclosure, copying, distribution or any action taken or omitted to be taken 
> in reliance on it, is prohibited and may be unlawful. When addressed to our 
> clients any opinions or advice contained in this Email are subject to the 
> terms and conditions expressed in any applicable governing The Home Depot 
> terms of business or client engagement letter. The Home Depot disclaims all 
> responsibility and liability for the accuracy and content of this attachment 
> and for any damages or losses 

Re: cass-2.2 trigger - how to get clustering columns and value?

2019-04-11 Thread Jacques-Henri Berthemet
Hi,

You should take a look at how Stratio’s Lucene index decodes CFs and keys, 
start from RowService.doIndex() implementations:
https://github.com/Stratio/cassandra-lucene-index/tree/branch-2.2.13/plugin/src/main/java/com/stratio/cassandra/lucene/service

Note that in some cases an update without values is a delete of the Cell.

Regards,
Jacques-Henri Berthemet

From: Carl Mueller 
Reply-To: "user@cassandra.apache.org" 
Date: Wednesday 10 April 2019 at 23:53
To: "user@cassandra.apache.org" 
Subject: cass-2.2 trigger - how to get clustering columns and value?

We have a multitenant cluster that we can't upgrade to 3.x easily, and we'd 
like to migrate some apps off of the shared cluster to dedicated clusters.

This is a 2.2 cluster.

So I'm trying a trigger to track updates while we transition and will send via 
kafka. Right now I'm just trying to extract all the data from the incoming 
updates

so for
public Collection augment(ByteBuffer key, ColumnFamily update) {

the names returned by the update.getColumnNames() for an update of a table with 
two clustering columns and had a regular column update produced two 
CellName/Cells:

one has no name, and no apparent raw value (bytebuffer is empty)

the other is the data column.

I can extract the primary key from the key field

But how do I get the values of the two clustering columns? They aren't listed 
in the iterator, and they don't appear to be in the key field. Since clustering 
columns are encoded into the name of a cell, I'd imagine there might be some 
"unpacking" trick to that.


Re: Cassandra Possible read/write race condition in LOCAL_ONE?

2019-03-29 Thread Jacques-Henri Berthemet
If you use LOCAL_ONE for write and read and you have RF>1, it means both 
operations could go to different replicas that does not have the data yet.
Try to use LOCAL_QUORUM instead, as usual check your clocks as well.

From: Jeff Jirsa 
Reply-To: "user@cassandra.apache.org" 
Date: Thursday 28 March 2019 at 23:29
To: cassandra 
Subject: Re: Cassandra Possible read/write race condition in LOCAL_ONE?

Yes it can race; if you don't want to race, you'd want to use SERIAL or 
LOCAL_SERIAL.

On Thu, Mar 28, 2019 at 3:04 PM Richard Xin  
wrote:
Hi,
Our Cassandra Consistency level is currently set to LOCAL_ONE, we have script 
doing followings
1) insert one record into table_A
2) select last_inserted_record from table_A and do something ...

step #1 & 2 are running sequentially without pause,  and I assume 1 & 2 suppose 
to run in same DC

we are facing sporadic issues that step #2 didnt get inserted data by #1.
is it possible to have a race condition when LOCAL_ONE that #2 might not get 
inserted data on step #1?

Thanks in advance!
Richard


Re: SASI queries- cqlsh vs java driver

2019-02-04 Thread Jacques-Henri Berthemet
I’m not sure why it`s not allowed by the Datastax driver, but maybe you could 
try to use OR instead of IN?
SELECT blah FROM foo WHERE  = :val1 OR  = :val2 
ALLOW FILTERING

It should be the same as IN query, but I don’t if it makes a difference for 
performance.

From: Peter Heitman 
Reply-To: "user@cassandra.apache.org" 
Date: Monday 4 February 2019 at 07:17
To: "user@cassandra.apache.org" 
Subject: SASI queries- cqlsh vs java driver

When I create a SASI index on a secondary column, from cqlsh I can execute a 
query

SELECT blah FROM foo WHERE  IN ('mytext') ALLOW FILTERING;

but not from the java driver:

SELECT blah FROM foo WHERE  IN :val ALLOW FILTERING

Here I get an exception

com.datastax.driver.core.exceptions.InvalidQueryException: IN predicates on 
non-primary-key columns () is not yet supported
at 
com.datastax.driver.core.exceptions.InvalidQueryException.copy(InvalidQueryException.java:49)
 ~[cassandra-driver-core-3.6.0.jar:na]

Why are they different? Is there anything I can do with the java driver to get 
past this exception?

Peter




Re: OpenJDK and Windows Service Error

2019-01-02 Thread Jacques-Henri Berthemet
Apache Cassandra has no service executable, so you're probably using Datastax 
DDAC instead, if it's the case you should contact them for support.



From: Michael Shuler  on behalf of Michael Shuler 

Sent: Wednesday, January 2, 2019 17:15
To: user@cassandra.apache.org
Subject: Re: OpenJDK and Windows Service Error

On 1/2/19 9:58 AM, Rick L Johnson wrote:
> “Windows could not start the service on Local Computer. For more
> information review the System event log. If this is a non-Microsoft
> service, contact the service vendor and refer to the server specific
> error code 1”.

What does the System event log say? What else did you try as a result of
those log entries? etc.

Windows is a best-effort, untested, dev-only platform for the project.
You may need to work line by line through startup scripts, debug, then
offer suggested fixes to the project to keep Windows functional. There
are very few users on the platform, so it's really up to those few users
to keep it working.

--
Michael

-
To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
For additional commands, e-mail: user-h...@cassandra.apache.org



RE: Cassandra 4.0

2018-10-25 Thread Jacques-Henri Berthemet
Are there binary builds available for testing or is it source only?

--
Jacques-Henri Berthemet

-Original Message-
From: Nate McCall  
Sent: Wednesday, October 24, 2018 10:02 PM
To: Cassandra Users 
Subject: Re: Cassandra 4.0

When it's ready :)

In all seriousness, the past two blog posts include some discussion on our 
motivations and current goals with regard to 4.0:
http://cassandra.apache.org/blog/
On Wed, Oct 24, 2018 at 4:49 AM Abdul Patel  wrote:
>
> Hi all,
>
> Any idea when 4.0 is planned to release?

-
To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
For additional commands, e-mail: user-h...@cassandra.apache.org



RE: how to avoid lightwieght transactions

2018-06-21 Thread Jacques-Henri Berthemet
Hi,

Another way would be to make your PK a clustering key with Id as PK and time as 
clustering with type TimeUUID. Then you’ll always insert records, never update, 
for each “transaction” you’ll keep a row in the partition. Then when you’ll 
read all the rows for that partition by Id, you’ll process all of them to know 
the real status. For example, if final status must be “completed” and you have:

Id, TimeUUI, status
1, t0, added
1, t1, added
1, t2, completed
1, t3, added

When reading back you’ll just discard the last row.


If you’re only concerned about “insert or update” case but the data is actually 
the same you can always insert. If you insert on an existing record it will 
just overwrite it, if you update without an existing record it will insert 
data. In Cassandra there is not much difference between insert and update 
operations.

Regards,
--
Jacques-Henri Berthemet

From: Rajesh Kishore [mailto:rajesh10si...@gmail.com]
Sent: Thursday, June 21, 2018 7:45 AM
To: user@cassandra.apache.org
Subject: Re: how to avoid lightwieght transactions

Hi,

I think LWT feature is introduced for your kind of usecases only -  you don't 
want other requests to be updating the same data at the same time using Paxos 
algo(2 Phase commit).
So, IMO your usecase makes perfect sense to use LWT to avoid concurrent updates.
If your issue is not the concurrent update one then IMHO you may want to split 
this in two steps:
- get the transcation_type with quorum factor (or higher consistency level)
-  And conditionally update the row with with quorum factor (or higher 
consistency level)
But remember, this wont be atomic in nature and wont solve the concurrent 
update issue if you have.

Regards,
Rajesh



On Wed, Jun 20, 2018 at 2:59 AM, manuj singh 
mailto:s.manuj...@gmail.com>> wrote:
Hi all,
we have a use case where we need to update frequently our rows. Now in order to 
do so and so that we dont override updates we have to resort to lightweight 
transactions.
Since lightweight is expensive(could be 4 times as expensive as normal insert) 
, how do we model around it.

e.g i have a table where


CREATE TABLE multirow (

id text,

time text,

transcation_type text,

status text,

PRIMARY KEY (id, time)

)



So lets say we update status column multiple times. So first time we update we 
also have to make sure that the transaction exists otherwise normal update will 
insert it and then the original insert comes in and it will override the update.

So in order to fix that we need to use light weight transactions.



Is there another way i can model this so that we can avoid the lightweight 
transactions.





Thanks





RE: Does Cassandra supports ACID txn

2018-04-19 Thread Jacques-Henri Berthemet
When using BATCH on multiple tables you’ll need to use a LOGGED batch. When you 
send the request, it will be written to the batch log of all (relevant) nodes, 
when this write is successful it will be “accepted” and nodes will try to apply 
the batch operations. If for any reason a statement fails the node will keep 
retrying forever. In that case you may see partially applied batch until it’s 
fixed.

Note that you can’t mix BATCH and LWT on different tables/partitions.

You can get more details here:
http://cassandra.apache.org/doc/latest/cql/dml.html#batch
https://inoio.de/blog/2016/01/13/cassandra-to-batch-or-not-to-batch/
--
Jacques-Henri Berthemet

From: Rajesh Kishore [mailto:rajesh10si...@gmail.com]
Sent: Thursday, April 19, 2018 11:13 AM
To: user@cassandra.apache.org
Subject: Re: Does Cassandra supports ACID txn

Thanks for the response. Let me put my question again wrt a example
I want to perform a atomic txn say insert/delete/update on a set of tables
TableA
TableB
TableC
When these are performed as batch operations and let us say something goes 
wrong while doing operation at TableC
Would the system rollback the operations done for TableA TableB ?
-Rajesh


On Thu, Apr 19, 2018 at 1:25 PM, Jacques-Henri Berthemet 
<jacques-henri.berthe...@genesys.com<mailto:jacques-henri.berthe...@genesys.com>>
 wrote:
Cassandra support LWT (Lightweight transactions), you may find this doc 
interesting:
https://docs.datastax.com/en/cassandra/3.0/cassandra/dml/dmlDataConsistencyTOC.html<https://docs.datastax.com/en/cassandra/3.0/cassandra/dml/dmlDataConsistencyTOC.html>

In any case, LWT or BATCH you won’t have external control on the tx, it’s 
either done or not done. In case of timeout you won’t have a way to know if it 
worked or not.
There is no way to rollback a statement/batch, the only way is to send an 
update to modify the partition to its previous state.

Regards,
--
Jacques-Henri Berthemet

From: DuyHai Doan [mailto:doanduy...@gmail.com<mailto:doanduy...@gmail.com>]
Sent: Thursday, April 19, 2018 9:10 AM
To: user <user@cassandra.apache.org<mailto:user@cassandra.apache.org>>
Subject: Re: Does Cassandra supports ACID txn

No ACID transaction any soon in Cassandra

On Thu, Apr 19, 2018 at 7:35 AM, Rajesh Kishore 
<rajesh10si...@gmail.com<mailto:rajesh10si...@gmail.com>> wrote:
Hi,
I am bit confused by reading different articles, does recent version of 
Cassandra supports ACID transaction ?
I found BATCH command , but not sure if it supports rollback, consider that 
transaction I am going to perform would be on single partition.
Also, what are the limitations if any?

Thanks,
Rajesh




RE: Does Cassandra supports ACID txn

2018-04-19 Thread Jacques-Henri Berthemet
Cassandra support LWT (Lightweight transactions), you may find this doc 
interesting:
https://docs.datastax.com/en/cassandra/3.0/cassandra/dml/dmlDataConsistencyTOC.html

In any case, LWT or BATCH you won’t have external control on the tx, it’s 
either done or not done. In case of timeout you won’t have a way to know if it 
worked or not.
There is no way to rollback a statement/batch, the only way is to send an 
update to modify the partition to its previous state.

Regards,
--
Jacques-Henri Berthemet

From: DuyHai Doan [mailto:doanduy...@gmail.com]
Sent: Thursday, April 19, 2018 9:10 AM
To: user <user@cassandra.apache.org>
Subject: Re: Does Cassandra supports ACID txn

No ACID transaction any soon in Cassandra

On Thu, Apr 19, 2018 at 7:35 AM, Rajesh Kishore 
<rajesh10si...@gmail.com<mailto:rajesh10si...@gmail.com>> wrote:
Hi,
I am bit confused by reading different articles, does recent version of 
Cassandra supports ACID transaction ?
I found BATCH command , but not sure if it supports rollback, consider that 
transaction I am going to perform would be on single partition.
Also, what are the limitations if any?

Thanks,
Rajesh



Re: Is it safe to use paxos protocol in LWT from patent perspective ?

2018-04-18 Thread Jacques-Henri Berthemet
Hi Hiroyuki,


That's an interesting question, it looks like Paxos was invented in 1989:

https://en.wikipedia.org/wiki/Paxos_(computer_science)


So "prior art" could discard such patent claim.


--

Jacques-Henri Berthemet


From: Hiroyuki Yamada <mogwa...@gmail.com>
Sent: Wednesday, April 18, 2018 3:50:05 AM
To: user@cassandra.apache.org
Subject: Is it safe to use paxos protocol in LWT from patent perspective ?

Hi all,

I'm wondering if it is safe to use paxos protocol in LWT from patent
perspective.
I found some paxos-related patents here.
<https://patents.justia.com/inventor/leslie-lamport>

Does anyone know about this ?

Best regards,
Hiroyuki

-
To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
For additional commands, e-mail: user-h...@cassandra.apache.org



RE: Mailing list server IPs

2018-04-16 Thread Jacques-Henri Berthemet
Hi Nate,

Thanks for the help, I passed the info to our IT.

Regards,
--
Jacques-Henri Berthemet

From: Nate McCall [mailto:n...@thelastpickle.com]
Sent: Monday, April 16, 2018 12:27 AM
To: Cassandra Users <user@cassandra.apache.org>
Subject: Re: Mailing list server IPs

Hi Jacques,
Thanks for bringing this up. I took a quick look through the INFRA project and 
saw a couple of resolved issues that might help:
https://issues.apache.org/jira/browse/INFRA-6584?jql=project%20%3D%20INFRA%20AND%20text%20~%20%22mail%20server%20whitelist%22<https://issues.apache.org/jira/browse/INFRA-6584?jql=project%20%3D%20INFRA%20AND%20text%20~%20%22mail%20server%20whitelist%22>

If those don't do it for you, please open a new issue with INFRA.


On Sat, Apr 14, 2018 at 1:19 AM, Jacques-Henri Berthemet 
<jacques-henri.berthe...@genesys.com<mailto:jacques-henri.berthe...@genesys.com>>
 wrote:
I checked with IT and I missed an email on the period where I got the last 
bounce. It’s not a very big deal but I’d like to have it fixed if possible.

Gmail servers are very picky on SMTP traffic and reject a lot of things.

--
Jacques-Henri Berthemet

From: Nicolas Guyomar 
[mailto:nicolas.guyo...@gmail.com<mailto:nicolas.guyo...@gmail.com>]
Sent: Friday, April 13, 2018 3:15 PM
To: user@cassandra.apache.org<mailto:user@cassandra.apache.org>
Subject: Re: Mailing list server IPs

Hi,

I receive similar messages from time to time, and I'm using Gmail ;)  I believe 
I never missed a mail on the ML and that you can safely ignore this message

On 13 April 2018 at 15:06, Jacques-Henri Berthemet 
<jacques-henri.berthe...@genesys.com<mailto:jacques-henri.berthe...@genesys.com>>
 wrote:
Hi,

I’m getting bounce messages from the ML from time to time, see attached 
example. Our IT told me that they need to whitelist all IPs used by Cassandra 
ML server. Is there a way to get those IPs?

Sorry if it’s not really related to Cassandra itself but I didn’t find anything 
in 
http://untroubled.org/ezmlm/ezman/ezman5.html<http://untroubled.org/ezmlm/ezman/ezman5.html>
 commands.

Regards,
--
Jacques-Henri Berthemet


-- Forwarded message --
From: "user-h...@cassandra.apache.org<mailto:user-h...@cassandra.apache.org>" 
<user-h...@cassandra.apache.org<mailto:user-h...@cassandra.apache.org>>
To: Jacques-Henri Berthemet 
<jacques-henri.berthe...@genesys.com<mailto:jacques-henri.berthe...@genesys.com>>
Cc:
Bcc:
Date: Fri, 6 Apr 2018 20:47:22 +
Subject: Warning from 
user@cassandra.apache.org<mailto:user@cassandra.apache.org>
Hi! This is the ezmlm program. I'm managing the
user@cassandra.apache.org<mailto:user@cassandra.apache.org> mailing list.


Messages to you from the user mailing list seem to
have been bouncing. I've attached a copy of the first bounce
message I received.

If this message bounces too, I will send you a probe. If the probe bounces,
I will remove your address from the user mailing list,
without further notice.


I've kept a list of which messages from the user mailing list have
bounced from your address.

Copies of these messages may be in the archive.
To retrieve a set of messages 123-145 (a maximum of 100 per request),
send a short message to:
   
<user-get.123_...@cassandra.apache.org<mailto:user-get.123_...@cassandra.apache.org>>

To receive a subject and author list for the last 100 or so messages,
send a short message to:
   <user-in...@cassandra.apache.org<mailto:user-in...@cassandra.apache.org>>

Here are the message numbers:

   60535
   60536
   60548

--- Enclosed is a copy of the bounce message I received.

Return-Path: <>
Received: (qmail 8848 invoked for bounce); 27 Mar 2018 14:22:11 -
Date: 27 Mar 2018 14:22:11 -
From: mailer-dae...@apache.org<mailto:mailer-dae...@apache.org>
To: 
user-return-605...@cassandra.apache.org<mailto:user-return-605...@cassandra.apache.org>
Subject: failure notice




-
To unsubscribe, e-mail: 
user-unsubscr...@cassandra.apache.org<mailto:user-unsubscr...@cassandra.apache.org>
For additional commands, e-mail: 
user-h...@cassandra.apache.org<mailto:user-h...@cassandra.apache.org>




--
-
Nate McCall
Wellington, NZ
@zznate

CTO
Apache Cassandra Consulting
http://www.thelastpickle.com<http://www.thelastpickle.com>


RE: Mailing list server IPs

2018-04-13 Thread Jacques-Henri Berthemet
I checked with IT and I missed an email on the period where I got the last 
bounce. It’s not a very big deal but I’d like to have it fixed if possible.

Gmail servers are very picky on SMTP traffic and reject a lot of things.

--
Jacques-Henri Berthemet

From: Nicolas Guyomar [mailto:nicolas.guyo...@gmail.com]
Sent: Friday, April 13, 2018 3:15 PM
To: user@cassandra.apache.org
Subject: Re: Mailing list server IPs

Hi,

I receive similar messages from time to time, and I'm using Gmail ;)  I believe 
I never missed a mail on the ML and that you can safely ignore this message

On 13 April 2018 at 15:06, Jacques-Henri Berthemet 
<jacques-henri.berthe...@genesys.com<mailto:jacques-henri.berthe...@genesys.com>>
 wrote:
Hi,

I’m getting bounce messages from the ML from time to time, see attached 
example. Our IT told me that they need to whitelist all IPs used by Cassandra 
ML server. Is there a way to get those IPs?

Sorry if it’s not really related to Cassandra itself but I didn’t find anything 
in 
http://untroubled.org/ezmlm/ezman/ezman5.html<http://untroubled.org/ezmlm/ezman/ezman5.html>
 commands.

Regards,
--
Jacques-Henri Berthemet


-- Forwarded message --
From: "user-h...@cassandra.apache.org<mailto:user-h...@cassandra.apache.org>" 
<user-h...@cassandra.apache.org<mailto:user-h...@cassandra.apache.org>>
To: Jacques-Henri Berthemet 
<jacques-henri.berthe...@genesys.com<mailto:jacques-henri.berthe...@genesys.com>>
Cc:
Bcc:
Date: Fri, 6 Apr 2018 20:47:22 +
Subject: Warning from 
user@cassandra.apache.org<mailto:user@cassandra.apache.org>
Hi! This is the ezmlm program. I'm managing the
user@cassandra.apache.org<mailto:user@cassandra.apache.org> mailing list.


Messages to you from the user mailing list seem to
have been bouncing. I've attached a copy of the first bounce
message I received.

If this message bounces too, I will send you a probe. If the probe bounces,
I will remove your address from the user mailing list,
without further notice.


I've kept a list of which messages from the user mailing list have
bounced from your address.

Copies of these messages may be in the archive.
To retrieve a set of messages 123-145 (a maximum of 100 per request),
send a short message to:
   
<user-get.123_...@cassandra.apache.org<mailto:user-get.123_...@cassandra.apache.org>>

To receive a subject and author list for the last 100 or so messages,
send a short message to:
   <user-in...@cassandra.apache.org<mailto:user-in...@cassandra.apache.org>>

Here are the message numbers:

   60535
   60536
   60548

--- Enclosed is a copy of the bounce message I received.

Return-Path: <>
Received: (qmail 8848 invoked for bounce); 27 Mar 2018 14:22:11 -
Date: 27 Mar 2018 14:22:11 -
From: mailer-dae...@apache.org<mailto:mailer-dae...@apache.org>
To: 
user-return-605...@cassandra.apache.org<mailto:user-return-605...@cassandra.apache.org>
Subject: failure notice




-
To unsubscribe, e-mail: 
user-unsubscr...@cassandra.apache.org<mailto:user-unsubscr...@cassandra.apache.org>
For additional commands, e-mail: 
user-h...@cassandra.apache.org<mailto:user-h...@cassandra.apache.org>



Mailing list server IPs

2018-04-13 Thread Jacques-Henri Berthemet
Hi,

I’m getting bounce messages from the ML from time to time, see attached 
example. Our IT told me that they need to whitelist all IPs used by Cassandra 
ML server. Is there a way to get those IPs?

Sorry if it’s not really related to Cassandra itself but I didn’t find anything 
in http://untroubled.org/ezmlm/ezman/ezman5.html commands.

Regards,
--
Jacques-Henri Berthemet
--- Begin Message ---
Hi! This is the ezmlm program. I'm managing the
user@cassandra.apache.org mailing list.


Messages to you from the user mailing list seem to
have been bouncing. I've attached a copy of the first bounce
message I received.

If this message bounces too, I will send you a probe. If the probe bounces,
I will remove your address from the user mailing list,
without further notice.


I've kept a list of which messages from the user mailing list have
bounced from your address.

Copies of these messages may be in the archive.
To retrieve a set of messages 123-145 (a maximum of 100 per request),
send a short message to:
   <user-get.123_...@cassandra.apache.org>

To receive a subject and author list for the last 100 or so messages,
send a short message to:
   <user-in...@cassandra.apache.org>

Here are the message numbers:

   60535
   60536
   60548

--- Enclosed is a copy of the bounce message I received.

Return-Path: <>
Received: (qmail 8848 invoked for bounce); 27 Mar 2018 14:22:11 -
Date: 27 Mar 2018 14:22:11 -
From: mailer-dae...@apache.org
To: user-return-605...@cassandra.apache.org
Subject: failure notice


--- End Message ---

-
To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
For additional commands, e-mail: user-h...@cassandra.apache.org

RE: does c* 3.0 use one ring for all datacenters?

2018-04-11 Thread Jacques-Henri Berthemet
Hi,

Each DC has the whole ring, each DC contains a copy of the same data. When you 
add replication to a new DC, all data is copied to the new DC.

Within a DC, each range of token is 'owned' by a (primary) node (and replicas 
if you have RF > 1). If you add/remove a node in a DC, tokens will be 
rearranged between all nodes within the DC only, the other DCs won't be 
affected.

--
Jacques-Henri Berthemet

-Original Message-
From: Jinhua Luo [mailto:luajit...@gmail.com] 
Sent: Wednesday, April 11, 2018 12:35 PM
To: user@cassandra.apache.org
Subject: does c* 3.0 use one ring for all datacenters?

Hi All,

I know it seems a stupid question, but I am really confused about the documents 
on the internet related to this topic, especially it seems that it has 
different answers for c* with vnodes or not.

Let's assume the token range is 1-100 for the whole cluster, how does it 
distributed into the datacenters? Think that the number of datacenters is 
dynamic in a cluster, if there is only one ring, then the token range would 
change on each node when I add a new datacenter into the cluster? Then it would 
involve data migration? It doesn't make sense.

Looking forward to clarification for c* 3.0, thanks!

-
To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
For additional commands, e-mail: user-h...@cassandra.apache.org


-
To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
For additional commands, e-mail: user-h...@cassandra.apache.org


RE: LWT on data mutated by non-LWT operation is valid ?

2018-03-26 Thread Jacques-Henri Berthemet
If you check the Jira issue I linked you'll see a recent comment describing a 
potential explanation for my mixed LWT/non-LWT problem. So, it looks like there 
can be some edge cases.

I'd say that if data was inserted a while ago (seconds) there should be no 
problems.

--
Jacques-Henri Berthemet

-Original Message-
From: Hiroyuki Yamada [mailto:mogwa...@gmail.com] 
Sent: Sunday, March 25, 2018 1:10 AM
To: user@cassandra.apache.org
Subject: Re: LWT on data mutated by non-LWT operation is valid ?

Thank you JH.
But, it's still a little bit unclear to me

Let me clarify the question.
What I wanted to know is whether or not linearizability is sustained by doing 
LWT  (Consistency: QUORUM, Serial Consistency: SERIAL) on data previously 
mutated by non-LWT (Consistency: QUORUM).
I think It should be OK if non-LWT surely correctly happened before the LWT, 
but I'm wondering if there is a corner case where it's not OK.

For example, to test LWT (Update) operations, initial data should be inserted 
by LWT operations ? or it can be non-LWT operations ?

Thanks,
Hiroyuki

On Sat, Mar 24, 2018 at 8:27 PM, Jacques-Henri Berthemet 
<jacques-henri.berthe...@genesys.com> wrote:
> Hi Hiroyuki,
>
>
> For both operations you'll have to provide partition key so "conflict" 
> at DB level can always be resolved.
>
> But if two operations, LWT and non-LWT, are racing against each others 
> the result is unpredictable, if non-LWT is applied after LWT the 
> result will be overwritten.
>
>
> It seems mixing LWT and non-LWT can result in strange results, we 
> recently opened a bug on non-working delete after LWT insert:
> https://issues.apache.org/jira/browse/CASSANDRA-14304
> .apache.org
>
>
> Regards,
>
> JH
>
> 
> From: Hiroyuki Yamada <mogwa...@gmail.com>
> Sent: Saturday, March 24, 2018 4:38:15 AM
> To: user@cassandra.apache.org
> Subject: LWT on data mutated by non-LWT operation is valid ?
>
> Hi all,
>
> I have some question about LWT.
>
> I am wondering if LWT works only for data mutated by LWT or not.
> In other words, doing LWT on some data mutated by non-LWT operations 
> is still valid ?
> I don't fully understand how system.paxos table works in LWT, but 
> row_key should be empty for a data mutated by non-LWT operation, so 
> conflict resolution seems impossible.
> It works only if a previous non-LWT operation is completely finished ?
>
> Thanks in advance.
>
> Best regards,
> Hiroyuki
>
> -
> To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
> For additional commands, e-mail: user-h...@cassandra.apache.org
>

-
To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
For additional commands, e-mail: user-h...@cassandra.apache.org


-
To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
For additional commands, e-mail: user-h...@cassandra.apache.org


Re: LWT on data mutated by non-LWT operation is valid ?

2018-03-24 Thread Jacques-Henri Berthemet
Hi Hiroyuki,


For both operations you'll have to provide partition key so "conflict" at DB 
level can always be resolved.

But if two operations, LWT and non-LWT, are racing against each others the 
result is unpredictable, if non-LWT is applied after LWT the result will be 
overwritten.


It seems mixing LWT and non-LWT can result in strange results, we recently 
opened a bug on non-working delete after LWT insert: 
https://issues.apache.org/jira/browse/CASSANDRA-14304


Regards,

JH


From: Hiroyuki Yamada 
Sent: Saturday, March 24, 2018 4:38:15 AM
To: user@cassandra.apache.org
Subject: LWT on data mutated by non-LWT operation is valid ?

Hi all,

I have some question about LWT.

I am wondering if LWT works only for data mutated by LWT or not.
In other words, doing LWT on some data mutated by non-LWT operations
is still valid ?
I don't fully understand how system.paxos table works in LWT,
but row_key should be empty for a data mutated by non-LWT operation,
so conflict resolution seems impossible.
It works only if a previous non-LWT operation is completely finished ?

Thanks in advance.

Best regards,
Hiroyuki

-
To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
For additional commands, e-mail: user-h...@cassandra.apache.org



Re: Using Spark to delete from Transactional Cluster

2018-03-24 Thread Jacques-Henri Berthemet
A row is TTLed once all its columns are TTLed. If you want a row to be TTLed at 
once just set the same TTL on all its columns.


From: Charulata Sharma (charshar) 
Sent: Friday, March 23, 2018 9:52:28 PM
To: user@cassandra.apache.org
Subject: Re: Using Spark to delete from Transactional Cluster


Yes agree on “let really old data expire” . However, I could not find a way to 
TTL an entire row. Only columns can be TTLed.



Charu



From: Rahul Singh 
Reply-To: "user@cassandra.apache.org" 
Date: Friday, March 23, 2018 at 1:45 PM
To: "user@cassandra.apache.org" , 
"user@cassandra.apache.org" 
Subject: Re: Using Spark to delete from Transactional Cluster



I think there are better ways to leverage parallel processing than to use it to 
delete data. As I said , it works for one of my projects for the same exact 
reason you stated : business rules.

Deleting data is an old way of thinking. Why not store the data and just use 
the relevant data .. let really old data expire ..

--
Rahul Singh
rahul.si...@anant.us

Anant Corporation

On Mar 23, 2018, 11:38 AM -0700, Charulata Sharma (charshar) 
, wrote:


Hi Rahul,

 Thanks for your answer. Why do you say that deleting from spark is not 
elegant?? This is the exact feedback I want. Basically why is it not elegant?

I can either delete using delete prepared statements or through spark. TTL 
approach doesn’t work for us

Because first of all ttl is there at a column level and there are business 
rules for purge which make the TTL solution not very clean in our case.



Thanks,

Charu



From: Rahul Singh 
Reply-To: "user@cassandra.apache.org" 
Date: Thursday, March 22, 2018 at 5:08 PM
To: "user@cassandra.apache.org" , 
"user@cassandra.apache.org" 
Subject: Re: Using Spark to delete from Transactional Cluster



Short answer : it works. You can even run “delete” statements from within Spark 
once you know which keys to delete. Not elegant but it works.

It will create a bunch of tombstones and you may need to spread your deletes 
over days. Another thing to consider is instead of deleting setting a TTL which 
will eventually get cleansed.

--
Rahul Singh
rahul.si...@anant.us

Anant Corporation

On Mar 22, 2018, 2:19 PM -0500, Charulata Sharma (charshar) 
, wrote:

Hi,

   Wanted to know the community’s experiences and feedback on using Apache 
Spark to delete data from C* transactional cluster.

We have spark installed in our analytical C* cluster and so far we have been 
using Spark only for analytics purposes.



However, now with advanced features of Spark 2.0, I am considering using 
spark-cassandra connector for deletes instead of a series of Delete Prepared 
Statements

So essentially the deletes will happen on the analytical cluster and they will 
be replicated over to transactional cluster by means of our keyspace 
replication strategies.



Are there any risks involved in this ??



Thanks,

Charu




RE: Cassandra DevCenter

2018-03-13 Thread Jacques-Henri Berthemet
Then you need to make the cp yourself as I described.

--
Jacques-Henri Berthemet

-Original Message-
From: phi...@free.fr [mailto:phi...@free.fr] 
Sent: Tuesday, March 13, 2018 11:03 AM
To: user@cassandra.apache.org
Subject: Re: Cassandra DevCenter

Same result.

- Mail original -
De: "Jacques-Henri Berthemet" <jacques-henri.berthe...@genesys.com>
À: user@cassandra.apache.org
Envoyé: Mardi 13 Mars 2018 10:46:29
Objet: RE: Cassandra DevCenter

And that?
groovy -cp C:\DevCenter\plugins\* Cass1.groovy

DevCenter\plugins contains all needed jars, your problem is just a classpath 
problem. If that does not work, you'll need to manually make the required 
classpath, check jars needed in 
https://mvnrepository.com/artifact/com.datastax.cassandra/cassandra-driver-core/3.4.0
 "Compile dependencies" and make it yourself.
--
Jacques-Henri Berthemet

-Original Message-
From: phi...@free.fr [mailto:phi...@free.fr] 
Sent: Tuesday, March 13, 2018 9:51 AM
To: user@cassandra.apache.org
Subject: Re: Cassandra DevCenter


I'm now getting this:

Cass1.groovy: 15: unable to resolve class 
com.datastax.driver.core.ProtocolOptions.Compression
 @ line 15, column 1.
   import com.datastax.driver.core.ProtocolOptions.Compression
   ^

Cass1.groovy: 11: unable to resolve class com.datastax.driver.core.Metadata  @ 
line 11, column 1.
   import com.datastax.driver.core.Metadata
   ^

Cass1.groovy: 13: unable to resolve class com.datastax.driver.core.ResultSet
 @ line 13, column 1.
   import com.datastax.driver.core.ResultSet
   ^
Cass1.groovy: 9: unable to resolve class com.datastax.driver.core.Cluster  @ 
line 9, column 1.
   import com.datastax.driver.core.Cluster
   ^

Cass1.groovy: 10: unable to resolve class com.datastax.driver.core.Host  @ line 
10, column 1.
   import com.datastax.driver.core.Host
   ^
\Cass1.groovy: 14: unable to resolve class com.datastax.driver.core.Row  @ line 
14, column 1.
   import com.datastax.driver.core.Row
   ^



- Mail original -
De: "Jacques-Henri Berthemet" <jacques-henri.berthe...@genesys.com>
À: user@cassandra.apache.org
Envoyé: Mardi 13 Mars 2018 09:43:54
Objet: RE: Cassandra DevCenter

Hi,

Try that:
groovy -cp C:\DevCenter\plugins\*.jar Cass1.groovy 


--
Jacques-Henri Berthemet

-Original Message-
From: phi...@free.fr [mailto:phi...@free.fr]
Sent: Tuesday, March 13, 2018 9:40 AM
To: user@cassandra.apache.org
Subject: Re: Cassandra DevCenter

Good morning,

when I run 

groovy -cp .\cassandra-driver-core-3.4.0.jar;C:\DevCenter\plugins Cass1.groovy 

I get the following error message:


U:\workarea\ProjetsBNP\groo>run_Cass1.bat
"groovy vopts = "
Caught: java.lang.NoClassDefFoundError: 
com/google/common/util/concurrent/AsyncFunction
java.lang.NoClassDefFoundError: com/google/common/util/concurrent/AsyncFunction
at Cass1.retrieveCities(Cass1.groovy:25)
at Cass1.run(Cass1.groovy:67)
Caused by: java.lang.ClassNotFoundException: 
com.google.common.util.concurrent.AsyncFunction
... 2 more


Here's my Groovy code:



import com.datastax.driver.core.Cluster
import com.datastax.driver.core.Host
import com.datastax.driver.core.Metadata import 
com.datastax.driver.core.Session import com.datastax.driver.core.ResultSet
import com.datastax.driver.core.Row
import com.datastax.driver.core.ProtocolOptions.Compression
import java.sql.SQLException

def retrieveCities() {

final def hostsIp = 'xxx,bbb,ccc'
final def hosts = ''
// 
def Cluster cluster = 
Cluster.builder().addContactPoints(hostsIp.split(','))  <== line 25
.withPort(1234)
// com.datastax.driver.core.ProtocolOptions.Compression.LZ4
.withCompression(Compression.LZ4)
.withCredentials( '...' , '...')
.build()




As a reminder, I can't download missing dependencies as I have no access to 
Internet.

As a result, I must rely only on the cassandra-driver-core-3.4.0.jar jar and/or 
the DevCenter jars.

Any help would be much appreciated.


Philippe


- Mail original -
De: "Jacques-Henri Berthemet" <jacques-henri.berthe...@genesys.com>
À: user@cassandra.apache.org
Envoyé: Lundi 12 Mars 2018 09:44:27
Objet: RE: Cassandra DevCenter




Hi, 



There is no DevCenter 2.x, latest is 1.6. It would help if you provide jar 
names and exceptions you encounter. Make sure you ’ re not mixing Guava 
versions from other dependencies. DevCenter uses Datastax driver to connect to 
Cassandra, double check the versions of the jars you need here: 

https://mvnrepository.com/artifact/com.datastax.cassandra/cassandra-driver-core 



Put only the jars listed on the driver version you have on you classpath and it 
should work. 




-- 

Jacques-Henri Berthemet 





From: Philippe de Rochambeau [mailto:phi...@free.fr]
Sent: 

RE: Cassandra DevCenter

2018-03-13 Thread Jacques-Henri Berthemet
And that?
groovy -cp C:\DevCenter\plugins\* Cass1.groovy

DevCenter\plugins contains all needed jars, your problem is just a classpath 
problem. If that does not work, you'll need to manually make the required 
classpath, check jars needed in 
https://mvnrepository.com/artifact/com.datastax.cassandra/cassandra-driver-core/3.4.0
 "Compile dependencies" and make it yourself.
--
Jacques-Henri Berthemet

-Original Message-
From: phi...@free.fr [mailto:phi...@free.fr] 
Sent: Tuesday, March 13, 2018 9:51 AM
To: user@cassandra.apache.org
Subject: Re: Cassandra DevCenter


I'm now getting this:

Cass1.groovy: 15: unable to resolve class 
com.datastax.driver.core.ProtocolOptions.Compression
 @ line 15, column 1.
   import com.datastax.driver.core.ProtocolOptions.Compression
   ^

Cass1.groovy: 11: unable to resolve class com.datastax.driver.core.Metadata  @ 
line 11, column 1.
   import com.datastax.driver.core.Metadata
   ^

Cass1.groovy: 13: unable to resolve class com.datastax.driver.core.ResultSet
 @ line 13, column 1.
   import com.datastax.driver.core.ResultSet
   ^
Cass1.groovy: 9: unable to resolve class com.datastax.driver.core.Cluster  @ 
line 9, column 1.
   import com.datastax.driver.core.Cluster
   ^

Cass1.groovy: 10: unable to resolve class com.datastax.driver.core.Host  @ line 
10, column 1.
   import com.datastax.driver.core.Host
   ^
\Cass1.groovy: 14: unable to resolve class com.datastax.driver.core.Row  @ line 
14, column 1.
   import com.datastax.driver.core.Row
   ^



- Mail original -
De: "Jacques-Henri Berthemet" <jacques-henri.berthe...@genesys.com>
À: user@cassandra.apache.org
Envoyé: Mardi 13 Mars 2018 09:43:54
Objet: RE: Cassandra DevCenter

Hi,

Try that:
groovy -cp C:\DevCenter\plugins\*.jar Cass1.groovy 


--
Jacques-Henri Berthemet

-Original Message-
From: phi...@free.fr [mailto:phi...@free.fr]
Sent: Tuesday, March 13, 2018 9:40 AM
To: user@cassandra.apache.org
Subject: Re: Cassandra DevCenter

Good morning,

when I run 

groovy -cp .\cassandra-driver-core-3.4.0.jar;C:\DevCenter\plugins Cass1.groovy 

I get the following error message:


U:\workarea\ProjetsBNP\groo>run_Cass1.bat
"groovy vopts = "
Caught: java.lang.NoClassDefFoundError: 
com/google/common/util/concurrent/AsyncFunction
java.lang.NoClassDefFoundError: com/google/common/util/concurrent/AsyncFunction
at Cass1.retrieveCities(Cass1.groovy:25)
at Cass1.run(Cass1.groovy:67)
Caused by: java.lang.ClassNotFoundException: 
com.google.common.util.concurrent.AsyncFunction
... 2 more


Here's my Groovy code:



import com.datastax.driver.core.Cluster
import com.datastax.driver.core.Host
import com.datastax.driver.core.Metadata import 
com.datastax.driver.core.Session import com.datastax.driver.core.ResultSet
import com.datastax.driver.core.Row
import com.datastax.driver.core.ProtocolOptions.Compression
import java.sql.SQLException

def retrieveCities() {

final def hostsIp = 'xxx,bbb,ccc'
final def hosts = ''
// 
def Cluster cluster = 
Cluster.builder().addContactPoints(hostsIp.split(','))  <== line 25
.withPort(1234)
// com.datastax.driver.core.ProtocolOptions.Compression.LZ4
.withCompression(Compression.LZ4)
.withCredentials( '...' , '...')
.build()




As a reminder, I can't download missing dependencies as I have no access to 
Internet.

As a result, I must rely only on the cassandra-driver-core-3.4.0.jar jar and/or 
the DevCenter jars.

Any help would be much appreciated.


Philippe


----- Mail original -
De: "Jacques-Henri Berthemet" <jacques-henri.berthe...@genesys.com>
À: user@cassandra.apache.org
Envoyé: Lundi 12 Mars 2018 09:44:27
Objet: RE: Cassandra DevCenter




Hi, 



There is no DevCenter 2.x, latest is 1.6. It would help if you provide jar 
names and exceptions you encounter. Make sure you ’ re not mixing Guava 
versions from other dependencies. DevCenter uses Datastax driver to connect to 
Cassandra, double check the versions of the jars you need here: 

https://mvnrepository.com/artifact/com.datastax.cassandra/cassandra-driver-core 



Put only the jars listed on the driver version you have on you classpath and it 
should work. 




-- 

Jacques-Henri Berthemet 





From: Philippe de Rochambeau [mailto:phi...@free.fr]
Sent: Saturday, March 10, 2018 6:56 PM
To: user@cassandra.apache.org
Subject: Re: Cassandra DevCenter 




Hi, 


thank you for replying. 


Unfortunately, the computer DevCenter is running on doesn’t have Internet 
access (for security reasons). As a result, I can’t use the pom.xml. 


Furthermore, I’ve tried running a Groovy program whose classpath included the 
DevCenter (2.x) lib directory, but to no avail as a Google dependency was 
missing (I can’t recall the dependency’s name). 


RE: Cassandra DevCenter

2018-03-13 Thread Jacques-Henri Berthemet
Hi,

Try that:
groovy -cp C:\DevCenter\plugins\*.jar Cass1.groovy 


--
Jacques-Henri Berthemet

-Original Message-
From: phi...@free.fr [mailto:phi...@free.fr] 
Sent: Tuesday, March 13, 2018 9:40 AM
To: user@cassandra.apache.org
Subject: Re: Cassandra DevCenter

Good morning,

when I run 

groovy -cp .\cassandra-driver-core-3.4.0.jar;C:\DevCenter\plugins Cass1.groovy 

I get the following error message:


U:\workarea\ProjetsBNP\groo>run_Cass1.bat
"groovy vopts = "
Caught: java.lang.NoClassDefFoundError: 
com/google/common/util/concurrent/AsyncFunction
java.lang.NoClassDefFoundError: com/google/common/util/concurrent/AsyncFunction
at Cass1.retrieveCities(Cass1.groovy:25)
at Cass1.run(Cass1.groovy:67)
Caused by: java.lang.ClassNotFoundException: 
com.google.common.util.concurrent.AsyncFunction
... 2 more


Here's my Groovy code:



import com.datastax.driver.core.Cluster
import com.datastax.driver.core.Host
import com.datastax.driver.core.Metadata import 
com.datastax.driver.core.Session import com.datastax.driver.core.ResultSet
import com.datastax.driver.core.Row
import com.datastax.driver.core.ProtocolOptions.Compression
import java.sql.SQLException

def retrieveCities() {

final def hostsIp = 'xxx,bbb,ccc'
final def hosts = ''
// 
def Cluster cluster = 
Cluster.builder().addContactPoints(hostsIp.split(','))  <== line 25
.withPort(1234)
// com.datastax.driver.core.ProtocolOptions.Compression.LZ4
.withCompression(Compression.LZ4)
.withCredentials( '...' , '...')
.build()




As a reminder, I can't download missing dependencies as I have no access to 
Internet.

As a result, I must rely only on the cassandra-driver-core-3.4.0.jar jar and/or 
the DevCenter jars.

Any help would be much appreciated.


Philippe


- Mail original -----
De: "Jacques-Henri Berthemet" <jacques-henri.berthe...@genesys.com>
À: user@cassandra.apache.org
Envoyé: Lundi 12 Mars 2018 09:44:27
Objet: RE: Cassandra DevCenter




Hi, 



There is no DevCenter 2.x, latest is 1.6. It would help if you provide jar 
names and exceptions you encounter. Make sure you ’ re not mixing Guava 
versions from other dependencies. DevCenter uses Datastax driver to connect to 
Cassandra, double check the versions of the jars you need here: 

https://mvnrepository.com/artifact/com.datastax.cassandra/cassandra-driver-core 



Put only the jars listed on the driver version you have on you classpath and it 
should work. 




-- 

Jacques-Henri Berthemet 





From: Philippe de Rochambeau [mailto:phi...@free.fr]
Sent: Saturday, March 10, 2018 6:56 PM
To: user@cassandra.apache.org
Subject: Re: Cassandra DevCenter 




Hi, 


thank you for replying. 


Unfortunately, the computer DevCenter is running on doesn’t have Internet 
access (for security reasons). As a result, I can’t use the pom.xml. 


Furthermore, I’ve tried running a Groovy program whose classpath included the 
DevCenter (2.x) lib directory, but to no avail as a Google dependency was 
missing (I can’t recall the dependency’s name). 


Because DevCenter manages to connect to Cassandra without downloading 
dependencies, there’s bound to be a way to drive the former using Java or 
Groovy. 



Le 10 mars 2018 à 18:34, Goutham reddy < goutham.chiru...@gmail.com > a écrit : 






Get the JARS from Cassandra lib folder and put it in your build path. Or else 
use Pom.xml maven project to directly download from repository. 





Thanks and Regards, 


Goutham Reddy Aenugu. 





On Sat, Mar 10, 2018 at 9:30 AM Philippe de Rochambeau < phi...@free.fr > 
wrote: 



Hello,
has anyone tried running CQL queries from a Java program using the jars 
provided with DevCenter? 
Many thanks. 
Philippe 

-
To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
For additional commands, e-mail: user-h...@cassandra.apache.org 

-- 




Regards 

Goutham Reddy

-
To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
For additional commands, e-mail: user-h...@cassandra.apache.org



RE: yet another benchmark bottleneck

2018-03-12 Thread Jacques-Henri Berthemet
If throughput decreases as you add more load then it’s probably due to disk 
latency, can you test SDDs? Are you using VMWare ESXi?

--
Jacques-Henri Berthemet

From: onmstester onmstester [mailto:onmstes...@zoho.com]
Sent: Monday, March 12, 2018 2:15 PM
To: user <user@cassandra.apache.org>
Subject: RE: yet another benchmark bottleneck

I mentioned that already tested increasing client threads + many stress-client 
instances in one node + two stress-client in two separate nodes, in all of them 
the sum of throughputs is less than 130K. I've been tuning all aspects of OS 
and Cassandra (whatever I've seen in config files!) for two days, still no luck!


Sent using Zoho Mail<https://www.zoho.com/mail/>


 On Mon, 12 Mar 2018 16:38:22 +0330 Jacques-Henri Berthemet 
<jacques-henri.berthe...@genesys.com<mailto:jacques-henri.berthe...@genesys.com>>
 wrote 

What happens if you increase number of client threads?
Can you add another instance of cassandra-stress on another host?

--
Jacques-Henri Berthemet

From: onmstester onmstester 
[mailto:onmstes...@zoho.com<mailto:onmstes...@zoho.com>]
Sent: Monday, March 12, 2018 12:50 PM
To: user <user@cassandra.apache.org<mailto:user@cassandra.apache.org>>
Subject: RE: yet another benchmark bottleneck

no luck even with 320 threads for write


Sent using Zoho Mail<https://www.zoho.com/mail/>


 On Mon, 12 Mar 2018 14:44:15 +0330 Jacques-Henri Berthemet 
<jacques-henri.berthe...@genesys.com<mailto:jacques-henri.berthe...@genesys.com>>
 wrote 

It makes more sense now, 130K is not that bad.

According to cassandra.yaml you should be able to increase your number of write 
threads in Cassandra:
# On the other hand, since writes are almost never IO bound, the ideal
# number of "concurrent_writes" is dependent on the number of cores in
# your system; (8 * number_of_cores) is a good rule of thumb.
concurrent_reads: 32
concurrent_writes: 32
concurrent_counter_writes: 32

Jumping directly to 160 would be a bit high with spinning disks, maybe start 
with 64 just to see if it gets better.

--
Jacques-Henri Berthemet

From: onmstester onmstester 
[mailto:onmstes...@zoho.com<mailto:onmstes...@zoho.com>]
Sent: Monday, March 12, 2018 12:08 PM
To: user <user@cassandra.apache.org<mailto:user@cassandra.apache.org>>
Subject: RE: yet another benchmark bottleneck

RF=1
No errors or warnings.
Actually its 300 Mbit/seconds and 130K OP/seconds. I missed a 'K' in first 
mail, but anyway! the point is: More than half of node resources (cpu, mem, 
disk, network) is unused and i can't increase write throughput.


Sent using Zoho Mail<https://www.zoho.com/mail/>


 On Mon, 12 Mar 2018 14:25:12 +0330 Jacques-Henri Berthemet 
<jacques-henri.berthe...@genesys.com<mailto:jacques-henri.berthe...@genesys.com>>
 wrote 

Any errors/warning in Cassandra logs? What’s your RF?
Using 300MB/s of network bandwidth for only 130 op/s looks very high.

--
Jacques-Henri Berthemet

From: onmstester onmstester 
[mailto:onmstes...@zoho.com<mailto:onmstes...@zoho.com>]
Sent: Monday, March 12, 2018 11:38 AM
To: user <user@cassandra.apache.org<mailto:user@cassandra.apache.org>>
Subject: RE: yet another benchmark bottleneck

1.2 TB 15K
latency reported by stress tool is 7.6 ms. disk latency is 2.6 ms


Sent using Zoho Mail<https://www.zoho.com/mail/>


 On Mon, 12 Mar 2018 14:02:29 +0330 Jacques-Henri Berthemet 
<jacques-henri.berthe...@genesys.com<mailto:jacques-henri.berthe...@genesys.com>>
 wrote 

What’s your disk latency? What kind of disk is it?

--
Jacques-Henri Berthemet

From: onmstester onmstester 
[mailto:onmstes...@zoho.com<mailto:onmstes...@zoho.com>]
Sent: Monday, March 12, 2018 10:48 AM
To: user <user@cassandra.apache.org<mailto:user@cassandra.apache.org>>
Subject: Re: yet another benchmark bottleneck

Running two instance of Apache Cassandra on same server, each having their own 
commit log disk dis not help. Sum of cpu/ram usage  for both instances would be 
less than half of all available resources. disk usage is less than 20% and 
network is still less than 300Mb in Rx.


Sent using Zoho Mail<https://www.zoho.com/mail/>


 On Mon, 12 Mar 2018 09:34:26 +0330 onmstester onmstester 
<onmstes...@zoho.com<mailto:onmstes...@zoho.com>> wrote 

Apache-cassandra-3.11.1
Yes, i'm dosing a single host test


Sent using Zoho Mail<https://www.zoho.com/mail/>


 On Mon, 12 Mar 2018 09:24:04 +0330 Jeff Jirsa 
<jji...@gmail.com<mailto:jji...@gmail.com>> wrote 



Would help to know your version. 130 ops/second sounds like a ridiculously low 
rate. Are you doing a single host test?

On Sun, Mar 11, 2018 at 10:44 PM, onmstester onmstester 
<onmstes...@zoho.com<mailto:onmstes...@zoho.com>> wrote:


I'm going to benchmark Cassandra's write throughput on a node with f

RE: yet another benchmark bottleneck

2018-03-12 Thread Jacques-Henri Berthemet
What happens if you increase number of client threads?
Can you add another instance of cassandra-stress on another host?

--
Jacques-Henri Berthemet

From: onmstester onmstester [mailto:onmstes...@zoho.com]
Sent: Monday, March 12, 2018 12:50 PM
To: user <user@cassandra.apache.org>
Subject: RE: yet another benchmark bottleneck

no luck even with 320 threads for write


Sent using Zoho Mail<https://www.zoho.com/mail/>


 On Mon, 12 Mar 2018 14:44:15 +0330 Jacques-Henri Berthemet 
<jacques-henri.berthe...@genesys.com<mailto:jacques-henri.berthe...@genesys.com>>
 wrote 

It makes more sense now, 130K is not that bad.

According to cassandra.yaml you should be able to increase your number of write 
threads in Cassandra:
# On the other hand, since writes are almost never IO bound, the ideal
# number of "concurrent_writes" is dependent on the number of cores in
# your system; (8 * number_of_cores) is a good rule of thumb.
concurrent_reads: 32
concurrent_writes: 32
concurrent_counter_writes: 32

Jumping directly to 160 would be a bit high with spinning disks, maybe start 
with 64 just to see if it gets better.

--
Jacques-Henri Berthemet

From: onmstester onmstester 
[mailto:onmstes...@zoho.com<mailto:onmstes...@zoho.com>]
Sent: Monday, March 12, 2018 12:08 PM
To: user <user@cassandra.apache.org<mailto:user@cassandra.apache.org>>
Subject: RE: yet another benchmark bottleneck

RF=1
No errors or warnings.
Actually its 300 Mbit/seconds and 130K OP/seconds. I missed a 'K' in first 
mail, but anyway! the point is: More than half of node resources (cpu, mem, 
disk, network) is unused and i can't increase write throughput.


Sent using Zoho Mail<https://www.zoho.com/mail/>


 On Mon, 12 Mar 2018 14:25:12 +0330 Jacques-Henri Berthemet 
<jacques-henri.berthe...@genesys.com<mailto:jacques-henri.berthe...@genesys.com>>
 wrote 

Any errors/warning in Cassandra logs? What’s your RF?
Using 300MB/s of network bandwidth for only 130 op/s looks very high.

--
Jacques-Henri Berthemet

From: onmstester onmstester 
[mailto:onmstes...@zoho.com<mailto:onmstes...@zoho.com>]
Sent: Monday, March 12, 2018 11:38 AM
To: user <user@cassandra.apache.org<mailto:user@cassandra.apache.org>>
Subject: RE: yet another benchmark bottleneck

1.2 TB 15K
latency reported by stress tool is 7.6 ms. disk latency is 2.6 ms


Sent using Zoho Mail<https://www.zoho.com/mail/>


 On Mon, 12 Mar 2018 14:02:29 +0330 Jacques-Henri Berthemet 
<jacques-henri.berthe...@genesys.com<mailto:jacques-henri.berthe...@genesys.com>>
 wrote 

What’s your disk latency? What kind of disk is it?

--
Jacques-Henri Berthemet

From: onmstester onmstester 
[mailto:onmstes...@zoho.com<mailto:onmstes...@zoho.com>]
Sent: Monday, March 12, 2018 10:48 AM
To: user <user@cassandra.apache.org<mailto:user@cassandra.apache.org>>
Subject: Re: yet another benchmark bottleneck

Running two instance of Apache Cassandra on same server, each having their own 
commit log disk dis not help. Sum of cpu/ram usage  for both instances would be 
less than half of all available resources. disk usage is less than 20% and 
network is still less than 300Mb in Rx.


Sent using Zoho Mail<https://www.zoho.com/mail/>


 On Mon, 12 Mar 2018 09:34:26 +0330 onmstester onmstester 
<onmstes...@zoho.com<mailto:onmstes...@zoho.com>> wrote 

Apache-cassandra-3.11.1
Yes, i'm dosing a single host test


Sent using Zoho Mail<https://www.zoho.com/mail/>


 On Mon, 12 Mar 2018 09:24:04 +0330 Jeff Jirsa 
<jji...@gmail.com<mailto:jji...@gmail.com>> wrote 



Would help to know your version. 130 ops/second sounds like a ridiculously low 
rate. Are you doing a single host test?

On Sun, Mar 11, 2018 at 10:44 PM, onmstester onmstester 
<onmstes...@zoho.com<mailto:onmstes...@zoho.com>> wrote:


I'm going to benchmark Cassandra's write throughput on a node with following 
spec:

  *   CPU: 20 Cores
  *   Memory: 128 GB (32 GB as Cassandra heap)
  *   Disk: 3 seprate disk for OS, data and commitlog
  *   Network: 10 Gb (test it with iperf)
  *   Os: Ubuntu 16

Running Cassandra-stress:
cassandra-stress write n=100 -rate threads=1000 -mode native cql3 -node 
X.X.X.X

from two node with same spec as above, i can not get throughput more than 130 
Op/s. The clients are using less than 50% of CPU, Cassandra node uses:

  *   60% of cpu
  *   30% of memory
  *   30-40% util in iostat of commitlog
  *   300 Mb of network bandwidth
I suspect the network, cause no matter how many clients i run, cassandra always 
using less than 300 Mb. I've done all the tuning mentioned by datastax.
Increasing wmem_max and rmem_max did not help either.


Sent using Zoho Mail<https://www.zoho.com/mail/>










RE: yet another benchmark bottleneck

2018-03-12 Thread Jacques-Henri Berthemet
It makes more sense now, 130K is not that bad.

According to cassandra.yaml you should be able to increase your number of write 
threads in Cassandra:
# On the other hand, since writes are almost never IO bound, the ideal
# number of "concurrent_writes" is dependent on the number of cores in
# your system; (8 * number_of_cores) is a good rule of thumb.
concurrent_reads: 32
concurrent_writes: 32
concurrent_counter_writes: 32

Jumping directly to 160 would be a bit high with spinning disks, maybe start 
with 64 just to see if it gets better.

--
Jacques-Henri Berthemet

From: onmstester onmstester [mailto:onmstes...@zoho.com]
Sent: Monday, March 12, 2018 12:08 PM
To: user <user@cassandra.apache.org>
Subject: RE: yet another benchmark bottleneck

RF=1
No errors or warnings.
Actually its 300 Mbit/seconds and 130K OP/seconds. I missed a 'K' in first 
mail, but anyway! the point is: More than half of node resources (cpu, mem, 
disk, network) is unused and i can't increase write throughput.


Sent using Zoho Mail<https://www.zoho.com/mail/>


 On Mon, 12 Mar 2018 14:25:12 +0330 Jacques-Henri Berthemet 
<jacques-henri.berthe...@genesys.com<mailto:jacques-henri.berthe...@genesys.com>>
 wrote 

Any errors/warning in Cassandra logs? What’s your RF?
Using 300MB/s of network bandwidth for only 130 op/s looks very high.

--
Jacques-Henri Berthemet

From: onmstester onmstester 
[mailto:onmstes...@zoho.com<mailto:onmstes...@zoho.com>]
Sent: Monday, March 12, 2018 11:38 AM
To: user <user@cassandra.apache.org<mailto:user@cassandra.apache.org>>
Subject: RE: yet another benchmark bottleneck

1.2 TB 15K
latency reported by stress tool is 7.6 ms. disk latency is 2.6 ms


Sent using Zoho Mail<https://www.zoho.com/mail/>


 On Mon, 12 Mar 2018 14:02:29 +0330 Jacques-Henri Berthemet 
<jacques-henri.berthe...@genesys.com<mailto:jacques-henri.berthe...@genesys.com>>
 wrote 

What’s your disk latency? What kind of disk is it?

--
Jacques-Henri Berthemet

From: onmstester onmstester 
[mailto:onmstes...@zoho.com<mailto:onmstes...@zoho.com>]
Sent: Monday, March 12, 2018 10:48 AM
To: user <user@cassandra.apache.org<mailto:user@cassandra.apache.org>>
Subject: Re: yet another benchmark bottleneck

Running two instance of Apache Cassandra on same server, each having their own 
commit log disk dis not help. Sum of cpu/ram usage  for both instances would be 
less than half of all available resources. disk usage is less than 20% and 
network is still less than 300Mb in Rx.


Sent using Zoho Mail<https://www.zoho.com/mail/>


 On Mon, 12 Mar 2018 09:34:26 +0330 onmstester onmstester 
<onmstes...@zoho.com<mailto:onmstes...@zoho.com>> wrote 

Apache-cassandra-3.11.1
Yes, i'm dosing a single host test


Sent using Zoho Mail<https://www.zoho.com/mail/>


 On Mon, 12 Mar 2018 09:24:04 +0330 Jeff Jirsa 
<jji...@gmail.com<mailto:jji...@gmail.com>> wrote 



Would help to know your version. 130 ops/second sounds like a ridiculously low 
rate. Are you doing a single host test?

On Sun, Mar 11, 2018 at 10:44 PM, onmstester onmstester 
<onmstes...@zoho.com<mailto:onmstes...@zoho.com>> wrote:


I'm going to benchmark Cassandra's write throughput on a node with following 
spec:

  *   CPU: 20 Cores
  *   Memory: 128 GB (32 GB as Cassandra heap)
  *   Disk: 3 seprate disk for OS, data and commitlog
  *   Network: 10 Gb (test it with iperf)
  *   Os: Ubuntu 16

Running Cassandra-stress:
cassandra-stress write n=100 -rate threads=1000 -mode native cql3 -node 
X.X.X.X

from two node with same spec as above, i can not get throughput more than 130 
Op/s. The clients are using less than 50% of CPU, Cassandra node uses:

  *   60% of cpu
  *   30% of memory
  *   30-40% util in iostat of commitlog
  *   300 Mb of network bandwidth
I suspect the network, cause no matter how many clients i run, cassandra always 
using less than 300 Mb. I've done all the tuning mentioned by datastax.
Increasing wmem_max and rmem_max did not help either.


Sent using Zoho Mail<https://www.zoho.com/mail/>








RE: yet another benchmark bottleneck

2018-03-12 Thread Jacques-Henri Berthemet
Any errors/warning in Cassandra logs? What’s your RF?
Using 300MB/s of network bandwidth for only 130 op/s looks very high.

--
Jacques-Henri Berthemet

From: onmstester onmstester [mailto:onmstes...@zoho.com]
Sent: Monday, March 12, 2018 11:38 AM
To: user <user@cassandra.apache.org>
Subject: RE: yet another benchmark bottleneck

1.2 TB 15K
latency reported by stress tool is 7.6 ms. disk latency is 2.6 ms


Sent using Zoho Mail<https://www.zoho.com/mail/>


 On Mon, 12 Mar 2018 14:02:29 +0330 Jacques-Henri Berthemet 
<jacques-henri.berthe...@genesys.com<mailto:jacques-henri.berthe...@genesys.com>>
 wrote 

What’s your disk latency? What kind of disk is it?

--
Jacques-Henri Berthemet

From: onmstester onmstester 
[mailto:onmstes...@zoho.com<mailto:onmstes...@zoho.com>]
Sent: Monday, March 12, 2018 10:48 AM
To: user <user@cassandra.apache.org<mailto:user@cassandra.apache.org>>
Subject: Re: yet another benchmark bottleneck

Running two instance of Apache Cassandra on same server, each having their own 
commit log disk dis not help. Sum of cpu/ram usage  for both instances would be 
less than half of all available resources. disk usage is less than 20% and 
network is still less than 300Mb in Rx.


Sent using Zoho Mail<https://www.zoho.com/mail/>


 On Mon, 12 Mar 2018 09:34:26 +0330 onmstester onmstester 
<onmstes...@zoho.com<mailto:onmstes...@zoho.com>> wrote 

Apache-cassandra-3.11.1
Yes, i'm dosing a single host test


Sent using Zoho Mail<https://www.zoho.com/mail/>


 On Mon, 12 Mar 2018 09:24:04 +0330 Jeff Jirsa 
<jji...@gmail.com<mailto:jji...@gmail.com>> wrote 



Would help to know your version. 130 ops/second sounds like a ridiculously low 
rate. Are you doing a single host test?

On Sun, Mar 11, 2018 at 10:44 PM, onmstester onmstester 
<onmstes...@zoho.com<mailto:onmstes...@zoho.com>> wrote:


I'm going to benchmark Cassandra's write throughput on a node with following 
spec:

  *   CPU: 20 Cores
  *   Memory: 128 GB (32 GB as Cassandra heap)
  *   Disk: 3 seprate disk for OS, data and commitlog
  *   Network: 10 Gb (test it with iperf)
  *   Os: Ubuntu 16

Running Cassandra-stress:
cassandra-stress write n=100 -rate threads=1000 -mode native cql3 -node 
X.X.X.X

from two node with same spec as above, i can not get throughput more than 130 
Op/s. The clients are using less than 50% of CPU, Cassandra node uses:

  *   60% of cpu
  *   30% of memory
  *   30-40% util in iostat of commitlog
  *   300 Mb of network bandwidth
I suspect the network, cause no matter how many clients i run, cassandra always 
using less than 300 Mb. I've done all the tuning mentioned by datastax.
Increasing wmem_max and rmem_max did not help either.


Sent using Zoho Mail<https://www.zoho.com/mail/>






RE: yet another benchmark bottleneck

2018-03-12 Thread Jacques-Henri Berthemet
What’s your disk latency? What kind of disk is it?

--
Jacques-Henri Berthemet

From: onmstester onmstester [mailto:onmstes...@zoho.com]
Sent: Monday, March 12, 2018 10:48 AM
To: user <user@cassandra.apache.org>
Subject: Re: yet another benchmark bottleneck

Running two instance of Apache Cassandra on same server, each having their own 
commit log disk dis not help. Sum of cpu/ram usage  for both instances would be 
less than half of all available resources. disk usage is less than 20% and 
network is still less than 300Mb in Rx.


Sent using Zoho Mail<https://www.zoho.com/mail/>


 On Mon, 12 Mar 2018 09:34:26 +0330 onmstester onmstester 
<onmstes...@zoho.com<mailto:onmstes...@zoho.com>> wrote 

Apache-cassandra-3.11.1
Yes, i'm dosing a single host test


Sent using Zoho Mail<https://www.zoho.com/mail/>


 On Mon, 12 Mar 2018 09:24:04 +0330 Jeff Jirsa 
<jji...@gmail.com<mailto:jji...@gmail.com>> wrote 



Would help to know your version. 130 ops/second sounds like a ridiculously low 
rate. Are you doing a single host test?

On Sun, Mar 11, 2018 at 10:44 PM, onmstester onmstester 
<onmstes...@zoho.com<mailto:onmstes...@zoho.com>> wrote:


I'm going to benchmark Cassandra's write throughput on a node with following 
spec:

  *   CPU: 20 Cores
  *   Memory: 128 GB (32 GB as Cassandra heap)
  *   Disk: 3 seprate disk for OS, data and commitlog
  *   Network: 10 Gb (test it with iperf)
  *   Os: Ubuntu 16

Running Cassandra-stress:
cassandra-stress write n=100 -rate threads=1000 -mode native cql3 -node 
X.X.X.X

from two node with same spec as above, i can not get throughput more than 130 
Op/s. The clients are using less than 50% of CPU, Cassandra node uses:

  *   60% of cpu
  *   30% of memory
  *   30-40% util in iostat of commitlog
  *   300 Mb of network bandwidth
I suspect the network, cause no matter how many clients i run, cassandra always 
using less than 300 Mb. I've done all the tuning mentioned by datastax.
Increasing wmem_max and rmem_max did not help either.


Sent using Zoho Mail<https://www.zoho.com/mail/>





RE: Cassandra DevCenter

2018-03-12 Thread Jacques-Henri Berthemet
Hi,

There is no DevCenter 2.x, latest is 1.6. It would help if you provide jar 
names and exceptions you encounter. Make sure you’re not mixing Guava versions 
from other dependencies. DevCenter uses Datastax driver to connect to 
Cassandra, double check the versions of the jars you need here:
https://mvnrepository.com/artifact/com.datastax.cassandra/cassandra-driver-core

Put only the jars listed on the driver version you have on you classpath and it 
should work.

--
Jacques-Henri Berthemet

From: Philippe de Rochambeau [mailto:phi...@free.fr]
Sent: Saturday, March 10, 2018 6:56 PM
To: user@cassandra.apache.org
Subject: Re: Cassandra DevCenter

Hi,
thank you for replying.
Unfortunately, the computer DevCenter is running on doesn’t have Internet 
access (for security reasons).  As a result, I can’t use the pom.xml.
Furthermore, I’ve tried running a Groovy program whose classpath included the 
DevCenter (2.x) lib directory, but to no avail as a Google dependency was 
missing (I can’t recall the dependency’s name).
Because DevCenter manages to connect to Cassandra without downloading 
dependencies, there’s bound to be a way to drive the former using Java or 
Groovy.

Le 10 mars 2018 à 18:34, Goutham reddy 
<goutham.chiru...@gmail.com<mailto:goutham.chiru...@gmail.com>> a écrit :
Get the JARS from Cassandra lib folder and put it in your build path. Or else 
use Pom.xml maven project to directly download from repository.

Thanks and Regards,
Goutham Reddy Aenugu.

On Sat, Mar 10, 2018 at 9:30 AM Philippe de Rochambeau 
<phi...@free.fr<mailto:phi...@free.fr>> wrote:
Hello,
has anyone tried running CQL queries from a Java program using the jars 
provided with DevCenter?
Many thanks.
Philippe

-
To unsubscribe, e-mail: 
user-unsubscr...@cassandra.apache.org<mailto:user-unsubscr...@cassandra.apache.org>
For additional commands, e-mail: 
user-h...@cassandra.apache.org<mailto:user-h...@cassandra.apache.org>
--
Regards
Goutham Reddy


RE: Secondary Indexes C* 3.0

2018-02-23 Thread Jacques-Henri Berthemet
A very interesting and detailed article, thank you DuyHai. I think this should 
be part of general Cassandra documentation.

--
Jacques-Henri Berthemet

From: DuyHai Doan [mailto:doanduy...@gmail.com]
Sent: Thursday, February 22, 2018 7:04 PM
To: user <user@cassandra.apache.org>
Subject: Re: Secondary Indexes C* 3.0

Read this: 
http://www.doanduyhai.com/blog/?p=13191<http://www.doanduyhai.com/blog/?p=13191>




On Thu, Feb 22, 2018 at 6:44 PM, Akash Gangil 
<akashg1...@gmail.com<mailto:akashg1...@gmail.com>> wrote:
To provide more context, I was going through this 
https://docs.datastax.com/en/cql/3.3/cql/cql_using/useWhenIndex.html#useWhenIndex__highCardCol<https://docs.datastax.com/en/cql/3.3/cql/cql_using/useWhenIndex.html#useWhenIndex__highCardCol>

On Thu, Feb 22, 2018 at 9:35 AM, Akash Gangil 
<akashg1...@gmail.com<mailto:akashg1...@gmail.com>> wrote:
Hi,
I was wondering if there are recommendations around the cardinality of 
secondary indexes.

As I understand an index on a column with many distinct values will be 
inefficient. Is it because the index would only direct me to the specfic 
sstable, but then it sequentially searches for the target records? So a wide 
range of the index could lead to a lot of ssltable options to traverse?
Though what's unclear is what the recommended (or benchmarked?) limit, is it 
the index must have 100 distinct values, or can it have upto 1000 or 5 
distinct values?
thanks!




--
Akash


--
Akash



RE: Cassandra Needs to Grow Up by Version Five!

2018-02-22 Thread Jacques-Henri Berthemet
Hi Kenneth,

As a Cassandra user I value usability, but since it's a database I value 
consistency and performance even more. If you want usability and documentation 
you can use Datastax DSE, after all that's where they add value on top of 
Cassandra. Since Datastax actually paid dev to work Cassandra internals, it's 
understandable that they kept some part (usability) for their own product. We 
all notice that when you google for some CQL commands you'll always end up to 
Datastax site, it would be great if that was not the case but it would take a 
lot of time.

Also, as a manager you're not supposed to fight with devs but to allocate 
tasks/time. If you have to choose between enhancing documentation and fixing 
this bad race condition that corrupts data, I hope you'd choose the later.

As for filling Jiras, if you create one like "I want a UI to setup TLS" it 
would be the kind of Jira nobody would implement, it takes a lot of time, 
touches security and may not be that useful in the end.

Last point on usability for Cassandra, as an end user it's very difficult to 
see the progress on it, but since I'm using Cassandra internals for my custom 
secondary index I can tell you that there was a huge rework between Cassandra 
2.2 and 3.x, PartitionIterators are a very elegant solution and is really 
helpful in my case, great work guys :)
--
Jacques-Henri Berthemet

-Original Message-
From: Kenneth Brotman [mailto:kenbrot...@yahoo.com.INVALID] 
Sent: Wednesday, February 21, 2018 11:54 PM
To: d...@cassandra.apache.org
Cc: user@cassandra.apache.org
Subject: RE: Cassandra Needs to Grow Up by Version Five!

Hi Akash,

I get the part about outside work which is why in replying to Jeff Jirsa I was 
suggesting the big companies could justify taking it on easy enough and you 
know actually pay the people who would be working at it so those people could 
have a life.

The part I don't get is the aversion to usability.  Isn't that what you think 
about when you are coding?  "Am I making this thing I'm building easy to use?"  
If you were programming for me, we would be constantly talking about what we 
are building and how we can make things easier for users.  If I had to fight 
with a developer, architect or engineer about usability all the time, they 
would be gone and quick.  How do approach programming if you aren't trying to 
make things easy.

Kenneth Brotman

-Original Message-
From: Akash Gangil [mailto:akashg1...@gmail.com]
Sent: Wednesday, February 21, 2018 2:24 PM
To: d...@cassandra.apache.org
Cc: user@cassandra.apache.org
Subject: Re: Cassandra Needs to Grow Up by Version Five!

I would second Jon in the arguments he made. Contributing outside work is 
draining and really requires a lot of commitment. If someone requires features 
around usability etc, just pay for it, period.

On Wed, Feb 21, 2018 at 2:20 PM, Kenneth Brotman < 
kenbrot...@yahoo.com.invalid> wrote:

> Jon,
>
> Very sorry that you don't see the value of the time I'm taking for this.
> I don't have demands; I do have a stern warning and I'm right Jon.  
> Please be very careful not to mischaracterized my words Jon.
>
> You suggest I put things in JIRA's, then seem to suggest that I'd be 
> lucky if anyone looked at it and did anything. That's what I figured too.
>
> I don't appreciate the hostility.  You will understand more fully in 
> the next post where I'm coming from.  Try to keep the conversation civilized.
> I'm trying or at least so you understand I think what I'm doing is 
> saving your gig and mine.  I really like a lot of people is this group.
>
> I've come to a preliminary assessment on things.  Soon the cloud will 
> clear or I'll be gone.  Don't worry.  I'm a very peaceful person and 
> like you I am driven by real important projects that I feel compelled 
> to work on for the good of others.  I don't have time for people to 
> hand hold a database and I can't get stuck with my projects on the wrong 
> stuff.
>
> Kenneth Brotman
>
>
> -Original Message-
> From: Jon Haddad [mailto:jonathan.had...@gmail.com] On Behalf Of Jon 
> Haddad
> Sent: Wednesday, February 21, 2018 12:44 PM
> To: user@cassandra.apache.org
> Cc: d...@cassandra.apache.org
> Subject: Re: Cassandra Needs to Grow Up by Version Five!
>
> Ken,
>
> Maybe it’s not clear how open source projects work, so let me try to 
> explain.  There’s a bunch of us who either get paid by someone or 
> volunteer on our free time.  The folks that get paid, (yay!) usually 
> take direction on what the priorities are, and work on projects that 
> directly affect our jobs.  That means that someone needs to care 
> enough about the features you want to work on them, if you’re not going to do 
> it yourself.
>
> Now as others have said already, please put your list of demands in 
> JIRA, if someone i

RE: overhead of empty tables

2018-02-16 Thread Jacques-Henri Berthemet
The main overhead is that each table locks 1MB of Java heap, so if you have 
1000 tables it will use 1GB of RAM just for managing the tables, even if they 
are empty.

--
Jacques-Henri Berthemet

From: Alaa Zubaidi (PDF) [mailto:alaa.zuba...@pdf.com]
Sent: Friday, February 16, 2018 1:05 AM
To: user@cassandra.apache.org; Dinesh Joshi <dinesh.jo...@yahoo.com>
Subject: Re: overhead of empty tables

Thanks Dinesh,
We have 36332 files under the data folder

On Thu, Feb 15, 2018 at 3:45 PM, Dinesh Joshi 
<dinesh.jo...@yahoo.com.invalid<mailto:dinesh.jo...@yahoo.com.invalid>> wrote:
Each table in a keyspace is stored as a separate directory in the data 
directory. If you many tables you'll have a lot of files. Some file systems 
have issues dealing with a lot of files in a single directory. Other than that, 
there will likely be some book keeping overhead within the Cassandra process. 
How many tables are we talking about here?

Here's more information about it: 
https://docs.datastax.com/en/cassandra/3.0/cassandra/dml/dmlHowDataWritten.html<https://docs.datastax.com/en/cassandra/3.0/cassandra/dml/dmlHowDataWritten.html>

Dinesh


On Thursday, February 15, 2018, 3:34:49 PM PST, Alaa Zubaidi (PDF) 
<alaa.zuba...@pdf.com<mailto:alaa.zuba...@pdf.com>> wrote:


Is there any overhead if my keyspace contains many empty tables?
Thanks
Alaa

This message may contain confidential and privileged information. If it has 
been sent to you in error, please reply to advise the sender of the error and 
then immediately permanently delete it and all attachments to it from your 
systems. If you are not the intended recipient, do not read, copy, disclose or 
otherwise use this message or any attachments to it. The sender disclaims any 
liability for such unauthorized use. PLEASE NOTE that all incoming e-mails sent 
to PDF e-mail accounts will be archived and may be scanned by us and/or by 
external service providers to detect and prevent threats to our systems, 
investigate illegal or inappropriate behavior, and/or eliminate unsolicited 
promotional e-mails (“spam”). If you have any concerns about this process, 
please contact us at legal.departm...@pdf.com<mailto:legal.departm...@pdf.com>.



--

Alaa Zubaidi
PDF Solutions, Inc.
333 West San Carlos Street, Suite 1000
San Jose, CA 95110  USA
Tel: 408-283-5639
fax: 408-938-6479
email: alaa.zuba...@pdf.com<mailto:alaa.zuba...@pdf.com>

This message may contain confidential and privileged information. If it has 
been sent to you in error, please reply to advise the sender of the error and 
then immediately permanently delete it and all attachments to it from your 
systems. If you are not the intended recipient, do not read, copy, disclose or 
otherwise use this message or any attachments to it. The sender disclaims any 
liability for such unauthorized use. PLEASE NOTE that all incoming e-mails sent 
to PDF e-mail accounts will be archived and may be scanned by us and/or by 
external service providers to detect and prevent threats to our systems, 
investigate illegal or inappropriate behavior, and/or eliminate unsolicited 
promotional e-mails (“spam”). If you have any concerns about this process, 
please contact us at legal.departm...@pdf.com<mailto:legal.departm...@pdf.com>.


RE: LWT broken?

2018-02-13 Thread Jacques-Henri Berthemet
Yes, non-applied LWT will return the row of the winning result. I agree, in 
theory I’d expect your code to have a correct behavior.

You could also check release notes of later Cassandra versions for LWT related 
bugs. If your ids are timeUUID you could try to extract the time when the 
inconsistencies happened and check corresponding Cassandra logs to see what 
happened.
--
Jacques-Henri Berthemet

From: Mahdi Ben Hamida [mailto:ma...@signalfx.com]
Sent: Monday, February 12, 2018 8:45 PM
To: user@cassandra.apache.org
Subject: Re: LWT broken?

On 2/12/18 2:04 AM, Jacques-Henri Berthemet wrote:

Mahdi, you don’t need to re-read at CL ONE on line 9. When a LWT statement is 
not applied, the values that prevented the LWT are returned as part of the 
response, I’d expect them to be more consistent than your read. I’m not 100% 
sure it’s the case for 2.0.x but it’s the case for Cassandra 2.2.

Yes. That's an optimization that can be added. I need to check that it works 
properly with the version of cassandra that I'm running. Right now, we have 
line 9 done at a SERIAL consistency and the issue still happens.



And it’s the same for line 1, you should only keep your LWT statement unless 
you have a huge performance benefit of doing. In Cassandra doing a read before 
write is a bad pattern.
I'll be trying this next and seeing if the issue disappears when we change it 
to serial. Although, I still don't understand how this would cause any 
inconsistencies. In the worst case, a non serial read would return no rows for 
the specified primary key which I handle by trying to do an LWT insert. If it's 
returning a result, I assume that result will be the row that the winning 
lightweight transaction has written. I think that assumption may not be correct 
all the time and I would love to understand why that is the case.

--
Mahdi.


AFAIK a LWT statement is always executed as SERIAL, the only choice you have is 
between SERIAL and LOCAL_SERIAL.

Regards,
--
Jacques-Henri Berthemet

From: DuyHai Doan [mailto:doanduy...@gmail.com]
Sent: Sunday, February 11, 2018 6:11 PM
To: user <user@cassandra.apache.org><mailto:user@cassandra.apache.org>
Subject: Re: LWT broken?

Mahdi , the issue in your code is here:

else // we lost LWT, fetch the winning value
 9existing_id = SELECT id FROM hash_id WHERE hash=computed_hash | 
consistency = ONE

You lost LWT, it means that there is a concurrent LWT that has won the Paxos 
round and has applied the value using QUORUM/SERIAL.

In best case, it means that the won LWT value has been applied to at least 2 
replicas out of 3 (assuming RF=3)
In worst case, the won LWT value has not been applied yet or is pending to be 
applied to any replica

Now, if you immediately read with CL=ONE, you may:

1) Read the staled value on the 3rd replica which has not yet received the 
correct won LWT value
2) Or worst, read a staled value because the won LWT is being applied when the 
read operation is made

That's the main reason reading with CL=SERIAL is recommended (CL=QUORUM is not 
sufficient enough)

Reading with CL=SERIAL will:

a. like QUORUM, contact strict majority of replicas
b. unlike QUORUM, look for validated (but not yet applied) previous Paxos round 
value and force-applied it before actually reading the new value




On Sun, Feb 11, 2018 at 5:36 PM, Mahdi Ben Hamida 
<ma...@signalfx.com<mailto:ma...@signalfx.com>> wrote:

Totally understood that it's not worth (or it's rather incorrect) to mix serial 
and non serial operations for LWT tables. It would be highly satisfying to my 
engineer mind if someone can explain why that would cause issues in this 
particular situation. The only explanation I have is that a non serial read may 
cause a read repair to happen and that could interfere with a concurrent serial 
write, although I still can't explain how that would cause two different 
"insert if not exist" transactions to both succeed.

--

Mahdi.
On 2/9/18 2:40 PM, Jonathan Haddad wrote:
If you want consistent reads you have to use the CL that enforces it. There’s 
no way around it.
On Fri, Feb 9, 2018 at 2:35 PM Mahdi Ben Hamida 
<ma...@signalfx.com<mailto:ma...@signalfx.com>> wrote:

In this case, we only write using CAS (code guarantees that). We also never 
update, just insert if not exist. Once a hash exists, it never changes (it may 
get deleted later and that'll be a CAS delete as well).

--

Mahdi.
On 2/9/18 1:38 PM, Jeff Jirsa wrote:


On Fri, Feb 9, 2018 at 1:33 PM, Mahdi Ben Hamida 
<ma...@signalfx.com<mailto:ma...@signalfx.com>> wrote:

 Under what circumstances would we be reading inconsistent results ? Is there a 
case where we end up reading a value that actually end up not being written ?




If you ever write the same value with CAS and without CAS (different code paths 
both updating the same value), you're using CAS wrong, and inconsistencies can 
happen.








RE: LWT broken?

2018-02-12 Thread Jacques-Henri Berthemet
Mahdi, you don’t need to re-read at CL ONE on line 9. When a LWT statement is 
not applied, the values that prevented the LWT are returned as part of the 
response, I’d expect them to be more consistent than your read. I’m not 100% 
sure it’s the case for 2.0.x but it’s the case for Cassandra 2.2.

And it’s the same for line 1, you should only keep your LWT statement unless 
you have a huge performance benefit of doing. In Cassandra doing a read before 
write is a bad pattern.

AFAIK a LWT statement is always executed as SERIAL, the only choice you have is 
between SERIAL and LOCAL_SERIAL.

Regards,
--
Jacques-Henri Berthemet

From: DuyHai Doan [mailto:doanduy...@gmail.com]
Sent: Sunday, February 11, 2018 6:11 PM
To: user <user@cassandra.apache.org>
Subject: Re: LWT broken?

Mahdi , the issue in your code is here:

else // we lost LWT, fetch the winning value
 9existing_id = SELECT id FROM hash_id WHERE hash=computed_hash | 
consistency = ONE

You lost LWT, it means that there is a concurrent LWT that has won the Paxos 
round and has applied the value using QUORUM/SERIAL.

In best case, it means that the won LWT value has been applied to at least 2 
replicas out of 3 (assuming RF=3)
In worst case, the won LWT value has not been applied yet or is pending to be 
applied to any replica

Now, if you immediately read with CL=ONE, you may:

1) Read the staled value on the 3rd replica which has not yet received the 
correct won LWT value
2) Or worst, read a staled value because the won LWT is being applied when the 
read operation is made

That's the main reason reading with CL=SERIAL is recommended (CL=QUORUM is not 
sufficient enough)

Reading with CL=SERIAL will:

a. like QUORUM, contact strict majority of replicas
b. unlike QUORUM, look for validated (but not yet applied) previous Paxos round 
value and force-applied it before actually reading the new value




On Sun, Feb 11, 2018 at 5:36 PM, Mahdi Ben Hamida 
<ma...@signalfx.com<mailto:ma...@signalfx.com>> wrote:

Totally understood that it's not worth (or it's rather incorrect) to mix serial 
and non serial operations for LWT tables. It would be highly satisfying to my 
engineer mind if someone can explain why that would cause issues in this 
particular situation. The only explanation I have is that a non serial read may 
cause a read repair to happen and that could interfere with a concurrent serial 
write, although I still can't explain how that would cause two different 
"insert if not exist" transactions to both succeed.

--

Mahdi.
On 2/9/18 2:40 PM, Jonathan Haddad wrote:
If you want consistent reads you have to use the CL that enforces it. There’s 
no way around it.
On Fri, Feb 9, 2018 at 2:35 PM Mahdi Ben Hamida 
<ma...@signalfx.com<mailto:ma...@signalfx.com>> wrote:

In this case, we only write using CAS (code guarantees that). We also never 
update, just insert if not exist. Once a hash exists, it never changes (it may 
get deleted later and that'll be a CAS delete as well).

--

Mahdi.
On 2/9/18 1:38 PM, Jeff Jirsa wrote:


On Fri, Feb 9, 2018 at 1:33 PM, Mahdi Ben Hamida 
<ma...@signalfx.com<mailto:ma...@signalfx.com>> wrote:

 Under what circumstances would we be reading inconsistent results ? Is there a 
case where we end up reading a value that actually end up not being written ?




If you ever write the same value with CAS and without CAS (different code paths 
both updating the same value), you're using CAS wrong, and inconsistencies can 
happen.







RE: ClassNotFoundException when trigger is fired

2017-12-06 Thread Jacques-Henri Berthemet
Hi,

I have a custom secondary index that works well with Cassandra, I put the jar 
file in Cassandra's lib folder before starting Cassandra, maybe you can try to 
do the same thing?

I don't think that Cassandra's class loader is dynamic, you need to have your 
jars in the classpath before starting Cassandra.

Regards,
--
Jacques-Henri Berthemet

From: tsubasa.nar...@us.fujitsu.com [mailto:tsubasa.nar...@us.fujitsu.com]
Sent: mercredi 6 décembre 2017 19:49
To: user@cassandra.apache.org
Subject: ClassNotFoundException when trigger is fired

Dear All

I use cassandra trigger to detect data change in DB and usually it works.
But sometime I get ClassNotFoundException when trigger is fired.

Following is what I did
1. create class which implement ITrigger interface. ex)class name is 
TestTrigger.java
2. create jar file and put it under conf/triggers ex)jar file name is 
TestTrigger.jar
3. start cassandra
4. I can find following log. Looks like jar file is loaded successfully
INFO  [OptionalTasks:1] 2017-11-30 03:55:43,541 CustomClassLoader.java:87 - 
Loading new jar /home/tnarita/cassandra/conf/triggers/TestTrigger.jar
5. login cql and create trigger for test table.
6. insert value into test table
7. trigger is fired.
8. I got ClassNotFoundException. following is the log

java.lang.RuntimeException: Exception while executing trigger on table with ID: 
1cb6a5a0-cb00-11e7-a737-49047aea57a8
at 
org.apache.cassandra.triggers.TriggerExecutor.executeInternal(TriggerExecutor.java:241)
 ~[apache-cassandra-3.9.jar:3.9]
at 
org.apache.cassandra.triggers.TriggerExecutor.execute(TriggerExecutor.java:119) 
~[apache-cassandra-3.9.jar:3.9]
at 
org.apache.cassandra.service.StorageProxy.mutateWithTriggers(StorageProxy.java:823)
 ~[apache-cassandra-3.9.jar:3.9]
at 
org.apache.cassandra.cql3.statements.ModificationStatement.executeWithoutCondition(ModificationStatement.java:431)
 ~[apache-cassandra-3.9.jar:3.9]
at 
org.apache.cassandra.cql3.statements.ModificationStatement.execute(ModificationStatement.java:417)
 ~[apache-cassandra-3.9.jar:3.9]
at 
org.apache.cassandra.cql3.QueryProcessor.processStatement(QueryProcessor.java:188)
 ~[apache-cassandra-3.9.jar:3.9]
at 
org.apache.cassandra.cql3.QueryProcessor.process(QueryProcessor.java:219) 
~[apache-cassandra-3.9.jar:3.9]
at 
org.apache.cassandra.cql3.QueryProcessor.process(QueryProcessor.java:204) 
~[apache-cassandra-3.9.jar:3.9]
at 
org.apache.cassandra.transport.messages.QueryMessage.execute(QueryMessage.java:115)
 ~[apache-cassandra-3.9.jar:3.9]
at 
org.apache.cassandra.transport.Message$Dispatcher.channelRead0(Message.java:513)
 [apache-cassandra-3.9.jar:3.9]
at 
org.apache.cassandra.transport.Message$Dispatcher.channelRead0(Message.java:407)
 [apache-cassandra-3.9.jar:3.9]
at 
io.netty.channel.SimpleChannelInboundHandler.channelRead(SimpleChannelInboundHandler.java:105)
 [netty-all-4.0.39.Final.jar:4.0.39.Final]
at 
io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:366)
 [netty-all-4.0.39.Final.jar:4.0.39.Final]
at 
io.netty.channel.AbstractChannelHandlerContext.access$600(AbstractChannelHandlerContext.java:35)
 [netty-all-4.0.39.Final.jar:4.0.39.Final]
at 
io.netty.channel.AbstractChannelHandlerContext$7.run(AbstractChannelHandlerContext.java:357)
 [netty-all-4.0.39.Final.jar:4.0.39.Final]
at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) 
[na:1.8.0_66]
at 
org.apache.cassandra.concurrent.AbstractLocalAwareExecutorService$FutureTask.run(AbstractLocalAwareExecutorService.java:164)
 [apache-cassandra-3.9.jar:3.9]
at 
org.apache.cassandra.concurrent.SEPWorker.run(SEPWorker.java:109) 
[apache-cassandra-3.9.jar:3.9]
at java.lang.Thread.run(Thread.java:745) [na:1.8.0_66]
Caused by: java.lang.ClassNotFoundException: com.test.TestTrigger
at java.net.URLClassLoader.findClass(URLClassLoader.java:381) ~[na:1.8.0_66]
at 
org.apache.cassandra.triggers.CustomClassLoader.loadClassInternal(CustomClassLoader.java:118)
 ~[apache-cassandra-3.9.jar:3.9]
at 
org.apache.cassandra.triggers.CustomClassLoader.loadClass(CustomClassLoader.java:103)
 ~[apache-cassandra-3.9.jar:3.9]
at 
org.apache.cassandra.triggers.TriggerExecutor.loadTriggerInstance(TriggerExecutor.java:254)
 ~[apache-cassandra-3.9.jar:3.9]
at 
org.apache.cassandra.triggers.TriggerExecutor.executeInternal(TriggerExecutor.java:226)
 ~[apache-cassandra-3.9.jar:3.9]
... 18 common frames omitted


When I get this issue, restarting cassandra resolve this issue.
TestTrigger.java output dummy log in static initializer. When I don't get this 
issue, I can find dummy log after Step4

RE: Solr Search With Apache Cassandra

2017-11-20 Thread Jacques-Henri Berthemet
Tjake is now working for Datastax and they made DSE out of Solandra.

--
Jacques-Henri Berthemet

From: Jon Haddad [mailto:jonathan.had...@gmail.com] On Behalf Of Jon Haddad
Sent: lundi 20 novembre 2017 18:11
To: user <user@cassandra.apache.org>
Subject: Re: Solr Search With Apache Cassandra

That’s long since been abandoned (last commit was 5 years ago)



On Nov 20, 2017, at 12:10 PM, Nageswara Rao 
<nageswara.r...@gmail.com<mailto:nageswara.r...@gmail.com>> wrote:

There is a fork with name on this combo called solandra

https://github.com/tjake/Solandra<https://github.com/tjake/Solandra>

Please check.


On 20 Nov 2017 3:13 p.m., "@Nandan@" 
<nandanpriyadarshi...@gmail.com<mailto:nandanpriyadarshi...@gmail.com>> wrote:
Sorry, For my mistakes.  I mean that  Using Apache Cassandra for storage and 
Solr for search facility from DSE.
Thanks for your Suggestion, will check with DSE mates.

Thanks
Nandan

On Mon, Nov 20, 2017 at 5:36 PM, Jacques-Henri Berthemet 
<jacques-henri.berthe...@genesys.com<mailto:jacques-henri.berthe...@genesys.com>>
 wrote:
How are Cassandra and Solr related? they are two separate products.

--
Jacques-Henri Berthemet

From: @Nandan@ 
[mailto:nandanpriyadarshi...@gmail.com<mailto:nandanpriyadarshi...@gmail.com>]
Sent: lundi 20 novembre 2017 10:04
To: user <user@cassandra.apache.org<mailto:user@cassandra.apache.org>>
Subject: Re: Solr Search With Apache Cassandra

Hi Jacques,

For testing, I configure Apache Cassandra and Solr.
and Using Solr Admin for testing the query.

Thanks

On Mon, Nov 20, 2017 at 4:37 PM, Jacques-Henri Berthemet 
<jacques-henri.berthe...@genesys.com<mailto:jacques-henri.berthe...@genesys.com>>
 wrote:
Hi,

Apache Cassandra does not have Solr search, it’s Datastax Entreprise that 
supports such feature, you should contact Datastax support for such questions.

Regards,
--
Jacques-Henri Berthemet

From: @Nandan@ 
[mailto:nandanpriyadarshi...@gmail.com<mailto:nandanpriyadarshi...@gmail.com>]
Sent: lundi 20 novembre 2017 06:44
To: user <user@cassandra.apache.org<mailto:user@cassandra.apache.org>>
Subject: Solr Search With Apache Cassandra

Hi All,
How Solr Search affect the READ operation from Cassandra?
I am having a table with 100 columns with Primary Key as UUID.
Note:- I am having 100 columns in a single table because of implemented Advance 
search on multiple columns like E-commerce.

Now my concerns are:-
1) whenever do READ from a table based on a Primary key such as
select * from table1 where col1 = UUID;
It's working perfectly.
2) When even do READ from table with using solr on 1,2 columns such as
using col1:val1 and col2:val2
it is working also perfect.
3) But when I am performing a complex search, it is taking time such as 4-5 
seconds.
even currently READ and WRITE operations are not on the massive scale.

So please tell me what's the cause and how to resolve this.
Thanks
Nandan Priyadarshi





RE: Solr Search With Apache Cassandra

2017-11-20 Thread Jacques-Henri Berthemet
How are Cassandra and Solr related? they are two separate products.

--
Jacques-Henri Berthemet

From: @Nandan@ [mailto:nandanpriyadarshi...@gmail.com]
Sent: lundi 20 novembre 2017 10:04
To: user <user@cassandra.apache.org>
Subject: Re: Solr Search With Apache Cassandra

Hi Jacques,

For testing, I configure Apache Cassandra and Solr.
and Using Solr Admin for testing the query.

Thanks

On Mon, Nov 20, 2017 at 4:37 PM, Jacques-Henri Berthemet 
<jacques-henri.berthe...@genesys.com<mailto:jacques-henri.berthe...@genesys.com>>
 wrote:
Hi,

Apache Cassandra does not have Solr search, it’s Datastax Entreprise that 
supports such feature, you should contact Datastax support for such questions.

Regards,
--
Jacques-Henri Berthemet

From: @Nandan@ 
[mailto:nandanpriyadarshi...@gmail.com<mailto:nandanpriyadarshi...@gmail.com>]
Sent: lundi 20 novembre 2017 06:44
To: user <user@cassandra.apache.org<mailto:user@cassandra.apache.org>>
Subject: Solr Search With Apache Cassandra

Hi All,
How Solr Search affect the READ operation from Cassandra?
I am having a table with 100 columns with Primary Key as UUID.
Note:- I am having 100 columns in a single table because of implemented Advance 
search on multiple columns like E-commerce.

Now my concerns are:-
1) whenever do READ from a table based on a Primary key such as
select * from table1 where col1 = UUID;
It's working perfectly.
2) When even do READ from table with using solr on 1,2 columns such as
using col1:val1 and col2:val2
it is working also perfect.
3) But when I am performing a complex search, it is taking time such as 4-5 
seconds.
even currently READ and WRITE operations are not on the massive scale.

So please tell me what's the cause and how to resolve this.
Thanks
Nandan Priyadarshi



RE: Solr Search With Apache Cassandra

2017-11-20 Thread Jacques-Henri Berthemet
Hi,

Apache Cassandra does not have Solr search, it’s Datastax Entreprise that 
supports such feature, you should contact Datastax support for such questions.

Regards,
--
Jacques-Henri Berthemet

From: @Nandan@ [mailto:nandanpriyadarshi...@gmail.com]
Sent: lundi 20 novembre 2017 06:44
To: user <user@cassandra.apache.org>
Subject: Solr Search With Apache Cassandra

Hi All,
How Solr Search affect the READ operation from Cassandra?
I am having a table with 100 columns with Primary Key as UUID.
Note:- I am having 100 columns in a single table because of implemented Advance 
search on multiple columns like E-commerce.

Now my concerns are:-
1) whenever do READ from a table based on a Primary key such as
select * from table1 where col1 = UUID;
It's working perfectly.
2) When even do READ from table with using solr on 1,2 columns such as
using col1:val1 and col2:val2
it is working also perfect.
3) But when I am performing a complex search, it is taking time such as 4-5 
seconds.
even currently READ and WRITE operations are not on the massive scale.

So please tell me what's the cause and how to resolve this.
Thanks
Nandan Priyadarshi


RE: Executing a check before replication / manual replication

2017-11-17 Thread Jacques-Henri Berthemet
In the trigger API I mentioned you’ll get the data that is about to get 
inserted, you can decode that data and check that it is compliant to your 
security. If you want to kill the node, just call System.exit() or 
CassandraDaemon.stop(); The thing is that if you have RF=4 with 4 nodes, they 
will all receive the same update and this will kill your whole cluster. If 
instead you throw an exception, you’ll prevent the rogue write and your client 
will get an error.

As far as I know there is no public interface to plug your code at the 
replication level, and even if there was one, it would only work when you have 
different DCs. One DC would stay with rogue data, and the other one would 
shutdown.

--
Jacques-Henri Berthemet

From: Abdelkrim Fitouri [mailto:abdou@gmail.com]
Sent: jeudi 16 novembre 2017 22:33
To: user@cassandra.apache.org
Subject: Re: Executing a check before replication / manual replication

ok please find bellow an example:
Lets suppose that i have a cassandra cluster of 4 nodes / one DC / replication 
factor = 4, So in this architecture i have on full copy of the data on each 
node.

Imagine now that one node have been hacked and in some way with full access to 
cqlsh session, if data is changed on that node, data will be changed on the 
three other, am i right ?
imagine now that i am able to know (using cryptographic bases) if one column 
was modified by my API ( => normal way) or not ( => suspicious way), and i want 
to execute this check function just before any replication of a keyspace to 
avoid that all the replica will be affected by that and so a rollback will be 
not easy and the integrity of all the system will be down, the check will for 
example kill the local cassandra service ...
Hope that my question is more clear now.
Many thanks for any help.


2017-11-16 22:01 GMT+01:00 Oliver Ruebenacker 
<cur...@gmail.com<mailto:cur...@gmail.com>>:

 Hello,
  If I understand the OP right, he wants an automated response one node 
displays suspicious activity.
  I suppose in that case, one would want the node to be removed from the 
cluster or shut down or both.
 Best, Oliver

On Thu, Nov 16, 2017 at 3:40 PM, kurt greaves 
<k...@instaclustr.com<mailto:k...@instaclustr.com>> wrote:
What's the purpose here? If they have access to cqlsh, they have access to 
every nodes data, not just the one they are on. An attacker modifying RF would 
be the least of your worries. If you manage to detect that some node is 
compromise you should isolate it immediately.


On 16 Nov. 2017 07:33, "Abdelkrim Fitouri" 
<abdou@gmail.com<mailto:abdou@gmail.com>> wrote:
Hi,

I know that cassandra handel properly data replication between cluster nodes, 
but for some security reasons I am wonderning how to avoid data replication 
after a server node have been compromised and someone is executing modification 
via cqlsh ?

is there a posibility on Cassandra to execute a custom check / Hook  before 
replication ?

is there a posibilty to execute a manual replication between node ?



--

Best Regards.

Abdelkarim FITOURI

System And Security Engineer





--
Oliver Ruebenacker
Senior Software Engineer, Diabetes 
Portal<http://www.type2diabetesgenetics.org/>, Broad 
Institute<http://www.broadinstitute.org/>






RE: SASI and secondary index simultaniously

2017-07-12 Thread Jacques-Henri Berthemet
Hi,

According to SASI source code (3.11.0) it will always have priority over 
regular secondary index:
https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/index/sasi/SASIIndex.java#L234






public long getEstimatedResultRows()


{


// this is temporary (until proper QueryPlan is integrated into 
Cassandra)


// and allows us to priority SASI indexes if any in the query since they


// are going to be more efficient, to query and intersect, than 
built-in indexes.


return Long.MIN_VALUE;


}


I see that index building progress is reported as a CompactionInfo task so you 
should be able to monitor progress using ‘nodetool compactionstats’. Last 
point, from the moment that SASI index is created it will be used over regular 
index so I think you could drop the regular as soon as it is created it will 
make no difference. It also means that you may miss results until it is fully 
built.

Note that I may be wrong, I’m just reading sources as I’m working on a custom 
index.

--
Jacques-Henri Berthemet

From: Vlad [mailto:qa23d-...@yahoo.com.INVALID]
Sent: mercredi 12 juillet 2017 08:56
To: User cassandra.apache.org <user@cassandra.apache.org>
Subject: SASI and secondary index simultaniously

Hi,

it's possible to create both regular secondary index and SASI on the same 
column:

CREATE TABLE ks.tb (id int PRIMARY KEY,  name text);
CREATE CUSTOM INDEX tb_name_idx_1 ON ks.tb (name) USING 
'org.apache.cassandra.index.sasi.SASIIndex';
CREATE INDEX tb_name_idx ON ks.tb (name);
But which one is used for SELECT? Assuming we have regular index and would like 
to migrate to SASI, can we first create SASI, than drop regular? And how can we 
check then index build is completed?

Thanks.




RE: Reg:- Data Modelling For Hierarchy Data

2017-06-09 Thread Jacques-Henri Berthemet
For query 2) you should have a second table, secondary index is usually never 
recommended. If you’re planning to use Cassandra 3.x you should take a look at 
materialized views (MVs):
http://cassandra.apache.org/doc/latest/cql/mvs.html
https://opencredo.com/everything-need-know-cassandra-materialized-views/

I don’t have experience on MVs, I’m stuck on 2.2 for now.

Regards,
--
Jacques-Henri Berthemet

From: @Nandan@ [mailto:nandanpriyadarshi...@gmail.com]
Sent: vendredi 9 juin 2017 10:27
To: Jacques-Henri Berthemet <jacques-henri.berthe...@genesys.com>
Cc: user@cassandra.apache.org
Subject: Re: Reg:- Data Modelling For Hierarchy Data

Hi,
Yes, I am following with single Users table.
Suppose my query patterns are:-
1) Select user by email.
2) Select user by user_type
1st query pattern will satisfy the Users table, but in the case of second query 
pattern, either have to go with another table like user_by_type or I have to 
create secondary index on user_type by which client will able to access Only 
Buyer or Seller Records.

Please suggest the best way.
Best Regards.
Nandan

On Fri, Jun 9, 2017 at 3:59 PM, Jacques-Henri Berthemet 
<jacques-henri.berthe...@genesys.com<mailto:jacques-henri.berthe...@genesys.com>>
 wrote:
Hi,

According to your model a use can only be of one type, so I’d go with a very 
simple model with a single table:

string email (PK), string user_type, map<string, string> attributes

user_type can be Buyer, Master_Seller, Slave_Seller and all other columns go 
into attribute map as long as all of them don’t exceed 64k, but you could 
create dedicate columns for all attributes that you know will always be there.

--
Jacques-Henri Berthemet

From: @Nandan@ 
[mailto:nandanpriyadarshi...@gmail.com<mailto:nandanpriyadarshi...@gmail.com>]
Sent: vendredi 9 juin 2017 03:14
To: user@cassandra.apache.org<mailto:user@cassandra.apache.org>
Subject: Reg:- Data Modelling For Hierarchy Data

Hi,

I am working on Music database where we have multiple order of users of our 
portal. Different category of users is having some common attributes but some 
different attributes based on their registration.
This becomes a hierarchy pattern. I am attaching one sample hierarchy pattern 
of User Module which is somehow part of my current data modeling.

There are few conditions:-
1) email id should be unique. i.e If some user registered with one email id 
then that particular user can't able to register as another user.
2) Some type of users having 20-30 columns as in their registration. such as 
company,address,email,first_name,join_date etc..

Query pattern is like:-
1) select user by email

Please suggest me how to do data modeling for these type of hierarchy data.
Should I create a seperate table for the seperate type of users or should I go 
with single user table?
As we have unique email id condition, so should I go with email id as a primary 
key or user_id UUID will be the best choice.



Best regards,
Nandan Priyadarshi



RE: Reg:- Data Modelling For Hierarchy Data

2017-06-09 Thread Jacques-Henri Berthemet
Hi,

According to your model a use can only be of one type, so I’d go with a very 
simple model with a single table:

string email (PK), string user_type, map<string, string> attributes

user_type can be Buyer, Master_Seller, Slave_Seller and all other columns go 
into attribute map as long as all of them don’t exceed 64k, but you could 
create dedicate columns for all attributes that you know will always be there.

--
Jacques-Henri Berthemet

From: @Nandan@ [mailto:nandanpriyadarshi...@gmail.com]
Sent: vendredi 9 juin 2017 03:14
To: user@cassandra.apache.org
Subject: Reg:- Data Modelling For Hierarchy Data

Hi,

I am working on Music database where we have multiple order of users of our 
portal. Different category of users is having some common attributes but some 
different attributes based on their registration.
This becomes a hierarchy pattern. I am attaching one sample hierarchy pattern 
of User Module which is somehow part of my current data modeling.

There are few conditions:-
1) email id should be unique. i.e If some user registered with one email id 
then that particular user can't able to register as another user.
2) Some type of users having 20-30 columns as in their registration. such as 
company,address,email,first_name,join_date etc..

Query pattern is like:-
1) select user by email

Please suggest me how to do data modeling for these type of hierarchy data.
Should I create a seperate table for the seperate type of users or should I go 
with single user table?
As we have unique email id condition, so should I go with email id as a primary 
key or user_id UUID will be the best choice.



Best regards,
Nandan Priyadarshi


RE: Reg:- CQL SOLR Query Not gives result

2017-05-12 Thread Jacques-Henri Berthemet
While this is indeed a problem with DSE, your problem looks related to CJK 
Lucene indexing, in this context I think your query does not make sense.
(see CJK: https://en.wikipedia.org/wiki/CJK_characters)

If you properly configured your indexing to handle CJK, as it looks like you’re 
searching for Chinese, using wildcards with CJK does not make sense. 中 can be 
considered as a word, not a letter, so partial matches using wildcards don’t 
make sense. Also, CJK analyzer is indexing bi-grams, so you should search for 
pairs of characters.

--
Jacques-Henri Berthemet

From: Jonathan Haddad [mailto:j...@jonhaddad.com]
Sent: vendredi 12 mai 2017 04:21
To: @Nandan@ <nandanpriyadarshi...@gmail.com>; user@cassandra.apache.org
Subject: Re: Reg:- CQL SOLR Query Not gives result

This is a question for datastax support, not the Apache mailing list. Folks 
here are more than happy to help with open source, Apache Cassandra questions, 
if you've got one.
On Thu, May 11, 2017 at 9:06 PM @Nandan@ 
<nandanpriyadarshi...@gmail.com<mailto:nandanpriyadarshi...@gmail.com>> wrote:
Hi ,

In my table, I am having few records and implemented SOLR for partial search 
but not able to retrieve data.

SELECT * from revall_book_by_title where solr_query = 'language:中';
SELECT * from revall_book_by_title where solr_query = 'language:中*';

None of them are working.
Any suggestions.


RE: NoSE: Automated schema design for Cassandra

2017-05-10 Thread Jacques-Henri Berthemet
Hi,

This is interesting, I’d just advise to put full examples and more 
documentation on how to use it (the articles are a bit too detailed).
Also, you should not mention “column families” but just tables.

Was this used to generate a schema used for production?
Do you think it’s possible to generate test code to validate the workload?

--
Jacques-Henri Berthemet

From: michael.m...@gmail.com [mailto:michael.m...@gmail.com] On Behalf Of 
Michael Mior
Sent: mardi 9 mai 2017 17:30
To: user <user@cassandra.apache.org>
Subject: NoSE: Automated schema design for Cassandra

Hi all,

I wanted to share a tool I've been working on that tries to help automate the 
schema design process for Cassandra. The short description is that you provide 
information on the kind of data you want to store and the queries and updates 
you want to issue, and NoSE will perform a cost-based analysis to suggest an 
optimal schema.

There's lots of room for improvement and many Cassandra features which are not 
currently supported, but hopefully some in the community may still find it 
useful as a starting point.

Link to more details and the source code below:

https://michael.mior.ca/projects/nose/<https://michael.mior.ca/projects/nose/>

If you're interested in trying it out, don't hesitate to reach out and I'm 
happy to help!

Cheers,
--
Michael Mior
mm...@uwaterloo.ca<mailto:mm...@uwaterloo.ca>


RE: scylladb

2017-03-12 Thread Jacques-Henri Berthemet
Will you support custom secondary indexes, triggers and UDF?
I checked index code but it’s just a couple of files with commented Java code. 
I’m curious to test Scylladb but our application uses LWT and custom secondary 
indexes, I understand LWT is coming (soon?).

--
Jacques-Henri Berthemet

From: sfesc...@gmail.com [mailto:sfesc...@gmail.com]
Sent: dimanche 12 mars 2017 09:23
To: user@cassandra.apache.org
Subject: Re: scylladb


On Sat, Mar 11, 2017 at 1:52 AM Avi Kivity 
<a...@scylladb.com<mailto:a...@scylladb.com>> wrote:


Lastly, why don't you test Scylla yourself?  It's pretty easy to set up, 
there's nothing to tune.

Avi

 I'll look seriously at Scylla when it is 3.0.12 compatible.


RE: scylladb

2017-03-10 Thread Jacques-Henri Berthemet
Cassandra is not about pure performance, there are many other DBs that are much 
faster than Cassandra. Cassandra strength is all about scalability, performance 
increases in a linear way as you add more nodes. During Cassandra summit 2014 
Apple said they have a 10k node cluster. The usual limiting factor is your disk 
write speed and latency, I don’t see how C++ changes anything in this regard 
unless you can cache all your data in memory.

I’d be curious to know how ScyllaDB performs with a 100+ nodes cluster with PBs 
of data compared to Cassandra.
--
Jacques-Henri Berthemet

From: Rakesh Kumar [mailto:rakeshkumar...@outlook.com]
Sent: vendredi 10 mars 2017 09:58
To: user@cassandra.apache.org
Subject: Re: scylladb

Cassanda vs Scylla is a valid comparison because they both are compatible.  
Scylla is a drop-in replacement for Cassandra.
Is Aerospike a drop-in replacement for Cassandra? If yes, and only if yes, then 
the comparison is valid with Scylla.


From: Bhuvan Rawal <bhu1ra...@gmail.com<mailto:bhu1ra...@gmail.com>>
To: user@cassandra.apache.org<mailto:user@cassandra.apache.org>
Sent: Friday, March 10, 2017 11:59 AM
Subject: Re: scylladb

Agreed C++ gives an added advantage to talk to underlying hardware with better 
efficiency, it sound good but can a pice of code written in C++ give 1000% 
throughput than a Java app? Is TPC design 10X more performant than SEDA arch?

And if C/C++ is indeed that fast how can Aerospike (which is itself written in 
C) claim to be 10X faster than Scylla here 
http://www.aerospike.com/benchmarks/scylladb-initial/<http://www.aerospike.com/benchmarks/scylladb-initial/>
 ? (Combining your's and aerospike's benchmarks it appears that Aerospike is 
100X performant than C* - I highly doubt that!! )

For a moment lets forget about evaluating 2 different databases, one can 
observe 10X performance difference between a mistuned cassandra cluster and one 
thats tuned as per data model - there are so many Tunables in yaml as well as 
table configs.

Idea is - in order to strengthen your claim, you need to provide complete 
system metrics (Disk, CPU, Network), the OPS increase starts to decay along 
with the configs used. Having plain ops per second and 99p latency is blackbox.

Regards,
Bhuvan

On Fri, Mar 10, 2017 at 12:47 PM, Avi Kivity 
<a...@scylladb.com<mailto:a...@scylladb.com>> wrote:
ScyllaDB engineer here.

C++ is really an enabling technology here. It is directly responsible for a 
small fraction of the gain by executing faster than Java.  But it is indirectly 
responsible for the gain by allowing us direct control over memory and 
threading.  Just as an example, Scylla starts by taking over almost all of the 
machine's memory, and dynamically assigning it to memtables, cache, and working 
memory needed to handle requests in flight.  Memory is statically partitioned 
across cores, allowing us to exploit NUMA fully.  You can't do these things in 
Java.

I would say the major contributors to Scylla performance are:
 - thread-per-core design
 - replacement of the page cache with a row cache
 - careful attention to many small details, each contributing a little, but 
with a large overall impact

While I'm here I can say that performance is not the only goal here, it is 
stable and predictable performance over varying loads and during maintenance 
operations like repair, without any special tuning.  We measure the amount of 
CPU and I/O spent on foreground (user) and background (maintenance) tasks and 
divide them fairly.  This work is not complete but already makes operating 
Scylla a lot simpler.


On 03/10/2017 01:42 AM, Kant Kodali wrote:
I dont think ScyllaDB performance is because of C++. The design decisions in 
scylladb are indeed different from Cassandra such as getting rid of SEDA and 
moving to TPC and so on.

If someone thinks it is because of C++ then just show the benchmarks that 
proves it is indeed the C++ which gave 10X performance boost as ScyllaDB claims 
instead of stating it.


On Thu, Mar 9, 2017 at 3:22 PM, Richard L. Burton III 
<mrbur...@gmail.com<mailto:mrbur...@gmail.com>> wrote:
They spend an enormous amount of time focusing on performance. You can expect 
them to continue on with their optimization and keep crushing it.

P.S., I don't work for ScyllaDB.

On Thu, Mar 9, 2017 at 6:02 PM, Rakesh Kumar 
<rakeshkumar...@outlook.com<mailto:rakeshkumar...@outlook.com>> wrote:
In all of their presentation they keep harping on the fact that scylladb is 
written in C++ and does not carry the overhead of Java.  Still the difference 
looks staggering.
__ __
From: daemeon reiydelle <daeme...@gmail.com<mailto:daeme...@gmail.com>>
Sent: Thursday, March 9, 2017 14:21
To: user@cassandra.apache.org<mailto:user@cassandra.apache.org>
Subject: Re: scylladb

The comparison is fair, and conservative. Did substanti

RE: [C*3.0.3]lucene indexes not deleted and nodetool repair makes DC unavailable

2016-05-06 Thread Jacques-Henri Berthemet
Hi,

This is very interesting that there is a difference between vanilla Cassandra 
and DSC, I thought it was simply the same thing provided with a setup/package.

--
Jacques-Henri Berthemet

From: Eduardo Alonso [mailto:eduardoalo...@stratio.com]
Sent: vendredi 6 mai 2016 07:36
To: user@cassandra.apache.org
Subject: Re: [C*3.0.3]lucene indexes not deleted and nodetool repair makes DC 
unavailable

Hi Siddaharth:

I have tested with apache cassandra 3.0.3 and cassandra-lucene-index-3.0.3.1 
and works well but with dsc-cassandra-3.0.3 it does not delete the lucene files.

Please, can you open an 
issue<https://github.com/stratio/cassandra-lucene-index/issues>?


Refering to your second question, depending on the consistency level used for 
writes the repairs in a DC could be very resource-intensive, I have some 
questions:

1º- How do you run nodetool repair? Do you run it in every machine at the same 
time or waits for finishing in one machine to run it in the next? Do you use 
primary range repair (-pr argument) in nodetool repair?

2º- Which is the replication factor for that keyspace?, Which consistency level 
do you use in writes?

Eduardo Alonso
[Image removed by sender.]
Vía de las dos Castillas, 33, Ática 4, 3ª Planta
28224 Pozuelo de Alarcón, Madrid
Tel: +34 91 828 6473 // www.stratio.com<http://www.stratio.com/> // 
@stratiobd<https://twitter.com/StratioBD>

2016-05-06 9:43 GMT+02:00 Siddharth Verma 
<verma.siddha...@snapdeal.com<mailto:verma.siddha...@snapdeal.com>>:
Hi,
I have 2 queries. We are using cassandra dsc 3.0.3 and stratio lucene indexes 
on tables.
1. when table is truncated, lucene index is not cleared for the same. we see 
that it still occupied space on disk.
2. when we run nodetool repair, all node are up (nodetool status) but we can't 
connect to either of the nodes in the same DC.
Any help would be appreciated
Thanks
Siddharth Verma



RE: how expensive is light weight transaction: if not exists

2016-04-27 Thread Jacques-Henri Berthemet
Hi,

You can’t batch LWT if they don’t work on the same partition. So in your below 
queries all the values of “id” must be the same.

--
Jacques-Henri Berthemet

From: y2k...@gmail.com [mailto:y2k...@gmail.com] On Behalf Of Jimmy Lin
Sent: mercredi 27 avril 2016 18:14
To: user@cassandra.apache.org
Subject: how expensive is light weight transaction: if not exists

hi all,
we like to consider using light weight transaction like the following:
begin batch:
update table set x=y where id=A if not exists;
update table set x=y where id=B if not exists;
update table set x=y where id=C if not exists;
update table set x=y where id=D if not exists;
apply batch
(using LOCAL_QUORUM)
I know there is lot of things going on behind the cass light weight 
transaction, just how much overhead when using "if not exists" ?


RE: How many nodes do we require

2016-03-31 Thread Jacques-Henri Berthemet
You’re right. I meant about data integrity, I understand it’s not everybody’s 
priority!

--
Jacques-Henri Berthemet

From: Jonathan Haddad [mailto:j...@jonhaddad.com]
Sent: jeudi 31 mars 2016 17:48
To: user@cassandra.apache.org
Subject: Re: How many nodes do we require

Losing a write is very different from having a fragile cluster.  A fragile 
cluster implies that whole thing will fall apart, that it breaks easily.  
Writing at CL=ONE gives you a pretty damn stable cluster at the potential risk 
of losing a write that hasn't replicated (but has been ack'ed) which for a lot 
of people is preferable to downtime.  CL=ONE gives you the *most stable* 
cluster you can have.
On Tue, Mar 29, 2016 at 12:57 AM Jacques-Henri Berthemet 
<jacques-henri.berthe...@genesys.com<mailto:jacques-henri.berthe...@genesys.com>>
 wrote:
Because if you lose a node you have chances to lose some data forever if it was 
not yet replicated.

--
Jacques-Henri Berthemet

From: Jonathan Haddad [mailto:j...@jonhaddad.com<mailto:j...@jonhaddad.com>]
Sent: vendredi 25 mars 2016 19:37

To: user@cassandra.apache.org<mailto:user@cassandra.apache.org>
Subject: Re: How many nodes do we require

Why would using CL-ONE make your cluster fragile? This isn't obvious to me. 
It's the most practical setting for high availability, which very much says 
"not fragile".
On Fri, Mar 25, 2016 at 10:44 AM Jacques-Henri Berthemet 
<jacques-henri.berthe...@genesys.com<mailto:jacques-henri.berthe...@genesys.com>>
 wrote:
I found this calculator very convenient:
http://www.ecyrd.com/cassandracalculator/

Regardless of your other DCs you need RF=3 if you write at LOCAL_QUORUM, RF=2 
if you write/read at ONE.

Obviously using ONE as CL makes your cluster very fragile.
--
Jacques-Henri Berthemet


-Original Message-
From: Rakesh Kumar 
[mailto:rakeshkumar46...@gmail.com<mailto:rakeshkumar46...@gmail.com>]
Sent: vendredi 25 mars 2016 18:14
To: user@cassandra.apache.org<mailto:user@cassandra.apache.org>
Subject: Re: How many nodes do we require

On Fri, Mar 25, 2016 at 11:45 AM, Jack Krupansky
<jack.krupan...@gmail.com<mailto:jack.krupan...@gmail.com>> wrote:
> It depends on how much data you have. A single node can store a lot of data,
> but the more data you have the longer a repair or node replacement will
> take. How long can you tolerate for a full repair or node replacement?

At this time, for a foreseeable future, size of data will not be
significant. So we can safely disregard the above as a decision
factor.

>
> Generally, RF=3 is both sufficient and recommended.

Are you telling a SimpleReplication topology with RF=3
or NetworkTopology with RF=3.


taken from:

https://docs.datastax.com/en/cassandra/2.0/cassandra/architecture/architectureDataDistributeReplication_c.html

"
Three replicas in each data center: This configuration tolerates
either the failure of a one node per replication group at a strong
consistency level of LOCAL_QUORUM or multiple node failures per data
center using consistency level ONE."

In our case, with only 3 nodes in each DC, wouldn't a RF=3 effectively mean ALL.

I will state our requirement clearly:

If we are going with six nodes (3 in each DC), we should be able to
write even with a loss of one DC and loss of one node of the surviving
DC. I am open to hearing what compromise we have to do with the reads
during the time a DC is down. For us write is critical, more than
reads.

May be this is not possible with 6 nodes, and requires more.  Pls advise.


RE: How many nodes do we require

2016-03-29 Thread Jacques-Henri Berthemet
Because if you lose a node you have chances to lose some data forever if it was 
not yet replicated.

--
Jacques-Henri Berthemet

From: Jonathan Haddad [mailto:j...@jonhaddad.com]
Sent: vendredi 25 mars 2016 19:37
To: user@cassandra.apache.org
Subject: Re: How many nodes do we require

Why would using CL-ONE make your cluster fragile? This isn't obvious to me. 
It's the most practical setting for high availability, which very much says 
"not fragile".
On Fri, Mar 25, 2016 at 10:44 AM Jacques-Henri Berthemet 
<jacques-henri.berthe...@genesys.com<mailto:jacques-henri.berthe...@genesys.com>>
 wrote:
I found this calculator very convenient:
http://www.ecyrd.com/cassandracalculator/

Regardless of your other DCs you need RF=3 if you write at LOCAL_QUORUM, RF=2 
if you write/read at ONE.

Obviously using ONE as CL makes your cluster very fragile.
--
Jacques-Henri Berthemet


-Original Message-
From: Rakesh Kumar 
[mailto:rakeshkumar46...@gmail.com<mailto:rakeshkumar46...@gmail.com>]
Sent: vendredi 25 mars 2016 18:14
To: user@cassandra.apache.org<mailto:user@cassandra.apache.org>
Subject: Re: How many nodes do we require

On Fri, Mar 25, 2016 at 11:45 AM, Jack Krupansky
<jack.krupan...@gmail.com<mailto:jack.krupan...@gmail.com>> wrote:
> It depends on how much data you have. A single node can store a lot of data,
> but the more data you have the longer a repair or node replacement will
> take. How long can you tolerate for a full repair or node replacement?

At this time, for a foreseeable future, size of data will not be
significant. So we can safely disregard the above as a decision
factor.

>
> Generally, RF=3 is both sufficient and recommended.

Are you telling a SimpleReplication topology with RF=3
or NetworkTopology with RF=3.


taken from:

https://docs.datastax.com/en/cassandra/2.0/cassandra/architecture/architectureDataDistributeReplication_c.html

"
Three replicas in each data center: This configuration tolerates
either the failure of a one node per replication group at a strong
consistency level of LOCAL_QUORUM or multiple node failures per data
center using consistency level ONE."

In our case, with only 3 nodes in each DC, wouldn't a RF=3 effectively mean ALL.

I will state our requirement clearly:

If we are going with six nodes (3 in each DC), we should be able to
write even with a loss of one DC and loss of one node of the surviving
DC. I am open to hearing what compromise we have to do with the reads
during the time a DC is down. For us write is critical, more than
reads.

May be this is not possible with 6 nodes, and requires more.  Pls advise.


RE: How many nodes do we require

2016-03-25 Thread Jacques-Henri Berthemet
I found this calculator very convenient:
http://www.ecyrd.com/cassandracalculator/

Regardless of your other DCs you need RF=3 if you write at LOCAL_QUORUM, RF=2 
if you write/read at ONE.

Obviously using ONE as CL makes your cluster very fragile.
--
Jacques-Henri Berthemet


-Original Message-
From: Rakesh Kumar [mailto:rakeshkumar46...@gmail.com] 
Sent: vendredi 25 mars 2016 18:14
To: user@cassandra.apache.org
Subject: Re: How many nodes do we require

On Fri, Mar 25, 2016 at 11:45 AM, Jack Krupansky
<jack.krupan...@gmail.com> wrote:
> It depends on how much data you have. A single node can store a lot of data,
> but the more data you have the longer a repair or node replacement will
> take. How long can you tolerate for a full repair or node replacement?

At this time, for a foreseeable future, size of data will not be
significant. So we can safely disregard the above as a decision
factor.

>
> Generally, RF=3 is both sufficient and recommended.

Are you telling a SimpleReplication topology with RF=3
or NetworkTopology with RF=3.


taken from:

https://docs.datastax.com/en/cassandra/2.0/cassandra/architecture/architectureDataDistributeReplication_c.html

"
Three replicas in each data center: This configuration tolerates
either the failure of a one node per replication group at a strong
consistency level of LOCAL_QUORUM or multiple node failures per data
center using consistency level ONE."

In our case, with only 3 nodes in each DC, wouldn't a RF=3 effectively mean ALL.

I will state our requirement clearly:

If we are going with six nodes (3 in each DC), we should be able to
write even with a loss of one DC and loss of one node of the surviving
DC. I am open to hearing what compromise we have to do with the reads
during the time a DC is down. For us write is critical, more than
reads.

May be this is not possible with 6 nodes, and requires more.  Pls advise.



RE: Updating secondary index options

2016-03-04 Thread Jacques-Henri Berthemet
Indeed it’s a custom implementation of PerRowSecondaryIndex, in my case I know 
it’s safe to update the particular setting I want update, and it won’t rebuild 
the index, just provide the ability to tune some settings.

Even on regular Cassandra indexes that are based on SSTables you could want to 
update read_repair_chance, sstable_compression, compaction, caching …

I created the below Jira “Wish”:
CASSANDRA-11306<https://issues.apache.org/jira/browse/CASSANDRA-11306> Add 
support for ALTER INDEX command

For now I’ll have to hack something indeed. I tried to update system tables 
that holds index options but it seems I need to restart Cassandra twice to be 
able to see the changes.

Regards,
--
Jacques-Henri Berthemet

From: Jack Krupansky [mailto:jack.krupan...@gmail.com]
Sent: vendredi 4 mars 2016 18:40
To: user@cassandra.apache.org
Subject: Re: Updating secondary index options

Is this a secondary indexer of your own design so that you know that changing 
the options will be safe for existing index entries?

It might be worth a Jira.

Otherwise, you may jus have to manually go in and hack the information under 
the hood.

-- Jack Krupansky

On Fri, Mar 4, 2016 at 12:14 PM, DuyHai Doan 
<doanduy...@gmail.com<mailto:doanduy...@gmail.com>> wrote:
Unfortunately for you, ALTER INDEX does not exist.

And anyway, even if it exists, altering an index option is going likely to 
require index rebuild so you can't cut it anyway

On Fri, Mar 4, 2016 at 4:59 PM, Jacques-Henri Berthemet 
<jacques-henri.berthe...@genesys.com<mailto:jacques-henri.berthe...@genesys.com>>
 wrote:
It’s not possible, it’s a PerRowSecondary index, potentially as big as the 
table itself (few TBs) it will take a very long time to drop and re-create.

--
Jacques-Henri Berthemet

From: DuyHai Doan [mailto:doanduy...@gmail.com<mailto:doanduy...@gmail.com>]
Sent: vendredi 4 mars 2016 14:52
To: user@cassandra.apache.org<mailto:user@cassandra.apache.org>
Subject: Re: Updating secondary index options

DROP and re-create the index with the new options

On Fri, Mar 4, 2016 at 3:45 PM, Jacques-Henri Berthemet 
<jacques-henri.berthe...@genesys.com<mailto:jacques-henri.berthe...@genesys.com>>
 wrote:
Hi,

I’m using Cassandra 2.2.5 with a custom secondary index. It’s created with the 
below syntax:
https://docs.datastax.com/en/cql/3.1/cql/cql_reference/create_index_r.html
CREATE CUSTOM INDEX ON users (email) USING 'path.to.the.IndexClass' WITH 
OPTIONS = {'some_setting': 'value'};
I’d like to update those settings, I tried the below command based on ALTER 
TABLE but it does not work:
cqlsh:test> alter index table_idx WITH OPTIONS = {'some_setting': 'value'};
SyntaxException: 

Is there a way to send such updates?

Regards,
Jacques-Henri






RE: Updating secondary index options

2016-03-04 Thread Jacques-Henri Berthemet
It’s not possible, it’s a PerRowSecondary index, potentially as big as the 
table itself (few TBs) it will take a very long time to drop and re-create.

--
Jacques-Henri Berthemet

From: DuyHai Doan [mailto:doanduy...@gmail.com]
Sent: vendredi 4 mars 2016 14:52
To: user@cassandra.apache.org
Subject: Re: Updating secondary index options

DROP and re-create the index with the new options

On Fri, Mar 4, 2016 at 3:45 PM, Jacques-Henri Berthemet 
<jacques-henri.berthe...@genesys.com<mailto:jacques-henri.berthe...@genesys.com>>
 wrote:
Hi,

I’m using Cassandra 2.2.5 with a custom secondary index. It’s created with the 
below syntax:
https://docs.datastax.com/en/cql/3.1/cql/cql_reference/create_index_r.html
CREATE CUSTOM INDEX ON users (email) USING 'path.to.the.IndexClass' WITH 
OPTIONS = {'some_setting': 'value'};
I’d like to update those settings, I tried the below command based on ALTER 
TABLE but it does not work:
cqlsh:test> alter index table_idx WITH OPTIONS = {'some_setting': 'value'};
SyntaxException: 

Is there a way to send such updates?

Regards,
Jacques-Henri




Updating secondary index options

2016-03-04 Thread Jacques-Henri Berthemet
Hi,

I'm using Cassandra 2.2.5 with a custom secondary index. It's created with the 
below syntax:
https://docs.datastax.com/en/cql/3.1/cql/cql_reference/create_index_r.html
CREATE CUSTOM INDEX ON users (email) USING 'path.to.the.IndexClass' WITH 
OPTIONS = {'some_setting': 'value'};
I'd like to update those settings, I tried the below command based on ALTER 
TABLE but it does not work:
cqlsh:test> alter index table_idx WITH OPTIONS = {'some_setting': 'value'};
SyntaxException: 

Is there a way to send such updates?

Regards,
Jacques-Henri



RE: automated CREATE TABLE just nuked my cluster after a 2.0 -> 2.1 upgrade....

2016-02-03 Thread Jacques-Henri Berthemet
You will have the same problem without IF NOT EXIST, at least I had Cassandra 
2.1 complaining about having tables with the same name but different UUIDs. In 
the end in our case we have a single application node that is responsible for 
schema upgrades, that’s ok for us as we don’t plan to have the schema upgraded 
that much.

--
Jacques-Henri Berthemet

From: Ken Hancock [mailto:ken.hanc...@schange.com]
Sent: mardi 2 février 2016 17:14
To: user@cassandra.apache.org
Subject: Re: automated CREATE TABLE just nuked my cluster after a 2.0 -> 2.1 
upgrade

Just to close the loop on this, but am I correct that the IF NOT EXITS isn't 
the real problem?  Even multiple calls to CREATE TABLE cause the same schema 
mismatch if done concurrently?  Normally, a CREATE TABLE call will return an 
exception that the table already exists.

On Tue, Feb 2, 2016 at 11:06 AM, Jack Krupansky 
<jack.krupan...@gmail.com<mailto:jack.krupan...@gmail.com>> wrote:
And CASSANDRA-10699  seems to be the sub-issue of CASSANDRA-9424 to do that:
https://issues.apache.org/jira/browse/CASSANDRA-10699


-- Jack Krupansky

On Tue, Feb 2, 2016 at 9:59 AM, Sebastian Estevez 
<sebastian.este...@datastax.com<mailto:sebastian.este...@datastax.com>> wrote:

Hi Ken,

Earlier in this thread I posted a link to 
https://issues.apache.org/jira/browse/CASSANDRA-9424

That is the fix for these schema disagreement issues and as ay commented, the 
plan is to use CAS. Until then we have to treat schema delicately.

all the best,

Sebastián
On Feb 2, 2016 9:48 AM, "Ken Hancock" 
<ken.hanc...@schange.com<mailto:ken.hanc...@schange.com>> wrote:
So this rings odd to me.  If you can accomplish the same thing by using a CAS 
operation, why not fix create table if not exist so that if your are writing an 
application that creates the table on startup, that the application is safe to 
run on multiple nodes and uses CAS to safeguard multiple concurrent creations?

On Tue, Jan 26, 2016 at 12:32 PM, Eric Stevens 
<migh...@gmail.com<mailto:migh...@gmail.com>> wrote:
There's still a race condition there, because two clients could SELECT at the 
same time as each other, then both INSERT.

You'd be better served with a CAS operation, and let Paxos guarantee 
at-most-once execution.

On Tue, Jan 26, 2016 at 9:06 AM Francisco Reyes 
<li...@natserv.net<mailto:li...@natserv.net>> wrote:
On 01/22/2016 10:29 PM, Kevin Burton wrote:
I sort of agree.. but we are also considering migrating to hourly tables.. and 
what if the single script doesn't run.

I like having N nodes make changes like this because in my experience that 
central / single box will usually fail at the wrong time :-/



On Fri, Jan 22, 2016 at 6:47 PM, Jonathan Haddad 
<j...@jonhaddad.com<mailto:j...@jonhaddad.com>> wrote:
Instead of using ZK, why not solve your concurrency problem by removing it?  By 
that, I mean simply have 1 process that creates all your tables instead of 
creating a race condition intentionally?

On Fri, Jan 22, 2016 at 6:16 PM Kevin Burton 
<bur...@spinn3r.com<mailto:bur...@spinn3r.com>> wrote:
Not sure if this is a bug or not or kind of a *fuzzy* area.

In 2.0 this worked fine.

We have a bunch of automated scripts that go through and create tables... one 
per day.

at midnight UTC our entire CQL went offline.. .took down our whole app.  ;-/

The resolution was a full CQL shut down and then a drop table to remove the bad 
tables...

pretty sure the issue was with schema disagreement.

All our CREATE TABLE use IF NOT EXISTS but I think the IF NOT EXISTS only 
checks locally?

My work around is going to be to use zookeeper to create a mutex lock during 
this operation.

Any other things I should avoid?


--
We’re hiring if you know of any awesome Java Devops or Linux Operations 
Engineers!

Founder/CEO Spinn3r.com<http://Spinn3r.com>
Location: San Francisco, CA
blog: http://burtonator.wordpress.com
… or check out my Google+ 
profile<https://plus.google.com/102718274791889610666/posts>
Error! Filename not specified.



--
We’re hiring if you know of any awesome Java Devops or Linux Operations 
Engineers!

Founder/CEO Spinn3r.com<http://Spinn3r.com>
Location: San Francisco, CA
blog: http://burtonator.wordpress.com
… or check out my Google+ 
profile<https://plus.google.com/102718274791889610666/posts>
Error! Filename not specified.

One way to accomplish both, a single process doing the work and having multiple 
machines be able to do it, is to have a control table.

You can have a table that lists what tables have been created and force 
concistency all. In this table you list the names of tables created. If a table 
name is in there, it doesn't need to be created again.



--
Ken Hancock | System Architect, Advanced Advertising
SeaChange International
50 Nagog Park
Acton, Massachusetts 01720
ken.hanc...@schange.com<mailto:ken.hanc...@schange.com> | 
www.s

RE: Changing schema on multiple nodes while they are isolated

2015-10-05 Thread Jacques-Henri Berthemet
Then maybe Cassandra is not the right tool for that, or you need a different 
data structure. For example you could keep a single table where what used to be 
your table name is now a part of you partition key. That way any “offline” data 
will be merged when the nodes join again. If you have conflicts it will be 
resolved on the basis of the row timestamp.

--
Jacques-Henri Berthemet

From: Stephen Baynes [mailto:stephen.bay...@smoothwall.net]
Sent: lundi 5 octobre 2015 11:00
To: user@cassandra.apache.org
Subject: Re: Changing schema on multiple nodes while they are isolated

> Why don’t you simply let the node join the cluster? It will pull new tables 
> and the data automatically.

Because there is no guarantee the rest of the cluster is up, or even if there 
is anything more than a cluster of one at this time. This is a plug in and go 
environment where the user does not even know or care about the details of 
Cassandra. It is not a managed datacenter.

On 2 October 2015 at 17:16, Jacques-Henri Berthemet 
<jacques-henri.berthe...@genesys.com<mailto:jacques-henri.berthe...@genesys.com>>
 wrote:
Why don’t you simply let the node join the cluster? It will pull new tables and 
the data automatically.

--
Jacques-Henri Berthemet

From: Stephen Baynes 
[mailto:stephen.bay...@smoothwall.net<mailto:stephen.bay...@smoothwall.net>]
Sent: vendredi 2 octobre 2015 18:08
To: user@cassandra.apache.org<mailto:user@cassandra.apache.org>
Subject: Re: Changing schema on multiple nodes while they are isolated

Hi Jacques-Henri

You are right - serious trouble. I managed some more testing and it does not 
repair or share any data. In the logs I see lots of:

WARN  [MessagingService-Incoming-/10.50.16.214<http://10.50.16.214>] 2015-10-02 
16:52:36,810 IncomingTcpConnection.java:100 - UnknownColumnFamilyException 
reading from socket; closing
org.apache.cassandra.db.UnknownColumnFamilyException: Couldn't find 
cfId=e6828dd0-691a-11e5-8a27-b1780df21c7c
 at 
org.apache.cassandra.db.ColumnFamilySerializer.deserializeCfId(ColumnFamilySerializer.java:163)
 ~[apache-cassandra-2.2.1.jar:2.2.1]
 at 
org.apache.cassandra.db.ColumnFamilySerializer.deserialize(ColumnFamilySerializer.java:96)
 ~[apache-cassandra-2.2.1.jar:2.2.1]

and some:

ERROR [AntiEntropyStage:1] 2015-10-02 16:48:16,546 
RepairMessageVerbHandler.java:164 - Got error, removing parent repair session
ERROR [AntiEntropyStage:1] 2015-10-02 16:48:16,548 CassandraDaemon.java:183 - 
Exception in thread Thread[AntiEntropyStage:1,5,main]
java.lang.RuntimeException: java.lang.NullPointerException
 at 
org.apache.cassandra.repair.RepairMessageVerbHandler.doVerb(RepairMessageVerbHandler.java:167)
 ~[apache-cassandra-2.2.1.jar:2.2.1]
 at 
org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:66) 
~[apache-cassandra-2.2.1.jar:2.2.1]


Will need to do some thinking about this. I wonder about shiping a backup of a 
good system keyspace and restore it on each node before it starts for the first 
time - but will that end up with each node having the same internal id?



On 2 October 2015 at 16:27, Jacques-Henri Berthemet 
<jacques-henri.berthe...@genesys.com<mailto:jacques-henri.berthe...@genesys.com>>
 wrote:
Hi Stephen,

If you manage to create tables on each node while node A and B are separated, 
you’ll get into troubles when they will reconnect again. I had the case 
previously and Cassandra complained that tables with same names but different 
ids were present in the keyspace. I don’t know if there is a way to fix that 
with nodetool but I don’t think that it is a good practice.

To solve this, we have a “schema creator” application node that is responsible 
to change the schema. If this node is down, schema updates are not possible. We 
can make any node ‘creator’, but only one can be enabled at any given time.
--
Jacques-Henri Berthemet

From: Stephen Baynes 
[mailto:stephen.bay...@smoothwall.net<mailto:stephen.bay...@smoothwall.net>]
Sent: vendredi 2 octobre 2015 16:46
To: user@cassandra.apache.org<mailto:user@cassandra.apache.org>
Subject: Changing schema on multiple nodes while they are isolated

Is it safe to make schema changes ( e.g. create keyspace and tables ) on 
multiple separate nodes of a cluster while they are out of communication with 
other nodes in the cluster? For example create on node A while node B is down, 
create on node B while A is down, then bring both up together.

We are looking to embed Cassandra invisibly in another product and we have no 
control in what order users may start/stop the nodes up or add/remove them from 
clusters. And Cassandra must come up and be working with at least local access 
regardless. So this means always creating keyspaces and tables so they are 
always present. But this means nodes joining clusters which already have the 
same keyspace and table defined. Will it cause any issues? I have done some 
testing and saw some 

RE: Changing schema on multiple nodes while they are isolated

2015-10-02 Thread Jacques-Henri Berthemet
Hi Stephen,

If you manage to create tables on each node while node A and B are separated, 
you’ll get into troubles when they will reconnect again. I had the case 
previously and Cassandra complained that tables with same names but different 
ids were present in the keyspace. I don’t know if there is a way to fix that 
with nodetool but I don’t think that it is a good practice.

To solve this, we have a “schema creator” application node that is responsible 
to change the schema. If this node is down, schema updates are not possible. We 
can make any node ‘creator’, but only one can be enabled at any given time.
--
Jacques-Henri Berthemet

From: Stephen Baynes [mailto:stephen.bay...@smoothwall.net]
Sent: vendredi 2 octobre 2015 16:46
To: user@cassandra.apache.org
Subject: Changing schema on multiple nodes while they are isolated

Is it safe to make schema changes ( e.g. create keyspace and tables ) on 
multiple separate nodes of a cluster while they are out of communication with 
other nodes in the cluster? For example create on node A while node B is down, 
create on node B while A is down, then bring both up together.

We are looking to embed Cassandra invisibly in another product and we have no 
control in what order users may start/stop the nodes up or add/remove them from 
clusters. And Cassandra must come up and be working with at least local access 
regardless. So this means always creating keyspaces and tables so they are 
always present. But this means nodes joining clusters which already have the 
same keyspace and table defined. Will it cause any issues? I have done some 
testing and saw some some issues when I tried to nodetool repair to bring 
things into sync. However at the time I was fighting with what I later 
discovered was CASSANDRA-9689 keyspace does not show in describe list, if 
create query times out.<https://issues.apache.org/jira/browse/CASSANDRA-9689> 
and did not know what was what. I will give it another try sometime, but would 
appreciate knowing if this is going to run into trouble before we find it.

We are basically using Cassandra to share fairly transient information We can 
cope with data loss during environment changes and occasional losses at other 
times. But if the environment is stable then it should all just work, whatever 
the environment is. We use a very high replication factor so all nodes have a 
copy of all the data and will keep working even if they are the only one up.

Thanks

--

Stephen Baynes


RE: Changing schema on multiple nodes while they are isolated

2015-10-02 Thread Jacques-Henri Berthemet
Why don’t you simply let the node join the cluster? It will pull new tables and 
the data automatically.

--
Jacques-Henri Berthemet

From: Stephen Baynes [mailto:stephen.bay...@smoothwall.net]
Sent: vendredi 2 octobre 2015 18:08
To: user@cassandra.apache.org
Subject: Re: Changing schema on multiple nodes while they are isolated

Hi Jacques-Henri

You are right - serious trouble. I managed some more testing and it does not 
repair or share any data. In the logs I see lots of:

WARN  [MessagingService-Incoming-/10.50.16.214<http://10.50.16.214>] 2015-10-02 
16:52:36,810 IncomingTcpConnection.java:100 - UnknownColumnFamilyException 
reading from socket; closing
org.apache.cassandra.db.UnknownColumnFamilyException: Couldn't find 
cfId=e6828dd0-691a-11e5-8a27-b1780df21c7c
 at 
org.apache.cassandra.db.ColumnFamilySerializer.deserializeCfId(ColumnFamilySerializer.java:163)
 ~[apache-cassandra-2.2.1.jar:2.2.1]
 at 
org.apache.cassandra.db.ColumnFamilySerializer.deserialize(ColumnFamilySerializer.java:96)
 ~[apache-cassandra-2.2.1.jar:2.2.1]

and some:

ERROR [AntiEntropyStage:1] 2015-10-02 16:48:16,546 
RepairMessageVerbHandler.java:164 - Got error, removing parent repair session
ERROR [AntiEntropyStage:1] 2015-10-02 16:48:16,548 CassandraDaemon.java:183 - 
Exception in thread Thread[AntiEntropyStage:1,5,main]
java.lang.RuntimeException: java.lang.NullPointerException
 at 
org.apache.cassandra.repair.RepairMessageVerbHandler.doVerb(RepairMessageVerbHandler.java:167)
 ~[apache-cassandra-2.2.1.jar:2.2.1]
 at 
org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:66) 
~[apache-cassandra-2.2.1.jar:2.2.1]


Will need to do some thinking about this. I wonder about shiping a backup of a 
good system keyspace and restore it on each node before it starts for the first 
time - but will that end up with each node having the same internal id?



On 2 October 2015 at 16:27, Jacques-Henri Berthemet 
<jacques-henri.berthe...@genesys.com<mailto:jacques-henri.berthe...@genesys.com>>
 wrote:
Hi Stephen,

If you manage to create tables on each node while node A and B are separated, 
you’ll get into troubles when they will reconnect again. I had the case 
previously and Cassandra complained that tables with same names but different 
ids were present in the keyspace. I don’t know if there is a way to fix that 
with nodetool but I don’t think that it is a good practice.

To solve this, we have a “schema creator” application node that is responsible 
to change the schema. If this node is down, schema updates are not possible. We 
can make any node ‘creator’, but only one can be enabled at any given time.
--
Jacques-Henri Berthemet

From: Stephen Baynes 
[mailto:stephen.bay...@smoothwall.net<mailto:stephen.bay...@smoothwall.net>]
Sent: vendredi 2 octobre 2015 16:46
To: user@cassandra.apache.org<mailto:user@cassandra.apache.org>
Subject: Changing schema on multiple nodes while they are isolated

Is it safe to make schema changes ( e.g. create keyspace and tables ) on 
multiple separate nodes of a cluster while they are out of communication with 
other nodes in the cluster? For example create on node A while node B is down, 
create on node B while A is down, then bring both up together.

We are looking to embed Cassandra invisibly in another product and we have no 
control in what order users may start/stop the nodes up or add/remove them from 
clusters. And Cassandra must come up and be working with at least local access 
regardless. So this means always creating keyspaces and tables so they are 
always present. But this means nodes joining clusters which already have the 
same keyspace and table defined. Will it cause any issues? I have done some 
testing and saw some some issues when I tried to nodetool repair to bring 
things into sync. However at the time I was fighting with what I later 
discovered was CASSANDRA-9689 keyspace does not show in describe list, if 
create query times out.<https://issues.apache.org/jira/browse/CASSANDRA-9689> 
and did not know what was what. I will give it another try sometime, but would 
appreciate knowing if this is going to run into trouble before we find it.

We are basically using Cassandra to share fairly transient information We can 
cope with data loss during environment changes and occasional losses at other 
times. But if the environment is stable then it should all just work, whatever 
the environment is. We use a very high replication factor so all nodes have a 
copy of all the data and will keep working even if they are the only one up.

Thanks

--

Stephen Baynes


Thanks
--

Stephen Baynes


RE: Using DTCS, TTL but old SSTables not being removed

2015-09-15 Thread Jacques-Henri Berthemet
Hi,

Any idea when 2.2.2 will be released?
I see there are still 3 issues left to fix:
https://issues.apache.org/jira/browse/CASSANDRA/fixforversion/1219/?selectedTab=com.atlassian.jira.jira-projects-plugin:version-summary-panel


--
Jacques-Henri Berthemet

-Original Message-
From: Jeff Jirsa [mailto:jeff.ji...@crowdstrike.com] 
Sent: dimanche 13 septembre 2015 20:34
To: user@cassandra.apache.org
Subject: Re: Using DTCS, TTL but old SSTables not being removed

2.2.1 has a pretty significant bug in compaction: 
https://issues.apache.org/jira/browse/CASSANDRA-10270

That prevents it from compacting files after 60 minutes. It may or may not be 
the cause of the problem you’re seeing, but it seems like it may be possibly 
related, and you can try the workaround in that ticket to see if it helps.





On 9/13/15, 10:54 AM, "Phil Budne" <p...@ultimate.com> wrote:

>Running Cassandra 2.2.1 on 3 nodes (on EC2, from Datastax AMI, then
>upgraded).  Inserting time-series data; All entries with TTL to expire
>3 hours after the "actual_time" of the observation.  Entries arrive
>with varied delay, and often in duplicate. Data is expiring (no longer
>visible from CQL), but old SSTables are not being removed (except on
>restart).
>
>CREATE KEYSPACE thing
>WITH replication = {'class': 'SimpleStrategy', 'replication_factor': '2'}
>AND durable_writes = true;
>
>CREATE TABLE thing.thing_ia (
>id int,
>actual_time timestamp,
>data text,
>PRIMARY KEY (id, actual_time)
>) WITH CLUSTERING ORDER BY (actual_time ASC)
>AND bloom_filter_fp_chance = 0.01
>AND caching = '{"keys":"ALL", "rows_per_partition":"NONE"}'
>AND comment = ''
>AND compaction = {'tombstone_threshold': '0.1', 
> 'tombstone_compaction_interval': '600', 'class': 
> 'org.apache.cassandra.db.compaction.DateTieredCompactionStrategy'}
>AND compression = {'sstable_compression': 
> 'org.apache.cassandra.io.compress.LZ4Compressor'}
>AND dclocal_read_repair_chance = 0.1
>AND default_time_to_live = 0
>AND gc_grace_seconds = 60
>AND max_index_interval = 2048
>AND memtable_flush_period_in_ms = 0
>AND min_index_interval = 128
>AND read_repair_chance = 0.0
>AND speculative_retry = '99.0PERCENTILE';
>
>All times shown in UTC:
>
>$ python -c 'import time; print int(time.time())'
>1442166347
>
>$ date
>Sun Sep 13 17:46:19 UTC 2015
>
>$ cat ~/mmm.sh
>for x in la-*Data.db; do
>ls -l $x
>~/meta.sh $x >/tmp/mmm/$x
>head < /tmp/mmm/$x
>echo 
>grep Ances /tmp/mmm/$x
>echo ''
>done
>
>$ sh ~/mmm.sh 
>-rw-r--r-- 1 cassandra cassandra 31056032 Sep 12 05:41 la-203-big-Data.db
>SSTable: /raid0/cassandra/data/thing.thing_ia-.../la-203-big
>Partitioner: org.apache.cassandra.dht.Murmur3Partitioner
>Bloom Filter FP chance: 0.01
>Minimum timestamp: 1442025790163000
>Maximum timestamp: 1442034620451000
>SSTable max local deletion time: 1442045239
>Compression ratio: -1.0
>Estimated droppable tombstones: 0.946418951062831
>SSTable Level: 0
>Repaired at: 0
>
>Ancestors: [202]
>
>-rw-r--r-- 1 cassandra cassandra 23647585 Sep 12 06:09 la-204-big-Data.db
>SSTable: /raid0/cassandra/data/thing.thing_ia-.../la-204-big
>Partitioner: org.apache.cassandra.dht.Murmur3Partitioner
>Bloom Filter FP chance: 0.01
>Minimum timestamp: 1442034620472000
>Maximum timestamp: 1442038188419002
>SSTable max local deletion time: 1442073136
>Compression ratio: -1.0
>Estimated droppable tombstones: 0.9163514458998852
>SSTable Level: 0
>Repaired at: 0
>
>Ancestors: []
>
>-rw-r--r-- 1 cassandra cassandra 23456946 Sep 12 07:25 la-205-big-Data.db
>SSTable: /raid0/cassandra/data/thing.thing_ia-.../la-205-big
>Partitioner: org.apache.cassandra.dht.Murmur3Partitioner
>Bloom Filter FP chance: 0.01
>Minimum timestamp: 1442038188472000
>Maximum timestamp: 1442042703834001
>SSTable max local deletion time: 1442053303
>Compression ratio: -1.0
>Estimated droppable tombstones: 0.9442594560554178
>SSTable Level: 0
>Repaired at: 0
>
>Ancestors: []
>
>-rw-r--r-- 1 cassandra cassandra 23331024 Sep 12 08:11 la-206-big-Data.db
>SSTable: /raid0/cassandra/data/thing.thing_ia-.../la-206-big
>Partitioner: org.apache.cassandra.dht.Murmur3Partitioner
>Bloom Filter FP chance: 0.01
>Minimum timestamp: 1442042703845000
>Maximum timestamp: 1442045482391000
>SSTable max local deletion time: 1442056194
>Compression ratio: -1.0
>Estimated droppable tombstones: 0.922422134865437
>SSTable Level: 0
>Repaired at: 0
>
>Ancestors: []
>
>-rw-r--r-- 1 cassandra cassandra 23699494 Sep 1

RE: TTL question

2015-08-28 Thread Jacques-Henri Berthemet
What if you use an update statement in the second query?

--
Jacques-Henri Berthemet

-Original Message-
From: Tommy Stendahl [mailto:tommy.stend...@ericsson.com] 
Sent: vendredi 28 août 2015 13:34
To: user@cassandra.apache.org
Subject: Re: TTL question

Yes, I understand that but I think this gives a strange behaviour. 
Having values only on the primary key columns are perfectly valid so why 
should the primary key be deleted by the TTL on the non-key column.

/Tommy

On 2015-08-28 13:19, Marcin Pietraszek wrote:
 Please look at primary key which you've defined. Second mutation has
 exactly the same primary key - it overwrote row that you previously
 had.

 On Fri, Aug 28, 2015 at 1:14 PM, Tommy Stendahl
 tommy.stend...@ericsson.com wrote:
 Hi,

 I did a small test using TTL but I didn't get the result I expected.

 I did this in sqlsh:

 cqlsh create TABLE foo.bar ( key int, cluster int, col int, PRIMARY KEY
 (key, cluster)) ;
 cqlsh INSERT INTO foo.bar (key, cluster ) VALUES ( 1,1 );
 cqlsh SELECT * FROM foo.bar ;

   key | cluster | col
 -+-+--
 1 |   1 | null

 (1 rows)
 cqlsh INSERT INTO foo.bar (key, cluster, col ) VALUES ( 1,1,1 ) USING TTL
 10;
 cqlsh SELECT * FROM foo.bar ;

   key | cluster | col
 -+-+-
 1 |   1 |   1

 (1 rows)

 wait for TTL to expire

 cqlsh SELECT * FROM foo.bar ;

   key | cluster | col
 -+-+-

 (0 rows)



 Is this really correct?
 I expected the result from the last select to be:

   key | cluster | col
 -+-+--
 1 |   1 | null

 (1 rows)


 Regards,
 Tommy