Re: Cassandra listen port

2016-10-04 Thread Mehdi Bada
Be sure, it's for. Test :))) 

But I just don't want to use ccm. I want to test it manually (manual 
configuration) for a good understanding. 

Thnks Vladimir 

--- 

Mehdi Bada | Consultant 
Phone: +41 32 422 96 00 | Mobile: +41 79 928 75 48 | Fax: +41 32 499 96 15 
dbi services, Rue de la Jeunesse 2, CH-2800 Delémont 
mehdi.b...@dbi-services.com 
www.dbi-services.com 




From: "Vladimir Yudovin"  
To: "user"  
Sent: Wednesday, October 5, 2016 6:03:26 AM 
Subject: Re: Re: Cassandra listen port 

I hope it's not for production ))) 

Yes, you need three IP (real or aliases). There is Cassandra Cluster Manager 
tool for launching several C* on the same host. 
See also 
https://academy.datastax.com/getting-started-with-ccm-cassandra-cluster-manager 
and 
http://www.datastax.com/dev/blog/ccm-a-development-tool-for-creating-local-cassandra-clusters
 


Best regards, Vladimir Yudovin, 
Winguzone Inc - Hosted Cloud Cassandra on Azure and SoftLayer. 
Launch your cluster in minutes. 



 On Tue, 04 Oct 2016 15:36:23 -0400 Mehdi Bada 
 wrote  



I want to run a cluster (3 instances) in a single server. Configuration of my 
VM: 
- host-only adapter: static IP 192.168... 
- bridge adapter 

I create 3 environments (data dir, admin dir, conf file...) for the 3 
instances. But I'm now blocked in the network configuration. 
I can use 1 IP address for the 3 instances? And just modify the port? 
If yes, which parameters I have to change? 

Regards 

Mehdi Bada | Consultant 
Phone: +41 32 422 96 00 | Mobile: +41 79 928 75 48 | Fax: +41 32 499 96 15 
dbi services, Rue de la Jeunesse 2, CH-2800 Delémont 
mehdi.b...@dbi-services.com 
www.dbi-services.com 




From: "Vladimir Yudovin" < vla...@winguzone.com > 
To: "user" < user@cassandra.apache.org > 
Sent: Tuesday, October 4, 2016 8:53:22 PM 
Subject: Re: Re: Cassandra listen port 

>Use multiple IP addresses instead. 
>Virtual addresses can be possible also? 
>eth0:0, eth0:1 

Why multiple or virtual IP? 

You can use the same IP for both addresses, as they use different TCP ports. 
Sure, it's better to use internal IP (like 10... or 192.168...) for internode 
connect, but it's not demand. 

Best regards, Vladimir Yudovin, 
Winguzone Inc - Hosted Cloud Cassandra on Azure and SoftLayer. 
Launch your cluster in minutes. 



 On Tue, 04 Oct 2016 14:51:59 -0400 Benjamin Roth < benjamin.r...@jaumo.com 
> wrote  

BQ_BEGIN

Of course, just add aliases to your interfaces (like eth0:0, eth0:1, ...). 
For example CCM ( https://github.com/pcmanus/ccm ) uses 127.0.0.[1-255] to set 
up multiple CS instances on a single server. 

2016-10-04 20:49 GMT+02:00 Mehdi Bada < mehdi.b...@dbi-services.com > : 

BQ_BEGIN
Virtual addresses can be possible also? 

Thanks Benjamin 

Mehdi Bada | Consultant 
Phone: +41 32 422 96 00 | Mobile: +41 79 928 75 48 | Fax: +41 32 422 96 15 dbi 
services, Rue de la Jeunesse 2, CH-2800 Delémont 
mehdi.b...@dbi-services.com www.dbi-services.com 

⇒ dbi services is recruiting Oracle & SQL Server experts ! – Join the team 


- Original Message - 
From: Benjamin Roth < benjamin.r...@jaumo.com > 
To: user@cassandra.apache.org 
Sent: Tue, 04 Oct 2016 20:36:49 +0200 (CEST) 
Subject: Re: Cassandra listen port 

As far as I can see, these ports are also used for outgoing connection, so 
a node expects all other peers also to use that port. Therefore the answer 
is no. Use multiple IP addresses instead. 

2016-10-04 20:03 GMT+02:00 Mehdi Bada < mehdi.b...@dbi-services.com >: 

> Thanks Vladimir. 
> It means if I want to run Cassandra on multi instance environment I only 
> have to change the listen address of each instance and the 9000 CQL port?? 
> 
> 
> --- 
> Mehdi Bada | Consultant 
> Phone: +41 32 422 96 00 | Mobile: +41 79 928 75 48 | Fax: +41 32 422 96 15 
> dbi services, Rue de la Jeunesse 2, CH-2800 Delémont 
> mehdi.b...@dbi-services.com www.dbi-services.com 
> 
> ⇒ dbi services is recruiting Oracle & SQL Server experts ! – Join the team 
> 
> 
> - Original Message - 
> From: Vladimir Yudovin < vla...@winguzone.com > 
> To: user@cassandra.apache.org 
> Sent: Tue, 04 Oct 2016 18:18:19 +0200 (CEST) 
> Subject: Re: Cassandra listen port 
> 
> Actually the main port is 9042 - for client (CQL) connections and 7000 
> (7001 if SSL enabled) for inter node communications. 
> 
> Best regards, Vladimir Yudovin, 
> Winguzone Inc - Hosted Cloud Cassandra on Azure and SoftLayer. 
> Launch your cluster in minutes. 
> 
> 
> 
> 
>  On Tue, 04 Oct 2016 11:36:04 -0400 Benjamin 
> Rothbenjamin.r...@jaumo.com  wrote  
> 
> There are several ports for several services. They are all set in 
> cassandra.yaml 
> 
> See here for complete documentation: 
> https://docs.datastax.com/en/cassandra/2.1/cassandra/ 
> configuration/configCassandra_yaml_r.html 
> 
> 
> 
> 2016-10-04 16:54 GMT+02:00 Mehdi Bada & lt;mehdi.b...@dbi-services.com : 
> Hi all, 
> 
> 
> 
> What 

Re: Re: Cassandra listen port

2016-10-04 Thread Vladimir Yudovin
I hope it's not for production )))

Yes, you need three IP (real or aliases). There is Cassandra Cluster Manager 
tool for launching several C* on the same host.
See also 
https://academy.datastax.com/getting-started-with-ccm-cassandra-cluster-manager 
and 
http://www.datastax.com/dev/blog/ccm-a-development-tool-for-creating-local-cassandra-clusters


Best regards, Vladimir Yudovin, 
Winguzone Inc - Hosted Cloud Cassandra on Azure and SoftLayer.
Launch your cluster in minutes.




 On Tue, 04 Oct 2016 15:36:23 -0400 Mehdi Bada 
mehdi.b...@dbi-services.com wrote  

I want to run a cluster (3 instances) in a single server. Configuration of my 
VM:

- host-only adapter: static IP 192.168... 

- bridge adapter



I create 3 environments (data dir, admin dir, conf file...) for the 3 
instances. But I'm now blocked in the network configuration.

I can use 1 IP address for the 3 instances? And just modify the port? 

If yes, which parameters I have to change?



Regards



Mehdi Bada | Consultant
Phone: +41 32 422 96 00 | Mobile: +41 79 928 75 48 | Fax: +41 32 499 96 15 
dbi services, Rue de la Jeunesse 2, CH-2800 Delémont
mehdi.b...@dbi-services.com 
www.dbi-services.com









From: "Vladimir Yudovin" vla...@winguzone.com
To: "user" user@cassandra.apache.org
Sent: Tuesday, October 4, 2016 8:53:22 PM
Subject: Re: Re: Cassandra listen port



Use multiple IP addresses instead. 
Virtual addresses can be possible also? 
eth0:0, eth0:1

Why multiple or virtual IP?

You can use the same IP for both addresses, as they use different TCP ports. 
Sure, it's better to use internal IP (like 10... or 192.168...) for internode 
connect, but it's not demand.

Best regards, Vladimir Yudovin, 
Winguzone Inc - Hosted Cloud Cassandra on Azure and SoftLayer.
Launch your cluster in minutes.




 On Tue, 04 Oct 2016 14:51:59 -0400 Benjamin Roth 
benjamin.r...@jaumo.com wrote  

Of course, just add aliases to your interfaces (like eth0:0, eth0:1, ...).For 
example CCM (https://github.com/pcmanus/ccm) uses 127.0.0.[1-255] to set up 
multiple CS instances on a single server.


2016-10-04 20:49 GMT+02:00 Mehdi Bada mehdi.b...@dbi-services.com:
Virtual addresses can be possible also?
 
 Thanks Benjamin
 
 Mehdi Bada | Consultant
 Phone: +41 32 422 96 00 | Mobile: +41 79 928 75 48 | Fax: +41 32 422 96 15 dbi 
services, Rue de la Jeunesse 2, CH-2800 Delémont
 mehdi.b...@dbi-services.com www.dbi-services.com
 
 ⇒ dbi services is recruiting Oracle  SQL Server experts ! – Join the team
 
 
 - Original Message -
 From: Benjamin Roth benjamin.r...@jaumo.com
 To: user@cassandra.apache.org
 Sent: Tue, 04 Oct 2016 20:36:49 +0200 (CEST)
 Subject: Re: Cassandra listen port
 
 As far as I can see, these ports are also used for outgoing connection, so
 a node expects all other peers also to use that port. Therefore the answer
 is no. Use multiple IP addresses instead.

 2016-10-04 20:03 GMT+02:00 Mehdi Bada mehdi.b...@dbi-services.com:
 
  Thanks Vladimir.
  It means if I want to run Cassandra on multi instance environment I only
  have to change the listen address of each instance and the 9000 CQL port??
 
 
  ---
  Mehdi Bada | Consultant
  Phone: +41 32 422 96 00 | Mobile: +41 79 928 75 48 | Fax: +41 32 422 96 15
  dbi services, Rue de la Jeunesse 2, CH-2800 Delémont
  mehdi.b...@dbi-services.com www.dbi-services.com
 
  ⇒ dbi services is recruiting Oracle  SQL Server experts ! – Join the 
team
 
 
  - Original Message -
  From: Vladimir Yudovin vla...@winguzone.com
  To: user@cassandra.apache.org
  Sent: Tue, 04 Oct 2016 18:18:19 +0200 (CEST)
  Subject: Re: Cassandra listen port
 
  Actually the main port is 9042 - for client (CQL) connections and 7000
  (7001 if SSL enabled) for inter node communications.
 
  Best regards, Vladimir Yudovin,
  Winguzone Inc - Hosted Cloud Cassandra on Azure and SoftLayer.
  Launch your cluster in minutes.
 
 
 
 
   On Tue, 04 Oct 2016 11:36:04 -0400 Benjamin
  Rothlt;benjamin.r...@jaumo.comgt; wrote 
 
  There are several ports for several services. They are all set in
  cassandra.yaml
 
  See here for complete documentation:
  https://docs.datastax.com/en/cassandra/2.1/cassandra/
  configuration/configCassandra_yaml_r.html
 
 
 
  2016-10-04 16:54 GMT+02:00 Mehdi Bada 
lt;mehdi.b...@dbi-services.comgt;:
  Hi all,
 
 
 
  What is the listen port parameter for Apache Cassandra? Does it exist?
 
  In comparison with MongoDB, in mongo it's possible to set the listen port
  in the mongod.conf (configuration file)
 
 
 
  Regards
 
  Mehdi
 
 
 
  Mehdi Bada | Consultant
  Phone: +41 32 422 96 00 | Mobile: +41 79 928 75 48 | Fax: +41 32 422 96 15
  dbi services, Rue de la Jeunesse 2, CH-2800 Delémont
  mehdi.b...@dbi-services.com
  www.dbi-services.com
 
 
 
 
 
 
 
 
 
  ⇒ dbi services is recruiting Oracle amp; SQL Server experts ! – Join 
the
  team
 
 
 
 
 
 
 
 
 
  --
  Benjamin Roth
  Prokurist
 
  Jaumo GmbH · www.jaumo.com
  Wehrstraße 46 · 73035 Göppingen · Germany
 

Re: How to write a trigger in Cassandra to only detect updates of an existing row?

2016-10-04 Thread Kant Kodali
Hi Siddharth,
That seems like a cool trick. but since I am looking for only updates of an
existing row how would I know from this logic "insert/update(length > 0)" do I
need to create a hashmap for every row and keep track oflength > 0 but that
would blow up the memory right.
Thanks,kant
 





On Tue, Oct 4, 2016 1:46 PM, siddharth verma sidd.verma29.l...@gmail.com
wrote:
Hi,consider the schemapk1 text,ck1 textv1 text,v2 text.PRIMARY KEY(pk1,ck1)
1. insert into ks.tablename(pk1,ck1,v1,v2) values('PK1,'CK1','a','a');2. delete
from ks.tablename where pk1='PK2' and ck1='CK2';
3. insert into ks.tablename(pk1,ck1) values('PK3,'CK3');4. insert into
ks.tablename(pk1,ck1,v1) values('PK4,'CK4','a');
3rd case is "insert of the form when ONLY primary key values are specified"
if you are sure, case 3 will never occur from your application, you can check on
length of "next"(as in the code snippet),next.length() will be greater than zero
in case 1,4next.length() will be equal to zero in case 2,3

Thus, inspite of 3 being an insert, in the code snippet, it might appear to be a
delete.

Rephrasing"If you are sure that your application will NOT do an insert of the
form when ONLY primary key values are specified, you can check the length of
next, to indicate whether it is an insert/update(where atleast one non primary
key column value is inserted) or a delete if length is zero."If you are sure
case 3 will never occur,then checking the next.length(), you can decide whether
it is an insert/update(length > 0) OR delete(length == 0)
I would urge you to try the snippet once on you own, to see what kind of data it
produces in next. You could dump the output of next in a column for audit table,
to see that output.

RegardsSiddharth Verma
On Wed, Oct 5, 2016 at 1:23 AM, Kant Kodali   wrote:
Hi Siddharth,
I don't quite follow the assumption "If you are sure that your application will
NOT do an insert of the form when ONLY primary key values are specified, you can
check the length of next, to indicate whether it is an insert/update(where
atleast one non primary key column value is inserted) or a delete if length is
zero.". Could you please provide an example ?
Thanks,kant

 





On Tue, Oct 4, 2016 12:34 PM, siddharth verma sidd.verma29.l...@gmail.com
wrote:
Hi,I am not sure whether it will help you or not.Code snippet :public
Collection augment(Partition update){...    StringBuilder next=new
StringBuilder();    SearchIterator searchIterator =
update.searchIterator(ColumnFilter.all(update.metadata()),false);       
while(searchIterator.hasNext()){            next.append(searchIterator.
next(Clustering.EMPTY).toString()+"\001");        }...//next carries non primary
key column values}
If you are sure that your application will NOT do an insert of the form when
ONLY primary key values are specified, you can check the length of next, to
indicate whether it is an insert/update(where atleast one non primary key column
value is inserted) or a delete if length is zero.
The code snippet is to the best of my knowledge, however, kindly try it once at
your end, as this was part of some legacy code, and I am not completely sure
about it.
Here, if the assumption stated above holds true, you could avoid a cassandra
select for that key.
ThanksSiddharth Verma

On Wed, Oct 5, 2016 at 12:20 AM, Kant Kodali   wrote:
Thanks a lot, This helps me to make a decision on not to write one for the
performance reasons you pointed out!

 





On Tue, Oct 4, 2016 11:42 AM, Eric Stevens migh...@gmail.com
wrote:
You would have to perform a SELECT on the row in the trigger code in order to
determine if there was underlying data.  Cassandra is in essence an append-only
data store, when an INSERT or UPDATE is executed, it has no idea if there is
already a row underlying it, and for write performance reasons it also doesn't
care.
Note that if you do this, you're going to introduce a giant bottleneck in your
write path and increase the IO cost of writes.  You'll also probably have some
race conditions such that if two writes to the same row happen in quick
succession your trigger might not notice that one of them is writing to the same
row as the other. You might need to resort to CAS operations to overcome that,
along with its associated overhead.  But all that said, it should be possible,
though you'll have to write it for yourself in your trigger code.


On Tue, Oct 4, 2016 at 12:29 PM Kant Kodali  wrote:
Hi all,
How to write a trigger in Cassandra to detect updates? My requirement is that I
want a trigger to alert me only when there is an update to an existing row and
looks like given the way INSERT and Update works this might be hard to do
because INSERT will just overwrite if there is an existing row and Update
becomes new insert where there is no row that belongs to certain partition key.
is there a way to solve this problem?
Thanks,

kant

Re: nodetool cfhistograms

2016-10-04 Thread Sungju Hong
Sorry, I had a typing error.

nodetool cfhistograms ouput shows like below.


ks1/cf1 histograms
Offset  SSTables Write Latency  Read Latency  Row
Size  Column Count
1  0 0 0
0  35565129
2  0 0 0
0  57856928
3  0 0 0
0  10785515
...
17436917   0 0 0
8 0
20924300   0 0 0
5 0
25109160   0 0 0
6 0


If "Row Size" or "Column Count" go over last value(25109160) how is it
represented ?

For example, if the column count of some column family is 5,000,000 ?
For example, if the column count of some column family is 50,000,000 ?
Does it add to range of 25109160 ?

Thanks.
Regards

On Wed, Oct 5, 2016 at 10:47 AM, Sungju Hong  wrote:

> Hello,
>
> nodetool cfhistograms ouput shows like below.
>
> 
> ks1/cf1 histograms
> Offset  SSTables Write Latency  Read Latency  Row
> Size  Column Count
> 1  0 0 0
> 0  35565129
> 2  0 0 0
> 0  57856928
> 3  0 0 0
> 0  10785515
> ...
> 17436917   0 0 0
> 8 0
> 20924300   0 0 0
> 5 0
> 25109160   0 0 0
> 6 0
> 
>
> If "Row Size" or "Column Count" go over last value(25109160) how is it
> represented ?
> For example, if the column count of some column family is 5,000,000 ?
> Does it add to range of 25109160 ?
>
> Thanks.
> Regards
> Sungju
>


nodetool cfhistograms

2016-10-04 Thread Sungju Hong
Hello,

nodetool cfhistograms ouput shows like below.


ks1/cf1 histograms
Offset  SSTables Write Latency  Read Latency  Row
Size  Column Count
1  0 0 0
0  35565129
2  0 0 0
0  57856928
3  0 0 0
0  10785515
...
17436917   0 0 0
8 0
20924300   0 0 0
5 0
25109160   0 0 0
6 0


If "Row Size" or "Column Count" go over last value(25109160) how is it
represented ?
For example, if the column count of some column family is 5,000,000 ?
Does it add to range of 25109160 ?

Thanks.
Regards
Sungju


Re: How to write a trigger in Cassandra to only detect updates of an existing row?

2016-10-04 Thread siddharth verma
Hi,
consider the schema
pk1 text,
ck1 text
v1 text,
v2 text.
PRIMARY KEY(pk1,ck1)

1. insert into ks.tablename(pk1,ck1,v1,v2) values('PK1,'CK1','a','a');
2. delete from ks.tablename where pk1='PK2' and ck1='CK2';
3. insert into ks.tablename(pk1,ck1) values('PK3,'CK3');
4. insert into ks.tablename(pk1,ck1,v1) values('PK4,'CK4','a');

3rd case is "insert of the form when ONLY primary key values are specified"

if you are sure, case 3 will never occur from your application, you can
check on length of "next"(as in the code snippet),
next.length() will be greater than zero in case 1,4
next.length() will be equal to zero in case 2,3

Thus, inspite of 3 being an insert, in the code snippet, it might appear to
be a delete.


Rephrasing
"If you are sure that your application will NOT do an insert of the form
when ONLY primary key values are specified, you can check the length of
next, to indicate whether it is an insert/update(where atleast one non
primary key column value is inserted) or a delete if length is zero."
If you are sure case 3 will never occur,
then checking the next.length(), you can decide whether it is an
insert/update(length > 0) OR delete(length == 0)

I would urge you to try the snippet once on you own, to see what kind of
data it produces in *next*. You could dump the output of next in a column
for audit table, to see that output.


Regards
Siddharth Verma

On Wed, Oct 5, 2016 at 1:23 AM, Kant Kodali  wrote:

> Hi Siddharth,
>
> I don't quite follow the assumption "If you are sure that your
> application will NOT do an insert of the form when ONLY primary key values
> are specified, you can check the length of next, to indicate whether it is
> an insert/update(where atleast one non primary key column value is
> inserted) or a delete if length is zero.". Could you please provide an
> example ?
>
> Thanks,
> kant
>
>
>
> On Tue, Oct 4, 2016 12:34 PM, siddharth verma sidd.verma29.l...@gmail.com
> wrote:
>
>> Hi,
>> I am not sure whether it will help you or not.
>> Code snippet :
>> public Collection augment(Partition update)
>> {
>> ...
>> StringBuilder next=new StringBuilder();
>> SearchIterator searchIterator =
>> update.searchIterator(ColumnFilter.all(update.metadata()),false);
>> while(searchIterator.hasNext()){
>> next.append(searchIterator.next(Clustering.EMPTY).
>> toString()+"\001");
>> }
>> ...
>> //next carries non primary key column values
>> }
>>
>> If you are sure that your application will NOT do an insert of the form
>> when ONLY primary key values are specified, you can check the length of
>> next, to indicate whether it is an insert/update(where atleast one non
>> primary key column value is inserted) or a delete if length is zero.
>>
>> The code snippet is to the best of my knowledge, however, kindly try it
>> once at your end, as this was part of some legacy code, and I am not
>> completely sure about it.
>>
>> Here, if the assumption stated above holds true, you could avoid a
>> cassandra select for that key.
>>
>> Thanks
>> Siddharth Verma
>>
>>
>> On Wed, Oct 5, 2016 at 12:20 AM, Kant Kodali  wrote:
>>
>> Thanks a lot, This helps me to make a decision on not to write one for
>> the performance reasons you pointed out!
>>
>>
>>
>> On Tue, Oct 4, 2016 11:42 AM, Eric Stevens migh...@gmail.com wrote:
>>
>> You would have to perform a SELECT on the row in the trigger code in
>> order to determine if there was underlying data.  Cassandra is in essence
>> an append-only data store, when an INSERT or UPDATE is executed, it has no
>> idea if there is already a row underlying it, and for write performance
>> reasons it also doesn't care.
>>
>> Note that if you do this, you're going to introduce a giant bottleneck in
>> your write path and increase the IO cost of writes.  You'll also probably
>> have some race conditions such that if two writes to the same row happen in
>> quick succession your trigger might not notice that one of them is writing
>> to the same row as the other. You might need to resort to CAS operations to
>> overcome that, along with its associated overhead.  But all that said, it
>> should be possible, though you'll have to write it for yourself in your
>> trigger code.
>>
>>
>>
>> On Tue, Oct 4, 2016 at 12:29 PM Kant Kodali  wrote:
>>
>> Hi all,
>>
>> How to write a trigger in Cassandra to detect updates? My requirement is
>> that I want a trigger to alert me only when there is an update to an
>> existing row and looks like given the way INSERT and Update works this
>> might be hard to do because INSERT will just overwrite if there is an
>> existing row and Update becomes new insert where there is no row that
>> belongs to certain partition key. is there a way to solve this problem?
>>
>> Thanks,
>>
>> kant
>>
>>
>>


Re: How to query '%' character using LIKE operator in Cassandra 3.7?

2016-10-04 Thread Mikhail Krupitskiy
Please see my comments inline.

Thanks,
Mikhail
> On 26 Sep 2016, at 17:07, DuyHai Doan  wrote:
> 
> "In the current implementation (‘%’ could be a wildcard only at the start/end 
> of a term) I guess it should be ’ENDS with ‘%escape’ ‘." 
> 
> --> Yes in the current impl, it means ENDS WITH '%escape' but we want SASI to 
> understand the %% as an escape for % so the goal is that SASI understands 
> LIKE '%%escape' as EQUALS TO '%escape'. Am I correct ?
I guess that the goal is to define a way to use ‘%’ as a simple char.
LIKE '%escape' - ENDS WITH 'escape'
LIKE '%%escape' - EQUALS TO '%escape’
LIKE '%%escape%' - STARTS WITH '%escape’

LIKE ‘%%%escape’ - undefined in general case
LIKE ‘%%%escape’ - ENDS WITH “%escape” in a case when we know that a wildcard 
could be only at the start/end.
> 
> "Moreover all terms that contains single ‘%’ somewhere in the middle should 
> cause an exception."
> 
> --> Not necessarily, sometime people may want to search text pattern 
> containing the literal %. Imagine the text "this year the average income has 
> increase by 10%". People may want to search for "10%”.
If someone wants to search for ’10%’ then he should escape the ‘%’ char: like 
“10%%”.
> 
> 
> 
> "BUT may be it’s better to make escaping more universal to support a future 
> possible case where a wildcard could be placed in the middle of a term too?"
> 
> --> I guess universal escaping for % is the cleaner and better solution. 
> However it may involve some complex regular expression. I'm not sure that 
> input.replaceAll("%%", "%") trick would work for any cases.
As I wrote I don’t think that it’s possible to do universal escaping using ‘%’ 
as an escape char (a char to escape wildcard char to make it a simple char 
semantically) and as wildcard at the same time.
I suggest to use “\” as an escape char.
Also I don’t know enough about Cassandra’s internals to estimate how universal 
escaping will affect performance.
It really looks like a better solution for Cassandra users.
> 
> And we also need to define when we want to detect operation type 
> (LIKE_PREFIX, LIKE_SUFFIX, LIKE_CONTAINS, EQUAL) ? 
> 
> Should we detect operation type BEFORE escaping or AFTER escaping ?
As I understand ‘escaping' will be done by users. 
So on DB level we get an already escaped string from a request and it’s 
possible to know which symbol is a wildcard and which is just a char.
I guess that Cassandra should parse (unescape?) an incoming string to define 
wildcards positions and simple chars positions and then define an operation 
type.

 
> 
> 
> 
> 
> 
> On Mon, Sep 26, 2016 at 3:54 PM, Mikhail Krupitskiy 
> > 
> wrote:
>> LIKE '%%%escape' --> EQUALS TO '%%escape' ???
> In the current implementation (‘%’ could be a wildcard only at the start/end 
> of a term) I guess it should be ’ENDS with ‘%escape’ ‘.
> Moreover all terms that contains single ‘%’ somewhere in the middle should 
> cause an exception.
> BUT may be it’s better to make escaping more universal to support a future 
> possible case where a wildcard could be placed in the middle of a term too?
> 
> Thanks,
> Mikhail 
>> On 24 Sep 2016, at 21:09, DuyHai Doan > > wrote:
>> 
>> Reminder, right now, the % character is only interpreted as wildcard IF AND 
>> ONLY IF it is the first/last character of the searched term
>> 
>> 
>> LIKE '%escape' --> ENDS WITH 'escape' 
>> 
>> If we use % to escape %,
>> LIKE '%%escape' -->  EQUALS TO '%escape'
>> 
>> LIKE '%%%escape' --> EQUALS TO '%%escape' ???
>> 
>> 
>> 
>> 
>> On Fri, Sep 23, 2016 at 5:02 PM, Mikhail Krupitskiy 
>> > 
>> wrote:
>> Hi, Jim,
>> 
>> What pattern should be used to search “ends with  ‘%escape’ “ with your 
>> conception?
>> 
>> Thanks,
>> Mikhail
>> 
>>> On 22 Sep 2016, at 18:51, Jim Ancona >> > wrote:
>>> 
>>> To answer DuyHai's question without introducing new syntax, I'd suggest:
 LIKE '%%%escape' means STARTS WITH '%' AND ENDS WITH 'escape' 
>>> So the first two %'s are translated to a literal, non-wildcard % and the 
>>> third % is a wildcard because it's not doubled.
>>> 
>>> Jim
>>> 
>>> On Thu, Sep 22, 2016 at 11:40 AM, Mikhail Krupitskiy 
>>> >> > wrote:
>>> I guess that it should be similar to how it is done in SQL for LIKE 
>>> patterns.
>>> 
>>> You can introduce an escape character, e.g. ‘\’.
>>> Examples:
>>> ‘%’ - any string
>>> ‘\%’ - equal to ‘%’ character
>>> ‘\%foo%’ - starts from ‘%foo’
>>> ‘%%%escape’ - ends with ’escape’
>>> ‘\%%’ - starts from ‘%’
>>> ‘\\\%%’ - starts from ‘\%’ .
>>> 
>>> What do you think?
>>> 
>>> Thanks,
>>> Mikhail
 On 22 Sep 2016, at 16:47, DuyHai Doan 

Re: How to write a trigger in Cassandra to only detect updates of an existing row?

2016-10-04 Thread Kant Kodali
Hi Siddharth,
I don't quite follow the assumption "If you are sure that your application will
NOT do an insert of the form when ONLY primary key values are specified, you can
check the length of next, to indicate whether it is an insert/update(where
atleast one non primary key column value is inserted) or a delete if length is
zero.". Could you please provide an example ?
Thanks,kant
 





On Tue, Oct 4, 2016 12:34 PM, siddharth verma sidd.verma29.l...@gmail.com
wrote:
Hi,I am not sure whether it will help you or not.Code snippet :public
Collection augment(Partition update){...    StringBuilder next=new
StringBuilder();    SearchIterator searchIterator =
update.searchIterator(ColumnFilter.all(update.metadata()),false);       
while(searchIterator.hasNext()){           
next.append(searchIterator.next(Clustering.EMPTY).toString()+"\001");        }
...//next carries non primary key column values}
If you are sure that your application will NOT do an insert of the form when
ONLY primary key values are specified, you can check the length of next, to
indicate whether it is an insert/update(where atleast one non primary key column
value is inserted) or a delete if length is zero.
The code snippet is to the best of my knowledge, however, kindly try it once at
your end, as this was part of some legacy code, and I am not completely sure
about it.
Here, if the assumption stated above holds true, you could avoid a cassandra
select for that key.
ThanksSiddharth Verma

On Wed, Oct 5, 2016 at 12:20 AM, Kant Kodali   wrote:
Thanks a lot, This helps me to make a decision on not to write one for the
performance reasons you pointed out!

 





On Tue, Oct 4, 2016 11:42 AM, Eric Stevens migh...@gmail.com
wrote:
You would have to perform a SELECT on the row in the trigger code in order to
determine if there was underlying data.  Cassandra is in essence an append-only
data store, when an INSERT or UPDATE is executed, it has no idea if there is
already a row underlying it, and for write performance reasons it also doesn't
care.
Note that if you do this, you're going to introduce a giant bottleneck in your
write path and increase the IO cost of writes.  You'll also probably have some
race conditions such that if two writes to the same row happen in quick
succession your trigger might not notice that one of them is writing to the same
row as the other. You might need to resort to CAS operations to overcome that,
along with its associated overhead.  But all that said, it should be possible,
though you'll have to write it for yourself in your trigger code.


On Tue, Oct 4, 2016 at 12:29 PM Kant Kodali  wrote:
Hi all,
How to write a trigger in Cassandra to detect updates? My requirement is that I
want a trigger to alert me only when there is an update to an existing row and
looks like given the way INSERT and Update works this might be hard to do
because INSERT will just overwrite if there is an existing row and Update
becomes new insert where there is no row that belongs to certain partition key.
is there a way to solve this problem?
Thanks,

kant

Re: Cassandra listen port

2016-10-04 Thread Mehdi Bada
I want to run a cluster (3 instances) in a single server. Configuration of my 
VM: 
- host-only adapter: static IP 192.168... 
- bridge adapter 

I create 3 environments (data dir, admin dir, conf file...) for the 3 
instances. But I'm now blocked in the network configuration. 
I can use 1 IP address for the 3 instances? And just modify the port? 
If yes, which parameters I have to change? 

Regards 

Mehdi Bada | Consultant 
Phone: +41 32 422 96 00 | Mobile: +41 79 928 75 48 | Fax: +41 32 499 96 15 
dbi services, Rue de la Jeunesse 2, CH-2800 Delémont 
mehdi.b...@dbi-services.com 
www.dbi-services.com 




From: "Vladimir Yudovin"  
To: "user"  
Sent: Tuesday, October 4, 2016 8:53:22 PM 
Subject: Re: Re: Cassandra listen port 

>Use multiple IP addresses instead. 
>Virtual addresses can be possible also? 
>eth0:0, eth0:1 

Why multiple or virtual IP? 

You can use the same IP for both addresses, as they use different TCP ports. 
Sure, it's better to use internal IP (like 10... or 192.168...) for internode 
connect, but it's not demand. 

Best regards, Vladimir Yudovin, 
Winguzone Inc - Hosted Cloud Cassandra on Azure and SoftLayer. 
Launch your cluster in minutes. 



 On Tue, 04 Oct 2016 14:51:59 -0400 Benjamin Roth  
wrote  



Of course, just add aliases to your interfaces (like eth0:0, eth0:1, ...). 
For example CCM ( https://github.com/pcmanus/ccm ) uses 127.0.0.[1-255] to set 
up multiple CS instances on a single server. 

2016-10-04 20:49 GMT+02:00 Mehdi Bada < mehdi.b...@dbi-services.com > : 

BQ_BEGIN
Virtual addresses can be possible also? 

Thanks Benjamin 

Mehdi Bada | Consultant 
Phone: +41 32 422 96 00 | Mobile: +41 79 928 75 48 | Fax: +41 32 422 96 15 dbi 
services, Rue de la Jeunesse 2, CH-2800 Delémont 
mehdi.b...@dbi-services.com www.dbi-services.com 

⇒ dbi services is recruiting Oracle & SQL Server experts ! – Join the team 


- Original Message - 
From: Benjamin Roth < benjamin.r...@jaumo.com > 
To: user@cassandra.apache.org 
Sent: Tue, 04 Oct 2016 20:36:49 +0200 (CEST) 
Subject: Re: Cassandra listen port 

As far as I can see, these ports are also used for outgoing connection, so 
a node expects all other peers also to use that port. Therefore the answer 
is no. Use multiple IP addresses instead. 

2016-10-04 20:03 GMT+02:00 Mehdi Bada < mehdi.b...@dbi-services.com >: 

> Thanks Vladimir. 
> It means if I want to run Cassandra on multi instance environment I only 
> have to change the listen address of each instance and the 9000 CQL port?? 
> 
> 
> --- 
> Mehdi Bada | Consultant 
> Phone: +41 32 422 96 00 | Mobile: +41 79 928 75 48 | Fax: +41 32 422 96 15 
> dbi services, Rue de la Jeunesse 2, CH-2800 Delémont 
> mehdi.b...@dbi-services.com www.dbi-services.com 
> 
> ⇒ dbi services is recruiting Oracle & SQL Server experts ! – Join the team 
> 
> 
> - Original Message - 
> From: Vladimir Yudovin < vla...@winguzone.com > 
> To: user@cassandra.apache.org 
> Sent: Tue, 04 Oct 2016 18:18:19 +0200 (CEST) 
> Subject: Re: Cassandra listen port 
> 
> Actually the main port is 9042 - for client (CQL) connections and 7000 
> (7001 if SSL enabled) for inter node communications. 
> 
> Best regards, Vladimir Yudovin, 
> Winguzone Inc - Hosted Cloud Cassandra on Azure and SoftLayer. 
> Launch your cluster in minutes. 
> 
> 
> 
> 
>  On Tue, 04 Oct 2016 11:36:04 -0400 Benjamin 
> Rothbenjamin.r...@jaumo.com  wrote  
> 
> There are several ports for several services. They are all set in 
> cassandra.yaml 
> 
> See here for complete documentation: 
> https://docs.datastax.com/en/cassandra/2.1/cassandra/ 
> configuration/configCassandra_yaml_r.html 
> 
> 
> 
> 2016-10-04 16:54 GMT+02:00 Mehdi Bada & lt;mehdi.b...@dbi-services.com : 
> Hi all, 
> 
> 
> 
> What is the listen port parameter for Apache Cassandra? Does it exist? 
> 
> In comparison with MongoDB, in mongo it's possible to set the listen port 
> in the mongod.conf (configuration file) 
> 
> 
> 
> Regards 
> 
> Mehdi 
> 
> 
> 
> Mehdi Bada | Consultant 
> Phone: +41 32 422 96 00 | Mobile: +41 79 928 75 48 | Fax: +41 32 422 96 15 
> dbi services, Rue de la Jeunesse 2, CH-2800 Delémont 
> mehdi.b...@dbi-services.com 
> www.dbi-services.com 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> ⇒ dbi services is recruiting Oracle  SQL Server experts ! – Join the 
> team 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> -- 
> Benjamin Roth 
> Prokurist 
> 
> Jaumo GmbH · www.jaumo.com 
> Wehrstraße 46 · 73035 Göppingen · Germany 
> Phone +49 7161 304880-6 · Fax +49 7161 304880-1 
> AG Ulm · HRB 731058 · Managing Director: Jens Kammerer 
> 
> 
> 
> 
> 
> 
> 
> 
> 


-- 
Benjamin Roth 
Prokurist 

Jaumo GmbH · www.jaumo.com 
Wehrstraße 46 · 73035 Göppingen · Germany 
Phone +49 7161 304880-6 · Fax +49 7161 304880-1 
AG Ulm · HRB 731058 · Managing Director: Jens Kammerer 







-- 
Benjamin Roth 
Prokurist 

Jaumo GmbH · www.jaumo.com 
Wehrstraße 46 · 73035 Göppingen · Germany 
Phone +49 

Re: How to write a trigger in Cassandra to only detect updates of an existing row?

2016-10-04 Thread siddharth verma
Hi,
I am not sure whether it will help you or not.
Code snippet :
public Collection augment(Partition update)
{
...
StringBuilder next=new StringBuilder();
SearchIterator searchIterator =
update.searchIterator(ColumnFilter.all(update.metadata()),false);
while(searchIterator.hasNext()){

next.append(searchIterator.next(Clustering.EMPTY).toString()+"\001");
}
...
//next carries non primary key column values
}

If you are sure that your application will NOT do an insert of the form
when ONLY primary key values are specified, you can check the length of
next, to indicate whether it is an insert/update(where atleast one non
primary key column value is inserted) or a delete if length is zero.

The code snippet is to the best of my knowledge, however, kindly try it
once at your end, as this was part of some legacy code, and I am not
completely sure about it.

Here, if the assumption stated above holds true, you could avoid a
cassandra select for that key.

Thanks
Siddharth Verma


On Wed, Oct 5, 2016 at 12:20 AM, Kant Kodali  wrote:

> Thanks a lot, This helps me to make a decision on not to write one for the
> performance reasons you pointed out!
>
>
>
> On Tue, Oct 4, 2016 11:42 AM, Eric Stevens migh...@gmail.com wrote:
>
>> You would have to perform a SELECT on the row in the trigger code in
>> order to determine if there was underlying data.  Cassandra is in essence
>> an append-only data store, when an INSERT or UPDATE is executed, it has no
>> idea if there is already a row underlying it, and for write performance
>> reasons it also doesn't care.
>>
>> Note that if you do this, you're going to introduce a giant bottleneck in
>> your write path and increase the IO cost of writes.  You'll also probably
>> have some race conditions such that if two writes to the same row happen in
>> quick succession your trigger might not notice that one of them is writing
>> to the same row as the other. You might need to resort to CAS operations to
>> overcome that, along with its associated overhead.  But all that said, it
>> should be possible, though you'll have to write it for yourself in your
>> trigger code.
>>
>>
>>
>> On Tue, Oct 4, 2016 at 12:29 PM Kant Kodali  wrote:
>>
>> Hi all,
>>
>> How to write a trigger in Cassandra to detect updates? My requirement is
>> that I want a trigger to alert me only when there is an update to an
>> existing row and looks like given the way INSERT and Update works this
>> might be hard to do because INSERT will just overwrite if there is an
>> existing row and Update becomes new insert where there is no row that
>> belongs to certain partition key. is there a way to solve this problem?
>>
>> Thanks,
>>
>> kant
>>
>>


Re: Re: Cassandra listen port

2016-10-04 Thread Vladimir Yudovin
Use multiple IP addresses instead. 
Virtual addresses can be possible also? 
eth0:0, eth0:1

Why multiple or virtual IP?

You can use the same IP for both addresses, as they use different TCP ports. 
Sure, it's better to use internal IP (like 10... or 192.168...) for internode 
connect, but it's not demand.

Best regards, Vladimir Yudovin, 
Winguzone Inc - Hosted Cloud Cassandra on Azure and SoftLayer.
Launch your cluster in minutes.




 On Tue, 04 Oct 2016 14:51:59 -0400 Benjamin Roth 
benjamin.r...@jaumo.com wrote  

Of course, just add aliases to your interfaces (like eth0:0, eth0:1, ...).For 
example CCM (https://github.com/pcmanus/ccm) uses 127.0.0.[1-255] to set up 
multiple CS instances on a single server.


2016-10-04 20:49 GMT+02:00 Mehdi Bada mehdi.b...@dbi-services.com:
Virtual addresses can be possible also?
 
 Thanks Benjamin
 
 Mehdi Bada | Consultant
 Phone: +41 32 422 96 00 | Mobile: +41 79 928 75 48 | Fax: +41 32 422 96 15 dbi 
services, Rue de la Jeunesse 2, CH-2800 Delémont
 mehdi.b...@dbi-services.com www.dbi-services.com
 
 ⇒ dbi services is recruiting Oracle  SQL Server experts ! – Join the team
 
 
 - Original Message -
 From: Benjamin Roth benjamin.r...@jaumo.com
 To: user@cassandra.apache.org
 Sent: Tue, 04 Oct 2016 20:36:49 +0200 (CEST)
 Subject: Re: Cassandra listen port
 
 As far as I can see, these ports are also used for outgoing connection, so
 a node expects all other peers also to use that port. Therefore the answer
 is no. Use multiple IP addresses instead.
 
 2016-10-04 20:03 GMT+02:00 Mehdi Bada mehdi.b...@dbi-services.com:
 
  Thanks Vladimir.
  It means if I want to run Cassandra on multi instance environment I only
  have to change the listen address of each instance and the 9000 CQL port??
 
 
  ---
  Mehdi Bada | Consultant
  Phone: +41 32 422 96 00 | Mobile: +41 79 928 75 48 | Fax: +41 32 422 96 15
  dbi services, Rue de la Jeunesse 2, CH-2800 Delémont
  mehdi.b...@dbi-services.com www.dbi-services.com
 
  ⇒ dbi services is recruiting Oracle  SQL Server experts ! – Join the 
team
 
 
  - Original Message -
  From: Vladimir Yudovin vla...@winguzone.com
  To: user@cassandra.apache.org
  Sent: Tue, 04 Oct 2016 18:18:19 +0200 (CEST)
  Subject: Re: Cassandra listen port
 
  Actually the main port is 9042 - for client (CQL) connections and 7000
  (7001 if SSL enabled) for inter node communications.
 
  Best regards, Vladimir Yudovin,
  Winguzone Inc - Hosted Cloud Cassandra on Azure and SoftLayer.
  Launch your cluster in minutes.
 
 
 
 
   On Tue, 04 Oct 2016 11:36:04 -0400 Benjamin
  Rothlt;benjamin.r...@jaumo.comgt; wrote 
 
  There are several ports for several services. They are all set in
  cassandra.yaml
 
  See here for complete documentation:
  https://docs.datastax.com/en/cassandra/2.1/cassandra/
  configuration/configCassandra_yaml_r.html
 
 
 
  2016-10-04 16:54 GMT+02:00 Mehdi Bada 
lt;mehdi.b...@dbi-services.comgt;:
  Hi all,
 
 
 
  What is the listen port parameter for Apache Cassandra? Does it exist?
 
  In comparison with MongoDB, in mongo it's possible to set the listen port
  in the mongod.conf (configuration file)
 
 
 
  Regards
 
  Mehdi
 
 
 
  Mehdi Bada | Consultant
  Phone: +41 32 422 96 00 | Mobile: +41 79 928 75 48 | Fax: +41 32 422 96 15
  dbi services, Rue de la Jeunesse 2, CH-2800 Delémont
  mehdi.b...@dbi-services.com
  www.dbi-services.com
 
 
 
 
 
 
 
 
 
  ⇒ dbi services is recruiting Oracle amp; SQL Server experts ! – Join 
the
  team
 
 
 
 
 
 
 
 
 
  --
  Benjamin Roth
  Prokurist
 
  Jaumo GmbH · www.jaumo.com
  Wehrstraße 46 · 73035 Göppingen · Germany
  Phone +49 7161 304880-6 · Fax +49 7161 304880-1
  AG Ulm · HRB 731058 · Managing Director: Jens Kammerer
 
 
 
 
 
 
 
 
 
 
 
 --
 Benjamin Roth
 Prokurist
 
 Jaumo GmbH · www.jaumo.com
 Wehrstraße 46 · 73035 Göppingen · Germany
 Phone +49 7161 304880-6 · Fax +49 7161 304880-1
 AG Ulm · HRB 731058 · Managing Director: Jens Kammerer
 
 






-- 
Benjamin Roth
Prokurist

Jaumo GmbH · www.jaumo.com
Wehrstraße 46 · 73035 Göppingen · Germany
Phone +49 7161 304880-6 · Fax +49 7161 304880-1
AG Ulm · HRB 731058 · Managing Director: Jens Kammerer

 
 






Re: Cassandra listen port

2016-10-04 Thread Benjamin Roth
Of course, just add aliases to your interfaces (like eth0:0, eth0:1, ...).
For example CCM (https://github.com/pcmanus/ccm) uses 127.0.0.[1-255] to
set up multiple CS instances on a single server.

2016-10-04 20:49 GMT+02:00 Mehdi Bada :

> Virtual addresses can be possible also?
>
> Thanks Benjamin
>
> Mehdi Bada | Consultant
> Phone: +41 32 422 96 00 | Mobile: +41 79 928 75 48 | Fax: +41 32 422 96 15
> dbi services, Rue de la Jeunesse 2, CH-2800 Delémont
> mehdi.b...@dbi-services.com www.dbi-services.com
>
> ⇒ dbi services is recruiting Oracle & SQL Server experts ! – Join the team
>
>
> - Original Message -
> From: Benjamin Roth 
> To: user@cassandra.apache.org
> Sent: Tue, 04 Oct 2016 20:36:49 +0200 (CEST)
> Subject: Re: Cassandra listen port
>
> As far as I can see, these ports are also used for outgoing connection, so
> a node expects all other peers also to use that port. Therefore the answer
> is no. Use multiple IP addresses instead.
>
> 2016-10-04 20:03 GMT+02:00 Mehdi Bada :
>
> > Thanks Vladimir.
> > It means if I want to run Cassandra on multi instance environment I only
> > have to change the listen address of each instance and the 9000 CQL
> port??
> >
> >
> > ---
> > Mehdi Bada | Consultant
> > Phone: +41 32 422 96 00 | Mobile: +41 79 928 75 48 | Fax: +41 32 422 96
> 15
> > dbi services, Rue de la Jeunesse 2, CH-2800 Delémont
> > mehdi.b...@dbi-services.com www.dbi-services.com
> >
> > ⇒ dbi services is recruiting Oracle & SQL Server experts ! – Join the
> team
> >
> >
> > - Original Message -
> > From: Vladimir Yudovin 
> > To: user@cassandra.apache.org
> > Sent: Tue, 04 Oct 2016 18:18:19 +0200 (CEST)
> > Subject: Re: Cassandra listen port
> >
> > Actually the main port is 9042 - for client (CQL) connections and 7000
> > (7001 if SSL enabled) for inter node communications.
> >
> > Best regards, Vladimir Yudovin,
> > Winguzone Inc - Hosted Cloud Cassandra on Azure and SoftLayer.
> > Launch your cluster in minutes.
> >
> >
> >
> >
> >  On Tue, 04 Oct 2016 11:36:04 -0400 Benjamin
> > Rothbenjamin.r...@jaumo.com wrote 
> >
> > There are several ports for several services. They are all set in
> > cassandra.yaml
> >
> > See here for complete documentation:
> > https://docs.datastax.com/en/cassandra/2.1/cassandra/
> > configuration/configCassandra_yaml_r.html
> >
> >
> >
> > 2016-10-04 16:54 GMT+02:00 Mehdi Bada mehdi.b...@dbi-services.com
> :
> > Hi all,
> >
> >
> >
> > What is the listen port parameter for Apache Cassandra? Does it exist?
> >
> > In comparison with MongoDB, in mongo it's possible to set the listen port
> > in the mongod.conf (configuration file)
> >
> >
> >
> > Regards
> >
> > Mehdi
> >
> >
> >
> > Mehdi Bada | Consultant
> > Phone: +41 32 422 96 00 | Mobile: +41 79 928 75 48 | Fax: +41 32 422 96
> 15
> > dbi services, Rue de la Jeunesse 2, CH-2800 Delémont
> > mehdi.b...@dbi-services.com
> > www.dbi-services.com
> >
> >
> >
> >
> >
> >
> >
> >
> >
> > ⇒ dbi services is recruiting Oracle  SQL Server experts ! – Join the
> > team
> >
> >
> >
> >
> >
> >
> >
> >
> >
> > --
> > Benjamin Roth
> > Prokurist
> >
> > Jaumo GmbH · www.jaumo.com
> > Wehrstraße 46 · 73035 Göppingen · Germany
> > Phone +49 7161 304880-6 · Fax +49 7161 304880-1
> > AG Ulm · HRB 731058 · Managing Director: Jens Kammerer
> >
> >
> >
> >
> >
> >
> >
> >
> >
>
>
> --
> Benjamin Roth
> Prokurist
>
> Jaumo GmbH · www.jaumo.com
> Wehrstraße 46 · 73035 Göppingen · Germany
> Phone +49 7161 304880-6 · Fax +49 7161 304880-1
> AG Ulm · HRB 731058 · Managing Director: Jens Kammerer
>
>


-- 
Benjamin Roth
Prokurist

Jaumo GmbH · www.jaumo.com
Wehrstraße 46 · 73035 Göppingen · Germany
Phone +49 7161 304880-6 · Fax +49 7161 304880-1
AG Ulm · HRB 731058 · Managing Director: Jens Kammerer


Re: How to write a trigger in Cassandra to only detect updates of an existing row?

2016-10-04 Thread Kant Kodali
Thanks a lot, This helps me to make a decision on not to write one for the
performance reasons you pointed out!
 





On Tue, Oct 4, 2016 11:42 AM, Eric Stevens migh...@gmail.com
wrote:
You would have to perform a SELECT on the row in the trigger code in order to
determine if there was underlying data.  Cassandra is in essence an append-only
data store, when an INSERT or UPDATE is executed, it has no idea if there is
already a row underlying it, and for write performance reasons it also doesn't
care.
Note that if you do this, you're going to introduce a giant bottleneck in your
write path and increase the IO cost of writes.  You'll also probably have some
race conditions such that if two writes to the same row happen in quick
succession your trigger might not notice that one of them is writing to the same
row as the other. You might need to resort to CAS operations to overcome that,
along with its associated overhead.  But all that said, it should be possible,
though you'll have to write it for yourself in your trigger code.


On Tue, Oct 4, 2016 at 12:29 PM Kant Kodali  wrote:
Hi all,
How to write a trigger in Cassandra to detect updates? My requirement is that I
want a trigger to alert me only when there is an update to an existing row and
looks like given the way INSERT and Update works this might be hard to do
because INSERT will just overwrite if there is an existing row and Update
becomes new insert where there is no row that belongs to certain partition key.
is there a way to solve this problem?
Thanks,

kant

Re: Cassandra listen port

2016-10-04 Thread Mehdi Bada
Virtual addresses can be possible also? 

Thanks Benjamin

Mehdi Bada | Consultant
Phone: +41 32 422 96 00 | Mobile: +41 79 928 75 48 | Fax: +41 32 422 96 15 dbi 
services, Rue de la Jeunesse 2, CH-2800 Delémont
mehdi.b...@dbi-services.com www.dbi-services.com

⇒ dbi services is recruiting Oracle & SQL Server experts ! – Join the team


- Original Message -
From: Benjamin Roth 
To: user@cassandra.apache.org
Sent: Tue, 04 Oct 2016 20:36:49 +0200 (CEST)
Subject: Re: Cassandra listen port

As far as I can see, these ports are also used for outgoing connection, so
a node expects all other peers also to use that port. Therefore the answer
is no. Use multiple IP addresses instead.

2016-10-04 20:03 GMT+02:00 Mehdi Bada :

> Thanks Vladimir.
> It means if I want to run Cassandra on multi instance environment I only
> have to change the listen address of each instance and the 9000 CQL port??
>
>
> ---
> Mehdi Bada | Consultant
> Phone: +41 32 422 96 00 | Mobile: +41 79 928 75 48 | Fax: +41 32 422 96 15
> dbi services, Rue de la Jeunesse 2, CH-2800 Delémont
> mehdi.b...@dbi-services.com www.dbi-services.com
>
> ⇒ dbi services is recruiting Oracle & SQL Server experts ! – Join the team
>
>
> - Original Message -
> From: Vladimir Yudovin 
> To: user@cassandra.apache.org
> Sent: Tue, 04 Oct 2016 18:18:19 +0200 (CEST)
> Subject: Re: Cassandra listen port
>
> Actually the main port is 9042 - for client (CQL) connections and 7000
> (7001 if SSL enabled) for inter node communications.
>
> Best regards, Vladimir Yudovin,
> Winguzone Inc - Hosted Cloud Cassandra on Azure and SoftLayer.
> Launch your cluster in minutes.
>
>
>
>
>  On Tue, 04 Oct 2016 11:36:04 -0400 Benjamin
> Rothbenjamin.r...@jaumo.com wrote 
>
> There are several ports for several services. They are all set in
> cassandra.yaml
>
> See here for complete documentation:
> https://docs.datastax.com/en/cassandra/2.1/cassandra/
> configuration/configCassandra_yaml_r.html
>
>
>
> 2016-10-04 16:54 GMT+02:00 Mehdi Bada mehdi.b...@dbi-services.com:
> Hi all,
>
>
>
> What is the listen port parameter for Apache Cassandra? Does it exist?
>
> In comparison with MongoDB, in mongo it's possible to set the listen port
> in the mongod.conf (configuration file)
>
>
>
> Regards
>
> Mehdi
>
>
>
> Mehdi Bada | Consultant
> Phone: +41 32 422 96 00 | Mobile: +41 79 928 75 48 | Fax: +41 32 422 96 15
> dbi services, Rue de la Jeunesse 2, CH-2800 Delémont
> mehdi.b...@dbi-services.com
> www.dbi-services.com
>
>
>
>
>
>
>
>
>
> ⇒ dbi services is recruiting Oracle  SQL Server experts ! – Join the
> team
>
>
>
>
>
>
>
>
>
> --
> Benjamin Roth
> Prokurist
>
> Jaumo GmbH · www.jaumo.com
> Wehrstraße 46 · 73035 Göppingen · Germany
> Phone +49 7161 304880-6 · Fax +49 7161 304880-1
> AG Ulm · HRB 731058 · Managing Director: Jens Kammerer
>
>
>
>
>
>
>
>
>


-- 
Benjamin Roth
Prokurist

Jaumo GmbH · www.jaumo.com
Wehrstraße 46 · 73035 Göppingen · Germany
Phone +49 7161 304880-6 · Fax +49 7161 304880-1
AG Ulm · HRB 731058 · Managing Director: Jens Kammerer



Re: Re: Cassandra listen port

2016-10-04 Thread Vladimir Yudovin
 9000 CQL port??
Do you mean 9042?

There are two address/ports - one is listen (internode communication) and 
second is CQL (rpc in YAML terms).
Look at YAML explanation on listen_address  and rpc_address

Actual configuration depends on how many network card each node have, is it NAT 
or not, etc. Can you give more detail about your network environment?


Best regards, Vladimir Yudovin, 
Winguzone Inc - Hosted Cloud Cassandra on Azure and SoftLayer.
Launch your cluster in minutes.




 On Tue, 04 Oct 2016 14:03:42 -0400 Mehdi Bada 
mehdi.b...@dbi-services.com wrote  

Thanks Vladimir. 
It means if I want to run Cassandra on multi instance environment I only have 
to change the listen address of each instance and the 9000 CQL port?? 
 
 
--- 
Mehdi Bada | Consultant 
Phone: +41 32 422 96 00 | Mobile: +41 79 928 75 48 | Fax: +41 32 422 96 15 dbi 
services, Rue de la Jeunesse 2, CH-2800 Delémont 
mehdi.b...@dbi-services.com www.dbi-services.com 
 
⇒ dbi services is recruiting Oracle  SQL Server experts ! – Join the team 
 
 
- Original Message - 
From: Vladimir Yudovin vla...@winguzone.com 
To: user@cassandra.apache.org 
Sent: Tue, 04 Oct 2016 18:18:19 +0200 (CEST) 
Subject: Re: Cassandra listen port 
 
Actually the main port is 9042 - for client (CQL) connections and 7000 (7001 if 
SSL enabled) for inter node communications. 
 
Best regards, Vladimir Yudovin, 
Winguzone Inc - Hosted Cloud Cassandra on Azure and SoftLayer. 
Launch your cluster in minutes. 
 
 
 
 
 On Tue, 04 Oct 2016 11:36:04 -0400 Benjamin 
Rothlt;benjamin.r...@jaumo.comgt; wrote  
 
There are several ports for several services. They are all set in 
cassandra.yaml 
 
See here for complete documentation: 
https://docs.datastax.com/en/cassandra/2.1/cassandra/configuration/configCassandra_yaml_r.html
 
 
 
 
2016-10-04 16:54 GMT+02:00 Mehdi Bada 
lt;mehdi.b...@dbi-services.comgt;: 
Hi all, 
 
 
 
What is the listen port parameter for Apache Cassandra? Does it exist? 
 
In comparison with MongoDB, in mongo it's possible to set the listen port in 
the mongod.conf (configuration file) 
 
 
 
Regards 
 
Mehdi 
 
 
 
Mehdi Bada | Consultant 
Phone: +41 32 422 96 00 | Mobile: +41 79 928 75 48 | Fax: +41 32 422 96 15 
dbi services, Rue de la Jeunesse 2, CH-2800 Delémont 
mehdi.b...@dbi-services.com 
www.dbi-services.com 
 
 
 
 
 
 
 
 
 
⇒ dbi services is recruiting Oracle amp; SQL Server experts ! – Join the 
team 
 
 
 
 
 
 
 
 
 
-- 
Benjamin Roth 
Prokurist 
 
Jaumo GmbH · www.jaumo.com 
Wehrstraße 46 · 73035 Göppingen · Germany 
Phone +49 7161 304880-6 · Fax +49 7161 304880-1 
AG Ulm · HRB 731058 · Managing Director: Jens Kammerer 
 
 
 
 
 
 
 
 







Re: How to write a trigger in Cassandra to only detect updates of an existing row?

2016-10-04 Thread Eric Stevens
You would have to perform a SELECT on the row in the trigger code in order
to determine if there was underlying data.  Cassandra is in essence an
append-only data store, when an INSERT or UPDATE is executed, it has no
idea if there is already a row underlying it, and for write performance
reasons it also doesn't care.

Note that if you do this, you're going to introduce a giant bottleneck in
your write path and increase the IO cost of writes.  You'll also probably
have some race conditions such that if two writes to the same row happen in
quick succession your trigger might not notice that one of them is writing
to the same row as the other. You might need to resort to CAS operations to
overcome that, along with its associated overhead.  But all that said, it
should be possible, though you'll have to write it for yourself in your
trigger code.



On Tue, Oct 4, 2016 at 12:29 PM Kant Kodali  wrote:

> Hi all,
>
> How to write a trigger in Cassandra to detect updates? My requirement is
> that I want a trigger to alert me only when there is an update to an
> existing row and looks like given the way INSERT and Update works this
> might be hard to do because INSERT will just overwrite if there is an
> existing row and Update becomes new insert where there is no row that
> belongs to certain partition key. is there a way to solve this problem?
>
> Thanks,
>
> kant
>


Re: Cassandra listen port

2016-10-04 Thread Benjamin Roth
As far as I can see, these ports are also used for outgoing connection, so
a node expects all other peers also to use that port. Therefore the answer
is no. Use multiple IP addresses instead.

2016-10-04 20:03 GMT+02:00 Mehdi Bada :

> Thanks Vladimir.
> It means if I want to run Cassandra on multi instance environment I only
> have to change the listen address of each instance and the 9000 CQL port??
>
>
> ---
> Mehdi Bada | Consultant
> Phone: +41 32 422 96 00 | Mobile: +41 79 928 75 48 | Fax: +41 32 422 96 15
> dbi services, Rue de la Jeunesse 2, CH-2800 Delémont
> mehdi.b...@dbi-services.com www.dbi-services.com
>
> ⇒ dbi services is recruiting Oracle & SQL Server experts ! – Join the team
>
>
> - Original Message -
> From: Vladimir Yudovin 
> To: user@cassandra.apache.org
> Sent: Tue, 04 Oct 2016 18:18:19 +0200 (CEST)
> Subject: Re: Cassandra listen port
>
> Actually the main port is 9042 - for client (CQL) connections and 7000
> (7001 if SSL enabled) for inter node communications.
>
> Best regards, Vladimir Yudovin,
> Winguzone Inc - Hosted Cloud Cassandra on Azure and SoftLayer.
> Launch your cluster in minutes.
>
>
>
>
>  On Tue, 04 Oct 2016 11:36:04 -0400 Benjamin
> Rothbenjamin.r...@jaumo.com wrote 
>
> There are several ports for several services. They are all set in
> cassandra.yaml
>
> See here for complete documentation:
> https://docs.datastax.com/en/cassandra/2.1/cassandra/
> configuration/configCassandra_yaml_r.html
>
>
>
> 2016-10-04 16:54 GMT+02:00 Mehdi Bada mehdi.b...@dbi-services.com:
> Hi all,
>
>
>
> What is the listen port parameter for Apache Cassandra? Does it exist?
>
> In comparison with MongoDB, in mongo it's possible to set the listen port
> in the mongod.conf (configuration file)
>
>
>
> Regards
>
> Mehdi
>
>
>
> Mehdi Bada | Consultant
> Phone: +41 32 422 96 00 | Mobile: +41 79 928 75 48 | Fax: +41 32 422 96 15
> dbi services, Rue de la Jeunesse 2, CH-2800 Delémont
> mehdi.b...@dbi-services.com
> www.dbi-services.com
>
>
>
>
>
>
>
>
>
> ⇒ dbi services is recruiting Oracle  SQL Server experts ! – Join the
> team
>
>
>
>
>
>
>
>
>
> --
> Benjamin Roth
> Prokurist
>
> Jaumo GmbH · www.jaumo.com
> Wehrstraße 46 · 73035 Göppingen · Germany
> Phone +49 7161 304880-6 · Fax +49 7161 304880-1
> AG Ulm · HRB 731058 · Managing Director: Jens Kammerer
>
>
>
>
>
>
>
>
>


-- 
Benjamin Roth
Prokurist

Jaumo GmbH · www.jaumo.com
Wehrstraße 46 · 73035 Göppingen · Germany
Phone +49 7161 304880-6 · Fax +49 7161 304880-1
AG Ulm · HRB 731058 · Managing Director: Jens Kammerer


How to write a trigger in Cassandra to only detect updates of an existing row?

2016-10-04 Thread Kant Kodali

Hi all,
How to write a trigger in Cassandra to detect updates? My requirement is that I
want a trigger to alert me only when there is an update to an existing row and
looks like given the way INSERT and Update works this might be hard to do
because INSERT will just overwrite if there is an existing row and Update
becomes new insert where there is no row that belongs to certain partition key.
is there a way to solve this problem?
Thanks,

kant

Re: Cassandra listen port

2016-10-04 Thread Mehdi Bada
Thanks Vladimir. 
It means if I want to run Cassandra on multi instance environment I only have 
to change the listen address of each instance and the 9000 CQL port??


---
Mehdi Bada | Consultant
Phone: +41 32 422 96 00 | Mobile: +41 79 928 75 48 | Fax: +41 32 422 96 15 dbi 
services, Rue de la Jeunesse 2, CH-2800 Delémont
mehdi.b...@dbi-services.com www.dbi-services.com

⇒ dbi services is recruiting Oracle & SQL Server experts ! – Join the team


- Original Message -
From: Vladimir Yudovin 
To: user@cassandra.apache.org
Sent: Tue, 04 Oct 2016 18:18:19 +0200 (CEST)
Subject: Re: Cassandra listen port

Actually the main port is 9042 - for client (CQL) connections and 7000 (7001 if 
SSL enabled) for inter node communications.

Best regards, Vladimir Yudovin, 
Winguzone Inc - Hosted Cloud Cassandra on Azure and SoftLayer.
Launch your cluster in minutes.




 On Tue, 04 Oct 2016 11:36:04 -0400 Benjamin 
Rothbenjamin.r...@jaumo.com wrote  

There are several ports for several services. They are all set in cassandra.yaml

See here for complete documentation:
https://docs.datastax.com/en/cassandra/2.1/cassandra/configuration/configCassandra_yaml_r.html



2016-10-04 16:54 GMT+02:00 Mehdi Bada mehdi.b...@dbi-services.com:
Hi all, 



What is the listen port parameter for Apache Cassandra? Does it exist?

In comparison with MongoDB, in mongo it's possible to set the listen port in 
the mongod.conf (configuration file)



Regards 

Mehdi



Mehdi Bada | Consultant
Phone: +41 32 422 96 00 | Mobile: +41 79 928 75 48 | Fax: +41 32 422 96 15 
dbi services, Rue de la Jeunesse 2, CH-2800 Delémont
mehdi.b...@dbi-services.com 
www.dbi-services.com









⇒ dbi services is recruiting Oracle  SQL Server experts ! – Join the team









-- 
Benjamin Roth
Prokurist

Jaumo GmbH · www.jaumo.com
Wehrstraße 46 · 73035 Göppingen · Germany
Phone +49 7161 304880-6 · Fax +49 7161 304880-1
AG Ulm · HRB 731058 · Managing Director: Jens Kammerer

 
 







Re: Tombstoned error and then OOM

2016-10-04 Thread INDRANIL BASU
The query has a where clause on a column which is a secondary index in the 
column family.E.g 
select * from test_schema.test_cf where status = 0; 
Here the status is integer column which is indexed. 
 -- IB

  From: kurt Greaves 
 To: user@cassandra.apache.org; INDRANIL BASU  
 Sent: Tuesday, 4 October 2016 10:38 PM
 Subject: Re: Tombstoned error and then OOM
   
This sounds like you're running a query that consumes a lot of memory. Are you 
by chance querying a very large partition or not bounding your query?

I'd also recommend upgrading to 2.1.15, 2.1.0 is very old and has quite a few 
bugs.

On 3 October 2016 at 17:08, INDRANIL BASU  wrote:

Hello All,



I am getting the below error repeatedly in the system log of C* 2.1.0

WARN  [SharedPool-Worker-64] 2016-09-27 00:43:35,835 SliceQueryFilter.java:236 
- Read 0 live and 1923 tombstoned cells in test_schema.test_cf.test_cf_ 
col1_idx (see tombstone_warn_threshold). 5000 columns was requested, 
slices=[-], delInfo={deletedAt=- 9223372036854775808, localDeletion=2147483647}
After that NullPointer Exception and finally OOM
ERROR [CompactionExecutor:6287] 2016-09-29 22:09:13,546 
CassandraDaemon.java:166 - Exception in thread Thread[CompactionExecutor: 
6287,1,main]
java.lang. NullPointerException: null
    at org.apache.cassandra.service. CacheService$ 
KeyCacheSerializer.serialize( CacheService.java:475) 
~[apache-cassandra-2.1.0.jar: 2.1.0]
    at org.apache.cassandra.service. CacheService$ 
KeyCacheSerializer.serialize( CacheService.java:463) 
~[apache-cassandra-2.1.0.jar: 2.1.0]
    at org.apache.cassandra.cache. AutoSavingCache$Writer. 
saveCache(AutoSavingCache. java:225) ~[apache-cassandra-2.1.0.jar: 2.1.0]
    at org.apache.cassandra.db. compaction.CompactionManager$ 
11.run(CompactionManager.java: 1061) ~[apache-cassandra-2.1.0.jar: 2.1.0]
    at java.util.concurrent. Executors$RunnableAdapter. call(Unknown 
Source) ~[na:1.7.0_80]
    at java.util.concurrent. FutureTask.run(Unknown Source) ~[na:1.7.0_80]
    at java.util.concurrent. ThreadPoolExecutor.runWorker( Unknown Source) 
[na:1.7.0_80]
    at java.util.concurrent. ThreadPoolExecutor$Worker.run( Unknown Source) 
[na:1.7.0_80]
    at java.lang.Thread.run(Unknown Source) [na:1.7.0_80]
ERROR [CompactionExecutor:9712] 2016-10-01 10:09:13,871 
CassandraDaemon.java:166 - Exception in thread Thread[CompactionExecutor: 
9712,1,main]
java.lang. NullPointerException: null
ERROR [CompactionExecutor:10070] 2016-10-01 14:09:14,154 
CassandraDaemon.java:166 - Exception in thread Thread[CompactionExecutor: 
10070,1,main]
java.lang. NullPointerException: null
ERROR [CompactionExecutor:10413] 2016-10-01 18:09:14,265 
CassandraDaemon.java:166 - Exception in thread Thread[CompactionExecutor: 
10413,1,main]
java.lang. NullPointerException: null
ERROR [MemtableFlushWriter:2396] 2016-10-01 20:28:27,425 
CassandraDaemon.java:166 - Exception in thread Thread[MemtableFlushWriter: 
2396,5,main]
java.lang.OutOfMemoryError: unable to create new native thread
    at java.lang.Thread.start0(Native Method) ~[na:1.7.0_80]
    at java.lang.Thread.start(Unknown Source) ~[na:1.7.0_80]
    at java.util.concurrent. ThreadPoolExecutor.addWorker( Unknown Source) 
~[na:1.7.0_80]
    at java.util.concurrent. ThreadPoolExecutor. processWorkerExit(Unknown 
Source) ~[na:1.7.0_80]
    at java.util.concurrent. ThreadPoolExecutor.runWorker( Unknown Source) 
~[na:1.7.0_80]
    at java.util.concurrent. ThreadPoolExecutor$Worker.run( Unknown Source) 
~[na:1.7.0_80]
    at java.lang.Thread.run(Unknown Source) ~[na:1.7.0_80]
-- IB




   



-- 
Kurt greavesk...@instaclustr.comwww.instaclustr.com

   

Re: Efficient model for a sorting

2016-10-04 Thread Benjamin Roth
I started off with 3.0.6 and for my personal use case(s) they had the same
bugs as tick tock.

2016-10-04 19:03 GMT+02:00 Jonathan Haddad :

> I strongly recommend avoiding tick tock. You'll be one of the only people
> putting it in prod and will likely hit a number of weird issues nobody will
> be able to help you with.
> On Tue, Oct 4, 2016 at 12:40 PM Benjamin Roth 
> wrote:
>
>> I have the impression, that not the tick-tock is the real problem but MVs
>> are not really battle-tested yet.
>> Depending on the model, they put much more complexity on a cluster and
>> it's behaviour under heavy load. Especially if you are going to create an
>> MV with a different partition key than the base table this might be a shot
>> in the head.
>> At least I was able to bring my cluster down many times just by throwing
>> a few queries too much at it or by running some big repairs with reaper.
>> Only since some days, things seem to go smoothly after having struggled
>> about 2 months with very different kind of issues.
>>
>> We'll see ... most probably I will stick with the latest version. After
>> all it seems to work ok, I gained a lot of experience in running and
>> troubleshooting and to deal with bugs and maybe I am so able to contribute
>> a bit to further development.
>>
>> 2016-10-04 18:26 GMT+02:00 Vladimir Yudovin :
>>
>> >Would you consider 3.0.x to be more stable than 3.x?
>> I guess yes, but there are some discussion on this list:
>>
>> (C)* stable version after 3.5
>> 
>> Upgrade from 3.0.6 to 3.7.
>> 
>>
>> It seems to be eternal topic till tick-tock approach stabilizes.
>>
>>
>> Best regards, Vladimir Yudovin,
>>
>>
>> *Winguzone Inc  - Hosted Cloud Cassandra
>> on Azure and SoftLayer.Launch your cluster in minutes.*
>>
>>
>>  On Tue, 04 Oct 2016 12:19:13 -0400 *Benjamin
>> Roth>* wrote 
>>
>> I use the self-compiled master (3.10, ticktock). I had to fix a severe
>> bug on my own and decided to go with the latest code.
>> Would you consider 3.0.x to be more stable than 3.x?
>>
>> 2016-10-04 18:14 GMT+02:00 Vladimir Yudovin :
>>
>> Hi Benjamin!
>>
>> >we now use CS 3.x and have been advised that 3.x is still not considered
>> really production ready.
>>
>> Did you consider using of 3.0.9? Actually it's 3.0 with almost an year
>> fixes.
>>
>>
>> Best regards, Vladimir Yudovin,
>>
>>
>> *Winguzone Inc  - Hosted Cloud Cassandra
>> on Azure and SoftLayer.Launch your cluster in minutes.*
>>
>>
>>  On Tue, 04 Oct 2016 07:27:54 -0400 *Benjamin Roth
>> >* wrote 
>>
>> Hi!
>>
>> I have a frequently used pattern which seems to be quite costly in CS.
>> The pattern is always the same: I have a unique key and a sorting by a
>> different field.
>>
>> To give an example, here a real life example from our model:
>> CREATE TABLE visits.visits_in (
>> user_id int,
>> user_id_visitor int,
>> created timestamp,
>> PRIMARY KEY (user_id, user_id_visitor)
>> ) WITH CLUSTERING ORDER BY (user_id_visitor ASC)
>>
>> CREATE MATERIALIZED VIEW visits.visits_in_sorted_mv AS
>> SELECT user_id, created, user_id_visitor
>> FROM visits.visits_in
>> WHERE user_id IS NOT NULL AND created IS NOT NULL AND user_id_visitor
>> IS NOT NULL
>> PRIMARY KEY (user_id, created, user_id_visitor)
>> WITH CLUSTERING ORDER BY (created DESC, user_id_visitor DESC)
>>
>> This simply represents people, that visited my profile sorted by date
>> desc but only one entry per visitor.
>> Other examples with the same pattern could be a whats-app-like inbox
>> where the last message of each sender is shown by date desc. There are lots
>> of examples for that pattern.
>>
>> E.g. in redis I'd just use a sorted set, where the key could be like
>> "visits_${user_id}", set key would be user_id_visitor and score
>> the created timestamp.
>> In MySQL I'd create the table with PK on user_id + user_id_visitor and
>> create an index on user_id + created
>> In C* i use an MV.
>>
>> Is this the most efficient approach?
>> I also could have done this without an MV but then the situation in our
>> app would be far more complex.
>> I know that denormalization is a common pattern in C* and I don't
>> hesitate to use it but in this case, it is not as simple as it's not an
>> append-only case but updates have to be handled correctly.
>> If it is the first visit of a user, it's that simple, just 2 inserts in
>> base table + denormalized table. But on a 2nd or 3rd visit, the 1st or 2nd
>> visit has to be deleted from the 

Re: Tombstoned error and then OOM

2016-10-04 Thread kurt Greaves
This sounds like you're running a query that consumes a lot of memory. Are
you by chance querying a very large partition or not bounding your query?

I'd also recommend upgrading to 2.1.15, 2.1.0 is very old and has quite a
few bugs.

On 3 October 2016 at 17:08, INDRANIL BASU  wrote:

> Hello All,
>
>
>
> I am getting the below error repeatedly in the system log of C* 2.1.0
>
> WARN  [SharedPool-Worker-64] 2016-09-27 00:43:35,835
> SliceQueryFilter.java:236 - Read 0 live and 1923 tombstoned cells in
> test_schema.test_cf.test_cf_col1_idx (see tombstone_warn_threshold). 5000
> columns was requested, slices=[-], delInfo={deletedAt=-9223372036854775808,
> localDeletion=2147483647}
>
> After that NullPointer Exception and finally OOM
>
> ERROR [CompactionExecutor:6287] 2016-09-29 22:09:13,546
> CassandraDaemon.java:166 - Exception in thread Thread[CompactionExecutor:
> 6287,1,main]
> java.lang.NullPointerException: null
> at org.apache.cassandra.service.CacheService$
> KeyCacheSerializer.serialize(CacheService.java:475)
> ~[apache-cassandra-2.1.0.jar:2.1.0]
> at org.apache.cassandra.service.CacheService$
> KeyCacheSerializer.serialize(CacheService.java:463)
> ~[apache-cassandra-2.1.0.jar:2.1.0]
> at org.apache.cassandra.cache.AutoSavingCache$Writer.
> saveCache(AutoSavingCache.java:225) ~[apache-cassandra-2.1.0.jar:2.1.0]
> at org.apache.cassandra.db.compaction.CompactionManager$
> 11.run(CompactionManager.java:1061) ~[apache-cassandra-2.1.0.jar:2.1.0]
> at java.util.concurrent.Executors$RunnableAdapter.call(Unknown
> Source) ~[na:1.7.0_80]
> at java.util.concurrent.FutureTask.run(Unknown Source)
> ~[na:1.7.0_80]
> at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown
> Source) [na:1.7.0_80]
> at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown
> Source) [na:1.7.0_80]
> at java.lang.Thread.run(Unknown Source) [na:1.7.0_80]
> ERROR [CompactionExecutor:9712] 2016-10-01 10:09:13,871
> CassandraDaemon.java:166 - Exception in thread Thread[CompactionExecutor:
> 9712,1,main]
> java.lang.NullPointerException: null
> ERROR [CompactionExecutor:10070] 2016-10-01 14:09:14,154
> CassandraDaemon.java:166 - Exception in thread Thread[CompactionExecutor:
> 10070,1,main]
> java.lang.NullPointerException: null
> ERROR [CompactionExecutor:10413] 2016-10-01 18:09:14,265
> CassandraDaemon.java:166 - Exception in thread Thread[CompactionExecutor:
> 10413,1,main]
> java.lang.NullPointerException: null
> ERROR [MemtableFlushWriter:2396] 2016-10-01 20:28:27,425
> CassandraDaemon.java:166 - Exception in thread Thread[MemtableFlushWriter:
> 2396,5,main]
> java.lang.OutOfMemoryError: unable to create new native thread
> at java.lang.Thread.start0(Native Method) ~[na:1.7.0_80]
> at java.lang.Thread.start(Unknown Source) ~[na:1.7.0_80]
> at java.util.concurrent.ThreadPoolExecutor.addWorker(Unknown
> Source) ~[na:1.7.0_80]
> at java.util.concurrent.ThreadPoolExecutor.processWorkerExit(Unknown
> Source) ~[na:1.7.0_80]
> at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown
> Source) ~[na:1.7.0_80]
> at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown
> Source) ~[na:1.7.0_80]
> at java.lang.Thread.run(Unknown Source) ~[na:1.7.0_80]
>
> -- IB
>
>
>
>
>
>


-- 
Kurt Greaves
k...@instaclustr.com
www.instaclustr.com


Re: Efficient model for a sorting

2016-10-04 Thread Jonathan Haddad
I strongly recommend avoiding tick tock. You'll be one of the only people
putting it in prod and will likely hit a number of weird issues nobody will
be able to help you with.
On Tue, Oct 4, 2016 at 12:40 PM Benjamin Roth 
wrote:

> I have the impression, that not the tick-tock is the real problem but MVs
> are not really battle-tested yet.
> Depending on the model, they put much more complexity on a cluster and
> it's behaviour under heavy load. Especially if you are going to create an
> MV with a different partition key than the base table this might be a shot
> in the head.
> At least I was able to bring my cluster down many times just by throwing a
> few queries too much at it or by running some big repairs with reaper.
> Only since some days, things seem to go smoothly after having struggled
> about 2 months with very different kind of issues.
>
> We'll see ... most probably I will stick with the latest version. After
> all it seems to work ok, I gained a lot of experience in running and
> troubleshooting and to deal with bugs and maybe I am so able to contribute
> a bit to further development.
>
> 2016-10-04 18:26 GMT+02:00 Vladimir Yudovin :
>
> >Would you consider 3.0.x to be more stable than 3.x?
> I guess yes, but there are some discussion on this list:
>
> (C)* stable version after 3.5
> 
> Upgrade from 3.0.6 to 3.7.
> 
>
> It seems to be eternal topic till tick-tock approach stabilizes.
>
>
> Best regards, Vladimir Yudovin,
>
>
> *Winguzone Inc  - Hosted Cloud Cassandra
> on Azure and SoftLayer.Launch your cluster in minutes.*
>
>
>  On Tue, 04 Oct 2016 12:19:13 -0400 *Benjamin
> Roth>* wrote 
>
> I use the self-compiled master (3.10, ticktock). I had to fix a severe bug
> on my own and decided to go with the latest code.
> Would you consider 3.0.x to be more stable than 3.x?
>
> 2016-10-04 18:14 GMT+02:00 Vladimir Yudovin :
>
> Hi Benjamin!
>
> >we now use CS 3.x and have been advised that 3.x is still not considered
> really production ready.
>
> Did you consider using of 3.0.9? Actually it's 3.0 with almost an year
> fixes.
>
>
> Best regards, Vladimir Yudovin,
>
>
> *Winguzone Inc  - Hosted Cloud Cassandra
> on Azure and SoftLayer.Launch your cluster in minutes.*
>
>
>  On Tue, 04 Oct 2016 07:27:54 -0400 *Benjamin Roth
> >* wrote 
>
> Hi!
>
> I have a frequently used pattern which seems to be quite costly in CS. The
> pattern is always the same: I have a unique key and a sorting by a
> different field.
>
> To give an example, here a real life example from our model:
> CREATE TABLE visits.visits_in (
> user_id int,
> user_id_visitor int,
> created timestamp,
> PRIMARY KEY (user_id, user_id_visitor)
> ) WITH CLUSTERING ORDER BY (user_id_visitor ASC)
>
> CREATE MATERIALIZED VIEW visits.visits_in_sorted_mv AS
> SELECT user_id, created, user_id_visitor
> FROM visits.visits_in
> WHERE user_id IS NOT NULL AND created IS NOT NULL AND user_id_visitor
> IS NOT NULL
> PRIMARY KEY (user_id, created, user_id_visitor)
> WITH CLUSTERING ORDER BY (created DESC, user_id_visitor DESC)
>
> This simply represents people, that visited my profile sorted by date desc
> but only one entry per visitor.
> Other examples with the same pattern could be a whats-app-like inbox where
> the last message of each sender is shown by date desc. There are lots of
> examples for that pattern.
>
> E.g. in redis I'd just use a sorted set, where the key could be like
> "visits_${user_id}", set key would be user_id_visitor and score
> the created timestamp.
> In MySQL I'd create the table with PK on user_id + user_id_visitor and
> create an index on user_id + created
> In C* i use an MV.
>
> Is this the most efficient approach?
> I also could have done this without an MV but then the situation in our
> app would be far more complex.
> I know that denormalization is a common pattern in C* and I don't hesitate
> to use it but in this case, it is not as simple as it's not an append-only
> case but updates have to be handled correctly.
> If it is the first visit of a user, it's that simple, just 2 inserts in
> base table + denormalized table. But on a 2nd or 3rd visit, the 1st or 2nd
> visit has to be deleted from the denormalized table before. Otherwise the
> visit would not be unique any more.
> Handling this case without an MV requires a lot more effort, I guess even
> more effort than just using an MV.
> 1. You need kind of app-side locking to deal with race conditions
> 2. Read before write is 

Re: Efficient model for a sorting

2016-10-04 Thread Benjamin Roth
I have the impression, that not the tick-tock is the real problem but MVs
are not really battle-tested yet.
Depending on the model, they put much more complexity on a cluster and it's
behaviour under heavy load. Especially if you are going to create an MV
with a different partition key than the base table this might be a shot in
the head.
At least I was able to bring my cluster down many times just by throwing a
few queries too much at it or by running some big repairs with reaper.
Only since some days, things seem to go smoothly after having struggled
about 2 months with very different kind of issues.

We'll see ... most probably I will stick with the latest version. After all
it seems to work ok, I gained a lot of experience in running and
troubleshooting and to deal with bugs and maybe I am so able to contribute
a bit to further development.

2016-10-04 18:26 GMT+02:00 Vladimir Yudovin :

> >Would you consider 3.0.x to be more stable than 3.x?
> I guess yes, but there are some discussion on this list:
>
> (C)* stable version after 3.5
> 
> Upgrade from 3.0.6 to 3.7.
> 
>
> It seems to be eternal topic till tick-tock approach stabilizes.
>
>
> Best regards, Vladimir Yudovin,
>
>
> *Winguzone Inc  - Hosted Cloud Cassandra
> on Azure and SoftLayer.Launch your cluster in minutes.*
>
>
>  On Tue, 04 Oct 2016 12:19:13 -0400 *Benjamin
> Roth>* wrote 
>
> I use the self-compiled master (3.10, ticktock). I had to fix a severe bug
> on my own and decided to go with the latest code.
> Would you consider 3.0.x to be more stable than 3.x?
>
> 2016-10-04 18:14 GMT+02:00 Vladimir Yudovin :
>
> Hi Benjamin!
>
> >we now use CS 3.x and have been advised that 3.x is still not considered
> really production ready.
>
> Did you consider using of 3.0.9? Actually it's 3.0 with almost an year
> fixes.
>
>
> Best regards, Vladimir Yudovin,
>
>
> *Winguzone Inc  - Hosted Cloud Cassandra
> on Azure and SoftLayer.Launch your cluster in minutes.*
>
>
>  On Tue, 04 Oct 2016 07:27:54 -0400 *Benjamin Roth
> >* wrote 
>
> Hi!
>
> I have a frequently used pattern which seems to be quite costly in CS. The
> pattern is always the same: I have a unique key and a sorting by a
> different field.
>
> To give an example, here a real life example from our model:
> CREATE TABLE visits.visits_in (
> user_id int,
> user_id_visitor int,
> created timestamp,
> PRIMARY KEY (user_id, user_id_visitor)
> ) WITH CLUSTERING ORDER BY (user_id_visitor ASC)
>
> CREATE MATERIALIZED VIEW visits.visits_in_sorted_mv AS
> SELECT user_id, created, user_id_visitor
> FROM visits.visits_in
> WHERE user_id IS NOT NULL AND created IS NOT NULL AND user_id_visitor
> IS NOT NULL
> PRIMARY KEY (user_id, created, user_id_visitor)
> WITH CLUSTERING ORDER BY (created DESC, user_id_visitor DESC)
>
> This simply represents people, that visited my profile sorted by date desc
> but only one entry per visitor.
> Other examples with the same pattern could be a whats-app-like inbox where
> the last message of each sender is shown by date desc. There are lots of
> examples for that pattern.
>
> E.g. in redis I'd just use a sorted set, where the key could be like
> "visits_${user_id}", set key would be user_id_visitor and score
> the created timestamp.
> In MySQL I'd create the table with PK on user_id + user_id_visitor and
> create an index on user_id + created
> In C* i use an MV.
>
> Is this the most efficient approach?
> I also could have done this without an MV but then the situation in our
> app would be far more complex.
> I know that denormalization is a common pattern in C* and I don't hesitate
> to use it but in this case, it is not as simple as it's not an append-only
> case but updates have to be handled correctly.
> If it is the first visit of a user, it's that simple, just 2 inserts in
> base table + denormalized table. But on a 2nd or 3rd visit, the 1st or 2nd
> visit has to be deleted from the denormalized table before. Otherwise the
> visit would not be unique any more.
> Handling this case without an MV requires a lot more effort, I guess even
> more effort than just using an MV.
> 1. You need kind of app-side locking to deal with race conditions
> 2. Read before write is required to determine if an old record has to be
> deleted
> 3. At least CL_QUORUM is required to make sure that read before write is
> always consistent
> 4. Old record has to be deleted on update
>
> I guess, using an MV here is more efficient as there is less roundtrip
> between C* and the 

Re: when taking backups using snapshot if the sstable gets compacted will nodetool snapshot hung ??

2016-10-04 Thread Vladimir Yudovin
Hi James!
Hai we are taking backups using nodetool snapshots but i occasionally see that 
my script pauses  while taking a snapshot of a CF, is this because when it is 
taking snapshot does the sstables got compacted to a different one  so it 
couldn't find that particular sstable on which it is taking snapshot so it 
pauses at that particular CF ??
 
Creating snapshot doesn't involve any heavy disk operation, just creating of 
hard link.
Do you have Java JNA installed and enabled? You should see line like INFO  
16:30:42 JNA mlockall successful in C* log file.

Which platform do you use?


Best regards, Vladimir Yudovin, 
Winguzone Inc - Hosted Cloud Cassandra on Azure and SoftLayer.
Launch your cluster in minutes.





Re: Efficient model for a sorting

2016-10-04 Thread Vladimir Yudovin
Would you consider 3.0.x to be more stable than 3.x?
I guess yes, but there are some discussion on this list:

 (C)* stable version after 3.5
 Upgrade from 3.0.6 to 3.7.


It seems to be eternal topic till tick-tock approach stabilizes.


Best regards, Vladimir Yudovin, 
Winguzone Inc - Hosted Cloud Cassandra on Azure and SoftLayer.
Launch your cluster in minutes.




 On Tue, 04 Oct 2016 12:19:13 -0400 Benjamin 
Rothbenjamin.r...@jaumo.com wrote  

I use the self-compiled master (3.10, ticktock). I had to fix a severe bug on 
my own and decided to go with the latest code.Would you consider 3.0.x to be 
more stable than 3.x?


2016-10-04 18:14 GMT+02:00 Vladimir Yudovin vla...@winguzone.com:
Hi Benjamin!

we now use CS 3.x and have been advised that 3.x is still not considered 
really production ready.

Did you consider using of 3.0.9? Actually it's 3.0 with almost an year fixes.


Best regards, Vladimir Yudovin, 
Winguzone Inc - Hosted Cloud Cassandra on Azure and SoftLayer.
Launch your cluster in minutes.




 On Tue, 04 Oct 2016 07:27:54 -0400 Benjamin Roth 
benjamin.r...@jaumo.com wrote  

Hi!


I have a frequently used pattern which seems to be quite costly in CS. The 
pattern is always the same: I have a unique key and a sorting by a different 
field.


To give an example, here a real life example from our model:
CREATE TABLE visits.visits_in (
user_id int,
user_id_visitor int,
created timestamp,
PRIMARY KEY (user_id, user_id_visitor)
) WITH CLUSTERING ORDER BY (user_id_visitor ASC)



CREATE MATERIALIZED VIEW visits.visits_in_sorted_mv AS
SELECT user_id, created, user_id_visitor
FROM visits.visits_in
WHERE user_id IS NOT NULL AND created IS NOT NULL AND user_id_visitor IS 
NOT NULL
PRIMARY KEY (user_id, created, user_id_visitor)
WITH CLUSTERING ORDER BY (created DESC, user_id_visitor DESC)


This simply represents people, that visited my profile sorted by date desc but 
only one entry per visitor.
Other examples with the same pattern could be a whats-app-like inbox where the 
last message of each sender is shown by date desc. There are lots of examples 
for that pattern.




E.g. in redis I'd just use a sorted set, where the key could be like 
"visits_${user_id}", set key would be user_id_visitor and score the created 
timestamp.

In MySQL I'd create the table with PK on user_id + user_id_visitor and create 
an index on user_id + created
In C* i use an MV.


Is this the most efficient approach?
I also could have done this without an MV but then the situation in our app 
would be far more complex.
I know that denormalization is a common pattern in C* and I don't hesitate to 
use it but in this case, it is not as simple as it's not an append-only case 
but updates have to be handled correctly.
If it is the first visit of a user, it's that simple, just 2 inserts in base 
table + denormalized table. But on a 2nd or 3rd visit, the 1st or 2nd visit has 
to be deleted from the denormalized table before. Otherwise the visit would not 
be unique any more.
Handling this case without an MV requires a lot more effort, I guess even more 
effort than just using an MV. 
1. You need kind of app-side locking to deal with race conditions
2. Read before write is required to determine if an old record has to be deleted
3. At least CL_QUORUM is required to make sure that read before write is always 
consistent
4. Old record has to be deleted on update


I guess, using an MV here is more efficient as there is less roundtrip between 
C* and the app to do all that and the MV does not require strong consistency as 
MV updates are always local and are eventual consistent when the base table is. 
So there is also no need for distributed locks.


I ask all this as we now use CS 3.x and have been advised that 3.x is still not 
considered really production ready.


I guess in a perfect world, this wouldn't even require an MV if SASI indexes 
could be created over more than 1 column. E.g. in MySQL this case is nothing 
else than a BTree. AFAIK SASI indices are also BTrees, filtering by Partition 
Key (which should to be done anyway) and sorting by a field would perfectly do 
the trick. But from the docs, this is not possible right now.



Does anyone see a better solution or are all my assumptions correct?



-- 
Benjamin Roth
Prokurist

Jaumo GmbH · www.jaumo.com
Wehrstraße 46 · 73035 Göppingen · Germany
Phone +49 7161 304880-6 · Fax +49 7161 304880-1
AG Ulm · HRB 731058 · Managing Director: Jens Kammerer

 
 












-- 
Benjamin Roth
Prokurist

Jaumo GmbH · www.jaumo.com
Wehrstraße 46 · 73035 Göppingen · Germany
Phone +49 7161 304880-6 · Fax +49 7161 304880-1
AG Ulm · HRB 731058 · Managing Director: Jens Kammerer

 
 






Re: Efficient model for a sorting

2016-10-04 Thread Benjamin Roth
I use the self-compiled master (3.10, ticktock). I had to fix a severe bug
on my own and decided to go with the latest code.
Would you consider 3.0.x to be more stable than 3.x?

2016-10-04 18:14 GMT+02:00 Vladimir Yudovin :

> Hi Benjamin!
>
> >we now use CS 3.x and have been advised that 3.x is still not considered
> really production ready.
>
> Did you consider using of 3.0.9? Actually it's 3.0 with almost an year
> fixes.
>
>
> Best regards, Vladimir Yudovin,
>
>
> *Winguzone Inc  - Hosted Cloud Cassandra
> on Azure and SoftLayer.Launch your cluster in minutes.*
>
>
>  On Tue, 04 Oct 2016 07:27:54 -0400 *Benjamin Roth
> >* wrote 
>
> Hi!
>
> I have a frequently used pattern which seems to be quite costly in CS. The
> pattern is always the same: I have a unique key and a sorting by a
> different field.
>
> To give an example, here a real life example from our model:
> CREATE TABLE visits.visits_in (
> user_id int,
> user_id_visitor int,
> created timestamp,
> PRIMARY KEY (user_id, user_id_visitor)
> ) WITH CLUSTERING ORDER BY (user_id_visitor ASC)
>
> CREATE MATERIALIZED VIEW visits.visits_in_sorted_mv AS
> SELECT user_id, created, user_id_visitor
> FROM visits.visits_in
> WHERE user_id IS NOT NULL AND created IS NOT NULL AND user_id_visitor
> IS NOT NULL
> PRIMARY KEY (user_id, created, user_id_visitor)
> WITH CLUSTERING ORDER BY (created DESC, user_id_visitor DESC)
>
> This simply represents people, that visited my profile sorted by date desc
> but only one entry per visitor.
> Other examples with the same pattern could be a whats-app-like inbox where
> the last message of each sender is shown by date desc. There are lots of
> examples for that pattern.
>
> E.g. in redis I'd just use a sorted set, where the key could be like
> "visits_${user_id}", set key would be user_id_visitor and score
> the created timestamp.
> In MySQL I'd create the table with PK on user_id + user_id_visitor and
> create an index on user_id + created
> In C* i use an MV.
>
> Is this the most efficient approach?
> I also could have done this without an MV but then the situation in our
> app would be far more complex.
> I know that denormalization is a common pattern in C* and I don't hesitate
> to use it but in this case, it is not as simple as it's not an append-only
> case but updates have to be handled correctly.
> If it is the first visit of a user, it's that simple, just 2 inserts in
> base table + denormalized table. But on a 2nd or 3rd visit, the 1st or 2nd
> visit has to be deleted from the denormalized table before. Otherwise the
> visit would not be unique any more.
> Handling this case without an MV requires a lot more effort, I guess even
> more effort than just using an MV.
> 1. You need kind of app-side locking to deal with race conditions
> 2. Read before write is required to determine if an old record has to be
> deleted
> 3. At least CL_QUORUM is required to make sure that read before write is
> always consistent
> 4. Old record has to be deleted on update
>
> I guess, using an MV here is more efficient as there is less roundtrip
> between C* and the app to do all that and the MV does not require strong
> consistency as MV updates are always local and are eventual consistent when
> the base table is. So there is also no need for distributed locks.
>
> I ask all this as we now use CS 3.x and have been advised that 3.x is
> still not considered really production ready.
>
> I guess in a perfect world, this wouldn't even require an MV if SASI
> indexes could be created over more than 1 column. E.g. in MySQL this case
> is nothing else than a BTree. AFAIK SASI indices are also BTrees, filtering
> by Partition Key (which should to be done anyway) and sorting by a field
> would perfectly do the trick. But from the docs, this is not possible right
> now.
>
> Does anyone see a better solution or are all my assumptions correct?
>
> --
> Benjamin Roth
> Prokurist
>
> Jaumo GmbH · www.jaumo.com
> Wehrstraße 46 · 73035 Göppingen · Germany
> Phone +49 7161 304880-6 · Fax +49 7161 304880-1
> AG Ulm · HRB 731058 · Managing Director: Jens Kammerer
>
>
>
>


-- 
Benjamin Roth
Prokurist

Jaumo GmbH · www.jaumo.com
Wehrstraße 46 · 73035 Göppingen · Germany
Phone +49 7161 304880-6 · Fax +49 7161 304880-1
AG Ulm · HRB 731058 · Managing Director: Jens Kammerer


Re: Cassandra listen port

2016-10-04 Thread Vladimir Yudovin
Actually the main port is 9042 - for client (CQL) connections and 7000 (7001 if 
SSL enabled) for inter node communications.

Best regards, Vladimir Yudovin, 
Winguzone Inc - Hosted Cloud Cassandra on Azure and SoftLayer.
Launch your cluster in minutes.




 On Tue, 04 Oct 2016 11:36:04 -0400 Benjamin 
Rothbenjamin.r...@jaumo.com wrote  

There are several ports for several services. They are all set in cassandra.yaml

See here for complete documentation:
https://docs.datastax.com/en/cassandra/2.1/cassandra/configuration/configCassandra_yaml_r.html



2016-10-04 16:54 GMT+02:00 Mehdi Bada mehdi.b...@dbi-services.com:
Hi all, 



What is the listen port parameter for Apache Cassandra? Does it exist?

In comparison with MongoDB, in mongo it's possible to set the listen port in 
the mongod.conf (configuration file)



Regards 

Mehdi



Mehdi Bada | Consultant
Phone: +41 32 422 96 00 | Mobile: +41 79 928 75 48 | Fax: +41 32 422 96 15 
dbi services, Rue de la Jeunesse 2, CH-2800 Delémont
mehdi.b...@dbi-services.com 
www.dbi-services.com









⇒ dbi services is recruiting Oracle  SQL Server experts ! – Join the team









-- 
Benjamin Roth
Prokurist

Jaumo GmbH · www.jaumo.com
Wehrstraße 46 · 73035 Göppingen · Germany
Phone +49 7161 304880-6 · Fax +49 7161 304880-1
AG Ulm · HRB 731058 · Managing Director: Jens Kammerer

 
 






Re: Efficient model for a sorting

2016-10-04 Thread Vladimir Yudovin
Hi Benjamin!

we now use CS 3.x and have been advised that 3.x is still not considered 
really production ready.

Did you consider using of 3.0.9? Actually it's 3.0 with almost an year fixes.


Best regards, Vladimir Yudovin, 
Winguzone Inc - Hosted Cloud Cassandra on Azure and SoftLayer.
Launch your cluster in minutes.




 On Tue, 04 Oct 2016 07:27:54 -0400 Benjamin Roth 
benjamin.r...@jaumo.com wrote  

Hi!


I have a frequently used pattern which seems to be quite costly in CS. The 
pattern is always the same: I have a unique key and a sorting by a different 
field.


To give an example, here a real life example from our model:
CREATE TABLE visits.visits_in (
user_id int,
user_id_visitor int,
created timestamp,
PRIMARY KEY (user_id, user_id_visitor)
) WITH CLUSTERING ORDER BY (user_id_visitor ASC)



CREATE MATERIALIZED VIEW visits.visits_in_sorted_mv AS
SELECT user_id, created, user_id_visitor
FROM visits.visits_in
WHERE user_id IS NOT NULL AND created IS NOT NULL AND user_id_visitor IS 
NOT NULL
PRIMARY KEY (user_id, created, user_id_visitor)
WITH CLUSTERING ORDER BY (created DESC, user_id_visitor DESC)


This simply represents people, that visited my profile sorted by date desc but 
only one entry per visitor.
Other examples with the same pattern could be a whats-app-like inbox where the 
last message of each sender is shown by date desc. There are lots of examples 
for that pattern.




E.g. in redis I'd just use a sorted set, where the key could be like 
"visits_${user_id}", set key would be user_id_visitor and score the created 
timestamp.

In MySQL I'd create the table with PK on user_id + user_id_visitor and create 
an index on user_id + created
In C* i use an MV.


Is this the most efficient approach?
I also could have done this without an MV but then the situation in our app 
would be far more complex.
I know that denormalization is a common pattern in C* and I don't hesitate to 
use it but in this case, it is not as simple as it's not an append-only case 
but updates have to be handled correctly.
If it is the first visit of a user, it's that simple, just 2 inserts in base 
table + denormalized table. But on a 2nd or 3rd visit, the 1st or 2nd visit has 
to be deleted from the denormalized table before. Otherwise the visit would not 
be unique any more.
Handling this case without an MV requires a lot more effort, I guess even more 
effort than just using an MV. 
1. You need kind of app-side locking to deal with race conditions
2. Read before write is required to determine if an old record has to be deleted
3. At least CL_QUORUM is required to make sure that read before write is always 
consistent
4. Old record has to be deleted on update


I guess, using an MV here is more efficient as there is less roundtrip between 
C* and the app to do all that and the MV does not require strong consistency as 
MV updates are always local and are eventual consistent when the base table is. 
So there is also no need for distributed locks.


I ask all this as we now use CS 3.x and have been advised that 3.x is still not 
considered really production ready.


I guess in a perfect world, this wouldn't even require an MV if SASI indexes 
could be created over more than 1 column. E.g. in MySQL this case is nothing 
else than a BTree. AFAIK SASI indices are also BTrees, filtering by Partition 
Key (which should to be done anyway) and sorting by a field would perfectly do 
the trick. But from the docs, this is not possible right now.



Does anyone see a better solution or are all my assumptions correct?



-- 
Benjamin Roth
Prokurist

Jaumo GmbH · www.jaumo.com
Wehrstraße 46 · 73035 Göppingen · Germany
Phone +49 7161 304880-6 · Fax +49 7161 304880-1
AG Ulm · HRB 731058 · Managing Director: Jens Kammerer

 
 






Re: Cassandra listen port

2016-10-04 Thread Benjamin Roth
There are several ports for several services. They are all set in
cassandra.yaml

See here for complete documentation:
https://docs.datastax.com/en/cassandra/2.1/cassandra/configuration/configCassandra_yaml_r.html

2016-10-04 16:54 GMT+02:00 Mehdi Bada :

> Hi all,
>
> What is the listen port parameter for Apache Cassandra? Does it exist?
> In comparison with MongoDB, in mongo it's possible to set the listen port
> in the mongod.conf (configuration file)
>
> Regards
> Mehdi
>
> *Mehdi Bada* | Consultant
> Phone: +41 32 422 96 00 | Mobile: +41 79 928 75 48 | Fax: +41 32 422 96 15
> dbi services, Rue de la Jeunesse 2, CH-2800 Delémont
> mehdi.b...@dbi-services.com
> www.dbi-services.com
>
>
>
>
> *⇒ dbi services is recruiting Oracle & SQL Server experts ! – Join the
> team
> *
>



-- 
Benjamin Roth
Prokurist

Jaumo GmbH · www.jaumo.com
Wehrstraße 46 · 73035 Göppingen · Germany
Phone +49 7161 304880-6 · Fax +49 7161 304880-1
AG Ulm · HRB 731058 · Managing Director: Jens Kammerer


Cassandra listen port

2016-10-04 Thread Mehdi Bada
Hi all, 

What is the listen port parameter for Apache Cassandra? Does it exist? 
In comparison with MongoDB, in mongo it's possible to set the listen port in 
the mongod.conf (configuration file) 

Regards 
Mehdi 

Mehdi Bada | Consultant 
Phone: +41 32 422 96 00 | Mobile: +41 79 928 75 48 | Fax: +41 32 422 96 15 
dbi services, Rue de la Jeunesse 2, CH-2800 Delémont 
mehdi.b...@dbi-services.com 
www.dbi-services.com 



⇒ dbi services is recruiting Oracle & SQL Server experts ! – Join the team 


Cassandra Ignores path to HeadDumpFile defined by cassandra-env.sh

2016-10-04 Thread Jean Carlo
Hi all,

We got recently a OOM error in cassandra, and it happened that cassandra
made the dump in the path defined by debian/init.


However we defined the CASSANDRA_HEAPDUMP_DIR in the file
/etc/default/cassandra so cassandra must do the dump in it.

Checking the cassandra jvm arguments, I can see that the value of
-XX:HeapDumpPath is charge two times. And the last corresponds to the value
set by default.

I suppose cassandra takes in account the last  -XX:HeapDumpPath and not
that one set in cassandra-env.sh which isn't the expected behavior.

Did anyone get the same behavior ?


Best regards

Jean Carlo

"The best way to predict the future is to inven
t it"
Alan Kay


Little question

2016-10-04 Thread Ruben Cardenal
 

Hi, 

We've inherited quite a big amazon infrastructure from a company we've
purchased. It's has an ancient and obsolete implementation of services,
being the worst (and more expensive) of all of them a 5 cluster of
Cassandra (RF=3). I'm new to Cassandra, and yes, I'm doing my way
throughout docs. 

I was told that Amazon asked them a few months ago to reboot one of
their servers (it had been turned on for so long that Amazon had to make
some changes and needed it rebooted), so they had to add a new node to
the cluster. If you query nodetool as of now, it shows: 

$ nodetool ring
Note: Ownership information does not include topology, please specify a
keyspace.
Address DC Rack Status State Load Owns Token
141784319550391026443072753096570088105
10.128.50.130 datacenter1 rack1 Up Normal 263.06 GB 16.67% 0
10.128.50.237 datacenter1 rack1 Up Normal 253.31 GB 16.67%
28356863910078205288614550619314017621
10.128.60.106 datacenter1 rack1 Up Normal 262.12 GB 33.33%
85070591730234615865843651857942052863
10.128.70.41 datacenter1 rack1 Up Normal 264.28 GB 16.67%
113427455640312821154458202477256070484
10.128.60.206 datacenter1 rack1 Up Normal 65.15 GB 16.67%
141784319550391026443072753096570088105 

What puzzels me is the last line. It belongs to the last added node, the
new one I talked about. While it's holding the same amount of data
(16.67%) that other 3 nodes, the Load is about 4 times lower. What does
this mean? Is that difference data that is not cleaned up, such as
TTL-expired cell or tombstoned data? 

Thanks and excuse me if I'm asking something stupid. 

Rubén. 

 

Re: Repairing without -pr shows unexpected out-of-sync ranges

2016-10-04 Thread Paulo Motta
> is (2) a direct consequence of a repair on the full token range (and thus
anti-compaction ran only on a subset of the RF nodes)?

Not necessarily, because even with -pr enabled the nodes will be
responsible for different ranges, so they will flush and compact at
different instants. The effect of this on long running repairs is that data
that was marked as repaired in one replica, may be compacted in some other
replica, causing it to not be marked as repaired due to CASSANDRA-9143,
what will cause a mismatch in the next repair. This could probably be
alleviated by CASSANDRA-6696.

2016-10-03 12:16 GMT-03:00 Stefano Ortolani :

> I was wondering: is (2) a direct consequence of a repair on the full
> token range (and thus anti-compaction ran only on a subset of the RF
> nodes)?. If I understand correctly, a repair with -pr should fix this,
> at the cost of all nodes performing the anticompaction phase?
>
> Cheers,
> Stefano
>
> On Tue, Sep 27, 2016 at 4:09 PM, Stefano Ortolani 
> wrote:
> > Didn't know about (2), and I actually have a time drift between the
> nodes.
> > Thanks a lot Paulo!
> >
> > Regards,
> > Stefano
> >
> > On Thu, Sep 22, 2016 at 6:36 PM, Paulo Motta 
> > wrote:
> >>
> >> There are a couple of things that could be happening here:
> >> - There will be time differences between when nodes participating repair
> >> flush, so in write-heavy tables there will always be minor differences
> >> during validation, and those could be accentuated by low resolution
> merkle
> >> trees, which will affect mostly larger tables.
> >> - SSTables compacted during incremental repair will not be marked as
> >> repaired, so nodes with different compaction cadences will have
> different
> >> data in their unrepaired set, what will cause mismatches in the
> subsequent
> >> incremental repairs. CASSANDRA-9143 will hopefully fix that limitation.
> >>
> >> 2016-09-22 7:10 GMT-03:00 Stefano Ortolani :
> >>>
> >>> Hi,
> >>>
> >>> I am seeing something weird while running repairs.
> >>> I am testing 3.0.9 so I am running the repairs manually, node after
> node,
> >>> on a cluster with RF=3. I am using a standard repair command
> (incremental,
> >>> parallel, full range), and I just noticed that the third node detected
> some
> >>> ranges out of sync with one of the nodes that just finished repairing.
> >>>
> >>> Since there was no dropped mutation, that sounds weird to me
> considering
> >>> that the repairs are supposed to operate on the whole range.
> >>>
> >>> Any idea why?
> >>> Maybe I am missing something?
> >>>
> >>> Cheers,
> >>> Stefano
> >>>
> >>
> >
>


Re: Efficient model for a sorting

2016-10-04 Thread DuyHai Doan
MV build is also async.

In the end it's MV maintenance cost vs Lucene index maintenance cost. I
don't have clear figure to judge which one is better. Maybe you should
benchmark yourself. Anyway I'll be interested by the results

On Tue, Oct 4, 2016 at 3:05 PM, Dorian Hoxha  wrote:

> On lucene you can query+filter+sort on a single shard, so it should be
> better than MV/sasi. The index building is a little async though.
>
> On Tue, Oct 4, 2016 at 2:29 PM, Benjamin Roth 
> wrote:
>
>> Thanks guys!
>>
>> Good to know, that my approach is basically right, but I will check that
>> lucene indices by time.
>>
>> 2016-10-04 14:22 GMT+02:00 DuyHai Doan :
>>
>>> "What scatter/gather? "
>>>
>>> http://www.slideshare.net/doanduyhai/sasi-cassandra-on-the-f
>>> ull-text-search-ride-voxxed-daybelgrade-2016/23
>>>
>>> "If you partition your data by user_id then you query only 1 shard to
>>> get sorted by time visitors for a user"
>>>
>>> Exact, but in this case, you're using a 2nd index only for sorting right
>>> ? For SASI it's not even possible. Maybe it can work with Statrio Lucene
>>> impl
>>>
>>> On Tue, Oct 4, 2016 at 2:15 PM, Dorian Hoxha 
>>> wrote:
>>>
 @DuyHai

 What scatter/gather? If you partition your data by user_id then you
 query only 1 shard to get sorted by time visitors for a user.

 On Tue, Oct 4, 2016 at 2:09 PM, DuyHai Doan 
 wrote:

> MV is right now your best choice for this kind of sorting behavior.
>
> Secondary index (whatever the impl, SASI or Lucene) has a cost of
> scatter-gather if your cluster scale out. With MV you're at least
> guaranteed to hit a single node everytime
>
> On Tue, Oct 4, 2016 at 1:56 PM, Dorian Hoxha 
> wrote:
>
>> Can you use the lucene index https://github.com/Stratio/cas
>> sandra-lucene-index ?
>>
>> On Tue, Oct 4, 2016 at 1:27 PM, Benjamin Roth <
>> benjamin.r...@jaumo.com> wrote:
>>
>>> Hi!
>>>
>>> I have a frequently used pattern which seems to be quite costly in
>>> CS. The pattern is always the same: I have a unique key and a sorting 
>>> by a
>>> different field.
>>>
>>> To give an example, here a real life example from our model:
>>> CREATE TABLE visits.visits_in (
>>> user_id int,
>>> user_id_visitor int,
>>> created timestamp,
>>> PRIMARY KEY (user_id, user_id_visitor)
>>> ) WITH CLUSTERING ORDER BY (user_id_visitor ASC)
>>>
>>> CREATE MATERIALIZED VIEW visits.visits_in_sorted_mv AS
>>> SELECT user_id, created, user_id_visitor
>>> FROM visits.visits_in
>>> WHERE user_id IS NOT NULL AND created IS NOT NULL AND
>>> user_id_visitor IS NOT NULL
>>> PRIMARY KEY (user_id, created, user_id_visitor)
>>> WITH CLUSTERING ORDER BY (created DESC, user_id_visitor DESC)
>>>
>>> This simply represents people, that visited my profile sorted by
>>> date desc but only one entry per visitor.
>>> Other examples with the same pattern could be a whats-app-like inbox
>>> where the last message of each sender is shown by date desc. There are 
>>> lots
>>> of examples for that pattern.
>>>
>>> E.g. in redis I'd just use a sorted set, where the key could be like
>>> "visits_${user_id}", set key would be user_id_visitor and score
>>> the created timestamp.
>>> In MySQL I'd create the table with PK on user_id + user_id_visitor
>>> and create an index on user_id + created
>>> In C* i use an MV.
>>>
>>> Is this the most efficient approach?
>>> I also could have done this without an MV but then the situation in
>>> our app would be far more complex.
>>> I know that denormalization is a common pattern in C* and I don't
>>> hesitate to use it but in this case, it is not as simple as it's not an
>>> append-only case but updates have to be handled correctly.
>>> If it is the first visit of a user, it's that simple, just 2 inserts
>>> in base table + denormalized table. But on a 2nd or 3rd visit, the 1st 
>>> or
>>> 2nd visit has to be deleted from the denormalized table before. 
>>> Otherwise
>>> the visit would not be unique any more.
>>> Handling this case without an MV requires a lot more effort, I guess
>>> even more effort than just using an MV.
>>> 1. You need kind of app-side locking to deal with race conditions
>>> 2. Read before write is required to determine if an old record has
>>> to be deleted
>>> 3. At least CL_QUORUM is required to make sure that read before
>>> write is always consistent
>>> 4. Old record has to be deleted on update
>>>
>>> I guess, using an MV here is more efficient as there is less
>>> roundtrip between C* and the app to do all that and the MV does not 

Re: Efficient model for a sorting

2016-10-04 Thread Dorian Hoxha
On lucene you can query+filter+sort on a single shard, so it should be
better than MV/sasi. The index building is a little async though.

On Tue, Oct 4, 2016 at 2:29 PM, Benjamin Roth 
wrote:

> Thanks guys!
>
> Good to know, that my approach is basically right, but I will check that
> lucene indices by time.
>
> 2016-10-04 14:22 GMT+02:00 DuyHai Doan :
>
>> "What scatter/gather? "
>>
>> http://www.slideshare.net/doanduyhai/sasi-cassandra-on-the-
>> full-text-search-ride-voxxed-daybelgrade-2016/23
>>
>> "If you partition your data by user_id then you query only 1 shard to
>> get sorted by time visitors for a user"
>>
>> Exact, but in this case, you're using a 2nd index only for sorting right
>> ? For SASI it's not even possible. Maybe it can work with Statrio Lucene
>> impl
>>
>> On Tue, Oct 4, 2016 at 2:15 PM, Dorian Hoxha 
>> wrote:
>>
>>> @DuyHai
>>>
>>> What scatter/gather? If you partition your data by user_id then you
>>> query only 1 shard to get sorted by time visitors for a user.
>>>
>>> On Tue, Oct 4, 2016 at 2:09 PM, DuyHai Doan 
>>> wrote:
>>>
 MV is right now your best choice for this kind of sorting behavior.

 Secondary index (whatever the impl, SASI or Lucene) has a cost of
 scatter-gather if your cluster scale out. With MV you're at least
 guaranteed to hit a single node everytime

 On Tue, Oct 4, 2016 at 1:56 PM, Dorian Hoxha 
 wrote:

> Can you use the lucene index https://github.com/Stratio/cas
> sandra-lucene-index ?
>
> On Tue, Oct 4, 2016 at 1:27 PM, Benjamin Roth  > wrote:
>
>> Hi!
>>
>> I have a frequently used pattern which seems to be quite costly in
>> CS. The pattern is always the same: I have a unique key and a sorting by 
>> a
>> different field.
>>
>> To give an example, here a real life example from our model:
>> CREATE TABLE visits.visits_in (
>> user_id int,
>> user_id_visitor int,
>> created timestamp,
>> PRIMARY KEY (user_id, user_id_visitor)
>> ) WITH CLUSTERING ORDER BY (user_id_visitor ASC)
>>
>> CREATE MATERIALIZED VIEW visits.visits_in_sorted_mv AS
>> SELECT user_id, created, user_id_visitor
>> FROM visits.visits_in
>> WHERE user_id IS NOT NULL AND created IS NOT NULL AND
>> user_id_visitor IS NOT NULL
>> PRIMARY KEY (user_id, created, user_id_visitor)
>> WITH CLUSTERING ORDER BY (created DESC, user_id_visitor DESC)
>>
>> This simply represents people, that visited my profile sorted by date
>> desc but only one entry per visitor.
>> Other examples with the same pattern could be a whats-app-like inbox
>> where the last message of each sender is shown by date desc. There are 
>> lots
>> of examples for that pattern.
>>
>> E.g. in redis I'd just use a sorted set, where the key could be like
>> "visits_${user_id}", set key would be user_id_visitor and score
>> the created timestamp.
>> In MySQL I'd create the table with PK on user_id + user_id_visitor
>> and create an index on user_id + created
>> In C* i use an MV.
>>
>> Is this the most efficient approach?
>> I also could have done this without an MV but then the situation in
>> our app would be far more complex.
>> I know that denormalization is a common pattern in C* and I don't
>> hesitate to use it but in this case, it is not as simple as it's not an
>> append-only case but updates have to be handled correctly.
>> If it is the first visit of a user, it's that simple, just 2 inserts
>> in base table + denormalized table. But on a 2nd or 3rd visit, the 1st or
>> 2nd visit has to be deleted from the denormalized table before. Otherwise
>> the visit would not be unique any more.
>> Handling this case without an MV requires a lot more effort, I guess
>> even more effort than just using an MV.
>> 1. You need kind of app-side locking to deal with race conditions
>> 2. Read before write is required to determine if an old record has to
>> be deleted
>> 3. At least CL_QUORUM is required to make sure that read before write
>> is always consistent
>> 4. Old record has to be deleted on update
>>
>> I guess, using an MV here is more efficient as there is less
>> roundtrip between C* and the app to do all that and the MV does not 
>> require
>> strong consistency as MV updates are always local and are eventual
>> consistent when the base table is. So there is also no need for 
>> distributed
>> locks.
>>
>> I ask all this as we now use CS 3.x and have been advised that 3.x is
>> still not considered really production ready.
>>
>> I guess in a perfect world, this wouldn't even require an MV if SASI
>> indexes could be 

Re: Efficient model for a sorting

2016-10-04 Thread Benjamin Roth
Thanks guys!

Good to know, that my approach is basically right, but I will check that
lucene indices by time.

2016-10-04 14:22 GMT+02:00 DuyHai Doan :

> "What scatter/gather? "
>
> http://www.slideshare.net/doanduyhai/sasi-cassandra-on-
> the-full-text-search-ride-voxxed-daybelgrade-2016/23
>
> "If you partition your data by user_id then you query only 1 shard to get
> sorted by time visitors for a user"
>
> Exact, but in this case, you're using a 2nd index only for sorting right ?
> For SASI it's not even possible. Maybe it can work with Statrio Lucene impl
>
> On Tue, Oct 4, 2016 at 2:15 PM, Dorian Hoxha 
> wrote:
>
>> @DuyHai
>>
>> What scatter/gather? If you partition your data by user_id then you query
>> only 1 shard to get sorted by time visitors for a user.
>>
>> On Tue, Oct 4, 2016 at 2:09 PM, DuyHai Doan  wrote:
>>
>>> MV is right now your best choice for this kind of sorting behavior.
>>>
>>> Secondary index (whatever the impl, SASI or Lucene) has a cost of
>>> scatter-gather if your cluster scale out. With MV you're at least
>>> guaranteed to hit a single node everytime
>>>
>>> On Tue, Oct 4, 2016 at 1:56 PM, Dorian Hoxha 
>>> wrote:
>>>
 Can you use the lucene index https://github.com/Stratio/cas
 sandra-lucene-index ?

 On Tue, Oct 4, 2016 at 1:27 PM, Benjamin Roth 
 wrote:

> Hi!
>
> I have a frequently used pattern which seems to be quite costly in CS.
> The pattern is always the same: I have a unique key and a sorting by a
> different field.
>
> To give an example, here a real life example from our model:
> CREATE TABLE visits.visits_in (
> user_id int,
> user_id_visitor int,
> created timestamp,
> PRIMARY KEY (user_id, user_id_visitor)
> ) WITH CLUSTERING ORDER BY (user_id_visitor ASC)
>
> CREATE MATERIALIZED VIEW visits.visits_in_sorted_mv AS
> SELECT user_id, created, user_id_visitor
> FROM visits.visits_in
> WHERE user_id IS NOT NULL AND created IS NOT NULL AND
> user_id_visitor IS NOT NULL
> PRIMARY KEY (user_id, created, user_id_visitor)
> WITH CLUSTERING ORDER BY (created DESC, user_id_visitor DESC)
>
> This simply represents people, that visited my profile sorted by date
> desc but only one entry per visitor.
> Other examples with the same pattern could be a whats-app-like inbox
> where the last message of each sender is shown by date desc. There are 
> lots
> of examples for that pattern.
>
> E.g. in redis I'd just use a sorted set, where the key could be like
> "visits_${user_id}", set key would be user_id_visitor and score
> the created timestamp.
> In MySQL I'd create the table with PK on user_id + user_id_visitor and
> create an index on user_id + created
> In C* i use an MV.
>
> Is this the most efficient approach?
> I also could have done this without an MV but then the situation in
> our app would be far more complex.
> I know that denormalization is a common pattern in C* and I don't
> hesitate to use it but in this case, it is not as simple as it's not an
> append-only case but updates have to be handled correctly.
> If it is the first visit of a user, it's that simple, just 2 inserts
> in base table + denormalized table. But on a 2nd or 3rd visit, the 1st or
> 2nd visit has to be deleted from the denormalized table before. Otherwise
> the visit would not be unique any more.
> Handling this case without an MV requires a lot more effort, I guess
> even more effort than just using an MV.
> 1. You need kind of app-side locking to deal with race conditions
> 2. Read before write is required to determine if an old record has to
> be deleted
> 3. At least CL_QUORUM is required to make sure that read before write
> is always consistent
> 4. Old record has to be deleted on update
>
> I guess, using an MV here is more efficient as there is less roundtrip
> between C* and the app to do all that and the MV does not require strong
> consistency as MV updates are always local and are eventual consistent 
> when
> the base table is. So there is also no need for distributed locks.
>
> I ask all this as we now use CS 3.x and have been advised that 3.x is
> still not considered really production ready.
>
> I guess in a perfect world, this wouldn't even require an MV if SASI
> indexes could be created over more than 1 column. E.g. in MySQL this case
> is nothing else than a BTree. AFAIK SASI indices are also BTrees, 
> filtering
> by Partition Key (which should to be done anyway) and sorting by a field
> would perfectly do the trick. But from the docs, this is not possible 
> right
> now.
>
> Does anyone see a better 

Re: Efficient model for a sorting

2016-10-04 Thread DuyHai Doan
"What scatter/gather? "

http://www.slideshare.net/doanduyhai/sasi-cassandra-on-the-full-text-search-ride-voxxed-daybelgrade-2016/23

"If you partition your data by user_id then you query only 1 shard to get
sorted by time visitors for a user"

Exact, but in this case, you're using a 2nd index only for sorting right ?
For SASI it's not even possible. Maybe it can work with Statrio Lucene impl

On Tue, Oct 4, 2016 at 2:15 PM, Dorian Hoxha  wrote:

> @DuyHai
>
> What scatter/gather? If you partition your data by user_id then you query
> only 1 shard to get sorted by time visitors for a user.
>
> On Tue, Oct 4, 2016 at 2:09 PM, DuyHai Doan  wrote:
>
>> MV is right now your best choice for this kind of sorting behavior.
>>
>> Secondary index (whatever the impl, SASI or Lucene) has a cost of
>> scatter-gather if your cluster scale out. With MV you're at least
>> guaranteed to hit a single node everytime
>>
>> On Tue, Oct 4, 2016 at 1:56 PM, Dorian Hoxha 
>> wrote:
>>
>>> Can you use the lucene index https://github.com/Stratio/cas
>>> sandra-lucene-index ?
>>>
>>> On Tue, Oct 4, 2016 at 1:27 PM, Benjamin Roth 
>>> wrote:
>>>
 Hi!

 I have a frequently used pattern which seems to be quite costly in CS.
 The pattern is always the same: I have a unique key and a sorting by a
 different field.

 To give an example, here a real life example from our model:
 CREATE TABLE visits.visits_in (
 user_id int,
 user_id_visitor int,
 created timestamp,
 PRIMARY KEY (user_id, user_id_visitor)
 ) WITH CLUSTERING ORDER BY (user_id_visitor ASC)

 CREATE MATERIALIZED VIEW visits.visits_in_sorted_mv AS
 SELECT user_id, created, user_id_visitor
 FROM visits.visits_in
 WHERE user_id IS NOT NULL AND created IS NOT NULL AND
 user_id_visitor IS NOT NULL
 PRIMARY KEY (user_id, created, user_id_visitor)
 WITH CLUSTERING ORDER BY (created DESC, user_id_visitor DESC)

 This simply represents people, that visited my profile sorted by date
 desc but only one entry per visitor.
 Other examples with the same pattern could be a whats-app-like inbox
 where the last message of each sender is shown by date desc. There are lots
 of examples for that pattern.

 E.g. in redis I'd just use a sorted set, where the key could be like
 "visits_${user_id}", set key would be user_id_visitor and score
 the created timestamp.
 In MySQL I'd create the table with PK on user_id + user_id_visitor and
 create an index on user_id + created
 In C* i use an MV.

 Is this the most efficient approach?
 I also could have done this without an MV but then the situation in our
 app would be far more complex.
 I know that denormalization is a common pattern in C* and I don't
 hesitate to use it but in this case, it is not as simple as it's not an
 append-only case but updates have to be handled correctly.
 If it is the first visit of a user, it's that simple, just 2 inserts in
 base table + denormalized table. But on a 2nd or 3rd visit, the 1st or 2nd
 visit has to be deleted from the denormalized table before. Otherwise the
 visit would not be unique any more.
 Handling this case without an MV requires a lot more effort, I guess
 even more effort than just using an MV.
 1. You need kind of app-side locking to deal with race conditions
 2. Read before write is required to determine if an old record has to
 be deleted
 3. At least CL_QUORUM is required to make sure that read before write
 is always consistent
 4. Old record has to be deleted on update

 I guess, using an MV here is more efficient as there is less roundtrip
 between C* and the app to do all that and the MV does not require strong
 consistency as MV updates are always local and are eventual consistent when
 the base table is. So there is also no need for distributed locks.

 I ask all this as we now use CS 3.x and have been advised that 3.x is
 still not considered really production ready.

 I guess in a perfect world, this wouldn't even require an MV if SASI
 indexes could be created over more than 1 column. E.g. in MySQL this case
 is nothing else than a BTree. AFAIK SASI indices are also BTrees, filtering
 by Partition Key (which should to be done anyway) and sorting by a field
 would perfectly do the trick. But from the docs, this is not possible right
 now.

 Does anyone see a better solution or are all my assumptions correct?

 --
 Benjamin Roth
 Prokurist

 Jaumo GmbH · www.jaumo.com
 Wehrstraße 46 · 73035 Göppingen · Germany
 Phone +49 7161 304880-6 · Fax +49 7161 304880-1
 AG Ulm · HRB 731058 · Managing Director: Jens Kammerer

>>>
>>>
>>
>


Re: Efficient model for a sorting

2016-10-04 Thread Dorian Hoxha
@DuyHai

What scatter/gather? If you partition your data by user_id then you query
only 1 shard to get sorted by time visitors for a user.

On Tue, Oct 4, 2016 at 2:09 PM, DuyHai Doan  wrote:

> MV is right now your best choice for this kind of sorting behavior.
>
> Secondary index (whatever the impl, SASI or Lucene) has a cost of
> scatter-gather if your cluster scale out. With MV you're at least
> guaranteed to hit a single node everytime
>
> On Tue, Oct 4, 2016 at 1:56 PM, Dorian Hoxha 
> wrote:
>
>> Can you use the lucene index https://github.com/Stratio/cas
>> sandra-lucene-index ?
>>
>> On Tue, Oct 4, 2016 at 1:27 PM, Benjamin Roth 
>> wrote:
>>
>>> Hi!
>>>
>>> I have a frequently used pattern which seems to be quite costly in CS.
>>> The pattern is always the same: I have a unique key and a sorting by a
>>> different field.
>>>
>>> To give an example, here a real life example from our model:
>>> CREATE TABLE visits.visits_in (
>>> user_id int,
>>> user_id_visitor int,
>>> created timestamp,
>>> PRIMARY KEY (user_id, user_id_visitor)
>>> ) WITH CLUSTERING ORDER BY (user_id_visitor ASC)
>>>
>>> CREATE MATERIALIZED VIEW visits.visits_in_sorted_mv AS
>>> SELECT user_id, created, user_id_visitor
>>> FROM visits.visits_in
>>> WHERE user_id IS NOT NULL AND created IS NOT NULL AND
>>> user_id_visitor IS NOT NULL
>>> PRIMARY KEY (user_id, created, user_id_visitor)
>>> WITH CLUSTERING ORDER BY (created DESC, user_id_visitor DESC)
>>>
>>> This simply represents people, that visited my profile sorted by date
>>> desc but only one entry per visitor.
>>> Other examples with the same pattern could be a whats-app-like inbox
>>> where the last message of each sender is shown by date desc. There are lots
>>> of examples for that pattern.
>>>
>>> E.g. in redis I'd just use a sorted set, where the key could be like
>>> "visits_${user_id}", set key would be user_id_visitor and score
>>> the created timestamp.
>>> In MySQL I'd create the table with PK on user_id + user_id_visitor and
>>> create an index on user_id + created
>>> In C* i use an MV.
>>>
>>> Is this the most efficient approach?
>>> I also could have done this without an MV but then the situation in our
>>> app would be far more complex.
>>> I know that denormalization is a common pattern in C* and I don't
>>> hesitate to use it but in this case, it is not as simple as it's not an
>>> append-only case but updates have to be handled correctly.
>>> If it is the first visit of a user, it's that simple, just 2 inserts in
>>> base table + denormalized table. But on a 2nd or 3rd visit, the 1st or 2nd
>>> visit has to be deleted from the denormalized table before. Otherwise the
>>> visit would not be unique any more.
>>> Handling this case without an MV requires a lot more effort, I guess
>>> even more effort than just using an MV.
>>> 1. You need kind of app-side locking to deal with race conditions
>>> 2. Read before write is required to determine if an old record has to be
>>> deleted
>>> 3. At least CL_QUORUM is required to make sure that read before write is
>>> always consistent
>>> 4. Old record has to be deleted on update
>>>
>>> I guess, using an MV here is more efficient as there is less roundtrip
>>> between C* and the app to do all that and the MV does not require strong
>>> consistency as MV updates are always local and are eventual consistent when
>>> the base table is. So there is also no need for distributed locks.
>>>
>>> I ask all this as we now use CS 3.x and have been advised that 3.x is
>>> still not considered really production ready.
>>>
>>> I guess in a perfect world, this wouldn't even require an MV if SASI
>>> indexes could be created over more than 1 column. E.g. in MySQL this case
>>> is nothing else than a BTree. AFAIK SASI indices are also BTrees, filtering
>>> by Partition Key (which should to be done anyway) and sorting by a field
>>> would perfectly do the trick. But from the docs, this is not possible right
>>> now.
>>>
>>> Does anyone see a better solution or are all my assumptions correct?
>>>
>>> --
>>> Benjamin Roth
>>> Prokurist
>>>
>>> Jaumo GmbH · www.jaumo.com
>>> Wehrstraße 46 · 73035 Göppingen · Germany
>>> Phone +49 7161 304880-6 · Fax +49 7161 304880-1
>>> AG Ulm · HRB 731058 · Managing Director: Jens Kammerer
>>>
>>
>>
>


Re: Efficient model for a sorting

2016-10-04 Thread DuyHai Doan
MV is right now your best choice for this kind of sorting behavior.

Secondary index (whatever the impl, SASI or Lucene) has a cost of
scatter-gather if your cluster scale out. With MV you're at least
guaranteed to hit a single node everytime

On Tue, Oct 4, 2016 at 1:56 PM, Dorian Hoxha  wrote:

> Can you use the lucene index https://github.com/Stratio/
> cassandra-lucene-index ?
>
> On Tue, Oct 4, 2016 at 1:27 PM, Benjamin Roth 
> wrote:
>
>> Hi!
>>
>> I have a frequently used pattern which seems to be quite costly in CS.
>> The pattern is always the same: I have a unique key and a sorting by a
>> different field.
>>
>> To give an example, here a real life example from our model:
>> CREATE TABLE visits.visits_in (
>> user_id int,
>> user_id_visitor int,
>> created timestamp,
>> PRIMARY KEY (user_id, user_id_visitor)
>> ) WITH CLUSTERING ORDER BY (user_id_visitor ASC)
>>
>> CREATE MATERIALIZED VIEW visits.visits_in_sorted_mv AS
>> SELECT user_id, created, user_id_visitor
>> FROM visits.visits_in
>> WHERE user_id IS NOT NULL AND created IS NOT NULL AND user_id_visitor
>> IS NOT NULL
>> PRIMARY KEY (user_id, created, user_id_visitor)
>> WITH CLUSTERING ORDER BY (created DESC, user_id_visitor DESC)
>>
>> This simply represents people, that visited my profile sorted by date
>> desc but only one entry per visitor.
>> Other examples with the same pattern could be a whats-app-like inbox
>> where the last message of each sender is shown by date desc. There are lots
>> of examples for that pattern.
>>
>> E.g. in redis I'd just use a sorted set, where the key could be like
>> "visits_${user_id}", set key would be user_id_visitor and score
>> the created timestamp.
>> In MySQL I'd create the table with PK on user_id + user_id_visitor and
>> create an index on user_id + created
>> In C* i use an MV.
>>
>> Is this the most efficient approach?
>> I also could have done this without an MV but then the situation in our
>> app would be far more complex.
>> I know that denormalization is a common pattern in C* and I don't
>> hesitate to use it but in this case, it is not as simple as it's not an
>> append-only case but updates have to be handled correctly.
>> If it is the first visit of a user, it's that simple, just 2 inserts in
>> base table + denormalized table. But on a 2nd or 3rd visit, the 1st or 2nd
>> visit has to be deleted from the denormalized table before. Otherwise the
>> visit would not be unique any more.
>> Handling this case without an MV requires a lot more effort, I guess even
>> more effort than just using an MV.
>> 1. You need kind of app-side locking to deal with race conditions
>> 2. Read before write is required to determine if an old record has to be
>> deleted
>> 3. At least CL_QUORUM is required to make sure that read before write is
>> always consistent
>> 4. Old record has to be deleted on update
>>
>> I guess, using an MV here is more efficient as there is less roundtrip
>> between C* and the app to do all that and the MV does not require strong
>> consistency as MV updates are always local and are eventual consistent when
>> the base table is. So there is also no need for distributed locks.
>>
>> I ask all this as we now use CS 3.x and have been advised that 3.x is
>> still not considered really production ready.
>>
>> I guess in a perfect world, this wouldn't even require an MV if SASI
>> indexes could be created over more than 1 column. E.g. in MySQL this case
>> is nothing else than a BTree. AFAIK SASI indices are also BTrees, filtering
>> by Partition Key (which should to be done anyway) and sorting by a field
>> would perfectly do the trick. But from the docs, this is not possible right
>> now.
>>
>> Does anyone see a better solution or are all my assumptions correct?
>>
>> --
>> Benjamin Roth
>> Prokurist
>>
>> Jaumo GmbH · www.jaumo.com
>> Wehrstraße 46 · 73035 Göppingen · Germany
>> Phone +49 7161 304880-6 · Fax +49 7161 304880-1
>> AG Ulm · HRB 731058 · Managing Director: Jens Kammerer
>>
>
>


Re: Efficient model for a sorting

2016-10-04 Thread Dorian Hoxha
Can you use the lucene index
https://github.com/Stratio/cassandra-lucene-index ?

On Tue, Oct 4, 2016 at 1:27 PM, Benjamin Roth 
wrote:

> Hi!
>
> I have a frequently used pattern which seems to be quite costly in CS. The
> pattern is always the same: I have a unique key and a sorting by a
> different field.
>
> To give an example, here a real life example from our model:
> CREATE TABLE visits.visits_in (
> user_id int,
> user_id_visitor int,
> created timestamp,
> PRIMARY KEY (user_id, user_id_visitor)
> ) WITH CLUSTERING ORDER BY (user_id_visitor ASC)
>
> CREATE MATERIALIZED VIEW visits.visits_in_sorted_mv AS
> SELECT user_id, created, user_id_visitor
> FROM visits.visits_in
> WHERE user_id IS NOT NULL AND created IS NOT NULL AND user_id_visitor
> IS NOT NULL
> PRIMARY KEY (user_id, created, user_id_visitor)
> WITH CLUSTERING ORDER BY (created DESC, user_id_visitor DESC)
>
> This simply represents people, that visited my profile sorted by date desc
> but only one entry per visitor.
> Other examples with the same pattern could be a whats-app-like inbox where
> the last message of each sender is shown by date desc. There are lots of
> examples for that pattern.
>
> E.g. in redis I'd just use a sorted set, where the key could be like
> "visits_${user_id}", set key would be user_id_visitor and score
> the created timestamp.
> In MySQL I'd create the table with PK on user_id + user_id_visitor and
> create an index on user_id + created
> In C* i use an MV.
>
> Is this the most efficient approach?
> I also could have done this without an MV but then the situation in our
> app would be far more complex.
> I know that denormalization is a common pattern in C* and I don't hesitate
> to use it but in this case, it is not as simple as it's not an append-only
> case but updates have to be handled correctly.
> If it is the first visit of a user, it's that simple, just 2 inserts in
> base table + denormalized table. But on a 2nd or 3rd visit, the 1st or 2nd
> visit has to be deleted from the denormalized table before. Otherwise the
> visit would not be unique any more.
> Handling this case without an MV requires a lot more effort, I guess even
> more effort than just using an MV.
> 1. You need kind of app-side locking to deal with race conditions
> 2. Read before write is required to determine if an old record has to be
> deleted
> 3. At least CL_QUORUM is required to make sure that read before write is
> always consistent
> 4. Old record has to be deleted on update
>
> I guess, using an MV here is more efficient as there is less roundtrip
> between C* and the app to do all that and the MV does not require strong
> consistency as MV updates are always local and are eventual consistent when
> the base table is. So there is also no need for distributed locks.
>
> I ask all this as we now use CS 3.x and have been advised that 3.x is
> still not considered really production ready.
>
> I guess in a perfect world, this wouldn't even require an MV if SASI
> indexes could be created over more than 1 column. E.g. in MySQL this case
> is nothing else than a BTree. AFAIK SASI indices are also BTrees, filtering
> by Partition Key (which should to be done anyway) and sorting by a field
> would perfectly do the trick. But from the docs, this is not possible right
> now.
>
> Does anyone see a better solution or are all my assumptions correct?
>
> --
> Benjamin Roth
> Prokurist
>
> Jaumo GmbH · www.jaumo.com
> Wehrstraße 46 · 73035 Göppingen · Germany
> Phone +49 7161 304880-6 · Fax +49 7161 304880-1
> AG Ulm · HRB 731058 · Managing Director: Jens Kammerer
>


Efficient model for a sorting

2016-10-04 Thread Benjamin Roth
Hi!

I have a frequently used pattern which seems to be quite costly in CS. The
pattern is always the same: I have a unique key and a sorting by a
different field.

To give an example, here a real life example from our model:
CREATE TABLE visits.visits_in (
user_id int,
user_id_visitor int,
created timestamp,
PRIMARY KEY (user_id, user_id_visitor)
) WITH CLUSTERING ORDER BY (user_id_visitor ASC)

CREATE MATERIALIZED VIEW visits.visits_in_sorted_mv AS
SELECT user_id, created, user_id_visitor
FROM visits.visits_in
WHERE user_id IS NOT NULL AND created IS NOT NULL AND user_id_visitor
IS NOT NULL
PRIMARY KEY (user_id, created, user_id_visitor)
WITH CLUSTERING ORDER BY (created DESC, user_id_visitor DESC)

This simply represents people, that visited my profile sorted by date desc
but only one entry per visitor.
Other examples with the same pattern could be a whats-app-like inbox where
the last message of each sender is shown by date desc. There are lots of
examples for that pattern.

E.g. in redis I'd just use a sorted set, where the key could be like
"visits_${user_id}", set key would be user_id_visitor and score
the created timestamp.
In MySQL I'd create the table with PK on user_id + user_id_visitor and
create an index on user_id + created
In C* i use an MV.

Is this the most efficient approach?
I also could have done this without an MV but then the situation in our app
would be far more complex.
I know that denormalization is a common pattern in C* and I don't hesitate
to use it but in this case, it is not as simple as it's not an append-only
case but updates have to be handled correctly.
If it is the first visit of a user, it's that simple, just 2 inserts in
base table + denormalized table. But on a 2nd or 3rd visit, the 1st or 2nd
visit has to be deleted from the denormalized table before. Otherwise the
visit would not be unique any more.
Handling this case without an MV requires a lot more effort, I guess even
more effort than just using an MV.
1. You need kind of app-side locking to deal with race conditions
2. Read before write is required to determine if an old record has to be
deleted
3. At least CL_QUORUM is required to make sure that read before write is
always consistent
4. Old record has to be deleted on update

I guess, using an MV here is more efficient as there is less roundtrip
between C* and the app to do all that and the MV does not require strong
consistency as MV updates are always local and are eventual consistent when
the base table is. So there is also no need for distributed locks.

I ask all this as we now use CS 3.x and have been advised that 3.x is still
not considered really production ready.

I guess in a perfect world, this wouldn't even require an MV if SASI
indexes could be created over more than 1 column. E.g. in MySQL this case
is nothing else than a BTree. AFAIK SASI indices are also BTrees, filtering
by Partition Key (which should to be done anyway) and sorting by a field
would perfectly do the trick. But from the docs, this is not possible right
now.

Does anyone see a better solution or are all my assumptions correct?

-- 
Benjamin Roth
Prokurist

Jaumo GmbH · www.jaumo.com
Wehrstraße 46 · 73035 Göppingen · Germany
Phone +49 7161 304880-6 · Fax +49 7161 304880-1
AG Ulm · HRB 731058 · Managing Director: Jens Kammerer


Re: Cassandra data model right definition

2016-10-04 Thread Mehdi Bada
Hi all, 

Just to refocus the debat (because I'm the at the origin of this very 
interesting exchanges). 
I think for a good understanding of the data model of any DMBS, we have 
(technical experts) to decompose the data objects of the model and understand 
how the data is precisely stored and what kind of mechanisms is used. 
In this way, I think, Russell has describe very well the situation, and we can 
said that Apache Cassandra data model can be defined as a Partitioned Row Store 
. 

Many thanks for your all feedbacks and contribution 

Best Regards 
Mehdi Bada 

--- 
Mehdi Bada | Consultant 
Phone: +41 32 422 96 00 | Mobile: +41 79 928 75 48 | Fax: +41 32 499 96 15 
dbi services, Rue de la Jeunesse 2, CH-2800 Delémont 
mehdi.b...@dbi-services.com 
www.dbi-services.com 




From: "Edward Capriolo"  
To: "user"  
Sent: Monday, October 3, 2016 4:53:16 PM 
Subject: Re: Cassandra data model right definition 

My original point can be summed up as: 

Do not define cassandra in terms SMILES & METAPHORS. Such words include "like" 
and "close relative". 

For the specifics: 

Any relational db could (and I'm sure one does!) allow for sparse fields as 
well. MySQL can be backed by rocksdb now, does that make it not a row store? 

Lets draw some lines, a relational database is clearly defined. 

https://en.wikipedia.org/wiki/Edgar_F._Codd 


Codd's theorem , a result proven in his seminal work on the relational model, 
equates the expressive power of relational algebra and relational calculus 
(both of which, lacking recursion, are strictly less powerful than first-order 
logic ). [ citation needed ] 

As the relational model started to become fashionable in the early 1980s, Codd 
fought a sometimes bitter campaign to prevent the term being misused by 
database vendors who had merely added a relational veneer to older technology. 
As part of this campaign, he published his 12 rules to define what constituted 
a relational database. This made his position in IBM increasingly difficult, so 
he left to form his own consulting company with Chris Date and others. 

Cassandra is not a relational database. 



I am have attempted to illustrate that a "row store" is defined as well. I do 
not believe Cassandra is a "row store". 

" Just because it uses log structured storage, sparse fields, and semi-flexible 
collections doesn't disqualify it from calling it a "row store"" 

What is the definition of "row store". Is it a logical construct or a physical 
one? 

Why isn't mongo DB a "row store"? I can drop a schema on top of mongo and 
present it as rows and columns. It seems to pass the litmus test being 
presented. 

https://github.com/mongodb/mongo-hadoop/wiki/Hive-Usage 







On Mon, Oct 3, 2016 at 10:02 AM, Jonathan Haddad < j...@jonhaddad.com > wrote: 


Sorry Ed, but you're really stretching here. A table in Cassandra is structured 
by a schema with the data for each row stored together in each data file. Just 
because it uses log structured storage, sparse fields, and semi-flexible 
collections doesn't disqualify it from calling it a "row store" 

Postgres added flexible storage through hstore, I don't hear anyone arguing 
that it needs to be renamed. 

Any relational db could (and I'm sure one does!) allow for sparse fields as 
well. MySQL can be backed by rocksdb now, does that make it not a row store? 

You're arguing that everything is wrong but you're not proposing an 
alternative, which is not productive. 
On Mon, Oct 3, 2016 at 9:40 AM Edward Capriolo < edlinuxg...@gmail.com > wrote: 

BQ_BEGIN

Also every piece of techincal information that describes a rowstore 

http://cs-www.cs.yale.edu/homes/dna/talks/abadi-sigmod08-slides.pdf 
https://en.wikipedia.org/wiki/Column-oriented_DBMS#Row-oriented_systems 

Does it like this: 
001:10,Smith,Joe,4;
002:12,Jones,Mary,5;
003:11,Johnson,Cathy,44000;
004:22,Jones,Bob,55000; 


The never depict a scenario where a the data looks like this on disk: 

001:10,Smith 
001:10,4; 
Which is much closer to how Cassandra stores it's data. 



On Fri, Sep 30, 2016 at 5:12 PM, Benedict Elliott Smith < bened...@apache.org > 
wrote: 

BQ_BEGIN

Absolutely. A "partitioned row store" is exactly what I would call it. As it 
happens, our README thinks the same, which is fantastic. 

I thought I'd take a look at the rest of our cohort, and didn't get far before 
disappointment. HBase literally calls itself a " column-oriented store" - which 
is so totally wrong it's simultaneously hilarious and tragic. 

I guess we can't blame the wider internet for misunderstanding/misnaming us 
poor "wide column stores" if even one of the major examples doesn't know what 
it, itself, is! 




On 30 September 2016 at 21:47, Jonathan Haddad < j...@jonhaddad.com > wrote: 

BQ_BEGIN
+1000 to what Benedict says. I usually call it a "partitioned row store" which 
usually needs some extra explanation but is more accurate than "column family" 

Re: cassandra dump file path

2016-10-04 Thread Jean Carlo
Yes, we did it.

So if the parameter in cassandra-env.sh is used only if we have a OOM, what
is for the definition of
*-XX:HeapDumpPath=/var/lib/cassandra/java_1475461286.hprof
*in /etc/init.d/cassandra for?


Saludos

Jean Carlo

"The best way to predict the future is to invent it" Alan Kay

On Tue, Oct 4, 2016 at 2:58 AM, Yabin Meng  wrote:

> Have you restarted Cassandra after making changes in cassandra-env.sh?
>
> Yabin
>
> On Mon, Oct 3, 2016 at 7:44 AM, Jean Carlo 
> wrote:
>
>> OK I got the response to one of my questions. In the script
>> /etc/init.d/cassandra we set the path for the heap dump by default in the
>> cassandra_home.
>>
>> Now the thing I don't understand is, why do the dumps are located by the
>> file set by /etc/init.d/cassandra and not by the  conf file
>> cassandra-env.sh?
>>
>> Anyone any idea?
>>
>>
>> Saludos
>>
>> Jean Carlo
>>
>> "The best way to predict the future is to invent it" Alan Kay
>>
>> On Mon, Oct 3, 2016 at 12:00 PM, Jean Carlo 
>> wrote:
>>
>>>
>>> Hi
>>>
>>> I see in the log of my node cassandra that the parameter
>>> -XX:HeapDumpPath is charged two times.
>>>
>>> INFO  [main] 2016-10-03 04:21:29,941 CassandraDaemon.java:205 - JVM
>>> Arguments: [-ea, -javaagent:/usr/share/cassandra/lib/jamm-0.3.0.jar,
>>> -XX:+CMSClassUnloadingEnabled, -XX:+UseThreadPriorities,
>>> -XX:ThreadPriorityPolicy=42, -Xms6G, -Xmx6G, -Xmn600M, 
>>> *-XX:+HeapDumpOnOutOfMemoryError,
>>> -XX:HeapDumpPath=/cassandra/dumps/cassandra-1475461287-pid34435.hprof*,
>>> -Xss256k, -XX:StringTableSize=103, -XX:+UseParNewGC,
>>> -XX:+UseConcMarkSweepGC, -XX:+CMSParallelRemarkEnabled,
>>> -XX:SurvivorRatio=8, -XX:MaxTenuringThreshold=1,
>>> -XX:CMSInitiatingOccupancyFraction=30, -XX:+UseCMSInitiatingOccupancyOnly,
>>> -XX:+UseTLAB, -XX:CompileCommandFile=/etc/cassandra/hotspot_compiler,
>>> -XX:CMSWaitDuration=1, -XX:+CMSParallelInitialMarkEnabled,
>>> -XX:+CMSEdenChunksRecordAlways, -XX:CMSWaitDuration=1,
>>> -XX:+UseCondCardMark, -XX:+PrintGCDetails, -XX:+PrintGCDateStamps,
>>> -XX:+PrintGCApplicationStoppedTime, 
>>> -Xloggc:/var/opt/hosting/log/cassandra/gc.log,
>>> -XX:+UseGCLogFileRotation, -XX:NumberOfGCLogFiles=20,
>>> -XX:GCLogFileSize=20M, -Djava.net.preferIPv4Stack=true,
>>> -Dcom.sun.management.jmxremote.port=7199, 
>>> -Dcom.sun.management.jmxremote.rmi.port=7199,
>>> -Dcom.sun.management.jmxremote.ssl=false, 
>>> -Dcom.sun.management.jmxremote.authenticate=false,
>>> -Dcom.sun.management.jmxremote.password.file=/etc/cassandra/jmxremote.password,
>>> -Djava.io.tmpdir=/var/opt/hosting/db/cassandra/tmp,
>>> -javaagent:/usr/share/cassandra/lib/jolokia-jvm-1.0.6-agent.jar=port=8778,host=0.0.0.0,
>>> -Dcassandra.auth_bcrypt_gensalt_log2_rounds=4,
>>> -Dlogback.configurationFile=logback.xml, 
>>> -Dcassandra.logdir=/var/log/cassandra,
>>> -Dcassandra.storagedir=, 
>>> -Dcassandra-pidfile=/var/run/cassandra/cassandra.pid,
>>> *-XX:HeapDumpPath=/var/lib/cassandra/java_1475461286.hprof*,
>>> -XX:ErrorFile=/var/lib/cassandra/hs_err_1475461286.log]
>>>
>>> This option is defined in cassandra-env.sh
>>>
>>> if [ "x$CASSANDRA_HEAPDUMP_DIR" != "x" ]; then
>>> JVM_OPTS="$JVM_OPTS 
>>> -XX:HeapDumpPath=$CASSANDRA_HEAPDUMP_DIR/cassandra-`date
>>> +%s`-pid$$.hprof"
>>> fi
>>>  and we defined before the value of CASSANDRA_HEAPDUMP_DIR before to
>>>
>>>
>>> */cassandra/dumps/*
>>> It is seems that cassandra does not care about the conf in
>>> cassandra-env.sh and he only takes in account the last set for HeapDumpPath
>>>
>>> */var/lib/cassandra/java_1475461286.hprof*
>>> This causes problems when we have to dump the heap because cassandra
>>> uses the disk not suitable to do it.
>>>
>>> Is  *XX:HeapDumpPath *set in another place/file that I dont know?
>>>
>>> Thxs
>>>
>>> Jean Carlo
>>>
>>> "The best way to predict the future is to invent it" Alan Kay
>>>
>>
>>
>