Re: How to query '%' character using LIKE operator in Cassandra 3.7?

2016-09-15 Thread DuyHai Doan
Ok so I've found the source of the issue, it's pretty well hidden because it is NOT in the SASI source code directly. Here is the method where C* determines what kind of LIKE expression you're using (LIKE_PREFIX , LIKE CONTAINS or LIKE_MATCHES)

Re: How to query '%' character using LIKE operator in Cassandra 3.7?

2016-09-15 Thread DuyHai Doan
Currently SASI can only understand the % for the beginning (suffix) or ending (prefix) position. Any expression containing the % in the middle like %w%a% will not be interpreter by SASI as wildcard. %w%a% will translate into "Give me all results containing w%a On Thu, Sep 15, 2016 at 3:58 PM,

Maximum number of columns in a table

2016-09-15 Thread Dorian Hoxha
Is there alot of overhead with having a big number of columns in a table ? Not unbounded, but say, would 2000 be a problem(I think that's the maximum I'll need) ? Thank You

Re: How to query '%' character using LIKE operator in Cassandra 3.7?

2016-09-15 Thread Mikhail Krupitskiy
Thank you for the investigation. Will wait for a fix and news. Probably it’s not a directly related question but what do you think about CASSANDRA-12573? Let me know if it’s better to create a separate thread for it. Thanks, Mikhail > On 15 Sep 2016, at 16:02, DuyHai Doan

Re: Maximum number of columns in a table

2016-09-15 Thread DuyHai Doan
There is no real limit in term of number of columns in a table, I would say that the impact of having a lot of columns is the amount of meta data C* needs to keep in memory for encoding/decoding each row. Now, if you have a table with 1000+ columns, the problem is probably your data model... On

Is to ok restart DECOMMISION

2016-09-15 Thread laxmikanth sadula
I started decommssioned a node in our cassandra cluster. But its taking too long time (more than 12 hrs) , so I would like to restart(stop/kill the node & restart 'node decommission' again).. Does killing node/stopping decommission and restarting decommission will cause any issues to cluster?

Re: Read timeouts on primary key queries

2016-09-15 Thread Joseph Tech
I added the error logs and see that the timeouts are in a range b/n 2 to 7s. Samples below: Query error after 5354 ms: [4 bound values] Query error after 6658 ms: [4 bound values] Query error after 4596 ms: [4 bound values] Query error after 2068 ms: [4 bound values] Query error after

Re: Maximum number of columns in a table

2016-09-15 Thread DuyHai Doan
"The data model is too dynamic" --> then create a table to cope with dynamic data types. Example CREATE TABLE dynamic_data( object_id uuid, property_name text, property_type text, bool_value boolean, long_value bigint, decimal_value double, text_value text,

Re: Maximum number of columns in a table

2016-09-15 Thread sfesc...@gmail.com
Another possible alternative is to use a single map column. On Thu, Sep 15, 2016 at 7:19 AM Dorian Hoxha wrote: > Since I will only have 1 table with that many columns, and the other > tables will be "normal" tables with max 30 columns, and the memory of 2K > columns

Re: Is to ok restart DECOMMISION

2016-09-15 Thread Kaide Mu
As far as I know restarting decommission shouldn't cause any problem to your cluster, but please note that decommission is not resumable in your Cassandra version (Resumable support will be introduced in 3.10), thus by restarting it you will restart the whole process. On Thu, Sep 15, 2016, 3:29

Re: Is to ok restart DECOMMISION

2016-09-15 Thread sai krishnam raju potturi
hi Laxmi; what's the size of data per node? If the data is really huge, then let the decommission process continue. Else; stop the cassandra process on the decommissioning node, and from another node in the datacenter, do a "nodetool removenode host-id". This might speed up the decommissioning

Re: Maximum number of columns in a table

2016-09-15 Thread Dorian Hoxha
Since I will only have 1 table with that many columns, and the other tables will be "normal" tables with max 30 columns, and the memory of 2K columns won't be that big, I'm gonna guess I'll be fine. The data model is too dynamic, the alternative would be to create a table for each user which will

Re: Maximum number of columns in a table

2016-09-15 Thread DuyHai Doan
"Another possible alternative is to use a single map column" --> how do you manage the different types then ? Because maps in Cassandra are strongly typed Unless you set the type of map value to blob, in this case you might as well store all the object as a single blob column On Thu, Sep 15,

Re: CASSANDRA-5376: CQL IN clause on last key not working when schema includes set,list or map

2016-09-15 Thread Samba
any update on this issue? the quoted JIRA issue (CASSANDRA-5376) is resolved as fixed in 1.2.4 but it is still not possible (even in 3.7) to use IN operator in queries that fetch collection columns. is the fix only to report better error message that this is not possible or was it fixed then

Re: Maximum number of columns in a table

2016-09-15 Thread Dorian Hoxha
@DuyHai Yes, that's another case, the "entity" model used in rdbms. But I need rows together to work with them (indexing etc). @sfespace The map is needed when you have a dynamic schema. I don't have a dynamic schema (may have, and will use the map if I do). I just have thousands of schemas. One

Re: Maximum number of columns in a table

2016-09-15 Thread DuyHai Doan
"But I need rows together to work with them (indexing etc)" What do you mean rows together ? You mean that you want to fetch a single row instead of 1 row per property right ? In this case, the map might be the solution: CREATE TABLE generic_with_maps( object_id uuid boolean_map map

Re: Maximum number of columns in a table

2016-09-15 Thread sfesc...@gmail.com
I agree a single blob would also work (I do that in some cases). The reason for the map is if you need more flexible updating. I think your solution of a map/data type works well. On Thu, Sep 15, 2016 at 11:10 AM DuyHai Doan wrote: > "But I need rows together to work with

Re: How to query '%' character using LIKE operator in Cassandra 3.7?

2016-09-15 Thread DuyHai Doan
Ok I get around the issue about %w%a% So this will be interpreter first by the CQL parser as LIKE CONTAINS with searched term = w%a And then things get complicated 1) if you're using NonTokeninzingAnalyzer or NoOpAnalyzer, everything is fine, the % in 'w%a' is interpreted as simple literal and

Re: Maximum number of columns in a table

2016-09-15 Thread Dorian Hoxha
Yes that makes more sense. But the problem is I can't use secondary indexing "where int25=5", while with normal columns I can. On Thu, Sep 15, 2016 at 8:23 PM, sfesc...@gmail.com wrote: > I agree a single blob would also work (I do that in some cases). The > reason for the

Re: Maximum number of columns in a table

2016-09-15 Thread DuyHai Doan
"But the problem is I can't use secondary indexing "where int25=5", while with normal columns I can." You have many objectives that contradict themselves in term of impl. Right now you're unlucky, SASI does not support indexing collections yet (it may come in future, when ? ¯\_(ツ)_/¯ ) If

Re: CASSANDRA-5376: CQL IN clause on last key not working when schema includes set,list or map

2016-09-15 Thread Tyler Hobbs
That ticket was just to improve the error message. From the comments on the ticket: "Unfortunately, handling collections is slightly harder than what CASSANDRA-5230 aimed for, because we can't do a name query. So this will have to wait for

Re: Is to ok restart DECOMMISION

2016-09-15 Thread Mark Rose
I've done that several times. Kill the process, restart it, let it sync, decommission. You'll need enough space on the receiving nodes for the full set of data, on top of the other data that was already sent earlier, plus room to cleanup/compact it. Before you kill, check system.log to see if it

Re: race condition for quorum consistency

2016-09-15 Thread Alexander Dejanovski
I haven't been very accurate in my first answer indeed, which was misleading. Apache Cassandra guarantees that if all queries are ran at least at quorum, a client writing successfully (as in the cluster acknowledged the write) then reading his previous write will see the correct value unless

Re: Maximum number of columns in a table

2016-09-15 Thread Hannu Kröger
Hi, The ‘old-fashioned’ secondary indexes do support index of collection values: https://docs.datastax.com/en/cql/3.1/cql/ddl/ddlIndexColl.html Br, Hannu > On 15 Sep 2016, at 15:59, DuyHai Doan wrote: > >

Re: Maximum number of columns in a table

2016-09-15 Thread Hannu Kröger
I do agree on that. > On 15 Sep 2016, at 16:23, DuyHai Doan wrote: > > I'd advise anyone against using the old native secondary index ... You'll get > poor performance (that's the main reason why some people developed SASI). > > On Thu, Sep 15, 2016 at 10:20 PM, Hannu

Re: Maximum number of columns in a table

2016-09-15 Thread Dorian Hoxha
@DuyHai I know they don't support. I need key+value mapping, not just "values" or just "keys". I'll use the lucene index. On Thu, Sep 15, 2016 at 10:23 PM, DuyHai Doan wrote: > I'd advise anyone against using the old native secondary index ... You'll > get poor

Re: Maximum number of columns in a table

2016-09-15 Thread DuyHai Doan
I'd advise anyone against using the old native secondary index ... You'll get poor performance (that's the main reason why some people developed SASI). On Thu, Sep 15, 2016 at 10:20 PM, Hannu Kröger wrote: > Hi, > > The ‘old-fashioned’ secondary indexes do support index of

Re: Streaming Process: How can we speed it up?

2016-09-15 Thread Vasileios Vlachos
Hello and thanks for your responses, OK, so increasing stream_throughput_outbound_megabits_per_sec makes no difference. Any ideas why streaming is limited to only two of the three nodes available? As an alternative to slow streaming I tried this: - install C* on a new node, stop the service

Re: Streaming Process: How can we speed it up?

2016-09-15 Thread Ben Slater
We’ve successfully used the rsynch method you outline quite a few times in situations where we’ve had clusters that take forever to add new nodes (mainly due to secondary indexes) and need to do a quick replacement for one reason or another. As you mention, the main disadvantage we ran into is

Re: Streaming Process: How can we speed it up?

2016-09-15 Thread Vasileios Vlachos
Thanks for sharing your experience Ben On 15 Sep 2016 11:35 am, "Ben Slater" wrote: > We’ve successfully used the rsynch method you outline quite a few times in > situations where we’ve had clusters that take forever to add new nodes > (mainly due to secondary