Re: Does Cassandra supports ACID txn

2018-04-19 Thread Sylvain Lebresne
On Thu, Apr 19, 2018 at 11:13 AM Rajesh Kishore 
wrote:

> Thanks for the response. Let me put my question again wrt a example
>
> I want to perform a atomic txn say insert/delete/update on a set of tables
> TableA
> TableB
> TableC
>
> When these are performed as batch operations and let us say something goes
> wrong while doing operation at TableC
> Would the system rollback the operations done for TableA TableB ?
>

No, batch have no rollback whatsoever.

Batch[1] use a distributed log however which make it so that the operation
on TableC will be retried internally until it
succeeds, so the guarantee you do get is that if some operations of a batch
are applied, then all of them will
*eventually* get applied. There is no isolation however, and you can very
well observer a state where you can read
the operations on TableA and TableB without seeing the ones on TableC
because they haven't been retried yet.


> -Rajesh
>
>
>
> On Thu, Apr 19, 2018 at 1:25 PM, Jacques-Henri Berthemet <
> jacques-henri.berthe...@genesys.com> wrote:
>
>> Cassandra support LWT (Lightweight transactions), you may find this doc
>> interesting:
>>
>>
>> https://docs.datastax.com/en/cassandra/3.0/cassandra/dml/dmlDataConsistencyTOC.html
>>
>>
>>
>> In any case, LWT or BATCH you won’t have external control on the tx, it’s
>> either done or not done. In case of timeout you won’t have a way to know
>> if it worked or not.
>>
>> There is no way to rollback a statement/batch, the only way is to send an
>> update to modify the partition to its previous state.
>>
>>
>>
>> Regards,
>>
>> *--*
>>
>> *Jacques-Henri Berthemet*
>>
>>
>>
>> *From:* DuyHai Doan [mailto:doanduy...@gmail.com]
>> *Sent:* Thursday, April 19, 2018 9:10 AM
>> *To:* user 
>> *Subject:* Re: Does Cassandra supports ACID txn
>>
>>
>>
>> No ACID transaction any soon in Cassandra
>>
>>
>>
>> On Thu, Apr 19, 2018 at 7:35 AM, Rajesh Kishore 
>> wrote:
>>
>> Hi,
>>
>> I am bit confused by reading different articles, does recent version of
>> Cassandra supports ACID transaction ?
>>
>> I found BATCH command , but not sure if it supports rollback, consider
>> that transaction I am going to perform would be on single partition.
>>
>> Also, what are the limitations if any?
>>
>>
>>
>> Thanks,
>>
>> Rajesh
>>
>>
>>
>
>


Re: Cassandra Needs to Grow Up by Version Five!

2018-02-22 Thread Sylvain Lebresne
>
> I have to disagree with people here and point out that just creating
> JIRA's and (trying to) have discussions about these issues will not lead to
> change in any reasonable timeframe, because everyone who could do the work
> has an endless list of bigger fish to fry. I strongly encourage you to get
> involved and write some code, or pay someone to do it, because to put it
> bluntly, it's *very* unlikely your JIRA's will get actioned unless you
> contribute significantly to them yourself.
>

Though I don't truly disagree with the overall point that getting into code
is the surest way to get something you care about see progress, I'd love
for this to not be understood as "we don't care about your idea unless you
bring code". There has been tons of JIRA tickets in the past suggesting
improvements where some contributor said "you know what, that's a good
idea" and implemented it. I've certainly see it happen numerous times and
trust I did it a lot as well (and sure, it happens dis-proportionally more
for small improvement than for lets-rewrite-the-whole-database ones, for
obvious reasons hopefully).

So if you have a relatively concrete idea for an improvement, I'd say,
please, share it. Don't get me wrong though, please do your homework first
and take a few minutes googling/JIRA searching to see if that hasn't been
discussed first; don't assume your time is more valuable than that of other
contributors. It's rude to assume so (I'd say in general, but even more so
because it's a free-as-in-beer software).

That said, and to paraphrase what others have said, one should always come
to this with a few understandings:
- For all that people may like your idea and have the time to help it get
in, there is not guarantee here. And yes, more often than not, contributors
already have a list of things they want to fix and only a finite amount of
time for contributions, so the bar for your idea to make it in some other
contributor "list" is probably high. And remember that behavior science
strongly suggests that you thinking your ideas are obviously the most
important ones likely involves a fair amount of bias. That's why
contributing the code yourself, if possible, definitively helps a lot.
- A distributed database is not exactly a simple software. In particular,
Cassandra make the choice to be fully distributed, which is a clear
trade-off: it gives it very interesting properties (scalability, fault
tolerance, ...) almost for free, but it makes some things quite a bit more
challenging. My point being, some things may look like easy problem to
solve on the surface, but are in fact more complex than they appear (which
in turns means solving them take much more time that it seems, and we get
back to contribution time/efforts not be infinite). So it's imo a good idea
to seek first to understand why things are a certain way rather than assume
than contributors don't care.
- Cassandra is not perfect, no software is, but don't assume contributors
are not aware of the weaknesses. We are for the most part. So if those
weaknesses are still there, it's generally (there is of course exceptions)
due to some combination of 1) a lack of time, 2) the difficulties of
solving those weaknesses (without creating new, worth ones) and 3) some
actually well though trade-off (we accept that weakness as the price for
other strengths). As such, if you come simply pointing deficiencies, you
may feel like you are pointing things nobody knows, but chances are, you
aren't. You're probably just reminding contributors how frustrating it is
they don't have time to solve everything. Pointing deficiencies is ok, but
unless you take the time to offer some constructive steps to improve as
well, it's often useless to be honest.

--
Sylvain


Proposal for deprecating/removing the read_repair_chance/dclocal_read_repair_chance table options

2017-09-29 Thread Sylvain Lebresne
We are considering deprecating and them ultimately removing the 2
table options: 'read_repair_chance' and 'dclocal_read_repair_chance'.
The rational and much more details are on CASSANDRA-13910
(https://issues.apache.org/jira/browse/CASSANDRA-13910), so I won't
repeat it here.

The goal of this email is to raise awareness of this intention for
those that don't follow JIRA closely. In particular, if those options
are really important to you, we'll love to hear more of why that is.
In any case, if you have feedback, either answer this email or comment
on the JIRA ticket.

--
Sylvain

-
To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
For additional commands, e-mail: user-h...@cassandra.apache.org



Re: WriteTimeoutException with LWT after few milliseconds

2017-04-19 Thread Sylvain Lebresne
This is https://issues.apache.org/jira/browse/CASSANDRA-9328 and I'd rather
not repeat myself here so I'll let you read the details. Let's maybe not
open another JIRA ticket though since we have this one.

On Wed, Apr 19, 2017 at 4:29 PM, benjamin roth  wrote:

> Thanks, Jeff!
>
> As soon as I have some spare time I will try to reproduce and open a Jira
> for it.
>
> 2017-04-19 16:27 GMT+02:00 Jeff Jirsa :
>
>>
>>
>> On 2017-04-13 05:13 (-0700), benjamin roth  wrote:
>> > I found out that if the WTEs occur, there was already another process
>> > inserting the same primary key because I found duplicates in some places
>> > that perfectly match the WTE logs.
>> >
>> > Does anybody know, why this throws a WTE instead of returning
>> [applied]' =
>> > false ?
>> > This is quite confusing!
>> >
>>
>> Certainly seems wrong. May want to open a JIRA, especially if it's
>> reproducible. Should mention what version and client you're using.
>>
>>
>


Re: Count(*) is not working

2017-02-20 Thread Sylvain Lebresne
I guess I misspoke, sorry. It is true that count() as any other query is
still governed by the read timeout and any count that has to process a lot
of data will take a long time and will require a high timeout set to not
timeout (true of every aggregation query as it happens).

I guess I responded too quickly to "if you want a reliable count" because
for a while count() was actually OOMing for large partition/queries, which
is not true anymore, and my brain made a connection that wasn't there since
that's not what Kurt said.

So sorry for the noise.

On Mon, Feb 20, 2017 at 2:47 PM, Benjamin Roth <benjamin.r...@jaumo.com>
wrote:

> +1 I also encountered timeouts many many times (using DS DevCenter).
> Roughly this occured when count(*) > 1.000.000
>
> 2017-02-20 14:42 GMT+01:00 Edward Capriolo <edlinuxg...@gmail.com>:
>
>> Seems worth it to file a bug since some here are under the impression it
>> almost always works and others are under the impression it almost never
>> works.
>>
>> On Friday, February 17, 2017, kurt greaves <k...@instaclustr.com> wrote:
>>
>>> really... well that's good to know. it still almost never works though.
>>> i guess every time I've seen it it must have timed out due to tombstones.
>>>
>>> On 17 Feb. 2017 22:06, "Sylvain Lebresne" <sylv...@datastax.com> wrote:
>>>
>>> On Fri, Feb 17, 2017 at 11:54 AM, kurt greaves <k...@instaclustr.com>
>>> wrote:
>>>
>>>> if you want a reliable count, you should use spark. performing a count
>>>> (*) will inevitably fail unless you make your server read timeouts and
>>>> tombstone fail thresholds ridiculous
>>>>
>>>
>>> That's just not true. count(*) is paged internally so while it is not
>>> particular fast, it shouldn't require bumping neither the read timeout nor
>>> the tombstone fail threshold in any way to work.
>>>
>>> In that case, it seems the partition does have many tombstones (more
>>> than live rows) and so the tombstone threshold is doing its job of warning
>>> about it.
>>>
>>>
>>>>
>>>> On 17 Feb. 2017 04:34, "Jan" <j...@dafuer.de> wrote:
>>>>
>>>>> Hi,
>>>>>
>>>>> could you post the output of nodetool cfstats for the table?
>>>>>
>>>>> Cheers,
>>>>>
>>>>> Jan
>>>>>
>>>>> Am 16.02.2017 um 17:00 schrieb Selvam Raman:
>>>>>
>>>>> I am not getting count as result. Where i keep on getting n number of
>>>>> results below.
>>>>>
>>>>> Read 100 live rows and 1423 tombstone cells for query SELECT * FROM
>>>>> keysace.table WHERE token(id) > token(test:ODP0144-0883E-022R-002/047-052)
>>>>> LIMIT 100 (see tombstone_warn_threshold)
>>>>>
>>>>> On Thu, Feb 16, 2017 at 12:37 PM, Jan Kesten <j...@dafuer.de> wrote:
>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> do you got a result finally?
>>>>>>
>>>>>> Those messages are simply warnings telling you that c* had to read
>>>>>> many tombstones while processing your query - rows that are deleted but 
>>>>>> not
>>>>>> garbage collected/compacted. This warning gives you some explanation why
>>>>>> things might be much slower than expected because per 100 rows that count
>>>>>> c* had to read about 15 times rows that were deleted already.
>>>>>>
>>>>>> Apart from that, count(*) is almost always slow - and there is a
>>>>>> default limit of 10.000 rows in a result.
>>>>>>
>>>>>> Do you really need the actual live count? To get a idea you can
>>>>>> always look at nodetool cfstats (but those numbers also contain deleted
>>>>>> rows).
>>>>>>
>>>>>>
>>>>>> Am 16.02.2017 um 13:18 schrieb Selvam Raman:
>>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> I want to know the total records count in table.
>>>>>>
>>>>>> I fired the below query:
>>>>>>select count(*) from tablename;
>>>>>>
>>>>>> and i have got the below output
>>>>>>
>>>>>> Read 100 live rows and 1423 tombstone cells for query SELECT * FROM
>>>>>> keysace.table WHERE token(id) > 
>>>>>> token(test:ODP0144-0883E-022R-002/047-052)
>>>>>> LIMIT 100 (see tombstone_warn_threshold)
>>>>>>
>>>>>> Read 100 live rows and 1435 tombstone cells for query SELECT * FROM
>>>>>> keysace.table WHERE token(id) > token(test:2565-AMK-2) LIMIT 100 (see
>>>>>> tombstone_warn_threshold)
>>>>>>
>>>>>> Read 96 live rows and 1385 tombstone cells for query SELECT * FROM
>>>>>> keysace.table WHERE token(id) > token(test:-2220-UV033/04) LIMIT 100 (see
>>>>>> tombstone_warn_threshold).
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> Can you please help me to get the total count of the table.
>>>>>>
>>>>>> --
>>>>>> Selvam Raman
>>>>>> "லஞ்சம் தவிர்த்து நெஞ்சம் நிமிர்த்து"
>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Selvam Raman
>>>>> "லஞ்சம் தவிர்த்து நெஞ்சம் நிமிர்த்து"
>>>>>
>>>>>
>>>>>
>>>
>>>
>>
>> --
>> Sorry this was sent from mobile. Will do less grammar and spell check
>> than usual.
>>
>
>
>
> --
> Benjamin Roth
> Prokurist
>
> Jaumo GmbH · www.jaumo.com
> Wehrstraße 46 · 73035 Göppingen · Germany
> Phone +49 7161 304880-6 <+49%207161%203048806> · Fax +49 7161 304880-1
> <+49%207161%203048801>
> AG Ulm · HRB 731058 · Managing Director: Jens Kammerer
>


Re: Count(*) is not working

2017-02-17 Thread Sylvain Lebresne
On Fri, Feb 17, 2017 at 11:54 AM, kurt greaves  wrote:

> if you want a reliable count, you should use spark. performing a count (*)
> will inevitably fail unless you make your server read timeouts and
> tombstone fail thresholds ridiculous
>

That's just not true. count(*) is paged internally so while it is not
particular fast, it shouldn't require bumping neither the read timeout nor
the tombstone fail threshold in any way to work.

In that case, it seems the partition does have many tombstones (more than
live rows) and so the tombstone threshold is doing its job of warning about
it.


>
> On 17 Feb. 2017 04:34, "Jan"  wrote:
>
>> Hi,
>>
>> could you post the output of nodetool cfstats for the table?
>>
>> Cheers,
>>
>> Jan
>>
>> Am 16.02.2017 um 17:00 schrieb Selvam Raman:
>>
>> I am not getting count as result. Where i keep on getting n number of
>> results below.
>>
>> Read 100 live rows and 1423 tombstone cells for query SELECT * FROM
>> keysace.table WHERE token(id) > token(test:ODP0144-0883E-022R-002/047-052)
>> LIMIT 100 (see tombstone_warn_threshold)
>>
>> On Thu, Feb 16, 2017 at 12:37 PM, Jan Kesten  wrote:
>>
>>> Hi,
>>>
>>> do you got a result finally?
>>>
>>> Those messages are simply warnings telling you that c* had to read many
>>> tombstones while processing your query - rows that are deleted but not
>>> garbage collected/compacted. This warning gives you some explanation why
>>> things might be much slower than expected because per 100 rows that count
>>> c* had to read about 15 times rows that were deleted already.
>>>
>>> Apart from that, count(*) is almost always slow - and there is a default
>>> limit of 10.000 rows in a result.
>>>
>>> Do you really need the actual live count? To get a idea you can always
>>> look at nodetool cfstats (but those numbers also contain deleted rows).
>>>
>>>
>>> Am 16.02.2017 um 13:18 schrieb Selvam Raman:
>>>
>>> Hi,
>>>
>>> I want to know the total records count in table.
>>>
>>> I fired the below query:
>>>select count(*) from tablename;
>>>
>>> and i have got the below output
>>>
>>> Read 100 live rows and 1423 tombstone cells for query SELECT * FROM
>>> keysace.table WHERE token(id) > token(test:ODP0144-0883E-022R-002/047-052)
>>> LIMIT 100 (see tombstone_warn_threshold)
>>>
>>> Read 100 live rows and 1435 tombstone cells for query SELECT * FROM
>>> keysace.table WHERE token(id) > token(test:2565-AMK-2) LIMIT 100 (see
>>> tombstone_warn_threshold)
>>>
>>> Read 96 live rows and 1385 tombstone cells for query SELECT * FROM
>>> keysace.table WHERE token(id) > token(test:-2220-UV033/04) LIMIT 100 (see
>>> tombstone_warn_threshold).
>>>
>>>
>>>
>>>
>>> Can you please help me to get the total count of the table.
>>>
>>> --
>>> Selvam Raman
>>> "லஞ்சம் தவிர்த்து நெஞ்சம் நிமிர்த்து"
>>>
>>>
>>
>>
>> --
>> Selvam Raman
>> "லஞ்சம் தவிர்த்து நெஞ்சம் நிமிர்த்து"
>>
>>
>>


Re: DELETE/SELECT with multi-column PK and IN

2017-02-09 Thread Sylvain Lebresne
On Thu, Feb 9, 2017 at 10:52 AM, Benjamin Roth <benjamin.r...@jaumo.com>
wrote:

> Ok got it.
>
> But it's interesting that this is supported:
> DELETE/SELECT FROM ks.cf WHERE (pk1) IN ((1), (2), (3));
>
> This is technically mostly the same (Token awareness,
> coordination/routing, read performance, ...), right?
>

It is. That's what I meant by "there is something to be said for the
consistency of the CQL language in general". In other words, look for no
externally logical reason for this being unsupported, it's unsupported
simply due to how the CQL code evolved. But as I said, we didn't fix that
inconsistency because we're all busy and it's not really that important in
practice. The project of course welcome any contributions though :)


>
> 2017-02-09 10:43 GMT+01:00 Sylvain Lebresne <sylv...@datastax.com>:
>
>> This is a statement on multiple partitions and there is really no
>> optimization the code internally does on that. In fact, I strongly advise
>> you to not use a batch but rather simply do a for loop client side and send
>> statement individually. That way, your driver will be able to use proper
>> token-awareness for each request (while if you send a batch, one
>> coordinator will be picked up and will have to forward most statement,
>> doing more network hops at the end of the day). The only case where using a
>> batch is indeed legit is if you care about all the statement being atomic,
>> but in that case it's a logged batch you want.
>>
>> That's btw more or less why we never bothered implementing that: it's
>> totally doable technically, but it's not really such a good idea
>> performance wise in practice most of the time, and you can easily work it
>> around with a batch if you need atomicity.
>>
>> Which is not saying it will never be and shouldn't be supported btw,
>> there is something to be said for the consistency of the CQL language in
>> general. But it's why no-one took time to do it so far.
>>
>> On Thu, Feb 9, 2017 at 10:36 AM, Benjamin Roth <benjamin.r...@jaumo.com>
>> wrote:
>>
>>> Yes, thats the workaround - I'll try that.
>>>
>>> Would you agree it would be better for internal optimizations to process
>>> this within a single statement?
>>>
>>> 2017-02-09 10:32 GMT+01:00 Ben Slater <ben.sla...@instaclustr.com>:
>>>
>>>> Yep, that makes it clear. I think an unlogged batch of prepared
>>>> statements with one statement per PK tuple would be roughly equivalent? And
>>>> probably no more complex to generate in the client?
>>>>
>>>> On Thu, 9 Feb 2017 at 20:22 Benjamin Roth <benjamin.r...@jaumo.com>
>>>> wrote:
>>>>
>>>>> Maybe that makes it clear:
>>>>>
>>>>> DELETE FROM ks.cf WHERE (partitionkey1, partitionkey2) IN ((1, 2),
>>>>> (1, 3), (2, 3), (3, 4));
>>>>>
>>>>> If want to delete or select a bunch of records identified by their
>>>>> multi-partitionkey tuples.
>>>>>
>>>>> 2017-02-09 10:18 GMT+01:00 Ben Slater <ben.sla...@instaclustr.com>:
>>>>>
>>>>> Are you looking this to be equivalent to (PK1=1 AND PK2=2) or are you
>>>>> looking for (PK1 IN (1,2) AND PK2 IN (1,2)) or something else?
>>>>>
>>>>> Cheers
>>>>> Ben
>>>>>
>>>>> On Thu, 9 Feb 2017 at 20:09 Benjamin Roth <benjamin.r...@jaumo.com>
>>>>> wrote:
>>>>>
>>>>> Hi Guys,
>>>>>
>>>>> CQL says this is not allowed:
>>>>>
>>>>> DELETE FROM ks.cf WHERE (pk1, pk2) IN ((1, 2));
>>>>>
>>>>> 1. Is there a reason for it? There shouldn't be a performance penalty,
>>>>> it is a PK lookup, the same thing works with a single pk column
>>>>> 2. Is there a known workaround for it?
>>>>>
>>>>> It would be much of a help to have it for daily business, IMHO it's a
>>>>> waste of resources to run multiple queries just to fetch a bunch of 
>>>>> records
>>>>> by a PK.
>>>>>
>>>>> Thanks in advance for any reply
>>>>>
>>>>> --
>>>>> Benjamin Roth
>>>>> Prokurist
>>>>>
>>>>> Jaumo GmbH · www.jaumo.com
>>>>> Wehrstraße 46 · 73035 Göppingen · Germany
>>>>> Phone +49 7161 304880-6 <+49%207161%203048806> · Fax +49 7161 304880-1
>>>>>

Re: DELETE/SELECT with multi-column PK and IN

2017-02-09 Thread Sylvain Lebresne
This is a statement on multiple partitions and there is really no
optimization the code internally does on that. In fact, I strongly advise
you to not use a batch but rather simply do a for loop client side and send
statement individually. That way, your driver will be able to use proper
token-awareness for each request (while if you send a batch, one
coordinator will be picked up and will have to forward most statement,
doing more network hops at the end of the day). The only case where using a
batch is indeed legit is if you care about all the statement being atomic,
but in that case it's a logged batch you want.

That's btw more or less why we never bothered implementing that: it's
totally doable technically, but it's not really such a good idea
performance wise in practice most of the time, and you can easily work it
around with a batch if you need atomicity.

Which is not saying it will never be and shouldn't be supported btw, there
is something to be said for the consistency of the CQL language in general.
But it's why no-one took time to do it so far.

On Thu, Feb 9, 2017 at 10:36 AM, Benjamin Roth 
wrote:

> Yes, thats the workaround - I'll try that.
>
> Would you agree it would be better for internal optimizations to process
> this within a single statement?
>
> 2017-02-09 10:32 GMT+01:00 Ben Slater :
>
>> Yep, that makes it clear. I think an unlogged batch of prepared
>> statements with one statement per PK tuple would be roughly equivalent? And
>> probably no more complex to generate in the client?
>>
>> On Thu, 9 Feb 2017 at 20:22 Benjamin Roth 
>> wrote:
>>
>>> Maybe that makes it clear:
>>>
>>> DELETE FROM ks.cf WHERE (partitionkey1, partitionkey2) IN ((1, 2), (1,
>>> 3), (2, 3), (3, 4));
>>>
>>> If want to delete or select a bunch of records identified by their
>>> multi-partitionkey tuples.
>>>
>>> 2017-02-09 10:18 GMT+01:00 Ben Slater :
>>>
>>> Are you looking this to be equivalent to (PK1=1 AND PK2=2) or are you
>>> looking for (PK1 IN (1,2) AND PK2 IN (1,2)) or something else?
>>>
>>> Cheers
>>> Ben
>>>
>>> On Thu, 9 Feb 2017 at 20:09 Benjamin Roth 
>>> wrote:
>>>
>>> Hi Guys,
>>>
>>> CQL says this is not allowed:
>>>
>>> DELETE FROM ks.cf WHERE (pk1, pk2) IN ((1, 2));
>>>
>>> 1. Is there a reason for it? There shouldn't be a performance penalty,
>>> it is a PK lookup, the same thing works with a single pk column
>>> 2. Is there a known workaround for it?
>>>
>>> It would be much of a help to have it for daily business, IMHO it's a
>>> waste of resources to run multiple queries just to fetch a bunch of records
>>> by a PK.
>>>
>>> Thanks in advance for any reply
>>>
>>> --
>>> Benjamin Roth
>>> Prokurist
>>>
>>> Jaumo GmbH · www.jaumo.com
>>> Wehrstraße 46 · 73035 Göppingen · Germany
>>> Phone +49 7161 304880-6 <+49%207161%203048806> · Fax +49 7161 304880-1
>>> <+49%207161%203048801>
>>> AG Ulm · HRB 731058 · Managing Director: Jens Kammerer
>>>
>>> --
>>> 
>>> Ben Slater
>>> Chief Product Officer
>>> Instaclustr: Cassandra + Spark - Managed | Consulting | Support
>>> +61 437 929 798 <+61%20437%20929%20798>
>>>
>>>
>>>
>>>
>>> --
>>> Benjamin Roth
>>> Prokurist
>>>
>>> Jaumo GmbH · www.jaumo.com
>>> Wehrstraße 46 · 73035 Göppingen · Germany
>>> Phone +49 7161 304880-6 <+49%207161%203048806> · Fax +49 7161 304880-1
>>> <+49%207161%203048801>
>>> AG Ulm · HRB 731058 · Managing Director: Jens Kammerer
>>>
>> --
>> 
>> Ben Slater
>> Chief Product Officer
>> Instaclustr: Cassandra + Spark - Managed | Consulting | Support
>> +61 437 929 798 <+61%20437%20929%20798>
>>
>
>
>
> --
> Benjamin Roth
> Prokurist
>
> Jaumo GmbH · www.jaumo.com
> Wehrstraße 46 · 73035 Göppingen · Germany
> Phone +49 7161 304880-6 <+49%207161%203048806> · Fax +49 7161 304880-1
> <+49%207161%203048801>
> AG Ulm · HRB 731058 · Managing Director: Jens Kammerer
>


Re: Benefit of LOCAL_SERIAL consistency

2016-12-08 Thread Sylvain Lebresne
On Fri, Dec 9, 2016 at 1:35 AM, Edward Capriolo 
wrote:
>
> I copied the wrong issue:
>
> The core issue was this: https://issues.apache.
> org/jira/browse/CASSANDRA-6123
>

Well, my previous remark applies equally well to this ticket so let me just
copy-paste:
"That ticket has nothing to do with LWT. In fact, LWT is the one mechanism
in
Cassandra where this ticket has not impact whatsoever because the whole
point of
the mechanism is to ensure timestamps are assigned in a collision free
manner."


> Which I believe was one of the key "call me maybe" Created issues.
>

The "call me maybe" blog post was not at exclusively about LWT and its
linearizability, so 6123 may have been an issue created following that
post, but it's
unrelated to LWT and again, don't affect it.


>
> 6123 references: this https://issues.apache.org/jira/browse/CASSANDRA-8892
>
>
> Which duplicates:
>
> https://issues.apache.org/jira/browse/CASSANDRA-6123
>
> So it is unclear to me what was resolved.
>

Again, it's unrelated to LWT but I don't think there is anything unclear
here: 6123 is
not resolved as indicated by the jira "resolution". Maybe what you found
unclear is
that CASSANDRA-8892 has the status "resolved", but that's just JIRA ugliness
here: marking a ticket "duplicate" imply "resolving" it as far as JIRA is
concerned, so
when you see "duplicate", you should basically ignore that ticket and only
look the
ticket it duplicates.


Re: Benefit of LOCAL_SERIAL consistency

2016-12-08 Thread Sylvain Lebresne
> The reason you don't want to use SERIAL in multi-DC clusters

I'm not a fan of blanket statements like that. There is a high cost to
SERIAL
consistency in multi-DC setups, but if you *need* global linearizability,
then
you have no choice and the latency may be acceptable for your use case. Take
the example of using LWT to ensure no 2 user creates accounts with the same
name in your system: it's something you don't want to screw up, but it's
also
something for which a high-ish latency is probably acceptable. I don't think
users would get super pissed off because registering a new account on some
service takes 500ms.

So yes it's costly, as is most things that willingly depends on cross-DC
latency, but I don't think that means it's never ever useful.

> So, I am not sure about what is the good use case for LOCAL_SERIAL.

Well, a good use case is when you're ok with operations within a datacenter
to
be linearizable, but can accept 2 operations in different datacenters to
not be.
Imagine a service that pins a given user to a DC on login for different
reasons,
that service might be fine using LOCAL_SERIAL for operations confined to a
given user session since it knows it's DC local.

So I think both SERIAL and LOCAL_SERIAL have their uses, though we
absolutely
agree they are not meant to be used together. And it's certainly worth
trying to
design your system in a way that make sure LOCAL_SERIAL is enough for you,
if
you can, since SERIAL is pretty costly. But that doesn't mean there isn't
case
where you care more about global linearizability than latency: engineering
is
all about trade-offs.

> I am not sure what of the state of this is anymore but I was under the
> impression the linearizability of lwt was in question. I never head it
> specifically addressed.

That's a pretty vague statement to make, let's not get into FUD. You
"might" be
thinking of a fairly old blog post by Aphyr that tested LWT in their very
early
days and they were bugs indeed, but those were fixed a long time ago. Since
then, his tests and much more were performed
(http://www.datastax.com/dev/blog/testing-apache-cassandra-with-jepsen)
and no problem with linearizability that I know of has been found. Don't
get me
wrong, any code can have subtle bug and not finding problems doesn't
guarantee
there isn't one, but if someone has demonstrated legit problems with the
linearizability of LWT, it's unknown to me and I'm watching this pretty
carefully.

I'll note to be complete that I'm not pretending the LWT implementation is
perfect, it's not (it's slow for one), and using them correctly can be more
challenging that it may sound at first (mostly because you need to handle
query timeouts properly and that's not always simple, sometimes requiring
a more complex data model that you'd want), but those are not break of
linearizability.

> https://issues.apache.org/jira/browse/CASSANDRA-6106

That ticket has nothing to do with LWT. In fact, LWT is the one mechanism in
Cassandra where this ticket has not impact whatsoever because the whole
point of
the mechanism is to ensure timestamps are assigned in a collision free
manner.


On Thu, Dec 8, 2016 at 8:32 AM, Hiroyuki Yamada  wrote:

> Hi DuyHai,
>
> Thank you for the comments.
> Yes, that's exactly what I mean.
> (Your comment is very helpful to support my opinion.)
>
> As you said, SERIAL with multi-DCs incurs latency increase,
> but it's a trade-off between latency and high availability bacause one
> DC can be down from a disaster.
> I don't think there is any way to achieve global linearlizability
> without latency increase, right ?
>
> > Edward
> Thank you for the ticket.
> I'll read it through.
>
> Thanks,
> Hiro
>
> On Thu, Dec 8, 2016 at 12:01 AM, Edward Capriolo 
> wrote:
> >
> >
> > On Wed, Dec 7, 2016 at 8:25 AM, DuyHai Doan 
> wrote:
> >>
> >> The reason you don't want to use SERIAL in multi-DC clusters is the
> >> prohibitive cost of lightweight transaction (in term of latency),
> especially
> >> if your data centers are separated by continents. A ping from London to
> New
> >> York takes 52ms just by speed of light in optic cable. Since LightWeight
> >> Transaction involves 4 network round-trips, it means at least 200ms
> just for
> >> raw network transfer, not even taking into account the cost of
> processing
> >> the operation
> >>
> >> You're right to raise a warning about mixing LOCAL_SERIAL with SERIAL.
> >> LOCAL_SERIAL guarantees you linearizability inside a DC, SERIAL
> guarantees
> >> you linearizability across multiple DC.
> >>
> >> If I have 3 DCs with RF = 3 each (total 9 replicas) and I did an INSERT
> IF
> >> NOT EXISTS with LOCAL_SERIAL in DC1, then it's possible that a
> subsequent
> >> INSERT IF NOT EXISTS on the same record succeeds when using SERIAL
> because
> >> SERIAL on 9 replicas = at least 5 replicas. Those 5 replicas which
> respond
> >> can come from DC2 and DC3 and thus did not apply yet 

Re: Why does `now()` produce different times within the same query?

2016-12-01 Thread Sylvain Lebresne
On Thu, Dec 1, 2016 at 4:44 PM, Edward Capriolo 
wrote:

>
> I am not sure you saw my reply on thread but I believe everyone's needs
> can be met I will copy that here:
>

I saw it, but the real problem that was raised initially was not that of
UDF and of allowing both behavior. It's a matter of people being confused
by the behavior of a non-UDF function, now(), and suggesting it should be
changed.

The Hive idea is interesting I guess, and we can switch to discussing that,
but it's a different problem really and I'm not a fond of derailing
threads. I will just note though that if we're not talking about a
confusion issue but rather how to get a timeuuid to be fixed within a
statement, then there is much much more trivial solution: generate it
client side. The `now()` function is a small convenience but there is
nothing you cannot do without it client side, and that actually basically
stands for almost any use of (non aggregate) function in Cassandra
currently.


>
>
> "Food for thought: Hive's UDFs introduced an annotation
> @UDFType(deterministic = false)
>
> http://dmtolpeko.com/2014/10/15/invoking-stateful-udf-at-map
> -and-reduce-side-in-hive/
>
> The effect is the query planner can see when such a UDF is in use and
> determine the value once at the start of a very long query."
>
> Essentially hive had a similar if not identical problem, during a long
> running distributed process like map/reduce some users wanted the semantics
> of:
>
> 1) Each call should have a new timestamps
>
> While other users wanted the semantics of:
>
> 2) Each call should generate the same timestamp
>
> The solution implemented was to add an annotation to udf such that the
> query planner would pick up the annotation and act accordingly.
>
> (Here is a related issue https://issues.apache.org/jira/browse/HIVE-1986
>
> As a result you can essentially implement two UDFS
>
> @UDFType(deterministic = false)
> public class UDFNow
>
> and for the other people
>
> @UDFType(deterministic = true)
> public class UDFNowOnce extends UDFNow
>
> Both user cases are met in a sensible way.
>


Re: Why does `now()` produce different times within the same query?

2016-12-01 Thread Sylvain Lebresne
One can of course always open a JIRA, but I'm going to strongly disagree
with a
change here (outside of a documentation one that is).

The now() function is a timeuuid generator, and it thus generates a unique
timeuuid on every call, as specified by the timeuuid spec. I'll note that
document lists it under "Timeuuid functions", and has sentences like
"the value returned by now() is guaranteed to be unique", so while I'm sure
the
documentation can be further clarified, I think it's pretty clear it's not
the
now() of SQL, and getting unique values on every call shouldn't be *that*
surprising.

Also, now() was primarily meant for use on timeuuid clustering columns for a
time-series like table, something like:
  CREATE TABLE ts (
k int,
t timeuuid,
v text,
PRIMARY KEY (k, t)
  )
and if you use it multiple times in a batch, this would look something like:
  BEGIN BATCH
INSERT INTO ts (k, t, v) VALUES (0, now(), 'foo');
INSERT INTO ts (k, t, v) VALUES (0, now(), 'bar');
  APPLY BATCH
and you definitively want that to insert 2 "events", not just one.

This is also why changing the behavior of this method *would* be a breaking
change.

Another reason this work the way it is is that functions in CQL are just
that,
functions. Each execution is unique and they have no notion of being
executed in
the same statement/batch/whatever. I actually think this is sensible,
assuming
one stops being obsessed with what other databases that aren't Apache
Cassandra
do.

I will note that Ben seems to suggest keeping the return of now() unique
across
call while keeping the time component equals, thus varying the rest of the
uuid
bytes. However:
 - I'm starting to wonder what this would buy us. Why would someone be super
   confused by the time changing across calls (in a single
statement/batch), but
   be totally not confused by the actual full return to not be equal? And
how is
   that actually useful: you're having different result anyway and you're
   letting the server pick the timestamp in the first place, so you're
probably
   not caring about milliseconds precision of that timestamp in the first
place.
 - This would basically be a violation of the timeuuid spec
 - This would be a big pain in the code and make of now() a special case
among functions. I'm unconvinced special cases are making things easier
in general.

So I'm all for improving the documentation if this confuses users due to
expectations (mistakenly) carried from prior experiences, and please
feel free to open a JIRA for that. I'm a lot less in agreement that there is
something wrong with the way the function behave in principle.

> I can see why this issue has been largely ignored and hasn't had a chance
for
> the behaviour to be formally defined

Don't make too much assumptions. The behavior is perfectly well defined:
now()
is a "normal" function and is evaluated whenever it's called according to
the
timeuuid spec (or as close to it as we can make it).

On Thu, Dec 1, 2016 at 7:25 AM, Benjamin Roth 
wrote:

> Great comment. +1
>
> Am 01.12.2016 06:29 schrieb "Ben Bromhead" :
>
>> tl;dr +1 yup raise a jira to discuss how now() should behave in a single
>> statement (and possible extend to batch statements).
>>
>> The values of now should be the same if you assume that now() works like
>> it does in relational databases such as postgres or mysql, however at the
>> moment it instead works like sysdate() in mysql. Given that CQL is supposed
>> to be SQL like, I think the assumption around the behaviour of now() was a
>> fair one to make.
>>
>> I definitely agree that raising a jira ticket would be a great place to
>> discuss what the behaviour of now() should be for Cassandra. Personally I
>> would be in favour of seeing the deterministic component (the actual time
>> part) being the same across multiple calls in the one statement or multiple
>> statements in a batch.
>>
>> Cassandra documentation does not make any claims as to how now() works
>> within a single statement and reading the code it shows the intent is to
>> work like sysdate() from MySQL rather than now(). One of the identified
>> dangers of making cql similar to sql is that, while yes it aids adoption,
>> users will find that SQL like things don't behave as expected. Of course as
>> a user, one shouldn't have to read the source code to determine correct
>> behaviour.
>>
>> Given that a timeuuid is made up of deterministic and (pseudo)
>> non-deterministic components I can see why this issue has been largely
>> ignored and hasn't had a chance for the behaviour to be formally defined
>> (you would expect now to return the same time in the one statement despite
>> multiple calls, but you wouldn't expect the same behaviour for say a call
>> to rand()).
>>
>>
>>
>>
>>
>>
>>
>> On Wed, 30 Nov 2016 at 19:54 Cody Yancey  wrote:
>>
>>> This is not a bug, and in fact changing it would be a serious bug.
>>>
>>> 

Re: Issue with Unexpected exception

2016-11-03 Thread Sylvain Lebresne
>From the trace, "Connection reset by peer" simply mean the client
disconnected, which isn't necessary a problem/abnormal per se (and if it
is, it sounds more like a client issue than anything else). That said, I'm
not sure why 3.0.8 log this at INFO now, as that's not really a problem, so
if you can reproduce on 3.0.9 too, feel free to open a JIRA ticket. That
said, it's kind of harmless unless you have other symptoms of something not
working.

On Thu, Nov 3, 2016 at 4:59 PM, Oleg Krayushkin 
wrote:

> Hi, about month ago I already asked about my problem here (with subject
> "Error while read after upgrade from 2.2.7 to 3.0.8") and also at
> stackoverflow .
> Unfortunately, I still didn't find a solution.
>
> It's "Unexpected exception" -- maybe it's a good idea to make an Issue
> with it? ..or is it my mistake somewhere?
>
> Thanks
> --
>
> Oleg Krayushkin
>


Re: Question about end of life support for Apache Cassandra 2.1 and 2.2

2016-09-01 Thread Sylvain Lebresne
"unsupported" means there won't be any more release, so no more patches
whatsoever and you're on your own.

On Thu, Sep 1, 2016 at 12:09 AM, Anmol Sharma 
wrote:

> According to the download  page,
> Apache Cassandra 2.1 is supported with critical fixes only till Nov 2016
> and and Apache Cassandra 2.2 is supported till Nov 2016.
>
> I wanted to know what is the policy for such "unsupported" versions,
> especially related to kernel vulnerabilities / security threats from
> dependent libraries that are discovered after a project has reached the
> "unsupported" stage?
>
> Will the upstream versions of Apache Cassandra 2.1 and 2.2 still receive
> security updates / patches or is it entirely up to the end users to fix
> these?
>
> Thanks,
> Anmol
>


Re: testing retry policy

2016-09-01 Thread Sylvain Lebresne
On Wed, Aug 31, 2016 at 6:56 PM, Jimmy Lin  wrote:

> hi all,
> I have some customized retry policies that want to test.
> In my single node local cluster, is there anyway to simulate the
> read/write timeout and or unavailable exception?
> I tried to kill the Cassandra process but it won't result in unavailable
> exception but no host available exception and so not going through the
> retry policy logic
>

This is really more a driver question and you should ideally use the
mailing list of whatever driver you have written your policy for for such
question. But read/write timeout or unavailable exceptions just cannot
happen on a single node cluster since those are errors send by the server,
and if you kill your only server, there is no-one to send you the error. So
you'll need at least 2 nodes to test those.


[ANNOUNCEMENT] Website update

2016-07-29 Thread Sylvain Lebresne
Wanted to let everyone know that if you go to the Cassandra website
(cassandra.apache.org), you'll notice that there has been some change.
Outside
of a face lift, the main change is a much improved documentation section
(http://cassandra.apache.org/doc/). As indicated, that documentation is a
work-in-progress and still has a few missing section. That documentation is
maintained in-tree and contributions (through JIRA as any other
contribution)
is more than welcome.

Best,
On behalf of the Apache Cassandra developers.


Re: Wireshark and CQL

2016-06-06 Thread Sylvain Lebresne
Good stuff, thanks for sharing.

On Sun, Jun 5, 2016 at 12:45 PM, Benoît Canet 
wrote:

>
> Hi List,
>
> I am from ScyllaDB and took some time to iterate on the
> wireshark CQL dissector that was previously written by
> Aaron Ten Clay.
>
> The result is that wireshark upstream now have a fully working
> CQL V3 dissectors merged in the following commit:
> https://github.com/wireshark/wireshark/commit/69a258514762405ce06ea5f65b7e8671743b65a1
>
> It should be available in the daily builds.
>
> We may evolve it in the future.
>
> Best regards
>
> Benoît Canet
> ScyllaDB
>


Re: Proper use of COUNT

2016-04-19 Thread Sylvain Lebresne
>
>
> Accept for relatively small or narrow queries, it seems to have a
> propensity for timing out.
>

For recent enough version of C*, it shouldn't since it pages internally (it
will be slow and as always be, but it shouldn't time out if some decent
page size is used, which should be the default). I suspect report of it
timeouting are either using old versions (or are using unreasonable paging
size values, but that sounds less likely since I'd assume users would
easily find and fix their error in that case).

But if the query is timeouting unreasonably for large partition in recent
versions, then it's a bug and a JIRA can be open with reproduction steps.

--
Sylvain

>


Re: Clustering key and secondary index behavior changed between 2.0.11 and 3.3.0

2016-04-05 Thread Sylvain Lebresne
I'm surprised this would have fall through the cracks but that certainly
look like a regression (a bug). If you can reproduce on 3.0.4 (just to make
sure we haven't fixed it recently), then please open a ticket in
https://issues.apache.org/jira/browse/CASSANDRA/ with your repro steps.

On Tue, Apr 5, 2016 at 10:47 AM, julien muller 
wrote:

> Hello,
>
> I noticed the following change in behavior while migrating from 2.0.11:
> Elements of the clustering key seems to not be secondary indexable anymore.
> Anyone could give me an insight about this issue? Or point me to documents
> relative to this evolution?
>
> There is a sample table with some data:
> CREATE TABLE table1 (
> name text,
> class int,
> inter text,
> power int,
> PRIMARY KEY (name, class, inter)
> ) WITH CLUSTERING ORDER BY (class DESC, inter ASC);
> INSERT INTO table1 (name, class, inter, power) VALUES ('R1',1, 'int1',13);
> INSERT INTO table1 (name, class, inter, power) VALUES ('R1',2, 'int1',18);
> INSERT INTO table1 (name, class, inter, power) VALUES ('R1',3, 'int1',37);
> INSERT INTO table1 (name, class, inter, power) VALUES ('R1',4, 'int1',49);
>
> In version 2.0.11, I used to have a secondary index on inter, that allowed
> me to make fast queries on this table:
> CREATE INDEX table1_inter ON table1 (inter);
> SELECT * FROM table1 where name='R1' AND class>0 AND class<4 AND
> inter='int1' ALLOW FILTERING;
>
> While testing on 3.3.0, I get the following message:
> *Cluste*
> *ring column "inter" cannot be restricted (preceding column "class" is
> restricted by a non-EQ relation)*
> It seems to only be considered as a key and the index and ALLOW FILTERING
> are not taken into account anymore (as it was in 2.0.11).
>
> Further tests confused me, as I found an ugly workaround: If I duplicate
> the column inter as a regular column, I can simply query it with the
> secondary index and no ALLOW FILTERING. It looks like the behavior I would
> anticipate and do not understand why it does not work on inter only because
> it is a clustering key.
>
> Any insight highly appreciated!
>
> Julien
>


Re: Using User Defined Functions in UPDATE queries

2016-03-11 Thread Sylvain Lebresne
On Fri, Mar 11, 2016 at 5:09 PM, Kim Liu <k...@edgewaternetworks.com> wrote:

> Just for sake of clarification, then, what is the use-case for having UDFs
> in an UPDATE?
>

Honestly, it's merely there for convenience when you use things like cqlsh
for instance.


>
> If they cannot read data from the data store, then all of the parameters
> to the UDF must be supplied by the client, correct?
>

Correct (for UPDATE at least).


>
> If the client has all the parameters, the client could perform the
> equivalent of the UDF on the client side, first, then send the results to
> the server, instead of pushing the computation work onto the server.
>

Absolutely.


>  So I am curious as to what one is supposed to use a UDF in an UPDATE for.
>

Again, mainly a convenience.

The main end goal for UDF is for use in SELECT. It's already potentially
slightly useful to save server->client bandwidth as you could imagine to do:
  SELECT compute_md5(image) FROM images WHERE ...;
(assuming of course that trading cpu server side for bandwidth is a good
idea)

Though their most useful use will probably be in WHERE clause, for things
like:
  SELECT * FROM foo WHERE sqrt(x) = 3;
and we also plan to have functional indexes to go with that.
However, I'll not right away that those last use case are not yet
supported, but they will be eventually. Adding UDF was more of the first
incremental step, but their most interesting use case is arguably not yet
supported. But as far as UPDATE is concerned, we'll probably never support
them as I said since that would require a read-before-write (except
possibly for LWT which do that read-before-write (at great cost) anyway,
we'll see).


>
>
>
> Long-winded explanation of the use-case I was poking at using UPDATE UDFs
> for below for the morbidly curious.
>
>
>
>
> That morbidly curious, huh?
>
> The scenario is, roughly, that the application receives a set of data
> which is broken up over, say, four messages (A,B,C,D).  However, the
> messages can arrive in any order, possibly with duplicates, and the data
> set is not complete until the all four messages are received.  There are
> multiple message receivers in order to scale to the volume of messages
> coming in, so each of the four messages per data set could arrive at any
> receiver (in any chronological pattern), and each receiving station would
> then insert the partial data into Cassandra.
>
> I looked at the Cassandra SET implementation, thinking that I could just
> add ‘A’, ‘B’, ‘C’, ‘D’ (or 1,2,3,4) to a set with a secondary index.  Then
> periodically search for where the set had all elements to spot which rows
> had a complete data set ready for processing.  However, there does not
> appear to be an equality check for SETs.  (Adding elements to a set is
> another place where UPDATE appears to allow for the “x = x 
> ” pattern which added to my confusion about using a UDF in the
> UPDATE.)
>
> So instead of using sets, the idea was to have a UDF perform a bit-wise OR
> operation.  Roughly:
>   CREATE FUNCTION bitwise_or( a int, b int ) CALLED ON NULL INPUT RETURNS
> int LANGUAGE java AS 'return Integer.valueOf((a == null ? 0 : a)|(b == null
> ? 0 : b));';
>
> Then as each message segment came in, I had intended, roughly:
>   UPDATE MessageData SET messageComplete = bitwise_or(messageComplete,2),
> data2=… ;
>   UPDATE MessageData SET messageComplete = bitwise_or(messageComplete,1),
> data1=… ;
>   UPDATE MessageData SET messageComplete = bitwise_or(messageComplete,8),
> data4=… ;
>   UPDATE MessageData SET messageComplete = bitwise_or(messageComplete,4),
> data3=… ;
>
> Then, with a secondary index on ‘messageComplete’, periodically scrape out
> all rows where messageComplete was equal to 15.  (At most, sixteen unique
> values in the secondary index.)  (And use a TTL to expire messages that did
> not eventually complete, etc.  Boilerplate infrastructure, etc.)
>
> This was based upon my incorrect assumption about UPDATE UDFs, since this
> looked like an optimal way to avoid having all the clients perform
> read-updates patterns and worrying about the clients stepping on each
> others data, as well as handling cases where duplicate messages were
> received by different receivers.  So it’s starting to look like I might
> need to use something else to perform the correlation between messages.
>
> —Kim
>
> From: Sylvain Lebresne <sylv...@datastax.com>
> Reply-To: "user@cassandra.apache.org" <user@cassandra.apache.org>
> Date: Friday, March 11, 2016 at 00:35
> To: "user@cassandra.apache.org" <user@cassandra.apache.org>
> Subject: Re: Using User Defined Functions in UPDATE queries
>
> UDF are usable in UPDATE statement as actually trying them show

Re: Using User Defined Functions in UPDATE queries

2016-03-11 Thread Sylvain Lebresne
UDF are usable in UPDATE statement as actually trying them shows, it's just
the documented grammar that needs fixing.

But as far as doing something like:
  UPDATE test_table SET data=max_int(data,5) WHERE idx='abc’;
this is indeed *not* supported and likely never will. One big pillar of C*
design is that normal writes like this don't do a read-before-write, both
for performance and because of consistency constraints, so we can't have
update depend on the previous value in any way.
I'll note that maybe that make UDF useless for you and if so, I'm sorry,
but you just can't use UDF in C* for that and you'd have to do a manual
read-before-write client side to achieve this.

For the sake of avoiding confusion, I will not that we do allow:
  UPDATE test_table SET c = c + 1 WHERE idx='abc';
if c is a counter, but that's a very special case. Counters have a
completely separate path and implementation and do have a read-before-write
(and are slower than normal update as a result).


On Thu, Mar 10, 2016 at 11:11 PM, Kim Liu 
wrote:

> It does sounds like the use of UDF in UPDATE is in an ambiguous state at
> the moment, then.  The document grammar says they can’t be used, but the
> document examples say they can, and the server will execute them, but it
> can’t execute them in a useful way (i.e. no row supplied data.)
>
> So essentially not useable at the moment, regardless of intent.
>
> Thanks,
> —Kim
>
> From: DuyHai Doan 
> Reply-To: "user@cassandra.apache.org" 
> Date: Thursday, March 10, 2016 at 14:03
> To: "user@cassandra.apache.org" 
> Subject: Re: Using User Defined Functions in UPDATE queries
>
> Surely an error because the grammar definition for UPDATE does not mention
> any function call:
> [SNIPPED]
>
> Unless the grammar in the doc is itself not up-to-date
>
>>
>>>


Re: Duplicated key with an IN statement

2016-02-04 Thread Sylvain Lebresne
That behavior has been changed in 2.2 and upwards. If you don't like it,
upgrade. In the meantime, it's probably not hard to avoid passing duplicate
keys in IN.

On Thu, Feb 4, 2016 at 3:48 PM, Edouard COLE 
wrote:

> Hello,
>
>
>
> When running that kind of query with TRACING ON; I noticed the coordinator
> is also performing multiple time the same query
>
>
>
> Because the element in the IN statement can involve many nodes, it makes
> sense to map/reduce the query, but running multiple time the same sub query
> should not happen. What if the result set change? Let’s imagine that query
> : SELECT * FROM t WHERE key IN (123, 123, …. X1000, 123), and while this
> query runs, the data for 123 change?
>
>
>
> key | value
>
> -+---
>
> 123 |   456
>
> 123 |   456
>
>  123 |   456
>
>  123 |   789 <-- Change here L
>
> 123 |   789
>
>
>
>
>
> There’s also something very important: when your table define a tuple
> being unique for a specific key, this is a real problem to be able to have
> a result set having multiple time the same key, which should be unique.
> This is why on every SQL implementation, this is not happening
>
>
>
> I think this is a bug
>
>
>
> Edouard COLE
>
>
>
>
>
> *De :* Alain RODRIGUEZ [mailto:arodr...@gmail.com]
> *Envoyé :* Thursday, February 04, 2016 11:55 AM
> *À :* Edouard COLE
> *Cc :* user@cassandra.apache.org
> *Objet :* Re: Duplicated key with an IN statement
>
>
>
> Hi,
>
>
>
> This is interesting.
>
>
>
> It seems rational that if you are looking at 2 keys and both exist (which
> is the case) it returns you 2 keys, it. Yet, I just checked this kind of
> command on MySQL and it gives a one line result. So here CQL differs from
> SQL (at least MySQL). I know we are trying to fit as much as possible with
> SQL to avoid loosing people, so we might want to change this.
>
> Not sure if this behavior is intentional / known. Not even sure someone
> ever tried to do this kind of query actually :).
>
>
>
> Does anyone know about that ? Should we raise a ticket ?
>
>
>
> -
>
> Alain Rodriguez
>
> France
>
>
>
> The Last Pickle
>
> http://www.thelastpickle.com
>
>
>
>
>
>
>
> 2016-02-04 8:36 GMT+00:00 Edouard COLE :
>
> Hello,
>
> I just discovered this, and I think this is weird:
>
> ed@debian:~$ cqlsh 192.168.10.8
> Connected to _CLUSTER_ at 192.168.10.8:9160.
> [cqlsh 4.0.1 | Cassandra 2.0.14.459 | CQL spec 3.1.1 | Thrift protocol
> 19.39.0]
> Use HELP for help.
> cqlsh> USE ks-test ;
> cqlsh:ks-test> CREATE TABLE t (
> ... key int,
> ... value int,
> ... PRIMARY KEY (key)
> ... );
> cqlsh:ks-test> INSERT INTO t (key, value) VALUES (123, 456) ;
> cqlsh:ks-test> SELECT * FROM t ;
>
>  key | value
> -+---
>  123 |   456
>
> (1 rows)
>
> cqlsh:ks-test> SELECT * FROM t WHERE key IN (123, 123);
>
>  key | value
> -+---
>  123 |   456
>  123 |   456 <- WTF?
>
> (2 rows)
>
> Adding multiple time the same key into an IN statement make the query
> returns multiple time the tuple
>
> This looks weird to me, can anyone give me some feedback on such a
> behavior?
>
> Edouard COLE
>
>
>


Re: How are timestamps selected for LWTs?

2016-02-02 Thread Sylvain Lebresne
On Tue, Feb 2, 2016 at 10:46 AM, Nicholas Wilson <
nicholas.wil...@realvnc.com> wrote:

> Hi,
>
> In the Cassandra docs I've read, it's not described how the timestamp is
> determined for LWTs. It's not possible to specify a timestamp with "USING
> TIMESTAMP ...", and my best guess is that in the "read" phase of the LWT
> (between propose and commit) the timestamp is selected based on the
> timestamps of the cells read. However, after reading through the source
> code (mainly StorageProxy::cas) I can't any hint of that.
>

It's not exactly how it works, but it yields a somewhat equivalent result.
Internally, LWTs use a so call "ballot" which is timeuuid, and the
underlying algorithm basically guarantees that the order of commit of
operations is the order of their ballot. And the timestamp used for the
cells of a given of operation is the timestamp part of that timeuuid
ballot, thus guaranteeing that this timestamp respects the order in which
operations are committed.

This is why you can't provide the timestamp client side: that timestamp is
picked server side and the value picked depends on when the operation is
committed.



>
> I'm worried about the following problem:
>
> Node A writes (using a LWT): UPDATE table SET val = 123, version = 2 WHERE
> key = 'foo' IF version = 1
> Node B writes (using a LWT): UPDATE table SET val = 234, version = 3 WHERE
> key = 'foo' IF version = 2
>
> If the first write is completed before the second, then both updates will
> be applied, but if Node B's clock is behind Node A's clock, then the second
> update would be effectively discarded if client-generated timestamps are
> used. It wouldn't take a big clock discrepancy, the HW clocks could in fact
> be perfectly in sync, but if the kernel ticks System.currentTimeMillis() at
> 15ms intervals it's quite possible for the two nodes to be 30ms out from
> each other.
>
> So, after the update query has "succeeded", do you need to do a read to
> find out whether it was actually applied? That would be surprising, since I
> can't find mention of it anywhere in the docs. You'd actually have to do a
> QUORUM read after every LWT update, just to find out whether your client
> chose the timestamp sensibly.
>
> The ideal thing would be if Cassandra chose the timestamp for the write,
> using the timestamp of the cells read during Paxos, to guarantee that
> writes are applied if the query condition holds, rather than leaving the
> potential for the query to succeed but do nothing if the cell already has a
> higher timestamp.
>
> If I've misunderstood, please do correct me!
>
> Thanks,
> Nicholas
>
> ---
> Nicholas Wilson
> Software developer
> RealVNC


Re: Java Driver Question

2016-02-02 Thread Sylvain Lebresne
As a side note, if your email subject is "Java Driver Question", then this
almost surely belong to the java driver mailing list. Please try to respect
other subscribers by using the most appropriate mailing list when possible.

On Tue, Feb 2, 2016 at 5:01 PM, Richard L. Burton III 
wrote:

> Very nice - Thanks Jack. I was looking at the docs and Contact Points but
> didn't see this. I'll use DNS records to manage the main contact points and
> update the DNS when those servers change.
>
> We should catch up again soon. Last time was a few years ago at the bar
> with Jake.
>
> On Tue, Feb 2, 2016 at 10:58 AM, Jack Krupansky 
> wrote:
>
>> No need to restart. As per the doc for Node Discovery:
>> "The driver discovers the nodes that constitute a cluster by querying
>> the contact points used in building the cluster object. After this it is up
>> to the cluster's load balancing policy to keep track of node events (that
>> is add, down, remove, or up) by its implementation of the
>> Host.StateListener interface."
>>
>> See:
>>
>> http://docs.datastax.com/en/developer/java-driver/3.0/common/drivers/reference/nodeDiscovery_r.html
>>
>> That said, your client would need to be modified/reconfigured and
>> restarted if the contact points changed enough that none were accessible.
>>
>>
>> -- Jack Krupansky
>>
>> On Tue, Feb 2, 2016 at 10:47 AM, Richard L. Burton III <
>> mrbur...@gmail.com> wrote:
>>
>>> In the case of adding more nodes to the cluster, would my application
>>> have to be restarted to detect the new nodes (as opposed to a node acting
>>> like a coordinator).
>>>
>>> e.g., Having the Java code connect using 3 known contact points and when
>>> a 4th and 5th node are added, the driver will become aware of these nodes
>>> without havng to be restarted?
>>>
>>> --
>>> -Richard L. Burton III
>>> @rburton
>>>
>>
>>
>
>
> --
> -Richard L. Burton III
> @rburton
>


Important notice for upgrades from 2.2.X to 3.Y

2016-01-07 Thread Sylvain Lebresne
The native protocol is the name we give to the protocol used between CQL
drivers and the server. That protocol is versioned and a new version,
version
4, was introduced in Cassandra 2.2.0. We recently uncovered a compatibility
bug
in that 4th version (https://issues.apache.org/jira/browse/CASSANDRA-10880)
that made said version of the protocol not fully compatible between 2.2.X
and
3.Y. As a consequence, you _must_ ensure that your clients use the protocol
version 3 if you plan an upgrade from any 2.2.X version to any 3.Y version.
Ensuring that might requires forcing the protocol version in the client
driver
used. For instance, in the DataStax Java driver, you can do so by calling
`.withProtocolVersion(ProtocolVersion.V3)` on your `Cluster.Builder` object.

The bug in question affects the automatic paging of result sets that the
protocol provides: the first page or results is always sent correctly, but
requesting the next pages might result in a failure. This mean that in
theory
you can disregard this problem if you know that you are not using said
paging,
but we still strongly encourage sticking to the protocol v3 for upgrade as
the
downsides are very minor (see below) and not worth taking risk.

The changes in the protocol v4 are relatively minor, and so forcing the use
of
v3 are relatively minor downsides, namely:
- the use of the recently added CQL type `date`, `time`, `tinyint` and
  `smallint` involves sending slightly bigger metadata in v3 that in v4. The
  resulting performance difference is unlikely to be noticeable.
- schema changes related to User Defined Functions are notified to clients
as a
  "keyspace change" in v3 which, being imprecise, might require a client
driver
  to request more schema metadata to update its own copy of said metadata.
This
  is again a very minor inefficiently.
- The protocol v4 has a feature that allows the server to send warnings to
the
  clients. This is as of yet little used by the server and in which case
where
  it is, the warning is also logged server side.

Note that using the protocol v4 is fine once you have finished with your
upgrade and all your nodes are on 3.Y. The problem is only with cluster
mixing
2.2.X and 3.Y nodes.

--
The Cassandra dev team


Re: [Marketing Mail] can't make any permissions change in 2.2.4

2015-12-18 Thread Sylvain Lebresne
On Fri, Dec 18, 2015 at 8:55 AM, Reynald Bourtembourg <
reynald.bourtembo...@esrf.fr> wrote:

> This does not seem to be explained in the Cassandra 2.2 Upgrading section
> of the NEWS.txt file:
>
>
> 
> https://git-wip-us.apache.org/repos/asf?p=cassandra.git;a=blob_plain;f=NEWS.txt;hb=refs/tags/cassandra-2.2.4
>
>
This is indeed an oversight. Would you mind opening a JIRA ticket so we
don't forget to add it now?

--
Sylvain


Re: [Marketing Mail] Re: [Marketing Mail] can't make any permissions change in 2.2.4

2015-12-18 Thread Sylvain Lebresne
On Fri, Dec 18, 2015 at 3:04 PM, Kai Wang <dep...@gmail.com> wrote:

> Reynald,
>
> Thanks for link. That explains it.
>
> Sylvain,
>
> What exactly are the "legacy tables" I am supposed to drop? Before I drop
> them, is there any way I can confirm the old schema has been converted to
> the new one successfully?
>

I didn't worked on those changes so I'm actually not sure of the exact
answer. But I see you commented on the ticket so we'll make sure to include
that information in the NEWS file (and maybe to get the blog post edited).


>
> Thanks.
>
>
> On Fri, Dec 18, 2015 at 5:05 AM, Reynald Bourtembourg <
> reynald.bourtembo...@esrf.fr> wrote:
>
>> Done:
>> https://issues.apache.org/jira/browse/CASSANDRA-10904
>>
>>
>>
>> On 18/12/2015 10:51, Sylvain Lebresne wrote:
>>
>> On Fri, Dec 18, 2015 at 8:55 AM, Reynald Bourtembourg <
>> <reynald.bourtembo...@esrf.fr>reynald.bourtembo...@esrf.fr> wrote:
>>
>>> This does not seem to be explained in the Cassandra 2.2 Upgrading
>>> section of the NEWS.txt file:
>>>
>>>
>>> https://git-wip-us.apache.org/repos/asf?p=cassandra.git;a=blob_plain;f=NEWS.txt;hb=refs/tags/cassandra-2.2.4
>>>
>>>
>> This is indeed an oversight. Would you mind opening a JIRA ticket so we
>> don't forget to add it now?
>>
>> --
>> Sylvain
>>
>>
>>
>


Data loss bug in upgrade to 3.0

2015-12-14 Thread Sylvain Lebresne
We have discovered a critical bug for upgrade in the 3.0.0, 3.0.1 and 3.1
release. This bug, https://issues.apache.org/jira/browse/CASSANDRA-10822,
only
affects upgrades from a 2.x version (to any of the currently released 3.x
version) and might cause data loss. As such, we strongly advice waiting on
the
fix for this issue to be released before upgrading to 3.x.

We are currently working on releasing new versions with the fix for that
issue
addressed. Those will be version 3.0.2 and 3.1.1 and should be released
before
the end of the week (as soon as we properly tested them).

Let me say again that new 3.x clusters are not affected.

--
The Cassandra Team


Re: lightweight transactions with potential problem?

2015-08-26 Thread Sylvain Lebresne
 By the way, I do not understand why in lightweight transactions in
 Cassandra has round-trip commit/acknowledgment?

 For me, I think we can commit the value within phase propose/accept. Do
 you agree? If not agree can you explain why we need commit/acknowledgment?


No, value cannot be committed during the propose/accept phase, because
nothing should be committed *before* the coordinator of the round has
received a QUORUM of accepts. In other words, you have to wait on the
result of the propose/accept to know if you can commit or not. This is not
at all specific to Cassandra btw: you'll find this in most Paxos
presentation, generally named as the learning phase.

 --
Sylvain


Re: lightweight transactions with potential problem?

2015-08-26 Thread Sylvain Lebresne
On Wed, Aug 26, 2015 at 12:19 PM, ibrahim El-sanosi 
ibrahimsaba...@gmail.com wrote:

 Yes, Sylvain, your answer makes more sense. The phase is in Paxos protocol
 sometimes called learning or decide phase, BUT this phase does not have
 acknowledgment round, just learning or decide message from the proposer to
 learners. So why we need acknowledgment round with commit phase in
 lightweight transactions?


It's not _needed_ as far as Paxos is concerned. But it's useful in the
context of Cassandra. The commit phase is about actually persisting to
replica the update decided by the Paxos algorithm and thus making that
update visible to non paxos reads. Being able to apply normal consistencies
to this phase is thus useful, since it allows user to get visibility
guarantees even for non-paxos reads if they so wish, and that's exactly
what we do and why we optionally wait on acknowledgments (and I say
optionally because how many acks we wait on depends on the user provided
consistency level and if that's CL.ANY then the whole Paxos operation
actually return without waiting on any of those acks).


Re: lightweight transactions with potential problem?

2015-08-26 Thread Sylvain Lebresne
Yes

On Wed, Aug 26, 2015 at 1:05 PM, ibrahim El-sanosi ibrahimsaba...@gmail.com
 wrote:

 OK. I see what the purpose of acknowledgment round here. So
 acknowledgment is optional here, depend on CL setting as we normally do in
 Cassandra.
 So we can say that acknowledgment is not really related to Paxos phase, it
 depends on CL in Cassandra?

 Ibrahim

 On Wed, Aug 26, 2015 at 11:50 AM, Sylvain Lebresne sylv...@datastax.com
 wrote:

 On Wed, Aug 26, 2015 at 12:19 PM, ibrahim El-sanosi 
 ibrahimsaba...@gmail.com wrote:

 Yes, Sylvain, your answer makes more sense. The phase is in Paxos
 protocol sometimes called learning or decide phase, BUT this phase does not
 have acknowledgment round, just learning or decide message from the
 proposer to learners. So why we need acknowledgment round with commit phase
 in lightweight transactions?


 It's not _needed_ as far as Paxos is concerned. But it's useful in the
 context of Cassandra. The commit phase is about actually persisting to
 replica the update decided by the Paxos algorithm and thus making that
 update visible to non paxos reads. Being able to apply normal consistencies
 to this phase is thus useful, since it allows user to get visibility
 guarantees even for non-paxos reads if they so wish, and that's exactly
 what we do and why we optionally wait on acknowledgments (and I say
 optionally because how many acks we wait on depends on the user provided
 consistency level and if that's CL.ANY then the whole Paxos operation
 actually return without waiting on any of those acks).






Re: lightweight transactions with potential problem?

2015-08-25 Thread Sylvain Lebresne


 So you meant that the older ballot will not only reject in round-trip1
 (prepare/promise), it also can be reject in propose/accept round-trips2, Is
 that correct?


Yes.



 You Said : Or more precisely, you got step 8 wrong: when a replica
 PROMISE, the promise is not that they won't promise a ballot older than
 2,it's that they won't accept a ballot older than 2

 Why step 8 wrong? I think replicas can accept any highest ballot, so
 ballot 2 is the highest in step 8? what do you think?
  Do you also mean replica can promise older ballot.


I shouldn't have said wrong. What I meant is that your description of
what a PROMISE meant was incomplete. It's true that in practice replicas
won't promise older ballots, but it's not the important property in this
case, the important property is that they also promise to not accept any
older ballot.



 I wish you could make it more clear.

 Thank you a lot Sylvain

 Ibrahim


 On Tue, Aug 25, 2015 at 1:40 PM, Sylvain Lebresne sylv...@datastax.com
 wrote:

 That scenario cannot happen. More specifically, your step 12 cannot
 happen if
 step 8 has happen. Or more precisely, you got step 8 wrong: when a replica
 PROMISE, the promise is not that they won't promise a ballot older than
 2,
 it's that they won't accept a ballot older than 2. Therefore, after
 step 8,
 the accept from N1 will be reject in step 12 and the insert from N1 will
 be
 rejected (that is, N1 will restart the whole algorithm with a new ballot).


 On Tue, Aug 25, 2015 at 1:54 PM, ibrahim El-sanosi 
 ibrahimsaba...@gmail.com wrote:

 Hi folks,


 Cassandra provides *linearizable consistency (CAS, Compare-and-Set) by
 using Paxos 4 round-trips as following*

 *1.  **Prepare/promise*

 *2.  **Read/result*

 *3.  **Propose/accept*

 *4.  **Commit/acknowledgment *

 Assume we have an application for resistering new account, I want to
 make sure I only allow exactly one user to claim a given account. For
 example, we do not allow two users having the same username.

 Assuming we have a cluster consist of 5 nodes N1, N2, N3, N4, and N5. We
 have two concurrent clients C1 and C2. We have replication factor 3 and the
 partitioner has determined the primary and the replicas nodes of the INSERT
 example are N3, N4, and N5.


 The scenario happens in following order:

 1.  C1 connects to coordinator N1 and sends INSERT  V1 (assume V1
 is username, not resister before)

 2.  N1 sends PREPARE message with ballot 1 (highest ballot have
 seen) to N3, N4 and N5. Note that this prepare for C1 and V1.

 3.  N3, N4 and N5 send a PROMISE message to N1, to not promise any
 with older than ballot 1.

 4.N1  sends READ message to N3, N4 and N5 to read V1.

 5.N3, N4 and N5 send RESULT message to N1, informing that V1 not
 exist which results in N1 will go forward to next round.

 6.  Now C2 connects to coordinator N2 and sends INSERT  V1.

 7.  N2 sends PREPARE message with ballot 2 (highest ballot after
 re-prepare because first time, N2 does not know about ballot 1, then
 eventual it solves and have ballot 2) to N3, N4 and N5. Note that this
 prepare for C2 and V1.

 8.  N3, N4 and N5 send a PROMISE message to N2, to not promise any
 with older than ballot 2.

 9.  N2  sends READ message to N3, N4 and N5 to read V1.

 10.   N3, N4 and N5 send RESULT message to N2, informing that V1 not
 exist which results in N2 will go forward to next round.

 11.   Now N1 send PROPOSE message to  N3, N4 and N5 (ballot 1, V1).

 12.  N3, N4 and N5 send ACCEPT message to N1.

 13.  N2 send PROPOSE message to  N3, N4 and N5 (ballot 2, V1).

 14.  N3, N4 and N5 send ACCEPT message to N2.

 15.  N1 send COMMIT message to  N3, N4 and N5 (ballot 1).

 16.   N3, N4 and N5 send ACK message to N1.

 17.   N2 send COMMIT message to  N3, N4 and N5 (ballot 2).

 18.  N3, N4 and N5 send ACK message to N2.


 As result, both V1 from client C1 and V1 from client C2 have written to
 replicas N3, N4, and N5. Which I think it does not achieve the goal of 
 *linearizable
 consistency and CAS. *



 *Is that true and such scenario could be occurred?*



 I look forward to hearing from you.


 Regards,






Re: lightweight transactions with potential problem?

2015-08-25 Thread Sylvain Lebresne
That scenario cannot happen. More specifically, your step 12 cannot happen
if
step 8 has happen. Or more precisely, you got step 8 wrong: when a replica
PROMISE, the promise is not that they won't promise a ballot older than 2,
it's that they won't accept a ballot older than 2. Therefore, after step
8,
the accept from N1 will be reject in step 12 and the insert from N1 will be
rejected (that is, N1 will restart the whole algorithm with a new ballot).


On Tue, Aug 25, 2015 at 1:54 PM, ibrahim El-sanosi ibrahimsaba...@gmail.com
 wrote:

 Hi folks,


 Cassandra provides *linearizable consistency (CAS, Compare-and-Set) by
 using Paxos 4 round-trips as following*

 *1.  **Prepare/promise*

 *2.  **Read/result*

 *3.  **Propose/accept*

 *4.  **Commit/acknowledgment *

 Assume we have an application for resistering new account, I want to make
 sure I only allow exactly one user to claim a given account. For example,
 we do not allow two users having the same username.

 Assuming we have a cluster consist of 5 nodes N1, N2, N3, N4, and N5. We
 have two concurrent clients C1 and C2. We have replication factor 3 and the
 partitioner has determined the primary and the replicas nodes of the INSERT
 example are N3, N4, and N5.


 The scenario happens in following order:

 1.  C1 connects to coordinator N1 and sends INSERT  V1 (assume V1 is
 username, not resister before)

 2.  N1 sends PREPARE message with ballot 1 (highest ballot have seen)
 to N3, N4 and N5. Note that this prepare for C1 and V1.

 3.  N3, N4 and N5 send a PROMISE message to N1, to not promise any
 with older than ballot 1.

 4.N1  sends READ message to N3, N4 and N5 to read V1.

 5.N3, N4 and N5 send RESULT message to N1, informing that V1 not
 exist which results in N1 will go forward to next round.

 6.  Now C2 connects to coordinator N2 and sends INSERT  V1.

 7.  N2 sends PREPARE message with ballot 2 (highest ballot after
 re-prepare because first time, N2 does not know about ballot 1, then
 eventual it solves and have ballot 2) to N3, N4 and N5. Note that this
 prepare for C2 and V1.

 8.  N3, N4 and N5 send a PROMISE message to N2, to not promise any
 with older than ballot 2.

 9.  N2  sends READ message to N3, N4 and N5 to read V1.

 10.   N3, N4 and N5 send RESULT message to N2, informing that V1 not
 exist which results in N2 will go forward to next round.

 11.   Now N1 send PROPOSE message to  N3, N4 and N5 (ballot 1, V1).

 12.  N3, N4 and N5 send ACCEPT message to N1.

 13.  N2 send PROPOSE message to  N3, N4 and N5 (ballot 2, V1).

 14.  N3, N4 and N5 send ACCEPT message to N2.

 15.  N1 send COMMIT message to  N3, N4 and N5 (ballot 1).

 16.   N3, N4 and N5 send ACK message to N1.

 17.   N2 send COMMIT message to  N3, N4 and N5 (ballot 2).

 18.  N3, N4 and N5 send ACK message to N2.


 As result, both V1 from client C1 and V1 from client C2 have written to
 replicas N3, N4, and N5. Which I think it does not achieve the goal of 
 *linearizable
 consistency and CAS. *



 *Is that true and such scenario could be occurred?*



 I look forward to hearing from you.


 Regards,



Re: JSON Cassandra 2.2 - insert syntax

2015-06-02 Thread Sylvain Lebresne
Well, your column is not called address, it's called addresses. It's
your type that is called address.

On Tue, Jun 2, 2015 at 4:39 AM, Michel Blase mblas...@gmail.com wrote:

 Zach,

 this is embarrassing.you were right, I was running 2.1

 shame on me! but now I'm getting the error:

 *InvalidRequest: code=2200 [Invalid query] message=JSON values map
 contains unrecognized column: address*
 any idea? This is the sequence of commands that I'm running:

 CREATE KEYSPACE json WITH REPLICATION = { 'class' :'SimpleStrategy',
 'replication_factor' : 1 };

 USE json;

 CREATE TYPE address (street text,city text,zip_code int,phones settext);

 CREATE TABLE users (id int PRIMARY KEY,name text,addresses maptext,
 frozenaddress);

 INSERT INTO users JSON  '{id: 123,name: jbellis,address: {home:
 {street: 123 Cassandra Dr,city:Austin,zip_code: 78747,phones:
 [2101234567]}}}';


 Consider that I'm running a just downloaded C2.2 instance (I'm on a mac)

 Thanks and sorry for the waste of time before!






 On Mon, Jun 1, 2015 at 7:10 PM, Zach Kurey zach.ku...@datastax.com
 wrote:

 Hi Michel,

 My only other guess is that you actually are running Cassandra 2.1, since
 thats the exact error I get if I try to execute a JSON statement against a
 version earlier than 2.2.



 On Mon, Jun 1, 2015 at 6:13 PM, Michel Blase mblas...@gmail.com wrote:

 Thanks Zach,

 tried that but I get the same error:

 *SyntaxException: ErrorMessage code=2000 [Syntax error in CQL query]
 message=line 1:24 mismatched input '{id: 123,name:
 jbellis,address: {home: {street: 123 Cassandra Dr,city:
 Austin,zip_code: 78747,phones: [2101234567]}}}' expecting ')' (INSERT
 INTO users JSON  ['{id: 123,name: jbellis,address: {home:
 {street: 123 Cassandra Dr,city: Austin,zip_code: 78747,phones:
 [2101234567]}}]}';)*

 On Mon, Jun 1, 2015 at 6:12 PM, Zach Kurey zach.ku...@datastax.com
 wrote:

 Looks like you have your use of single vs. double quotes inverted.
 What you want is:

 INSERT INTO users JSON  '{id: 123,name: jbellis,address: {
 home: {street: 123 Cassandra Dr,city: Austin,zip_code:
 78747,phones: [2101234567]}}}';

 HTH

 On Mon, Jun 1, 2015 at 6:03 PM, Michel Blase mblas...@gmail.com
 wrote:

 Hi all,

 I'm trying to test the new JSON functionalities in C* 2.2.

 I'm using this example:

 https://issues.apache.org/jira/browse/CASSANDRA-7970

 I believe there is a typo in the CREATE TABLE statement that requires
 frozen:

 CREATE TABLE users (id int PRIMARY KEY,name text,addresses maptext,
 frozenaddress);

 but my real problem is in the insert syntax. I've found the CQL-2.2
 documentation and my best guess is this:

 INSERT INTO users JSON {'id': 123,'name': 'jbellis','address':
 {'home': {'street': '123 Cassandra Dr','city': 'Austin','zip_code':
 78747,'phones': [2101234567]}}};

 but I get the error:

 SyntaxException: ErrorMessage code=2000 [Syntax error in CQL query]
 message=line 1:23 mismatched input '{'id': 123,'name':
 'jbellis','address': {'home': {'street': '123 Cassandra Dr','city':
 'Austin','zip_code': 78747,'phones': [2101234567]}}}' expecting ')' 
 (INSERT
 INTO users JSON [{'id': 123,'name': 'jbellis','address': {'home':
 {'street': '123 Cassandra Dr','city': 'Austin','zip_code': 78747,'phones':
 [2101234567]}}]};)


 Any idea?


 Thanks,

 Michael









Re: Batch isolation within a single partition

2015-05-19 Thread Sylvain Lebresne
On Tue, May 19, 2015 at 9:42 AM, DuyHai Doan doanduy...@gmail.com wrote:

 If RF  1, the consistency level at QUORUM cannot guarantee strict
 isolation (for normal mutation or batch). If you look at this slide:
 http://www.slideshare.net/doanduyhai/cassandra-introduction-apache-con-2014-budapest/25,
 you can see that the mutation is sent by the coordinator, in parallel, to
 all replicas.

  Now it is very possible that due to network latency, the mutation is
 applied on the first replica and is applied with some delay (which can be
 at the order of microseconds) on other replicas.

  Theoretically, one client can read updated value on first replica and old
 value on the other replicas, even at QUORUM.


Unfortunately different people will tend to have different definitions of
isolation and I don't seem to have the same definition than you but still,
I don't understand what you're talking about. Of course replicas might not
get a mutation at the same time, and yes, a read at QUORUM may thus not see
the most up to date value from all replicas. But the coordinator resolves
all responses together and return only the most recent one, so that doesn't
matter to the client and I don't see how that has anything to do with
isolation from the client perspective.

My response to the original question is that if by isolation you mean can
a reader observe a write only partially applied, then for single partition
writes, Cassandra do offer isolation. One caveat however is that if 2
writes conflits, they are resolved using their timestamp and if the
timestamp are the same, resolution is based on values, which is not
necessarily intuitive and may make it sound like the writes where not
applied in isolation (even though technically they are), see
https://issues.apache.org/jira/browse/CASSANDRA-6123 for details on that
later problem. I'll note that my definition of isolation does not mean you
can't read stale data, and you can indeed if you use weak consistency
levels.

If you mean something else by isolation, then I think agreeing first on the
definition would be wise.

--
Sylvain


Re: Data stax object mapping and lightweight transactions

2015-04-07 Thread Sylvain Lebresne
This is very much a Java Driver question so please try to use the java
driver mailing list (
https://groups.google.com/a/lists.datastax.com/forum/#!forum/java-driver-user)
for this type of question instead of the general Cassandra mailing list in
the future.

That said, to answer you question, no, the save operation will not use
lightweight transaction. If you want to use LWT with the driver mapping,
you should look at the accessors (
http://docs.datastax.com/en/developer/java-driver/2.1/common/drivers/reference/accessorAnnotatedInterfaces.html
).

On Tue, Apr 7, 2015 at 1:46 AM, Sha Liu lius...@hotmail.com wrote:

 Hi,

 Does the latest Data Stax Java driver (2.1.5) support lightweight
 transactions using object mapping? For example, if I set the write
 consistency level of the mapped class to SERIAL through annotation, then
 does the “save” operation use lightweight transaction instead of a normal
 write?

 Thanks,
 Sha Liu


Re: sstable writer and creating bytebuffers

2015-03-31 Thread Sylvain Lebresne
On Tue, Mar 31, 2015 at 7:42 AM, Peer, Oded oded.p...@rsa.com wrote:

  Thanks Sylvain.

 Is there any way to create a composite key with only one column in
 Cassandra when creating a table, or should creating a CompositeType
 instance with a single column be prohibited?


It's hard to answer without knowing what you are trying to achieve.
Provided I don't misunderstand what you are asking, then yes, it's
technically possible, but it's hard to say how wise that is unless I know
more about your constraints/the reasons you're considering that. Let's say
that in general, if you have only a single column, then there isn't too
much reasons to use a CompositeType.

--
Sylvain





 *From:* Sylvain Lebresne [mailto:sylv...@datastax.com]
 *Sent:* Monday, March 30, 2015 1:57 PM
 *To:* user@cassandra.apache.org
 *Subject:* Re: sstable writer and creating bytebuffers



 No, it's not a bug. In a composite every elements start by a 2 short
 indicating the size of the element, plus an extra byte that is used for
 sorting purposes. A little bit more details can be found in the
 CompositeType class javadoc if you're interested. It's not the most compact
 format there is but changing it would break backward compatibility anyway.



 On Mon, Mar 30, 2015 at 12:38 PM, Peer, Oded oded.p...@rsa.com wrote:

 I am writing code to bulk load data into Cassandra using
 SSTableSimpleUnsortedWriter

 I changed my partition key from a composite key (long, int) to a single
 column key (long).

 For creating the composite key I used a CompositeType, and I kept using it
 after changing the key to a single column.

 My code didn’t work until I changed the way I create the ByteBuffer not to
 use CompositeType.



 The following code prints ‘false’.

 Do you consider this a bug?



   *long* val = 123L;

   ByteBuffer direct = *bytes*( val );

   ByteBuffer composite = CompositeType.*getInstance*(
 LongType.*instance* ).builder().add( *bytes*( val ) ).build();

   System.*out*.println( direct.equals( composite ) );







Re: sstable writer and creating bytebuffers

2015-03-30 Thread Sylvain Lebresne
No, it's not a bug. In a composite every elements start by a 2 short
indicating the size of the element, plus an extra byte that is used for
sorting purposes. A little bit more details can be found in the
CompositeType class javadoc if you're interested. It's not the most compact
format there is but changing it would break backward compatibility anyway.

On Mon, Mar 30, 2015 at 12:38 PM, Peer, Oded oded.p...@rsa.com wrote:

  I am writing code to bulk load data into Cassandra using
 SSTableSimpleUnsortedWriter

 I changed my partition key from a composite key (long, int) to a single
 column key (long).

 For creating the composite key I used a CompositeType, and I kept using it
 after changing the key to a single column.

 My code didn’t work until I changed the way I create the ByteBuffer not to
 use CompositeType.



 The following code prints ‘false’.

 Do you consider this a bug?



   *long* val = 123L;

   ByteBuffer direct = *bytes*( val );

   ByteBuffer composite = CompositeType.*getInstance*(
 LongType.*instance* ).builder().add( *bytes*( val ) ).build();

   System.*out*.println( direct.equals( composite ) );





Re: Re: Dynamic Columns

2015-01-26 Thread Sylvain Lebresne
 Where we differ that I feel the coverage for existing thrift use cases
isn't
 100%. That may be right or wrong, but it is my impression.

Here's my problem: either CQL covers all existing thrift use cases or it
does
not (in which case the non supported use case should be pointed out). It's a
technical question, not one that is matter of impression or feeling. I'm
fine with you saying that, in your personal opinion, some use cases feels
more
natural/more direct in Thrift: you're entitled to your opinions. But when
your
initial emails on this thread start with the thing is, CQL only handles
some
types of dynamic column use cases, or say things like I hope CQL
continues to
improve so that it supports 100% of the existing use cases, then I'm sorry
but
it doesn't sound like you're just expressing some personal preference. And
since, I'm claiming, those statements are false, I can't force you but I
would
really appreciate that you refrain from propagating such falsehood (unless
of
course you can actually substantiate them by actual facts) because it's
confusing, especially for new users.

--
Sylvain


Re: BytesType and UTF8Type

2015-01-26 Thread Sylvain Lebresne
On Fri, Jan 23, 2015 at 5:28 PM, Ken Hancock ken.hanc...@schange.com
wrote:


 I have some thrift column families that were created with BytesType.  All
 the data written to the keys/columns/values were simple string.

 In cassandra-cli, I can correct these to UTF8Type (I believe both
 UTF8Type and BytesType are serialized similarly?) but I can't convert these
 blobs to text via CQL.

 (1) if my strings are all ascii, is it safe to correct the validation
 classes?


It is, provided your initial assumption (my strings are all ascii) is
actually true.


 (2) is the cql restriction merely a safety mechanism that cassandra-cli is
 lacking?


It is.

--
Sylvain





Re: Re: Dynamic Columns

2015-01-21 Thread Sylvain Lebresne
On Wed, Jan 21, 2015 at 4:44 PM, Peter Lin wool...@gmail.com wrote:

 I don't remember other people's examples in detail due to my shitty
 memory, so I'd rather not misquote.


Fair enough, but maybe you shouldn't use people's examples you don't
remenber as argument then. Those examples might be wrong or outdated and
that kind of stuff creates confusion for everyone.



 In my case, I mix static and dynamic columns in a single column family
 with primitives and objects. The objects are temporal object graphs with a
 known type. Doing this type of stuff is basically transparent for me, since
 I'm using thrift and our data modeler generates helper classes. Our tooling
 seamlessly convert the bytes back to the target object. We have a few
 standard static columns related to temporal metadata. At any time, dynamic
 columns can be added and they can be primitives or objects.


I don't see anything in that that cannot be done with CQL. You can mix
static and dynamic columns in CQL thanks to static columns. More precisely,
you can do what you're describing with a table looking a bit like this:
  CREATE TABLE t (
key blob,
static my_static_column_1 int,
static my_static_column_2 float,
static my_static_column_3 blob,
,
dynamic_column_name blob,
dynamic_column_value blob,
PRIMARY KEY (key, dynamic_column_name);
  )

And your helper classes will serialize your objects as they probably do
today (if you use a custom comparator, you can do that too). And let it be
clear that I'm not pretending that doing it this way is tremendously
simpler than thrift. But I'm saying that 1) it's possible and 2) while it's
not meaningfully simpler than thriftMy , it's not really harder either (and
in fact, it's actually less verbose with CQL than with raw thrift).



 For the record, doing this kind of stuff in a relational database sucks
 horribly.


I don't know what that has to do with CQL to be honest. If you're doing
relational with CQL you're doing it wrong. And please note that I'm not
saying CQL is the perfect API for modeling temporal data. But I don't get
how thrift, which is very crude API, is a much better API at that than CQL
(or, again, how it allows you to do things you can't with CQL).

--
Sylvain


Re: Re: Dynamic Columns

2015-01-21 Thread Sylvain Lebresne
On Wed, Jan 21, 2015 at 6:19 PM, Peter Lin wool...@gmail.com wrote:

 the dynamic column can't be part of the primary key. The temporal entity
 key can be the default UUID or the user can choose the field in their
 object. Within our framework, we have concept of temporal links between one
 or more temporal entities. Poluting the primary key with the dynamic column
 wouldn't work.


Not totally sure I understand. Are you talking about the underlying storage
space used? If you are, we can discuss it (it's not too hard to remedy it
in CQL, I was mainly trying to illustrating my point, not pretending this
was a drop-in solution for your use case) but it's more of a performance
discussion, and I think we've somewhat quit the realm of there's things
CQL3 doesn't support.


 Please excuse the confusing RDB comparison. My point is that Cassandra's
 dynamic column feature is the unique feature that makes it better than
 traditional RDB or newSql like VoltDB for building temporal databases. With
 databases that require static schema + alter table for managing schema
 evolution, it makes it harder and results in down time.


Here again you seem you imply that CQL doesn't support dynamic columns, or
has a somewhat inferior support, but that's just not true.


 One of the challenges of data management over time is evolving the data
 model and making queries simple. If the record is 5 years old, it probably
 has a difference schema than a record inserted this week. With temporal
 databases, every update is an insert, so it's a little bit more complex
 than just use a blob. There's a whole level of complication with temporal
 data and CQL3 custom types isn't clear to me. I've read the CQL3
 documentation on the custom types several times and it is rather poor. It
 gives me the impression there's still work needed to get custom types in
 good shape.


I'm sorry but that's a bit of hand waving. Custom types (and by that I mean
user-provided AbstractType implementations) works in CQL *exactly* like in
thrift: they are not in a better or worse shape than in thrift. And while
the documentation on CQL3 is indeed poor on this part, so is the thrift
documentation on the same subject (besides, I don't think you're whole
point is about saying that documentation could be improved). Again, what
you can do in thrift, you can do in CQL.


 I consistently recommend new users learn and understand both Thrift and
 CQL.


I understand that you do this with the best of intentions and don't take it
the wrong way but it is my opinion that you are counterproductive by doing
so, and this for 2 reasons:
1) you don't only recommend users to learn both API, you justify that
advice by affirming that there is a whole family of important use cases
that thrift supports and CQL do not. Except that I pretend tat this
affirmation is technically incorrect, and so far I haven't seen much
example proving me wrong.
2) there is a wealth of evidence that trying to learn both thrift and CQL
confuses the hell out of new users. Which is btw not surprising, both API
presents the same concepts in seemingly different way (even though they do
are the same concepts) and even have conflicting vocabulary, so it's
obviously confusing when you try to learn those concepts in the first
place. Trying to learn CQL when you know thrift well is fine, and why not
learn thrift once you know and understand CQL well, but learning both is
imo a bad advice. It could maybe (maybe) be justified if what you say about
having whole family of use cases not being doable with CQL was true, but
it's not.

--
Sylvain





 On Wed, Jan 21, 2015 at 11:45 AM, Sylvain Lebresne sylv...@datastax.com
 wrote:

 On Wed, Jan 21, 2015 at 4:44 PM, Peter Lin wool...@gmail.com wrote:

 I don't remember other people's examples in detail due to my shitty
 memory, so I'd rather not misquote.


 Fair enough, but maybe you shouldn't use people's examples you don't
 remenber as argument then. Those examples might be wrong or outdated and
 that kind of stuff creates confusion for everyone.



 In my case, I mix static and dynamic columns in a single column family
 with primitives and objects. The objects are temporal object graphs with a
 known type. Doing this type of stuff is basically transparent for me, since
 I'm using thrift and our data modeler generates helper classes. Our tooling
 seamlessly convert the bytes back to the target object. We have a few
 standard static columns related to temporal metadata. At any time, dynamic
 columns can be added and they can be primitives or objects.


 I don't see anything in that that cannot be done with CQL. You can mix
 static and dynamic columns in CQL thanks to static columns. More precisely,
 you can do what you're describing with a table looking a bit like this:
   CREATE TABLE t (
 key blob,
 static my_static_column_1 int,
 static my_static_column_2 float,
 static my_static_column_3 blob,
 ,
 dynamic_column_name blob

Re: Re: Dynamic Columns

2015-01-21 Thread Sylvain Lebresne
On Wed, Jan 21, 2015 at 3:46 AM, Peter Lin wool...@gmail.com wrote:


  I don't understand why people [...] pretend it supports 100% of the use
 cases.


Have you consider the possibly that it's actually true and you're just
wrong by lack of knowledge?

--
Sylvain


Re: row cache hit is costlier for partiton with large rows

2015-01-21 Thread Sylvain Lebresne
The row cache saves partition data off-heap, which means that every cache
hit require copying/deserializing the cached partition into the heap, and
the more rows per partition you cache, the long it will take. Which is why
it's currently not a good cache too much rows per partition (unless you
know what you're doing).

On Wed, Jan 21, 2015 at 1:15 PM, nitin padalia padalia.ni...@gmail.com
wrote:

 Hi,

 With two different families when I do a read, row cache hit is almost
 15x costlier with larger rows (1 rows per partition), in
 comparison to partition with only 100 rows.

 Difference in two column families is one is having 100 rows per
 partition another 1 rows per partition. Schema for two tables is:
 CREATE TABLE table1_row_cache (
   user_id uuid,
   dept_id uuid,
   location_id text,
   locationmap_id uuid,
   PRIMARY KEY ((user_id, location_id), dept_id)
 )

 CREATE TABLE table2_row_cache (
   user_id uuid,
   dept_id uuid,
   location_id text,
   locationmap_id uuid,
   PRIMARY KEY ((user_id, dept_id), location_id)
 )

 Here is the tracing:

 Row cache Hit with Column Family table1_row_cache, 100 rows per partition:
  Preparing statement [SharedPool-Worker-2] | 2015-01-20
 14:35:47.54 | x.x.x.x |   1023
   Row cache hit [SharedPool-Worker-5] | 2015-01-20
 14:35:47.542000 | x.x.x.x |   2426

 Row cache Hit with CF table2_row_cache, 1 rows per partition:
 Preparing statement [SharedPool-Worker-1] | 2015-01-20 16:02:51.696000
 | x.x.x.x |490
  Row cache hit [SharedPool-Worker-2] | 2015-01-20
 16:02:51.711000 | x.x.x.x |  15146


 If for both cases data is in memory why its not same? Can someone
 point me what wrong here?

 Nitin Padalia



Re: Re: Dynamic Columns

2015-01-21 Thread Sylvain Lebresne
 I've chatted with several long time users of Cassandra and there's things
 CQL3 doesn't support.


Would you care to elaborate then? Maybe a simple example of something (or
multiple things since you used plural) in thrift that cannot be supported
in CQL?
And please note that I'm *not* saying that all existing thrift table can be
seemlessly used from CQL: there is indeed a few cases for which that's not
the case. But that does not mean those cases cannot easily be in CQL from
scratch.


Re: Not enough replica available” when consistency is ONE?

2015-01-19 Thread Sylvain Lebresne
On Mon, Jan 19, 2015 at 2:29 AM, Kevin Burton bur...@spinn3r.com wrote:

 So ConsistencyLevel.ONE and if not exists are essentially mutually
 incompatible and shouldn’t the driver throw an exception if the user
 requests this configuration?


 The subtlety is that this consistency level (CL.ONE in your case) is
actually
used by conditional operations (aka CAS operations, i.e. any insert/update
with
a 'IF'). For those operations there is in fact 2 consistency level that is
taken into account: the serial consistency level, that your driver should
allow
you to set and for which there is really only a choice between SERIAL and
LOCAL_SERIAL, and the usual consistency level, the one you've set to ONE.
There
is 2 phases (I'm simplifying) to CAS operations and each CL apply to one of
them: the first phase is the serial phase and the so-called serial
consistency
applies to it.  For that phase you basically need a quorum of nodes (or a
local quorum if you use LOCAL_SERIAL) and it's because this phase couldn't
be
achieved that you got your exception. After this phase, the write has
been decided (nodes have agreed on committing it if you will) but it may
not be
visible to non-serial reads just yet because the actual write may not have
been
committed to all nodes. It's to this 2nd committing phase that the
normal
consistency level applies. Concretely, it means that if you do a CAS write
with
a normal CL of ONE, and you then do a read with CL.ONE, you're not
guaranteed
to see your write right away. But if you use QUORUM, then a QORUM read
guarantees you to see that write.

Anyway, my point being that it wouldn't make sense for the driver to throw
an
exception since there is nothing wrong in practice. But it is true that
users
needs to understand how conditional updates differ from normal updates to
avoid
surprises.

--
Sylvain


Re: Token function in CQL for composite partition key

2015-01-07 Thread Sylvain Lebresne
On Wed, Jan 7, 2015 at 10:18 AM, Ajay ajay.ga...@gmail.com wrote:

 Hi,

 I have a column family as below:

 (Wide row design)
 CREATE TABLE clicks (hour text,adId int,itemId int,time timeuuid,PRIMARY
 KEY((adId, hour), time, itemId)) WITH CLUSTERING ORDER BY (time DESC);

 Now to query for a given Ad Id and specific 3 hours say 2015-01-07 11 to
 2015-01-07 14, how do I use the token function in the CQL.


From that description, it doesn't appear to me that you need the token
function. Just do 3 queries for each hour, each queries being something
along the lines of
  SELECT * FROM clicks WHERE adId=... AND hour='2015-01-07 11' AND ...

For completness sake, I should note that you could do that with a single
query by using an IN on the hour column, but it's actually not a better
solution (provided you submit the 3 queries in an asynchronous fashion at
least) in that case because of reason explained here:
https://medium.com/@foundev/cassandra-query-patterns-not-using-the-in-query-e8d23f9b17c7
.

--
Sylvain




Re: is primary key( foo, bar) the same as primary key ( foo ) with a ‘set' of bars?

2015-01-04 Thread Sylvain Lebresne
On Sun, Jan 4, 2015 at 12:48 AM, Sylvain Wallez sylv...@apache.org wrote:

  Indeed this makes sense for map keys and set values, but AFAIU from the
 docs this also applies to map and list _values_:  The maximum size of an
 item in a collection is 64K


Somehow it appears that from Jack's quote you've only read what was in the
parenthesis. The part that was not in parenthesis was:
  Collections values are currently limited to 64K because the serialized
form used uses shorts to encode the elements length

That's a limitation of the binary protocol if you will, not an internal
storage one.

I'll add that this protocol limitation has in fact already be lifted in the
v3 of the protocol (so C* 2.1) but documentation may not be entirely up to
date yet.





 http://www.datastax.com/documentation/cql/3.0/cql/cql_using/use_collections_c.html

 Or are collection values also represented as keys?

 Sylvain

 Le 03/01/2015 20:50, Jack Krupansky a écrit :

 See: https://issues.apache.org/jira/browse/CASSANDRA-5355

  Collections values are currently limited to 64K because the serialized
 form used uses shorts to encode the elements length (and for sets elements
 and key map, because they are part of the internal column name that is
 itself limited to 64K).

  -- Jack Krupansky

 On Sat, Jan 3, 2015 at 2:31 PM, Sylvain Wallez sylv...@apache.org wrote:

  From what I understand from the docs, the 64k limit applies to both the
 number of items in a collection and the size of its elements?

 Why is there a constraint on value size in collections, when other types
 such as blob or text can be larger?

 Thanks,
 Sylvain

 Le 01/01/2015 20:04, DuyHai Doan a écrit :

 Storage-engine wise, they are almost equivalent, thought there are some
 minor differences:

  1) with Set structure, you cannot store more that 64kb worth of data
 2) collections and maps are loaded entirely by Cassandra for each query,
 whereas with clustering columns you can select a slice of columns



 On Thu, Jan 1, 2015 at 7:46 PM, Kevin Burton bur...@spinn3r.com wrote:

 I think the two tables are the same.  Correct?

  create table foo (

  source text,
 target text,
 primary key( source, target )
 )


  vs

   create table foo (

  source text,
 target settext,
 primary key( source )
 )

  … meaning that the first one, under the covers is represented the same
 as the second.  As a slice.

  Am I correct?

  --
   Founder/CEO Spinn3r.com
  Location: *San Francisco, CA*
  blog: http://burtonator.wordpress.com
 … or check out my Google+ profile
 https://plus.google.com/102718274791889610666/posts
  http://spinn3r.com




   --
 Sylvain Wallez - http://bluxte.net




 --
 Sylvain Wallez - http://bluxte.net




Re: IF NOT EXISTS on UPDATE statements?

2014-11-18 Thread Sylvain Lebresne
On Mon, Nov 17, 2014 at 10:52 PM, Kevin Burton bur...@spinn3r.com wrote:

 There’s still a lot of weirdness in CQL.

 For example, you can do an INSERT with an UPDATE .. .which I’m generally
 fine with.  Kind of make sense.

 However, with INSERT you can do IF NOT EXISTS.

 … but you can’t do the same thing on UPDATE.

 So I foolishly wrote all my code assuming that INSERT/UPDATE were
 orthogonal, but now they’re not.

 you can still do IF on UPDATE though… but it’s not possible to do IF
 mycolumn IS NULL

 .. so is there a way to mimic IF NOT EXISTS on UPDATE or is this just a
 bug?


There is no way to mimic IF NOT EXISTS on UPDATE and it's not a bug. INSERT
and UPDATE are not totally orthogonal
in CQL and you should use INSERT for actual insertion and UPDATE for
updates (granted, the database will not reject
our query if you break this rule but it's nonetheless the way it's intended
to be used).

--
Sylvain





 --

 Founder/CEO Spinn3r.com
 Location: *San Francisco, CA*
 blog: http://burtonator.wordpress.com
 … or check out my Google+ profile
 https://plus.google.com/102718274791889610666/posts
 http://spinn3r.com




Re: Empty cqlsh cells vs. null

2014-10-27 Thread Sylvain Lebresne
On Mon, Oct 27, 2014 at 11:05 AM, Jens Rantil jens.ran...@tink.se wrote:

  Tyler,

 I see. That explains it. Any chance you might know how the Datastax Java
 driver behaves for this (odd) case?


The Row.getInt() method will do as for nulls and return 0 (though of
course, the Row.isNull() method will return false). If you want to
explicitely check if it's an empty value, you'll have to use
getBytesUnsafe(). Long story short, unless you like suffering for no
reason, don't insert empty values for types for which it doesn't make sense.

--
Sylvain




 Cheers,
 Jens

 ——— Jens Rantil Backend engineer Tink AB Email: jens.ran...@tink.se
 Phone: +46 708 84 18 32 Web: www.tink.se Facebook Linkedin Twitter

 On Friday, Oct 24, 2014 at 6:24 pm, Tyler Hobbs ty...@datastax.com,
 wrote:


 On Fri, Oct 24, 2014 at 6:38 AM, Jens Rantil jens.ran...@tink.se wrote:


 Just to clarify, I am seeing three types of output for an int field.
 It’s either:
  * Empty output. Nothing. Nil. Also ‘’.
  * An integer written in green. Regexp: [0-9]+
  * Explicitly ‘null’ written in red letters.


  Some types (including ints) accept an empty string/ByteBuffer as a
 valid value.  This is distinct from null, or no cell being present.  This
 behavior is primarily a legacy from the Thrift days.

 --
 Tyler Hobbs
 DataStax http://datastax.com/




[RELEASE] Apache Cassandra 2.1.1 released

2014-10-24 Thread Sylvain Lebresne
The Cassandra team is pleased to announce the release of Apache Cassandra
version 2.1.1.

Apache Cassandra is a fully distributed database. It is the right choice
when you need scalability and high availability without compromising
performance.

 http://cassandra.apache.org/

Downloads of source and binary distributions are listed in our download
section:

 http://cassandra.apache.org/download/

This version is a bug fix release[1] on the 2.1 series. As always, please
pay
attention to the release notes[2] and Let us know[3] if you were to
encounter
any problem.

Enjoy!

[1]: http://goo.gl/ytYBFb (CHANGES.txt)
[2]: http://goo.gl/cQW3RF (NEWS.txt)
[3]: https://issues.apache.org/jira/browse/CASSANDRA


[RELEASE] Apache Cassandra 2.0.11 released

2014-10-24 Thread Sylvain Lebresne
The Cassandra team is pleased to announce the release of Apache Cassandra
version 2.0.11.

Apache Cassandra is a fully distributed database. It is the right choice
when you need scalability and high availability without compromising
performance.

 http://cassandra.apache.org/

Downloads of source and binary distributions are listed in our download
section:

 http://cassandra.apache.org/download/

This version is a bug fix release[1] on the 2.0 series. As always, please
pay
attention to the release notes[2] and Let us know[3] if you were to
encounter
any problem.

Enjoy!

[1]: http://goo.gl/pMBdRa (CHANGES.txt)
[2]: http://goo.gl/ZYN0Ji (NEWS.txt)
[3]: https://issues.apache.org/jira/browse/CASSANDRA


Re: Not-Equals (!=) in Where Clause

2014-10-01 Thread Sylvain Lebresne
Right, my bad, thanks Tyler for the correction.

On Tue, Sep 30, 2014 at 5:44 PM, Tyler Hobbs ty...@datastax.com wrote:

 I think Sylvain may not have had his coffee yet.  You can't use IF's in
 SELECT statements, but you can in INSERT/UPDATE/DELETE:

 UPDATE foo SET a = 0 WHERE k = 0 IF b != 0;

 On Tue, Sep 30, 2014 at 2:36 AM, Sylvain Lebresne sylv...@datastax.com
 wrote:



 Is != supported as part of the where clause in Cassandra?


 It's not.

 Or is it the grammar for some other purpose?


 It's supported in 'IF' conditions. You can do something like:
   SELECT * FROM foo WHERE k = 0 IF v != 3;

 --
 Sylvain




 --
 Tyler Hobbs
 DataStax http://datastax.com/



Re: Not-Equals (!=) in Where Clause

2014-09-30 Thread Sylvain Lebresne

 Is != supported as part of the where clause in Cassandra?


It's not.

Or is it the grammar for some other purpose?


It's supported in 'IF' conditions. You can do something like:
  SELECT * FROM foo WHERE k = 0 IF v != 3;

--
Sylvain


Re: Saving file content to ByteBuffer and to column does not retrieve the same size of data

2014-09-30 Thread Sylvain Lebresne
On Tue, Sep 30, 2014 at 2:25 AM, Robert Coli rc...@eventbrite.com wrote:

 On Mon, Sep 22, 2014 at 3:50 AM, Carlos Scheidecker nando@gmail.com
 wrote:

 I can successfully read a file to a ByteBuffer and then write to a
 Cassandra blob column. However, when I retrieve the value of the column,
 the size of the ByteBuffer retrieved is bigger than the original ByteBuffer
 where the file was read from. Writing to the disk, corrupts the image.


 Probably don't write binary blobs like images into a database, use a
 distributed filesystem?


I've very successfully stored lots of small images into Cassandra so I have
to disagree with that far too quick conclusion. Cassandra always read blobs
in their entirety, so it's definitively not very good with very large
blobs, but there is many cases where images are known to be pretty small (I
was personally storing thumbnails) and in those cases, it is my experience
that Cassandra is a very viable solution.


 But I agree that this behavior sounds like a bug, I would probably file it
 as a JIRA on http://issues.apache.org and then tell the list the URL of
 the JIRA you filed.


I actually doubt it is a bug, and it's almost certainly not a Cassandra bug
(so please, do *no* open a JIRA on http://issues.apache.org). I suspect a
bad use of the ByteBuffer API (which is definitively a very confusing API,
but that's what Java gives us). Typically, in your snippet of code above,
the line:
byte[] data = new byte[buffer.limit()];
is incorrect. 'buffer.limit()' is not the number of valid bytes in the
buffer, you should use 'buffer.remaining()' for that. You should also be
careful with messing with 'arrayOffset', a line that
buf.position(buf.arrayOffset());
(also from one of you snippet above) is almost surely wrong.

--
Sylvain


[RELEASE] Apache Cassandra 2.1.0

2014-09-11 Thread Sylvain Lebresne
The Cassandra team is pleased to announce the release of the final version
of Apache Cassandra 2.1.0.

Cassandra 2.1.0 brings a number of new features and improvements including
(but
not limited to):
 - Improved support of Windows.
 - A new incremental repair option[4, 5]
 - A better row cache that can cache only the head of partitions[6]
 - Off-heap memtables[7]
 - Numerous performance improvements[8, 9]
 - CQL improvements and additions: User-defined types, tuple types, 2ndary
   indexing of collections, ...[10]
 - An improved stress tool[11]

Please refer to the release notes[1] and changelog[2] for details.

Both source and binary distributions of Cassandra 2.1.0 can be downloaded
at:

 http://cassandra.apache.org/download/

As usual, a debian package is available from the project APT repository[3]
(you will need to use the 21x series).

The Cassandra team

[1]: http://goo.gl/k4eM39 (CHANGES.txt)
[2]: http://goo.gl/npCsro (NEWS.txt)
[3]: http://wiki.apache.org/cassandra/DebianPackaging
[4]: http://goo.gl/MjohJp
[5]: http://goo.gl/f8jSme
[6]: http://goo.gl/6TJPH6
[7]: http://goo.gl/YT7znJ
[8]: http://goo.gl/Rg3tdA
[9]: http://goo.gl/JfDBGW
[10]: http://goo.gl/kQl7GW
[11]: http://goo.gl/OTNqiQ


[RELEASE CANDIDATE] Apache Cassandra 2.1.0-rc7 released

2014-09-03 Thread Sylvain Lebresne
The Cassandra team is pleased to announce the sixth release candidate for
the
future Apache Cassandra version 2.1.0.

Please note that this is not yet the final 2.1.0 release and as such, it
should
not be considered for production use. We'd appreciate testing and let us
know
if you encounter any problem[3,4]. Please make sure to have a look at the
change log[1] and release notes[2].

Apache Cassandra 2.1.0-rc7[5] is available as usual from the cassandra
website (http://cassandra.apache.org/download/) and a debian package is
available using the 21x branch (see
http://wiki.apache.org/cassandra/DebianPackaging).

Enjoy!

[1]: http://goo.gl/7sCbQN (CHANGES.txt)
[2]: http://goo.gl/bwXyjm (NEWS.txt)
[3]: https://issues.apache.org/jira/browse/CASSANDRA
[4]: user@cassandra.apache.org
[5]:
http://git-wip-us.apache.org/repos/asf?p=cassandra.git;a=shortlog;h=refs/tags/cassandra-2.1.0-rc7


Re: Help with migration from Thrift to CQL3 on Cassandra 2.0.10

2014-09-01 Thread Sylvain Lebresne
On Sun, Aug 31, 2014 at 2:59 AM, Todd Nine toddn...@apache.org wrote:

 Hi all,
   I'm working on transferring our thrift DAOs over to CQL.  It's going
 well, except for 2 cases that both use multi get.  The use case is very
 simple.  It is a narrow row, by design, with only a few columns.  When I
 perform a multiget, I need to get up to 1k rows at a time.  I do not want
 to turn these into a wide row using scopeId and scopeType as the row key.


 On the physical level, my Column Family needs something similar to the
 following format.


 scopeId, scopeType, nodeId, nodeType :{ timestamp: 0x00 }


 I've defined by table with the following CQL.


 CREATE TABLE IF NOT EXISTS Graph_Marked_Nodes
 ( scopeId uuid, scopeType varchar, nodeId uuid, nodeType varchar,
 timestamp bigint,
 PRIMARY KEY ((scopeId , scopeType, nodeId, nodeType))
 )WITH caching = 'all'


 This works well for inserts deletes and single reads.  I always know the
 scopeId, scopeType, nodeId, and nodeType, so I want to return the timestamp
 columns.  I thought I could use the IN operation and specify the pairs of
 nodeId and nodeTypes I have as input, however this doesn't work.

 Can anyone give me a suggestion on how to perform a multiget when I have
 several values for the nodeId and the nodeType?  This read occurs on every
 read of edges so making 1k trips is not going to work from a performance
 perspective.

 Below is the query I've tried.

 SELECT timestamp FROM  Graph_Marked_Nodes WHERE scopeId = ? AND scopeType
 = ? AND nodeId IN (uuid1, uuid2, uuid3) AND nodeType IN ('foo','bar')


This is not supported by CQL currently. We do support the equivalent of
multiget in CQL through IN, but it's slightly limited in the case of
compound partition keys in that you can only use a IN on the last column of
such compound partition key currently (here, that's nodeType). There is no
good reason for that limitation outside of historical ones and I've opened
CASSANDRA-7855 to fix it.

That being said, I would argue that it's hardly a big deal since using
multiget has always been slightly frown upon. Multiget doesn't do much
optimization, the only thing it does is that it parallelize the queries on
the coordinator, which is something you can do as easily client side. And
doing it client side has a few advantages: you will get the result for each
partition as soon as it's performed, which can allow you to process things
sooner. Also, a multi-get is more likely to timeout that splitting it to
individual queries (and having only one of the subquery that timeout means
you don't get any result at all). Lastly, while doing the parallelization
server side will use a tiny bit more network traffic between the client and
coordinator, you will save intra-cluster traffic provided you have a
token-aware client (because each query will be properly routed).

--
Sylvain



 I've found this issue, which looks like it's a solution to my problem.

 https://issues.apache.org/jira/browse/CASSANDRA-6875

 However, I'm not able to get the syntax in the issue description to work
 either.  Any input would be appreciated!

 Cassandra: 2.0.10
 Datastax Driver: 2.1.0

 Thanks,
 Todd







Re: CQL performance inserting multiple cluster keys under same partition key

2014-08-27 Thread Sylvain Lebresne
On Tue, Aug 26, 2014 at 6:50 PM, Jaydeep Chovatia 
chovatia.jayd...@gmail.com wrote:

 Hi,

 I have question on inserting multiple cluster keys under same partition
 key.

 Ex:

 CREATE TABLE Employee (
   deptId int,
   empId int,
   name   varchar,
   address varchar,
   salary int,
   PRIMARY KEY(deptId, empId)
 );

 BEGIN *UNLOGGED *BATCH
   INSERT INTO Employee (deptId, empId, name, address, salary) VALUES (1,
 10, 'testNameA', 'testAddressA', 2);
   INSERT INTO Employee (deptId, empId, name, address, salary) VALUES (1,
 20, 'testNameB', 'testAddressB', 3);
 APPLY BATCH;

 Here we are inserting two cluster keys (10 and 20) under same partition
 key (1).
 Q1) Is this batch transaction atomic and isolated? If yes then is there
 any performance overhead with this syntax?


As long as the update are under the same partition key (and I insist, only
in that condition), logged (the one without the UNLOGGED keyword) and
unlogged batches behave *exactly* the same way. So yes, in that case the
batch is atomic and isolated (though on the isolation, you may want to be
aware that while technically isolated, the usual timestamp rules still
apply and so you might not get the behavior you think if 2 batches have the
same timestamp: see CASSANDRA-6123
https://issues.apache.org/jira/browse/CASSANDRA-6123). There is no also
no performance overhead (assuming you meant over logged batches).

Q2) Is this CQL syntax can be considered equivalent of Thrift
 batch_mutate?


It is equivalent, both (the CQL syntax and Thrift batch_mutate) resolve
to the same operation internally.

--
Sylvain


[RELEASE] Apache Cassandra 2.0.10 released

2014-08-25 Thread Sylvain Lebresne
The Cassandra team is pleased to announce the release of Apache Cassandra
version 2.0.10.

Apache Cassandra is a fully distributed database. It is the right choice
when you need scalability and high availability without compromising
performance.

 http://cassandra.apache.org/

Downloads of source and binary distributions are listed in our download
section:

 http://cassandra.apache.org/download/

This version is a bug fix release[1] on the 2.0 series. As always, please
pay
attention to the release notes[2] and Let us know[3] if you were to
encounter
any problem.

Enjoy!

[1]: http://goo.gl/FNcwyk (CHANGES.txt)
[2]: http://goo.gl/NLVXwb (NEWS.txt)
[3]: https://issues.apache.org/jira/browse/CASSANDRA


[RELEASE CANDIDATE] Apache Cassandra 2.1.0-rc6 released

2014-08-19 Thread Sylvain Lebresne
The Cassandra team is pleased to announce the sixth release candidate for
the
future Apache Cassandra version 2.1.0.

Please note that this is not yet the final 2.1.0 release and as such, it
should
not be considered for production use. We'd appreciate testing and let us
know
if you encounter any problem[3,4]. Please make sure to have a look at the
change log[1] and release notes[2].

Apache Cassandra 2.1.0-rc6[5] is available as usual from the cassandra
website (http://cassandra.apache.org/download/) and a debian package is
available using the 21x branch (see
http://wiki.apache.org/cassandra/DebianPackaging).

Enjoy!

[1]: http://goo.gl/MyqArD (CHANGES.txt)
[2]: http://goo.gl/7vS47U (NEWS.txt)
[3]: https://issues.apache.org/jira/browse/CASSANDRA
[4]: user@cassandra.apache.org
[5]:
http://git-wip-us.apache.org/repos/asf?p=cassandra.git;a=shortlog;h=refs/tags/cassandra-2.1.0-rc6


Re: Best way to format a ResultSet / Row ?

2014-08-19 Thread Sylvain Lebresne
This kind of question belong to the java driver mailing list, not the
Cassandra one, please try to use the proper mailing list in the future.

On Tue, Aug 19, 2014 at 10:11 AM, Fabrice Larcher fabrice.larc...@level5.fr
 wrote:


 But this is probably not very usefull, since you get only prints of bytes.
 You can then test the type of the column (variable 'def') in order to call
 the best suited method of 'row',


You don't have to test the type, you can just the deserialize method of
the column type. So in Fabrice's example,
  Object val = def.getType().deserialize(row.getBytesUnsafe(def.getName()));

--
Sylvain


Re: range query times out (on 1 node, just 1 row in table)

2014-08-13 Thread Sylvain Lebresne
That sounds like a bug (the trace does look fishy). I'm not sure you've
indicated the Cassandra version you use so the first thing might be to
check that this hasn't been fixed in a recent version, but if you are using
a recent release (say 2.0.9), then please do open a JIRA ticket with your
reproduction steps.


On Wed, Aug 13, 2014 at 4:25 AM, Ian Rose ianr...@fullstory.com wrote:

 Hi -

 I am currently running a single Cassandra node on my local dev machine.
  Here is my (test) schema (which is meaningless, I created it just to
 demonstrate the issue I am running into):

 CREATE TABLE foo (
   foo_name ascii,
   foo_shard bigint,
   int_val bigint,
   PRIMARY KEY ((foo_name, foo_shard))
 ) WITH read_repair_chance=0.1;

 CREATE INDEX ON foo (int_val);
 CREATE INDEX ON foo (foo_name);

 I have inserted just a single row into this table:
 insert into foo(foo_name, foo_shard, int_val) values('dave', 27, 100);

 This query works fine:
 select * from foo where foo_name='dave';

 But when I run this query, I get an RPC timeout:
 select * from foo where foo_name='dave' and int_val  0 allow filtering;

 With tracing enabled, here is the trace output:
 http://pastebin.com/raw.php?i=6XMEVUcQ

 (In short, everything looks fine to my untrained eye until 10s elapsed, at
 which time the following event is logged: Timed out; received 0 of 1
 responses for range 257 of 257)

 Can anyone help interpret this error?

 Many thanks!
 Ian




Re: C* 2.1-rc2 gets unstable after a 'DROP KEYSPACE' command ?

2014-08-07 Thread Sylvain Lebresne
It would be nice if you can try with 2.1.0-rc5 (there has been quite a bit
of bug fixes since rc2). If you can still reproduce that NPE there, please
do open a jira ticket with the reproduction steps.


On Thu, Aug 7, 2014 at 11:29 AM, Fabrice Larcher fabrice.larc...@level5.fr
wrote:

 Hello,

 After a 'DROP TABLE' command that returns errors={}, last_host=127.0.0.1
 (like most DROP commands do) from CQLSH with C* 2.1.0-rc2, I stopped C*.
 And I can not start one node. It says :
 ERROR 09:18:34 Exception encountered during startup
 java.lang.NullPointerException: null
 at org.apache.cassandra.db.Directories.init(Directories.java:191)
 ~[apache-cassandra-2.1.0-rc2.jar:2.1.0-rc2]
 at
 org.apache.cassandra.db.ColumnFamilyStore.removeUnfinishedCompactionLeftovers(ColumnFamilyStore.java:553)
 ~[apache-cassandra-2.1.0-rc2.jar:2.1.0-rc2]
 at
 org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:245)
 [apache-cassandra-2.1.0-rc2.jar:2.1.0-rc2]
 at
 org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:455)
 [apache-cassandra-2.1.0-rc2.jar:2.1.0-rc2]
 at
 org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:544)
 [apache-cassandra-2.1.0-rc2.jar:2.1.0-rc2]
 java.lang.NullPointerException
 at org.apache.cassandra.db.Directories.init(Directories.java:191)
 at
 org.apache.cassandra.db.ColumnFamilyStore.removeUnfinishedCompactionLeftovers(ColumnFamilyStore.java:553)
 at
 org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:245)
 at
 org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:455)
 at
 org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:544)
 Exception encountered during startup: null

 I do not now if it can help.


 Fabrice LARCHER


 2014-07-18 7:23 GMT+02:00 Fabrice Larcher fabrice.larc...@level5.fr:

 Hello,

 I still experience a similar issue after a 'DROP KEYSPACE' command with
 C* 2.1-rc3. Connection to the node may fail after a 'DROP'.

 But I did not see this issue with 2.1-rc1 (- it seems like to be a
 regression brought with 2.1-rc2).

 Fabrice LARCHER


 2014-07-17 9:19 GMT+02:00 Benedict Elliott Smith 
 belliottsm...@datastax.com:

 Also https://issues.apache.org/jira/browse/CASSANDRA-7437 and
 https://issues.apache.org/jira/browse/CASSANDRA-7465 for rc3, although
 the CounterCacheKey assertion looks like an independent (though
 comparatively benign) bug I will file a ticket for.

 Can you try this against rc3 to see if the problem persists? You may see
 the last exception, but it shouldn't affect the stability of the cluster.
 If either of the other exceptions persist, please file a ticket.


 On Thu, Jul 17, 2014 at 1:41 AM, Tyler Hobbs ty...@datastax.com wrote:

 This looks like https://issues.apache.org/jira/browse/CASSANDRA-6959,
 but that was fixed for 2.1.0-rc1.

 Is there any chance you can put together a script to reproduce the
 issue?


 On Thu, Jul 10, 2014 at 8:51 AM, Pavel Kogan pavel.ko...@cortica.com
 wrote:

 It seems that memtable tries to flush itself to SSTable of not
 existing keyspace. I don't know why it is happens, but probably running
 nodetool flush before drop should prevent this issue.

 Pavel


 On Thu, Jul 10, 2014 at 4:09 AM, Fabrice Larcher 
 fabrice.larc...@level5.fr wrote:

 ​Hello,

 I am using the 'development' version 2.1-rc2.

 With one node (=localhost), I get timeouts trying to connect to C*
 after running a 'DROP KEYSPACE' command. I have following error messages 
 in
 system.log :

 INFO  [SharedPool-Worker-3] 2014-07-09 16:29:36,578
 MigrationManager.java:319 - Drop Keyspace 'test_main'
 (...)
 ERROR [MemtableFlushWriter:6] 2014-07-09 16:29:37,178
 CassandraDaemon.java:166 - Exception in thread
 Thread[MemtableFlushWriter:6,5,main]
 java.lang.RuntimeException: Last written key
 DecoratedKey(91e7f660-076f-11e4-a36d-28d2444c0b1b,
 52446dde90244ca49789b41671e4ca7c) = current key
 DecoratedKey(91e7f660-076f-11e4-a36d-28d2444c0b1b,
 52446dde90244ca49789b41671e4ca7c) writing into
 ./../data/data/test_main/user-911d5360076f11e4812d3d4ba97474ac/test_main-user.user_account-tmp-ka-1-Data.db
 at
 org.apache.cassandra.io.sstable.SSTableWriter.beforeAppend(SSTableWriter.java:172)
 ~[apache-cassandra-2.1.0-rc2.jar:2.1.0-rc2]
 at
 org.apache.cassandra.io.sstable.SSTableWriter.append(SSTableWriter.java:215)
 ~[apache-cassandra-2.1.0-rc2.jar:2.1.0-rc2]
 at
 org.apache.cassandra.db.Memtable$FlushRunnable.writeSortedContents(Memtable.java:351)
 ~[apache-cassandra-2.1.0-rc2.jar:2.1.0-rc2]
 at
 org.apache.cassandra.db.Memtable$FlushRunnable.runWith(Memtable.java:314)
 ~[apache-cassandra-2.1.0-rc2.jar:2.1.0-rc2]
 at
 org.apache.cassandra.io.util.DiskAwareRunnable.runMayThrow(DiskAwareRunnable.java:48)
 ~[apache-cassandra-2.1.0-rc2.jar:2.1.0-rc2]
 at
 org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28)
 ~[apache-cassandra-2.1.0-rc2.jar:2.1.0-rc2]
 

Re: Issue with ALLOW FILTERING

2014-08-06 Thread Sylvain Lebresne
On Wed, Aug 6, 2014 at 9:41 AM, Jens Rantil jens.ran...@tink.se wrote


 I'm struggling to see any reason for it not being supported.


The time to implement it, plus a bunch of internal implementation reasons
that makes it not as trivial to support as you seem to suggest it is (of
course, this is open source, you are welcome to have a look if that's a
particular itch you want to scratch; there is even a JIRA ticket:
https://issues.apache.org/jira/browse/CASSANDRA-6377).



 Or is it considered implementation specific under what circumstances ALLOW
 FILTERING can be used?


Currently, it kind of is. ALLOW FILTERING allows to execute some queries
that couldn't be otherwise, but not everything. Again, things that are not
supported are not mainly for implementation reasons, nothing more, and that
may/will change in the future. That said, I'm not saying documentation
cannot be improved (though I'm not sure having the doc saying this doesn't
work would be a lot more helpful than trying it and having the
implementation saying this doesn't work).

--
Sylvain



 Thanks,
 Jens


 On Tue, Aug 5, 2014 at 8:11 PM, Sávio S. Teles de Oliveira 
 savio.te...@cuia.com.br wrote:

 You need to create an index on attribute *c.*


 2014-08-05 9:24 GMT-03:00 Jens Rantil jens.ran...@tink.se:

 Hi,

 I'm having an issue with ALLOW FILTERING with Cassandra 2.0.8. See a
 minimal example here:
 https://gist.github.com/JensRantil/ec43622c26acb56e5bc9

 I expect the second last to fail, but the last query to return a single
 row. In particular I expect the last SELECT to first select using the
 clustering primary id and then do filtering.

 I've been reading
 https://cassandra.apache.org/doc/cql3/CQL.html#selectStmt ALLOW
 FILTERING and can't wrap my head around why this won't work.

 Could anyone clarify this for me?

 Thanks,
 Jens




 --
 Atenciosamente,
 Sávio S. Teles de Oliveira
 voice: +55 62 9136 6996
 http://br.linkedin.com/in/savioteles
  Mestrando em Ciências da Computação - UFG
 Arquiteto de Software
 CUIA Internet Brasil





Re: A question about using 'update keyspace with strategyoptions' command

2014-08-05 Thread Sylvain Lebresne
Changing the strategy options, and in particular the replication factor,
does not perform any data replication by itself. You need to run a repair
to ensure data is replicated following the new replication.


On Tue, Aug 5, 2014 at 10:52 AM, Lu, Boying boying...@emc.com wrote:

 Thanks. yes. I can use the ‘show keyspace’ command to check and see the
 strategy does changed.



 But what I want to know is if the ‘update keyspace with strategy_options
 …’ command is

 a ‘sync’ operation or a ‘async’ operation.







 *From:* Rahul Menon [mailto:ra...@apigee.com]
 *Sent:* 2014年8月5日 16:38
 *To:* user
 *Subject:* Re: A question about using 'update keyspace with
 strategyoptions' command



 Try the show keyspaces command and look for Options under each keyspace.



 Thanks

 Rahul



 On Tue, Aug 5, 2014 at 2:01 PM, Lu, Boying boying...@emc.com wrote:

 Hi, All,



 I want to run ‘update keyspace with strategy_options={dc1:3, dc2:3}’ from
 cassandra-cli to update the strategy options of some keyspace

 in a multi-DC environment.



 When the command returns successfully, does it mean that the strategy
 options have been updated successfully or I need to wait

 some time for the change to be propagated  to all DCs?



 Thanks



 Boying







Re: A question about using 'update keyspace with strategyoptions' command

2014-08-05 Thread Sylvain Lebresne
On Tue, Aug 5, 2014 at 11:40 AM, Lu, Boying boying...@emc.com wrote:

 What I want to know is “are the *strategy* changed ?’ after the ‘udpate
 keyspace with strategy_options…’ command returns successfully


Like all schema changes, not necessarily on all nodes. You will have to
check for schema agreement between nodes.



 Not the *data* change.



 e.g. say I run the command ‘update keyspace with strategy_opitons [dc1: 3,
 dc2:3]’ , when this command returns,

 are the *strategy* options already changed? Or I need to wait some time
 for the strategy to be changed?





 *From:* Sylvain Lebresne [mailto:sylv...@datastax.com]
 *Sent:* 2014年8月5日 16:59
 *To:* user@cassandra.apache.org

 *Subject:* Re: A question about using 'update keyspace with
 strategyoptions' command



 Changing the strategy options, and in particular the replication factor,
 does not perform any data replication by itself. You need to run a repair
 to ensure data is replicated following the new replication.



 On Tue, Aug 5, 2014 at 10:52 AM, Lu, Boying boying...@emc.com wrote:

 Thanks. yes. I can use the ‘show keyspace’ command to check and see the
 strategy does changed.



 But what I want to know is if the ‘update keyspace with strategy_options
 …’ command is

 a ‘sync’ operation or a ‘async’ operation.







 *From:* Rahul Menon [mailto:ra...@apigee.com]
 *Sent:* 2014年8月5日 16:38
 *To:* user
 *Subject:* Re: A question about using 'update keyspace with
 strategyoptions' command



 Try the show keyspaces command and look for Options under each keyspace.



 Thanks

 Rahul



 On Tue, Aug 5, 2014 at 2:01 PM, Lu, Boying boying...@emc.com wrote:

 Hi, All,



 I want to run ‘update keyspace with strategy_options={dc1:3, dc2:3}’ from
 cassandra-cli to update the strategy options of some keyspace

 in a multi-DC environment.



 When the command returns successfully, does it mean that the strategy
 options have been updated successfully or I need to wait

 some time for the change to be propagated  to all DCs?



 Thanks



 Boying









[RELEASE CANDIDATE] Apache Cassandra 2.1.0-rc5 released

2014-08-04 Thread Sylvain Lebresne
The Cassandra team is pleased to announce the release of the fifth release
candidate for the future Apache Cassandra 2.1.0.

If all goes well this candidate will become the final 2.1.0 but until then,
it
should not be considered for production use. We'd appreciate testing and let
us know if you encounter any problem[3,4]. Please make sure to have a look
at
the change log[1] and release notes[2].

Apache Cassandra 2.1.0-rc5[5] is available as usual from the cassandra
website (http://cassandra.apache.org/download/) and a debian package is
available using the 21x branch (see
http://wiki.apache.org/cassandra/DebianPackaging).

Thank you for your help in testing and have fun with it.

[1]: http://goo.gl/oKtgbo (CHANGES.txt)
[2]: http://goo.gl/hGfRtm (NEWS.txt)
[3]: https://issues.apache.org/jira/browse/CASSANDRA
[4]: user@cassandra.apache.org
[5]:
http://git-wip-us.apache.org/repos/asf?p=cassandra.git;a=shortlog;h=refs/tags/cassandra-2.1.0-rc5


[RELEASE CANDIDATE] Apache Cassandra 2.1.0-rc3 released

2014-07-10 Thread Sylvain Lebresne
The Cassandra team is pleased to announce the release of the second release
candidate for the future Apache Cassandra 2.1.0.

Let first stress that this is not yet the final release of 2.1.0 and as
such is
*not* ready for production use. We however encourage as much testing as
possible of this release candidate as possible and please report any problem
you may encounter[3,4]. Also make sure to have a look at the change log[1]
and
release notes[2] to see where Cassandra 2.1 differs from the previous
series.

Apache Cassandra 2.1.0-rc3[5] is available as usual from the cassandra
website (http://cassandra.apache.org/download/) and a debian package is
available using the 21x branch (see
http://wiki.apache.org/cassandra/DebianPackaging).

Thank you for your help in testing and have fun with it.

[1]: http://goo.gl/mYfL0J (CHANGES.txt)
[2]: http://goo.gl/kiBZNS (NEWS.txt)
[3]: https://issues.apache.org/jira/browse/CASSANDRA
[4]: user@cassandra.apache.org
[5]:
http://git-wip-us.apache.org/repos/asf?p=cassandra.git;a=shortlog;h=refs/tags/cassandra-2.1.0-rc3


[RELEASE] Apache Cassandra 1.2.18 released

2014-07-03 Thread Sylvain Lebresne
The Cassandra team is pleased to announce the release of Apache Cassandra
version 1.2.18.

Cassandra is a highly scalable second-generation distributed database,
bringing together Dynamo's fully distributed design and Bigtable's
ColumnFamily-based data model. You can read more here:

 http://cassandra.apache.org/

Downloads of source and binary distributions are listed in our download
section:

 http://cassandra.apache.org/download/

This version is a maintenance/bug fix release[1] on the 1.2 series. As
always,
please pay attention to the release notes[2] and Let us know[3] if you were
to
encounter any problem.

Enjoy!

[1]: http://goo.gl/XWfGPo (CHANGES.txt)
[2]: http://goo.gl/PFr5TO (NEWS.txt)
[3]: https://issues.apache.org/jira/browse/CASSANDRA


Re: [RELEASE] Apache Cassandra 1.2.17 released

2014-07-02 Thread Sylvain Lebresne
TL;DR: unless you plan on compiling the source for this 1.2.17 release with
java 6, or want to use the new CloudStack snitch with java 6, you can
ignore this. Otherwise, read on.

The source for this 1.2.17 release won't compile with java 6 due to a
regression from CASSANDRA-7147
https://issues.apache.org/jira/browse/CASSANDRA-7147. This is a bug:
the 1.2 branch is still meant to be compatible with java 6 and we'll
release a 1.2.18 with a fix for this this shortly. Please note that doesn't
affect the binary artifacts published: those are still compiled for java 6
and there no particular problem running them with java 6. With the one
exception being the the new Cloudstack snitch introduced by CASSANDRA-7147
that will not work (it will throw a ClassNotFoundException) with java 6
until 1.2.18 is released.

--
Sylvain


On Mon, Jun 30, 2014 at 10:56 AM, Sylvain Lebresne sylv...@datastax.com
wrote:

 The Cassandra team is pleased to announce the release of Apache Cassandra
 version 1.2.17.

 Cassandra is a highly scalable second-generation distributed database,
 bringing together Dynamo's fully distributed design and Bigtable's
 ColumnFamily-based data model. You can read more here:

  http://cassandra.apache.org/

 Downloads of source and binary distributions are listed in our download
 section:

  http://cassandra.apache.org/download/

 This version is a maintenance/bug fix release[1] on the 1.2 series. As
 always,
 please pay attention to the release notes[2] and Let us know[3] if you
 were to
 encounter any problem.

 Enjoy!

 [1]: http://goo.gl/Me7v64 (CHANGES.txt)
 [2]: http://goo.gl/CWCIul (NEWS.txt)
 [3]: https://issues.apache.org/jira/browse/CASSANDRA




[RELEASE] Apache Cassandra 1.2.17 released

2014-06-30 Thread Sylvain Lebresne
The Cassandra team is pleased to announce the release of Apache Cassandra
version 1.2.17.

Cassandra is a highly scalable second-generation distributed database,
bringing together Dynamo's fully distributed design and Bigtable's
ColumnFamily-based data model. You can read more here:

 http://cassandra.apache.org/

Downloads of source and binary distributions are listed in our download
section:

 http://cassandra.apache.org/download/

This version is a maintenance/bug fix release[1] on the 1.2 series. As
always,
please pay attention to the release notes[2] and Let us know[3] if you were
to
encounter any problem.

Enjoy!

[1]: http://goo.gl/Me7v64 (CHANGES.txt)
[2]: http://goo.gl/CWCIul (NEWS.txt)
[3]: https://issues.apache.org/jira/browse/CASSANDRA


[RELEASE] Apache Cassandra 2.0.9 released

2014-06-30 Thread Sylvain Lebresne
The Cassandra team is pleased to announce the release of Apache Cassandra
version 2.0.9.

Cassandra is a highly scalable second-generation distributed database,
bringing together Dynamo's fully distributed design and Bigtable's
ColumnFamily-based data model. You can read more here:

 http://cassandra.apache.org/

Downloads of source and binary distributions are listed in our download
section:

 http://cassandra.apache.org/download/

This version is a bug fix release[1] on the 2.0 series. As always, please
pay
attention to the release notes[2] and Let us know[3] if you were to
encounter
any problem.

Enjoy!

[1]: http://goo.gl/ZTYXY8 (CHANGES.txt)
[2]: http://goo.gl/67aOI1 (NEWS.txt)
[3]: https://issues.apache.org/jira/browse/CASSANDRA


Re: Does the default LIMIT applies to automatic paging?

2014-06-26 Thread Sylvain Lebresne
On Wed, Jun 25, 2014 at 7:48 PM, ziju feng pkdog...@gmail.com wrote:


 The reason I mentioned the 1 rows LIMIT is not only because it is the
 default LIMIT in cqlsh, but also because I found it on the CQL document
 http://www.datastax.com/documentation/cql/3.1/cql/cql_reference/select_r.html,
 specifically the Specifying rows returned using LIMIT section. Perhaps
 the document needs some updates to clarify a bit about what applies to the
 drivers and what applies to cqlsh?


Yes, it needs clarification. I'll ask people to rectify. Thanks for
pointing it out.

--
Sylvain




 On Wed, Jun 25, 2014 at 12:21 AM, Sylvain Lebresne sylv...@datastax.com
 wrote:

 On Tue, Jun 24, 2014 at 1:03 AM, ziju feng pkdog...@gmail.com wrote:


 I was wondering if the default 1 rows LIMIT applies to automatic
 pagination in C* 2.0 (I'm using Datastax driver).


 There is no 1 rows LIMIT in CQL. cqlsh does apply a default LIMIT
 if you don't provide for convenience sake, but it's a cqlsh thing.
 Therefore, there is no default limit with the java driver (neither with or
 without automatic pagination).

 --
 Sylvain





Re: repair takes 10x more time in one DC compared to the other

2014-06-26 Thread Sylvain Lebresne
On Thu, Jun 26, 2014 at 4:06 AM, Paulo Ricardo Motta Gomes 
paulo.mo...@chaordicsystems.com wrote:


 [...] since you may want to repair nodes sequentially in the local DC
 (-local) without re-repairing ranges of neighbor nodes (-pr).


Nobody disagrees that this would nice to have, we're just saying that this
currently doesn't work and so we disallow it for now so people like you
don't get bitten. If you have a patch to fix it ready, please do feel free
to contribute.

--
Sylvain




 On Wed, Jun 25, 2014 at 1:48 PM, Sylvain Lebresne sylv...@datastax.com
 wrote:

 I see. Well, you shouldn't use both -local and -pr together, they
 don't make sense together. Which is the reason why their combination will
 be rejected in 2.0.9 (you can check
 https://issues.apache.org/jira/browse/CASSANDRA-7317 for details).
 Basically, the result of using both is that lots of stuffs don't get
 repaired.


 On Wed, Jun 25, 2014 at 6:11 PM, Paulo Ricardo Motta Gomes 
 paulo.mo...@chaordicsystems.com wrote:

 Thanks for the explanation, but I got slightly confused:

 From my understanding, you just described the behavior of the
 -pr/--partitioner-range option: Repair only the first range returned by
 the partitioner for the node. , so I would understand that repairs in the
 same CFs in different DCs with only the -pr option could take different
 times.

 However according to the description of the -local/--in-local-dc option,
 it only repairs against nodes in the same data center, but you said that 
 the
 range will be repaired for all replica in all data-centers, even with the
 -local option, or did you confuse it with -pr option?

 In any case, I'm using both -local and -pr options, what is the
 expected behavior in that case?

 Cheers,



 On Wed, Jun 25, 2014 at 12:46 PM, Sylvain Lebresne sylv...@datastax.com
  wrote:

 TL;DR, this is not unexpected and this is perfectly fine.

 For every node, 'repair --local' will repair the primary (where
 primary means the first range on the ring picked by the consistent hashing
 for this node given its token, nothing more) range of the node in the
 ring. And that range will be repaired for all replica in all data-centers.
 When you assign tokens to multiple DC, it's actually pretty common to
 offset the tokens of one DC slightly compared to the other one. This will
 result in the primary ranges being always small in one DC but not the
 other. But please note that this perfectly ok, it does not imply any
 imbalance in data-centers. It also don't really mean that the node of one
 DC actually do a lot more work than the other ones: all nodes most likely
 contribute roughly the same amount of work to the repair. It only mean that
 the nodes of one DC coordinate more repair work that those of the other
 DC. Which is not really a big deal since coordinating a repair is cheap.

 --
 Sylvain


 On Wed, Jun 25, 2014 at 4:43 PM, Paulo Ricardo Motta Gomes 
 paulo.mo...@chaordicsystems.com wrote:

 Hello,

 I'm running repair on a large CF with the --local flag in 2
 different DCs. In one of the DCs the operation takes about 1 hour per 
 node,
 while in the other it takes 10 hours per node.

 I would expect the times to differ, but not so much. The writes on
 that CF all come from the DC where it takes 10 hours per node, could this
 be the cause why it takes so long on this DC?

 Additional info: C* 1.2.16, both DCs have the same replication factor.

 Cheers,

 --
 *Paulo Motta*

 Chaordic | *Platform*
 *www.chaordic.com.br http://www.chaordic.com.br/*
 +55 48 3232.3200





 --
 *Paulo Motta*

 Chaordic | *Platform*
 *www.chaordic.com.br http://www.chaordic.com.br/*
 +55 48 3232.3200





 --
 *Paulo Motta*

 Chaordic | *Platform*
 *www.chaordic.com.br http://www.chaordic.com.br/*
 +55 48 3232.3200



[RELEASE CANDIDATE] Apache Cassandra 2.1.0-rc2 released

2014-06-26 Thread Sylvain Lebresne
The Cassandra team is pleased to announce the release of the second release
candidate for the future Apache Cassandra 2.1.0.

Let first stress that this is not yet the final release of 2.1.0 and as
such is
*not* ready for production use. We however encourage as much testing as
possible of this release candidate as possible and please report any problem
you may encounter[3,4]. Also make sure to have a look at the change log[1]
and
release notes[2] to see where Cassandra 2.1 differs from the previous
series.

Apache Cassandra 2.1.0-rc2[5] is available as usual from the cassandra
website (http://cassandra.apache.org/download/) and a debian package is
available using the 21x branch (see
http://wiki.apache.org/cassandra/DebianPackaging).

Thank you for your help in testing and have fun with it.

[1]: http://goo.gl/l7HnCV (CHANGES.txt)
[2]: http://goo.gl/AsB89A (NEWS.txt)
[3]: https://issues.apache.org/jira/browse/CASSANDRA
[4]: user@cassandra.apache.org
[5]:
http://git-wip-us.apache.org/repos/asf?p=cassandra.git;a=shortlog;h=refs/tags/cassandra-2.1.0-rc2


Re: Does the default LIMIT applies to automatic paging?

2014-06-25 Thread Sylvain Lebresne
On Tue, Jun 24, 2014 at 1:03 AM, ziju feng pkdog...@gmail.com wrote:


 I was wondering if the default 1 rows LIMIT applies to automatic
 pagination in C* 2.0 (I'm using Datastax driver).


There is no 1 rows LIMIT in CQL. cqlsh does apply a default LIMIT if
you don't provide for convenience sake, but it's a cqlsh thing. Therefore,
there is no default limit with the java driver (neither with or without
automatic pagination).

--
Sylvain


Re: Storing values of mixed types in a list

2014-06-25 Thread Sylvain Lebresne
On Wed, Jun 25, 2014 at 8:49 AM, Tuukka Mustonen tuukka.musto...@gmail.com
wrote:

 Unfortunately, I need to query per list items. That's why I'm running
 Cassandra 2.1rc1 (offers secondary indexes for collections).


Using a list of blobs does not in any way prevent you from doing that.
Types are constraints on what values C* will accept and using blob is
simply asking C* to not reject any value. Doing so does not in any way
limit the kind of queries you can do.

The small downside of using blobs is that you'll have to
serialize/deserialize your value manually client-side, but that's not a
huge deal either. That said, if you really only have 3 types of values to
store and if you don't particularly care about the order of items in the
collection (i.e. if you said you want a list but could really do with a
set), then storing 3 different sets can be a viable solution too (as in,
there is no strong downside to doing it as far as C* is concerned and it
may be simpler to deal with client side (or not, it depends a bit on what
your client side code does exactly)).



 As I understood it, also Cassandra supports dynamic schemas, but only
 through Thrift protocol.


dynamic schemas is a terribly imprecise term that means different things
to different people, but in general that statement is incorrect: you can do
the same things with CQL and with Thrift.


 Also, I don't think it changes the fact that collections need to be
 strongly-typed in Cassandra, no matter what protocol is used?


Well, yes since you do have to provide a type for the elements in the
collection, but as said previously that does not in any way prevent you for
having collections of anything since you can use a blob type.

--
Sylvain



 Tuukka



 On Tue, Jun 24, 2014 at 9:41 PM, DuyHai Doan doanduy...@gmail.com wrote:

 Jeremy, with blob field (ByteBuffer), I can query exact matches (just
 encode the value in query), but greater/less than queries would not work.
 Any sort of serialization kills native ways to query data -- Not
 necessarily. You still use normal types (uuid, string, timestamp,...) for
 clustering columns and use them for querying. For the cells where you store
 values, use blob type.




 On Tue, Jun 24, 2014 at 8:21 PM, Tuukka Mustonen 
 tuukka.musto...@gmail.com wrote:

 What if I need to query by list items?

 1. Jeremy, with blob field (ByteBuffer), I can query exact matches (just
 encode the value in query), but greater/less than queries would not work.
 Any sort of serialization kills native ways to query data
 2. Even with user defined types, I would need to define separate fields
 for each value. Running queries would be cumbersome (something like WHERE
 items CONTAINS {'text_value': 'foobar'} or WHERE items CONTAINS
 {'int_value': 3}. Pavel, did you mean like this?

 I'm running 2.1rc1 with python driver 2.0.2.

 Tuukka


 On Tue, Jun 24, 2014 at 4:39 PM, Pavel Kogan pavel.ko...@cortica.com
 wrote:

 1) You can use list of strings which are serialized JSONs, or use
 ByteBuffer with your own serialization as Jeremy suggested.
 2) Use Cassandra 2.1 (not officially released yet) were there is new
 feature of user defined types.

 Pavel




 On Tue, Jun 24, 2014 at 9:18 AM, Jeremy Jongsma jer...@barchart.com
 wrote:

 Use a ByteBuffer value type with your own serialization (we use
 protobuf for complex value structures)
  On Jun 24, 2014 5:30 AM, Tuukka Mustonen tuukka.musto...@gmail.com
 wrote:

 Hello,

 I need to store a list of mixed types in Cassandra. The list may
 contain numbers, strings and booleans. So I would need something like
 list?.

 Is this possible in Cassandra and if not, what workaround would you
 suggest for storing a list of mixed type items? I sketched a few (using a
 list per type, using list of user types in Cassandra 2.1, etc.), but I 
 get
 a bad feeling about each.

 Couldn't find an exact answer to this through searches...
 Regards,
 Tuukka

 P.S. I first asked this at SO before realizing the traffic there is
 very low:
 http://stackoverflow.com/questions/24380158/storing-a-list-of-mixed-types-in-cassandra








Re: repair takes 10x more time in one DC compared to the other

2014-06-25 Thread Sylvain Lebresne
TL;DR, this is not unexpected and this is perfectly fine.

For every node, 'repair --local' will repair the primary (where primary
means the first range on the ring picked by the consistent hashing for
this node given its token, nothing more) range of the node in the ring.
And that range will be repaired for all replica in all data-centers. When
you assign tokens to multiple DC, it's actually pretty common to offset the
tokens of one DC slightly compared to the other one. This will result in
the primary ranges being always small in one DC but not the other. But
please note that this perfectly ok, it does not imply any imbalance in
data-centers. It also don't really mean that the node of one DC actually do
a lot more work than the other ones: all nodes most likely contribute
roughly the same amount of work to the repair. It only mean that the nodes
of one DC coordinate more repair work that those of the other DC. Which
is not really a big deal since coordinating a repair is cheap.

--
Sylvain


On Wed, Jun 25, 2014 at 4:43 PM, Paulo Ricardo Motta Gomes 
paulo.mo...@chaordicsystems.com wrote:

 Hello,

 I'm running repair on a large CF with the --local flag in 2 different
 DCs. In one of the DCs the operation takes about 1 hour per node, while in
 the other it takes 10 hours per node.

 I would expect the times to differ, but not so much. The writes on that CF
 all come from the DC where it takes 10 hours per node, could this be the
 cause why it takes so long on this DC?

 Additional info: C* 1.2.16, both DCs have the same replication factor.

 Cheers,

 --
 *Paulo Motta*

 Chaordic | *Platform*
 *www.chaordic.com.br http://www.chaordic.com.br/*
 +55 48 3232.3200



Re: repair takes 10x more time in one DC compared to the other

2014-06-25 Thread Sylvain Lebresne
I see. Well, you shouldn't use both -local and -pr together, they don't
make sense together. Which is the reason why their combination will be
rejected in 2.0.9 (you can check
https://issues.apache.org/jira/browse/CASSANDRA-7317 for details).
Basically, the result of using both is that lots of stuffs don't get
repaired.


On Wed, Jun 25, 2014 at 6:11 PM, Paulo Ricardo Motta Gomes 
paulo.mo...@chaordicsystems.com wrote:

 Thanks for the explanation, but I got slightly confused:

 From my understanding, you just described the behavior of the
 -pr/--partitioner-range option: Repair only the first range returned by
 the partitioner for the node. , so I would understand that repairs in the
 same CFs in different DCs with only the -pr option could take different
 times.

 However according to the description of the -local/--in-local-dc option,
 it only repairs against nodes in the same data center, but you said that 
 the
 range will be repaired for all replica in all data-centers, even with the
 -local option, or did you confuse it with -pr option?

 In any case, I'm using both -local and -pr options, what is the
 expected behavior in that case?

 Cheers,



 On Wed, Jun 25, 2014 at 12:46 PM, Sylvain Lebresne sylv...@datastax.com
 wrote:

 TL;DR, this is not unexpected and this is perfectly fine.

 For every node, 'repair --local' will repair the primary (where primary
 means the first range on the ring picked by the consistent hashing for
 this node given its token, nothing more) range of the node in the ring.
 And that range will be repaired for all replica in all data-centers. When
 you assign tokens to multiple DC, it's actually pretty common to offset the
 tokens of one DC slightly compared to the other one. This will result in
 the primary ranges being always small in one DC but not the other. But
 please note that this perfectly ok, it does not imply any imbalance in
 data-centers. It also don't really mean that the node of one DC actually do
 a lot more work than the other ones: all nodes most likely contribute
 roughly the same amount of work to the repair. It only mean that the nodes
 of one DC coordinate more repair work that those of the other DC. Which
 is not really a big deal since coordinating a repair is cheap.

 --
 Sylvain


 On Wed, Jun 25, 2014 at 4:43 PM, Paulo Ricardo Motta Gomes 
 paulo.mo...@chaordicsystems.com wrote:

 Hello,

 I'm running repair on a large CF with the --local flag in 2 different
 DCs. In one of the DCs the operation takes about 1 hour per node, while in
 the other it takes 10 hours per node.

 I would expect the times to differ, but not so much. The writes on that
 CF all come from the DC where it takes 10 hours per node, could this be the
 cause why it takes so long on this DC?

 Additional info: C* 1.2.16, both DCs have the same replication factor.

 Cheers,

 --
 *Paulo Motta*

 Chaordic | *Platform*
 *www.chaordic.com.br http://www.chaordic.com.br/*
 +55 48 3232.3200





 --
 *Paulo Motta*

 Chaordic | *Platform*
 *www.chaordic.com.br http://www.chaordic.com.br/*
 +55 48 3232.3200



Re: Use Cassnadra thrift API with collection type

2014-06-23 Thread Sylvain Lebresne
On Mon, Jun 23, 2014 at 6:19 PM, James Campbell 
ja...@breachintelligence.com wrote:

  Huilang,


  Since there hasn't been another reply yet, I'll throw out an idea that
 worked for us as part of a test, though it does not seem exactly like a
 preferred way since it crosses code-bases.  We built the type using
  straight java type, then used the Datastax v2 driver's DataType class
 serializer.


  Concretely, it would look like the following (adapting your code):

 Column column = new Column();
 column.name=columnSerializer.toByteBuffer(colname); // the
 column name of the map type, it works with other kinds of data type

 ​column.value = DataType.map(DataType.ascii,
 DataType.decimal).serialize(yourMapGoesHere);
 column.timestamp = new Date().getTime();

 ...


This is exactly equivalent to what Huiliang posted and will thus not work
any better.

Collections are internally not store as one thrift column per collection.
Each element of the collection is a separate thrift column and the exact
encoding depends on the collection. The fact is, updating CQL collection
from thrift is technically possible but it is not recommended in any way. I
strongly advise you to stick to CQL if you want to use CQL collections.

 --
Sylvain



  --
 *From:* Huiliang Zhang zhl...@gmail.com
 *Sent:* Friday, June 20, 2014 10:10 PM
 *To:* user@cassandra.apache.org
 *Subject:* Use Cassnadra thrift API with collection type

 Hi,

  I have a problem when insert data of the map type into a cassandra table.
 I tried all kinds of MapSerializer to serialize the Map data and did not
 succeed.

  My code is like this:
 Column column = new Column();
 column.name=columnSerializer.toByteBuffer(colname); // the
 column name of the map type, it works with other kinds of data type
 column.value =
 MapSerializer.getInstance(AsciiSerializer.instance,
 DecimalSerializer.instance).serialize(someMapData);
 column.timestamp = new Date().getTime();

 Mutation mutation = new Mutation();
 mutation.column_or_supercolumn = new ColumnOrSuperColumn();
 mutation.column_or_supercolumn.column = column;
 mutationList.add(mutation);

  The data was input into the cassandra DB however it cannot be retrieved
 by CQL3 with the following error:
 ERROR 14:32:48,192 Exception in thread Thread[Thrift:4,5,main]
 java.lang.AssertionError
 at
 org.apache.cassandra.cql3.statements.ColumnGroupMap.getCollection(ColumnGroupMap.java:88)
 at
 org.apache.cassandra.cql3.statements.SelectStatement.getCollectionValue(SelectStatement.java:1185)
 at
 org.apache.cassandra.cql3.statements.SelectStatement.handleGroup(SelectStatement.java:1169)
 at
 org.apache.cassandra.cql3.statements.SelectStatement.processColumnFamily(SelectStatement.java:1076)
 ...

  So the question is how to write map data into cassandra by thrift API.
 Appreciated for any help.

  Thanks,
  Huiliang






Re: Exception with java driver

2014-06-19 Thread Sylvain Lebresne
Please don't post on two mailing lists at once, it makes it impossible for
people that are not subscribed to the 2 mailing list to follow the thread
(and is bad form in general). If unsure which one is the most appropriate,
it's fine, pick your best guest (in this case it's clearly a java driver
question).

--
Sylvain


On Thu, Jun 19, 2014 at 5:22 AM, Shaheen Afroz shaheenn.af...@gmail.com
wrote:

 +Cassandra DL

 We have Cassandra nodes in three datacenters - dc1, dc2 and dc3 and the
 cluster name is DataCluster. In the same way, our application code is also
 in same three datacenters. Our application code is accessing cassandra.

 Now I want to make sure if application call is coming from `dc1` then it
 should go to cassandra `dc1` always. Same with `dc2` and `dc3`.

 So I decided to use DCAwareRoundRobinPolicy of datastax java driver.
 Cassandra version we have is DSE 4.0 and datastax java driver version we
 are using is 2.0.2.

 But somehow with the below code it always gives me an excpetion as
 NoHostAvailableException -

 com.datastax.driver.core.exceptions.NoHostAvailableException: All host(s)
 tried for query failed (no host was tried)
  But in the same code, if I comment out below line and run it again. It
 works fine without any problem. That is pretty strange. What could be wrong
 with DCAwareRoundRobinPolicy or Cassandra setup?

 .withLoadBalancingPolicy(new DCAwareRoundRobinPolicy(dc1))
  Below is my code -

 public static Cluster cluster;
  public static Session session;
 public static Builder builder;

 public static void main(String[] args) {

  try {
 builder = Cluster.builder();
  builder.addContactPoint(some1_dc1_machine);
  builder.addContactPoint(some2_dc1_machine);
 builder.addContactPoint(some1_dc2_machine);
  builder.addContactPoint(some2_dc2_machine);
 builder.addContactPoint(some1_dc3_machine);
  builder.addContactPoint(some2_dc3_machine);
  PoolingOptions opts = new PoolingOptions();
 opts.setCoreConnectionsPerHost(HostDistance.LOCAL,
 opts.getCoreConnectionsPerHost(HostDistance.LOCAL));

 SocketOptions socketOpts = new SocketOptions();
  socketOpts.setReceiveBufferSize(1048576);
 socketOpts.setSendBufferSize(1048576);
  socketOpts.setTcpNoDelay(false);

  cluster = builder
 .withSocketOptions(socketOpts)
  .withRetryPolicy(DowngradingConsistencyRetryPolicy.INSTANCE)
 .withPoolingOptions(opts)
  .withReconnectionPolicy(new ConstantReconnectionPolicy(100L))
  .withLoadBalancingPolicy(new DCAwareRoundRobinPolicy(dc1))
 .withCredentials(username, password)
  .build();


 session = cluster.connect(testingkeyspace);
  Metadata metadata = cluster.getMetadata();
 System.out.println(String.format(Connected to cluster '%s' on
 %s., metadata.getClusterName(), metadata.getAllHosts()));

  } catch (NoHostAvailableException e) {
 System.out.println(NoHostAvailableException);
  e.printStackTrace();
 System.out.println(e.getErrors());
  } catch (Exception e) {
 System.out.println(Exception);
  e.printStackTrace();
 }
  }

 To unsubscribe from this group and stop receiving emails from it, send an
 email to java-driver-user+unsubscr...@lists.datastax.com.



Re: Questions about timestamp set at writetime

2014-06-17 Thread Sylvain Lebresne

 1) Who is responsible for this micro-second timestamp ? The coordinator
 which receives the insert request or each replica which actually do persist
 the data ?


The coordinator.



 2) In a case of a batch insert (CQL3 batch, not batch mutation Thrift
 API), if no user defined timestamp is set, neither on the batch statement
 nor on each individual statement, will C* generate a SAME timestamp for
 each individual statement in the batch or will there distinct timestamps ?


All the sub-statements will have the same timestamp.

--
Sylvain


[RELEASE CANDIDATE] Apache Cassandra 2.1.0-rc1 released

2014-06-02 Thread Sylvain Lebresne
The Cassandra team is pleased to announce the release of the first release
candidate for the future Apache Cassandra 2.1.0.

Let first stress that this is not yet the final release of 2.1.0-rc1 and as
such is *not* ready for production use. We however encourage as much testing
as possible of this release candidate as possible and please report any
problem you may encounter[3,4]. Also make sure to have a look at the change
log[1] and release notes[2] to see where Cassandra 2.1 differs from the
previous series.

Apache Cassandra 2.1.0-rc1[5] is available as usual from the cassandra
website (http://cassandra.apache.org/download/) and a debian package is
available using the 21x branch (see
http://wiki.apache.org/cassandra/DebianPackaging).

Thank you for your help in testing and have fun with it.

[1]: http://goo.gl/sjdeEG (CHANGES.txt)
[2]: http://goo.gl/k6aW8H (NEWS.txt)
[3]: https://issues.apache.org/jira/browse/CASSANDRA
[4]: user@cassandra.apache.org
[5]:
http://git-wip-us.apache.org/repos/asf?p=cassandra.git;a=shortlog;h=refs/tags/cassandra-2.1.0-rc1


[RELEASE] Apache Cassandra 2.0.8 released

2014-05-29 Thread Sylvain Lebresne
The Cassandra team is pleased to announce the release of Apache Cassandra
version 2.0.8.

Cassandra is a highly scalable second-generation distributed database,
bringing together Dynamo's fully distributed design and Bigtable's
ColumnFamily-based data model. You can read more here:

 http://cassandra.apache.org/

Downloads of source and binary distributions are listed in our download
section:

 http://cassandra.apache.org/download/

This version is a bug fix release[1] on the 2.0 series. As always, please
pay
attention to the release notes[2] and Let us know[3] if you were to
encounter
any problem.

Enjoy!

[1]: http://goo.gl/Z2XJWN (CHANGES.txt)
[2]: http://goo.gl/JYEB2D (NEWS.txt)
[3]: https://issues.apache.org/jira/browse/CASSANDRA


[BETA RELEASE] Apache Cassandra 2.1.0-beta2 released

2014-05-05 Thread Sylvain Lebresne
The Cassandra team is pleased to announce the release of the 2nd beta for
the future Apache Cassandra 2.1.0.

Let first stress that this is beta software and as such is *not* ready for
production use.

The goal of this release is to give a preview of what will become Cassandra
2.1 and to get wider testing in preparation for the final release. This
beta is known to contain bugs but all help in testing this beta would be
greatly appreciated and will help make 2.1 a solid release. Please report
any
problem you may encounter[3,4] with this release and have a look at the
change
log[1] and release notes[2] to see where Cassandra 2.1 differs from the
previous series.

Apache Cassandra 2.1.0-beta2[5] is available as usual from the cassandra
website (http://cassandra.apache.org/download/) and a debian package is
available using the 21x branch (see
http://wiki.apache.org/cassandra/DebianPackaging).

Thank you for your help in testing and have fun with it.

[1]: http://goo.gl/iDbwjb (CHANGES.txt)
[2]: http://goo.gl/rGJJmK (NEWS.txt)
[3]: https://issues.apache.org/jira/browse/CASSANDRA
[4]: user@cassandra.apache.org
[5]:
http://git-wip-us.apache.org/repos/asf?p=cassandra.git;a=shortlog;h=refs/tags/cassandra-2.1.0-beta2


[RELEASE] Apache Cassandra 2.0.7 released

2014-04-18 Thread Sylvain Lebresne
The Cassandra team is pleased to announce the release of Apache Cassandra
version 2.0.7.

Cassandra is a highly scalable second-generation distributed database,
bringing together Dynamo's fully distributed design and Bigtable's
ColumnFamily-based data model. You can read more here:

 http://cassandra.apache.org/

Downloads of source and binary distributions are listed in our download
section:

 http://cassandra.apache.org/download/

This version is a bug fix release[1] on the 2.0 series. As always, please
pay
attention to the release notes[2] and Let us know[3] if you were to
encounter
any problem.

Enjoy!

[1]: http://goo.gl/L7zU1H (CHANGES.txt)
[2]: http://goo.gl/eMy1jp (NEWS.txt)
[3]: https://issues.apache.org/jira/browse/CASSANDRA


Re: Dead node appearing in datastax driver

2014-04-01 Thread Sylvain Lebresne
On Tue, Apr 1, 2014 at 12:50 PM, Apoorva Gaurav
apoorva.gau...@myntra.comwrote:

 Hello All,

 We had a 4 node cassandra 2.0.4 cluster  ( lets call them host1, host2,
 host3 and host4), out of which we've removed one node (host4) using
 nodetool removenode command. Now using nodetool status or nodetool ring we
 no longer see host4. It's also not appearing in Datastax opscenter. But its
 intermittently appearing in Metadata.getAllHosts() while connecting using
 datastax driver 1.0.4.

 Couple of questions :-
 -How is it appearing.


Not sure. Can you try querying the peers system table on each of your nodes
(with cqlsh: SELECT * FROM system.peers) and see if the host4 is still
mentioned somewhere?


 -Can this have impact on read / write performance of client.


No. If the host doesn't exists, the driver might try to reconnect to it at
times, but since it won't be able to, it won't try to use it for reads and
writes. That does mean you might have a reconnection task running with some
regularity, but 1) it's not on the write/read path of queries and 2)
provided you've left the default reconnection policy, this will happen once
every 10 minutes and will be pretty cheap so that it will consume an
completely negligible amount of ressources. That doesn't mean I'm not
interested tracking down why that happens in the first place though.

--
Sylvain




 Code which we are using to connect is

  public void connect() {

 PoolingOptions poolingOptions = new PoolingOptions();

 cluster = Cluster.builder()

 .addContactPoints(inetAddresses.toArray(new String[]{}))

 .withLoadBalancingPolicy(new RoundRobinPolicy())

 .withPoolingOptions(poolingOptions)

 .withPort(port)

 .withCredentials(username, password)

 .build();

 Metadata metadata = cluster.getMetadata();

 System.out.printf(Connected to cluster: %s\n,
 metadata.getClusterName());

 for (Host host : metadata.getAllHosts()) {

 System.out.printf(Datacenter: %s; Host: %s; Rack: %s\n,
 host.getDatacenter(), host.getAddress(), host.getRack());

 }

 }



 --
 Thanks  Regards,
 Apoorva



Re: Dead node appearing in datastax driver

2014-04-01 Thread Sylvain Lebresne
What does Did that mean? Does that means I upgraded to 2.0.6, or does
that mean I manually removed entries from System.peers. If the latter,
I'd need more info on what you did exactly, what your peers table looked
like before and how they look like now: there is no reason deleting the
peers entries for hosts that at not part of the cluster anymore would have
anything to do with write latency (but if say you've removed wrong entries,
that might have make the driver think some live host had been removed and
if the drivers has less nodes to use to dispatch queries, that might impact
latency I suppose -- at least that's the only related thing I can think of).

--
Sylvain


On Tue, Apr 1, 2014 at 2:44 PM, Apoorva Gaurav apoorva.gau...@myntra.comwrote:

 Did that and I actually see a significant reduction in write latency.


 On Tue, Apr 1, 2014 at 5:35 PM, Sylvain Lebresne sylv...@datastax.comwrote:

 On Tue, Apr 1, 2014 at 1:49 PM, Apoorva Gaurav apoorva.gau...@myntra.com
  wrote:

 Hello Sylvian,

 Queried system.peers on three live nodes and host4 is appearing on two
 of these.


 That's why the driver thinks they are still there. You're most probably
 running into https://issues.apache.org/jira/browse/CASSANDRA-6053 since
 you are on C* 2.0.4. As said, this is relatively harmless, but you should
 think about upgrading to 2.0.6 to fix it in the future (you could manually
 remove the bad entries in System.peers in the meantime if you want, they
 are really just leftover that shouldn't be here).

 --
 Sylvain



 On Tue, Apr 1, 2014 at 5:06 PM, Sylvain Lebresne 
 sylv...@datastax.comwrote:

 On Tue, Apr 1, 2014 at 12:50 PM, Apoorva Gaurav 
 apoorva.gau...@myntra.com wrote:

 Hello All,

 We had a 4 node cassandra 2.0.4 cluster  ( lets call them host1,
 host2, host3 and host4), out of which we've removed one node (host4) using
 nodetool removenode command. Now using nodetool status or nodetool ring we
 no longer see host4. It's also not appearing in Datastax opscenter. But 
 its
 intermittently appearing in Metadata.getAllHosts() while connecting using
 datastax driver 1.0.4.

 Couple of questions :-
 -How is it appearing.


 Not sure. Can you try querying the peers system table on each of your
 nodes (with cqlsh: SELECT * FROM system.peers) and see if the host4 is
 still mentioned somewhere?


 -Can this have impact on read / write performance of client.


 No. If the host doesn't exists, the driver might try to reconnect to it
 at times, but since it won't be able to, it won't try to use it for reads
 and writes. That does mean you might have a reconnection task running with
 some regularity, but 1) it's not on the write/read path of queries and 2)
 provided you've left the default reconnection policy, this will happen once
 every 10 minutes and will be pretty cheap so that it will consume an
 completely negligible amount of ressources. That doesn't mean I'm not
 interested tracking down why that happens in the first place though.

 --
 Sylvain




 Code which we are using to connect is

  public void connect() {

 PoolingOptions poolingOptions = new PoolingOptions();

 cluster = Cluster.builder()

 .addContactPoints(inetAddresses.toArray(newString[]{}))

 .withLoadBalancingPolicy(new RoundRobinPolicy())

 .withPoolingOptions(poolingOptions)

 .withPort(port)

 .withCredentials(username, password)

 .build();

 Metadata metadata = cluster.getMetadata();

 System.out.printf(Connected to cluster: %s\n,
 metadata.getClusterName());

 for (Host host : metadata.getAllHosts()) {

 System.out.printf(Datacenter: %s; Host: %s; Rack: %s\n,
 host.getDatacenter(), host.getAddress(), host.getRack());

 }

 }



 --
 Thanks  Regards,
 Apoorva





 --
 Thanks  Regards,
 Apoorva





 --
 Thanks  Regards,
 Apoorva



[RELEASE] Apache Cassandra 1.2.16 released

2014-03-31 Thread Sylvain Lebresne
The Cassandra team is pleased to announce the release of Apache Cassandra
version 1.2.16.

Cassandra is a highly scalable second-generation distributed database,
bringing together Dynamo's fully distributed design and Bigtable's
ColumnFamily-based data model. You can read more here:

 http://cassandra.apache.org/

Downloads of source and binary distributions are listed in our download
section:

 http://cassandra.apache.org/download/

This version is a maintenance/bug fix release[1] on the 1.2 series. As
always,
please pay attention to the release notes[2] and Let us know[3] if you were
to
encounter any problem.

Enjoy!

[1]: http://goo.gl/E5Q9Cq (CHANGES.txt)
[2]: http://goo.gl/bQJhms (NEWS.txt)
[3]: https://issues.apache.org/jira/browse/CASSANDRA


Re: Thrift - CQL

2014-03-26 Thread Sylvain Lebresne

 -  *Is there any way to do insert / update at all on a good old wide cf
 using CQL?   Based on what we read back out, we have tried:*


 INSERT INTO cf_name(key, column1, value) VALUES ('key1',
 'columnName1','columnValue2')


 But we ended up with Unknown identifier column1


What does cqlsh give you if you try to do 'DESC cf_name'?





 -  *About read -  One of our cf is defined with a secondary index.  So
 the schema looks something like:*



 create column family cf_with_index

   with column_type = 'Standard'

   and comparator = 'UTF8Type'

   and default_validation_class = 'UTF8Type'

   and key_validation_class = 'UTF8Type'

   and column_metadata = [

 {column_name : 'indexed_column',

 validation_class : UTF8Type,

 index_name : 'column_idx',

 index_type : 0}];



 When reading from cli, we will see all columns, data as you expected:

 --

 ---

 RowKey: rowkey1

 = (name=c1, value=v1, timestamp=xxx, ttl=604800)

 = (name=c2, value=v2, timestamp=xxx, ttl=604800)

 = (name=c3, value=v3, timestamp=xxx, ttl=604800)

 = (name=indexed_column, value=value1, timestamp=xxx, ttl=604800)

 ---



 However when we Query via CQL, we only get the indexed column:

 SELECT * FROM cf_with_index WHERE key = 'rowkey1';



 key   | indexed_column

 ---+

 rowkey1   | value1



 Any way to get the rest?


You would have to declare the other columns (c1, c2 and c3) in the metadata
(you don't have to index them though).





 -  *Obtaining TTL and writetime on these wide rows  - we tried:*

 *SELECT key, column1, value, writetime(value), ttl(value) FROM cf LIMIT 1;*

 *It works, but a bit clumsy.  Is there a better way?*


No, it's the CQL way (not that I particularly agree with the clumsy
qualification, but I suppose we all have different opinion on what is
clumsy and what is not).




 -  *We can live with thrift.  Is there any way / plan to let us to
 execute thrift with datastax driver?*


No (and it's not like it's a minor change to allow that, the DataStax Java
driver uses the native protocol which is CQL only by nature).

--
Sylvain


Re: Serial Consistency and Thrift API

2014-03-15 Thread Sylvain Lebresne
On Fri, Mar 14, 2014 at 7:59 PM, Panagiotis Garefalakis
panga...@gmail.comwrote:


 Hello all,

 I am running some tests in my cluster and I wanted to try some of the new
 features of Cassandra like lightweight transactions and Serial Writes.
 Surprisingly I found out that Serial writes are not supported by the
 Thrift API.


They *are* supported by thrift. First, let me remark that both lightweight
transactions and Serial Writes are 2 names for the same thing. And in
thrift it's supported through the cas() method.

--
Sylvain



 Is there any patch available or the only way to support them is though CQL?

 Thanks in advance,
 Panagiotis



Re: CQL Select Map using an IN relationship

2014-03-13 Thread Sylvain Lebresne
On Thu, Mar 13, 2014 at 12:12 PM, David Savage davemssav...@gmail.comwrote:

 Hi there,

 I'm experimenting using cassandra and have run across an error message
 which I need a little more information on.

 The use case I'm experimenting with is a series of document updates
 (documents being an arbitrary map of key value pairs), I would like to find
 the latest document updates after a specified time period. I don't want to
 store many copies of the documents (one per update) as the updates are
 often only to single keys in the map so that would involve a lot of
 duplicated data.

 The solution I've found that seems to fit best in terms of performance is
 to have two tables.

 One that has an event log of timeuuid - docid and a second that stores
 the documents themselves stored by docid - mapstring, string. I then run
 two queries, one to select ids that have changed after a certain time:

 SELECT id FROM eventlog WHERE timestamp=minTimeuuid($minimumTime)

 and then a second to select the actual documents themselves

 SELECT id, data FROM documents WHERE id IN (0, 1, 2, 3, 4, 5, 6, 7...)

 However this then explodes on query with the error message:

 Cannot restrict PRIMARY KEY part id by IN relation as a collection is
 selected by the query

 Detective work lead me to these lines in
 org.apache.cassandra.cql3.statementsSelectStatement:

 // We only support IN for the last name and for
 compact storage so far
 // TODO: #3885 allows us to extend to non compact as
 well, but that remains to be done
 if (i != stmt.columnRestrictions.length - 1)
 throw new
 InvalidRequestException(String.format(PRIMARY KEY part %s cannot be
 restricted by IN relation, cname));
 else if (stmt.selectACollection())
 throw new
 InvalidRequestException(String.format(Cannot restrict PRIMARY KEY part %s
 by IN relation as a collection is selected by the query, cname));

 It seems like #3885 will allow support for the first IF block above, but I
 don't think it will allow the second, am I correct?


Right, #3885 is about the first one. Tbh, the 2nd limitation is kind of
historical and unless I'm forgetting something, we should be able to lift
that pretty easily. If you don't mind opening a JIRA ticket, I'll have a
look at removing said limitation.

--
Sylvain




 Any pointers on how I can work around this would be greatly appreciated.

 Kind regards,

 Dave



[RELEASE] Apache Cassandra 2.0.6 released

2014-03-10 Thread Sylvain Lebresne
The Cassandra team is pleased to announce the release of Apache Cassandra
version 2.0.6.

Cassandra is a highly scalable second-generation distributed database,
bringing together Dynamo's fully distributed design and Bigtable's
ColumnFamily-based data model. You can read more here:

 http://cassandra.apache.org/

Downloads of source and binary distributions are listed in our download
section:

 http://cassandra.apache.org/download/

This version is a bug fix release[1] on the 2.0 series. As always, please
pay
attention to the release notes[2] and Let us know[3] if you were to
encounter
any problem.

Enjoy!

[1]: http://goo.gl/UXgyZh (CHANGES.txt)
[2]: http://goo.gl/VxSAiN (NEWS.txt)
[3]: https://issues.apache.org/jira/browse/CASSANDRA


Re: Question regarding java DowngradingConsistencyRetryPolicy

2014-03-05 Thread Sylvain Lebresne
Let me first note that the DataStax Java driver has a dedicated mailing
list:
https://groups.google.com/a/lists.datastax.com/forum/#!forum/java-driver-user,
it would better to use that list for driver specific questions in the
future.

But to answer your question, a SIMPLE write is any write (INSERT, UPDATE,
DELETE) that is not in a batch. Concretely, if you do:
  session.execute(INSERT INTO ...);
it's a SIMPLE write.

--
Sylvain


On Tue, Mar 4, 2014 at 7:21 PM, HAITHEM JARRAYA a-hjarr...@expedia.comwrote:

 Hi All,

 I might be missing something and I would like some clarification on this.
 We are using the java driver with the Downgrading Retry policy, we see in
 our logs that are only the reads are retried.

 In the code and the docs, it says that the write method will retry a
 maximum of one retry, when the WriteType is UNLOGGED_BATCH or BATCH_LOG.
 My question is, when a write is considered as SIMPLE?

 Thanks,

 Haithem

 /**
  * Defines whether to retry and at which consistency level on a write
 timeout.
  * p
  * This method triggers a maximum of one retry. If {@code writeType ==
  * WriteType.BATCH_LOG}, the write is retried with the initial
  * consistency level. If {@code writeType == WriteType.UNLOGGED_BATCH}
  * and at least one replica acknowledged, the write is retried with a
  * lower consistency level (with unlogged batch, a write timeout can
  * balways/b mean that part of the batch haven't been persisted at
  * all, even if {@code receivedAcks  0}). For other {@code writeType},
  * if we know the write has been persisted on at least one replica, we
  * ignore the exception. Otherwise, an exception is thrown.
  *
  * @param statement the original query that timed out.
  * @param cl the original consistency level of the write that timed
 out.
  * @param writeType the type of the write that timed out.
  * @param requiredAcks the number of acknowledgments that were
 required to
  * achieve the requested consistency level.
  * @param receivedAcks the number of acknowledgments that had been
 received
  * by the time the timeout exception was raised.
  * @param nbRetry the number of retry already performed for this
 operation.
  * @return a RetryDecision as defined above.
  */
 @Override
 public RetryDecision onWriteTimeout(Statement statement,
 ConsistencyLevel cl, WriteType writeType, int requiredAcks, int
 receivedAcks, int nbRetry) {
 if (nbRetry != 0)
 return RetryDecision.rethrow();

 switch (writeType) {
 case SIMPLE:
 case BATCH:
 // Since we provide atomicity there is no point in retrying
 return RetryDecision.ignore();
 case UNLOGGED_BATCH:
 // Since only part of the batch could have been persisted,
 // retry with whatever consistency should allow to persist
 all
 return maxLikelyToWorkCL(receivedAcks);
 case BATCH_LOG:
 return RetryDecision.retry(cl);
 }
 // We want to rethrow on COUNTER and CAS, because in those case
 we don't know and don't want to guess
 return RetryDecision.rethrow();
 }





Re: Invalid compacted_at timestamp entries in Cassandra 2.0.5

2014-03-03 Thread Sylvain Lebresne
You're probably running into
https://issues.apache.org/jira/browse/CASSANDRA-6784. This will be fixed in
2.0.6.


On Mon, Mar 3, 2014 at 3:43 PM, Phil Luckhurst 
phil.luckhu...@powerassure.com wrote:

 Running 'nodetool compactionHistory' seems to be showing strange timestamp
 values for the 'compacted_at' column. e.g.
 id  keyspace_name
 columnfamily_namecompacted_at  bytes_in
 bytes_out  rows_merged
 cb035320-9f11-11e3-82e3-e37a59d03017 system
 sstable_activity
 1212036306964769  74352  19197  {1:19, 4:427}

 And running a CQL query on the system.compaction_history table shows dates
 well in the future.

  id| bytes_in |
 bytes_out | columnfamily_name   | compacted_at  |
 keyspace_name  | rows_merged

 --+--+---+-+---++--
  bda494f0-9db8-11e3-bb85-ed7074988754 |  647 |   320 |
 compactions_in_progress | 17391-03-07 12:26:35+ | system |
 {1: 3, 2: 1}
  dc87cd00-a269-11e3-a1a8-ed7074988754 |  410 |   159 |
 compactions_in_progress | 33738-09-07 17:03:21+0100 | system |
 {1: 2, 2: 1}

 Is this a known issue or something wrong on our system?

 Thanks
 Phil




 --
 View this message in context:
 http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Invalid-compacted-at-timestamp-entries-in-Cassandra-2-0-5-tp7593190.html
 Sent from the cassandra-u...@incubator.apache.org mailing list archive at
 Nabble.com.



Re: Update multiple rows in a CQL lightweight transaction

2014-02-27 Thread Sylvain Lebresne
On Thu, Feb 27, 2014 at 12:53 AM, Clint Kelly clint.ke...@gmail.com wrote:

 Thanks for your help everyone.

 Sylvain, as I understand it, the scenario I described above is not
 resolved by CASSANDRA-6561, correct?


Well, no, my point is that it kind of is resolved. At least if we're still
talking about:
  If there is a row with (x,y,t,z) = (a,1,2,10), then update/insert a row
with (x,y,t,z) = (a,3,4,5) and update/insert a row with (x,y,t,z) =
(a,4,5,6).
and assuming in you example schema that the the primary key is (x,y), then
you can do:
  BEGIN BATCH
UPDATE foo SET t = 2 WHERE t = 2 AND z = 10 IF x = 'a' AND y = 1;
INSERT INTO foo (x, y, t, z) VALUES ('a', 3, 4, 5);
INSERT INTO foo (x, y, t, z) VALUES ('a', 4, 5, 6)
  APPLY BATCH

This do what you described above. Now, it is true that it will also
overwrite the column 't' of the first row with it's current value because
(you do need syntax wise to at least set one value in the first UPDATE),
which is not part of what you described, but since this is done
atomically, overwriting a column by its own value probably doesn't matter.
And it almost surely doesn't matter performance wise since
it all end up in one internal mutation and so updating one more column
won't make a measurable difference, especially since you're using
a lightweight transaction and those are not the most performance oriented
operation in the first place.

--
Sylvain



 (This scenario may not matter to most folks, which is totally fine, I just
 want to make sure that I understand.)

 Should I instead look into using the Thrift API to address this?

 Best regards,
 Clint



 On Tue, Feb 25, 2014 at 11:30 PM, Sylvain Lebresne 
 sylv...@datastax.comwrote:

 Sorry to interfere again here but CASSANDRA-5633 will not be picked up
 because pretty much everything it was set to fix is fixed by
 CASSANDRA-6561, this is *not* a syntax problem anymore.


 On Wed, Feb 26, 2014 at 3:18 AM, Tupshin Harper tups...@tupshin.comwrote:

 Unfortunately there is no option to vote for a resolved ticket, but if
 you can propose a better syntax that people agree on, you could probably
 get some fresh traction on it.

 -Tupshin
  On Feb 25, 2014 7:20 PM, Clint Kelly clint.ke...@gmail.com wrote:

 Hi Tupshin,

 Thanks for your help!  Unfortunately in my case, I will need to do a
 compare and set in which the compare is against a value in a dynamic 
 column.

 In general, I need to be able to do the following:

- Check whether a given value exists in a dynamic column
- If so, perform some number of insertions / deletions for dynamic
columns in the same row (i.e., with the same partition key as the 
 dynamic
column used for the compare)

 I think you are correct that I need
 https://issues.apache.org/jira/browse/CASSANDRA-5633 to be
 implemented.  Is there any way to vote for that to get picked up again?  :)

 Best regards,
 Clint





 On Mon, Feb 24, 2014 at 2:32 PM, Tupshin Harper tups...@tupshin.comwrote:

 Hi Clint,

 That does appear to be an omission in CQL3. It would be possible to
 simulate it by doing
 BEGIN BATCH
 UPDATE foo SET z = 10 WHERE x = 'a' AND y = 1 IF t= 2 AND z=10;
 UPDATE foo SET t = 5,z=6 where x = 'a' AND y = 4
 APPLY BATCH;

 However, this does a redundant write to the first row if the condition
 holds, and I certainly wouldn't recommend doing that routinely.

 Alternatively, depending on your needs, you might be able to use a
 static column (coming with 2.0.6) as your conditional flag, as that column
 is shared by all rows in the partition.

 -Tupshin



 On Mon, Feb 24, 2014 at 3:57 PM, Clint Kelly clint.ke...@gmail.comwrote:

 Hi Tupshin,

 Thanks for your help; I appreciate it.

 Could I do something like the following?

 Given the same table you started with:

 x | y | t | z
 ---+---+---+
  a | 1 | 2 | 10
  a | 2 | 2 | 20

 I'd like to write a compare-and-set that does something like:

 If there is a row with (x,y,t,z) = (a,1,2,10), then update/insert a
 row with (x,y,t,z) = (a,3,4,5) and update/insert a row with (x,y,t,z)
 = (a,4,5,6).


 I don't see how I could do this with what you outlined above---just
 curious.  It seems like what I describe above under the hood would be
 a compare-and-(batch)-set on a single wide row, so it maybe is
 possible with the Thrift API (I have to check).

 Thanks again!

 Best regards,
 Clint

 On Sat, Feb 22, 2014 at 11:38 AM, Tupshin Harper tups...@tupshin.com
 wrote:
  #5633 was actually closed  because the static columns feature
  (https://issues.apache.org/jira/browse/CASSANDRA-6561) which has
 been
  checked in to the 2.0 branch but is not yet part of a release (it
 will be in
  2.0.6).
 
  That feature will let you update multiple rows within a single
 partition by
  doing a CAS write based on a static column shared by all rows
 within the
  partition.
 
  Example extracted from the ticket:
  CREATE TABLE foo (
  x text,
  y bigint,
  t bigint static,
  z bigint,
  PRIMARY KEY (x, y

Re: Combine multiple SELECT statements into one RPC?

2014-02-27 Thread Sylvain Lebresne
On Thu, Feb 27, 2014 at 1:00 AM, Clint Kelly clint.ke...@gmail.com wrote:

 Hi all,

 Is there any way to use the DataStax Java driver to combine multiple
 SELECT statements into a single RPC?  I assume not (I could not find
 anything about this in the documentation), but I just wanted to check.


The short answer is no.

The slightly longer answer is that the DataStax Java driver uses the
so-called native protocol. And that protocol does not allow to have
multiple SELECT into a single protocol message (the protocol is not really
RPC-based strictly speaking so I'll assume you meant one client-server
message here), and it follows that the driver can't either. But I'll note
that the reason why the protocol doesn't have such a thing is that it's
generally a better idea to parallelize your SELECT client side, though
since you haven't provided much context for you question I'd rather not go
into too much details here since that might be off-topic.

--
Sylvain


Re: CQL decimal encoding

2014-02-25 Thread Sylvain Lebresne
On Mon, Feb 24, 2014 at 8:50 PM, Theo Hultberg t...@iconara.net wrote:

 I don't know if it's by design or if it's by oversight that the data types
 aren't part of the binary protocol specification.


The honest answer is, no-one took the time to write that down properly and
include it in the spec. My small excuse for initially skipping it in the
spec is that the CQL data type encodings are really not different from what
we have had in thrift since forever, so that there is already many driver
in a lot of language out there that have encoding/decoding functions for
them that can be looked at. But if someone find some time to gather all
those encoding and provides a patch for the spec with them, that would
definitively be much appreciated. Cassandra is an open-source software,
everyone can contribute, and that does not exclude documentation and
specifications.

Regarding the decimal, it does uses 4 bytes for the scale and the rest for
the bytes of the unscaled value (i.e. a variable length integer) as Peter
mentioned. the actual value being the unscaled value * 10^-scale
(internally C* really just use the following BigDecimal ctor:
http://docs.oracle.com/javase/7/docs/api/java/math/BigDecimal.html#BigDecimal(java.math.BigInteger,
int)).

--
Sylvain



 I had to reverse engineer how to encode and decode all of them for the
 Ruby driver. There were definitely a few bugs in the first few versions
 that could have been avoided if there was a specification available.

 T#


 On Mon, Feb 24, 2014 at 8:43 PM, Paul LeoNerd Evans 
 leon...@leonerd.org.uk wrote:

 On Mon, 24 Feb 2014 19:14:48 +
 Ben Hood 0x6e6...@gmail.com wrote:

  So I have a question about the encoding of 0: \x00\x00\x00\x00\x00.

 The first four octets are the decimal shift (0), and the remaining ones
 (one in this case) encode a varint - 0 in this case. So it's

   0 * 10**0

 literally zero.

 Technically the decimal shift matters not for zero - any four bytes
 could be given as the shift, ending in \x00, but 0 is the simplest.

 --
 Paul LeoNerd Evans

 leon...@leonerd.org.uk
 ICQ# 4135350   |  Registered Linux# 179460
 http://www.leonerd.org.uk/





Re: Update multiple rows in a CQL lightweight transaction

2014-02-25 Thread Sylvain Lebresne
On Mon, Feb 24, 2014 at 11:32 PM, Tupshin Harper tups...@tupshin.comwrote:

 Hi Clint,

 That does appear to be an omission in CQL3. It would be possible to
 simulate it by doing
 BEGIN BATCH
 UPDATE foo SET z = 10 WHERE x = 'a' AND y = 1 IF t= 2 AND z=10;
 UPDATE foo SET t = 5,z=6 where x = 'a' AND y = 4
 APPLY BATCH;

 However, this does a redundant write to the first row if the condition
 holds, and I certainly wouldn't recommend doing that routinely.


I'm not sure what would be the big deal of redundantly writing the first
row to it's existing value really. Performance wise, it's going to be
rather negligible since if you use a condition you are, by extension,
updating rows in the same partition, and so all of it ends up in the same
internal Mutation and there is very very little cost to adding one more
column to that mutation. The only minor downside is that it would bump the
writeTime() value for that column, but I can't imagine all that many
application for which that would be a big deal (though I suppose it could
be possible to support UPDATE with an empty SET clause so that you can put
condition on a row without updating it).

--
Sylvain




 Alternatively, depending on your needs, you might be able to use a static
 column (coming with 2.0.6) as your conditional flag, as that column is
 shared by all rows in the partition.

 -Tupshin



 On Mon, Feb 24, 2014 at 3:57 PM, Clint Kelly clint.ke...@gmail.comwrote:

 Hi Tupshin,

 Thanks for your help; I appreciate it.

 Could I do something like the following?

 Given the same table you started with:

 x | y | t | z
 ---+---+---+
  a | 1 | 2 | 10
  a | 2 | 2 | 20

 I'd like to write a compare-and-set that does something like:

 If there is a row with (x,y,t,z) = (a,1,2,10), then update/insert a
 row with (x,y,t,z) = (a,3,4,5) and update/insert a row with (x,y,t,z)
 = (a,4,5,6).


 I don't see how I could do this with what you outlined above---just
 curious.  It seems like what I describe above under the hood would be
 a compare-and-(batch)-set on a single wide row, so it maybe is
 possible with the Thrift API (I have to check).

 Thanks again!

 Best regards,
 Clint

 On Sat, Feb 22, 2014 at 11:38 AM, Tupshin Harper tups...@tupshin.com
 wrote:
  #5633 was actually closed  because the static columns feature
  (https://issues.apache.org/jira/browse/CASSANDRA-6561) which has been
  checked in to the 2.0 branch but is not yet part of a release (it will
 be in
  2.0.6).
 
  That feature will let you update multiple rows within a single
 partition by
  doing a CAS write based on a static column shared by all rows within the
  partition.
 
  Example extracted from the ticket:
  CREATE TABLE foo (
  x text,
  y bigint,
  t bigint static,
  z bigint,
  PRIMARY KEY (x, y) );
 
  insert into foo (x,y,t, z) values ('a', 1, 1, 10);
  insert into foo (x,y,t, z) values ('a', 2, 2, 20);
 
  select * from foo;
 
  x | y | t | z
  ---+---+---+
   a | 1 | 2 | 10
   a | 2 | 2 | 20
  (Note that both values of t are 2 because it is static)
 
 
   begin batch update foo set z = 1 where x = 'a' and y = 1; update foo
 set z
  = 2 where x = 'a' and y = 2 if t = 4; apply batch;
 
   [applied] | x | y| t
  ---+---+--+---
   False | a | null | 2
 
  (Both updates failed to apply because there was an unmet conditional on
 one
  of them)
 
  select * from foo;
 
   x | y | t | z
  ---+---+---+
   a | 1 | 2 | 10
   a | 2 | 2 | 20
 
 
  begin batch update foo set z = 1 where x = 'a' and y = 1; update foo
 set z =
  2 where x = 'a' and y = 2 if t = 2; apply batch;
 
   [applied]
  ---
True
 
  (both updates succeeded because the check on t succeeded)
 
  select * from foo;
  x | y | t | z
  ---+---+---+---
   a | 1 | 2 | 1
   a | 2 | 2 | 2
 
  Hope this helps.
 
  -Tupshin
 
 
 
  On Fri, Feb 21, 2014 at 6:05 PM, DuyHai Doan doanduy...@gmail.com
 wrote:
 
  Hello Clint
 
   The Resolution status of the JIRA is set to Later, probably the
  implementation is not done yet. The JIRA was opened to discuss about
 impl
  strategy but nothing has been coded so far I guess.
 
 
 
  On Sat, Feb 22, 2014 at 12:02 AM, Clint Kelly clint.ke...@gmail.com
  wrote:
 
  Folks,
 
  Does anyone know how I can modify multiple rows at once in a
  lightweight transaction in CQL3?
 
  I saw the following ticket:
 
  https://issues.apache.org/jira/browse/CASSANDRA-5633
 
  but it was not obvious to me from the comments how (or whether) this
  got resolved.  I also couldn't find anything in the DataStax
  documentation about how to perform these operations.
 
  I'm in particular interested in how to perform a compare-and-set
  operation that modifies multiple rows (with the same partition key)
  using the DataStax Java driver.
 
  Thanks!
 
  Best regards,
  Clint
 
 
 





  1   2   3   4   5   6   7   8   >