from:"Hiroyuki Yamada"

Re: design principle to manage roll back

2020-07-14 Thread Hiroyuki Yamada

As one of the options, you can use (Logged) batch for kind of atomic mutations.
I said, "kind of" because it is not really atomic when mutations span
multiple partitions.
More specifically, the mutations go to all the nodes eventually so
intermediate states can be observed and there is no rollback.
https://docs.datastax.com/en/cql-oss/3.3/cql/cql_using/useBatch.html

You can do similar things by applications' side by making write
operations idempotent and
retry them until they are all succeeded.

As a last resort, you can use Scalar DB.
https://github.com/scalar-labs/scalardb
When you do operations through Scalar DB to Cassandra,
you can achieve ACID transactions on Cassandra.
If there is a failure during a transaction, it will be properly
rollbacked or rollforwarded based on the transaction states.

Hope it helps.

Thanks,
Hiro

On Tue, Jul 14, 2020 at 4:55 PM Manu Chadha  wrote:
>
> Thanks. Actually none of my data is big data. I just thought not to use 
> traditional RDBMS for my project. Features like replication, fast read and 
> write, always ON, scalability appealed well to me. I am also happy with 
> eventual consistency.
>
>
>
> To be honest, I feel there has to be a way because if Cassandra promotes data 
> duplication by creating a table for each query then there should be a way to 
> keep duplicate copies consistent.
>
>
>
>
>
> Sent from Mail for Windows 10
>
>
>
> From: onmstester onmstester
> Sent: 14 July 2020 08:04
> To: user
> Subject: Re: design principle to manage roll back
>
>
>
> Hi,
>
>
>
> I think that Cassandra alone is not suitable for your use case. You can use a 
> mix of Distributed/NoSQL (to storing single records of whatever makes your 
> input the big data) & Relational/Single Database (for transactional non-big 
> data part)
>
>
>
> Sent using Zoho Mail
>
>
>
>
>
>  On Tue, 14 Jul 2020 10:47:33 +0430 Manu Chadha  
> wrote 
>
>
>
>
>
> Hi
>
>
>
> What are the design approaches I can follow to ensure that data is consistent 
> from an application perspective (not from individual tables perspective). I 
> am thinking of issues which arise due to unavailability of rollback or 
> executing atomic transactions in Cassandra. Is Cassandra not suitable for my 
> project?
>
>
>
> Cassandra recommends creating a new table for each query. This results in 
> data duplication (which doesn’t bother me). Take the following scenario. An 
> application which allows users to create, share and manage food recipes. Each 
> of the function below adds records in a separate database
>
>
>
> for {savedRecipe <- saveInRecipeRepository(...)
>
>recipeTagRepository <- saveRecipeTag(...)
>partitionInfoOfRecipes <- savePartitionOfTheTag(...)
>updatedUserProfile <- updateInUserProfile(...)
>recipesByUser <- saveRecipesCreatedByUser(...)
>supportedRecipes <- updateSupportedRecipesInformation(tag)}
>
>
>
> If say updateInUserProfile fails, then I'll have to manage rollback in the 
> application itself as Cassandra doesn’t do it. My concerns is that the 
> rollback process could itself fail due to network issues say.
>
>
>
> Is there a recommended way or a design principle I can follow to keep data 
> consistent?
>
>
>
> Thanks
>
> Manu
>
>
>
> Sent from Mail for Windows 10
>
>
>
>
>
>
>
>

-
To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
For additional commands, e-mail: user-h...@cassandra.apache.org

Re: Cassandra is not showing a node up hours after restart

2019-12-11 Thread Hiroyuki Yamada

Hello Paul,

The behavior looks similar to what we experienced and reported.
https://issues.apache.org/jira/browse/CASSANDRA-15138

In our testing, "service cassandra stop" makes a cluster sometimes in
a wrong state.
How about doing kill -9 ?

Thanks,
Hiro

On Sun, Dec 8, 2019 at 7:47 PM Hossein Ghiyasi Mehr
 wrote:
>
> Which version of Cassandra did you install? deb or tar?
> If it's deb, its script should be used for start/stop.
> If it's tar, kill pid of cassandra to stop and use bin/cassandra to start.
>
> Stop doesn't need any other actions: drain, disable gossip & etc.
>
> Where do you use Cassandra?
> ---
> VafaTech : A Total Solution for Data Gathering & Analysis
> ---
>
>
> On Fri, Dec 6, 2019 at 11:20 PM Paul Mena  wrote:
>>
>> As we are still without a functional Cassandra cluster in our development 
>> environment, I thought I’d try restarting the same node (one of 4 in the 
>> cluster) with the following command:
>>
>>
>>
>> ip=$(cat /etc/hostname); nodetool disablethrift && nodetool disablebinary && 
>> sleep 5 && nodetool disablegossip && nodetool drain && sleep 10 && sudo 
>> service cassandra restart && until echo "SELECT * FROM system.peers LIMIT 
>> 1;" | cqlsh $ip > /dev/null 2>&1; do echo "Node $ip is still DOWN"; sleep 
>> 10; done && echo "Node $ip is now UP"
>>
>>
>>
>> The above command returned “Node is now UP” after about 40 seconds, 
>> confirmed on “node001” via “nodetool status”:
>>
>>
>>
>> user@node001=> nodetool status
>>
>> Datacenter: datacenter1
>>
>> ===
>>
>> Status=Up/Down
>>
>> |/ State=Normal/Leaving/Joining/Moving
>>
>> --  Address  Load   Tokens  OwnsHost ID  
>>  Rack
>>
>> UN  192.168.187.121  539.43 GB  256 ?   
>> c99cf581-f4ae-4aa9-ab37-1a114ab2429b  rack1
>>
>> UN  192.168.187.122  633.92 GB  256 ?   
>> bfa07f47-7e37-42b4-9c0b-024b3c02e93f  rack1
>>
>> UN  192.168.187.123  576.31 GB  256 ?   
>> 273df9f3-e496-4c65-a1f2-325ed288a992  rack1
>>
>> UN  192.168.187.124  628.5 GB   256 ?   
>> b8639cf1-5413-4ece-b882-2161bbb8a9c3  rack1
>>
>>
>>
>> As was the case before, running “nodetool status” on any of the other nodes 
>> shows that “node001” is still down:
>>
>>
>>
>> user@node002=> nodetool status
>>
>> Datacenter: datacenter1
>>
>> ===
>>
>> Status=Up/Down
>>
>> |/ State=Normal/Leaving/Joining/Moving
>>
>> --  Address  Load   Tokens  OwnsHost ID  
>>  Rack
>>
>> DN  192.168.187.121  538.94 GB  256 ?   
>> c99cf581-f4ae-4aa9-ab37-1a114ab2429b  rack1
>>
>> UN  192.168.187.122  634.04 GB  256 ?   
>> bfa07f47-7e37-42b4-9c0b-024b3c02e93f  rack1
>>
>> UN  192.168.187.123  576.42 GB  256 ?   
>> 273df9f3-e496-4c65-a1f2-325ed288a992  rack1
>>
>> UN  192.168.187.124  628.56 GB  256 ?   
>> b8639cf1-5413-4ece-b882-2161bbb8a9c3  rack1
>>
>>
>>
>> Is it inadvisable to continue with the rolling restart?
>>
>>
>>
>> Paul Mena
>>
>> Senior Application Administrator
>>
>> WHOI - Information Services
>>
>> 508-289-3539
>>
>>
>>
>> From: Shalom Sagges 
>> Sent: Tuesday, November 26, 2019 12:59 AM
>> To: user@cassandra.apache.org
>> Subject: Re: Cassandra is not showing a node up hours after restart
>>
>>
>>
>> Hi Paul,
>>
>>
>>
>> From the gossipinfo output, it looks like the node's IP address and 
>> rpc_address are different.
>>
>> /192.168.187.121 vs RPC_ADDRESS:192.168.185.121
>>
>> You can also see that there's a schema disagreement between nodes, e.g. 
>> schema_id on node001 is fd2dcb4b-ca62-30df-b8f2-d3fd774f2801 and on node002 
>> it is fd2dcb4b-ca62-30df-b8f2-d3fd774f2801.
>>
>> You can run nodetool describecluster to see it as well.
>>
>> So I suggest to change the rpc_address to the ip_address of the node or set 
>> it to 0.0.0.0 and it should resolve the issue.
>>
>>
>>
>> Hope this helps!
>>
>>
>>
>>
>>
>> On Tue, Nov 26, 2019 at 4:05 AM Inquistive allen  
>> wrote:
>>
>> Hello ,
>>
>>
>>
>> Check and compare everything parameters
>>
>>
>>
>> 1. Java version should ideally match across all nodes in the cluster
>>
>> 2. Check if port 7000 is open between the nodes. Use telnet or nc commands
>>
>> 3. You must see some clues in system logs, why the gossip is failing.
>>
>>
>>
>> Do confirm on the above things.
>>
>>
>>
>> Thanks
>>
>>
>>
>>
>>
>> On Tue, 26 Nov, 2019, 2:50 AM Paul Mena,  wrote:
>>
>> NTP was restarted on the Cassandra nodes, but unfortunately I’m still 
>> getting the same result: the restarted node does not appear to be rejoining 
>> the cluster.
>>
>>
>>
>> Here’s another data point: “nodetool gossipinfo”, when run from the 
>> restarted node (“node001”) shows a status of “normal”:
>>
>>
>>
>> user@node001=> nodetool -u gossipinfo
>>
>> /192.168.187.121
>>
>>   generation:1574364410
>>
>>   heartbeat:209150
>>
>>   NET_VERSION:8
>>
>>   RACK:rack1
>>

Released a simple and integrated backup tool for Apache Cassandra

2019-09-05 Thread Hiroyuki Yamada

Hi all,

We are pleased to announce the release of a new backup tool for
Cassandra called Cassy.
https://github.com/scalar-labs/cassy/

It is licensed under Apache 2.0 License so please give it a try.

Best regards,
Hiroyuki Yamada

-
To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
For additional commands, e-mail: user-h...@cassandra.apache.org

Re: [EXTERNAL] Re: loading big amount of data to Cassandra

2019-08-06 Thread Hiroyuki Yamada

cassandra-loader is also useful because you don't need to create sstables.
https://github.com/brianmhess/cassandra-loader

Hiro

On Tue, Aug 6, 2019 at 12:15 AM Durity, Sean R
 wrote:
>
> DataStax has a very fast bulk load tool - dsebulk. Not sure if it is 
> available for open source or not. In my experience so far, I am very 
> impressed with it.
>
>
>
> Sean Durity – Staff Systems Engineer, Cassandra
>
> -Original Message-
> From: p...@xvalheru.org 
> Sent: Saturday, August 3, 2019 6:06 AM
> To: user@cassandra.apache.org
> Cc: Dimo Velev 
> Subject: [EXTERNAL] Re: loading big amount of data to Cassandra
>
> Thanks to all,
>
> I'll try the SSTables.
>
> Thanks
>
> Pat
>
> On 2019-08-03 09:54, Dimo Velev wrote:
> > Check out the CQLSSTableWriter java class -
> > https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_apache_cassandra_blob_trunk_src_java_org_apache_cassandra_io_sstable_CQLSSTableWriter.java=DwIDaQ=MtgQEAMQGqekjTjiAhkudQ=aC_gxC6z_4f9GLlbWiKzHm1vucZTtVYWDDvyLkh8IaQ=0F8VMU_BKNwicZFDQ0Nx54JvvS3MHT92_W1RRwF3deA=F43aPz7NPfAfs5c_oRJQvUiTMJjDmpB_BXAHKhPfW2A=
> > . You use it to generate sstables - you need to write a small program
> > for that. You can then stream them over the network using the
> > sstableloader (either use the utility or use the underlying classes to
> > embed it in your program).
> >
> > On 3. Aug 2019, at 07:17, Ayub M  wrote:
> >
> >> Dimo, how do you generate sstables? Do you mean load data locally on
> >> a cassandra node and use sstableloader?
> >>
> >> On Fri, Aug 2, 2019, 5:48 PM Dimo Velev 
> >> wrote:
> >>
> >>> Hi,
> >>>
> >>> Batches will actually slow down the process because they mean a
> >>> different thing in C* - as you read they are just grouping changes
> >>> together that you want executed atomically.
> >>>
> >>> Cassandra does not really have indices so that is different than a
> >>> relational DB. However, after writing stuff to Cassandra it
> >>> generates many smallish partitions of the data. These are then
> >>> joined in the background together to improve read performance.
> >>>
> >>> You have two options from my experience:
> >>>
> >>> Option 1: use normal CQL api in async mode. This will create a
> >>> high CPU load on your cluster. Depending on whether that is fine
> >>> for you that might be the easiest solution.
> >>>
> >>> Option 2: generate sstables locally and use the sstableloader to
> >>> upload them into the cluster. The streaming does not generate high
> >>> cpu load so it is a viable option for clusters with other
> >>> operational load.
> >>>
> >>> Option 2 scales with the number of cores of the machine generating
> >>> the sstables. If you can split your data you can generate sstables
> >>> on multiple machines. In contrast, option 1 scales with your
> >>> cluster. If you have a large cluster that is idling, it would be
> >>> better to use option 1.
> >>>
> >>> With both options I was able to write at about 50-100K rows / sec
> >>> on my laptop and local Cassandra. The speed heavily depends on the
> >>> size of your rows.
> >>>
> >>> Back to your question — I guess option2 is similar to what you
> >>> are used to from tools like sqlloader for relational DBMSes
> >>>
> >>> I had a requirement of loading a few 100 mio rows per day into an
> >>> operational cluster so I went with option 2 to offload the cpu
> >>> load to reduce impact on the reading side during the loads.
> >>>
> >>> Cheers,
> >>> Dimo
> >>>
> >>> Sent from my iPad
> >>>
>  On 2. Aug 2019, at 18:59, p...@xvalheru.org wrote:
> 
>  Hi,
> 
>  I need to upload to Cassandra about 7 billions of records. What
> >>> is the best setup of Cassandra for this task? Will usage of batch
> >>> speeds up the upload (I've read somewhere that batch in Cassandra
> >>> is dedicated to atomicity not to speeding up communication)? How
> >>> Cassandra internally works related to indexing? In SQL databases
> >>> when uploading such amount of data is suggested to turn off
> >>> indexing and then turn on. Is something simmillar possible in
> >>> Cassandra?
> 
>  Thanks for all suggestions.
> 
>  Pat
> 
>  
>  Freehosting PIPNI - 
>  https://urldefense.proofpoint.com/v2/url?u=http-3A__www.pipni.cz_=DwIDaQ=MtgQEAMQGqekjTjiAhkudQ=aC_gxC6z_4f9GLlbWiKzHm1vucZTtVYWDDvyLkh8IaQ=0F8VMU_BKNwicZFDQ0Nx54JvvS3MHT92_W1RRwF3deA=nccgCDZwHe3qri11l3VV1if5GR1iqcWR5gjf6-J1C5U=
> 
> 
> 
> >>>
> >>
> > -
>  To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
>  For additional commands, e-mail: user-h...@cassandra.apache.org
> 
> >>>
> >>>
> >>
> > -
> >>> To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
> >>> For additional commands, e-mail: user-h...@cassandra.apache.org
> >
> >

Re: Cassandra LWT Writes inconsistent

2019-07-09 Thread Hiroyuki Yamada

Do you also set SERIAL CONSISTENCY properly ?
https://docs.datastax.com/en/archived/cql/3.3/cql/cql_reference/cqlshSerialConsistency.html

Hiro

On Tue, Jul 9, 2019 at 2:25 PM Jeff Jirsa  wrote:
>
> If applied is false it’s not the first write, the value already exists. 
> You’ve likely got a concurrency issue app side or you don’t understand the 
> concurrent queries you’re issuing to the db
>
> On Jul 8, 2019, at 10:08 PM, raman gugnani  wrote:
>
> Hi Team,
>
> Can anyone help on the same.
>
> Cassnadra driver says write is not done but eventually write has been done.
>
> On Mon, 8 Jul 2019 at 12:31, Upasana Sharma <028upasana...@gmail.com> wrote:
>>
>>
>> Hi,
>>
>> I am using an LWT Insert transaction similar to:
>>
>> INSERT INTO table1 (code, id, subpart) VALUES (:code, :id, :subpart) IF NOT 
>> EXISTS
>>
>> With
>> readConsistency="LOCAL_SERIAL"
>> writeConsistency="LOCAL_QUORUM"
>>
>> Cassandra Driver: 3.6.0
>> Cassandra Version: Cassandra 3.11.2
>>
>>
>> The problem is that I am getting [applied] false on the first write to 
>> cassandra.
>>
>> I have set retry policy as writeTimes = 0, so no retries are attempted.
>>
>> My application logs for reference:
>>
>> c-7967981443032352 - [INFO ] 2019-06-18T19:46:16.276Z [pool-15-thread-5] 
>> CreateService - SubPartition 104
>> c-7967981443032352 - [INFO ] 2019-06-18T19:46:16.805Z [pool-15-thread-5] 
>> Repository - Row[false, A, 1, 104]
>> c-7967981443032352 - [INFO ] 2019-06-18T19:46:16.805Z [pool-15-thread-5] 
>> CreateService - SubPartition 104 CodeNumber 75191 DuplicateCodeGenerated A
>>
>> This is causing my count of writes to tables to differ from required 10, 
>> to 11, writing extra codes.
>>
>> Please guide here.
>>
>> --
>> Regards,
>> Upasana Sharma
>
>
>
> --
> Raman Gugnani
>
> 8588892293
> Principal Engineer
> ixigo.com

-
To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
For additional commands, e-mail: user-h...@cassandra.apache.org

Re: Necessary consistency level for LWT writes

2019-05-23 Thread Hiroyuki Yamada

Hi Craig,

Now I probably understand what the python doc is saying.

As long as `serial_consistency_level` is set to SERIAL for paxos phase
and `consistency_level` is set to SERIAL for the later read,
conflicts in paxos table can be properly detected, so
`consistency_level` for commit phase can be anything (can be ANY as
the doc says).
Unfinished record write (commit) will be repaired in the read if any.
But, if `consistency_level` is set to others (like ALL) for the later
read, it won't be able to detect conflicts in the paxos table, so it
does not work as expected.

I'm not sure if it answers your question, but makes sense ?  > Craig.

Is this understanding correct ? > C* professionals.

Thanks,
Hiro

On Fri, May 24, 2019 at 10:49 AM Craig Pastro  wrote:
>
> Dear Hiro,
>
> Thank you for your response!
>
> Hmm, my understanding is slightly different I think. Please let me try to 
> explain one situation and let me know what you think.
>
> 1. Do a LWT write with serial_consistency = SERIAL (default) and consistency 
> = ONE.
> 2. LWT starts its Paxos phase and has communicated with a quorum of nodes
> 3. At this point a read of that data is initiated with consistency = SERIAL.
>
> Now, here is where I am confused. What I think happens is that a SERIAL read 
> will read from a quorum of nodes and detect that the Paxos phase is underway 
> and... maybe wait until it is over before responding with the latest data? 
> The Paxos phase happens between a quorum so basically even though the 
> consistency level is ONE (or indeed ANY as the Python docs state), doing a 
> read with SERIAL implies that the write actually took place at a consistency 
> level equivalent to QUORUM.
>
> Here also what I think is that a read initiated when the Paxos phase is 
> underway with a consistency level of QUORUM or ALL will not detect that a 
> Paxos phase is underway and return the old current data.
>
> Is this correct?
>
> Thank you for any help!
>
> Best wishes,
> Craig
>
>
>
>
>
>
> On Fri, May 24, 2019 at 9:58 AM Hiroyuki Yamada  wrote:
>>
>> Hi Craig,
>>
>> I'm not 100 % sure about some corner cases,
>> but I'm sure that LWT should be used with the following consistency
>> levels usually.
>>
>> LWT write:
>> serial_consistency_level: SERIAL
>> consistency_level: QUORUM
>>
>> LWT read:
>> consistency_level: SERIAL
>> (It's a bit weird and mis-leading as a design that you can set SERIAL
>> to consistency_level in read where as you can't for write.)
>>
>> BTW, I doubt the python doc is correct in especially the following part.
>> "But if the regular consistency_level of that write is ANY, then only
>> a read with a consistency_level of SERIAL is guaranteed to see it
>> (even a read with consistency ALL is not guaranteed to be enough)."
>> It is really true?
>> It doesn't really make sense to me because SERIAL read mostly returns
>> by seeing quorum of replications,
>> and write with ANY returns by writing mostly one replication, so they
>> don't overlap in that case.
>> It would be great if anyone can clarify this.
>>
>> Thanks,
>> Hiro
>>
>>
>> On Thu, May 23, 2019 at 3:53 PM Craig Pastro  wrote:
>> >
>> > Hello!
>> >
>> > I am trying to understand the consistency level (not serial consistency) 
>> > required for LWTs. Basically what I am trying to understand is that if a 
>> > consistency level of ONE is enough for a LWT write operation if I do my 
>> > read with a consistency level of SERIAL?
>> >
>> > It would seem so based on what is written for the datastax python driver:
>> >
>> > http://datastax.github.io/python-driver/api/cassandra/query.html#cassandra.query.Statement.serial_consistency_level
>> >
>> > However, that is the only place that I can find this information so I am a 
>> > little hesitant to believe it 100%.
>> >
>> > By the way, I did find basically the same question 
>> > (https://www.mail-archive.com/user@cassandra.apache.org/msg45453.html) but 
>> > I am unsure if the answer there really answers my question.
>> >
>> > Thank you in advance for any help!
>> >
>> > Best regards,
>> > Craig
>> >
>> >
>>
>> -
>> To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
>> For additional commands, e-mail: user-h...@cassandra.apache.org
>>

-
To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
For additional commands, e-mail: user-h...@cassandra.apache.org

Re: Necessary consistency level for LWT writes

2019-05-23 Thread Hiroyuki Yamada

Hi Craig,

I'm not 100 % sure about some corner cases,
but I'm sure that LWT should be used with the following consistency
levels usually.

LWT write:
serial_consistency_level: SERIAL
consistency_level: QUORUM

LWT read:
consistency_level: SERIAL
(It's a bit weird and mis-leading as a design that you can set SERIAL
to consistency_level in read where as you can't for write.)

BTW, I doubt the python doc is correct in especially the following part.
"But if the regular consistency_level of that write is ANY, then only
a read with a consistency_level of SERIAL is guaranteed to see it
(even a read with consistency ALL is not guaranteed to be enough)."
It is really true?
It doesn't really make sense to me because SERIAL read mostly returns
by seeing quorum of replications,
and write with ANY returns by writing mostly one replication, so they
don't overlap in that case.
It would be great if anyone can clarify this.

Thanks,
Hiro

On Thu, May 23, 2019 at 3:53 PM Craig Pastro  wrote:
>
> Hello!
>
> I am trying to understand the consistency level (not serial consistency) 
> required for LWTs. Basically what I am trying to understand is that if a 
> consistency level of ONE is enough for a LWT write operation if I do my read 
> with a consistency level of SERIAL?
>
> It would seem so based on what is written for the datastax python driver:
>
> http://datastax.github.io/python-driver/api/cassandra/query.html#cassandra.query.Statement.serial_consistency_level
>
> However, that is the only place that I can find this information so I am a 
> little hesitant to believe it 100%.
>
> By the way, I did find basically the same question 
> (https://www.mail-archive.com/user@cassandra.apache.org/msg45453.html) but I 
> am unsure if the answer there really answers my question.
>
> Thank you in advance for any help!
>
> Best regards,
> Craig
>
>

-
To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
For additional commands, e-mail: user-h...@cassandra.apache.org

Re: A cluster (RF=3) not recovering after two nodes are stopped

2019-05-22 Thread Hiroyuki Yamada

Hi,

FYI: I created a bug ticket since I think the behavior is just not right.
https://issues.apache.org/jira/browse/CASSANDRA-15138

Thanks,
Hiro

On Mon, May 13, 2019 at 10:58 AM Hiroyuki Yamada  wrote:

> Hi,
>
> Should I post a bug ?
> It doesn't seem to be an expected behavior,
> so I think it should be at least documented somewhere.
>
> Thanks,
> Hiro
>
>
> On Fri, Apr 26, 2019 at 3:17 PM Hiroyuki Yamada 
> wrote:
>
>> Hello,
>>
>> Thank you for some feedbacks.
>>
>> >Ben
>> Thank you.
>> I've tested with lower concurrency in my side, the issue still occurs.
>> We are using 3 x T3.xlarge instances for C* and small and separate
>> instance for the client program.
>> But if we tried with 1 host with 3 C* nodes, the issue didn't occur.
>>
>> > Alok
>> We also thought so and tested with hints disabled, but it doesn't make
>> any difference. (the issue still occurs)
>>
>> Thanks,
>> Hiro
>>
>>
>>
>>
>> On Fri, Apr 26, 2019 at 8:19 AM Alok Dwivedi <
>> alok.dwiv...@instaclustr.com> wrote:
>>
>>> Could it be related to hinted hand offs being stored in Node1 and then
>>> attempted to be replayed in Node2 when it comes back causing more load as
>>> new mutations are also being applied from cassandra-stress at same time?
>>>
>>> Alok Dwivedi
>>> Senior Consultant
>>> https://www.instaclustr.com/
>>>
>>>
>>>
>>>
>>> On 26 Apr 2019, at 09:04, Ben Slater  wrote:
>>>
>>> In the absence of anyone else having any bright ideas - it still sounds
>>> to me like the kind of scenario that can occur in a heavily overloaded
>>> cluster. I would try again with a lower load.
>>>
>>> What size machines are you using for stress client and the nodes? Are
>>> they all on separate machines?
>>>
>>> Cheers
>>> Ben
>>>
>>> ---
>>>
>>>
>>> *Ben Slater**Chief Product Officer*
>>>
>>> <https://www.instaclustr.com/platform/>
>>>
>>> <https://www.facebook.com/instaclustr>
>>> <https://twitter.com/instaclustr>
>>> <https://www.linkedin.com/company/instaclustr>
>>>
>>> Read our latest technical blog posts here
>>> <https://www.instaclustr.com/blog/>.
>>>
>>> This email has been sent on behalf of Instaclustr Pty. Limited
>>> (Australia) and Instaclustr Inc (USA).
>>>
>>> This email and any attachments may contain confidential and legally
>>> privileged information.  If you are not the intended recipient, do not copy
>>> or disclose its content, but please reply to this email immediately and
>>> highlight the error to the sender and then immediately delete the message.
>>>
>>>
>>> On Thu, 25 Apr 2019 at 17:26, Hiroyuki Yamada 
>>> wrote:
>>>
>>>> Hello,
>>>>
>>>> Sorry again.
>>>> We found yet another weird thing in this.
>>>> If we stop nodes with systemctl or just kill (TERM), it causes the
>>>> problem,
>>>> but if we kill -9, it doesn't cause the problem.
>>>>
>>>> Thanks,
>>>> Hiro
>>>>
>>>> On Wed, Apr 24, 2019 at 11:31 PM Hiroyuki Yamada 
>>>> wrote:
>>>>
>>>>> Sorry, I didn't write the version and the configurations.
>>>>> I've tested with C* 3.11.4, and
>>>>> the configurations are mostly set to default except for the
>>>>> replication factor and listen_address for proper networking.
>>>>>
>>>>> Thanks,
>>>>> Hiro
>>>>>
>>>>> On Wed, Apr 24, 2019 at 5:12 PM Hiroyuki Yamada 
>>>>> wrote:
>>>>>
>>>>>> Hello Ben,
>>>>>>
>>>>>> Thank you for the quick reply.
>>>>>> I haven't tried that case, but it does't recover even if I stopped
>>>>>> the stress.
>>>>>>
>>>>>> Thanks,
>>>>>> Hiro
>>>>>>
>>>>>> On Wed, Apr 24, 2019 at 3:36 PM Ben Slater <
>>>>>> ben.sla...@instaclustr.com> wrote:
>>>>>>
>>>>>>> Is it possible that stress is overloading node 1 so it’s not
>>>>>>> recovering state properly when node 2 comes up? Have you tried running 
>>>>>>> with
>>

Re: A cluster (RF=3) not recovering after two nodes are stopped

2019-05-12 Thread Hiroyuki Yamada

Hi,

Should I post a bug ?
It doesn't seem to be an expected behavior,
so I think it should be at least documented somewhere.

Thanks,
Hiro


On Fri, Apr 26, 2019 at 3:17 PM Hiroyuki Yamada  wrote:

> Hello,
>
> Thank you for some feedbacks.
>
> >Ben
> Thank you.
> I've tested with lower concurrency in my side, the issue still occurs.
> We are using 3 x T3.xlarge instances for C* and small and separate
> instance for the client program.
> But if we tried with 1 host with 3 C* nodes, the issue didn't occur.
>
> > Alok
> We also thought so and tested with hints disabled, but it doesn't make any
> difference. (the issue still occurs)
>
> Thanks,
> Hiro
>
>
>
>
> On Fri, Apr 26, 2019 at 8:19 AM Alok Dwivedi 
> wrote:
>
>> Could it be related to hinted hand offs being stored in Node1 and then
>> attempted to be replayed in Node2 when it comes back causing more load as
>> new mutations are also being applied from cassandra-stress at same time?
>>
>> Alok Dwivedi
>> Senior Consultant
>> https://www.instaclustr.com/
>>
>>
>>
>>
>> On 26 Apr 2019, at 09:04, Ben Slater  wrote:
>>
>> In the absence of anyone else having any bright ideas - it still sounds
>> to me like the kind of scenario that can occur in a heavily overloaded
>> cluster. I would try again with a lower load.
>>
>> What size machines are you using for stress client and the nodes? Are
>> they all on separate machines?
>>
>> Cheers
>> Ben
>>
>> ---
>>
>>
>> *Ben Slater**Chief Product Officer*
>>
>> <https://www.instaclustr.com/platform/>
>>
>> <https://www.facebook.com/instaclustr>
>> <https://twitter.com/instaclustr>
>> <https://www.linkedin.com/company/instaclustr>
>>
>> Read our latest technical blog posts here
>> <https://www.instaclustr.com/blog/>.
>>
>> This email has been sent on behalf of Instaclustr Pty. Limited
>> (Australia) and Instaclustr Inc (USA).
>>
>> This email and any attachments may contain confidential and legally
>> privileged information.  If you are not the intended recipient, do not copy
>> or disclose its content, but please reply to this email immediately and
>> highlight the error to the sender and then immediately delete the message.
>>
>>
>> On Thu, 25 Apr 2019 at 17:26, Hiroyuki Yamada  wrote:
>>
>>> Hello,
>>>
>>> Sorry again.
>>> We found yet another weird thing in this.
>>> If we stop nodes with systemctl or just kill (TERM), it causes the
>>> problem,
>>> but if we kill -9, it doesn't cause the problem.
>>>
>>> Thanks,
>>> Hiro
>>>
>>> On Wed, Apr 24, 2019 at 11:31 PM Hiroyuki Yamada 
>>> wrote:
>>>
>>>> Sorry, I didn't write the version and the configurations.
>>>> I've tested with C* 3.11.4, and
>>>> the configurations are mostly set to default except for the replication
>>>> factor and listen_address for proper networking.
>>>>
>>>> Thanks,
>>>> Hiro
>>>>
>>>> On Wed, Apr 24, 2019 at 5:12 PM Hiroyuki Yamada 
>>>> wrote:
>>>>
>>>>> Hello Ben,
>>>>>
>>>>> Thank you for the quick reply.
>>>>> I haven't tried that case, but it does't recover even if I stopped the
>>>>> stress.
>>>>>
>>>>> Thanks,
>>>>> Hiro
>>>>>
>>>>> On Wed, Apr 24, 2019 at 3:36 PM Ben Slater 
>>>>> wrote:
>>>>>
>>>>>> Is it possible that stress is overloading node 1 so it’s not
>>>>>> recovering state properly when node 2 comes up? Have you tried running 
>>>>>> with
>>>>>> a lower load (say 2 or 3 threads)?
>>>>>>
>>>>>> Cheers
>>>>>> Ben
>>>>>>
>>>>>> ---
>>>>>>
>>>>>>
>>>>>> *Ben Slater*
>>>>>> *Chief Product Officer*
>>>>>>
>>>>>>
>>>>>> <https://www.facebook.com/instaclustr>
>>>>>> <https://twitter.com/instaclustr>
>>>>>> <https://www.linkedin.com/company/instaclustr>
>>>>>>
>>>>>> Read our latest technical blog posts here
>>>>>> <https://www.instaclustr.com/blog/>.
>>>>>>
>>>>>> This email h

Re: A cluster (RF=3) not recovering after two nodes are stopped

2019-04-26 Thread Hiroyuki Yamada

Hello,

Thank you for some feedbacks.

>Ben
Thank you.
I've tested with lower concurrency in my side, the issue still occurs.
We are using 3 x T3.xlarge instances for C* and small and separate instance
for the client program.
But if we tried with 1 host with 3 C* nodes, the issue didn't occur.

> Alok
We also thought so and tested with hints disabled, but it doesn't make any
difference. (the issue still occurs)

Thanks,
Hiro




On Fri, Apr 26, 2019 at 8:19 AM Alok Dwivedi 
wrote:

> Could it be related to hinted hand offs being stored in Node1 and then
> attempted to be replayed in Node2 when it comes back causing more load as
> new mutations are also being applied from cassandra-stress at same time?
>
> Alok Dwivedi
> Senior Consultant
> https://www.instaclustr.com/
>
>
>
>
> On 26 Apr 2019, at 09:04, Ben Slater  wrote:
>
> In the absence of anyone else having any bright ideas - it still sounds to
> me like the kind of scenario that can occur in a heavily overloaded
> cluster. I would try again with a lower load.
>
> What size machines are you using for stress client and the nodes? Are they
> all on separate machines?
>
> Cheers
> Ben
>
> ---
>
>
> *Ben Slater**Chief Product Officer*
>
> <https://www.instaclustr.com/platform/>
>
> <https://www.facebook.com/instaclustr>   <https://twitter.com/instaclustr>
><https://www.linkedin.com/company/instaclustr>
>
> Read our latest technical blog posts here
> <https://www.instaclustr.com/blog/>.
>
> This email has been sent on behalf of Instaclustr Pty. Limited (Australia)
> and Instaclustr Inc (USA).
>
> This email and any attachments may contain confidential and legally
> privileged information.  If you are not the intended recipient, do not copy
> or disclose its content, but please reply to this email immediately and
> highlight the error to the sender and then immediately delete the message.
>
>
> On Thu, 25 Apr 2019 at 17:26, Hiroyuki Yamada  wrote:
>
>> Hello,
>>
>> Sorry again.
>> We found yet another weird thing in this.
>> If we stop nodes with systemctl or just kill (TERM), it causes the
>> problem,
>> but if we kill -9, it doesn't cause the problem.
>>
>> Thanks,
>> Hiro
>>
>> On Wed, Apr 24, 2019 at 11:31 PM Hiroyuki Yamada 
>> wrote:
>>
>>> Sorry, I didn't write the version and the configurations.
>>> I've tested with C* 3.11.4, and
>>> the configurations are mostly set to default except for the replication
>>> factor and listen_address for proper networking.
>>>
>>> Thanks,
>>> Hiro
>>>
>>> On Wed, Apr 24, 2019 at 5:12 PM Hiroyuki Yamada 
>>> wrote:
>>>
>>>> Hello Ben,
>>>>
>>>> Thank you for the quick reply.
>>>> I haven't tried that case, but it does't recover even if I stopped the
>>>> stress.
>>>>
>>>> Thanks,
>>>> Hiro
>>>>
>>>> On Wed, Apr 24, 2019 at 3:36 PM Ben Slater 
>>>> wrote:
>>>>
>>>>> Is it possible that stress is overloading node 1 so it’s not
>>>>> recovering state properly when node 2 comes up? Have you tried running 
>>>>> with
>>>>> a lower load (say 2 or 3 threads)?
>>>>>
>>>>> Cheers
>>>>> Ben
>>>>>
>>>>> ---
>>>>>
>>>>>
>>>>> *Ben Slater*
>>>>> *Chief Product Officer*
>>>>>
>>>>>
>>>>> <https://www.facebook.com/instaclustr>
>>>>> <https://twitter.com/instaclustr>
>>>>> <https://www.linkedin.com/company/instaclustr>
>>>>>
>>>>> Read our latest technical blog posts here
>>>>> <https://www.instaclustr.com/blog/>.
>>>>>
>>>>> This email has been sent on behalf of Instaclustr Pty. Limited
>>>>> (Australia) and Instaclustr Inc (USA).
>>>>>
>>>>> This email and any attachments may contain confidential and legally
>>>>> privileged information.  If you are not the intended recipient, do not 
>>>>> copy
>>>>> or disclose its content, but please reply to this email immediately and
>>>>> highlight the error to the sender and then immediately delete the message.
>>>>>
>>>>>
>>>>> On Wed, 24 Apr 2019 at 16:28, Hiroyuki Yamada 
>>>>> wrote:
>>>>>
>>>>>> Hello,
>>>>>>

Re: A cluster (RF=3) not recovering after two nodes are stopped

2019-04-25 Thread Hiroyuki Yamada

Hello,

Sorry again.
We found yet another weird thing in this.
If we stop nodes with systemctl or just kill (TERM), it causes the problem,
but if we kill -9, it doesn't cause the problem.

Thanks,
Hiro

On Wed, Apr 24, 2019 at 11:31 PM Hiroyuki Yamada  wrote:

> Sorry, I didn't write the version and the configurations.
> I've tested with C* 3.11.4, and
> the configurations are mostly set to default except for the replication
> factor and listen_address for proper networking.
>
> Thanks,
> Hiro
>
> On Wed, Apr 24, 2019 at 5:12 PM Hiroyuki Yamada 
> wrote:
>
>> Hello Ben,
>>
>> Thank you for the quick reply.
>> I haven't tried that case, but it does't recover even if I stopped the
>> stress.
>>
>> Thanks,
>> Hiro
>>
>> On Wed, Apr 24, 2019 at 3:36 PM Ben Slater 
>> wrote:
>>
>>> Is it possible that stress is overloading node 1 so it’s not recovering
>>> state properly when node 2 comes up? Have you tried running with a lower
>>> load (say 2 or 3 threads)?
>>>
>>> Cheers
>>> Ben
>>>
>>> ---
>>>
>>>
>>> *Ben Slater*
>>> *Chief Product Officer*
>>>
>>>
>>> <https://www.facebook.com/instaclustr>
>>> <https://twitter.com/instaclustr>
>>> <https://www.linkedin.com/company/instaclustr>
>>>
>>> Read our latest technical blog posts here
>>> <https://www.instaclustr.com/blog/>.
>>>
>>> This email has been sent on behalf of Instaclustr Pty. Limited
>>> (Australia) and Instaclustr Inc (USA).
>>>
>>> This email and any attachments may contain confidential and legally
>>> privileged information.  If you are not the intended recipient, do not copy
>>> or disclose its content, but please reply to this email immediately and
>>> highlight the error to the sender and then immediately delete the message.
>>>
>>>
>>> On Wed, 24 Apr 2019 at 16:28, Hiroyuki Yamada 
>>> wrote:
>>>
>>>> Hello,
>>>>
>>>> I faced a weird issue when recovering a cluster after two nodes are
>>>> stopped.
>>>> It is easily reproduce-able and looks like a bug or an issue to fix,
>>>> so let me write down the steps to reproduce.
>>>>
>>>> === STEPS TO REPRODUCE ===
>>>> * Create a 3-node cluster with RF=3
>>>>- node1(seed), node2, node3
>>>> * Start requests to the cluster with cassandra-stress (it continues
>>>> until the end)
>>>>- what we did: cassandra-stress mixed cl=QUORUM duration=10m
>>>> -errors ignore -node node1,node2,node3 -rate threads\>=16
>>>> threads\<=256
>>>> * Stop node3 normally (with systemctl stop)
>>>>- the system is still available because the quorum of nodes is
>>>> still available
>>>> * Stop node2 normally (with systemctl stop)
>>>>- the system is NOT available after it's stopped.
>>>>- the client gets `UnavailableException: Not enough replicas
>>>> available for query at consistency QUORUM`
>>>>- the client gets errors right away (so few ms)
>>>>- so far it's all expected
>>>> * Wait for 1 mins
>>>> * Bring up node2
>>>>- The issue happens here.
>>>>- the client gets ReadTimeoutException` or WriteTimeoutException
>>>> depending on if the request is read or write even after the node2 is
>>>> up
>>>>- the client gets errors after about 5000ms or 2000ms, which are
>>>> request timeout for write and read request
>>>>- what node1 reports with `nodetool status` and what node2 reports
>>>> are not consistent. (node2 thinks node1 is down)
>>>>- It takes very long time to recover from its state
>>>> === STEPS TO REPRODUCE ===
>>>>
>>>> Is it supposed to happen ?
>>>> If we don't start cassandra-stress, it's all fine.
>>>>
>>>> Some workarounds we found to recover the state are the followings:
>>>> * Restarting node1 and it recovers its state right after it's restarted
>>>> * Setting lower value in dynamic_snitch_reset_interval_in_ms (to 6
>>>> or something)
>>>>
>>>> I don't think either of them is a really good solution.
>>>> Can anyone explain what is going on and what is the best way to make
>>>> it not happen or recover ?
>>>>
>>>> Thanks,
>>>> Hiro
>>>>
>>>> -
>>>> To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
>>>> For additional commands, e-mail: user-h...@cassandra.apache.org
>>>>
>>>>

Re: A cluster (RF=3) not recovering after two nodes are stopped

2019-04-24 Thread Hiroyuki Yamada

Sorry, I didn't write the version and the configurations.
I've tested with C* 3.11.4, and
the configurations are mostly set to default except for the replication
factor and listen_address for proper networking.

Thanks,
Hiro

On Wed, Apr 24, 2019 at 5:12 PM Hiroyuki Yamada  wrote:

> Hello Ben,
>
> Thank you for the quick reply.
> I haven't tried that case, but it does't recover even if I stopped the
> stress.
>
> Thanks,
> Hiro
>
> On Wed, Apr 24, 2019 at 3:36 PM Ben Slater 
> wrote:
>
>> Is it possible that stress is overloading node 1 so it’s not recovering
>> state properly when node 2 comes up? Have you tried running with a lower
>> load (say 2 or 3 threads)?
>>
>> Cheers
>> Ben
>>
>> ---
>>
>>
>> *Ben Slater*
>> *Chief Product Officer*
>>
>>
>> <https://www.facebook.com/instaclustr>
>> <https://twitter.com/instaclustr>
>> <https://www.linkedin.com/company/instaclustr>
>>
>> Read our latest technical blog posts here
>> <https://www.instaclustr.com/blog/>.
>>
>> This email has been sent on behalf of Instaclustr Pty. Limited
>> (Australia) and Instaclustr Inc (USA).
>>
>> This email and any attachments may contain confidential and legally
>> privileged information.  If you are not the intended recipient, do not copy
>> or disclose its content, but please reply to this email immediately and
>> highlight the error to the sender and then immediately delete the message.
>>
>>
>> On Wed, 24 Apr 2019 at 16:28, Hiroyuki Yamada  wrote:
>>
>>> Hello,
>>>
>>> I faced a weird issue when recovering a cluster after two nodes are
>>> stopped.
>>> It is easily reproduce-able and looks like a bug or an issue to fix,
>>> so let me write down the steps to reproduce.
>>>
>>> === STEPS TO REPRODUCE ===
>>> * Create a 3-node cluster with RF=3
>>>- node1(seed), node2, node3
>>> * Start requests to the cluster with cassandra-stress (it continues
>>> until the end)
>>>- what we did: cassandra-stress mixed cl=QUORUM duration=10m
>>> -errors ignore -node node1,node2,node3 -rate threads\>=16
>>> threads\<=256
>>> * Stop node3 normally (with systemctl stop)
>>>- the system is still available because the quorum of nodes is
>>> still available
>>> * Stop node2 normally (with systemctl stop)
>>>- the system is NOT available after it's stopped.
>>>- the client gets `UnavailableException: Not enough replicas
>>> available for query at consistency QUORUM`
>>>- the client gets errors right away (so few ms)
>>>- so far it's all expected
>>> * Wait for 1 mins
>>> * Bring up node2
>>>- The issue happens here.
>>>- the client gets ReadTimeoutException` or WriteTimeoutException
>>> depending on if the request is read or write even after the node2 is
>>> up
>>>- the client gets errors after about 5000ms or 2000ms, which are
>>> request timeout for write and read request
>>>- what node1 reports with `nodetool status` and what node2 reports
>>> are not consistent. (node2 thinks node1 is down)
>>>- It takes very long time to recover from its state
>>> === STEPS TO REPRODUCE ===
>>>
>>> Is it supposed to happen ?
>>> If we don't start cassandra-stress, it's all fine.
>>>
>>> Some workarounds we found to recover the state are the followings:
>>> * Restarting node1 and it recovers its state right after it's restarted
>>> * Setting lower value in dynamic_snitch_reset_interval_in_ms (to 6
>>> or something)
>>>
>>> I don't think either of them is a really good solution.
>>> Can anyone explain what is going on and what is the best way to make
>>> it not happen or recover ?
>>>
>>> Thanks,
>>> Hiro
>>>
>>> -
>>> To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
>>> For additional commands, e-mail: user-h...@cassandra.apache.org
>>>
>>>

Re: A cluster (RF=3) not recovering after two nodes are stopped

2019-04-24 Thread Hiroyuki Yamada

Hello Ben,

Thank you for the quick reply.
I haven't tried that case, but it does't recover even if I stopped the
stress.

Thanks,
Hiro

On Wed, Apr 24, 2019 at 3:36 PM Ben Slater 
wrote:

> Is it possible that stress is overloading node 1 so it’s not recovering
> state properly when node 2 comes up? Have you tried running with a lower
> load (say 2 or 3 threads)?
>
> Cheers
> Ben
>
> ---
>
>
> *Ben Slater*
> *Chief Product Officer*
>
>
> <https://www.facebook.com/instaclustr>   <https://twitter.com/instaclustr>
><https://www.linkedin.com/company/instaclustr>
>
> Read our latest technical blog posts here
> <https://www.instaclustr.com/blog/>.
>
> This email has been sent on behalf of Instaclustr Pty. Limited (Australia)
> and Instaclustr Inc (USA).
>
> This email and any attachments may contain confidential and legally
> privileged information.  If you are not the intended recipient, do not copy
> or disclose its content, but please reply to this email immediately and
> highlight the error to the sender and then immediately delete the message.
>
>
> On Wed, 24 Apr 2019 at 16:28, Hiroyuki Yamada  wrote:
>
>> Hello,
>>
>> I faced a weird issue when recovering a cluster after two nodes are
>> stopped.
>> It is easily reproduce-able and looks like a bug or an issue to fix,
>> so let me write down the steps to reproduce.
>>
>> === STEPS TO REPRODUCE ===
>> * Create a 3-node cluster with RF=3
>>- node1(seed), node2, node3
>> * Start requests to the cluster with cassandra-stress (it continues
>> until the end)
>>- what we did: cassandra-stress mixed cl=QUORUM duration=10m
>> -errors ignore -node node1,node2,node3 -rate threads\>=16
>> threads\<=256
>> * Stop node3 normally (with systemctl stop)
>>- the system is still available because the quorum of nodes is
>> still available
>> * Stop node2 normally (with systemctl stop)
>>- the system is NOT available after it's stopped.
>>- the client gets `UnavailableException: Not enough replicas
>> available for query at consistency QUORUM`
>>- the client gets errors right away (so few ms)
>>- so far it's all expected
>> * Wait for 1 mins
>> * Bring up node2
>>- The issue happens here.
>>- the client gets ReadTimeoutException` or WriteTimeoutException
>> depending on if the request is read or write even after the node2 is
>> up
>>- the client gets errors after about 5000ms or 2000ms, which are
>> request timeout for write and read request
>>- what node1 reports with `nodetool status` and what node2 reports
>> are not consistent. (node2 thinks node1 is down)
>>- It takes very long time to recover from its state
>> === STEPS TO REPRODUCE ===
>>
>> Is it supposed to happen ?
>> If we don't start cassandra-stress, it's all fine.
>>
>> Some workarounds we found to recover the state are the followings:
>> * Restarting node1 and it recovers its state right after it's restarted
>> * Setting lower value in dynamic_snitch_reset_interval_in_ms (to 6
>> or something)
>>
>> I don't think either of them is a really good solution.
>> Can anyone explain what is going on and what is the best way to make
>> it not happen or recover ?
>>
>> Thanks,
>> Hiro
>>
>> -
>> To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
>> For additional commands, e-mail: user-h...@cassandra.apache.org
>>
>>

A cluster (RF=3) not recovering after two nodes are stopped

2019-04-24 Thread Hiroyuki Yamada

Hello,

I faced a weird issue when recovering a cluster after two nodes are stopped.
It is easily reproduce-able and looks like a bug or an issue to fix,
so let me write down the steps to reproduce.

=== STEPS TO REPRODUCE ===
* Create a 3-node cluster with RF=3
   - node1(seed), node2, node3
* Start requests to the cluster with cassandra-stress (it continues
until the end)
   - what we did: cassandra-stress mixed cl=QUORUM duration=10m
-errors ignore -node node1,node2,node3 -rate threads\>=16
threads\<=256
* Stop node3 normally (with systemctl stop)
   - the system is still available because the quorum of nodes is
still available
* Stop node2 normally (with systemctl stop)
   - the system is NOT available after it's stopped.
   - the client gets `UnavailableException: Not enough replicas
available for query at consistency QUORUM`
   - the client gets errors right away (so few ms)
   - so far it's all expected
* Wait for 1 mins
* Bring up node2
   - The issue happens here.
   - the client gets ReadTimeoutException` or WriteTimeoutException
depending on if the request is read or write even after the node2 is
up
   - the client gets errors after about 5000ms or 2000ms, which are
request timeout for write and read request
   - what node1 reports with `nodetool status` and what node2 reports
are not consistent. (node2 thinks node1 is down)
   - It takes very long time to recover from its state
=== STEPS TO REPRODUCE ===

Is it supposed to happen ?
If we don't start cassandra-stress, it's all fine.

Some workarounds we found to recover the state are the followings:
* Restarting node1 and it recovers its state right after it's restarted
* Setting lower value in dynamic_snitch_reset_interval_in_ms (to 6
or something)

I don't think either of them is a really good solution.
Can anyone explain what is going on and what is the best way to make
it not happen or recover ?

Thanks,
Hiro

-
To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
For additional commands, e-mail: user-h...@cassandra.apache.org

Re: Apache Cassandra transactions commit and rollback

2018-12-07 Thread Hiroyuki Yamada

Hi Ramya,

Scalar DB is one of the options.
https://github.com/scalar-labs/scalardb

But, first of all, please re-think about your design if you really need it.
For example, If eventual consistency between multiple rows are acceptable, and
writes are idempotent, then you should go with C* write with retries simply.
Using transaction is basically the last option.

Thanks,
Hiro

On Wed, Nov 28, 2018 at 10:27 PM Ramya K  wrote:
>
> Hi All,
>
>   I'm exploring Cassandra for our project and would like to know the best 
> practices for handling transactions in real time. Also suggest if any drivers 
> or tools are available for this.
>
>   I've read about Apache Kundera transaction layer over Cassandra, is there 
> bottlenecks with this.
>
>   Please suggest your views on this.
>
> Reagrds,
> Ramya.

-
To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
For additional commands, e-mail: user-h...@cassandra.apache.org

Re: Released an ACID-compliant transaction library on top of Cassandra

2018-11-13 Thread Hiroyuki Yamada

Hi all,

I am happy to release it under Apache 2 license now.
https://github.com/scalar-labs/scalardb

It passes not only jepsen but also our-built destructive testing.
For jepsen tests, please check the following.
https://github.com/scalar-labs/scalardb/tree/master/jepsen/scalardb

Also, as Yuji mentioned the other day, we also fixed/updated jepsen
tests for C* to make it work with the latest C* version properly and
follow the new style.
https://github.com/scalar-labs/jepsen/tree/cassandra

In addition to that, we fixed/updated cassaforte used in the jepsen
tests for C* to make it work with the latest java driver since
cassaforte is not really maintained any more.
https://github.com/scalar-labs/cassaforte/tree/driver-3.0-for-jepsen

We are pleased to be able to contribute to the community by the above updates.
Please give us any feedbacks or questions.

Thanks,
Hiro


On Wed, Oct 17, 2018 at 8:52 AM Hiroyuki Yamada  wrote:
>
> Hi all,
>
> Thank you for the comments and feedbacks.
>
> As Jonathan pointed out, it relies on LWT and uses the protocol
> proposed in the paper.
> Please read the design document for more detail.
> https://github.com/scalar-labs/scalardb/blob/master/docs/design.md
>
> Regarding the licensing, we are thinking of releasing it with Apache 2
> if lots of developers are interested in it.
>
> Best regards,
> Hiroyuki
> On Wed, Oct 17, 2018 at 3:13 AM Jonathan Ellis  wrote:
> >
> > Which was followed up by 
> > https://www.researchgate.net/profile/Akon_Dey/publication/282156834_Scalable_Distributed_Transactions_across_Heterogeneous_Stores/links/56058b9608ae5e8e3f32b98d.pdf
> >
> > On Tue, Oct 16, 2018 at 1:02 PM Jonathan Ellis  wrote:
> >>
> >> It looks like it's based on this: 
> >> http://www.vldb.org/pvldb/vol6/p1434-dey.pdf
> >>
> >> On Tue, Oct 16, 2018 at 11:37 AM Ariel Weisberg  wrote:
> >>>
> >>> Hi,
> >>>
> >>> Yes this does sound great. Does this rely on Cassandra's internal SERIAL 
> >>> consistency and CAS functionality or is that implemented at a higher 
> >>> level?
> >>>
> >>> Regards,
> >>> Ariel
> >>>
> >>> On Tue, Oct 16, 2018, at 12:31 PM, Jeff Jirsa wrote:
> >>> > This is great!
> >>> >
> >>> > --
> >>> > Jeff Jirsa
> >>> >
> >>> >
> >>> > > On Oct 16, 2018, at 5:47 PM, Hiroyuki Yamada  
> >>> > > wrote:
> >>> > >
> >>> > > Hi all,
> >>> > >
> >>> > > # Sorry, I accidentally emailed the following to dev@, so re-sending 
> >>> > > to here.
> >>> > >
> >>> > > We have been working on ACID-compliant transaction library on top of
> >>> > > Cassandra called Scalar DB,
> >>> > > and are pleased to announce the release of v.1.0 RC version in open 
> >>> > > source.
> >>> > >
> >>> > > https://github.com/scalar-labs/scalardb/
> >>> > >
> >>> > > Scalar DB is a library that provides a distributed storage abstraction
> >>> > > and client-coordinated distributed transaction on the storage,
> >>> > > and makes non-ACID distributed database/storage ACID-compliant.
> >>> > > And Cassandra is the first supported database implementation.
> >>> > >
> >>> > > It's been internally tested intensively and is jepsen-passed.
> >>> > > (see jepsen directory for more detail)
> >>> > > If you are looking for ACID transaction capability on top of 
> >>> > > cassandra,
> >>> > > Please take a look and give us a feedback or contribution.
> >>> > >
> >>> > > Best regards,
> >>> > > Hiroyuki Yamada
> >>> > >
> >>> > > -
> >>> > > To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
> >>> > > For additional commands, e-mail: user-h...@cassandra.apache.org
> >>> > >
> >>> >
> >>> > -
> >>> > To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
> >>> > For additional commands, e-mail: user-h...@cassandra.apache.org
> >>> >
> >>>
> >>> -
> >>> To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
> >>> For additional commands, e-mail: user-h...@cassandra.apache.org
> >>>
> >>
> >>
> >> --
> >> Jonathan Ellis
> >> co-founder, http://www.datastax.com
> >> @spyced
> >
> >
> >
> > --
> > Jonathan Ellis
> > co-founder, http://www.datastax.com
> > @spyced

-
To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
For additional commands, e-mail: user-h...@cassandra.apache.org

Re: Released an ACID-compliant transaction library on top of Cassandra

2018-10-16 Thread Hiroyuki Yamada

Hi all,

Thank you for the comments and feedbacks.

As Jonathan pointed out, it relies on LWT and uses the protocol
proposed in the paper.
Please read the design document for more detail.
https://github.com/scalar-labs/scalardb/blob/master/docs/design.md

Regarding the licensing, we are thinking of releasing it with Apache 2
if lots of developers are interested in it.

Best regards,
Hiroyuki
On Wed, Oct 17, 2018 at 3:13 AM Jonathan Ellis  wrote:
>
> Which was followed up by 
> https://www.researchgate.net/profile/Akon_Dey/publication/282156834_Scalable_Distributed_Transactions_across_Heterogeneous_Stores/links/56058b9608ae5e8e3f32b98d.pdf
>
> On Tue, Oct 16, 2018 at 1:02 PM Jonathan Ellis  wrote:
>>
>> It looks like it's based on this: 
>> http://www.vldb.org/pvldb/vol6/p1434-dey.pdf
>>
>> On Tue, Oct 16, 2018 at 11:37 AM Ariel Weisberg  wrote:
>>>
>>> Hi,
>>>
>>> Yes this does sound great. Does this rely on Cassandra's internal SERIAL 
>>> consistency and CAS functionality or is that implemented at a higher level?
>>>
>>> Regards,
>>> Ariel
>>>
>>> On Tue, Oct 16, 2018, at 12:31 PM, Jeff Jirsa wrote:
>>> > This is great!
>>> >
>>> > --
>>> > Jeff Jirsa
>>> >
>>> >
>>> > > On Oct 16, 2018, at 5:47 PM, Hiroyuki Yamada  wrote:
>>> > >
>>> > > Hi all,
>>> > >
>>> > > # Sorry, I accidentally emailed the following to dev@, so re-sending to 
>>> > > here.
>>> > >
>>> > > We have been working on ACID-compliant transaction library on top of
>>> > > Cassandra called Scalar DB,
>>> > > and are pleased to announce the release of v.1.0 RC version in open 
>>> > > source.
>>> > >
>>> > > https://github.com/scalar-labs/scalardb/
>>> > >
>>> > > Scalar DB is a library that provides a distributed storage abstraction
>>> > > and client-coordinated distributed transaction on the storage,
>>> > > and makes non-ACID distributed database/storage ACID-compliant.
>>> > > And Cassandra is the first supported database implementation.
>>> > >
>>> > > It's been internally tested intensively and is jepsen-passed.
>>> > > (see jepsen directory for more detail)
>>> > > If you are looking for ACID transaction capability on top of cassandra,
>>> > > Please take a look and give us a feedback or contribution.
>>> > >
>>> > > Best regards,
>>> > > Hiroyuki Yamada
>>> > >
>>> > > -
>>> > > To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
>>> > > For additional commands, e-mail: user-h...@cassandra.apache.org
>>> > >
>>> >
>>> > -
>>> > To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
>>> > For additional commands, e-mail: user-h...@cassandra.apache.org
>>> >
>>>
>>> -
>>> To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
>>> For additional commands, e-mail: user-h...@cassandra.apache.org
>>>
>>
>>
>> --
>> Jonathan Ellis
>> co-founder, http://www.datastax.com
>> @spyced
>
>
>
> --
> Jonathan Ellis
> co-founder, http://www.datastax.com
> @spyced

-
To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
For additional commands, e-mail: user-h...@cassandra.apache.org

Released an ACID-compliant transaction library on top of Cassandra

2018-10-16 Thread Hiroyuki Yamada

Hi all,

# Sorry, I accidentally emailed the following to dev@, so re-sending to here.

We have been working on ACID-compliant transaction library on top of
Cassandra called Scalar DB,
and are pleased to announce the release of v.1.0 RC version in open source.

https://github.com/scalar-labs/scalardb/

Scalar DB is a library that provides a distributed storage abstraction
and client-coordinated distributed transaction on the storage,
and makes non-ACID distributed database/storage ACID-compliant.
And Cassandra is the first supported database implementation.

It's been internally tested intensively and is jepsen-passed.
(see jepsen directory for more detail)
If you are looking for ACID transaction capability on top of cassandra,
Please take a look and give us a feedback or contribution.

Best regards,
Hiroyuki Yamada

-
To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
For additional commands, e-mail: user-h...@cassandra.apache.org

Is it safe to use paxos protocol in LWT from patent perspective ?

2018-04-17 Thread Hiroyuki Yamada

Hi all,

I'm wondering if it is safe to use paxos protocol in LWT from patent
perspective.
I found some paxos-related patents here.


Does anyone know about this ?

Best regards,
Hiroyuki

-
To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
For additional commands, e-mail: user-h...@cassandra.apache.org

Re: LWT on data mutated by non-LWT operation is valid ?

2018-03-26 Thread Hiroyuki Yamada

Thank you, JH.
I took a look and probably understand it.

Maybe, in summary,
if all the operations are LWT, there is no issue even if clocks are
drifted, because it's ballot based.
But, if some of the operations are non-LWT and clocks are drifted,
there might be causing issues.
(like overwriting data with previous insert with future time or something)

Anyways, thank you.

Thanks,
Hiro

On Mon, Mar 26, 2018 at 5:42 PM, Jacques-Henri Berthemet
<jacques-henri.berthe...@genesys.com> wrote:
> If you check the Jira issue I linked you'll see a recent comment describing a 
> potential explanation for my mixed LWT/non-LWT problem. So, it looks like 
> there can be some edge cases.
>
> I'd say that if data was inserted a while ago (seconds) there should be no 
> problems.
>
> --
> Jacques-Henri Berthemet
>
> -Original Message-
> From: Hiroyuki Yamada [mailto:mogwa...@gmail.com]
> Sent: Sunday, March 25, 2018 1:10 AM
> To: user@cassandra.apache.org
> Subject: Re: LWT on data mutated by non-LWT operation is valid ?
>
> Thank you JH.
> But, it's still a little bit unclear to me
>
> Let me clarify the question.
> What I wanted to know is whether or not linearizability is sustained by doing 
> LWT  (Consistency: QUORUM, Serial Consistency: SERIAL) on data previously 
> mutated by non-LWT (Consistency: QUORUM).
> I think It should be OK if non-LWT surely correctly happened before the LWT, 
> but I'm wondering if there is a corner case where it's not OK.
>
> For example, to test LWT (Update) operations, initial data should be inserted 
> by LWT operations ? or it can be non-LWT operations ?
>
> Thanks,
> Hiroyuki
>
> On Sat, Mar 24, 2018 at 8:27 PM, Jacques-Henri Berthemet 
> <jacques-henri.berthe...@genesys.com> wrote:
>> Hi Hiroyuki,
>>
>>
>> For both operations you'll have to provide partition key so "conflict"
>> at DB level can always be resolved.
>>
>> But if two operations, LWT and non-LWT, are racing against each others
>> the result is unpredictable, if non-LWT is applied after LWT the
>> result will be overwritten.
>>
>>
>> It seems mixing LWT and non-LWT can result in strange results, we
>> recently opened a bug on non-working delete after LWT insert:
>> https://issues.apache.org/jira/browse/CASSANDRA-14304
>> .apache.org
>>
>>
>> Regards,
>>
>> JH
>>
>> 
>> From: Hiroyuki Yamada <mogwa...@gmail.com>
>> Sent: Saturday, March 24, 2018 4:38:15 AM
>> To: user@cassandra.apache.org
>> Subject: LWT on data mutated by non-LWT operation is valid ?
>>
>> Hi all,
>>
>> I have some question about LWT.
>>
>> I am wondering if LWT works only for data mutated by LWT or not.
>> In other words, doing LWT on some data mutated by non-LWT operations
>> is still valid ?
>> I don't fully understand how system.paxos table works in LWT, but
>> row_key should be empty for a data mutated by non-LWT operation, so
>> conflict resolution seems impossible.
>> It works only if a previous non-LWT operation is completely finished ?
>>
>> Thanks in advance.
>>
>> Best regards,
>> Hiroyuki
>>
>> -
>> To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
>> For additional commands, e-mail: user-h...@cassandra.apache.org
>>
>
> -
> To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
> For additional commands, e-mail: user-h...@cassandra.apache.org
>
>
> -
> To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
> For additional commands, e-mail: user-h...@cassandra.apache.org

-
To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
For additional commands, e-mail: user-h...@cassandra.apache.org

Re: LWT on data mutated by non-LWT operation is valid ?

2018-03-24 Thread Hiroyuki Yamada

Thank you JH.
But, it's still a little bit unclear to me

Let me clarify the question.
What I wanted to know is whether or not linearizability is sustained
by doing LWT  (Consistency: QUORUM, Serial Consistency: SERIAL) on
data previously mutated by non-LWT (Consistency: QUORUM).
I think It should be OK if non-LWT surely correctly happened before the LWT,
but I'm wondering if there is a corner case where it's not OK.

For example, to test LWT (Update) operations,
initial data should be inserted by LWT operations ? or it can be
non-LWT operations ?

Thanks,
Hiroyuki

On Sat, Mar 24, 2018 at 8:27 PM, Jacques-Henri Berthemet
<jacques-henri.berthe...@genesys.com> wrote:
> Hi Hiroyuki,
>
>
> For both operations you'll have to provide partition key so "conflict" at DB
> level can always be resolved.
>
> But if two operations, LWT and non-LWT, are racing against each others the
> result is unpredictable, if non-LWT is applied after LWT the result will be
> overwritten.
>
>
> It seems mixing LWT and non-LWT can result in strange results, we recently
> opened a bug on non-working delete after LWT insert:
> https://issues.apache.org/jira/browse/CASSANDRA-14304
>
>
> Regards,
>
> JH
>
> 
> From: Hiroyuki Yamada <mogwa...@gmail.com>
> Sent: Saturday, March 24, 2018 4:38:15 AM
> To: user@cassandra.apache.org
> Subject: LWT on data mutated by non-LWT operation is valid ?
>
> Hi all,
>
> I have some question about LWT.
>
> I am wondering if LWT works only for data mutated by LWT or not.
> In other words, doing LWT on some data mutated by non-LWT operations
> is still valid ?
> I don't fully understand how system.paxos table works in LWT,
> but row_key should be empty for a data mutated by non-LWT operation,
> so conflict resolution seems impossible.
> It works only if a previous non-LWT operation is completely finished ?
>
> Thanks in advance.
>
> Best regards,
> Hiroyuki
>
> -
> To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
> For additional commands, e-mail: user-h...@cassandra.apache.org
>

-
To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
For additional commands, e-mail: user-h...@cassandra.apache.org

LWT on data mutated by non-LWT operation is valid ?

2018-03-23 Thread Hiroyuki Yamada

Hi all,

I have some question about LWT.

I am wondering if LWT works only for data mutated by LWT or not.
In other words, doing LWT on some data mutated by non-LWT operations
is still valid ?
I don't fully understand how system.paxos table works in LWT,
but row_key should be empty for a data mutated by non-LWT operation,
so conflict resolution seems impossible.
It works only if a previous non-LWT operation is completely finished ?

Thanks in advance.

Best regards,
Hiroyuki

-
To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
For additional commands, e-mail: user-h...@cassandra.apache.org

Re: Authentication with Java driver

2017-02-07 Thread Hiroyuki Yamada

Hi,

The API seems kind of not correct because credentials should be
usually set with a session but actually they are set with a cluster.

So, if there are 1000 clients, then with this API it has to create
1000 cluster instances ?
1000 clients seems usual if there are many nodes (say 20) and each
node has some concurrency (say 50),
but 1000 cluster instances seems too many.

Is this an expected way to do this ? or
Is there any way to authenticate per session ?

Thanks,
Hiro

On Tue, Feb 7, 2017 at 11:38 AM, Yuji Ito  wrote:
> Hi all,
>
> I want to know how to authenticate Cassandra users for multiple instances
> with Java driver.
> For instance, each thread creates a instance to access Cassandra with
> authentication.
>
> As the implementation example, only the first constructor builds a cluster
> and a session.
> Other constructors use them.
> This example is implemented according to the datastax document: "Basically
> you will want to share the same cluster and session instances across your
> application".
> http://www.datastax.com/dev/blog/4-simple-rules-when-using-the-datastax-drivers-for-cassandra
>
> However, other constructors don't authenticate the user and the password.
> That's because they don't need to build a cluster and a session.
>
> So, should I create a cluster and a session per instance for the
> authentication?
> If yes, can I create a lot of instances(clusters and sessions) to access C*
> concurrently?
>
> == example ==
> public class A {
>   private static Cluster cluster = null;
>   private static Map sessions = null;
>   private Session session;
>
>   public A (String keyspace, String user, String password) {
> if (cluster == null) {
> builder = Cluster.builder();
> ...
> builder = builder.withCredentials(user, password);
> cluster = builder.build();
> }
> session = sessions.get(keyspace);
> if (session == null) {
> session = cluster.connection(keyspace);
> sessions.put(keyspace, session)
> }
> ...
>   }
>   ...
>   public ResultSet update(...) {
>   ...
>   public ResultSet get(...) {
>   ...
> }
> == example ==
>
> Thanks,
> Yuji

Re: Benefit of LOCAL_SERIAL consistency

2016-12-07 Thread Hiroyuki Yamada

Hi DuyHai,

Thank you for the comments.
Yes, that's exactly what I mean.
(Your comment is very helpful to support my opinion.)

As you said, SERIAL with multi-DCs incurs latency increase,
but it's a trade-off between latency and high availability bacause one
DC can be down from a disaster.
I don't think there is any way to achieve global linearlizability
without latency increase, right ?

> Edward
Thank you for the ticket.
I'll read it through.

Thanks,
Hiro

On Thu, Dec 8, 2016 at 12:01 AM, Edward Capriolo <edlinuxg...@gmail.com> wrote:
>
>
> On Wed, Dec 7, 2016 at 8:25 AM, DuyHai Doan <doanduy...@gmail.com> wrote:
>>
>> The reason you don't want to use SERIAL in multi-DC clusters is the
>> prohibitive cost of lightweight transaction (in term of latency), especially
>> if your data centers are separated by continents. A ping from London to New
>> York takes 52ms just by speed of light in optic cable. Since LightWeight
>> Transaction involves 4 network round-trips, it means at least 200ms just for
>> raw network transfer, not even taking into account the cost of processing
>> the operation
>>
>> You're right to raise a warning about mixing LOCAL_SERIAL with SERIAL.
>> LOCAL_SERIAL guarantees you linearizability inside a DC, SERIAL guarantees
>> you linearizability across multiple DC.
>>
>> If I have 3 DCs with RF = 3 each (total 9 replicas) and I did an INSERT IF
>> NOT EXISTS with LOCAL_SERIAL in DC1, then it's possible that a subsequent
>> INSERT IF NOT EXISTS on the same record succeeds when using SERIAL because
>> SERIAL on 9 replicas = at least 5 replicas. Those 5 replicas which respond
>> can come from DC2 and DC3 and thus did not apply yet the previous INSERT...
>>
>> On Wed, Dec 7, 2016 at 2:14 PM, Hiroyuki Yamada <mogwa...@gmail.com>
>> wrote:
>>>
>>> Hi,
>>>
>>> I have been using lightweight transactions for several months now and
>>> wondering what is the benefit of having LOCAL_SERIAL serial consistency
>>> level.
>>>
>>> With SERIAL, it achieves global linearlizability,
>>> but with LOCAL_SERIAL, it only achieves DC-local linearlizability,
>>> which is missing point of linearlizability, I think.
>>>
>>> So, for example,
>>> once when SERIAL is used,
>>> we can't use LOCAL_SERIAL to achieve local linearlizability
>>> since data in local DC might not be updated yet to meet quorum.
>>> And vice versa,
>>> once when LOCAL_SERIAL is used,
>>> we can't use SERIAL to achieve global linearlizability
>>> since data is not globally updated yet to meet quorum .
>>>
>>> So, it would be great if we can use LOCAL_SERIAL if possible and
>>> use SERIAL only if local DC is down or unavailable,
>>> but based on the example above, I think it is not possible, is it ?
>>> So, I am not sure about what is the good use case for LOCAL_SERIAL.
>>>
>>> The only case that I can think of is having a cluster in one DC for
>>> online transactions and
>>> having another cluster in another DC for analytics purpose.
>>> In this case, I think there is no big point of using SERIAL since data
>>> for analytics sometimes doesn't have to be very correct/fresh and
>>> data can be asynchronously replicated to analytics node. (so using
>>> LOCAL_SERIAL for one DC makes sense.)
>>>
>>> Could anyone give me some thoughts about it ?
>>>
>>> Thanks,
>>> Hiro
>>
>>
>
> You're right to raise a warning about mixing LOCAL_SERIAL with SERIAL.
> LOCAL_SERIAL guarantees you linearizability inside a DC, SERIAL guarantees
> you linearizability across multiple DC.
>
> I am not sure what of the state of this is anymore but I was under the
> impression the linearizability of lwt was in question. I never head it
> specifically addressed.
>
> https://issues.apache.org/jira/browse/CASSANDRA-6106
>
> Its hard to follow 6106 because most of the tasks are closed 'fix later'  or
> closed 'not a problem' .

Benefit of LOCAL_SERIAL consistency

2016-12-07 Thread Hiroyuki Yamada

Hi,

I have been using lightweight transactions for several months now and
wondering what is the benefit of having LOCAL_SERIAL serial consistency level.

With SERIAL, it achieves global linearlizability,
but with LOCAL_SERIAL, it only achieves DC-local linearlizability,
which is missing point of linearlizability, I think.

So, for example,
once when SERIAL is used,
we can't use LOCAL_SERIAL to achieve local linearlizability
since data in local DC might not be updated yet to meet quorum.
And vice versa,
once when LOCAL_SERIAL is used,
we can't use SERIAL to achieve global linearlizability
since data is not globally updated yet to meet quorum .

So, it would be great if we can use LOCAL_SERIAL if possible and
use SERIAL only if local DC is down or unavailable,
but based on the example above, I think it is not possible, is it ?
So, I am not sure about what is the good use case for LOCAL_SERIAL.

The only case that I can think of is having a cluster in one DC for
online transactions and
having another cluster in another DC for analytics purpose.
In this case, I think there is no big point of using SERIAL since data
for analytics sometimes doesn't have to be very correct/fresh and
data can be asynchronously replicated to analytics node. (so using
LOCAL_SERIAL for one DC makes sense.)

Could anyone give me some thoughts about it ?

Thanks,
Hiro

Re: Does recovery continue after truncating a table?

2016-11-26 Thread Hiroyuki Yamada

Hi Yuji and Ben,

I tried out this revised script and the same issue occurred to me, too.
I think it's definitely a bug to be solved asap.

>Ben
What do you mean "an undocumented limitation" ?

Thanks,
Hiro

On Sat, Nov 26, 2016 at 3:13 PM, Ben Slater  wrote:
> Nice detective work! Seems to me that it’s a best an undocumented limitation
> and potentially could be viewed as a bug - maybe log another JIRA?
>
> One node - there is a nodetool truncatehints command that could be used to
> clear out the hints
> (http://cassandra.apache.org/doc/latest/tools/nodetool/truncatehints.html?highlight=truncate)
> . However, it seems to clear all hints on particular endpoint, not just for
> a specific table.
>
> Cheers
> Ben
>
> On Fri, 25 Nov 2016 at 17:42 Yuji Ito  wrote:
>>
>> Hi all,
>>
>> I revised the script to reproduce the issue.
>> I think the issue happens more frequently than before.
>> Killing another node is added to the previous script.
>>
>>  [script] 
>> #!/bin/sh
>>
>> node1_ip=
>> node2_ip=
>> node3_ip=
>> node2_user=
>> node3_user=
>> rows=1
>>
>> echo "consistency quorum;" > init_data.cql
>> for key in $(seq 0 $(expr $rows - 1))
>> do
>> echo "insert into testdb.testtbl (key, val) values($key, ) IF NOT
>> EXISTS;" >> init_data.cql
>> done
>>
>> while true
>> do
>> echo "truncate the table"
>> cqlsh $node1_ip -e "truncate table testdb.testtbl" > /dev/null 2>&1
>> if [ $? -ne 0 ]; then
>> echo "truncating failed"
>> continue
>> else
>> break
>> fi
>> done
>>
>> echo "kill C* process on node3"
>> pdsh -l $node3_user -R ssh -w $node3_ip "ps auxww | grep CassandraDaemon |
>> awk '{if (\$13 ~ /cassand/) print \$2}' | xargs sudo kill -9"
>>
>> echo "insert $rows rows"
>> cqlsh $node1_ip -f init_data.cql > insert_log 2>&1
>>
>> echo "restart C* process on node3"
>> pdsh -l $node3_user -R ssh -w $node3_ip "sudo /etc/init.d/cassandra start"
>>
>> while true
>> do
>> echo "truncate the table again"
>> cqlsh $node1_ip -e "truncate table testdb.testtbl"
>> if [ $? -ne 0 ]; then
>> echo "truncating failed"
>> continue
>> else
>> echo "truncation succeeded!"
>> break
>> fi
>> done
>>
>> echo "kill C* process on node2"
>> pdsh -l $node2_user -R ssh -w $node2_ip "ps auxww | grep CassandraDaemon |
>> awk '{if (\$13 ~ /cassand/) print \$2}' | xargs sudo kill -9"
>>
>> cqlsh $node1_ip --request-timeout 3600 -e "consistency serial; select
>> count(*) from testdb.testtbl;"
>> sleep 10
>> cqlsh $node1_ip --request-timeout 3600 -e "consistency serial; select
>> count(*) from testdb.testtbl;"
>>
>> echo "restart C* process on node2"
>> pdsh -l $node2_user -R ssh -w $node2_ip "sudo /etc/init.d/cassandra start"
>>
>>
>> Thanks,
>> yuji
>>
>>
>> On Fri, Nov 18, 2016 at 7:52 PM, Yuji Ito  wrote:
>>>
>>> I investigated source code and logs of killed node.
>>> I guess that unexpected writes are executed when truncation is being
>>> executed.
>>>
>>> Some writes were executed after flush (the first flush) in truncation and
>>> these writes could be read.
>>> These writes were requested as MUTATION by another node for hinted
>>> handoff.
>>> Their data was stored to a new memtable and flushed (the second flush) to
>>> a new SSTable before snapshot in truncation.
>>> So, the truncation discarded only old SSTables, not the new SSTable.
>>> That's because ReplayPosition which was used for discarding SSTable was
>>> that of the first flush.
>>>
>>> I copied some parts of log as below.
>>> "##" line is my comment.
>>> The point is that the ReplayPosition is moved forward by the second
>>> flush.
>>> It means some writes are executed after the first flush.
>>>
>>> == log ==
>>> ## started truncation
>>> TRACE [SharedPool-Worker-16] 2016-11-17 08:36:04,612
>>> ColumnFamilyStore.java:2790 - truncating testtbl
>>> ## the first flush started before truncation
>>> DEBUG [SharedPool-Worker-16] 2016-11-17 08:36:04,612
>>> ColumnFamilyStore.java:952 - Enqueuing flush of testtbl: 591360 (0%)
>>> on-heap, 0 (0%) off-heap
>>> INFO  [MemtableFlushWriter:1] 2016-11-17 08:36:04,613 Memtable.java:352 -
>>> Writing Memtable-testtbl@1863835308(42.625KiB serialized bytes, 2816 ops,
>>> 0%/0% of on/off-heap limit)
>>> ...
>>> DEBUG [MemtableFlushWriter:1] 2016-11-17 08:36:04,973 Memtable.java:386 -
>>> Completed flushing
>>> /var/lib/cassandra/data/testdb/testtbl-562848f0a55611e68b1451065d58fdfb/tmp-lb-1-big-Data.db
>>> (17.651KiB) for commitlog position ReplayPosition(segmentId=1479371760395,
>>> position=315867)
>>> ## this ReplayPosition was used for discarding SSTables
>>> ...
>>> TRACE [MemtablePostFlush:1] 2016-11-17 08:36:05,022 CommitLog.java:298 -
>>> discard completed log segments for ReplayPosition(segmentId=1479371760395,
>>> position=315867), table 562848f0-a556-11e6-8b14-51065d58fdfb
>>> ## end of the first flush
>>> DEBUG [SharedPool-Worker-16] 2016-11-17 08:36:05,028
>>>

Re: How does the "batch" commit log sync works

2016-10-30 Thread Hiroyuki Yamada

Hello Benedict and Edward,

Thank you very much for the comments.
I think the batch parameter is useful when doing some transactional
processing on C* where we need atomicity and higher durability.

Anyways, I think it is not working as expected at least in the latest
versions in 2.1 and 2.2.
So, I created a ticket in JIRA.
https://issues.apache.org/jira/browse/CASSANDRA-12864

I hope it will be fixed soon.

Thanks,
Hiro

On Fri, Oct 28, 2016 at 6:00 PM, Benedict Elliott Smith
<bened...@apache.org> wrote:
> That is the maximum length of time that queries may be batched together for,
> not the minimum. If there is a break in the flow of queries for the commit
> log, it will commit those outstanding immediately.  It will anyway commit in
> clusters of commit log file size (default 32Mb).
>
> I know the documentation used to disagree with itself in a few places, and
> with actual behaviour, but I thought that had been fixed.  I suggest you
> file a ticket if you find a mention that does not match this description.
>
> Really the batch period is a near useless parameter.  If it were to be
> honoured as a minimum, performance would decline due to the threading model
> in Cassandra (and it will be years before this and memory management improve
> enough to support that behaviour).
>
> Conversely honouring it as a maximum is only possible for very small values,
> just by nature of queueing theory.
>
> I believe I proposed removing the parameter entirely some time ago, though
> it is lost in the mists of time.
>
> Anyway, many people do indeed use this commitlog mode successfully, although
> it is by far less common than periodic mode.  This behaviour does not mean
> your data is in anyway unsafe.
>
>
> On Friday, 28 October 2016, Edward Capriolo <edlinuxg...@gmail.com> wrote:
>>
>> I mentioned during my Cassandra.yaml presentation at the summit that I
>> never saw anyone use these settings. Things off by default are typically not
>> highly not covered well by tests. It sounds like it is not working. Quick
>> suggestion: go back in time maybe to a version like 1.2.X or 0.7 and see if
>> it behaves like the yaml suggests it should.
>>
>> On Thu, Oct 27, 2016 at 11:48 PM, Hiroyuki Yamada <mogwa...@gmail.com>
>> wrote:
>>>
>>> Hello Satoshi and the community,
>>>
>>> I am also using commitlog_sync for durability, but I have never
>>> modified commitlog_sync_batch_window_in_ms parameter yet,
>>> so I wondered if it is working or not.
>>>
>>> As Satoshi said, I also changed commitlog_sync_batch_window_in_ms (to
>>> 1) and restarted C* and
>>> issued some INSERT command.
>>> But, it actually returned immediately right after issuing.
>>>
>>> So, it seems like the parameter is not working correctly.
>>> Are we missing something ?
>>>
>>> Thanks,
>>> Hiro
>>>
>>> On Thu, Oct 27, 2016 at 5:58 PM, Satoshi Hikida <sahik...@gmail.com>
>>> wrote:
>>> > Hi, all.
>>> >
>>> > I have a question about "batch" commit log sync behavior with C*
>>> > version
>>> > 2.2.8.
>>> >
>>> > Here's what I have done:
>>> >
>>> > * set commitlog_sync to the "batch" mode as follows:
>>> >
>>> >> commitlog_sync: batch
>>> >> commitlog_sync_batch_window_in_ms: 1
>>> >
>>> > * ran a script which inserts the data to a table
>>> > * prepared a disk dedicated to store the commit logs
>>> >
>>> > According to the DataStax document, I expected that fsync is done once
>>> > in a
>>> > batch window (one fsync per 10sec in this case) and writes issued
>>> > within
>>> > this batch window are blocked until fsync is completed.
>>> >
>>> > In my experiment, however, it seems that the write requests returned
>>> > almost
>>> > immediately (within 300~400 ms).
>>> >
>>> > Am I misunderstanding something? If so, can someone give me any advices
>>> > as
>>> > to the reason why C* behaves like this?
>>> >
>>> >
>>> > I referred to this document:
>>> >
>>> > https://docs.datastax.com/en/cassandra/2.2/cassandra/configuration/configCassandra_yaml.html#configCassandra_yaml__PerformanceTuningProps
>>> >
>>> > Regards,
>>> > Satoshi
>>> >
>>
>>
>

Re: How does the "batch" commit log sync works

2016-10-27 Thread Hiroyuki Yamada

Hello Satoshi and the community,

I am also using commitlog_sync for durability, but I have never
modified commitlog_sync_batch_window_in_ms parameter yet,
so I wondered if it is working or not.

As Satoshi said, I also changed commitlog_sync_batch_window_in_ms (to
1) and restarted C* and
issued some INSERT command.
But, it actually returned immediately right after issuing.

So, it seems like the parameter is not working correctly.
Are we missing something ?

Thanks,
Hiro

On Thu, Oct 27, 2016 at 5:58 PM, Satoshi Hikida  wrote:
> Hi, all.
>
> I have a question about "batch" commit log sync behavior with C* version
> 2.2.8.
>
> Here's what I have done:
>
> * set commitlog_sync to the "batch" mode as follows:
>
>> commitlog_sync: batch
>> commitlog_sync_batch_window_in_ms: 1
>
> * ran a script which inserts the data to a table
> * prepared a disk dedicated to store the commit logs
>
> According to the DataStax document, I expected that fsync is done once in a
> batch window (one fsync per 10sec in this case) and writes issued within
> this batch window are blocked until fsync is completed.
>
> In my experiment, however, it seems that the write requests returned almost
> immediately (within 300~400 ms).
>
> Am I misunderstanding something? If so, can someone give me any advices as
> to the reason why C* behaves like this?
>
>
> I referred to this document:
> https://docs.datastax.com/en/cassandra/2.2/cassandra/configuration/configCassandra_yaml.html#configCassandra_yaml__PerformanceTuningProps
>
> Regards,
> Satoshi
>

Re: Read operation can read uncommitted data?

2016-07-03 Thread Hiroyuki Yamada

Hi,

I'm also wondering if failed read/results phase (phase 2 in C* paxos)
is recovered by some other reads or not.
It seems easier to just return "failed" to clients if the phase 2
failed, because none of proposals are actually initiated.
Does anyone know about it ?

Thanks,
Hiroyuki


On Wed, Jun 29, 2016 at 11:45 AM, Yuji Ito  wrote:
> Tyler
>
> Thank you for your reply.
> I have 2 questions.
>
> Will the reads resume the CAS operation?
>
> Do the reads repair in-progress paxos writes in read/results phase too?
> According to the following article,
> when an UnavailableException or a WriteTimeoutException occur (propose or
> commit phase fail),
> reads will force Cassandra to commit the data.
> http://www.datastax.com/dev/blog/cassandra-error-handling-done-right
>
> Regards,
> Yuji Ito
>
>
> On Wed, Jun 29, 2016 at 7:03 AM, Tyler Hobbs  wrote:
>>
>> Reads at CL.SERIAL will complete any in-progress paxos writes, so the
>> behavior you're seeing is expected.
>>
>> On Mon, Jun 27, 2016 at 1:55 AM, Yuji Ito  wrote:
>>>
>>> Hi,
>>>
>>> I'm testing Cassandra CAS operation.
>>>
>>> Can a read operation read uncommitted data which is being updated by CAS
>>> in the following case?
>>>
>>> I use Cassandra 2.2.6.
>>> There are 3 nodes (A, B and C) in a cluster.
>>> Replication factor of keyspace is 3.
>>> CAS operation on node A starts to update row X (updating the value in row
>>> from 0 to 1).
>>>
>>> 1. prepare/promise phase succeeds on node A
>>> 2. node C is down
>>> 3. read/results phase in node A sends read requests to node B and C and
>>> waits for read responses from them.
>>> 4. (unrelated) read operation (CL: SERIAL) reads the same row X and gets
>>> the value "1" in the row!!
>>> 5. read/results phase fails by ReadTimeoutException caused by failure of
>>> node C
>>>
>>> Thanks,
>>> Yuji Ito
>>
>>
>>
>>
>> --
>> Tyler Hobbs
>> DataStax
>
>

About the data structure of partition index

2016-05-17 Thread Hiroyuki Yamada

Hi,

I am wondering how many primary keys are stored in one partition index.

As the following documents say,




I understand that each partition index has a list of primary keys and
the start position of compression offset map,
So, I assume the logical data structure of a partition index would be
like the following:

| [pkey1-pkeyN] | offset-to compression offset map |
(indexed by the first column to retrieve by a partition key)

I am wondering if it is a correct understanding and
how many primary keys are stored in the first column.

If it is not correct, would anyone give me the correct logical data structure ?

Thanks,
Hiro

Re: How can I make Cassandra stable in a 2GB RAM node environment ?

2016-03-11 Thread Hiroyuki Yamada

Thank you all to respond and discuss my question.

I agree with you all basically,
but, I think, in Cassandra case, it seems a matter of how much data we use
with how much memory we have.

As Jack's (and datastax's) suggestion,
I also used 4GM RAM machine (t2.medium) with 1 billion records (about 100GB
in size) with default configuration except for leveledCompactionStrategy,
but after completion of insertion from an application program, probably
compaction kept working,
and again, later Cassandra was killed by OOM killer.

Insertion from application side is finished, so the issue is maybe from
compaction happening in background.
Is there any recommended configuration in compaction to make Cassandra
stable with large dataset (more than 100GB) with kind of low memory (4GB)
environment ?

I think it would be the same thing if I try the experiment with 8GB memory
and larger data set (maybe more than 2 billion records).
(If it is not correct, please explain why.)

Best regards,
Hiro

On Fri, Mar 11, 2016 at 4:19 AM, Robert Coli  wrote:

> On Thu, Mar 10, 2016 at 3:27 AM, Alain RODRIGUEZ 
> wrote:
>
>> So, like Jack, I globally really not recommend it unless you know what
>> you are doing and don't care about facing those issues.
>>
>
> Certainly a spectrum of views here, but everyone (including OP) seems to
> agree with the above. :D
>
> =Rob
>
>

How can I make Cassandra stable in a 2GB RAM node environment ?

2016-03-04 Thread Hiroyuki Yamada

Hi,

I'm working on some POCs for Cassandra with single 2GB RAM node environment
and
some issues came up with me, so let me ask here.

I have tried to insert about 200 million records (about 11GB in size) to
the node,
and the insertion from an application program seems completed,
but something (probably compaction?) was happening after the insertion and
later Cassandra itself was killed by OOM killer.

I've tried to tune the configurations including heap size, compaction
memory setting and bloom filter setting
to make C* work nicely in the low memory environment,
but in any cases, it doesn't work so far. (which means I still get OOM
eventually)

I know it is not very recommended to run C* in such low memory environment,
but I am wondering what can I do (what configurations to change) to make it
a little more stable in such environment.
(I understand the following configuration is very tight and not very
recommended but I just want to make it work now)

Could anyone give me a help ?


Hardware and software :
- EC2 instance (t2.small: 1vCPU, 2GB RAM)
- Cassandra 2.2.5
- JDK 8 (8u73)

Cassandara configuraions (what I changed from the default) :
- leveledCompactionStrategy
- custom configuration settings of cassandra-env.sh
- MAX_HEAP_SIZE: 640MB
- HEAP_NEWSIZE: 128MB
- custom configuration settings of cassandra.yaml
- commitlog_segment_size_in_mb: 4
- commitlog_total_space_in_mb: 512
- sstable_preemptive_open_interval_in_mb: 16
- file_cache_size_in_mb: 40
- memtable_heap_space_in_mb: 40
- key_cache_size_in_mb: 0
- bloom filter is disabled


=== debug.log around when Cassandra was killed by OOM killer ===
DEBUG [NonPeriodicTasks:1] 2016-03-04 00:36:02,378
FileCacheService.java:177 - Invalidating cache for
/var/lib/cassandra/data/test/user-adc91d20e15011e586c53fd5b957bea8/tmplink-la-15626-big-Data.db
DEBUG [NonPeriodicTasks:1] 2016-03-04 00:36:09,903
FileCacheService.java:177 - Invalidating cache for
/var/lib/cassandra/data/test/user-adc91d20e15011e586c53fd5b957bea8/tmplink-la-15622-big-Data.db
DEBUG [NonPeriodicTasks:1] 2016-03-04 00:36:14,360
FileCacheService.java:177 - Invalidating cache for
/var/lib/cassandra/data/test/user-adc91d20e15011e586c53fd5b957bea8/tmplink-la-15626-big-Data.db
DEBUG [NonPeriodicTasks:1] 2016-03-04 00:36:20,004
FileCacheService.java:177 - Invalidating cache for
/var/lib/cassandra/data/test/user-adc91d20e15011e586c53fd5b957bea8/tmplink-la-15622-big-Data.db
==

=== /var/log/message ===
Mar  4 00:36:22 ip-10-0-0-11 kernel: Killed process 8919 (java)
total-vm:32407840kB, anon-rss:1535020kB, file-rss:123096kB
==


Best regards,
Hiro

Re: what consistency level should I set when using IF NOT EXIST or UPDATE IF statements ?

2016-01-14 Thread Hiroyuki Yamada

Thanks DuyHan !
That's clear and helpful.
(and I realized that we need to call setSerialConsistency for SERIAL and
setConsistency for others.)

Thanks,
Hiro

On Tue, Jan 12, 2016 at 9:34 PM, DuyHai Doan <doanduy...@gmail.com> wrote:

> There are 2 levels of consistency levels you can define on your query when
> using Lightweight Transaction:
>
> - one for the Paxos round: SERIAL or LOCAL_SERIAL (which indeed
> corresponds to QUORUM/LOCAL_QUORUM but named differently so people do not
> get confused)
>
> - one for the consistency of the mutation itself. In this case you can use
> any CL except SERIAL/LOCAL_SERIAL
>
> Setting the consistency level for Paxos is useful in the context of multi
> data centers only. SERIAL => require a majority wrt RF in all DCs.
> LOCAL_SERIAL => majority wrt RF in local DC only
>
> Hope that helps
>
>
>
> On Thu, Jan 7, 2016 at 10:44 AM, Hiroyuki Yamada <mogwa...@gmail.com>
> wrote:
>
>> Hi,
>>
>> I've been doing some POCs of lightweight transactions and
>> I come up with some questions, so please let me ask them to you here.
>>
>> So the question is:
>> what consistency level should I set when using IF NOT EXIST or UPDATE IF
>> statements ?
>>
>> I used the statements with ONE and QUORUM first, then it seems fine.
>> But, when I set SERIAL, it gave me the following error.
>>
>> === error message ===
>> Caused by: com.datastax.driver.core.exceptions.InvalidQueryException:
>> SERIAL is not supported as conditional update commit consistency. Use ANY
>> if you mean "make sure it is accepted but I don't care how many replicas
>> commit it for non-SERIAL reads"
>> === error message ===
>>
>>
>> So, I'm wondering what's SERIAL for when writing (and reading) and
>> what the differences are in setting ONE, QUORUM and ANY when using IF NOT
>> EXIST or UPDATE IF statements.
>>
>> Could you give me some advises ?
>>
>> Thanks,
>> Hiro
>>
>>
>>
>>
>>
>

Re: what consistency level should I set when using IF NOT EXIST or UPDATE IF statements ?

2016-01-11 Thread Hiroyuki Yamada

Can anyone answer my questions ?
I think the current datastax documents including python's one don't
describe how we should set consistency with lightweight transactions
precisely.

Regards,
Hiro

On Fri, Jan 8, 2016 at 11:48 AM, Hiroyuki Yamada <mogwa...@gmail.com> wrote:

> Thanks Tyler.
>
> I've read the python document and it's a bit more clear than before,
> but i'm still confused at what combinations make lightweight transaction
> operations work correctly.
>
> So, let me clarify the conditions where lightweight transactions work.
>
> QUORUM conditional write -> QUORUM read => OK (meets linearizability)
> ANY conditional write -> SERIAL read =>  OK (meets linearizability)
> ONE conditional write -> SERIAL read => OK ?
> SERIAL conditional write -> ??? read => ERROR for some reasons (why?)
>
> One question is that my understanding about the top 2 conditions are
> correct ?
> And the other question is "ONE conditional write - SERIAL read" is ok ?
> Also, why SERIAL conditional write fails
> even though SERIAL conditional write with (for example) ANY read
> afterwards seems logically OK ?
>
> The following document says that it seems like we can specify SERIAL in
> writes,
> so, when should I use SERIAL in writes except conditional writes (, which
> fails) ?
> <
> https://docs.datastax.com/en/cassandra/2.0/cassandra/dml/dml_config_consistency_c.html
> >
>
>
> Thanks,
> Hiro
>
>
>
> On Fri, Jan 8, 2016 at 2:44 AM, Tyler Hobbs <ty...@datastax.com> wrote:
>
>> The python driver docs explain this pretty well, I think:
>> http://datastax.github.io/python-driver/api/cassandra/query.html#cassandra.query.Statement.serial_consistency_level
>>
>> On Thu, Jan 7, 2016 at 3:44 AM, Hiroyuki Yamada <mogwa...@gmail.com>
>> wrote:
>>
>>> Hi,
>>>
>>> I've been doing some POCs of lightweight transactions and
>>> I come up with some questions, so please let me ask them to you here.
>>>
>>> So the question is:
>>> what consistency level should I set when using IF NOT EXIST or UPDATE IF
>>> statements ?
>>>
>>> I used the statements with ONE and QUORUM first, then it seems fine.
>>> But, when I set SERIAL, it gave me the following error.
>>>
>>> === error message ===
>>> Caused by: com.datastax.driver.core.exceptions.InvalidQueryException:
>>> SERIAL is not supported as conditional update commit consistency. Use ANY
>>> if you mean "make sure it is accepted but I don't care how many replicas
>>> commit it for non-SERIAL reads"
>>> === error message ===
>>>
>>>
>>> So, I'm wondering what's SERIAL for when writing (and reading) and
>>> what the differences are in setting ONE, QUORUM and ANY when using IF
>>> NOT EXIST or UPDATE IF statements.
>>>
>>> Could you give me some advises ?
>>>
>>> Thanks,
>>> Hiro
>>>
>>>
>>>
>>>
>>>
>>
>>
>> --
>> Tyler Hobbs
>> DataStax <http://datastax.com/>
>>
>
>

Re: what consistency level should I set when using IF NOT EXIST or UPDATE IF statements ?

2016-01-07 Thread Hiroyuki Yamada

Thanks Tyler.

I've read the python document and it's a bit more clear than before,
but i'm still confused at what combinations make lightweight transaction
operations work correctly.

So, let me clarify the conditions where lightweight transactions work.

QUORUM conditional write -> QUORUM read => OK (meets linearizability)
ANY conditional write -> SERIAL read =>  OK (meets linearizability)
ONE conditional write -> SERIAL read => OK ?
SERIAL conditional write -> ??? read => ERROR for some reasons (why?)

One question is that my understanding about the top 2 conditions are
correct ?
And the other question is "ONE conditional write - SERIAL read" is ok ?
Also, why SERIAL conditional write fails
even though SERIAL conditional write with (for example) ANY read afterwards
seems logically OK ?

The following document says that it seems like we can specify SERIAL in
writes,
so, when should I use SERIAL in writes except conditional writes (, which
fails) ?
<
https://docs.datastax.com/en/cassandra/2.0/cassandra/dml/dml_config_consistency_c.html
>

Thanks,
Hiro

On Fri, Jan 8, 2016 at 2:44 AM, Tyler Hobbs <ty...@datastax.com> wrote:

> The python driver docs explain this pretty well, I think:
> http://datastax.github.io/python-driver/api/cassandra/query.html#cassandra.query.Statement.serial_consistency_level
>
> On Thu, Jan 7, 2016 at 3:44 AM, Hiroyuki Yamada <mogwa...@gmail.com>
> wrote:
>
>> Hi,
>>
>> I've been doing some POCs of lightweight transactions and
>> I come up with some questions, so please let me ask them to you here.
>>
>> So the question is:
>> what consistency level should I set when using IF NOT EXIST or UPDATE IF
>> statements ?
>>
>> I used the statements with ONE and QUORUM first, then it seems fine.
>> But, when I set SERIAL, it gave me the following error.
>>
>> === error message ===
>> Caused by: com.datastax.driver.core.exceptions.InvalidQueryException:
>> SERIAL is not supported as conditional update commit consistency. Use ANY
>> if you mean "make sure it is accepted but I don't care how many replicas
>> commit it for non-SERIAL reads"
>> === error message ===
>>
>>
>> So, I'm wondering what's SERIAL for when writing (and reading) and
>> what the differences are in setting ONE, QUORUM and ANY when using IF NOT
>> EXIST or UPDATE IF statements.
>>
>> Could you give me some advises ?
>>
>> Thanks,
>> Hiro
>>
>>
>>
>>
>>
>
>
> --
> Tyler Hobbs
> DataStax <http://datastax.com/>
>

Re: design principle to manage roll back

Re: Cassandra is not showing a node up hours after restart

Released a simple and integrated backup tool for Apache Cassandra

Re: [EXTERNAL] Re: loading big amount of data to Cassandra

Re: Cassandra LWT Writes inconsistent

Re: Necessary consistency level for LWT writes

Re: Necessary consistency level for LWT writes

Re: A cluster (RF=3) not recovering after two nodes are stopped

Re: A cluster (RF=3) not recovering after two nodes are stopped

Re: A cluster (RF=3) not recovering after two nodes are stopped

Re: A cluster (RF=3) not recovering after two nodes are stopped

Re: A cluster (RF=3) not recovering after two nodes are stopped

Re: A cluster (RF=3) not recovering after two nodes are stopped

A cluster (RF=3) not recovering after two nodes are stopped

Re: Apache Cassandra transactions commit and rollback

Re: Released an ACID-compliant transaction library on top of Cassandra

Re: Released an ACID-compliant transaction library on top of Cassandra

Released an ACID-compliant transaction library on top of Cassandra

Is it safe to use paxos protocol in LWT from patent perspective ?

Re: LWT on data mutated by non-LWT operation is valid ?

Re: LWT on data mutated by non-LWT operation is valid ?

LWT on data mutated by non-LWT operation is valid ?

Re: Authentication with Java driver

Re: Benefit of LOCAL_SERIAL consistency

Benefit of LOCAL_SERIAL consistency

Re: Does recovery continue after truncating a table?

Re: How does the "batch" commit log sync works

Re: How does the "batch" commit log sync works

Re: Read operation can read uncommitted data?

About the data structure of partition index

Re: How can I make Cassandra stable in a 2GB RAM node environment ?

How can I make Cassandra stable in a 2GB RAM node environment ?

Re: what consistency level should I set when using IF NOT EXIST or UPDATE IF statements ?

Re: what consistency level should I set when using IF NOT EXIST or UPDATE IF statements ?

Re: what consistency level should I set when using IF NOT EXIST or UPDATE IF statements ?

35 matches

Site Navigation

Mail list logo

Footer information