Re: Practical limitations of too many columns/cells ?

2015-08-25 Thread Kevin Burton
No problem.  IS there a JIRA ticket already for this?

On Mon, Aug 24, 2015 at 6:06 AM, Jonathan Haddad j...@jonhaddad.com wrote:

 Can you post your findings to JIRA as well?  Would be good to see some
 real numbers from production.

 The refactor of the storage engine (8099) may completely change this, but
 it's good to have it on the radar.


 On Sun, Aug 23, 2015 at 10:31 PM Kevin Burton bur...@spinn3r.com wrote:

 Agreed.  We’re going to run a benchmark.  Just realized we grew to 144
 columns.  Fun.  Kind of disappointing that Cassandra is so slow in this
 regard.  Kind of defeats the whole point of flexible schema if actually
 using that feature is slow as hell.

 On Sun, Aug 23, 2015 at 4:54 PM, Jeff Jirsa jeff.ji...@crowdstrike.com
 wrote:

 The key is to benchmark it with your real data. Modern cassandra-stress
 let’s you get very close to your actual read/write behavior, and the real
 differentiator will depend on your use case (how often do you write the
 whole row vs updating just one column/field). My gist shows a ton of
 different examples, but they’re not scientific, and at this point they’re
 old versions (and performance varies version to version).

 - Jeff

 From: burtonator2...@gmail.com on behalf of Kevin Burton
 Reply-To: user@cassandra.apache.org
 Date: Sunday, August 23, 2015 at 2:58 PM
 To: user@cassandra.apache.org
 Subject: Re: Practical limitations of too many columns/cells ?

 Ah.. yes.  Great benchmarks. If I’m interpreting them correctly it was
 ~15x slower for 22 columns vs 2 columns?

 Guess we have to refactor again :-P

 Not the end of the world of course.

 On Sun, Aug 23, 2015 at 1:53 PM, Jeff Jirsa jeff.ji...@crowdstrike.com
 wrote:

 A few months back, a user in #cassandra on freenode mentioned that when
 they transitioned from thrift to cql, their overall performance decreased
 significantly. They had 66 columns per table, so I ran some benchmarks with
 various versions of Cassandra and thrift/cql combinations.

 It shouldn’t really surprise you that more columns = more work = slower
 operations. It’s not necessarily the size of the writes, but the amount of
 work that needs to be done with the extra cells (2 large columns totaling
 2k performs better than 66 small columns totaling 0.66k even though it’s
 three times as much raw data being written to disk)

 https://gist.github.com/jeffjirsa/6e481b132334dfb6d42c

 2.0.13, 2 tokens per node, 66 columns, 10 bytes per column, thrift (660
 bytes per): cassandra-stress --operation INSERT --num-keys 100
 --columns 66 --column-size=10 --replication-factor 2 --nodesfile=nodes
 Averages from the middle 80% of values: interval_op_rate : 10720

 2.0.13, 2 tokens per node, 20 columns, 10 bytes per column, thrift (200
 bytes per): cassandra-stress --operation INSERT --num-keys 100
 --columns 20 --column-size=10 --replication-factor 2 --nodesfile=nodes
 Averages from the middle 80% of values: interval_op_rate : 28667

 2.0.13, 2 tokens per node, 2 large columns, thrift (2048 bytes per):
 cassandra-stress --operation INSERT --num-keys 100 --columns 2
 --column-size=1024 --replication-factor 2 --nodesfile=nodes Averages
 from the middle 80% of values: interval_op_rate : 23489

 From: burtonator2...@gmail.com on behalf of Kevin Burton
 Reply-To: user@cassandra.apache.org
 Date: Sunday, August 23, 2015 at 1:02 PM
 To: user@cassandra.apache.org
 Subject: Practical limitations of too many columns/cells ?

 Is there any advantage to using say 40 columns per row vs using 2
 columns (one for the pk and the other for data) and then shoving the data
 into a BLOB as a JSON object?

 To date, we’ve been just adding new columns.  I profiled Cassandra and
 about 50% of the CPU time is spent on CPU doing compactions.  Seeing that
 CS is being CPU bottlenecked maybe this is a way I can optimize it.

 Any thoughts?

 --

 Founder/CEO Spinn3r.com
 Location: *San Francisco, CA*
 blog: http://burtonator.wordpress.com
 … or check out my Google+ profile
 https://plus.google.com/102718274791889610666/posts




 --

 Founder/CEO Spinn3r.com
 Location: *San Francisco, CA*
 blog: http://burtonator.wordpress.com
 … or check out my Google+ profile
 https://plus.google.com/102718274791889610666/posts




 --

 Founder/CEO Spinn3r.com
 Location: *San Francisco, CA*
 blog: http://burtonator.wordpress.com
 … or check out my Google+ profile
 https://plus.google.com/102718274791889610666/posts




-- 

Founder/CEO Spinn3r.com
Location: *San Francisco, CA*
blog: http://burtonator.wordpress.com
… or check out my Google+ profile
https://plus.google.com/102718274791889610666/posts


Re: How can I specify the file_data_directories for a keyspace

2015-08-25 Thread Jeff Jirsa
At this point, it is only/automatically managed by cassandra, but if you’re 
clever with mount points you can probably work around the limitation.



From:  Ahmed Eljami
Reply-To:  user@cassandra.apache.org
Date:  Tuesday, August 25, 2015 at 2:09 AM
To:  user@cassandra.apache.org
Subject:  How can I specify the file_data_directories for a keyspace

When I defines several file_data_directories in cassandra.yaml, would it be 
possible to specify the location keyspace and tables ? or it is only and 
automatically managed by Cassandra.

Thx.

-- 
Ahmed ELJAMI



smime.p7s
Description: S/MIME cryptographic signature


Re: lightweight transactions with potential problem?

2015-08-25 Thread ibrahim El-sanosi
What an excellent explanation!!, thank you a lot.

By the way, I do not understand why in lightweight transactions in
Cassandra has round-trip commit/acknowledgment?

For me, I think we can commit the value within phase propose/accept. Do you
agree? If not agree can you explain why we need commit/acknowledgment?



Regards,



ibrahim


Re: lightweight transactions with potential problem?

2015-08-25 Thread Sylvain Lebresne


 So you meant that the older ballot will not only reject in round-trip1
 (prepare/promise), it also can be reject in propose/accept round-trips2, Is
 that correct?


Yes.



 You Said : Or more precisely, you got step 8 wrong: when a replica
 PROMISE, the promise is not that they won't promise a ballot older than
 2,it's that they won't accept a ballot older than 2

 Why step 8 wrong? I think replicas can accept any highest ballot, so
 ballot 2 is the highest in step 8? what do you think?
  Do you also mean replica can promise older ballot.


I shouldn't have said wrong. What I meant is that your description of
what a PROMISE meant was incomplete. It's true that in practice replicas
won't promise older ballots, but it's not the important property in this
case, the important property is that they also promise to not accept any
older ballot.



 I wish you could make it more clear.

 Thank you a lot Sylvain

 Ibrahim


 On Tue, Aug 25, 2015 at 1:40 PM, Sylvain Lebresne sylv...@datastax.com
 wrote:

 That scenario cannot happen. More specifically, your step 12 cannot
 happen if
 step 8 has happen. Or more precisely, you got step 8 wrong: when a replica
 PROMISE, the promise is not that they won't promise a ballot older than
 2,
 it's that they won't accept a ballot older than 2. Therefore, after
 step 8,
 the accept from N1 will be reject in step 12 and the insert from N1 will
 be
 rejected (that is, N1 will restart the whole algorithm with a new ballot).


 On Tue, Aug 25, 2015 at 1:54 PM, ibrahim El-sanosi 
 ibrahimsaba...@gmail.com wrote:

 Hi folks,


 Cassandra provides *linearizable consistency (CAS, Compare-and-Set) by
 using Paxos 4 round-trips as following*

 *1.  **Prepare/promise*

 *2.  **Read/result*

 *3.  **Propose/accept*

 *4.  **Commit/acknowledgment *

 Assume we have an application for resistering new account, I want to
 make sure I only allow exactly one user to claim a given account. For
 example, we do not allow two users having the same username.

 Assuming we have a cluster consist of 5 nodes N1, N2, N3, N4, and N5. We
 have two concurrent clients C1 and C2. We have replication factor 3 and the
 partitioner has determined the primary and the replicas nodes of the INSERT
 example are N3, N4, and N5.


 The scenario happens in following order:

 1.  C1 connects to coordinator N1 and sends INSERT  V1 (assume V1
 is username, not resister before)

 2.  N1 sends PREPARE message with ballot 1 (highest ballot have
 seen) to N3, N4 and N5. Note that this prepare for C1 and V1.

 3.  N3, N4 and N5 send a PROMISE message to N1, to not promise any
 with older than ballot 1.

 4.N1  sends READ message to N3, N4 and N5 to read V1.

 5.N3, N4 and N5 send RESULT message to N1, informing that V1 not
 exist which results in N1 will go forward to next round.

 6.  Now C2 connects to coordinator N2 and sends INSERT  V1.

 7.  N2 sends PREPARE message with ballot 2 (highest ballot after
 re-prepare because first time, N2 does not know about ballot 1, then
 eventual it solves and have ballot 2) to N3, N4 and N5. Note that this
 prepare for C2 and V1.

 8.  N3, N4 and N5 send a PROMISE message to N2, to not promise any
 with older than ballot 2.

 9.  N2  sends READ message to N3, N4 and N5 to read V1.

 10.   N3, N4 and N5 send RESULT message to N2, informing that V1 not
 exist which results in N2 will go forward to next round.

 11.   Now N1 send PROPOSE message to  N3, N4 and N5 (ballot 1, V1).

 12.  N3, N4 and N5 send ACCEPT message to N1.

 13.  N2 send PROPOSE message to  N3, N4 and N5 (ballot 2, V1).

 14.  N3, N4 and N5 send ACCEPT message to N2.

 15.  N1 send COMMIT message to  N3, N4 and N5 (ballot 1).

 16.   N3, N4 and N5 send ACK message to N1.

 17.   N2 send COMMIT message to  N3, N4 and N5 (ballot 2).

 18.  N3, N4 and N5 send ACK message to N2.


 As result, both V1 from client C1 and V1 from client C2 have written to
 replicas N3, N4, and N5. Which I think it does not achieve the goal of 
 *linearizable
 consistency and CAS. *



 *Is that true and such scenario could be occurred?*



 I look forward to hearing from you.


 Regards,






Re: lightweight transactions with potential problem?

2015-08-25 Thread DuyHai Doan
The rationale of the last commit/ack phase is to set the chosen value (here
the mutation) in a durable storage (here into Cassandra) and reset this
value to allow another round of Paxos.

More explanation in this blog post:
http://www.datastax.com/dev/blog/lightweight-transactions-in-cassandra-2-0

For a detailed explanation of different Paxos phases, look at those slides:
http://www.slideshare.net/doanduyhai/distributed-algorithms-for-big-data-geecon/53


On Tue, Aug 25, 2015 at 6:07 PM, ibrahim El-sanosi ibrahimsaba...@gmail.com
 wrote:





 What an excellent explanation!!, thank you a lot.

 By the way, I do not understand why in lightweight transactions in
 Cassandra has round-trip commit/acknowledgment?

 For me, I think we can commit the value within phase propose/accept. Do
 you agree? If not agree can you explain why we need commit/acknowledgment?



 Regards,



 ibrahim



Re: PrepareStatement BUG

2015-08-25 Thread joseph gao
Hi, anybody knows how to resolve this problem?

2015-08-23 1:35 GMT+08:00 joseph gao gaojf.bok...@gmail.com:


 I'm using cassandra 2.1.7 and datastax java drive 2.1.6
 Here is the problem:

 I use PrepareStatement for query like : SELECT * FROM somespace.sometable
 where id = ?
 And I Cached the PrepareStatement in my jvm;
 When the table metadata has changed like a column was added;
 And I use the cached PrepareStament , the data and the metadata(column
 definations) don't match.
 So I re-prepare the sql using session.prepare(sql) again, but i see the
 code in the async-prepare callback part:

 stmt = cluster.manager.addPrepare(stmt); in the SessionManager.java

 this will return the previous PrepareStatement.
 So it neither re-prepare automatically nor allow user to re-prepare!
 Is this a bug or I use it like a fool?
 --
 --
 Joseph Gao
 PhoneNum:15210513582
 QQ: 409343351




-- 
--
Joseph Gao
PhoneNum:15210513582
QQ: 409343351


Re: Written data is lost and no exception thrown back to the client

2015-08-25 Thread Jean Tremblay
I have the same problem.

When I bulk load my data, I have a problem with Cassandra Datastax driver.

dependency
groupIdcom.datastax.cassandra/groupId
artifactIdcassandra-driver-core/artifactId
version2.1.4/version !-- Driver 2.1.6, 2.1.7.1 gives problems. Some data 
is lost. --
/dependency

With version 2.1.6 and also with version 2.1.7.1 I have lost records with no 
error message what so ever.
With version 2.1.4 I have no missing records.

I use CL.ONE to write my records. I use RF 3.






On 21 Aug 2015, at 13:06 , Robert Wille 
rwi...@fold3.commailto:rwi...@fold3.com wrote:

But it shouldn’t matter. I have missing data, and no errors, which shouldn’t be 
possible except with CL=ANY.

FWIW, I’m working on some sample code so I can post a Jira.

Robert

On Aug 21, 2015, at 5:04 AM, Robert Wille 
rwi...@fold3.commailto:rwi...@fold3.com wrote:

RF=1 with QUORUM consistency. I know QUORUM is weird with RF=1, but it should 
be the same as ONE. If’s QUORUM instead of ONE because production has RF=3, and 
I was running this against my test cluster with RF=1.

On Aug 20, 2015, at 7:28 PM, Jason 
jkushm...@rocketfuelinc.commailto:jkushm...@rocketfuelinc.com wrote:

What consistency level were the writes?

From: Robert Willemailto:rwi...@fold3.com
Sent: ‎8/‎20/‎2015 18:25
To: user@cassandra.apache.orgmailto:user@cassandra.apache.org
Subject: Written data is lost and no exception thrown back to the client

I wrote a data migration application which I was testing, and I pushed it too 
hard and the FlushWriter thread pool blocked, and I ended up with dropped 
mutation messages. I compared the source data against what is in my cluster, 
and as expected I have missing records. The strange thing is that my 
application didn’t error out. I’ve been doing some forensics, and there’s a lot 
about this that makes no sense and makes me feel very uneasy.

I use a lot of asynchronous queries, and I thought it was possible that I had 
bad error handling, so I checked for errors in other, independent ways.

I have a retry policy that on the first failure logs the error and then 
requests a retry. On the second failure it logs the error and then rethrows. A 
few retryable errors appeared in my logs, but no fatal errors. In theory, I 
should have a fatal error in my logs for any error that gets reported back to 
the client.

I wrap my Session object, and all queries go through this wrapper. This wrapper 
logs all query errors. Synchronous queries are wrapped in a try/catch which 
logs and rethrows. Asynchronous queries use a FutureCallback to log any 
onFailure invocations.

My logs indicate that no errors whatsoever were reported back to me. I do not 
understand how I can get dropped mutation messages and not know about it. I am 
running 2.0.16 with datastax Java driver 2.0.8. Three node cluster with RF=1. 
If someone could help me understand how this can occur, I would greatly 
appreciate it. A database that errors out is one thing. A database that errors 
out and makes you think everything was fine is quite another.

Thanks

Robert






'no such object in table'

2015-08-25 Thread Jason Lewis
I'm trying to run nodetool from one node, connecting to another.  I
can successfully connect to the majority of nodes in my ring, but two
nodes throw the following error.

nodetool: Failed to connect to 'IP:7199' NoSuchObjectException: 'no
such object in table'.

Any idea why this is happening?  Misconfiguration?

jas


How can I specify the file_data_directories for a keyspace

2015-08-25 Thread Ahmed Eljami
When I defines several file_data_directories in cassandra.yaml, would it be
possible to specify the location keyspace and tables ? or it is * only* and
*automatically* managed by Cassandra.

Thx.

-- 
Ahmed ELJAMI


Re: abnormal log after remove a node

2015-08-25 Thread Alain RODRIGUEZ
Hi, I am facing the same issue on 2.0.16.

Did you solve this ? How ?

I plan to try a rolling restart and see if gossip state recover from this.

C*heers,

Alain

2015-06-19 11:40 GMT+02:00 曹志富 cao.zh...@gmail.com:

 I have a C* 2.1.5 with 24 nodes.A few days ago ,I have remove a node from
 this cluster using nodetool decommission.

 But tody I find some log like this:

 INFO  [GossipStage:1] 2015-06-19 17:38:05,616 Gossiper.java:968 -
 InetAddress /172.19.105.41 is now DOWN
 INFO  [GossipStage:1] 2015-06-19 17:38:05,617 StorageService.java:1885 -
 Removing tokens [-1014432261309809702, -1055322450438958612,
 -1120728727235087395, -1191392141261832305, -1203676771883970142,
 -1215563040745505837, -1215648909329054362, -1269531760567530381,
 -1278047879489577908, -1313427877031136549, -1342822572958042617,
 -1350792764922315814, -1383390744017639599, -139000372807970456,
 -140827955201469664, -1631551789771606023, -1633789813430312609,
 -1795528665156349205, -1836619444785023397, -1879127294549041822,
 -1962337787208890426, -2022309807234530256, -2033402140526360327,
 -2089413865145942100, -210961549458416802, -2148530352195763113,
 -2184481573787758786, -610790268720205, -2340762266634834427,
 -2513416003567685694, -2520971378752190013, -2596695976621541808,
 -2620636796023437199, -2640378596436678113, -2679143017361311011,
 -2721176590519112233, -2749213392354746126, -279267896827516626,
 -2872377759991294853, -2904711688111888325, -290489381926812623,
 -3000574339499272616, -301428600802598523, -3019280155316984595,
 -3024451041907074275, -3056898917375012425, -3161300347260716852,
 -3166392383659271772, -3327634380871627036, -3530685865340274372,
 -3563112657791369745, -366930313427781469, -3729582520450700795,
 -3901838244986519991, -4065326606010524312, -4174346928341550117,
 -4184239233207315432, -4204369933734181327, -4206479093137814808,
 -421410317165821100, -4311166118017934135, -4407123461118340117,
 -4466364858622123151, -4466939645485100087, -448955147512581975,
 -4587780638857304626, -4649897584350376674, -4674234125365755024
 , -4833801201210885896, -4857586579802212277, -4868896650650107463,
 -4980063310159547694, -4983471821416248610, -4992846054037653676,
 -5026994389965137674, -514302500353679181
 0, -5198414516309928594, -5245363745777287346, -5346838390293957674,
 -5374413419545696184, -5427881744040857637, -5453876964430787287,
 -5491923669475601173, -55219734138599212
 6, -5523011502670737422, -5537121117160410549, -5557015938925208697,
 -5572489682738121748, -5745899409803353484, -5771239101488682535,
 -5893479791287484099, -59766730414807540
 44, -6014643892406938367, -6086002438656595783, -6129360679394503700,
 -6224240257573911174, -6290393495130499466, -6378712056928268929,
 -6430306056990093461, -6800188263839065
 013, -6912720411187525051, -7160327814305587432, -7175004328733776324,
 -7272070430660252577, -7307945744786025148, -742448651973108101,
 -7539255117639002578, -7657460716997978
 94, -7846698077070579798, -7870621904906244395, -7900841391761900719,
 -7918145426423910061, -7936795453892692473, -8070255024778921411,
 -8086888710627677669, -8124855925323654
 631, -8175270408138820500, -8271197636596881168, -8336685710406477123,
 -8466220397076441627, -8534337908154758270, -8550484400487603561,
 -862246738021989870, -8727219287242892
 185, -8895705475282612927, -8921801772904834063, -9057266752652143883,
 -9059183540698454288, -9067986437682229598, -9148183367896132028,
 -962208188860606543, 10859447725819218
 30, 1189775396643491793, 1253728955879686947, 1389982523380382228,
 1429632314664544045, 143610053770130548, 150118120072602242,
 1575692041584712198, 1624575905722628764, 17894
 76212785155173, 1995296121962835019, 2041217364870030239,
 2120277336231792146, 2124445736743406711, 2154979704292433983,
 2340726755918680765, 23481654796845972, 23620268084352
 24407, 2366144489007464626, 2381492708106933027, 2398868971489617398,
 2427315953339163528, 2433999003913998534, 2633074510238705620,
 266659839023809792, 2677817641360639089, 2
 719725410894526151, 2751925111749406683, 2815703589803785617,
 3041515796379693113, 3044903149214270978, 3094954503756703989,
 3243933267690865263, 3246086646486800371, 33270068
 97333869434, 3393657685587750192, 3395065499228709345,
 3426126123948029459, 3500469615600510698, 3644011364716880512,
 3693249207133187620, 3776164494954636918, 38780676797
 8035, 3872151295451662867, 3937077827707223414, 4041082935346014761,
 4060208918173638435, 4086747843759164940, 4165638694482690057,
 4203996339238989224, 4220155275330961826, 4
 366784953339236686, 4390116924352514616, 4391225331964772681,
 4392419346255765958, 4448400054980766409, 4463335839328115373,
 4547306976104362915, 4588174843388248100, 48438580
 67983993745, 4912719175808770608, 499628843707992459, 5004392861473086088,
 5021047773702107258, 510226752691159107, 5109551630357971118,
 5157669927051121583, 51627694176199618
 24, 5238710860488961530, 5245958115092331518, 

Re: Incremental, Sequential repair?

2015-08-25 Thread Robert Coli
On Tue, Aug 25, 2015 at 2:44 PM, Bryan Cheng br...@blockcypher.com wrote:

 [2015-08-25 21:36:43,433] It is not possible to mix sequential repair and
 incremental repairs.

 Is this a limitation around a specific configuration? Or is it generally
 true that incremental and sequential repairs are not compatible?


There's a migration process to sequential repairs.

http://www.datastax.com/dev/blog/more-efficient-repairs

etc.

=Rob


Re: 'no such object in table'

2015-08-25 Thread Michael Shuler

On 08/25/2015 02:19 PM, Jason Lewis wrote:

I'm trying to run nodetool from one node, connecting to another.  I
can successfully connect to the majority of nodes in my ring, but two
nodes throw the following error.

nodetool: Failed to connect to 'IP:7199' NoSuchObjectException: 'no
such object in table'.

Any idea why this is happening?  Misconfiguration?


Possibly. Check those nodes to see if 7199 is listening to only 
localhost or some private IP your client node cannot reach (failed to 
connect). The default is to only listen on localhost, as seen on my machine:


$ netstat -ln | grep 7199
tcp0  0 127.0.0.1:7199  0.0.0.0:*   LISTEN

JMX configuration is set in conf/cassandra-env.sh - please, configure 
JMX security as documented in that file and/or firewall JMX. Check all 
your nodes JMX security configs! :)


--
Kind regards,
Michael



Commit/acknowledgment phase in CAS?

2015-08-25 Thread ibrahim El-sanosi
Hi folks,


To achieve linearizable consistency in Cassandra, there are four
round-trips must be performed:

1.   Prepare/promise

2.   Read/result

3.   Propose/accept

*4.   **Commit/acknowledgment *



In the last phase in Paxos protocol (white paper), there is decide phase
only, no Commit/acknowledgment. DESIDE means to tell learners to apply the
accepted value.

If Commit/acknowledgment phase in CAS has similar purpose as DECIDE, then
why we have an acknowledgment round?


In fact, I want to know the purpose of Commit/acknowledgment phase in
lineazaible consistency in Cassandra. I have read the
http://www.datastax.com/dev/blog/lightweight-transactions-in-cassandra-2-0,
but it does not explain whole the picture.



I look forward to hearing from you

Ibrahim


Re: Incremental, Sequential repair?

2015-08-25 Thread Bryan Cheng
Thanks Robert! To clarify, you're referring to the process using
sstablerepairedset to mark sstables as repaired after a full repair with
autocompaction off? We're in the process of doing that throughout our
cluster now.

On Tue, Aug 25, 2015 at 3:30 PM, Robert Coli rc...@eventbrite.com wrote:

 On Tue, Aug 25, 2015 at 2:44 PM, Bryan Cheng br...@blockcypher.com
 wrote:

 [2015-08-25 21:36:43,433] It is not possible to mix sequential repair and
 incremental repairs.

 Is this a limitation around a specific configuration? Or is it generally
 true that incremental and sequential repairs are not compatible?


 There's a migration process to sequential repairs.

 http://www.datastax.com/dev/blog/more-efficient-repairs

 etc.

 =Rob



Re: Incremental, Sequential repair?

2015-08-25 Thread Robert Coli
On Tue, Aug 25, 2015 at 4:05 PM, Bryan Cheng br...@blockcypher.com wrote:

 Thanks Robert! To clarify, you're referring to the process using
 sstablerepairedset to mark sstables as repaired after a full repair with
 autocompaction off? We're in the process of doing that throughout our
 cluster now.


Yep.

As an aside, incremental repair currently doesn't handle some (edge) cases
that non-incremental repair does. FWIW, which is not too much!

https://issues.apache.org/jira/browse/CASSANDRA-5791
and
https://issues.apache.org/jira/browse/CASSANDRA-9947

=Rob


Incremental, Sequential repair?

2015-08-25 Thread Bryan Cheng
Hey all,

Got a question about incremental repairs, a quick google search turned up
nothing conclusive.

In the docs, in a few places, sequential, incremental repairs are mentioned.

From
http://docs.datastax.com/en/cassandra/2.1/cassandra/operations/ops_repair_nodes_c.html
(indirectly):

 You can combine repair options, such as parallel and incremental repair.

From http://www.datastax.com/dev/blog/more-efficient-repairs:

 Incremental repairs can be opted into via the -inc option to nodetool
repair. This is compatible with both sequential and parallel (-par) repair

However, when I try to run an incremental, sequential repair (nodetool
repair -inc), I get:

[2015-08-25 21:36:43,433] It is not possible to mix sequential repair and
incremental repairs.

Is this a limitation around a specific configuration? Or is it generally
true that incremental and sequential repairs are not compatible?

The cluster is a mixed 2.1.8/2.1.7, replication is NetworkTopology, with
LeveledCompaction (if it's relevant).

Thanks in advance!


lightweight transactions with potential problem?

2015-08-25 Thread ibrahim El-sanosi
Hi folks,


Cassandra provides *linearizable consistency (CAS, Compare-and-Set) by
using Paxos 4 round-trips as following*

*1.  **Prepare/promise*

*2.  **Read/result*

*3.  **Propose/accept*

*4.  **Commit/acknowledgment *

Assume we have an application for resistering new account, I want to make
sure I only allow exactly one user to claim a given account. For example,
we do not allow two users having the same username.

Assuming we have a cluster consist of 5 nodes N1, N2, N3, N4, and N5. We
have two concurrent clients C1 and C2. We have replication factor 3 and the
partitioner has determined the primary and the replicas nodes of the INSERT
example are N3, N4, and N5.


The scenario happens in following order:

1.  C1 connects to coordinator N1 and sends INSERT  V1 (assume V1 is
username, not resister before)

2.  N1 sends PREPARE message with ballot 1 (highest ballot have seen)
to N3, N4 and N5. Note that this prepare for C1 and V1.

3.  N3, N4 and N5 send a PROMISE message to N1, to not promise any with
older than ballot 1.

4.N1  sends READ message to N3, N4 and N5 to read V1.

5.N3, N4 and N5 send RESULT message to N1, informing that V1 not exist
which results in N1 will go forward to next round.

6.  Now C2 connects to coordinator N2 and sends INSERT  V1.

7.  N2 sends PREPARE message with ballot 2 (highest ballot after
re-prepare because first time, N2 does not know about ballot 1, then
eventual it solves and have ballot 2) to N3, N4 and N5. Note that this
prepare for C2 and V1.

8.  N3, N4 and N5 send a PROMISE message to N2, to not promise any with
older than ballot 2.

9.  N2  sends READ message to N3, N4 and N5 to read V1.

10.   N3, N4 and N5 send RESULT message to N2, informing that V1 not exist
which results in N2 will go forward to next round.

11.   Now N1 send PROPOSE message to  N3, N4 and N5 (ballot 1, V1).

12.  N3, N4 and N5 send ACCEPT message to N1.

13.  N2 send PROPOSE message to  N3, N4 and N5 (ballot 2, V1).

14.  N3, N4 and N5 send ACCEPT message to N2.

15.  N1 send COMMIT message to  N3, N4 and N5 (ballot 1).

16.   N3, N4 and N5 send ACK message to N1.

17.   N2 send COMMIT message to  N3, N4 and N5 (ballot 2).

18.  N3, N4 and N5 send ACK message to N2.


As result, both V1 from client C1 and V1 from client C2 have written to
replicas N3, N4, and N5. Which I think it does not achieve the goal of
*linearizable
consistency and CAS. *



*Is that true and such scenario could be occurred?*



I look forward to hearing from you.


Regards,


Re: lightweight transactions with potential problem?

2015-08-25 Thread Sylvain Lebresne
That scenario cannot happen. More specifically, your step 12 cannot happen
if
step 8 has happen. Or more precisely, you got step 8 wrong: when a replica
PROMISE, the promise is not that they won't promise a ballot older than 2,
it's that they won't accept a ballot older than 2. Therefore, after step
8,
the accept from N1 will be reject in step 12 and the insert from N1 will be
rejected (that is, N1 will restart the whole algorithm with a new ballot).


On Tue, Aug 25, 2015 at 1:54 PM, ibrahim El-sanosi ibrahimsaba...@gmail.com
 wrote:

 Hi folks,


 Cassandra provides *linearizable consistency (CAS, Compare-and-Set) by
 using Paxos 4 round-trips as following*

 *1.  **Prepare/promise*

 *2.  **Read/result*

 *3.  **Propose/accept*

 *4.  **Commit/acknowledgment *

 Assume we have an application for resistering new account, I want to make
 sure I only allow exactly one user to claim a given account. For example,
 we do not allow two users having the same username.

 Assuming we have a cluster consist of 5 nodes N1, N2, N3, N4, and N5. We
 have two concurrent clients C1 and C2. We have replication factor 3 and the
 partitioner has determined the primary and the replicas nodes of the INSERT
 example are N3, N4, and N5.


 The scenario happens in following order:

 1.  C1 connects to coordinator N1 and sends INSERT  V1 (assume V1 is
 username, not resister before)

 2.  N1 sends PREPARE message with ballot 1 (highest ballot have seen)
 to N3, N4 and N5. Note that this prepare for C1 and V1.

 3.  N3, N4 and N5 send a PROMISE message to N1, to not promise any
 with older than ballot 1.

 4.N1  sends READ message to N3, N4 and N5 to read V1.

 5.N3, N4 and N5 send RESULT message to N1, informing that V1 not
 exist which results in N1 will go forward to next round.

 6.  Now C2 connects to coordinator N2 and sends INSERT  V1.

 7.  N2 sends PREPARE message with ballot 2 (highest ballot after
 re-prepare because first time, N2 does not know about ballot 1, then
 eventual it solves and have ballot 2) to N3, N4 and N5. Note that this
 prepare for C2 and V1.

 8.  N3, N4 and N5 send a PROMISE message to N2, to not promise any
 with older than ballot 2.

 9.  N2  sends READ message to N3, N4 and N5 to read V1.

 10.   N3, N4 and N5 send RESULT message to N2, informing that V1 not
 exist which results in N2 will go forward to next round.

 11.   Now N1 send PROPOSE message to  N3, N4 and N5 (ballot 1, V1).

 12.  N3, N4 and N5 send ACCEPT message to N1.

 13.  N2 send PROPOSE message to  N3, N4 and N5 (ballot 2, V1).

 14.  N3, N4 and N5 send ACCEPT message to N2.

 15.  N1 send COMMIT message to  N3, N4 and N5 (ballot 1).

 16.   N3, N4 and N5 send ACK message to N1.

 17.   N2 send COMMIT message to  N3, N4 and N5 (ballot 2).

 18.  N3, N4 and N5 send ACK message to N2.


 As result, both V1 from client C1 and V1 from client C2 have written to
 replicas N3, N4, and N5. Which I think it does not achieve the goal of 
 *linearizable
 consistency and CAS. *



 *Is that true and such scenario could be occurred?*



 I look forward to hearing from you.


 Regards,



Re: lightweight transactions with potential problem?

2015-08-25 Thread ibrahim El-sanosi
OK, I see.

So you meant that the older ballot will not only reject in round-trip1
(prepare/promise), it also can be reject in propose/accept round-trips2, Is
that correct?

You Said : Or more precisely, you got step 8 wrong: when a replica PROMISE,
the promise is not that they won't promise a ballot older than 2,it's
that they won't accept a ballot older than 2

Why step 8 wrong? I think replicas can accept any highest ballot, so ballot
2 is the highest in step 8? what do you think?
 Do you also mean replica can promise older ballot.

I wish you could make it more clear.

Thank you a lot Sylvain

Ibrahim


On Tue, Aug 25, 2015 at 1:40 PM, Sylvain Lebresne sylv...@datastax.com
wrote:

 That scenario cannot happen. More specifically, your step 12 cannot happen
 if
 step 8 has happen. Or more precisely, you got step 8 wrong: when a replica
 PROMISE, the promise is not that they won't promise a ballot older than
 2,
 it's that they won't accept a ballot older than 2. Therefore, after step
 8,
 the accept from N1 will be reject in step 12 and the insert from N1 will be
 rejected (that is, N1 will restart the whole algorithm with a new ballot).


 On Tue, Aug 25, 2015 at 1:54 PM, ibrahim El-sanosi 
 ibrahimsaba...@gmail.com wrote:

 Hi folks,


 Cassandra provides *linearizable consistency (CAS, Compare-and-Set) by
 using Paxos 4 round-trips as following*

 *1.  **Prepare/promise*

 *2.  **Read/result*

 *3.  **Propose/accept*

 *4.  **Commit/acknowledgment *

 Assume we have an application for resistering new account, I want to make
 sure I only allow exactly one user to claim a given account. For example,
 we do not allow two users having the same username.

 Assuming we have a cluster consist of 5 nodes N1, N2, N3, N4, and N5. We
 have two concurrent clients C1 and C2. We have replication factor 3 and the
 partitioner has determined the primary and the replicas nodes of the INSERT
 example are N3, N4, and N5.


 The scenario happens in following order:

 1.  C1 connects to coordinator N1 and sends INSERT  V1 (assume V1 is
 username, not resister before)

 2.  N1 sends PREPARE message with ballot 1 (highest ballot have
 seen) to N3, N4 and N5. Note that this prepare for C1 and V1.

 3.  N3, N4 and N5 send a PROMISE message to N1, to not promise any
 with older than ballot 1.

 4.N1  sends READ message to N3, N4 and N5 to read V1.

 5.N3, N4 and N5 send RESULT message to N1, informing that V1 not
 exist which results in N1 will go forward to next round.

 6.  Now C2 connects to coordinator N2 and sends INSERT  V1.

 7.  N2 sends PREPARE message with ballot 2 (highest ballot after
 re-prepare because first time, N2 does not know about ballot 1, then
 eventual it solves and have ballot 2) to N3, N4 and N5. Note that this
 prepare for C2 and V1.

 8.  N3, N4 and N5 send a PROMISE message to N2, to not promise any
 with older than ballot 2.

 9.  N2  sends READ message to N3, N4 and N5 to read V1.

 10.   N3, N4 and N5 send RESULT message to N2, informing that V1 not
 exist which results in N2 will go forward to next round.

 11.   Now N1 send PROPOSE message to  N3, N4 and N5 (ballot 1, V1).

 12.  N3, N4 and N5 send ACCEPT message to N1.

 13.  N2 send PROPOSE message to  N3, N4 and N5 (ballot 2, V1).

 14.  N3, N4 and N5 send ACCEPT message to N2.

 15.  N1 send COMMIT message to  N3, N4 and N5 (ballot 1).

 16.   N3, N4 and N5 send ACK message to N1.

 17.   N2 send COMMIT message to  N3, N4 and N5 (ballot 2).

 18.  N3, N4 and N5 send ACK message to N2.


 As result, both V1 from client C1 and V1 from client C2 have written to
 replicas N3, N4, and N5. Which I think it does not achieve the goal of 
 *linearizable
 consistency and CAS. *



 *Is that true and such scenario could be occurred?*



 I look forward to hearing from you.


 Regards,