Re: client time stamp - force to be continuously increasing?

2016-05-12 Thread Jen Smith
Thank you - I think this driver solution may address a portion of the problem?  
Since this solution is from the driver, is it correct to assume that although 
this could potentially fix the issue within a single (client) session, it could 
not fix it for a pool of clients, where client A sent the first update and 
client B sent the 2nd one (because driver session doesn't share memory/data 
between clients)? Is this correct? if so, I think this doesn't provide the full 
HA client solution. 
but I think it does help to confirm that a 'reasonable' approach to solving the 
overall problem is software enforcement of a 'rigorously increasing timestamp' 
with the understood impact of drifting our timestamps out into the future (when 
conflict is identified).  it also sounds like from that jira ticket, the 
continually updating increment can be miliseconds, not seconds (note we are not 
using batch statements, which I believe have/had a TS granularity bug in the 
past).


  From: Alexandre Dutra 
 To: Jen Smith ; "user@cassandra.apache.org" 
 
 Sent: Thursday, May 12, 2016 11:28 AM
 Subject: Re: client time stamp - force to be continuously increasing?
   
Hi,
Among the ideas worth exploring, please note that the DataStax Java driver for 
Cassandra now includes a modified version of its monotonic timestamp generators 
that will indeed strive to provide rigorously increasing timestamps, even in 
the event of a system clock skew (in which case, they would keep drifting in 
the future). Such generators obviously do not pretend to provide the same 
monotonicity guarantees as a vector clock, but have at least the advantage of 
being fairly easy to set up. See JAVA-727[1] for details.
Hope that helps,
Alexandre
[1] https://datastax-oss.atlassian.net/browse/JAVA-727
On Thu, May 12, 2016 at 7:35 PM Jen Smith  wrote:

to clarify - the currentRecordTs would be saved on a field on the record being 
persisted
From: Jen Smith 
 To: "user@cassandra.apache.org"  
 Sent: Thursday, May 12, 2016 10:32 AM
 Subject: client time stamp - force to be continuously increasing?
  
I'd like to get feedback/opinions on a possible work around for a timestamp + 
data consistency edge case issue.
Context for this question:
When using client timestamp (default timestamp), on C* that supports it (v3 
protocol), on occasion a record update is lost when executing updates in rapid 
succession (less than a second between updates).  This is because C* by design 
(Last Write Wins) discards record updates with 'older' timestamp (from client), 
and server clocks (whether using client timestamp or c* node system timestamp) 
can move backwards, which results in data loss (eventual consistency is not 
reached).
For anyone needing more background, this blog has much of the detail 
https://aphyr.com/posts/299-the-trouble-with-timestamps , summarized 
as:"Cassandra uses the JVM’s System.getCurrentTimeMillis for its time source, 
which is backed by gettimeofday. Pretty much every Cassandra client out there 
does something similar. That means that the timestamps for writes made in a 
session are derived either from a single Cassandra server clock, or a single 
app server clock. These clocks can flow backwards, for a number of reasons:- 
Hardware wonkiness can push clocks days or centuries into the future or past.- 
Virtualization can wreak havoc on kernel timekeeping.- Misconfigured nodes may 
not have NTP enabled, or may not be able to reach upstream sources.- Upstream 
NTP servers can lie.- When the problem is identified and fixed, NTP corrects 
large time differentials by jumping the clock discontinously to the correct 
time.- Even when perfectly synchronized, POSIX time itself is not monotonic
If the system clock goes backwards for any reason, Cassandra’s session 
consistency guarantees no longer hold."
This blog goes on to suggest a monotonic clock (zookeeper as a possibility, but 
slow), or better NTP synching (which leaves gaps).
My question is if this can be addressed via software by (using? abusing?) the 
client provided timestamp field and forcing it to be continuously increasing, 
and what unexpected issues may arise from doing so?
Specifically, my idea is to set a timestamp on the record when it is created 
(from the system time of the client doing the create).  then on subsequent 
updates, always setting default client timestamp to the result of:
currentRecordTs = Math.max(currentRecordTs + standardDelta, 
System.currentTimeMillis()); 
(where standardDelta is probably 1 second)
Essentially this is keeping a wall clock guard on the record itself, to prevent 
backwards time stamping/ lost data and ensuring c* applies these updates in the 
proper order and does not discard any for being 'out of sequence' (ie, 
persisted 'after' a newer timestamped record was already persisted).
One 

Re: client time stamp - force to be continuously increasing?

2016-05-12 Thread Alexandre Dutra
Hi,

Among the ideas worth exploring, please note that the DataStax Java driver
for Cassandra now includes a modified version of its monotonic timestamp
generators that will indeed strive to provide rigorously increasing
timestamps, even in the event of a system clock skew (in which case, they
would keep drifting in the future). Such generators obviously do not
pretend to provide the same monotonicity guarantees as a vector clock, but
have at least the advantage of being fairly easy to set up. See JAVA-727[1]
for details.

Hope that helps,

Alexandre

[1] https://datastax-oss.atlassian.net/browse/JAVA-727

On Thu, May 12, 2016 at 7:35 PM Jen Smith  wrote:

> to clarify - the currentRecordTs would be saved on a field on the record
> being persisted
>
> --
> *From:* Jen Smith 
> *To:* "user@cassandra.apache.org" 
> *Sent:* Thursday, May 12, 2016 10:32 AM
> *Subject:* client time stamp - force to be continuously increasing?
>
> I'd like to get feedback/opinions on a possible work around for a
> timestamp + data consistency edge case issue.
>
> Context for this question:
>
> When using client timestamp (default timestamp), on C* that supports it
> (v3 protocol), on occasion a record update is lost when executing updates
> in rapid succession (less than a second between updates).  This is because
> C* by design (Last Write Wins) discards record updates with 'older'
> timestamp (from client), and server clocks (whether using client timestamp
> or c* node system timestamp) can move backwards, which results in data loss
> (eventual consistency is not reached).
>
> For anyone needing more background, this blog has much of the detail
> https://aphyr.com/posts/299-the-trouble-with-timestamps , summarized as:
> "Cassandra uses the JVM’s System.getCurrentTimeMillis for its time source,
> which is backed by gettimeofday. Pretty much every Cassandra client out
> there does something similar. That means that the timestamps for writes
> made in a session are derived either from a single Cassandra server clock,
> or a single app server clock. These clocks can flow backwards, for a number
> of reasons:
> - Hardware wonkiness can push clocks days or centuries into the future or
> past.
> - Virtualization can wreak havoc on kernel timekeeping.
> - Misconfigured nodes may not have NTP enabled, or may not be able to
> reach upstream sources.
> - Upstream NTP servers can lie.
> - When the problem is identified and fixed, NTP corrects large time
> differentials by jumping the clock discontinously to the correct time.
> - Even when perfectly synchronized, POSIX time itself is not monotonic.
> ...
>
> If the system clock goes backwards for any reason, Cassandra’s session
> consistency guarantees no longer hold."
>
> This blog goes on to suggest a monotonic clock (zookeeper as a
> possibility, but slow), or better NTP synching (which leaves gaps).
>
> My question is if this can be addressed via software by (using? abusing?)
> the client provided timestamp field and forcing it to be continuously
> increasing, and what unexpected issues may arise from doing so?
>
> Specifically, my idea is to set a timestamp on the record when it is
> created (from the system time of the client doing the create).  then on
> subsequent updates, always setting default client timestamp to the result
> of:
>
> currentRecordTs = Math.max(currentRecordTs + standardDelta,
> System.currentTimeMillis());
>
> (where standardDelta is probably 1 second)
>
> Essentially this is keeping a wall clock guard on the record itself, to
> prevent backwards time stamping/ lost data and ensuring c* applies these
> updates in the proper order and does not discard any for being 'out of
> sequence' (ie, persisted 'after' a newer timestamped record was already
> persisted).
>
> One (acceptable) drawback is that this will result in slightly inaccurate
> 'timestamp' being set, when  currentRecordTs + standardDelta >
> System.currentTimeMillis() , and that this could skew more incorrectly over
> time.
>
> Would you please advise me of any other problems, downstream effects,
> pitfalls or data consistency issues this approach might cause?   For
> example will C* object if the 'quasi' timestamp gets 'too far' in the
> future?
>
> More info - The system in question has LOCAL_QUORUM read/write
> consistency; and one client (c* session) is usually only updating a record
> at a time. (although concurrent updates from multiple clients are allowed-
> LWW is expected for that scenario, and some ambiguity here is ok).
>
> I apologize if this is a duplicate post to the list from me - I first sent
> this question when i was not subscribed to the list yet, so I am not sure
> if it has duplicated or not.
>
> thank you kindly for the advice,
> J. Smith
>
>
>
> --
Alexandre Dutra
Driver & Tools Engineer @ DataStax


Re: client time stamp - force to be continuously increasing?

2016-05-12 Thread Jen Smith
to clarify - the currentRecordTs would be saved on a field on the record being 
persisted
  From: Jen Smith 
 To: "user@cassandra.apache.org"  
 Sent: Thursday, May 12, 2016 10:32 AM
 Subject: client time stamp - force to be continuously increasing?
   
I'd like to get feedback/opinions on a possible work around for a timestamp + 
data consistency edge case issue.
Context for this question:
When using client timestamp (default timestamp), on C* that supports it (v3 
protocol), on occasion a record update is lost when executing updates in rapid 
succession (less than a second between updates).  This is because C* by design 
(Last Write Wins) discards record updates with 'older' timestamp (from client), 
and server clocks (whether using client timestamp or c* node system timestamp) 
can move backwards, which results in data loss (eventual consistency is not 
reached).
For anyone needing more background, this blog has much of the detail 
https://aphyr.com/posts/299-the-trouble-with-timestamps , summarized 
as:"Cassandra uses the JVM’s System.getCurrentTimeMillis for its time source, 
which is backed by gettimeofday. Pretty much every Cassandra client out there 
does something similar. That means that the timestamps for writes made in a 
session are derived either from a single Cassandra server clock, or a single 
app server clock. These clocks can flow backwards, for a number of reasons:- 
Hardware wonkiness can push clocks days or centuries into the future or past.- 
Virtualization can wreak havoc on kernel timekeeping.- Misconfigured nodes may 
not have NTP enabled, or may not be able to reach upstream sources.- Upstream 
NTP servers can lie.- When the problem is identified and fixed, NTP corrects 
large time differentials by jumping the clock discontinously to the correct 
time.- Even when perfectly synchronized, POSIX time itself is not monotonic
If the system clock goes backwards for any reason, Cassandra’s session 
consistency guarantees no longer hold."
This blog goes on to suggest a monotonic clock (zookeeper as a possibility, but 
slow), or better NTP synching (which leaves gaps).
My question is if this can be addressed via software by (using? abusing?) the 
client provided timestamp field and forcing it to be continuously increasing, 
and what unexpected issues may arise from doing so?
Specifically, my idea is to set a timestamp on the record when it is created 
(from the system time of the client doing the create).  then on subsequent 
updates, always setting default client timestamp to the result of:
currentRecordTs = Math.max(currentRecordTs + standardDelta, 
System.currentTimeMillis()); 
(where standardDelta is probably 1 second)
Essentially this is keeping a wall clock guard on the record itself, to prevent 
backwards time stamping/ lost data and ensuring c* applies these updates in the 
proper order and does not discard any for being 'out of sequence' (ie, 
persisted 'after' a newer timestamped record was already persisted).
One (acceptable) drawback is that this will result in slightly inaccurate 
'timestamp' being set, when  currentRecordTs + standardDelta > 
System.currentTimeMillis() , and that this could skew more incorrectly over 
time.
Would you please advise me of any other problems, downstream effects, pitfalls 
or data consistency issues this approach might cause?   For example will C* 
object if the 'quasi' timestamp gets 'too far' in the future?
More info - The system in question has LOCAL_QUORUM read/write consistency; and 
one client (c* session) is usually only updating a record at a time. (although 
concurrent updates from multiple clients are allowed- LWW is expected for that 
scenario, and some ambiguity here is ok).
I apologize if this is a duplicate post to the list from me - I first sent this 
question when i was not subscribed to the list yet, so I am not sure if it has 
duplicated or not.
thank you kindly for the advice,J. Smith


  

client time stamp - force to be continuously increasing?

2016-05-12 Thread Jen Smith
I'd like to get feedback/opinions on a possible work around for a timestamp + 
data consistency edge case issue.
Context for this question:
When using client timestamp (default timestamp), on C* that supports it (v3 
protocol), on occasion a record update is lost when executing updates in rapid 
succession (less than a second between updates).  This is because C* by design 
(Last Write Wins) discards record updates with 'older' timestamp (from client), 
and server clocks (whether using client timestamp or c* node system timestamp) 
can move backwards, which results in data loss (eventual consistency is not 
reached).
For anyone needing more background, this blog has much of the detail 
https://aphyr.com/posts/299-the-trouble-with-timestamps , summarized 
as:"Cassandra uses the JVM’s System.getCurrentTimeMillis for its time source, 
which is backed by gettimeofday. Pretty much every Cassandra client out there 
does something similar. That means that the timestamps for writes made in a 
session are derived either from a single Cassandra server clock, or a single 
app server clock. These clocks can flow backwards, for a number of reasons:- 
Hardware wonkiness can push clocks days or centuries into the future or past.- 
Virtualization can wreak havoc on kernel timekeeping.- Misconfigured nodes may 
not have NTP enabled, or may not be able to reach upstream sources.- Upstream 
NTP servers can lie.- When the problem is identified and fixed, NTP corrects 
large time differentials by jumping the clock discontinously to the correct 
time.- Even when perfectly synchronized, POSIX time itself is not monotonic
If the system clock goes backwards for any reason, Cassandra’s session 
consistency guarantees no longer hold."
This blog goes on to suggest a monotonic clock (zookeeper as a possibility, but 
slow), or better NTP synching (which leaves gaps).
My question is if this can be addressed via software by (using? abusing?) the 
client provided timestamp field and forcing it to be continuously increasing, 
and what unexpected issues may arise from doing so?
Specifically, my idea is to set a timestamp on the record when it is created 
(from the system time of the client doing the create).  then on subsequent 
updates, always setting default client timestamp to the result of:
currentRecordTs = Math.max(currentRecordTs + standardDelta, 
System.currentTimeMillis()); 
(where standardDelta is probably 1 second)
Essentially this is keeping a wall clock guard on the record itself, to prevent 
backwards time stamping/ lost data and ensuring c* applies these updates in the 
proper order and does not discard any for being 'out of sequence' (ie, 
persisted 'after' a newer timestamped record was already persisted).
One (acceptable) drawback is that this will result in slightly inaccurate 
'timestamp' being set, when  currentRecordTs + standardDelta > 
System.currentTimeMillis() , and that this could skew more incorrectly over 
time.
Would you please advise me of any other problems, downstream effects, pitfalls 
or data consistency issues this approach might cause?   For example will C* 
object if the 'quasi' timestamp gets 'too far' in the future?
More info - The system in question has LOCAL_QUORUM read/write consistency; and 
one client (c* session) is usually only updating a record at a time. (although 
concurrent updates from multiple clients are allowed- LWW is expected for that 
scenario, and some ambiguity here is ok).
I apologize if this is a duplicate post to the list from me - I first sent this 
question when i was not subscribed to the list yet, so I am not sure if it has 
duplicated or not.
thank you kindly for the advice,J. Smith


Schema format for Cassandra

2016-05-12 Thread Yawei Li
Hi there,

Is there any detailed document about the internal storage format for
Cassandra schema? My guess is that Cassandra is using an internal format.
If that's true, I am wondering if we've considered using Thrift or Avro
format, which might easy the integration between Cassandra and Hadoop.

Thanks
Yawei


Re: IF EXISTS checks on all nodes?

2016-05-12 Thread Siddharth Verma
Hi, I missed out on some info
node 1,2,3 are in DC1
node 4,5,6 are in DC2
and RF is 3
so all data is on all nodes


@Carlos : There was only one query. And yes all nodes have same data for
col5 only
node 6 has
P1,100,A,val1,w1
P1,100,B,val2,w2
P1,200,C,val3,w_x
P1,200,D,val4,w4

node 1,2,3,4,5 have
P1,100,A,val1,w1
P1,100,B,val2,w2
P1,200,C,null,w_x

So, when "consistency all" in cqlsh
1.IF EXISTS is checked ON EVERY NODE before applying, and if it is true on
all, ONLY then it is applied
OR
2. IF EXISTS was true on one, so applied on all.


Re: Low cardinality secondary index behaviour

2016-05-12 Thread Tyler Hobbs
On Tue, May 10, 2016 at 6:41 AM, Atul Saroha 
wrote:

> I have concern over using secondary index on field with low cardinality.
> Lets say I have few billion rows and each row can be classified in 1000
> category. Lets say we have 50 node cluster.
>
> Now we want to fetch data for a single category using secondary index over
> a category. And query is paginated too with fetch size property say 5000.
>
> Since query on secondary index works as scatter and gatherer approach by
> coordinator node. Would it lead to out of memory on coordinator or timeout
> errors too much.
>

Paging will prevent the coordinator from using excessive memory.  With the
type of data that you described, timeouts shouldn't be huge problem because
it will only take a few token ranges (assuming you're using vnodes) to get
enough matching rows to hit the page size.


>
> How does pagination (token level data fetch) behave in scatter and
> gatherer approach?
>

Secondary index queries fetch token ranges in sequential order [1],
starting with the minimum token.  When you fetch a new page, it resumes
from the last token (and primary key) that it returned in the previous page.

[1] As an optimization, multiple token ranges will be fetched in parallel
based on estimates of how many token ranges it will take to fill the page.


>
> Secondly, What If we create an inverted table with partition key as
> category. Then this will led to lots of data on single node. Then it might
> led to hot shard issue and performance issue of data fetching from single
> node as a single partition has  millions of rows.
>
> How should we tackle such low cardinality index in Cassandra?


The data distribution that you described sounds like a reasonable fit for
secondary indexes.  However, I would also take into account how frequently
you run this query and how fast you need it to be.  Even ignoring the
scatter-gather aspects of a secondary index query, they are still expensive
because they fetch many non-contiguous rows from an SSTable.  If you need
to run this query very frequently, that may add too much load to your
cluster, and some sort of inverted table approach may be more appropriate.

-- 
Tyler Hobbs
DataStax 


Blocking read repair giving consistent data but not repairing existing data

2016-05-12 Thread Bhuvan Rawal
Hi,

We are using dsc 3.0.3 on total of *6 Nodes*,* 2 DC's, 3 Node Each, RF-3*
so every node has complete data. Now we are facing a situation on a table
with 1 partition key, 2 clustering columns and 4 normal columns.

Out of the 6 5 nodes has a single value and Partition key, 2 clustering key
for a row but 3 other normal values are null.

When doing consistency level all query we get complete view of the row and
in the tracing output it says that inconsistency found in digest and read
repair is sent out to the nodes.
<*Exact error in tracing : Digest mismatch*:
org.apache.cassandra.service.DigestMismatchException: Mismatch for key
DecoratedKey(-9222324572206777856, 53444c393233363439393233)
(c421efa89ea3435c153617a34c08f396 vs 51a7e02f9e5e93520f56541ed6730558>

But on doing another read with reduced consistency the output received is
not repaired.

We are speculating that the node which is having complete view of the row
was down when delete happened on that row for more than 3 hrs (default hint
window).  We had not enabled read repair within gc grace period so possibly
the 3 deleted cells have come alive on that node, but in that case:

1. if the consistency level all query is giving complete row view then why
isnt it reflected on other nodes.
2. *read_repair_chance* is 0.0 but *dclocal_read_repair_chance* = 0.1 on
the table  (default configurations) and I tried the query with Local one on
all the servers to fulfill that probability (35-40 times on every server).

Therefore what could be the possibility that blocking and non blocking read
repair is not working? Could Full read repair fix it? Possible bug in
dsc3.0.3 which could be fixed in later version?

Any assistance on this will be welcome as this appears to be one off
scenario. I can provide complete cqlsh tracing log for consistency
local_one read query and consistency all query if required.

C*eers,
Bhuvan


Re: IF EXISTS checks on all nodes?

2016-05-12 Thread Carlos Rolo
Hello,

As far as I know, lightweight transactions only apply to a single
partition, so in your case it will only execute on the nodes responsible
for that partition. And as a consequence, those nodes will all be in the
same state when the transaction ends (If it would apply).

Please refer to this blog post for more information about the LIghtweigth
transactions:
http://www.datastax.com/dev/blog/lightweight-transactions-in-cassandra-2-0



Regards,

Carlos Juzarte Rolo
Cassandra Consultant / Datastax Certified Architect / Cassandra MVP

Pythian - Love your data

rolo@pythian | Twitter: @cjrolo | Skype: cjr2k3 | Linkedin:
*linkedin.com/in/carlosjuzarterolo
*
Mobile: +351 918 918 100
www.pythian.com

On Thu, May 12, 2016 at 12:17 PM, Siddharth Verma <
verma.siddha...@snapdeal.com> wrote:

> Hi,
> If i have inconsistent data on nodes
> Scenario :
> I have 2 DCs each with 3 nodes
> and I have inconsistent data on them
>
> node 1,2,3,4,5 have
> P1,100,A,val1,w1
> P1,100,B,val2,w2
>
> node 6 has
> P1,100,A,val1,w1
> P1,100,B,val2,w2
> P1,200,C,val3,w3
> P1,200,D,val4,w4
>
> col1, col2, col3,col4,col5 in table
> Primary key (col1, col2, col3)
>
> Now i execute the query from CQLSH
> update mykeyspace.my_table_1 set col5 = 'w_x' where col1='P1' and col2=200
> and col3='C' IF EXISTS;
>
> Is it possible that
> node 1,2,3,4,5 will get the entry
> P1,200,C,null,w_x
>
> I.e. IF EXISTS is checked per node or only once and then execute on all?
>
> Thanks
> Siddharth Verma
>

-- 


--





IF EXISTS checks on all nodes?

2016-05-12 Thread Siddharth Verma
Hi,
If i have inconsistent data on nodes
Scenario :
I have 2 DCs each with 3 nodes
and I have inconsistent data on them

node 1,2,3,4,5 have
P1,100,A,val1,w1
P1,100,B,val2,w2

node 6 has
P1,100,A,val1,w1
P1,100,B,val2,w2
P1,200,C,val3,w3
P1,200,D,val4,w4

col1, col2, col3,col4,col5 in table
Primary key (col1, col2, col3)

Now i execute the query from CQLSH
update mykeyspace.my_table_1 set col5 = 'w_x' where col1='P1' and col2=200
and col3='C' IF EXISTS;

Is it possible that
node 1,2,3,4,5 will get the entry
P1,200,C,null,w_x

I.e. IF EXISTS is checked per node or only once and then execute on all?

Thanks
Siddharth Verma


RE: Cassandra 2.0.x OOM during startsup - schema version inconsistency after reboot

2016-05-12 Thread Michael Fong
Hi Alain,

Thanks for your reply.

We understood that there is a chance that this would be left unresolved, since 
we are really way behind the official Cassandra releases.

Here is what have further found for the OOM issue, which seems to be related to 
# of gossip message accumulated on a live node waiting to connect to the 
rebooted node. Once that node is rebooted, all the gossip message floods into 
each other, triggers StorageService.onAlive() and schedules a schema pull on 
demand. In our case, schema version sometimes is different after reboot. When 
that happens, schema-exchange storm begins.

Also, thanks for your tip on sharing the SOP on stopping an ode, here is what 
we have for our stop procedure:
Disable thrift
Disable Binary
Wait 10s
Disable gossip
Drain
Kill 

Any thought on this to be further improved?

Thanks!

Sincerely,

Michael Fong


From: Alain RODRIGUEZ [mailto:arodr...@gmail.com]
Sent: Wednesday, May 11, 2016 10:01 PM
To: user@cassandra.apache.org
Cc: d...@cassandra.apache.org
Subject: Re: Cassandra 2.0.x OOM during startsup - schema version inconsistency 
after reboot

Hi Michaels :-),

My guess is this ticket will be closed with a "Won't Fix" resolution.

Cassandra 2.0 is no longer supported and I have seen tickets being rejected 
like CASSANDRA-10510.

Would you like to upgrade to 2.1.last and see if you still have the issue?

About your issue, do you stop your node using a command like the following one?

nodetool disablethrift && nodetool disablebinary && sleep 5 && nodetool 
disablegossip && sleep 10 && nodetool drain && sleep 10 && sudo service 
cassandra stop

or even flushing:

nodetool disablethrift && nodetool disablebinary && sleep 5 && nodetool 
disablegossip && sleep 10 && nodetool flush && nodetool drain && sleep 10 && 
sudo service cassandra stop

Are commitlogs empty when you start cassandra?

C*heers,

---
Alain Rodriguez - al...@thelastpickle.com
France

The Last Pickle - Apache Cassandra Consulting
http://www.thelastpickle.com

2016-05-11 5:35 GMT+02:00 Michael Fong 
>:
Hi,

Thanks for your recommendation.
I also opened a ticket to keep track @ 
https://issues.apache.org/jira/browse/CASSANDRA-11748
Hope this could brought someone's attention to take a look. Thanks.

Sincerely,

Michael Fong

-Original Message-
From: Michael Kjellman 
[mailto:mkjell...@internalcircle.com]
Sent: Monday, May 09, 2016 11:57 AM
To: d...@cassandra.apache.org
Cc: user@cassandra.apache.org
Subject: Re: Cassandra 2.0.x OOM during startsup - schema version inconsistency 
after reboot

I'd recommend you create a JIRA! That way you can get some traction on the 
issue. Obviously an OOM is never correct, even if your process is wrong in some 
way!

Best,
kjellman

Sent from my iPhone

> On May 8, 2016, at 8:48 PM, Michael Fong 
> > 
> wrote:
>
> Hi, all,
>
>
> Haven't heard any responses so far, and this isue has troubled us for quite 
> some time. Here is another update:
>
> We have noticed several times that The schema version may change after 
> migration and reboot:
>
> Here is the scenario:
>
> 1.   Two node cluster (1 & 2).
>
> 2.   There are some schema changes, i.e. create a few new columnfamily. 
> The cluster will wait until both nodes have schema version in sync (describe 
> cluster) before moving on.
>
> 3.   Right before node2 is rebooted, the schema version is consistent; 
> however, after ndoe2 reboots and starts servicing, the MigrationManager would 
> gossip different schema version.
>
> 4.   Afterwards, both nodes starts exchanging schema  message 
> indefinitely until one of the node dies.
>
> We currently suspect the change of schema is due to replying the old entry in 
> commit log. We wish to continue dig further, but need experts help on this.
>
> I don't know if anyone has seen this before, or if there is anything wrong 
> with our migration flow though..
>
> Thanks in advance.
>
> Best regards,
>
>
> Michael Fong
>
> From: Michael Fong 
> [mailto:michael.f...@ruckuswireless.com]
> Sent: Thursday, April 21, 2016 6:41 PM
> To: user@cassandra.apache.org; 
> d...@cassandra.apache.org
> Subject: RE: Cassandra 2.0.x OOM during bootstrap
>
> Hi, all,
>
> Here is some more information on before the OOM happened on the rebooted node 
> in a 2-node test cluster:
>
>
> 1.   It seems the schema version has changed on the rebooted node after 
> reboot, i.e.
> Before reboot,
> Node 1: DEBUG [MigrationStage:1] 2016-04-19 11:09:42,326
> MigrationManager.java (line 328) Gossiping my schema version
>