Re: Repair of 5GB data vs. disk throughput does not make sense

2018-04-26 Thread horschi
Hi Thomas,

I don't think I have seen compaction ever being faster.

For me, tables with small values usually are around 5 MB/s with a single
compaction. With larger blobs (few KB per blob) I have seen 16MB/s. Both
with "nodetool setcompactionthroughput 0".

I don't think its disk related either. I think parsing the data simply
utilizes the CPU or perhaps the issue is GC related? But I have never dug
into it, I just observed low IO-wait percentages in top.

regards,
Christian




On Thu, Apr 26, 2018 at 7:39 PM, Jonathan Haddad  wrote:

> I can't say for sure, because I haven't measured it, but I've seen a
> combination of readahead + large chunk size with compression cause serious
> issues with read amplification, although I'm not sure if or how it would
> apply here.  Likely depends on the size of your partitions and the
> fragmentation of the sstables, although at only 5GB I'm really surprised to
> hear 32GB read in, that seems a bit absurd.
>
> Definitely something to dig deeper into.
>
> On Thu, Apr 26, 2018 at 5:02 AM Steinmaurer, Thomas <
> thomas.steinmau...@dynatrace.com> wrote:
>
>> Hello,
>>
>>
>>
>> yet another question/issue with repair.
>>
>>
>>
>> Cassandra 2.1.18, 3 nodes, RF=3, vnode=256, data volume ~ 5G per node
>> only. A repair (nodetool repair -par) issued on a single node at this data
>> volume takes around 36min with an AVG of ~ 15MByte/s disk throughput
>> (read+write) for the entire time-frame, thus processing ~ 32GByte from a
>> disk perspective so ~ 6 times of the real data volume reported by nodetool
>> status. Does this make any sense? This is with 4 compaction threads and
>> compaction throughput = 64. Similar results doing this test a few times,
>> where most/all inconsistent data should be already sorted out by previous
>> runs.
>>
>>
>>
>> I know there is e.g. reaper, but the above is a simple use case simply
>> after a single failed node recovers beyond the 3h hinted handoff window.
>> How should this finish in a timely manner for > 500G on a recovering node?
>>
>>
>>
>> I have to admit this is with NFS as storage. I know, NFS might not be the
>> best idea, but with the above test at ~ 5GB data volume, we see an IOPS
>> rate at ~ 700 at a disk latency of ~ 15ms, thus I wouldn’t treat it as that
>> bad. This all is using/running Cassandra on-premise (at the customer, so
>> not hosted by us), so while we can make recommendations storage-wise (of
>> course preferring local disks), it may and will happen that NFS is being in
>> use then.
>>
>>
>>
>> Why we are using -par in combination with NFS is a different story and
>> related to this issue: https://issues.apache.org/
>> jira/browse/CASSANDRA-8743. Without switching from sequential to
>> parallel repair, we basically kill Cassandra.
>>
>>
>>
>> Throughput-wise, I also don’t think it is related to NFS, cause we see
>> similar repair throughput values with AWS EBS (gp2, SSD based) running
>> regular repairs on small-sized CFs.
>>
>>
>>
>> Thanks for any input.
>>
>> Thomas
>> The contents of this e-mail are intended for the named addressee only. It
>> contains information that may be confidential. Unless you are the named
>> addressee or an authorized designee, you may not copy or use it, or
>> disclose it to anyone else. If you received it in error please notify us
>> immediately and then destroy it. Dynatrace Austria GmbH (registration
>> number FN 91482h) is a company registered in Linz whose registered office
>> is at 4040 Linz, Austria, Freistädterstraße 313
>>
>


Re: Phantom growth resulting automatically node shutdown

2018-04-19 Thread horschi
Did you check the number of files in your data folder before & after the
restart?

I have seen cases where cassandra would keep creating sstables, which
disappeared on restart.

regards,
Christian


On Thu, Apr 19, 2018 at 12:18 PM, Fernando Neves 
wrote:

> I am facing one issue with our Cassandra cluster.
>
> Details: Cassandra 3.0.14, 12 nodes, 7.4TB(JBOD) disk size in each node,
> ~3.5TB used physical data in each node, ~42TB whole cluster and default
> compaction setup. This size maintain the same because after the retention
> period some tables are dropped.
>
> Issue: Nodetool status is not showing the correct used size in the output.
> It keeps increasing the used size without limit until automatically node
> shutdown or until our sequential scheduled restart(workaround 3 times
> week). After the restart, nodetool shows the correct used space but for few
> days.
> Did anybody have similar problem? Is it a bug?
>
> Stackoverflow: https://stackoverflow.com/ques
> tions/49668692/cassandra-nodetool-status-is-not-showing-correct-used-space
>
>


Re: Driver consistency issue

2018-02-27 Thread horschi
Hi Abhishek & everyone else,

might it be related to https://issues.apache.org/jira/browse/CASSANDRA-7868
?

regards,
Christian



On Tue, Feb 27, 2018 at 12:46 PM, Abhishek Kumar Maheshwari <
abhishek.maheshw...@timesinternet.in> wrote:

> Hi,
>
> Not always. Randomly i am getting this exception. (one observation, mostly
> i got this exception when i add new node in cluster.)
>
> On Tue, Feb 27, 2018 at 4:29 PM, Nicolas Guyomar <
> nicolas.guyo...@gmail.com> wrote:
>
>> Hi,
>>
>> Adding the java-driver ML for further question, because this does look
>> like a bug
>>
>> Are you able to reproduce it a clean environnement using the same C*
>> version and driver version ?
>>
>>
>> On 27 February 2018 at 10:05, Abhishek Kumar Maheshwari <
>> abhishek.maheshw...@timesinternet.in> wrote:
>>
>>> Hi Alex,
>>>
>>> i have only One DC (with name DC1) and have only one keyspace. So i dont
>>> think so both of the scenario is possible. (yes in my case QUORUM is  
>>> equivalent
>>> of ALL)
>>>
>>> cqlsh> SELECT * FROM system_schema.keyspaces  where
>>> keyspace_name='adlog' ;
>>>
>>>  keyspace_name | durable_writes | replication
>>> ---++---
>>> 
>>>  adlog |   True | {'DC1': '2', 'class':
>>> 'org.apache.cassandra.locator.NetworkTopologyStrategy'}
>>>
>>>
>>> On Tue, Feb 27, 2018 at 2:27 PM, Oleksandr Shulgin <
>>> oleksandr.shul...@zalando.de> wrote:
>>>
 On Tue, Feb 27, 2018 at 9:45 AM, Abhishek Kumar Maheshwari <
 abhishek.maheshw...@timesinternet.in> wrote:

>
> i have a KeySpace in Cassandra (cassandra version 3.0.9- total 12
> Servers )With below definition:
>
> {'DC1': '2', 'class': 'org.apache.cassandra.locator.
> NetworkTopologyStrategy'}
>
> Some time i am getting below exception
>
> [snip]

> Caused by: com.datastax.driver.core.exceptions.WriteTimeoutException:
> Cassandra timeout during write query at consistency QUORUM (3 replica were
> required but only 2 acknowledged the write)
> at com.datastax.driver.core.excep
> tions.WriteTimeoutException.copy(WriteTimeoutException.java:100)
> at com.datastax.driver.core.Respo
> nses$Error.asException(Responses.java:134)
> at com.datastax.driver.core.Reque
> stHandler$SpeculativeExecution.onSet(RequestHandler.java:525)
> at com.datastax.driver.core.Conne
> ction$Dispatcher.channelRead0(Connection.java:1077)
>
> why its waiting for acknowledged from 3rd server as replication
> factor is 2?
>

 I see two possibilities:

 1) The data in this keyspace is replicated to another DC, so there is
 also 'DC2': '2', for example, but you didn't show it.  In this case QUORUM
 requires more than 2 nodes.
 2) The write was targeting a table in a different keyspace than you
 think.

 In any case QUORUM (or LOCAL_QUORUM) with RF=2 is equivalent of ALL.
 Not sure why would you use it in the first place.

 For consistency levels involving quorum you want to go with RF=3 in a
 single DC.  For multi DC you should think if you want QUORUM or EACH_QUORUM
 for your writes and figure out the RFs from that.

 Cheers,
 --
 Alex


>>>
>>>
>>> --
>>>
>>> *Thanks & Regards,*
>>> *Abhishek Kumar Maheshwari*
>>> *+91- 805591 <+91%208%2005591> (Mobile)*
>>>
>>> Times Internet Ltd. | A Times of India Group Company
>>>
>>> FC - 6, Sector 16A, Film City,  Noida,  U.P. 201301 | INDIA
>>>
>>> *P** Please do not print this email unless it is absolutely necessary.
>>> Spread environmental awareness.*
>>>
>>
>>
>
>
> --
>
> *Thanks & Regards,*
> *Abhishek Kumar Maheshwari*
> *+91- 805591 <+91%208%2005591> (Mobile)*
>
> Times Internet Ltd. | A Times of India Group Company
>
> FC - 6, Sector 16A, Film City,  Noida,  U.P. 201301 | INDIA
>
> *P** Please do not print this email unless it is absolutely necessary.
> Spread environmental awareness.*
>


Re: All subsequent CAS requests time out after heavy use of new CAS feature

2016-12-24 Thread horschi
Oh yes it is, like Couters :-)


On Sat, Dec 24, 2016 at 4:02 AM, Edward Capriolo <edlinuxg...@gmail.com>
wrote:

> Anecdotal CAS works differently than the typical cassandra workload. If
> you run a stress instance 3 nodes one host, you find that you typically run
> into CPU issues, but if you are doing a CAS workload you see things timing
> out and before you hit 100% CPU. It is a strange beast.
>
> On Fri, Dec 23, 2016 at 7:28 AM, horschi <hors...@gmail.com> wrote:
>
>> Update: I replace all quorum reads on that table with serial reads, and
>> now these errors got less. Somehow quorum reads on CAS values cause most of
>> these WTEs.
>>
>> Also I found two tickets on that topic:
>> https://issues.apache.org/jira/browse/CASSANDRA-9328
>> https://issues.apache.org/jira/browse/CASSANDRA-8672
>>
>> On Thu, Dec 15, 2016 at 3:14 PM, horschi <hors...@gmail.com> wrote:
>>
>>> Hi,
>>>
>>> I would like to warm up this old thread. I did some debugging and found
>>> out that the timeouts are coming from StorageProxy.proposePaxos()
>>> - callback.isFullyRefused() returns false and therefore triggers a
>>> WriteTimeout.
>>>
>>> Looking at my ccm cluster logs, I can see that two replica nodes return
>>> different results in their ProposeVerbHandler. In my opinion the
>>> coordinator should not throw a Exception in such a case, but instead retry
>>> the operation.
>>>
>>> What do the CAS/Paxos experts on this list say to this? Feel free to
>>> instruct me to do further tests/code changes. I'd be glad to help.
>>>
>>> Log:
>>>
>>> node1/logs/system.log:WARN  [SharedPool-Worker-5] 2016-12-15
>>> 14:48:36,896 PaxosState.java:124 - Rejecting proposal for
>>> Commit(2d803540-c2cd-11e6-2e48-53a129c60cfc, [MDS.Lock] key=locktest_ 1
>>> columns=[[] | [value]]
>>> node1/logs/system.log-Row: id=@ | value=) because
>>> inProgress is now Commit(2d8146b0-c2cd-11e6-f996-e5c8d88a1da4,
>>> [MDS.Lock] key=locktest_ 1 columns=[[] | [value]]
>>> --
>>> node1/logs/system.log:ERROR [SharedPool-Worker-12] 2016-12-15
>>> 14:48:36,980 StorageProxy.java:506 - proposePaxos:
>>> Commit(2d803540-c2cd-11e6-2e48-53a129c60cfc, [MDS.Lock] key=locktest_ 1
>>> columns=[[] | [value]]
>>> node1/logs/system.log-Row: id=@ | value=)//1//0
>>> --
>>> node2/logs/system.log:WARN  [SharedPool-Worker-7] 2016-12-15
>>> 14:48:36,969 PaxosState.java:117 - Accepting proposal:
>>> Commit(2d803540-c2cd-11e6-2e48-53a129c60cfc, [MDS.Lock] key=locktest_ 1
>>> columns=[[] | [value]]
>>> node2/logs/system.log-Row: id=@ | value=)
>>> --
>>> node3/logs/system.log:WARN  [SharedPool-Worker-2] 2016-12-15
>>> 14:48:36,897 PaxosState.java:124 - Rejecting proposal for
>>> Commit(2d803540-c2cd-11e6-2e48-53a129c60cfc, [MDS.Lock] key=locktest_ 1
>>> columns=[[] | [value]]
>>> node3/logs/system.log-Row: id=@ | value=) because
>>> inProgress is now Commit(2d8146b0-c2cd-11e6-f996-e5c8d88a1da4,
>>> [MDS.Lock] key=locktest_ 1 columns=[[] | [value]]
>>>
>>>
>>> kind regards,
>>> Christian
>>>
>>>
>>> On Fri, Apr 15, 2016 at 8:27 PM, Denise Rogers <datag...@aol.com> wrote:
>>>
>>>> My thinking was that due to the size of the data that there maybe I/O
>>>> issues. But it sounds more like you're competing for locks and hit a
>>>> deadlock issue.
>>>>
>>>> Regards,
>>>> Denise
>>>> Cell - (860)989-3431 <(860)%20989-3431>
>>>>
>>>> Sent from mi iPhone
>>>>
>>>> On Apr 15, 2016, at 9:00 AM, horschi <hors...@gmail.com> wrote:
>>>>
>>>> Hi Denise,
>>>>
>>>> in my case its a small blob I am writing (should be around 100 bytes):
>>>>
>>>>  CREATE TABLE "Lock" (
>>>>  lockname varchar,
>>>>  id varchar,
>>>>  value blob,
>>>>  PRIMARY KEY (lockname, id)
>>>>  ) WITH COMPACT STORAGE
>>>>  AND COMPRESSION = { 'sstable_compression' :
>>>> 'SnappyCompressor', 'chunk_length_kb' : '8' };
>>>>
>>>> You ask because large values are known to cause issues? Anything
>>>> special you have in mind?
>>>>
>>>> kind regards,
>>>> Christian
>>>>
>>>>
>>>>
>>>>
>&

Re: All subsequent CAS requests time out after heavy use of new CAS feature

2016-12-23 Thread horschi
Update: I replace all quorum reads on that table with serial reads, and now
these errors got less. Somehow quorum reads on CAS values cause most of
these WTEs.

Also I found two tickets on that topic:
https://issues.apache.org/jira/browse/CASSANDRA-9328
https://issues.apache.org/jira/browse/CASSANDRA-8672

On Thu, Dec 15, 2016 at 3:14 PM, horschi <hors...@gmail.com> wrote:

> Hi,
>
> I would like to warm up this old thread. I did some debugging and found
> out that the timeouts are coming from StorageProxy.proposePaxos()
> - callback.isFullyRefused() returns false and therefore triggers a
> WriteTimeout.
>
> Looking at my ccm cluster logs, I can see that two replica nodes return
> different results in their ProposeVerbHandler. In my opinion the
> coordinator should not throw a Exception in such a case, but instead retry
> the operation.
>
> What do the CAS/Paxos experts on this list say to this? Feel free to
> instruct me to do further tests/code changes. I'd be glad to help.
>
> Log:
>
> node1/logs/system.log:WARN  [SharedPool-Worker-5] 2016-12-15 14:48:36,896
> PaxosState.java:124 - Rejecting proposal for 
> Commit(2d803540-c2cd-11e6-2e48-53a129c60cfc,
> [MDS.Lock] key=locktest_ 1 columns=[[] | [value]]
> node1/logs/system.log-Row: id=@ | value=) because
> inProgress is now Commit(2d8146b0-c2cd-11e6-f996-e5c8d88a1da4, [MDS.Lock]
> key=locktest_ 1 columns=[[] | [value]]
> --
> node1/logs/system.log:ERROR [SharedPool-Worker-12] 2016-12-15 14:48:36,980
> StorageProxy.java:506 - proposePaxos: 
> Commit(2d803540-c2cd-11e6-2e48-53a129c60cfc,
> [MDS.Lock] key=locktest_ 1 columns=[[] | [value]]
> node1/logs/system.log-Row: id=@ | value=)//1//0
> --
> node2/logs/system.log:WARN  [SharedPool-Worker-7] 2016-12-15 14:48:36,969
> PaxosState.java:117 - Accepting proposal: 
> Commit(2d803540-c2cd-11e6-2e48-53a129c60cfc,
> [MDS.Lock] key=locktest_ 1 columns=[[] | [value]]
> node2/logs/system.log-Row: id=@ | value=)
> --
> node3/logs/system.log:WARN  [SharedPool-Worker-2] 2016-12-15 14:48:36,897
> PaxosState.java:124 - Rejecting proposal for 
> Commit(2d803540-c2cd-11e6-2e48-53a129c60cfc,
> [MDS.Lock] key=locktest_ 1 columns=[[] | [value]]
> node3/logs/system.log-Row: id=@ | value=) because
> inProgress is now Commit(2d8146b0-c2cd-11e6-f996-e5c8d88a1da4, [MDS.Lock]
> key=locktest_ 1 columns=[[] | [value]]
>
>
> kind regards,
> Christian
>
>
> On Fri, Apr 15, 2016 at 8:27 PM, Denise Rogers <datag...@aol.com> wrote:
>
>> My thinking was that due to the size of the data that there maybe I/O
>> issues. But it sounds more like you're competing for locks and hit a
>> deadlock issue.
>>
>> Regards,
>> Denise
>> Cell - (860)989-3431 <(860)%20989-3431>
>>
>> Sent from mi iPhone
>>
>> On Apr 15, 2016, at 9:00 AM, horschi <hors...@gmail.com> wrote:
>>
>> Hi Denise,
>>
>> in my case its a small blob I am writing (should be around 100 bytes):
>>
>>  CREATE TABLE "Lock" (
>>  lockname varchar,
>>  id varchar,
>>  value blob,
>>  PRIMARY KEY (lockname, id)
>>  ) WITH COMPACT STORAGE
>>  AND COMPRESSION = { 'sstable_compression' : 'SnappyCompressor',
>> 'chunk_length_kb' : '8' };
>>
>> You ask because large values are known to cause issues? Anything special
>> you have in mind?
>>
>> kind regards,
>> Christian
>>
>>
>>
>>
>> On Fri, Apr 15, 2016 at 2:42 PM, Denise Rogers <datag...@aol.com> wrote:
>>
>>> Also, what type of data were you reading/writing?
>>>
>>> Regards,
>>> Denise
>>>
>>> Sent from mi iPad
>>>
>>> On Apr 15, 2016, at 8:29 AM, horschi <hors...@gmail.com> wrote:
>>>
>>> Hi Jan,
>>>
>>> were you able to resolve your Problem?
>>>
>>> We are trying the same and also see a lot of WriteTimeouts:
>>> WriteTimeoutException: Cassandra timeout during write query at
>>> consistency SERIAL (2 replica were required but only 1 acknowledged the
>>> write)
>>>
>>> How many clients were competing for a lock in your case? In our case its
>>> only two :-(
>>>
>>> cheers,
>>> Christian
>>>
>>>
>>> On Tue, Sep 24, 2013 at 12:18 AM, Robert Coli <rc...@eventbrite.com>
>>> wrote:
>>>
>>>> On Mon, Sep 16, 2013 at 9:09 AM, Jan Algermissen <
>>>> jan.algermis...@nordsc.com> wrote:
>>>>
>>>>> I am experimenting with C* 2.0 ( and today's java-driver 2.0 snapshot)
>>>>> for implementing distributed locks.
>>>>>
>>>>
>>>> [ and I'm experiencing the problem described in the subject ... ]
>>>>
>>>>
>>>>> Any idea how to approach this problem?
>>>>>
>>>>
>>>> 1) Upgrade to 2.0.1 release.
>>>> 2) Try to reproduce symptoms.
>>>> 3) If able to, file a JIRA at https://issues.apache.org/jira
>>>> /secure/Dashboard.jspa including repro steps
>>>> 4) Reply to this thread with the JIRA ticket URL
>>>>
>>>> =Rob
>>>>
>>>>
>>>>
>>>
>>>
>>
>


Re: All subsequent CAS requests time out after heavy use of new CAS feature

2016-12-15 Thread horschi
Hi,

I would like to warm up this old thread. I did some debugging and found out
that the timeouts are coming from StorageProxy.proposePaxos()
- callback.isFullyRefused() returns false and therefore triggers a
WriteTimeout.

Looking at my ccm cluster logs, I can see that two replica nodes return
different results in their ProposeVerbHandler. In my opinion the
coordinator should not throw a Exception in such a case, but instead retry
the operation.

What do the CAS/Paxos experts on this list say to this? Feel free to
instruct me to do further tests/code changes. I'd be glad to help.

Log:

node1/logs/system.log:WARN  [SharedPool-Worker-5] 2016-12-15 14:48:36,896
PaxosState.java:124 - Rejecting proposal for
Commit(2d803540-c2cd-11e6-2e48-53a129c60cfc, [MDS.Lock] key=locktest_ 1
columns=[[] | [value]]
node1/logs/system.log-Row: id=@ | value=) because inProgress
is now Commit(2d8146b0-c2cd-11e6-f996-e5c8d88a1da4, [MDS.Lock]
key=locktest_ 1 columns=[[] | [value]]
--
node1/logs/system.log:ERROR [SharedPool-Worker-12] 2016-12-15 14:48:36,980
StorageProxy.java:506 - proposePaxos:
Commit(2d803540-c2cd-11e6-2e48-53a129c60cfc, [MDS.Lock] key=locktest_ 1
columns=[[] | [value]]
node1/logs/system.log-Row: id=@ | value=)//1//0
--
node2/logs/system.log:WARN  [SharedPool-Worker-7] 2016-12-15 14:48:36,969
PaxosState.java:117 - Accepting proposal:
Commit(2d803540-c2cd-11e6-2e48-53a129c60cfc, [MDS.Lock] key=locktest_ 1
columns=[[] | [value]]
node2/logs/system.log-Row: id=@ | value=)
--
node3/logs/system.log:WARN  [SharedPool-Worker-2] 2016-12-15 14:48:36,897
PaxosState.java:124 - Rejecting proposal for
Commit(2d803540-c2cd-11e6-2e48-53a129c60cfc, [MDS.Lock] key=locktest_ 1
columns=[[] | [value]]
node3/logs/system.log-Row: id=@ | value=) because inProgress
is now Commit(2d8146b0-c2cd-11e6-f996-e5c8d88a1da4, [MDS.Lock]
key=locktest_ 1 columns=[[] | [value]]


kind regards,
Christian


On Fri, Apr 15, 2016 at 8:27 PM, Denise Rogers <datag...@aol.com> wrote:

> My thinking was that due to the size of the data that there maybe I/O
> issues. But it sounds more like you're competing for locks and hit a
> deadlock issue.
>
> Regards,
> Denise
> Cell - (860)989-3431 <(860)%20989-3431>
>
> Sent from mi iPhone
>
> On Apr 15, 2016, at 9:00 AM, horschi <hors...@gmail.com> wrote:
>
> Hi Denise,
>
> in my case its a small blob I am writing (should be around 100 bytes):
>
>  CREATE TABLE "Lock" (
>  lockname varchar,
>  id varchar,
>  value blob,
>  PRIMARY KEY (lockname, id)
>  ) WITH COMPACT STORAGE
>  AND COMPRESSION = { 'sstable_compression' : 'SnappyCompressor',
> 'chunk_length_kb' : '8' };
>
> You ask because large values are known to cause issues? Anything special
> you have in mind?
>
> kind regards,
> Christian
>
>
>
>
> On Fri, Apr 15, 2016 at 2:42 PM, Denise Rogers <datag...@aol.com> wrote:
>
>> Also, what type of data were you reading/writing?
>>
>> Regards,
>> Denise
>>
>> Sent from mi iPad
>>
>> On Apr 15, 2016, at 8:29 AM, horschi <hors...@gmail.com> wrote:
>>
>> Hi Jan,
>>
>> were you able to resolve your Problem?
>>
>> We are trying the same and also see a lot of WriteTimeouts:
>> WriteTimeoutException: Cassandra timeout during write query at
>> consistency SERIAL (2 replica were required but only 1 acknowledged the
>> write)
>>
>> How many clients were competing for a lock in your case? In our case its
>> only two :-(
>>
>> cheers,
>> Christian
>>
>>
>> On Tue, Sep 24, 2013 at 12:18 AM, Robert Coli <rc...@eventbrite.com>
>> wrote:
>>
>>> On Mon, Sep 16, 2013 at 9:09 AM, Jan Algermissen <
>>> jan.algermis...@nordsc.com> wrote:
>>>
>>>> I am experimenting with C* 2.0 ( and today's java-driver 2.0 snapshot)
>>>> for implementing distributed locks.
>>>>
>>>
>>> [ and I'm experiencing the problem described in the subject ... ]
>>>
>>>
>>>> Any idea how to approach this problem?
>>>>
>>>
>>> 1) Upgrade to 2.0.1 release.
>>> 2) Try to reproduce symptoms.
>>> 3) If able to, file a JIRA at https://issues.apache.org/
>>> jira/secure/Dashboard.jspa including repro steps
>>> 4) Reply to this thread with the JIRA ticket URL
>>>
>>> =Rob
>>>
>>>
>>>
>>
>>
>


Re: Speeding up schema generation during tests

2016-10-23 Thread horschi
You have to manually do "nodetool flush && nodetool flush system" before
shutdown, otherwise Cassandra might break. With that it is working nicely.

On Sun, Oct 23, 2016 at 3:40 PM, Ali Akhtar <ali.rac...@gmail.com> wrote:

> I'm using https://github.com/jsevellec/cassandra-unit and haven't come
> across any race issues or problems. Cassandra-unit takes care of creating
> the schema before it runs the tests.
>
> On Sun, Oct 23, 2016 at 6:17 PM, DuyHai Doan <doanduy...@gmail.com> wrote:
>
>> Ok I have added -Dcassandra.unsafesystem=true and my tests are broken.
>>
>> The reason is that I create some schemas before executing tests.
>>
>> When unable unsafesystem, Cassandra does not block for schema flush so
>> you man run into race conditions where the test start using the created
>> schema but it has not been fully flushed yet to disk:
>>
>> See C* source code here: https://github.com/apach
>> e/cassandra/blob/trunk/src/java/org/apache/cassandra/sche
>> ma/SchemaKeyspace.java#L278-L282
>>
>> static void flush()
>> {
>> if (!DatabaseDescriptor.isUnsafeSystem())
>> ALL.forEach(table -> FBUtilities.waitOnFuture(getSc
>> hemaCFS(table).forceFlush()));
>> }
>>
>> I don't know how it worked out for you but it didn't for me...
>>
>> On Wed, Oct 19, 2016 at 9:45 AM, DuyHai Doan <doanduy...@gmail.com>
>> wrote:
>>
>>> Ohh didn't know such system property exist, nice idea!
>>>
>>> On Wed, Oct 19, 2016 at 9:40 AM, horschi <hors...@gmail.com> wrote:
>>>
>>>> Have you tried starting Cassandra with -Dcassandra.unsafesystem=true ?
>>>>
>>>>
>>>> On Wed, Oct 19, 2016 at 9:31 AM, DuyHai Doan <doanduy...@gmail.com>
>>>> wrote:
>>>>
>>>>> As I said, when I bootstrap the server and create some keyspace,
>>>>> sometimes the schema is not fully initialized and when the test code tried
>>>>> to insert data, it fails.
>>>>>
>>>>> I did not have time to dig into the source code to find the root
>>>>> cause, maybe it's something really stupid and simple to fix. If you want 
>>>>> to
>>>>> investigate and try out my CassandraDaemon server, I'd be happy to get
>>>>> feedbacks
>>>>>
>>>>> On Wed, Oct 19, 2016 at 9:22 AM, Ali Akhtar <ali.rac...@gmail.com>
>>>>> wrote:
>>>>>
>>>>>> Thanks. I've disabled durable writes but this is still pretty slow
>>>>>> (about 10 seconds).
>>>>>>
>>>>>> What issues did you run into with your impl?
>>>>>>
>>>>>> On Wed, Oct 19, 2016 at 12:15 PM, DuyHai Doan <doanduy...@gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>>> There is a lot of pre-flight checks when starting the cassandra
>>>>>>> server and they took time.
>>>>>>>
>>>>>>> For integration testing, I have developped a modified
>>>>>>> CassandraDeamon here that remove pretty most of those checks:
>>>>>>>
>>>>>>> https://github.com/doanduyhai/Achilles/blob/master/achilles-
>>>>>>> embedded/src/main/java/info/archinnov/achilles/embedded/Achi
>>>>>>> llesCassandraDaemon.java
>>>>>>>
>>>>>>> The problem is that I felt into weird scenarios where creating a
>>>>>>> keyspace wasn't created in timely manner so I just stop using this impl 
>>>>>>> for
>>>>>>> the moment, just look at it and do whatever you want.
>>>>>>>
>>>>>>> Another idea for testing is to disable durable write to speed up
>>>>>>> mutation (CREATE KEYSPACE ... WITH durable_write=false)
>>>>>>>
>>>>>>> On Wed, Oct 19, 2016 at 3:24 AM, Ali Akhtar <ali.rac...@gmail.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> Is there a way to speed up the creation of keyspace + tables during
>>>>>>>> integration tests? I am using an RF of 1, with SimpleStrategy, but it 
>>>>>>>> still
>>>>>>>> takes upto 10-15 seconds.
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>


Re: Speeding up schema generation during tests

2016-10-19 Thread horschi
Have you tried starting Cassandra with -Dcassandra.unsafesystem=true ?


On Wed, Oct 19, 2016 at 9:31 AM, DuyHai Doan  wrote:

> As I said, when I bootstrap the server and create some keyspace, sometimes
> the schema is not fully initialized and when the test code tried to insert
> data, it fails.
>
> I did not have time to dig into the source code to find the root cause,
> maybe it's something really stupid and simple to fix. If you want to
> investigate and try out my CassandraDaemon server, I'd be happy to get
> feedbacks
>
> On Wed, Oct 19, 2016 at 9:22 AM, Ali Akhtar  wrote:
>
>> Thanks. I've disabled durable writes but this is still pretty slow (about
>> 10 seconds).
>>
>> What issues did you run into with your impl?
>>
>> On Wed, Oct 19, 2016 at 12:15 PM, DuyHai Doan 
>> wrote:
>>
>>> There is a lot of pre-flight checks when starting the cassandra server
>>> and they took time.
>>>
>>> For integration testing, I have developped a modified CassandraDeamon
>>> here that remove pretty most of those checks:
>>>
>>> https://github.com/doanduyhai/Achilles/blob/master/achilles-
>>> embedded/src/main/java/info/archinnov/achilles/embedded/Achi
>>> llesCassandraDaemon.java
>>>
>>> The problem is that I felt into weird scenarios where creating a
>>> keyspace wasn't created in timely manner so I just stop using this impl for
>>> the moment, just look at it and do whatever you want.
>>>
>>> Another idea for testing is to disable durable write to speed up
>>> mutation (CREATE KEYSPACE ... WITH durable_write=false)
>>>
>>> On Wed, Oct 19, 2016 at 3:24 AM, Ali Akhtar 
>>> wrote:
>>>
 Is there a way to speed up the creation of keyspace + tables during
 integration tests? I am using an RF of 1, with SimpleStrategy, but it still
 takes upto 10-15 seconds.

>>>
>>>
>>
>


Re: Java Driver - Specifying parameters for an IN() query?

2016-10-11 Thread horschi
Hi Ali,

do you perhaps want "'Select * from my_table WHERE pk = ? And ck IN ?'" ?
(Without the brackets around the question mark)

regards,
Ch

On Tue, Oct 11, 2016 at 3:14 PM, Ali Akhtar  wrote:

> If I wanted to create an accessor, and have a method which does a query
> like this:
>
> 'Select * from my_table WHERE pk = ? And ck IN (?)'
>
> And there were multiple options that could go inside the IN() query, how
> can I specify that? Will it e.g, let me pass in an array as the 2nd
> variable?
>


Re: Stale value appears after consecutive TRUNCATE

2016-08-25 Thread horschi
(running C* 2.2.7)

On Thu, Aug 25, 2016 at 11:10 AM, horschi <hors...@gmail.com> wrote:

> Hi Yuji,
>
> I tried your script a couple of times. I did not experience any stale
> values. (On my Linux laptop)
>
> regards,
> Ch
>
> On Mon, Aug 15, 2016 at 7:29 AM, Yuji Ito <y...@imagine-orb.com> wrote:
>
>> Hi,
>>
>> I can reproduce the problem with the following script.
>> I got rows which should be truncated.
>> If truncating is executed only once, the problem doesn't occur.
>>
>> The test for multi nodes (replication_factor:3, kill & restart C*
>> processes in all nodes) can also reproduce it.
>>
>> test script:
>> 
>>
>> ip=xxx.xxx.xxx.xxx
>>
>> echo "0. prepare a table"
>> cqlsh $ip -e "drop keyspace testdb;"
>> cqlsh $ip -e "CREATE KEYSPACE testdb WITH replication = {'class':
>> 'SimpleStrategy', 'replication_factor': '1'};"
>> cqlsh $ip -e "CREATE TABLE testdb.testtbl (key int PRIMARY KEY, val int);"
>>
>> echo "1. insert rows"
>> for key in $(seq 1 10)
>> do
>> cqlsh $ip -e "insert into testdb.testtbl (key, val) values($key,
>> 1000) IF NOT EXISTS;" >> /dev/null 2>&1
>> done
>>
>> echo "2. truncate the table twice"
>> cqlsh $ip -e "consistency all; truncate table testdb.testtbl"
>> cqlsh $ip -e "consistency all; truncate table testdb.testtbl"
>>
>> echo "3. kill C* process"
>> ps auxww | grep "CassandraDaemon" | awk '{if ($13 ~ /cassand/) print $2}'
>> | xargs sudo kill -9
>>
>> echo "4. restart C* process"
>> sudo /etc/init.d/cassandra start
>> sleep 20
>>
>> echo "5. check the table"
>> cqlsh $ip -e "select * from testdb.testtbl;"
>>
>> 
>>
>> test result:
>> 
>>
>> 0. prepare a table
>> 1. insert rows
>> 2. truncate the table twice
>> Consistency level set to ALL.
>> Consistency level set to ALL.
>> 3. kill C* process
>> 4. restart C* process
>> Starting Cassandra: OK
>> 5. check the table
>>
>>  key | val
>> -+--
>>5 | 1000
>>   10 | 1000
>>1 | 1000
>>8 | 1000
>>2 | 1000
>>4 | 1000
>>7 | 1000
>>6 | 1000
>>9 | 1000
>>3 | 1000
>>
>> (10 rows)
>>
>> 
>>
>>
>> Thanks Christian,
>>
>> I tried with durable_writes=False.
>> It failed. I guessed this failure was caused by another problem.
>> I use SimpleStrategy.
>> A keyspace using the SimpleStrategy isn't permitted to use
>> durable_writes=False.
>>
>>
>> Regards,
>> Yuji
>>
>> On Thu, Aug 11, 2016 at 12:41 AM, horschi <hors...@gmail.com> wrote:
>>
>>> Hi Yuji,
>>>
>>> ok, perhaps you are seeing a different issue than I do.
>>>
>>> Have you tried with durable_writes=False? If the issue is caused by the
>>> commitlog, then it should work if you disable durable_writes.
>>>
>>> Cheers,
>>> Christian
>>>
>>>
>>>
>>> On Tue, Aug 9, 2016 at 3:04 PM, Yuji Ito <y...@imagine-orb.com> wrote:
>>>
>>>> Thanks Christian
>>>>
>>>> can you reproduce the behaviour with a single node?
>>>>
>>>> I tried my test with a single node. But I can't.
>>>>
>>>> This behaviour is seems to be CQL only, or at least has gotten worse
>>>>> with CQL. I did not experience this with Thrift.
>>>>
>>>> I truncate tables with CQL. I've never tried with Thrift.
>>>>
>>>> I think that my problem can happen when truncating even succeeds.
>>>> That's because I check all records after truncating.
>>>>
>>>> I checked the source code.
>>>> ReplayPosition.segment and position become -1 and 0
>>>> (ReplayPosition.NONE) in dscardSSTables() at truncating a table when there
>>>> is no SSTable.
>>>> I guess that ReplayPosition.segment shouldn't be -1 at truncating a
>>>> table in this case.
>>>> replayMutation() can request unexpected replay mutations because of
>>>> this segment's value.
>>>>
>>>> Is there anyone familiar with truncate and replay?
>>>>
>>>> Regards,
>>>> Yuji
>>>>
>>>>
>>>> On Mon, Aug 8, 2016 at 6:36 PM, 

Re: Stale value appears after consecutive TRUNCATE

2016-08-25 Thread horschi
Hi Yuji,

I tried your script a couple of times. I did not experience any stale
values. (On my Linux laptop)

regards,
Ch

On Mon, Aug 15, 2016 at 7:29 AM, Yuji Ito <y...@imagine-orb.com> wrote:

> Hi,
>
> I can reproduce the problem with the following script.
> I got rows which should be truncated.
> If truncating is executed only once, the problem doesn't occur.
>
> The test for multi nodes (replication_factor:3, kill & restart C*
> processes in all nodes) can also reproduce it.
>
> test script:
> 
>
> ip=xxx.xxx.xxx.xxx
>
> echo "0. prepare a table"
> cqlsh $ip -e "drop keyspace testdb;"
> cqlsh $ip -e "CREATE KEYSPACE testdb WITH replication = {'class':
> 'SimpleStrategy', 'replication_factor': '1'};"
> cqlsh $ip -e "CREATE TABLE testdb.testtbl (key int PRIMARY KEY, val int);"
>
> echo "1. insert rows"
> for key in $(seq 1 10)
> do
> cqlsh $ip -e "insert into testdb.testtbl (key, val) values($key, 1000)
> IF NOT EXISTS;" >> /dev/null 2>&1
> done
>
> echo "2. truncate the table twice"
> cqlsh $ip -e "consistency all; truncate table testdb.testtbl"
> cqlsh $ip -e "consistency all; truncate table testdb.testtbl"
>
> echo "3. kill C* process"
> ps auxww | grep "CassandraDaemon" | awk '{if ($13 ~ /cassand/) print $2}'
> | xargs sudo kill -9
>
> echo "4. restart C* process"
> sudo /etc/init.d/cassandra start
> sleep 20
>
> echo "5. check the table"
> cqlsh $ip -e "select * from testdb.testtbl;"
>
> 
>
> test result:
> 
>
> 0. prepare a table
> 1. insert rows
> 2. truncate the table twice
> Consistency level set to ALL.
> Consistency level set to ALL.
> 3. kill C* process
> 4. restart C* process
> Starting Cassandra: OK
> 5. check the table
>
>  key | val
> -+--
>5 | 1000
>   10 | 1000
>1 | 1000
>8 | 1000
>2 | 1000
>4 | 1000
>7 | 1000
>6 | 1000
>9 | 1000
>3 | 1000
>
> (10 rows)
>
> 
>
>
> Thanks Christian,
>
> I tried with durable_writes=False.
> It failed. I guessed this failure was caused by another problem.
> I use SimpleStrategy.
> A keyspace using the SimpleStrategy isn't permitted to use
> durable_writes=False.
>
>
> Regards,
> Yuji
>
> On Thu, Aug 11, 2016 at 12:41 AM, horschi <hors...@gmail.com> wrote:
>
>> Hi Yuji,
>>
>> ok, perhaps you are seeing a different issue than I do.
>>
>> Have you tried with durable_writes=False? If the issue is caused by the
>> commitlog, then it should work if you disable durable_writes.
>>
>> Cheers,
>> Christian
>>
>>
>>
>> On Tue, Aug 9, 2016 at 3:04 PM, Yuji Ito <y...@imagine-orb.com> wrote:
>>
>>> Thanks Christian
>>>
>>> can you reproduce the behaviour with a single node?
>>>
>>> I tried my test with a single node. But I can't.
>>>
>>> This behaviour is seems to be CQL only, or at least has gotten worse
>>>> with CQL. I did not experience this with Thrift.
>>>
>>> I truncate tables with CQL. I've never tried with Thrift.
>>>
>>> I think that my problem can happen when truncating even succeeds.
>>> That's because I check all records after truncating.
>>>
>>> I checked the source code.
>>> ReplayPosition.segment and position become -1 and 0
>>> (ReplayPosition.NONE) in dscardSSTables() at truncating a table when there
>>> is no SSTable.
>>> I guess that ReplayPosition.segment shouldn't be -1 at truncating a
>>> table in this case.
>>> replayMutation() can request unexpected replay mutations because of this
>>> segment's value.
>>>
>>> Is there anyone familiar with truncate and replay?
>>>
>>> Regards,
>>> Yuji
>>>
>>>
>>> On Mon, Aug 8, 2016 at 6:36 PM, horschi <hors...@gmail.com> wrote:
>>>
>>>> Hi Yuji,
>>>>
>>>> can you reproduce the behaviour with a single node?
>>>>
>>>> The reason I ask is because I probably have the same issue with my
>>>> automated tests (which run truncate between every test), which run on my
>>>> local laptop.
>>>>
>>>> Maybe around 5 tests randomly fail out of my 1800. I can see that the
>>>> failed tests sometimes show data from other tests, which I think must be
>>>> because of a failed truncate. This behaviour is seems to be CQL only, or at
>

Re: Stale value appears after consecutive TRUNCATE

2016-08-10 Thread horschi
Hi Yuji,

ok, perhaps you are seeing a different issue than I do.

Have you tried with durable_writes=False? If the issue is caused by the
commitlog, then it should work if you disable durable_writes.

Cheers,
Christian



On Tue, Aug 9, 2016 at 3:04 PM, Yuji Ito <y...@imagine-orb.com> wrote:

> Thanks Christian
>
> can you reproduce the behaviour with a single node?
>
> I tried my test with a single node. But I can't.
>
> This behaviour is seems to be CQL only, or at least has gotten worse with
>> CQL. I did not experience this with Thrift.
>
> I truncate tables with CQL. I've never tried with Thrift.
>
> I think that my problem can happen when truncating even succeeds.
> That's because I check all records after truncating.
>
> I checked the source code.
> ReplayPosition.segment and position become -1 and 0 (ReplayPosition.NONE)
> in dscardSSTables() at truncating a table when there is no SSTable.
> I guess that ReplayPosition.segment shouldn't be -1 at truncating a table
> in this case.
> replayMutation() can request unexpected replay mutations because of this
> segment's value.
>
> Is there anyone familiar with truncate and replay?
>
> Regards,
> Yuji
>
>
> On Mon, Aug 8, 2016 at 6:36 PM, horschi <hors...@gmail.com> wrote:
>
>> Hi Yuji,
>>
>> can you reproduce the behaviour with a single node?
>>
>> The reason I ask is because I probably have the same issue with my
>> automated tests (which run truncate between every test), which run on my
>> local laptop.
>>
>> Maybe around 5 tests randomly fail out of my 1800. I can see that the
>> failed tests sometimes show data from other tests, which I think must be
>> because of a failed truncate. This behaviour is seems to be CQL only, or at
>> least has gotten worse with CQL. I did not experience this with Thrift.
>>
>> regards,
>> Christian
>>
>>
>>
>> On Mon, Aug 8, 2016 at 7:34 AM, Yuji Ito <y...@imagine-orb.com> wrote:
>>
>>> Hi all,
>>>
>>> I have a question about clearing table and commit log replay.
>>> After some tables were truncated consecutively, I got some stale values.
>>> This problem doesn't occur when I clear keyspaces with DROP (and CREATE).
>>>
>>> I'm testing the following test with node failure.
>>> Some stale values appear at checking phase.
>>>
>>> Test iteration:
>>> 1. initialize tables as below
>>> 2. request a lot of read/write concurrently
>>> 3. check all records
>>> 4. repeat from the beginning
>>>
>>> I use C* 2.2.6. There are 3 nodes (replication_factor: 3).
>>> Each node kills cassandra process at random intervals and restarts it
>>> immediately.
>>>
>>> My initialization:
>>> 1. clear tables with TRUNCATE
>>> 2. INSERT initial records
>>> 3. check if all values are correct
>>>
>>> If any phase fails (because of node failure), the initialization starts
>>> all over again.
>>> So, tables are sometimes truncated consecutively.
>>> Though the check in the initialization is OK, stale data appears when I
>>> execute "SELECT * FROM mykeyspace.mytable;" after a lot of requests are
>>> completed.
>>>
>>> The problem is likely to occur when the ReplayPosition's value in
>>> "truncated_at" is initialized as below after an empty table is truncated.
>>>
>>> Column Family ID: truncated_at
>>> ----: 0x
>>> 0156597cd4c7
>>> (this value was acquired just after phase 1 in my initialization)
>>>
>>> I guess some unexpected replays occur.
>>> Does anyone know the behavior?
>>>
>>> Thanks,
>>> Yuji
>>>
>>
>>
>


Re: Stale value appears after consecutive TRUNCATE

2016-08-08 Thread horschi
Hi Yuji,

can you reproduce the behaviour with a single node?

The reason I ask is because I probably have the same issue with my
automated tests (which run truncate between every test), which run on my
local laptop.

Maybe around 5 tests randomly fail out of my 1800. I can see that the
failed tests sometimes show data from other tests, which I think must be
because of a failed truncate. This behaviour is seems to be CQL only, or at
least has gotten worse with CQL. I did not experience this with Thrift.

regards,
Christian



On Mon, Aug 8, 2016 at 7:34 AM, Yuji Ito  wrote:

> Hi all,
>
> I have a question about clearing table and commit log replay.
> After some tables were truncated consecutively, I got some stale values.
> This problem doesn't occur when I clear keyspaces with DROP (and CREATE).
>
> I'm testing the following test with node failure.
> Some stale values appear at checking phase.
>
> Test iteration:
> 1. initialize tables as below
> 2. request a lot of read/write concurrently
> 3. check all records
> 4. repeat from the beginning
>
> I use C* 2.2.6. There are 3 nodes (replication_factor: 3).
> Each node kills cassandra process at random intervals and restarts it
> immediately.
>
> My initialization:
> 1. clear tables with TRUNCATE
> 2. INSERT initial records
> 3. check if all values are correct
>
> If any phase fails (because of node failure), the initialization starts
> all over again.
> So, tables are sometimes truncated consecutively.
> Though the check in the initialization is OK, stale data appears when I
> execute "SELECT * FROM mykeyspace.mytable;" after a lot of requests are
> completed.
>
> The problem is likely to occur when the ReplayPosition's value in
> "truncated_at" is initialized as below after an empty table is truncated.
>
> Column Family ID: truncated_at
> ----: 0x
> 0156597cd4c7
> (this value was acquired just after phase 1 in my initialization)
>
> I guess some unexpected replays occur.
> Does anyone know the behavior?
>
> Thanks,
> Yuji
>


Re: [RELEASE] Apache Cassandra 3.0.8 released

2016-07-08 Thread horschi
2.2.7 also works. Thanks!

On Fri, Jul 8, 2016 at 9:15 AM, Julien Anguenot <jul...@anguenot.org> wrote:

> All good now. Thanks.
>
>J.
>
> --
> Julien Anguenot (@anguenot)
>
> On Jul 8, 2016, at 4:10 AM, Jake Luciani <j...@apache.org> wrote:
>
> Sorry, I totally missed that.  Uploading now.
>
> On Thu, Jul 7, 2016 at 4:51 AM, horschi <hors...@gmail.com> wrote:
>
>> Same for 2.2.7.
>>
>> On Thu, Jul 7, 2016 at 10:49 AM, Julien Anguenot <jul...@anguenot.org>
>> wrote:
>>
>>> Hey,
>>>
>>> The Debian packages do not seem to have been published. Normal?
>>>
>>> Thank you.
>>>
>>>J.
>>>
>>> On Jul 6, 2016, at 4:20 PM, Jake Luciani <j...@apache.org> wrote:
>>>
>>> The Cassandra team is pleased to announce the release of Apache Cassandra
>>> version 3.0.8.
>>>
>>> Apache Cassandra is a fully distributed database. It is the right choice
>>> when you need scalability and high availability without compromising
>>> performance.
>>>
>>>  http://cassandra.apache.org/
>>>
>>> Downloads of source and binary distributions are listed in our download
>>> section:
>>>
>>>  http://cassandra.apache.org/download/
>>>
>>> This version is a bug fix release[1] on the 3.0 series. As always,
>>> please pay
>>> attention to the release notes[2] and Let us know[3] if you were to
>>> encounter
>>> any problem.
>>>
>>> Enjoy!
>>>
>>> [1]: http://goo.gl/DQpe4d (CHANGES.txt)
>>> [2]: http://goo.gl/UISX1K (NEWS.txt)
>>> [3]: https://issues.apache.org/jira/browse/CASSANDRA
>>>
>>>
>>>
>>
>
>


Re: [RELEASE] Apache Cassandra 3.0.8 released

2016-07-07 Thread horschi
Same for 2.2.7.

On Thu, Jul 7, 2016 at 10:49 AM, Julien Anguenot 
wrote:

> Hey,
>
> The Debian packages do not seem to have been published. Normal?
>
> Thank you.
>
>J.
>
> On Jul 6, 2016, at 4:20 PM, Jake Luciani  wrote:
>
> The Cassandra team is pleased to announce the release of Apache Cassandra
> version 3.0.8.
>
> Apache Cassandra is a fully distributed database. It is the right choice
> when you need scalability and high availability without compromising
> performance.
>
>  http://cassandra.apache.org/
>
> Downloads of source and binary distributions are listed in our download
> section:
>
>  http://cassandra.apache.org/download/
>
> This version is a bug fix release[1] on the 3.0 series. As always, please
> pay
> attention to the release notes[2] and Let us know[3] if you were to
> encounter
> any problem.
>
> Enjoy!
>
> [1]: http://goo.gl/DQpe4d (CHANGES.txt)
> [2]: http://goo.gl/UISX1K (NEWS.txt)
> [3]: https://issues.apache.org/jira/browse/CASSANDRA
>
>
>


Re: C* 2.2.7 ?

2016-06-29 Thread horschi
Awesome! There is a lot of good stuff in 2.2.7 :-)

On Wed, Jun 29, 2016 at 5:37 PM, Tyler Hobbs <ty...@datastax.com> wrote:

> 2.2.7 just got tentatively tagged yesterday.  So, there should be a vote
> on releasing it shortly.
>
> On Wed, Jun 29, 2016 at 8:24 AM, Dominik Keil <dominik.k...@movilizer.com>
> wrote:
>
>> +1
>>
>> there's some bugs fixed we might be or sure are affected by and the
>> change log has become quite large already mind voting von 2.2.7 soon?
>>
>>
>> Am 21.06.2016 um 15:31 schrieb horschi:
>>
>> Hi,
>>
>> are there any plans to release 2.2.7 any time soon?
>>
>> kind regards,
>> Christian
>>
>>
>> --
>> *Dominik Keil*
>> Phone: + 49 (0) 621 150 207 31
>> Mobile: + 49 (0) 151 626 602 14
>>
>> Movilizer GmbH
>> Konrad-Zuse-Ring 30
>> 68163 Mannheim
>> Germany
>>
>> movilizer.com
>>
>> [image: Visit company website] <http://movilizer.com/>
>> *Reinvent Your Mobile Enterprise*
>>
>> *-Movilizer is moving*
>> After June 27th 2016 Movilizer's new headquarter will be
>>
>>
>>
>>
>> *EASTSITE VIIIKonrad-Zuse-Ring 3068163 Mannheim*
>>
>> <http://movilizer.com/training>
>> <http://movilizer.com/training>
>>
>> *Be the first to know:*
>> Twitter <https://twitter.com/Movilizer> | LinkedIn
>> <https://www.linkedin.com/company/movilizer-gmbh> | Facebook
>> <https://www.facebook.com/Movilizer> | stack overflow
>> <http://stackoverflow.com/questions/tagged/movilizer>
>>
>> Company's registered office: Mannheim HRB: 700323 / Country Court:
>> Mannheim Managing Directors: Alberto Zamora, Jörg Bernauer, Oliver Lesche
>> Please inform us immediately if this e-mail and/or any attachment was
>> transmitted incompletely or was not intelligible.
>>
>> This e-mail and any attachment is for authorized use by the intended
>> recipient(s) only. It may contain proprietary material, confidential
>> information and/or be subject to legal privilege. It should not be
>> copied, disclosed to, retained or used by any other party. If you are not
>> an intended recipient then please promptly delete this e-mail and any
>> attachment and all copies and inform the sender.
>
>
>
>
> --
> Tyler Hobbs
> DataStax <http://datastax.com/>
>


C* 2.2.7 ?

2016-06-21 Thread horschi
Hi,

are there any plans to release 2.2.7 any time soon?

kind regards,
Christian


Re: CAS operation does not return value on failure

2016-05-09 Thread horschi
Update: It was actually the driver update (from 2.1.9 to 3.0.1) that solved
the issue. I reverted by C* Server back to 2.2 and my test is still ok.

On Mon, May 9, 2016 at 1:28 PM, horschi <hors...@gmail.com> wrote:

> I just retried with Cassandra 3.0.5 and it performs much better. Not a
> single of these illegal results.
>
> I guess my recommendation for anyone using CAS is: Upgrade to >= 3.x :-)
>
> On Wed, May 4, 2016 at 5:46 PM, horschi <hors...@gmail.com> wrote:
>
>> Hi,
>>
>> I am doing some testing on CAS operations and I am frequently having the
>> issue that my resultset says wasApplied()==false, but it does not contain
>> any value.
>>
>>
>> This behaviour of course leads to the following Exception when I try to
>> read it:
>>
>> Caused by: java.lang.IllegalArgumentException: value is not a column
>> defined in this metadata
>> at
>> com.datastax.driver.core.ColumnDefinitions.getAllIdx(ColumnDefinitions.java:273)
>> at
>> com.datastax.driver.core.ColumnDefinitions.getFirstIdx(ColumnDefinitions.java:279)
>> at
>> com.datastax.driver.core.ArrayBackedRow.getIndexOf(ArrayBackedRow.java:68)
>> at
>> com.datastax.driver.core.AbstractGettableData.getBytes(AbstractGettableData.java:131)
>>
>>
>>
>> My questions now are:
>>
>> Is it to be expected that a failing CAS operation sometimes does this?
>>
>> if yes: Shouldn't there a possibility on the driver side to handle this
>> in a better was, e.g. add a "hasColumn()" method or something to the
>> ResultSet?
>>
>> if no: Is that perhaps a symptom to a greater issue in cassandra?
>>
>>
>> kind regards,
>> Christian
>>
>> PS: I also appreciate general feedback on the entire C* CAS topic :-)
>>
>>
>>
>


Re: CAS operation does not return value on failure

2016-05-09 Thread horschi
I just retried with Cassandra 3.0.5 and it performs much better. Not a
single of these illegal results.

I guess my recommendation for anyone using CAS is: Upgrade to >= 3.x :-)

On Wed, May 4, 2016 at 5:46 PM, horschi <hors...@gmail.com> wrote:

> Hi,
>
> I am doing some testing on CAS operations and I am frequently having the
> issue that my resultset says wasApplied()==false, but it does not contain
> any value.
>
>
> This behaviour of course leads to the following Exception when I try to
> read it:
>
> Caused by: java.lang.IllegalArgumentException: value is not a column
> defined in this metadata
> at
> com.datastax.driver.core.ColumnDefinitions.getAllIdx(ColumnDefinitions.java:273)
> at
> com.datastax.driver.core.ColumnDefinitions.getFirstIdx(ColumnDefinitions.java:279)
> at
> com.datastax.driver.core.ArrayBackedRow.getIndexOf(ArrayBackedRow.java:68)
> at
> com.datastax.driver.core.AbstractGettableData.getBytes(AbstractGettableData.java:131)
>
>
>
> My questions now are:
>
> Is it to be expected that a failing CAS operation sometimes does this?
>
> if yes: Shouldn't there a possibility on the driver side to handle this in
> a better was, e.g. add a "hasColumn()" method or something to the ResultSet?
>
> if no: Is that perhaps a symptom to a greater issue in cassandra?
>
>
> kind regards,
> Christian
>
> PS: I also appreciate general feedback on the entire C* CAS topic :-)
>
>
>


Re: CAS operation does not return value on failure

2016-05-09 Thread horschi
Hi Jack,

sorry to keep you busy :-)

There definitely is a column named "value" in the table. And most of the
time this codepath works fine, even when my CAS update fails. But in very
rare cases I get a ResultSet that contains applied=false but does not
contain any value column.


I just ran my test again and found a empty ResultSet for the following
query:

delete from "Lock" where lockname=:lockname and id=:id if value=:value

--> ResultSet contains only [applied]=false, but no lockname, id or value.


Am I correct in my assumption that this should not be?

kind regards,
Christian



On Fri, May 6, 2016 at 1:20 AM, Jack Krupansky <jack.krupan...@gmail.com>
wrote:

> "value" in that message is the name of a column that is expect to be in
> your table schema - the message is simply complaining that you have no
> column named "value" in that table.
> The error concerns the table schema, not any actual data in either the
> statement or the table.
>
> "metadata" is simply referring to your table schema.
>
> Does your table schema have a "value" column?
> Does your preared statement refer to a "value" column, or are you
> supplying that name when executing the prepared statement?
>
> The "datastax.driver.core" in the exception trace class names indicates
> that the error is detected in the Java driver, not Cassandra.
>
>
>
> -- Jack Krupansky
>
> On Thu, May 5, 2016 at 6:45 PM, horschi <hors...@gmail.com> wrote:
>
>> Hi Jack,
>>
>> I thought that it is Cassandra that fills the value on CAS failures. So
>> the question if it is to be expected to have wasApplied()==false and not
>> have any value in the ResultSet should belong here.
>>
>> So my question for this mailing list would be:
>>
>> Is it correct behaviour that C* returns wasApplied()==false but not any
>> value? My expectation was that there always is a value in such a case.
>>
>> kind regards,
>> Christian
>>
>>
>> On Wed, May 4, 2016 at 6:00 PM, Jack Krupansky <jack.krupan...@gmail.com>
>> wrote:
>>
>>> Probably better to ask this on the Java driver user list.
>>>
>>>
>>> -- Jack Krupansky
>>>
>>> On Wed, May 4, 2016 at 11:46 AM, horschi <hors...@gmail.com> wrote:
>>>
>>>> Hi,
>>>>
>>>> I am doing some testing on CAS operations and I am frequently having
>>>> the issue that my resultset says wasApplied()==false, but it does not
>>>> contain any value.
>>>>
>>>>
>>>> This behaviour of course leads to the following Exception when I try to
>>>> read it:
>>>>
>>>> Caused by: java.lang.IllegalArgumentException: value is not a column
>>>> defined in this metadata
>>>> at
>>>> com.datastax.driver.core.ColumnDefinitions.getAllIdx(ColumnDefinitions.java:273)
>>>> at
>>>> com.datastax.driver.core.ColumnDefinitions.getFirstIdx(ColumnDefinitions.java:279)
>>>> at
>>>> com.datastax.driver.core.ArrayBackedRow.getIndexOf(ArrayBackedRow.java:68)
>>>> at
>>>> com.datastax.driver.core.AbstractGettableData.getBytes(AbstractGettableData.java:131)
>>>>
>>>>
>>>>
>>>> My questions now are:
>>>>
>>>> Is it to be expected that a failing CAS operation sometimes does this?
>>>>
>>>> if yes: Shouldn't there a possibility on the driver side to handle this
>>>> in a better was, e.g. add a "hasColumn()" method or something to the
>>>> ResultSet?
>>>>
>>>> if no: Is that perhaps a symptom to a greater issue in cassandra?
>>>>
>>>>
>>>> kind regards,
>>>> Christian
>>>>
>>>> PS: I also appreciate general feedback on the entire C* CAS topic :-)
>>>>
>>>>
>>>>
>>>
>>
>


Re: CAS operation does not return value on failure

2016-05-05 Thread horschi
Hi Jack,

I thought that it is Cassandra that fills the value on CAS failures. So the
question if it is to be expected to have wasApplied()==false and not have
any value in the ResultSet should belong here.

So my question for this mailing list would be:

Is it correct behaviour that C* returns wasApplied()==false but not any
value? My expectation was that there always is a value in such a case.

kind regards,
Christian


On Wed, May 4, 2016 at 6:00 PM, Jack Krupansky <jack.krupan...@gmail.com>
wrote:

> Probably better to ask this on the Java driver user list.
>
>
> -- Jack Krupansky
>
> On Wed, May 4, 2016 at 11:46 AM, horschi <hors...@gmail.com> wrote:
>
>> Hi,
>>
>> I am doing some testing on CAS operations and I am frequently having the
>> issue that my resultset says wasApplied()==false, but it does not contain
>> any value.
>>
>>
>> This behaviour of course leads to the following Exception when I try to
>> read it:
>>
>> Caused by: java.lang.IllegalArgumentException: value is not a column
>> defined in this metadata
>> at
>> com.datastax.driver.core.ColumnDefinitions.getAllIdx(ColumnDefinitions.java:273)
>> at
>> com.datastax.driver.core.ColumnDefinitions.getFirstIdx(ColumnDefinitions.java:279)
>> at
>> com.datastax.driver.core.ArrayBackedRow.getIndexOf(ArrayBackedRow.java:68)
>> at
>> com.datastax.driver.core.AbstractGettableData.getBytes(AbstractGettableData.java:131)
>>
>>
>>
>> My questions now are:
>>
>> Is it to be expected that a failing CAS operation sometimes does this?
>>
>> if yes: Shouldn't there a possibility on the driver side to handle this
>> in a better was, e.g. add a "hasColumn()" method or something to the
>> ResultSet?
>>
>> if no: Is that perhaps a symptom to a greater issue in cassandra?
>>
>>
>> kind regards,
>> Christian
>>
>> PS: I also appreciate general feedback on the entire C* CAS topic :-)
>>
>>
>>
>


CAS operation does not return value on failure

2016-05-04 Thread horschi
Hi,

I am doing some testing on CAS operations and I am frequently having the
issue that my resultset says wasApplied()==false, but it does not contain
any value.


This behaviour of course leads to the following Exception when I try to
read it:

Caused by: java.lang.IllegalArgumentException: value is not a column
defined in this metadata
at
com.datastax.driver.core.ColumnDefinitions.getAllIdx(ColumnDefinitions.java:273)
at
com.datastax.driver.core.ColumnDefinitions.getFirstIdx(ColumnDefinitions.java:279)
at
com.datastax.driver.core.ArrayBackedRow.getIndexOf(ArrayBackedRow.java:68)
at
com.datastax.driver.core.AbstractGettableData.getBytes(AbstractGettableData.java:131)



My questions now are:

Is it to be expected that a failing CAS operation sometimes does this?

if yes: Shouldn't there a possibility on the driver side to handle this in
a better was, e.g. add a "hasColumn()" method or something to the ResultSet?

if no: Is that perhaps a symptom to a greater issue in cassandra?


kind regards,
Christian

PS: I also appreciate general feedback on the entire C* CAS topic :-)


Re: All subsequent CAS requests time out after heavy use of new CAS feature

2016-04-15 Thread horschi
Hi Jan,

were you able to resolve your Problem?

We are trying the same and also see a lot of WriteTimeouts:
WriteTimeoutException: Cassandra timeout during write query at consistency
SERIAL (2 replica were required but only 1 acknowledged the write)

How many clients were competing for a lock in your case? In our case its
only two :-(

cheers,
Christian


On Tue, Sep 24, 2013 at 12:18 AM, Robert Coli  wrote:

> On Mon, Sep 16, 2013 at 9:09 AM, Jan Algermissen <
> jan.algermis...@nordsc.com> wrote:
>
>> I am experimenting with C* 2.0 ( and today's java-driver 2.0 snapshot)
>> for implementing distributed locks.
>>
>
> [ and I'm experiencing the problem described in the subject ... ]
>
>
>> Any idea how to approach this problem?
>>
>
> 1) Upgrade to 2.0.1 release.
> 2) Try to reproduce symptoms.
> 3) If able to, file a JIRA at
> https://issues.apache.org/jira/secure/Dashboard.jspa including repro steps
> 4) Reply to this thread with the JIRA ticket URL
>
> =Rob
>
>
>


Re: Dynamic TTLs / limits still not working in 2.2 ?

2016-03-08 Thread horschi
Ok, I just realized the parameter should not be called ":limit" :-)

Also I upgraded my Java Driver from 2.1.6 to 2.1.9.

Both, TTL and limit, work fine now. Sorry again for the confusion.

cheers,
Christian


On Tue, Mar 8, 2016 at 3:19 PM, horschi <hors...@gmail.com> wrote:

> Oh, I just realized I made a mistake with the TTL query:
>
> The TTL has to be specified before the set. Like this:
> update mytable using ttl :timetolive set data=:data where ts=:ts and
> randkey=:randkey
>
> And this of course works nicely. Sorry for the confusion.
>
>
> Nevertheless, I don't think this is the issue with my "select ... limit"
> querys. But I will verify this and also try the workaround.
>
>
>
> On Tue, Mar 8, 2016 at 3:08 PM, horschi <hors...@gmail.com> wrote:
>
>> Hi Nick,
>>
>> I will try your workaround. Thanks a lot.
>>
>> I was not expecting the Java-Driver to have a bug, because in the Jira
>> Ticket (JAVA-54) it says "not a problem". So i assumed there is nothing to
>> do to support it :-)
>>
>> kind regards,
>> Christian
>>
>> On Tue, Mar 8, 2016 at 2:56 PM, Nicholas Wilson <
>> nicholas.wil...@realvnc.com> wrote:
>>
>>> Hi Christian,
>>>
>>>
>>> I ran into this problem last month; after some chasing I thought it was
>>> possibly a bug in the Datastax driver, which I'm also using. The CQL
>>> protocol itself supports dynamic TTLs fine.
>>>
>>>
>>> One workaround that seems to work is to use an unnamed bind marker for
>>> the TTL ('?') and then set it using the "[ttl]" reserved name as the bind
>>> marker name ('setLong("[ttl]", myTtl)'), which will set the correct field
>>> in the bound statement.
>>>
>>>
>>> Best,
>>>
>>> Nick​
>>>
>>>
>>> --
>>> *From:* horschi <hors...@gmail.com>
>>> *Sent:* 08 March 2016 13:52
>>> *To:* user@cassandra.apache.org
>>> *Subject:* Dynamic TTLs / limits still not working in 2.2 ?
>>>
>>> Hi,
>>>
>>> according to CASSANDRA-4450
>>> <https://issues.apache.org/jira/browse/CASSANDRA-4450> it should be
>>> fixed, but I still can't use dynamic TTLs or limits in my CQL queries.
>>>
>>> Query:
>>> update mytable set data=:data where ts=:ts and randkey=:randkey using
>>> ttl :timetolive
>>>
>>> Exception:
>>> Caused by: com.datastax.driver.core.exceptions.SyntaxError: line 1:138
>>> missing EOF at 'using' (...:ts and randkey=:randkey [using] ttl...)
>>> at
>>> com.datastax.driver.core.Responses$Error.asException(Responses.java:100)
>>>
>>> I am using Cassandra 2.2 (using Datastax java driver 2.1.9) and I still
>>> see this, even though the Jira ticket states fixVersion 2.0.
>>>
>>> Has anyone used this successfully? Am I doing something wrong or is
>>> there still a bug?
>>>
>>> kind regards,
>>> Christian
>>>
>>>
>>> Tickets:
>>> https://datastax-oss.atlassian.net/browse/JAVA-54
>>> https://issues.apache.org/jira/browse/CASSANDRA-4450
>>>
>>>
>>>
>>
>


Re: Dynamic TTLs / limits still not working in 2.2 ?

2016-03-08 Thread horschi
Oh, I just realized I made a mistake with the TTL query:

The TTL has to be specified before the set. Like this:
update mytable using ttl :timetolive set data=:data where ts=:ts and
randkey=:randkey

And this of course works nicely. Sorry for the confusion.


Nevertheless, I don't think this is the issue with my "select ... limit"
querys. But I will verify this and also try the workaround.



On Tue, Mar 8, 2016 at 3:08 PM, horschi <hors...@gmail.com> wrote:

> Hi Nick,
>
> I will try your workaround. Thanks a lot.
>
> I was not expecting the Java-Driver to have a bug, because in the Jira
> Ticket (JAVA-54) it says "not a problem". So i assumed there is nothing to
> do to support it :-)
>
> kind regards,
> Christian
>
> On Tue, Mar 8, 2016 at 2:56 PM, Nicholas Wilson <
> nicholas.wil...@realvnc.com> wrote:
>
>> Hi Christian,
>>
>>
>> I ran into this problem last month; after some chasing I thought it was
>> possibly a bug in the Datastax driver, which I'm also using. The CQL
>> protocol itself supports dynamic TTLs fine.
>>
>>
>> One workaround that seems to work is to use an unnamed bind marker for
>> the TTL ('?') and then set it using the "[ttl]" reserved name as the bind
>> marker name ('setLong("[ttl]", myTtl)'), which will set the correct field
>> in the bound statement.
>>
>>
>> Best,
>>
>> Nick​
>>
>>
>> --
>> *From:* horschi <hors...@gmail.com>
>> *Sent:* 08 March 2016 13:52
>> *To:* user@cassandra.apache.org
>> *Subject:* Dynamic TTLs / limits still not working in 2.2 ?
>>
>> Hi,
>>
>> according to CASSANDRA-4450
>> <https://issues.apache.org/jira/browse/CASSANDRA-4450> it should be
>> fixed, but I still can't use dynamic TTLs or limits in my CQL queries.
>>
>> Query:
>> update mytable set data=:data where ts=:ts and randkey=:randkey using ttl
>> :timetolive
>>
>> Exception:
>> Caused by: com.datastax.driver.core.exceptions.SyntaxError: line 1:138
>> missing EOF at 'using' (...:ts and randkey=:randkey [using] ttl...)
>> at
>> com.datastax.driver.core.Responses$Error.asException(Responses.java:100)
>>
>> I am using Cassandra 2.2 (using Datastax java driver 2.1.9) and I still
>> see this, even though the Jira ticket states fixVersion 2.0.
>>
>> Has anyone used this successfully? Am I doing something wrong or is there
>> still a bug?
>>
>> kind regards,
>> Christian
>>
>>
>> Tickets:
>> https://datastax-oss.atlassian.net/browse/JAVA-54
>> https://issues.apache.org/jira/browse/CASSANDRA-4450
>>
>>
>>
>


Re: Dynamic TTLs / limits still not working in 2.2 ?

2016-03-08 Thread horschi
Hi Nick,

I will try your workaround. Thanks a lot.

I was not expecting the Java-Driver to have a bug, because in the Jira
Ticket (JAVA-54) it says "not a problem". So i assumed there is nothing to
do to support it :-)

kind regards,
Christian

On Tue, Mar 8, 2016 at 2:56 PM, Nicholas Wilson <nicholas.wil...@realvnc.com
> wrote:

> Hi Christian,
>
>
> I ran into this problem last month; after some chasing I thought it was
> possibly a bug in the Datastax driver, which I'm also using. The CQL
> protocol itself supports dynamic TTLs fine.
>
>
> One workaround that seems to work is to use an unnamed bind marker for the
> TTL ('?') and then set it using the "[ttl]" reserved name as the bind
> marker name ('setLong("[ttl]", myTtl)'), which will set the correct field
> in the bound statement.
>
>
> Best,
>
> Nick​
>
>
> --
> *From:* horschi <hors...@gmail.com>
> *Sent:* 08 March 2016 13:52
> *To:* user@cassandra.apache.org
> *Subject:* Dynamic TTLs / limits still not working in 2.2 ?
>
> Hi,
>
> according to CASSANDRA-4450
> <https://issues.apache.org/jira/browse/CASSANDRA-4450> it should be
> fixed, but I still can't use dynamic TTLs or limits in my CQL queries.
>
> Query:
> update mytable set data=:data where ts=:ts and randkey=:randkey using ttl
> :timetolive
>
> Exception:
> Caused by: com.datastax.driver.core.exceptions.SyntaxError: line 1:138
> missing EOF at 'using' (...:ts and randkey=:randkey [using] ttl...)
> at com.datastax.driver.core.Responses$Error.asException(Responses.java:100)
>
> I am using Cassandra 2.2 (using Datastax java driver 2.1.9) and I still
> see this, even though the Jira ticket states fixVersion 2.0.
>
> Has anyone used this successfully? Am I doing something wrong or is there
> still a bug?
>
> kind regards,
> Christian
>
>
> Tickets:
> https://datastax-oss.atlassian.net/browse/JAVA-54
> https://issues.apache.org/jira/browse/CASSANDRA-4450
>
>
>


Dynamic TTLs / limits still not working in 2.2 ?

2016-03-08 Thread horschi
Hi,

according to CASSANDRA-4450
 it should be fixed,
but I still can't use dynamic TTLs or limits in my CQL queries.

Query:
update mytable set data=:data where ts=:ts and randkey=:randkey using ttl
:timetolive

Exception:
Caused by: com.datastax.driver.core.exceptions.SyntaxError: line 1:138
missing EOF at 'using' (...:ts and randkey=:randkey [using] ttl...)
at com.datastax.driver.core.Responses$Error.asException(Responses.java:100)

I am using Cassandra 2.2 (using Datastax java driver 2.1.9) and I still see
this, even though the Jira ticket states fixVersion 2.0.

Has anyone used this successfully? Am I doing something wrong or is there
still a bug?

kind regards,
Christian


Tickets:
https://datastax-oss.atlassian.net/browse/JAVA-54
https://issues.apache.org/jira/browse/CASSANDRA-4450


Low compactionthroughput blocks reads?

2016-02-26 Thread horschi
Hi,

I just had a weird behaviour on one of our Cassandra nodes, which I would
like to share:

Short version:
My pending reads went up from ~0 to the hundreds when I reduced the
compactionthroughput from 16 to 2.


Long version:

One of our more powerful nodes had a few pending reads, while the other
ones didn't. So far nothing special.

Strangely neither CPU, nor IO Wait, nor disk-ops/s, nor C*-heap was
particularly high. So I was wondering.

That machine had two compactions and a validation(incremental) running, so
I set the compactionthroughput to 2. To my surprise I saw the pending reads
go up to the hundreds within 5-10 seconds. Setting the compactionthroughput
back to 16 and the pending reads went back to 0 (or at least close to zero).

I kept the compactionthroughput on 2 for less than a minute. So the issue
is not compactions falling behind.

I was able to reproduce this behaviour 5-10 times. The pending reads went
up, everytime I *de*creased the compactionthroughput. I watched the pending
reads while the compactionthroughput was on 16, and I never observed even a
two digit pending read count while it was on compactionthroughput 16.

Unfortunetaly the machine does not show this behaviour any more. Also it
was only a single machine.



Our setup:
C* 2.2.5 with 256 vnodes + 9 nodes + incremental repair + 6GB heap


My question:
Did someone else ever observe such a behaviour?

Is it perhaps possible that the read-path shares a lock with
repair/compaction that waits on ThrottledReader while holding that lock?


kind regards,
Christian


Re: 3k sstables during a repair incremental !!

2016-02-10 Thread horschi
Hi Jean,

which Cassandra version do you use?

Incremental repair got much better in 2.2 (for us at least).

kind regards,
Christian

On Wed, Feb 10, 2016 at 2:33 PM, Jean Carlo  wrote:
> Hello guys!
>
> I am testing the repair inc in my custer cassandra. I am doing my test over
> these tables
>
> CREATE TABLE pns_nonreg_bench.cf3 (
> s text,
> sp int,
> d text,
> dp int,
> m map,
> t timestamp,
> PRIMARY KEY (s, sp, d, dp)
> ) WITH CLUSTERING ORDER BY (sp ASC, d ASC, dp ASC)
>
> AND compaction = {'class':
> 'org.apache.cassandra.db.compaction.LeveledCompactionStrategy'}
> AND compression = {'sstable_compression':
> 'org.apache.cassandra.io.compress.SnappyCompressor'}
>
> CREATE TABLE pns_nonreg_bench.cf1 (
> ise text PRIMARY KEY,
> int_col int,
> text_col text,
> ts_col timestamp,
> uuid_col uuid
> ) WITH bloom_filter_fp_chance = 0.01
>  AND compaction = {'class':
> 'org.apache.cassandra.db.compaction.LeveledCompactionStrategy'}
> AND compression = {'sstable_compression':
> 'org.apache.cassandra.io.compress.SnappyCompressor'}
>
> table cf1
> Space used (live): 665.7 MB
> table cf2
> Space used (live): 697.03 MB
>
> It happens that when I do repair -inc -par on theses tables, cf2 got a pick
> of 3k sstables. When the repair finish, it takes 30 min or more to finish
> all the compactations and return to 6 sstable.
>
> I am a little concern about if this will happen on production. is it normal?
>
> Saludos
>
> Jean Carlo
>
> "The best way to predict the future is to invent it" Alan Kay


Re: 3k sstables during a repair incremental !!

2016-02-10 Thread horschi
Hi Jean,

we had the same issue, but on SizeTieredCompaction. During repair the
number of SSTables and pending compactions were exploding.

It not only affected latencies, at some point Cassandra ran out of heap.

After the upgrade to 2.2 things got much better.

regards,
Christian


On Wed, Feb 10, 2016 at 2:46 PM, Jean Carlo <jean.jeancar...@gmail.com> wrote:
> Hi Horschi !!!
>
> I have the 2.1.12. But I think it is something related to Level compaction
> strategy. It is impressive that we passed from 6 sstables to 3k sstable.
> I think this will affect the latency on production because the number of
> compactions going on
>
>
>
> Best regards
>
> Jean Carlo
>
> "The best way to predict the future is to invent it" Alan Kay
>
> On Wed, Feb 10, 2016 at 2:37 PM, horschi <hors...@gmail.com> wrote:
>>
>> Hi Jean,
>>
>> which Cassandra version do you use?
>>
>> Incremental repair got much better in 2.2 (for us at least).
>>
>> kind regards,
>> Christian
>>
>> On Wed, Feb 10, 2016 at 2:33 PM, Jean Carlo <jean.jeancar...@gmail.com>
>> wrote:
>> > Hello guys!
>> >
>> > I am testing the repair inc in my custer cassandra. I am doing my test
>> > over
>> > these tables
>> >
>> > CREATE TABLE pns_nonreg_bench.cf3 (
>> > s text,
>> > sp int,
>> > d text,
>> > dp int,
>> > m map<text, text>,
>> > t timestamp,
>> > PRIMARY KEY (s, sp, d, dp)
>> > ) WITH CLUSTERING ORDER BY (sp ASC, d ASC, dp ASC)
>> >
>> > AND compaction = {'class':
>> > 'org.apache.cassandra.db.compaction.LeveledCompactionStrategy'}
>> > AND compression = {'sstable_compression':
>> > 'org.apache.cassandra.io.compress.SnappyCompressor'}
>> >
>> > CREATE TABLE pns_nonreg_bench.cf1 (
>> > ise text PRIMARY KEY,
>> > int_col int,
>> > text_col text,
>> > ts_col timestamp,
>> > uuid_col uuid
>> > ) WITH bloom_filter_fp_chance = 0.01
>> >  AND compaction = {'class':
>> > 'org.apache.cassandra.db.compaction.LeveledCompactionStrategy'}
>> > AND compression = {'sstable_compression':
>> > 'org.apache.cassandra.io.compress.SnappyCompressor'}
>> >
>> > table cf1
>> > Space used (live): 665.7 MB
>> > table cf2
>> > Space used (live): 697.03 MB
>> >
>> > It happens that when I do repair -inc -par on theses tables, cf2 got a
>> > pick
>> > of 3k sstables. When the repair finish, it takes 30 min or more to
>> > finish
>> > all the compactations and return to 6 sstable.
>> >
>> > I am a little concern about if this will happen on production. is it
>> > normal?
>> >
>> > Saludos
>> >
>> > Jean Carlo
>> >
>> > "The best way to predict the future is to invent it" Alan Kay
>
>


Re: 3k sstables during a repair incremental !!

2016-02-10 Thread horschi
btw: I am not saying incremental Repair in 2.1 is broken, but ... ;-)

On Wed, Feb 10, 2016 at 2:59 PM, horschi <hors...@gmail.com> wrote:
> Hi Jean,
>
> we had the same issue, but on SizeTieredCompaction. During repair the
> number of SSTables and pending compactions were exploding.
>
> It not only affected latencies, at some point Cassandra ran out of heap.
>
> After the upgrade to 2.2 things got much better.
>
> regards,
> Christian
>
>
> On Wed, Feb 10, 2016 at 2:46 PM, Jean Carlo <jean.jeancar...@gmail.com> wrote:
>> Hi Horschi !!!
>>
>> I have the 2.1.12. But I think it is something related to Level compaction
>> strategy. It is impressive that we passed from 6 sstables to 3k sstable.
>> I think this will affect the latency on production because the number of
>> compactions going on
>>
>>
>>
>> Best regards
>>
>> Jean Carlo
>>
>> "The best way to predict the future is to invent it" Alan Kay
>>
>> On Wed, Feb 10, 2016 at 2:37 PM, horschi <hors...@gmail.com> wrote:
>>>
>>> Hi Jean,
>>>
>>> which Cassandra version do you use?
>>>
>>> Incremental repair got much better in 2.2 (for us at least).
>>>
>>> kind regards,
>>> Christian
>>>
>>> On Wed, Feb 10, 2016 at 2:33 PM, Jean Carlo <jean.jeancar...@gmail.com>
>>> wrote:
>>> > Hello guys!
>>> >
>>> > I am testing the repair inc in my custer cassandra. I am doing my test
>>> > over
>>> > these tables
>>> >
>>> > CREATE TABLE pns_nonreg_bench.cf3 (
>>> > s text,
>>> > sp int,
>>> > d text,
>>> > dp int,
>>> > m map<text, text>,
>>> > t timestamp,
>>> > PRIMARY KEY (s, sp, d, dp)
>>> > ) WITH CLUSTERING ORDER BY (sp ASC, d ASC, dp ASC)
>>> >
>>> > AND compaction = {'class':
>>> > 'org.apache.cassandra.db.compaction.LeveledCompactionStrategy'}
>>> > AND compression = {'sstable_compression':
>>> > 'org.apache.cassandra.io.compress.SnappyCompressor'}
>>> >
>>> > CREATE TABLE pns_nonreg_bench.cf1 (
>>> > ise text PRIMARY KEY,
>>> > int_col int,
>>> > text_col text,
>>> > ts_col timestamp,
>>> > uuid_col uuid
>>> > ) WITH bloom_filter_fp_chance = 0.01
>>> >  AND compaction = {'class':
>>> > 'org.apache.cassandra.db.compaction.LeveledCompactionStrategy'}
>>> > AND compression = {'sstable_compression':
>>> > 'org.apache.cassandra.io.compress.SnappyCompressor'}
>>> >
>>> > table cf1
>>> > Space used (live): 665.7 MB
>>> > table cf2
>>> > Space used (live): 697.03 MB
>>> >
>>> > It happens that when I do repair -inc -par on theses tables, cf2 got a
>>> > pick
>>> > of 3k sstables. When the repair finish, it takes 30 min or more to
>>> > finish
>>> > all the compactations and return to 6 sstable.
>>> >
>>> > I am a little concern about if this will happen on production. is it
>>> > normal?
>>> >
>>> > Saludos
>>> >
>>> > Jean Carlo
>>> >
>>> > "The best way to predict the future is to invent it" Alan Kay
>>
>>


Re: memory usage problem of Metadata.tokenMap.tokenToHost

2015-09-22 Thread horschi
Hi Joseph,

I think 2000 keyspaces might be just too much. Fewer keyspaces (and CFs)
will probably work much better.

kind regards,
Christian


On Tue, Sep 22, 2015 at 9:29 AM, joseph gao  wrote:

> Hi, anybody could help me?
>
> 2015-09-21 0:47 GMT+08:00 joseph gao :
>
>> ps : that's the code in java drive , in MetaData.TokenMap.build:
>>
>> for (KeyspaceMetadata keyspace : keyspaces)
>> {
>> ReplicationStrategy strategy = keyspace.replicationStrategy();
>> Map ksTokens = (strategy == null)
>> ? makeNonReplicatedMap(tokenToPrimary)
>> : strategy.computeTokenToReplicaMap(tokenToPrimary, ring);
>>
>> tokenToHosts.put(keyspace.getName(), ksTokens);
>>
>> tokenToPrimary is all same, ring is all same, and if strategy is all
>> same , strategy.computeTokenToReplicaMap would return 'same' map but
>> different object( cause every calling returns a new HashMap
>>
>> 2015-09-21 0:22 GMT+08:00 joseph gao :
>>
>>> cassandra: 2.1.7
>>> java driver: datastax java driver 2.1.6
>>>
>>> Here is the problem:
>>>My application uses 2000+ keyspaces, and will dynamically create
>>> keyspaces and tables. And then in java client, the
>>> Metadata.tokenMap.tokenToHost would use about 1g memory. so this will cause
>>> a lot of  full gc.
>>>As I see, the key of the tokenToHost is keyspace, and the value is a
>>> tokenId_to_replicateNodes map.
>>>
>>>When I try to solve this problem, I find something not sure: all
>>> keyspaces have same 'tokenId_to_replicateNodes' map.
>>> My replication strategy of all keyspaces is : simpleStrategy and
>>> replicationFactor is 3
>>>
>>> So would it be possible if keyspaces use same strategy, the value of
>>> tokenToHost map use a same map. So it would extremely reduce the memory
>>> usage
>>>
>>>  thanks a lot
>>>
>>> --
>>> --
>>> Joseph Gao
>>> PhoneNum:15210513582
>>> QQ: 409343351
>>>
>>
>>
>>
>> --
>> --
>> Joseph Gao
>> PhoneNum:15210513582
>> QQ: 409343351
>>
>
>
>
> --
> --
> Joseph Gao
> PhoneNum:15210513582
> QQ: 409343351
>


Re: Is it possible to bootstrap the 1st node of a new DC?

2015-09-10 Thread horschi
Hi Samuel,

thanks a lot for the jira link. Another reason to upgrade to 2.1 :-)

regards,
Christian



On Thu, Sep 10, 2015 at 1:28 PM, Samuel CARRIERE <samuel.carri...@urssaf.fr>
wrote:

> Hi Christian,
> The problem you mention (violation of constency) is a true one. If I have
> understood correctly, it is resolved in cassandra 2.1 (see CASSANDRA-2434).
> Regards,
> Samuel
>
>
> horschi <hors...@gmail.com> a écrit sur 10/09/2015 12:41:41 :
>
> > De : horschi <hors...@gmail.com>
> > A : user@cassandra.apache.org,
> > Date : 10/09/2015 12:42
> > Objet : Re: Is it possible to bootstrap the 1st node of a new DC?
> >
> > Hi Rob,
> >
> > regarding 1-3:
> > Thank you for the step-by-step explanation :-) My mistake was to use
> > join_ring=false during the inital start already. It now works for me
> > as its supposed to. Nevertheless it does not what I want, as it does
> > not take writes during the time of repair/rebuild: Running an 8 hour
> > repair will lead to 8 hours of data missing.
> >
> > regarding 1-6:
> > This is what we did. And it works of course. Our issue was just that
> > we had some global-QUORUMS hidden somewhere, which the operator was
> > not aware of. Therefore it would have been nice if the ops guy could
> > prevent these reads by himself.
> >
> >
> > Another issue I think the current bootstrapping process has: Doesn't
> > it practically reduce the RF for old data by one? (With old data I
> > mean any data that was written before the bootstrap).
> >
> > Let me give an example:
> >
> > Lets assume I have a cluster of Node 1,2 and 3 with RF=3. And lets
> > assume a single write on node 2 got lost. So this particular write
> > is only available on node 1 and 3.
> >
> > Now I add node 4, which takes the range in such a way that node 1
> > will not own that previously written key any more. Also assume that
> > the new node loads its data from node 2.
> >
> > This means we have a cluster where the previously mentioned write is
> > only on node 3. (Node 1 is not responsible for the key any more and
> > node 4 loaded its data from the wrong node)
> >
> > Any quorum-read that hit node 2 & 4 will not return the column. So
> > this means we effectively lowered the CL/RF.
> >
> > Therefore what I would like to be able to do is:
> > - Add new node 4, but leave it in a joining state. (This means it
> > gets all the writes but does not serve reads.)
> > - Do "nodetool rebuild"
> > - New node should not serve reads yet. And node 1 should not yet
> > give up its ranges to node 4.
> > - Do "nodetool repair", to ensure consistency.
> > - Finish bootstrap. Now node1 should not be responsible for the
> > range and node4 should become eligible for reads.
> >
> > regards,
> > Christian
> >
> > On Tue, Sep 8, 2015 at 11:51 PM, Robert Coli <rc...@eventbrite.com>
> wrote:
> > On Tue, Sep 8, 2015 at 2:39 PM, horschi <hors...@gmail.com> wrote:
> > I tried to set up a new node with join_ring=false once. In my test
> > that node did not pick a token in the ring. I assume running repair
> > or rebuild would not do anything in that case: No tokens = no data.
> > But I must admit: I have not tried running rebuild.
> >
> > I admit I haven't been following this thread closely, perhaps I have
> > missed what exactly it is you're trying to do.
> >
> > It's possible you'd need to :
> >
> > 1) join the node with auto_bootstrap=false
> > 2) immediately stop it
> > 3) re-start it with join_ring=false
> >
> > To actually use repair or rebuild in this way.
> >
> > However, if your goal is to create a new data-center and rebuild a
> > node there without any risk of reading from that node while creating
> > the new data center, you can just :
> >
> > 1) create nodes in new data-center, with RF=0 for that DC
> > 2) change RF in that DC
> > 3) run rebuild on new data-center nodes
> > 4) while doing so, don't talk to new data-center coordinators from your
> client
> > 5) and also use LOCAL_ONE/LOCAL_QUORUM to avoid cross-data-center
> > reads from your client
> > 6) modulo the handful of current bugs which make 5) currently imperfect
> >
> > What problem are you encountering with this procedure? If it's this ...
> >
> > I've learned from experience that the node immediately joins the
> > cluster, and starts accepting reads (from other DCs) for the range it
> owns.
> >
> > This seems to be the incorrect assumption at the heart of the
> > confusion. You "should" be able to prevent this behavior entirely
> > via correct use of ConsistencyLevel and client configuration.
> >
> > In an ideal world, I'd write a detailed blog post explaining this...
> > :/ in my copious spare time...
> >
> > =Rob
> >
>


Re: Is it possible to bootstrap the 1st node of a new DC?

2015-09-10 Thread horschi
Hi Rob,

regarding 1-3:
Thank you for the step-by-step explanation :-) My mistake was to use
join_ring=false during the inital start already. It now works for me as its
supposed to. Nevertheless it does not what I want, as it does not take
writes during the time of repair/rebuild: Running an 8 hour repair will
lead to 8 hours of data missing.


regarding 1-6:
This is what we did. And it works of course. Our issue was just that we had
some global-QUORUMS hidden somewhere, which the operator was not aware of.
Therefore it would have been nice if the ops guy could prevent these reads
by himself.




Another issue I think the current bootstrapping process has: Doesn't it
practically reduce the RF for old data by one? (With old data I mean any
data that was written before the bootstrap).

Let me give an example:

Lets assume I have a cluster of Node 1,2 and 3 with RF=3. And lets assume a
single write on node 2 got lost. So this particular write is only available
on node 1 and 3.

Now I add node 4, which takes the range in such a way that node 1 will not
own that previously written key any more. Also assume that the new node
loads its data from node 2.

This means we have a cluster where the previously mentioned write is only
on node 3. (Node 1 is not responsible for the key any more and node 4
loaded its data from the wrong node)

Any quorum-read that hit node 2 & 4 will not return the column. So this
means we effectively lowered the CL/RF.


Therefore what I would like to be able to do is:
- Add new node 4, but leave it in a joining state. (This means it gets all
the writes but does not serve reads.)
- Do "nodetool rebuild"
- New node should not serve reads yet. And node 1 should not yet give up
its ranges to node 4.
- Do "nodetool repair", to ensure consistency.
- Finish bootstrap. Now node1 should not be responsible for the range and
node4 should become eligible for reads.


regards,
Christian




On Tue, Sep 8, 2015 at 11:51 PM, Robert Coli <rc...@eventbrite.com> wrote:

> On Tue, Sep 8, 2015 at 2:39 PM, horschi <hors...@gmail.com> wrote:
>
>> I tried to set up a new node with join_ring=false once. In my test that
>> node did not pick a token in the ring. I assume running repair or rebuild
>> would not do anything in that case: No tokens = no data. But I must admit:
>> I have not tried running rebuild.
>>
>
> I admit I haven't been following this thread closely, perhaps I have
> missed what exactly it is you're trying to do.
>
> It's possible you'd need to :
>
> 1) join the node with auto_bootstrap=false
> 2) immediately stop it
> 3) re-start it with join_ring=false
>
> To actually use repair or rebuild in this way.
>
> However, if your goal is to create a new data-center and rebuild a node
> there without any risk of reading from that node while creating the new
> data center, you can just :
>
> 1) create nodes in new data-center, with RF=0 for that DC
> 2) change RF in that DC
> 3) run rebuild on new data-center nodes
> 4) while doing so, don't talk to new data-center coordinators from your
> client
> 5) and also use LOCAL_ONE/LOCAL_QUORUM to avoid cross-data-center reads
> from your client
> 6) modulo the handful of current bugs which make 5) currently imperfect
>
> What problem are you encountering with this procedure? If it's this ...
>
> I've learned from experience that the node immediately joins the cluster,
>> and starts accepting reads (from other DCs) for the range it owns.
>
>
> This seems to be the incorrect assumption at the heart of the confusion.
> You "should" be able to prevent this behavior entirely via correct use of
> ConsistencyLevel and client configuration.
>
> In an ideal world, I'd write a detailed blog post explaining this... :/ in
> my copious spare time...
>
> =Rob
>
>
>


Re: Is it possible to bootstrap the 1st node of a new DC?

2015-09-08 Thread horschi
Hi Tom,

"The idea of join_ring=false is that other nodes are not aware of the new
node, and therefore never send requests to it. The new node can then be
repaired"
Nicely explained, but I still see the issue that this node would not
receive writes during that time. So after the repair the node would still
miss data.
Again, what is needed is either some joining-state or write-survey that
allows disabling reads, but still accepts writes.



"To set up a new DC, I was hoping that you could also rebuild (instead of a
repair) a new node while join_ring=false, but that seems not to work."
Correct. The node does not get any tokens with join_ring=false. And again,
your node won't receive any writes while you are rebuilding. Therefore you
will have outdated data at the point when you are done rebuilding.


kind regards,
Christian





On Tue, Sep 8, 2015 at 10:00 AM, Tom van den Berge <
tom.vandenbe...@gmail.com> wrote:

> "one drawback: the node joins the cluster as soon as the bootstrapping
>> begins."
>> I am not sure I understand this correctly. It will get tokens, but not
>> load data if you combine it with autobootstrap=false.
>>
> Joining the cluster means that all other nodes become aware of the new
> node, and therefore it might receive reads. And yes, it will not have any
> data, because auto_bootstrap=false.
>
>
>
>> How I see it: You should be able to start all the new nodes in the new DC
>> with autobootstrap=false and survey-mode=true. Then you should have a new
>> DC with nodes that have tokens but no data. Then you can start rebuild on
>> all new nodes. During this process, the new nodes should get writes, but
>> not serve reads.
>>
> Maybe you're right.
>
>
>>
>> "It turns out that join_ring=false in this scenario does not solve this
>> problem"
>> I also don't see how joing_ring would help here. (Actually I have no clue
>> where you would ever need that option)
>>
> The idea of join_ring=false is that other nodes are not aware of the new
> node, and therefore never send requests to it. The new node can then be
> repaired (see https://issues.apache.org/jira/browse/CASSANDRA-6961). To
> set up a new DC, I was hoping that you could also rebuild (instead of a
> repair) a new node while join_ring=false, but that seems not to work.
>
>>
>>
>> "Currently I'm trying to auto_bootstrap my new DC. The good thing is that
>> it doesn't accept reads from other DCs."
>> The joining-state actually works perfectly. The joining state is a state
>> where node take writes, but not serve ready. It would be really cool if you
>> could boot a node into the joining state. Actually, write_survey should
>> basically be the same.
>>
> It would be great if you could select the DC from where it's bootstrapped,
> similar to nodetool rebuild. I'm currently bootstrapping a node in
> San-Jose. It decides to stream all data from another DC in Amsterdam, while
> we also have another DC in San-Jose, right next to it. Streaming data
> across the Atlantic takes a lot more time :(
>
>
>
>>
>> kind regards,
>> Christian
>>
>> PS: I would love to see the results, if you perform any tests on the
>> write-survey. Please share it here on the mailing list :-)
>>
>>
>>
>> On Mon, Sep 7, 2015 at 11:10 PM, Tom van den Berge <
>> tom.vandenbe...@gmail.com> wrote:
>>
>>> Hi Christian,
>>>
>>> No, I never tried survey mode. I didn't know it until now, but form the
>>> info I was able to find it looks like it is meant for a different purpose.
>>> Maybe it can be used to bootstrap a new DC, though.
>>>
>>> On the other hand, the auto_bootstrap=false + rebuild scenario seems to
>>> be designed to do exactly what I need, except that it has one drawback: the
>>> node joins the cluster as soon as the bootstrapping begins.
>>>
>>> It turns out that join_ring=false in this scenario does not solve this
>>> problem, since nodetool rebuild does not do anything if C* is started with
>>> this option.
>>>
>>> A workaround could be to ensure that only LOCAL_* CL is used by all
>>> clients, but even then I'm seeing failed queries, because they're
>>> mysteriously routed to the new DC every now and then.
>>>
>>> Currently I'm trying to auto_bootstrap my new DC. The good thing is that
>>> it doesn't accept reads from other DCs. The bad thing is that a) I can't
>>> choose where it streams its data from, and b) the two nodes I've been
>>> trying to bootstrap crashed when they were almost finished...
>>>

Re: Is it possible to bootstrap the 1st node of a new DC?

2015-09-08 Thread horschi
Hi Robert,

I tried to set up a new node with join_ring=false once. In my test that
node did not pick a token in the ring. I assume running repair or rebuild
would not do anything in that case: No tokens = no data. But I must admit:
I have not tried running rebuild.

Is a new node with join_ring=false supposed to pick tokens? From driftx
comment in CASSANDRA-6961 I take it should not.

Tom: What does "nodetool status" say after you started the new node
with join_ring=false?
In my test I got a node that was not in the ring at all.

kind regards,
Christian



On Tue, Sep 8, 2015 at 9:05 PM, Robert Coli <rc...@eventbrite.com> wrote:

>
>
> On Tue, Sep 8, 2015 at 1:39 AM, horschi <hors...@gmail.com> wrote:
>
>> "The idea of join_ring=false is that other nodes are not aware of the
>> new node, and therefore never send requests to it. The new node can then be
>> repaired"
>> Nicely explained, but I still see the issue that this node would not
>> receive writes during that time. So after the repair the node would still
>> miss data.
>> Again, what is needed is either some joining-state or write-survey that
>> allows disabling reads, but still accepts writes.
>>
>
> https://issues.apache.org/jira/browse/CASSANDRA-6961
> "
> We can *almost* set join_ring to false, then repair, and then join the
> ring to narrow the window (actually, you can do this and everything
> succeeds because the node doesn't know it's a member yet, which is probably
> a bit of a bug.) If instead we modified this to put the node in hibernate,
> like replace_address does, it could work almost like replace, except you
> could run a repair (manually) while in the hibernate state, and then flip
> to normal when it's done.
> "
>
> Since 2.0.7, you should be able to use join_ring=false + repair to do the
> operation this thread discusses.
>
> Has anyone here tried and found it wanting? If so, in what way?
>
> For the record, I find various statements in this thread confusing and
> likely to be wrong :
>
> " And again, your node won't receive any writes while you are rebuilding.
>>  "
>
>
> If your RF has been increased in the new DC, sure you will, you'll get the
> writes you're supposed to get because of your RF? The challenge with
> rebuild is premature reads from the new DC, not losing writes?
>
> Running nodetool rebuild on a node that was started with join_ring=false
>> does not work, unfortunately. The nodetool command returns immediately,
>> after a message appears in the log that the streaming of data has started.
>> After that, nothing happens.
>
>
> Per driftx, the author of CASSANDRA-6961, this sounds like a bug. If you
> can repro, please file a JIRA and let the list know the URL.
>
> =Rob
>
>
>


Re: Is it possible to bootstrap the 1st node of a new DC?

2015-09-07 Thread horschi
Hi Tom,

"one drawback: the node joins the cluster as soon as the bootstrapping
begins."
I am not sure I understand this correctly. It will get tokens, but not load
data if you combine it with autobootstrap=false.

How I see it: You should be able to start all the new nodes in the new DC
with autobootstrap=false and survey-mode=true. Then you should have a new
DC with nodes that have tokens but no data. Then you can start rebuild on
all new nodes. During this process, the new nodes should get writes, but
not serve reads.

Disclaimer: I have not tested the combination of the two!



"It turns out that join_ring=false in this scenario does not solve this
problem"
I also don't see how joing_ring would help here. (Actually I have no clue
where you would ever need that option)


"A workaround could be to ensure that only LOCAL_* CL is used by all
clients, but even then I'm seeing failed queries, because they're
mysteriously routed to the new DC every now and then."
Yes, it works fine if you don't do any mistakes. Keep in mind you also have
to make sure your driver does not connect against the other DC. But I agree
with you: its a workaround for this scenario. To me this does not feel
correct.



"Currently I'm trying to auto_bootstrap my new DC. The good thing is that
it doesn't accept reads from other DCs."
The joining-state actually works perfectly. The joining state is a state
where node take writes, but not serve ready. It would be really cool if you
could boot a node into the joining state. Actually, write_survey should
basically be the same.

kind regards,
Christian

PS: I would love to see the results, if you perform any tests on the
write-survey. Please share it here on the mailing list :-)



On Mon, Sep 7, 2015 at 11:10 PM, Tom van den Berge <
tom.vandenbe...@gmail.com> wrote:

> Hi Christian,
>
> No, I never tried survey mode. I didn't know it until now, but form the
> info I was able to find it looks like it is meant for a different purpose.
> Maybe it can be used to bootstrap a new DC, though.
>
> On the other hand, the auto_bootstrap=false + rebuild scenario seems to be
> designed to do exactly what I need, except that it has one drawback: the
> node joins the cluster as soon as the bootstrapping begins.
>
> It turns out that join_ring=false in this scenario does not solve this
> problem, since nodetool rebuild does not do anything if C* is started with
> this option.
>
> A workaround could be to ensure that only LOCAL_* CL is used by all
> clients, but even then I'm seeing failed queries, because they're
> mysteriously routed to the new DC every now and then.
>
> Currently I'm trying to auto_bootstrap my new DC. The good thing is that
> it doesn't accept reads from other DCs. The bad thing is that a) I can't
> choose where it streams its data from, and b) the two nodes I've been
> trying to bootstrap crashed when they were almost finished...
>
>
>
> On Mon, Sep 7, 2015 at 10:22 PM, horschi <hors...@gmail.com> wrote:
>
>> Hi Tom,
>>
>> this sounds very much like my thread: "auto_bootstrap=false broken?"
>>
>> Did you try booting the new node with survey-mode? I wanted to try this,
>> but I am waiting for 2.0.17 to come out (survey mode is broken in earlier
>> versions). Imho survey mode is what you (and me too) want: start a node,
>> accepting writes, but not serving reads. I have not tested it yet, but I
>> think it should work.
>>
>> Also the manual join mentioned in CASSANDRA-9667 sounds very interesting.
>>
>> kind regards,
>> Christian
>>
>> On Mon, Sep 7, 2015 at 10:11 PM, Tom van den Berge <t...@drillster.com>
>> wrote:
>>
>>> Running nodetool rebuild on a node that was started with join_ring=false
>>> does not work, unfortunately. The nodetool command returns immediately,
>>> after a message appears in the log that the streaming of data has started.
>>> After that, nothing happens.
>>>
>>> Tom
>>>
>>>
>>> On Fri, Sep 12, 2014 at 5:47 PM, Robert Coli <rc...@eventbrite.com>
>>> wrote:
>>>
>>>> On Fri, Sep 12, 2014 at 6:57 AM, Tom van den Berge <t...@drillster.com>
>>>> wrote:
>>>>
>>>>> Wouldn't it be far more efficient if a node that is rebuilding itself
>>>>> is responsible for not accepting reads until the rebuild is complete? E.g.
>>>>> by marking it as "Joining", similar to a node that is being bootstrapped?
>>>>>
>>>>
>>>> Yes, and Cassandra 2.0.7 and above contain this long desired
>>>> functionality.
>>>>
>>>> https://issues.apache.org/jira/browse/CASSANDRA-6961
>>>>
>>>> I presume that one can also run a rebuild in this state, though I
>>>> haven't tried. Driftx gives it an 80% chance... try it and see and let us
>>>> know? :D
>>>>
>>>> =Rob
>>>>
>>>>
>>>
>>>
>>>
>>
>


Re: Is it possible to bootstrap the 1st node of a new DC?

2015-09-07 Thread horschi
Hi Tom,

this sounds very much like my thread: "auto_bootstrap=false broken?"

Did you try booting the new node with survey-mode? I wanted to try this,
but I am waiting for 2.0.17 to come out (survey mode is broken in earlier
versions). Imho survey mode is what you (and me too) want: start a node,
accepting writes, but not serving reads. I have not tested it yet, but I
think it should work.

Also the manual join mentioned in CASSANDRA-9667 sounds very interesting.

kind regards,
Christian

On Mon, Sep 7, 2015 at 10:11 PM, Tom van den Berge 
wrote:

> Running nodetool rebuild on a node that was started with join_ring=false
> does not work, unfortunately. The nodetool command returns immediately,
> after a message appears in the log that the streaming of data has started.
> After that, nothing happens.
>
> Tom
>
>
> On Fri, Sep 12, 2014 at 5:47 PM, Robert Coli  wrote:
>
>> On Fri, Sep 12, 2014 at 6:57 AM, Tom van den Berge 
>> wrote:
>>
>>> Wouldn't it be far more efficient if a node that is rebuilding itself is
>>> responsible for not accepting reads until the rebuild is complete? E.g. by
>>> marking it as "Joining", similar to a node that is being bootstrapped?
>>>
>>
>> Yes, and Cassandra 2.0.7 and above contain this long desired
>> functionality.
>>
>> https://issues.apache.org/jira/browse/CASSANDRA-6961
>>
>> I presume that one can also run a rebuild in this state, though I haven't
>> tried. Driftx gives it an 80% chance... try it and see and let us know? :D
>>
>> =Rob
>>
>>
>
>
>


Re: auto_bootstrap=false broken?

2015-08-07 Thread horschi
Hi Jeff,


You’re trying to force your view onto an established ecosystem.

It is not my intent to force anyone to do anything. I apologize if my title
was too provocative. I just wanted to clickbait ;-)


It’s not “wrong only because its currently bootstrapping”, it’s not
 bootstrapping at all, you told it not to bootstrap.

Let me correct myself. It should be: its wrong because it isn't
bootstrapped. But that does not change what I am proposing: It still
should not serve reads.


‘auto_bootstrap’ is the knob that tells cassandra whether or not you want
 to stream data from other replicas when you join the ring. Period. That’s
 all it does. If you set it to false, you’re telling cassandra it already
 has the data. The switch implies nothing else. There is no option to “join
 the ring but don’t serve reads until I tell you it’s ready”, and changing
 auto-bootstrap to be that is unlikely to ever happen.

I know that it does only that. But I would have made a different design
decision (to not serve reads in such a state).



 Don’t want to serve reads? Disable thrift and native proto, start the with
 auto-bootstrap set to whatever you want but thrift and native proto
 disabled, then enable thrift and native proto again to enable reads from
 clients when ready. Until then, make sure you’re using a consistency level
 appropriate for your requirements.

Of course it can be worked around. I just think its error prone to do that
manually. That is why I was proposing a change.



 You’re mis-using a knob that doesn’t do what you think it does, and is
 unlikely to ever be changed to do what you think it should.

I wanted to change the definition of what auto_bootstrap=false is. I dont
know if that makes it better or worse ;-)


I hope I did not consume too much of your time. Thanks for all the
responses. I will experiment a bit with write_survey and see if it already
does what I need.

kind regards,
Christian


Re: auto_bootstrap=false broken?

2015-08-07 Thread horschi
Hi Cyril,

thanks for backing me up. I'm under siege from all sides here ;-)


That something we're trying to do too. However disabling clients
 connections (closing thrift and native ports) does not prevent other nodes
 (acting as a coordinators) to request it ... Honestly we'd like to restart
 a node that need to deploy HH and to make it serve reads only when it's
 done. And to be more precise, we know when it's done and don't need it to
 work by itself (automatically).

If you use auto_bootstrap=false in the same DC, then I think you are
screwed. Afaik auto_bootstrap=false can only work in a new DC, where you
can control reads via LOCAL-Consistencieslevels.


kind regards,
Christian


Re: auto_bootstrap=false broken?

2015-08-06 Thread horschi
Hi Rob,



 Your asking the wrong nodes for data in the rebuild-a-new-DC case does not
 indicate a problem with the auto_bootstrap false + rebuild paradigm.


The node is wrong only because its currently bootstrapping. So imho
Cassandra should not serve any reads in such a case.




 What makes you think you should be using it in any case other than
 rebuild-new-DC or I restored my node's data with tablesnap?


These two cases are my example use-cases. I don't know if there are any
other cases, perhaps there is.




 As the bug I pasted you indicates, if you want to repair a node before
 having it join, use join_ring=false, repair it, and then join.

 https://issues.apache.org/jira/browse/CASSANDRA-6961

 In what way does this functionality not meet your needs?


I see two reasons why join_ring=false does not help with my issue or I
misunderstand:

- When I set up a new node and I start it with join_ring=false, then it
does not get any tokens, right? (I tried it, and that was what I got)  How
can I run nodetool repair when the node doesn't have any tokens?

- A node that is not joining the ring, will not receive any writes, right?
So if I run repair in a unjoined state for X hours, then I will miss X
hours worth of data afterwards.


The only solution I see seems to be write_survey. I will do some tests with
it, once 2.0.17 is out. I will post my results :-)

kind regards,
Christian


Re: auto_bootstrap=false broken?

2015-08-05 Thread horschi
Hi Rob,

let me try to give examples why auto_bootstrap=false is dangerous:

I just yesterday had the issue that we wanted to set up a new DC:
Unfortunetaly we had one application that used CL.ONE (because its only
querying static data and its read heavy). That application stopped working
after we brought up the new DC, because it was querying against the new
nodes. We are now changing it to LOCAL_ONE, then it should be Ok. But
nevertheless: I think it would have been cleaner if the new node would not
have served reads in the first place. Instead the operations people have to
worry about the applications using the correct CL.


Another, more general, issue with auto_bootstrap=false: When adding a new
node to an existing cluster, you are basically lowering your CL by one.
RF=3 with with quorum will read from two nodes. One might be the
bootstrapped node, which has no data. Then you are relying on a single node
to be 100% consistent.


So what I am trying to say is: Everytime you use auto_bootstrap=false, you
are entering a dangerous path. And I think this could be fixed, if
auto_bootstrap=false would leave the node in a write-only state. Then the
operator could still decide to override it with nodetool.


Disclaimer: I am using C* 2.0.

kind regards,
Christian



On Tue, Aug 4, 2015 at 10:02 PM, Robert Coli rc...@eventbrite.com wrote:

 On Tue, Aug 4, 2015 at 11:40 AM, horschi hors...@gmail.com wrote:

 unless you specify auto_bootstrap=false :)


 ... so why are you doing that?

 Two experts are confused as to what you're trying to do; why do you think
 you need to do it?

 =Rob




auto_bootstrap=false broken?

2015-08-04 Thread horschi
Hi everyone,

I'll just ask my question as provocative as possible ;-)

Isnt't auto_bootstrap=false broken the way it is currently implemented?

What currently happens:
New node starts with auto_bootstrap=false and it starts serving reads
immediately without having any data.

Would the following be more correct:
- New node should stay in a joining state
- Operator loads data (e.g. using nodetool rebuild or putting in backupped
files or whatever)
- Operator has to manually switch from joining into normal state using
nodetool (only then it will start serving reads)

Wouldn't this behaviour more consistent?

kind regards,
Christian


Re: auto_bootstrap=false broken?

2015-08-04 Thread horschi
Hi Paulo,

thanks for your feedback, but I think this is not what I am looking for.

Starting with join_ring does not take any tokens in the ring. And the
nodetool join afterwards will again do token-selection and data loading
in one step.

I would like to separate these steps:
1. assign tokens
2. have the node in a joining state, so that I can copy in data
3. mark the node as ready


I just saw that perhaps write_survey could be misused for that.

Did anyone ever use write_survey for such a partial bootstrapping?
Do I have to worry about data-loss when using multiple write_survey nodes
in one cluster?

kind regards,
Christian



On Tue, Aug 4, 2015 at 2:24 PM, Paulo Motta pauloricard...@gmail.com
wrote:

 Hello Christian,

 You may use the start-up parameter -Dcassandra.join_ring=false if you
 don't want the node to join the ring on startup. More about this parameter
 here:
 http://docs.datastax.com/en/cassandra/2.0/cassandra/tools/toolsCUtility_t.html

 You can later join the ring via nodetool join command:
 http://docs.datastax.com/en/cassandra/2.0/cassandra/tools/toolsJoin.html

 auto_bootstrap=false is typically used to bootstrap new datacenters or
 clusters, or nodes with data already on it before starting the process.

 Cheers,

 Paulo

 2015-08-04 8:50 GMT-03:00 horschi hors...@gmail.com:

 Hi everyone,

 I'll just ask my question as provocative as possible ;-)

 Isnt't auto_bootstrap=false broken the way it is currently implemented?

 What currently happens:
 New node starts with auto_bootstrap=false and it starts serving reads
 immediately without having any data.

 Would the following be more correct:
 - New node should stay in a joining state
 - Operator loads data (e.g. using nodetool rebuild or putting in
 backupped files or whatever)
 - Operator has to manually switch from joining into normal state using
 nodetool (only then it will start serving reads)

 Wouldn't this behaviour more consistent?

 kind regards,
 Christian





Re: auto_bootstrap=false broken?

2015-08-04 Thread horschi
Hi Robert,

sorry for the confusion. Perhaps write_survey is not my solution
(unfortunetaly I cant get it to work, so I dont really know). I just
thought that it *could* be my solution.


What I actually want:
I want to be able to start a new node, without it starting to serve reads
prematurely. I want cassandra to wait for me to confirm everything is ok,
now serve reads.



Possible solutions so far:

A) When starting a new node with auto_bootstrap=false, then I get a node
that has no data, but serves reads. In my opinion it would be cleaner if it
would stay in a joining state where it only receives writes.

B) Disabling join_ring on my new node does nothing. The new node will not
have a token. I cant see it in nodetool status. Therefore I assume it
will not receive any writes.

C) write_survey unfortunetaly does not seem to work for me: My new node,
which I start with survey-mode, gets writes from other nodes and shows as
joining in the ring. Which is good! But does not get a schema, so it
throws exceptions when receiving these writes. I assume its just a bug in
2.0.




Disclaimer: I am using C* 2.0, with which I can't get the desire behaviour
(or at least I don't know how).

kind regards,
Christian




On Tue, Aug 4, 2015 at 7:12 PM, Robert Coli rc...@eventbrite.com wrote:

 On Tue, Aug 4, 2015 at 6:19 AM, horschi hors...@gmail.com wrote:

 I would like to separate these steps:
 1. assign tokens
 2. have the node in a joining state, so that I can copy in data
 3. mark the node as ready



 Did anyone ever use write_survey for such a partial bootstrapping?


 What you're asking doesn't make sense to me.

 What does partial bootstrap mean? Where are you getting the data from?
 How are you copying in data and why do you need the node to be in a
 joining state to do that?

 https://issues.apache.org/jira/browse/CASSANDRA-6961

 Explains a method by which you can repair a partially joined node. In what
 way does this differ from what you want?

 =Rob




Re: auto_bootstrap=false broken?

2015-08-04 Thread horschi
Hi Aeljami,

thanks for the ticket. I'll keep an eye on it.

I can't get the survey to work at all on 2.0 (I am not getting any schema
on the survey node). So I guess the survey is not going to be a solution
for now.

kind regards,
Christian


On Tue, Aug 4, 2015 at 3:29 PM, aeljami@orange.com wrote:

 I had problems with write_survey.

 I opened a bug :  https://issues.apache.org/jira/browse/CASSANDRA-9934



 *De :* horschi [mailto:hors...@gmail.com]
 *Envoyé :* mardi 4 août 2015 15:20
 *À :* user@cassandra.apache.org
 *Objet :* Re: auto_bootstrap=false broken?



 Hi Paulo,



 thanks for your feedback, but I think this is not what I am looking for.



 Starting with join_ring does not take any tokens in the ring. And the
 nodetool join afterwards will again do token-selection and data loading
 in one step.



 I would like to separate these steps:

 1. assign tokens

 2. have the node in a joining state, so that I can copy in data

 3. mark the node as ready





 I just saw that perhaps write_survey could be misused for that.



 Did anyone ever use write_survey for such a partial bootstrapping?

 Do I have to worry about data-loss when using multiple write_survey nodes
 in one cluster?



 kind regards,

 Christian







 On Tue, Aug 4, 2015 at 2:24 PM, Paulo Motta pauloricard...@gmail.com
 wrote:

 Hello Christian,

 You may use the start-up parameter -Dcassandra.join_ring=false if you
 don't want the node to join the ring on startup. More about this parameter
 here:
 http://docs.datastax.com/en/cassandra/2.0/cassandra/tools/toolsCUtility_t.html

 You can later join the ring via nodetool join command:
 http://docs.datastax.com/en/cassandra/2.0/cassandra/tools/toolsJoin.html

 auto_bootstrap=false is typically used to bootstrap new datacenters or
 clusters, or nodes with data already on it before starting the process.

 Cheers,

 Paulo



 2015-08-04 8:50 GMT-03:00 horschi hors...@gmail.com:

 Hi everyone,



 I'll just ask my question as provocative as possible ;-)



 Isnt't auto_bootstrap=false broken the way it is currently implemented?



 What currently happens:

 New node starts with auto_bootstrap=false and it starts serving reads
 immediately without having any data.



 Would the following be more correct:

 - New node should stay in a joining state

 - Operator loads data (e.g. using nodetool rebuild or putting in backupped
 files or whatever)

 - Operator has to manually switch from joining into normal state using
 nodetool (only then it will start serving reads)



 Wouldn't this behaviour more consistent?



 kind regards,

 Christian





 _

 Ce message et ses pieces jointes peuvent contenir des informations 
 confidentielles ou privilegiees et ne doivent donc
 pas etre diffuses, exploites ou copies sans autorisation. Si vous avez recu 
 ce message par erreur, veuillez le signaler
 a l'expediteur et le detruire ainsi que les pieces jointes. Les messages 
 electroniques etant susceptibles d'alteration,
 Orange decline toute responsabilite si ce message a ete altere, deforme ou 
 falsifie. Merci.

 This message and its attachments may contain confidential or privileged 
 information that may be protected by law;
 they should not be distributed, used or copied without authorisation.
 If you have received this email in error, please notify the sender and delete 
 this message and its attachments.
 As emails may be altered, Orange is not liable for messages that have been 
 modified, changed or falsified.
 Thank you.




Re: auto_bootstrap=false broken?

2015-08-04 Thread horschi
Hi Jonathan,

unless you specify auto_bootstrap=false :)

kind regards,
Christian

On Tue, Aug 4, 2015 at 7:54 PM, Jonathan Haddad j...@jonhaddad.com wrote:

 You're trying to solve a problem that doesn't exist.  Cassandra only
 starts serving reads when it's ready.

 On Tue, Aug 4, 2015 at 10:51 AM horschi hors...@gmail.com wrote:

 Hi Robert,

 sorry for the confusion. Perhaps write_survey is not my solution
 (unfortunetaly I cant get it to work, so I dont really know). I just
 thought that it *could* be my solution.


 What I actually want:
 I want to be able to start a new node, without it starting to serve reads
 prematurely. I want cassandra to wait for me to confirm everything is ok,
 now serve reads.



 Possible solutions so far:

 A) When starting a new node with auto_bootstrap=false, then I get a node
 that has no data, but serves reads. In my opinion it would be cleaner if it
 would stay in a joining state where it only receives writes.

 B) Disabling join_ring on my new node does nothing. The new node will not
 have a token. I cant see it in nodetool status. Therefore I assume it
 will not receive any writes.

 C) write_survey unfortunetaly does not seem to work for me: My new node,
 which I start with survey-mode, gets writes from other nodes and shows as
 joining in the ring. Which is good! But does not get a schema, so it
 throws exceptions when receiving these writes. I assume its just a bug in
 2.0.




 Disclaimer: I am using C* 2.0, with which I can't get the desire
 behaviour (or at least I don't know how).

 kind regards,
 Christian




 On Tue, Aug 4, 2015 at 7:12 PM, Robert Coli rc...@eventbrite.com wrote:

 On Tue, Aug 4, 2015 at 6:19 AM, horschi hors...@gmail.com wrote:

 I would like to separate these steps:
 1. assign tokens
 2. have the node in a joining state, so that I can copy in data
 3. mark the node as ready



 Did anyone ever use write_survey for such a partial bootstrapping?


 What you're asking doesn't make sense to me.

 What does partial bootstrap mean? Where are you getting the data from?
 How are you copying in data and why do you need the node to be in a
 joining state to do that?

 https://issues.apache.org/jira/browse/CASSANDRA-6961

 Explains a method by which you can repair a partially joined node. In
 what way does this differ from what you want?

 =Rob





Re: Truncate really slow

2015-07-01 Thread horschi
Hi,

you have to enable -Dcassandra.unsafesystem=true in cassandra-env.sh. Also
disable durables writes for your CFs.

This should speed things up and should reduce IOWait dramatically.

kind regards,
Christian

On Wed, Jul 1, 2015 at 11:52 PM, Robert Wille rwi...@fold3.com wrote:

 I have two test clusters, both 2.0.15. One has a single node and one has
 three nodes. Truncate on the three node cluster is really slow, but is
 quite fast on the single-node cluster. My test cases truncate tables before
 each test, and  95% of the time in my test cases is spent truncating
 tables on the 3-node cluster. Auto-snapshotting is off.

 I know there’s some coordination that has to occur when a truncate
 happens, but it seems really excessive. Almost one second to truncate each
 table with an otherwise idle cluster.

 Any thoughts?

 Thanks in advance

 Robert




Re: How to minimize Cassandra memory usage for test environment?

2015-06-09 Thread horschi
Hi Eax,

are you truncating/dropping tables between tests? Are your issues perhaps
related to that?

If you are, you should disable autoSnapshots and enable -DunsafeSystem=true
to make it run smoother.

kind regards,
Christian

On Tue, Jun 9, 2015 at 11:25 AM, Jason Wee peich...@gmail.com wrote:

 for a start, maybe you can see the setting use by raspberry pi project,
 for instance
 http://ac31004.blogspot.com/2012/05/apache-cassandra-on-raspberry-pi.html

 you can look at these two files, to tune down the settings for test
 environment.
 cassandra-env.sh
 cassandra.yaml

 hth

 jason

 On Tue, Jun 9, 2015 at 3:59 PM, Eax Melanhovich m...@eax.me wrote:

 Hello.

 We are running integration tests, using real Cassandra (not a mock)
 under Vagrant. MAX_HEAP_SIZE is set to 500M. As I discovered, lower
 value causes 'out of memory' after some time.

 Could memory usage be decreased somehow? Developers don't usually have
 a lot of free RAM and performance obviously is not an issue in this
 case.

 --
 Best regards,
 Eax Melanhovich
 http://eax.me/





Re: Query returning tombstones

2015-05-03 Thread horschi
Hi Jens,

thanks a lot for the link! Your ticket seems very similar to my request.

kind regards,
Christian


On Sat, May 2, 2015 at 2:25 PM, Jens Rantil jens.ran...@tink.se wrote:

 Hi Christian,

 I just know Sylvain explicitly stated he wasn't a fan of exposing
 tombstones here:
 https://issues.apache.org/jira/browse/CASSANDRA-8574?focusedCommentId=14292063page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14292063

 Cheers,
 Jens

 On Wed, Apr 29, 2015 at 12:43 PM, horschi hors...@gmail.com wrote:

 Hi,

 did anybody ever raise a feature request for selecting tombstones in
 CQL/thrift?

 It would be nice if I could use CQLSH to see where my tombstones are
 coming from. This would much more convenient than using sstable2json.

 Maybe someone can point me to an existing jira-ticket, but I also
 appreciate any other feedback :-)

 regards,
 Christian




 --
 Jens Rantil
 Backend engineer
 Tink AB

 Email: jens.ran...@tink.se
 Phone: +46 708 84 18 32
 Web: www.tink.se

 Facebook https://www.facebook.com/#!/tink.se Linkedin
 http://www.linkedin.com/company/2735919?trk=vsrp_companies_res_phototrkInfo=VSRPsearchId%3A1057023381369207406670%2CVSRPtargetId%3A2735919%2CVSRPcmpt%3Aprimary
  Twitter https://twitter.com/tink



Query returning tombstones

2015-04-29 Thread horschi
Hi,

did anybody ever raise a feature request for selecting tombstones in
CQL/thrift?

It would be nice if I could use CQLSH to see where my tombstones are coming
from. This would much more convenient than using sstable2json.

Maybe someone can point me to an existing jira-ticket, but I also
appreciate any other feedback :-)

regards,
Christian


Re: How do you run integration tests for your cassandra code?

2014-10-13 Thread horschi
Hi Kevin,

I run my tests against my locally running Cassandra instance. I am not
using any framework, but simply truncate all my tables after/before each
test. With which I am quite happy.

You have to enable the unsafeSystem property, disable durable writes on the
CFs and disable auto-snapshot in the yaml for it to be fast.

kind regards,
Christian


On Mon, Oct 13, 2014 at 9:50 PM, Kevin Burton bur...@spinn3r.com wrote:

 Curious to see if any of you have an elegant solution here.

 Right now I”m using cassandra unit;

 https://github.com/jsevellec/cassandra-unit

 for my integration tests.

 The biggest problem is that it doesn’t support shutdown.  so I can’t stop
 or cleanup after cassandra between tests.

 I have other Java daemons that have the same problem.  For example,
 ActiveMQ doesn’t clean up after itself.

 I was *thinking* of using docker or vagrant to startup a daemon in a
 container, then shut it down between tests.

 But this seems difficult to setup and configure … as well as being not
 amazingly portable.

 Another solution is to use a test suite, and a setUp/tearDown that drops
 all tables created by a test.   This way you’re still on the same cassandra
 instance, but the tables are removed for each pass.

 Anyone have an elegant solution to this?

 --

 Founder/CEO Spinn3r.com
 Location: *San Francisco, CA*
 blog: http://burtonator.wordpress.com
 … or check out my Google+ profile
 https://plus.google.com/102718274791889610666/posts
 http://spinn3r.com




Re: MemtablePostFlusher and FlushWriter

2014-07-17 Thread horschi
Hi Ahmed,

for that you should increase the flush queue size setting in your
cassandra.yaml

kind regards,
Christian



On Thu, Jul 17, 2014 at 10:54 AM, Kais Ahmed k...@neteck-fr.com wrote:

 Thanks christian,

 I'll check on my side.

 Have you an idea about FlushWriter 'All time blocked'

 Thanks,


 2014-07-16 16:23 GMT+02:00 horschi hors...@gmail.com:

 Hi Ahmed,

 this exception is caused by you creating rows with a key-length of more
 than 64kb. Your key is 394920 bytes long it seems.

 Keys and column-names are limited to 64kb. Only values may be larger.

 I cannot say for sure if this is the cause of your high
 MemtablePostFlusher pending count, but I would say it is possible.

 kind regards,
 Christian

 PS: I still use good old thrift lingo.






 On Wed, Jul 16, 2014 at 3:14 PM, Kais Ahmed k...@neteck-fr.com wrote:

 Hi chris, christan,

 Thanks for reply, i'm not using DSE.

 I have in the log files, this error that appear two times.

 ERROR [FlushWriter:3456] 2014-07-01 18:25:33,607 CassandraDaemon.java
 (line 196) Exception in thread Thread[FlushWriter:3456,5,main]
 java.lang.AssertionError: 394920
 at
 org.apache.cassandra.utils.ByteBufferUtil.writeWithShortLength(ByteBufferUtil.java:342)
 at
 org.apache.cassandra.db.ColumnIndex$Builder.maybeWriteRowHeader(ColumnIndex.java:201)
 at
 org.apache.cassandra.db.ColumnIndex$Builder.add(ColumnIndex.java:188)
 at
 org.apache.cassandra.db.ColumnIndex$Builder.build(ColumnIndex.java:133)
 at
 org.apache.cassandra.io.sstable.SSTableWriter.rawAppend(SSTableWriter.java:202)
 at
 org.apache.cassandra.io.sstable.SSTableWriter.append(SSTableWriter.java:187)
 at
 org.apache.cassandra.db.Memtable$FlushRunnable.writeSortedContents(Memtable.java:365)
 at
 org.apache.cassandra.db.Memtable$FlushRunnable.runWith(Memtable.java:318)
 at
 org.apache.cassandra.io.util.DiskAwareRunnable.runMayThrow(DiskAwareRunnable.java:48)
 at
 org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28)
 at
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
 at
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
 at java.lang.Thread.run(Thread.java:744)


 It's the same error than this link
 http://mail-archives.apache.org/mod_mbox/cassandra-user/201305.mbox/%3cbay169-w52699dd7a1c0007783f8d8a8...@phx.gbl%3E
 ,
 with the same configuration 2 nodes RF 2 with SimpleStrategy.

 Hope this help.

 Thanks,



 2014-07-16 1:49 GMT+02:00 Chris Lohfink clohf...@blackbirdit.com:

 The MemtablePostFlusher is also used for flushing non-cf backed (solr)
 indexes.  Are you using DSE and solr by chance?

 Chris

 On Jul 15, 2014, at 5:01 PM, horschi hors...@gmail.com wrote:

 I have seen this behavour when Commitlog files got deleted (or
 permissions were set to read only).

 MemtablePostFlusher is the stage that marks the Commitlog as flushed.
 When they fail it usually means there is something wrong with the commitlog
 files.

 Check your logfiles for any commitlog related errors.

 regards,
 Christian


 On Tue, Jul 15, 2014 at 7:03 PM, Kais Ahmed k...@neteck-fr.com wrote:

 Hi all,

 I have a small cluster (2 nodes RF 2)  running with C* 2.0.6 on I2
 Extra Large (AWS) with SSD disk,
 the nodetool tpstats shows many MemtablePostFlusher pending and
 FlushWriter All time blocked.

 The two nodes have the default configuration. All CF use size-tiered
 compaction strategy.

 There are 10 times more reads than writes (1300 reads/s and 150
 writes/s).


 ubuntu@node1 :~$ nodetool tpstats
 Pool NameActive   Pending  Completed
 Blocked  All time blocked
 MemtablePostFlusher   1  1158 159590
 0 0
 FlushWriter   0 0  11568
 0  1031

 ubuntu@node1:~$ nodetool compactionstats
 pending tasks: 90
 Active compaction remaining time :n/a


 ubuntu@node2:~$ nodetool tpstats
 Pool NameActive   Pending  Completed
 Blocked  All time blocked
 MemtablePostFlusher   1  1020  50987
 0 0
 FlushWriter   0 0   6672
 0   948


 ubuntu@node2:~$ nodetool compactionstats
 pending tasks: 89
 Active compaction remaining time :n/a

 I think there is something wrong, thank you for your help.









Re: MemtablePostFlusher and FlushWriter

2014-07-16 Thread horschi
Hi Ahmed,

this exception is caused by you creating rows with a key-length of more
than 64kb. Your key is 394920 bytes long it seems.

Keys and column-names are limited to 64kb. Only values may be larger.

I cannot say for sure if this is the cause of your high MemtablePostFlusher
pending count, but I would say it is possible.

kind regards,
Christian

PS: I still use good old thrift lingo.






On Wed, Jul 16, 2014 at 3:14 PM, Kais Ahmed k...@neteck-fr.com wrote:

 Hi chris, christan,

 Thanks for reply, i'm not using DSE.

 I have in the log files, this error that appear two times.

 ERROR [FlushWriter:3456] 2014-07-01 18:25:33,607 CassandraDaemon.java
 (line 196) Exception in thread Thread[FlushWriter:3456,5,main]
 java.lang.AssertionError: 394920
 at
 org.apache.cassandra.utils.ByteBufferUtil.writeWithShortLength(ByteBufferUtil.java:342)
 at
 org.apache.cassandra.db.ColumnIndex$Builder.maybeWriteRowHeader(ColumnIndex.java:201)
 at
 org.apache.cassandra.db.ColumnIndex$Builder.add(ColumnIndex.java:188)
 at
 org.apache.cassandra.db.ColumnIndex$Builder.build(ColumnIndex.java:133)
 at
 org.apache.cassandra.io.sstable.SSTableWriter.rawAppend(SSTableWriter.java:202)
 at
 org.apache.cassandra.io.sstable.SSTableWriter.append(SSTableWriter.java:187)
 at
 org.apache.cassandra.db.Memtable$FlushRunnable.writeSortedContents(Memtable.java:365)
 at
 org.apache.cassandra.db.Memtable$FlushRunnable.runWith(Memtable.java:318)
 at
 org.apache.cassandra.io.util.DiskAwareRunnable.runMayThrow(DiskAwareRunnable.java:48)
 at
 org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28)
 at
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
 at
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
 at java.lang.Thread.run(Thread.java:744)


 It's the same error than this link
 http://mail-archives.apache.org/mod_mbox/cassandra-user/201305.mbox/%3cbay169-w52699dd7a1c0007783f8d8a8...@phx.gbl%3E
 ,
 with the same configuration 2 nodes RF 2 with SimpleStrategy.

 Hope this help.

 Thanks,



 2014-07-16 1:49 GMT+02:00 Chris Lohfink clohf...@blackbirdit.com:

 The MemtablePostFlusher is also used for flushing non-cf backed (solr)
 indexes.  Are you using DSE and solr by chance?

 Chris

 On Jul 15, 2014, at 5:01 PM, horschi hors...@gmail.com wrote:

 I have seen this behavour when Commitlog files got deleted (or
 permissions were set to read only).

 MemtablePostFlusher is the stage that marks the Commitlog as flushed.
 When they fail it usually means there is something wrong with the commitlog
 files.

 Check your logfiles for any commitlog related errors.

 regards,
 Christian


 On Tue, Jul 15, 2014 at 7:03 PM, Kais Ahmed k...@neteck-fr.com wrote:

 Hi all,

 I have a small cluster (2 nodes RF 2)  running with C* 2.0.6 on I2 Extra
 Large (AWS) with SSD disk,
 the nodetool tpstats shows many MemtablePostFlusher pending and
 FlushWriter All time blocked.

 The two nodes have the default configuration. All CF use size-tiered
 compaction strategy.

 There are 10 times more reads than writes (1300 reads/s and 150
 writes/s).


 ubuntu@node1 :~$ nodetool tpstats
 Pool NameActive   Pending  Completed   Blocked
 All time blocked
 MemtablePostFlusher   1  1158 159590
 0 0
 FlushWriter   0 0  11568
 0  1031

 ubuntu@node1:~$ nodetool compactionstats
 pending tasks: 90
 Active compaction remaining time :n/a


 ubuntu@node2:~$ nodetool tpstats
 Pool NameActive   Pending  Completed   Blocked
 All time blocked
 MemtablePostFlusher   1  1020  50987
 0 0
 FlushWriter   0 0   6672
 0   948


 ubuntu@node2:~$ nodetool compactionstats
 pending tasks: 89
 Active compaction remaining time :n/a

 I think there is something wrong, thank you for your help.







Re: MemtablePostFlusher and FlushWriter

2014-07-15 Thread horschi
I have seen this behavour when Commitlog files got deleted (or permissions
were set to read only).

MemtablePostFlusher is the stage that marks the Commitlog as flushed. When
they fail it usually means there is something wrong with the commitlog
files.

Check your logfiles for any commitlog related errors.

regards,
Christian


On Tue, Jul 15, 2014 at 7:03 PM, Kais Ahmed k...@neteck-fr.com wrote:

 Hi all,

 I have a small cluster (2 nodes RF 2)  running with C* 2.0.6 on I2 Extra
 Large (AWS) with SSD disk,
 the nodetool tpstats shows many MemtablePostFlusher pending and
 FlushWriter All time blocked.

 The two nodes have the default configuration. All CF use size-tiered
 compaction strategy.

 There are 10 times more reads than writes (1300 reads/s and 150 writes/s).


 ubuntu@node1 :~$ nodetool tpstats
 Pool NameActive   Pending  Completed   Blocked
 All time blocked
 MemtablePostFlusher   1  1158 159590
 0 0
 FlushWriter   0 0  11568
 0  1031

 ubuntu@node1:~$ nodetool compactionstats
 pending tasks: 90
 Active compaction remaining time :n/a


 ubuntu@node2:~$ nodetool tpstats
 Pool NameActive   Pending  Completed   Blocked
 All time blocked
 MemtablePostFlusher   1  1020  50987
 0 0
 FlushWriter   0 0   6672
 0   948


 ubuntu@node2:~$ nodetool compactionstats
 pending tasks: 89
 Active compaction remaining time :n/a

 I think there is something wrong, thank you for your help.




Re: Cassandra 2.0.8 MemoryMeter goes crazy

2014-06-16 Thread horschi
Hi again,

before people start replying here: I just reported a Jira ticket:
https://issues.apache.org/jira/browse/CASSANDRA-7401

I think Memtable.maybeUpdateLiveRatio() needs some love.

kind regards,
Christian



On Sat, Jun 14, 2014 at 10:02 PM, horschi hors...@gmail.com wrote:

 Hi everyone,

 this week we upgraded one of our Systems from Cassandra 1.2.16 to 2.0.8.
 All 3 nodes were upgraded. SStables are upgraded.

 Unfortunetaly we are now experiencing that Cassandra starts to hang every
 10 hours or so.

 We can see the MemoryMeter being very active, every time it is hanging.
 Both in tpstats and in the system.log:

  INFO [MemoryMeter:1] 2014-06-14 19:24:09,488 Memtable.java (line 481)
 CFS(Keyspace='MDS', ColumnFamily='ResponsePortal') liveRatio is 64.0
 (just-counted was 64.0).  calculation took 0ms for 0 cells

 This line is logged hundreds of times per second (!) when Cassandra is
 down. CPU is a 100% busy.

 Interestingly this is only logged for this particular Columnfamily. This
 CF is used as a queue, which only contains a few entries (datafiles are
 about 4kb, only ~100 keys, usually 1-2 active, 98-99 tombstones).

 Table: ResponsePortal
 SSTable count: 1
 Space used (live), bytes: 4863
 Space used (total), bytes: 4863
 SSTable Compression Ratio: 0.9545454545454546
 Number of keys (estimate): 128
 Memtable cell count: 0
 Memtable data size, bytes: 0
 Memtable switch count: 1
 Local read count: 0
 Local read latency: 0.000 ms
 Local write count: 5
 Local write latency: 0.000 ms
 Pending tasks: 0
 Bloom filter false positives: 0
 Bloom filter false ratio: 0.0
 Bloom filter space used, bytes: 176
 Compacted partition minimum bytes: 43
 Compacted partition maximum bytes: 50
 Compacted partition mean bytes: 50
 Average live cells per slice (last five minutes): 0.0
 Average tombstones per slice (last five minutes): 0.0


 Table: ResponsePortal
 SSTable count: 1
 Space used (live), bytes: 4765
 Space used (total), bytes: 5777
 SSTable Compression Ratio: 0.75
 Number of keys (estimate): 128
 Memtable cell count: 0
 Memtable data size, bytes: 0
 Memtable switch count: 12
 Local read count: 0
 Local read latency: 0.000 ms
 Local write count: 1096
 Local write latency: 0.000 ms
 Pending tasks: 0
 Bloom filter false positives: 0
 Bloom filter false ratio: 0.0
 Bloom filter space used, bytes: 16
 Compacted partition minimum bytes: 43
 Compacted partition maximum bytes: 50
 Compacted partition mean bytes: 50
 Average live cells per slice (last five minutes): 0.0
 Average tombstones per slice (last five minutes): 0.0


 Has anyone ever seen this or has an idea what could be wrong? It seems
 that 2.0 can handle this column family not as good as 1.2 could.

 Any hints on what could be wrong are greatly appreciated :-)

 Cheers,
 Christian



Re: Cassandra 2.0.8 MemoryMeter goes crazy

2014-06-16 Thread horschi
Hi Robert,

sorry, I am using our own internal terminology :-)

The entire cluster was upgraded. All 3 nodes of that cluster are on 2.0.8
now.

About the issue:
To me it looks like there is something wrong in the Memtable class. Some
very special edge case on CFs that are updated rarely. I cant say if it is
new to 2.0 or if it already existed in 1.2.

About running mixed versions:
I thought running mixed versions is ok. Running repair with mixed versions
is not though. Right?

kind regards,
Christian



On Mon, Jun 16, 2014 at 7:50 PM, Robert Coli rc...@eventbrite.com wrote:

 On Sat, Jun 14, 2014 at 1:02 PM, horschi hors...@gmail.com wrote:

 this week we upgraded one of our Systems from Cassandra 1.2.16 to 2.0.8.
 All 3 nodes were upgraded. SStables are upgraded.


 One of your *clusters* or one of your *systems*?

 Running with split major versions is not supported.

 =Rob



Cassandra 2.0.8 MemoryMeter goes crazy

2014-06-14 Thread horschi
Hi everyone,

this week we upgraded one of our Systems from Cassandra 1.2.16 to 2.0.8.
All 3 nodes were upgraded. SStables are upgraded.

Unfortunetaly we are now experiencing that Cassandra starts to hang every
10 hours or so.

We can see the MemoryMeter being very active, every time it is hanging.
Both in tpstats and in the system.log:

 INFO [MemoryMeter:1] 2014-06-14 19:24:09,488 Memtable.java (line 481)
CFS(Keyspace='MDS', ColumnFamily='ResponsePortal') liveRatio is 64.0
(just-counted was 64.0).  calculation took 0ms for 0 cells

This line is logged hundreds of times per second (!) when Cassandra is
down. CPU is a 100% busy.

Interestingly this is only logged for this particular Columnfamily. This CF
is used as a queue, which only contains a few entries (datafiles are about
4kb, only ~100 keys, usually 1-2 active, 98-99 tombstones).

Table: ResponsePortal
SSTable count: 1
Space used (live), bytes: 4863
Space used (total), bytes: 4863
SSTable Compression Ratio: 0.9545454545454546
Number of keys (estimate): 128
Memtable cell count: 0
Memtable data size, bytes: 0
Memtable switch count: 1
Local read count: 0
Local read latency: 0.000 ms
Local write count: 5
Local write latency: 0.000 ms
Pending tasks: 0
Bloom filter false positives: 0
Bloom filter false ratio: 0.0
Bloom filter space used, bytes: 176
Compacted partition minimum bytes: 43
Compacted partition maximum bytes: 50
Compacted partition mean bytes: 50
Average live cells per slice (last five minutes): 0.0
Average tombstones per slice (last five minutes): 0.0


Table: ResponsePortal
SSTable count: 1
Space used (live), bytes: 4765
Space used (total), bytes: 5777
SSTable Compression Ratio: 0.75
Number of keys (estimate): 128
Memtable cell count: 0
Memtable data size, bytes: 0
Memtable switch count: 12
Local read count: 0
Local read latency: 0.000 ms
Local write count: 1096
Local write latency: 0.000 ms
Pending tasks: 0
Bloom filter false positives: 0
Bloom filter false ratio: 0.0
Bloom filter space used, bytes: 16
Compacted partition minimum bytes: 43
Compacted partition maximum bytes: 50
Compacted partition mean bytes: 50
Average live cells per slice (last five minutes): 0.0
Average tombstones per slice (last five minutes): 0.0


Has anyone ever seen this or has an idea what could be wrong? It seems that
2.0 can handle this column family not as good as 1.2 could.

Any hints on what could be wrong are greatly appreciated :-)

Cheers,
Christian


Does NetworkTopologyStrategy in Cassandra 2.0 work?

2014-04-22 Thread horschi
Hi,

is it possible that NetworkTopologyStrategy does not work with Cassandra
2.0 any more?

I just updated my Dev Cluster to 2.0.7 and got UnavailableExceptions for
CQLThrift queries on my already existing column families, even though all
(two) nodes were up. Changing to SimpleStrategy fixed the issue.

Also I cannot switch switch back to NetworkTopologyStrategy:

[default@unknown] update keyspace MYKS with placement_strategy =
'NetworkTopologyStrategy';
Error constructing replication strategy class

[default@unknown] update keyspace MYKS with placement_strategy =
'org.apache.cassandra.locator.NetworkTopologyStrategy';
Error constructing replication strategy class


This does not seem to be something I encountered with 1.2 before. Can
anyone tell me which one is broken here, Cassandra or myself? :-)

cheers,
Christian


Re: Does NetworkTopologyStrategy in Cassandra 2.0 work?

2014-04-22 Thread horschi
Ok, it seems 2.0 now is simply stricter about datacenter names. I simply
had to change the datacenter name to match the name in nodetool ring:

update keyspace MYKS with placement_strategy = 'NetworkTopologyStrategy'
and strategy_options = {datacenter1 : 2};

So the schema was wrong, but 1.2 did not care about it.

cheers,
Christian


On Tue, Apr 22, 2014 at 1:51 PM, horschi hors...@gmail.com wrote:

 Hi,

 is it possible that NetworkTopologyStrategy does not work with Cassandra
 2.0 any more?

 I just updated my Dev Cluster to 2.0.7 and got UnavailableExceptions for
 CQLThrift queries on my already existing column families, even though all
 (two) nodes were up. Changing to SimpleStrategy fixed the issue.

 Also I cannot switch switch back to NetworkTopologyStrategy:

 [default@unknown] update keyspace MYKS with placement_strategy =
 'NetworkTopologyStrategy';
 Error constructing replication strategy class

 [default@unknown] update keyspace MYKS with placement_strategy =
 'org.apache.cassandra.locator.NetworkTopologyStrategy';
 Error constructing replication strategy class


 This does not seem to be something I encountered with 1.2 before. Can
 anyone tell me which one is broken here, Cassandra or myself? :-)

 cheers,
 Christian



Re: Expired column showing up

2014-02-17 Thread horschi
Hi Mahesh,

the problem is that every column is only tombstoned for as long as the
original column was valid.

So if the last update was only valid for 1 sec, then the tombstone will
also be valid for 1 second! If the previous was valid for a longer time,
then this old value might reappear.

Maybe you can explain why you are doing this?

kind regards,
Christian



On Mon, Feb 17, 2014 at 6:18 PM, mahesh rajamani
rajamani.mah...@gmail.comwrote:

 Christain,

 Yes. Is it a problem?  Can you explain what happens in this scenario?

 Thanks
 Mahesh


 On Fri, Feb 14, 2014 at 3:07 PM, horschi hors...@gmail.com wrote:

 Hi Mahesh,

 is it possible you are creating columns with a long TTL, then update
 these columns with a smaller TTL?

 kind regards,
 Christian


 On Fri, Feb 14, 2014 at 3:45 PM, mahesh rajamani 
 rajamani.mah...@gmail.com wrote:

 Hi,

 I am using Cassandra 2.0.2 version. On a wide row (approx. 1
 columns), I expire few column by setting TTL as 1 second. At times these
 columns show up during slice query.

 When I have this issue, running count and get commands for that row
 using Cassandra cli it gives different column counts.

 But once I run flush and compact, the issue goes off and expired columns
 don't show up.

 Can someone provide some help on this issue.

 --
 Regards,
 Mahesh Rajamani





 --
 Regards,
 Mahesh Rajamani



Re: Expired column showing up

2014-02-14 Thread horschi
Hi Mahesh,

is it possible you are creating columns with a long TTL, then update these
columns with a smaller TTL?

kind regards,
Christian


On Fri, Feb 14, 2014 at 3:45 PM, mahesh rajamani
rajamani.mah...@gmail.comwrote:

 Hi,

 I am using Cassandra 2.0.2 version. On a wide row (approx. 1 columns),
 I expire few column by setting TTL as 1 second. At times these columns show
 up during slice query.

 When I have this issue, running count and get commands for that row using
 Cassandra cli it gives different column counts.

 But once I run flush and compact, the issue goes off and expired columns
 don't show up.

 Can someone provide some help on this issue.

 --
 Regards,
 Mahesh Rajamani



Re: Possible optimization: avoid creating tombstones for TTLed columns if updates to TTLs are disallowed

2014-01-28 Thread horschi
Hi Donald,

I was reporting the ticket you mentioned, so I kinds feel like I should
answer this :-)


 I presume the point is that GCable tombstones can still do work
 (preventing spurious writing from nodes that were down) but only until the
 data is flushed to disk.

I am not sure I understand this correctly. Could you rephrase that sentence?



 If the effective TTL exceeds gc_grace_seconds then the tombstone will be
 deleted anyway.

Its not even written (since  CASSANDRA-4917). There is no delete on the
tombstone in that case.



  It occurred to me that if you never update the TTL of a column, then
 there should be no need for tombstones at all:  any replicas will have the
 same TTL.  So there'd be no risk of missed deletes.  You wouldn't even need
 GCable tombstones

I think so too. There should be no need for a tombstone at all if the
following condition are given:
- column was not deleted manually, but timed out by itself
- column was not updated in the last gc_grace days

If I am not mistaken, the second point would even be neccessary for
CASSANDRA-4917 to be able to handle changing TTLs correctly: I think the
current implementation might break, if a column gets updated with a smaller
TTL, or to be more precise when  (old.creationdate + old.ttl) 
(new.creationdate + new.ttl)  new.ttl  gc_grace


Imho, for any further tombstone-optimization to work, compaction would have
to be smarter:
 I think it should be able to track max(old.creationdate + old.ttl ,
new.creationdate + new.ttl) when merging columns. I have no idea if that is
possible though.



 So, if - and it's a big if - a table disallowed updates to TTL, then you
 could really optimize deletion of TTLed columns: you could do away with
 tombstones entirely.   If a table allows updates to TTL then it's possible
 a different node will have the row without the TTL and the tombstone would
 be needed.

I am not sure I understand this. My thrift understanding of cassandra is
that you cannot update the TTL, you can just update an entire column. Also
each column has its own TTL. There is no TTL on the row.


cheers,
Christian


Re: Cassandra unit testing becoming nearly impossible: suggesting alternative.

2013-12-25 Thread horschi
Hi Ed,

my opinion on unit testing with C* is: Use the real database, not any
embedded crap :-)

All you need are fast truncates, by which I mean:
JVM_OPTS=$JVM_OPTS -Dcassandra.unsafesystem=true
and
auto_snapshot: false

This setup works really nice for me (C* 1.1 and 1.2, have not tested 2.0
yet).

Imho this setup is better for multiple reasons:
- No extra classpath issues
- Faster: Running JUnits and C* in one JVM would require a really large
heap (for me at least).
- Faster: No Cassandra startup everytime I run my tests.

The only downside is that developers must change the properties in their
configs.

cheers,
Christian



On Tue, Dec 24, 2013 at 9:31 PM, Edward Capriolo edlinuxg...@gmail.comwrote:

 I am not sure there how many people have been around developing Cassandra
 for as long as I have, but the state of all the client libraries and the
 cassandra server is WORD_I_DONT_WANT_TO_SAY.

 Here is an example of something I am seeing:
 ERROR 14:59:45,845 Exception in thread Thread[Thrift:5,5,main]
 java.lang.AbstractMethodError:
 org.apache.thrift.ProcessFunction.isOneway()Z
 at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:51)
 at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39)
 at
 org.apache.cassandra.thrift.CustomTThreadPoolServer$WorkerProcess.run(CustomTThreadPoolServer.java:194)
 at
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
 at
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
 at java.lang.Thread.run(Thread.java:722)
 DEBUG 14:59:51,654 retryPolicy for schema_triggers is 0.99

 In short: If you are new to cassandra and only using the newest client I
 am sure everything is peachy for you.

 For people that have been using Cassandra for a while it is harder to
 jump ship when something better comes along. You need sometimes to
 support both hector and astyanax, it happens.

 For a while I have been using hector. Even not to use hector as an API,
 but the one nice thing I got from hector was a simple EmbeddedServer that
 would clean up after itself. Hector seems badly broken at the moment. I
 have no idea how the current versions track with anything out there in the
 cassandra world.

 For a while I played with https://github.com/Netflix/astyanax, which has
 it's own version and schemes and dependent libraries. (astyanax has some
 packaging error that forces me into maven3)

 Enter cassandra 2.0 which forces you into java 0.7. Besides that it has
 it's own kit of things it seems to want.

 I am guessing since hectors embedded server does not work, and I should go
 to https://github.com/jsevellec/cassandra-unit not sure...really...how
 anyone does this anymore. I am sure I could dive into the source code and
 figure this out, but I would just rather have a stable piece of code that
 brings up the embedded server that just works and continues working.

 I can not seem to get this working right either. (since it includes hector
 I see from the pom)

 Between thrift, cassandra,client x, it is almost impossible to build a
 sane classpath, and that is not even counting the fact that people have
 their own classpath issues (with guava mismatches etc).

 I think the only sane thing to do is start shipping cassandra-embedded
 like this:

 https://github.com/kstyrc/embedded-redis

 In other words package embedded-cassandra as a binary. Don't force the
 client/application developer to bring cassandra on the classpath and fight
 with mismatches in thrift/guava etc. That or provide a completely shaded
 cassandra server for embedded testing. As it stands now trying to support a
 setup that uses more than one client or works with multiple versions of
 cassandra is major pita.  (aka library x compiled against 1.2.0 library y
 compiled against 2.0.3)

 Does anyone have any thoughts on this, or tried something similar?

 Edward




Offline migration: Random-Murmur

2013-12-23 Thread horschi
Hi list,

has anyone ever tried to migrate a cluster from Random to Murmur?

We would like to do so, to have a more standardized setup. I wrote a small
(yet untested) utility, which should be able to read SSTable files from
disk and write them into a cassandra cluster using Hector. This migration
would be offline of course and would only work for smaller clusters.

Any thoughts on the topic?

kind regards,
Christian

PS: The reason for doing so are not performance. It is to simplify
operational stuff for the years to come. :-)
import java.io.File;
import java.io.FilenameFilter;
import java.nio.ByteBuffer;
import java.util.HashSet;
import java.util.LinkedList;
import java.util.List;
import java.util.Set;

import me.prettyprint.cassandra.serializers.ByteBufferSerializer;
import me.prettyprint.cassandra.service.CassandraHostConfigurator;
import me.prettyprint.cassandra.service.ThriftCfDef;
import me.prettyprint.cassandra.service.ThriftKsDef;
import me.prettyprint.hector.api.Cluster;
import me.prettyprint.hector.api.Keyspace;
import me.prettyprint.hector.api.beans.HColumn;
import me.prettyprint.hector.api.beans.HCounterColumn;
import me.prettyprint.hector.api.ddl.ColumnFamilyDefinition;
import me.prettyprint.hector.api.ddl.KeyspaceDefinition;
import me.prettyprint.hector.api.factory.HFactory;
import me.prettyprint.hector.api.mutation.Mutator;

import org.apache.cassandra.config.CFMetaData;
import org.apache.cassandra.config.Config;
import org.apache.cassandra.config.DatabaseDescriptor;
import org.apache.cassandra.config.KSMetaData;
import org.apache.cassandra.config.Schema;
import org.apache.cassandra.db.Column;
import org.apache.cassandra.db.CounterColumn;
import org.apache.cassandra.db.DeletedColumn;
import org.apache.cassandra.db.ExpiringColumn;
import org.apache.cassandra.db.OnDiskAtom;
import org.apache.cassandra.db.columniterator.OnDiskAtomIterator;
import org.apache.cassandra.db.filter.QueryFilter;
import org.apache.cassandra.db.filter.QueryPath;
import org.apache.cassandra.db.filter.SliceQueryFilter;
import org.apache.cassandra.dht.IPartitioner;
import org.apache.cassandra.io.sstable.Descriptor;
import org.apache.cassandra.io.sstable.SSTableMetadata;
import org.apache.cassandra.io.sstable.SSTableReader;
import org.apache.cassandra.io.sstable.SSTableScanner;

/**
 * 
 * @author Christian Spriegel
 *
 */
public class BulkMigrator
{
private static Cluster cluster;
private static String  keyspaceName = null;
private static Keyspacekeyspace;

public static void main(String[] args) throws Exception
{
//Config.setClientMode(true);
Config.setLoadYaml(false);
if(args.length != 1)
{
System.out.println(java -jar BulkMigrator.jar directory);
return;
}

String path = args[0];
File dir = new File(path);
if(!dir.exists() ||  !dir.isDirectory() )
{
System.out.println(Path is not a directory: +path);
return;
}

cluster = HFactory.getOrCreateCluster(TestCluster, new CassandraHostConfigurator(localhost:9160));
FilenameFilter filter = new FilenameFilter()
{
@Override
public boolean accept(File dir, String name)
{
if(!name.endsWith(-Data.db))
return false; // ignore non data files
if(name.substring(0,name.length()-8) .contains(.))
return false; // ignore secondary indexes
return true;
}
};
for(File f : dir.listFiles(filter))
{
System.out.println (Found file +f + ... );
Descriptor desc = Descriptor.fromFilename(dir, f.getName()).left;
initCF(desc.ksname);
System.out.println(Loaded descriptor +desc+ ...);
SSTableMetadata sstableMetadata = SSTableMetadata.serializer.deserialize(desc).left;
DatabaseDescriptor.setPartitioner((IPartitioner)(Class.forName(sstableMetadata.partitioner).newInstance()));
System.out.println(Using partitioner +DatabaseDescriptor.getPartitionerName());
SSTableReader ssreader = SSTableReader.open(desc);
int numkeys = (int)(ssreader.estimatedKeys());
System.out.println(Opened reader +ssreader+ with +numkeys+ keys);

SliceQueryFilter atomfilter = new SliceQueryFilter(ByteBuffer.allocate(0), ByteBuffer.allocate(0), false, Integer.MAX_VALUE);
String cfname = desc.cfname;
SSTableScanner rrar = ssreader.getScanner(new QueryFilter(null, new QueryPath(cfname), atomfilter));

MutatorByteBuffer mutator = HFactory.createMutator(keyspace, ByteBufferSerializer.get());
int keyi = 0;
long bufsize = 0L;
while(rrar.hasNext())
{
OnDiskAtomIterator odai = rrar.next();
ByteBuffer rowkey = 

Re: Offline migration: Random-Murmur

2013-12-23 Thread horschi
Interesting you even dare to do a live migration :-)

Do you do all Murmur-writes with the timestamp from the Random-data? So
that all migrated data is written with timestamps from the past.



On Mon, Dec 23, 2013 at 3:59 PM, Rahul Menon ra...@apigee.com wrote:

 Christian,

 I have been planning to migrate my cluster from random to murmur3 in a
 similar manner. I intend to use pycassa to read and then write to the newer
 cluster. My only concern would be ensuring the consistency of already
 migrated data as the cluster ( with random ) would be constantly serving
 the production traffic. I was able to do this on a non prod cluster, but
 production is a different game.

 I would also like to hear more about this, especially if someone was able
 to successfully do this.

 Thanks
 Rahul


 On Mon, Dec 23, 2013 at 6:45 PM, horschi hors...@gmail.com wrote:

 Hi list,

 has anyone ever tried to migrate a cluster from Random to Murmur?

 We would like to do so, to have a more standardized setup. I wrote a
 small (yet untested) utility, which should be able to read SSTable files
 from disk and write them into a cassandra cluster using Hector. This
 migration would be offline of course and would only work for smaller
 clusters.

 Any thoughts on the topic?

 kind regards,
 Christian

 PS: The reason for doing so are not performance. It is to simplify
 operational stuff for the years to come. :-)





Re: Murmur Long.MIN_VALUE token allowed?

2013-12-10 Thread horschi
Hi Aaron,

thanks for your response. But that is exactly what scares me:
RandomPartitioner.MIN is -1, which is not a valid token :-)

And my feeling gets worse when I look at Murmur3Partitioner.normalize().
This one explicitly excludes Long.MIN_VALUE by changing it to
Long.MAX_VALUE.

I think I'll just avoid it in the future. Better safe than sorry...

cheers,
Christian


On Tue, Dec 10, 2013 at 8:24 AM, Aaron Morton aa...@thelastpickle.comwrote:

 AFAIK any value that is a valid output from murmor3 is a valid token.

 The Murmur3Partitioner set’s min and max to long min and max…

 public static final LongToken MINIMUM = new LongToken(Long.MIN_VALUE);
 public static final long MAXIMUM = Long.MAX_VALUE;

 Cheers

 -
 Aaron Morton
 New Zealand
 @aaronmorton

 Co-Founder  Principal Consultant
 Apache Cassandra Consulting
 http://www.thelastpickle.com

 On 5/12/2013, at 12:38 am, horschi hors...@gmail.com wrote:

 Hi,

 I just realized that I can move a node to Long.MIN_VALUE:

 127.0.0.1  rack1   Up Normal  1011.58 KB  100.00%
 -9223372036854775808

 Is that really a valid token for Murmur3Partitioner ?

 I thought that Long.MIN_VALUE (like -1 for Random) is not a regular token.
 Shouldn't be only used for token-range-scans?

 kind regards,
 Christian





Murmur Long.MIN_VALUE token allowed?

2013-12-04 Thread horschi
Hi,

I just realized that I can move a node to Long.MIN_VALUE:

127.0.0.1  rack1   Up Normal  1011.58 KB  100.00%
-9223372036854775808

Is that really a valid token for Murmur3Partitioner ?

I thought that Long.MIN_VALUE (like -1 for Random) is not a regular token.
Shouldn't be only used for token-range-scans?

kind regards,
Christian


Re: TTL and gc_grace_Seconds

2013-09-18 Thread horschi
Hi Christopher,

in 2.0 gc_grace should be capped by TTL anyway: see CASSANDRA-4917

cheers,
Christian



On Wed, Sep 18, 2013 at 4:29 PM, Christopher Wirt chris.w...@struq.comwrote:

 I have a column family contains time series events, all columns have a 24
 hour TTL and gc_grace_seconds is currently 20 days. There is a TimeUUID in
 part of the key.

 ** **

 It takes 15 days to repair the entire ring.

 ** **

 Consistency is not my main worry. Speed is. We currently write to this CF
 at LOCAL_QUORUM and read at ONE.

 ** **

 Is there any reason to have gc_grace_seconds higher than the TTL? Feels
 like I’m just wasting resources all over given my consistency and speed
 requirements. 

 ** **

 ** **

 Second question.

 Does anyone vary their read repair ratio throughout the day.. i.e. at peak
 turn off read repairs, turn to 0.7 for the grave yard shift.

 ** **

 ** **

 Cheers,

 Chris



Re: How often to run `nodetool repair`

2013-08-01 Thread horschi
 TTL is effectively DELETE; you need to run a repair once every
 gc_grace_seconds. If you don't, data might un-delete itself.


The undelete part is not true. btw: With CASSANDRA-4917 TTLed columns will
not even create a tombstone (assuming ttl  gc_grace).

The rest of your mail I agree with :-)


Re: TTL, Tombstones, and gc_grace

2013-07-25 Thread horschi
Hi Michael,

yes, you should never loose a delete, because there are no real deletes. No
matter what version you are using.

btw: There is actually a ticket that builds an optimization on top of that
assumption: CASSANDRA-4917. Basically, if TTLgc_grace then do not create
tombstones for expiring-columns. This works because disappear anyway if TTL
is over.

cheers,
Christian


On Thu, Jul 25, 2013 at 3:24 PM, Michael Theroux mthero...@yahoo.comwrote:

 Hello,

 Quick question on Cassandra, TTLs, tombstones, and GC grace.  If we have a
 column family whose only mechanism of deleting columns is utilizing TTLs,
 is repair really necessary to make tombstones consistent, and therefore
 would it be safe to set the gc grace period of the column family to a very
 low value?

 I ask because of this blog post based on Cassandra .7:
 http://www.datastax.com/dev/blog/whats-new-cassandra-07-expiring-columns.

 The first time the expired column is compacted, it is transformed into a
 tombstone. This transformation frees some disk space: the size of the value
 of the expired column. From that moment on, the column is a normal
 tombstone and follows the tombstone rules: it will be totally removed by
 compaction (including minor ones in most cases since Cassandra 0.6.6) after
 GCGraceSeconds.

 Since tombstones are not written using a replicated write, but instead
 written during compaction, theoretically, it shouldn't be possible to lose
 a tombstone?  Or is this blog post inaccurate for later versions of
 cassandra?  We are using cassandra 1.1.11.

 Thanks,
 -Mike





Re: About column family

2013-07-25 Thread horschi
With 1.2.7 you can use -Dcassandra.unsafesystem. That will speed up cf
creation. So you will get in even more trouble even faster!



On Tue, Jul 23, 2013 at 12:23 PM, bjbylh bjb...@me.com wrote:

 Hi all:
 i have two questions to ask:
 1,how many column families can be created in a cluster?is there a limit to
 the number of it?
 2,it spents 2-5 seconds to create a new cf while the cluster contains
 about 1 cfs(if the cluster is empty,it spents about 0.5s).is it
 normal?how to improve the efficiency of creating cf?
 btw C* is 1.2.4.
 thanks a lot.

 Sent from Samsung Mobile



C* 1.2.5 AssertionError in ColumnSerializer:40

2013-07-01 Thread horschi
Hi,

using C* 1.2.5 I just found a weird AssertionError in our logfiles:

...
 INFO [OptionalTasks:1] 2013-07-01 09:15:43,608 MeteredFlusher.java (line
58) flushing high-traffic column family CFS(Keyspace='Monitoring',
ColumnFamily='cfDateOrderedMessages') (estimated 5242880 bytes)
 INFO [OptionalTasks:1] 2013-07-01 09:15:43,609 ColumnFamilyStore.java
(line 630) Enqueuing flush of
Memtable-cfDateOrderedMessages@2147245119(4616888/5242880
serialized/live bytes, 23714 ops)
 INFO [FlushWriter:9] 2013-07-01 09:15:43,610 Memtable.java (line 461)
Writing Memtable-cfDateOrderedMessages@2147245119(4616888/5242880
serialized/live bytes, 23714 ops)
ERROR [FlushWriter:9] 2013-07-01 09:15:44,145 CassandraDaemon.java (line
192) Exception in thread Thread[FlushWriter:9,5,main]
java.lang.AssertionError
at
org.apache.cassandra.db.ColumnSerializer.serialize(ColumnSerializer.java:40)
at
org.apache.cassandra.db.ColumnSerializer.serialize(ColumnSerializer.java:30)
at
org.apache.cassandra.db.OnDiskAtom$Serializer.serializeForSSTable(OnDiskAtom.java:62)
at
org.apache.cassandra.db.ColumnIndex$Builder.add(ColumnIndex.java:181)
at
org.apache.cassandra.db.ColumnIndex$Builder.build(ColumnIndex.java:133)
at
org.apache.cassandra.io.sstable.SSTableWriter.append(SSTableWriter.java:185)
at
org.apache.cassandra.db.Memtable$FlushRunnable.writeSortedContents(Memtable.java:489)
at
org.apache.cassandra.db.Memtable$FlushRunnable.runWith(Memtable.java:448)
at
org.apache.cassandra.io.util.DiskAwareRunnable.runMayThrow(DiskAwareRunnable.java:48)
at
org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28)
at
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:662)


I looked into the code and it seems to be coming from the following code:

public void serialize(IColumn column, DataOutput dos) throws IOException
{
assert column.name().remaining()  0; // crash
ByteBufferUtil.writeWithShortLength(column.name(), dos);
try
{...


Does anybody have an idea why this is happening? The machine has some
issues with its disks, but flush shouldn't be affected by bad disks, right?
I can rule out that this memtable was filled by a bad commitlog.

Thanks,
Christian


Re: Cassandra optimizations for multi-core machines

2013-06-05 Thread horschi
Hi,

Cassandra is heavily multithreaded. If the load demands it will make use of
your 8 cores.

I dont know the startup code, but I would assume it would be parallelized
if neccessary/possible. Afaik there were optimizations already made to
reduce the startup time. Therefore I would assume any optimizations left
would not be so easy.

cheers,
Christian Spriegel



On Wed, Jun 5, 2013 at 11:04 PM, srmore comom...@gmail.com wrote:

 Hello All,
 We are thinking of going with Cassandra on a 8 core machine, are there any
 optimizations that can help us here ?

 I have seen that during startup stage Cassandra uses only one core, is
 there a way we can speed up the startup process ?

 Thanks !



Re: Compacted data returns with repair?

2013-06-04 Thread horschi
Hi,

this sounds like the following issue:

https://issues.apache.org/jira/browse/CASSANDRA-4905

cheers,
Ch

On Tue, Jun 4, 2013 at 5:50 PM, André Cruz andre.c...@co.sapo.pt wrote:

 Hello.

 I deleted a lot of data from one of my CFs, waited the gc_grace_period,
 and as the compactions were deleting the data slowly, ran a major
 compaction on that CF. It reduced the size to what I expected. I did not
 run a major compaction on the other 2 nodes (RF = 3) before repairs took
 place and now the CF is again jumping in size on the node I ran the major
 compaction. Is this expected? Does the repair get the data back from the
 other nodes even though it should be gone?

 I should add I'm using Cassandra 1.1.5.

 Thanks,
 André


Re: Cassandra 1.1.11 does not always show filename of corrupted files

2013-05-31 Thread horschi
Ok, looking at the code I can see that 1.2 fixes the issue:
try
{
validBufferBytes =
metadata.compressor().uncompress(compressed.array(), 0, chunk.length,
buffer, 0);
}
catch (IOException e)
{
throw new CorruptBlockException(getPath(), chunk);
}

So thats nice :-)

But does nobody else find the old behaviour annoying? Nobody ever wanted to
identfy the broken files?

cheers,
Christian

On Thu, May 30, 2013 at 7:11 PM, horschi hors...@gmail.com wrote:

 Hi,

 we had some hard-disk issues this week, which caused some datafiles to get
 corrupt, which was reported by the compaction. My approach to fix this was
 to delete the corrupted files and run repair. That sounded easy at first,
 but unfortunetaly C* 1.1.11 sometimes does not show which datafile is
 causing the exception.

 How do you handle such cases? Do you delete the entire CF or do you look
 up the compaction-started message and delete the files being involved?

 In my opinion the Stacktrace should always show the filename of the file
 which could not be read. Does anybody know if there were already changes to
 the logging since 1.1.11? 
 CASSANDRA-2261https://issues.apache.org/jira/browse/CASSANDRA-2261does not 
 seem to have fixed the Exceptionhandling part. Were there perhaps
 changes in 1.2 with the new disk-failure handling?

 cheers,
 Christian

 PS: Here are some examples I found in my logs:

 *Bad behaviour:*
 ERROR [ValidationExecutor:1] 2013-05-29 13:26:09,121
 AbstractCassandraDaemon.java (line 132) Exception in thread
 Thread[ValidationExecutor:1,1,main]
 java.io.IOError: java.io.IOException: FAILED_TO_UNCOMPRESS(5)
 at
 org.apache.cassandra.db.compaction.PrecompactedRow.merge(PrecompactedRow.java:116)
 at
 org.apache.cassandra.db.compaction.PrecompactedRow.init(PrecompactedRow.java:99)
 at
 org.apache.cassandra.db.compaction.CompactionController.getCompactedRow(CompactionController.java:176)
 at
 org.apache.cassandra.db.compaction.CompactionIterable$Reducer.getReduced(CompactionIterable.java:83)
 at
 org.apache.cassandra.db.compaction.CompactionIterable$Reducer.getReduced(CompactionIterable.java:68)
 at
 org.apache.cassandra.utils.MergeIterator$ManyToOne.consume(MergeIterator.java:118)
 at
 org.apache.cassandra.utils.MergeIterator$ManyToOne.computeNext(MergeIterator.java:101)
 at
 com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:140)
 at
 com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:135)
 at
 com.google.common.collect.Iterators$7.computeNext(Iterators.java:614)
 at
 com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:140)
 at
 com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:135)
 at
 org.apache.cassandra.db.compaction.CompactionManager.doValidationCompaction(CompactionManager.java:726)
 at
 org.apache.cassandra.db.compaction.CompactionManager.access$600(CompactionManager.java:69)
 at
 org.apache.cassandra.db.compaction.CompactionManager$9.call(CompactionManager.java:457)
 at
 java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
 at java.util.concurrent.FutureTask.run(FutureTask.java:138)
 at
 java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
 at
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
 at java.lang.Thread.run(Thread.java:662)
 Caused by: java.io.IOException: FAILED_TO_UNCOMPRESS(5)
 at org.xerial.snappy.SnappyNative.throw_error(SnappyNative.java:78)
 at org.xerial.snappy.SnappyNative.rawUncompress(Native Method)
 at org.xerial.snappy.Snappy.rawUncompress(Snappy.java:391)
 at
 org.apache.cassandra.io.compress.SnappyCompressor.uncompress(SnappyCompressor.java:94)
 at
 org.apache.cassandra.io.compress.CompressedRandomAccessReader.decompressChunk(CompressedRandomAccessReader.java:90)
 at
 org.apache.cassandra.io.compress.CompressedRandomAccessReader.reBuffer(CompressedRandomAccessReader.java:71)
 at
 org.apache.cassandra.io.util.RandomAccessReader.read(RandomAccessReader.java:302)
 at java.io.RandomAccessFile.readFully(RandomAccessFile.java:397)
 at java.io.RandomAccessFile.readFully(RandomAccessFile.java:377)
 at
 org.apache.cassandra.utils.BytesReadTracker.readFully(BytesReadTracker.java:95)
 at
 org.apache.cassandra.utils.ByteBufferUtil.read(ByteBufferUtil.java:401)
 at
 org.apache.cassandra.utils.ByteBufferUtil.readWithLength(ByteBufferUtil.java:363)
 at
 org.apache.cassandra.db.ColumnSerializer.deserialize(ColumnSerializer.java:114)
 at
 org.apache.cassandra.db.ColumnSerializer.deserialize(ColumnSerializer.java:37)
 at
 org.apache.cassandra.db.ColumnFamilySerializer.deserializeColumns

Cassandra 1.1.11 does not always show filename of corrupted files

2013-05-30 Thread horschi
Hi,

we had some hard-disk issues this week, which caused some datafiles to get
corrupt, which was reported by the compaction. My approach to fix this was
to delete the corrupted files and run repair. That sounded easy at first,
but unfortunetaly C* 1.1.11 sometimes does not show which datafile is
causing the exception.

How do you handle such cases? Do you delete the entire CF or do you look up
the compaction-started message and delete the files being involved?

In my opinion the Stacktrace should always show the filename of the file
which could not be read. Does anybody know if there were already changes to
the logging since 1.1.11?
CASSANDRA-2261https://issues.apache.org/jira/browse/CASSANDRA-2261does
not seem to have fixed the Exceptionhandling part. Were there perhaps
changes in 1.2 with the new disk-failure handling?

cheers,
Christian

PS: Here are some examples I found in my logs:

*Bad behaviour:*
ERROR [ValidationExecutor:1] 2013-05-29 13:26:09,121
AbstractCassandraDaemon.java (line 132) Exception in thread
Thread[ValidationExecutor:1,1,main]
java.io.IOError: java.io.IOException: FAILED_TO_UNCOMPRESS(5)
at
org.apache.cassandra.db.compaction.PrecompactedRow.merge(PrecompactedRow.java:116)
at
org.apache.cassandra.db.compaction.PrecompactedRow.init(PrecompactedRow.java:99)
at
org.apache.cassandra.db.compaction.CompactionController.getCompactedRow(CompactionController.java:176)
at
org.apache.cassandra.db.compaction.CompactionIterable$Reducer.getReduced(CompactionIterable.java:83)
at
org.apache.cassandra.db.compaction.CompactionIterable$Reducer.getReduced(CompactionIterable.java:68)
at
org.apache.cassandra.utils.MergeIterator$ManyToOne.consume(MergeIterator.java:118)
at
org.apache.cassandra.utils.MergeIterator$ManyToOne.computeNext(MergeIterator.java:101)
at
com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:140)
at
com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:135)
at
com.google.common.collect.Iterators$7.computeNext(Iterators.java:614)
at
com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:140)
at
com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:135)
at
org.apache.cassandra.db.compaction.CompactionManager.doValidationCompaction(CompactionManager.java:726)
at
org.apache.cassandra.db.compaction.CompactionManager.access$600(CompactionManager.java:69)
at
org.apache.cassandra.db.compaction.CompactionManager$9.call(CompactionManager.java:457)
at
java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
at java.util.concurrent.FutureTask.run(FutureTask.java:138)
at
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:662)
Caused by: java.io.IOException: FAILED_TO_UNCOMPRESS(5)
at org.xerial.snappy.SnappyNative.throw_error(SnappyNative.java:78)
at org.xerial.snappy.SnappyNative.rawUncompress(Native Method)
at org.xerial.snappy.Snappy.rawUncompress(Snappy.java:391)
at
org.apache.cassandra.io.compress.SnappyCompressor.uncompress(SnappyCompressor.java:94)
at
org.apache.cassandra.io.compress.CompressedRandomAccessReader.decompressChunk(CompressedRandomAccessReader.java:90)
at
org.apache.cassandra.io.compress.CompressedRandomAccessReader.reBuffer(CompressedRandomAccessReader.java:71)
at
org.apache.cassandra.io.util.RandomAccessReader.read(RandomAccessReader.java:302)
at java.io.RandomAccessFile.readFully(RandomAccessFile.java:397)
at java.io.RandomAccessFile.readFully(RandomAccessFile.java:377)
at
org.apache.cassandra.utils.BytesReadTracker.readFully(BytesReadTracker.java:95)
at
org.apache.cassandra.utils.ByteBufferUtil.read(ByteBufferUtil.java:401)
at
org.apache.cassandra.utils.ByteBufferUtil.readWithLength(ByteBufferUtil.java:363)
at
org.apache.cassandra.db.ColumnSerializer.deserialize(ColumnSerializer.java:114)
at
org.apache.cassandra.db.ColumnSerializer.deserialize(ColumnSerializer.java:37)
at
org.apache.cassandra.db.ColumnFamilySerializer.deserializeColumns(ColumnFamilySerializer.java:144)
at
org.apache.cassandra.io.sstable.SSTableIdentityIterator.getColumnFamilyWithColumns(SSTableIdentityIterator.java:234)
at
org.apache.cassandra.db.compaction.PrecompactedRow.merge(PrecompactedRow.java:112)
... 19 more

*Also bad behaviour:*
ERROR [CompactionExecutor:1] 2013-05-29 13:12:58,896
AbstractCassandraDaemon.java (line 132) Exception in thread
Thread[CompactionExecutor:1,1,main]
java.io.IOError: java.io.IOException: java.util.zip.DataFormatException:
incomplete dynamic bit lengths tree
at

Re: (unofficial) Community Poll for Production Operators : Repair

2013-05-15 Thread horschi
Hi Alain,

have you had a look at the following tickets?

CASSANDRA-4905 - Repair should exclude gcable tombstones from merkle-tree
computation
CASSANDRA-4932 - Agree on a gcbefore/expirebefore value for all replica
during validation compaction
CASSANDRA-4917 - Optimize tombstone creation for ExpiringColumns
CASSANDRA-5398 - Remove localTimestamp from merkle-tree calculation (for
tombstones)

Imho these should reduce the over-repair to some degree. Especially when
using TTL. Some of them are already fixed in 1.2. The rest will (hopefully)
follow :-)

cheers,
Christian


On Wed, May 15, 2013 at 10:27 AM, Alain RODRIGUEZ arodr...@gmail.comwrote:

 Rob, I was wondering something. Are you a commiter working on improving
 the repair or something similar ?

 Anyway, if a commiter (or any other expert) could give us some feedback on
 our comments (Are we doing well or not, whether things we observe are
 normal or unexplained, what is going to be improved in the future about
 repair...)

 I am always interested on hearing about how things work and whether I am
 doing well or not.

 Alain




Re: MySQL Cluster performing faster than Cassandra cluster on single table

2013-04-16 Thread horschi
Hi Hannah,

mysql-cluster is a in-memory database.

In-memory is fast. But I dont think you ever be able to store hundreds of
Gigabytes of data on a node, which is something you can do with Cassandra.

If your dataset is small, then maybe NDB is the better choice for you. I
myself will not even touch it with stick any more, I hate it with a
passion. But this might depend on the use-case :-)

regards,
Christian

On Tue, Apr 16, 2013 at 12:56 PM, jrdn hannah j...@jrdnhannah.co.uk wrote:

 Hi,

 I was wondering if anybody here had any insight into this.

 I was running some tests on cassandra and mysql performance, with a two
 node and three node cassandra cluster, and a five node mysql cluster (mgmt,
 2 x api, 2 x data).

 On the cassandra 2 node cluster vs mysql cluster, I was getting a couple
 of strange results. For example, on updating a single table in MySQL, with
 the equivalent super column in Cassandra, I was getting results of 0.231 ms
 for MySQL and 1.248ms for Cassandra to perform the update 1000 times.

 Could anybody help explain why this is the case?

 Thanks,
 Hannah


Re: MySQL Cluster performing faster than Cassandra cluster on single table

2013-04-16 Thread horschi
Ah, I see, that makes sense. Have you got a source for the storing of
 hundreds of gigabytes? And does Cassandra not store anything in memory?

It stores bloom filters and index-samples in memory. But they are much
smaller than the actual data and they can be configured.



 Yeah, my dataset is small at the moment - perhaps I should have chosen
 something larger for the work I'm doing (University dissertation), however,
 it is far too late to change now!

On paper mysql-cluster looks great. But in daily use its not as nice as
Cassandra (where you have machines dying, networks splitting, etc.).

cheers,
Christian


Re: Repair does not fix inconsistency

2013-04-04 Thread horschi
Hi Michal,

Let's say the tombstone on one of the nodes (X) is gcable and was not
 compacted (purged) so far. After it was created we re-created this row, but
 due some problems it was written only to the second node (Y), so we have
 live data on node Y which is newer than the gcable tombstone on replica
 node X. Some time ago we did NOT repair our cluster for a  while (well,
 pretty long while), so it's possible that such situation happened.

 My concern is: will AntiEntropy ignore this tombstone only, or basically
 everything related to the row key that this tombstone was created for?

It will only ignore the tombstone (which should have been repair in a
previous repair anyway - assuming you to repairs within gc_grace). Any
newer columns (overwriting the tombstone) would be still alive and would
not be ignored.

The only way for CASSANDRA-4905 to make any difference is to not run repair
within gc_grace. With the patch it would not repair these old tombstones
any more. But in that case you should simply increase gc_grace and not undo
the patch :-)



When I query (cqlsh) some rows by key (CL is default = ONE) I _always_ get
a correct result.  However, when I query it by indexed column, it returns
nothing.
This looks to me more like a secondary index issue. If you say the access
via rowkey is always correct, then the repair works fine. I think there
might be something wrong with your secondary index then.


Cheers,
Christian


Re: Repair does not fix inconsistency

2013-04-04 Thread horschi
Hi,

This was my first thought too, but if you take a look at the logs I
 attached to previous e-mail, you'll notice that query by key
 (no-index.log) retrieves data from BOTH replicas, while the by indexed
 column one (index.log) talks only to one of them (too bad it's the one
 that contains tombstone only - 1:7). In the first case it is possible to
 resolve the conflict and return the proper result, while in the second
 case it's impossible because tombstone is the only thing that is returned
 for this key.

Sorry, I did not look into the logs. Thats the first time I'm seeing the
trace btw. :-)

Does CQL not allow CL=ONE queries? Why does it ask two nodes for the key,
when you say that you are using CL=default=1? I'm a bit confused here (I'm
a thrift user).

But thinking about your theory some more: I think CASSANDRA-4905 might make
reappearing columns more common (only if you do not run repair within
gc_grace of course). Before CASSANDRA-4905 the tombstones would be repaired
even after gc_grace, so it was a bit more forgiving. It was never
guaranteed that the inconsistency would be repaired though.

I think you should have increased gc-grace or run repair within the 10 days.


The repair bit makes sense now in my head, unlike the CQL CL :-)

cheers,
Christian


Re: Repair does not fix inconsistency

2013-04-04 Thread horschi
 Well... Strange. We have such problem with 6 users, but there's only ONE
 tombstone (created 8 days ago, so it's not gcable yet) in all the SSTables
 on 2:1 node - checked using sstable2json.
 Moreover, this tombstone DOES NOT belong to the row key I'm using for
 tests, because this user was NOT ever removed / replaced.
 So now I have no bloody idea how C* can see a tombstone for this key when
 running a query. Maybe it's a index problem then?

Yes, maybe there are two issues here: repair not running and maybe really
some index-thing.

Maybe you can try a CL=ONE with cassandra-cli? So that we can see how it
works without index.


it tells me about Key cache hit for  sstable for SSTables 23  24, but I
 don't have such SSTables for Users CF. However, I have SSTables like this
 for index.

I think the Index-SSTables and the data SSTables are compacted separately
and the numbers can differ from the data, even though they are flushed
together. So the numbers can differ. (anybody feel free to correct me on
this)


Re: Repair does not fix inconsistency

2013-04-04 Thread horschi
Repair is fine - all the data seem to be in SSTables. I've checked it and
 while index tells me that I have 1 tombstone and 0 live cells for a key, I
 can _see_, thanks to sstable2json, that I have 3 live cells (assuming a
 cell is an entry in SSTable) and 0 tombstones. After being confused for the
 most of the day, now I'm almost sure it's a problem with index (re)building.

I'm glad to hear that. I feared my ticket might be responsible for your
data loss. I could not live the guilt ;-) Seriously: I'm glad we can rule
out the repair change.



 The same: for key-based query it returns correct result no matter if I use
 CL=ONE or TWO (or stronger). When querying by indexed column it works for
 CL=TWO or more, but returns no data for CL=ONE.

Yes, if it works with CL=one, then it must be the index. Check the
mailing-list, I think someone else posted something similar the other day.


cheers,
Christian


Re: repair, compaction, and tombstone rows

2012-11-05 Thread horschi
 - ... ExpiringColumn not create any tombstones? Imo this could be safely
  done if the columns TTL is = gcgrace.

 Yes, if the TTL = gcgrace this would be safe and I'm pretty sure we
 use to have a ticket for that (can't find it back with a quick search
 but JIRA search suck and I didn't bother long). But basically we
 decided to not do it for now for 2 reasons:...

The only ticket I found that was anything similar is CASSANDRA-4565. I have
my doubts that you meant that one :-)

I dont know what your approach was back then, but maybe it could be solved
quite easily: When creating tombstones for ExpiringColumns, we could use
the ExpiringColumn.timestamp to set the DeletedColumn.localDeletionTime .
So instead of using the deletiontime of the ExpiringColumn, we use the
creationtime.

In the ExpiringColumn class this would like this:

public static Column create(ByteBuffer name, ByteBuffer value, long
timestamp, int timeToLive, int localExpirationTime, int expireBefore,
IColumnSerializer.Flag flag)
{
if (localExpirationTime = expireBefore || flag ==
IColumnSerializer.Flag.PRESERVE_SIZE)
return new ExpiringColumn(name, value, timestamp, timeToLive,
localExpirationTime);
// the column is now expired, we can safely return a simple tombstone
return new DeletedColumn(name, *timestamp/1000*, timestamp); // uses
creation timestamp for ExpiringColumn
// return new DeletedColumn(name, localExpirationTime, timestamp); //
old code
}

Imo this makes tombstones of DeletedColumns live only as long as they need
to be:
In case you specify ExpireColumn.TTL  10days, then the created
DeletedColumn would have a timestamp thats 10days in the past, which makes
it obsolete for gc right away. With ttl=5days the tombstone stays for 5
days, enough for either the ExpiringColumn or the Tombstone to be repaired.





  - ... ExpiringColumn not add local timestamp to digest?

 As I said in a previous thread, I don't see what the problem is here.
 The timestamp is not local to the node, it is assigned once and for
 all by the coordinator at insert time. I can agree that it's not
 really useful per se to the digest, but I don't think it matters in
 any case.

Oh sorry, you're right, I mixed something up there. Its DeletedColumn that
has the localtimestamp (as value). It takes a localDeletionTime (which is
supplied by RowMutation.delete) and uses that a value for the
DeletedColumn. This value is used by Column to update the digest.



Sorry for not letting this go, but I think there are some low hanging
fruits here.

cheers,
Christian


Re: repair, compaction, and tombstone rows

2012-11-03 Thread horschi
Sure, created CASSANDRA-4905. I understand that these tombstones will be
still streamed though. Thats fine with me.

Do you mind if I ask where you stand on making...
- ... ExpiringColumn not create any tombstones? Imo this could be safely
done if the columns TTL is = gcgrace. That way it is ensured that repair
ran and any previous un-TTLed columns were overwritten.
- ... ExpiringColumn not add local timestamp to digest?

Cheers,
Christian


On Sat, Nov 3, 2012 at 8:37 PM, Sylvain Lebresne sylv...@datastax.comwrote:

 On Fri, Nov 2, 2012 at 10:46 AM, horschi hors...@gmail.com wrote:
  might I ask why repair cannot simply ignore anything that is older than
  gc-grace? (like Aaron proposed)

 Well, actually the merkle tree computation could probably ignore
 gcable tombstones without much problem, which might not be such a bad
 idea and would probably solve much of your problem. However, when we
 stream the ranges that needs to be repaired, we stream sub-part of the
 sstables without deserializing them, so we can't exclude the gcable
 tombstones in that phase (that is, it's a technical reason, but
 deserializing data would be inefficient). Meaning that if we won't
 guarantee that you won't ever stream gcable tombstones.

 But excluding gcable tombstones from the merkle-tree computation is a
 good idea. Would you mind opening a JIRA ticket?

 --
 Sylvain



Re: repair, compaction, and tombstone rows

2012-11-02 Thread horschi
Hi Sylvain,

might I ask why repair cannot simply ignore anything that is older than
gc-grace? (like Aaron proposed)  I agree that repair should not process any
tombstones or anything. But in my mind it sounds reasonable to make repair
ignore timed-out data. Because the timestamp is created on the client,
there is no reason to repair these, right?

We are using TTLs quite heavily and I was noticing that every repair
increases the load of all nodes by 1-2 GBs, where each node has about
20-30GB of data. I dont know if this increases with the data-volume. The
data is mostly time-series data.
I even noticed an increase when running two repairs directly after each
other. So even when data was just repaired, there is still data being
transferred. I assume this is due some columns timing out within that
timeframe and the entire row being repaired.

regards,
Christian

On Thu, Nov 1, 2012 at 9:43 AM, Sylvain Lebresne sylv...@datastax.comwrote:

  Is this a feature or a bug?

 Neither really. Repair doesn't do any gcable tombstone collection and
 it would be really hard to change that (besides, it's not his job). So
 if you when you run repair there is sstable with tombstone that could
 be collected but are not yet, then yes, they will be streamed. Now the
 theory is that compaction will run often enough that gcable tombstone
 will be collected in a reasonably timely fashion and so you will never
 have lots of such tombstones in general (making the fact that repair
 stream them largely irrelevant). That being said, in practice, I don't
 doubt that there is a few scenario like your own where this still can
 lead to doing too much useless work.

 I believe the main problem is that size tiered compaction has a
 tendency to not compact the largest sstables very often. Meaning that
 you could have large sstable with mostly gcable tombstone sitting
 around. In the upcoming Cassandra 1.2,
 https://issues.apache.org/jira/browse/CASSANDRA-3442 will fix that.
 Until then, if you are no afraid of a little bit of scripting, one
 option could be before running a repair to run a small script that
 would check the creation time of your sstable. If an sstable is old
 enough (for some value of that that depends on what is the TTL you use
 on all your columns), you may want to force a compaction (using the
 JMX call forceUserDefinedCompaction()) of that sstable. The goal being
 to get read of a maximum of outdated tombstones before running the
 repair (you could also alternatively run a major compaction prior to
 the repair, but major compactions have a lot of nasty effect so I
 wouldn't recommend that a priori).

 --
 Sylvain



Re: repair, compaction, and tombstone rows

2012-11-02 Thread horschi
 IIRC, tombstone timestamps are written by the server, at compaction
 time. Therefore if you have RF=X, you have X different timestamps
 relative to GCGraceSeconds. I believe there was another thread about
 two weeks ago in which Sylvain detailed the problems with what you are
 proposing, when someone else asked approximately the same question.

Oh yes, I forgot about the thread. I assume you are talking about:
http://grokbase.com/t/cassandra/user/12ab6pbs5n/unnecessary-tombstones-transmission-during-repair-process

I think these are multiple issues that correlate with each other:

1) Repair uses the local timestamp of DeletedColumns for Merkle tree
calculation. This is what the other thread was about.
Alexey claims that this was fixed by some other commit:
https://issues.apache.org/jira/secure/attachment/12544204/CASSANDRA-4561-CS.patch
But honestly, I dont see how this solves it. I understand how Alexeys patch
a few messages before would solve it (by overriding the updateDigest method
in DeletedColumn)

2) ExpiringColumns should not be used for merkle tree calculation if they
are timed out.
I checked LazilyCompactedRow and saw that it does not exclude any timed-out
columns. It loops over all columns and calls updateDigest on them. Without
any condition. Imho ExpiringColumn.updateDigest() should check for its own
isMarkedForDelete() first before doing any digest-changes (We cannot simply
call isMarkedDelete from LazilyCompactionRow because we dont want this for
DeletedColumns).

3) Cassandra should not create tombstones for expiring columns.
I am not a 100% sure, but it looks to me like cassandra creates tombstones
for expired ExpiringColumns. This makes me wonder if we could delete
expired columns directly. The digest for a ExpiringColumn and DeletedColumn
can never match, due to the different implementations. So there will be
always a repair if compactions are not synchronous on nodes.
Imho it should be valid to delete ExpiringColumns directly, because the TTL
is given by the client and should pass on all nodes at the same time.

All together should reduce over-repair.


Merkle trees are an optimization, what they trade for this
 optimization is over-repair.

 (FWIW, I agree that, if possible, this particular case of over-repair
 would be nice to eliminate.)

Of course, rather over-repair than corrupt something.


Re: Cassandra vs Couchbase benchmarks

2012-10-01 Thread horschi
Hi Andy,

things I find odd:

- Replicacount=1 for mongo and couchdb. How is that a realistic benchmark?
I always want at least 2 replicas for my data. Maybe thats just me.
- On the Mongo Config slide they said they disabled journaling. Why do you
disable all safety mechanisms that you would want in a production
environment? Maybe they should have added /dev/null to their benchmark ;-)
- I dont see the replicacount for Cassandra in the slides. Also CL is not
specified. Imho the important stuff is missing in the cassandra
configuration.
- In the goals section it said more data than RAM. But they only have
12GB data per node, with 15GB of RAM per node!

I am very interested in a recent cassandra-benchmark, but I find this
benchmark very disappointing.

cheers,
Christian


On Mon, Oct 1, 2012 at 5:05 PM, Andy Cobley
acob...@computing.dundee.ac.ukwrote:

 There are some interesting results in the benchmarks below:

 http://www.slideshare.net/renatko/couchbase-performance-benchmarking

 Without starting a flame war etc, I'm interested if these results should
 be considered Fair and Balanced or if the methodology is flawed in some
 way ? (for instance is the use of Amazon EC2 sensible for Cassandra
 deployment) ?

 Andy



 The University of Dundee is a Scottish Registered Charity, No. SC015096.





Re: are counters stable enough for production?

2012-09-18 Thread horschi
The repair of taking the highest value of two inconsistent might cause
getting higher values?


If a counter counts backwards (therefore has negative values), then repair
would still choose the larger value? Or does cassandra take the highter
absolute value?  This would result to an undercounting in case of an error
instead of an overcount.

To go further, would it maybe be an idea to count everything twice? One as
postive value and once as negative value. When reading the counters, the
application could just compare the negative and positive counter to get an
error margin.

Has anybody tried something like this? I assume most people would rather
have an under- than an overcount.

cheers,
Christian


Re: Cassandra Evaluation/ Benchmarking: Throughput not scaling as expected neither latency showing good numbers

2012-07-17 Thread horschi
When they say linear scalibility they mean throughput scales with the
amount of machines in your cluster.

Try adding more machines to your cluster and measure the thoughput. I'm
pretty sure you'll see linear scalibility.

regards,
Christian


On Tue, Jul 17, 2012 at 6:13 AM, Code Box codeith...@gmail.com wrote:

 I am doing Cassandra Benchmarking using YCSB for evaluating the best
 performance for my application which will be both read and write intensive.
 I have set up a three cluster environment on EC2 and i am using YCSB in the
 same availability region as a client. I have tried various combinations of
 tuning cassandra parameters like FSync ( Setting to batch and periodic ),
 Increasing the number of rpc_threads, increasing number of concurrent reads
 and concurrent writes, write consistency one and Quorum i am not getting
 very great results and also i do not see a linear graph in terms of
 scalability that is if i increase the number of clients i do not see an
 increase in the throughput.

 Here are some sample numbers that i got :-

 *Test 1:-  Write Consistency set to Quorum Write Proportion = 100%. FSync
 = Batch and Window = 0ms*

 ThreadsThroughput ( write per sec ) Avg Latency (ms)TP95(ms) TP99(ms)
 Min(ms)Max(ms)


  102149 3.1984 51.499291   1004070 23.82870 2.22602004151 45.96571301.7
 1242 300419764.68 1154222.09 216


 If you look at the numbers the number of threads do not increase the
 throughput. Also the latency values are not that great. I am using fsync
 set to batch and with 0 ms window.

 *Test 2:- ** Write Consistency set to Quorum Write Proportion = 100%.
 FSync = Periodic and Window = 1000 ms*
 *
 *
 1803 1.23712 1.012312.9Q 10015944 5.343925 1.21579.1Q 200196309.047 1970
 1.17 1851Q
 Are these numbers expected numbers or does Cassandra perform better ? Am i
 missing something ?