Re: Big Data Question

2023-08-21 Thread daemeon reiydelle
- k8s

   1. depending on the version and networking, number of containers per
   node, nodepooling, etc. you can expect to see 1-2% additional storage IO
   latency (depends on whether all are on the same network vs. separate
   storage IO TCP network)
   2. System overhead may be 3-15% depending on what security mitigations
   are in place (if you own the systems and workload is dedicated, turn them
   off!)
   3. c* pod loss recovery is the big win here. pod failure and recovery
   (e.g. to another node) will bring up the SAME c* node as of the node
   failure (so only a few updates). Perhaps 2x replication, or none if the
   storage itself is replicated.

I wonder if you folks have already set out OLA's for "minimum outage" with
no data loss? Write amplification is mostly only a problem when networks
are heavily used. May not even be an issue in your case.
*.*
*Arthur C. Clarke famously said that "technology sufficiently advanced is
indistinguishable from magic." Magic is coming, and it's coming for all of
us*

*Daemeon Reiydelle*
*email: daeme...@gmail.com *
*LI: https://www.linkedin.com/in/daemeonreiydelle/
<https://www.linkedin.com/in/daemeonreiydelle/>*
*San Francisco 1.415.501.0198/Skype daemeon.c.m.reiydelle*


On Mon, Aug 21, 2023 at 8:49 AM Patrick McFadin  wrote:

> ...and a shameless plug for the Cassandra Summit in December. We have a
> talk from somebody that is doing 70TB per node and will be digging into all
> the aspects that make that work for them. I hope everyone in this thread is
> at that talk! I can't wait to hear all the questions.
>
> Patrick
>
> On Mon, Aug 21, 2023 at 8:01 AM Jeff Jirsa  wrote:
>
>> There's a lot of questionable advice scattered in this thread. Set aside
>> most of the guidance like 2TB/node, it's old and super nuanced.
>>
>> If you're bare metal, do what your organization is good at. If you have
>> millions of dollars in SAN equipment and you know how SANs work and fail
>> and get backed up, run on a SAN if your organization knows how to properly
>> operate a SAN. Just make sure you understand it's a single point of failure.
>>
>> If you're in the cloud, EBS is basically the same concept. You can lose
>> EBS in an AZ, just like you can lose SAN in a DC. Persist outside of that.
>> Have backups. Know how to restore them.
>>
>> The reason the "2TB/node" limit was a thing was around time to recover
>> from failure more than anything else. I described this in detail here, in
>> 2015, before faster-streaming in 4.0 was a thing :
>> https://stackoverflow.com/questions/31563447/cassandra-cluster-data-density-data-size-per-node-looking-for-feedback-and/31690279
>> . With faster streaming, IF you use LCS (so faster streaming works), you
>> can probably go at least 4-5x more dense than before, if you understand how
>> likely your disks are to fail and you can ensure you dont have correlated
>> failures when they age out (that means if you're on bare metal, measuring
>> flash life, and ideally mixing vendors to avoid firmware bugs).
>>
>> You'll still see risks of huge clusters, largely in gossip and schema
>> propagation. Upcoming CEPs address those. 4.0 is better there (with schema,
>> especially) than 3.0 was, but for "max nodes in a cluster", what you're
>> really comparing is "how many gossip speakers and tokens are in the
>> cluster" (which means your vnode settings matter, for things like pending
>> range calculators).
>>
>> Looking at the roadmap, your real question comes down to :
>> - If you expect to use the transactional features in Accord/5.0 to
>> transact across rows/keys, you probably want to keep one cluster
>> - If you dont ever expect to use multi-key transactions, just de-risk by
>> sharding your cluster into many smaller clusters now, with consistent
>> hashing to map keys to clusters, and have 4 clusters of the same smaller
>> size, with whatever node density you think you can do based on your
>> compaction strategy and streaming rate (and disk type).
>>
>> If you have time and budget, create a 3 node cluster with whatever disks
>> you have, fill them, start working on them - expand to 4, treat one as
>> failed and replace it - simulate the operations you'll do at that size.
>> It's expensive to mimic a 500 host cluster, but if you've got budget, try
>> it in AWS and see what happens when you apply your real schema, and then do
>> a schema change.
>>
>>
>>
>>
>>
>> On Mon, Aug 21, 2023 at 7:31 AM Joe Obernberger <
>> joseph.obernber...@gmail.com> wrote:
>>
>>> For our scenario, the goal is to minimize down-time for a single (at
>>> l

Re: Big Data Question

2023-08-17 Thread daemeon reiydelle
I started to respond, then realized I and the other OP posters are not
thinking the same: What is the business case for availability, data
los/reload/recoverability? You all argue for higher availability and damn
the cost. But noone asked "can you lose access, for 20 minutes, to a
portion of the data, 10 times a year, on a 250 node cluster in AWS, if it
is not lost"? Can you lose access 1-2 times a year for the cost of a 500
node cluster holding the same data?

Then we can discuss 32/64g JVM and SSD's.
*.*
*Arthur C. Clarke famously said that "technology sufficiently advanced is
indistinguishable from magic." Magic is coming, and it's coming for all of
us*

*Daemeon Reiydelle*
*email: daeme...@gmail.com *
*LI: https://www.linkedin.com/in/daemeonreiydelle/
<https://www.linkedin.com/in/daemeonreiydelle/>*
*San Francisco 1.415.501.0198/Skype daemeon.c.m.reiydelle*


On Thu, Aug 17, 2023 at 1:53 PM Joe Obernberger <
joseph.obernber...@gmail.com> wrote:

> Was assuming reaper did incremental?  That was probably a bad assumption.
>
> nodetool repair -pr
> I know it well now!
>
> :)
>
> -Joe
>
> On 8/17/2023 4:47 PM, Bowen Song via user wrote:
> > I don't have experience with Cassandra on Kubernetes, so I can't
> > comment on that.
> >
> > For repairs, may I interest you with incremental repairs? It will make
> > repairs hell of a lot faster. Of course, occasional full repair is
> > still needed, but that's another story.
> >
> >
> > On 17/08/2023 21:36, Joe Obernberger wrote:
> >> Thank you.  Enjoying this conversation.
> >> Agree on blade servers, where each blade has a small number of SSDs.
> >> Yeh/Nah to a kubernetes approach assuming fast persistent storage?  I
> >> think that might be easier to manage.
> >>
> >> In my current benchmarks, the performance is excellent, but the
> >> repairs are painful.  I come from the Hadoop world where it was all
> >> about large servers with lots of disk.
> >> Relatively small number of tables, but some have a high number of
> >> rows, 10bil + - we use spark to run across all the data.
> >>
> >> -Joe
> >>
> >> On 8/17/2023 12:13 PM, Bowen Song via user wrote:
> >>> The optimal node size largely depends on the table schema and
> >>> read/write pattern. In some cases 500 GB per node is too large, but
> >>> in some other cases 10TB per node works totally fine. It's hard to
> >>> estimate that without benchmarking.
> >>>
> >>> Again, just pointing out the obvious, you did not count the off-heap
> >>> memory and page cache. 1TB of RAM for 24GB heap * 40 instances is
> >>> definitely not enough. You'll most likely need between 1.5 and 2 TB
> >>> memory for 40x 24GB heap nodes. You may be better off with blade
> >>> servers than single server with gigantic memory and disk sizes.
> >>>
> >>>
> >>> On 17/08/2023 15:46, Joe Obernberger wrote:
> >>>> Thanks for this - yeah - duh - forgot about replication in my example!
> >>>> So - is 2TBytes per Cassandra instance advisable?  Better to use
> >>>> more/less?  Modern 2u servers can be had with 24 3.8TBtyte SSDs; so
> >>>> assume 80Tbytes per server, you could do:
> >>>> (1024*3)/80 = 39 servers, but you'd have to run 40 instances of
> >>>> Cassandra on each server; maybe 24G of heap per instance, so a
> >>>> server with 1TByte of RAM would work.
> >>>> Is this what folks would do?
> >>>>
> >>>> -Joe
> >>>>
> >>>> On 8/17/2023 9:13 AM, Bowen Song via user wrote:
> >>>>> Just pointing out the obvious, for 1PB of data on nodes with 2TB
> >>>>> disk each, you will need far more than 500 nodes.
> >>>>>
> >>>>> 1, it is unwise to run Cassandra with replication factor 1. It
> >>>>> usually makes sense to use RF=3, so 1PB data will cost 3PB of
> >>>>> storage space, minimal of 1500 such nodes.
> >>>>>
> >>>>> 2, depending on the compaction strategy you use and the write
> >>>>> access pattern, there's a disk space amplification to consider.
> >>>>> For example, with STCS, the disk usage can be many times of the
> >>>>> actual live data size.
> >>>>>
> >>>>> 3, you will need some extra free disk space as temporary space for
> >>>>> running compactions.
> >>>>>
> >>>>> 4, the data 

Re: Big Data Question

2023-08-17 Thread daemeon reiydelle
A lot of (actually all) seem to be based on local nodes with 1gb networks
of spinning rust. Much of what is mentioned below is TOTALLY wrong for
cloud. So clarify whether you are "real world" or rusty slow data center
world (definitely not modern DC either).

E.g. should not handle more than 2tb of ACTIVE disk, and that was for
spinning rust with maybe 1gb networks. 10tb of modern high speed SSD is
more typical with 10 or 40gb networks. If data is persisted to cloud
storage, replication should be 1, vm's fail over to new hardware. Obviously
if your storage is ephemeral, you have a different discussion. More of a
monologue with an idiot in Finance, but 
*.*
*Arthur C. Clarke famously said that "technology sufficiently advanced is
indistinguishable from magic." Magic is coming, and it's coming for all of
us*

*Daemeon Reiydelle*
*email: daeme...@gmail.com *
*LI: https://www.linkedin.com/in/daemeonreiydelle/
<https://www.linkedin.com/in/daemeonreiydelle/>*
*San Francisco 1.415.501.0198/Skype daemeon.c.m.reiydelle*


On Thu, Aug 17, 2023 at 6:13 AM Bowen Song via user <
user@cassandra.apache.org> wrote:

> Just pointing out the obvious, for 1PB of data on nodes with 2TB disk
> each, you will need far more than 500 nodes.
>
> 1, it is unwise to run Cassandra with replication factor 1. It usually
> makes sense to use RF=3, so 1PB data will cost 3PB of storage space,
> minimal of 1500 such nodes.
>
> 2, depending on the compaction strategy you use and the write access
> pattern, there's a disk space amplification to consider. For example,
> with STCS, the disk usage can be many times of the actual live data size.
>
> 3, you will need some extra free disk space as temporary space for
> running compactions.
>
> 4, the data is rarely going to be perfectly evenly distributed among all
> nodes, and you need to take that into consideration and size the nodes
> based on the node with the most data.
>
> 5, enough of bad news, here's a good one. Compression will save you (a
> lot) of disk space!
>
> With all the above considered, you probably will end up with a lot more
> than the 500 nodes you initially thought. Your choice of compaction
> strategy and compression ratio can dramatically affect this calculation.
>
>
> On 16/08/2023 16:33, Joe Obernberger wrote:
> > General question on how to configure Cassandra.  Say I have 1PByte of
> > data to store.  The general rule of thumb is that each node (or at
> > least instance of Cassandra) shouldn't handle more than 2TBytes of
> > disk.  That means 500 instances of Cassandra.
> >
> > Assuming you have very fast persistent storage (such as a NetApp,
> > PorterWorx etc.), would using Kubernetes or some orchestration layer
> > to handle those nodes be a viable approach?  Perhaps the worker nodes
> > would have enough RAM to run 4 instances (pods) of Cassandra, you
> > would need 125 servers.
> > Another approach is to build your servers with 5 (or more) SSD devices
> > - one for OS, four for each instance of Cassandra running on that
> > server.  Then build some scripts/ansible/puppet that would manage
> > Cassandra start/stops, and other maintenance items.
> >
> > Where I think this runs into problems is with repairs, or
> > sstablescrubs that can take days to run on a single instance.  How is
> > that handled 'in the real world'?  With seed nodes, how many would you
> > have in such a configuration?
> > Thanks for any thoughts!
> >
> > -Joe
> >
> >
>


Re: TLS/SSL overhead

2022-02-06 Thread daemeon reiydelle
the % numbers seen high for a clean network and a reasonable fast client.
The 5% really not reasonable. No jumbo frames? No network retries
(netstats)?



*Daemeon Reiydelle*

*email: daeme...@gmail.com *
*San Francisco 1.415.501.0198/Skype daemeon.c.m.reiydelle*

*"Why is it so hard to rhyme either Life or Love?" - Sondheim*


On Sun, Feb 6, 2022 at 6:06 PM Dinesh Joshi  wrote:

> I wish there was an easy answer to this question. Like you pointed out it
> is hardware dependent but software stack plays a big part. For instance,
> the JVM you're running makes a difference too. Cassandra comes with netty
> and IIRC we include tcnative which accelerates TLS. You could also slip
> Amazon's Corretto Crypto Provider into your runtime. I am not suggesting
> using everything all at once but a combination of libraries, runtimes, JVM,
> OS, cipher suites can make a big difference. Therefore it is best to try it
> out on your stack.
>
> Typically modern hardware has accelerators for common encryption
> algorithms. If the software stack enables you to optimally take advantage
> of the hardware then you could see very little to no impact on latencies.
>
> Cassandra maintains persistent connections therefore the visible impact is
> on connection establishment time (TLS handshake is expensive). Encryption
> will make thundering herd problems worse. You should watch out for those
> two issues.
>
> Dinesh
>
>
> On Feb 5, 2022, at 3:53 AM, onmstester onmstester 
> wrote:
>
> Hi,
>
> Anyone measured impact of wire encryption using TLS
> (client_encryption/server_encryption) on cluster latency/throughput?
> It may be dependent on Hardware or even data model but I already did some
> sort of measurements and got to 2% for client encryption and 3-5% for
> client + server encryption and wanted to validate that with community.
>
> Best Regards
>
> Sent using Zoho Mail <https://www.zoho.com/mail/>
>
>
>
>


Re: about memory problem in write heavy system..

2022-01-07 Thread daemeon reiydelle
Maybe SSD's? Take a look at the IO read/write wait times.

FYI, your config changes simply push more activity into memory. Trading IO
for mem footprint ;{)

*Daemeon Reiydelle*

*email: daeme...@gmail.com *
*San Francisco 1.415.501.0198/Skype daemeon.c.m.reiydelle*

Cognitive Bias: (written in 1935) ...
One of the painful things about our time is that those who feel certainty
are stupid, and those with any imagination and understanding are filled
with doubt and indecision. - Bertrand Russel



On Fri, Jan 7, 2022 at 8:27 AM Jeff Jirsa  wrote:

> 3.11.4 is a very old release, with lots of known bugs. It's possible the
> memory is related to that.
>
> If you bounce one of the old nodes, where does the memory end up?
>
>
> On Thu, Jan 6, 2022 at 3:44 PM Eunsu Kim  wrote:
>
>>
>> Looking at the memory usage chart, it seems that the physical memory
>> usage of the existing node has increased since the new node was added with
>> auto_bootstrap=false.
>>
>>
>>
>>
>> On Fri, Jan 7, 2022 at 1:11 AM Eunsu Kim  wrote:
>>
>>> Hi,
>>>
>>> I have a Cassandra cluster(3.11.4) that does heavy writing work.
>>> (14k~16k write throughput per second per node)
>>>
>>> Nodes are physical machine in data center. Number of nodes are 30. Each
>>> node has three data disks mounted.
>>>
>>>
>>> A few days ago, a QueryTimeout problem occurred due to Full GC.
>>> So, referring to this blog(
>>> https://thelastpickle.com/blog/2018/04/11/gc-tuning.html), it seemed to
>>> have been solved by changing the memtable_allocation_type to
>>> offheap_objects.
>>>
>>> But today, I got an alarm saying that some nodes are using more than 90%
>>> of physical memory. (115GiB /125GiB)
>>>
>>> Native memory usage of some nodes is gradually increasing.
>>>
>>>
>>>
>>> All tables use TWCS, and TTL is 2 weeks.
>>>
>>> Below is the applied jvm option.
>>>
>>> -Xms31g
>>> -Xmx31g
>>> -XX:+UseG1GC
>>> -XX:G1RSetUpdatingPauseTimePercent=5
>>> -XX:MaxGCPauseMillis=500
>>> -XX:InitiatingHeapOccupancyPercent=70
>>> -XX:ParallelGCThreads=24
>>> -XX:ConcGCThreads=24
>>> …
>>>
>>>
>>> What additional things can I try?
>>>
>>> I am looking forward to the advice of experts.
>>>
>>> Regards.
>>>
>>
>>


Re: Latest Supported RedHat Linux version for Cassandra 3.11

2021-09-27 Thread daemeon reiydelle
runs through 8.4 for sure.

*Daemeon Reiydelle*
*email: daeme...@gmail.com *
*LI: https://www.linkedin.com/in/daemeonreiydelle/
<https://www.linkedin.com/in/daemeonreiydelle/>*
*San Francisco 1.415.501.0198/Skype daemeon.c.m.reiydelle*

*“*“I have a different idea of elegance. I don't dress like a fop, it’s
true, but my moral grooming is impeccable. I never appear in public with a
soiled conscience, a tarnished honor, threadbare scruples, or an insult
that I haven't washed away. I'm always immaculately clean, adorned with
independence and frankness. I may not cut a stylish figure, but I hold my
soul erect. I wear my deeds as ribbons, my wit is sharper than the finest
mustache, and when I walk among men I make truths ring like spurs.”

Edmond Rostand, Cyrano de Bergerac


On Mon, Sep 27, 2021 at 9:10 AM Saha, Sushanta K <
sushanta.s...@verizonwireless.com> wrote:

> I am currently running Open Source Apache Cassandra 3.11.1 on RedHat 7.7.
> But, need to upgrade the OS to RedHat to 7.9 or 8.x.
>
> The site
> cassandra.apache.org/doc/latest/cassandra/getting_started/installing.html
> has listed "CentOS & RedHat Enterprise Linux (RHEL) including 6.6 to 7.7".
> FYI.
>
> Question : Can I run Cassandra 3.11.1 on RedHat 7.9 or 8.x?
>
> Thanks
>  Sushanta
>
>


Re: High mutation stage in multi dc deployment

2021-07-19 Thread daemeon reiydelle
You may want to think about the latency impacts of a cluster that has one
node "far away". This is such a basic design flaw that you need to do some
basic learning, and some basic understanding of networking and latency.





On Mon, Jul 19, 2021 at 10:38 AM MyWorld  wrote:

> Hi all,
>
> Currently we have a cluster with 2 DC of 3 nodes each. One DC is in GCP-US
> while other is in GCP-India. Just to add here, configuration of every node
> accross both DC is same. Cpu-6, Ram-32gb, Heap-8gb
>
> We do all our write on US data center. While performing a bulk write on
> GCP US, we observe normal load of 1 on US while this load at GCP India
> spikes to 10.
>
> On observing tpstats further in grafana we found mutation stage at GCP
> India is going to 1million intermittently though our overall write is
> nearly 300 per sec per node. Don't know the reason but whenever we have
> this spike, we are having load issue.
> Please help what could be the possible reason for this?
>
> Regards,
> Ashish
>


Re: underutilized servers

2021-03-05 Thread daemeon reiydelle
you did not specify read and write consistency levels, default would be to
hit two nodes (one for data, one for digest) with every query. Network load
of 50% is not too helpful. 1gbit? 10gbit? 50% of each direction or average
of both?

Iowait is not great for a system of this size: assuming that you have 3
vm's on THREE SEPARATE physical systems and WITHOUT network attached storage
...


*Daemeon Reiydelle*
*email: daeme...@gmail.com *
*LI: https://www.linkedin.com/in/daemeonreiydelle/
<https://www.linkedin.com/in/daemeonreiydelle/>*
*San Francisco 1.415.501.0198/Skype daemeon.c.m.reiydelle*

"Life should not be a journey to the grave with the intention of arriving
safely in a pretty and well preserved body, but rather to skid in broadside
in a cloud of smoke, thoroughly used up, totally worn out, and loudly
proclaiming “Wow! What a Ride!" - Hunter S. Thompson


On Fri, Mar 5, 2021 at 6:48 AM Attila Wind  wrote:

> Hi guys,
>
> I have a DevOps related question - hope someone here could give some
> ideas/pointers...
>
> We are running a 3 nodes Cassandra cluster
> Recently we realized we do have performance issues. And based on
> investigation we took it seems our bottleneck is the Cassandra cluster. The
> application layer is waiting a lot for Cassandra ops. So queries are
> running slow on Cassandra side however due to our monitoring it looks the
> Cassandra servers still have lots of free resources...
>
> The Cassandra machines are virtual machines (we do own the physical hosts
> too) built with kvm - with 6 CPU cores (3 physical) and 32GB RAM dedicated
> to it.
> We are using Ubuntu Linux 18.04 distro - everywhere the same version (the
> physical and virtual host)
> We are running Cassandra 4.0-alpha4
>
> What we see is
>
>- CPU load is around 20-25% - so we have lots of spare capacity
>- iowait is around 2-5% - so disk bandwidth should be fine
>- network load is around 50% of the full available bandwidth
>- loadavg is max around 4 - 4.5 but typically around 3 (because of the
>cpu count 6 should represent 100% load)
>
> and still, query performance is slow ... and we do not understand what
> could hold Cassandra back to fully utilize the server resources...
>
> We are clearly missing something!
> Anyone any idea / tip?
>
> thanks!
> --
> Attila Wind
>
> http://www.linkedin.com/in/attilaw
> Mobile: +49 176 43556932
>
>
>


Re: AWS ephemeral instances + backup

2019-12-05 Thread daemeon reiydelle
If you can handle the slower IO of S3 this can work, but you will have a
window of out of date images. YOu don't have a concept of persistent
snapshots.

<==>
Life lived is not about the size of the dog in the fight:
It is about the size of the fight in the dog.

*Daemeon Reiydelle*

*email: daeme...@gmail.com *
*San Francisco 1.415.501.0198/Skype daemeon.c.m.reiydelle*



On Thu, Dec 5, 2019 at 2:06 PM Jon Haddad  wrote:

> You can easily do this with bcache or LVM
> http://rustyrazorblade.com/post/2018/2018-04-24-intro-to-lvm/.
>
> Medusa might be a good route to go down if you want to do backups instead:
> https://thelastpickle.com/blog/2019/11/05/cassandra-medusa-backup-tool-is-open-source.html
>
>
>
> On Thu, Dec 5, 2019 at 12:21 PM Carl Mueller
>  wrote:
>
>> Does anyone have experience tooling written to support this strategy:
>>
>> Use case: run cassandra on i3 instances on ephemerals but synchronize the
>> sstables and commitlog files to the cheapest EBS volume type (those have
>> bad IOPS but decent enough throughput)
>>
>> On node replace, the startup script for the node, back-copies the
>> sstables and commitlog state from the EBS to the ephemeral.
>>
>> As can be seen:
>> https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/EBSVolumeTypes.html
>>
>> the (presumably) spinning rust tops out at 2375 MB/sec (using
>> multiple EBS volumes presumably) that would incur about a ten minute delay
>> for node replacement for a 1TB node, but I imagine this would only be used
>> on higher IOPS r/w nodes with smaller densities, so 100GB would be about a
>> minute of delay only, already within the timeframes of an AWS node
>> replacement/instance restart.
>>
>>
>>


Re: JOB | The Last Pickle (Consultant) in USA

2019-11-20 Thread daemeon reiydelle
Sounds VERY interesting! If the resume passes the BS sniff test (I do big
data, which has included C* for a NUMBER of years), I would love to chat.
FYI I do a fair amount of readiness assessments, before, during (with
laughable results), and now/after my tenure at Accenture/Avanade.

Cheers, D.

<==>
Made weak by time and fate, but strong in will,
To strive, to seek, to find, and not to yield.
Ulysses - A. Lord Tennyson

*Daemeon C.M. Reiydelle*

*email: daeme...@gmail.com *
*San Francisco 1.415.501.0198/Skype daemeon.c.m.reiydelle*



On Wed, Nov 20, 2019 at 6:24 AM Mick Semb Wever  wrote:

>
> The Last Pickle is hiring in the US:
> https://thelastpickle.com/blog/2019/10/24/tlp-is-hiring-another-consultant.html
>
> If you enjoy Cassandra like we do, and are keen to join our team, reach
> out (see details in link above).
>
> regards,
> Mick
>
> -
> To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
> For additional commands, e-mail: user-h...@cassandra.apache.org
>
>
<>

-
To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
For additional commands, e-mail: user-h...@cassandra.apache.org

Re: Aws instance stop and star with ebs

2019-11-06 Thread daemeon reiydelle
No connection timeouts? No tcp level retries? I am sorry truly sorry but
you have exceeded my capability. I have never seen a java.io timeout with
out either a session half open failure (no response) or multiple retries.

I am out of my depth, so please feel free to ignore but, did you see the
packets that are making the initial connection (which must have timed out)?
Out of curiosity, a netstat -arn must be showing bad packets, timeouts,
etc. To see progress, create a simple shell script that dumps date, dumps
netstat, sleeps 100 seconds, repeated. During that window stop, wait 10
seconds, restart the remove node.

<==>
Made weak by time and fate, but strong in will,
To strive, to seek, to find, and not to yield.
Ulysses - A. Lord Tennyson

*Daemeon C.M. Reiydelle*

*email: daeme...@gmail.com *
*San Francisco 1.415.501.0198/Skype daemeon.c.m.reiydelle*



On Wed, Nov 6, 2019 at 9:11 AM Rahul Reddy  wrote:

> Thank you.
>
> I have stopped instance in east. i see that all other instances can gossip
> to that instance and only one instance in west having issues gossiping to
> that node.  when i enable debug mode i see below on the west node
>
> i see bellow messages from 16:32 to 16:47
> DEBUG [RMI TCP Connection(272)-127.0.0.1] 2019-11-06 16:44:50,
> 417 StorageProxy.java:2361 - Hosts not in agreement. Didn't get a response
> from everybody:
> 424 StorageProxy.java:2361 - Hosts not in agreement. Didn't get a response
> from everybody:
>
> later i see timeout
>
> DEBUG [MessagingService-Outgoing-/eastip-Gossip] 2019-11-06 16:47:04,831
> OutboundTcpConnection.java:350 - Error writing to /eastip
> java.io.IOException: Connection timed out
>
> then  INFO  [GossipStage:1] 2019-11-06 16:47:05,792 StorageService.j
> ava:2289 - Node /eastip state jump to NORMAL
>
> DEBUG [GossipStage:1] 2019-11-06 16:47:06,244 MigrationManager
> .java:99 - Not pulling schema from /eastip, because sche
> ma versions match: local/real=cdbb639b-1675-31b3-8a0d-84aca18e
> 86bf, local/compatible=49bf1daa-d585-38e0-a72b-b36ce82da9cb, r
> emote=cdbb639b-1675-31b3-8a0d-84aca18e86bf
>
> i tried running some tcpdump during that time i dont see any packet loss
> during that time.  still unsure why east instance which was stopped and
> started unreachable to west node almost for 15 minutes.
>
>
> On Tue, Nov 5, 2019 at 10:14 PM daemeon reiydelle 
> wrote:
>
>> 10 minutes is 600 seconds, and there are several timeouts that are set to
>> that, including the data center timeout as I recall.
>>
>> You may be forced to tcpdump the interface(s) to see where the chatter
>> is. Out of curiosity, when you restart the node, have you snapped the jvm's
>> memory to see if e.g. heap is even in use?
>>
>>
>> On Tue, Nov 5, 2019 at 7:03 PM Rahul Reddy 
>> wrote:
>>
>>> Thanks Ben,
>>> Before stoping the ec2 I did run nodetool drain .so i ruled it out and
>>> system.log also doesn't show commitlogs being applied.
>>>
>>>
>>>
>>>
>>>
>>> On Tue, Nov 5, 2019, 7:51 PM Ben Slater 
>>> wrote:
>>>
>>>> The logs between first start and handshaking should give you a clue but
>>>> my first guess would be replaying commit logs.
>>>>
>>>> Cheers
>>>> Ben
>>>>
>>>> ---
>>>>
>>>>
>>>> *Ben Slater**Chief Product Officer*
>>>>
>>>> <https://www.instaclustr.com/platform/>
>>>>
>>>> <https://www.facebook.com/instaclustr>
>>>> <https://twitter.com/instaclustr>
>>>> <https://www.linkedin.com/company/instaclustr>
>>>>
>>>> Read our latest technical blog posts here
>>>> <https://www.instaclustr.com/blog/>.
>>>>
>>>> This email has been sent on behalf of Instaclustr Pty. Limited
>>>> (Australia) and Instaclustr Inc (USA).
>>>>
>>>> This email and any attachments may contain confidential and legally
>>>> privileged information.  If you are not the intended recipient, do not copy
>>>> or disclose its content, but please reply to this email immediately and
>>>> highlight the error to the sender and then immediately delete the message.
>>>>
>>>>
>>>> On Wed, 6 Nov 2019 at 04:36, Rahul Reddy 
>>>> wrote:
>>>>
>>>>> I can reproduce the issue.
>>>>>
>>>>> I did drain Cassandra node then stop and started Cassandra instance .
>>>>> Cassandra instance comes up but other nodes will be in DN state around 10
>>>>> minutes.
>&g

Re: Aws instance stop and star with ebs

2019-11-05 Thread daemeon reiydelle
10 minutes is 600 seconds, and there are several timeouts that are set to
that, including the data center timeout as I recall.

You may be forced to tcpdump the interface(s) to see where the chatter is.
Out of curiosity, when you restart the node, have you snapped the jvm's
memory to see if e.g. heap is even in use?


On Tue, Nov 5, 2019 at 7:03 PM Rahul Reddy  wrote:

> Thanks Ben,
> Before stoping the ec2 I did run nodetool drain .so i ruled it out and
> system.log also doesn't show commitlogs being applied.
>
>
>
>
>
> On Tue, Nov 5, 2019, 7:51 PM Ben Slater 
> wrote:
>
>> The logs between first start and handshaking should give you a clue but
>> my first guess would be replaying commit logs.
>>
>> Cheers
>> Ben
>>
>> ---
>>
>>
>> *Ben Slater**Chief Product Officer*
>>
>> 
>>
>> 
>> 
>> 
>>
>> Read our latest technical blog posts here
>> .
>>
>> This email has been sent on behalf of Instaclustr Pty. Limited
>> (Australia) and Instaclustr Inc (USA).
>>
>> This email and any attachments may contain confidential and legally
>> privileged information.  If you are not the intended recipient, do not copy
>> or disclose its content, but please reply to this email immediately and
>> highlight the error to the sender and then immediately delete the message.
>>
>>
>> On Wed, 6 Nov 2019 at 04:36, Rahul Reddy 
>> wrote:
>>
>>> I can reproduce the issue.
>>>
>>> I did drain Cassandra node then stop and started Cassandra instance .
>>> Cassandra instance comes up but other nodes will be in DN state around 10
>>> minutes.
>>>
>>> I don't see error in the systemlog
>>>
>>> DN  xx.xx.xx.59   420.85 MiB  256  48.2% id  2
>>> UN  xx.xx.xx.30   432.14 MiB  256  50.0% id  0
>>> UN  xx.xx.xx.79   447.33 MiB  256  51.1% id  4
>>> DN  xx.xx.xx.144  452.59 MiB  256  51.6% id  1
>>> DN  xx.xx.xx.19   431.7 MiB  256  50.1% id  5
>>> UN  xx.xx.xx.6421.79 MiB  256  48.9%
>>>
>>> when i do nodetool status 3 nodes still showing down. and i dont see
>>> errors in system.log
>>>
>>> and after 10 mins it shows the other node is up as well.
>>>
>>>
>>> INFO  [HANDSHAKE-/10.72.100.156] 2019-11-05 15:05:09,133
>>> OutboundTcpConnection.java:561 - Handshaking version with /stopandstarted
>>> node
>>> INFO  [RequestResponseStage-7] 2019-11-05 15:16:27,166
>>> Gossiper.java:1019 - InetAddress /nodewhichitwasshowing down is now UP
>>>
>>> what is causing delay for 10mins to be able to say that node is reachable
>>>
>>> On Wed, Oct 30, 2019, 8:37 AM Rahul Reddy 
>>> wrote:
>>>
 And also aws ec2 stop and start comes with new instance with same ip
 and all our file systems are in ebs mounted fine.  Does coming new instance
 with same ip cause any gossip issues?

 On Tue, Oct 29, 2019, 6:16 PM Rahul Reddy 
 wrote:

> Thanks Alex. We have 6 nodes in each DC with RF=3  with CL local
> qourum . and we stopped and started only one instance at a time . Tough
> nodetool status says all nodes UN and system.log says canssandra started
> and started listening . Jmx explrter shows instance stayed down longer how
> do we determine what caused  the Cassandra unavialbe though log says its
> stared and listening ?
>
> On Tue, Oct 29, 2019, 4:44 PM Oleksandr Shulgin <
> oleksandr.shul...@zalando.de> wrote:
>
>> On Tue, Oct 29, 2019 at 9:34 PM Rahul Reddy 
>> wrote:
>>
>>>
>>> We have our infrastructure on aws and we use ebs storage . And aws
>>> was retiring on of the node. Since our storage was persistent we did
>>> nodetool drain and stopped and start the instance . This caused 500 
>>> errors
>>> in the service. We have local_quorum and rf=3 why does stopping one
>>> instance cause application to have issues?
>>>
>>
>> Can you still look up what was the underlying error from Cassandra
>> driver in the application logs?  Was it request timeout or not enough
>> replicas?
>>
>> For example, if you only had 3 Cassandra nodes, restarting one of
>> them reduces your cluster capacity by 33% temporarily.
>>
>> Cheers,
>> --
>> Alex
>>
>>


Re: Ram & Space...

2019-10-23 Thread daemeon reiydelle
pretty clear evidence of a memory leak, tombstone problem (still memory),
etc.

If this is Apache, then you may need to do some heap dumps and see what is
going on (if it is java heap that is OOM'ing, which I suspect. Might want
to do some periodic vmstat or equivalent (brute force might be screen shots
of top -o mem first to see which processes are leaking?

But (smiling) could be the spirits telling you to actually USE the cluster?

<==>
Made weak by time and fate, but strong in will,
To strive, to seek, to find, and not to yield.
Ulysses - A. Lord Tennyson

*Daemeon C.M. Reiydelle*

*email: daeme...@gmail.com *
*San Francisco 1.415.501.0198/Skype daemeon.c.m.reiydelle*



On Wed, Oct 23, 2019 at 1:45 PM Paul Chandler  wrote:

> We had what sounds like a similar problem with a DSE cluster a little
> while ago, It was not being used, and had no tables in it. The memory kept
> rising until it was killed by the oom-killer.
>
> We spent along time trying to get to the bottom of the problem, but it
> suddenly stopped when the developers started using the cluster. Perhaps the
> same will happen when you start using yours.
>
> Thanks
>
> Paul
>
> On 23 Oct 2019, at 18:26, A  wrote:
>
> Thank you. But I have added any tables yet. It’s empty...
>
>
> Sent from Yahoo Mail for iPhone
> 
>
> On Tuesday, October 22, 2019, 1:15 AM, Matthias Pfau <
> matthias.p...@tutao.de.INVALID> wrote:
>
> Did you check nodetool status and logs? If so, what is reported?
>
> Regarding that more and more memory is used. This might be a problem with
> your table design. I would start analyzing nodetool tablestats output. It
> reports how much memory (especially off heap) is used by which table.
>
> Best,
> Matthias
>
>
> Oct 19, 2019, 18:46 by htt...@yahoo.com.INVALID:
> What are minimum and recommended ram and space requirements to run
> Cassandra in AWS?
>
> Every like 24 hours Cassandra stops working. Even though the service is
> active, it’s dead and non responsive until I restart the service.
>
> Top shows %MEM slowly creeping upwards. Yesterday it showed 75%.
>
> In the logs it throws that Cassandra is running in degraded mode and that
> I should consider adding more space to the free 25G...
>
> Thanks in advance for your help. Newbie here... lots to learn.
>
> Angel
>
>
> Sent from Yahoo Mail for iPhone
>
>
>
> -
> To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
> For additional commands, e-mail: user-h...@cassandra.apache.org
>
>
>


Re: Looking for feedback on automated root-cause system

2019-02-19 Thread daemeon reiydelle
Welcome to the world of testing predictive analytics. I will pass this on
to my folks at Accenture, know of a couple of C* clients we run, wondering
what you had in mind?


*Daemeon C.M. Reiydelle*

*email: daeme...@gmail.com *
*San Francisco 1.415.501.0198/London 44 020 8144 9872/Skype
daemeon.c.m.reiydelle*



On Tue, Feb 19, 2019 at 3:35 PM Matthew Stump  wrote:

> Howdy,
>
> I’ve been engaged in the Cassandra user community for a long time, almost
> 8 years, and have worked on hundreds of Cassandra deployments. One of the
> things I’ve noticed in myself and a lot of my peers that have done
> consulting, support or worked on really big deployments is that we get
> burnt out. We fight a lot of the same fires over and over again, and don’t
> get to work on new or interesting stuff. Also, what we do is really hard to
> transfer to other people because it’s based on experience.
>
> Over the past year my team and I have been working to overcome that gap,
> creating an assistant that’s able to scale some of this knowledge. We’ve
> got it to the point where it’s able to classify known root causes for an
> outage or an SLA breach in Cassandra with an accuracy greater than 90%. It
> can accurately diagnose bugs, data-modeling issues, or misuse of certain
> features and when it does give you specific remediation steps with links to
> knowledge base articles.
>
> We think we’ve seeded our database with enough root causes that it’ll
> catch the vast majority of issues but there is always the possibility that
> we’ll run into something previously unknown like CASSANDRA-11170 (one of
> the issues our system found in the wild).
>
> We’re looking for feedback and would like to know if anyone is interested
> in giving the product a trial. The process would be a collaboration, where
> we both get to learn from each other and improve how we’re doing things.
>
> Thanks,
> Matt Stump
>
>


Re: benefits oh HBase over Cassandra

2018-08-25 Thread daemeon reiydelle
Messenger can allow for some losses in degenerate infra cases, given a
given infra footprint. Also some ability to handle scale up faster as
demand increases, peak loads, etc. It therefore becomes a use case specific
optimization. Also hBase can run in Hadoop more easily, leveraging blobs
(HDFS), etc. So, depends on your use case.

<==>
Be the reason someone smiles today.
Or the reason they need a drink.
Whichever works.

*Daemeon C.M. Reiydelle*

*email: daeme...@gmail.com *
*San Francisco 1.415.501.0198/London 44 020 8144 9872/Skype
daemeon.c.m.reiydelle*



On Fri, Aug 24, 2018 at 10:40 PM Vitaliy Semochkin 
wrote:

> Thank you very much for fast reply, Dinesh!
> I was under impression that with tunable consistency  Cassandra can
> act as CP (in case it is needed), e.g  by setting  ALL on both reads
> and writes.
> Do you agree with this statement?
>
> PS Are there any other benefits of HBase you have found? I'd be glad
> to hear usecases list.
>
>
>
> On Sat, Aug 25, 2018 at 12:44 AM dinesh.jo...@yahoo.com.INVALID
>  wrote:
> >
> > I've worked with both databases. They're suitable for different
> use-cases. If you look at the CAP theorem; HBase is CP while Cassandra is a
> AP. If we talk about a specific use-case, it'll be easier to discuss.
> >
> > Dinesh
> >
> >
> > On Friday, August 24, 2018, 1:56:31 PM PDT, Vitaliy Semochkin <
> vitaliy...@gmail.com> wrote:
> >
> >
> > Hi,
> >
> > I read that once Facebook chose HBase over Cassandra for it's messenger,
> > but I never found what are the benefits for HBase over Cassandra,
> > can someone list, if there are any?
> >
> > Regards,
> > Vitaliy
> >
> > -
> > To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
> > For additional commands, e-mail: user-h...@cassandra.apache.org
> >
>
> -
> To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
> For additional commands, e-mail: user-h...@cassandra.apache.org
>
>


Re: JBOD disk failure

2018-08-14 Thread daemeon reiydelle
you have to explain what you mean by "JBOD". All in one large vdisk?
Separate drives?

At the end of the day, if a device fails in a way that the data housed on
that device (or array) is no longer available, that HDFS storage is marked
down. HDFS now needs to create a 3rd replicant. Various timers control how
long HDFS waits to see if the device comes back on line. But assume
immediately for convenience. Remember that a write is to a (random) copy of
the data, and that datanode then replicates to the next node, and so forth.
The in-process-of-being-created 3rd copy will also get those delete
"updates". Have you read up on how "deleting" a record works?

<==>
Be the reason someone smiles today.
Or the reason they need a drink.
Whichever works.

*Daemeon C.M. Reiydelle*

*email: daeme...@gmail.com *
*San Francisco 1.415.501.0198/London 44 020 8144 9872/Skype
daemeon.c.m.reiydelle*



On Tue, Aug 14, 2018 at 6:10 AM Christian Lorenz <
christian.lor...@webtrekk.com> wrote:

> Hi,
>
>
>
> given a cluster with RF=3 and CL=LOCAL_ONE and application is deleting
> data, what happens if the nodes are setup with JBOD and one disk fails? Do
> I get consistent results while the broken drive is replaced and a nodetool
> repair is running on the node with the replaced drive?
>
>
>
> Kind regards,
>
> Christian
>


Re: Size of a single Data Row?

2018-06-10 Thread daemeon reiydelle
I'd like to split your question into two parts.

Part one is around recovery. If you lose a copy of the underlying data
because a note fails and let's assume you have three copies, how long can
you tolerate the time to restore the third copy?

The second question is about the absolute length of a row. This question is
more about the time to read a row if it's a single super long row, that can
only be read from one node, if the row is split into multiple shorter rows
then in most cases there is an opportunity to read it in parallel.

The sizes you're looking at are not in themselves an issue, it's more how
you want to access and use the data.

I might argue that you might not want to use Cassandra, if this is your
only use case for Cassandra. I might suggest you look at something like
elk, whether or not you use elasticsearch or Cassandra might get you
thinking about your architecture to meet this particular business case. But
of course if you have multiple use cases to store something some tables or
shorter columns and others, then overall Cassandra would be an excellent
choice.

But as is often the case, and I do hope I'm being helpful in this response,
your overall family of business processes can drive compromises in one
business process to facilitate a single storage solution and simplified
Administration


Daemeon (Dæmœn) Reiydelle
USA 1.415.501.0198

On Sun, Jun 10, 2018, 02:54 Ralph Soika  wrote:

> Hi,
> I have a general question concerning the Cassandra technology. I already
> read 2 books but after all I am more and more confused about the question
> if Cassandra is the right technology. My goal is to store Business Data
> form a workflow engine into Cassandra. I want to use Cassandra as a kind of
> archive service because of its fault tolerant and decentralized approach.
>
> But here are two things which are confusing me. On the one hand the
> project claims that a single column value can be 2 GB (1 MB is
> recommended). On the other hand people explain that a partition should not
> be larger than 100MB.
>
> I plan only one single simple table:
>
> CREATE TABLE documents (
>created text,
>id text,
>data text,
>PRIMARY KEY (created,id)
> );
>
> 'created' is the partition key holding the date in ISO fomat (-MM-DD).
> The 'id' is a clustering key and is unique.
>
> But my 'data' column holds a XML document with business data. This cell
> contains many unstructured data and also media data. The data cell will be
> between 1 and 10 MB. BUT it can also hold more than 100MB and less than 2GB
> in some cases.
>
> Is Cassandra able to handle this kind of table? Or is Cassandra at the end
> not recommended for this kind of data?
>
> For example I would like to ask if data for a specific date is available :
>
> SELECT created,id WHERE created = '2018-06-10'
>
> I select without the data column and just ask if data exists. Is the
> performance automatically poor only because the data cell (no primary key)
> of some rows is grater then 100MB? Or is cassandra running out of heap
> space in any case? It is perfectly clear that it makes no sense to select
> multiple cells which each contain over 100 MB of data in one single query.
> But this is a fundamental problem and has nothing to do with Cassandra. My
> java application running in Wildfly would also not be able to handle a data
> result with multiple GB of data.  But I would expect hat I can select a set
> of keys just to decide whether to load one single data cell.
>
> Cassandra seems like a great system. But many people seem to claim that it
> is only suitable for mapping a user status list ala Facebook? Is this true?
> Thanks for you comments in advance.
>
>
>
>
> ===
> Ralph
>
>


Re: Mongo DB vs Cassandra

2018-05-31 Thread daemeon reiydelle
If you are starting with a modest amount of data (e.g. under .25 PB) and do
not have extremely high availability requirements, then it is easier to
start with MongoDB, avoiding HA clusters. I would suggest you start with
MongoDB. Both are great, but C* scales far beyond MongoDB FOR A GIVEN LEVEL
OF DBA ADMIN AND CONFIG.


<==>
"When I finish a project for a client, I have ... learned their issues with
life,
their personal secrets, I have come to care about them.
Once the project is over, I lose them as if I lost family.
For the client, however, they’ve just dismissed a service worker." ...
"Thought on the Gig Economy" by Francine Brevetti


*Daemeon C.M. ReiydelleSan Francisco 1.415.501.0198/London 44 020 8144
9872/Skype daemeon.c.m.reiydelle*


On Thu, May 31, 2018 at 4:49 AM, Sudhakar Ganesan <
sudhakar.gane...@flex.com.invalid> wrote:

> Team,
>
>
>
> I need to make a decision on Mongo DB vs Cassandra for loading the csv
> file data and store csv file as well. If any of you did such study in last
> couple of months, please share your analysis or observations.
>
>
>
> Regards,
>
> Sudhakar
> Legal Disclaimer :
> The information contained in this message may be privileged and
> confidential.
> It is intended to be read only by the individual or entity to whom it is
> addressed
> or by their designee. If the reader of this message is not the intended
> recipient,
> you are on notice that any distribution of this message, in any form,
> is strictly prohibited. If you have received this message in error,
> please immediately notify the sender and delete or destroy any copy of
> this message!
>


Re: Does Cassandra supports ACID txn

2018-04-25 Thread daemeon reiydelle
If ACID is needed, then C* is the wrong architecture. Your architecture
needs to match to your business processes as Ben pointed out: "Ask if it’s
really needed"

There is a concept of a velocity file (modern tech is memSQL'ish) that
delivers the high performance, acid transactions of lambda architectures.
It means the architecture is designed to support ONLY those functions that
need to be acid'ic. FYI, velocity files are the ultra-fast record of ATM
transactions that "just" happened, and are slowly replicated to the
persistent account balances.

So again, design your application architecture to support your needs. C* is
BASIC. Learn to love it. It is basically your friend in big data, high
volume ANSI SQL solutions.


<==>
"When I finish a project for a client, I have ... learned their issues with
life,
their personal secrets, I have come to care about them.
Once the project is over, I lose them as if I lost family.
For the client, however, they’ve just dismissed a service worker." ...
"Thought on the Gig Economy" by Francine Brevetti


On Wed, Apr 25, 2018 at 8:33 PM, Ben Slater 
wrote:

> Would be interested to hear if anyone else has any different approaches
> but my approaches would be:
> 1) Ask if it’s really needed - in the example you gave would it really
> matter that, for a small period of time, the hotel appeared in once kind of
> search but not another? (Although clearly there are examples where it might
> matter.)
> 2) Put the state that matters in a single table. In this example, have a
> hotel_enabled table. Search would have to both find the hotel in one of
> your hotel_by_* tables and  then look up the hotel in hotel_enabled to
> check it is really enabled. “deleting” a hotel is then a single write to
> hotel_enabled. hotel_enabled could also be something like hotel_details so
> the other tables really are just indexes. You need to do more reads but
> whatever you do consistency doesn’t come for free.
>
> Cheers
> Ben
>
>
> On Thu, 26 Apr 2018 at 12:44 Rajesh Kishore 
> wrote:
>
>> Correction from previous query
>>
>>
>> Thanks Ben and all experts.
>>
>> I am almost a newbie to NoSQL world and thus I have a very general
>> question how does consumer application of Cassandra/other NoSQL
>> technologies deal with atomicity & other factors when there is need to 
>> *de-normalize
>> *data. For example:
>>
>> Let us say I have requirement for queries
>> - find all hotels by name
>> - Find all hotels by Point of Interest (POI)
>> - Find POI near by a hotel
>>
>> For these queries I would end up more or less in following tables
>> hotels_by_name(hotel_name,hotel_id,city,) primary key -
>> hotel_name
>> hotels_by_poi(poi_name,poi_id,hotel_id,hotel_name,..) primary key -
>> poi_name
>> poi_by_hotel(hotel_id,poi_name,poi_id,poi_loc,hotel_name,..) primary
>> key - hotel_id
>>
>> So, If I have to add/remove a hotel from/into hotels_by_name , I may need
>> to add/remove into/from tables hotels_by_poi/poi_by_hotel. So, here my
>> assumption is these operations would need to be atomic( and may be
>> supporting other ACID properties) . How these kind of operations/usecases
>> being handled in Cassandra/NoSQL world?
>>
>> Appreciate your response.
>>
>> Thanks,
>> Rajesh
>>
>> On Thu, Apr 26, 2018 at 8:05 AM, Rajesh Kishore 
>> wrote:
>>
>>> Thanks Ben and all experts.
>>>
>>> I am almost a newbie to NoSQL world and thus I have a very general
>>> question how does consumer application of Cassandra/other NoSQL
>>> technologies deal with atomicity & other factors when there is need to
>>> normalize data. For example:
>>>
>>> Let us say I have requirement for queries
>>> - find all hotels by name
>>> - Find all hotels by Point of Interest (POI)
>>> - Find POI near by a hotel
>>>
>>> For these queries I would end up more or less in following tables
>>> hotels_by_name(hotel_name,hotel_id,city,) primary key -
>>> hotel_name
>>> hotels_by_poi(poi_name,poi_id,hotel_id,hotel_name,..) primary key -
>>> poi_name
>>> poi_by_hotel(hotel_id,poi_name,poi_id,poi_loc,hotel_name,..)
>>> primary key - hotel_id
>>>
>>> So, If I have to add/remove a hotel from/into hotels_by_name , I may
>>> need to add/remove into/from tables hotels_by_poi/poi_by_hotel. So, here my
>>> assumption is these operations would need to be atomic( and may be
>>> supporting other ACID properties) . How these kind of operations/usecases
>>> being handled in Cassandra/NoSQL world?
>>>
>>> Appreciate your response.
>>>
>>> Thanks,
>>> Rajesh
>>>
>>>
>>>
>>> On Fri, Apr 20, 2018 at 11:07 AM, Ben Slater >> > wrote:
>>>
 The second SO answer just says the partitions will be collocated (ie on
 the same server) not that the two tables will use the same partition. In
 any event, Cassandra does not have the kind of functionality you are
 looking for. The closest is logged batch but as Sylvain said, 

Re: 答复: A node down every day in a 6 nodes cluster

2018-03-26 Thread daemeon reiydelle
Look for errors on your network interface. I think you have periodic errors
in your network connectivity


<==>
"Who do you think made the first stone spear? The Asperger guy.
If you get rid of the autism genetics, there would be no Silicon Valley"
Temple Grandin


*Daemeon C.M. ReiydelleSan Francisco 1.415.501.0198London 44 020 8144 9872*


On Mon, Mar 26, 2018 at 8:26 PM, Xiangfei Ni  wrote:

> Hi Jeff,
>
> I need to restart the node manually every time,only one node has this
> problem.
>
> I have attached the nodetool output,thanks.
>
>
>
> Best Regards,
>
>
>
> 倪项菲*/ **David Ni*
>
> 中移德电网络科技有限公司
>
> Virtue Intelligent Network Ltd, co.
>
> Add: 2003,20F No.35 Luojia creative city,Luoyu Road,Wuhan,HuBei
>
> Mob: +86 13797007811 <+86%20137%209700%207811>|Tel: + 86 27 5024 2516
> <+86%2027%205024%202516>
>
>
>
> *发件人:* Jeff Jirsa 
> *发送时间:* 2018年3月27日 11:03
> *收件人:* user@cassandra.apache.org
> *主题:* Re: A node down every day in a 6 nodes cluster
>
>
>
> That warning isn’t sufficient to understand why the node is going down
>
>
>
>
>
> Cassandra 3.9 has some pretty serious known issues - upgrading to 3.11.3
> is likely a good idea
>
>
>
> Are the nodes coming up on their own? Or are you restarting them?
>
>
>
> Paste the output of nodetool tpstats and nodetool cfstats
>
>
>
>
>
>
>
> --
>
> Jeff Jirsa
>
>
>
>
> On Mar 26, 2018, at 7:56 PM, Xiangfei Ni  wrote:
>
> Hi Cassandra experts,
>
>   I am facing an issue,a node downs every day in a 6 nodes cluster,the
> cluster is just in one DC,
>
>   Every node has 4C 16G,and the heap configuration is MAX_HEAP_SIZE=8192m
> HEAP_NEWSIZE=512m,every node load about 200G data,the RF for the business
> CF is 3,a node downs one time every day,the system.log shows below info:
>
> WARN  [Native-Transport-Requests-19] 2018-03-26 18:53:17,128
> CassandraAuthorizer.java:101 - CassandraAuthorizer failed to authorize
> # for 
>
> ERROR [Native-Transport-Requests-19] 2018-03-26 18:53:17,129
> QueryMessage.java:128 - Unexpected error during query
>
> com.google.common.util.concurrent.UncheckedExecutionException:
> java.lang.RuntimeException: 
> org.apache.cassandra.exceptions.ReadTimeoutException:
> Operation timed out - received only 0 responses.
>
> at 
> com.google.common.cache.LocalCache$Segment.get(LocalCache.java:2203)
> ~[guava-18.0.jar:na]
>
> at com.google.common.cache.LocalCache.get(LocalCache.java:3937)
> ~[guava-18.0.jar:na]
>
> at com.google.common.cache.LocalCache.getOrLoad(LocalCache.java:3941)
> ~[guava-18.0.jar:na]
>
> at 
> com.google.common.cache.LocalCache$LocalLoadingCache.get(LocalCache.java:4824)
> ~[guava-18.0.jar:na]
>
> at org.apache.cassandra.auth.AuthCache.get(AuthCache.java:108)
> ~[apache-cassandra-3.9.jar:3.9]
>
> at 
> org.apache.cassandra.auth.PermissionsCache.getPermissions(PermissionsCache.java:45)
> ~[apache-cassandra-3.9.jar:3.9]
>
> at 
> org.apache.cassandra.auth.AuthenticatedUser.getPermissions(AuthenticatedUser.java:104)
> ~[apache-cassandra-3.9.jar:3.9]
>
> at 
> org.apache.cassandra.service.ClientState.authorize(ClientState.java:419)
> ~[apache-cassandra-3.9.jar:3.9]
>
> at org.apache.cassandra.service.ClientState.
> checkPermissionOnResourceChain(ClientState.java:352)
> ~[apache-cassandra-3.9.jar:3.9]
>
> at 
> org.apache.cassandra.service.ClientState.ensureHasPermission(ClientState.java:329)
> ~[apache-cassandra-3.9.jar:3.9]
>
> at 
> org.apache.cassandra.service.ClientState.hasAccess(ClientState.java:316)
> ~[apache-cassandra-3.9.jar:3.9]
>
> at 
> org.apache.cassandra.service.ClientState.hasColumnFamilyAccess(ClientState.java:300)
> ~[apache-cassandra-3.9.jar:3.9]
>
> at org.apache.cassandra.cql3.statements.ModificationStatement.
> checkAccess(ModificationStatement.java:211) ~[apache-cassandra-3.9.jar:3.
> 9]
>
> at 
> org.apache.cassandra.cql3.QueryProcessor.processStatement(QueryProcessor.java:185)
> ~[apache-cassandra-3.9.jar:3.9]
>
> at 
> org.apache.cassandra.cql3.QueryProcessor.process(QueryProcessor.java:219)
> ~[apache-cassandra-3.9.jar:3.9]
>
> at 
> org.apache.cassandra.cql3.QueryProcessor.process(QueryProcessor.java:204)
> ~[apache-cassandra-3.9.jar:3.9]
>
> at 
> org.apache.cassandra.transport.messages.QueryMessage.execute(QueryMessage.java:115)
> ~[apache-cassandra-3.9.jar:3.9]
>
> at 
> org.apache.cassandra.transport.Message$Dispatcher.channelRead0(Message.java:513)
> [apache-cassandra-3.9.jar:3.9]
>
> at 
> org.apache.cassandra.transport.Message$Dispatcher.channelRead0(Message.java:407)
> [apache-cassandra-3.9.jar:3.9]
>
> at io.netty.channel.SimpleChannelInboundHandler.channelRead(
> SimpleChannelInboundHandler.java:105) [netty-all-4.0.39.Final.jar:4.
> 0.39.Final]
>
> at io.netty.channel.AbstractChannelHandlerContext.
> invokeChannelRead(AbstractChannelHandlerContext.java:366)

Re: Cassandra on high performance machine: virtualization vs Docker

2018-02-27 Thread daemeon reiydelle
Docker will provide less per node overhead.

And yes, virtualizing smaller nodes out of a bigger physical makes sense.
Of course you lose the per node failure protection, but I guess this is not
production?


<==>
"Who do you think made the first stone spear? The Asperger guy.
If you get rid of the autism genetics, there would be no Silicon Valley"
Temple Grandin


*Daemeon C.M. ReiydelleSan Francisco 1.415.501.0198London 44 020 8144 9872*


On Tue, Feb 27, 2018 at 8:26 PM, onmstester onmstester 
wrote:

> What i've got to set up my Apache Cassandra cluster are some Servers with
> 20 Core cpu * 2 Threads and 128 GB ram and 8 * 2TB disk.
> Just read all over the web: Do not use big nodes for your cluster, i'm
> convinced to run multiple nodes on a single physical server.
> So the question is which technology should i use: Docker or Virtualiztion
> (ESX)? Any exprience?
>
> Sent using Zoho Mail
>
>
>
>


Re: What kind of Automation you have for Cassandra related operations on AWS ?

2018-02-08 Thread daemeon reiydelle
Terraform plus ansible. Put ok but messy. 5-30,000 nodes and infra


Daemeon (Dæmœn) Reiydelle
USA 1.415.501.0198

On Thu, Feb 8, 2018, 15:57 Ben Wood  wrote:

> Shameless plug of our (DC/OS) Apache Cassandra service:
> https://docs.mesosphere.com/services/cassandra/2.0.3-3.0.14.
>
> You must run DC/OS, but it will handle:
> Restarts
> Replacement of nodes
> Modification of configuration
> Backups and Restores (to S3)
>
> On Thu, Feb 8, 2018 at 3:46 PM, Krish Donald  wrote:
>
>> Hi All,
>>
>> What kind of Automation you have for Cassandra related operations on AWS
>> like restacking, restart of the cluster , changing cassandra.yaml
>> parameters etc ?
>>
>> Thanks
>>
>>
>
>
> --
> Ben Wood
> Software Engineer - Data Agility
> Mesosphere
>


Re: Meltdown/Spectre Linux patch - Performance impact on Cassandra?

2018-01-09 Thread daemeon reiydelle
Good luck with that. Pcid out since mid 2017 as I recall?


Daemeon (Dæmœn) Reiydelle
USA 1.415.501.0198

On Jan 9, 2018 10:31 AM, "Dor Laor"  wrote:

Make sure you pick instances with PCID cpu capability, their TLB overhead
flush
overhead is much smaller

On Tue, Jan 9, 2018 at 2:04 AM, Steinmaurer, Thomas <
thomas.steinmau...@dynatrace.com> wrote:

> Quick follow up.
>
>
>
> Others in AWS reporting/seeing something similar, e.g.:
> https://twitter.com/BenBromhead/status/950245250504601600
>
>
>
> So, while we have seen an relative CPU increase of ~ 50% since Jan 4,
> 2018, we now also have applied a kernel update at OS/VM level on a single
> node (loadtest and not production though), thus more or less double patched
> now. Additional CPU impact by OS/VM level kernel patching is more or less 
> negligible,
> so looks highly Hypervisor related.
>
>
>
> Regards,
>
> Thomas
>
>
>
> *From:* Steinmaurer, Thomas [mailto:thomas.steinmau...@dynatrace.com]
> *Sent:* Freitag, 05. Jänner 2018 12:09
> *To:* user@cassandra.apache.org
> *Subject:* Meltdown/Spectre Linux patch - Performance impact on Cassandra?
>
>
>
> Hello,
>
>
>
> has anybody already some experience/results if a patched Linux kernel
> regarding Meltdown/Spectre is affecting performance of Cassandra negatively?
>
>
>
> In production, all nodes running in AWS with m4.xlarge, we see up to a 50%
> relative (e.g. AVG CPU from 40% => 60%) CPU increase since Jan 4, 2018,
> most likely correlating with Amazon finished patching the underlying
> Hypervisor infrastructure …
>
>
>
> Anybody else seeing a similar CPU increase?
>
>
>
> Thanks,
>
> Thomas
>
>
>
> The contents of this e-mail are intended for the named addressee only. It
> contains information that may be confidential. Unless you are the named
> addressee or an authorized designee, you may not copy or use it, or
> disclose it to anyone else. If you received it in error please notify us
> immediately and then destroy it. Dynatrace Austria GmbH (registration
> number FN 91482h) is a company registered in Linz whose registered office
> is at 4040 Linz, Austria, Freistädterstraße 313
> 
> The contents of this e-mail are intended for the named addressee only. It
> contains information that may be confidential. Unless you are the named
> addressee or an authorized designee, you may not copy or use it, or
> disclose it to anyone else. If you received it in error please notify us
> immediately and then destroy it. Dynatrace Austria GmbH (registration
> number FN 91482h) is a company registered in Linz whose registered office
> is at 4040 Linz, Austria, Freistädterstraße 313
> 
>


Re:

2017-10-01 Thread daemeon reiydelle
What specifically are you looking to monitor? As per above, Datadog has
superb components for monitoring, and no need do develop and support
anything, for a price of course. I have found management sometimes sees
devops resources as pretty low cost (pay for 40, get 70 hours work per
week). Depends on how big your clusters are, whether they are Hadoop MR,
add Hive, add Spark, add Ignite, etc.

Same sort of questions apply to your etl/ingest: Kafka/NiFi, Streaming, etc.

We like to say that we don’t get to choose our parents, that they were
given by chance – yet, we can truly choose whose children we wish to be. -
Seneca the Younger



*Daemeon C.M. ReiydelleSan Francisco 1.415.501.0198London 44 020 8144 9872*


On Sun, Oct 1, 2017 at 9:57 AM, Jeff Jirsa  wrote:

> I've seen successful AWS deployments in the past with Datadog and
> Graphite+Seyren
>
>
>
> On Sun, Oct 1, 2017 at 9:14 AM, Bill Walters 
> wrote:
>
>> Hi All,
>>
>> I need some help with deploying a monitoring and alerting system for our
>> new Cassandra 3.0.4 cluster that we are setting up in AWS East region.
>> I have a good experience with Cassandra as we are running some 2.0.16
>> clusters in production on our on-prem servers. We use Nagios tool to
>> monitor and alert our on-call people if the any of the nodes in our on-prem
>> servers go down. (Nagios is the default monitoring and alerting system used
>> by our company)
>> Since, our leadership started a plan to migrate our infrastructure to
>> cloud, we have chosen AWS as our public cloud.
>> We are planning to use same old Nagios as our monitoring and alerting
>> system even for our cloud servers.
>> But not sure if this is the ideal approach, I have seen uses cases where Yelp
>> used Sensu
>> 
>>  and Netflix wrote their own tool
>> 
>>  for
>> monitoring their cloud Cassandra clusters.
>>
>> Please let me know if there are any cloud native monitoring systems that
>> work well with Cassandra, we will review it for our setup.
>>
>>
>>
>> Thank You,
>> Bill Walters.
>>
>
>


Re: new question ;-) // RE: understanding batch atomicity

2017-09-29 Thread daemeon reiydelle
recall that a delete is actually a corner case of an update, as is an
insert.

As I read the snippet, you are updating multiple tables. The partition key
is table specific, so two sets of update batches are handled here.

We like to say that we don’t get to choose our parents, that they were
given by chance – yet, we can truly choose whose children we wish to be. -
Seneca the Younger



*Daemeon C.M. ReiydelleSan Francisco 1.415.501.0198London 44 020 8144 9872*


On Fri, Sep 29, 2017 at 8:59 AM, DE VITO Dominique <
dominique.dev...@thalesgroup.com> wrote:

> Thanks DuyHai !
>
>
>
> Does anyone know if BATCH provides atomicity for all mutations of a given
> partition key for a __single__ table ?
>
>
>
> Or if BATCH provides atomicity for all mutations of a given partition key
> for __ALL__ mutated tables into the BATCH ?
>
>
>
> That is, in case of :
>
>
>
> BEGIN BATCH
>
> Update table_1 where PartitionKey_table_1 = 1 … => (A) mutation
>
> Update table_2 where PartitionKey_table_2 = 1 … => (B) mutation
>
> END BATCH
>
>
>
> Here, both mutations occur for the same PartitionKey = 1
>
> => are mutations (A) & (B) done in an atomic way (all or nothing) ?
>
>
>
> Thanks.
>
>
>
> Dominique
>
>
>
>
>
>
>
> [@@ THALES GROUP INTERNAL @@]
>
>
>
> *De :* DuyHai Doan [mailto:doanduy...@gmail.com]
> *Envoyé :* vendredi 29 septembre 2017 17:10
> *À :* user
> *Objet :* Re: understanding batch atomicity
>
>
>
> All updates here means all mutations == INSERT/UPDATE or DELETE
>
>
>
>
>
>
>
> On Fri, Sep 29, 2017 at 5:07 PM, DE VITO Dominique <
> dominique.dev...@thalesgroup.com> wrote:
>
> Hi,
>
>
>
> About BATCH, the Apache doc https://cassandra.apache.org/
> doc/latest/cql/dml.html?highlight=atomicity says :
>
>
>
> “*The BATCH statement group multiple modification statements
> (insertions/updates and deletions) into a single statement. It serves
> several purposes:*
>
> *...*
>
> *All updates in a BATCH belonging to a given partition key are performed
> in isolation*”
>
>
>
> Is “All *updates*” meaning equivalent to “All modifications (whatever
> it’s sources: INSERT or UPDATE statements)” ?
>
>
>
> Or, is “*updates*” meaning partition-level isolation *only* for UPDATE
> statements into the batch (w/o taking into isolation the INSERT other
> statements into the batch) ?
>
>
>
> Thanks
>
>
>
> Regards
>
> Dominique
>
>
>
>
>


Re: cassandra hardware requirements (STAT/SSD)

2017-09-29 Thread daemeon reiydelle
Note to the AWS poster, you have some limited understanding of how disks
are presented to AWS compute nodes. As a result your post is not relevant,
and misleading.

When considering throughput, recall that disk IO is ideally parallel. While
C* handles IO across multiple devices nicely, the unit of storage is a very
large "block". Whether that serial read is adequate, or whether you do RAID
0 (max parallel, no checksum overhead, loss of one drive makes the whole
volume unavailable) is a performance vs. reliability tradeoff.



We like to say that we don’t get to choose our parents, that they were
given by chance – yet, we can truly choose whose children we wish to be. -
Seneca the Younger



*Daemeon C.M. ReiydelleSan Francisco 1.415.501.0198London 44 020 8144 9872*


On Fri, Sep 29, 2017 at 6:11 AM, Lutaya Shafiq Holmes <
lutayasha...@gmail.com> wrote:

> Please try and USE AWS
>
> amazon web services on aws.amazon.com
>
> On 9/29/17, Peng Xiao <2535...@qq.com> wrote:
> > Hi there,
> > we are struggling on hardware selection,we all know that ssd is good,and
> > Datastax suggests us to use ssd,as Cassandra is a CPU bound db,we are
> > considering to use sata disk,we noticed that the normal IO throughput is
> > 7MB/s.
> >
> >
> > Could anyone give some advice?
> >
> >
> > Thanks,
> > Peng Xiao
>
>
> --
> Lutaaya Shafiq
> Web: www.ronzag.com | i...@ronzag.com
> Mobile: +256702772721 | +256783564130
> Twitter: @lutayashafiq
> Skype: lutaya5
> Blog: lutayashafiq.com
> http://www.fourcornersalliancegroup.com/?a=shafiqholmes
>
> "The most beautiful people we have known are those who have known defeat,
> known suffering, known struggle, known loss and have found their way out of
> the depths. These persons have an appreciation, a sensitivity and an
> understanding of life that fills them with compassion, gentleness and a
> deep loving concern. Beautiful people do not just happen." - *Elisabeth
> Kubler-Ross*
>
> -
> To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
> For additional commands, e-mail: user-h...@cassandra.apache.org
>
>


Re: Tool to manage cassandra

2017-06-16 Thread daemeon reiydelle
Ambari





*Daemeon C.M. ReiydelleUSA (+1) 415.501.0198London (+44) (0) 20 8144 9872*


*"It is better to be insulted with the truth than kissed with a lie”*

On Fri, Jun 16, 2017 at 6:01 AM, Ram Bhatia  wrote:

> Hi
>
> May I know, if there a tool similar to Oracle Enterprise Manager for
> managing Cassandra ?
>
> Thank you in advance for your help,
> Ram Bhatia
> - To
> unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org For additional
> commands, e-mail: user-h...@cassandra.apache.org


Re: Restarting nodes and reported load

2017-06-01 Thread daemeon reiydelle
Some random thoughts; I would like to thank you for giving us an
interesting problem. Cassandra can get boring sometimes, it is too stable.

- Do you have a way to monitor the network traffic to see if it is
increasing between restarts or does it seem relatively flat?
- What activities are happening when you observe the (increasing)
latencies? Something must be writing to keyspaces, something I presume is
reading. What is the workload?
- when using SSD, there are some /devices optimizations for SSD's. I wonder
if those were done (they will cause some IO latency, but not like this)







*Daemeon C.M. ReiydelleUSA (+1) 415.501.0198London (+44) (0) 20 8144 9872*



On Thu, Jun 1, 2017 at 7:18 AM, Daniel Steuernol 
wrote:

> I am just restarting cassandra. I'm not having any disk space issues I
> think, but we're having issues where operations have increased latency, and
> these are fixed by a restart. It seemed like the load reported by nodetool
> status might be helpful in understanding what is going wrong but I'm not
> sure. Another symptom is that nodes will report as DN in nodetool status
> and then come back up again just a minute later.
>
> I'm not really sure what to track to find out what exactly is going wrong
> on the cluster, so any insight or debugging techniques would be super
> helpful
>
>
> On May 31 2017, at 5:07 pm, Anthony Grasso 
> wrote:
>
>> Hi Daniel,
>>
>> When you say that the nodes have to be restarted, are you just restarting
>> the Cassandra service or are you restarting the machine?
>> How are you reclaiming disk space at the moment? Does disk space free up
>> after the restart?
>>
>> Regarding storage on nodes, keep in mind the more data stored on a node,
>> the longer some operations to maintain that data will take to complete. In
>> addition, the more data that is on each node, the long it will take to
>> stream data to other nodes. Whether it is replacing a down node or
>> inserting a new node, having a large amount of data on each node will mean
>> that it takes longer for a node to join the cluster if it is streaming the
>> data.
>>
>> Kind regards,
>> Anthony
>>
>> On 30 May 2017 at 02:43, Daniel Steuernol  wrote:
>>
>> The cluster is running with RF=3, right now each node is storing about
>> 3-4 TB of data. I'm using r4.2xlarge EC2 instances, these have 8 vCPU's, 61
>> GB of RAM, and the disks attached for the data drive are gp2 ssd ebs
>> volumes with 10k iops. I guess this brings up the question of what's a good
>> marker to decide on whether to increase disk space vs provisioning a new
>> node?
>>
>>
>>
>> On May 29 2017, at 9:35 am, tommaso barbugli 
>> wrote:
>>
>> Hi Daniel,
>>
>> This is not normal. Possibly a capacity problem. Whats the RF, how much
>> data do you store per node and what kind of servers do you use (core count,
>> RAM, disk, ...)?
>>
>> Cheers,
>> Tommaso
>>
>> On Mon, May 29, 2017 at 6:22 PM, Daniel Steuernol 
>> wrote:
>>
>>
>> I am running a 6 node cluster, and I have noticed that the reported load
>> on each node rises throughout the week and grows way past the actual disk
>> space used and available on each node. Also eventually latency for
>> operations suffers and the nodes have to be restarted. A couple questions
>> on this, is this normal? Also does cassandra need to be restarted every few
>> days for best performance? Any insight on this behaviour would be helpful.
>>
>> Cheers,
>> Daniel
>> - To
>> unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org For
>> additional commands, e-mail: user-h...@cassandra.apache.org
>>
>>
>> - To
>> unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org For
>> additional commands, e-mail: user-h...@cassandra.apache.org
>>
>>
>> - To
> unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org For additional
> commands, e-mail: user-h...@cassandra.apache.org


Re: Restarting nodes and reported load

2017-05-30 Thread daemeon reiydelle
Did you notice that HDFS is the distributed file system used?





*Daemeon C.M. ReiydelleUSA (+1) 415.501.0198London (+44) (0) 20 8144 9872*


*“All men dream, but not equally. Those who dream by night in the dusty
recesses of their minds wake up in the day to find it was vanity, but the
dreamers of the day are dangerous men, for they may act their dreams with
open eyes, to make it possible.” — T.E. Lawrence*


On Tue, May 30, 2017 at 2:18 PM, Jonathan Haddad <j...@jonhaddad.com> wrote:

> This isn't an HDFS mailing list.
>
> On Tue, May 30, 2017 at 2:14 PM daemeon reiydelle <daeme...@gmail.com>
> wrote:
>
>> no, 3tb is small. 30-50tb of hdfs space is typical these days per hdfs
>> node. Depends somewhat on whether there is a mix of more and less
>> frequently accessed data. But even storing only hot data, never saw
>> anything less than 20tb hdfs per node.
>>
>>
>>
>>
>>
>> *Daemeon C.M. ReiydelleUSA (+1) 415.501.0198 <(415)%20501-0198>London
>> (+44) (0) 20 8144 9872 <+44%2020%208144%209872>*
>>
>>
>> *“All men dream, but not equally. Those who dream by night in the dusty
>> recesses of their minds wake up in the day to find it was vanity, but the
>> dreamers of the day are dangerous men, for they may act their dreams with
>> open eyes, to make it possible.” — T.E. Lawrence*
>>
>>
>> On Tue, May 30, 2017 at 2:00 PM, tommaso barbugli <tbarbu...@gmail.com>
>> wrote:
>>
>>> Am I the only one thinking 3TB is way too much data for a single node on
>>> a VM?
>>>
>>> On Tue, May 30, 2017 at 10:36 PM, Daniel Steuernol <
>>> dan...@sendwithus.com> wrote:
>>>
>>>> I don't believe incremental repair is enabled, I have never enabled it
>>>> on the cluster, and unless it's the default then it is off. Also I don't
>>>> see a setting in cassandra.yaml for it.
>>>>
>>>>
>>>>
>>>> On May 30 2017, at 1:10 pm, daemeon reiydelle <daeme...@gmail.com>
>>>> wrote:
>>>>
>>>>> Unless there is a bug, snapshots are excluded (they are not HDFS
>>>>> anyway!) from nodetool status.
>>>>>
>>>>> Out of curiousity, is incremenatal repair enabled? This is almost
>>>>> certainly a rat hole, but there was an issue a few releases back where 
>>>>> load
>>>>> would only increase until the node was restarted. Had been fixed ages ago,
>>>>> but wondering what happens if you restart a node, IF you have incremental
>>>>> enabled.
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> *Daemeon C.M. ReiydelleUSA (+1) 415.501.0198 <+1%20415-501-0198>London
>>>>> (+44) (0) 20 8144 9872 <+44%2020%208144%209872>*
>>>>>
>>>>>
>>>>> *“All men dream, but not equally. Those who dream by night in the
>>>>> dusty recesses of their minds wake up in the day to find it was vanity, 
>>>>> but
>>>>> the dreamers of the day are dangerous men, for they may act their dreams
>>>>> with open eyes, to make it possible.” — T.E. Lawrence*
>>>>>
>>>>>
>>>>> On Tue, May 30, 2017 at 12:15 PM, Varun Gupta <var...@uber.com> wrote:
>>>>>
>>>>> Can you please check if you have incremental backup enabled and
>>>>> snapshots are occupying the space.
>>>>>
>>>>> run nodetool clearsnapshot command.
>>>>>
>>>>> On Tue, May 30, 2017 at 11:12 AM, Daniel Steuernol <
>>>>> dan...@sendwithus.com> wrote:
>>>>>
>>>>> It's 3-4TB per node, and by load rises, I'm talking about load as
>>>>> reported by nodetool status.
>>>>>
>>>>>
>>>>>
>>>>> On May 30 2017, at 10:25 am, daemeon reiydelle <daeme...@gmail.com>
>>>>> wrote:
>>>>>
>>>>> When you say "the load rises ... ", could you clarify what you mean by
>>>>> "load"? That has a specific Linux term, and in e.g. Cloudera Manager. But
>>>>> in neither case would that be relevant to transient or persisted disk. Am 
>>>>> I
>>>>> missing something?
>>>>>
>>>>>
>>>>> On Tue, May 30, 2017 at 10:18 AM, tommaso barbugli <
>>>>> tbarbu...@gmail.com> wrote:
>>>>>
>>>>> 3-4 T

Re: Restarting nodes and reported load

2017-05-30 Thread daemeon reiydelle
no, 3tb is small. 30-50tb of hdfs space is typical these days per hdfs
node. Depends somewhat on whether there is a mix of more and less
frequently accessed data. But even storing only hot data, never saw
anything less than 20tb hdfs per node.





*Daemeon C.M. ReiydelleUSA (+1) 415.501.0198London (+44) (0) 20 8144 9872*


*“All men dream, but not equally. Those who dream by night in the dusty
recesses of their minds wake up in the day to find it was vanity, but the
dreamers of the day are dangerous men, for they may act their dreams with
open eyes, to make it possible.” — T.E. Lawrence*


On Tue, May 30, 2017 at 2:00 PM, tommaso barbugli <tbarbu...@gmail.com>
wrote:

> Am I the only one thinking 3TB is way too much data for a single node on a
> VM?
>
> On Tue, May 30, 2017 at 10:36 PM, Daniel Steuernol <dan...@sendwithus.com>
> wrote:
>
>> I don't believe incremental repair is enabled, I have never enabled it on
>> the cluster, and unless it's the default then it is off. Also I don't see a
>> setting in cassandra.yaml for it.
>>
>>
>>
>> On May 30 2017, at 1:10 pm, daemeon reiydelle <daeme...@gmail.com>
>> wrote:
>>
>>> Unless there is a bug, snapshots are excluded (they are not HDFS
>>> anyway!) from nodetool status.
>>>
>>> Out of curiousity, is incremenatal repair enabled? This is almost
>>> certainly a rat hole, but there was an issue a few releases back where load
>>> would only increase until the node was restarted. Had been fixed ages ago,
>>> but wondering what happens if you restart a node, IF you have incremental
>>> enabled.
>>>
>>>
>>>
>>>
>>>
>>> *Daemeon C.M. ReiydelleUSA (+1) 415.501.0198 <+1%20415-501-0198>London
>>> (+44) (0) 20 8144 9872 <+44%2020%208144%209872>*
>>>
>>>
>>> *“All men dream, but not equally. Those who dream by night in the dusty
>>> recesses of their minds wake up in the day to find it was vanity, but the
>>> dreamers of the day are dangerous men, for they may act their dreams with
>>> open eyes, to make it possible.” — T.E. Lawrence*
>>>
>>>
>>> On Tue, May 30, 2017 at 12:15 PM, Varun Gupta <var...@uber.com> wrote:
>>>
>>> Can you please check if you have incremental backup enabled and
>>> snapshots are occupying the space.
>>>
>>> run nodetool clearsnapshot command.
>>>
>>> On Tue, May 30, 2017 at 11:12 AM, Daniel Steuernol <
>>> dan...@sendwithus.com> wrote:
>>>
>>> It's 3-4TB per node, and by load rises, I'm talking about load as
>>> reported by nodetool status.
>>>
>>>
>>>
>>> On May 30 2017, at 10:25 am, daemeon reiydelle <daeme...@gmail.com>
>>> wrote:
>>>
>>> When you say "the load rises ... ", could you clarify what you mean by
>>> "load"? That has a specific Linux term, and in e.g. Cloudera Manager. But
>>> in neither case would that be relevant to transient or persisted disk. Am I
>>> missing something?
>>>
>>>
>>> On Tue, May 30, 2017 at 10:18 AM, tommaso barbugli <tbarbu...@gmail.com>
>>> wrote:
>>>
>>> 3-4 TB per node or in total?
>>>
>>> On Tue, May 30, 2017 at 6:48 PM, Daniel Steuernol <dan...@sendwithus.com
>>> > wrote:
>>>
>>> I should also mention that I am running cassandra 3.10 on the cluster
>>>
>>>
>>>
>>> On May 29 2017, at 9:43 am, Daniel Steuernol <dan...@sendwithus.com>
>>> wrote:
>>>
>>> The cluster is running with RF=3, right now each node is storing about
>>> 3-4 TB of data. I'm using r4.2xlarge EC2 instances, these have 8 vCPU's, 61
>>> GB of RAM, and the disks attached for the data drive are gp2 ssd ebs
>>> volumes with 10k iops. I guess this brings up the question of what's a good
>>> marker to decide on whether to increase disk space vs provisioning a new
>>> node?
>>>
>>>
>>> On May 29 2017, at 9:35 am, tommaso barbugli <tbarbu...@gmail.com>
>>> wrote:
>>>
>>> Hi Daniel,
>>>
>>> This is not normal. Possibly a capacity problem. Whats the RF, how much
>>> data do you store per node and what kind of servers do you use (core count,
>>> RAM, disk, ...)?
>>>
>>> Cheers,
>>> Tommaso
>>>
>>> On Mon, May 29, 2017 at 6:22 PM, Daniel Steuernol <dan...@sendwithus.com
>>> > wrote:
>>>
>>>
>>> I am running a 6 node cluster, and I have noticed that the reported load
>>> on each node rises throughout the week and grows way past the actual disk
>>> space used and available on each node. Also eventually latency for
>>> operations suffers and the nodes have to be restarted. A couple questions
>>> on this, is this normal? Also does cassandra need to be restarted every few
>>> days for best performance? Any insight on this behaviour would be helpful.
>>>
>>> Cheers,
>>> Daniel
>>> -
>>> To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org For
>>> additional commands, e-mail: user-h...@cassandra.apache.org
>>>
>>>
>>>
>>>
>>> -
>>> To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org For
>>> additional commands, e-mail: user-h...@cassandra.apache.org
>>>
>>>
>>>
>>>
>


Re: Restarting nodes and reported load

2017-05-30 Thread daemeon reiydelle
No degradation.





*Daemeon C.M. ReiydelleUSA (+1) 415.501.0198London (+44) (0) 20 8144 9872*


*“All men dream, but not equally. Those who dream by night in the dusty
recesses of their minds wake up in the day to find it was vanity, but the
dreamers of the day are dangerous men, for they may act their dreams with
open eyes, to make it possible.” — T.E. Lawrence*


On Tue, May 30, 2017 at 1:54 PM, Daniel Steuernol <dan...@sendwithus.com>
wrote:

> That does sound like what's happening, did performance degrade as the
> reported load increased?
>
>
>
> On May 30 2017, at 1:52 pm, daemeon reiydelle <daeme...@gmail.com> wrote:
>
>> OK, thanks.
>>
>> So there was a bug in a prior version of C*, symptoms were:
>>
>> Nodetool would show increasing load utilization over time. Stopping and
>> restarting C* nodes would reset the storage back to what one would expect
>> on that node, for a while, then it would creep upwards again, until the
>> node(s) are restarted, etc. FYI it ONLY occurred on an in-use system, etc.
>>
>> I know (double checked) that the problem was fixed a while back.
>> Wondering if it resurfaced?
>>
>>
>>
>>
>>
>> *Daemeon C.M. ReiydelleUSA (+1) 415.501.0198 <(415)%20501-0198>London
>> (+44) (0) 20 8144 9872 <+44%2020%208144%209872>*
>>
>>
>> *“All men dream, but not equally. Those who dream by night in the dusty
>> recesses of their minds wake up in the day to find it was vanity, but the
>> dreamers of the day are dangerous men, for they may act their dreams with
>> open eyes, to make it possible.” — T.E. Lawrence*
>>
>>
>> On Tue, May 30, 2017 at 1:36 PM, Daniel Steuernol <dan...@sendwithus.com>
>> wrote:
>>
>> I don't believe incremental repair is enabled, I have never enabled it on
>> the cluster, and unless it's the default then it is off. Also I don't see a
>> setting in cassandra.yaml for it.
>>
>>
>> On May 30 2017, at 1:10 pm, daemeon reiydelle <daeme...@gmail.com>
>> wrote:
>>
>> Unless there is a bug, snapshots are excluded (they are not HDFS anyway!)
>> from nodetool status.
>>
>> Out of curiousity, is incremenatal repair enabled? This is almost
>> certainly a rat hole, but there was an issue a few releases back where load
>> would only increase until the node was restarted. Had been fixed ages ago,
>> but wondering what happens if you restart a node, IF you have incremental
>> enabled.
>>
>>
>>
>>
>>
>> *Daemeon C.M. ReiydelleUSA (+1) 415.501.0198London (+44) (0) 20 8144 9872*
>>
>>
>> *“All men dream, but not equally. Those who dream by night in the dusty
>> recesses of their minds wake up in the day to find it was vanity, but the
>> dreamers of the day are dangerous men, for they may act their dreams with
>> open eyes, to make it possible.” — T.E. Lawrence*
>>
>>
>> On Tue, May 30, 2017 at 12:15 PM, Varun Gupta <var...@uber.com> wrote:
>>
>> Can you please check if you have incremental backup enabled and snapshots
>> are occupying the space.
>>
>> run nodetool clearsnapshot command.
>>
>> On Tue, May 30, 2017 at 11:12 AM, Daniel Steuernol <dan...@sendwithus.com
>> > wrote:
>>
>> It's 3-4TB per node, and by load rises, I'm talking about load as
>> reported by nodetool status.
>>
>>
>>
>> On May 30 2017, at 10:25 am, daemeon reiydelle <daeme...@gmail.com>
>> wrote:
>>
>> When you say "the load rises ... ", could you clarify what you mean by
>> "load"? That has a specific Linux term, and in e.g. Cloudera Manager. But
>> in neither case would that be relevant to transient or persisted disk. Am I
>> missing something?
>>
>>
>> On Tue, May 30, 2017 at 10:18 AM, tommaso barbugli <tbarbu...@gmail.com>
>> wrote:
>>
>> 3-4 TB per node or in total?
>>
>> On Tue, May 30, 2017 at 6:48 PM, Daniel Steuernol <dan...@sendwithus.com>
>> wrote:
>>
>> I should also mention that I am running cassandra 3.10 on the cluster
>>
>>
>>
>> On May 29 2017, at 9:43 am, Daniel Steuernol <dan...@sendwithus.com>
>> wrote:
>>
>> The cluster is running with RF=3, right now each node is storing about
>> 3-4 TB of data. I'm using r4.2xlarge EC2 instances, these have 8 vCPU's, 61
>> GB of RAM, and the disks attached for the data drive are gp2 ssd ebs
>> volumes with 10k iops. I guess this brings up the question of what's a good
>> marker to decide on whether to increase disk space 

Re: Restarting nodes and reported load

2017-05-30 Thread daemeon reiydelle
OK, thanks.

So there was a bug in a prior version of C*, symptoms were:

Nodetool would show increasing load utilization over time. Stopping and
restarting C* nodes would reset the storage back to what one would expect
on that node, for a while, then it would creep upwards again, until the
node(s) are restarted, etc. FYI it ONLY occurred on an in-use system, etc.

I know (double checked) that the problem was fixed a while back. Wondering
if it resurfaced?





*Daemeon C.M. ReiydelleUSA (+1) 415.501.0198London (+44) (0) 20 8144 9872*


*“All men dream, but not equally. Those who dream by night in the dusty
recesses of their minds wake up in the day to find it was vanity, but the
dreamers of the day are dangerous men, for they may act their dreams with
open eyes, to make it possible.” — T.E. Lawrence*


On Tue, May 30, 2017 at 1:36 PM, Daniel Steuernol <dan...@sendwithus.com>
wrote:

> I don't believe incremental repair is enabled, I have never enabled it on
> the cluster, and unless it's the default then it is off. Also I don't see a
> setting in cassandra.yaml for it.
>
>
> On May 30 2017, at 1:10 pm, daemeon reiydelle <daeme...@gmail.com> wrote:
>
>> Unless there is a bug, snapshots are excluded (they are not HDFS anyway!)
>> from nodetool status.
>>
>> Out of curiousity, is incremenatal repair enabled? This is almost
>> certainly a rat hole, but there was an issue a few releases back where load
>> would only increase until the node was restarted. Had been fixed ages ago,
>> but wondering what happens if you restart a node, IF you have incremental
>> enabled.
>>
>>
>>
>>
>>
>> *Daemeon C.M. ReiydelleUSA (+1) 415.501.0198 <(415)%20501-0198>London
>> (+44) (0) 20 8144 9872 <+44%2020%208144%209872>*
>>
>>
>> *“All men dream, but not equally. Those who dream by night in the dusty
>> recesses of their minds wake up in the day to find it was vanity, but the
>> dreamers of the day are dangerous men, for they may act their dreams with
>> open eyes, to make it possible.” — T.E. Lawrence*
>>
>>
>> On Tue, May 30, 2017 at 12:15 PM, Varun Gupta <var...@uber.com> wrote:
>>
>> Can you please check if you have incremental backup enabled and snapshots
>> are occupying the space.
>>
>> run nodetool clearsnapshot command.
>>
>> On Tue, May 30, 2017 at 11:12 AM, Daniel Steuernol <dan...@sendwithus.com
>> > wrote:
>>
>> It's 3-4TB per node, and by load rises, I'm talking about load as
>> reported by nodetool status.
>>
>>
>>
>> On May 30 2017, at 10:25 am, daemeon reiydelle <daeme...@gmail.com>
>> wrote:
>>
>> When you say "the load rises ... ", could you clarify what you mean by
>> "load"? That has a specific Linux term, and in e.g. Cloudera Manager. But
>> in neither case would that be relevant to transient or persisted disk. Am I
>> missing something?
>>
>>
>> On Tue, May 30, 2017 at 10:18 AM, tommaso barbugli <tbarbu...@gmail.com>
>> wrote:
>>
>> 3-4 TB per node or in total?
>>
>> On Tue, May 30, 2017 at 6:48 PM, Daniel Steuernol <dan...@sendwithus.com>
>> wrote:
>>
>> I should also mention that I am running cassandra 3.10 on the cluster
>>
>>
>>
>> On May 29 2017, at 9:43 am, Daniel Steuernol <dan...@sendwithus.com>
>> wrote:
>>
>> The cluster is running with RF=3, right now each node is storing about
>> 3-4 TB of data. I'm using r4.2xlarge EC2 instances, these have 8 vCPU's, 61
>> GB of RAM, and the disks attached for the data drive are gp2 ssd ebs
>> volumes with 10k iops. I guess this brings up the question of what's a good
>> marker to decide on whether to increase disk space vs provisioning a new
>> node?
>>
>>
>> On May 29 2017, at 9:35 am, tommaso barbugli <tbarbu...@gmail.com>
>> wrote:
>>
>> Hi Daniel,
>>
>> This is not normal. Possibly a capacity problem. Whats the RF, how much
>> data do you store per node and what kind of servers do you use (core count,
>> RAM, disk, ...)?
>>
>> Cheers,
>> Tommaso
>>
>> On Mon, May 29, 2017 at 6:22 PM, Daniel Steuernol <dan...@sendwithus.com>
>> wrote:
>>
>>
>> I am running a 6 node cluster, and I have noticed that the reported load
>> on each node rises throughout the week and grows way past the actual disk
>> space used and available on each node. Also eventually latency for
>> operations suffers and the nodes have to be restarted. A couple questions
>> on this, is this normal? Also does cassandra need to be restarted every few
>> days for best performance? Any insight on this behaviour would be helpful.
>>
>> Cheers,
>> Daniel
>> - To
>> unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org For
>> additional commands, e-mail: user-h...@cassandra.apache.org
>>
>>
>>
>>
>> - To
>> unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org For
>> additional commands, e-mail: user-h...@cassandra.apache.org
>>
>>
>>
>>


Re: Restarting nodes and reported load

2017-05-30 Thread daemeon reiydelle
Unless there is a bug, snapshots are excluded (they are not HDFS anyway!)
from nodetool status.

Out of curiousity, is incremenatal repair enabled? This is almost certainly
a rat hole, but there was an issue a few releases back where load would
only increase until the node was restarted. Had been fixed ages ago, but
wondering what happens if you restart a node, IF you have incremental
enabled.





*Daemeon C.M. ReiydelleUSA (+1) 415.501.0198London (+44) (0) 20 8144 9872*


*“All men dream, but not equally. Those who dream by night in the dusty
recesses of their minds wake up in the day to find it was vanity, but the
dreamers of the day are dangerous men, for they may act their dreams with
open eyes, to make it possible.” — T.E. Lawrence*


On Tue, May 30, 2017 at 12:15 PM, Varun Gupta <var...@uber.com> wrote:

> Can you please check if you have incremental backup enabled and snapshots
> are occupying the space.
>
> run nodetool clearsnapshot command.
>
> On Tue, May 30, 2017 at 11:12 AM, Daniel Steuernol <dan...@sendwithus.com>
> wrote:
>
>> It's 3-4TB per node, and by load rises, I'm talking about load as
>> reported by nodetool status.
>>
>>
>>
>> On May 30 2017, at 10:25 am, daemeon reiydelle <daeme...@gmail.com>
>> wrote:
>>
>>> When you say "the load rises ... ", could you clarify what you mean by
>>> "load"? That has a specific Linux term, and in e.g. Cloudera Manager. But
>>> in neither case would that be relevant to transient or persisted disk. Am I
>>> missing something?
>>>
>>>
>>> On Tue, May 30, 2017 at 10:18 AM, tommaso barbugli <tbarbu...@gmail.com>
>>> wrote:
>>>
>>> 3-4 TB per node or in total?
>>>
>>> On Tue, May 30, 2017 at 6:48 PM, Daniel Steuernol <dan...@sendwithus.com
>>> > wrote:
>>>
>>> I should also mention that I am running cassandra 3.10 on the cluster
>>>
>>>
>>>
>>> On May 29 2017, at 9:43 am, Daniel Steuernol <dan...@sendwithus.com>
>>> wrote:
>>>
>>> The cluster is running with RF=3, right now each node is storing about
>>> 3-4 TB of data. I'm using r4.2xlarge EC2 instances, these have 8 vCPU's, 61
>>> GB of RAM, and the disks attached for the data drive are gp2 ssd ebs
>>> volumes with 10k iops. I guess this brings up the question of what's a good
>>> marker to decide on whether to increase disk space vs provisioning a new
>>> node?
>>>
>>>
>>> On May 29 2017, at 9:35 am, tommaso barbugli <tbarbu...@gmail.com>
>>> wrote:
>>>
>>> Hi Daniel,
>>>
>>> This is not normal. Possibly a capacity problem. Whats the RF, how much
>>> data do you store per node and what kind of servers do you use (core count,
>>> RAM, disk, ...)?
>>>
>>> Cheers,
>>> Tommaso
>>>
>>> On Mon, May 29, 2017 at 6:22 PM, Daniel Steuernol <dan...@sendwithus.com
>>> > wrote:
>>>
>>>
>>> I am running a 6 node cluster, and I have noticed that the reported load
>>> on each node rises throughout the week and grows way past the actual disk
>>> space used and available on each node. Also eventually latency for
>>> operations suffers and the nodes have to be restarted. A couple questions
>>> on this, is this normal? Also does cassandra need to be restarted every few
>>> days for best performance? Any insight on this behaviour would be helpful.
>>>
>>> Cheers,
>>> Daniel
>>> -
>>> To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org For
>>> additional commands, e-mail: user-h...@cassandra.apache.org
>>>
>>>
>>>
>>>
>>> -
>> To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org For
>> additional commands, e-mail: user-h...@cassandra.apache.org
>>
>
>


Re: Restarting nodes and reported load

2017-05-30 Thread daemeon reiydelle
When you say "the load rises ... ", could you clarify what you mean by
"load"? That has a specific Linux term, and in e.g. Cloudera Manager. But
in neither case would that be relevant to transient or persisted disk. Am I
missing something?


On Tue, May 30, 2017 at 10:18 AM, tommaso barbugli 
wrote:

> 3-4 TB per node or in total?
>
> On Tue, May 30, 2017 at 6:48 PM, Daniel Steuernol 
> wrote:
>
>> I should also mention that I am running cassandra 3.10 on the cluster
>>
>>
>>
>> On May 29 2017, at 9:43 am, Daniel Steuernol 
>> wrote:
>>
>>> The cluster is running with RF=3, right now each node is storing about
>>> 3-4 TB of data. I'm using r4.2xlarge EC2 instances, these have 8 vCPU's, 61
>>> GB of RAM, and the disks attached for the data drive are gp2 ssd ebs
>>> volumes with 10k iops. I guess this brings up the question of what's a good
>>> marker to decide on whether to increase disk space vs provisioning a new
>>> node?
>>>
>>>
>>> On May 29 2017, at 9:35 am, tommaso barbugli 
>>> wrote:
>>>
>>> Hi Daniel,
>>>
>>> This is not normal. Possibly a capacity problem. Whats the RF, how much
>>> data do you store per node and what kind of servers do you use (core count,
>>> RAM, disk, ...)?
>>>
>>> Cheers,
>>> Tommaso
>>>
>>> On Mon, May 29, 2017 at 6:22 PM, Daniel Steuernol >> > wrote:
>>>
>>>
>>> I am running a 6 node cluster, and I have noticed that the reported load
>>> on each node rises throughout the week and grows way past the actual disk
>>> space used and available on each node. Also eventually latency for
>>> operations suffers and the nodes have to be restarted. A couple questions
>>> on this, is this normal? Also does cassandra need to be restarted every few
>>> days for best performance? Any insight on this behaviour would be helpful.
>>>
>>> Cheers,
>>> Daniel
>>> -
>>> To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org For
>>> additional commands, e-mail: user-h...@cassandra.apache.org
>>>
>>>
>>>
>


Re: How do you do automatic restacking of AWS instance for cassandra?

2017-05-28 Thread daemeon reiydelle
This is in fact an interesting security practice that makes sense. It
assumes the existing ami had security holes that WERE ALREADY exploited.
See if you can negotiate moving the hdfs volumes to persistent storage. Fyi
two major banks I have worked with did much the same, but as the storage
was SAN (with VMware ) I was able to make adjustments to the ansible
scripts (client was providing mobile banking solutions to the bank)

I had another client using AWS, Chef, Terraform. I WAS NOT able to make
this work in Chef. I can do it with Ansible, Terraform, AWS however.

“All men dream, but not equally. Those who dream by night in the dusty
recesses of their minds wake up in the day to find it was vanity, but the
dreamers of the day are dangerous men, for they may act their dreams with
open eyes, to make it possible.” — T.E. Lawrence

sent from my mobile
Daemeon Reiydelle
skype daemeon.c.m.reiydelle
USA 415.501.0198

On May 28, 2017 1:25 AM, "Anthony Grasso" <anthony.gra...@gmail.com> wrote:

> Hi Surbhi,
>
> Please see my comment inline below.
>
> On 28 May 2017 at 12:11, Jeff Jirsa <jji...@apache.org> wrote:
>
>>
>>
>> On 2017-05-27 18:04 (-0700), Surbhi Gupta <surbhi.gupt...@gmail.com>
>> wrote:
>> > Thanks a lot for all of your reply.
>> > Our requirement is :
>> > Our company releases AMI almost every month where they have some or the
>> > other security packages.
>> > So as per our security team we need to move our cassandra cluster to the
>> > new AMI .
>> > As this process happens every month, we would like to automate the
>> process .
>> > Few points to consider here:
>> >
>> > 1. We are using ephemeral drives to store cassandra data
>> > 2. We are on dse 4.8.x
>> >
>> > So currently to do the process, we pinup a new nodes with new DC name
>> and
>> > join that DC, alter the keyspace, do rebuild  and later alter the
>> keyspace
>> > again to remove the old DC .
>> >
>> > But all of this process is manually done as of now.
>> >
>> > So i wanted to understand , on AWS, how do you do above kind of task
>> > automatically ?
>>
>>
>> At a previous employer, they used M4 class instances with data on a
>> dedicated EBS volumes, so we could swap AMIs / stop / start / adjust
>> instances without having to deal with this. This worked reasonably well for
>> their scale (which was petabytes of data).
>>
>
> This is a really good option as it avoids streaming data to replace a node
> which could potentially be quicker if dealing with large amounts of data on
> each node.
>
>
>>
>> Other companies using ephemeral tend to be more willing to just terminate
>> instances and replace them (-Dcassandra.replace_address). If you stop
>> cassandra, then boot a replacement with 'replace_address' set, it'll take
>> over for the stopped instance, including re-streaming all data (as best it
>> can, subject to consistency level and repair status). This may be easier
>> for you to script than switching your fleet to EBS, but it's not without
>> risk.
>>
>
> A quick note if you do decide to go down this path. If you are using
> Cassandra version 2.x.x and above, the cassandra.replace_address_firs
> t_boot can also be used. This option works once when Cassandra is first
> started and the replacement node inserted into the cluster. After that, the
> option is ignored for all subsequent restarts, where as
> cassandra.replace_address needs to be removed from the *cassandra-env.sh*
> file in order to restart the node. Restart behaviour aside, both options
> operate in the same way to replace a node in the cluster.
>
>
>>
>>
>>
>> -
>> To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
>> For additional commands, e-mail: user-h...@cassandra.apache.org
>>
>>
>


Re: How do you do automatic restacking of AWS instance for cassandra?

2017-05-25 Thread daemeon reiydelle
What is restacking?





*Daemeon C.M. ReiydelleUSA (+1) 415.501.0198London (+44) (0) 20 8144 9872*


*“All men dream, but not equally. Those who dream by night in the dusty
recesses of their minds wake up in the day to find it was vanity, but the
dreamers of the day are dangerous men, for they may act their dreams with
open eyes, to make it possible.” — T.E. Lawrence*


On Thu, May 25, 2017 at 10:24 AM, Surbhi Gupta 
wrote:

> Hi,
>
> Wanted to understand, how do you do automatic restacking of cassandra
> nodes on AWS?
>
> Thanks
> Surbhi
>


Re: How to avoid flush if the data can fit into memtable

2017-05-25 Thread daemeon reiydelle
This sounds exactly like a previous post that ended when I asked the person
to document the number of nodes ec2 instance type and size. I suspected a
single nose you system. So the poster reposts? Hmm.

“All men dream, but not equally. Those who dream by night in the dusty
recesses of their minds wake up in the day to find it was vanity, but the
dreamers of the day are dangerous men, for they may act their dreams with
open eyes, to make it possible.” — T.E. Lawrence

sent from my mobile
Daemeon Reiydelle
skype daemeon.c.m.reiydelle
USA 415.501.0198

On May 25, 2017 9:14 AM, "Jonathan Haddad" <j...@jonhaddad.com> wrote:

Sorry for the confusion.  That was for the OP.  I wrote it quickly right
after waking up.

What I'm asking is why does the OP want to keep his data in the memtable
exclusively?  If the goal is to "make reads fast", then just turn on row
caching.

If there's so little data that it fits in memory (300MB), and there aren't
going to be any writes past the initial small dataset, why use Cassandra?
It sounds like the wrong tool for this job.  Sounds like something that
could easily be stored in S3 and loaded in memory when the app is fired up.


On Thu, May 25, 2017 at 8:06 AM Avi Kivity <a...@scylladb.com> wrote:

> Not sure whether you're asking me or the original poster, but the more
> times data gets overwritten in a memtable, the less it has to be compacted
> later on (and even without overwrites, larger memtables result in less
> compaction).
>
> On 05/25/2017 05:59 PM, Jonathan Haddad wrote:
>
> Why do you think keeping your data in the memtable is a what you need to
> do?
> On Thu, May 25, 2017 at 7:16 AM Avi Kivity <a...@scylladb.com> wrote:
>
>> Then it doesn't have to (it still may, for other reasons).
>>
>> On 05/25/2017 05:11 PM, preetika tyagi wrote:
>>
>> What if the commit log is disabled?
>>
>> On May 25, 2017 4:31 AM, "Avi Kivity" <a...@scylladb.com> wrote:
>>
>>> Cassandra has to flush the memtable occasionally, or the commit log
>>> grows without bounds.
>>>
>>> On 05/25/2017 03:42 AM, preetika tyagi wrote:
>>>
>>> Hi,
>>>
>>> I'm running Cassandra with a very small dataset so that the data can
>>> exist on memtable only. Below are my configurations:
>>>
>>> In jvm.options:
>>>
>>> -Xms4G
>>> -Xmx4G
>>>
>>> In cassandra.yaml,
>>>
>>> memtable_cleanup_threshold: 0.50
>>> memtable_allocation_type: heap_buffers
>>>
>>> As per the documentation in cassandra.yaml, the
>>> *memtable_heap_space_in_mb* and *memtable_heap_space_in_mb* will be set
>>> of 1/4 of heap size i.e. 1000MB
>>>
>>> According to the documentation here (http://docs.datastax.com/en/
>>> cassandra/3.0/cassandra/configuration/configCassandra_
>>> yaml.html#configCassandra_yaml__memtable_cleanup_threshold), the
>>> memtable flush will trigger if the total size of memtabl(s) goes beyond
>>> (1000+1000)*0.50=1000MB.
>>>
>>> Now if I perform several write requests which results in almost ~300MB
>>> of the data, memtable still gets flushed since I see sstables being created
>>> on file system (Data.db etc.) and I don't understand why.
>>>
>>> Could anyone explain this behavior and point out if I'm missing
>>> something here?
>>>
>>> Thanks,
>>>
>>> Preetika
>>>
>>>
>>>
>>
>


Re: Replication issue with Multi DC setup in cassandra

2017-05-24 Thread daemeon reiydelle
Cqlsh looks at the cluster, not node

“All men dream, but not equally. Those who dream by night in the dusty
recesses of their minds wake up in the day to find it was vanity, but the
dreamers of the day are dangerous men, for they may act their dreams with
open eyes, to make it possible.” — T.E. Lawrence

sent from my mobile
Daemeon Reiydelle
skype daemeon.c.m.reiydelle
USA 415.501.0198

On May 16, 2017 2:42 PM, "suraj pasuparthy" <suraj.pasupar...@gmail.com>
wrote:

> So i though the same,
> I see the data via the CQLSH in both the datacenters. consistency is set
> to LQ
>
> thanks
> -Suraj
>
> On Tue, May 16, 2017 at 2:19 PM, Nitan Kainth <ni...@bamlabs.com> wrote:
>
>> Do you see data on other DC or just directory structure? Directory
>> structure would populate because it is DDL but inserts shouldn’t populate,
>> ideally.
>>
>> On May 16, 2017, at 3:19 PM, suraj pasuparthy <suraj.pasupar...@gmail.com>
>> wrote:
>>
>> elp me fig
>>
>>
>>
>
>
> --
> Suraj Pasuparthy
>
> cisco systems
> Software Engineer
> San Jose CA
>


Re: Replication issue with Multi DC setup in cassandra

2017-05-24 Thread daemeon reiydelle
May I inquire if your configuration is actually data center aware? Do you
understand the difference between LQ and replication?





*Daemeon C.M. ReiydelleUSA (+1) 415.501.0198London (+44) (0) 20 8144 9872*


*“All men dream, but not equally. Those who dream by night in the dusty
recesses of their minds wake up in the day to find it was vanity, but the
dreamers of the day are dangerous men, for they may act their dreams with
open eyes, to make it possible.” — T.E. Lawrence*


On Wed, May 24, 2017 at 12:03 PM, Igor Leão  wrote:

> Did you run `nodetool repair` after changing the keyspace? (not sure if it
> makes sense though)
>
> 2017-05-16 19:52 GMT-03:00 Nitan Kainth :
>
>> Strange. Anybody else might share something more important.
>>
>> Sent from my iPhone
>>
>> On May 16, 2017, at 5:23 PM, suraj pasuparthy 
>> wrote:
>>
>> Yes is see them in the datacenter's data directories.. infact i see then
>> even after i bring down the interface between the 2 DC's which further
>> confirms that a local copy is maintained in the DC that was not configured
>> in the strategy ..
>> its quite important that we block the info for this keyspace from
>> replicating :(.. not sure why this does not work
>>
>> Thanks
>> Suraj
>>
>> On Tue, May 16, 2017 at 3:06 PM Nitan Kainth  wrote:
>>
>>> check for datafiles on filesystem in both DCs.
>>>
>>> On May 16, 2017, at 4:42 PM, suraj pasuparthy <
>>> suraj.pasupar...@gmail.com> wrote:
>>>
>>> So i though the same,
>>> I see the data via the CQLSH in both the datacenters. consistency is set
>>> to LQ
>>>
>>> thanks
>>> -Suraj
>>>
>>> On Tue, May 16, 2017 at 2:19 PM, Nitan Kainth  wrote:
>>>
 Do you see data on other DC or just directory structure? Directory
 structure would populate because it is DDL but inserts shouldn’t populate,
 ideally.

 On May 16, 2017, at 3:19 PM, suraj pasuparthy <
 suraj.pasupar...@gmail.com> wrote:

 elp me fig



>>>
>>>
>>> --
>>> Suraj Pasuparthy
>>>
>>> cisco systems
>>> Software Engineer
>>> San Jose CA
>>>
>>>
>>>
>>>
>>>
>>>
>
>
> --
> Igor Leão  Site Reliability Engineer
>
> Mobile: +55 81 99727-1083 
> Skype: *igorvpcleao*
> Office: +55 81 4042-9757 
> Website: inlocomedia.com 
> [image: inlocomedia]
> 
>  [image: LinkedIn]
> 
>  [image: Facebook]  [image: Twitter]
> 
>
>
>
>
>
>
>


Re: Impact on latency with larger memtable

2017-05-24 Thread daemeon reiydelle
You speak of increase. Please provide your results. Specific examples, Eg
25% increase results in n% increase. Also please include number of nodes,
size of total keyspace, rep factor, etc.

Hopefully this is a 6 node cluster with several hundred gig per keyspace,
not some single node free tier box.

“All men dream, but not equally. Those who dream by night in the dusty
recesses of their minds wake up in the day to find it was vanity, but the
dreamers of the day are dangerous men, for they may act their dreams with
open eyes, to make it possible.” — T.E. Lawrence

sent from my mobile
Daemeon Reiydelle
skype daemeon.c.m.reiydelle
USA 415.501.0198

On May 24, 2017 9:32 AM, "preetika tyagi" <preetikaty...@gmail.com> wrote:

> Hi,
>
> I'm experimenting with memtable/heap size on my Cassandra server to
> understand how it impacts the latency/throughput for read requests.
>
> I vary heap size (Xms and -Xmx) in jvm.options so memtable will be 1/4 of
> this. When I increase the heap size and hence memtable, I notice the drop
> in throughput and increase in latency. I'm also creating the database such
> that its size doesn't exceed the size of memtable. Therefore, all data
> exist in memtable and I'm not able to reason why bigger size of memtable is
> resulting into higher latency/low throughput.
>
> Since everything is DRAM, shouldn't the throughput/latency remain same in
> all the cases?
>
> Thanks,
> Preetika
>


Re: Cassandra Node Density thresholds

2017-05-19 Thread daemeon reiydelle
500 nodes, 20tb of ACTIVE DATA per node in hdfs, no brainer, no problem.
But remember the cross DC traffic will get substantial.

“All men dream, but not equally. Those who dream by night in the dusty
recesses of their minds wake up in the day to find it was vanity, but the
dreamers of the day are dangerous men, for they may act their dreams with
open eyes, to make it possible.” — T.E. Lawrence

sent from my mobile
Daemeon Reiydelle
skype daemeon.c.m.reiydelle
USA 415.501.0198

On May 19, 2017 9:05 AM, "ZAIDI, ASAD A" <az1...@att.com> wrote:

> Hello Folks -
>
> I'm using open source apache Cassandra 2.2 .My cluster is spread over 14
> nodes in cluster in two data centers.
>
>
>
> My DC1 data center nodes are reaching 2TB of consumed volume. we don't
> have much space left on disk.
>
> I am wondering if there is guideline available that can point me to
> certain best practice that describe when we should add more nodes to the
> cluster.  should we add more storage or add more nodes. I guess we should
> scale Cassandra horizontally so adding node may be better option.. i am
> looking for a criteria that describes node density thresholds, if there are
> any.
>
> Can you guys please share your thoughts , experience. I'll much appreciate
> your reply. Thanks/Asad
>
>
>
>
>


Re: Can I have multiple datacenter with different versions of Cassandra

2017-05-18 Thread daemeon reiydelle
Yes, or decomission the old one and build anew after new one is operational

“All men dream, but not equally. Those who dream by night in the dusty
recesses of their minds wake up in the day to find it was vanity, but the
dreamers of the day are dangerous men, for they may act their dreams with
open eyes, to make it possible.” — T.E. Lawrence

sent from my mobile
Daemeon Reiydelle
skype daemeon.c.m.reiydelle
USA 415.501.0198

On May 18, 2017 8:20 AM, "Chuck Reynolds" <creyno...@ancestry.com> wrote:

> I have a need to create another datacenter and upgrade my existing
> Cassandra from 2.1.13 to Cassandra 3.0.9.
>
>
>
> Can I do this as one step?  Create a new Cassandra ring that is version
> 3.0.9 and replicate the data from an existing ring that is Cassandra 2.1.13?
>
>
>
> After replicating to the new ring if possible them I would upgrade the old
> ring to Cassandra 3.0.9
>


Re: Bootstraping a Node With a Newer Version

2017-05-17 Thread daemeon reiydelle
​So you are not upgrading the kernel, you are upgrading the OS. Not what
you asked about. Your devops team is right.

However, Depending on what is using python, the new version of python may
break older scripts (I do not know, mentioning this, testing required?)
W
​hen I am doing an OS upgrade (and usually ditto with Hadoop) I ​
add nodes to the cluster at the new OS/HDFS version, decom​​
​mission old nodes, and repeat. The replication takes a bit but zero down
time, etc. Since you don't have a lot of storage per node, I don't think
you will have a lot of high network traffic impacting the performance of
nodes.





*Daemeon C.M. ReiydelleUSA (+1) 415.501.0198London (+44) (0) 20 8144 9872*



On Wed, May 17, 2017 at 12:51 AM, Shalom Sagges <shal...@liveperson.com>
wrote:

> Our DevOPS team told me that their policy is not to perform major kernel
> upgrades but simply install a clean new version.
> I also checked online and found a lot of recommendations *not *to do so
> as there might be a lot of dependencies issues that may affect processes
> such as yum.
> e.g.
> https://www.centos.org/forums/viewtopic.php?t=53678
> "The upgrade from CentOS 6 to 7 is a process that is fraught with danger
> and very very untested. Almost no-one succeeds without extreme effort. The
> CentOS wiki page about it has a big fat warning saying "Do not do this". If
> at all possible you should do a parallel install, migrate your data, apps
> and settings to the new box and decommission the old one.
>
> The problem comes about because there are a large number of packages in
> el6 that already have a higher version number than those in el7. This means
> that the el6 packages take precedence in the update and there are quite a
> few orphans left behind and these break lilttle things like yum. For
> example, one that I know about is openldap which is
> openldap-2.4.40-5.el6.x86_64 and openldap-2.4.39-6.el7.x86_64 so the el6
> package is seen as newer than the el7 one. Anything that's linked against
> openldap (a *lot*) now will not function until that package is replaced
> with its el7 equivalent, The easiest way to do this would be to yum
> downgrade openldap but, ooops, one of the things that needs openldap is
> yum so it doesn't work."
>
>
> I've also checked the Centos Wiki page and found the same recommendation:
> https://wiki.centos.org/FAQ/General?highlight=%28upgrade%
> 29%7C%28to%29%7C%28centos7%29#head-3ac1bdb51f0fecde1f98142cef90e8
> 87b1b12a00 :
>
> *"Upgrades in place are not supported nor recommended by CentOS or TUV. A
> backup followed by a fresh install is the only recommended upgrade path.
> See the Migration Guide for more information."*
>
>
> Since I have around twenty 2TB nodes in each DC (2 DCs in 6 different
> farms) and I don't want it to take forever, perhaps the best way would be
> to either leave it with Centos 6 and install Python 2.7 (I understand
> that's not so user friendly) or perform the backup recommendations shown on
> the Centos page (which sounds extremely agonizing as well).
>
> What do you think?
>
> Thanks!
>
>
> Shalom Sagges
> DBA
> T: +972-74-700-4035 <+972%2074-700-4035>
> <http://www.linkedin.com/company/164748> <http://twitter.com/liveperson>
> <http://www.facebook.com/LivePersonInc> We Create Meaningful Connections
>
>
>
> On Tue, May 16, 2017 at 6:48 PM, daemeon reiydelle <daeme...@gmail.com>
> wrote:
>
>> What makes you think you cannot upgrade the kernel?
>>
>> “All men dream, but not equally. Those who dream by night in the dusty
>> recesses of their minds wake up in the day to find it was vanity, but the
>> dreamers of the day are dangerous men, for they may act their dreams with
>> open eyes, to make it possible.” — T.E. Lawrence
>>
>> sent from my mobile
>> Daemeon Reiydelle
>> skype daemeon.c.m.reiydelle
>> USA 415.501.0198 <(415)%20501-0198>
>>
>> On May 16, 2017 5:27 AM, "Shalom Sagges" <shal...@liveperson.com> wrote:
>>
>>> Hi All,
>>>
>>> Hypothetically speaking, let's say I want to upgrade my Cassandra
>>> cluster, but I also want to perform a major upgrade to the kernel of all
>>> nodes.
>>> In order to upgrade the kernel, I need to reinstall the server, hence
>>> lose all data on the node.
>>>
>>> My question is this, after reinstalling the server with the new kernel,
>>> can I first install the upgraded Cassandra version and then bootstrap it to
>>> the cluster?
>>>
>>> Since there's already no data on the node, I wish to skip the agonizing
>>> sstable upgrade process.
>>>
>>> Does anyon

Re: Bootstraping a Node With a Newer Version

2017-05-16 Thread daemeon reiydelle
What makes you think you cannot upgrade the kernel?

“All men dream, but not equally. Those who dream by night in the dusty
recesses of their minds wake up in the day to find it was vanity, but the
dreamers of the day are dangerous men, for they may act their dreams with
open eyes, to make it possible.” — T.E. Lawrence

sent from my mobile
Daemeon Reiydelle
skype daemeon.c.m.reiydelle
USA 415.501.0198

On May 16, 2017 5:27 AM, "Shalom Sagges" <shal...@liveperson.com> wrote:

> Hi All,
>
> Hypothetically speaking, let's say I want to upgrade my Cassandra cluster,
> but I also want to perform a major upgrade to the kernel of all nodes.
> In order to upgrade the kernel, I need to reinstall the server, hence lose
> all data on the node.
>
> My question is this, after reinstalling the server with the new kernel,
> can I first install the upgraded Cassandra version and then bootstrap it to
> the cluster?
>
> Since there's already no data on the node, I wish to skip the agonizing
> sstable upgrade process.
>
> Does anyone know if this is doable?
>
> Thanks!
>
>
>
> Shalom Sagges
> DBA
> T: +972-74-700-4035 <+972%2074-700-4035>
> <http://www.linkedin.com/company/164748> <http://twitter.com/liveperson>
> <http://www.facebook.com/LivePersonInc> We Create Meaningful Connections
>
>
>
> This message may contain confidential and/or privileged information.
> If you are not the addressee or authorized to receive this on behalf of
> the addressee you must not use, copy, disclose or take action based on this
> message or any information herein.
> If you have received this message in error, please advise the sender
> immediately by reply email and delete this message. Thank you.
>


Re: Cassandra as a key/object store for many small (10-60k) files

2017-05-05 Thread daemeon reiydelle
I would guess you have network overload issues, I have seen pretty much
exactly what you describe many times, (so far ;{) always this is the issue.
Especially with 1gbit networks, no jumbo frames, etc. Get your network guys
to monitor the error retry packets across ALL of the interfaces (all the
nodes, Top of Rack switch, network switches, etc.). If you see ANY retries,
timeouts, errors, you have found your problem.

Or it could be something like java stack garbage collection, cpu overload,
etc.


*...*

*Making a billion dollar startup is easy: "take a human desire, preferably
one that has been around for a really long time … Identify that desire and
use modern technology to take out steps."*


*...Daemeon C.M. ReiydelleUSA (+1) 415.501.0198London (+44) (0) 20 8144
9872*

On Fri, May 5, 2017 at 12:26 PM, Jonathan Guberman <j...@tineye.com> wrote:

> Yes, local storage volumes on each machine.
>
> On May 5, 2017, at 3:25 PM, daemeon reiydelle <daeme...@gmail.com> wrote:
>
> These numbers do not match e.g. AWS, so guessing you are using local
> storage?
>
>
> *...*
>
> *Making a billion dollar startup is easy: "take a human desire, preferably
> one that has been around for a really long time … Identify that desire and
> use modern technology to take out steps."*
>
>
> *...Daemeon C.M. ReiydelleUSA (+1) 415.501.0198
> <(415)%20501-0198>London (+44) (0) 20 8144 9872 <+44%2020%208144%209872>*
>
> On Fri, May 5, 2017 at 12:19 PM, Jonathan Guberman <j...@tineye.com> wrote:
>
>> Hello,
>>
>> We’re currently testing Cassandra for use as a pure key-object store for
>> data blobs around 10kB - 60kB each. Our use case is storing on the order of
>> 10 billion objects with about 5-20 million new writes per day. A written
>> object will never be updated or deleted. Objects will be read at least
>> once, some time within 10 days of being written. This will generally happen
>> as a batch; that is, all of the images written on a particular day will be
>> read together at the same time. This batch read will only happen one time;
>> future reads will happen on individual objects, with no grouping, and they
>> will follow a long-tail distribution, with popular objects read thousands
>> of times per year but most read never or virtually never.
>>
>> I’ve set up a small four node test cluster and have written test scripts
>> to benchmark writing and reading our data. The table I’ve set up is very
>> simple: an ascii primary key column with the object ID and a blob column
>> for the data. All other settings were left at their defaults.
>>
>> I’ve found write speeds to be very fast most of the time. However,
>> periodically, writes will slow to a crawl for anywhere between half an hour
>> to two hours, after which speeds recover to their previous levels. I assume
>> this is some sort of data compaction or flushing to disk, but I haven’t
>> been able to figure out the exact cause.
>>
>> Read speeds have been more disappointing. Cached reads are very fast, but
>> random read speed averages about 2 MB/sec, which is too slow when we need
>> to read out a batch of several million objects. I don’t think it’s
>> reasonable to assume that these rows will all still be cached by the time
>> we need to read them for that first large batch read.
>>
>> My general question is whether anyone has any suggestions for how to
>> improve performance for our use case. More specifically:
>>
>> - Is there a way to mitigate or eliminate the huge slowdowns I see when
>> writing millions of rows?
>> - Are there settings I should be using in order to maximize read speeds
>> for random reads?
>> - Is there a way to design our tables to improve the read speeds for the
>> initial large batched reads? I was thinking of using a batch ID column that
>> could be used to retrieve the data for the initial block. However, future
>> reads would need to be done by the object ID, not the batch ID, so it seems
>> to me I’d need to duplicate the data, one in a “objects by batch” table,
>> and the other in a simple “objects” table. Is there a better approach than
>> this?
>>
>> Thank you!
>>
>>
>>
>> -
>> To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
>> For additional commands, e-mail: user-h...@cassandra.apache.org
>>
>>
>
>


Re: Cassandra as a key/object store for many small (10-60k) files

2017-05-05 Thread daemeon reiydelle
These numbers do not match e.g. AWS, so guessing you are using local
storage?


*...*

*Making a billion dollar startup is easy: "take a human desire, preferably
one that has been around for a really long time … Identify that desire and
use modern technology to take out steps."*


*...Daemeon C.M. ReiydelleUSA (+1) 415.501.0198London (+44) (0) 20 8144
9872*

On Fri, May 5, 2017 at 12:19 PM, Jonathan Guberman  wrote:

> Hello,
>
> We’re currently testing Cassandra for use as a pure key-object store for
> data blobs around 10kB - 60kB each. Our use case is storing on the order of
> 10 billion objects with about 5-20 million new writes per day. A written
> object will never be updated or deleted. Objects will be read at least
> once, some time within 10 days of being written. This will generally happen
> as a batch; that is, all of the images written on a particular day will be
> read together at the same time. This batch read will only happen one time;
> future reads will happen on individual objects, with no grouping, and they
> will follow a long-tail distribution, with popular objects read thousands
> of times per year but most read never or virtually never.
>
> I’ve set up a small four node test cluster and have written test scripts
> to benchmark writing and reading our data. The table I’ve set up is very
> simple: an ascii primary key column with the object ID and a blob column
> for the data. All other settings were left at their defaults.
>
> I’ve found write speeds to be very fast most of the time. However,
> periodically, writes will slow to a crawl for anywhere between half an hour
> to two hours, after which speeds recover to their previous levels. I assume
> this is some sort of data compaction or flushing to disk, but I haven’t
> been able to figure out the exact cause.
>
> Read speeds have been more disappointing. Cached reads are very fast, but
> random read speed averages about 2 MB/sec, which is too slow when we need
> to read out a batch of several million objects. I don’t think it’s
> reasonable to assume that these rows will all still be cached by the time
> we need to read them for that first large batch read.
>
> My general question is whether anyone has any suggestions for how to
> improve performance for our use case. More specifically:
>
> - Is there a way to mitigate or eliminate the huge slowdowns I see when
> writing millions of rows?
> - Are there settings I should be using in order to maximize read speeds
> for random reads?
> - Is there a way to design our tables to improve the read speeds for the
> initial large batched reads? I was thinking of using a batch ID column that
> could be used to retrieve the data for the initial block. However, future
> reads would need to be done by the object ID, not the batch ID, so it seems
> to me I’d need to duplicate the data, one in a “objects by batch” table,
> and the other in a simple “objects” table. Is there a better approach than
> this?
>
> Thank you!
>
>
>
> -
> To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
> For additional commands, e-mail: user-h...@cassandra.apache.org
>
>


Re: Service discovery in the Cassandra cluster

2017-05-02 Thread daemeon reiydelle
My compliments to all of you for being adults, excessively kind, and
definitely excessively nice.


*...*



*Daemeon C.M. ReiydelleUSA (+1) 415.501.0198London (+44) (0) 20 8144 9872*

On Tue, May 2, 2017 at 5:08 PM, Steve Robenalt 
wrote:

> Hi Roman,
>
> I'm assuming you were intending your first statement to be in jest, but
> it's really not that hard to startup a Cassandra cluster. The defaults are
> pretty usable, so if all you want to do is set the IPs and start it up, the
> cluster probably will just take care of everything else.
>
> So I jest a little bit too. It's normally desirable to set up storage
> properly for your database, and there's a few options for which you might
> want to change the defaults, such as the snitch.
>
> Still, if that means you only need to take note of of a couple of IPs and
> designate them as seeds so your cluster can mostly manage itself, you can
> say that's sad, but I'd say it's a small price to pay for all that you
> don't have to do.
>
> Steve
>
> On Mon, May 1, 2017 at 4:55 PM, Roman Naumenko 
> wrote:
>
>> Lol yeah, why
>> I guess I run some ec2 instances, drop some cassandra deb packages on 'em
>> - the thing will figure out how to run...
>>
>> Also, how would you get "initial state of the cluster" if the cluster...
>> is being initialized?
>> Or that's easy, according to the docs - just hardcode some seed IPs into
>> each node, lol
>>
>> It's all kinda funny, but in a sad way.
>>
>> On Mon, May 1, 2017 at 4:45 PM, Jon Haddad 
>> wrote:
>>
>>> Why do you have to figure out what’s up w/ them by accident?  You’ve
>>> gotten all the information you need.  Seeds are used to get the initial
>>> state of the cluster and as an optimization to spread gossip faster.
>>> That’s it.
>>>
>>>
>>>
>>> On May 1, 2017, at 4:37 PM, Roman Naumenko  wrote:
>>>
>>> Well, I guess I have to figure out what’s up with IPs/hostnames by
>>> experiment.
>>> Information about service discovery is practically absent.
>>> Not to mention all important details about fqdns/hostnames, automatic
>>> replacing seed nodes or what not.
>>>
>>> —
>>> Roman
>>>
>>> On May 1, 2017, at 4:14 PM, Jon Haddad 
>>> wrote:
>>>
>>> The in-tree docs do not mention this anywhere, and even have some of the
>>> answers you’re asking:
>>>
>>> https://cassandra.apache.org/doc/latest/faq/index.html?highl
>>> ight=seed#what-are-seeds
>>>
>>> The DataStax docs are maintained outside of the project, you’ll have to
>>> ask them why they’re wrong or misleading.
>>>
>>> Jon
>>>
>>> On May 1, 2017, at 4:10 PM, Roman Naumenko  wrote:
>>>
>>> The docs mention IP addresses everywhere.
>>>
>>> http://docs.datastax.com/en/archived/cassandra/2.0/cassandra
>>> /operations/ops_replace_seed_node.html
>>> Promote an existing node to a seed node by adding its IP address to
>>> -seeds list and remove (demote) the IP address of the dead seed node from
>>> the cassandra.yaml file for each node in the cluster.
>>>
>>> http://docs.datastax.com/en/archived/cassandra/2.0/cassandra
>>> /operations/ops_replace_node_t.html
>>> Note the Address of the dead node; it is used in step 5.
>>>
>>> http://docs.datastax.com/en/cassandra/2.1/cassandra/initiali
>>> ze/initializeSingleDS.html
>>>
>>> Properties to set:
>>> num_tokens: recommended value: 256
>>> -seeds: internal IP address of each seed node
>>>
>>>
>>> I saw also *hostnames *mentioned few times, but it just makes it even
>>> more confusing.
>>>
>>> —
>>> Roman
>>>
>>> On May 1, 2017, at 3:50 PM, Jon Haddad 
>>> wrote:
>>>
>>> Sure, you could use DNS.  Where does it say IP addresses are a
>>> requirement?
>>>
>>> On May 1, 2017, at 1:36 PM, Roman Naumenko  wrote:
>>>
>>> If I understand how Cassandra nodes work, they must contain a list of
>>> seed’s IP addressed in config file.
>>>
>>> This requirement makes cluster setup unnecessarily complicated. Is it
>>> possible to use DNS name for seed nodes?
>>>
>>> Thanks,
>>>
>>> —
>>> Roman
>>> -
>>> To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
>>> For additional commands, e-mail: user-h...@cassandra.apache.org
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>
>
>
> --
>
>
> * Steve Robenalt Software Architect, HighWire Press, Inc. *
> www.highwire.org| Los Gatos, CA| Belfast, NI| Brighton, UK
> 
> 
>
> *HighWire Summer Publishers' Meeting, London, June 12-13
> *
> STM Annual US Conference, April 25-27: Michiel Klein Swormink and Jennifer
> Chang are representing HighWire
> 
> 2017 CSE Annual Meeting: John Sack is presenting on topic of 

Re: Service discovery in the Cassandra cluster

2017-05-01 Thread daemeon reiydelle
Yes, you can use host names. That merely adds another level of
configuration. When using terraform, I often use node names like
 and just use those. They are only routable within the
region/VPC but are in fact already in dns. You do have to watch out as if
you change the seeds (in tf) or the cluster can get terminated and rebuild.
If you have a way to capture these (you can do it in ansible, I had been
told it is really hard to do in Chef/Puppet) then your cms can just adjust
the xml as needed without fussing with route53.


*...*



*Daemeon C.M. ReiydelleUSA (+1) 415.501.0198London (+44) (0) 20 8144 9872*

On Mon, May 1, 2017 at 3:50 PM, Jon Haddad 
wrote:

> Sure, you could use DNS.  Where does it say IP addresses are a requirement?
>
> > On May 1, 2017, at 1:36 PM, Roman Naumenko  wrote:
> >
> > If I understand how Cassandra nodes work, they must contain a list of
> seed’s IP addressed in config file.
> >
> > This requirement makes cluster setup unnecessarily complicated. Is it
> possible to use DNS name for seed nodes?
> >
> > Thanks,
> >
> > —
> > Roman
> > -
> > To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
> > For additional commands, e-mail: user-h...@cassandra.apache.org
> >
>
>
> -
> To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
> For additional commands, e-mail: user-h...@cassandra.apache.org
>
>


Re: Seed nodes as part of cluster

2017-05-01 Thread daemeon reiydelle
Caps below for emphasis, not shouting ;{)

Seed nodes are IDENTICAL to all other node hdfs nodes or you will wish
otherwise. Folks get confused because of terminoligy. I refer to this stuff
as "the seed node service of a normal hdfs node". ANY HDFS NODE IS ABLE TO
ACT AS A SEED NODE BY DEFINITION. But ONLY the nodes listed as seeds in the
XML will be contacted, however.

The seed "function" is only used by new nodes when they FIRST join the
cluster for the FIRST time, then never used again (once an node joins the
cluster it is using different protocols, a separate list of nodes, etc.).




*...*



*Daemeon C.M. ReiydelleUSA (+1) 415.501.0198London (+44) (0) 20 8144 9872*

On Mon, May 1, 2017 at 2:05 PM, Roman Naumenko  wrote:

> So they are like any other “data” node… but special?
>
> I’m so freaking confused by this seed nodes design.
>
> —
> Roman
>
> On May 1, 2017, at 1:37 PM, vasu gunja  wrote:
>
> Seed will contain meta data + actual data too
>
> On Mon, May 1, 2017 at 3:34 PM, Roman Naumenko 
> wrote:
>
>> Hi,
>>
>> I’d like to confirm that seed nodes doesn’t contain any data. Is it
>> correct?
>>
>> Can the instances for seed nodes be smaller size than for data nodes?
>>
>> Thank you
>> Roman
>> -
>> To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
>> For additional commands, e-mail: user-h...@cassandra.apache.org
>>
>>
>
>


Re: Migrating from Datastax Distribution to Apache Cassandra

2017-04-07 Thread daemeon reiydelle
Having done variants of this, I would suggest you bring up new nodes at
approximately the same Apache version as a separate data center, in your
same cluster. Replication strategy may need to be tweaked


*...*



*Daemeon C.M. ReiydelleUSA (+1) 415.501.0198London (+44) (0) 20 8144 9872*

On Fri, Apr 7, 2017 at 1:55 AM, Eren Yilmaz 
wrote:

> Hi,
>
>
>
> We have Cassandra 3.7 installation on Ubuntu, from Datastax distribution
> (using the repo). Since Datastax has announced that they will no longer
> support a community Cassandra distribution, I want to migrate to Apache
> distribution. Are there any differences between distributions? Can I use
> the upgrading procedures as described in https://docs.datastax.com/en/
> latest-upgrade/upgrade/cassandra/upgrdCassandraDetails.html?
>
>
>
> Thanks,
>
> Eren
>


Re: Cassandra and LINUX CPU Context Switches

2017-04-05 Thread daemeon reiydelle
This would be normal if the switches are user to kernel mode (disk &
network IO are kernel mode activities). If your run queue (jobs waiting to
run) is much larger than the number of cores (just a swag but less than
2-3*# of cores), you might have other issues.


*...*



*Daemeon C.M. ReiydelleUSA (+1) 415.501.0198London (+44) (0) 20 8144 9872*

On Wed, Apr 5, 2017 at 5:45 AM, William Boutin 
wrote:

> I’ve noticed that my apache-cassandra 2.2.6 process is consistently
> performing CPU Context Switches above 10,000 per second.
>
> Is this to be expected or should I be looking into ways to lower the
> number of context switches done on my Cassandra cluster?
>
> Thanks in advance.
>
>
>
> [image: Ericsson] 
>
> *WILLIAM L. BOUTIN *
> Engineer IV - Sftwr
> BMDA PADB DSE DU CC NGEE
>
>
> *Ericsson*
> 1 Ericsson Drive, US PI06 1.S747
> Piscataway, NJ, 08854, USA
> Phone (913) 241-5574
> Mobile (732) 213-1368
> Emergency (732) 354-1263
> william.bou...@ericsson.com
> www.ericsson.com
>
> [image: http://www.ericsson.com/current_campaign]
> 
>
> Legal entity: EUS - ERICSSON INC., registered office in US PI01 4A242.
> This Communication is Confidential. We only send and receive email on the
> basis of the terms set out at www.ericsson.com/email_disclaimer
>
>
>


Re: nodes are always out of sync

2017-04-01 Thread daemeon reiydelle
What you are doing is correctly going to result in this, IF there is
substantial backlog/network/disk or whatever pressure.

What do you think will happen when you write with a replication factor
greater than consistency level of write? Perhaps your mental model of how
C* works needs work?


*...*



*Daemeon C.M. ReiydelleUSA (+1) 415.501.0198London (+44) (0) 20 8144 9872*

On Sat, Apr 1, 2017 at 11:09 AM, Vladimir Yudovin 
wrote:

> Hi,
>
> did you try to read data with consistency ALL immediately after write with
> consistency ONE? Does it succeed?
>
> Best regards, Vladimir Yudovin,
> *Winguzone  - Cloud Cassandra Hosting*
>
>
>  On Thu, 30 Mar 2017 04:22:28 -0400 *Roland Otta
> >* wrote 
>
> hi,
>
> we see the following behaviour in our environment:
>
> cluster consists of 6 nodes (cassandra version 3.0.7). keyspace has a
> replication factor 3.
> clients are writing data to the keyspace with consistency one.
>
> we are doing parallel, incremental repairs with cassandra reaper.
>
> even if a repair just finished and we are starting a new one
> immediately, we can see the following entries in our logs:
>
> INFO  [RepairJobTask:1] 2017-03-30 10:14:00,782 SyncTask.java:73 -
> [repair #d0f651f6-1520-11e7-a443-d9f5b942818e] Endpoints /192.168.0.188
> and /192.168.0.191 have 1 range(s) out of sync for ad_event_history
> INFO  [RepairJobTask:2] 2017-03-30 10:14:00,782 SyncTask.java:73 -
> [repair #d0f651f6-1520-11e7-a443-d9f5b942818e] Endpoints /192.168.0.188
> and /192.168.0.189 have 1 range(s) out of sync for ad_event_history
> INFO  [RepairJobTask:4] 2017-03-30 10:14:00,782 SyncTask.java:73 -
> [repair #d0f651f6-1520-11e7-a443-d9f5b942818e] Endpoints /192.168.0.189
> and /192.168.0.191 have 1 range(s) out of sync for ad_event_history
> INFO  [RepairJobTask:2] 2017-03-30 10:14:03,997 SyncTask.java:73 -
> [repair #d0fa70a1-1520-11e7-a443-d9f5b942818e] Endpoints /192.168.0.26
> and /192.168.0.189 have 2 range(s) out of sync for ad_event_history
> INFO  [RepairJobTask:1] 2017-03-30 10:14:03,997 SyncTask.java:73 -
> [repair #d0fa70a1-1520-11e7-a443-d9f5b942818e] Endpoints /192.168.0.26
> and /192.168.0.191 have 2 range(s) out of sync for ad_event_history
> INFO  [RepairJobTask:4] 2017-03-30 10:14:03,997 SyncTask.java:73 -
> [repair #d0fa70a1-1520-11e7-a443-d9f5b942818e] Endpoints /192.168.0.189
> and /192.168.0.191 have 2 range(s) out of sync for ad_event_history
> INFO  [RepairJobTask:1] 2017-03-30 10:14:05,375 SyncTask.java:73 -
> [repair #d0fbd033-1520-11e7-a443-d9f5b942818e] Endpoints /192.168.0.189
> and /192.168.0.191 have 1 range(s) out of sync for ad_event_history
> INFO  [RepairJobTask:2] 2017-03-30 10:14:05,375 SyncTask.java:73 -
> [repair #d0fbd033-1520-11e7-a443-d9f5b942818e] Endpoints /192.168.0.189
> and /192.168.0.190 have 1 range(s) out of sync for ad_event_history
> INFO  [RepairJobTask:4] 2017-03-30 10:14:05,375 SyncTask.java:73 -
> [repair #d0fbd033-1520-11e7-a443-d9f5b942818e] Endpoints /192.168.0.190
> and /192.168.0.191 have 1 range(s) out of sync for ad_event_history
>
> we cant see any hints on the systems ... so we thought everything is
> running smoothly with the writes.
>
> do we have to be concerned about the nodes always being out of sync or
> is this a normal behaviour in a write intensive table (as the tables
> will never be 100% in sync for the latest inserts)?
>
> bg,
> roland
>
>
>
>


Re: How to add a node with zero downtime

2017-03-21 Thread daemeon reiydelle
Possible areas to check:
- too few nodes (node overload) - you did not indicate either replication
factor, number of nodes. Assume nodes are *rather* full.
- network overload (check your TORS's errors, also the tcp stats on the
relevant nodes)
- look for stop the world garbage collection on multiple nodes.


*...*



*Daemeon C.M. ReiydelleUSA (+1) 415.501.0198London (+44) (0) 20 8144 9872*

On Tue, Mar 21, 2017 at 11:17 AM, Cogumelos Maravilha <
cogumelosmaravi...@sapo.pt> wrote:

> Hi list,
>
> I'm using C* 3.10;
>
> authenticator: PasswordAuthenticator and authorizer: CassandraAuthorizer
>
> When adding a node and before nodetool repair system_auth finished all my
> clients die with:
>
> cassandra.cluster.NoHostAvailable: ('Unable to connect to any servers',
> {'10.100.100.19': AuthenticationFailed('Failed to authenticate to ...
>
> Thanks in advance.
>


Re: repair performance

2017-03-20 Thread daemeon reiydelle
I would zero in on network throughput, especially interrack trunks


sent from my mobile
Daemeon Reiydelle
skype daemeon.c.m.reiydelle
USA 415.501.0198

On Mar 17, 2017 2:07 PM, "Roland Otta" <roland.o...@willhaben.at> wrote:

> hello,
>
> we are quite inexperienced with cassandra at the moment and are playing
> around with a new cluster we built up for getting familiar with
> cassandra and its possibilites.
>
> while getting familiar with that topic we recognized that repairs in
> our cluster take a long time. To get an idea of our current setup here
> are some numbers:
>
> our cluster currently consists of 4 nodes (replication factor 3).
> these nodes are all on dedicated physical hardware in our own
> datacenter. all of the nodes have
>
> 32 cores @2,9Ghz
> 64 GB ram
> 2 ssds (raid0) 900 GB each for data
> 1 seperate hdd for OS + commitlogs
>
> current dataset:
> approx 530 GB per node
> 21 tables (biggest one has more than 200 GB / node)
>
>
> i already tried setting compactionthroughput + streamingthroughput to
> unlimited for testing purposes ... but that did not change anything.
>
> when checking system resources i cannot see any bottleneck (cpus are
> pretty idle and we have no iowaits).
>
> when issuing a repair via
>
> nodetool repair -local on a node the repair takes longer than a day.
> is this normal or could we normally expect a faster repair?
>
> i also recognized that initalizing of new nodes in the datacenter was
> really slow (approx 50 mbit/s). also here i expected a much better
> performance - could those 2 problems be somehow related?
>
> br//
> roland


Re: Random slow read times in Cassandra

2017-03-17 Thread daemeon reiydelle
check for level 2 (stop the world) garbage collections.


*...*



*Daemeon C.M. ReiydelleUSA (+1) 415.501.0198London (+44) (0) 20 8144 9872*

On Fri, Mar 17, 2017 at 11:51 AM, Chuck Reynolds 
wrote:

> I have a large Cassandra 2.1.13 ring (60 nodes) in AWS that has
> consistently random high read times.  In general most reads are under 10
> milliseconds but with in the 30 request there is usually a read time that
> is a couple of seconds.
>
>
>
> Instance type: r4.8xlarge
>
> EBS GP2 volumes, 3tb with 9000 IOPS
>
> 30 Gig Heap
>
>
>
> Data per node is about 170 gigs
>
>
>
> The keyspace is an id & a blob.  When I check the data the slow reads
> don’t seem to have anything to do with size of the blobs
>
>
>
> This system has repairs run once a weeks because it takes a lot of updates.
>
>
>
> The client makes a call and does 30 request serially to Cassandra and the
> response times look like this in milliseconds.
>
>
>
> What could make these so slow and what can I do to diagnosis this?
>
>
>
>
>
> *Responses*
>
>
>
> Get Person time: 3 319746229:9009:66
>
> Get Person time: 7 1830093695:9009:66
>
> Get Person time: 4 30072253:9009:66
>
> Get Person time: 4 2303790089:9009:66
>
> Get Person time: 2 156792066:9009:66
>
> Get Person time: 8 491230624:9009:66
>
> Get Person time: 7 284904599:9009:66
>
> Get Person time: 4 600370489:9009:66
>
> Get Person time: 2 281007386:9009:66
>
> Get Person time: 4 971178094:9009:66
>
> Get Person time: 1 1322259885:9009:66
>
> Get Person time: 2 1937958542:9009:66
>
> Get Person time: 9 286536648:9009:66
>
> Get Person time: 9 1835633470:9009:66
>
> Get Person time: 2 300867513:9009:66
>
> Get Person time: 3 178975468:9009:66
>
> Get Person time: 2900 293043081:9009:66
>
> Get Person time: 8 214913830:9009:66
>
> Get Person time: 2 1956710764:9009:66
>
> Get Person time: 4 237673776:9009:66
>
> Get Person time: 17 68942206:9009:66
>
> Get Person time: 1800 20072145:9009:66
>
> Get Person time: 2 304698506:9009:66
>
> Get Person time: 2 308177320:9009:66
>
> Get Person time: 2 998436038:9009:66
>
> Get Person time: 10 1036890112:9009:66
>
> Get Person time: 1 1629649548:9009:66
>
> Get Person time: 6 1595339706:9009:66
>
> Get Person time: 4 1079637599:9009:66
>
> Get Person time: 3 556342855:9009:66
>
>
>
>
>
> Get Person time: 5 1856382256:9009:66
>
> Get Person time: 3 1891737174:9009:66
>
> Get Person time: 2 1179373651:9009:66
>
> Get Person time: 2 1482602756:9009:66
>
> Get Person time: 3 1236458510:9009:66
>
> Get Person time: 11 1003159823:9009:66
>
> Get Person time: 2 1264952556:9009:66
>
> Get Person time: 2 1662234295:9009:66
>
> Get Person time: 1 246108569:9009:66
>
> Get Person time: 5 1709881651:9009:66
>
> Get Person time: 3213 11878078:9009:66
>
> Get Person time: 2 112866483:9009:66
>
> Get Person time: 2 201870153:9009:66
>
> Get Person time: 6 227696684:9009:66
>
> Get Person time: 2 1946780190:9009:66
>
> Get Person time: 2 2197987101 <(219)%20798-7101>:9009:66
>
> Get Person time: 18 1838959725:9009:66
>
> Get Person time: 3 1782937802:9009:66
>
> Get Person time: 3 1692530939:9009:66
>
> Get Person time: 9 1765654196:9009:66
>
> Get Person time: 2 1597757121:9009:66
>
> Get Person time: 2 1853127153:9009:66
>
> Get Person time: 3 1533599253:9009:66
>
> Get Person time: 6 1693244112:9009:66
>
> Get Person time: 6 82047537:9009:66
>
> Get Person time: 2 96221961:9009:66
>
> Get Person time: 4 98202209:9009:66
>
> Get Person time: 9 12952388:9009:66
>
> Get Person time: 2 300118652:9009:66
>
> Get Person time: 10 78801084:9009:66
>
>
>
>
>
> Get Person time: 13 1856424913:9009:66
>
> Get Person time: 2 255814186:9009:66
>
> Get Person time: 2 1183397424:9009:66
>
> Get Person time: 5 1828603730:9009:66
>
> Get Person time: 9 132965919:9009:66
>
> Get Person time: 4 1616190071:9009:66
>
> Get Person time: 2 15929337:9009:66
>
> Get Person time: 10 297005427:9009:66
>
> Get Person time: 2 1306460047:9009:66
>
> Get Person time: 5 620139216:9009:66
>
> Get Person time: 2 1364349058:9009:66
>
> Get Person time: 3 629543403:9009:66
>
> Get Person time: 5 1299827034:9009:66
>
> Get Person time: 4 1593205912:9009:66
>
> Get Person time: 2 1755460077:9009:66
>
> Get Person time: 2 1906388666:9009:66
>
> Get Person time: 1 1838653952:9009:66
>
> Get Person time: 2 2249662508 <(224)%20966-2508>:9009:66
>
> Get Person time: 3 1931708432:9009:66
>
> Get Person time: 2 2177004948 <(217)%20700-4948>:9009:66
>
> Get Person time: 2 2042756682 <(204)%20275-6682>:9009:66
>
> Get Person time: 5 41764865:9009:66
>
> Get Person time: 4023 1733384704:9009:66
>
> Get Person time: 1 1614842189:9009:66
>
> Get Person time: 2 2194211396 <(219)%20421-1396>:9009:66
>
> Get Person time: 3 1711330834:9009:66
>
> Get Person time: 2 2264849689 <(226)%20484-9689>:9009:66
>
> Get Person time: 3 1819027970:9009:66
>
> Get Person time: 2 1978614851:9009:66
>
> Get Person time: 1 1863483129:9009:66
>
>
>


Re: Issue with Cassandra consistency in results

2017-03-17 Thread daemeon reiydelle
The prep is needed. If I recall correctly it must remain in cache for the
query to complete. I don't have the docs to dig out the yaml parm to adjust
query cache. I had run into the problem stress testing a smallish cluster
with many queries at once.

Do you have a sense of how many distinct queries are hitting the cluster at
peak?

If many clients, how do you balance the connection load or do you always
hit the same node?


sent from my mobile
Daemeon Reiydelle
skype daemeon.c.m.reiydelle
USA 415.501.0198

On Mar 16, 2017 3:25 PM, "srinivasarao daruna" <sree.srin...@gmail.com>
wrote:

> Hi reiydelle,
>
> I cannot confirm the range as the volume of data is huge and the query
> frequency is also high.
> If the cache is the cause of issue, can we increase cache size or is there
> solution to avoid dropped prep statements.?
>
>
>
>
>
>
> Thank You,
> Regards,
> Srini
>
> On Thu, Mar 16, 2017 at 2:13 PM, daemeon reiydelle <daeme...@gmail.com>
> wrote:
>
>> The discard due to oom is causing the zero returned. I would guess a
>> cache miss problem of some sort, but not sure. Are you using row, index,
>> etc. caches? Are you seeing the failed prep statement on random nodes (duh,
>> nodes that have the relevant data ranges)?
>>
>>
>> *...*
>>
>>
>>
>> *Daemeon C.M. ReiydelleUSA (+1) 415.501.0198 <+1%20415-501-0198>London
>> (+44) (0) 20 8144 9872 <+44%2020%208144%209872>*
>>
>> On Thu, Mar 16, 2017 at 10:56 AM, Ryan Svihla <r...@foundev.pro> wrote:
>>
>>> Depends actually, restore just restores what's there, so if only one
>>> node had a copy of the data then only one node had a copy of the data
>>> meaning quorum will still be wrong sometimes.
>>>
>>> On Thu, Mar 16, 2017 at 1:53 PM, Arvydas Jonusonis <
>>> arvydas.jonuso...@gmail.com> wrote:
>>>
>>>> If the data was written at ONE, consistency is not guaranteed. ..but
>>>> considering you just restored the cluster, there's a good chance something
>>>> else is off.
>>>>
>>>> On Thu, Mar 16, 2017 at 18:19 srinivasarao daruna <
>>>> sree.srin...@gmail.com> wrote:
>>>>
>>>>> Want to make read and write QUORUM as well.
>>>>>
>>>>>
>>>>> On Mar 16, 2017 1:09 PM, "Ryan Svihla" <r...@foundev.pro> wrote:
>>>>>
>>>>> Replication factor is 3, and write consistency is ONE and read
>>>>> consistency is QUORUM.
>>>>>
>>>>> That combination is not gonna work well:
>>>>>
>>>>> *Write succeeds to NODE A but fails on node B,C*
>>>>>
>>>>> *Read goes to NODE B, C*
>>>>>
>>>>> If you can tolerate some temporary inaccuracy you can use QUORUM but
>>>>> may still have the situation where
>>>>>
>>>>> Write succeeds on node A a timestamp 1, B succeeds at timestamp 2
>>>>> Read succeeds on node B and C at timestamp 1
>>>>>
>>>>> If you need fully race condition free counts I'm afraid you need to
>>>>> use SERIAL or LOCAL_SERIAL (for in DC only accuracy)
>>>>>
>>>>> On Thu, Mar 16, 2017 at 1:04 PM, srinivasarao daruna <
>>>>> sree.srin...@gmail.com> wrote:
>>>>>
>>>>> Replication strategy is SimpleReplicationStrategy.
>>>>>
>>>>> Smith is : EC2 snitch. As we deployed cluster on EC2 instances.
>>>>>
>>>>> I was worried that CL=ALL have more read latency and read failures.
>>>>> But won't rule out trying it.
>>>>>
>>>>> Should I switch select count (*) to select partition_key column? Would
>>>>> that be of any help.?
>>>>>
>>>>>
>>>>> Thank you
>>>>> Regards
>>>>> Srini
>>>>>
>>>>> On Mar 16, 2017 12:46 PM, "Arvydas Jonusonis" <
>>>>> arvydas.jonuso...@gmail.com> wrote:
>>>>>
>>>>> What are your replication strategy and snitch settings?
>>>>>
>>>>> Have you tried doing a read at CL=ALL? If it's an actual inconsistency
>>>>> issue (missing data), this should cause the correct results to be 
>>>>> returned.
>>>>> You'll need to run a repair to fix the inconsistencies.
>>>>>
>>>>> If all the data is actually there, you might have o

Re: Issue with Cassandra consistency in results

2017-03-16 Thread daemeon reiydelle
The discard due to oom is causing the zero returned. I would guess a cache
miss problem of some sort, but not sure. Are you using row, index, etc.
caches? Are you seeing the failed prep statement on random nodes (duh,
nodes that have the relevant data ranges)?


*...*



*Daemeon C.M. ReiydelleUSA (+1) 415.501.0198London (+44) (0) 20 8144 9872*

On Thu, Mar 16, 2017 at 10:56 AM, Ryan Svihla  wrote:

> Depends actually, restore just restores what's there, so if only one node
> had a copy of the data then only one node had a copy of the data meaning
> quorum will still be wrong sometimes.
>
> On Thu, Mar 16, 2017 at 1:53 PM, Arvydas Jonusonis <
> arvydas.jonuso...@gmail.com> wrote:
>
>> If the data was written at ONE, consistency is not guaranteed. ..but
>> considering you just restored the cluster, there's a good chance something
>> else is off.
>>
>> On Thu, Mar 16, 2017 at 18:19 srinivasarao daruna 
>> wrote:
>>
>>> Want to make read and write QUORUM as well.
>>>
>>>
>>> On Mar 16, 2017 1:09 PM, "Ryan Svihla"  wrote:
>>>
>>> Replication factor is 3, and write consistency is ONE and read
>>> consistency is QUORUM.
>>>
>>> That combination is not gonna work well:
>>>
>>> *Write succeeds to NODE A but fails on node B,C*
>>>
>>> *Read goes to NODE B, C*
>>>
>>> If you can tolerate some temporary inaccuracy you can use QUORUM but may
>>> still have the situation where
>>>
>>> Write succeeds on node A a timestamp 1, B succeeds at timestamp 2
>>> Read succeeds on node B and C at timestamp 1
>>>
>>> If you need fully race condition free counts I'm afraid you need to use
>>> SERIAL or LOCAL_SERIAL (for in DC only accuracy)
>>>
>>> On Thu, Mar 16, 2017 at 1:04 PM, srinivasarao daruna <
>>> sree.srin...@gmail.com> wrote:
>>>
>>> Replication strategy is SimpleReplicationStrategy.
>>>
>>> Smith is : EC2 snitch. As we deployed cluster on EC2 instances.
>>>
>>> I was worried that CL=ALL have more read latency and read failures. But
>>> won't rule out trying it.
>>>
>>> Should I switch select count (*) to select partition_key column? Would
>>> that be of any help.?
>>>
>>>
>>> Thank you
>>> Regards
>>> Srini
>>>
>>> On Mar 16, 2017 12:46 PM, "Arvydas Jonusonis" <
>>> arvydas.jonuso...@gmail.com> wrote:
>>>
>>> What are your replication strategy and snitch settings?
>>>
>>> Have you tried doing a read at CL=ALL? If it's an actual inconsistency
>>> issue (missing data), this should cause the correct results to be returned.
>>> You'll need to run a repair to fix the inconsistencies.
>>>
>>> If all the data is actually there, you might have one or several nodes
>>> that aren't identifying the correct replicas.
>>>
>>> Arvydas
>>>
>>>
>>>
>>> On Thu, Mar 16, 2017 at 5:31 PM, srinivasarao daruna <
>>> sree.srin...@gmail.com> wrote:
>>>
>>> Hi Team,
>>>
>>> We are struggling with a problem related to cassandra counts, after
>>> backup and restore of the cluster. Aaron Morton has suggested to send this
>>> to user list, so some one of the list will be able to help me.
>>>
>>> We are have a rest api to talk to cassandra and one of our query which
>>> fetches count is creating problems for us.
>>>
>>> We have done backup and restore and copied all the data to new cluster.
>>> We have done nodetool refresh on the tables, and did the nodetool repair as
>>> well.
>>>
>>> However, one of our key API call is returning inconsistent results. The
>>> result count is 0 in the first call and giving the actual values for later
>>> calls. The query frequency is bit high and failure rate has also raised
>>> considerably.
>>>
>>> 1) The count query has partition keys in it. Didnt see any read timeout
>>> or any errors from api logs.
>>>
>>> 2) This is how our code of creating session looks.
>>>
>>> val poolingOptions = new PoolingOptions
>>> poolingOptions
>>>   .setCoreConnectionsPerHost(HostDistance.LOCAL, 4)
>>>   .setMaxConnectionsPerHost(HostDistance.LOCAL, 10)
>>>   .setCoreConnectionsPerHost(HostDistance.REMOTE, 4)
>>>   .setMaxConnectionsPerHost( HostDistance.REMOTE, 10)
>>>
>>> val builtCluster = clusterBuilder.withCredentials(username, password)
>>>   .withPoolingOptions(poolingOptions)
>>>   .build()
>>> val cassandraSession = builtCluster.get.connect()
>>>
>>> val preparedStatement = cassandraSession.prepare(state
>>> ment).setConsistencyLevel(ConsistencyLevel.QUORUM)
>>> cassandraSession.execute(preparedStatement.bind(args :_*))
>>>
>>> Query: SELECT count(*) FROM table_name WHERE parition_column=? AND
>>> text_column_of_clustering_key=? AND date_column_of_clustering_key<=?
>>> AND date_column_of_clustering_key>=?
>>>
>>> 3) Cluster configuration:
>>>
>>> 6 Machines: 3 seeds, we are using apache cassandra 3.9 version. Each
>>> machine is equipped with 16 Cores and 64 GB Ram.
>>>
>>> Replication factor is 3, and write consistency is ONE and read
>>> consistency is QUORUM.
>>>
>>> 4) cassandra is never down on any machine
>>>

Re: Does "nodetool repair" need to be run on each node for a given table?

2017-03-14 Thread daemeon reiydelle
Am I unreasonable in expecting a poster to have looked at the documentation
before posting? And that reposting the same query WITHOUT reading the
documents (when pointed out to them) when asked to do so is not
appropriate? Do we have a way to blackball such?


*...*



*Daemeon C.M. ReiydelleUSA (+1) 415.501.0198London (+44) (0) 20 8144 9872*

On Mon, Mar 13, 2017 at 1:30 PM, Thakrar, Jayesh <
jthak...@conversantmedia.com> wrote:

> I understand that the nodetool command connects to a specific server and
> for many of the commands, e.g. "info", "compactionstats", etc, the
> information is for that specific node.
>
> While for some other commands like "status", the info is for the whole
> cluster.
>
>
>
> So is "nodetool repair" that operates at a single node level (i.e. repairs
> the partitions contained on the target node?).
>
> If so, what is the recommended approach to doing repairs?
>
>
>
> E.g. we have a large number of tables (20+), large amount of data (40+ TB)
> and a number of nodes (40+).
>
> Do I need to iterate through each server AND each table?
>
>
>
> Thanks,
>
> Jayesh
>
>
>
>
>
>
>


Re: Does "nodetool repair" need to be run on each node for a given table?

2017-03-13 Thread daemeon reiydelle
I
​ find it helpful to read the manual first. After review, I would be happy
to answer specific questions.

https://docs.datastax.com/en/cassandra/2.1/cassandra/tools/toolsRepair.html​


*...*



*Daemeon C.M. ReiydelleUSA (+1) 415.501.0198London (+44) (0) 20 8144 9872*

On Mon, Mar 13, 2017 at 1:30 PM, Thakrar, Jayesh <
jthak...@conversantmedia.com> wrote:

> I understand that the nodetool command connects to a specific server and
> for many of the commands, e.g. "info", "compactionstats", etc, the
> information is for that specific node.
>
> While for some other commands like "status", the info is for the whole
> cluster.
>
>
>
> So is "nodetool repair" that operates at a single node level (i.e. repairs
> the partitions contained on the target node?).
>
> If so, what is the recommended approach to doing repairs?
>
>
>
> E.g. we have a large number of tables (20+), large amount of data (40+ TB)
> and a number of nodes (40+).
>
> Do I need to iterate through each server AND each table?
>
>
>
> Thanks,
>
> Jayesh
>
>
>
>
>
>
>


Re: scylladb

2017-03-11 Thread daemeon reiydelle
Recall that garbage collection on a busy node can occur minutes or seconds
apart. Note that stop the world GC also happens as frequently as every
couple of minutes on every node. Remove that and do the simple arithmetic.


sent from my mobile
Daemeon Reiydelle
skype daemeon.c.m.reiydelle
USA 415.501.0198

On Mar 10, 2017 8:59 AM, "Bhuvan Rawal" <bhu1ra...@gmail.com> wrote:

> Agreed C++ gives an added advantage to talk to underlying hardware with
> better efficiency, it sound good but can a pice of code written in C++ give
> 1000% throughput than a Java app? Is TPC design 10X more performant than
> SEDA arch?
>
> And if C/C++ is indeed that fast how can Aerospike (which is itself
> written in C) claim to be 10X faster than Scylla here
> http://www.aerospike.com/benchmarks/scylladb-initial/ ? (Combining your's
> and aerospike's benchmarks it appears that Aerospike is 100X performant
> than C* - I highly doubt that!! )
>
> For a moment lets forget about evaluating 2 different databases, one can
> observe 10X performance difference between a mistuned cassandra cluster and
> one thats tuned as per data model - there are so many Tunables in yaml as
> well as table configs.
>
> Idea is - in order to strengthen your claim, you need to provide complete
> system metrics (Disk, CPU, Network), the OPS increase starts to decay along
> with the configs used. Having plain ops per second and 99p latency is
> blackbox.
>
> Regards,
> Bhuvan
>
> On Fri, Mar 10, 2017 at 12:47 PM, Avi Kivity <a...@scylladb.com> wrote:
>
>> ScyllaDB engineer here.
>>
>> C++ is really an enabling technology here. It is directly responsible for
>> a small fraction of the gain by executing faster than Java.  But it is
>> indirectly responsible for the gain by allowing us direct control over
>> memory and threading.  Just as an example, Scylla starts by taking over
>> almost all of the machine's memory, and dynamically assigning it to
>> memtables, cache, and working memory needed to handle requests in flight.
>> Memory is statically partitioned across cores, allowing us to exploit NUMA
>> fully.  You can't do these things in Java.
>>
>> I would say the major contributors to Scylla performance are:
>>  - thread-per-core design
>>  - replacement of the page cache with a row cache
>>  - careful attention to many small details, each contributing a little,
>> but with a large overall impact
>>
>> While I'm here I can say that performance is not the only goal here, it
>> is stable and predictable performance over varying loads and during
>> maintenance operations like repair, without any special tuning.  We measure
>> the amount of CPU and I/O spent on foreground (user) and background
>> (maintenance) tasks and divide them fairly.  This work is not complete but
>> already makes operating Scylla a lot simpler.
>>
>>
>> On 03/10/2017 01:42 AM, Kant Kodali wrote:
>>
>> I dont think ScyllaDB performance is because of C++. The design decisions
>> in scylladb are indeed different from Cassandra such as getting rid of SEDA
>> and moving to TPC and so on.
>>
>> If someone thinks it is because of C++ then just show the benchmarks that
>> proves it is indeed the C++ which gave 10X performance boost as ScyllaDB
>> claims instead of stating it.
>>
>>
>> On Thu, Mar 9, 2017 at 3:22 PM, Richard L. Burton III <mrbur...@gmail.com
>> > wrote:
>>
>>> They spend an enormous amount of time focusing on performance. You can
>>> expect them to continue on with their optimization and keep crushing it.
>>>
>>> P.S., I don't work for ScyllaDB.
>>>
>>> On Thu, Mar 9, 2017 at 6:02 PM, Rakesh Kumar <rakeshkumar...@outlook.com
>>> > wrote:
>>>
>>>> In all of their presentation they keep harping on the fact that
>>>> scylladb is written in C++ and does not carry the overhead of Java.  Still
>>>> the difference looks staggering.
>>>> 
>>>> From: daemeon reiydelle <daeme...@gmail.com>
>>>> Sent: Thursday, March 9, 2017 14:21
>>>> To: user@cassandra.apache.org
>>>> Subject: Re: scylladb
>>>>
>>>> The comparison is fair, and conservative. Did substantial performance
>>>> comparisons for two clients, both results returned throughputs that were
>>>> faster than the published comparisons (15x as I recall). At that time the
>>>> client preferred to utilize a Cass COTS solution and use a caching solution
>>>> for OLA compliance.
>>>>
>>>>
>

Re: Disconnecting two data centers

2017-03-08 Thread daemeon reiydelle
I guess it depends on the experience one has. This is a common process to
bring up, move, build full prod copies, etc.

What is outlined is pretty much exactly what I have done 20-50 times (too
many to remember).

FYI, some of this should be done with nodes DOWN.



*...*



*Daemeon C.M. ReiydelleUSA (+1) 415.501.0198London (+44) (0) 20 8144 9872*

On Wed, Mar 8, 2017 at 6:38 AM, Ryan Svihla  wrote:

> it's a bit tricky and I don't advise it, but the typical pattern is (say
> you have DC1 and DC2):
>
> 1. partition the data centers from one another..kill the routing however
> you can (firewall, etc)
> 2. while partitioned log onto DC1 alter schema so that DC2 is not
> replicating), repeat for other.
> 2a. If using propertyfilesnitch remove the DC2 from all the DC1 property
> files and vice versa
> 2b. change the seeds setting in the cassandra.yaml accordingly (DC1 yaml's
> shouldn't have any seeds from DC2, etc)
> 3. rolling restart to account for this.
> 4,. run repair (not even sure how necessary this step is, but after doing
> RF changes I do this to prevent hiccups)
>
> I've done this a couple of times but really failing all of that, the more
> well supported and harder to mess up but more work approach is:
>
> 1. Set DC2 to RF 0
> 2. remove all nodes from DC2
> 3. change yamls for seed files (update property file if need be)
> 4. create new cluster in DC2,
> 5. use sstableloader to stream DC1 data to DC2.
>
> On Wed, Mar 8, 2017 at 8:13 AM, Chuck Reynolds 
> wrote:
>
>> I’m running C* 2.1.13 and I have two rings that are replicating data from
>> our data center to one in AWS.
>>
>>
>>
>> We would like to keep both of them for a while but we have a need to
>> disconnect them.  How can this be done?
>>
>
>
>
> --
>
> Thanks,
> Ryan Svihla
>
>


Re: AWS NVMe i3 instances performances

2017-03-01 Thread daemeon reiydelle
We did. Found that, even with (CentOS, Ubuntu both for application
compatibility reasons) that there is somewhat less IO and better CPU
throughput at the price point. At the time my optimization work for that
client ended, Amazon was looking at the IO issue, as perhaps the frame
configurations needed further optimization. this was 2 months ago. A very
superficial (no kernel tuning) done last month seems to indicate the same
tradeoffs. Testing was performed in both cases with C* stress tool and with
CI test suites. Does this help?


*...*



*Daemeon C.M. ReiydelleUSA (+1) 415.501.0198London (+44) (0) 20 8144 9872*

On Wed, Mar 1, 2017 at 3:30 AM, Romain Hardouin  wrote:

> Hi all,
>
> AWS launched i3 instances few days ago*. NVMe SSDs seem very promising!
>
> Did someone already benchmark an i3 with Cassandra? e.g. i2 vs i3
> If yes, with which OS and kernel version?
> Did you make any system tuning for NVMe? e.g. PCIe IRQ? etc.
>
> We plan to make some benchmarks but Debian is not listed as a supported OS
> so we have to upgrade our kernel and see if it works :P
> Here is what we have in mind for the time being:
> * OS: Debian
> * Kernel: v4.9
> * IRQ: try several configurations
> Also I would like to compare performances between our Debian AMI and a
> standard AWS Linux AMI.
>
> Thanks!
>
> [*] https://aws.amazon.com/fr/blogs/aws/now-available-i3-
> instances-for-demanding-io-intensive-applications/
>
>
>


Re: Current data density limits with Open Source Cassandra

2017-02-08 Thread daemeon reiydelle
your MMV. Think of that storage limit as fairly reasonable for active data
likely to tombstone. Add more for older/historic data. Then think about
time to recover a node.


*...*



*Daemeon C.M. ReiydelleUSA (+1) 415.501.0198London (+44) (0) 20 8144 9872*

On Wed, Feb 8, 2017 at 2:14 PM, Ben Slater 
wrote:

> The major issue we’ve seen with very high density (we generally say <2TB
> node is best) is manageability - if you need to replace a node or add node
> then restreaming data takes a *long* time and there we fairly high chance
> of a glitch in the universe meaning you have to start again before it’s
> done.
>
> Also, if you’re uses STCS you can end up with gigantic compactions which
> also take a long time and can cause issues.
>
> Heap limitations are mainly related to partition size rather than node
> density in my experience.
>
> Cheers
> Ben
>
> On Thu, 9 Feb 2017 at 08:20 Hannu Kröger  wrote:
>
>> Hello,
>>
>> Back in the day it was recommended that max disk density per node for
>> Cassandra 1.2 was at around 3-5TB of uncompressed data.
>>
>> IIRC it was mostly because of heap memory limitations? Now that off-heap
>> support is there for certain data and 3.x has different data storage
>> format, is that 3-5TB still a valid limit?
>>
>> Does anyone have experience on running Cassandra with 3-5TB compressed
>> data ?
>>
>> Cheers,
>> Hannu
>
> --
> 
> Ben Slater
> Chief Product Officer
> Instaclustr: Cassandra + Spark - Managed | Consulting | Support
> +61 437 929 798 <+61%20437%20929%20798>
>


Re: Instaclustr Masters scholarship

2017-02-07 Thread daemeon reiydelle
A bunch more welcome than here in the US, to our deep shame and foolishness.

Sadly while I am actually involved in this area, I am happy in San
Francisco. I would be interested in being part of a pro bono team should
that transpire.

Thanks, D.


*...*



*Daemeon C.M. ReiydelleUSA (+1) 415.501.0198London (+44) (0) 20 8144 9872*

On Tue, Feb 7, 2017 at 7:24 PM, Ben Bromhead  wrote:

> As part of our commitment to contributing back to the Apache Cassandra
> open source project and the wider community we are always looking for ways
> we can foster knowledge sharing and improve usability of Cassandra itself.
> One of the ways we have done so previously was to open up our internal
> builds and versions of Cassandra (https://github.com/instaclustr/cassandra
> ).
>
> We have also been looking at a few novel or outside the box ways we can
> further contribute back to the community. As such, we are sponsoring a
> masters project in conjunction with the Australian based University of
> Canberra. Instaclustr’s staff will be available to provide advice and
> feedback to the successful candidate.
>
> *Scope*
> Distributed database systems are relatively new technology compared to
> traditional relational databases. Distributed advantages provide
> significant advantages in terms of reliability and scalability but often at
> a cost of increased complexity. This complexity presents challenges for
> testing of these systems to prove correct operation across all possible
> system states. The scope of this masters scholarship is to use the Apache
> Cassandra repair process as an example to consider and improve available
> approaches to distributed database systems testing.
>
> The repair process in Cassandra is a scheduled process that runs to ensure
> the multiple copies of each piece of data that is maintained by Cassandra
> are kept synchronised. Correct operation of repairs has been an ongoing
> challenge for the Cassandra project partly due to the difficulty in
> designing and developing  comprehensive automated tests for this
> functionality.
>
> The expected scope of this project is to:
>
>- survey and understand the existing testing framework available as
>part of the Cassandra project, particularly as it pertains to testing
>repairs
>- consider, research and develop enhanced approaches to testing of
>repairs
>- submit any successful approaches to the Apache Cassandra project for
>feedback and inclusion in the project code base
>
> Australia is a pretty great place to advance your education and is
> welcoming of foreign students.
>
> We are also open to sponsoring a PhD project with a more in depth focus
> for the right candidate.
>
> For more details please don't hesitate to get in touch with myself or
> reach out to i...@instaclustr.com.
>
> Cheers
>
> Ben
> --
> Ben Bromhead
> CTO | Instaclustr 
> +1 650 284 9692 <(650)%20284-9692>
> Managed Cassandra / Spark on AWS, Azure and Softlayer
>


Re: Why does `now()` produce different times within the same query?

2016-11-30 Thread daemeon reiydelle
This is not a bug, and in fact changing it would be a serious bug.

What it is is a wonderful case of bad coding: would one expect a
java/py/bash script that loops on a bunch of read/execut/update calls where
each iteration calls time to return the same exact time for the duration of
the execution of the code? Whether the code runs for 5 seconds or 5 hours?

Every call to a system call is unique, including within C*. Calling now
PRIOR to initiating multiple inserts is in most cases exactly what one does
to assure unique time stamps FOR THE BATCH OF INSERTS. To get a nearly
identical system time as would be the uuid of the row, one tries to call
time as close to just before the insert as possible. Then repeat.

You have a logic issue in your code. If you want the same value for a set
of calls, the ONLY practice is to set the value before initiating the
sequence of calls.



*...*



*Daemeon C.M. ReiydelleUSA (+1) 415.501.0198London (+44) (0) 20 8144 9872*

On Wed, Nov 30, 2016 at 6:16 PM, Cody Yancey  wrote:

> Getting the same TimeUUID values might be a major problem. Getting two
> different TimeUUIDs that at least have time component would not be a major
> problem as this is the main case today. Getting different time components
> is actually the corner case, and it is a corner case that breaks
> Internet-of-Things applications. We can tightly control clock skew in our
> cluster. We most definitely CANNOT control clock skew on the thousands of
> sensors that write to our cluster.
>
> Thanks,
> Cody
>
> On Wed, Nov 30, 2016 at 5:33 PM, Robert Wille  wrote:
>
>> In my opinion, this is not broken and “fixing” it would break existing
>> code. Consider a batch that includes multiple inserts, each of which
>> inserts the value returned by now(). Getting the same UUID for each insert
>> would be a major problem.
>>
>> Cheers
>>
>> Robert
>>
>>
>> On Nov 30, 2016, at 4:46 PM, Todd Fast  wrote:
>>
>> FWIW I'd suggest opening a bug--this behavior is certainly quite
>> unexpected and more than just a documentation issue. In general I can't
>> imagine any desirable properties of the current implementation, and there
>> are likely a bunch of latent bugs sitting out there, so it should be fixed.
>>
>> Todd
>>
>> On Wed, Nov 30, 2016 at 12:37 PM Terry Liu  wrote:
>>
>>> Sorry for my typo. Obviously, I meant:
>>> "It appears that a single query that calls Cassandra's`now()` time
>>> function *multiple times *may actually cause a query to write or return
>>> different times."
>>>
>>> Less of a surprise now that I realize more about the implementation, but
>>> I agree that more explicit documentation around when exactly the
>>> "execution" of each now() statement happens and what implications it has
>>> for the resulting timestamps would be helpful when running into this.
>>>
>>> Thanks for the quick responses!
>>>
>>> -Terry
>>>
>>>
>>>
>>> On Tue, Nov 29, 2016 at 2:45 PM, Marko Švaljek 
>>> wrote:
>>>
>>> every now() call in statement is under the hood "replaced" with newly
>>> generated uuid.
>>>
>>> It can happen that they belong to  different milliseconds in time.
>>>
>>> If you need to have same timestamps you need to set them on the client
>>> side.
>>>
>>>
>>> @msvaljek 
>>>
>>> 2016-11-29 22:49 GMT+01:00 Terry Liu :
>>>
>>> It appears that a single query that calls Cassandra's `now()` time
>>> function may actually cause a query to write or return different times.
>>>
>>> Is this the expected or defined behavior, and if so, why does it behave
>>> like this rather than evaluating `now()` once across an entire statement?
>>>
>>> This really affects UPDATE statements but to test it more easily, you
>>> could try something like:
>>>
>>> SELECT toTimestamp(now()) as a, toTimestamp(now()) as b
>>> FROM keyspace.table
>>> LIMIT 100;
>>>
>>> If you run that a few times, you should eventually see that the
>>> timestamp returned moves onto the next millisecond mid-query.
>>>
>>> --
>>> *Software Engineer*
>>> Turnitin - http://www.turnitin.com
>>> t...@turnitin.com
>>>
>>>
>>>
>>>
>>>
>>> --
>>> *Software Engineer*
>>> Turnitin - http://www.turnitin.com
>>> t...@turnitin.com
>>>
>>
>>
>


Re: Throughout of hints delivery

2016-09-17 Thread daemeon reiydelle
timeouts indicate network or equivalent throughput delays, from the
physical box's network card out and to the other dc's card. If you are
using VM's add that layer. Your network team needs to be looking for ANY
timeouts, retries, packets delivered in retry window > 0, etc. ANY value
other than zero forever is your problem


*...*



*Daemeon C.M. ReiydelleUSA (+1) 415.501.0198London (+44) (0) 20 8144 9872*

On Sat, Sep 17, 2016 at 12:02 PM, laxmikanth sadula  wrote:

> Hi Matija,
>
> All nodes are UP & running and even GC patterns are all well. But I see
> lot of "Timed out replaying hints" in HintedHandOff Manger, I suspect this
> might be the reason why GBs of hints getting piled up instead of proper
> delivery.
> So this clearly indicates some network related issues , so just wanted to
> know the way to monitor hints delivery throughput and also tcp
> throughput(packets sent, received ,dropped etc., on an interface).
>
> If anyone monitoring such stats, please let me know.
>
> On Sat, Sep 17, 2016 at 11:26 PM, Matija Gobec 
> wrote:
>
>> Hi,
>>
>> You should first figure out why you have so many hints and then think
>> about throughput of hints delivery.
>> Hints are generated for dead nodes and in a healthy cluster are not
>> present.
>> Are all your nodes alive and running? What is the issue of inter DC
>> connectivity?
>>
>> Matija
>>
>> --
>>
>> *Matija Gobec*
>> *Co-Founder & Senior Consultant*
>> www.smartcat.io
>>
>>
>>
>> *Data  --> Knowledge
>>  --> Power  *
>>
>> On Sat, Sep 17, 2016 at 3:16 PM, laxmikanth sadula <
>> laxmikanth...@gmail.com> wrote:
>>
>>> Hi,
>>>
>>> Is there any way to monitor hints delivery throughout/performance/issue
>>> delivering hints?
>>>
>>> We have 2 DC c* cluster with 2.0.17 with RF=3 setup. Due to inter DC
>>> connectivity issues/some other issues hints shoot upto GBs/node.
>>>
>>> So I would like to monitor hints throughput/pin point the reason for
>>> hints growth on nodes.
>>>
>>> Kindly let us know if any one of you have such thing to monitor hints.
>>>
>>>
>>> Thanks
>>> Laxmikanth
>>>
>>>
>>> --
>>> Regards,
>>> Laxmikanth
>>> 99621 38051
>>>
>>>
>>>
>>
>
>
> --
> Regards,
> Laxmikanth
> 99621 38051
>
>


Re: Questions about anti-entropy repair

2016-07-20 Thread daemeon reiydelle
I don't know if my perspective on this will assist, so YMMV:

Summary

   1. Nodetool repairs are required when a node has issues and can't get
   its (e.g. hinted handoff) resync done: culprit: usually network, sometimes
   container/vm, rarely disk.
   2. Scripts to do partition range are a pain to maintain, and you have to
   be CONSTANTLY checking for new keyspaces, parsing them, etc. Git hub
   project?
   3. Monitor/monitor/monitor: if you do a best practices job of actually
   monitoring the FULL stack, you only need to do repairs when the world goes
   south.
   4. Are you alerted when errors show up in the logs, network goes wacky,
   etc? No? then you have to CYA by doing hail mary passes with periodic
   nodetool repairs.
   5. Nodetool repair is a CYA for a cluster whose status is not well
   monitored.

Daemeon's thoughts:

Nodetool repair is not required for a cluster that is and "always has been"
in a known good state. Monitoring of the relevant logs/network/disk/etc. is
the only way that I know of to assure this state. Because (e.g. AWS, and
EVERY ONE OF my clients' infrastructures: screwed up networks) nodes can
disappear then the cluster *can* get overloaded (network traffic) causing
hinted handoffs to have all of the worst case corner cases you can never
hope to see.

So, if you have good monitoring in place to assure that there is known good
cluster behaviour (network, disk, etc.), repairs are not required until you
are alerted that a cluster health problem has occurred. Partition range
repair is a pain in various parts of the anatomy because one has to
CONSTANTLY be updating the scripts that generate the commands (I have not
seen a git hub project around this, would love to see responses that point
them out!).



*...*



*Daemeon C.M. ReiydelleUSA (+1) 415.501.0198London (+44) (0) 20 8144 9872*

On Wed, Jul 20, 2016 at 4:33 AM, Alain RODRIGUEZ  wrote:

> Hi Satoshi,
>
>
>> Q1:
>> According to the DataStax document, it's recommended to run full repair
>> weekly or monthly. Is it needed even if repair with partitioner range
>> option ("nodetool repair -pr", in C* v2.2+) is set to run periodically for
>> every node in the cluster?
>>
>
> More accurately you need to run a repair for each node and each table
> within the gc_grace_seconds value defined at the table level to ensure no
> deleted data will return. Also running this on a regular basis ensure a
> constantly low entropy in your cluster, allowing better consistency (if not
> using a strong consistency like with CL.R = quorum).
>
> A full repair means every piece of data have been repaired. On a 3 node
> cluster with RF=3, running 'nodetool repair -pr' on the 3 nodes or
> 'nodetool repair' on one node are an equivalent "full repair". The best
> approach is often to run repair with '-pr' on all the nodes indeed. This is
> a full repair.
>
> Is it a good practice to repair a node without using non-repaired
>> snapshots when I want to restore a node because repair process is too slow?
>
>
> I am sorry, this is unclear to me. But from this "actually 1GB data is
> updated because the snapshot is already repaired" I understand you are
> using incremental repairs (or that you think that Cassandra repair uses it
> by default, which is not the case in your version).
> http://www.datastax.com/dev/blog/more-efficient-repairs
>
> Also, be aware that repair is a PITA for all the operators using
> Cassandra, that lead to many tries to improve things:
>
> Range repair: https://github.com/BrianGallew/cassandra_range_repair
> Reaper: https://github.com/spotify/cassandra-reaper
> Ticket to automatically schedule / handle repairs in Cassandra:
> https://issues.apache.org/jira/browse/CASSANDRA-10070
> Ticket to switch to Mutation Based Repairs (MBR):
> https://issues.apache.org/jira/browse/CASSANDRA-8911
>
> And probably many more... There is a lot to read and try, repair is an
> important yet non trivial topic for any Cassandra operator.
>
> C*heers,
> ---
> Alain Rodriguez - al...@thelastpickle.com
> France
>
> The Last Pickle - Apache Cassandra Consulting
> http://www.thelastpickle.com
>
>
>
>
> 2016-07-14 9:41 GMT+02:00 Satoshi Hikida :
>
>> Hi,
>>
>> I have two questions about anti-entropy repair.
>>
>> Q1:
>> According to the DataStax document, it's recommended to run full repair
>> weekly or monthly. Is it needed even if repair with partitioner range
>> option ("nodetool repair -pr", in C* v2.2+) is set to run periodically for
>> every node in the cluster?
>>
>> References:
>> - DataStax, "When to run anti-entropy repair",
>> http://docs.datastax.com/en/cassandra/2.2/cassandra/operations/opsRepairNodesWhen.html
>>
>>
>> Q2:
>> Is it a good practice to repair a node without using non-repaired
>> snapshots when I want to restore a node because repair process is too slow?
>>
>> I've done some simple verifications for anti-entropy repair and found out
>> that the repair process spends 

Re: Problems with cassandra on AWS

2016-07-11 Thread daemeon reiydelle
xWell, I seem to recall that the private IP's are valid for communications
WITHIN one VPC. I assume you can log into one machine and ping (or ssh) the
others. If so, check that cassandra.yaml is not set to listen on 127.0.0.1
(localhost).


*...*



*Daemeon C.M. ReiydelleUSA (+1) 415.501.0198
<%28%2B1%29%20415.501.0198>London (+44) (0) 20 8144 9872
<%28%2B44%29%20%280%29%2020%208144%209872>*

On Sun, Jul 10, 2016 at 4:54 PM, Kant Kodali  wrote:

> Hi Guys,
>
> I installed a 3 node Cassandra cluster on AWS and my replication factor is
> 3. I am trying to insert some data into a table. I set the consistency
> level of QUORUM at a Cassandra Session level. It only inserts into one node
> and unable to talk to other nodes because it is trying to contact other
> nodes through private IP and obviously that is failing so I am not sure how
> to change settings in say cassandra.yaml or somewhere such that rpc_address
> in system.peers table is updated to public IP's? I tried changing the seeds
> to all public IP's that didn't work as it looks like ec2 instances cannot
> talk to each other using public IP's. any help would be appreciated!
>
> Thanks,
> kant
>


Re: Blog post on Cassandra's inner workings and performance - feedback welcome

2016-07-09 Thread daemeon reiydelle
I saw this really useful post a few days ago. I found the organization and
presentation quite clear and helpful (I often struggle trying to do high
level comparisons of Hadoop and Cass). Thank you!

If there was sections I would like to see your clear thoughts appear
within, it would be around:

   - (1) why networks need to be clean (the impact of "dirty"/erratic
   networks);
   - (2) the impact of java (off heap, stop the world garbage collection,
   why more memory makes things worse;
   - (3) table design decisions (read mostly, write mostly, mixed
   read/write, etc.)

A really great writeup, thank you!





*...*



*Daemeon C.M. ReiydelleUSA (+1) 415.501.0198London (+44) (0) 20 8144 9872*

On Fri, Jul 8, 2016 at 11:59 PM, Manuel Kiessling <
kiessling.man...@gmail.com> wrote:

> Yes, the joke's on me. It was a copy error, and I've since posted
> the correct URL (journeymonitor.com
> :4000/tutorials/2016/02/29/cassandra-inner-workings-and-how-this-relates-to-performance/).
>
> Substantial feedback regarding the actual post still very much welcome.
>
> Regards,
> Manuel
>
> Am 09.07.2016 um 03:32 schrieb daemeon reiydelle <daeme...@gmail.com>:
>
> Localhost is a special network address that never leaves the operating
> system. It only goes "half way" down the IP stack. Thanks for your efforts!
>
>
> *...*
>
>
>
> *Daemeon C.M. ReiydelleUSA (+1) 415.501.0198
> <%28%2B1%29%20415.501.0198>London (+44) (0) 20 8144 9872
> <%28%2B44%29%20%280%29%2020%208144%209872>*
>
> On Fri, Jul 8, 2016 at 5:53 PM, Joaquin Alzola <joaquin.alz...@lebara.com>
> wrote:
>
>> Hi Manuel
>>
>>
>>
>> I think localhost will not work for people on the internet.
>>
>>
>>
>> BR
>>
>>
>>
>> Joaquin
>>
>>
>>
>> *From:* kiessling.man...@gmail.com [mailto:kiessling.man...@gmail.com] *On
>> Behalf Of *Manuel Kiessling
>> *Sent:* 07 July 2016 14:12
>> *To:* user@cassandra.apache.org
>> *Subject:* Blog post on Cassandra's inner workings and performance -
>> feedback welcome
>>
>>
>>
>> Hi all,
>>
>> I'm currently in the process of understanding the inner workings of
>> Cassandra with regards to network and local storage mechanisms and
>> operations. In order to do so, I've written a blog post about it which is
>> now in a "first final" version.
>>
>> Any feedback, especially corrections regarding misunderstandings on my
>> side, would be highly appreciated. The post really represents my very
>> subjective view on how Cassandra works under the hood, which makes it prone
>> to errors of course.
>>
>> You can access the current version at
>> http://localhost:4000/tutorials/2016/02/29/cassandra-inner-workings-and-how-this-relates-to-performance/
>>
>>
>>
>> Thanks,
>>
>> --
>>
>>  Manuel
>> This email is confidential and may be subject to privilege. If you are
>> not the intended recipient, please do not copy or disclose its content but
>> contact the sender immediately upon receipt.
>>
>
>


Re: Blog post on Cassandra's inner workings and performance - feedback welcome

2016-07-08 Thread daemeon reiydelle
Localhost is a special network address that never leaves the operating
system. It only goes "half way" down the IP stack. Thanks for your efforts!


*...*



*Daemeon C.M. ReiydelleUSA (+1) 415.501.0198London (+44) (0) 20 8144 9872*

On Fri, Jul 8, 2016 at 5:53 PM, Joaquin Alzola 
wrote:

> Hi Manuel
>
>
>
> I think localhost will not work for people on the internet.
>
>
>
> BR
>
>
>
> Joaquin
>
>
>
> *From:* kiessling.man...@gmail.com [mailto:kiessling.man...@gmail.com] *On
> Behalf Of *Manuel Kiessling
> *Sent:* 07 July 2016 14:12
> *To:* user@cassandra.apache.org
> *Subject:* Blog post on Cassandra's inner workings and performance -
> feedback welcome
>
>
>
> Hi all,
>
> I'm currently in the process of understanding the inner workings of
> Cassandra with regards to network and local storage mechanisms and
> operations. In order to do so, I've written a blog post about it which is
> now in a "first final" version.
>
> Any feedback, especially corrections regarding misunderstandings on my
> side, would be highly appreciated. The post really represents my very
> subjective view on how Cassandra works under the hood, which makes it prone
> to errors of course.
>
> You can access the current version at
> http://localhost:4000/tutorials/2016/02/29/cassandra-inner-workings-and-how-this-relates-to-performance/
>
>
>
> Thanks,
>
> --
>
>  Manuel
> This email is confidential and may be subject to privilege. If you are not
> the intended recipient, please do not copy or disclose its content but
> contact the sender immediately upon receipt.
>


Re: Is my cluster normal?

2016-07-07 Thread daemeon reiydelle
Those numbers, as I suspected, line up pretty well with your AWS
configuration and network latencies within AWS. It is clear that this is a
WRITE ONLY test. You might want to do a mixed (e.g. 50% read, 50% write)
test for sanity. Note that the test will populate the data BEFORE it begins
doing the read/write tests.

In a dedicated environment at a recent client, with 10gbit links (just
grabbing one casstest run from my archives) I see less than twice the
above. Note your latency max is the result of a stop-the-world garbage
collection. There were huge problems below because this particular run was
using 24gb (Cassandra 2.x) java heap.

op rate   : 21567 [WRITE:21567]
partition rate: 21567 [WRITE:21567]
row rate  : 21567 [WRITE:21567]
latency mean  : 9.3 [WRITE:9.3]
latency median: 7.7 [WRITE:7.7]
latency 95th percentile   : 13.2 [WRITE:13.2]
latency 99th percentile   : 32.6 [WRITE:32.6]
latency 99.9th percentile : 97.2 [WRITE:97.2]
latency max   : 14906.1 [WRITE:14906.1]
Total partitions  : 8333 [WRITE:8333]
Total errors  : 0 [WRITE:0]
total gc count: 705
total gc mb   : 1691132
total gc time (s) : 30
avg gc time(ms)   : 43
stdev gc time(ms) : 13
Total operation time  : 01:04:23


*...*



*Daemeon C.M. ReiydelleUSA (+1) 415.501.0198London (+44) (0) 20 8144 9872*

On Thu, Jul 7, 2016 at 2:51 PM, Yuan Fang <y...@kryptoncloud.com> wrote:

> Yes, here is my stress test result:
> Results:
> op rate   : 12200 [WRITE:12200]
> partition rate: 12200 [WRITE:12200]
> row rate  : 12200 [WRITE:12200]
> latency mean  : 16.4 [WRITE:16.4]
> latency median: 7.1 [WRITE:7.1]
> latency 95th percentile   : 38.1 [WRITE:38.1]
> latency 99th percentile   : 204.3 [WRITE:204.3]
> latency 99.9th percentile : 465.9 [WRITE:465.9]
> latency max   : 1408.4 [WRITE:1408.4]
> Total partitions  : 100 [WRITE:100]
> Total errors  : 0 [WRITE:0]
> total gc count: 0
> total gc mb   : 0
> total gc time (s) : 0
> avg gc time(ms)   : NaN
> stdev gc time(ms) : 0
> Total operation time  : 00:01:21
> END
>
> On Thu, Jul 7, 2016 at 2:49 PM, Ryan Svihla <r...@foundev.pro> wrote:
>
>> Lots of variables you're leaving out.
>>
>> Depends on write size, if you're using logged batch or not, what
>> consistency level, what RF, if the writes come in bursts, etc, etc.
>> However, that's all sort of moot for determining "normal" really you need a
>> baseline as all those variables end up mattering a huge amount.
>>
>> I would suggest using Cassandra stress as a baseline and go from there
>> depending on what those numbers say (just pick the defaults).
>>
>> Sent from my iPhone
>>
>> On Jul 7, 2016, at 4:39 PM, Yuan Fang <y...@kryptoncloud.com> wrote:
>>
>> yes, it is about 8k writes per node.
>>
>>
>>
>> On Thu, Jul 7, 2016 at 2:18 PM, daemeon reiydelle <daeme...@gmail.com>
>> wrote:
>>
>>> Are you saying 7k writes per node? or 30k writes per node?
>>>
>>>
>>> *...*
>>>
>>>
>>>
>>> *Daemeon C.M. ReiydelleUSA (+1) 415.501.0198
>>> <%28%2B1%29%20415.501.0198>London (+44) (0) 20 8144 9872
>>> <%28%2B44%29%20%280%29%2020%208144%209872>*
>>>
>>> On Thu, Jul 7, 2016 at 2:05 PM, Yuan Fang <y...@kryptoncloud.com> wrote:
>>>
>>>> writes 30k/second is the main thing.
>>>>
>>>>
>>>> On Thu, Jul 7, 2016 at 1:51 PM, daemeon reiydelle <daeme...@gmail.com>
>>>> wrote:
>>>>
>>>>> Assuming you meant 100k, that likely for something with 16mb of
>>>>> storage (probably way small) where the data is more that 64k hence will 
>>>>> not
>>>>> fit into the row cache.
>>>>>
>>>>>
>>>>> *...*
>>>>>
>>>>>
>>>>>
>>>>> *Daemeon C.M. ReiydelleUSA (+1) 415.501.0198
>>>>> <%28%2B1%29%20415.501.0198>London (+44) (0) 20 8144 9872
>>>>> <%28%2B44%29%20%280%29%2020%208144%209872>*
>>>>>
>>>>> On Thu, Jul 7, 2016 at 1:25 PM, Yuan Fang <y...@kryptoncloud.com>
>>>>> wrote:
>>>>>
>>>>>>
>>>>>>
>>>>>> I have a cluster of 4 m4.xlarge nodes(4 cpus and 16 gb memory and
>>>>>> 600GB ssd EBS).
>>>>>> I can reach a cluster wide write requests of 30k/second and read
>>>>>> request about 100/second. The cluster OS load constantly above 10. Are
>>>>>> those normal?
>>>>>>
>>>>>> Thanks!
>>>>>>
>>>>>>
>>>>>> Best,
>>>>>>
>>>>>> Yuan
>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>


Re: Is my cluster normal?

2016-07-07 Thread daemeon reiydelle
Are you saying 7k writes per node? or 30k writes per node?


*...*



*Daemeon C.M. ReiydelleUSA (+1) 415.501.0198London (+44) (0) 20 8144 9872*

On Thu, Jul 7, 2016 at 2:05 PM, Yuan Fang <y...@kryptoncloud.com> wrote:

> writes 30k/second is the main thing.
>
>
> On Thu, Jul 7, 2016 at 1:51 PM, daemeon reiydelle <daeme...@gmail.com>
> wrote:
>
>> Assuming you meant 100k, that likely for something with 16mb of storage
>> (probably way small) where the data is more that 64k hence will not fit
>> into the row cache.
>>
>>
>> *...*
>>
>>
>>
>> *Daemeon C.M. ReiydelleUSA (+1) 415.501.0198
>> <%28%2B1%29%20415.501.0198>London (+44) (0) 20 8144 9872
>> <%28%2B44%29%20%280%29%2020%208144%209872>*
>>
>> On Thu, Jul 7, 2016 at 1:25 PM, Yuan Fang <y...@kryptoncloud.com> wrote:
>>
>>>
>>>
>>> I have a cluster of 4 m4.xlarge nodes(4 cpus and 16 gb memory and 600GB
>>> ssd EBS).
>>> I can reach a cluster wide write requests of 30k/second and read request
>>> about 100/second. The cluster OS load constantly above 10. Are those normal?
>>>
>>> Thanks!
>>>
>>>
>>> Best,
>>>
>>> Yuan
>>>
>>>
>>
>


Re: Is my cluster normal?

2016-07-07 Thread daemeon reiydelle
Assuming you meant 100k, that likely for something with 16mb of storage
(probably way small) where the data is more that 64k hence will not fit
into the row cache.


*...*



*Daemeon C.M. ReiydelleUSA (+1) 415.501.0198London (+44) (0) 20 8144 9872*

On Thu, Jul 7, 2016 at 1:25 PM, Yuan Fang  wrote:

>
>
> I have a cluster of 4 m4.xlarge nodes(4 cpus and 16 gb memory and 600GB
> ssd EBS).
> I can reach a cluster wide write requests of 30k/second and read request
> about 100/second. The cluster OS load constantly above 10. Are those normal?
>
> Thanks!
>
>
> Best,
>
> Yuan
>
>


Re: Debugging high tail read latencies (internal timeout)

2016-07-07 Thread daemeon reiydelle
Hmm. Would you mind looking at your network interface (appropriate netstat
commands). if I am right you will be seeing packet errors, drops, retries,
packet out of window receives, etc.

What you may be missing is that you reported zero DROPPED latency. Not mean
LATENCY. Check your netstats. ANY VALUE CHANGE IS BAD (except total
read/write byte counts). If your network guys say otherwise, escalate to
someone who undertands tcp retry and sliding window.



*...*



*Daemeon C.M. ReiydelleUSA (+1) 415.501.0198London (+44) (0) 20 8144 9872*

On Thu, Jul 7, 2016 at 11:35 AM, Bryan Cheng  wrote:

> Hi Nimi,
>
> My suspicions would probably lie somewhere between GC and large partitions.
>
> The first tool would probably be a trace but if you experience full client
> timeouts from dropped messages you may find it hard to find the issue. You
> can try running the trace with cqlsh's timeouts cranked all the way against
> the local node with CL=ONE to try to force the local machine to answer.
>
> What does nodetool tpstats report for dropped message counts? Are they
> very high? Primarily restricted to READ, or including MUTATION, etc. ?
>
> Are there specific PK's that trigger this behavior, either all the time or
> more consistently? That would finger either very large partition sizes or
> potentially bad hardware on a node. cfhistograms will show you various
> percentile partition sizes and your max as well.
>
> GC should be accessible via JMX and also you should have GCInspector logs
> in cassandra/system.log that should give you per-collection breakdowns.
>
> --Bryan
>
>
> On Wed, Jul 6, 2016 at 6:22 PM, Nimi Wariboko Jr 
> wrote:
>
>> Hi,
>>
>> I've begun experiencing very high tail latencies across my clusters.
>> While Cassandra's internal metrics report <1ms read latencies, measuring
>> responses from within the driver in my applications (roundtrips of
>> query/execute frames), have 90% round trip times of up to a second for very
>> basic queries (SELECT a,b FROM table WHERE pk=x).
>>
>> I've been studying the logs to try and get a handle on what could be
>> going wrong. I don't think there are GC issues, but the logs mention
>> dropped messages due to timeouts while the threadpools are nearly empty -
>>
>> https://gist.github.com/nemothekid/28b2a8e8353b3e60d7bbf390ed17987c
>>
>> Relevant line:
>> REQUEST_RESPONSE messages were dropped in last 5000 ms: 1 for internal
>> timeout and 0 for cross node timeout. Mean internal dropped latency: 54930
>> ms and Mean cross-node dropped latency: 0 ms
>>
>> Are there any tools I can use to start to understand what is causing
>> these issues?
>>
>> Nimi
>>
>>
>


Re: all the nost are not reacheable when running massive deletes

2016-04-04 Thread daemeon reiydelle
Network issues. Could be jumbo frames not consistent or other.

sent from my mobile

sent from my mobile
Daemeon C.M. Reiydelle
USA 415.501.0198
London +44.0.20.8144.9872
On Apr 4, 2016 5:34 AM, "Paco Trujillo"  wrote:

> Hi everyone
>
>
>
> We are having problems with our cluster (7 nodes version 2.0.17) when
> running “massive deletes” on one of the nodes (via cql command line). At
> the beginning everything is fine, but after a while we start getting
> constant NoHostAvailableException using the datastax driver:
>
>
>
> Caused by: com.datastax.driver.core.exceptions.NoHostAvailableException:
> All host(s) tried for query failed (tried: /172.31.7.243:9042
> (com.datastax.driver.core.exceptions.DriverException: Timeout while trying
> to acquire available connection (you may want to increase the driver number
> of per-host connections)), /172.31.7.245:9042
> (com.datastax.driver.core.exceptions.DriverException: Timeout while trying
> to acquire available connection (you may want to increase the driver number
> of per-host connections)), /172.31.7.246:9042
> (com.datastax.driver.core.exceptions.DriverException: Timeout while trying
> to acquire available connection (you may want to increase the driver number
> of per-host connections)), /172.31.7.247:9042, /172.31.7.232:9042, /
> 172.31.7.233:9042, /172.31.7.244:9042 [only showing errors of first 3
> hosts, use getErrors() for more details])
>
>
>
>
>
> All the nodes are running:
>
>
>
> UN  172.31.7.244  152.21 GB  256 14.5%
> 58abea69-e7ba-4e57-9609-24f3673a7e58  RAC1
>
> UN  172.31.7.245  168.4 GB   256 14.5%
> bc11b4f0-cf96-4ca5-9a3e-33cc2b92a752  RAC1
>
> UN  172.31.7.246  177.71 GB  256 13.7%
> 8dc7bb3d-38f7-49b9-b8db-a622cc80346c  RAC1
>
> UN  172.31.7.247  158.57 GB  256 14.1%
> 94022081-a563-4042-81ab-75ffe4d13194  RAC1
>
> UN  172.31.7.243  176.83 GB  256 14.6%
> 0dda3410-db58-42f2-9351-068bdf68f530  RAC1
>
> UN  172.31.7.233  159 GB 256 13.6%
> 01e013fb-2f57-44fb-b3c5-fd89d705bfdd  RAC1
>
> UN  172.31.7.232  166.05 GB  256 15.0%
> 4d009603-faa9-4add-b3a2-fe24ec16a7c1
>
>
>
> but two of them have high cpu load, especially the 232 because I am
> running a lot of deletes using cqlsh in that node.
>
>
>
> I know that deletes generate tombstones, but with 7 nodes in the cluster I
> do not think is normal that all the host are not accesible.
>
>
>
> We have a replication factor of 3 and for the deletes I am not using any
> consistency (so it is using the default ONE).
>
>
>
> I check the nodes which a lot of CPU (near 96%) and th gc activity remains
> on 1.6% (using only 3 GB from the 10 which have assigned). But looking at
> the thread pool stats, the mutation stages pending column grows without
> stop, could be that the problem?
>
>
>
> I cannot find the reason that originates the timeouts. I already have
> increased the timeouts, but It do not think that is a solution because the
> timeouts indicated another type of error. Anyone have a tip to try to
> determine where is the problem?
>
>
>
> Thanks in advance
>


Re: Unexpected high internode network activity

2016-02-25 Thread daemeon reiydelle
Hmm. From the AWS FAQ:

*Q: If I have two instances in different availability zones, how will I be
charged for regional data transfer?*

Each instance is charged for its data in and data out. Therefore, if data
is transferred between these two instances, it is charged out for the first
instance and in for the second instance.


I really am not seeing this factored into your numbers fully. If data
transfer is only twice as much as expected, the above billing would seem to
put the numbers in line. Since (I assume) you have one copy in EACH AZ (dc
aware but really dc=az) I am not seeing the bandwidth as that much out of
line.



*...*



*Daemeon C.M. ReiydelleUSA (+1) 415.501.0198London (+44) (0) 20 8144 9872*

On Thu, Feb 25, 2016 at 11:00 PM, Gianluca Borello <gianl...@sysdig.com>
wrote:

> It is indeed very intriguing and I really hope to learn more from the
> experience of this mailing list. To address your points:
>
> - The theory that full data is coming from replicas during reads is not
> enough to explain the situation. In my scenario, over a time window I had
> 17.5 GB of intra node activity (port 7000) for 1 GB of writes and 1.5 GB of
> reads (measured on port 9042), so even if both reads and writes affected
> all replicas, I would have (1 + 1.5) * 3 = 7.5 GB, still leaving 10 GB on
> port 7000 unaccounted
>
> - We are doing regular backups the standard way, using periodic snapshots
> and synchronizing them to S3. This traffic is not part of the anomalous
> traffic we're seeing above, since this one goes on port 80 and it's clearly
> visible with a separate bpf filter, and its magnitude is far lower than
> that anyway
>
> Thanks
>
> On Thu, Feb 25, 2016 at 9:03 PM, daemeon reiydelle <daeme...@gmail.com>
> wrote:
>
>> Intriguing. It's enough data to look like full data is coming from the
>> replicants instead of digests when the read of the copy occurs. Are you
>> doing backup/dr? Are directories copied regularly and over the network or ?
>>
>>
>> *...*
>>
>>
>>
>> *Daemeon C.M. ReiydelleUSA (+1) 415.501.0198
>> <%28%2B1%29%20415.501.0198>London (+44) (0) 20 8144 9872
>> <%28%2B44%29%20%280%29%2020%208144%209872>*
>>
>> On Thu, Feb 25, 2016 at 8:12 PM, Gianluca Borello <gianl...@sysdig.com>
>> wrote:
>>
>>> Thank you for your reply.
>>>
>>> To answer your points:
>>>
>>> - I fully agree on the write volume, in fact my isolated tests confirm
>>> your estimation
>>>
>>> - About the read, I agree as well, but the volume of data is still much
>>> higher
>>>
>>> - I am writing to one single keyspace with RF 3, there's just one
>>> keyspace
>>>
>>> - I am not using any indexes, the column families are very simple
>>>
>>> - I am aware of the double count, in fact, I measured the traffic on
>>> port 9042 at the client side (so just counted once) and I divided by two
>>> the traffic on port 7000 as measured on each node (35 GB -> 17.5 GB). All
>>> the measurements have been done with iftop with proper bpf filters on the
>>> port and the total traffic matches what I see in cloudwatch (divided by two)
>>>
>>> So unfortunately I still don't have any ideas about what's going on and
>>> why I'm seeing 17 GB of internode traffic instead of ~ 5-6.
>>>
>>> On Thursday, February 25, 2016, daemeon reiydelle <daeme...@gmail.com>
>>> wrote:
>>>
>>>> If read & write at quorum then you write 3 copies of the data then
>>>> return to the caller; when reading you read one copy (assume it is not on
>>>> the coordinator), and 1 digest (because read at quorum is 2, not 3).
>>>>
>>>> When you insert, how many keyspaces get written to? (Are you using e.g.
>>>> inverted indices?) That is my guess, that your db has about 1.8 bytes
>>>> written for every byte inserted.
>>>>
>>>> ​Every byte you write is counted also as a read (system a sends 1gb to
>>>> system b, so system b receives 1gb). You would not be charged if intra AZ,
>>>> but inter AZ and inter DC will get that double count.
>>>>
>>>> So, my guess is reverse indexes, and you forgot to include receive and
>>>> transmit.​
>>>> ​
>>>>
>>>>
>>>> *...*
>>>>
>>>>
>>>>
>>>> *Daemeon C.M. ReiydelleUSA (+1) 415.501.0198
>>>> <%28%2B1%29%20415.501.0198>London (+44) (0) 20 8144 9872
>>>> <%28%2B44%29%20%280%29%2020%208144%209872>*
>>&

Re: Unexpected high internode network activity

2016-02-25 Thread daemeon reiydelle
Intriguing. It's enough data to look like full data is coming from the
replicants instead of digests when the read of the copy occurs. Are you
doing backup/dr? Are directories copied regularly and over the network or ?


*...*



*Daemeon C.M. ReiydelleUSA (+1) 415.501.0198London (+44) (0) 20 8144 9872*

On Thu, Feb 25, 2016 at 8:12 PM, Gianluca Borello <gianl...@sysdig.com>
wrote:

> Thank you for your reply.
>
> To answer your points:
>
> - I fully agree on the write volume, in fact my isolated tests confirm
> your estimation
>
> - About the read, I agree as well, but the volume of data is still much
> higher
>
> - I am writing to one single keyspace with RF 3, there's just one keyspace
>
> - I am not using any indexes, the column families are very simple
>
> - I am aware of the double count, in fact, I measured the traffic on port
> 9042 at the client side (so just counted once) and I divided by two the
> traffic on port 7000 as measured on each node (35 GB -> 17.5 GB). All the
> measurements have been done with iftop with proper bpf filters on the
> port and the total traffic matches what I see in cloudwatch (divided by two)
>
> So unfortunately I still don't have any ideas about what's going on and
> why I'm seeing 17 GB of internode traffic instead of ~ 5-6.
>
> On Thursday, February 25, 2016, daemeon reiydelle <daeme...@gmail.com>
> wrote:
>
>> If read & write at quorum then you write 3 copies of the data then return
>> to the caller; when reading you read one copy (assume it is not on the
>> coordinator), and 1 digest (because read at quorum is 2, not 3).
>>
>> When you insert, how many keyspaces get written to? (Are you using e.g.
>> inverted indices?) That is my guess, that your db has about 1.8 bytes
>> written for every byte inserted.
>>
>> ​Every byte you write is counted also as a read (system a sends 1gb to
>> system b, so system b receives 1gb). You would not be charged if intra AZ,
>> but inter AZ and inter DC will get that double count.
>>
>> So, my guess is reverse indexes, and you forgot to include receive and
>> transmit.​
>> ​
>>
>>
>> *...*
>>
>>
>>
>> *Daemeon C.M. ReiydelleUSA (+1) 415.501.0198
>> <%28%2B1%29%20415.501.0198>London (+44) (0) 20 8144 9872
>> <%28%2B44%29%20%280%29%2020%208144%209872>*
>>
>> On Thu, Feb 25, 2016 at 6:51 PM, Gianluca Borello <gianl...@sysdig.com>
>> wrote:
>>
>>> Hello,
>>>
>>> We have a Cassandra 2.1.9 cluster on EC2 for one of our live
>>> applications. There's a total of 21 nodes across 3 AWS availability zones,
>>> c3.2xlarge instances.
>>>
>>> The configuration is pretty standard, we use the default settings that
>>> come with the datastax AMI and the driver in our application is configured
>>> to use lz4 compression. The keyspace where all the activity happens has RF
>>> 3 and we read and write at quorum to get strong consistency.
>>>
>>> While analyzing our monthly bill, we noticed that the amount of network
>>> traffic related to Cassandra was significantly higher than expected. After
>>> breaking it down by port, it seems like over any given time, the internode
>>> network activity is 6-7 times higher than the traffic on port 9042, whereas
>>> we would expect something around 2-3 times, given the replication factor
>>> and the consistency level of our queries.
>>>
>>> For example, this is the network traffic broken down by port and
>>> direction over a few minutes, measured as sum of each node:
>>>
>>> Port 9042 from client to cluster (write queries): 1 GB
>>> Port 9042 from cluster to client (read queries): 1.5 GB
>>> Port 7000: 35 GB, which must be divided by two because the traffic is
>>> always directed to another instance of the cluster, so that makes it 17.5
>>> GB generated traffic
>>>
>>> The traffic on port 9042 completely matches our expectations, we do
>>> about 100k write operations writing 10KB binary blobs for each query, and a
>>> bit more reads on the same data.
>>>
>>> According to our calculations, in the worst case, when the coordinator
>>> of the query is not a replica for the data, this should generate about (1 +
>>> 1.5) * 3 = 7.5 GB, and instead we see 17 GB, which is quite a lot more.
>>>
>>> Also, hinted handoffs are disabled and nodes are healthy over the period
>>> of observation, and I get the same numbers across pretty much every time
>>> window, even including an entire 24 hours period.
>>>
>>> I tried to replicate this problem in a test environment so I connected a
>>> client to a test cluster done in a bunch of Docker containers (same
>>> parameters, essentially the only difference is the
>>> GossipingPropertyFileSnitch instead of the EC2 one) and I always get what I
>>> expect, the amount of traffic on port 7000 is between 2 and 3 times the
>>> amount of traffic on port 9042 and the queries are pretty much the same
>>> ones.
>>>
>>> Before doing more analysis, I was wondering if someone has an
>>> explanation on this problem, since perhaps we are missing something obvious
>>> here?
>>>
>>> Thanks
>>>
>>>
>>>
>>


Re: Unexpected high internode network activity

2016-02-25 Thread daemeon reiydelle
If read & write at quorum then you write 3 copies of the data then return
to the caller; when reading you read one copy (assume it is not on the
coordinator), and 1 digest (because read at quorum is 2, not 3).

When you insert, how many keyspaces get written to? (Are you using e.g.
inverted indices?) That is my guess, that your db has about 1.8 bytes
written for every byte inserted.

​Every byte you write is counted also as a read (system a sends 1gb to
system b, so system b receives 1gb). You would not be charged if intra AZ,
but inter AZ and inter DC will get that double count.

So, my guess is reverse indexes, and you forgot to include receive and
transmit.​
​


*...*



*Daemeon C.M. ReiydelleUSA (+1) 415.501.0198London (+44) (0) 20 8144 9872*

On Thu, Feb 25, 2016 at 6:51 PM, Gianluca Borello 
wrote:

> Hello,
>
> We have a Cassandra 2.1.9 cluster on EC2 for one of our live applications.
> There's a total of 21 nodes across 3 AWS availability zones, c3.2xlarge
> instances.
>
> The configuration is pretty standard, we use the default settings that
> come with the datastax AMI and the driver in our application is configured
> to use lz4 compression. The keyspace where all the activity happens has RF
> 3 and we read and write at quorum to get strong consistency.
>
> While analyzing our monthly bill, we noticed that the amount of network
> traffic related to Cassandra was significantly higher than expected. After
> breaking it down by port, it seems like over any given time, the internode
> network activity is 6-7 times higher than the traffic on port 9042, whereas
> we would expect something around 2-3 times, given the replication factor
> and the consistency level of our queries.
>
> For example, this is the network traffic broken down by port and direction
> over a few minutes, measured as sum of each node:
>
> Port 9042 from client to cluster (write queries): 1 GB
> Port 9042 from cluster to client (read queries): 1.5 GB
> Port 7000: 35 GB, which must be divided by two because the traffic is
> always directed to another instance of the cluster, so that makes it 17.5
> GB generated traffic
>
> The traffic on port 9042 completely matches our expectations, we do about
> 100k write operations writing 10KB binary blobs for each query, and a bit
> more reads on the same data.
>
> According to our calculations, in the worst case, when the coordinator of
> the query is not a replica for the data, this should generate about (1 +
> 1.5) * 3 = 7.5 GB, and instead we see 17 GB, which is quite a lot more.
>
> Also, hinted handoffs are disabled and nodes are healthy over the period
> of observation, and I get the same numbers across pretty much every time
> window, even including an entire 24 hours period.
>
> I tried to replicate this problem in a test environment so I connected a
> client to a test cluster done in a bunch of Docker containers (same
> parameters, essentially the only difference is the
> GossipingPropertyFileSnitch instead of the EC2 one) and I always get what I
> expect, the amount of traffic on port 7000 is between 2 and 3 times the
> amount of traffic on port 9042 and the queries are pretty much the same
> ones.
>
> Before doing more analysis, I was wondering if someone has an explanation
> on this problem, since perhaps we are missing something obvious here?
>
> Thanks
>
>
>


Re: Checking replication status

2016-02-25 Thread daemeon reiydelle
Hmm. What are your processes when a node comes back after "a long offline"?
Long enough to take the node offline and do a repair? Run the risk of
serving stale data? Parallel repairs? ???

So, what sort of time frames are "a long time"?


*...*



*Daemeon C.M. ReiydelleUSA (+1) 415.501.0198London (+44) (0) 20 8144 9872*

On Thu, Feb 25, 2016 at 11:36 AM, Jimmy Lin  wrote:

> hi all,
>
> what are the better ways to check replication overall status of cassandra 
> cluster?
>
>  within a single DC, unless a node is down for long time, most of the time i 
> feel it is pretty much non-issue and things are replicated pretty fast. But 
> when a node come back from a long offline, is there a way to check that the 
> node has finished its data sync with other nodes  ?
>
>  Now across DC, we have frequent VPN outage (sometime short sometims long) 
> between DCs, i also like to know if there is a way to find how the 
> replication progress between DC catching up under this condtion?
>
>  Also, if i understand correctly, the only gaurantee way to make sure data 
> are synced is to run a complete repair job,
> is that correct? I am trying to see if there is a way to "force a quick 
> replication sync" between DCs after vpn outage.
> Or maybe this is unnecessary, as Cassandra will catch up as fast as it can, 
> there is nothing else we/(system admin) can do to make it faster or better?
>
>
>
> Sent from my iPhone
>


Re: Nodes go down periodically

2016-02-23 Thread daemeon reiydelle
If you can, do a few (short, maybe 10m records, delete the default schema
between executions) run of Cassandra Stress test against your production
cluster (replication=3, force quorum to 3). Look for latency max in the 10s
of SECONDS. If your devops team is running a monitoring tool that looks at
the network, look for timeout/retries/errors/lost packets, etc. during the
run (worst case you need to do netstats runs against the relevant nic e.g.
every 10 seconds on the CassStress node, look for jumps in this count (if
monitoring is enabled, look at the monitor's results for ALL of your nodes.
At least one is having some issues.


*...*



*Daemeon C.M. ReiydelleUSA (+1) 415.501.0198London (+44) (0) 20 8144 9872*

On Tue, Feb 23, 2016 at 8:43 AM, Jack Krupansky 
wrote:

> The reality of modern distributed systems is that connectivity between
> nodes is never guaranteed and distributed software must be able to cope
> with occasional absence of connectivity. GC and network connectivity are
> the two issues that a lot of us are most familiar with. There may be others
> - but most technical problems on a node would be clearly logged on that
> node. If you see a lapse of connectivity no more than once or twice a day,
> consider yourselves lucky.
>
> Is it only one node at a time that goes down, and at widely dispersed
> times?
>
> How many nodes?
>
> -- Jack Krupansky
>
> On Tue, Feb 23, 2016 at 11:01 AM, Joel Samuelsson <
> samuelsson.j...@gmail.com> wrote:
>
>> Hi,
>>
>> Version is 2.0.17.
>> Yes, these are VMs in the cloud though I'm fairly certain they are on a
>> LAN rather than WAN. They are both in the same data centre physically. The
>> phi_convict_threshold is set to default. I'd rather find the root cause of
>> the problem than just hiding it by not convicting a node if it isn't
>> responding though. If pings are <2 ms without a single ping missed in
>> several days, I highly doubt that network is the reason for the downtime.
>>
>> Best regards,
>> Joel
>>
>> 2016-02-23 16:39 GMT+01:00 :
>>
>>> You didn’t mention version, but I saw this kind of thing very often in
>>> the 1.1 line. Often this is connected to network flakiness. Are these VMs?
>>> In the cloud? Connected over a WAN? You mention that ping seems fine. Take
>>> a look at the phi_convict_threshold in c assandra.yaml. You may need to
>>> increase it to reduce the UP/DOWN flapping behavior.
>>>
>>>
>>>
>>>
>>>
>>> Sean Durity
>>>
>>>
>>>
>>> *From:* Joel Samuelsson [mailto:samuelsson.j...@gmail.com]
>>> *Sent:* Tuesday, February 23, 2016 9:41 AM
>>> *To:* user@cassandra.apache.org
>>> *Subject:* Re: Nodes go down periodically
>>>
>>>
>>>
>>> Hi,
>>>
>>>
>>>
>>> Thanks for your reply.
>>>
>>>
>>>
>>> I have debug logging on and see no GC pauses that are that long. GC
>>> pauses are all well below 1s and 99 times out of 100 below 100ms.
>>>
>>> Do I need to enable GC log options to see the pauses?
>>>
>>> I see plenty of these lines:
>>> DEBUG [ScheduledTasks:1] 2016-02-22 10:43:02,891 GCInspector.java (line
>>> 118) GC for ParNew: 24 ms for 1 collections
>>>
>>> as well as a few CMS GC log lines.
>>>
>>>
>>>
>>> Best regards,
>>>
>>> Joel
>>>
>>>
>>>
>>> 2016-02-23 15:14 GMT+01:00 Hannu Kröger :
>>>
>>> Hi,
>>>
>>>
>>>
>>> Those are probably GC pauses. Memory tuning is probably needed. Check
>>> the parameters that you already have customised if they make sense.
>>>
>>>
>>>
>>> http://blog.mikiobraun.de/2010/08/cassandra-gc-tuning.html
>>>
>>>
>>>
>>> Hannu
>>>
>>>
>>>
>>>
>>>
>>> On 23 Feb 2016, at 16:08, Joel Samuelsson 
>>> wrote:
>>>
>>>
>>>
>>> Our nodes go down periodically, around 1-2 times each day. Downtime is
>>> from <1 second to 30 or so seconds.
>>>
>>>
>>>
>>> INFO [GossipTasks:1] 2016-02-22 10:05:14,896 Gossiper.java (line 992)
>>> InetAddress /109.74.13.67 is now DOWN
>>>
>>>  INFO [RequestResponseStage:8844] 2016-02-22 10:05:38,331 Gossiper.java
>>> (line 978) InetAddress /109.74.13.67 is now UP
>>>
>>>
>>>
>>> I find nothing odd in the logs around the same time. I logged a ping
>>> with timestamp and checked during the same time and saw nothing weird (ping
>>> is less than 2ms at all times).
>>>
>>>
>>>
>>> Does anyone have any suggestions as to why this might happen?
>>>
>>>
>>>
>>> Best regards,
>>> Joel
>>>
>>>
>>>
>>>
>>>
>>> --
>>>
>>> The information in this Internet Email is confidential and may be
>>> legally privileged. It is intended solely for the addressee. Access to this
>>> Email by anyone else is unauthorized. If you are not the intended
>>> recipient, any disclosure, copying, distribution or any action taken or
>>> omitted to be taken in reliance on it, is prohibited and may be unlawful.
>>> When addressed to our clients any opinions or advice contained in this
>>> Email are subject to the terms and conditions expressed in any applicable
>>> governing The Home Depot terms of 

RE: Restart Cassandra automatically

2016-02-23 Thread daemeon reiydelle
Cassandra nodes do not go down "for no reason". They are not stateless. I
would like to thank you for this marvelous example of a wonderful
antipattern. Absolutely fantastic.

Thank you! I am not being a satirical smartass. I sometimes am challenged
by clients in my presentations about sre best practices around c*, hadoop,
and elk on the grounds that "noone would ever do this in production". Now I
have objective proof!

Daemeon

sent from my mobile
Daemeon C.M. Reiydelle
USA 415.501.0198
London +44.0.20.8144.9872
On Feb 23, 2016 7:53 AM,  wrote:

> Yes, I can see the potential problem in theory. However, we never do your
> #2. Generally, we don’t have unused spare hardware. We just fix the host
> that is down and run repairs. (Side note: while I have seen nodes fight it
> out over who owns a particular token in earlier versions, it seems that
> 1.2+ doesn’t allow that to happen as easily. The second node will just not
> come up.)
>
>
>
> For most of our use cases, I would agree with your Coli Conjecture.
>
>
>
>
>
> Sean Durity
>
>
>
> *From:* Robert Coli [mailto:rc...@eventbrite.com]
> *Sent:* Tuesday, February 09, 2016 4:41 PM
> *To:* user@cassandra.apache.org
> *Subject:* Re: Restart Cassandra automatically
>
>
>
> On Tue, Feb 9, 2016 at 6:20 AM,  wrote:
>
> Call me naïve, but we do use an in-house built program for keeping nodes
> started (based on a flag-check). The program is something that was written
> for all kinds of daemon processes here, not Cassandra specifically. The
> basic idea is that is runs a status check. If that fails, and the flag is
> set, start Cassandra. In my opinion, it has helped more than hurt us –
> especially with the very fragile 1.1 releases that were prone to heap
> problems.
>
>
>
> Ok, you're naïve.. ;P
>
>
>
> But seriously, think of this scenario :
>
>
>
> 1) Node A, responsible for range A-M, goes down due to hardware failure of
> a disk in a RAID
>
> 2) Node B is put into service and is made responsible for A-M
>
> 3) Months pass
>
> 4) Node A comes back up, announces that it is responsible for A-M, and the
> cluster agrees
>
>
>
> Consistency is now permanently broken for any involved rows. Why doesn't
> it (usually) matter?
>
>
>
> It's not so much that you are naïve but that you are providing still more
> support for the Coli Conjecture : "If you are using a distributed database
> you probably do not care about consistency, even if you think you do." You
> have repeatedly chosen Availability over Consistency and it has never had a
> negative impact on your actual application.
>
>
>
> =Rob
>
>
>
> --
>
> The information in this Internet Email is confidential and may be legally
> privileged. It is intended solely for the addressee. Access to this Email
> by anyone else is unauthorized. If you are not the intended recipient, any
> disclosure, copying, distribution or any action taken or omitted to be
> taken in reliance on it, is prohibited and may be unlawful. When addressed
> to our clients any opinions or advice contained in this Email are subject
> to the terms and conditions expressed in any applicable governing The Home
> Depot terms of business or client engagement letter. The Home Depot
> disclaims all responsibility and liability for the accuracy and content of
> this attachment and for any damages or losses arising from any
> inaccuracies, errors, viruses, e.g., worms, trojan horses, etc., or other
> items of a destructive nature, which may be contained in this attachment
> and shall not be liable for direct, indirect, consequential or special
> damages in connection with this e-mail message or its attachment.
>


Re: Live upgrade 2.0 to 2.1 temporarily increases GC time causing timeouts and unavailability

2016-02-19 Thread daemeon reiydelle
FYI, my observations were with native, not thrift.


*...*



*Daemeon C.M. ReiydelleUSA (+1) 415.501.0198London (+44) (0) 20 8144 9872*

On Fri, Feb 19, 2016 at 10:12 AM, Sotirios Delimanolis  wrote:

> Does your cluster contain 24+ nodes or fewer?
>
> We did the same upgrade on a smaller cluster of 5 nodes and we didn't see
> this behavior. On the 24 node cluster, the timeouts only took effect once
> ~5-6-7+ nodes had been upgraded.
>
> We're doing some more upgrades next week, trying different deployment
> plans. I'll report back with the results.
>
> Thanks for the reply (we absolutely want to move to CQL)
>
>
> On Friday, February 19, 2016 1:10 AM, Alain RODRIGUEZ 
> wrote:
>
>
> I performed this exact update a few days ago, excepted clients were using
> native protocol and it wen smoothly. So I think this might be thrift
> related. No idea what is producing this though, just wanted to give the
> info fwiw.
>
> As a side note, unrelated to the issue, performances using native are a
> lot better than thrift starting in C* 2.1. Drivers using native are also
> more modern allowing you to do very interesting stuff. Updating to native
> now that you are using 2.1 is something you might want to do soon enough
> :-).
>
> C*heers,
> -
> Alain Rodriguez
> France
>
> The Last Pickle
> http://www.thelastpickle.com
>
> 2016-02-19 3:07 GMT+01:00 Sotirios Delimanolis :
>
> We have a Cassandra cluster with 24 nodes. These nodes were running
> 2.0.16.
>
> While the nodes are in the ring and handling queries, we perform the
> upgrade to 2.1.12 as follows (more or less) one node at a time:
>
>
>1. Stop the Cassandra process
>2. Deploy jars, scripts, binaries, etc.
>3. Start the Cassandra process
>
>
> A few nodes into the upgrade, we start noticing that the majority of
> queries (mostly through Thrift) time out or report unavailable. Looking at
> system information, Cassandra GC time goes through the roof, which is what
> we assume causes the time outs.
>
> Once all nodes are upgraded, the cluster stabilizes and no more (barely
> any) time outs occur.
>
> What could explain this? Does it have anything to do with how a 2.0
> communicates with a 2.1?
>
> Our Cassandra consumers haven't changed.
>
>
>
>
>
>
>
>
>


Re: Live upgrade 2.0 to 2.1 temporarily increases GC time causing timeouts and unavailability

2016-02-19 Thread daemeon reiydelle
May be unrelated, but I found highly variable latency (latency max) when on
the 2.1 code tree loading new data (and reading). Others found that G1 or
CMS do not make a difference. Some evidence that 8/12/16gb memory make no
difference. These were latencies in the 10-30 SECOND range. It did cause
timeouts. You may not be seeing a 2.0 vs. 2.1 issue, rather a 2.1 issue
proper. While others did not find this associated with stop-the-world GC, I
saw some evidence of same (using Cassandra stress, but I recently reproduce
the issue with YCSB!)


*...*



*Daemeon C.M. ReiydelleUSA (+1) 415.501.0198London (+44) (0) 20 8144 9872*

On Fri, Feb 19, 2016 at 1:10 AM, Alain RODRIGUEZ  wrote:

> I performed this exact update a few days ago, excepted clients were using
> native protocol and it wen smoothly. So I think this might be thrift
> related. No idea what is producing this though, just wanted to give the
> info fwiw.
>
> As a side note, unrelated to the issue, performances using native are a
> lot better than thrift starting in C* 2.1. Drivers using native are also
> more modern allowing you to do very interesting stuff. Updating to native
> now that you are using 2.1 is something you might want to do soon enough
> :-).
>
> C*heers,
> -
> Alain Rodriguez
> France
>
> The Last Pickle
> http://www.thelastpickle.com
>
> 2016-02-19 3:07 GMT+01:00 Sotirios Delimanolis :
>
>> We have a Cassandra cluster with 24 nodes. These nodes were running
>> 2.0.16.
>>
>> While the nodes are in the ring and handling queries, we perform the
>> upgrade to 2.1.12 as follows (more or less) one node at a time:
>>
>>
>>1. Stop the Cassandra process
>>2. Deploy jars, scripts, binaries, etc.
>>3. Start the Cassandra process
>>
>>
>> A few nodes into the upgrade, we start noticing that the majority of
>> queries (mostly through Thrift) time out or report unavailable. Looking at
>> system information, Cassandra GC time goes through the roof, which is what
>> we assume causes the time outs.
>>
>> Once all nodes are upgraded, the cluster stabilizes and no more (barely
>> any) time outs occur.
>>
>> What could explain this? Does it have anything to do with how a 2.0
>> communicates with a 2.1?
>>
>> Our Cassandra consumers haven't changed.
>>
>>
>>
>>
>>
>>
>


Re: Compatability, performance & portability of Cassandra data types (MAP, UDT & JSON) in DSE Search & Analytics

2016-02-18 Thread daemeon reiydelle
Given you only have 16 columns vs. over 200 ... I would expect a
substantial improvement in writes, but not 5x.
Ditto reads. I would be interested to understand where that 5x comes from.


*...*



*Daemeon C.M. ReiydelleUSA (+1) 415.501.0198London (+44) (0) 20 8144 9872*

On Thu, Feb 18, 2016 at 8:20 PM, Chandra Sekar KR <
chandraseka...@hotmail.com> wrote:

> Hi,
>
>
> I'm looking for help in arriving at pros & cons of using MAP, UDT & JSON
> (Text) data types in Cassandra & its ease of use/impact across other DSE
> products - Spark & Solr. We are migrating an OLTP database from RDBMS to
> Cassandra which has 200+ columns and with an average daily volume of 25
> million records/day. The access pattern is quite simple and in OLTP the
> access is always based on primary key. For OLAP, there are other access
> patterns with a combination of columns where we are planning to use Spark &
> Solr for search & analytical capabilities (in a separate DC).
>
>
> The average size of each record is ~2KB and the application workload is of
> type INSERT only (no updates/deletes). We conducted performance tests on
> two types of data models
>
> 1) A table with 200+ columns similar to RDBMS
>
> 2) A table with 15 columns where only critical business fields are
> maintained as key/value pairs and the remaining are stored in a single
> column of type TEXT as JSON object.
>
>
> In the results, we noticed significant advantage in the JSON model where
> the performance was 5X times better than columnar data model.
> Alternatively, we are in the process of evaluating performance for other
> data types - MAP & UDT instead of using TEXT for storing JSON object.
> Sample data model structure for columnar, json, map & udt types are given
> below:
>
>
>
>
> I would like to know the performance, transformation, compatibility &
> portability impacts & east-of-use of each of these data types from Search &
> Analytics perspective (Spark & Solr). I'm aware that we will have to use
> field transformers in Solr to use index on JSON fields, not sure about MAP
> & UDT. Any help on comparison of these data types in Spark & Solr is highly
> appreciated.
>
>
> Regards, KR
>


Re: High Bloom filter false ratio

2016-02-18 Thread daemeon reiydelle
The bloom filter buckets the values in a small number of buckets. I have
been surprised by how many cases I see with large cardinality where a few
values populate a given bloom leaf, resulting in high false positives, and
a surprising impact on latencies!

Are you seeing 2:1 ranges between mean and worse case latencies (allowing
for gc times)?

Daemeon Reiydelle
On Feb 18, 2016 8:57 AM, "Tyler Hobbs" <ty...@datastax.com> wrote:

> You can try slightly lowering the bloom_filter_fp_chance on your table.
>
> Otherwise, it's possible that you're repeatedly querying one or two
> partitions that always trigger a bloom filter false positive.  You could
> try manually tracing a few queries on this table (for non-existent
> partitions) to see if the bloom filter rejects them.
>
> Depending on your Cassandra version, your false positive ratio could be
> inaccurate: https://issues.apache.org/jira/browse/CASSANDRA-8525
>
> There are also a couple of recent improvements to bloom filters:
> * https://issues.apache.org/jira/browse/CASSANDRA-8413
> * https://issues.apache.org/jira/browse/CASSANDRA-9167
>
>
> On Thu, Feb 18, 2016 at 1:35 AM, Anishek Agarwal <anis...@gmail.com>
> wrote:
>
>> Hello,
>>
>> We have a table with composite partition key with humungous cardinality,
>> its a combination of (long,long). On the table we have
>> bloom_filter_fp_chance=0.01.
>>
>> On doing "nodetool cfstats" on the 5 nodes we have in the cluster we are
>> seeing  "Bloom filter false ratio:" in the range of 0.7 -0.9.
>>
>> I thought over time the bloom filter would adjust to the key space
>> cardinality, we have been running the cluster for a long time now but have
>> added significant traffic from Jan this year, which would not lead to
>> writes in the db but would lead to high reads to see if are any values.
>>
>> Are there any settings that can be changed to allow better ratio.
>>
>> Thanks
>> Anishek
>>
>
>
>
> --
> Tyler Hobbs
> DataStax <http://datastax.com/>
>


Re: Cassandra Collections performance issue

2016-02-09 Thread daemeon reiydelle
I think the key to your problem might be around "we overwrite every value".
You are creating a large number of tombstones, forcing many reads to pull
current results. You would do well to rethink why you are having to to
overwrite values all the time under the same key. You would be better to
figure out haw to add values under a key then age off the old values. I
would say that (at least at scale) you have a classic anti-pattern in play.


*...*



*Daemeon C.M. ReiydelleUSA (+1) 415.501.0198London (+44) (0) 20 8144 9872*

On Mon, Feb 8, 2016 at 5:23 PM, Robert Coli  wrote:

> On Mon, Feb 8, 2016 at 2:10 PM, Agrawal, Pratik 
> wrote:
>
>> Recently we added one of the table fields from as Map in 
>> *Cassandra
>> 2.1.11*. Currently we read every field from Map and overwrite map
>> values. Map is of size 3. We saw that writes are 30-40% slower while reads
>> are 70-80% slower. Please find below some metrics that can help.
>>
>> My question is, Are there any known issues in Cassandra map performance?
>> As I understand it each of the CQL3 Map entry, maps to a column in
>> cassandra, with that assumption we are just creating 3 columns right? Any
>> insight on this issue would be helpful.
>>
>
> I have previously heard reports along similar lines, but in the other
> direction.
>
> eg - "I moved from a collection to a TEXT column with JSON in it, and my
> reads and writes both became much faster!"
>
> I'm not sure if the issue has been raised as an Apache Cassandra Jira, iow
> if it is a known and expected limitation as opposed to just a performance
> issue.
>
> If I were you, I would consider filing a repro case as a Jira ticket, and
> responding to this thread with its URL. :D
>
> =Rob
>
>


Re: Need Feedback about cassandra-stress tests

2016-01-23 Thread daemeon reiydelle
Might I suggest you START by using the default schema provided by
cassandra-stress. Using someone else's schema is great AFTER you use have
used a standard and generally well understood baseline.

>From that you can decide whether a 4 node x 2 cluster is right for you.
FYI, given your 6 way replication, the above do not seem unreasonable. You
can see where YOUR test hits a peak and falls back in throughput. But what
are your USER requirements?

With the default schema (you can try distinct as well as the default
keyspace1 shared keyspace), look very hard at your latencies. If you can
withstand occasional 10 second worst case latencies (1msec) during
compaction, great. if your 90/99/99.5 latencies are important, you will be
able to see where your tiny cluster plays out. Only THEN can you try some
other no doubt fun schema.



*...*






*“Life should not be a journey to the grave with the intention of arriving
safely in apretty and well preserved body, but rather to skid in broadside
in a cloud of smoke,thoroughly used up, totally worn out, and loudly
proclaiming “Wow! What a Ride!” - Hunter ThompsonDaemeon C.M. ReiydelleUSA
(+1) 415.501.0198London (+44) (0) 20 8144 9872*

On Sat, Jan 23, 2016 at 10:18 AM, Bhuvan Rawal  wrote:

> Hi All,
>
> I have been trying to set up a cluster for POC and ran cassandra-stress
> tests today on an 8 Node cluster with 2 DC having 4 nodes each. The disk
> used was SSD with I/O of greater than 500MB per sec, CPU- 2.3 Ghz ,4 Core
> Xeon and 8Gigs of RAM in each node.
>
> First of all I would like to thank *Alain Rodriguez* and *Sabastian
> Estevez* for helping me to sort out an issue which was troubling me.
>
> I used the schema and queries mentioned in the gist at this link -
> https://gist.github.com/tjake/fb166a659e8fe4c8d4a3#file-query2-txt-L14. I
> used Network Replication on the keyspace with RF of 3 for each DC, meaning
> each key would be replicated to 6
>
> Im not sure if I have managed to get the config right because the read
> operations seem to be slow. I would need your comments on them.
>
> Command - $ cassandra-stress user profile=./blogpost.yaml ops\(insert=1\) 
> -node 10.41.55.21
> Insert Operation
> type id total ops op/s pk/s row/s
> threadCount 4 insert 90259 2994 2994 4088
> threadCount 4 total 90259 2994 2994 4088
> threadCount 8 insert 161754 5362 5362 7335
> threadCount 8 total 161754 5362 5362 7335
> threadCount 16 insert 261958 7796 7796 10653
> threadCount 16 total 261958 7796 7796 10653
> threadCount 24 insert 769757 8132 8132 5
> threadCount 24 total 769757 8132 8132 5
> threadCount 36 insert 571600 9762 9762 13362
> threadCount 36 total 571600 9762 9762 13362
> threadCount 54 insert 884351 10730 10730 14672
> threadCount 54 total 884351 10730 10730 14672
> threadCount 81 insert 623874 10919 10919 14931
> threadCount 81 total 623874 10919 10919 14931
> threadCount 121 insert 1234867 11736 11736 16053
> threadCount 121 total 1234867 11736 11736 16053
> threadCount 181 insert 2377008 10310 10310 14110
> threadCount 181 total 2377008 10310 10310 14110
>
> Command-$cassandra-stress user profile=./blogpost.yaml 
> ops\(singlepost=2,timeline=1,insert=1\) -node 10.41.55.21
> Mixed – Read, Write
> Query – singlepost=2,timeline=1,insert=1
> type id total ops op/s pk/s row/s
> threadCount 4 insert 27179 76 76 103
> threadCount 4 singlepost 54118 151 151 151
> threadCount 4 timeline 27343 76 76 576
> threadCount 4 total 108640 303 303 831
> threadCount 8 insert 8642 149 149 203
> threadCount 8 singlepost 17838 307 307 307
> threadCount 8 timeline 8529 147 147 1107
> threadCount 8 total 35009 602 602 1617
> threadCount 16 insert 18784 176 176 242
> threadCount 16 singlepost 37960 356 356 356
> threadCount 16 timeline 19058 179 179 1359
> threadCount 16 total 75802 710 710 1957
> threadCount 24 insert 21564 151 151 208
> threadCount 24 singlepost 44545 313 313 313
> threadCount 24 timeline 21553 151 151 1150
> threadCount 24 total 87662 615 615 1670
> threadCount 36 insert 38054 185 185 252
> threadCount 36 singlepost 76495 372 372 372
> threadCount 36 timeline 38783 189 189 1443
> threadCount 36 total 153332 746 746 2068
> threadCount 54 insert 92639 167 167 229
> threadCount 54 singlepost 187085 338 338 338
> threadCount 54 timeline 92679 167 167 1292
> threadCount 54 total 372403 673 673 1859
>
> Command - $ cassandra-stress user profile=./blogpost.yaml ops\(singlepost=1\) 
> -node 10.41.55.21
>
> Read only load, single value, include blog text (which is 5000 chars) – 
> singlepost query
> Query- select * from blogposts where domain = ? LIMIT 1
> type id total ops op/s pk/s row/s
> threadCount 4 singlepost 11334 368 368 368
> threadCount 4 total 11334 368 368 368
> threadCount 8 singlepost 18744 547 547 547
> threadCount 8 total 18744 547 547 547
> threadCount 16 singlepost 26038 607 607 607
> threadCount 16 total 26038 607 607 607
> threadCount 24 singlepost 35903 654 654 654
> threadCount 24 total 35903 654 654 654
> 

Re: In UJ status for over a week trying to rejoin cluster in Cassandra 3.0.1

2016-01-17 Thread daemeon reiydelle
What do the logs say on the seed node (and on the UJ node)?

Look for timeout messages.

This problem has occurred for me when there was high network utilization
between the seed and the joining node, also routing issues.



*...*






*“Life should not be a journey to the grave with the intention of arriving
safely in apretty and well preserved body, but rather to skid in broadside
in a cloud of smoke,thoroughly used up, totally worn out, and loudly
proclaiming “Wow! What a Ride!” - Hunter ThompsonDaemeon C.M. ReiydelleUSA
(+1) 415.501.0198London (+44) (0) 20 8144 9872*

On Sun, Jan 17, 2016 at 2:24 PM, Kai Wang  wrote:

> Carlos,
>
> so you essentially replace the 33 node. Did you follow this
> https://docs.datastax.com/en/cassandra/2.0/cassandra/operations/ops_replace_node_t.html?
> The link is for 2.x not sure about 3.x. What if you change the new node to
> .34?
>
>
>
> On Mon, Jan 11, 2016 at 12:57 AM, Carlos A  wrote:
>
>> Hello all,
>>
>> I have a small dev environment with 4 machines. One of them, I had it
>> removed (.33) from the cluster because I wanted to upgrade its HD to a SSD.
>> I then reinstalled it and tried to join. It is on UJ status for a week now
>> and no changes.
>>
>> I had tried node-repair etc but nothing.
>>
>> nodetool status output
>>
>> Datacenter: DC1
>> ===
>> Status=Up/Down
>> |/ State=Normal/Leaving/Joining/Moving
>> --  Address   Load   Tokens   OwnsHost ID
>>   Rack
>> UN  192.168.1.30  16.13 MB   256  ?
>> 0e524b1c-b254-45d0-98ee-63b8f34a8531  RAC1
>> UN  192.168.1.31  20.12 MB   256  ?
>> 1f8000f5-026c-42c7-8189-cf19fbede566  RAC1
>> UN  192.168.1.32  17.73 MB   256  ?
>> 7b06f9e9-7c41-4364-ab18-f6976fd359e4  RAC1
>> UJ  192.168.1.33  877.6 KB   256  ?
>> 7a1507b5-198e-4a3a-a9fd-7af9e588fde2  RAC1
>>
>> Note: Non-system keyspaces don't have the same replication settings,
>> effective ownership information is meaningless
>>
>> Any tips on fixing this?
>>
>> Thanks,
>>
>> C.
>>
>
>


Re: electricity outage problem

2016-01-15 Thread daemeon reiydelle
Nodes need about 60-90 second delay before it can start accepting
connections as a seed node. Also a seed node needs time to accept a node
starting up, and syncing to other nodes (on 10gbit the max new nodes is
only 1 or 2, on 1gigabit it can handle at least 3-4 new nodes connecting).
In a large cluster (500 nodes) I see this wierd condition where nodetool
status shows overlapping subsets of nodes, and the problem does not go away
after even an hour on a 10 gigabit network).



*...*






*“Life should not be a journey to the grave with the intention of arriving
safely in apretty and well preserved body, but rather to skid in broadside
in a cloud of smoke,thoroughly used up, totally worn out, and loudly
proclaiming “Wow! What a Ride!” - Hunter ThompsonDaemeon C.M. ReiydelleUSA
(+1) 415.501.0198London (+44) (0) 20 8144 9872*

On Fri, Jan 15, 2016 at 9:17 AM, Adil <adil.cha...@gmail.com> wrote:

> Hi,
> we did full restart of the cluster but nodetool status still giving
> incoerent info from different nodes, some nodes appers UP from a node but
> appers DOWN from another, and in the log as is said still having the
> message "received an invalid gossip generation for peer /x.x.x.x"
> cassandra version is 2.1.2, we want to execute the purge operation as
> explained here
> https://docs.datastax.com/en/cassandra/2.1/cassandra/operations/ops_gossip_purge.html
> but we don't found the peers folder, should we do it via cql deleting the
> peers content? should we do it for all nodes?
>
> thanks
>
>
> 2016-01-12 17:42 GMT+01:00 Jack Krupansky <jack.krupan...@gmail.com>:
>
>> Sometimes you may have to clear out the saved Gossip state:
>>
>> https://docs.datastax.com/en/cassandra/2.0/cassandra/operations/ops_gossip_purge.html
>>
>> Note the instruction about bringing up the seed nodes first. Normally
>> seed nodes are only relevant when initially joining a node to a cluster
>> (and then the Gossip state will be persisted locally), but if you clear te
>> persisted Gossip state the seed nodes will again be needed to find the rest
>> of the cluster.
>>
>> I'm not sure whether a power outage is the same as stopping and
>> restarting an instance (AWS) in terms of whether the restarted instance
>> retains its current public IP address.
>>
>>
>>
>> -- Jack Krupansky
>>
>> On Tue, Jan 12, 2016 at 10:02 AM, daemeon reiydelle <daeme...@gmail.com>
>> wrote:
>>
>>> This happens when there is insufficient time for nodes coming up to join
>>> a network. It takes a few seconds for a node to come up, e.g. your seed
>>> node. If you tell a node to join a cluster you can get this scenario
>>> because of high network utilization as well. I wait 90 seconds after the
>>> first (i.e. my first seed) node comes up to start the next one. Any nodes
>>> that are seeds need some 60 seconds, so the additional 30 seconds is a
>>> buffer. Additional nodes each wait 60 seconds before joining (although this
>>> is a parallel tree for large clusters).
>>>
>>>
>>>
>>>
>>>
>>> *...*
>>>
>>>
>>>
>>>
>>>
>>>
>>> *“Life should not be a journey to the grave with the intention of
>>> arriving safely in apretty and well preserved body, but rather to skid in
>>> broadside in a cloud of smoke,thoroughly used up, totally worn out, and
>>> loudly proclaiming “Wow! What a Ride!” - Hunter ThompsonDaemeon C.M.
>>> ReiydelleUSA (+1) 415.501.0198 <%28%2B1%29%20415.501.0198>London (+44) (0)
>>> 20 8144 9872 <%28%2B44%29%20%280%29%2020%208144%209872>*
>>>
>>> On Tue, Jan 12, 2016 at 6:56 AM, Adil <adil.cha...@gmail.com> wrote:
>>>
>>>> Hi,
>>>>
>>>> we have two DC with 5 nodes in each cluster, yesterday there was an
>>>> electricity outage causing all nodes down, we restart the clusters but when
>>>> we run nodetool status on DC1 it results that some nodes are DN, the
>>>> strange thing is that running the command from diffrent node in DC1 doesn't
>>>> give the same node in DC as own, we have noticed this message in the log
>>>> "received an invalid gossip generation for peer", does anyone know how to
>>>> resolve this problem? should we purge the gossip?
>>>>
>>>> thanks
>>>>
>>>> Adil
>>>>
>>>
>>>
>>
>


Re: Encryption in cassandra

2016-01-14 Thread daemeon reiydelle
The keys don't have to be on the box. You do need a logi/password for c*.

sent from my mobile
Daemeon C.M. Reiydelle
USA 415.501.0198
London +44.0.20.8144.9872
On Jan 14, 2016 5:16 PM, "oleg yusim"  wrote:

> Greetings,
>
> Guys, can you please help me to understand following:
>
> I'm reading through the way keystore and truststore are implemented, and
> it is all fine and great, but at the end Cassandra documentation
> instructing to extract all the keystore content and leave all certs and
> keys in a clear.
>
> Do I miss something here? Why are we doing it? What is the point to even
> have a keystore then? It doesn't look very secure to me...
>
> Another item - cassandra.yaml has passwords from keystore and truststore -
> clear text... what is the point to have these stores then, if passwords are
> out?
>
> Thanks,
>
> Oleg
>


Re: electricity outage problem

2016-01-12 Thread daemeon reiydelle
This happens when there is insufficient time for nodes coming up to join a
network. It takes a few seconds for a node to come up, e.g. your seed node.
If you tell a node to join a cluster you can get this scenario because of
high network utilization as well. I wait 90 seconds after the first (i.e.
my first seed) node comes up to start the next one. Any nodes that are
seeds need some 60 seconds, so the additional 30 seconds is a buffer.
Additional nodes each wait 60 seconds before joining (although this is a
parallel tree for large clusters).





*...*






*“Life should not be a journey to the grave with the intention of arriving
safely in apretty and well preserved body, but rather to skid in broadside
in a cloud of smoke,thoroughly used up, totally worn out, and loudly
proclaiming “Wow! What a Ride!” - Hunter ThompsonDaemeon C.M. ReiydelleUSA
(+1) 415.501.0198London (+44) (0) 20 8144 9872*

On Tue, Jan 12, 2016 at 6:56 AM, Adil  wrote:

> Hi,
>
> we have two DC with 5 nodes in each cluster, yesterday there was an
> electricity outage causing all nodes down, we restart the clusters but when
> we run nodetool status on DC1 it results that some nodes are DN, the
> strange thing is that running the command from diffrent node in DC1 doesn't
> give the same node in DC as own, we have noticed this message in the log
> "received an invalid gossip generation for peer", does anyone know how to
> resolve this problem? should we purge the gossip?
>
> thanks
>
> Adil
>


Re: Three questions about cassandra

2015-11-27 Thread daemeon reiydelle
There is a window after a node goes down that changes that node should have
gotten will be kept. If the node is down LONGER than that, it will server
stale data. If the consistency is greater than two, its data will be
ignored (if consistency one, its data could be the first returned, if
consistency two then the application needs to be able to handle such a
situation. Nodetool repair needs to be run in this case to get data
consistent. Cleanup does more than make things pretty, but it will do that.

The comment about disabling the thrift listener is related to preventing
the node serving old data if the timeout I mention above has expired
between the time the node comes on line and the time the repair is
completed.

One of the advantages of using e.g. Ansible is that it can be configured to
whack an errant node's thrift listener BEFORE it starts the node's Cass
instance. Agent based tools like Puppet and Chef can have this magic
performed. This automatically start Cass vs. NOT automatically starting the
service sometimes makes for interesting religious wars. And obviously if
the node didn't stop but just lost network connections, there are
advantages to agent based tools.





*...*






*“Life should not be a journey to the grave with the intention of arriving
safely in apretty and well preserved body, but rather to skid in broadside
in a cloud of smoke,thoroughly used up, totally worn out, and loudly
proclaiming “Wow! What a Ride!” - Hunter ThompsonDaemeon C.M. ReiydelleUSA
(+1) 415.501.0198London (+44) (0) 20 8144 9872*

On Fri, Nov 27, 2015 at 3:51 AM, Hadmut Danisch  wrote:

> Thanks!
>
> Hadmut
>


Re: Repair Hangs while requesting Merkle Trees

2015-11-11 Thread daemeon reiydelle
Have you checked the network statistics on that machine? (netstats -tas)
while attempting to repair ... if netstats show ANY issues you have a
problem. If you can put the command in a loop running every 60 seconds for
maybe 15 minutes and post back?

Out of curiousity, how many remote DC nodes are getting successfully
repaired?



*...*






*“Life should not be a journey to the grave with the intention of arriving
safely in apretty and well preserved body, but rather to skid in broadside
in a cloud of smoke,thoroughly used up, totally worn out, and loudly
proclaiming “Wow! What a Ride!” - Hunter ThompsonDaemeon C.M. ReiydelleUSA
(+1) 415.501.0198London (+44) (0) 20 8144 9872*

On Wed, Nov 11, 2015 at 1:06 PM, Anuj Wadehra 
wrote:

> Hi,
>
> we are using 2.0.14. We have 2 DCs at remote locations with 10GBps
> connectivity.We are able to complete repair (-par -pr) on 5 nodes. On only
> one node in DC2, we are unable to complete repair as it always hangs. Node
> sends Merkle Tree requests, but one or more nodes in DC1 (remote) never
> show that they sent the merkle tree reply to requesting node.
> Repair hangs infinitely.
>
> After increasing request_timeout_in_ms on affected node, we were able to
> successfully run repair on one of the two occassions.
>
> Any comments, why this is happening on just one node? In
> OutboundTcpConnection.java,  when isTimeOut method always returns false for
> non-droppable verb such as Merkle Tree Request(verb=REPAIR_MESSAGE),why
> increasing request timeout solved problem on one occasion ?
>
>
> Thanks
> Anuj Wadehra
>
>
>
> On Thursday, 12 November 2015 2:35 AM, Anuj Wadehra <
> anujw_2...@yahoo.co.in> wrote:
>
>
> Hi,
>
> We have 2 DCs at remote locations with 10GBps connectivity.We are able to
> complete repair (-par -pr) on 5 nodes. On only one node in DC2, we are
> unable to complete repair as it always hangs. Node sends Merkle Tree
> requests, but one or more nodes in DC1 (remote) never show that they sent
> the merkle tree reply to requesting node.
> Repair hangs infinitely.
>
> After increasing request_timeout_in_ms on affected node, we were able to
> successfully run repair on one of the two occassions.
>
> Any comments, why this is happening on just one node? In
> OutboundTcpConnection.java,  when isTimeOut method always returns false for
> non-droppable verb such as Merkle Tree Request(verb=REPAIR_MESSAGE),why
> increasing request timeout solved problem on one occasion ?
>
>
> Thanks
> Anuj Wadehra
>
>
>


Re: Can consistency-levels be different for "read" and "write" in Datastax Java-Driver?

2015-10-26 Thread daemeon reiydelle
If one rethinks "consistency" to mean "copies returned" and "copies
written" then one can have different values for the former (datastax) and
the latter (within Cassandra). The latter changes eventual consistency
(e.g. two copies must be written), the former can speed up a result at the
(slight) risk of stale data. I have no experience with the former, just
recall it somewhere in the documentation: n-copy eventual consistency is
fine for all of my work.



*...*






*“Life should not be a journey to the grave with the intention of arriving
safely in apretty and well preserved body, but rather to skid in broadside
in a cloud of smoke,thoroughly used up, totally worn out, and loudly
proclaiming “Wow! What a Ride!” - Hunter ThompsonDaemeon C.M. ReiydelleUSA
(+1) 415.501.0198London (+44) (0) 20 8144 9872*

On Mon, Oct 26, 2015 at 11:52 AM, Jonathan Haddad  wrote:

> What's your query?  Do you have IF NOT EXISTS in there?
>
> On Mon, Oct 26, 2015 at 11:17 AM Ajay Garg  wrote:
>
>> Right now, I have setup "LOCAL QUORUM" as the consistency level in the
>> driver, but it seems that "SERIAL" is being used during writes, and I
>> consistently get this error of type ::
>>
>> *Cassandra timeout during write query at consistency SERIAL (3 replica
>> were required but only 0 acknowledged the write)*
>>
>>
>> Am I missing something?
>>
>>
>>
>> --
>> Regards,
>> Ajay
>>
>


Re: How much disk is needed to compact Leveled compaction?

2015-04-05 Thread daemeon reiydelle
You appear to have multiple java binaries in your path. That needs to be
resolved.

sent from my mobile
Daemeon C.M. Reiydelle
USA 415.501.0198
London +44.0.20.8144.9872
On Apr 5, 2015 1:40 AM, Jean Tremblay jean.tremb...@zen-innovations.com
wrote:

  Hi,
 I have a cluster of 5 nodes. We use cassandra 2.1.3.

  The 5 nodes use about 50-57% of the 1T SSD.
  One node managed to compact all its data. During one compaction this node
 used almost 100% of the drive. The other nodes refuse to continue
 compaction claiming that there is not enough disk space.

  From the documentation LeveledCompactionStrategy should be able to
 compact my data, well at least this is what I understand.

  Size-tiered compaction requires at least as much free disk space for
 compaction as the size of the largest column family. Leveled compaction
 needs much less space for compaction, only 10 * sstable_size_in_mb.
 However, even if you’re using leveled compaction, you should leave much
 more free disk space available than this to accommodate streaming, repair,
 and snapshots, which can easily use 10GB or more of disk space.
 Furthermore, disk performance tends to decline after 80 to 90% of the disk
 space is used, so don’t push the boundaries.

  This is the disk usage. Node 4 is the only one that could compact
 everything.
  node0: /dev/disk1 931Gi 534Gi 396Gi 57% /
 node1: /dev/disk1 931Gi 513Gi 417Gi 55% /
 node2: /dev/disk1 931Gi 526Gi 404Gi 57% /
 node3: /dev/disk1 931Gi 507Gi 424Gi 54% /
 node4: /dev/disk1 931Gi 475Gi 456Gi 51% /

  When I try to compact the other ones I get this:

  objc[18698]: Class JavaLaunchHelper is implemented in both /Library/Java/
 JavaVirtualMachines/jdk1.8.0_40.jdk/Contents/Home/bin/java and
 /Library/Java/JavaVirtualMachines/jdk1.8.0_40.jdk/Contents/Home/jre/lib/libinstrument.dylib.
 One of the two will be used. Which one is undefined.
 error: Not enough space for compaction, estimated sstables = 2894,
 expected write size = 485616651726
 -- StackTrace --
 java.lang.RuntimeException: Not enough space for compaction, estimated
 sstables = 2894, expected write size = 485616651726
 at org.apache.cassandra.db.compaction.CompactionTask.
 checkAvailableDiskSpace(CompactionTask.java:293)
 at org.apache.cassandra.db.compaction.CompactionTask.
 runMayThrow(CompactionTask.java:127)
 at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28)
 at org.apache.cassandra.db.compaction.CompactionTask.executeInternal(
 CompactionTask.java:76)
 at org.apache.cassandra.db.compaction.AbstractCompactionTask.execute(
 AbstractCompactionTask.java:59)
 at org.apache.cassandra.db.compaction.CompactionManager$7.runMayThrow(
 CompactionManager.java:512)
 at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28)

   I did not set the sstable_size_in_mb I use the 160MB default.

  Is it normal that during compaction it needs so much diskspace? What
 would be the best solution to overcome this problem?

  Thanks for your help




Re: COMMERCIAL:Re: Cross-datacenter requests taking a very long time.

2015-04-02 Thread daemeon reiydelle
You might want to see what quorum is configured? I meant to ask that.



*...*






*“Life should not be a journey to the grave with the intention of arriving
safely in apretty and well preserved body, but rather to skid in broadside
in a cloud of smoke,thoroughly used up, totally worn out, and loudly
proclaiming “Wow! What a Ride!” - Hunter ThompsonDaemeon C.M. ReiydelleUSA
(+1) 415.501.0198London (+44) (0) 20 8144 9872*

On Thu, Apr 2, 2015 at 12:39 PM, Andrew Vant andrew.v...@rackspace.com
wrote:

 On Mar 31, 2015, at 4:59 PM, daemeon reiydelle daeme...@gmail.com wrote:
  What is your replication factor?

 NetworkTopologyStrategy with replfactor: 2 in each DC.

 Someone else asked about the endpoint snitch I'm using; it's set to
 GossipingPropertyFileSnitch.

  Any idea how much data has to be processed under the query?

 It does not matter what query I use, or what size; the problem occurs even
 just selecting a single user from the users table.

  While running the query against both DC's, you can take a look at
 netstats
  to get a really quick-and-dirty idea of network traffic.

 I'll try that. I should add that one of the other teams here has a similar
 setup (3 nodes in 3 DCs) that is working correctly. We're going to go
 through the config files and see if we can figure out what's different.

 --

 Andrew


Re: Best practice: Multiple clusters vs multiple tables in a single cluster?

2015-04-02 Thread daemeon reiydelle
Jack did a superb job of explaining all of your issues, and his last
sentence seems to fit your needs (and my experience) very well. The only
other point I would add is to ascertain if the use patterns commend
microservices to abstract from data locality, even if the initial
deployment is a noop to a single cluster. This depends on whether you see a
rapid stream of special purpose business functions. A second question is
about data access ... does Pig support your data access response times?
Many clients find Hadoop ideally suited to a sophisticated ECTL (extract,
cleanup, transformation, and load) model to fast, schema oriented,
repositories like e.g. MySQL. All depends on the use case, growth 
fragmentation expectations for your business model(s), etc.

Good luck.

PS, Jack thanks, for your succint comment.




On Thu, Apr 2, 2015 at 6:33 AM, Jack Krupansky jack.krupan...@gmail.com
wrote:

 There is an old saying in the software industry: The structure of a system
 follows from the structure of the organization that created it (Conway's
 Law). Seriously, the main, first question for your end is who owns the
 applications in terms of executive management, such that if management
 makes a decision that dramatically affects the app's impact on the cluster,
 is it likely that they will have done so with the concurrence of management
 who owns the other app. Trust me, you do not want to be in the middle when
 two managers are in dispute over whose app is more important. IOW, if one
 manager owns both apps, you are probably safe, but if two different
 managers might have differing views of each other's priorities, tread with
 caution.

 In any case, be prepared to move one of the apps to a different cluster if
 and when usage patterns cause them to conflict.

 There is also the concept of devOps, where the app developers also own
 operations. You really can't have two separate development teams administer
 operations for one set of hardware.

 If you are dedicated to operations for both app teams and the teams seem
 to be reasonably compatible, then it could be fine.

 In short, sure, technically a single cluster can support  any number of
 key spaces, but mostly it will come down to whether there might be an
 excess of contention for load and operations of the cluster in production.

 And then little things like software upgrades - one app might really need
 a disruptive or risky upgrade or need to bounce the entire cluster, but
 then the other app may be impacted even though it had no need for the
 upgrade or be bounced.

 Are the apps synergistic in some way, such that there is an architectural
 benefit from running on the same hardware?

 In the end, the simplest solution is typically the better solution, unless
 any of these other factors loom too large.


 -- Jack Krupansky

 On Thu, Apr 2, 2015 at 9:06 AM, Ian Rose ianr...@fullstory.com wrote:

 Hi all -

 We currently have a single cassandra cluster that is dedicated to a
 relatively narrow purpose, with just 2 tables.  Soon we will need cassandra
 for another, unrelated, system, and my debate is whether to just add the
 new tables to our existing cassandra cluster or whether to spin up an
 entirely new, separate cluster for this new system.

 Does anyone have pros/cons to share on this?  It appears from watching
 talks and such online that the big users (e.g. Netflix, Spotify) tend to
 favor multiple, single-purpose clusters, and thus that was my initial
 preference.  But we are (for now) no where close to them in traffic so I'm
 wondering if running an entirely separate cluster would be a premature
 optimization which wouldn't pay for the (nontrivial) overhead in
 configuration management and ops.  While we are still small it might be
 much smarter to reuse our existing clusters so that I can get it done
 faster...

 Thanks!
 - Ian





Re: Cluster status instability

2015-04-02 Thread daemeon reiydelle
Do you happen to be using a tool like Nagios or Ganglia that are able to
report utilization (CPU, Load, disk io, network)? There are plugins for
both that will also notify you of (depending on whether you enabled the
intermediate GC logging) about what is happening.



On Thu, Apr 2, 2015 at 8:35 AM, Jan cne...@yahoo.com wrote:

 Marcin  ;

 are all your nodes within the same Region   ?
 If not in the same region,   what is the Snitch type that you are using
 ?

 Jan/



   On Thursday, April 2, 2015 3:28 AM, Michal Michalski 
 michal.michal...@boxever.com wrote:


 Hey Marcin,

 Are they actually going up and down repeatedly (flapping) or just down and
 they never come back?
 There might be different reasons for flapping nodes, but to list what I
 have at the top of my head right now:

 1. Network issues. I don't think it's your case, but you can read about
 the issues some people are having when deploying C* on AWS EC2 (keyword to
 look for: phi_convict_threshold)

 2. Heavy load. Node is under heavy load because of massive number of reads
 / writes / bulkloads or e.g. unthrottled compaction etc., which may result
 in extensive GC.

 Could any of these be a problem in your case? I'd start from investigating
 GC logs e.g. to see how long does the stop the world full GC take (GC
 logs should be on by default from what I can see [1])

 [1] https://issues.apache.org/jira/browse/CASSANDRA-5319

 Michał


 Kind regards,
 Michał Michalski,
 michal.michal...@boxever.com

 On 2 April 2015 at 11:05, Marcin Pietraszek mpietras...@opera.com wrote:

 Hi!

 We have 56 node cluster with C* 2.0.13 + CASSANDRA-9036 patch
 installed. Assume we have nodes A, B, C, D, E. On some irregular basis
 one of those nodes starts to report that subset of other nodes is in
 DN state although C* deamon on all nodes is running:

 A$ nodetool status
 UN B
 DN C
 DN D
 UN E

 B$ nodetool status
 UN A
 UN C
 UN D
 UN E

 C$ nodetool status
 DN A
 UN B
 UN D
 UN E

 After restart of A node, C and D report that A it's in UN and also A
 claims that whole cluster is in UN state. Right now I don't have any
 clear steps to reproduce that situation, do you guys have any idea
 what could be causing such behaviour? How this could be prevented?

 It seems like when A node is a coordinator and gets request for some
 data being replicated on C and D it respond with Unavailable
 exception, after restarting A that problem disapears.

 --
 mp







Re: Frequent timeout issues

2015-04-02 Thread daemeon reiydelle
May not be relevant, but what is the default heap size you have deployed.
Should be no more than 16gb (and be aware of the impacts of gc on that
large size), suggest not smaller than 8-12gb.



On Wed, Apr 1, 2015 at 11:28 AM, Anuj Wadehra anujw_2...@yahoo.co.in
wrote:

 Are you writing multiple cf at same time?
 Please run nodetool tpstats to make sure that FlushWriter etc doesnt have
 high All time blocked counts. A Blocked memtable FlushWriter may block/drop
 writes. If thats the case you may need to increase memtable flush
 writers..if u have many secondary indexes in cf ..make sure that memtable
 flush que size is set at least equal to no of indexes..

 monitoring iostat and gc logs may help..

 Thanks
 Anuj Wadehra
 --
   *From*:Amlan Roy amlan@cleartrip.com
 *Date*:Wed, 1 Apr, 2015 at 9:27 pm
 *Subject*:Re: Frequent timeout issues

 Did not see any exception in cassandra.log and system.log. Monitored using
 JConsole. Did not see anything wrong. Do I need to see any specific info?
 Doing almost 1000 writes/sec.

 HBase and Cassandra are running on different clusters. For cassandra I
 have 6 nodes with 64GB RAM(Heap is at default setting) and 32 cores.

 On 01-Apr-2015, at 8:43 pm, Eric R Medley emed...@xylocore.com wrote:




Re: Column value not getting updated

2015-04-02 Thread daemeon reiydelle
Interesting that you are finding excessive drift from public time servers.
I only once saw that problem with AWS' time servers. To be conservative I
sometimes recommend that clients spool up their own time server, but
realize IT will also drift if the public time servers do! Somewhat
different if in your own DC, but same time server drift issues.

Google has resorted to putting tier one time server(s) (cesium clock or
whatever) in every data center due to the public drift issues. Does anyone
know if AWS' time server is now stratum 1 backed?

However, it is better to have two (at least) in AWS, make sure their
private IP's are not in the same 24 CIDR subnet!

Of course this can get troublesome if load sharing between e.g. AWS East
and West.



*...*






*“Life should not be a journey to the grave with the intention of arriving
safely in apretty and well preserved body, but rather to skid in broadside
in a cloud of smoke,thoroughly used up, totally worn out, and loudly
proclaiming “Wow! What a Ride!” - Hunter ThompsonDaemeon C.M. ReiydelleUSA
(+1) 415.501.0198London (+44) (0) 20 8144 9872*

On Tue, Mar 31, 2015 at 10:49 PM, Saurabh Sethi saurabh_se...@symantec.com
wrote:

 Thanks Mark. A great post indeed and saved me a lot of trouble.

 - Saurabh
 From: Mark Greene green...@gmail.com
 Reply-To: user@cassandra.apache.org user@cassandra.apache.org
 Date: Tuesday, March 31, 2015 at 10:15 PM
 To: user@cassandra.apache.org user@cassandra.apache.org

 Subject: Re: Column value not getting updated

 Hey Saurabh,

 We're actually preparing for this ourselves and spinning up our own NTP
 server pool. The public NTP pools have a lot of drift and should not be
 relied upon for cluster technology that is sensitive to time skew like C*.

 The folks at Logentries did a great write up about this which we used as a
 guide.



-

 https://blog.logentries.com/2014/03/synchronizing-clocks-in-a-cassandra-cluster-pt-1-the-problem/
-

 https://blog.logentries.com/2014/03/synchronizing-clocks-in-a-cassandra-cluster-pt-2-solutions/


 -Mark

 On Tue, Mar 31, 2015 at 5:59 PM, Saurabh Sethi saurabh_se...@symantec.com
  wrote:

 That’s what I found out that the clocks were not in sync.

 But I have setup NTP on all 3 nodes and would expect the clocks to be in
 sync.

 From: Nate McCall n...@thelastpickle.com
 Reply-To: user@cassandra.apache.org user@cassandra.apache.org
 Date: Tuesday, March 31, 2015 at 2:50 PM
 To: Cassandra Users user@cassandra.apache.org
 Subject: Re: Column value not getting updated

 You would see that if the servers' clocks were out of sync.

 Make sure the time on the servers is in sync or set the client timestamps
 explicitly.

 On Tue, Mar 31, 2015 at 3:23 PM, Saurabh Sethi 
 saurabh_se...@symantec.com wrote:

 I have written a unit test that creates a column family, inserts a row
 in that column family and then updates the value of one of the columns.

 After updating, unit test immediately tries to read the updated value
 for that column, but Cassandra returns the old value.

- I am using QueryBuilder API and not CQL directly.
- I am using the consistency level of QUORUM for everything –
insert, update and read.
- Cassandra is running as a 3 node cluster with replication factor
of 3.


 Anyone has any idea what is going on here?

 Thanks,
 Saurabh




 --
 -
 Nate McCall
 Austin, TX
 @zznate

 Co-Founder  Sr. Technical Consultant
 Apache Cassandra Consulting
 http://www.thelastpickle.com





  1   2   >