Re: Compression Tuning Tutorial
Great post, Jonathan! Thank you very much. ~Eric On Wed, Aug 8, 2018 at 2:34 PM Jonathan Haddad wrote: > Hey folks, > > We've noticed a lot over the years that people create tables usually > leaving the default compression parameters, and have spent a lot of time > helping teams figure out the right settings for their cluster based on > their workload. I finally managed to write some thoughts down along with a > high level breakdown of how the internals function that should help people > pick better settings for their cluster. > > This post focuses on a mixed 50:50 read:write workload, but the same > conclusions are drawn from a read heavy workload. Hopefully this helps > some folks get better performance / save some money on hardware! > > http://thelastpickle.com/blog/2018/08/08/compression_performance.html > > > -- > Jon Haddad > Principal Consultant, The Last Pickle >
Re: 答复: Time serial column family design
Jon, Great article. Thank you. (I have nothing to do with this issue, but I appreciate nuggets of information I glean from the list) Regards, Eric On Tue, Apr 17, 2018 at 10:57 PM Jonathan Haddadwrote: > To add to what Nate suggested, we have an entire blog post on scaling time > series data models: > > > http://thelastpickle.com/blog/2017/08/02/time-series-data-modeling-massive-scale.html > > Jon > > > On Tue, Apr 17, 2018 at 7:39 PM Nate McCall > wrote: > >> I disagree. Create date as a raw integer is an excellent surrogate for >> controlling time series "buckets" as it gives you complete control over the >> granularity. You can even have multiple granularities in the same table - >> remember that partition key "misses" in Cassandra are pretty lightweight as >> they won't make it past the bloom filter on the read path. >> >> On Wed, Apr 18, 2018 at 10:00 AM, Javier Pareja >> wrote: >> >>> Hi David, >>> >>> Could you describe why you chose to include the create date in the >>> partition key? If the vin in enough "partitioning", meaning that the size >>> (number of rows x size of row) of each partition is less than 100MB, then >>> remove the date and just use the create_time, because the date is already >>> included in that column anyways. >>> >>> For example if columns "a" and "b" (from your table) are of max 256 UTF8 >>> characters, then you can have approx 100MB / (2*256*2Bytes) = 100,000 rows >>> per partition. You can actually have many more but you don't want to go >>> much higher for performance reasons. >>> >>> If this is not enough you could use create_month instead of create_date, >>> for example, to reduce the partition size while not being too granular. >>> >>> >>> On Tue, 17 Apr 2018, 22:17 Nate McCall, wrote: >>> Your table design will work fine as you have appropriately bucketed by an integer-based 'create_date' field. Your goal for this refactor should be to remove the "IN" clause from your code. This will move the rollup of multiple partition keys being retrieved into the client instead of relying on the coordinator assembling the results. You have to do more work and add some complexity, but the trade off will be much higher performance as you are removing the single coordinator as the bottleneck. On Tue, Apr 17, 2018 at 10:05 PM, Xiangfei Ni wrote: > Hi Nate, > > Thanks for your reply! > > Is there other way to design this table to meet this requirement? > > > > Best Regards, > > > > 倪项菲*/ **David Ni* > > 中移德电网络科技有限公司 > > Virtue Intelligent Network Ltd, co. > > Add: 2003,20F No.35 Luojia creative city,Luoyu Road,Wuhan,HuBei > > Mob: +86 13797007811|Tel: + 86 27 5024 2516 > > > > *发件人:* Nate McCall > *发送时间:* 2018年4月17日 7:12 > *收件人:* Cassandra Users > *主题:* Re: Time serial column family design > > > > > > Select * from test where vin =“ZD41578123DSAFWE12313” and create_date > in (20180416, 20180415, 20180414, 20180413, 20180412….); > > But this cause the cql query is very long,and I don’t know whether > there is limitation for the length of the cql. > > Please give me some advice,thanks in advance. > > > > Using the SELECT ... IN syntax means that: > > - the driver will not be able to route the queries to the nodes which > have the partition > > - a single coordinator must scatter-gather the query and results > > > > Break this up into a series of single statements using the > executeAsync method and gather the results via something like Futures in > Guava or similar. > -- - Nate McCall Wellington, NZ @zznate CTO Apache Cassandra Consulting http://www.thelastpickle.com >>> >> >> >> -- >> - >> Nate McCall >> Wellington, NZ >> @zznate >> >> CTO >> Apache Cassandra Consulting >> http://www.thelastpickle.com >> >
Re: Gathering / Curating / Organizing Cassandra Best Practices & Patterns
*Kenneth, * How did you get "caught in the middle" of this "stuff"? You are the one bringing it up? Also, your tone switched between calling Chris a "well intended ASF" board member, to calling him an "idiot". He asked a perfectly reasonable question, and then other questions followed as a result. If you want to contribute to the community, please start by being respectful to all members of the community. Regards, Eric Plowe On Mon, Feb 26, 2018 at 12:35 PM Kenneth Brotman <kenbrot...@yahoo.com.invalid> wrote: > I got caught in the middle of this stuff. I feel for everyone. I said > my two cents. I had to vent. I’m back to concentrating on helping the > group. > > > > Kenneth Brotman > > > > *From:* Eric Evans [mailto:john.eric.ev...@gmail.com] > *Sent:* Monday, February 26, 2018 9:16 AM > *To:* user@cassandra.apache.org > *Subject:* Re: Gathering / Curating / Organizing Cassandra Best Practices > & Patterns > > > > > > > > On Sun, Feb 25, 2018 at 8:45 AM, Kenneth Brotman < > kenbrot...@yahoo.com.invalid> wrote: > > Chris Mattmann acted without authority and completely improperly as an > Apache Software Foundation board member as a board member on their own has > no authority. Their authority is to participate and vote at board > meetings. They are not allowed to transact business, they are not supposed > to force themselves on anyone or order anyone around. The one that was > acting controlling was this idiot board member that has caused this > situation between DataStax and the rest of our community. > > > > Furthermore, when he instructed Cassandra legend Jonathan Ellis, the > Cassandra PMC Chair to include certain information in a report to the > Apache Software Foundation board that escalated the matter to something > that was before the board. > > > > I am not an attorney and this should not be taken as legal advice! > > > > It is clear to me as one someone who is experienced and trained as a board > member that Chris Mattmann and the ASF itself probably will find themselves > in court over this. I think a lot of folks should raise this matter with > their legal counsel. > > > > What happened is not trivial. It is news worthy. I suggest people talk > to the media about this story Ask them to investigate and report the > story. > > > > Is APC interfering with other communities? > > > > Kenneth, I really think you need to pump the brakes here. You're leveling > some pretty serious accusations, and have now resorted to personal attacks; > This is not constructive. > > > > *From:* Kenneth Brotman [mailto:kenbrot...@yahoo.com.INVALID] > *Sent:* Saturday, February 24, 2018 3:29 PM > *To:* user@cassandra.apache.org > *Subject:* RE: Gathering / Curating / Organizing Cassandra Best Practices > & Patterns > *Importance:* High > > > > If you read the email message, the first link below, you’ll see that it’s > a well intending Apache Foundation board member who could not grasp how our > community functioned. Apache Foundation messed up our community by the way > they handled a routine inquiry, leaving no option for DataStax but to seek > legal counsel. I’ve been there. Your own legal counsel deal the final > blow. They tell you all communication has to go through them. They tell > you there has to be clear separation. They say you have to take their > advice or they will not keep defending you and you will not any personal > protection. Anyone can be sued and you will be liable for defending > yourself. Sound familiar! > > > > Everyone kept saying that everything was good. That the community, our > community liked the way things worked. > > > > I call on Apache Foundation to reach out to DataStax and fix the mess > forthwith! Report openly on your efforts. You can fix your mess Apache > Foundation. This email says it all. A total miscall: > https://www.mail-archive.com/dev@cassandra.apache.org/msg09090.html. And > the guy has a PhD! > > > > Kenneth Brotman > > > > *From:* Kenneth Brotman [mailto:kenbrot...@yahoo.com.INVALID > <kenbrot...@yahoo.com.INVALID>] > *Sent:* Saturday, February 24, 2018 12:58 PM > *To:* user@cassandra.apache.org > *Subject:* RE: Gathering / Curating / Organizing Cassandra Best Practices > & Patterns > > > > Jon, > > > > This is considered the start of the problem: > https://www.mail-archive.com/dev@cassandra.apache.org/msg09050.html > > > > That’s according to this well sourced article called “Fear of Staxit: What > next for ASF’s Cassandra as biggest donor cuts back” > https://www.theregister.co.uk/2016
Re: Cassandra Needs to Grow Up by Version Five!
Cassandra, hard to use? I disagree completely. With that said, there are definitely deficiencies in certain parts of the documentation, but nothing that is a show stopper. We’ve been using Cassandra since the sub 1.0 days and have had nothing but great things to say about it. With that said, its an open source project; you get from it what you’re willing to put in. If you just expect something that installs, asks a couple of questions and you’re off to the races, Cassandra might not be for you. If you’re willing to put in the time to understand how Cassandra works, and how it fits into your use case, and if it is the right fit for your use case, you’ll be more than happy, I bet. If there are things that are lacking, that you can’t find a work around for, submit a PR! That’s the beauty of open source projects. On Thu, Feb 22, 2018 at 2:55 AM Oleksandr Shulgin < oleksandr.shul...@zalando.de> wrote: > On Wed, Feb 21, 2018 at 7:54 PM, Durity, Sean R < > sean_r_dur...@homedepot.com> wrote: > >> >> >> However, I think the shots at Cassandra are generally unfair. When I >> started working with it, the DataStax documentation was some of the best >> documentation I had seen on any project, especially an open source one. >> > > Oh, don't get me started on documentation, especially the DataStax one. I > come from Postgres. In comparison, Cassandra documentation is mostly > non-existent (and this is just a way to avoid listing other uncomfortable > epithets). > > Not sure if I would be able to submit patches to improve that, however, > since most of the time it would require me to already know the answer to my > questions when the doc is incomplete. > > The move from DataStax to Apache.org for docs is actually good, IMO, since > the docs were maintained very poorly and there was no real leverage to > influence that. > > Cheers, > -- > Alex > >
Re: Don't print Ping caused error logs
The driver had load balancing policies built in. Behind a load balancer you'd lose the benefit things like the TokenAwarePolicy. On Mon, Jun 19, 2017 at 3:49 PM Jonathan Haddadwrote: > The driver grabs all the cluster information from the nodes you provide > the driver and connects automatically to the rest. You don't need (and > shouldn't use) a load balancer. > > Jon > > On Mon, Jun 19, 2017 at 12:28 PM Daniel Hölbling-Inzko < > daniel.hoelbling-in...@bitmovin.com> wrote: > >> Just out of curiosity how to you then make sure all nodes get the same >> amount of traffic from clients without having to maintain a manual contact >> points list of all cassandra nodes in the client applications? >> Especially with big C* deployments this sounds like a lot of work >> whenever adding/removing nodes. Putting them behind a lb that can Auto >> discover nodes (or manually adding them to the LB rotation etc) sounds like >> a much easier way. >> I am thinking mostly about cloud lb systems like AWS ELB or GCP LB >> >> Or can the client libraries discover nodes and use other contact points >> für subsequent requests? Having a bunch of seed nodes would be easier I >> guess. >> >> Greetings Daniel >> Akhil Mehra schrieb am Mo. 19. Juni 2017 um 11:44: >> >>> Just in case you are not aware using a load balancer is an anti patter. >>> Please refer to ( >>> http://docs.datastax.com/en/landing_page/doc/landing_page/planning/planningAntiPatterns.html#planningAntiPatterns__AntiPatLoadBal >>> ) >>> >>> You can turnoff logging for a particular class using the nodetool >>> setlogginglevel ( >>> http://docs.datastax.com/en/cassandra/2.1/cassandra/tools/toolsSetLogLev.html >>> ). >>> >>> In your case you can try setting the log level for >>> org.apache.cassandra.transport.Message to warn using the following command >>> >>> nodetool setlogginglevel org.apache.cassandra.transport.Message WARN >>> >>> Obviously this will suppress all info level logging in the message >>> class. >>> >>> I hope that helps. >>> >>> Cheers, >>> Akhil >>> >>> >>> >>> >>> On 19/06/2017, at 9:13 PM, wxn...@zjqunshuo.com wrote: >>> >>> Hi, >>> Our cluster nodes are behind a SLB(Service Load Balancer) with a VIP and >>> the Cassandra client access the cluster by the VIP. >>> System.log print the below IOException every several seconds. I guess >>> it's the SLB service which Ping the port 9042 of the Cassandra node >>> periodically and caused the exceptions print. >>> Any method to prevend the Ping caused exceptions been print? >>> >>> >>> INFO [SharedPool-Worker-1] 2017-06-19 16:54:15,997 Message.java:605 - >>> Unexpected exception during request; channel = [id: 0x332c09b7, / >>> 10.253.106.210:9042] >>> java.io.IOException: Error while read(...): Connection reset by peer >>> >>> at io.netty.channel.epoll.Native.readAddress(Native Method) >>> ~[netty-all-4.0.23.Final.jar:4.0.23.Final] >>> >>> at >>> io.netty.channel.epoll.EpollSocketChannel$EpollSocketUnsafe.doReadBytes(EpollSocketChannel.java:675) >>> ~[netty-all-4.0.23.Final.jar:4.0.23.Final] >>> >>> at >>> io.netty.channel.epoll.EpollSocketChannel$EpollSocketUnsafe.epollInReady(EpollSocketChannel.java:714) >>> ~[netty-all-4.0.23.Final.jar:4.0.23.Final] >>> >>> at >>> io.netty.channel.epoll.EpollEventLoop.processReady(EpollEventLoop.java:326) >>> ~[netty-all-4.0.23.Final.jar:4.0.23.Final] >>> >>> at io.netty.channel.epoll.EpollEventLoop.run(EpollEventLoop.java:264) >>> ~[netty-all-4.0.23.Final.jar:4.0.23.Final] >>> >>> at >>> io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:116) >>> ~[netty-all-4.0.23.Final.jar:4.0.23.Final] >>> >>> at >>> io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorator.run(DefaultThreadFactory.java:137) >>> ~[netty-all-4.0.23.Final.jar:4.0.23.Final] >>> at java.lang.Thread.run(Thread.java:745) [na:1.7.0_85] >>> >>> Cheer, >>> -Simon >>> >>> >>>
Re: ONE has much higher latency than LOCAL_ONE
Yes, your request from the client is going to the LocalDC that you've defined for the data center aware load balancing policy, but with a consistency level of ONE, there is a chance for the coordinator (the node your client has connected to) to route the request across DC's. Please see: https://docs.datastax.com/en/cassandra/2.1/cassandra/dml/dmlClientRequestsRead.html#dmlClientRequestsRead__two-dc-one "A two datacenter cluster with a consistency level of ONE "In a multiple datacenter cluster with a replication factor of 3, and a read consistency of ONE, the closest replica for the given row, regardless of datacenter, is contacted to fulfill the read request. In the background a read repair is potentially initiated, based on the read_repair_chance setting of the table, for the other replicas." A two datacenter cluster with a consistency level of LOCAL_ONE <https://docs.datastax.com/en/cassandra/2.1/cassandra/dml/dmlClientRequestsRead.html#dmlClientRequestsRead__two-dc-local-one> In a multiple datacenter cluster with a replication factor of 3, and a read consistency of LOCAL_ONE, the closest replica for the given row in the same datacenter as the coordinator node is contacted to fulfill the read request. In the background a read repair is potentially initiated, based on the read_repair_chance setting of the table, for the other replicas." Dynamic snitching also comes into play with reads. Just because your client is using TokenAware, and should connect to the appropriate replica node (which now is your coordinator) it can route your read request away from what it believes to be poorly performing nodes to another replica which could be in the other DC with CL = ONE. Read more about dynamic snitch here: https://docs.datastax.com/en/cassandra/2.1/cassandra/architecture/architectureSnitchDynamic_c.html Regards, Eric Plowe On Wed, Mar 22, 2017 at 12:21 PM Shannon Carey <sca...@expedia.com> wrote: I understand all that, but it doesn't explain why the latency increases. The requests are not going to a remote DC. I know this because currently all requests are going to the client in one particular DC. The read request rate of the Cassandra nodes in the other DC remained flat (near zero) the whole time, compared to ~200read/s on the Cassandra nodes in the DC local to the client doing the reads. This is expected, because the DCAwareRoundRobinPolicy will cause local nodes to be used preferentially whenever possible. What's not expected is the dramatic latency increase. Btw this client is read-only: no writes. From: Eric Plowe <eric.pl...@gmail.com> Reply-To: "user@cassandra.apache.org" <user@cassandra.apache.org> Date: Tuesday, March 21, 2017 at 7:23 PM To: "user@cassandra.apache.org" <user@cassandra.apache.org> Subject: Re: ONE has much higher latency than LOCAL_ONE ONE means at least one replica node to ack the write, but doesn't require that the coordinator route the request to a node in the local data center. LOCAL_ONE was introduced to handle the case of when you have multiple data centers and cross data center traffic is not desirable. In multiple datacenter clusters, a consistency level of ONE is often desirable, but cross-DC traffic is not. LOCAL_ONEaccomplishes this. For security and quality reasons, you can use this consistency level in an offline datacenter to prevent automatic connection to online nodes in other datacenters if an offline node goes down. From: https://docs.datastax.com/en/cassandra/2.1/cassandra/dml/dml_config_consistency_c.html Regards, Eric On Tue, Mar 21, 2017 at 7:49 PM Shannon Carey <sca...@expedia.com> wrote: The cluster is in two DCs, and yes the client is deployed locally to each DC. From: Matija Gobec <matija0...@gmail.com> Reply-To: "user@cassandra.apache.org" <user@cassandra.apache.org> Date: Tuesday, March 21, 2017 at 2:56 PM To: "user@cassandra.apache.org" <user@cassandra.apache.org> Subject: Re: ONE has much higher latency than LOCAL_ONE Are you running a multi DC cluster? If yes do you have application in both data centers/regions ? On Tue, Mar 21, 2017 at 8:07 PM, Shannon Carey <sca...@expedia.com> wrote: I am seeing unexpected behavior: consistency level ONE increases read latency 99th percentile to ~108ms (95th percentile to 5ms-90ms) up from ~5ms (99th percentile) when using LOCAL_ONE. I am using DSE 5.0 with Datastax client 3.0.0. The client is configured with a TokenAwarePolicy wrapping a DCAwareRoundRobinPolicy with usedHostsPerRemoteDc set to a very high number. Cassandra cluster has two datacenters. I would expect that when the cluster is operating normally (all local nodes reachable), ONE would behave the same as LOCAL_ONE. The Does anyone know why this is not the case?
Re: ONE has much higher latency than LOCAL_ONE
ONE means at least one replica node to ack the write, but doesn't require that the coordinator route the request to a node in the local data center. LOCAL_ONE was introduced to handle the case of when you have multiple data centers and cross data center traffic is not desirable. In multiple datacenter clusters, a consistency level of ONE is often desirable, but cross-DC traffic is not. LOCAL_ONEaccomplishes this. For security and quality reasons, you can use this consistency level in an offline datacenter to prevent automatic connection to online nodes in other datacenters if an offline node goes down. From: https://docs.datastax.com/en/cassandra/2.1/cassandra/dml/dml_config_consistency_c.html Regards, Eric On Tue, Mar 21, 2017 at 7:49 PM Shannon Careywrote: > The cluster is in two DCs, and yes the client is deployed locally to each > DC. > > From: Matija Gobec > Reply-To: "user@cassandra.apache.org" > Date: Tuesday, March 21, 2017 at 2:56 PM > To: "user@cassandra.apache.org" > Subject: Re: ONE has much higher latency than LOCAL_ONE > > Are you running a multi DC cluster? If yes do you have application in both > data centers/regions ? > > On Tue, Mar 21, 2017 at 8:07 PM, Shannon Carey wrote: > > I am seeing unexpected behavior: consistency level ONE increases read > latency 99th percentile to ~108ms (95th percentile to 5ms-90ms) up from > ~5ms (99th percentile) when using LOCAL_ONE. > > I am using DSE 5.0 with Datastax client 3.0.0. The client is configured > with a TokenAwarePolicy wrapping a DCAwareRoundRobinPolicy with > usedHostsPerRemoteDc set to a very high number. Cassandra cluster has two > datacenters. > > I would expect that when the cluster is operating normally (all local > nodes reachable), ONE would behave the same as LOCAL_ONE. The Does anyone > know why this is not the case? > > >
Re: EC2 storage options for C*
http://m.theregister.co.uk/2013/08/26/amazon_ebs_cloud_problems/ That's what I'm worried about. Granted that's an article from 2013, and While the the general purpose EBS volumes are performant for a production C* workload, I'm worried about EBS outages. If EBS is down, my cluster is down. On Monday, February 1, 2016, Jeff Jirsa <jeff.ji...@crowdstrike.com> wrote: > Yes, but getting at why you think EBS is going down is the real point. New > GM in 2011. Very different product. 35:40 in the video > > > -- > Jeff Jirsa > > > On Jan 31, 2016, at 9:57 PM, Eric Plowe <eric.pl...@gmail.com > <javascript:_e(%7B%7D,'cvml','eric.pl...@gmail.com');>> wrote: > > Jeff, > > If EBS goes down, then EBS Gp2 will go down as well, no? I'm not > discounting EBS, but prior outages are worrisome. > > On Sunday, January 31, 2016, Jeff Jirsa <jeff.ji...@crowdstrike.com > <javascript:_e(%7B%7D,'cvml','jeff.ji...@crowdstrike.com');>> wrote: > >> Free to choose what you'd like, but EBS outages were also addressed in >> that video (second half, discussion by Dennis Opacki). 2016 EBS isn't the >> same as 2011 EBS. >> >> -- >> Jeff Jirsa >> >> >> On Jan 31, 2016, at 8:27 PM, Eric Plowe <eric.pl...@gmail.com> wrote: >> >> Thank you all for the suggestions. I'm torn between GP2 vs Ephemeral. GP2 >> after testing is a viable contender for our workload. The only worry I have >> is EBS outages, which have happened. >> >> On Sunday, January 31, 2016, Jeff Jirsa <jeff.ji...@crowdstrike.com> >> wrote: >> >>> Also in that video - it's long but worth watching >>> >>> We tested up to 1M reads/second as well, blowing out page cache to >>> ensure we weren't "just" reading from memory >>> >>> >>> >>> -- >>> Jeff Jirsa >>> >>> >>> On Jan 31, 2016, at 9:52 AM, Jack Krupansky <jack.krupan...@gmail.com> >>> wrote: >>> >>> How about reads? Any differences between read-intensive and >>> write-intensive workloads? >>> >>> -- Jack Krupansky >>> >>> On Sun, Jan 31, 2016 at 3:13 AM, Jeff Jirsa <jeff.ji...@crowdstrike.com> >>> wrote: >>> >>>> Hi John, >>>> >>>> We run using 4T GP2 volumes, which guarantee 10k iops. Even at 1M >>>> writes per second on 60 nodes, we didn’t come close to hitting even 50% >>>> utilization (10k is more than enough for most workloads). PIOPS is not >>>> necessary. >>>> >>>> >>>> >>>> From: John Wong >>>> Reply-To: "user@cassandra.apache.org" >>>> Date: Saturday, January 30, 2016 at 3:07 PM >>>> To: "user@cassandra.apache.org" >>>> Subject: Re: EC2 storage options for C* >>>> >>>> For production I'd stick with ephemeral disks (aka instance storage) if >>>> you have running a lot of transaction. >>>> However, for regular small testing/qa cluster, or something you know >>>> you want to reload often, EBS is definitely good enough and we haven't had >>>> issues 99%. The 1% is kind of anomaly where we have flush blocked. >>>> >>>> But Jeff, kudo that you are able to use EBS. I didn't go through the >>>> video, do you actually use PIOPS or just standard GP2 in your production >>>> cluster? >>>> >>>> On Sat, Jan 30, 2016 at 1:28 PM, Bryan Cheng <br...@blockcypher.com> >>>> wrote: >>>> >>>>> Yep, that motivated my question "Do you have any idea what kind of >>>>> disk performance you need?". If you need the performance, its hard to beat >>>>> ephemeral SSD in RAID 0 on EC2, and its a solid, battle tested >>>>> configuration. If you don't, though, EBS GP2 will save a _lot_ of >>>>> headache. >>>>> >>>>> Personally, on small clusters like ours (12 nodes), we've found our >>>>> choice of instance dictated much more by the balance of price, CPU, and >>>>> memory. We're using GP2 SSD and we find that for our patterns the disk is >>>>> rarely the bottleneck. YMMV, of course. >>>>> >>>>> On Fri, Jan 29, 2016 at 7:32 PM, Jeff Jirsa < >>>>> jeff.ji...@crowdstrike.com> wrote: >>>>> >>>>>> If you have to ask that question, I strongly recommend m4 or c4 >>>>>> instances with GP2 EBS. When you don’t care about replacing a node >>>>>> because >>>>>> of an instance failure, go with i2+ephemerals. Until then, GP2 EBS is >>>>>> capable of amazing things, and greatly simplifies life. >>>>>> >>>>>> We gave a talk on this topic at both Cassandra Summit and AWS >>>>>> re:Invent: https://www.youtube.com/watch?v=1R-mgOcOSd4 It’s very >>>>>> much a viable option, despite any old documents online that say >>>>>> otherwise. >>>>>> >>>>>> >>>>>> >>>>>> From: Eric Plowe >>>>>> Reply-To: "user@cassandra.apache.org" >>>>>> Date: Friday, January 29, 2016 at 4:33 PM >>>>>> To: "user@cassandra.apache.org" >>>>>> Subject: EC2 storage options for C* >>>>>> >>>>>> My company is planning on rolling out a C* cluster in EC2. We are >>>>>> thinking about going with ephemeral SSDs. The question is this: Should we >>>>>> put two in RAID 0 or just go with one? We currently run a cluster in our >>>>>> data center with 2 250gig Samsung 850 EVO's in RAID 0 and we are happy >>>>>> with >>>>>> the performance we are seeing thus far. >>>>>> >>>>>> Thanks! >>>>>> >>>>>> Eric >>>>>> >>>>> >>>>> >>>> >>>
Re: EC2 storage options for C*
Jeff, If EBS goes down, then EBS Gp2 will go down as well, no? I'm not discounting EBS, but prior outages are worrisome. On Sunday, January 31, 2016, Jeff Jirsa <jeff.ji...@crowdstrike.com> wrote: > Free to choose what you'd like, but EBS outages were also addressed in > that video (second half, discussion by Dennis Opacki). 2016 EBS isn't the > same as 2011 EBS. > > -- > Jeff Jirsa > > > On Jan 31, 2016, at 8:27 PM, Eric Plowe <eric.pl...@gmail.com > <javascript:_e(%7B%7D,'cvml','eric.pl...@gmail.com');>> wrote: > > Thank you all for the suggestions. I'm torn between GP2 vs Ephemeral. GP2 > after testing is a viable contender for our workload. The only worry I have > is EBS outages, which have happened. > > On Sunday, January 31, 2016, Jeff Jirsa <jeff.ji...@crowdstrike.com > <javascript:_e(%7B%7D,'cvml','jeff.ji...@crowdstrike.com');>> wrote: > >> Also in that video - it's long but worth watching >> >> We tested up to 1M reads/second as well, blowing out page cache to ensure >> we weren't "just" reading from memory >> >> >> >> -- >> Jeff Jirsa >> >> >> On Jan 31, 2016, at 9:52 AM, Jack Krupansky <jack.krupan...@gmail.com> >> wrote: >> >> How about reads? Any differences between read-intensive and >> write-intensive workloads? >> >> -- Jack Krupansky >> >> On Sun, Jan 31, 2016 at 3:13 AM, Jeff Jirsa <jeff.ji...@crowdstrike.com> >> wrote: >> >>> Hi John, >>> >>> We run using 4T GP2 volumes, which guarantee 10k iops. Even at 1M writes >>> per second on 60 nodes, we didn’t come close to hitting even 50% >>> utilization (10k is more than enough for most workloads). PIOPS is not >>> necessary. >>> >>> >>> >>> From: John Wong >>> Reply-To: "user@cassandra.apache.org" >>> Date: Saturday, January 30, 2016 at 3:07 PM >>> To: "user@cassandra.apache.org" >>> Subject: Re: EC2 storage options for C* >>> >>> For production I'd stick with ephemeral disks (aka instance storage) if >>> you have running a lot of transaction. >>> However, for regular small testing/qa cluster, or something you know you >>> want to reload often, EBS is definitely good enough and we haven't had >>> issues 99%. The 1% is kind of anomaly where we have flush blocked. >>> >>> But Jeff, kudo that you are able to use EBS. I didn't go through the >>> video, do you actually use PIOPS or just standard GP2 in your production >>> cluster? >>> >>> On Sat, Jan 30, 2016 at 1:28 PM, Bryan Cheng <br...@blockcypher.com> >>> wrote: >>> >>>> Yep, that motivated my question "Do you have any idea what kind of >>>> disk performance you need?". If you need the performance, its hard to beat >>>> ephemeral SSD in RAID 0 on EC2, and its a solid, battle tested >>>> configuration. If you don't, though, EBS GP2 will save a _lot_ of headache. >>>> >>>> Personally, on small clusters like ours (12 nodes), we've found our >>>> choice of instance dictated much more by the balance of price, CPU, and >>>> memory. We're using GP2 SSD and we find that for our patterns the disk is >>>> rarely the bottleneck. YMMV, of course. >>>> >>>> On Fri, Jan 29, 2016 at 7:32 PM, Jeff Jirsa <jeff.ji...@crowdstrike.com >>>> > wrote: >>>> >>>>> If you have to ask that question, I strongly recommend m4 or c4 >>>>> instances with GP2 EBS. When you don’t care about replacing a node >>>>> because >>>>> of an instance failure, go with i2+ephemerals. Until then, GP2 EBS is >>>>> capable of amazing things, and greatly simplifies life. >>>>> >>>>> We gave a talk on this topic at both Cassandra Summit and AWS >>>>> re:Invent: https://www.youtube.com/watch?v=1R-mgOcOSd4 It’s very much >>>>> a viable option, despite any old documents online that say otherwise. >>>>> >>>>> >>>>> >>>>> From: Eric Plowe >>>>> Reply-To: "user@cassandra.apache.org" >>>>> Date: Friday, January 29, 2016 at 4:33 PM >>>>> To: "user@cassandra.apache.org" >>>>> Subject: EC2 storage options for C* >>>>> >>>>> My company is planning on rolling out a C* cluster in EC2. We are >>>>> thinking about going with ephemeral SSDs. The question is this: Should we >>>>> put two in RAID 0 or just go with one? We currently run a cluster in our >>>>> data center with 2 250gig Samsung 850 EVO's in RAID 0 and we are happy >>>>> with >>>>> the performance we are seeing thus far. >>>>> >>>>> Thanks! >>>>> >>>>> Eric >>>>> >>>> >>>> >>> >>
Re: EC2 storage options for C*
Thank you all for the suggestions. I'm torn between GP2 vs Ephemeral. GP2 after testing is a viable contender for our workload. The only worry I have is EBS outages, which have happened. On Sunday, January 31, 2016, Jeff Jirsa <jeff.ji...@crowdstrike.com> wrote: > Also in that video - it's long but worth watching > > We tested up to 1M reads/second as well, blowing out page cache to ensure > we weren't "just" reading from memory > > > > -- > Jeff Jirsa > > > On Jan 31, 2016, at 9:52 AM, Jack Krupansky <jack.krupan...@gmail.com > <javascript:_e(%7B%7D,'cvml','jack.krupan...@gmail.com');>> wrote: > > How about reads? Any differences between read-intensive and > write-intensive workloads? > > -- Jack Krupansky > > On Sun, Jan 31, 2016 at 3:13 AM, Jeff Jirsa <jeff.ji...@crowdstrike.com > <javascript:_e(%7B%7D,'cvml','jeff.ji...@crowdstrike.com');>> wrote: > >> Hi John, >> >> We run using 4T GP2 volumes, which guarantee 10k iops. Even at 1M writes >> per second on 60 nodes, we didn’t come close to hitting even 50% >> utilization (10k is more than enough for most workloads). PIOPS is not >> necessary. >> >> >> >> From: John Wong >> Reply-To: "user@cassandra.apache.org >> <javascript:_e(%7B%7D,'cvml','user@cassandra.apache.org');>" >> Date: Saturday, January 30, 2016 at 3:07 PM >> To: "user@cassandra.apache.org >> <javascript:_e(%7B%7D,'cvml','user@cassandra.apache.org');>" >> Subject: Re: EC2 storage options for C* >> >> For production I'd stick with ephemeral disks (aka instance storage) if >> you have running a lot of transaction. >> However, for regular small testing/qa cluster, or something you know you >> want to reload often, EBS is definitely good enough and we haven't had >> issues 99%. The 1% is kind of anomaly where we have flush blocked. >> >> But Jeff, kudo that you are able to use EBS. I didn't go through the >> video, do you actually use PIOPS or just standard GP2 in your production >> cluster? >> >> On Sat, Jan 30, 2016 at 1:28 PM, Bryan Cheng <br...@blockcypher.com >> <javascript:_e(%7B%7D,'cvml','br...@blockcypher.com');>> wrote: >> >>> Yep, that motivated my question "Do you have any idea what kind of disk >>> performance you need?". If you need the performance, its hard to beat >>> ephemeral SSD in RAID 0 on EC2, and its a solid, battle tested >>> configuration. If you don't, though, EBS GP2 will save a _lot_ of headache. >>> >>> Personally, on small clusters like ours (12 nodes), we've found our >>> choice of instance dictated much more by the balance of price, CPU, and >>> memory. We're using GP2 SSD and we find that for our patterns the disk is >>> rarely the bottleneck. YMMV, of course. >>> >>> On Fri, Jan 29, 2016 at 7:32 PM, Jeff Jirsa <jeff.ji...@crowdstrike.com >>> <javascript:_e(%7B%7D,'cvml','jeff.ji...@crowdstrike.com');>> wrote: >>> >>>> If you have to ask that question, I strongly recommend m4 or c4 >>>> instances with GP2 EBS. When you don’t care about replacing a node because >>>> of an instance failure, go with i2+ephemerals. Until then, GP2 EBS is >>>> capable of amazing things, and greatly simplifies life. >>>> >>>> We gave a talk on this topic at both Cassandra Summit and AWS >>>> re:Invent: https://www.youtube.com/watch?v=1R-mgOcOSd4 It’s very much >>>> a viable option, despite any old documents online that say otherwise. >>>> >>>> >>>> >>>> From: Eric Plowe >>>> Reply-To: "user@cassandra.apache.org >>>> <javascript:_e(%7B%7D,'cvml','user@cassandra.apache.org');>" >>>> Date: Friday, January 29, 2016 at 4:33 PM >>>> To: "user@cassandra.apache.org >>>> <javascript:_e(%7B%7D,'cvml','user@cassandra.apache.org');>" >>>> Subject: EC2 storage options for C* >>>> >>>> My company is planning on rolling out a C* cluster in EC2. We are >>>> thinking about going with ephemeral SSDs. The question is this: Should we >>>> put two in RAID 0 or just go with one? We currently run a cluster in our >>>> data center with 2 250gig Samsung 850 EVO's in RAID 0 and we are happy with >>>> the performance we are seeing thus far. >>>> >>>> Thanks! >>>> >>>> Eric >>>> >>> >>> >> >
Re: EC2 storage options for C*
Bryan, Correct, I should have clarified that. I'm evaluating instance types based on one SSD or two in RAID 0. I thinking its going to be two in RAID 0, but as I've had no experience running a production C* cluster in EC2, I wanted to reach out to the list. Sorry for the half-baked question :) Eric On Friday, January 29, 2016, Bryan Cheng <br...@blockcypher.com> wrote: > Do you have any idea what kind of disk performance you need? > > Cassandra with RAID 0 is a fairly common configuration (Al's awesome > tuning guide has a blurb on it > https://tobert.github.io/pages/als-cassandra-21-tuning-guide.html), so if > you feel comfortable with the operational overhead it seems like a solid > choice. > > To clarify, though, by "just one", do you mean just using one of two > available ephemeral disks available to the instance, or are you evaluating > different instance types based on one disk vs two? > > On Fri, Jan 29, 2016 at 4:33 PM, Eric Plowe <eric.pl...@gmail.com > <javascript:_e(%7B%7D,'cvml','eric.pl...@gmail.com');>> wrote: > >> My company is planning on rolling out a C* cluster in EC2. We are >> thinking about going with ephemeral SSDs. The question is this: Should we >> put two in RAID 0 or just go with one? We currently run a cluster in our >> data center with 2 250gig Samsung 850 EVO's in RAID 0 and we are happy with >> the performance we are seeing thus far. >> >> Thanks! >> >> Eric >> > >
Re: EC2 storage options for C*
RAID 0 regardless of instance type* On Friday, January 29, 2016, Eric Plowe <eric.pl...@gmail.com> wrote: > Bryan, > > Correct, I should have clarified that. I'm evaluating instance types based > on one SSD or two in RAID 0. I thinking its going to be two in RAID 0, > but as I've had no experience running a production C* cluster in EC2, I > wanted to reach out to the list. > > Sorry for the half-baked question :) > > Eric > > On Friday, January 29, 2016, Bryan Cheng <br...@blockcypher.com > <javascript:_e(%7B%7D,'cvml','br...@blockcypher.com');>> wrote: > >> Do you have any idea what kind of disk performance you need? >> >> Cassandra with RAID 0 is a fairly common configuration (Al's awesome >> tuning guide has a blurb on it >> https://tobert.github.io/pages/als-cassandra-21-tuning-guide.html), so >> if you feel comfortable with the operational overhead it seems like a solid >> choice. >> >> To clarify, though, by "just one", do you mean just using one of two >> available ephemeral disks available to the instance, or are you evaluating >> different instance types based on one disk vs two? >> >> On Fri, Jan 29, 2016 at 4:33 PM, Eric Plowe <eric.pl...@gmail.com> wrote: >> >>> My company is planning on rolling out a C* cluster in EC2. We are >>> thinking about going with ephemeral SSDs. The question is this: Should we >>> put two in RAID 0 or just go with one? We currently run a cluster in our >>> data center with 2 250gig Samsung 850 EVO's in RAID 0 and we are happy with >>> the performance we are seeing thus far. >>> >>> Thanks! >>> >>> Eric >>> >> >>
EC2 storage options for C*
My company is planning on rolling out a C* cluster in EC2. We are thinking about going with ephemeral SSDs. The question is this: Should we put two in RAID 0 or just go with one? We currently run a cluster in our data center with 2 250gig Samsung 850 EVO's in RAID 0 and we are happy with the performance we are seeing thus far. Thanks! Eric
Re: which astyanax version to use?
Lijun, That is correct. If you have an investment in Astyanax , you'll need to stay in the 2.0 series. You'll either need to wait until Astyanax is updated to support 2.1 (if that is going to happen) or migrate to the datastax java driver. ~Eric On Wed, Nov 18, 2015 at 12:04 AM, Lijun Huangwrote: > Thank you Minh, > > So it means if I want to use Cassandra 2.1+, any version of Astyanax > cannot be compatible with it? Because we are already using the Astyanax, it > maybe a heavy work to change from Astyanax to Datastax Java Driver. > > > > On Wed, Nov 18, 2015 at 11:52 AM, Minh Do wrote: > >> The latest version of Astyanax won't work with Cassandra 2.1+. So you >> are better off using Java Driver from Datastax. >> >> /Minh >> >> On Tue, Nov 17, 2015 at 7:29 PM, Lijun Huang wrote: >> >>> Hi All, >>> >>> I have the similar problem, if I use the Cassandra 2.1 version, which >>> Astyanax version is the best one for me? For the versions in Astyanax >>> Github pages make me a little confused, I need some experience about this. >>> Thanks in advance. >>> >>> Thanks, >>> Lijun Huang >>> >>> >>> -- Original -- >>> *From: * "Lu, Boying"; ; >>> *Date: * Mon, Nov 9, 2015 04:56 PM >>> *To: * "user@cassandra.apache.org" ; >>> *Subject: * which astyanax version to use? >>> >>> Hi, All, >>> >>> >>> >>> We plan to upgrade Cassandra from 2.0.17 to 2.1.11 (the latest stable >>> release recommended to be used in the product environment) in our product. >>> >>> Currently we are using Astyanax 1.56.49 as Java client, I found there >>> are many new Astyanax at https://github.com/Netflix/astyanax/releases >>> >>> So which version should we use in a product environment 3.8.0? >>> >>> >>> >>> Thanks >>> >>> >>> Boying >>> >>> >>> >> >> > > > -- > Best regards, > Lijun Huang >
Re: GossipingPropertyFileSnitch and nodetool info
Ah. My bad for not checking the jira first. Thanks! On Friday, October 2, 2015, Robert Coli <rc...@eventbrite.com> wrote: > On Thu, Oct 1, 2015 at 1:07 PM, Eric Plowe <eric.pl...@gmail.com > <javascript:_e(%7B%7D,'cvml','eric.pl...@gmail.com');>> wrote: > >> I am using C* 2.1.9 and GossipingPropertyFileSnitch. I noticed that when >> I run nodetool info I am seeing the data center and rack as >> ... >> Is this just a bug with nodetool? >> > > Yep, fixed in 2.1.10. > > https://issues.apache.org/jira/browse/CASSANDRA-10382 > > =Rob >
GossipingPropertyFileSnitch and nodetool info
I am using C* 2.1.9 and GossipingPropertyFileSnitch. I noticed that when I run nodetool info I am seeing the data center and rack as DC1 r1 which comes from the default entry from cassandra-topology.properties. From my understanding, this should only be used when PropertyFileSnitch is used and as a fallback when using GossipingPropertyFileSnitch When I run nodetool status, and looking at the cluster in ops center, everything reports correctly. As a test, I renamed cassandra-topology.properties to cassandra-topology.properties.back on a node and restarted C*. When I run nodetool info I am getting: Data Center: UNKNOWN_DC Rack : UNKNOWN_RACK nodetool status and opscenter still report correctly. Is this just a bug with nodetool? Regards, Eric Plowe
Re: Cassandra Summit 2015 Roll Call!
I'am here! Beaded guy, in a blue gingham shirt. I'll be at the reception. On Tue, Sep 22, 2015 at 2:59 PM, Jonathan Haddadwrote: > Yo. It's me. Haddad, aka rustyrazorblade. 6'1", hair probably in a bun > and a beard. Helping with training today, giving a talk on pyspark & on > the python driver tomorrow. I'll be at the MVP dinner. Wearing a DataStax > training t shirt today, not sure about the rest of the time though. > > Here I am: > https://pbs.twimg.com/profile_images/555812979187265536/uFbfp2q1.jpeg > > Thanks Rob for starting the thread, good idea. > > Please feel free to come say hi even if we're never talked, I love meeting > people in the community. > > Jon > > > On Tue, Sep 22, 2015 at 11:42 AM Russell Bradberry > wrote: > >> I will be wearing a red t-shirt that says SimpleReach and I will be at >> the reception tonight, the MVP dinner and the summit both days. I'm about >> 5'11" and probably going to be the best looking person there. ;) >> >> See you all at the summit. >> >> On Tue, Sep 22, 2015 at 11:27 AM, Robert Coli >> wrote: >> >>> Cassandra Summit 2015 is upon us! >>> >>> Every year, the conference gets bigger and bigger, and the chance of IRL >>> meeting people you've "met" online gets smaller and smaller. >>> >>> To improve everyone's chances, if you are attending the summit : >>> >>> 1) respond on-thread with a brief introduction (and physical description >>> of yourself if you want others to be able to spot you!) >>> 2) join #cassandra on freenode IRC (irc.freenode.org) to chat and >>> connect with other attendees! >>> >>> MY CONTRIBUTION : >>> -- >>> I will be at the summit on Wednesday and Thursday. I am 5'8" or so, and >>> will be wearing glasses and either a red or blue "Eventbrite Engineering" >>> t-shirt with a graphic logo of gears on it. Come say hello! :D >>> >>> =Rob >>> >>> >>
Is it normal to see a node version handshake with itself?
I noticed in the system.log of one of my nodes INFO [HANDSHAKE-mia1-cas-001.bongojuice.com/172.16.245.1] 2015-09-10 16:00:37,748 OutboundTcpConnection.java:485 - Handshaking version with mia1-cas-001.bongojuice.com/172.16.245.1 The machine I am on is mia1-cas-001. If it's nothing, never mind, just stood out to me. ~Eric
Re: Question about consistency
Interesting. I'll give it a try and report back my findings. Thank you, Michael. On Wednesday, September 9, 2015, Laing, Michael <michael.la...@nytimes.com> wrote: > Perhaps a variation on > https://issues.apache.org/jira/browse/CASSANDRA-9753? > > You could try setting speculative retry to 0 to avoid cross-dc reads. > > On Wed, Sep 9, 2015 at 7:55 AM, Eric Plowe <eric.pl...@gmail.com > <javascript:_e(%7B%7D,'cvml','eric.pl...@gmail.com');>> wrote: > >> read_repair_chance: 0 >> dclocal_read_repair_chance: 0.1 >> >> >> On Wednesday, September 9, 2015, Laing, Michael < >> michael.la...@nytimes.com >> <javascript:_e(%7B%7D,'cvml','michael.la...@nytimes.com');>> wrote: >> >>> What are your read repair settings? >>> >>> On Tue, Sep 8, 2015 at 9:28 PM, Eric Plowe <eric.pl...@gmail.com> wrote: >>> >>>> To further expand. We have two data centers, Miami and Dallas. Dallas >>>> is our disaster recovery data center. The cluster has 12 nodes, 6 in Miami >>>> and 6 in Dallas. The servers in Miami only read/write to Miami using data >>>> center aware load balancing policy of the driver. We have the problem when >>>> writing and reading to the Miami cluster with LOCAL_QUORUM. >>>> >>>> Regards, >>>> >>>> Eric >>>> >>>> On Tuesday, September 8, 2015, Eric Plowe <eric.pl...@gmail.com> wrote: >>>> >>>>> Rob, >>>>> >>>>> All writes/reads are happening from DC1. DC2 is a backup. The web app >>>>> does not handle live requests from DC2. >>>>> >>>>> Regards, >>>>> >>>>> Eric Plowe >>>>> >>>>> On Tuesday, September 8, 2015, Robert Coli <rc...@eventbrite.com> >>>>> wrote: >>>>> >>>>>> On Tue, Sep 8, 2015 at 4:40 PM, Eric Plowe <eric.pl...@gmail.com> >>>>>> wrote: >>>>>> >>>>>>> I'm using Cassandra as a storage mechanism for session state >>>>>>> persistence for an ASP.NET web application. I am seeing issues >>>>>>> where the session state is persisted on one page (setting a value: >>>>>>> Session["key"] = "value" and when it redirects to another (from a post >>>>>>> back >>>>>>> event) and check for the existence of the value that was set, it doesn't >>>>>>> exist. >>>>>>> >>>>>>> It's a 12 node cluster with 2 data centers (6 and 6) running 2.1.9. >>>>>>> The key space that the column family lives has a RF of 3 for each >>>>>>> data center. The session state provider is using the the datastax csharp >>>>>>> driver v2.1.6. Writes and reads are at LOCAL_QUORUM. >>>>>>> >>>>>> >>>>>> 1) Write to DC_A with LOCAL_QUORUM >>>>>> 2) Replication to DC_B takes longer than it takes to... >>>>>> 3) Read from DC_B with LOCAL_QUORUM, do not see the write from 1) >>>>>> >>>>>> If you want to be able to read your writes from DC_A in DC_B, you're >>>>>> going to need to use EACH_QUORUM. >>>>>> >>>>>> =Rob >>>>>> >>>>>> >>> >
Re: Question about consistency
read_repair_chance: 0 dclocal_read_repair_chance: 0.1 On Wednesday, September 9, 2015, Laing, Michael <michael.la...@nytimes.com> wrote: > What are your read repair settings? > > On Tue, Sep 8, 2015 at 9:28 PM, Eric Plowe <eric.pl...@gmail.com > <javascript:_e(%7B%7D,'cvml','eric.pl...@gmail.com');>> wrote: > >> To further expand. We have two data centers, Miami and Dallas. Dallas is >> our disaster recovery data center. The cluster has 12 nodes, 6 in Miami and >> 6 in Dallas. The servers in Miami only read/write to Miami using data >> center aware load balancing policy of the driver. We have the problem when >> writing and reading to the Miami cluster with LOCAL_QUORUM. >> >> Regards, >> >> Eric >> >> On Tuesday, September 8, 2015, Eric Plowe <eric.pl...@gmail.com >> <javascript:_e(%7B%7D,'cvml','eric.pl...@gmail.com');>> wrote: >> >>> Rob, >>> >>> All writes/reads are happening from DC1. DC2 is a backup. The web app >>> does not handle live requests from DC2. >>> >>> Regards, >>> >>> Eric Plowe >>> >>> On Tuesday, September 8, 2015, Robert Coli <rc...@eventbrite.com> wrote: >>> >>>> On Tue, Sep 8, 2015 at 4:40 PM, Eric Plowe <eric.pl...@gmail.com> >>>> wrote: >>>> >>>>> I'm using Cassandra as a storage mechanism for session state >>>>> persistence for an ASP.NET web application. I am seeing issues where >>>>> the session state is persisted on one page (setting a value: >>>>> Session["key"] >>>>> = "value" and when it redirects to another (from a post back event) and >>>>> check for the existence of the value that was set, it doesn't exist. >>>>> >>>>> It's a 12 node cluster with 2 data centers (6 and 6) running 2.1.9. >>>>> The key space that the column family lives has a RF of 3 for each >>>>> data center. The session state provider is using the the datastax csharp >>>>> driver v2.1.6. Writes and reads are at LOCAL_QUORUM. >>>>> >>>> >>>> 1) Write to DC_A with LOCAL_QUORUM >>>> 2) Replication to DC_B takes longer than it takes to... >>>> 3) Read from DC_B with LOCAL_QUORUM, do not see the write from 1) >>>> >>>> If you want to be able to read your writes from DC_A in DC_B, you're >>>> going to need to use EACH_QUORUM. >>>> >>>> =Rob >>>> >>>> >
Re: Question about consistency
Would this work: ALTER TABLE session_state WITH speculative_retry = '0ms'; ALTER TABLE session_state WITH speculative_retry = '0PERCENTILE'; I can't set it to 0, but was wondering if these would have the same effect? ~Eric On Wed, Sep 9, 2015 at 8:19 AM, Eric Plowe <eric.pl...@gmail.com> wrote: > Interesting. I'll give it a try and report back my findings. > > Thank you, Michael. > > > On Wednesday, September 9, 2015, Laing, Michael <michael.la...@nytimes.com> > wrote: > >> Perhaps a variation on >> https://issues.apache.org/jira/browse/CASSANDRA-9753? >> >> You could try setting speculative retry to 0 to avoid cross-dc reads. >> >> On Wed, Sep 9, 2015 at 7:55 AM, Eric Plowe <eric.pl...@gmail.com> wrote: >> >>> read_repair_chance: 0 >>> dclocal_read_repair_chance: 0.1 >>> >>> >>> On Wednesday, September 9, 2015, Laing, Michael < >>> michael.la...@nytimes.com> wrote: >>> >>>> What are your read repair settings? >>>> >>>> On Tue, Sep 8, 2015 at 9:28 PM, Eric Plowe <eric.pl...@gmail.com> >>>> wrote: >>>> >>>>> To further expand. We have two data centers, Miami and Dallas. Dallas >>>>> is our disaster recovery data center. The cluster has 12 nodes, 6 in Miami >>>>> and 6 in Dallas. The servers in Miami only read/write to Miami using data >>>>> center aware load balancing policy of the driver. We have the problem when >>>>> writing and reading to the Miami cluster with LOCAL_QUORUM. >>>>> >>>>> Regards, >>>>> >>>>> Eric >>>>> >>>>> On Tuesday, September 8, 2015, Eric Plowe <eric.pl...@gmail.com> >>>>> wrote: >>>>> >>>>>> Rob, >>>>>> >>>>>> All writes/reads are happening from DC1. DC2 is a backup. The web app >>>>>> does not handle live requests from DC2. >>>>>> >>>>>> Regards, >>>>>> >>>>>> Eric Plowe >>>>>> >>>>>> On Tuesday, September 8, 2015, Robert Coli <rc...@eventbrite.com> >>>>>> wrote: >>>>>> >>>>>>> On Tue, Sep 8, 2015 at 4:40 PM, Eric Plowe <eric.pl...@gmail.com> >>>>>>> wrote: >>>>>>> >>>>>>>> I'm using Cassandra as a storage mechanism for session state >>>>>>>> persistence for an ASP.NET web application. I am seeing issues >>>>>>>> where the session state is persisted on one page (setting a value: >>>>>>>> Session["key"] = "value" and when it redirects to another (from a post >>>>>>>> back >>>>>>>> event) and check for the existence of the value that was set, it >>>>>>>> doesn't >>>>>>>> exist. >>>>>>>> >>>>>>>> It's a 12 node cluster with 2 data centers (6 and 6) running 2.1.9. >>>>>>>> The key space that the column family lives has a RF of 3 for each >>>>>>>> data center. The session state provider is using the the datastax >>>>>>>> csharp >>>>>>>> driver v2.1.6. Writes and reads are at LOCAL_QUORUM. >>>>>>>> >>>>>>> >>>>>>> 1) Write to DC_A with LOCAL_QUORUM >>>>>>> 2) Replication to DC_B takes longer than it takes to... >>>>>>> 3) Read from DC_B with LOCAL_QUORUM, do not see the write from 1) >>>>>>> >>>>>>> If you want to be able to read your writes from DC_A in DC_B, you're >>>>>>> going to need to use EACH_QUORUM. >>>>>>> >>>>>>> =Rob >>>>>>> >>>>>>> >>>> >>
Re: Question about consistency
Yeah, that's what I did. Just wanted to verify it that will indeed turn it off. On Wednesday, September 9, 2015, Laing, Michael <michael.la...@nytimes.com> wrote: > "alter table test.test_root WITH speculative_retry = '0.0PERCENTILE';" > > seemed to work for me with C* version 2.1.7 > > On Wed, Sep 9, 2015 at 10:11 AM, Eric Plowe <eric.pl...@gmail.com > <javascript:_e(%7B%7D,'cvml','eric.pl...@gmail.com');>> wrote: > >> Would this work: >> >> ALTER TABLE session_state WITH speculative_retry = '0ms'; >> ALTER TABLE session_state WITH speculative_retry = '0PERCENTILE'; >> >> I can't set it to 0, but was wondering if these would have the same >> effect? >> >> ~Eric >> >> On Wed, Sep 9, 2015 at 8:19 AM, Eric Plowe <eric.pl...@gmail.com >> <javascript:_e(%7B%7D,'cvml','eric.pl...@gmail.com');>> wrote: >> >>> Interesting. I'll give it a try and report back my findings. >>> >>> Thank you, Michael. >>> >>> >>> On Wednesday, September 9, 2015, Laing, Michael < >>> michael.la...@nytimes.com >>> <javascript:_e(%7B%7D,'cvml','michael.la...@nytimes.com');>> wrote: >>> >>>> Perhaps a variation on >>>> https://issues.apache.org/jira/browse/CASSANDRA-9753? >>>> >>>> You could try setting speculative retry to 0 to avoid cross-dc reads. >>>> >>>> On Wed, Sep 9, 2015 at 7:55 AM, Eric Plowe <eric.pl...@gmail.com> >>>> wrote: >>>> >>>>> read_repair_chance: 0 >>>>> dclocal_read_repair_chance: 0.1 >>>>> >>>>> >>>>> On Wednesday, September 9, 2015, Laing, Michael < >>>>> michael.la...@nytimes.com> wrote: >>>>> >>>>>> What are your read repair settings? >>>>>> >>>>>> On Tue, Sep 8, 2015 at 9:28 PM, Eric Plowe <eric.pl...@gmail.com> >>>>>> wrote: >>>>>> >>>>>>> To further expand. We have two data centers, Miami and Dallas. >>>>>>> Dallas is our disaster recovery data center. The cluster has 12 nodes, >>>>>>> 6 in >>>>>>> Miami and 6 in Dallas. The servers in Miami only read/write to Miami >>>>>>> using >>>>>>> data center aware load balancing policy of the driver. We have the >>>>>>> problem >>>>>>> when writing and reading to the Miami cluster with LOCAL_QUORUM. >>>>>>> >>>>>>> Regards, >>>>>>> >>>>>>> Eric >>>>>>> >>>>>>> On Tuesday, September 8, 2015, Eric Plowe <eric.pl...@gmail.com> >>>>>>> wrote: >>>>>>> >>>>>>>> Rob, >>>>>>>> >>>>>>>> All writes/reads are happening from DC1. DC2 is a backup. The web >>>>>>>> app does not handle live requests from DC2. >>>>>>>> >>>>>>>> Regards, >>>>>>>> >>>>>>>> Eric Plowe >>>>>>>> >>>>>>>> On Tuesday, September 8, 2015, Robert Coli <rc...@eventbrite.com> >>>>>>>> wrote: >>>>>>>> >>>>>>>>> On Tue, Sep 8, 2015 at 4:40 PM, Eric Plowe <eric.pl...@gmail.com> >>>>>>>>> wrote: >>>>>>>>> >>>>>>>>>> I'm using Cassandra as a storage mechanism for session state >>>>>>>>>> persistence for an ASP.NET web application. I am seeing issues >>>>>>>>>> where the session state is persisted on one page (setting a value: >>>>>>>>>> Session["key"] = "value" and when it redirects to another (from a >>>>>>>>>> post back >>>>>>>>>> event) and check for the existence of the value that was set, it >>>>>>>>>> doesn't >>>>>>>>>> exist. >>>>>>>>>> >>>>>>>>>> It's a 12 node cluster with 2 data centers (6 and 6) running >>>>>>>>>> 2.1.9. The key space that the column family lives has a RF of 3 >>>>>>>>>> for each data center. The session state provider is using the the >>>>>>>>>> datastax >>>>>>>>>> csharp driver v2.1.6. Writes and reads are at LOCAL_QUORUM. >>>>>>>>>> >>>>>>>>> >>>>>>>>> 1) Write to DC_A with LOCAL_QUORUM >>>>>>>>> 2) Replication to DC_B takes longer than it takes to... >>>>>>>>> 3) Read from DC_B with LOCAL_QUORUM, do not see the write from 1) >>>>>>>>> >>>>>>>>> If you want to be able to read your writes from DC_A in DC_B, >>>>>>>>> you're going to need to use EACH_QUORUM. >>>>>>>>> >>>>>>>>> =Rob >>>>>>>>> >>>>>>>>> >>>>>> >>>> >> >
Re: Question about consistency
So I set speculative_retry to NONE and I encountered the situation about 30 minutes ago. On Wednesday, September 9, 2015, Laing, Michael <michael.la...@nytimes.com> wrote: > It appears that: "alter table test.test_root with speculative_retry = > 'NONE';" is also valid. > > Seems a bit more definitive :) > > On Wed, Sep 9, 2015 at 12:11 PM, Eric Plowe <eric.pl...@gmail.com > <javascript:_e(%7B%7D,'cvml','eric.pl...@gmail.com');>> wrote: > >> Yeah, that's what I did. Just wanted to verify it that will indeed turn >> it off. >> >> On Wednesday, September 9, 2015, Laing, Michael < >> michael.la...@nytimes.com >> <javascript:_e(%7B%7D,'cvml','michael.la...@nytimes.com');>> wrote: >> >>> "alter table test.test_root WITH speculative_retry = '0.0PERCENTILE';" >>> >>> seemed to work for me with C* version 2.1.7 >>> >>> On Wed, Sep 9, 2015 at 10:11 AM, Eric Plowe <eric.pl...@gmail.com> >>> wrote: >>> >>>> Would this work: >>>> >>>> ALTER TABLE session_state WITH speculative_retry = '0ms'; >>>> ALTER TABLE session_state WITH speculative_retry = '0PERCENTILE'; >>>> >>>> I can't set it to 0, but was wondering if these would have the same >>>> effect? >>>> >>>> ~Eric >>>> >>>> On Wed, Sep 9, 2015 at 8:19 AM, Eric Plowe <eric.pl...@gmail.com> >>>> wrote: >>>> >>>>> Interesting. I'll give it a try and report back my findings. >>>>> >>>>> Thank you, Michael. >>>>> >>>>> >>>>> On Wednesday, September 9, 2015, Laing, Michael < >>>>> michael.la...@nytimes.com> wrote: >>>>> >>>>>> Perhaps a variation on >>>>>> https://issues.apache.org/jira/browse/CASSANDRA-9753? >>>>>> >>>>>> You could try setting speculative retry to 0 to avoid cross-dc reads. >>>>>> >>>>>> On Wed, Sep 9, 2015 at 7:55 AM, Eric Plowe <eric.pl...@gmail.com> >>>>>> wrote: >>>>>> >>>>>>> read_repair_chance: 0 >>>>>>> dclocal_read_repair_chance: 0.1 >>>>>>> >>>>>>> >>>>>>> On Wednesday, September 9, 2015, Laing, Michael < >>>>>>> michael.la...@nytimes.com> wrote: >>>>>>> >>>>>>>> What are your read repair settings? >>>>>>>> >>>>>>>> On Tue, Sep 8, 2015 at 9:28 PM, Eric Plowe <eric.pl...@gmail.com> >>>>>>>> wrote: >>>>>>>> >>>>>>>>> To further expand. We have two data centers, Miami and Dallas. >>>>>>>>> Dallas is our disaster recovery data center. The cluster has 12 >>>>>>>>> nodes, 6 in >>>>>>>>> Miami and 6 in Dallas. The servers in Miami only read/write to Miami >>>>>>>>> using >>>>>>>>> data center aware load balancing policy of the driver. We have the >>>>>>>>> problem >>>>>>>>> when writing and reading to the Miami cluster with LOCAL_QUORUM. >>>>>>>>> >>>>>>>>> Regards, >>>>>>>>> >>>>>>>>> Eric >>>>>>>>> >>>>>>>>> On Tuesday, September 8, 2015, Eric Plowe <eric.pl...@gmail.com> >>>>>>>>> wrote: >>>>>>>>> >>>>>>>>>> Rob, >>>>>>>>>> >>>>>>>>>> All writes/reads are happening from DC1. DC2 is a backup. The web >>>>>>>>>> app does not handle live requests from DC2. >>>>>>>>>> >>>>>>>>>> Regards, >>>>>>>>>> >>>>>>>>>> Eric Plowe >>>>>>>>>> >>>>>>>>>> On Tuesday, September 8, 2015, Robert Coli <rc...@eventbrite.com> >>>>>>>>>> wrote: >>>>>>>>>> >>>>>>>>>>> On Tue, Sep 8, 2015 at 4:40 PM, Eric Plowe <eric.pl...@gmail.com >>>>>>>>>>> > wrote: >>>>>>>>>>> >>>>>>>>>>>> I'm using Cassandra as a storage mechanism for session state >>>>>>>>>>>> persistence for an ASP.NET web application. I am seeing issues >>>>>>>>>>>> where the session state is persisted on one page (setting a value: >>>>>>>>>>>> Session["key"] = "value" and when it redirects to another (from a >>>>>>>>>>>> post back >>>>>>>>>>>> event) and check for the existence of the value that was set, it >>>>>>>>>>>> doesn't >>>>>>>>>>>> exist. >>>>>>>>>>>> >>>>>>>>>>>> It's a 12 node cluster with 2 data centers (6 and 6) running >>>>>>>>>>>> 2.1.9. The key space that the column family lives has a RF of >>>>>>>>>>>> 3 for each data center. The session state provider is using the the >>>>>>>>>>>> datastax csharp driver v2.1.6. Writes and reads are at >>>>>>>>>>>> LOCAL_QUORUM. >>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> 1) Write to DC_A with LOCAL_QUORUM >>>>>>>>>>> 2) Replication to DC_B takes longer than it takes to... >>>>>>>>>>> 3) Read from DC_B with LOCAL_QUORUM, do not see the write from 1) >>>>>>>>>>> >>>>>>>>>>> If you want to be able to read your writes from DC_A in DC_B, >>>>>>>>>>> you're going to need to use EACH_QUORUM. >>>>>>>>>>> >>>>>>>>>>> =Rob >>>>>>>>>>> >>>>>>>>>>> >>>>>>>> >>>>>> >>>> >>> >
Question about consistency
I'm using Cassandra as a storage mechanism for session state persistence for an ASP.NET web application. I am seeing issues where the session state is persisted on one page (setting a value: Session["key"] = "value" and when it redirects to another (from a post back event) and check for the existence of the value that was set, it doesn't exist. It's a 12 node cluster with 2 data centers (6 and 6) running 2.1.9. The key space that the column family lives has a RF of 3 for each data center. The session state provider is using the the datastax csharp driver v2.1.6. Writes and reads are at LOCAL_QUORUM. The cluster and web servers have their time synced and we've ruled out clock drift issues. The issue doesn't happen all the time, maybe two to three times a day. Any insight as to what to look at next? Thanks! ~Eric Plowe
Re: Question about consistency
Rob, All writes/reads are happening from DC1. DC2 is a backup. The web app does not handle live requests from DC2. Regards, Eric Plowe On Tuesday, September 8, 2015, Robert Coli <rc...@eventbrite.com> wrote: > On Tue, Sep 8, 2015 at 4:40 PM, Eric Plowe <eric.pl...@gmail.com > <javascript:_e(%7B%7D,'cvml','eric.pl...@gmail.com');>> wrote: > >> I'm using Cassandra as a storage mechanism for session state persistence >> for an ASP.NET web application. I am seeing issues where the session >> state is persisted on one page (setting a value: Session["key"] = >> "value" and when it redirects to another (from a post back event) and check >> for the existence of the value that was set, it doesn't exist. >> >> It's a 12 node cluster with 2 data centers (6 and 6) running 2.1.9. The >> key space that the column family lives has a RF of 3 for each data >> center. The session state provider is using the the datastax csharp driver >> v2.1.6. Writes and reads are at LOCAL_QUORUM. >> > > 1) Write to DC_A with LOCAL_QUORUM > 2) Replication to DC_B takes longer than it takes to... > 3) Read from DC_B with LOCAL_QUORUM, do not see the write from 1) > > If you want to be able to read your writes from DC_A in DC_B, you're going > to need to use EACH_QUORUM. > > =Rob > >
Re: Question about consistency
To further expand. We have two data centers, Miami and Dallas. Dallas is our disaster recovery data center. The cluster has 12 nodes, 6 in Miami and 6 in Dallas. The servers in Miami only read/write to Miami using data center aware load balancing policy of the driver. We have the problem when writing and reading to the Miami cluster with LOCAL_QUORUM. Regards, Eric On Tuesday, September 8, 2015, Eric Plowe <eric.pl...@gmail.com> wrote: > Rob, > > All writes/reads are happening from DC1. DC2 is a backup. The web app does > not handle live requests from DC2. > > Regards, > > Eric Plowe > > On Tuesday, September 8, 2015, Robert Coli <rc...@eventbrite.com > <javascript:_e(%7B%7D,'cvml','rc...@eventbrite.com');>> wrote: > >> On Tue, Sep 8, 2015 at 4:40 PM, Eric Plowe <eric.pl...@gmail.com> wrote: >> >>> I'm using Cassandra as a storage mechanism for session state persistence >>> for an ASP.NET web application. I am seeing issues where the session >>> state is persisted on one page (setting a value: Session["key"] = >>> "value" and when it redirects to another (from a post back event) and check >>> for the existence of the value that was set, it doesn't exist. >>> >>> It's a 12 node cluster with 2 data centers (6 and 6) running 2.1.9. The >>> key space that the column family lives has a RF of 3 for each data >>> center. The session state provider is using the the datastax csharp driver >>> v2.1.6. Writes and reads are at LOCAL_QUORUM. >>> >> >> 1) Write to DC_A with LOCAL_QUORUM >> 2) Replication to DC_B takes longer than it takes to... >> 3) Read from DC_B with LOCAL_QUORUM, do not see the write from 1) >> >> If you want to be able to read your writes from DC_A in DC_B, you're >> going to need to use EACH_QUORUM. >> >> =Rob >> >>
Re: auto_bootstrap=false broken?
I think reading the relevant documentation might have helped. http://docs.datastax.com/en/cassandra/2.0/cassandra/operations/ops_add_dc_to_cluster_t.html On Fri, Aug 7, 2015 at 9:04 AM, horschi hors...@gmail.com wrote: Hi Cyril, thanks for backing me up. I'm under siege from all sides here ;-) That something we're trying to do too. However disabling clients connections (closing thrift and native ports) does not prevent other nodes (acting as a coordinators) to request it ... Honestly we'd like to restart a node that need to deploy HH and to make it serve reads only when it's done. And to be more precise, we know when it's done and don't need it to work by itself (automatically). If you use auto_bootstrap=false in the same DC, then I think you are screwed. Afaik auto_bootstrap=false can only work in a new DC, where you can control reads via LOCAL-Consistencieslevels. kind regards, Christian
Re: Cassandra communication between 2 datacenter
Are you sure that both DC's can communicate with each other over the necessary ports? On Thu, Nov 13, 2014 at 3:46 PM, Adil adil.cha...@gmail.com wrote: yeh we started nodes one at timemy doubt is if we should configure alse cassandra-topology.properties or not? we leave it with default vlaues 2014-11-13 21:05 GMT+01:00 Robert Coli rc...@eventbrite.com: On Thu, Nov 13, 2014 at 10:26 AM, Adil adil.cha...@gmail.com wrote: Hi, we have two datacenter with those inof: Cassandra version 2.1.0 DC1 with 5 nodes DC2 with 5 nodes we set the snitch to GossipingPropertyFileSnitch and in cassandra-rackdc.properties we put: in DC1: dc=DC1 rack=RAC1 in DC2: dc=DC2 rack=RAC1 and in every node's cassandra.yaml we define two seeds of DC1 and two seed of DC2. Do you start the nodes one at a time, and then consult nodetool ring (etc.) to see if the cluster coalesces in the way you expect? If so, a Keyspace created in one should very quickly be created in the other. =Rob http://twitter.com/rcolidba
Re: Operating on large cluster
I am a big fan of perl-ssh-tools (https://github.com/tobert/perl-ssh-tools) to let me manage my nodes and SVN to store configs. ~Eric Plowe On Thu, Oct 23, 2014 at 3:07 PM, Michael Shuler mich...@pbandjelly.org wrote: On 10/23/2014 04:18 AM, Alain RODRIGUEZ wrote: I was wondering about how do you guys handle a large cluster (50+ machines). Configuration management tools are awesome, until they aren't. Having used or played with all the popular ones, and having been bitten by failures of those tools on large clusters, my long-time preference has been using a VCS to check configs and scripts in/out and parallel ssh (whichever one you like). Simple is good. If you don't deeply understand the config management system you have chosen, the unexpected may(will?) eventually happen. To all the servers at once. Even when you are careful, we are human. No tool can prevent *all* mistakes. Test everything in a staging environment, first! -- Kind regards, Michael PS. even staging doesn't prevent fallibility.. :) https://twitter.com/mshuler/status/520667739615395840
Cassandra, vnodes, and spark
Hello. http://stackoverflow.com/questions/19969329/why-not-enable-virtual-node-in-an-hadoop-node/19974621#19974621 Based on this stackoverflow question, vnodes effect the number of mappers Hadoop needs to spawn. Which in then affect performance. With the spark connector for cassandra would the same situation happen? Would vnodes affect performance in a similar situation to Hadoop?
Re: Cassandra, vnodes, and spark
Sorry. Trigger finger on the send. Would vnodes affect performance for spark in a similar fashion for spark. On Monday, September 15, 2014, Eric Plowe eric.pl...@gmail.com wrote: Hello. http://stackoverflow.com/questions/19969329/why-not-enable-virtual-node-in-an-hadoop-node/19974621#19974621 Based on this stackoverflow question, vnodes effect the number of mappers Hadoop needs to spawn. Which in then affect performance. With the spark connector for cassandra would the same situation happen? Would vnodes affect performance in a similar situation to Hadoop?
Re: Cassandra, vnodes, and spark
As hadoop* again sorry.. On Monday, September 15, 2014, Eric Plowe eric.pl...@gmail.com wrote: Sorry. Trigger finger on the send. Would vnodes affect performance for spark in a similar fashion for spark. On Monday, September 15, 2014, Eric Plowe eric.pl...@gmail.com javascript:_e(%7B%7D,'cvml','eric.pl...@gmail.com'); wrote: Hello. http://stackoverflow.com/questions/19969329/why-not-enable-virtual-node-in-an-hadoop-node/19974621#19974621 Based on this stackoverflow question, vnodes effect the number of mappers Hadoop needs to spawn. Which in then affect performance. With the spark connector for cassandra would the same situation happen? Would vnodes affect performance in a similar situation to Hadoop?
Re: Cassandra, vnodes, and spark
Interesting. The way I understand the spark connector is that it's basically a client executing a cql query and filling a spark rdd. Spark will then handle the partitioning of data. Again, this is my understanding, and it maybe incorrect. On Monday, September 15, 2014, Robert Coli rc...@eventbrite.com wrote: On Mon, Sep 15, 2014 at 4:57 PM, Eric Plowe eric.pl...@gmail.com javascript:_e(%7B%7D,'cvml','eric.pl...@gmail.com'); wrote: Based on this stackoverflow question, vnodes effect the number of mappers Hadoop needs to spawn. Which in then affect performance. With the spark connector for cassandra would the same situation happen? Would vnodes affect performance in a similar situation to Hadoop? I don't know what specifically Spark does here, but if it has the same locality expectations as Hadoop generally, my belief would be : yes. =Rob
Re: binary protocol server side sockets
Michael, The ask is for letting keep alive be configurable for native transport, with Socket.setKeepAlive. By default, SO_KEEPALIVE is false ( http://docs.oracle.com/javase/7/docs/api/java/net/StandardSocketOptions.html#SO_KEEPALIVE). Regards, Eric Plowe On Wed, Apr 9, 2014 at 1:25 PM, Michael Shuler mich...@pbandjelly.orgwrote: On 04/09/2014 11:39 AM, graham sanderson wrote: Thanks, but I would think that just sets keep alive from the client end; I’m talking about the server end… this is one of those issues where there is something (e.g. switch, firewall, VPN in between the client and the server) and we get left with orphaned established connections to the server when the client is gone. There would be no server setting for any service, not just c*, that would correct mis-configured connection-assassinating network gear between the client and server. Fix the gear to allow persistent connections. Digging through the various timeouts in c*.yaml didn't lead me to a simple answer for something tunable, but I think this may be more basic networking related. I believe it's up to the client to keep the connection open as Duy indicated. I don't think c* will arbitrarily sever connections - something that disconnects the client may happen. In that case, the TCP connection on the server should drop to TIME_WAIT. Is this what you are seeing in `netstat -a` on the server - a bunch of TIME_WAIT connections hanging around? Those should eventually be recycled, but that's tunable in the network stack, if they are being generated at a high rate. -- Michael
Suggestions for upgrading cassandra
i have a cluster that is running 1.2.6. I'd like to upgrade that cluster to 2.0.7 Any suggestions/tips that would make the upgrade process smooth?
Re: Suggestions for upgrading cassandra
Thanks guys! Yea I am going to do this on a test environment that simulates our production environment. I was just looking for any potential gotchas, etc. On Tue, May 27, 2014 at 6:42 PM, Robert Coli rc...@eventbrite.com wrote: On Tue, May 27, 2014 at 1:24 PM, Robert Coli rc...@eventbrite.com wrote: On Tue, May 27, 2014 at 10:57 AM, Eric Plowe eric.pl...@gmail.comwrote: i have a cluster that is running 1.2.6. I'd like to upgrade that cluster to 2.0.7 Any suggestions/tips that would make the upgrade process smooth? As indicated in NOTES.txt, upgrades to 2.0.x must pass through at least (iirc?) 1.2.9. s/NOTES/NEWS/ =Rob
Re: Filtering on Collections
Collection types cannot be used for filtering (as part of the where statement). They cannot be used as a primary key or part of a primary key. Secondary indexes are not supported as well. On Mon, May 19, 2014 at 12:50 PM, Raj Janakarajan r...@zephyrhealthinc.comwrote: Hello all, I am using Cassandra version 2.0.7. I am wondering if collections is efficient for filtering. We are thinking of using collections to maintain a list for a customer row but we have to be able to filter on the collection values. Select UUID from customer where eligibility_state IN (CA, NC) Eligibility_state being a collection. The above query would be used frequently. Would you recommend collections for modeling from a performance perspective? Raj -- Data Architect ❘ Zephyr Health 589 Howard St. ❘ San Francisco, CA 94105 m: +1 9176477433 ❘ f: +1 415 520-9288 o: +1 415 529-7649 | s: raj.janakarajan http://www.zephyrhealth.com
Re: Filtering on Collections
Ah, that is interesting, Patricia. Since they can be a secondary index, it's not too far off for them being able to be a primary key, no? On Mon, May 19, 2014 at 1:54 PM, Patricia Gorla patri...@thelastpickle.comwrote: Raj, Secondary indexes across CQL3 collections were introduced into 2.1 beta1, so will be available in future versions. See https://issues.apache.org/jira/browse/CASSANDRA-4511 If your main concern is performance then you should find another way to model the data: each collection is read entirely into memory to access a single item. On Mon, May 19, 2014 at 11:03 AM, Eric Plowe eric.pl...@gmail.com wrote: Collection types cannot be used for filtering (as part of the where statement). They cannot be used as a primary key or part of a primary key. Secondary indexes are not supported as well. On Mon, May 19, 2014 at 12:50 PM, Raj Janakarajan r...@zephyrhealthinc.com wrote: Hello all, I am using Cassandra version 2.0.7. I am wondering if collections is efficient for filtering. We are thinking of using collections to maintain a list for a customer row but we have to be able to filter on the collection values. Select UUID from customer where eligibility_state IN (CA, NC) Eligibility_state being a collection. The above query would be used frequently. Would you recommend collections for modeling from a performance perspective? Raj -- Data Architect ❘ Zephyr Health 589 Howard St. ❘ San Francisco, CA 94105 m: +1 9176477433 ❘ f: +1 415 520-9288 o: +1 415 529-7649 | s: raj.janakarajan http://www.zephyrhealth.com -- Patricia Gorla @patriciagorla Consultant Apache Cassandra Consulting http://www.thelastpickle.com http://thelastpickle.com
Re: Deleting column names
Setting the columns to null is essentially deleting them from my understanding. A delete operation works on the entire row. On Monday, April 21, 2014, Andreas Wagner andreas.josef.wag...@googlemail.com wrote: Hi cassandra users, hi Sebastian, I'd be interested in this ... is there any update/solution? Thanks so much ;) Andreas On 04/16/2014 11:43 AM, Sebastian Schmidt wrote: Hi, I'm using a Cassandra table to store some data. I created the table like this: CREATE TABLE IF NOT EXISTS table_name (s BLOB, p BLOB, o BLOB, c BLOB, PRIMARY KEY (s, p, o, c)); I need the at least the p column to be sorted, so that I can use it in a WHERE clause. So as far as I understand, the s column is now the row key, and (p, o, c) is the column name. I tried to delete single entries with a prepared statement like this: DELETE p, o, c FROM table_name WHERE s = ? AND p = ? AND o = ? AND c = ?; That didn't work, because p is a primary key part. It failed during preparation. I also tried to use variables like this: DELETE ?, ?, ? FROM table_name WHERE s = ?; This also failed during preparation, because ? is an unknown identifier. Since I have multiple different p, o, c combinations per s, deleting the whole row identified by s is no option. So how can I delete a s, p, o, c tuple, without deleting other s, p, o, c tuples with the same s? I know that this worked with Thrift/Hector before. Regards, Sebastian
Re: Deleting column names
Also I don't think you can null out columns that are part of the primary key after they've been set. On Monday, April 21, 2014, Andreas Wagner andreas.josef.wag...@googlemail.com wrote: Hi cassandra users, hi Sebastian, I'd be interested in this ... is there any update/solution? Thanks so much ;) Andreas On 04/16/2014 11:43 AM, Sebastian Schmidt wrote: Hi, I'm using a Cassandra table to store some data. I created the table like this: CREATE TABLE IF NOT EXISTS table_name (s BLOB, p BLOB, o BLOB, c BLOB, PRIMARY KEY (s, p, o, c)); I need the at least the p column to be sorted, so that I can use it in a WHERE clause. So as far as I understand, the s column is now the row key, and (p, o, c) is the column name. I tried to delete single entries with a prepared statement like this: DELETE p, o, c FROM table_name WHERE s = ? AND p = ? AND o = ? AND c = ?; That didn't work, because p is a primary key part. It failed during preparation. I also tried to use variables like this: DELETE ?, ?, ? FROM table_name WHERE s = ?; This also failed during preparation, because ? is an unknown identifier. Since I have multiple different p, o, c combinations per s, deleting the whole row identified by s is no option. So how can I delete a s, p, o, c tuple, without deleting other s, p, o, c tuples with the same s? I know that this worked with Thrift/Hector before. Regards, Sebastian
Re: binary protocol server side sockets
The situation I am seeing is this: To access my companies development environment I need to VPN. I do some development on the application, and for some reason my VPN drops, but I had established connections to my development cassandra server. When I reconnect and check netstat I see the connections I had established previously still there, and they never go away. I have had connections that are held open from almost 7 days ago. I ran 'netstat -tulpn' per the request of Nate McCall, and the receive and send queues are 0. I just did a test where I changed the code of my application to use thrift (using the FluentCassandra driver). Start the application, kill my vpn connection, reconnect. When I check the cassandra server, I still see the thrift (9160) connection established, but it is eventually removed because of keep alive. If I change rpc_keepalive to false in cassandra.yaml and restart cassandra then run the same test I outlined above using thrift the connection will stay, like the native transport connections, until cassandra, or the box, is restarted. It seems the lack of keep alive support for native transport is the culprit. Regards, Eric Plowe On Fri, Apr 11, 2014 at 1:12 PM, Nate McCall n...@thelastpickle.com wrote: Out of curiosity, any folks seeing backups in the send or receive queues via netstat while this is happening? (netstat -tulpn for example) I feel like I had this happen once and it ended up being a sysconfig tuning issue (net.core.* and net.ipv4.* stuff specifically). Can't seem to find anything in my notes though, unfortunately. On Fri, Apr 11, 2014 at 10:16 AM, Phil Luckhurst phil.luckhu...@powerassure.com wrote: We have considered this but wondered how well it would work as the Cassandra Java Driver opens multiple connections internally to each Cassandra node. I suppose it depends how those connections are used internally, if it's round robin then it should work. Perhaps we just need to to try it. -- Thanks Phil Chris Lohfink wrote TCP keep alives (by the setTimeout) are notoriously useless... The default 2 hours is generally far longer then any timeout in NAT translation tables (generally ~5 min) and even if you decrease the keep alive to a sane value a log of networks actually throw away TCP keep alive packets. You see that a lot more in cell networks though. Its almost always a good idea to have a software keep alive although it seems to be not implemented in this protocol. You can make a super simple CF with 1 value and query it every minute a connection is idle or something. i.e. select * from DummyCF where id = 1 -- *Chris Lohfink* Engineer 415.663.6738 | Skype: clohfink.blackbirdit *Blackbird **[image: favicon]* 775.345.3485 | www.blackbirdIT.com lt;http://www.blackbirdit.com/gt ; *Formerly PalominoDB/DriveDev* On Fri, Apr 11, 2014 at 3:04 AM, Phil Luckhurst phil.luckhurst@ wrote: We are also seeing this in our development environment. We have a 3 node Cassandra 2.0.5 cluster running on Ubuntu 12.04 and are connecting from a Tomcat based application running on Windows using the 2.0.0 Cassandra Java Driver. We have setKeepAlive(true) when building the cluster in the application and this does keep one connection open on the client side to each of the 3 Cassandra nodes, but we still see the build up of 'old' ESTABLISHED connections on each of the Cassandra servers. We are also getting that same Unexpected exception during request exception appearing in the logs ERROR [Native-Transport-Requests:358378] 2014-04-09 12:31:46,824 ErrorMessage.java (line 222) Unexpected exception during request java.io.IOException: Connection reset by peer at sun.nio.ch.FileDispatcherImpl.read0(Native Method) at sun.nio.ch.SocketDispatcher.read(Unknown Source) at sun.nio.ch.IOUtil.readIntoNativeBuffer(Unknown Source) at sun.nio.ch.IOUtil.read(Unknown Source) at sun.nio.ch.SocketChannelImpl.read(Unknown Source) at org.jboss.netty.channel.socket.nio.NioWorker.read(NioWorker.java:64) at org.jboss.netty.channel.socket.nio.AbstractNioWorker.process(AbstractNioWorker.java:109) at org.jboss.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:312) at org.jboss.netty.channel.socket.nio.AbstractNioWorker.run(AbstractNioWorker.java:90) at org.jboss.netty.channel.socket.nio.NioWorker.run(NioWorker.java:178) at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source) at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) at java.lang.Thread.run(Unknown Source) Initially we thought this was down to a firewall that is between our development machines and the Cassandra nodes but that has now been configured not to 'kill' any connections on port 9042. We also have
Re: binary protocol server side sockets
I am having the exact same issue. I see the connections pile up and pile up, but they never seem to come down. Any insight into this would be amazing. Eric Plowe On Wed, Apr 9, 2014 at 4:17 PM, graham sanderson gra...@vast.com wrote: Thanks Michael, Yup keepalive is not the default. It is possible they are going away after nf_conntrack_tcp_timeout_established; will have to do more digging (it is hard to tell how old a connection is - there are no visible timers (thru netstat) on an ESTABLISHED connection))... This is actually low on my priority list, I was just spending a bit of time trying to track down the source of ERROR [Native-Transport-Requests:3833603] 2014-04-09 17:46:48,833 ErrorMessage.java (line 222) Unexpected exception during request java.io.IOException: Connection reset by peer at sun.nio.ch.FileDispatcherImpl.read0(Native Method) at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39) at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223) at sun.nio.ch.IOUtil.read(IOUtil.java:192) at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:379) at org.jboss.netty.channel.socket.nio.NioWorker.read(NioWorker.java:64) at org.jboss.netty.channel.socket.nio.AbstractNioWorker.process(AbstractNioWorker.java:109) at org.jboss.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:312) at org.jboss.netty.channel.socket.nio.AbstractNioWorker.run(AbstractNioWorker.java:90) at org.jboss.netty.channel.socket.nio.NioWorker.run(NioWorker.java:178) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:744) errors, which are spamming our server logs quite a lot (I originally thought this might be caused by KEEPALIVE, which is when I realized that the connections weren't in keep alive and were building up) - it would be nice if netty would tell us which a little about the Socket channel in the error message (maybe there is a way to do this by changing log levels, but as I say I haven't had time to go digging there) I will probably file a JIRA issue to add the setting (since I can't see any particular harm to setting keepalive) On Apr 9, 2014, at 1:34 PM, Michael Shuler mich...@pbandjelly.org wrote: On 04/09/2014 12:41 PM, graham sanderson wrote: Michael, it is not that the connections are being dropped, it is that the connections are not being dropped. Thanks for the clarification. These server side sockets are ESTABLISHED, even though the client connection on the other side of the network device is long gone. This may well be an issue with the network device (it is valiantly trying to keep the connection alive it seems). Have you tested if they *ever* time out on their own, or do they just keep sticking around forever? (maybe 432000 sec (120 hours), which is the default for nf_conntrack_tcp_timeout_established?) Trying out all the usage scenarios is really the way to track it down - directly on switch, behind/in front of firewall, on/off the VPN. That said KEEPALIVE on the server side would not be a bad idea. At least then the OS on the server would eventually (probably after 2 hours of inactivity) attempt to ping the client. At that point hopefully something interesting would happen perhaps causing an error and destroying the server side socket (note KEEPALIVE is also good for preventing idle connections from being dropped by other network devices along the way) Tuning net.ipv4.tcp_keepalive_* could be helpful, if you know they timeout after 2 hours, which is the default. rpc_keepalive on the server sets keep alive on the server side sockets for thrift, and is true by default There doesn't seem to be a setting for the native protocol Note this isn't a huge issue for us, they can be cleaned up by a rolling restart, and this particular case is not production, but related to development/testing against alpha by people working remotely over VPN - and it may well be the VPNs fault in this case... that said and maybe this is a dev list question, it seems like the option to set keepalive should exist. Yeah, but I agree you shouldn't have to restart to clean up connections - that's why I think it is lower in the network stack, and that a bit of troubleshooting and tuning might be helpful. That setting sounds like a good Jira request - keepalive may be the default, I'm not sure. :) -- Michael On Apr 9, 2014, at 12:25 PM, Michael Shuler mich...@pbandjelly.org wrote: On 04/09/2014 11:39 AM, graham sanderson wrote: Thanks, but I would think that just sets keep alive from the client end; I'm talking about the server end... this is one of those issues where there is something (e.g. switch
Re: sending notifications through data replication on remote clusters
You should be able to achieve what you're looking for with a trigger vs. a modification to the core of Cassandra. http://www.datastax.com/dev/blog/whats-new-in-cassandra-2-0-prototype-triggers-support On Mon, Mar 10, 2014 at 10:06 AM, DE VITO Dominique dominique.dev...@thalesgroup.com wrote: On 03/10/2014 07:49 AM, DE VITO Dominique wrote: If I update a data on DC1, I just want apps connected-first to DC2 to be informed when this data is available on DC2 after replication. If I run a SELECT, I'm going to receive the latest data per the read conditions (ONE, TWO, QUORUM), regardless of location of the client connection. If using network aware topology, you'll get the most current data in that DC. When using Thrift, one way could be to modify CassandraServer class, to send notification to apps according to data coming in into the coordinator node of DC2. Is it common (~ the way to do it) ? Is there another way to do so ? When using CQL, is there a precise src code place to modify for the same purpose ? Notifying connected clients about random INSERT or UPDATE statements that ran somewhere seems to be far, far outside the scope of storing data. Just configure your client to SELECT in the manner that you need. I may not fully understand your problem and could be simplifying things in my head, so feel free to expand. -- Michael First of all, thanks for you answer and your attention. I know about SELECT. The idea, here, is to avoid doing POLLING regularly, as it could be easily a performance nightmare. The idea is to replace POLLING with PUSH, just like in many cases like SEDA architecture, or CQRS architecture, or continuous querying with some data stores. So, following this PUSH idea, it would be nice to inform apps connected to a preferred DC that some new data have been replicated, and is now available. I hope it's clearer. Dominique
Re: Noticing really high read latency
Disregard... heh. Was reading the latency as SECONDS. Sorry, it's been one of those weeks. On Wed, Mar 5, 2014 at 1:44 AM, Eric Plowe eric.pl...@gmail.com wrote: Background info: 6 node cluster. 24 gigs of ram per machine 8 gigs of ram dedicated to c* 4 4 core cpu's 2 250 gig SSD's raid 0 Running c* 1.2.6 The CF is configured as followed CREATE TABLE behaviors ( uid text, buid int, name text, expires text, value text, PRIMARY KEY (uid, buid, name) ) WITH bloom_filter_fp_chance=0.01 AND caching='KEYS_ONLY' AND comment='' AND dclocal_read_repair_chance=0.00 AND gc_grace_seconds=864000 AND read_repair_chance=0.10 AND replicate_on_write='true' AND populate_io_cache_on_flush='false' AND compaction={'sstable_size_in_mb': '160', 'class': 'LeveledCompactionStrategy'} AND compression={'sstable_compression': 'SnappyCompressor'}; I am noticing that the read latency is very high considering when I look at the output of nodetool cfstats. This is the example output of one of the nodes: Column Family: behaviors SSTable count: 2 SSTables in each level: [1, 1, 0, 0, 0, 0, 0, 0, 0] Space used (live): 171496198 Space used (total): 171496591 Number of Keys (estimate): 1153664 Memtable Columns Count: 14445 Memtable Data Size: 1048576 Memtable Switch Count: 1 Read Count: 1894 Read Latency: 0.497 ms. Write Count: 7169 Write Latency: 0.041 ms. Pending Tasks: 0 Bloom Filter False Positives: 4 Bloom Filter False Ratio: 0.00862 Bloom Filter Space Used: 3533152 Compacted row minimum size: 125 Compacted row maximum size: 9887 Compacted row mean size: 365 The write latency is awesome, but the read latency, not so much. The output of iostat doesn't show anything out of the ordinary. The cpu utilization is between 1% to 5%. All read queries are issued with a CL of ONE. We always include WHERE uid = 'somevalue' for the queries. If there is any more info I can provide, please let me know. At this point in time, I am a bit stumped. Regards, Eric Plowe
Noticing really high read latency
Background info: 6 node cluster. 24 gigs of ram per machine 8 gigs of ram dedicated to c* 4 4 core cpu's 2 250 gig SSD's raid 0 Running c* 1.2.6 The CF is configured as followed CREATE TABLE behaviors ( uid text, buid int, name text, expires text, value text, PRIMARY KEY (uid, buid, name) ) WITH bloom_filter_fp_chance=0.01 AND caching='KEYS_ONLY' AND comment='' AND dclocal_read_repair_chance=0.00 AND gc_grace_seconds=864000 AND read_repair_chance=0.10 AND replicate_on_write='true' AND populate_io_cache_on_flush='false' AND compaction={'sstable_size_in_mb': '160', 'class': 'LeveledCompactionStrategy'} AND compression={'sstable_compression': 'SnappyCompressor'}; I am noticing that the read latency is very high considering when I look at the output of nodetool cfstats. This is the example output of one of the nodes: Column Family: behaviors SSTable count: 2 SSTables in each level: [1, 1, 0, 0, 0, 0, 0, 0, 0] Space used (live): 171496198 Space used (total): 171496591 Number of Keys (estimate): 1153664 Memtable Columns Count: 14445 Memtable Data Size: 1048576 Memtable Switch Count: 1 Read Count: 1894 Read Latency: 0.497 ms. Write Count: 7169 Write Latency: 0.041 ms. Pending Tasks: 0 Bloom Filter False Positives: 4 Bloom Filter False Ratio: 0.00862 Bloom Filter Space Used: 3533152 Compacted row minimum size: 125 Compacted row maximum size: 9887 Compacted row mean size: 365 The write latency is awesome, but the read latency, not so much. The output of iostat doesn't show anything out of the ordinary. The cpu utilization is between 1% to 5%. All read queries are issued with a CL of ONE. We always include WHERE uid = 'somevalue' for the queries. If there is any more info I can provide, please let me know. At this point in time, I am a bit stumped. Regards, Eric Plowe