Re: Not-sticky sessions with Sling?
Hi Lance Ok, so being as it is — eventual consistent repo replicating the Oak login token and not able to use sticky sessions, I suggest you go with something else, which does *not* need the repository for persistence. This means you might want to investigate your own authentication handler or look at other options here at Sling — for example the old Form based login (not sure what its state is, though). Or good ol’ HTTP Basic (at some other prices like no support for „logout“) Regards Felix > Am 18.01.2017 um 02:43 schrieb lancedolan: > > lancedolan wrote >> I must know what determines the duration of this revision catch-up time >> ... > > While I don't know where to look in src code to answer this, I did run a > very revealing experiment. > > It pretty much always takes 1 second exactly for a Sling instance to get the > latest revision, and thus the latest data. When not 1 second, it takes 2 > seconds exactly. If you increase load on the server, the likelihood of > taking 2 seconds increases, and you also begin to see it take exactly 3 > seconds in some rare cases. Increasing load increases the number of seconds > before a "sync," however it's always near-exactly a second interval. > > It seems impossible for this to be a natural coincidence - I smell a setting > somewhere (or perhaps hardcode value) which is telling Sling to check the > latest JCR revision on 1 second intervals. When that window can't be hit, it > checks on the next second interval, and so on. > > Is there a Sling dev who can tell me whether this is configurable? I have a > load of questions about this discovery: > > - Am I wrong? (I'll be shocked) > - Perhaps we can speed it up? > - What event is causing it to "miss the window" and wait until the next 1 > second synch interval? > - If we do decrease the interval, will that just increase the likelihood of > taking more intervals anyhow? > - Is there a maximum number of 1 second intervals before the things just > gets the latest?? > > progress. > > > > -- > View this message in context: > http://apache-sling.73963.n3.nabble.com/Not-sticky-sessions-with-Sling-tp4069530p4069711.html > Sent from the Sling - Users mailing list archive at Nabble.com.
RE: Not-sticky sessions with Sling?
Thi is tempting, but I know in my dev-instinct that we won't have the time to solve all the unsolved in that effort. Thank you for suggesting though :) -- View this message in context: http://apache-sling.73963.n3.nabble.com/Not-sticky-sessions-with-Sling-tp4069530p4069712.html Sent from the Sling - Users mailing list archive at Nabble.com.
Re: Not-sticky sessions with Sling?
lancedolan wrote > I must know what determines the duration of this revision catch-up time > ... While I don't know where to look in src code to answer this, I did run a very revealing experiment. It pretty much always takes 1 second exactly for a Sling instance to get the latest revision, and thus the latest data. When not 1 second, it takes 2 seconds exactly. If you increase load on the server, the likelihood of taking 2 seconds increases, and you also begin to see it take exactly 3 seconds in some rare cases. Increasing load increases the number of seconds before a "sync," however it's always near-exactly a second interval. It seems impossible for this to be a natural coincidence - I smell a setting somewhere (or perhaps hardcode value) which is telling Sling to check the latest JCR revision on 1 second intervals. When that window can't be hit, it checks on the next second interval, and so on. Is there a Sling dev who can tell me whether this is configurable? I have a load of questions about this discovery: - Am I wrong? (I'll be shocked) - Perhaps we can speed it up? - What event is causing it to "miss the window" and wait until the next 1 second synch interval? - If we do decrease the interval, will that just increase the likelihood of taking more intervals anyhow? - Is there a maximum number of 1 second intervals before the things just gets the latest?? progress. -- View this message in context: http://apache-sling.73963.n3.nabble.com/Not-sticky-sessions-with-Sling-tp4069530p4069711.html Sent from the Sling - Users mailing list archive at Nabble.com.
RE: Not-sticky sessions with Sling?
not sure if this is of any help for your usecase - but do you need the full JCR features and complexity underneath sling, or only a sling cluster + storage in mongodb? if you need only basic resource read and write features via the Sling API you might bypass JCR completely and directly use a NoSQL resource provider for MongoDB, see [1] and [2]. but please be aware that: 1. the code might not be production-ready for heavy usages yet (not sure how much it is used) 2. it does not add any support for cluster synchronization etc. if your multiple nodes write to the same path you have to take care of concurrency yourself 3. the code is not yet migrated to the latest resourceprovider SPI from sling 9-SNAPSHOT, but should still run with it 4. it has not built-in support for ACLs etc., you have to take care of this yourself this resource provider is only a thin layer above the MongoDB java client, so it should be possible to have full control what mongodb features are used in which way. stefan [1] http://sling.apache.org/documentation/bundles/nosql-resource-providers.html [2] https://github.com/apache/sling/tree/trunk/contrib/nosql
Re: Not-sticky sessions with Sling?
Bertrand Delacretaz wrote > That would be a pity, as I suppose you're starting to like Sling now ;-) Ma you have no idea haha! I've got almost every dev in the office all excited about this now haha. However, it seems our hands are tied. I wrote local consistency test scripts which POST and immediately GET a property, checking for consistency. Results on a 2-member Sling cluster and localhost mongodb: -0% consistency with 50ms delay between POST and GET -35% to 50% consistency with 1 second delay between POST and GET -90% consistency with 2 second delay -98% to 100% consistency after 3 seconds delay. So yes, you are all correct. True, we could use sticky sessions to avoid inconsistency... but only until we scale our server-farm up or down, which we do daily So sticky sessions doesn't really solve anything for us. If you already understand how scaling nullifies the benefit of sticky sessions, you can skip past this paragraph and move onto the next: Each time we scale, users will lose their "stickiness." We have thousands of write users ("authors"). Hundreds concurrently. Compare that to typical AEM projects have less than 10 authors, and rarely more than 1 concurrently (I've got several global-scale AEM implementations under my belt). For us, it's a requirement that we add or remove app servers multiple times per day, optimizing between AWS costs and performance. Each time we remove an instance, those users will go to a new Sling instance, and experience the inconsistency. Each time we add an instance, we will invalidate all stickiness and users will get re-assigned to a new Sling instance, and experience the inconsistency. If we don't do this invalidation and re-assignment on scaling-up, it can takes hours potentially for a scale-up to positively impact an overloaded cluster where all users are permanently stuck to their current app server instance. As you can see, we need to deal with the inconsistency problem, regardless of whether we use sticky sessions. I have some ideas, but none are appealing, and would benefit greatly from your guys' knowledge: 1) Race condition If this delay to "catch up" to latest revision is mostly predictable, it doesn't grow as the repo grows in size, or if it doesn't change due to other variables, we can measure it and then account for it reliably with user-feedback (loading screen or whatever). This *might* be a race condition we can live with. My results above show as much as 3 or 4 seconds to "catch up." I must know what determines the duration of this revision catch-up time. Is it a function of repo size? Does the delay grow as the repo size grows? Does the delay grow as usage increases? Does the delay grow as the number of Sling instances in the cluster grow? Does the delay grow as network latency grows (I'm testing all on the same machine with practically no latency compared to a distributed production deployment). Is there any Sling dev, who is familiar with the algorithm that Sling uses to select a "newer" revision, who could answer this for me? ... perhaps it's just polling on a predictable time period! :) 2) Browser knows what revision it's on. The browser could know what JCR Revision it's on, learning that revision after every POST or PUT, perhaps in some response header. When its future requests are sent to a Sling instance on an older revision, it could wait until that instance "catches up." This sounds like a horrible example of client code operating on knowledge of underlying implementation details, and we're not at all excited about the chaos to implement it. That being said, can we programmatically check the revision that the current Sling instance is reading from? 3) "Pause" during scale-up or scale-down. Each time we add or remove a sling instance, all users experience a "pause" screen while their new Sling Instance "catches up." This is essentially the same as the race condition in #1, except we'd constrain users to only experience this when we scale up or down. However, we are *extremely* unhappy to impact our users just because we're scaling up or down, especially when we must do so frequently. Anybody have any other ideas? Other questions: 1) When a brand new Sling instance discovers an existing JCR (Mongo), does it automatically and immediately go to the latest head revision? Or is there some progression through the revisions, and it takes time for the Sling instance to catch up to the latest? 2) Is there any reason, BESIDES JCR CONSISTENCY, why a Sling cluster must be deployed with sticky-sessions? What other problems would we introduce by not having sticky sessions? I seem to have used this email to track my own thoughts more than anything, my sincere thanks if you've taken the time to read the whole thing. -- View this message in context: http://apache-sling.73963.n3.nabble.com/Not-sticky-sessions-with-Sling-tp4069530p4069709.html Sent from the Sling - Users mailing list archive at Nabble.com.
Re: Not-sticky sessions with Sling?
My bad: CAP = consistency, availability and partition-tolerance. Jörg 2017-01-17 19:35 GMT+01:00 Jörg Hoh: > HI Lance, > > 2017-01-17 19:19 GMT+01:00 lancedolan : > >> ... >> >> If "being eventual" is the reason we can't go stateless, then how is adobe >> getting away with it if we know their architecture is also eventual?? What >> am I missing? I understand that the documentation I linked is a >> distributed >> segment store architecture and mine is a share documentstore datastore, >> but >> what is the REASON for them allowing a stateless (not sticky) >> architecture, >> if the REASON is not eventual consistency ? Both architectures are >> eventual. >> >> > It depends a lot on your usecase. For example Facebook is also eventually > consistent (I sometimes think that the timeline is different on every > reload). Also the CAP theorem says, that you can choose only 2 of > "consistency, atomicity and partition-tolerance". > > In the case of independent segment stores (in Adobe speak: publish > instances, stateless loadbalancing) you have a lot of individual requests > from multiple users. So you as an individual cannot decide if another gets > the very same content as you. And as long as this eventual consistency is > not causing annoyances and friction on and end-user side (e.g. you hit a > intra-side link, which returns in a 404), I would not consider it as a > problem. And these problems occur so rarely, that many (including me and > many other users of AEM) ignore it for daily work. But this is only valid > for a readonly usecase! > > The situation is different on the clustered documentNodeStore (in Adobe > speak: authoring, sticky connections). Due to write skew write operations > will be visible with a small delay on all cluster nodes. But because there > it matters that a user sees the changes he just did. And to overcome this > limitation with the write skew, the recommendation is to use > sticky-sessions. > > > > Jörg > > > -- > Cheers, > Jörg Hoh, > > http://cqdump.wordpress.com > Twitter: @joerghoh > -- Cheers, Jörg Hoh, http://cqdump.wordpress.com Twitter: @joerghoh
Re: Not-sticky sessions with Sling?
HI Lance, 2017-01-17 19:19 GMT+01:00 lancedolan: > ... > > If "being eventual" is the reason we can't go stateless, then how is adobe > getting away with it if we know their architecture is also eventual?? What > am I missing? I understand that the documentation I linked is a distributed > segment store architecture and mine is a share documentstore datastore, but > what is the REASON for them allowing a stateless (not sticky) architecture, > if the REASON is not eventual consistency ? Both architectures are > eventual. > > It depends a lot on your usecase. For example Facebook is also eventually consistent (I sometimes think that the timeline is different on every reload). Also the CAP theorem says, that you can choose only 2 of "consistency, atomicity and partition-tolerance". In the case of independent segment stores (in Adobe speak: publish instances, stateless loadbalancing) you have a lot of individual requests from multiple users. So you as an individual cannot decide if another gets the very same content as you. And as long as this eventual consistency is not causing annoyances and friction on and end-user side (e.g. you hit a intra-side link, which returns in a 404), I would not consider it as a problem. And these problems occur so rarely, that many (including me and many other users of AEM) ignore it for daily work. But this is only valid for a readonly usecase! The situation is different on the clustered documentNodeStore (in Adobe speak: authoring, sticky connections). Due to write skew write operations will be visible with a small delay on all cluster nodes. But because there it matters that a user sees the changes he just did. And to overcome this limitation with the write skew, the recommendation is to use sticky-sessions. Jörg -- Cheers, Jörg Hoh, http://cqdump.wordpress.com Twitter: @joerghoh
Re: Not-sticky sessions with Sling?
Ok First of all - I GENUINELY appreciate the heck out of your time, and patience!! ... and THIS is really interesting: If THIS is true: chetan mehrotra wrote > If you are running a cluster with Sling on Oak/Mongo then sticky > sessions would be required due to eventual consistent nature of > repository. and THIS is true: chetan mehrotra wrote > Cluster which involves multiple datastores (tar) > is also eventually consistent. Then why is adobe recommending it's multi-million-dollar projects to go stateless with the encapsulated token here, if those architectures are *also* eventually: https://docs.adobe.com/docs/en/aem/6-1/administer/security/encapsulated-token.html If "being eventual" is the reason we can't go stateless, then how is adobe getting away with it if we know their architecture is also eventual?? What am I missing? I understand that the documentation I linked is a distributed segment store architecture and mine is a share documentstore datastore, but what is the REASON for them allowing a stateless (not sticky) architecture, if the REASON is not eventual consistency ? Both architectures are eventual. Again, thanks for your patience and sticking with me on this one... whoa pun! -- View this message in context: http://apache-sling.73963.n3.nabble.com/Not-sticky-sessions-with-Sling-tp4069530p4069698.html Sent from the Sling - Users mailing list archive at Nabble.com.
Re: Sling Model and Request Parameters
Hi everyone, There was a discussion back on Oct 11, 2016 about an extra injector+annotation to inject request parameters into Sling models. I couldn’t find any JIRA issue about that, and I haven’t seen anything like that in the code so I don’t think this is currently in progress. I just joined the mailing list and I started developing with AEM + Sling recently, but I have quite some experience with Spring and think this would be a useful annotation, especially to be able to handle form submits with sling models. If there is still interest to have that annotation, I would be willing to take care of that: I already wrote the missing code and could already provide a diff or pull request. Best regards, Christophe