Re: Not-sticky sessions with Sling?

2017-01-17 Thread Felix Meschberger
Hi Lance

Ok, so being as it is — eventual consistent repo replicating the Oak login 
token and not able to use sticky sessions, I suggest you go with something 
else, which does *not* need the repository for persistence.

This means you might want to investigate your own authentication handler or 
look at other options here at Sling — for example the old Form based login (not 
sure what its state is, though). Or good ol’ HTTP Basic (at some other prices 
like no support for „logout“)

Regards
Felix

> Am 18.01.2017 um 02:43 schrieb lancedolan :
> 
> lancedolan wrote
>> I must know what determines the duration of this revision catch-up time
>> ... 
> 
> While I don't know where to look in src code to answer this, I did run a
> very revealing experiment.
> 
> It pretty much always takes 1 second exactly for a Sling instance to get the
> latest revision, and thus the latest data. When not 1 second, it takes 2
> seconds exactly. If you increase load on the server, the likelihood of
> taking 2 seconds increases, and you also begin to see it take exactly 3
> seconds in some rare cases. Increasing load increases the number of seconds
> before a "sync," however it's always near-exactly a second interval.
> 
> It seems impossible for this to be a natural coincidence - I smell a setting
> somewhere (or perhaps hardcode value) which is telling Sling to check the
> latest JCR revision on 1 second intervals. When that window can't be hit, it
> checks on the next second interval, and so on. 
> 
> Is there a Sling dev who can tell me whether this is configurable? I have a
> load of questions about this discovery:
> 
> - Am I wrong? (I'll be shocked)
> - Perhaps we can speed it up? 
> - What event is causing it to "miss the window" and wait until the next 1
> second synch interval?
> - If we do decrease the interval, will that just increase the likelihood of
> taking more intervals anyhow?
> - Is there a maximum number of 1 second intervals before the things just
> gets the latest??
> 
> progress.
> 
> 
> 
> --
> View this message in context: 
> http://apache-sling.73963.n3.nabble.com/Not-sticky-sessions-with-Sling-tp4069530p4069711.html
> Sent from the Sling - Users mailing list archive at Nabble.com.



RE: Not-sticky sessions with Sling?

2017-01-17 Thread lancedolan
Thi is tempting, but I know in my dev-instinct that we won't have the
time to solve all the unsolved in that effort. Thank you for suggesting
though :)



--
View this message in context: 
http://apache-sling.73963.n3.nabble.com/Not-sticky-sessions-with-Sling-tp4069530p4069712.html
Sent from the Sling - Users mailing list archive at Nabble.com.


Re: Not-sticky sessions with Sling?

2017-01-17 Thread lancedolan
lancedolan wrote
> I must know what determines the duration of this revision catch-up time
> ... 

While I don't know where to look in src code to answer this, I did run a
very revealing experiment.

It pretty much always takes 1 second exactly for a Sling instance to get the
latest revision, and thus the latest data. When not 1 second, it takes 2
seconds exactly. If you increase load on the server, the likelihood of
taking 2 seconds increases, and you also begin to see it take exactly 3
seconds in some rare cases. Increasing load increases the number of seconds
before a "sync," however it's always near-exactly a second interval.

It seems impossible for this to be a natural coincidence - I smell a setting
somewhere (or perhaps hardcode value) which is telling Sling to check the
latest JCR revision on 1 second intervals. When that window can't be hit, it
checks on the next second interval, and so on. 

Is there a Sling dev who can tell me whether this is configurable? I have a
load of questions about this discovery:

- Am I wrong? (I'll be shocked)
- Perhaps we can speed it up? 
- What event is causing it to "miss the window" and wait until the next 1
second synch interval?
- If we do decrease the interval, will that just increase the likelihood of
taking more intervals anyhow?
- Is there a maximum number of 1 second intervals before the things just
gets the latest??

progress.



--
View this message in context: 
http://apache-sling.73963.n3.nabble.com/Not-sticky-sessions-with-Sling-tp4069530p4069711.html
Sent from the Sling - Users mailing list archive at Nabble.com.


RE: Not-sticky sessions with Sling?

2017-01-17 Thread Stefan Seifert
not sure if this is of any help for your usecase - but do you need the full JCR 
features and complexity underneath sling, or only a sling cluster + storage in 
mongodb?

if you need only basic resource read and write features via the Sling API you 
might bypass JCR completely and directly use a NoSQL resource provider for 
MongoDB, see [1] and [2].

but please be aware that:
1. the code might not be production-ready for heavy usages yet (not sure how 
much it is used)
2. it does not add any support for cluster synchronization etc. if your 
multiple nodes write to the same path you have to take care of concurrency 
yourself
3. the code is not yet migrated to the latest resourceprovider SPI from sling 
9-SNAPSHOT, but should still run with it
4. it has not built-in support for ACLs etc., you have to take care of this 
yourself

this resource provider is only a thin layer above the MongoDB java client, so 
it should be possible to have full control what mongodb features are used in 
which way.

stefan

[1] http://sling.apache.org/documentation/bundles/nosql-resource-providers.html
[2] https://github.com/apache/sling/tree/trunk/contrib/nosql




Re: Not-sticky sessions with Sling?

2017-01-17 Thread lancedolan
Bertrand Delacretaz wrote
> That would be a pity, as I suppose you're starting to like Sling now ;-)

Ma you have no idea haha! I've got almost every dev in the office all
excited about this now haha. However, it seems our hands are tied.

I wrote local consistency test scripts which POST and immediately GET a
property, checking for consistency. 

Results on a 2-member Sling cluster and localhost mongodb:

-0% consistency with 50ms delay between POST and GET
-35% to 50% consistency with 1 second delay between POST and GET 
-90% consistency with 2 second delay
-98% to 100% consistency after 3 seconds delay.

So yes, you are all correct. 

True, we could use sticky sessions to avoid inconsistency... but only until
we scale our server-farm up or down, which we do daily So sticky
sessions doesn't really solve anything for us.

If you already understand how scaling nullifies the benefit of sticky
sessions, you can skip past this paragraph and move onto the next:
Each time we scale, users will lose their "stickiness." We have thousands of
write users ("authors"). Hundreds concurrently. Compare that to typical AEM
projects have less than 10 authors, and rarely more than 1 concurrently
(I've got several global-scale AEM implementations under my belt). For us,
it's a requirement that we add or remove app servers multiple times per day,
optimizing between AWS costs and performance. Each time we remove an
instance, those users will go to a new Sling instance, and experience the
inconsistency. Each time we add an instance, we will invalidate all
stickiness and users will get re-assigned to a new Sling instance, and
experience the inconsistency. If we don't do this invalidation and
re-assignment on scaling-up, it can takes hours potentially for a scale-up
to positively impact an overloaded cluster where all users are permanently
stuck to their current app server instance.

As you can see, we need to deal with the inconsistency problem, regardless
of whether we use sticky sessions.

I have some ideas, but none are appealing, and would benefit greatly from
your guys' knowledge:

1) Race condition
If this delay to "catch up" to latest revision is mostly predictable, it
doesn't grow as the repo grows in size, or if it doesn't change due to other
variables, we can measure it and then account for it reliably with
user-feedback (loading screen or whatever). This *might* be a race condition
we can live with. 

My results above show as much as 3 or 4 seconds to "catch up."  I must know
what determines the duration of this revision catch-up time. Is it a
function of repo size? Does the delay grow as the repo size grows? Does the
delay grow as usage increases? Does the delay grow as the number of Sling
instances in the cluster grow? Does the delay grow as network latency grows
(I'm testing all on the same machine with practically no latency compared to
a distributed production deployment). Is there any Sling dev, who is
familiar with the algorithm that Sling uses to select a "newer" revision,
who could answer this for me? ... perhaps it's just polling on a predictable
time period! :) 

2) Browser knows what revision it's on.
The browser could know what JCR Revision it's on, learning that revision
after every POST or PUT, perhaps in some response header. When its future
requests are sent to a Sling instance on an older revision, it could wait
until that instance "catches up." This sounds like a horrible example of
client code operating on knowledge of underlying implementation details, and
we're not at all excited about the chaos to implement it. That being said,
can we programmatically check the revision that the current Sling instance
is reading from?

3) "Pause" during scale-up or scale-down.
Each time we add or remove a sling instance, all users experience a "pause"
screen while their new Sling Instance "catches up." This is essentially the
same as the race condition in #1, except we'd constrain users to only
experience this when we scale up or down. However, we are *extremely*
unhappy to impact our users just because we're scaling up or down,
especially when we must do so frequently. 

Anybody have any other ideas?

Other questions:

1) When a brand new Sling instance discovers an existing JCR (Mongo), does
it automatically and immediately go to the latest head revision? Or is there
some progression through the revisions, and it takes time for the Sling
instance to catch up to the latest?

2) Is there any reason, BESIDES JCR CONSISTENCY, why a Sling cluster must be
deployed with sticky-sessions? What other problems would we introduce by not
having sticky sessions?

I seem to have used this email to track my own thoughts more than anything,
my sincere thanks if you've taken the time to read the whole thing.




--
View this message in context: 
http://apache-sling.73963.n3.nabble.com/Not-sticky-sessions-with-Sling-tp4069530p4069709.html
Sent from the Sling - Users mailing list archive at Nabble.com.


Re: Not-sticky sessions with Sling?

2017-01-17 Thread Jörg Hoh
My bad:
CAP = consistency, availability and partition-tolerance.

Jörg

2017-01-17 19:35 GMT+01:00 Jörg Hoh :

> HI Lance,
>
> 2017-01-17 19:19 GMT+01:00 lancedolan :
>
>> ...
>>
>> If "being eventual" is the reason we can't go stateless, then how is adobe
>> getting away with it if we know their architecture is also eventual?? What
>> am I missing? I understand that the documentation I linked is a
>> distributed
>> segment store architecture and mine is a share documentstore datastore,
>> but
>> what is the REASON for them allowing a stateless (not sticky)
>> architecture,
>> if the REASON is not eventual consistency ? Both architectures are
>> eventual.
>>
>>
> It depends a lot on your usecase. For example Facebook is also eventually
> consistent (I sometimes think that the timeline is different on every
> reload). Also the CAP theorem says, that you can choose only 2 of
> "consistency, atomicity and partition-tolerance".
>
> In the case of independent segment stores (in Adobe speak: publish
> instances, stateless loadbalancing) you have a lot of individual requests
> from multiple users. So you as an individual cannot decide if another gets
> the very same content as you. And as long as this eventual consistency is
> not causing annoyances and friction on and end-user side (e.g. you hit a
> intra-side link, which returns in a 404), I would not consider it as a
> problem. And these problems occur so rarely, that many (including me and
> many other users of AEM) ignore it for daily work. But this is only valid
> for a readonly usecase!
>
> The situation is different on the clustered documentNodeStore (in Adobe
> speak: authoring, sticky connections). Due to write skew write operations
> will be visible with a small delay on all cluster nodes. But because there
> it matters that a user sees the changes he just did. And to overcome this
> limitation with the write skew, the recommendation is to use
> sticky-sessions.
>
>
>
> Jörg
>
>
> --
> Cheers,
> Jörg Hoh,
>
> http://cqdump.wordpress.com
> Twitter: @joerghoh
>



-- 
Cheers,
Jörg Hoh,

http://cqdump.wordpress.com
Twitter: @joerghoh


Re: Not-sticky sessions with Sling?

2017-01-17 Thread Jörg Hoh
HI Lance,

2017-01-17 19:19 GMT+01:00 lancedolan :

> ...
>
> If "being eventual" is the reason we can't go stateless, then how is adobe
> getting away with it if we know their architecture is also eventual?? What
> am I missing? I understand that the documentation I linked is a distributed
> segment store architecture and mine is a share documentstore datastore, but
> what is the REASON for them allowing a stateless (not sticky) architecture,
> if the REASON is not eventual consistency ? Both architectures are
> eventual.
>
>
It depends a lot on your usecase. For example Facebook is also eventually
consistent (I sometimes think that the timeline is different on every
reload). Also the CAP theorem says, that you can choose only 2 of
"consistency, atomicity and partition-tolerance".

In the case of independent segment stores (in Adobe speak: publish
instances, stateless loadbalancing) you have a lot of individual requests
from multiple users. So you as an individual cannot decide if another gets
the very same content as you. And as long as this eventual consistency is
not causing annoyances and friction on and end-user side (e.g. you hit a
intra-side link, which returns in a 404), I would not consider it as a
problem. And these problems occur so rarely, that many (including me and
many other users of AEM) ignore it for daily work. But this is only valid
for a readonly usecase!

The situation is different on the clustered documentNodeStore (in Adobe
speak: authoring, sticky connections). Due to write skew write operations
will be visible with a small delay on all cluster nodes. But because there
it matters that a user sees the changes he just did. And to overcome this
limitation with the write skew, the recommendation is to use
sticky-sessions.



Jörg


-- 
Cheers,
Jörg Hoh,

http://cqdump.wordpress.com
Twitter: @joerghoh


Re: Not-sticky sessions with Sling?

2017-01-17 Thread lancedolan
Ok First of all - I GENUINELY appreciate the heck out of your time, and
patience!! 

... and THIS is really interesting:

If THIS is true:


chetan mehrotra wrote
> If you are running a cluster with Sling on Oak/Mongo then sticky 
> sessions would be required due to eventual consistent nature of 
> repository.

and THIS is true:


chetan mehrotra wrote
> Cluster which involves multiple datastores (tar) 
> is also eventually consistent. 

Then why is adobe recommending it's multi-million-dollar projects to go
stateless with the encapsulated token here, if those architectures are
*also* eventually:
https://docs.adobe.com/docs/en/aem/6-1/administer/security/encapsulated-token.html

If "being eventual" is the reason we can't go stateless, then how is adobe
getting away with it if we know their architecture is also eventual?? What
am I missing? I understand that the documentation I linked is a distributed
segment store architecture and mine is a share documentstore datastore, but
what is the REASON for them allowing a stateless (not sticky) architecture,
if the REASON is not eventual consistency ? Both architectures are eventual.

Again, thanks for your patience and sticking with me on this one... whoa
pun!



--
View this message in context: 
http://apache-sling.73963.n3.nabble.com/Not-sticky-sessions-with-Sling-tp4069530p4069698.html
Sent from the Sling - Users mailing list archive at Nabble.com.


Re: Sling Model and Request Parameters

2017-01-17 Thread Christophe Jelger
Hi everyone,

There was a discussion back on Oct 11, 2016 about an extra injector+annotation 
to inject request parameters into Sling models.
I couldn’t find any JIRA issue about that, and I haven’t seen anything like 
that in the code so I don’t think this is currently in progress.

I just joined the mailing list and I started developing with AEM + Sling 
recently, but I have quite some experience with Spring and think this would be 
a useful annotation, especially to be able to handle form submits with sling 
models.

If there is still interest to have that annotation, I would be willing to take 
care of that: I already wrote the missing code and could already provide a diff 
or pull request.

Best regards,
Christophe