Re: Filtered replication of _users

Willem van der Westhuizen Thu, 17 Oct 2019 01:20:47 -0700

Hi Sebastien

I though I will give you a quick input in our experience of supportingonline and offline. We work in conditions where networks are poor andnot reliable, and we have after quite some pain and trial and errorreverted to use the replication mechanism to save data even for usersworking online. In our case which is a business process and workflowtool, it is absolutely essential that all documents arrive in thecorrect version. So we build a ACID - styled transaction engine, andwhen the user saves, it triggers a limited replication based on documentids. That has given us orders of magnitute greater stability in poornetworks. Each user saves in a per-user databse and replicates to theserver. the transactions engine processes it to the actuall correctdatabase completing the save.


Regards

Willem

On 2019/10/17 10:06, Sebastien wrote:

After all, we've decided not to rely on filtered replication for our use
case.

The issue is that we will not only support an offline-first mode where a
filtered copy of the data will be retrieved, but there will also be an
online-only mode (e.g., when accessing the app from an untrusted device,
where the users might prefer not to store anything locally). In the
online-only mode, the users will need to directly access the database, but
it'll also need to be filtered and I'm not sure if there's a safe way to do
that.

What we've chosen to do now is to keep the information colocated in _users
and to go through an API to retrieve the subset of information that is
required (e.g., n properties all members of database X). This way it works
fine in the online-only scenario, but also for the offline-first one since
we can persist the information after having retrieve it once. We also keep
better control over what happens with the data (up to some extent) and can
wipe it if/when necessary.

This issue is rather hairy form a privacy protection point of view, but
such use cases are critical for multi-user offline-first systems.

Thanks again for the useful feedback!

kr,
Sébastien

On Sun, Oct 13, 2019 at 10:34 AM Stefan Klein <[email protected]> wrote:

Hi Sebastien,

Am Sa., 12. Okt. 2019 um 15:55 Uhr schrieb Sebastien <[email protected]

:
Taking that as starting point, one option could indeed be as you propose

to

copy a subset of that "persons" database into each other database (of
course again only a subset of the info, ideally controllable by the end
users). One problem that I imagine with that is mainly the amount of
incurred data duplication.

With the duplication it needs to be absolutely clear which of the
copies is the authoritative version of the document and which are just
copies, then it's manageable.

For instance, imagine that persons contains [A, B, C, D, E, F], then:
- If [A, B, C] have access to database X, then those users should have a
copy of [A, B, C] locally
- If [A, D, E] have access to database Y, then those users should have a
copy of [A,D,E] locally
Consequently, A should have A, B, C, D, E in his local "persons" database
copy.
If at some point E is removed from database Y, then user A should not

have

E in his local database anymore.

Does that sound like something that can be handled through filtered
replication?

I am not aware of any way to delete documents in the target that still
exist in the source.
But if you have a copy of E in Y and delete E from Y at a later point,
this delete will be replicated to the local DB too (If you don't
filter out deleted documents).
Since you probably have some kind of management system to remove E
from Ys _security, you could either delete Es profile from Y in the
same step or have a cron job or similar to remove the redundant
profiles from the databases.

One possible issue here though:
If E gains access to Y again while Es profile wasn't changed, the
former _deleted revision is still the "current" revision and Es
profile stays _deleted in database Y.
You would have to modify Es person document in the persons database,
so it gets a new revision.

I hope that my system will be able to handle hundreds/thousands of
databases with 1-100 users in each database; each use having access to
~1-10 database, thus potentially having access to ~1K user documents
locally (thus is really just an early guesstimate).

Can't comment on pouchdb.
 From my experience CouchDB doesn't care about how many databases
exist, as long there is no current access to a database it is just a
file in the file system.

The system currently doesn't allow users to manage their own profile but
it's indeed a requirement. I'll probably only allow users to modify their
own information while online through a dedicated API endpoint checking

the

user's identity instead of letting them directly write to the "persons"
database.

With this you do have a clear dataflow:
Users modify their profile via API, this changes the persons database.
Documents from the persons database are distributed to the destination
databases.
So there should be no issue with data duplication.

regards,
Stefan

Re: Filtered replication of _users

Reply via email to