Re: Summarizing - key areas of discomfort around CouchDB?

Dale Harvey Thu, 25 Jul 2013 01:50:58 -0700

Lloyd asked me to send along some first thoughts and a demo, unfortunately
due to firefox-os commitments I havent had a whole lot of time to do a
demo, however yesterday freed up and if I can get persona behaving nicely
inside the fxos browser then I should have one before the meeting or at the
latest tomorrow, apologies since I did promise that a while ago.

First Ill introduce myself to get any bias out the way, prior to Mozilla I
worked at Couch.io -> CouchOne, the CouchDB companies with all the founders
of the project, This then was bought by Couchbase (an entirely different
database) that I worked on for a few years before moving to Mozilla. While
at mozilla I started a side project of reimplementing the full CouchDB API
in JavaScript (http://pouchdb.com/) which uses various storage engines
(IndexedDB / LevelDB / WebSQL / HTTP), I am a (not very active) CouchDB
contributor. At mozilla I work on firefox os, am on the browser team and
mostly work somewhere between front end and gecko stuff.

There were 3 main issues mentioned here I wanted to address

For the Couch Server, I dont really believe having attack mitigation, auth
etc sitting in front of CouchDB servers is an issue, this is very much
standard practice. Pretty much the only way couchdb is exposed directly to
webfacing server (in production) is when is in a secure mode and has had a
token server authorising the client beforehand, these are just problems
that will  have to be solved whether or not couch is involved.

As for not being able to control local space usage, I am not entirely sure
I understand why you mean here but there are a variety of ways in which to
control what gets saved, both as in built functionality to couch (filtered
replication), external process (purgers / compactors) and as trivial
modifications to couch (or a client)

Now for the data models and protocol, I would / do have concerns around,
Couch has various reasonable solutions for representing trees but it is
obviously a data model not designed for them, I previously seen references
to a treeSync 'on top' of couch however I can only really see these working
as collections implemented as a batch edit / materialised views while still
using the existing per object / document to sync

The replication protocol currently shared by couch + clients definitely has
some major shortcomings and I would very much not expect it to be used as
is, its terribly inefficient when replicating an existing data set to a new
profile as well as resuming long disconnected peers, there is a difference
between implementation and protocol definition though, I have yet to see
whether there are irreconcilable flaws in the protocol or there are
implementation issues that can be fixed.

My approach for building PouchDB has been to copy the core storage +
protocol of CouchDB, flaws and all with the aim of getting to the point of
a known good and starting optimisations from there, Obviously sync is a
terribly hard problem and despite CouchDB having a fairly simple protocol I
have still seen 4 years of edge cases found in everything from the protocol
to the assumptions that the disk storage has to uphold (as well as 4 years
of failed attempts at building a decent custom sync solution).

I am not particularly good at 'inventing' protocols / technologies so my
preference has always gone towards reusing / retooling what works, PouchDB
as a server replacement for Couch works and is light enough to be heavily
modified without too much concern, this would be one way to start at a
'known good' and make explicit changes when required, having it reach
anywhere near the robustness / scalability of CouchDB (on the server) will
be a large job though (currently implementation is leveldb with no
transactions, its definitely not thread safe), CouchDB has been battle
tested enough that I would always look to fit my problems on top of what it
currently does, improvements to replication performance etc are reasonably
trivial, as a client PouchDB is young enough that there is plenty of fairly
trivial bugs, but I am confident that at it is now sound at a base and will
quickly stabilise, For coming up with a new custom sync protocol I dont
really have much experience advice, just a warning that it will be very
hard, which I dont doubt you know.

Some other points of note, from a firefox os perspective I dont speak
authoritatively but we only really have 2 options and thats a solution to
that works in web content or to create a new webAPI, we have a few very
minor holes in which we firefox os system can talk to chrome, but they are
exceptional, not very flexible (mostly system messages) and are to be
gotten rid off. There are a few places where we could make use of a 'one
off web api' (such as a download manager, hooking into places) but those do
seem to get the lowest priority, I think something that cant work in web
content would be taking firefox os off the target for a long time.

I cant find them now but I seen numbers relating to the data stored by sync
users and worries about indexeddb performance, I think the opportunity to
share a code base between 3 projects (desktop / android / fxos) the uses a
webapi we already want to be as fast as possible could be a big win, as
they were the numbers seemed manageable though.

And last of all I wanted to mention that all the above comes from my
experience of years of working with couch and very little experience and
context into firefox sync / weave and its requirements / constraints, so
not making judgement call or anything just sharing what I have learnt, and
pretty excited to learn more about whats in store (having sync on firefox
os is probably my #1 awaited feature, possibly #2 behind spotify)

Sorry for the essay
Cheers
Dale

On 25 July 2013 06:43, Richard Newman <[email protected]> wrote:

> > Monitoring, attack mitigation, and security auditing don't seem unique
> to couchdb.
>
> Many of the concerns one might have about Couch are concerns one would
> have about most COTS software, or at least OTS document stores. I don't
> think that means they aren't concerns we have about applying Couch to this
> problem.
>
> > "Protocol Limitations" and perhaps "Scalability" and maybe "Data
> Representation" (thinking that protocol limitations can pretty deeply
> affects representation design, not everyone agrees).
>
> Yes; if a hypothetical protocol+server supported transactions, for
> example, we could use different representations.
>
> > Ok, I'm admitting we're going to have a pretty protracted conversation
> on this tomorrow, and just wanted to ensure concerns were represented.
> >
> > I *think* I've done a fair job.  We shall see! :}
>
> Yeah, only one way to find out :)  It's an enormous topic.
>
> Will be there to help!
>
> _______________________________________________
> Sync-dev mailing list
> [email protected]
> https://mail.mozilla.org/listinfo/sync-dev
>

_______________________________________________
Sync-dev mailing list
[email protected]
https://mail.mozilla.org/listinfo/sync-dev

Re: Summarizing - key areas of discomfort around CouchDB?

Reply via email to