Hello guys,

I'd like to announce a jq-based view server for couchdb. It's extremely
rudimentary, but works as a proof of concept of what can be achieved:

https://github.com/rakoo/jqouch

A bit of background: jq is a cli tool to extract and render information
from any json you give it, with a custom but powerful syntax:

$ curl localhost:5984 | jq '.vendor .version'
"1.6.1"

$ curl localhost:5984/mydb | jq '.disk_size - .data_size'
80892224

Looks like I'd better compact !

If you're dabbling with json and not using it already, I encourage you
to check it out.

Basically jq is invoked with a filter (that's the '.vendor .version'
from the example above); you then feed jq with a JSON document in stdin,
and it gives you all matches and transformations on stdout.  jqouch
works by taking the function given in "add_fun" and spawning an external
process with this fun as a filter, and forwarding documents in "map_doc"
to it. All output from jq is then sent back to CouchDB through jqouch
(jq processes are not killed after each doc, they stay alive as long as
the stdin is not closed, which jqouch never does until it dies)

I have included some example in the repo, here they are. I'm using some
examples from a dump of... I don't know exactly what, but a sample is
here:

https://github.com/rakoo/jqouch/blob/master/sample.json

taken from http://parltrack.euwiki.org/dumps/eurlex.json.xz. That's
22925 documents. I made some benchmarks on CouchDB 1.6:

Here's a really simple view in js:

    function(doc) {
      emit(doc.title, null)
    }

it maps all docs in ~ 35s

And the equivalent in jq:

    [ [.title, null] ]

it maps all docs in ~ 19s

Each map function emits a list of kv pairs, there's no more emit(); it's
actually the format of what a query server has to return for each
mapping function. It may not be ideal, but it works.

Here's an other, more "useful" set of view:

  function(doc) {
    for (var i = 0; i < doc.dates.length; i++) {
      emit([doc.dates[i].type, doc.dates[i].date], null)
    }
  }

runs in ~ 32s

    [ .dates[] | [[.type, .date], null] ]

runs in ~ 19s




There are a few things we can say:

* For all 4 pairs of example views (see repo), jq is constantly almost
  twice as fast as the equivalent js. Moreover the couchjs process is
  always eating a large part of my CPU when running, whereas the jq
  process is never over 30%. This indicates some overhead is spent on
  passing documents betweer processes, which I'm going to investigate
  with the jq C API.

* jq views can be hard to understand and write, but they can be tested
  through the cli jq tool directly, or even online with jqplay
  (https://jqplay.org/)

* using jq doesn't (AFAIK) allow one to output non-deterministic values,
  by default

* jq is "sandboxed" in that it can't do anything other than transform
  documents, contrary to standard languages

* jq filters are in my opininion very clear on what they do, such that a
  one-line filter can be enough in most cases

Of course, it's not all rainbows and unicorns:

* there are still some quirks in the jq views, they can output something
 like [null, null] when they should not return anything because the
 view doesn't apply to the doc.

* jqouch currently doesn't understand anything other than "reset",
  "add_fun" and "map_doc"

* I don't see the jq language as being enough for more generic functions
  such as show and list, but who knows

Anyway, there may be some value in using jq to define basic views, the
ones that just index a document on some value and don't do much more. As
a non-serious CouchDB user I've never had to use really fancy views.

Thoughts ?

Reply via email to