Hello guys, I'd like to announce a jq-based view server for couchdb. It's extremely rudimentary, but works as a proof of concept of what can be achieved:
https://github.com/rakoo/jqouch A bit of background: jq is a cli tool to extract and render information from any json you give it, with a custom but powerful syntax: $ curl localhost:5984 | jq '.vendor .version' "1.6.1" $ curl localhost:5984/mydb | jq '.disk_size - .data_size' 80892224 Looks like I'd better compact ! If you're dabbling with json and not using it already, I encourage you to check it out. Basically jq is invoked with a filter (that's the '.vendor .version' from the example above); you then feed jq with a JSON document in stdin, and it gives you all matches and transformations on stdout. jqouch works by taking the function given in "add_fun" and spawning an external process with this fun as a filter, and forwarding documents in "map_doc" to it. All output from jq is then sent back to CouchDB through jqouch (jq processes are not killed after each doc, they stay alive as long as the stdin is not closed, which jqouch never does until it dies) I have included some example in the repo, here they are. I'm using some examples from a dump of... I don't know exactly what, but a sample is here: https://github.com/rakoo/jqouch/blob/master/sample.json taken from http://parltrack.euwiki.org/dumps/eurlex.json.xz. That's 22925 documents. I made some benchmarks on CouchDB 1.6: Here's a really simple view in js: function(doc) { emit(doc.title, null) } it maps all docs in ~ 35s And the equivalent in jq: [ [.title, null] ] it maps all docs in ~ 19s Each map function emits a list of kv pairs, there's no more emit(); it's actually the format of what a query server has to return for each mapping function. It may not be ideal, but it works. Here's an other, more "useful" set of view: function(doc) { for (var i = 0; i < doc.dates.length; i++) { emit([doc.dates[i].type, doc.dates[i].date], null) } } runs in ~ 32s [ .dates[] | [[.type, .date], null] ] runs in ~ 19s There are a few things we can say: * For all 4 pairs of example views (see repo), jq is constantly almost twice as fast as the equivalent js. Moreover the couchjs process is always eating a large part of my CPU when running, whereas the jq process is never over 30%. This indicates some overhead is spent on passing documents betweer processes, which I'm going to investigate with the jq C API. * jq views can be hard to understand and write, but they can be tested through the cli jq tool directly, or even online with jqplay (https://jqplay.org/) * using jq doesn't (AFAIK) allow one to output non-deterministic values, by default * jq is "sandboxed" in that it can't do anything other than transform documents, contrary to standard languages * jq filters are in my opininion very clear on what they do, such that a one-line filter can be enough in most cases Of course, it's not all rainbows and unicorns: * there are still some quirks in the jq views, they can output something like [null, null] when they should not return anything because the view doesn't apply to the doc. * jqouch currently doesn't understand anything other than "reset", "add_fun" and "map_doc" * I don't see the jq language as being enough for more generic functions such as show and list, but who knows Anyway, there may be some value in using jq to define basic views, the ones that just index a document on some value and don't do much more. As a non-serious CouchDB user I've never had to use really fancy views. Thoughts ?
