On Wed, May 13, 2009 at 12:50:20PM +0100, Brian Candler wrote: > I want to write a reduce function which, when reducing over a range of keys, > gives the minimum and maximum *key* found in that range. (*) > > This could be done very easily and efficiently if I could rely on the > following two properties: > > (1) keys/values passed into a first reduce are in increasing order of key > > (2) reduced values passed into a re-reduce are for increasing key ranges > > The question is, can I rely on both of these properties?
To answer my own question, experimentation shows that clearly I can't. Here's my test code, with the efficient reduce function I wanted to use: ---- 8< ---- require 'rubygems' require 'restclient' require 'json' DB="http://127.0.0.1:5984/test" RestClient.delete DB rescue nil RestClient.put DB, {}.to_json docs = [] (1..50).each do |i| docs << {"foo" => i*10} docs << {"foo" => i*10 + 1000} docs << {"foo" => i*10 + 2000} end RestClient.post "#{DB}/_bulk_docs", {'docs'=>docs}.to_json RestClient.put "#{DB}/_design/test", { "views" => { "test" => { "map" => <<-MAP, function(doc) { if (doc.foo) { emit(doc.foo,null); } } MAP "reduce" => <<-REDUCE, function(ks, vs, co) { if (co) { var c = 0; for (var k in vs) { c += vs[k].count; } return { count: c, min: vs[0].min, max: vs[vs.length-1].max, } } else { return { count: ks.length, min: ks[0][0], max: ks[ks.length-1][0], } } } REDUCE } } }.to_json puts "\nreduce across all says:" puts RestClient.get("#{DB}/_design/test/_view/test") puts "\nreduce across 25..55 says:" puts RestClient.get("#{DB}/_design/test/_view/test?startkey=25&endkey=55") puts "\nreduce across 2385..2405 says:" puts RestClient.get("#{DB}/_design/test/_view/test?startkey=2385&endkey=2405") puts "\nreduce across 15..2405 says:" puts RestClient.get("#{DB}/_design/test/_view/test?startkey=15&endkey=2405") ---- 8< ---- The output I get is: ---- 8< ---- reduce across all says: {"rows":[ {"key":null,"value":{"count":150,"min":240,"max":10}} ]} reduce across 25..55 says: {"rows":[ {"key":null,"value":{"count":3,"min":50,"max":30}} ]} reduce across 2385..2405 says: {"rows":[ {"key":null,"value":{"count":2,"min":2400,"max":2390}} ]} reduce across 15..2405 says: {"rows":[ {"key":null,"value":{"count":139,"min":240,"max":20}} ---- 8< ---- So for small key ranges, i.e. just reduce (not re-reduce), it appears that the keys are passed in *reverse* order into the function. Swapping min/max fixes that. But then the reduce function doesn't work for large ranges (in particular the last, 15..2405), so it seems I can't rely on a particular ordering of reduce blobs into the re-reduce function. Never mind. Back to a traditional comparison max/min function. Regards, Brian.
