Re: Sorting a view by a reduce result (or a better document structure or M/R)

Filippo Fadda Sat, 12 Oct 2013 13:39:22 -0700

So, the short answer is: you can't. The long answer is that I was figuring the 
same problem and I have found the following solution.

Every time a user vote, the application saves a new document having doc.type == 
'vote'. The 'vote' document has a property called 'recorded', that is really 
important.

The application uses a design document called 'votes', where is stored a view 
called 'notRecorded', that contains just a map function like this one:

              function($doc) {
                if ((doc.type == 'vote') and (!doc.recorded))
                  emit(doc.timestamp, doc);
              }

As you can see, there is not a reduce function, since we need to mark those 
documents as recorded, we'll reduce them later on our own.

There is a cron job that runs, in my case, a PHP script. You can choose to run 
the script every minute, every five minutes, etc. The script queries the view 
defined above to get all the 'not recorded' votes.
Once we have all the 'not recorded' votes, we use a simple function to reduce 
them. In my case the function reduces them by postId. The function returns an 
associative array with the sum of the 'not recorded' votes, using as keys the 
post identifiers.
Finally the same script queries another view of another design document 
scores/perPost, using as keys the same post identifiers. This is the map 
function:

              function(doc)) {
                if (doc.type == 'score')
                  emit(doc.postId, doc);
              }

In the end, the script updates every single document of type 'score' retrieved 
before, adding to the score.total the total of new voted counted. Then the 
script updates every single 'vote' document marking it as recorded. So we'll 
never count them anymore. There is no race condition here, no conflicts at all.

To get the posts ordered by score you just need to query another view, let's 
call it posts/byScore (ddoc/viewName) like this:

              function(doc)) {
                if (doc.type == 'score')
                  emit(score.total, doc.postId);
              }

I suggest to use _bulk_docs API, within all_or_nothing option, to be sure to 
change all the documents as an atomic operation.

I don't know if you are using PHP, but my library has some interesting 
features. First of all, you can process a query result chunk by chunk (I use 
this approach to reduce votes), and you can also returns null rows in case some 
post doesn't still have a score, because you never registered votes. This 
simplify a lot your job.

This solution counts just the new votes. In a new vote is generated in the 
meantime, nothing bad is gonna happen, no conflict is generated. You just 
operate on a strict set of posts, just the one have new votes. You'll just 
change the score related to those documents. That's all. :-)

So 3 views and 3 document types: post, vote, score. In your case you'll have 
user instead of post, but the mechanic remains the same.

Bye

-Filippo

On Oct 11, 2013, at 5:09 PM, Tom van Neerijnen wrote:

> Hi all
> 
> I'm developing a ranking service I'd like to throw scores for multiple
> users at and later be able to retrieve a list of users ranked/ordered by
> their scores. Problem is if I make it time series data and accumulate the
> total score in a reduce I can't put the score in the key to sort by.
> 
> Does anyone have suggestions as to how I could re-jig my documents and/or
> M/R to accommodate this kind of output or should I simply consider storing
> a total document along side all my time series documents? My concern with
> storing a total document is I can never be sure the total in the document
> exactly matches my M/R over my time series data because I'd have to have an
> app query the view and, using the result of the view query, update the
> total document, so that's a last resort.
> 
> I considered sorting in a list function but we're potentially looking at
> millions of users so that's not really an option.
> 
> I'm imagining time series documents like so:
> "abcdefghi" : {
>  "userId": "jim",
>  "scoreChange": 10,
>  "country": "GB",
>  "game": "solitaire",
>  "date": "2013-10-08T13:50:19"
> }
> 
> "jklmnopqr" : {
>  "userId": "jim",
>  "scoreChange": -5,
>  "country": "GB",
>  "game": "solitaire",
>  "date": "2013-10-08T13:51:19"
> }
> 
> "jklmnopqr" : {
>  "userId": "james",
>  "scoreChange": 4,
>  "country": "GB",
>  "game": "freecell",
>  "date": "2013-10-08T13:52:19"
> }
> 
> I could then have this M/R for, as an example, ranks by game then user:
> "by_game": {
>    "map": "function(doc) { emit([doc.game, doc.userId], doc.scoreChange)}",
>    "reduce": "function(keys, values, rereduce) { return sum(values); }"
> }
> 
> and get this list from it:
> {"key":["freecell","james"],"value":4}
> {"key":["solitaire","jim"],"value":5},
> 
> Thanks

Re: Sorting a view by a reduce result (or a better document structure or M/R)

Reply via email to