Thanks Mark. I've actually ended up using a combination of Martin, Robert's and your approach in the end.
Seems to be giving me everything I need now. On 17 August 2010 18:45, Mark J. Reed <[email protected]> wrote: > Bear in mind that this approach doesn't distinguish between multiple > instances of a given title within a single doc vs multiple docs. From > your original email I thought you wanted to collapse multiples of the > same title within one document but count separately multiples if they > come from different docs. > > If that's the case, you'll still want to make the list unique before > emitting it... but a double loop isn't the way to go there; something > like this will work: > > map: function(doc) { > var titles={}; > for (var i=0; i<doc.titles.length; ++i) { > titles[doc.titles[i]] = 1 > } > for (var title in titles) { > emit(doc.author, title) > } > } > > > On Tue, Aug 17, 2010 at 8:17 AM, Ian Wootten <[email protected]> wrote: >> Thanks guys. I'd been working toward a solution with multiple level >> keys but had missed this approach for some reason. It's nice to know >> that at least some part of it has to be implemented in code. >> >> Not fully understanding what was being received by the reduce function >> and how it could be worked upon was the source of my problems. >> >> Anyway, I can get what I require from my view now, thanks for the help. >> >> On 17 August 2010 11:37, Robert Newson <[email protected]> wrote: >>> If you emit([doc.docAuthor, doc.titles[title]], 1) instead you could >>> use the built-in Erlang reduce function "_sum" instead, which is >>> faster. >>> >>> B. >>> >>> On Tue, Aug 17, 2010 at 10:24 AM, Martin Higham <[email protected]> wrote: >>>> I think it would be better to use the View to split the titles and create a >>>> list of Authors and Titles. A Map function such as >>>> >>>> function(doc) { >>>> for (title in doc.titles) >>>> emit([doc.docAuthor, doc.titles[title]], null); >>>> } >>>> >>>> does just this. >>>> >>>> You now have a list of keys in the form [Author, title] and they are sorted >>>> by Author. >>>> >>>> It's easy to then take these and produce a list of unique Author/title >>>> combinations and a count of their frequency with the Reduce function. >>>> >>>> function(keys, values, rereduce) { >>>> if (rereduce) { >>>> return sum(values); >>>> } >>>> else { >>>> return values.length; >>>> } >>>> } >>>> >>>> However it is difficult for reduce to produce a list of the top 3. Any >>>> processing within the Reduce can only operate on the data passed in. It >>>> doesn't know what data is yet to come. If you were to output only the top 3 >>>> entries passed in to a given invocation of the Reduce you would produce >>>> inaccurate results as you would potentially throw away rows that might yet >>>> accumulate into the all time top 3. >>>> >>>> Once you have a list of unique Author/title pairs and their frequency you >>>> can either sort and filter them within the client or within a list function >>>> >>>> Hope this helps >>>> >>>> Martin >>>> >>>> >>>> On 17 August 2010 09:26, Ian Wootten <[email protected]> wrote: >>>> >>>>> Hi Everyone, >>>>> >>>>> I was hoping somebody might be able to solve a problem I'm having >>>>> attempting to implement a view at the moment. >>>>> >>>>> Essentially, what it does is to take a collection of documents which >>>>> each have a single author and a list of names (which a possibly >>>>> repeated). There may be multiple documents by the same author, with >>>>> the same names within. Here's an example doc. >>>>> >>>>> doc.author >>>>> doc.titles = ['sometitle', 'someothertitle', 'sometitle, 'anothertitle'] >>>>> >>>>> I would like to return a list of the top 3 titles across for each >>>>> author across all documents. I have tried and failed for several days >>>>> to get this working correctly. >>>>> >>>>> So far, my map is as follows, giving the unique titles for a document, >>>>> not ordered at all: >>>>> >>>>> function(doc) { >>>>> >>>>> var unique_titles = []; >>>>> >>>>> for(var i in doc.titles) >>>>> { >>>>> var count=0; >>>>> >>>>> for(var j in unique_titles) >>>>> { >>>>> if(doc.titles[i]==unique_titles[j]) >>>>> { >>>>> count++; >>>>> } >>>>> } >>>>> >>>>> if(count==0) >>>>> { >>>>> unique_titles.push(doc.titles[i]); >>>>> } >>>>> } >>>>> >>>>> for(var k=0; k<unique_titles.length;k++) >>>>> { >>>>> emit(doc.author, unique_titles[k]); >>>>> } >>>>> } >>>>> >>>>> My map is as follows, this returns two unique titles from a single >>>>> document when only a single document exists for an author(I think): >>>>> >>>>> function(keys, values, rereduce) { >>>>> return values.splice(0,2); >>>>> } >>>>> >>>>> My problem is that: >>>>> >>>>> a) I can't return more than 2 items from the values array (if I set >>>>> the splice length to 3, futon spits back a non-reducing error at me). >>>>> b) Where multiple documents exist for the same author, in some >>>>> instances I see a weird multi-dimensional array returned (rather than >>>>> just two values). For example: >>>>> [['sometitle','someothertitle'],['anothertitle'],['afurthertitle']] >>>>> >>>>> Presumably b) is the result of multiple documents for a single author >>>>> interfering with one another, though I'm confused as to how I >>>>> configure my map/reduce in order to get the information I'm after (I >>>>> also wonder if its even possible). >>>>> >>>>> I've attempted to understand the documentation on reduce functions, >>>>> taking a look at the many examples that exist too, but I'm unable to >>>>> understand them well enough to solve my problem. >>>>> >>>>> I'd appreciate any help on this! >>>>> >>>>> Thanks, >>>>> >>>>> Ian >>>>> >>>> >>> >> > > > > -- > Mark J. Reed <[email protected]> >
