Re: 'Grouping' documents so that a set of documents is passed to the view function

hhsuper Thu, 25 Jun 2009 23:04:31 -0700

Great job, Brain, I saw you have update the wiki
http://wiki.apache.org/couchdb/Introduction_to_CouchDB_views with the
discuss content, it'll great help for many people


On Fri, Jun 26, 2009 at 1:32 PM, hhsuper <[email protected]> wrote:

> by the way, brian, i see many people say that with couchdb, you can
> download the data from view and then do sorting/filting/pagination in
> client, yes you can do that but i think we absolutely need these feature in
> couchdb level just like when we use rdbms it support that, such as when i
> have a view with million records i absolutely need paging/sorting on couchdb
> level, also the sorting need to be support with any col of the returned keys
> and returned values, but now it's difficult to impl that with couchdb, isn't
> it?
>
>
> On Fri, Jun 26, 2009 at 10:39 AM, hhsuper <[email protected]> wrote:
>
>> Thx Brian again, I totally understand your description about the
>> map/reduce realistic scenario, this is just what i worry about my view, my
>> reduce function is non-linear, actually i need the logic in reduce part and
>> the that re-reduce code abviouse wrong, but when the realistic reduce
>> occurred, like you say "t's quite possible given N documents that couchdb
>> will reduce the first N-1, then reduce the last 1" and my logic isn't be run
>>
>> seem difficult to execute this logic in reduce, except i return a large
>> object( which i don't like ) to make sure i can execute the same logic in
>> re-reduce part, i can return the additional value which hold the
>> {'dialogid':bestScore,....} in the reduce function and that could make sure
>> i can execute the same logic in re-reduce part, but when user studied dialog
>> more and more, the reduce value got lager and larger
>>
>> should i get  a conclusion that logic like this isn't proper to implement
>> in couchdb's view?
>> also as you say i can download all data to client to  caculate, but this
>> is very costly and have scalable problem.
>>
>> > I don't really understand why you need a subquery in rdbms. I would just
>> > select all results where uid=x, and process them as required (for
>> example:
>> > build a hash of dialogid=>bestScore and update it from each received
>> row)
>>
>> oh, maybe i don't descripe clearly, with a subquery i can used only one
>> sql to get all the user's result( i impl a scoreboard) without any other
>> program code, and within query i can impl pagination(physic paging) and
>> sorting,
>>
>> On Thu, Jun 25, 2009 at 5:08 PM, Brian Candler <[email protected]>wrote:
>>
>>> On Thu, Jun 25, 2009 at 09:34:31AM +0100, Brian Candler wrote:
>>> > Perhaps it will help you to understand this if you consider the
>>> limiting
>>> > case where exactly one document is fed into the 'reduce' function at a
>>> time,
>>> > and then the outputs of the reduce functions are combined with a large
>>> > re-reduce phase.
>>>
>>> Incidentally, this is a partly realistic scenario. It's quite possible
>>> given
>>> N documents that couchdb will reduce the first N-1, then reduce the last
>>> 1,
>>> then re-reduce those two values. This might be because of how the
>>> documents
>>> are split between Btree nodes, or there may be a limit on the number of
>>> documents passed to the reduce function in one go. This is entirely an
>>> implementation issue which you have no control over, so you must write
>>> your
>>> reduce/rereduce to give the same answer for *any* partitioning of
>>> documents.
>>>
>>> More info at
>>> http://wiki.apache.org/couchdb/Introduction_to_CouchDB_views
>>>
>>> "To make incremental Map/Reduce possible, the Reduce function has the
>>> requirement that not only must it be referentially transparent, but it
>>> must
>>> also be commutative and associative for the array value input, to be able
>>> reduce on its own output and get the same answer, like this:
>>>
>>> f(Key, Values) == f(Key, [ f(Key, Values) ] )"
>>>
>>> Now, at first glance your re-reduce function appears to satisfy that
>>> condition, so perhaps there should be another one: namely, that for any
>>> partitioning of Values into subsets Values1, Values2, ... then
>>>
>>>  f(Key, Values) == f(Key, [ f(Key,Values1), f(Key,Values2), ... ] )
>>>
>>> But I am not a mathematician so I'm not sure if this condition is
>>> actually
>>> stronger.
>>>
>>> Regards,
>>>
>>> Brian.
>>>
>>
>>
>>
>> --
>> Yours sincerely
>>
>> Jack Su
>>
>
>
>
> --
> Yours sincerely
>
> Jack Su
>



-- 
Yours sincerely

Jack Su

Re: 'Grouping' documents so that a set of documents is passed to the view function

Reply via email to