Re: Slow filtered _changes feed

Cory Zue Wed, 20 Oct 2010 09:26:58 -0700

Thanks for the suggestions, all.  I took a look at memcached - turns
out it's pretty good at this type of thing.  :)  Looks like it works
fine for my needs.


Cory

On Wed, Oct 20, 2010 at 11:16 AM, Simon Metson
<[email protected]> wrote:
> Hi,
>        That will improve things, but it's still potentially got to skip
> through a lot of records to return limit number of records. Say you have a
> very prolific user and one who is less active. The prolific guy has 1000
> messages for every one the laid back guy gets. If you're filtering by user
> you're going to have to go through ~10000 records to return the less active
> user 10 documents (assuming I'm understanding _changes right...). The
> problem is not getting the 10 documents out but skipping the 10000 documents
> that don't match the filter.
>
> If the records are split across databases (one per user) you'll only hit the
> relevant changes to your user. Of course that might not be possible for your
> use case....
> Cheers
> Simon
>
>
> On 20 Oct 2010, at 14:31, [email protected] wrote:
>
>> Hello,
>>
>> Perhaps you can use the limit option together with the since option to
>> retrieve the changes feed. That way, no matter if the application is doing a
>> first time initialization, or starts from the last known sync, your code
>> will always be the same :
>> 1/ retrieve N changes infos starting from sequence S (S = 0 the very first
>> time)
>> 2/ do I have N changes in the couchdb response ? if so repeat 1, setting S
>> to the last seq number of the couchdb response.
>>
>> Regards,
>>
>> Mickael
>>
>> ----- Mail Original -----
>> De: "Cory Zue" <[email protected]>
>> À: "user" <[email protected]>
>> Envoyé: Mercredi 20 Octobre 2010 14h06:59 GMT +01:00 Amsterdam / Berlin /
>> Berne / Rome / Stockholm / Vienne
>> Objet: Re: Slow filtered _changes feed
>>
>> Thanks Simon,
>>
>> On Wed, Oct 20, 2010 at 6:43 AM, Simon Metson
>> <[email protected]> wrote:
>>>
>>> Hi,
>>>       One thought: do you query the last N changes or the whole feed? If
>>> I
>>> apply a simple filter to my test database it is slow to get the full
>>> result,
>>> but it's relatviely fast to skip to the more recent changes (the ones
>>> I've
>>> already consumed and go from there.
>>
>> When the phones sync they provide a token containing information about
>> the last known sync,which we use to skip ahead in the changes feed.
>> However, on a first time initialization of the phone it processes the
>> entire feed, and if this doesn't succeed then subsequent ones likely
>> won't either.
>>
>>> Also, the _changes feed streams the documents down to the client - can
>>> your
>>> client/server deal with a streaming response?
>>
>> It doesn't yet, although we could theoretically add this.
>>
>> My latest plan is to basically save a copy of all the relevant
>> information in a couch doc after every attempted sync.  In this case
>> the first operation would still timeout, but if the client retried all
>> the relevant doc ids could be retrieved from that document and only a
>> small update would have to be applied.  Since this is only likely a
>> problem when it is a long time between syncs I think it could work ok
>> (definitely not ideal, though).  Does this seem sane?
>>
>>> Cheers
>>> Simon
>>>
>>> On 20 Oct 2010, at 03:44, Cory Zue wrote:
>>>
>>>> Howdy,
>>>>
>>>> I'm bringing up a problem I chatted about with a few folks with on IRC
>>>> today but was unable to solve.  My app is using the _changes feed to
>>>> detect what updates need to go to particular clients (in this case
>>>> cell phones) based on some filtered information the phones send up in
>>>> the sync request.  The flow looks something like:
>>>>
>>>> Phone ---HTTP POST---> Server
>>>> Server ---filtered _changes---> CouchDB
>>>> [Server prepares couch results for phone]
>>>> Server ---Data Payload---> Phone
>>>>
>>>> All of the above represents a single HTTP POST and response between
>>>> the phone and server.
>>>>
>>>> The problem I am seeing is that hitting the _changes feed from the
>>>> server is prohibitively slow, and these requests are timing out before
>>>> the server can send data back down to the phone.
>>>>
>>>> I was led to believe on IRC that changing my filter from javascript to
>>>> erlang would drastically increase performance, but I'm not observing
>>>> this at all.  In fact the erlang version seems slightly slower.
>>>>
>>>> I setup an erlang view server following these instructions:
>>>> http://wiki.apache.org/couchdb/EnableErlangViews
>>>>
>>>> Am I missing something?  Is my erlang so bad as to negate the
>>>> increased performance gain from switching over?  Was I lied to?  Is my
>>>> whole approach dumb and do I need to implement filtered caching inside
>>>> my server and outside of couch?
>>>>
>>>> Any thoughts or feedback would be much appreciated.
>>>>
>>>> thanks,
>>>> Cory
>>>
>>>
>
>

Re: Slow filtered _changes feed

Reply via email to