Re: Scaling issue

Henrik Sarvell Tue, 20 Apr 2010 09:22:17 -0700

That was a clever one I must say :)

OK I'll redistribute the articles first and then get back to you for a few
of the details with regards to the above.


The above can then also be used to fetch articles by feed, tag or any other
attribute because they must always be sorted by date.



On Tue, Apr 20, 2010 at 5:22 PM, Alexander Burger <a...@software-lab.de>wrote:

> Hi Henrik,
>
> > So with the refs in place I could use the full remote logic to run pilog
> > queries on the remotes.
>
> OK
>
> > Now a search is made for all articles containing the word "picolisp" for
> > instance. I then need to be able to get an arbitrary slice back of the
> total
> > which needs to be sorted by time. I have a hard time understanding how
> this
> > can be achieved in any sensible way except through one the following:
> >
> > Central Command:
> > ...
> > Cascading:
> > ...
>
> I think both solutions are feasible. This is because you are in the
> lucky situation that you can separate the articles on the remote machine
> according to their age. In a general case (e.g. if the data are not
> "archived" like here, but are subject to permanent change).
>
>
> However: I think there is a solution that is simpler, as well as more
> general (not assuming anything about the locations of the articles).
>
> I did this in another project, where I collected items from remote
> machines sorted by attributes (not date, but counts and sizes).
>
>
> The first thing is that you define the index to be an +Aux, combining
> the search key with the date. So if you search for a key like "picolisp"
> on a single remote machine, you get all hits sorted by date. No extra
> sorting required.
>
> Then, each remote machine has a function (e.g. 'sendRefDatArticles')
> defined, which simply iterates the index tree (with 'collect', or better
> a pilog query) and sends each found object with 'pr' to the current
> output channel. When it sent all hits, it terminates.
>
> Then on the central server you open connections with *Ext enabled to
> each remote client. This can be done with +Agent objects taking care of
> the details (maintaining the connections, communicating via 'ext' etc.).
>
> The actual query then sends out a remote command like
>
>   (start> Agent 'sendRefDatArticles "picolisp" (someDate))
>
> Now all remote database start sending their results, ordered by date.
> They are actually busy only until the TCP queue fills up, or until the
> connection is closed. If the queue is filled up, they will block so that
> it is advisable that they are all fork'ed children.
>
> The central server then reads a single object from each connection into
> a list. Now, to return the results one by one to the actual caller (e.g.
> the GUI), it always picks the object with the highest date from that
> list, and reads the next item into that place in the list. The list is
> effectively a single-object look-ahead on each connection. When one of
> the connections returns NIL, it means the list of hits on that machine
> is exhausted, the remote child process terminated, and the connection
> can be closed.
>
> So the GUI calls a function (or, more probably) a proper pilog predicate
> which returns always the next available object with the highest date.
> With that, you can fetch 1, 25, or thousands of objects in order.
>
> Cheers,
> - Alex
> --
> UNSUBSCRIBE: mailto:picol...@software-lab.de?subject=unsubscribe
>

Re: Scaling issue

Reply via email to