I've been reading up a bit on the remote stuff, I haven't made the articles
distributed yet but let's assume I have, with 10 000 articles per remote.
Let's also assume that I have remade the word indexes to now work with real
+Ref +Links on each remote that links words and articles (not simply numbers
for subsequent use with (id) locally).
So with the refs in place I could use the full remote logic to run pilog
queries on the remotes.
Now a search is made for all articles containing the word "picolisp" for
instance. I then need to be able to get an arbitrary slice back of the total
which needs to be sorted by time. I have a hard time understanding how this
can be achieved in any sensible way except through one the following:
1.) The remotes are setup so that remote one contains the oldest articles,
remote two the second oldest articles and so on (this is the case naturally
as a new remote is spawned when the newest one is "full").
2.) Each remote then returns how many articles it has that contains
"picolisp". This is needed for the pagination anyway in order to display a
correct amount of page numbers and can be done pretty trivially through the
count tree mechanism described earlier in this thread.
3.) The local logic now determines which remote(s) should be queried in
order to get 25 correct articles, issues the queries to be executed remotely
and displays the returned articles.
If pagination is scrapped the total count is not needed, it's possible to
have a "More Results" button instead, I'm fine with that kind of interface
too. In most cases the count is not important for the user anyway. In that
way the following might be possible:
1.) The newest remote is queried first and can quickly determine through
count tree that it has the requested articles, quickly fetches them and
2.) If it doesn't contain them it will pass on the request to the second
newest remote which might contain all of the requested articles, or a subset
in which case the missing ones will be returned from the third newest remote
through the same mechanism.
3.) The end result is that the correct articles now end up in the first
remote which will return them to the local.
Did I miss something, might this problem be solved in a cleverer way?
On Thu, Apr 15, 2010 at 12:55 PM, Henrik Sarvell <hsarv...@gmail.com> wrote:
> To simply be able to pass along simple commands like collect and db ie. the
> *Ext stuff was overkill, which works just fine except in this special case
> when there are thousands of articles to a feed.
> I'm planning to distribute the whole DB except users and what feeds they
> subscribe to. Everything else will be article centric and remote. I will
> also keep local records of which feeds have articles in which remote so I
> don't query remotes for nothing.
> On Thu, Apr 15, 2010 at 12:17 PM, Alexander Burger
>> On Thu, Apr 15, 2010 at 09:12:18AM +0200, Henrik Sarvell wrote:
>> > On the other hand, if I'm to follow my own thinking to its logical
>> > conclusion I should make the articles distributed too, with blobs and
>> What was the rationale to use object IDs instead of direct remote access
>> via '*Ext'? I can't remember at the moment.
>> UNSUBSCRIBE: mailto:picol...@software-lab.de?subject=unsubscribe