Re: Scaling issue

Henrik Sarvell Sun, 11 Apr 2010 03:35:17 -0700

I see, I should've known about that one (I'm using it to get similar
articles already).

What's additionally needed is:

1.) Calculating total count somehow without retrieving all articles.

2.) Somehow sorting by date so I get say the 25 first articles.

If those two can also be achieved in a manner that won't require me to fetch
all articles then I can use Pilog in this manner to fetch the results when
it comes to getting all articles under all feeds under a specific tag. At
the moment I'm fetching all of them at once and using head, not optimal.

However, it won't work with the word indexes, a redesign of how the system
works is needed I think.

When searching for articles belonging to a certain feed containing a word in
the content I now let the distributed indexes return all articles and then I
simply use filter to get at the articles. And to do that I of course need to
fetch all the articles in a certain feed, which works fine for most feeds
but not Twitter as it now probably contains more than 10 000 articles.

The only solution I can see to this is to simply store the feed -> article
mapping remotely too, ie each word index server contains this info too for
the articles it's mapping, resutling in an E/R section looking like this:

(class +WordCount +Entity) #
(rel article   (+Ref +Number))
(rel word      (+Aux +Ref +Number) (article))
(rel count     (+Number))

(class +ArFeLink +Entity)
(rel article   (+Aux +Ref +Number) (feed))
(rel feed      (+Ref +Number))

Then I could simply filter by feed remotely.

/Henrik

On Sun, Apr 11, 2010 at 9:25 AM, Alexander Burger <a...@software-lab.de>wrote:

> Hi Henrik,
>
> > (class +ArFeLink +Entity)
> > (rel article   (+Aux +Ref +Link) (feed) NIL (+Article))
> > (rel feed      (+Ref +Link) NIL (+Feed))
> >
> > (collect 'feed '+ArFeLink Obj Obj 'article) takes forever (2 mins) I need
> it
> > to take something like maximum 2 seconds...
> >
> > Can this be fixed by adding some index or key or do I need make this part
> of
> > the DB distributed and chopped up so I can run this in parallel?
>
> This is already the proper index. Is it perhaps the case that there are
> simply too many articles fetched at once? How may articles does the
> above 'collect' return? And are these articles all needed at that time?
>
> If you talk about 2 seconds, I assume you don't want the user having to
> wait, so it is a GUI interaction. In such cases it is typical not to
> fetch all data from the DB, but only the first chunk e.g. to display
> them in the GUI. It would be better then to use a Pilog query, returning
> the results one by one (as done in +QueryChart).
>
> To get results analog to the above 'collect', you could create a query
> like
>
>   (let Q
>      (goal
>         (quote
>            @Obj Obj
>            (db feed +ArFeLink @Obj @Feed)
>            (val @Article @Feed article) ) )
>      ...
>      (do 20   # Then fetch the first 20 articles
>         (NIL (prove Q))  # More?
>         (bind @   # Bind the result values
>            (println @Article)  # Use the article
>            ...
>
> Instead of 'bind' you could also simply use 'get' to extract the
> @Article: (get @ '@Article).
>
> Before doing so, I would test it interactively, e.g.
>
> : (? (db feed +ArFeLink {ART} @Feed) (val @Article @Feed article))
>
> if '{ART}' is an article.
>
> Not that the above is not tested.
>
> Cheers,
> - Alex
> --
> UNSUBSCRIBE: mailto:picol...@software-lab.de?subject=unsubscribe
>

Re: Scaling issue

Reply via email to