Re: Scaling issue

Henrik Sarvell Sun, 11 Apr 2010 05:25:27 -0700

Thanks Alex, I will go for the the reversed range and check out select/3.

I'm already using collect with dates extensively but in this case it
wouldn't work as I need the 25 newest regardless of exactly when they were
published.


/Henrik

On Sun, Apr 11, 2010 at 1:27 PM, Alexander Burger <a...@software-lab.de>wrote:

> On Sun, Apr 11, 2010 at 12:25:42PM +0200, Henrik Sarvell wrote:
> > What's additionally needed is:
> >
> > 1.) Calculating total count somehow without retrieving all articles.
>
> If it is simply the count of all articles in the DB, you can get it
> directly from a '+Key' or '+Ref' index. I don't quite remember the E/R
> model, but I found this in an old mail:
>
>   (class +Article +Entity)
>   (rel aid       (+Key +Number))
>   (rel title     (+Idx +String))
>   (rel htmlUrl   (+Key +String))
>
> With that, (count (tree 'aid '+Article)) or (count (tree 'htmlUrl
> '+Article)) will give all articles having the property 'aid' or
> 'htmlUrl' (not, however, via 'title', as an '+Idx' index creates more
> than one tree node per object).
>
> If you need distinguished counts (e.g. for groups of articles or
> according to certain features), it might be necessary to build more
> indexes, or simply maintain counts during import.
>
>
> > 2.) Somehow sorting by date so I get say the 25 first articles.
>
> This is also best done with a dedicated index, e.g.
>
>   (rel dat (+Ref +Date))
>
> in '+Article'. Then you could specify a reversed range (T . NIL) for a
> pilog query
>
>   (? (db dat +Article (T . NIL) @Article) (show @Article))
>
> This will start with the newest article, and step backwards. Even easier
> might be if you specify a range of dates, say from today till one week
> ago. Then you could use 'collect'
>
>   (collect 'dat '+Article (date) (- (date) 7))
>
> or, as 'today' is not very informative,
>
>   (collect 'dat '+Article T (- (date) 7))
>
>
> > When searching for articles belonging to a certain feed containing a word
> in
> > the content I now let the distributed indexes return all articles and
> then I
> > simply use filter to get at the articles. And to do that I of course need
> to
> > fetch all the articles in a certain feed, which works fine for most feeds
> > but not Twitter as it now probably contains more than 10 000 articles.
>
> I think that usually it should not be necessary to fetch all articles,
> if you build a combined query with the 'select/3' predicate.
>
>
> > The only solution I can see to this is to simply store the feed ->
> article
> > mapping remotely too, ie each word index server contains this info too
> for
> > ...
> > Then I could simply filter by feed remotely.
>
> Not sure. But I feel that I would use distributed processing here only
> if there is no other way (i.e. the parallel search with 'select/3').
>
> Cheers,
> - Alex
> --
> UNSUBSCRIBE: mailto:picol...@software-lab.de?subject=unsubscribe
>

Re: Scaling issue

Reply via email to