Thanks Alex, I will go for the the reversed range and check out select/3. I'm already using collect with dates extensively but in this case it wouldn't work as I need the 25 newest regardless of exactly when they were published.
/Henrik On Sun, Apr 11, 2010 at 1:27 PM, Alexander Burger <a...@software-lab.de>wrote: > On Sun, Apr 11, 2010 at 12:25:42PM +0200, Henrik Sarvell wrote: > > What's additionally needed is: > > > > 1.) Calculating total count somehow without retrieving all articles. > > If it is simply the count of all articles in the DB, you can get it > directly from a '+Key' or '+Ref' index. I don't quite remember the E/R > model, but I found this in an old mail: > > (class +Article +Entity) > (rel aid (+Key +Number)) > (rel title (+Idx +String)) > (rel htmlUrl (+Key +String)) > > With that, (count (tree 'aid '+Article)) or (count (tree 'htmlUrl > '+Article)) will give all articles having the property 'aid' or > 'htmlUrl' (not, however, via 'title', as an '+Idx' index creates more > than one tree node per object). > > If you need distinguished counts (e.g. for groups of articles or > according to certain features), it might be necessary to build more > indexes, or simply maintain counts during import. > > > > 2.) Somehow sorting by date so I get say the 25 first articles. > > This is also best done with a dedicated index, e.g. > > (rel dat (+Ref +Date)) > > in '+Article'. Then you could specify a reversed range (T . NIL) for a > pilog query > > (? (db dat +Article (T . NIL) @Article) (show @Article)) > > This will start with the newest article, and step backwards. Even easier > might be if you specify a range of dates, say from today till one week > ago. Then you could use 'collect' > > (collect 'dat '+Article (date) (- (date) 7)) > > or, as 'today' is not very informative, > > (collect 'dat '+Article T (- (date) 7)) > > > > When searching for articles belonging to a certain feed containing a word > in > > the content I now let the distributed indexes return all articles and > then I > > simply use filter to get at the articles. And to do that I of course need > to > > fetch all the articles in a certain feed, which works fine for most feeds > > but not Twitter as it now probably contains more than 10 000 articles. > > I think that usually it should not be necessary to fetch all articles, > if you build a combined query with the 'select/3' predicate. > > > > The only solution I can see to this is to simply store the feed -> > article > > mapping remotely too, ie each word index server contains this info too > for > > ... > > Then I could simply filter by feed remotely. > > Not sure. But I feel that I would use distributed processing here only > if there is no other way (i.e. the parallel search with 'select/3'). > > Cheers, > - Alex > -- > UNSUBSCRIBE: mailto:picol...@software-lab.de?subject=unsubscribe >