Re: Scaling issue

2010-05-14 Thread Alexander Burger
On Thu, May 13, 2010 at 09:12:06PM +0200, Henrik Sarvell wrote:
 One thing first though, since articles are indexed when they're parsed
 and PL isn't doing any kind of sorting automatically on insert then
 they should be sorted by date automatically with the latest articles
 at the end of the database file since I suppose they're just appended?

While this is correct in principle, I would not rely on it. If there
should ever be an object deleted from that database file, the space
would be reused by the next new object, and the assumption would break.


 How can I simply start walking from the end of the file until I've
 found say 25 matches? This procedure should be the absolutely fastest
 way of getting what I want.

Currently I see no easy way. The only function that walks a database
file directly is 'seq', but it can only step forwards.


 I know about your iter example earlier and it seems like a good fit if
 it starts walking in the right end?

Yes, 'iter' (and the related 'scan') can walk in both directions. You
need only to pass inverted keys (i.e. Beg  End).


If I understand it right, however, you solved the problem in your next
mail(s) by using the date index, and starting at 6 months ago?

Cheers,
- Alex
-- 
UNSUBSCRIBE: mailto:picol...@software-lab.de?subject=unsubscribe


Re: Scaling issue

2010-05-14 Thread Henrik Sarvell
OK since I can't rely on sorting by date anyway let's forget that idea.

Yes since it seemed I had to involve dates anyway I simply chose a
date far back enough in time that if someone is looking for something
they might as well use Google.

Anyway the above is scanning 19 remotes containing indexes for 10 000
articles each and returns in 3-4 seconds which is OK for me, problem
solved as far as I'm concerned. I have to add though that all remotes
are currently on the same machine, had they been truly distributed it
would be faster, especially if the other machines were in the same
data center.

On Fri, May 14, 2010 at 7:55 AM, Alexander Burger a...@software-lab.de wrote:
 On Thu, May 13, 2010 at 09:12:06PM +0200, Henrik Sarvell wrote:
 One thing first though, since articles are indexed when they're parsed
 and PL isn't doing any kind of sorting automatically on insert then
 they should be sorted by date automatically with the latest articles
 at the end of the database file since I suppose they're just appended?

 While this is correct in principle, I would not rely on it. If there
 should ever be an object deleted from that database file, the space
 would be reused by the next new object, and the assumption would break.


 How can I simply start walking from the end of the file until I've
 found say 25 matches? This procedure should be the absolutely fastest
 way of getting what I want.

 Currently I see no easy way. The only function that walks a database
 file directly is 'seq', but it can only step forwards.


 I know about your iter example earlier and it seems like a good fit if
 it starts walking in the right end?

 Yes, 'iter' (and the related 'scan') can walk in both directions. You
 need only to pass inverted keys (i.e. Beg  End).


 If I understand it right, however, you solved the problem in your next
 mail(s) by using the date index, and starting at 6 months ago?

 Cheers,
 - Alex
 --
 UNSUBSCRIBE: mailto:picol...@software-lab.de?subject=unsubscribe

-- 
UNSUBSCRIBE: mailto:picol...@software-lab.de?subject=unsubscribe