Re: [HACKERS] Adding a suffix array index

2004-12-03 Thread Troels Arvin
On Sun, 28 Nov 2004 17:53:38 -0500, Tom Lane wrote: But is it cheaper, IO-wise to jump around in an index than to go back and forth between index and tuple blocks? Perhaps not --- but why would you be jumping around? Wouldn't the needed info appear in consecutive locations in the index?

Re: [HACKERS] Adding a suffix array index

2004-12-03 Thread Tom Lane
Troels Arvin [EMAIL PROTECTED] writes: How much of[1] is still the case today? Reference 1: Stonebraker Olson: Large Object Support in POSTGRES (1993) http://epoch.cs.berkeley.edu:8000/postgres/papers/S2K-93-30.pdf Probably almost none of it ... the only thing I know about the Berkeley-era

Re: [HACKERS] Adding a suffix array index

2004-11-28 Thread Troels Arvin
On Fri, 19 Nov 2004 10:35:20 -0500, Tom Lane wrote: 2. Does someone know of interesting documentation (perhaps in the form of interesting code comments) which I should read, as a basis for creating a non-standard index type in PostgreSQL? There's not a whole lot :-( and you should

Re: [HACKERS] Adding a suffix array index

2004-11-28 Thread Tom Lane
Troels Arvin [EMAIL PROTECTED] writes: What kind of (logical) block identifier should I point to in my index? CTID (block # + line #) is the only valid pointer from an index to a table. It doesn't change over the life of an index entry. I think though that you'd be making a serious mistake by

Re: [HACKERS] Adding a suffix array index

2004-11-28 Thread Troels Arvin
On Sun, 28 Nov 2004 16:52:47 -0500, Tom Lane wrote: CTID (block # + line #) is the only valid pointer from an index to a table. Thanks. I think though that you'd be making a serious mistake by not duplicating the suffixes into the index (rather than expecting to retrieve them from the

Re: [HACKERS] Adding a suffix array index

2004-11-28 Thread Tom Lane
Troels Arvin [EMAIL PROTECTED] writes: On Sun, 28 Nov 2004 16:52:47 -0500, Tom Lane wrote: You need to be able to scan the index and identify rows matching a query without making lots of probes into the table. But is it cheaper, IO-wise to jump around in an index than to go back and forth

[HACKERS] Adding a suffix array index

2004-11-19 Thread Troels Arvin
Hello, I'm working on a thesis project where I explore the addition of a specialized, bioinformatics-related data type to a RDBMS. My choice of RDBMS is PostgreSQL, of course, and I've started by adding a dnaseq (DNA sequence) data type, using PostgreSQL's APIs for type additions. The idea is to

Re: [HACKERS] Adding a suffix array index

2004-11-19 Thread Oleg Bartunov
Hi, your project looks very attractive. In principle, suffix array should be implemented using GiST framework. String Btree should be very useful for your problem. My student is working on string btree library, but we have no plan to intergrate it into postgresql. Oleg On Fri, 19 Nov 2004,

Re: [HACKERS] Adding a suffix array index

2004-11-19 Thread Hannu Krosing
On R, 2004-11-19 at 12:42, Troels Arvin wrote: The basic parts of the type are pretty much done. Those interested may have a look at http://troels.arvin.dk/svn-snap/postgresql-dnaseq/ (the code organization needs some clean-up). The basic type implementation should be improved by adding more

Re: [HACKERS] Adding a suffix array index

2004-11-19 Thread Adam Witney
Hi Troels, This is not related to the database aspects of your question... But there are more than 4 possible letters in DNA sequences, 16 in fact. Depending on the accuracy of the DNA sequences you are storing, you may come across ambiguity DNA bases, so your type will have to take these into

Re: [HACKERS] Adding a suffix array index

2004-11-19 Thread Troels Arvin
Hello Oleg, On Fri, 2004-11-19 at 15:35 +0300, Oleg Bartunov wrote: your project looks very attractive. Thanks. In principle, suffix array should be implemented using GiST framework. But in a previous conversation between the two of us, you wrote that the GiST wasn't suitable for this

Re: [HACKERS] Adding a suffix array index

2004-11-19 Thread Troels Arvin
On Fri, 19 Nov 2004 14:38:20 +0200, Hannu Krosing wrote: Part of my current code concerns packing DNA characters: As the alphabet of DNA strings is very small (four characters), it seems like a straigt-forward optimization to store each character in two bits. My advice would be to get it

Re: [HACKERS] Adding a suffix array index

2004-11-19 Thread Oleg Bartunov
On Fri, 19 Nov 2004, Troels Arvin wrote: Hello Oleg, On Fri, 2004-11-19 at 15:35 +0300, Oleg Bartunov wrote: your project looks very attractive. Thanks. In principle, suffix array should be implemented using GiST framework. But in a previous conversation between the two of us, you wrote that the

Re: [HACKERS] Adding a suffix array index

2004-11-19 Thread Tom Lane
Troels Arvin [EMAIL PROTECTED] writes: 2. Does someone know of interesting documentation (perhaps in the form of interesting code comments) which I should read, as a basis for creating a non-standard index type in PostgreSQL? There's not a whole lot :-( and you should definitely

Re: [HACKERS] Adding a suffix array index

2004-11-19 Thread Simon Riggs
On Fri, 2004-11-19 at 10:42, Troels Arvin wrote: Hello, I'm working on a thesis project where I explore the addition of a specialized, bioinformatics-related data type to a RDBMS. My choice of RDBMS is PostgreSQL, of course, and I've started by adding a dnaseq (DNA sequence) data type,