Re: [HACKERS] Adding a suffix array index

2004-12-03 Thread Tom Lane
Troels Arvin <[EMAIL PROTECTED]> writes: > How much of[1] is still the case today? > Reference 1: > Stonebraker & Olson: Large Object Support in POSTGRES (1993) > http://epoch.cs.berkeley.edu:8000/postgres/papers/S2K-93-30.pdf Probably almost none of it ... the only thing I know about the Berkeley

Re: [HACKERS] Adding a suffix array index

2004-12-03 Thread Troels Arvin
On Sun, 28 Nov 2004 17:53:38 -0500, Tom Lane wrote: >> But is it cheaper, IO-wise to "jump" around in an index than to go back >> and forth between index and tuple blocks? > > Perhaps not --- but why would you be "jumping around"? Wouldn't the > needed info appear in consecutive locations in the

Re: [HACKERS] Adding a suffix array index

2004-11-28 Thread Tom Lane
Troels Arvin <[EMAIL PROTECTED]> writes: > On Sun, 28 Nov 2004 16:52:47 -0500, Tom Lane wrote: >> You need to be able >> to scan the index and identify rows matching a query without making lots >> of probes into the table. > But is it cheaper, IO-wise to "jump" around in an index than to go back >

Re: [HACKERS] Adding a suffix array index

2004-11-28 Thread Troels Arvin
On Sun, 28 Nov 2004 16:52:47 -0500, Tom Lane wrote: > CTID (block # + line #) is the only valid pointer from an index to a > table. Thanks. > I think > though that you'd be making a serious mistake by not duplicating the > suffixes into the index (rather than expecting to retrieve them from the

Re: [HACKERS] Adding a suffix array index

2004-11-28 Thread Tom Lane
Troels Arvin <[EMAIL PROTECTED]> writes: > What kind of (logical) block identifier should I point to in my index? CTID (block # + line #) is the only valid pointer from an index to a table. It doesn't change over the life of an index entry. I think though that you'd be making a serious mistake b

Re: [HACKERS] Adding a suffix array index

2004-11-28 Thread Troels Arvin
On Fri, 19 Nov 2004 10:35:20 -0500, Tom Lane wrote: >> 2. Does someone know of interesting documentation (perhaps >>in the form of interesting code comments) which I should >>read, as a basis for creating a non-standard index type >>in PostgreSQL? > > There's not a whole lot :-( and y

Re: [HACKERS] Adding a suffix array index

2004-11-19 Thread Simon Riggs
On Fri, 2004-11-19 at 10:42, Troels Arvin wrote: > Hello, > > I'm working on a thesis project where I explore the addition of a > specialized, bioinformatics-related data type to a RDBMS. My choice of > RDBMS is PostgreSQL, of course, and I've started by adding a "dnaseq" (DNA > sequence) data typ

Re: [HACKERS] Adding a suffix array index

2004-11-19 Thread Tom Lane
Troels Arvin <[EMAIL PROTECTED]> writes: > 2. Does someone know of interesting documentation (perhaps >in the form of interesting code comments) which I should >read, as a basis for creating a non-standard index type >in PostgreSQL? There's not a whole lot :-( and you should definitely

Re: [HACKERS] Adding a suffix array index

2004-11-19 Thread Oleg Bartunov
On Fri, 19 Nov 2004, Troels Arvin wrote: Hello Oleg, On Fri, 2004-11-19 at 15:35 +0300, Oleg Bartunov wrote: your project looks very attractive. Thanks. In principle, suffix array should be implemented using GiST framework. But in a previous conversation between the two of us, you wrote that the Gi

Re: [HACKERS] Adding a suffix array index

2004-11-19 Thread Troels Arvin
On Fri, 19 Nov 2004 14:38:20 +0200, Hannu Krosing wrote: >> Part of my current code concerns packing DNA characters: As the alphabet >> of DNA strings is very small (four characters), it seems like a >> straigt-forward optimization to store each character in two bits. > > My advice would be to

Re: [HACKERS] Adding a suffix array index

2004-11-19 Thread Troels Arvin
Hello Oleg, On Fri, 2004-11-19 at 15:35 +0300, Oleg Bartunov wrote: > your project looks very attractive. Thanks. > In principle, suffix array should be implemented using GiST framework. But in a previous conversation between the two of us, you wrote that the GiST wasn't suitable for this pro

Re: [HACKERS] Adding a suffix array index

2004-11-19 Thread Adam Witney
Hi Troels, This is not related to the database aspects of your question... But there are more than 4 possible letters in DNA sequences, 16 in fact. Depending on the accuracy of the DNA sequences you are storing, you may come across ambiguity DNA bases, so your type will have to take these into ac

Re: [HACKERS] Adding a suffix array index

2004-11-19 Thread Hannu Krosing
On R, 2004-11-19 at 12:42, Troels Arvin wrote: > The basic parts of the type are pretty much done. Those interested may > have a look at http://troels.arvin.dk/svn-snap/postgresql-dnaseq/ (the > code organization needs some clean-up). The basic type implementation > should be improved by adding mor

Re: [HACKERS] Adding a suffix array index

2004-11-19 Thread Oleg Bartunov
Hi, your project looks very attractive. In principle, suffix array should be implemented using GiST framework. String Btree should be very useful for your problem. My student is working on string btree library, but we have no plan to intergrate it into postgresql. Oleg On Fri, 19 Nov 2004,

[HACKERS] Adding a suffix array index

2004-11-19 Thread Troels Arvin
Hello, I'm working on a thesis project where I explore the addition of a specialized, bioinformatics-related data type to a RDBMS. My choice of RDBMS is PostgreSQL, of course, and I've started by adding a "dnaseq" (DNA sequence) data type, using PostgreSQL's APIs for type additions. The idea is t