Scott Cain <[EMAIL PROTECTED]> writes:
> A few days ago, I asked for advice on speeding up substring queries on
> the GENERAL mailing list. Joe Conway helpfully pointed out the ALTER
> TABLE STORAGE EXTERNAL documentation. After doing the alter,
> the queries got slower! Here is the background:
Scott Cain wrote:
Oh, and I forgot to mention: it's highly compressed (bzip2 -9) and is
109M.
Thanks. I'll grab a copy from home later today and see if I can find
some time to poke at it.
Joe
---(end of broadcast)---
TIP 1: subscribe and unsubscr
Oh, and I forgot to mention: it's highly compressed (bzip2 -9) and is
109M.
Scott
On Tue, 2003-08-05 at 11:01, Scott Cain wrote:
> Joe,
>
> Good idea, since I may not get around to profiling it this week. I
> created a dump of the data set I was working with. It is available at
> http://www.gm
> snipped much discussion about EXTERNAL storage and substring speed
Joe and Tom,
Thanks for all of your help; I went back to my (nearly) production
database, and executed the `update feature set residues = residues
||'';` and then redid my benchmark. Here's a summary of the results:
substr in
Joe Conway <[EMAIL PROTECTED]> writes:
> Tom Lane wrote:
>> Ah-hah, I've sussed it ... you didn't actually change the storage
>> representation. You wrote:
> Yeah, I came to the same conclusion this morning (update longdna set dna
> = dna || '';), but it still seems that the chunked table is ver
Tom Lane wrote:
Scott Cain <[EMAIL PROTECTED]> writes:
A few days ago, I asked for advice on speeding up substring queries on
the GENERAL mailing list. Joe Conway helpfully pointed out the ALTER
TABLE STORAGE EXTERNAL documentation. After doing the alter,
the queries got slower! Here is the bac
Joe,
Good idea, since I may not get around to profiling it this week. I
created a dump of the data set I was working with. It is available at
http://www.gmod.org/string_dump.bz2
Thanks,
Scott
On Mon, 2003-08-04 at 16:29, Joe Conway wrote:
> Is there a sample table schema and dataset available
Scott Cain wrote:
On Mon, 2003-08-04 at 11:53, Tom Lane wrote:
I find it really, really hard to believe that a crude reimplementation
in plpgsql of the TOAST concept could beat the built-in implementation
at all, let alone beat it by two orders of magnitude.
Either there's something unrealistic abo
> > 2. If you want to search for a sequence you'll need to deal with the case
> > where it starts in one chunk and ends in another.
>
> I forgot about searching--I suspect that application is why I faced
> opposition for shredding in my schema development group. Maybe I should
> push that off to t
On Mon, 2003-08-04 at 11:53, Tom Lane wrote:
> Scott Cain <[EMAIL PROTECTED]> writes:
> > At least this appears to work and is much faster, completing substring
> > operations like above in about 0.27 secs (that's about two orders of
> > magnitude improvement!)
>
> I find it really, really hard to
On 4 Aug 2003 at 12:14, Scott Cain wrote:
> I forgot about searching--I suspect that application is why I faced
> opposition for shredding in my schema development group. Maybe I should
> push that off to the file system and use grep (or BLAST). Otherwise, I
> could write a function that would se
On Mon, 2003-08-04 at 11:55, Richard Huxton wrote:
> On Monday 04 August 2003 16:25, Scott Cain wrote:
> [snip]
> > [snip]
>
> You might want some checks to make sure that smin < smax, otherwise looks like
> it does the job in a good clean fashion.
Good point--smin < smax generally by virtue of
On Monday 04 August 2003 16:25, Scott Cain wrote:
[snip]
> In an effort to find the best way to do this operation, I decided to
> look at what is my "worst case" scenario: the DNA sequence for human
> chromosome 1, which is about 250 million characters long (previous
> strings where about 20 millio
Scott Cain <[EMAIL PROTECTED]> writes:
> At least this appears to work and is much faster, completing substring
> operations like above in about 0.27 secs (that's about two orders of
> magnitude improvement!)
I find it really, really hard to believe that a crude reimplementation
in plpgsql of the
Hello,
Note: there is a SQL question way at the bottom of this narrative :-)
Last week I asked about doing substring operations on very long strings
(>10 million characters). I was given a suggestion to use EXTERNAL
storage on the column via the ALTER TABLE ... SET STORAGE command. In
one test
So it is possible that if I had a fast scsi drive, the performance might
be better?
On Thu, 2003-07-31 at 16:31, Joe Conway wrote:
> Scott Cain wrote:
> > Index Scan using feature_pkey on feature (cost=0.00..3.01 rows=1
> > width=153) (actual time=954.13..954.14 rows=1 loops=1)
> >Index Cond
On Thu, 2003-07-31 at 15:31, Joe Conway wrote:
> Scott Cain wrote:
> > Index Scan using feature_pkey on feature (cost=0.00..3.01 rows=1
> > width=153) (actual time=954.13..954.14 rows=1 loops=1)
> >Index Cond: (feature_id = 1)
> > Total runtime: 954.26 msec
> > (3 rows)
> >
> > Whoa! That'
Joe,
I'm working on the comparison--I think the best way to do it is to
reload the original data into a new database and compare them, so it
will take a while.
I have tuned postgresql.conf according to the page that everybody around
here seems to cite. I'll probably post back tomorrow with anothe
Scott Cain wrote:
I am not against recompiling. I am currently using an RPM version, but
I could probably recompile; the compilation is probably straight forward
(adding something like `--with_profiling` to ./configure), but how
straight forward is actually doing the profiling? Is there a documen
Scott Cain <[EMAIL PROTECTED]> writes:
> I see, encoding is a per database option. Since I've never set it, all
> my databases use sql_ascii.
Okay, then you've dodged the obvious bullet; time to try profiling I
guess. The way I usually do it is (given a clean, configured source
tree):
c
Scott Cain wrote:
So it is possible that if I had a fast scsi drive, the performance might
be better?
Faster drives are always better ;-)
Did you try the comparison with shorter substrings? Also, maybe not
related to your specific question, but have you tuned any other
postgresql.conf settings?
Scott Cain <[EMAIL PROTECTED]> writes:
> explain analyze select substring(residues from 100 for 2)
> from feature where feature_id=1;
> where feature is a table with ~3 million rows, and residues is a text
> column, where for the majority of the rows of feature, it is null, for a
> large m
Scott Cain <[EMAIL PROTECTED]> writes:
>> (BTW, if you are using a multibyte database encoding, then that's your
>> problem right there --- the optimization is practically useless unless
>> character and byte indexes are the same.)
> I shouldn't be, but since it is an RPM, I can't be sure.
Look a
Scott Cain wrote:
Index Scan using feature_pkey on feature (cost=0.00..3.01 rows=1
width=153) (actual time=954.13..954.14 rows=1 loops=1)
Index Cond: (feature_id = 1)
Total runtime: 954.26 msec
(3 rows)
Whoa! That's not what I expected, the time to do the query got more
that twice as long.
On Thu, 2003-07-31 at 16:32, Tom Lane wrote:
> Scott Cain <[EMAIL PROTECTED]> writes:
> >> (BTW, if you are using a multibyte database encoding, then that's your
> >> problem right there --- the optimization is practically useless unless
> >> character and byte indexes are the same.)
>
> > I shoul
On Thu, 2003-07-31 at 15:44, Tom Lane wrote:
> Scott Cain <[EMAIL PROTECTED]> writes:
> > explain analyze select substring(residues from 100 for 2)
> > from feature where feature_id=1;
>
> > where feature is a table with ~3 million rows, and residues is a text
> > column, where for the maj
Hello,
A few days ago, I asked for advice on speeding up substring queries on
the GENERAL mailing list. Joe Conway helpfully pointed out the ALTER
TABLE STORAGE EXTERNAL documentation. After doing the alter,
the queries got slower! Here is the background:
A freshly loaded database is VACUUM AN
27 matches
Mail list logo