Re: [PERFORM] I can't wait too much: Total runtime 432478.44 msec

2003-08-04 Thread Manfred Koizar
On Mon, 04 Aug 2003 16:10:18 +0200, I wrote:
>SELECT DISTINCT ON (
>cp.id_instalacion,
>cp.id_contenido,
>cp.generar_vainilla,
>cp.fecha_publicacion
>   )

Cut'n'paste error!  fecha_publicacion should not be in the DISTINCT ON
list.  The same error is in my second suggestion (FROM (subselect)).

Servus
 Manfred

---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faqs/FAQ.html


Re: [osdldbt-general] Re: [PERFORM] OSDL Database Test Suite 3 is

2003-08-04 Thread scott.marlowe
On 4 Aug 2003, Jenny Zhang wrote:

> On Mon, 2003-08-04 at 06:33, Manfred Koizar wrote:
> > | effective_cache_size   | 1000
> > 
> > With 4GB of memory this is definitely too low and *can* (note that I
> > don't say *must*) lead the planner to wrong decisions.
> > 
> I changed the default to effective_cache_size=393216 as calculated by
> Scott.  Another way to check the execution plan is to go to the results
> dir:
> http://khack.osdl.org/stp/276917/results
> There is a 'power_plan.out' file to record the execution plan.  I am
> running a test with the changed effective_cache_size, I will see how it
> affect the plan.
> 
> > | shared_buffers | 15200
> > 
> > ... looks reasonable.  Did you test with other values?
> I have only one with shared_buffers=120 at:
> http://khack.osdl.org/stp/276847/
> The performance degraded.  

Well, that's truly huge, even for a machine with lots-o-ram.  Most tests 
find that once the shared_buffers are big enough to use more than about 25 
to 33% of RAM, they're too big, as you get little return.

> > | sort_mem   | 524288
> > 
> > This is a bit high, IMHO, but might be ok given that DBT3 is not run
> > with many concurrent sessions (right?).
> > http://khack.osdl.org/stp/276912/results/plot/vmstat_swap.png shows
> > some swapping activity towards the end of the run which could be
> > caused by a too high sort_mem setting.
> Right, I run only 4 streams.  Setting this parameter lower caused more
> reading/writing to the pgsql/tmp.  I guess the database has to do it if
> it can not do sorting in memory. 

Note that IF your sortmem really is 1/2 gig, then you'll likely have LESS 
than 3 gigs left for OS system cache.  About how big does top show buff 
and cached to be on that box under load?  Not that it's a big deal if you 
get the effective cache size off by a little bit, it's more of a rubber 
mallet setting than a jeweler's screw driver setting.


Thanks a bunch for all the great testing.  It's a very nice tool to have 
for convincing the bosses to go with Postgresql.


---(end of broadcast)---
TIP 8: explain analyze is your friend


Re: [osdldbt-general] Re: [PERFORM] OSDL Database Test Suite 3 is

2003-08-04 Thread Craig Thomas
On Mon, 2003-08-04 at 09:39, Shridhar Daithankar wrote:
> On 4 Aug 2003 at 15:33, Manfred Koizar wrote:
> 
> > On 01 Aug 2003 11:04:10 -0700, Jenny Zhang <[EMAIL PROTECTED]> wrote:
> > >A sample OSDL-DBT3 test result report can be found at:
> > >http://khack.osdl.org/stp/276912/
> > >
> > >Your comments are welcome,
> 
> I could not get postgresql .conf so I will combine the comments.
> 
> 1. Effective cache size, already mentioned
> 2. Sort memory already mentioned.
> 3. Was WAL put on different drive?
> 4. Can you try with autovacuum daemon and 7.4beta when it comes out..
> 5. What was the file system? Ext2/Ext3/reiser/XFS?
> 
> 
> 
> Is there any comparison available for other databases.. Could be interesting to 
> see..:-)

OSDL has run workloads using the SAP DB and PostgreSQL.  However, each
of the workloads have been tweaked to work around deficiencies of each
database with respect to the TPC benchmarks from with the DBT workloads
are derrived.  Since there are extensive modifications to each workload
an that the fact that each workload operates under different situations
(SAP uses raw disk, PostgreSQL uses a file system), it is not beneficial
to compare numbers between different databases.

Remember, the intent is to provide a tool kit that can be used to 
benefit the community.  From other postings, it appears that these 
workloads we have available can be used to help the PostgreSQl community
develop a better database; one that is better able to handle the kinds
of stress these workloads can produce when scaled to large database
sizes.

We have been using these kits to characterize the abilities of the Linux
kernel.  To show that these workloads work with two different databases
implies that Linux is capable of supporting these two databases.

The other tool kits, by the way, are being ported to PostgreSQL as well.
Help is needed to tune the workloads to exercise PostgreSQL better.  It
would be great if you could get involved with the porting efforts and
assist with the tuning of the PostgreSQL kit.


> 
> 
> 
> Thanks for the good work. I understand it must have been quite an effort to run 
> it..
> 
> Keep it up..
> 
> Bye
>  Shridhar
> 
> --
> Fourth Law of Revision:   It is usually impractical to worry beforehand about
>  
> interferences -- if you have none, someone will make one for you.
> 
> 
> 
> ---
> This SF.Net email sponsored by: Free pre-built ASP.NET sites including
> Data Reports, E-commerce, Portals, and Forums are available now.
> Download today and enter to win an XBOX or Visual Studio .NET.
> http://aspnet.click-url.com/go/psa0013ave/direct;at.aspnet_072303_01/01
> ___
> osdldbt-general mailing list
> [EMAIL PROTECTED]
> https://lists.sourceforge.net/lists/listinfo/osdldbt-general
-- 
Craig Thomas
[EMAIL PROTECTED]


---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


Re: [osdldbt-general] Re: [PERFORM] OSDL Database Test Suite 3 is

2003-08-04 Thread Jenny Zhang
Thanks all for your feedback.

I think I should explain more about how to use this test kit. 

The main purpose of putting the test kit on Scalability Test
Platform(STP) is that testers can run the workload against the database
with different parameters and Linux kernels to see performance
differences.  Though the test kit picks up default parameters if they
are not provided, the command line parameters overwrite the default
ones.  Currently, the following parameters are supported:
-s  -n  -d '' -r
<{0|1}> -x <{0|1}> 

where:
-s  is tpc-h database scale factor, right now, only SF=1
is available.

-n  is the number of throughput test streams, which
corresponds number of simultaneous database connections during
throughput test.

-d  is the database parameters used when starting
postmaster.  for example: 
-B 12 -c effective_cache_size=393216 -c sort_mem=524288 -c
stats_command_string=true -c stats_row_level=true -c
stats_block_level=true

-r {0|1}: indicates if the database dir base//pgsql_tmp is
put on a separate disk drive

-x {0|1}: indicates if the WAL is put on a separate disk drive.

The other comments are in-lined:

On Mon, 2003-08-04 at 06:33, Manfred Koizar wrote:
> | effective_cache_size   | 1000
> 
> With 4GB of memory this is definitely too low and *can* (note that I
> don't say *must*) lead the planner to wrong decisions.
> 
I changed the default to effective_cache_size=393216 as calculated by
Scott.  Another way to check the execution plan is to go to the results
dir:
http://khack.osdl.org/stp/276917/results
There is a 'power_plan.out' file to record the execution plan.  I am
running a test with the changed effective_cache_size, I will see how it
affect the plan.

> | shared_buffers | 15200
> 
> ... looks reasonable.  Did you test with other values?
I have only one with shared_buffers=120 at:
http://khack.osdl.org/stp/276847/
The performance degraded.  
> 
> | sort_mem   | 524288
> 
> This is a bit high, IMHO, but might be ok given that DBT3 is not run
> with many concurrent sessions (right?).
> http://khack.osdl.org/stp/276912/results/plot/vmstat_swap.png shows
> some swapping activity towards the end of the run which could be
> caused by a too high sort_mem setting.
Right, I run only 4 streams.  Setting this parameter lower caused more
reading/writing to the pgsql/tmp.  I guess the database has to do it if
it can not do sorting in memory. 

On 4 Aug 2003 at 15:33, Manfred Koizar wrote:
> 
> I could not get postgresql .conf so I will combine the comments.
It is under database monitor data: database parameters
> 
> 1. Effective cache size, already mentioned
> 2. Sort memory already mentioned.
> 3. Was WAL put on different drive?
That run did not put WAL on different drive.  I changed it this morning
so that it is configurable.  Also I changed the result page so that the
testers can tell from the result page.
> 4. Can you try with autovacuum daemon and 7.4beta when it comes out..
I'd be happy to run it.  We would like to improve out Patch Life
Management(PLM) system so that it can accept PG patches and run
performance tests on those patches.  Right now PLM only manages Linux
Kernel patches.  I would like to ask the PostgreSQL community if this
kind of tools is of interest.
> 5. What was the file system? Ext2/Ext3/reiser/XFS?
> 
> 
> 
It is Ext2.  Yeah, it is not reported on the page.
> Is there any comparison available for other databases.. Could be interesting to 
> see..:-)
> 
> 
> 

Let me know if you have any suggestions about how to improve the test
kit (parameters, reported information, etc.), or how to make it more
useful to PG community.

Thanks,
-- 
Jenny Zhang
Open Source Development Lab Inc 
12725 SW Millikan Way
Suite 400
Beaverton, OR 97005
(503)626-2455 ext 31


---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


Re: [PERFORM] I can't wait too much: Total runtime 432478.44 msec

2003-08-04 Thread Tom Lane
"Fernando Papa" <[EMAIL PROTECTED]> writes:
> Thanks Tom. I vaccumed full every night. Now I drop function index and
> change the upper. Nothing change (I know, total time rise because we are
> doing other things on database now).

>  ->  Seq Scan on cont_publicacion
> (cost=0.00..5329.47 rows=10 width=35) (actual time=41.74..18004.75
> rows=97 loops=40)
>Filter: (((generar_vainilla = 'S'::character
> varying) OR (generar_vainilla = 's'::character varying)) AND
> (fecha_publicacion = (subplan)))
>SubPlan
>  ->  Aggregate  (cost=11.86..11.86 rows=1
> width=8) (actual time=40.15..40.15 rows=1 loops=17880)

Something fishy going on here.  Why did it switch to a seqscan,
considering it still (mistakenly) thinks there are only going to be 10
or 20 rows matching the generar_vainilla condition?  How many rows have
generar_vainilla equal to 's' or 'S', anyway?

In any case, the real problem is to get rid of the subselect at the 
Now that I look at your original query, I see that what you really seem
to be after is the publications with latest pub date among each group with
identical id_instalacion, id_contenido, and generar_vainilla.  You would
probably do well to reorganize the query using SELECT DISTINCT ON, viz

SELECT * FROM
(SELECT DISTINCT ON (id_instalacion, id_contenido, generar_vainilla)
  ...
 FROM ...
 WHERE ...
 ORDER BY
  id_instalacion, id_contenido, generar_vainilla, fecha_publicacion DESC)
AS ss
ORDER BY fecha_publicacion desc 
LIMIT 10
OFFSET 0

See the "weather reports" example in the SELECT reference page for
motivation.

regards, tom lane

---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match


Re: [PERFORM] I can't wait too much: Total runtime 432478.44 msec

2003-08-04 Thread Fernando Papa

Thanks Tom. I vaccumed full every night. Now I drop function index and
change the upper. Nothing change (I know, total time rise because we are
doing other things on database now). But you can see, if was any
performace gain i didn't see. Actually I get better results when I
disable nested loops or disable merge joins, as I write in a older post.

Thanks!

 Limit  (cost=9.76..9.76 rows=1 width=479) (actual
time=720480.00..720480.03 rows=8 loops=1)
   ->  Sort  (cost=9.76..9.76 rows=1 width=479) (actual
time=720479.99..720480.00 rows=8 loops=1)
 Sort Key: cont_publicacion.fecha_publicacion
 ->  Merge Join  (cost=9.73..9.75 rows=1 width=479) (actual
time=323197.81..720477.96 rows=8 loops=1)
   Merge Cond: (("outer".id_instalacion =
"inner".id_instalacion) AND ("outer".id_contenido =
"inner".id_contenido))
   ->  Nested Loop  (cost=0.00..213197.04 rows=4 width=367)
(actual time=12136.55..720425.66 rows=40 loops=1)
 Join Filter: (("inner".id_contenido =
"outer".id_contenido) AND ("inner".id_instalacion =
"outer".id_instalacion))
 ->  Index Scan using jue_conf_pk on juegos_config
(cost=0.00..12.19 rows=40 width=332) (actual time=34.13..92.02 rows=40
loops=1)
 ->  Seq Scan on cont_publicacion
(cost=0.00..5329.47 rows=10 width=35) (actual time=41.74..18004.75
rows=97 loops=40)
   Filter: (((generar_vainilla = 'S'::character
varying) OR (generar_vainilla = 's'::character varying)) AND
(fecha_publicacion = (subplan)))
   SubPlan
 ->  Aggregate  (cost=11.86..11.86 rows=1
width=8) (actual time=40.15..40.15 rows=1 loops=17880)
   ->  Index Scan using
cont_pub_gen_vainilla on cont_publicacion cp1  (cost=0.00..11.86 rows=1
width=8) (actual time=16.89..40.01 rows=7 loops=17880)
 Index Cond: (generar_vainilla =
$2)
 Filter: ((id_instalacion = $0)
AND (id_contenido = $1))
   ->  Sort  (cost=9.73..9.74 rows=3 width=112) (actual
time=30.96..31.00 rows=8 loops=1)
 Sort Key: cont_contenido.id_instalacion,
cont_contenido.id_contenido
 ->  Seq Scan on cont_contenido  (cost=0.00..9.70
rows=3 width=112) (actual time=0.65..28.98 rows=8 loops=1)
   Filter: ((id_instalacion = 2::numeric) AND
(id_sbc = 619::numeric) AND (id_tipo = 2::numeric))
 Total runtime: 720595.77 msec
(20 rows)

-Mensaje original-
De: Tom Lane [mailto:[EMAIL PROTECTED] 
Enviado el: lunes, 04 de agosto de 2003 18:28
Para: Fernando Papa
CC: [EMAIL PROTECTED]
Asunto: Re: [PERFORM] I can't wait too much: Total runtime 432478.44
msec 


"Fernando Papa" <[EMAIL PROTECTED]> writes:
>->  Nested Loop  (cost=0.00..1828.46 rows=1 width=367) 
> (actual time=7525.51..436843.27 rows=40 loops=1)
>  Join Filter: (("inner".id_contenido =
> "outer".id_contenido) AND ("inner".id_instalacion =
> "outer".id_instalacion))
>  ->  Index Scan using jue_conf_pk on juegos_config

> (cost=0.00..12.19 rows=40 width=332) (actual time=0.38..6.63 rows=40
> loops=1)
>  ->  Index Scan using idx_generar_vainilla_ci on 
> cont_publicacion  (cost=0.00..45.39 rows=1 width=35) (actual 
> time=48.81..10917.53 rows=97 loops=40)
>Index Cond: 
> (upper((generar_vainilla)::text) = 'S'::text)
>Filter: (subplan)
>SubPlan
>  ->  Aggregate  (cost=15.85..15.85 rows=1
> width=8) (actual time=24.30..24.30 rows=0 loops=17880)

As best I can tell, the problem here is coming from a drastic
underestimate of the number of rows selected by
"upper(generar_vainilla) = 'S'".  Evidently there are about 450 such
rows (since in 40 repetitions of the inner index scan, the aggregate
subplan gets evaluated 17880 times), but the planner seems to think
there will be only about two such rows.  Had it made a more correct
estimate, it would never have picked a plan that required multiple
repetitions of the indexscan.

One thing I'm wondering is if you've VACUUM ANALYZEd cont_publicacion
lately --- the cost estimate seems on the small side, and I'm wondering
if the planner thinks the table is much smaller than it really is.  But
assuming you didn't make that mistake, the only solution I can see is to
not use a functional index.  The planner is not good about making row
count estimates for functional indexes.  You could replace the index on 
upper(generar_vainilla) with a plain index on generar_vainilla, and
change the query condition from "upper(generar_vainilla) = 'S'" to
"generar_vainilla IN ('S', 's')".  I think the planner would have a lot
better chance at understanding the statistics that way.

regards, tom lane

---(end of broadc

Re: [PERFORM] I can't wait too much: Total runtime 432478.44 msec

2003-08-04 Thread Tom Lane
"Fernando Papa" <[EMAIL PROTECTED]> writes:
>->  Nested Loop  (cost=0.00..1828.46 rows=1 width=367)
> (actual time=7525.51..436843.27 rows=40 loops=1)
>  Join Filter: (("inner".id_contenido =
> "outer".id_contenido) AND ("inner".id_instalacion =
> "outer".id_instalacion))
>  ->  Index Scan using jue_conf_pk on juegos_config
> (cost=0.00..12.19 rows=40 width=332) (actual time=0.38..6.63 rows=40
> loops=1)
>  ->  Index Scan using idx_generar_vainilla_ci on
> cont_publicacion  (cost=0.00..45.39 rows=1 width=35) (actual
> time=48.81..10917.53 rows=97 loops=40)
>Index Cond: (upper((generar_vainilla)::text)
> = 'S'::text)
>Filter: (subplan)
>SubPlan
>  ->  Aggregate  (cost=15.85..15.85 rows=1
> width=8) (actual time=24.30..24.30 rows=0 loops=17880)

As best I can tell, the problem here is coming from a drastic
underestimate of the number of rows selected by
"upper(generar_vainilla) = 'S'".  Evidently there are about 450 such
rows (since in 40 repetitions of the inner index scan, the aggregate
subplan gets evaluated 17880 times), but the planner seems to think
there will be only about two such rows.  Had it made a more correct
estimate, it would never have picked a plan that required multiple
repetitions of the indexscan.

One thing I'm wondering is if you've VACUUM ANALYZEd cont_publicacion
lately --- the cost estimate seems on the small side, and I'm wondering
if the planner thinks the table is much smaller than it really is.  But
assuming you didn't make that mistake, the only solution I can see is to
not use a functional index.  The planner is not good about making row
count estimates for functional indexes.  You could replace the index on 
upper(generar_vainilla) with a plain index on generar_vainilla, and
change the query condition from "upper(generar_vainilla) = 'S'" to
"generar_vainilla IN ('S', 's')".  I think the planner would have a lot
better chance at understanding the statistics that way.

regards, tom lane

---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match


Re: [PERFORM] Indexes not used for "min()"

2003-08-04 Thread Bruno Wolff III
On Mon, Aug 04, 2003 at 15:05:08 -0600,
  "Valsecchi, Patrick" <[EMAIL PROTECTED]> wrote:
> Sir,
> 
> I did a search with the "index" keyword on the mailing list archive and it did come 
> with no result. Sorry if it's a known bug. 

It isn't a bug. It is a design trade off. The database has no special
knowledge about the min and max aggregate functions that would allow it
to use indexes.

> But in general, I think the indexes are under used. I have several queries that are 
> taking at least 30 minutes and that would take less than one minute if indexes where 
> used (comes with a comparison I made with Oracle). In particular, I have the feeling 
> that indexes are not used for "IN" statements (within a where clauses).

There are know performance problems with in. These are addressed in 7.4
which will be going into beta any time now. You can usually rewrite IN
queries to use exists instead, which will speed things up.

Also be sure to run analyze (or vacuum analyze) so that the database
server has accurate statistics on which to bases its decisions.

> On the same subject, I'd add that the ability to provide plan "hints" within the 
> queries (like provided in Oracle) would be helpful. I know that the Postgres 
> optimizer  is supposed to do a better job than the one from Oracle, but an optimizer 
> cannot be perfect for every cases.

"Hints" aren't going to happen. They cause maintainance problems.
You can disable features for a session (such as sequential scans)
and try to get a plan you like. But generally, rather than adding
this to you application code, you should try to find out why the
planner is making the wrong choice. Adjusting the relative costs
for doing things might allow the planner to do a much better job for
you.

This kind of thing gets discussed on the performance list.

---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faqs/FAQ.html


Re: [SQL] [PERFORM] EXTERNAL storage and substring on long strings

2003-08-04 Thread Joe Conway
Scott Cain wrote:
On Mon, 2003-08-04 at 11:53, Tom Lane wrote:
I find it really, really hard to believe that a crude reimplementation
in plpgsql of the TOAST concept could beat the built-in implementation
at all, let alone beat it by two orders of magnitude.
Either there's something unrealistic about your testing of the
dna_string function, or your original tests are not causing TOAST to be
invoked in the expected way, or there's a bug we need to fix.  I'd
really like to see some profiling of the poor-performing
external-storage case, so we can figure out what's going on.
I was really hoping for a "Good job and glad to hear it" from you :-)

I don't think there is anything unrealistic about my function or its
testing, as it is very much along the lines of the types of things we do
now.  I will really try to do some profiling this week to help figure
out what is going on.
Is there a sample table schema and dataset available (external-storage 
case) that we can play with?

Joe

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
   (send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])


Re: [PERFORM] I can't wait too much: Total runtime 432478.44 msec

2003-08-04 Thread Fernando Papa

I was play with nested loops, and I found this:


Original explain:

 Limit  (cost=9.75..9.76 rows=1 width=479) (actual
time=436858.90..436858.93 rows=8 loops=1)
   ->  Sort  (cost=9.75..9.76 rows=1 width=479) (actual
time=436858.88..436858.89 rows=8 loops=1)
 Sort Key: cont_publicacion.fecha_publicacion
 ->  Merge Join  (cost=9.73..9.74 rows=1 width=479) (actual
time=196970.93..436858.04 rows=8 loops=1)
   Merge Cond: (("outer".id_instalacion =
"inner".id_instalacion) AND ("outer".id_contenido =
"inner".id_contenido))
   ->  Nested Loop  (cost=0.00..1828.46 rows=1 width=367)
(actual time=7525.51..436843.27 rows=40 loops=1)
 Join Filter: (("inner".id_contenido =
"outer".id_contenido) AND ("inner".id_instalacion =
"outer".id_instalacion))
 ->  Index Scan using jue_conf_pk on juegos_config
(cost=0.00..12.19 rows=40 width=332) (actual time=0.38..6.63 rows=40
loops=1)
 ->  Index Scan using idx_generar_vainilla_ci on
cont_publicacion  (cost=0.00..45.39 rows=1 width=35) (actual
time=48.81..10917.53 rows=97 loops=40)
   Index Cond: (upper((generar_vainilla)::text)
= 'S'::text)
   Filter: (subplan)
   SubPlan
 ->  Aggregate  (cost=15.85..15.85 rows=1
width=8) (actual time=24.30..24.30 rows=0 loops=17880)
   Filter: (max(fecha_publicacion) = $3)
   ->  Seq Scan on cont_publicacion cp1
(cost=0.00..15.84 rows=1 width=8) (actual time=10.17..24.12 rows=7
loops=17880)
 Filter: ((id_instalacion = $0)
AND (id_contenido = $1) AND (generar_vainilla = $2))
   ->  Sort  (cost=9.73..9.74 rows=3 width=112) (actual
time=8.91..8.95 rows=8 loops=1)
 Sort Key: cont_contenido.id_instalacion,
cont_contenido.id_contenido
 ->  Seq Scan on cont_contenido  (cost=0.00..9.70
rows=3 width=112) (actual time=0.45..7.59 rows=8 loops=1)
   Filter: ((id_instalacion = 2::numeric) AND
(id_sbc = 619::numeric) AND (id_tipo = 2::numeric))
 Total runtime: 436860.84 msec
(21 rows)

With set enable_nestloop to off :
 


--
 Limit  (cost=55.15..55.16 rows=1 width=479) (actual
time=11394.79..11394.82 rows=8 loops=1)
   ->  Sort  (cost=55.15..55.16 rows=1 width=479) (actual
time=11394.77..11394.79 rows=8 loops=1)
 Sort Key: cont_publicacion.fecha_publicacion
 ->  Merge Join  (cost=55.13..55.14 rows=1 width=479) (actual
time=11380.12..11394.01 rows=8 loops=1)
   Merge Cond: (("outer".id_instalacion =
"inner".id_instalacion) AND ("outer".id_contenido =
"inner".id_contenido))
   ->  Merge Join  (cost=45.40..45.41 rows=1 width=367)
(actual time=11358.48..11380.18 rows=40 loops=1)
 Merge Cond: (("outer".id_instalacion =
"inner".id_instalacion) AND ("outer".id_contenido =
"inner".id_contenido))
 ->  Index Scan using jue_conf_pk on juegos_config
(cost=0.00..12.19 rows=40 width=332) (actual time=0.23..5.62 rows=40
loops=1)
 ->  Sort  (cost=45.40..45.40 rows=1 width=35)
(actual time=11357.48..11357.68 rows=97 loops=1)
   Sort Key: cont_publicacion.id_instalacion,
cont_publicacion.id_contenido
   ->  Index Scan using idx_generar_vainilla_ci
on cont_publicacion  (cost=0.00..45.39 rows=1 width=35) (actual
time=48.81..11339.80 rows=97 loops=1)
 Index Cond:
(upper((generar_vainilla)::text) = 'S'::text)
 Filter: (fecha_publicacion = (subplan))
 SubPlan
   ->  Aggregate  (cost=15.84..15.84
rows=1 width=8) (actual time=25.21..25.22 rows=1 loops=447)
 ->  Seq Scan on
cont_publicacion cp1  (cost=0.00..15.84 rows=1 width=8) (actual
time=10.21..25.07 rows=7 loops=447)
   Filter: ((id_instalacion
= $0) AND (id_contenido = $1) AND (generar_vainilla = $2))
   ->  Sort  (cost=9.73..9.74 rows=3 width=112) (actual
time=8.77..8.79 rows=8 loops=1)
 Sort Key: cont_contenido.id_instalacion,
cont_contenido.id_contenido
 ->  Seq Scan on cont_contenido  (cost=0.00..9.70
rows=3 width=112) (actual time=0.45..7.41 rows=8 loops=1)
   Filter: ((id_instalacion = 2::numeric) AND
(id_sbc = 619::numeric) AND (id_tipo = 2::numeric))
 Total runtime: 11397.66 msec
(22 rows)



Why postgresql don't choose not to use nested loop? Why is more cheap to
use nested loops but It's take a lot of time

Re: [SQL] [PERFORM] EXTERNAL storage and substring on long strings

2003-08-04 Thread Matt Clark
> > 2. If you want to search for a sequence you'll need to deal with the case
> > where it starts in one chunk and ends in another.
>
> I forgot about searching--I suspect that application is why I faced
> opposition for shredding in my schema development group.  Maybe I should
> push that off to the file system and use grep (or BLAST).  Otherwise, I
> could write a function that would search the chunks first, then after
> failing to find the substring in those, I could start sewing the chunks
> together to look for the query string.  That could get ugly (and
> slow--but if the user knows that and expects it to be slow, I'm ok with
> that).

If you know the max length of the sequences being searched for, and this is much less 
than the chunk size, then you could simply
have the chunks overlap by that much, thus guaranteeing every substring will be found 
in its entirety in at least one chunk.



---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


Re: [PERFORM] OSDL Database Test Suite 3 is available on PostgreSQL

2003-08-04 Thread Shridhar Daithankar
On 4 Aug 2003 at 15:33, Manfred Koizar wrote:

> On 01 Aug 2003 11:04:10 -0700, Jenny Zhang <[EMAIL PROTECTED]> wrote:
> >A sample OSDL-DBT3 test result report can be found at:
> >http://khack.osdl.org/stp/276912/
> >
> >Your comments are welcome,

I could not get postgresql .conf so I will combine the comments.

1. Effective cache size, already mentioned
2. Sort memory already mentioned.
3. Was WAL put on different drive?
4. Can you try with autovacuum daemon and 7.4beta when it comes out..
5. What was the file system? Ext2/Ext3/reiser/XFS?



Is there any comparison available for other databases.. Could be interesting to 
see..:-)



Thanks for the good work. I understand it must have been quite an effort to run 
it..

Keep it up..

Bye
 Shridhar

--
Fourth Law of Revision: It is usually impractical to worry beforehand about 
interferences -- if you have none, someone will make one for you.


---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match


Re: [SQL] [PERFORM] EXTERNAL storage and substring on long strings

2003-08-04 Thread Scott Cain
On Mon, 2003-08-04 at 11:53, Tom Lane wrote:
> Scott Cain <[EMAIL PROTECTED]> writes:
> > At least this appears to work and is much faster, completing substring
> > operations like above in about 0.27 secs (that's about two orders of
> > magnitude improvement!)
> 
> I find it really, really hard to believe that a crude reimplementation
> in plpgsql of the TOAST concept could beat the built-in implementation
> at all, let alone beat it by two orders of magnitude.
> 
> Either there's something unrealistic about your testing of the
> dna_string function, or your original tests are not causing TOAST to be
> invoked in the expected way, or there's a bug we need to fix.  I'd
> really like to see some profiling of the poor-performing
> external-storage case, so we can figure out what's going on.
> 
I was really hoping for a "Good job and glad to hear it" from you :-)

I don't think there is anything unrealistic about my function or its
testing, as it is very much along the lines of the types of things we do
now.  I will really try to do some profiling this week to help figure
out what is going on.

Scott

-- 

Scott Cain, Ph. D. [EMAIL PROTECTED]
GMOD Coordinator (http://www.gmod.org/) 216-392-3087
Cold Spring Harbor Laboratory


---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org


Re: [PERFORM] OSDL Database Test Suite 3 is available on PostgreSQL

2003-08-04 Thread scott.marlowe
On Mon, 4 Aug 2003, Manfred Koizar wrote:

> On 01 Aug 2003 11:04:10 -0700, Jenny Zhang <[EMAIL PROTECTED]> wrote:
> >A sample OSDL-DBT3 test result report can be found at:
> >http://khack.osdl.org/stp/276912/
> >
> >Your comments are welcome,
> 
> | effective_cache_size   | 1000
> 
> With 4GB of memory this is definitely too low and *can* (note that I
> don't say *must*) lead the planner to wrong decisions.
> 
> | shared_buffers | 15200
> 
> ... looks reasonable.  Did you test with other values?
> 
> | sort_mem   | 524288
> 
> This is a bit high, IMHO, but might be ok given that DBT3 is not run
> with many concurrent sessions (right?).
> http://khack.osdl.org/stp/276912/results/plot/vmstat_swap.png shows
> some swapping activity towards the end of the run which could be
> caused by a too high sort_mem setting.

Sorry, my last email shot off out of the gun before it was completed...

To repeat:  Don't forget to set effective_cache_size the same way as 
shared buffers (i.e. it's in 8k blocks for most systems.)  If you have a 
machine with 4 gig ram, and 3 gigs is available as disk cache, then 
divide out 3 gigs by 8k to get the right number.  My quick calculation 
shows that being about 393216 blocks.


---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


Re: [PERFORM] OSDL Database Test Suite 3 is available on PostgreSQL

2003-08-04 Thread scott.marlowe
On Mon, 4 Aug 2003, Manfred Koizar wrote:

> On 01 Aug 2003 11:04:10 -0700, Jenny Zhang <[EMAIL PROTECTED]> wrote:
> >A sample OSDL-DBT3 test result report can be found at:
> >http://khack.osdl.org/stp/276912/
> >
> >Your comments are welcome,
> 
> | effective_cache_size   | 1000
> 
> With 4GB of memory this is definitely too low and *can* (note that I
> don't say *must*) lead the planner to wrong decisions.
> 
> | shared_buffers | 15200
> 
> ... looks reasonable.  Did you test with other values?
> 
> | sort_mem   | 524288
> 
> This is a bit high, IMHO, but might be ok given that DBT3 is not run
> with many concurrent sessions (right?).
> http://khack.osdl.org/stp/276912/results/plot/vmstat_swap.png shows
> some swapping activity towards the end of the run which could be
> caused by a too high sort_mem setting.

And, as always, don't forget to set effect_cache_size.


---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings


Re: [SQL] [PERFORM] EXTERNAL storage and substring on long strings

2003-08-04 Thread Shridhar Daithankar
On 4 Aug 2003 at 12:14, Scott Cain wrote:
> I forgot about searching--I suspect that application is why I faced
> opposition for shredding in my schema development group.  Maybe I should
> push that off to the file system and use grep (or BLAST).  Otherwise, I
> could write a function that would search the chunks first, then after
> failing to find the substring in those, I could start sewing the chunks
> together to look for the query string.  That could get ugly (and
> slow--but if the user knows that and expects it to be slow, I'm ok with
> that).

I assume your DNA sequence is compacted. Your best bet would be to fetch them 
from database and run blast on them in client memory. No point duplicating 
blast functionality. Last I tried it beat every technique of text searching 
when heuristics are involved.

Bye
 Shridhar

--
There are two types of Linux developers - those who can spell, andthose who 
can't. There is a constant pitched battle between the two.(From one of the post-
1.1.54 kernel update messages posted to c.o.l.a)


---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org


Re: [SQL] [PERFORM] EXTERNAL storage and substring on long strings

2003-08-04 Thread Scott Cain
On Mon, 2003-08-04 at 11:55, Richard Huxton wrote:
> On Monday 04 August 2003 16:25, Scott Cain wrote:
> [snip]
> > [snip]
> 
> You might want some checks to make sure that smin < smax, otherwise looks like 
> it does the job in a good clean fashion.

Good point--smin < smax generally by virtue of the application using the
database, but I shouldn't assume that will always be the case.
> 
> Glad to hear it's going to solve your problems. Two things you might want to 
> bear in mind:
> 1. There's probably a "sweet spot" where the chunk size interacts well with 
> your data, usage patterns and PGs backend to give you peak performance. 
> You'll have to test.

Yes, I had a feeling that was probably the case-- since this is an open
source project, I will need to write directions for installers on
picking a reasonable chunk size.

> 2. If you want to search for a sequence you'll need to deal with the case 
> where it starts in one chunk and ends in another.

I forgot about searching--I suspect that application is why I faced
opposition for shredding in my schema development group.  Maybe I should
push that off to the file system and use grep (or BLAST).  Otherwise, I
could write a function that would search the chunks first, then after
failing to find the substring in those, I could start sewing the chunks
together to look for the query string.  That could get ugly (and
slow--but if the user knows that and expects it to be slow, I'm ok with
that).

Thanks,
Scott

-- 

Scott Cain, Ph. D. [EMAIL PROTECTED]
GMOD Coordinator (http://www.gmod.org/) 216-392-3087
Cold Spring Harbor Laboratory


---(end of broadcast)---
TIP 8: explain analyze is your friend


Re: [SQL] [PERFORM] EXTERNAL storage and substring on long strings

2003-08-04 Thread Richard Huxton
On Monday 04 August 2003 16:25, Scott Cain wrote:
[snip]
> In an effort to find the best way to do this operation, I decided to
> look at what is my "worst case" scenario: the DNA sequence for human
> chromosome 1, which is about 250 million characters long (previous
> strings where about 20 million characters long).  I wrote a perl script
> to do several substring operations over this very long string, with
> substring lengths varying between 1000 and 40,000 characters spread out
> over various locations along the string.  While EXTENDED storage won in
> this case, it was a hollow victory: 38 seconds per operation versus 40
> seconds, both of which are way too long to for an interactive
> application.
>
> Time for a new method.  A suggestion from my boss was to "shred" the DNA
> into smallish chunks and a column giving offsets from the beginning of
> the string, so that it can be reassembled when needed. Here is the test
> table:
>
> string=> \d dna
>   Table "public.dna"
>  Column  |  Type   | Modifiers
> -+-+---
>  foffset | integer |
>  pdna| text|
> Indexes: foffset_idx btree (foffset)

[snipped plpgsql function which stitches chunks together and then substrings]

> So here's the question: I've never written a plpgsql function before, so
> I don't have much experience with it; is there anything obviously wrong
> with this function, or are there things that could be done better?  At
> least this appears to work and is much faster, completing substring
> operations like above in about 0.27 secs (that's about two orders of
> magnitude improvement!)

You might want some checks to make sure that smin < smax, otherwise looks like 
it does the job in a good clean fashion.

Glad to hear it's going to solve your problems. Two things you might want to 
bear in mind:
1. There's probably a "sweet spot" where the chunk size interacts well with 
your data, usage patterns and PGs backend to give you peak performance. 
You'll have to test.
2. If you want to search for a sequence you'll need to deal with the case 
where it starts in one chunk and ends in another.

-- 
  Richard Huxton
  Archonet Ltd

---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org


Re: [SQL] [PERFORM] EXTERNAL storage and substring on long strings

2003-08-04 Thread Tom Lane
Scott Cain <[EMAIL PROTECTED]> writes:
> At least this appears to work and is much faster, completing substring
> operations like above in about 0.27 secs (that's about two orders of
> magnitude improvement!)

I find it really, really hard to believe that a crude reimplementation
in plpgsql of the TOAST concept could beat the built-in implementation
at all, let alone beat it by two orders of magnitude.

Either there's something unrealistic about your testing of the
dna_string function, or your original tests are not causing TOAST to be
invoked in the expected way, or there's a bug we need to fix.  I'd
really like to see some profiling of the poor-performing
external-storage case, so we can figure out what's going on.

regards, tom lane

---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [PERFORM] I can't wait too much: Total runtime 432478.44 msec

2003-08-04 Thread Fernando Papa
Err... you're right... one of us say the same thing when I show the
Volker mail...

-Mensaje original-
De: Manfred Koizar [mailto:[EMAIL PROTECTED] 
Enviado el: lunes, 04 de agosto de 2003 12:17
Para: Fernando Papa
CC: Volker Helm; [EMAIL PROTECTED]
Asunto: Re: [PERFORM] I can't wait too much: Total runtime 432478.44
msec


On Mon, 4 Aug 2003 12:02:46 -0300, "Fernando Papa" <[EMAIL PROTECTED]>
wrote:
>FROM 
>cont_contenido   
>,juegos_config
>,cont_publicacion 
>,(SELECT max(cp1.fecha_publicacion) as max_pub
--change here
> FROM cont_publicacion cp1) a --change here

But this calculates the global maximum, not per id_instalacion,
id_contenido, and generar_vainilla as in

>AND cont_publicacion.fecha_publicacion = (SELECT
max(cp1.fecha_publicacion) 
> FROM cont_publicacion cp1  
> WHERE cp1.id_instalacion =
cont_publicacion.id_instalacion 
>   AND cp1.id_contenido =
cont_publicacion.id_contenido  
>   AND cp1.generar_vainilla =
cont_publicacion.generar_vainilla) 

Servus
 Manfred

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


Re: [PERFORM] EXTERNAL storage and substring on long strings

2003-08-04 Thread Scott Cain
Hello,

Note: there is a SQL question way at the bottom of this narrative :-)

Last week I asked about doing substring operations on very long strings
(>10 million characters).  I was given a suggestion to use EXTERNAL
storage on the column via the ALTER TABLE ... SET STORAGE command.  In
one test case, the performance of substring actually got worse using
EXTERNAL storage.  

In an effort to find the best way to do this operation, I decided to
look at what is my "worst case" scenario: the DNA sequence for human
chromosome 1, which is about 250 million characters long (previous
strings where about 20 million characters long).  I wrote a perl script
to do several substring operations over this very long string, with
substring lengths varying between 1000 and 40,000 characters spread out
over various locations along the string.  While EXTENDED storage won in
this case, it was a hollow victory: 38 seconds per operation versus 40
seconds, both of which are way too long to for an interactive
application.

Time for a new method.  A suggestion from my boss was to "shred" the DNA
into smallish chunks and a column giving offsets from the beginning of
the string, so that it can be reassembled when needed. Here is the test
table:

string=> \d dna
  Table "public.dna"
 Column  |  Type   | Modifiers
-+-+---
 foffset | integer |
 pdna| text|
Indexes: foffset_idx btree (foffset)

In practice, there would also be a foreign key column to give the
identifier of the dna.  Then I wrote the following function (here's the
SQL part promised above):

CREATE OR REPLACE FUNCTION dna_string (integer, integer) RETURNS TEXT AS '
DECLARE
smin ALIAS FOR $1;
smax ALIAS FOR $2;
longdna TEXT := ;
dna_row dna%ROWTYPE;
dnastring   TEXT;
firstchunk  INTEGER;
lastchunk   INTEGER;
in_longdnastart INTEGER;
in_longdnalen   INTEGER;
chunksize   INTEGER;
BEGIN
SELECT INTO chunksize min(foffset) FROM dna WHERE foffset>0;
firstchunk :=  chunksize*(smin/chunksize);
lastchunk  :=  chunksize*(smax/chunksize);

in_longdnastart := smin % chunksize;
in_longdnalen   := smax - smin + 1;

FOR dna_row IN
SELECT * FROM dna
WHERE foffset >= firstchunk AND foffset <= lastchunk
ORDER BY foffset
LOOP

longdna := longdna || dna_row.pdna;
END LOOP;

dnastring := substring(longdna FROM in_longdnastart FOR in_longdnalen);

RETURN dnastring;
END;
' LANGUAGE 'plpgsql';

So here's the question: I've never written a plpgsql function before, so
I don't have much experience with it; is there anything obviously wrong
with this function, or are there things that could be done better?  At
least this appears to work and is much faster, completing substring
operations like above in about 0.27 secs (that's about two orders of
magnitude improvement!)

Thanks,
Scott


-- 

Scott Cain, Ph. D. [EMAIL PROTECTED]
GMOD Coordinator (http://www.gmod.org/) 216-392-3087
Cold Spring Harbor Laboratory


---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [PERFORM] I can't wait too much: Total runtime 432478.44 msec

2003-08-04 Thread Manfred Koizar
On Mon, 4 Aug 2003 12:02:46 -0300, "Fernando Papa" <[EMAIL PROTECTED]>
wrote:
>FROM 
>cont_contenido   
>,juegos_config
>,cont_publicacion 
>,(SELECT max(cp1.fecha_publicacion) as max_pub
>  --change here
> FROM cont_publicacion cp1) a --change here

But this calculates the global maximum, not per id_instalacion,
id_contenido, and generar_vainilla as in

>AND cont_publicacion.fecha_publicacion = (SELECT 
> max(cp1.fecha_publicacion) 
> FROM cont_publicacion cp1  
> WHERE cp1.id_instalacion = 
> cont_publicacion.id_instalacion 
>   AND cp1.id_contenido = 
> cont_publicacion.id_contenido  
>   AND cp1.generar_vainilla = 
> cont_publicacion.generar_vainilla) 

Servus
 Manfred

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])


Re: [PERFORM] I can't wait too much: Total runtime 432478.44 msec

2003-08-04 Thread Fernando Papa
Title: Mensaje



Hi 
Volker!!! I think you're right. Look at times:
   
QUERY 
PLAN 
--- Limit  
(cost=23.37..23.37 rows=1 width=487) (actual time=2245.61..2245.61 rows=0 
loops=1)   ->  Sort  (cost=23.37..23.37 rows=1 
width=487) (actual time=2245.60..2245.60 rows=0 
loops=1) Sort Key: 
cont_publicacion.fecha_publicacion 
->  Nested Loop  (cost=23.33..23.36 rows=1 width=487) (actual 
time=2244.10..2244.10 rows=0 
loops=1)   
Join Filter: ("outer".fecha_publicacion = 
"inner".max_pub)   
->  Merge Join  (cost=9.73..9.74 rows=1 width=479) (actual 
time=918.73..1988.43 rows=16 
loops=1) 
Merge Cond: (("outer".id_instalacion = "inner".id_instalacion) AND 
("outer".id_contenido = 
"inner".id_contenido)) 
->  Nested Loop  (cost=0.00..409.35 rows=1 width=367) (actual 
time=35.44..1967.20 rows=82 
loops=1)   
Join Filter: (("inner".id_contenido = "outer".id_contenido) AND 
("inner".id_instalacion = 
"outer".id_instalacion))   
->  Index Scan using jue_conf_pk on juegos_config  
(cost=0.00..12.19 rows=40 width=332) (actual time=0.42..6.73 rows=40 
loops=1)   
->  Index Scan using idx_generar_vainilla_ci on cont_publicacion  
(cost=0.00..9.90 rows=2 width=35) (actual time=0.20..35.19 rows=447 
loops=40) 
Index Cond: (upper((generar_vainilla)::text) = 
'S'::text) 
->  Sort  (cost=9.73..9.74 rows=3 width=112) (actual 
time=10.42..10.48 rows=15 
loops=1)   
Sort Key: cont_contenido.id_instalacion, 
cont_contenido.id_contenido   
->  Seq Scan on cont_contenido  (cost=0.00..9.70 rows=3 width=112) 
(actual time=0.57..8.11 rows=8 
loops=1) 
Filter: ((id_instalacion = 2::numeric) AND (id_sbc = 619::numeric) AND (id_tipo 
= 
2::numeric))   
->  Subquery Scan a  (cost=13.60..13.60 rows=1 width=8) (actual 
time=15.89..15.90 rows=1 
loops=16) 
->  Aggregate  (cost=13.60..13.60 rows=1 width=8) (actual 
time=15.87..15.88 rows=1 
loops=16)   
->  Seq Scan on cont_publicacion cp1  (cost=0.00..12.48 rows=448 
width=8) (actual time=0.05..11.62 rows=448 loops=16) Total runtime: 
2250.92 msec(20 rows)
The 
problem was the subquery, no doubt.
 

  
  -Mensaje original-De: Volker Helm 
  [mailto:[EMAIL PROTECTED] Enviado el: lunes, 04 de agosto de 2003 
  11:45Para: Fernando PapaAsunto: AW: [PERFORM] I can't 
  wait too much: Total runtime 432478.44 msec
  Hi,
   
  just use the subquery as inline-View an join the 
  tables:
   
  SELECT  
  cont_contenido.id_contenido ,cont_contenido.pertenece_premium ,cont_contenido.Titulo_esp 
  as v_sufix 
   ,cont_contenido.url_contenido ,cont_contenido.tipo_acceso ,cont_contenido.id_sbc ,cont_contenido.cant_vistos ,cont_contenido.cant_votos 
   ,cont_contenido.puntaje_total 
   ,cont_contenido.id_contenido_padre 
   ,juegos_config.imagen_tapa_especial 
   ,juegos_config.info_general_esp as info_general 
   ,juegos_config.ayuda  ,juegos_config.tips_tricks_esp as 
  tips_tricks  ,juegos_config.mod_imagen_tapa_especial 
   ,cont_publicacion.fecha_publicacion as fecha_publicacion 
   ,cont_publicacion.generar_Vainilla  FROM 
   cont_contenido   
   ,juegos_config     ,cont_publicacion 
  
   ,(SELECT max(cp1.fecha_publicacion) as 
  max_pub 
  --change 
  here  
  FROM cont_publicacion cp1) a --change 
  hereWHERE 
   cont_contenido.id_instalacion    
  = 2 AND 
  cont_contenido.id_sbc   
  = 619 AND 
  cont_contenido.id_tipo   
  = 2 AND cont_contenido.id_instalacion  = 
  juegos_config.id_instalacion  AND 
  cont_contenido.id_contenido   = 
  juegos_config.id_contenido 
   AND upper(cont_publicacion.generar_Vainilla) = 'S' AND 
  cont_publicacion.id_instalacion = cont_contenido.id_instalacion  AND 
  cont_publicacion.id_contenido = cont_contenido.id_contenido  AND 
  cont_publicacion.fecha_publicacion 
  = a.max_pub  
  --  change here  ORDER BY  
  cont_publicacion.fecha_publicacion desc
   
  hope it helps,
   
  Volker Helm
  
-Ursprüngliche Nachricht-Von: 
[EMAIL PROTECTED] 
[mailto:[EMAIL PROTECTED]Im Auftrag von 
Fernando PapaGesendet: Freitag, 1. August 2003 
23:17An: [EMAIL PROTECTED]Betreff: 
[PERFORM] I can't wait too much: Total runtime 432478.44 
msec
 
Hi 
all!
Really I don't 
know what happened with this query. I'm running PG 7.3.1 on solaris, 
vacc

Re: [PERFORM] I can't wait too much: Total runtime 432478.44 msec

2003-08-04 Thread Fernando Papa

I create the index, but doesn't help too much:

   QUERY 
PLAN
-
 Limit  (cost=9.75..9.76 rows=1 width=479) (actual time=486421.35..486421.38 rows=8 
loops=1)
   ->  Sort  (cost=9.75..9.76 rows=1 width=479) (actual time=486421.33..486421.34 
rows=8 loops=1)
 Sort Key: cont_publicacion.fecha_publicacion
 ->  Merge Join  (cost=9.73..9.74 rows=1 width=479) (actual 
time=220253.76..486420.35 rows=8 loops=1)
   Merge Cond: (("outer".id_instalacion = "inner".id_instalacion) AND 
("outer".id_contenido = "inner".id_contenido))
   ->  Nested Loop  (cost=0.00..1828.35 rows=1 width=367) (actual 
time=8347.78..486405.02 rows=40 loops=1)
 Join Filter: (("inner".id_contenido = "outer".id_contenido) AND 
("inner".id_instalacion = "outer".id_instalacion))
 ->  Index Scan using jue_conf_pk on juegos_config  
(cost=0.00..12.19 rows=40 width=332) (actual time=0.23..6.73 rows=40 loops=1)
 ->  Index Scan using idx_generar_vainilla_ci on cont_publicacion  
(cost=0.00..45.39 rows=1 width=35) (actual time=56.01..12156.48 rows=97 loops=40)
   Index Cond: (upper((generar_vainilla)::text) = 'S'::text)
   Filter: (fecha_publicacion = (subplan))
   SubPlan
 ->  Aggregate  (cost=15.84..15.84 rows=1 width=8) (actual 
time=27.03..27.03 rows=1 loops=17880)
   ->  Seq Scan on cont_publicacion cp1  
(cost=0.00..15.84 rows=1 width=8) (actual time=11.21..26.86 rows=7 loops=17880)
 Filter: ((id_instalacion = $0) AND 
(id_contenido = $1) AND (generar_vainilla = $2))
   ->  Sort  (cost=9.73..9.74 rows=3 width=112) (actual time=9.28..9.32 
rows=8 loops=1)
 Sort Key: cont_contenido.id_instalacion, 
cont_contenido.id_contenido
 ->  Seq Scan on cont_contenido  (cost=0.00..9.70 rows=3 
width=112) (actual time=0.47..7.48 rows=8 loops=1)
   Filter: ((id_instalacion = 2::numeric) AND (id_sbc = 
619::numeric) AND (id_tipo = 2::numeric))
 Total runtime: 486445.19 msec
(20 rows)


-Mensaje original-
De: Mendola Gaetano [mailto:[EMAIL PROTECTED] 
Enviado el: sábado, 02 de agosto de 2003 7:36
Para: [EMAIL PROTECTED]
CC: Fernando Papa
Asunto: Re: I can't wait too much: Total runtime 432478.44 msec


From: ""Fernando Papa"" <[EMAIL PROTECTED]>

> AND upper(cont_publicacion.generar_Vainilla) = 'S'


>  Filter: ((upper((generar_vainilla)::text) = 'S'::text) AND
(fecha_publicacion = (subplan)))

using a functional index on this field should help

create index idx_generar_vainilla_ci on cont_publicacion (
upper(generar_Vainilla) )



Regards
Gaetano Mendola



---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])


Re: [PERFORM] I can't wait too much: Total runtime 432478.44 msec

2003-08-04 Thread Fernando Papa

Hi Josh... a little worse time:

EXPLAIN ANALYZE 
SELECT
 cont_contenido.id_contenido
 ,cont_contenido.pertenece_premium
 ,cont_contenido.Titulo_esp as v_sufix 
 ,cont_contenido.url_contenido
 ,cont_contenido.tipo_acceso
 ,cont_contenido.id_sbc
 ,cont_contenido.cant_vistos
 ,cont_contenido.cant_votos 
 ,cont_contenido.puntaje_total 
 ,cont_contenido.id_contenido_padre 
 ,juegos_config.imagen_tapa_especial 
 ,juegos_config.info_general_esp as info_general 
 ,juegos_config.ayuda 
 ,juegos_config.tips_tricks_esp as tips_tricks 
 ,juegos_config.mod_imagen_tapa_especial 
 ,cont_publicacion.fecha_publicacion as fecha_publicacion 
 ,cont_publicacion.generar_Vainilla 
 FROM 
 cont_contenido   
 ,juegos_config
,cont_publicacion 
 WHERE 
 cont_contenido.id_instalacion= 2
 AND cont_contenido.id_sbc   = 619
 AND cont_contenido.id_tipo   = 2
 AND cont_contenido.id_instalacion  = juegos_config.id_instalacion 
 AND cont_contenido.id_contenido   = juegos_config.id_contenido 
 AND upper(cont_publicacion.generar_Vainilla) = 'S'
 AND cont_publicacion.id_instalacion = cont_contenido.id_instalacion 
 AND cont_publicacion.id_contenido = cont_contenido.id_contenido 
 AND EXISTS (SELECT max(cp1.fecha_publicacion)
 FROM cont_publicacion cp1
WHERE cp1.id_instalacion = cont_publicacion.id_instalacion
AND cp1.id_contenido = cont_publicacion.id_contenido
AND cp1.generar_vainilla = cont_publicacion.generar_vainilla
   HAVING max(cp1.fecha_publicacion) =
cont_publicacion.fecha_publicacion)
ORDER BY  cont_publicacion.fecha_publicacion desc 
 LIMIT 10
OFFSET 0
;


 
QUERY PLAN



-
 Limit  (cost=9.75..9.76 rows=1 width=479) (actual
time=449760.88..449760.91 rows=8 loops=1)
   ->  Sort  (cost=9.75..9.76 rows=1 width=479) (actual
time=449760.87..449760.88 rows=8 loops=1)
 Sort Key: cont_publicacion.fecha_publicacion
 ->  Merge Join  (cost=9.73..9.74 rows=1 width=479) (actual
time=202257.20..449759.00 rows=8 loops=1)
   Merge Cond: (("outer".id_instalacion =
"inner".id_instalacion) AND ("outer".id_contenido =
"inner".id_contenido))
   ->  Nested Loop  (cost=0.00..284556.86 rows=1 width=367)
(actual time=7794.28..449741.85 rows=40 loops=1)
 Join Filter: (("inner".id_contenido =
"outer".id_contenido) AND ("inner".id_instalacion =
"outer".id_instalacion))
 ->  Index Scan using jue_conf_pk on juegos_config
(cost=0.00..12.19 rows=40 width=332) (actual time=0.43..8.12 rows=40
loops=1)
 ->  Seq Scan on cont_publicacion
(cost=0.00..7113.60 rows=1 width=35) (actual time=24.10..11239.67
rows=97 loops=40)
   Filter: ((upper((generar_vainilla)::text) =
'S'::text) AND (subplan))
   SubPlan
 ->  Aggregate  (cost=15.85..15.85 rows=1
width=8) (actual time=25.03..25.03 rows=0 loops=17880)
   Filter: (max(fecha_publicacion) = $3)
   ->  Seq Scan on cont_publicacion cp1
(cost=0.00..15.84 rows=1 width=8) (actual time=10.51..24.85 rows=7
loops=17880)
 Filter: ((id_instalacion = $0)
AND (id_contenido = $1) AND (generar_vainilla = $2))
   ->  Sort  (cost=9.73..9.74 rows=3 width=112) (actual
time=10.49..10.52 rows=8 loops=1)
 Sort Key: cont_contenido.id_instalacion,
cont_contenido.id_contenido
 ->  Seq Scan on cont_contenido  (cost=0.00..9.70
rows=3 width=112) (actual time=0.59..8.07 rows=8 loops=1)
   Filter: ((id_instalacion = 2::numeric) AND
(id_sbc = 619::numeric) AND (id_tipo = 2::numeric))
 Total runtime: 449765.69 msec
(20 rows)



-Mensaje original-
De: Josh Berkus [mailto:[EMAIL PROTECTED] 
Enviado el: viernes, 01 de agosto de 2003 18:32
Para: Christopher Browne; Fernando Papa
CC: [EMAIL PROTECTED]
Asunto: Re: [PERFORM] I can't wait too much: Total runtime 432478.44
msec


Fernando,

>  AND cont_publicacion.fecha_publicacion = (SELECT
max(cp1.fecha_publicacion) 
>   FROM cont_publicacion cp1  
>   WHERE cp1.id_instalacion =
cont_publicacion.id_instalacion 
> AND cp1.id_contenido =
cont_publicacion.id_contenido  
> AND cp1.generar_vainilla =
cont_publicacion.generar_vainilla) 

Or event changing it to:

AND EXISTS (SELECT max(cp1.fecha_publicacion)
FROM cont_publicacion cp1
WHERE cp1.id_instalacion = cont_publicacion.id_instalacion 
AND cp1.id_contenido = cont_publicacion.id_contenido  
AND cp1.generar_vainilla = cont_publicacion.generar_vainilla
HAVING max(cp1.fecha_publicacion) =
cont_publicacion.fecha_publicacion)
-- 
-Josh Berkus
 Aglio Database Solutions
 San Francisco

Re: [PERFORM] I can't wait too much: Total runtime 432478.44 msec

2003-08-04 Thread Fernando Papa

Sorry Chris... a little slower...

esdc=> EXPLAIN ANALYZE 
SELECT
  cont_contenido.id_contenido
 ,cont_contenido.pertenece_premium
 ,cont_contenido.Titulo_esp as v_sufix 
 ,cont_contenido.url_contenido
 ,cont_contenido.tipo_acceso
 ,cont_contenido.id_sbc
 ,cont_contenido.cant_vistos
 ,cont_contenido.cant_votos 
 ,cont_contenido.puntaje_total 
 ,cont_contenido.id_contenido_padre 
 ,juegos_config.imagen_tapa_especial 
 ,juegos_config.info_general_esp as info_general 
 ,juegos_config.ayuda 
 ,juegos_config.tips_tricks_esp as tips_tricks 
 ,juegos_config.mod_imagen_tapa_especial 
 ,cont_publicacion.fecha_publicacion as fecha_publicacion 
 ,cont_publicacion.generar_Vainilla 
FROM 
 cont_contenido   
 ,juegos_config
 ,cont_publicacion 
WHERE 
 cont_contenido.id_instalacion= 2
 AND cont_contenido.id_sbc   = 619
 AND cont_contenido.id_tipo   = 2
 AND cont_contenido.id_instalacion  = juegos_config.id_instalacion 
 AND cont_contenido.id_contenido   = juegos_config.id_contenido   
 AND upper(cont_publicacion.generar_Vainilla) = 'S'
 AND cont_publicacion.id_instalacion = cont_contenido.id_instalacion 
 AND cont_publicacion.id_contenido = cont_contenido.id_contenido 
 AND cont_publicacion.fecha_publicacion = (SELECT cp1.fecha_publicacion
FROM cont_publicacion cp1
 WHERE cp1.id_instalacion = cont_publicacion.id_instalacion
 AND cp1.id_contenido = cont_publicacion.id_contenido
 AND cp1.generar_vainilla = cont_publicacion.generar_vainilla
 ORDER BY fecha_publicacion LIMIT 1)
ORDER BY  cont_publicacion.fecha_publicacion desc 
 LIMIT 10
 OFFSET 0
 ;

  QUERY PLAN   

---
 Limit  (cost=9.75..9.76 rows=1 width=479) (actual time=465085.25..465085.27 rows=8 
loops=1)
   ->  Sort  (cost=9.75..9.76 rows=1 width=479) (actual time=465085.23..465085.24 
rows=8 loops=1)
 Sort Key: cont_publicacion.fecha_publicacion
 ->  Merge Join  (cost=9.73..9.74 rows=1 width=479) (actual 
time=210743.83..465083.31 rows=8 loops=1)
   Merge Cond: (("outer".id_instalacion = "inner".id_instalacion) AND 
("outer".id_contenido = "inner".id_contenido))
   ->  Nested Loop  (cost=0.00..284756.79 rows=1 width=367) (actual 
time=8319.87..464981.68 rows=40 loops=1)
 Join Filter: (("inner".id_contenido = "outer".id_contenido) AND 
("inner".id_instalacion = "outer".id_instalacion))
 ->  Index Scan using jue_conf_pk on juegos_config  
(cost=0.00..12.19 rows=40 width=332) (actual time=52.93..142.31 rows=40 loops=1)
 ->  Seq Scan on cont_publicacion  (cost=0.00..7118.60 rows=1 
width=35) (actual time=51.79..11617.12 rows=97 loops=40)
   Filter: ((upper((generar_vainilla)::text) = 'S'::text) AND 
(fecha_publicacion = (subplan)))
   SubPlan
 ->  Limit  (cost=15.85..15.85 rows=1 width=8) (actual 
time=25.86..25.86 rows=1 loops=17880)
   ->  Sort  (cost=15.85..15.86 rows=1 width=8) 
(actual time=25.82..25.82 rows=2 loops=17880)
 Sort Key: fecha_publicacion
 ->  Seq Scan on cont_publicacion cp1  
(cost=0.00..15.84 rows=1 width=8) (actual time=10.68..25.32 rows=7 loops=17880)
   Filter: ((id_instalacion = $0) AND 
(id_contenido = $1) AND (generar_vainilla = $2))
   ->  Sort  (cost=9.73..9.74 rows=3 width=112) (actual time=94.91..94.93 
rows=8 loops=1)
 Sort Key: cont_contenido.id_instalacion, 
cont_contenido.id_contenido
 ->  Seq Scan on cont_contenido  (cost=0.00..9.70 rows=3 
width=112) (actual time=21.70..92.96 rows=8 loops=1)
   Filter: ((id_instalacion = 2::numeric) AND (id_sbc = 
619::numeric) AND (id_tipo = 2::numeric))
 Total runtime: 465088.66 msec
(21 rows)



-Mensaje original-
De: Christopher Browne [mailto:[EMAIL PROTECTED] 
Enviado el: viernes, 01 de agosto de 2003 18:27
Para: Fernando Papa
CC: [EMAIL PROTECTED]
Asunto: Re: [PERFORM] I can't wait too much: Total runtime 432478.44 msec



I'd point at the following as being a sterling candidate for being a cause of this 
being slow...

 AND cont_publicacion.fecha_publicacion = (SELECT max(cp1.fecha_publicacion) 
  FROM cont_publicacion cp1  
  WHERE cp1.id_instalacion = cont_publicacion.id_instalacion 
    AND cp1.id_contenido = cont_publicacion.id_contenido  
    AND cp1.generar_vainilla = cont_publicacion.generar_vainilla) 

May I suggest changing it to:

 AND cont_publicacion.fecha_p

Re: [PERFORM] I can't wait too much: Total runtime 432478.44 msec

2003-08-04 Thread Manfred Koizar
On Fri, 1 Aug 2003 18:17:17 -0300, "Fernando Papa" <[EMAIL PROTECTED]>
wrote:
> AND cont_publicacion.fecha_publicacion = (SELECT
>max(cp1.fecha_publicacion) 
>  FROM cont_publicacion cp1  
>  WHERE cp1.id_instalacion =
>cont_publicacion.id_instalacion 
>AND cp1.id_contenido = cont_publicacion.id_contenido
>
>AND cp1.generar_vainilla =
>cont_publicacion.generar_vainilla) 

If certain uniqueness conditions are met, the Postgres specific
DISTINCT ON clause could help totally eliminating the subselect:

SELECT DISTINCT ON (
cp.id_instalacion,
cp.id_contenido,
cp.generar_vainilla,
cp.fecha_publicacion
   )
  cc.id_contenido
 ,cc.pertenece_premium
 ,cc.Titulo_esp as v_sufix 
 ,cc.url_contenido
 ,cc.tipo_acceso
 ,cc.id_sbc
 ,cc.cant_vistos
 ,cc.cant_votos 
 ,cc.puntaje_total 
 ,cc.id_contenido_padre 
 ,jc.imagen_tapa_especial 
 ,jc.info_general_esp as info_general 
 ,jc.ayuda 
 ,jc.tips_tricks_esp as tips_tricks 
 ,jc.mod_imagen_tapa_especial 
 ,cp.fecha_publicacion as fecha_publicacion 
 ,cp.generar_Vainilla 
 FROM 
 cont_contenido cc
 ,juegos_config jc
 ,cont_publicacion cp
WHERE 
 cc.id_instalacion= 2
 AND cc.id_sbc   = 619
 AND cc.id_tipo   = 2
 AND cc.id_instalacion  = jc.id_instalacion 
 AND cc.id_contenido   = jc.id_contenido 
 AND upper(cp.generar_Vainilla) = 'S'
 AND cp.id_instalacion = cc.id_instalacion 
 AND cp.id_contenido = cc.id_contenido 
 ORDER BY  
   cp.id_instalacion,
   cp.id_contenido,
   cp.generar_vainilla,
   cp.fecha_publicacion desc

However, this doesn't get the result in the original order, so you
have to wrap another SELECT ... ORDER BY ... LIMIT around it.  Or try
to move the subselect into the FROM clause:

SELECT
  cc.id_contenido
 ,cc.pertenece_premium
 ,cc.Titulo_esp as v_sufix 
 ,cc.url_contenido
 ,cc.tipo_acceso
 ,cc.id_sbc
 ,cc.cant_vistos
 ,cc.cant_votos 
 ,cc.puntaje_total 
 ,cc.id_contenido_padre 
 ,jc.imagen_tapa_especial 
 ,jc.info_general_esp as info_general 
 ,jc.ayuda 
 ,jc.tips_tricks_esp as tips_tricks 
 ,jc.mod_imagen_tapa_especial 
 ,cp.fecha_publicacion as fecha_publicacion 
 ,cp.generar_Vainilla 
 FROM 
 cont_contenido cc
 ,juegos_config jc
 ,(SELECT DISTINCT ON (
id_instalacion,
id_contenido,
generar_vainilla,
fecha_publicacion
   )
  *
 FROM cont_publicacion
ORDER BY
   id_instalacion,
   id_contenido,
   generar_vainilla,
   fecha_publicacion desc
  ) AS cp
WHERE 
 cc.id_instalacion= 2
 AND cc.id_sbc   = 619
 AND cc.id_tipo   = 2
 AND cc.id_instalacion  = jc.id_instalacion 
 AND cc.id_contenido   = jc.id_contenido 
 AND upper(cp.generar_Vainilla) = 'S'
 AND cp.id_instalacion = cc.id_instalacion 
 AND cp.id_contenido = cc.id_contenido 
 ORDER BY  
   cp.fecha_publicacion desc
 LIMIT 10
 OFFSET 0
 
[completely untested]

Servus
 Manfred

---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings


Re: [PERFORM] OSDL Database Test Suite 3 is available on PostgreSQL

2003-08-04 Thread Manfred Koizar
On 01 Aug 2003 11:04:10 -0700, Jenny Zhang <[EMAIL PROTECTED]> wrote:
>A sample OSDL-DBT3 test result report can be found at:
>http://khack.osdl.org/stp/276912/
>
>Your comments are welcome,

| effective_cache_size   | 1000

With 4GB of memory this is definitely too low and *can* (note that I
don't say *must*) lead the planner to wrong decisions.

| shared_buffers | 15200

... looks reasonable.  Did you test with other values?

| sort_mem   | 524288

This is a bit high, IMHO, but might be ok given that DBT3 is not run
with many concurrent sessions (right?).
http://khack.osdl.org/stp/276912/results/plot/vmstat_swap.png shows
some swapping activity towards the end of the run which could be
caused by a too high sort_mem setting.

Servus
 Manfred

---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org