Re: [PERFORM] I can't wait too much: Total runtime 432478.44 msec
On Mon, 04 Aug 2003 16:10:18 +0200, I wrote: >SELECT DISTINCT ON ( >cp.id_instalacion, >cp.id_contenido, >cp.generar_vainilla, >cp.fecha_publicacion > ) Cut'n'paste error! fecha_publicacion should not be in the DISTINCT ON list. The same error is in my second suggestion (FROM (subselect)). Servus Manfred ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faqs/FAQ.html
Re: [osdldbt-general] Re: [PERFORM] OSDL Database Test Suite 3 is
On 4 Aug 2003, Jenny Zhang wrote: > On Mon, 2003-08-04 at 06:33, Manfred Koizar wrote: > > | effective_cache_size | 1000 > > > > With 4GB of memory this is definitely too low and *can* (note that I > > don't say *must*) lead the planner to wrong decisions. > > > I changed the default to effective_cache_size=393216 as calculated by > Scott. Another way to check the execution plan is to go to the results > dir: > http://khack.osdl.org/stp/276917/results > There is a 'power_plan.out' file to record the execution plan. I am > running a test with the changed effective_cache_size, I will see how it > affect the plan. > > > | shared_buffers | 15200 > > > > ... looks reasonable. Did you test with other values? > I have only one with shared_buffers=120 at: > http://khack.osdl.org/stp/276847/ > The performance degraded. Well, that's truly huge, even for a machine with lots-o-ram. Most tests find that once the shared_buffers are big enough to use more than about 25 to 33% of RAM, they're too big, as you get little return. > > | sort_mem | 524288 > > > > This is a bit high, IMHO, but might be ok given that DBT3 is not run > > with many concurrent sessions (right?). > > http://khack.osdl.org/stp/276912/results/plot/vmstat_swap.png shows > > some swapping activity towards the end of the run which could be > > caused by a too high sort_mem setting. > Right, I run only 4 streams. Setting this parameter lower caused more > reading/writing to the pgsql/tmp. I guess the database has to do it if > it can not do sorting in memory. Note that IF your sortmem really is 1/2 gig, then you'll likely have LESS than 3 gigs left for OS system cache. About how big does top show buff and cached to be on that box under load? Not that it's a big deal if you get the effective cache size off by a little bit, it's more of a rubber mallet setting than a jeweler's screw driver setting. Thanks a bunch for all the great testing. It's a very nice tool to have for convincing the bosses to go with Postgresql. ---(end of broadcast)--- TIP 8: explain analyze is your friend
Re: [osdldbt-general] Re: [PERFORM] OSDL Database Test Suite 3 is
On Mon, 2003-08-04 at 09:39, Shridhar Daithankar wrote: > On 4 Aug 2003 at 15:33, Manfred Koizar wrote: > > > On 01 Aug 2003 11:04:10 -0700, Jenny Zhang <[EMAIL PROTECTED]> wrote: > > >A sample OSDL-DBT3 test result report can be found at: > > >http://khack.osdl.org/stp/276912/ > > > > > >Your comments are welcome, > > I could not get postgresql .conf so I will combine the comments. > > 1. Effective cache size, already mentioned > 2. Sort memory already mentioned. > 3. Was WAL put on different drive? > 4. Can you try with autovacuum daemon and 7.4beta when it comes out.. > 5. What was the file system? Ext2/Ext3/reiser/XFS? > > > > Is there any comparison available for other databases.. Could be interesting to > see..:-) OSDL has run workloads using the SAP DB and PostgreSQL. However, each of the workloads have been tweaked to work around deficiencies of each database with respect to the TPC benchmarks from with the DBT workloads are derrived. Since there are extensive modifications to each workload an that the fact that each workload operates under different situations (SAP uses raw disk, PostgreSQL uses a file system), it is not beneficial to compare numbers between different databases. Remember, the intent is to provide a tool kit that can be used to benefit the community. From other postings, it appears that these workloads we have available can be used to help the PostgreSQl community develop a better database; one that is better able to handle the kinds of stress these workloads can produce when scaled to large database sizes. We have been using these kits to characterize the abilities of the Linux kernel. To show that these workloads work with two different databases implies that Linux is capable of supporting these two databases. The other tool kits, by the way, are being ported to PostgreSQL as well. Help is needed to tune the workloads to exercise PostgreSQL better. It would be great if you could get involved with the porting efforts and assist with the tuning of the PostgreSQL kit. > > > > Thanks for the good work. I understand it must have been quite an effort to run > it.. > > Keep it up.. > > Bye > Shridhar > > -- > Fourth Law of Revision: It is usually impractical to worry beforehand about > > interferences -- if you have none, someone will make one for you. > > > > --- > This SF.Net email sponsored by: Free pre-built ASP.NET sites including > Data Reports, E-commerce, Portals, and Forums are available now. > Download today and enter to win an XBOX or Visual Studio .NET. > http://aspnet.click-url.com/go/psa0013ave/direct;at.aspnet_072303_01/01 > ___ > osdldbt-general mailing list > [EMAIL PROTECTED] > https://lists.sourceforge.net/lists/listinfo/osdldbt-general -- Craig Thomas [EMAIL PROTECTED] ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
Re: [osdldbt-general] Re: [PERFORM] OSDL Database Test Suite 3 is
Thanks all for your feedback. I think I should explain more about how to use this test kit. The main purpose of putting the test kit on Scalability Test Platform(STP) is that testers can run the workload against the database with different parameters and Linux kernels to see performance differences. Though the test kit picks up default parameters if they are not provided, the command line parameters overwrite the default ones. Currently, the following parameters are supported: -s -n -d '' -r <{0|1}> -x <{0|1}> where: -s is tpc-h database scale factor, right now, only SF=1 is available. -n is the number of throughput test streams, which corresponds number of simultaneous database connections during throughput test. -d is the database parameters used when starting postmaster. for example: -B 12 -c effective_cache_size=393216 -c sort_mem=524288 -c stats_command_string=true -c stats_row_level=true -c stats_block_level=true -r {0|1}: indicates if the database dir base//pgsql_tmp is put on a separate disk drive -x {0|1}: indicates if the WAL is put on a separate disk drive. The other comments are in-lined: On Mon, 2003-08-04 at 06:33, Manfred Koizar wrote: > | effective_cache_size | 1000 > > With 4GB of memory this is definitely too low and *can* (note that I > don't say *must*) lead the planner to wrong decisions. > I changed the default to effective_cache_size=393216 as calculated by Scott. Another way to check the execution plan is to go to the results dir: http://khack.osdl.org/stp/276917/results There is a 'power_plan.out' file to record the execution plan. I am running a test with the changed effective_cache_size, I will see how it affect the plan. > | shared_buffers | 15200 > > ... looks reasonable. Did you test with other values? I have only one with shared_buffers=120 at: http://khack.osdl.org/stp/276847/ The performance degraded. > > | sort_mem | 524288 > > This is a bit high, IMHO, but might be ok given that DBT3 is not run > with many concurrent sessions (right?). > http://khack.osdl.org/stp/276912/results/plot/vmstat_swap.png shows > some swapping activity towards the end of the run which could be > caused by a too high sort_mem setting. Right, I run only 4 streams. Setting this parameter lower caused more reading/writing to the pgsql/tmp. I guess the database has to do it if it can not do sorting in memory. On 4 Aug 2003 at 15:33, Manfred Koizar wrote: > > I could not get postgresql .conf so I will combine the comments. It is under database monitor data: database parameters > > 1. Effective cache size, already mentioned > 2. Sort memory already mentioned. > 3. Was WAL put on different drive? That run did not put WAL on different drive. I changed it this morning so that it is configurable. Also I changed the result page so that the testers can tell from the result page. > 4. Can you try with autovacuum daemon and 7.4beta when it comes out.. I'd be happy to run it. We would like to improve out Patch Life Management(PLM) system so that it can accept PG patches and run performance tests on those patches. Right now PLM only manages Linux Kernel patches. I would like to ask the PostgreSQL community if this kind of tools is of interest. > 5. What was the file system? Ext2/Ext3/reiser/XFS? > > > It is Ext2. Yeah, it is not reported on the page. > Is there any comparison available for other databases.. Could be interesting to > see..:-) > > > Let me know if you have any suggestions about how to improve the test kit (parameters, reported information, etc.), or how to make it more useful to PG community. Thanks, -- Jenny Zhang Open Source Development Lab Inc 12725 SW Millikan Way Suite 400 Beaverton, OR 97005 (503)626-2455 ext 31 ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
Re: [PERFORM] I can't wait too much: Total runtime 432478.44 msec
"Fernando Papa" <[EMAIL PROTECTED]> writes: > Thanks Tom. I vaccumed full every night. Now I drop function index and > change the upper. Nothing change (I know, total time rise because we are > doing other things on database now). > -> Seq Scan on cont_publicacion > (cost=0.00..5329.47 rows=10 width=35) (actual time=41.74..18004.75 > rows=97 loops=40) >Filter: (((generar_vainilla = 'S'::character > varying) OR (generar_vainilla = 's'::character varying)) AND > (fecha_publicacion = (subplan))) >SubPlan > -> Aggregate (cost=11.86..11.86 rows=1 > width=8) (actual time=40.15..40.15 rows=1 loops=17880) Something fishy going on here. Why did it switch to a seqscan, considering it still (mistakenly) thinks there are only going to be 10 or 20 rows matching the generar_vainilla condition? How many rows have generar_vainilla equal to 's' or 'S', anyway? In any case, the real problem is to get rid of the subselect at the Now that I look at your original query, I see that what you really seem to be after is the publications with latest pub date among each group with identical id_instalacion, id_contenido, and generar_vainilla. You would probably do well to reorganize the query using SELECT DISTINCT ON, viz SELECT * FROM (SELECT DISTINCT ON (id_instalacion, id_contenido, generar_vainilla) ... FROM ... WHERE ... ORDER BY id_instalacion, id_contenido, generar_vainilla, fecha_publicacion DESC) AS ss ORDER BY fecha_publicacion desc LIMIT 10 OFFSET 0 See the "weather reports" example in the SELECT reference page for motivation. regards, tom lane ---(end of broadcast)--- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] I can't wait too much: Total runtime 432478.44 msec
Thanks Tom. I vaccumed full every night. Now I drop function index and change the upper. Nothing change (I know, total time rise because we are doing other things on database now). But you can see, if was any performace gain i didn't see. Actually I get better results when I disable nested loops or disable merge joins, as I write in a older post. Thanks! Limit (cost=9.76..9.76 rows=1 width=479) (actual time=720480.00..720480.03 rows=8 loops=1) -> Sort (cost=9.76..9.76 rows=1 width=479) (actual time=720479.99..720480.00 rows=8 loops=1) Sort Key: cont_publicacion.fecha_publicacion -> Merge Join (cost=9.73..9.75 rows=1 width=479) (actual time=323197.81..720477.96 rows=8 loops=1) Merge Cond: (("outer".id_instalacion = "inner".id_instalacion) AND ("outer".id_contenido = "inner".id_contenido)) -> Nested Loop (cost=0.00..213197.04 rows=4 width=367) (actual time=12136.55..720425.66 rows=40 loops=1) Join Filter: (("inner".id_contenido = "outer".id_contenido) AND ("inner".id_instalacion = "outer".id_instalacion)) -> Index Scan using jue_conf_pk on juegos_config (cost=0.00..12.19 rows=40 width=332) (actual time=34.13..92.02 rows=40 loops=1) -> Seq Scan on cont_publicacion (cost=0.00..5329.47 rows=10 width=35) (actual time=41.74..18004.75 rows=97 loops=40) Filter: (((generar_vainilla = 'S'::character varying) OR (generar_vainilla = 's'::character varying)) AND (fecha_publicacion = (subplan))) SubPlan -> Aggregate (cost=11.86..11.86 rows=1 width=8) (actual time=40.15..40.15 rows=1 loops=17880) -> Index Scan using cont_pub_gen_vainilla on cont_publicacion cp1 (cost=0.00..11.86 rows=1 width=8) (actual time=16.89..40.01 rows=7 loops=17880) Index Cond: (generar_vainilla = $2) Filter: ((id_instalacion = $0) AND (id_contenido = $1)) -> Sort (cost=9.73..9.74 rows=3 width=112) (actual time=30.96..31.00 rows=8 loops=1) Sort Key: cont_contenido.id_instalacion, cont_contenido.id_contenido -> Seq Scan on cont_contenido (cost=0.00..9.70 rows=3 width=112) (actual time=0.65..28.98 rows=8 loops=1) Filter: ((id_instalacion = 2::numeric) AND (id_sbc = 619::numeric) AND (id_tipo = 2::numeric)) Total runtime: 720595.77 msec (20 rows) -Mensaje original- De: Tom Lane [mailto:[EMAIL PROTECTED] Enviado el: lunes, 04 de agosto de 2003 18:28 Para: Fernando Papa CC: [EMAIL PROTECTED] Asunto: Re: [PERFORM] I can't wait too much: Total runtime 432478.44 msec "Fernando Papa" <[EMAIL PROTECTED]> writes: >-> Nested Loop (cost=0.00..1828.46 rows=1 width=367) > (actual time=7525.51..436843.27 rows=40 loops=1) > Join Filter: (("inner".id_contenido = > "outer".id_contenido) AND ("inner".id_instalacion = > "outer".id_instalacion)) > -> Index Scan using jue_conf_pk on juegos_config > (cost=0.00..12.19 rows=40 width=332) (actual time=0.38..6.63 rows=40 > loops=1) > -> Index Scan using idx_generar_vainilla_ci on > cont_publicacion (cost=0.00..45.39 rows=1 width=35) (actual > time=48.81..10917.53 rows=97 loops=40) >Index Cond: > (upper((generar_vainilla)::text) = 'S'::text) >Filter: (subplan) >SubPlan > -> Aggregate (cost=15.85..15.85 rows=1 > width=8) (actual time=24.30..24.30 rows=0 loops=17880) As best I can tell, the problem here is coming from a drastic underestimate of the number of rows selected by "upper(generar_vainilla) = 'S'". Evidently there are about 450 such rows (since in 40 repetitions of the inner index scan, the aggregate subplan gets evaluated 17880 times), but the planner seems to think there will be only about two such rows. Had it made a more correct estimate, it would never have picked a plan that required multiple repetitions of the indexscan. One thing I'm wondering is if you've VACUUM ANALYZEd cont_publicacion lately --- the cost estimate seems on the small side, and I'm wondering if the planner thinks the table is much smaller than it really is. But assuming you didn't make that mistake, the only solution I can see is to not use a functional index. The planner is not good about making row count estimates for functional indexes. You could replace the index on upper(generar_vainilla) with a plain index on generar_vainilla, and change the query condition from "upper(generar_vainilla) = 'S'" to "generar_vainilla IN ('S', 's')". I think the planner would have a lot better chance at understanding the statistics that way. regards, tom lane ---(end of broadc
Re: [PERFORM] I can't wait too much: Total runtime 432478.44 msec
"Fernando Papa" <[EMAIL PROTECTED]> writes: >-> Nested Loop (cost=0.00..1828.46 rows=1 width=367) > (actual time=7525.51..436843.27 rows=40 loops=1) > Join Filter: (("inner".id_contenido = > "outer".id_contenido) AND ("inner".id_instalacion = > "outer".id_instalacion)) > -> Index Scan using jue_conf_pk on juegos_config > (cost=0.00..12.19 rows=40 width=332) (actual time=0.38..6.63 rows=40 > loops=1) > -> Index Scan using idx_generar_vainilla_ci on > cont_publicacion (cost=0.00..45.39 rows=1 width=35) (actual > time=48.81..10917.53 rows=97 loops=40) >Index Cond: (upper((generar_vainilla)::text) > = 'S'::text) >Filter: (subplan) >SubPlan > -> Aggregate (cost=15.85..15.85 rows=1 > width=8) (actual time=24.30..24.30 rows=0 loops=17880) As best I can tell, the problem here is coming from a drastic underestimate of the number of rows selected by "upper(generar_vainilla) = 'S'". Evidently there are about 450 such rows (since in 40 repetitions of the inner index scan, the aggregate subplan gets evaluated 17880 times), but the planner seems to think there will be only about two such rows. Had it made a more correct estimate, it would never have picked a plan that required multiple repetitions of the indexscan. One thing I'm wondering is if you've VACUUM ANALYZEd cont_publicacion lately --- the cost estimate seems on the small side, and I'm wondering if the planner thinks the table is much smaller than it really is. But assuming you didn't make that mistake, the only solution I can see is to not use a functional index. The planner is not good about making row count estimates for functional indexes. You could replace the index on upper(generar_vainilla) with a plain index on generar_vainilla, and change the query condition from "upper(generar_vainilla) = 'S'" to "generar_vainilla IN ('S', 's')". I think the planner would have a lot better chance at understanding the statistics that way. regards, tom lane ---(end of broadcast)--- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] Indexes not used for "min()"
On Mon, Aug 04, 2003 at 15:05:08 -0600, "Valsecchi, Patrick" <[EMAIL PROTECTED]> wrote: > Sir, > > I did a search with the "index" keyword on the mailing list archive and it did come > with no result. Sorry if it's a known bug. It isn't a bug. It is a design trade off. The database has no special knowledge about the min and max aggregate functions that would allow it to use indexes. > But in general, I think the indexes are under used. I have several queries that are > taking at least 30 minutes and that would take less than one minute if indexes where > used (comes with a comparison I made with Oracle). In particular, I have the feeling > that indexes are not used for "IN" statements (within a where clauses). There are know performance problems with in. These are addressed in 7.4 which will be going into beta any time now. You can usually rewrite IN queries to use exists instead, which will speed things up. Also be sure to run analyze (or vacuum analyze) so that the database server has accurate statistics on which to bases its decisions. > On the same subject, I'd add that the ability to provide plan "hints" within the > queries (like provided in Oracle) would be helpful. I know that the Postgres > optimizer is supposed to do a better job than the one from Oracle, but an optimizer > cannot be perfect for every cases. "Hints" aren't going to happen. They cause maintainance problems. You can disable features for a session (such as sequential scans) and try to get a plan you like. But generally, rather than adding this to you application code, you should try to find out why the planner is making the wrong choice. Adjusting the relative costs for doing things might allow the planner to do a much better job for you. This kind of thing gets discussed on the performance list. ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faqs/FAQ.html
Re: [SQL] [PERFORM] EXTERNAL storage and substring on long strings
Scott Cain wrote: On Mon, 2003-08-04 at 11:53, Tom Lane wrote: I find it really, really hard to believe that a crude reimplementation in plpgsql of the TOAST concept could beat the built-in implementation at all, let alone beat it by two orders of magnitude. Either there's something unrealistic about your testing of the dna_string function, or your original tests are not causing TOAST to be invoked in the expected way, or there's a bug we need to fix. I'd really like to see some profiling of the poor-performing external-storage case, so we can figure out what's going on. I was really hoping for a "Good job and glad to hear it" from you :-) I don't think there is anything unrealistic about my function or its testing, as it is very much along the lines of the types of things we do now. I will really try to do some profiling this week to help figure out what is going on. Is there a sample table schema and dataset available (external-storage case) that we can play with? Joe ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])
Re: [PERFORM] I can't wait too much: Total runtime 432478.44 msec
I was play with nested loops, and I found this: Original explain: Limit (cost=9.75..9.76 rows=1 width=479) (actual time=436858.90..436858.93 rows=8 loops=1) -> Sort (cost=9.75..9.76 rows=1 width=479) (actual time=436858.88..436858.89 rows=8 loops=1) Sort Key: cont_publicacion.fecha_publicacion -> Merge Join (cost=9.73..9.74 rows=1 width=479) (actual time=196970.93..436858.04 rows=8 loops=1) Merge Cond: (("outer".id_instalacion = "inner".id_instalacion) AND ("outer".id_contenido = "inner".id_contenido)) -> Nested Loop (cost=0.00..1828.46 rows=1 width=367) (actual time=7525.51..436843.27 rows=40 loops=1) Join Filter: (("inner".id_contenido = "outer".id_contenido) AND ("inner".id_instalacion = "outer".id_instalacion)) -> Index Scan using jue_conf_pk on juegos_config (cost=0.00..12.19 rows=40 width=332) (actual time=0.38..6.63 rows=40 loops=1) -> Index Scan using idx_generar_vainilla_ci on cont_publicacion (cost=0.00..45.39 rows=1 width=35) (actual time=48.81..10917.53 rows=97 loops=40) Index Cond: (upper((generar_vainilla)::text) = 'S'::text) Filter: (subplan) SubPlan -> Aggregate (cost=15.85..15.85 rows=1 width=8) (actual time=24.30..24.30 rows=0 loops=17880) Filter: (max(fecha_publicacion) = $3) -> Seq Scan on cont_publicacion cp1 (cost=0.00..15.84 rows=1 width=8) (actual time=10.17..24.12 rows=7 loops=17880) Filter: ((id_instalacion = $0) AND (id_contenido = $1) AND (generar_vainilla = $2)) -> Sort (cost=9.73..9.74 rows=3 width=112) (actual time=8.91..8.95 rows=8 loops=1) Sort Key: cont_contenido.id_instalacion, cont_contenido.id_contenido -> Seq Scan on cont_contenido (cost=0.00..9.70 rows=3 width=112) (actual time=0.45..7.59 rows=8 loops=1) Filter: ((id_instalacion = 2::numeric) AND (id_sbc = 619::numeric) AND (id_tipo = 2::numeric)) Total runtime: 436860.84 msec (21 rows) With set enable_nestloop to off : -- Limit (cost=55.15..55.16 rows=1 width=479) (actual time=11394.79..11394.82 rows=8 loops=1) -> Sort (cost=55.15..55.16 rows=1 width=479) (actual time=11394.77..11394.79 rows=8 loops=1) Sort Key: cont_publicacion.fecha_publicacion -> Merge Join (cost=55.13..55.14 rows=1 width=479) (actual time=11380.12..11394.01 rows=8 loops=1) Merge Cond: (("outer".id_instalacion = "inner".id_instalacion) AND ("outer".id_contenido = "inner".id_contenido)) -> Merge Join (cost=45.40..45.41 rows=1 width=367) (actual time=11358.48..11380.18 rows=40 loops=1) Merge Cond: (("outer".id_instalacion = "inner".id_instalacion) AND ("outer".id_contenido = "inner".id_contenido)) -> Index Scan using jue_conf_pk on juegos_config (cost=0.00..12.19 rows=40 width=332) (actual time=0.23..5.62 rows=40 loops=1) -> Sort (cost=45.40..45.40 rows=1 width=35) (actual time=11357.48..11357.68 rows=97 loops=1) Sort Key: cont_publicacion.id_instalacion, cont_publicacion.id_contenido -> Index Scan using idx_generar_vainilla_ci on cont_publicacion (cost=0.00..45.39 rows=1 width=35) (actual time=48.81..11339.80 rows=97 loops=1) Index Cond: (upper((generar_vainilla)::text) = 'S'::text) Filter: (fecha_publicacion = (subplan)) SubPlan -> Aggregate (cost=15.84..15.84 rows=1 width=8) (actual time=25.21..25.22 rows=1 loops=447) -> Seq Scan on cont_publicacion cp1 (cost=0.00..15.84 rows=1 width=8) (actual time=10.21..25.07 rows=7 loops=447) Filter: ((id_instalacion = $0) AND (id_contenido = $1) AND (generar_vainilla = $2)) -> Sort (cost=9.73..9.74 rows=3 width=112) (actual time=8.77..8.79 rows=8 loops=1) Sort Key: cont_contenido.id_instalacion, cont_contenido.id_contenido -> Seq Scan on cont_contenido (cost=0.00..9.70 rows=3 width=112) (actual time=0.45..7.41 rows=8 loops=1) Filter: ((id_instalacion = 2::numeric) AND (id_sbc = 619::numeric) AND (id_tipo = 2::numeric)) Total runtime: 11397.66 msec (22 rows) Why postgresql don't choose not to use nested loop? Why is more cheap to use nested loops but It's take a lot of time
Re: [SQL] [PERFORM] EXTERNAL storage and substring on long strings
> > 2. If you want to search for a sequence you'll need to deal with the case > > where it starts in one chunk and ends in another. > > I forgot about searching--I suspect that application is why I faced > opposition for shredding in my schema development group. Maybe I should > push that off to the file system and use grep (or BLAST). Otherwise, I > could write a function that would search the chunks first, then after > failing to find the substring in those, I could start sewing the chunks > together to look for the query string. That could get ugly (and > slow--but if the user knows that and expects it to be slow, I'm ok with > that). If you know the max length of the sequences being searched for, and this is much less than the chunk size, then you could simply have the chunks overlap by that much, thus guaranteeing every substring will be found in its entirety in at least one chunk. ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
Re: [PERFORM] OSDL Database Test Suite 3 is available on PostgreSQL
On 4 Aug 2003 at 15:33, Manfred Koizar wrote: > On 01 Aug 2003 11:04:10 -0700, Jenny Zhang <[EMAIL PROTECTED]> wrote: > >A sample OSDL-DBT3 test result report can be found at: > >http://khack.osdl.org/stp/276912/ > > > >Your comments are welcome, I could not get postgresql .conf so I will combine the comments. 1. Effective cache size, already mentioned 2. Sort memory already mentioned. 3. Was WAL put on different drive? 4. Can you try with autovacuum daemon and 7.4beta when it comes out.. 5. What was the file system? Ext2/Ext3/reiser/XFS? Is there any comparison available for other databases.. Could be interesting to see..:-) Thanks for the good work. I understand it must have been quite an effort to run it.. Keep it up.. Bye Shridhar -- Fourth Law of Revision: It is usually impractical to worry beforehand about interferences -- if you have none, someone will make one for you. ---(end of broadcast)--- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [SQL] [PERFORM] EXTERNAL storage and substring on long strings
On Mon, 2003-08-04 at 11:53, Tom Lane wrote: > Scott Cain <[EMAIL PROTECTED]> writes: > > At least this appears to work and is much faster, completing substring > > operations like above in about 0.27 secs (that's about two orders of > > magnitude improvement!) > > I find it really, really hard to believe that a crude reimplementation > in plpgsql of the TOAST concept could beat the built-in implementation > at all, let alone beat it by two orders of magnitude. > > Either there's something unrealistic about your testing of the > dna_string function, or your original tests are not causing TOAST to be > invoked in the expected way, or there's a bug we need to fix. I'd > really like to see some profiling of the poor-performing > external-storage case, so we can figure out what's going on. > I was really hoping for a "Good job and glad to hear it" from you :-) I don't think there is anything unrealistic about my function or its testing, as it is very much along the lines of the types of things we do now. I will really try to do some profiling this week to help figure out what is going on. Scott -- Scott Cain, Ph. D. [EMAIL PROTECTED] GMOD Coordinator (http://www.gmod.org/) 216-392-3087 Cold Spring Harbor Laboratory ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] OSDL Database Test Suite 3 is available on PostgreSQL
On Mon, 4 Aug 2003, Manfred Koizar wrote: > On 01 Aug 2003 11:04:10 -0700, Jenny Zhang <[EMAIL PROTECTED]> wrote: > >A sample OSDL-DBT3 test result report can be found at: > >http://khack.osdl.org/stp/276912/ > > > >Your comments are welcome, > > | effective_cache_size | 1000 > > With 4GB of memory this is definitely too low and *can* (note that I > don't say *must*) lead the planner to wrong decisions. > > | shared_buffers | 15200 > > ... looks reasonable. Did you test with other values? > > | sort_mem | 524288 > > This is a bit high, IMHO, but might be ok given that DBT3 is not run > with many concurrent sessions (right?). > http://khack.osdl.org/stp/276912/results/plot/vmstat_swap.png shows > some swapping activity towards the end of the run which could be > caused by a too high sort_mem setting. Sorry, my last email shot off out of the gun before it was completed... To repeat: Don't forget to set effective_cache_size the same way as shared buffers (i.e. it's in 8k blocks for most systems.) If you have a machine with 4 gig ram, and 3 gigs is available as disk cache, then divide out 3 gigs by 8k to get the right number. My quick calculation shows that being about 393216 blocks. ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: [PERFORM] OSDL Database Test Suite 3 is available on PostgreSQL
On Mon, 4 Aug 2003, Manfred Koizar wrote: > On 01 Aug 2003 11:04:10 -0700, Jenny Zhang <[EMAIL PROTECTED]> wrote: > >A sample OSDL-DBT3 test result report can be found at: > >http://khack.osdl.org/stp/276912/ > > > >Your comments are welcome, > > | effective_cache_size | 1000 > > With 4GB of memory this is definitely too low and *can* (note that I > don't say *must*) lead the planner to wrong decisions. > > | shared_buffers | 15200 > > ... looks reasonable. Did you test with other values? > > | sort_mem | 524288 > > This is a bit high, IMHO, but might be ok given that DBT3 is not run > with many concurrent sessions (right?). > http://khack.osdl.org/stp/276912/results/plot/vmstat_swap.png shows > some swapping activity towards the end of the run which could be > caused by a too high sort_mem setting. And, as always, don't forget to set effect_cache_size. ---(end of broadcast)--- TIP 7: don't forget to increase your free space map settings
Re: [SQL] [PERFORM] EXTERNAL storage and substring on long strings
On 4 Aug 2003 at 12:14, Scott Cain wrote: > I forgot about searching--I suspect that application is why I faced > opposition for shredding in my schema development group. Maybe I should > push that off to the file system and use grep (or BLAST). Otherwise, I > could write a function that would search the chunks first, then after > failing to find the substring in those, I could start sewing the chunks > together to look for the query string. That could get ugly (and > slow--but if the user knows that and expects it to be slow, I'm ok with > that). I assume your DNA sequence is compacted. Your best bet would be to fetch them from database and run blast on them in client memory. No point duplicating blast functionality. Last I tried it beat every technique of text searching when heuristics are involved. Bye Shridhar -- There are two types of Linux developers - those who can spell, andthose who can't. There is a constant pitched battle between the two.(From one of the post- 1.1.54 kernel update messages posted to c.o.l.a) ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
Re: [SQL] [PERFORM] EXTERNAL storage and substring on long strings
On Mon, 2003-08-04 at 11:55, Richard Huxton wrote: > On Monday 04 August 2003 16:25, Scott Cain wrote: > [snip] > > [snip] > > You might want some checks to make sure that smin < smax, otherwise looks like > it does the job in a good clean fashion. Good point--smin < smax generally by virtue of the application using the database, but I shouldn't assume that will always be the case. > > Glad to hear it's going to solve your problems. Two things you might want to > bear in mind: > 1. There's probably a "sweet spot" where the chunk size interacts well with > your data, usage patterns and PGs backend to give you peak performance. > You'll have to test. Yes, I had a feeling that was probably the case-- since this is an open source project, I will need to write directions for installers on picking a reasonable chunk size. > 2. If you want to search for a sequence you'll need to deal with the case > where it starts in one chunk and ends in another. I forgot about searching--I suspect that application is why I faced opposition for shredding in my schema development group. Maybe I should push that off to the file system and use grep (or BLAST). Otherwise, I could write a function that would search the chunks first, then after failing to find the substring in those, I could start sewing the chunks together to look for the query string. That could get ugly (and slow--but if the user knows that and expects it to be slow, I'm ok with that). Thanks, Scott -- Scott Cain, Ph. D. [EMAIL PROTECTED] GMOD Coordinator (http://www.gmod.org/) 216-392-3087 Cold Spring Harbor Laboratory ---(end of broadcast)--- TIP 8: explain analyze is your friend
Re: [SQL] [PERFORM] EXTERNAL storage and substring on long strings
On Monday 04 August 2003 16:25, Scott Cain wrote: [snip] > In an effort to find the best way to do this operation, I decided to > look at what is my "worst case" scenario: the DNA sequence for human > chromosome 1, which is about 250 million characters long (previous > strings where about 20 million characters long). I wrote a perl script > to do several substring operations over this very long string, with > substring lengths varying between 1000 and 40,000 characters spread out > over various locations along the string. While EXTENDED storage won in > this case, it was a hollow victory: 38 seconds per operation versus 40 > seconds, both of which are way too long to for an interactive > application. > > Time for a new method. A suggestion from my boss was to "shred" the DNA > into smallish chunks and a column giving offsets from the beginning of > the string, so that it can be reassembled when needed. Here is the test > table: > > string=> \d dna > Table "public.dna" > Column | Type | Modifiers > -+-+--- > foffset | integer | > pdna| text| > Indexes: foffset_idx btree (foffset) [snipped plpgsql function which stitches chunks together and then substrings] > So here's the question: I've never written a plpgsql function before, so > I don't have much experience with it; is there anything obviously wrong > with this function, or are there things that could be done better? At > least this appears to work and is much faster, completing substring > operations like above in about 0.27 secs (that's about two orders of > magnitude improvement!) You might want some checks to make sure that smin < smax, otherwise looks like it does the job in a good clean fashion. Glad to hear it's going to solve your problems. Two things you might want to bear in mind: 1. There's probably a "sweet spot" where the chunk size interacts well with your data, usage patterns and PGs backend to give you peak performance. You'll have to test. 2. If you want to search for a sequence you'll need to deal with the case where it starts in one chunk and ends in another. -- Richard Huxton Archonet Ltd ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
Re: [SQL] [PERFORM] EXTERNAL storage and substring on long strings
Scott Cain <[EMAIL PROTECTED]> writes: > At least this appears to work and is much faster, completing substring > operations like above in about 0.27 secs (that's about two orders of > magnitude improvement!) I find it really, really hard to believe that a crude reimplementation in plpgsql of the TOAST concept could beat the built-in implementation at all, let alone beat it by two orders of magnitude. Either there's something unrealistic about your testing of the dna_string function, or your original tests are not causing TOAST to be invoked in the expected way, or there's a bug we need to fix. I'd really like to see some profiling of the poor-performing external-storage case, so we can figure out what's going on. regards, tom lane ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] I can't wait too much: Total runtime 432478.44 msec
Err... you're right... one of us say the same thing when I show the Volker mail... -Mensaje original- De: Manfred Koizar [mailto:[EMAIL PROTECTED] Enviado el: lunes, 04 de agosto de 2003 12:17 Para: Fernando Papa CC: Volker Helm; [EMAIL PROTECTED] Asunto: Re: [PERFORM] I can't wait too much: Total runtime 432478.44 msec On Mon, 4 Aug 2003 12:02:46 -0300, "Fernando Papa" <[EMAIL PROTECTED]> wrote: >FROM >cont_contenido >,juegos_config >,cont_publicacion >,(SELECT max(cp1.fecha_publicacion) as max_pub --change here > FROM cont_publicacion cp1) a --change here But this calculates the global maximum, not per id_instalacion, id_contenido, and generar_vainilla as in >AND cont_publicacion.fecha_publicacion = (SELECT max(cp1.fecha_publicacion) > FROM cont_publicacion cp1 > WHERE cp1.id_instalacion = cont_publicacion.id_instalacion > AND cp1.id_contenido = cont_publicacion.id_contenido > AND cp1.generar_vainilla = cont_publicacion.generar_vainilla) Servus Manfred ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
Re: [PERFORM] EXTERNAL storage and substring on long strings
Hello, Note: there is a SQL question way at the bottom of this narrative :-) Last week I asked about doing substring operations on very long strings (>10 million characters). I was given a suggestion to use EXTERNAL storage on the column via the ALTER TABLE ... SET STORAGE command. In one test case, the performance of substring actually got worse using EXTERNAL storage. In an effort to find the best way to do this operation, I decided to look at what is my "worst case" scenario: the DNA sequence for human chromosome 1, which is about 250 million characters long (previous strings where about 20 million characters long). I wrote a perl script to do several substring operations over this very long string, with substring lengths varying between 1000 and 40,000 characters spread out over various locations along the string. While EXTENDED storage won in this case, it was a hollow victory: 38 seconds per operation versus 40 seconds, both of which are way too long to for an interactive application. Time for a new method. A suggestion from my boss was to "shred" the DNA into smallish chunks and a column giving offsets from the beginning of the string, so that it can be reassembled when needed. Here is the test table: string=> \d dna Table "public.dna" Column | Type | Modifiers -+-+--- foffset | integer | pdna| text| Indexes: foffset_idx btree (foffset) In practice, there would also be a foreign key column to give the identifier of the dna. Then I wrote the following function (here's the SQL part promised above): CREATE OR REPLACE FUNCTION dna_string (integer, integer) RETURNS TEXT AS ' DECLARE smin ALIAS FOR $1; smax ALIAS FOR $2; longdna TEXT := ; dna_row dna%ROWTYPE; dnastring TEXT; firstchunk INTEGER; lastchunk INTEGER; in_longdnastart INTEGER; in_longdnalen INTEGER; chunksize INTEGER; BEGIN SELECT INTO chunksize min(foffset) FROM dna WHERE foffset>0; firstchunk := chunksize*(smin/chunksize); lastchunk := chunksize*(smax/chunksize); in_longdnastart := smin % chunksize; in_longdnalen := smax - smin + 1; FOR dna_row IN SELECT * FROM dna WHERE foffset >= firstchunk AND foffset <= lastchunk ORDER BY foffset LOOP longdna := longdna || dna_row.pdna; END LOOP; dnastring := substring(longdna FROM in_longdnastart FOR in_longdnalen); RETURN dnastring; END; ' LANGUAGE 'plpgsql'; So here's the question: I've never written a plpgsql function before, so I don't have much experience with it; is there anything obviously wrong with this function, or are there things that could be done better? At least this appears to work and is much faster, completing substring operations like above in about 0.27 secs (that's about two orders of magnitude improvement!) Thanks, Scott -- Scott Cain, Ph. D. [EMAIL PROTECTED] GMOD Coordinator (http://www.gmod.org/) 216-392-3087 Cold Spring Harbor Laboratory ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] I can't wait too much: Total runtime 432478.44 msec
On Mon, 4 Aug 2003 12:02:46 -0300, "Fernando Papa" <[EMAIL PROTECTED]> wrote: >FROM >cont_contenido >,juegos_config >,cont_publicacion >,(SELECT max(cp1.fecha_publicacion) as max_pub > --change here > FROM cont_publicacion cp1) a --change here But this calculates the global maximum, not per id_instalacion, id_contenido, and generar_vainilla as in >AND cont_publicacion.fecha_publicacion = (SELECT > max(cp1.fecha_publicacion) > FROM cont_publicacion cp1 > WHERE cp1.id_instalacion = > cont_publicacion.id_instalacion > AND cp1.id_contenido = > cont_publicacion.id_contenido > AND cp1.generar_vainilla = > cont_publicacion.generar_vainilla) Servus Manfred ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])
Re: [PERFORM] I can't wait too much: Total runtime 432478.44 msec
Title: Mensaje Hi Volker!!! I think you're right. Look at times: QUERY PLAN --- Limit (cost=23.37..23.37 rows=1 width=487) (actual time=2245.61..2245.61 rows=0 loops=1) -> Sort (cost=23.37..23.37 rows=1 width=487) (actual time=2245.60..2245.60 rows=0 loops=1) Sort Key: cont_publicacion.fecha_publicacion -> Nested Loop (cost=23.33..23.36 rows=1 width=487) (actual time=2244.10..2244.10 rows=0 loops=1) Join Filter: ("outer".fecha_publicacion = "inner".max_pub) -> Merge Join (cost=9.73..9.74 rows=1 width=479) (actual time=918.73..1988.43 rows=16 loops=1) Merge Cond: (("outer".id_instalacion = "inner".id_instalacion) AND ("outer".id_contenido = "inner".id_contenido)) -> Nested Loop (cost=0.00..409.35 rows=1 width=367) (actual time=35.44..1967.20 rows=82 loops=1) Join Filter: (("inner".id_contenido = "outer".id_contenido) AND ("inner".id_instalacion = "outer".id_instalacion)) -> Index Scan using jue_conf_pk on juegos_config (cost=0.00..12.19 rows=40 width=332) (actual time=0.42..6.73 rows=40 loops=1) -> Index Scan using idx_generar_vainilla_ci on cont_publicacion (cost=0.00..9.90 rows=2 width=35) (actual time=0.20..35.19 rows=447 loops=40) Index Cond: (upper((generar_vainilla)::text) = 'S'::text) -> Sort (cost=9.73..9.74 rows=3 width=112) (actual time=10.42..10.48 rows=15 loops=1) Sort Key: cont_contenido.id_instalacion, cont_contenido.id_contenido -> Seq Scan on cont_contenido (cost=0.00..9.70 rows=3 width=112) (actual time=0.57..8.11 rows=8 loops=1) Filter: ((id_instalacion = 2::numeric) AND (id_sbc = 619::numeric) AND (id_tipo = 2::numeric)) -> Subquery Scan a (cost=13.60..13.60 rows=1 width=8) (actual time=15.89..15.90 rows=1 loops=16) -> Aggregate (cost=13.60..13.60 rows=1 width=8) (actual time=15.87..15.88 rows=1 loops=16) -> Seq Scan on cont_publicacion cp1 (cost=0.00..12.48 rows=448 width=8) (actual time=0.05..11.62 rows=448 loops=16) Total runtime: 2250.92 msec(20 rows) The problem was the subquery, no doubt. -Mensaje original-De: Volker Helm [mailto:[EMAIL PROTECTED] Enviado el: lunes, 04 de agosto de 2003 11:45Para: Fernando PapaAsunto: AW: [PERFORM] I can't wait too much: Total runtime 432478.44 msec Hi, just use the subquery as inline-View an join the tables: SELECT cont_contenido.id_contenido ,cont_contenido.pertenece_premium ,cont_contenido.Titulo_esp as v_sufix ,cont_contenido.url_contenido ,cont_contenido.tipo_acceso ,cont_contenido.id_sbc ,cont_contenido.cant_vistos ,cont_contenido.cant_votos ,cont_contenido.puntaje_total ,cont_contenido.id_contenido_padre ,juegos_config.imagen_tapa_especial ,juegos_config.info_general_esp as info_general ,juegos_config.ayuda ,juegos_config.tips_tricks_esp as tips_tricks ,juegos_config.mod_imagen_tapa_especial ,cont_publicacion.fecha_publicacion as fecha_publicacion ,cont_publicacion.generar_Vainilla FROM cont_contenido ,juegos_config ,cont_publicacion ,(SELECT max(cp1.fecha_publicacion) as max_pub --change here FROM cont_publicacion cp1) a --change hereWHERE cont_contenido.id_instalacion = 2 AND cont_contenido.id_sbc = 619 AND cont_contenido.id_tipo = 2 AND cont_contenido.id_instalacion = juegos_config.id_instalacion AND cont_contenido.id_contenido = juegos_config.id_contenido AND upper(cont_publicacion.generar_Vainilla) = 'S' AND cont_publicacion.id_instalacion = cont_contenido.id_instalacion AND cont_publicacion.id_contenido = cont_contenido.id_contenido AND cont_publicacion.fecha_publicacion = a.max_pub -- change here ORDER BY cont_publicacion.fecha_publicacion desc hope it helps, Volker Helm -Ursprüngliche Nachricht-Von: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]Im Auftrag von Fernando PapaGesendet: Freitag, 1. August 2003 23:17An: [EMAIL PROTECTED]Betreff: [PERFORM] I can't wait too much: Total runtime 432478.44 msec Hi all! Really I don't know what happened with this query. I'm running PG 7.3.1 on solaris, vacc
Re: [PERFORM] I can't wait too much: Total runtime 432478.44 msec
I create the index, but doesn't help too much: QUERY PLAN - Limit (cost=9.75..9.76 rows=1 width=479) (actual time=486421.35..486421.38 rows=8 loops=1) -> Sort (cost=9.75..9.76 rows=1 width=479) (actual time=486421.33..486421.34 rows=8 loops=1) Sort Key: cont_publicacion.fecha_publicacion -> Merge Join (cost=9.73..9.74 rows=1 width=479) (actual time=220253.76..486420.35 rows=8 loops=1) Merge Cond: (("outer".id_instalacion = "inner".id_instalacion) AND ("outer".id_contenido = "inner".id_contenido)) -> Nested Loop (cost=0.00..1828.35 rows=1 width=367) (actual time=8347.78..486405.02 rows=40 loops=1) Join Filter: (("inner".id_contenido = "outer".id_contenido) AND ("inner".id_instalacion = "outer".id_instalacion)) -> Index Scan using jue_conf_pk on juegos_config (cost=0.00..12.19 rows=40 width=332) (actual time=0.23..6.73 rows=40 loops=1) -> Index Scan using idx_generar_vainilla_ci on cont_publicacion (cost=0.00..45.39 rows=1 width=35) (actual time=56.01..12156.48 rows=97 loops=40) Index Cond: (upper((generar_vainilla)::text) = 'S'::text) Filter: (fecha_publicacion = (subplan)) SubPlan -> Aggregate (cost=15.84..15.84 rows=1 width=8) (actual time=27.03..27.03 rows=1 loops=17880) -> Seq Scan on cont_publicacion cp1 (cost=0.00..15.84 rows=1 width=8) (actual time=11.21..26.86 rows=7 loops=17880) Filter: ((id_instalacion = $0) AND (id_contenido = $1) AND (generar_vainilla = $2)) -> Sort (cost=9.73..9.74 rows=3 width=112) (actual time=9.28..9.32 rows=8 loops=1) Sort Key: cont_contenido.id_instalacion, cont_contenido.id_contenido -> Seq Scan on cont_contenido (cost=0.00..9.70 rows=3 width=112) (actual time=0.47..7.48 rows=8 loops=1) Filter: ((id_instalacion = 2::numeric) AND (id_sbc = 619::numeric) AND (id_tipo = 2::numeric)) Total runtime: 486445.19 msec (20 rows) -Mensaje original- De: Mendola Gaetano [mailto:[EMAIL PROTECTED] Enviado el: sábado, 02 de agosto de 2003 7:36 Para: [EMAIL PROTECTED] CC: Fernando Papa Asunto: Re: I can't wait too much: Total runtime 432478.44 msec From: ""Fernando Papa"" <[EMAIL PROTECTED]> > AND upper(cont_publicacion.generar_Vainilla) = 'S' > Filter: ((upper((generar_vainilla)::text) = 'S'::text) AND (fecha_publicacion = (subplan))) using a functional index on this field should help create index idx_generar_vainilla_ci on cont_publicacion ( upper(generar_Vainilla) ) Regards Gaetano Mendola ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])
Re: [PERFORM] I can't wait too much: Total runtime 432478.44 msec
Hi Josh... a little worse time: EXPLAIN ANALYZE SELECT cont_contenido.id_contenido ,cont_contenido.pertenece_premium ,cont_contenido.Titulo_esp as v_sufix ,cont_contenido.url_contenido ,cont_contenido.tipo_acceso ,cont_contenido.id_sbc ,cont_contenido.cant_vistos ,cont_contenido.cant_votos ,cont_contenido.puntaje_total ,cont_contenido.id_contenido_padre ,juegos_config.imagen_tapa_especial ,juegos_config.info_general_esp as info_general ,juegos_config.ayuda ,juegos_config.tips_tricks_esp as tips_tricks ,juegos_config.mod_imagen_tapa_especial ,cont_publicacion.fecha_publicacion as fecha_publicacion ,cont_publicacion.generar_Vainilla FROM cont_contenido ,juegos_config ,cont_publicacion WHERE cont_contenido.id_instalacion= 2 AND cont_contenido.id_sbc = 619 AND cont_contenido.id_tipo = 2 AND cont_contenido.id_instalacion = juegos_config.id_instalacion AND cont_contenido.id_contenido = juegos_config.id_contenido AND upper(cont_publicacion.generar_Vainilla) = 'S' AND cont_publicacion.id_instalacion = cont_contenido.id_instalacion AND cont_publicacion.id_contenido = cont_contenido.id_contenido AND EXISTS (SELECT max(cp1.fecha_publicacion) FROM cont_publicacion cp1 WHERE cp1.id_instalacion = cont_publicacion.id_instalacion AND cp1.id_contenido = cont_publicacion.id_contenido AND cp1.generar_vainilla = cont_publicacion.generar_vainilla HAVING max(cp1.fecha_publicacion) = cont_publicacion.fecha_publicacion) ORDER BY cont_publicacion.fecha_publicacion desc LIMIT 10 OFFSET 0 ; QUERY PLAN - Limit (cost=9.75..9.76 rows=1 width=479) (actual time=449760.88..449760.91 rows=8 loops=1) -> Sort (cost=9.75..9.76 rows=1 width=479) (actual time=449760.87..449760.88 rows=8 loops=1) Sort Key: cont_publicacion.fecha_publicacion -> Merge Join (cost=9.73..9.74 rows=1 width=479) (actual time=202257.20..449759.00 rows=8 loops=1) Merge Cond: (("outer".id_instalacion = "inner".id_instalacion) AND ("outer".id_contenido = "inner".id_contenido)) -> Nested Loop (cost=0.00..284556.86 rows=1 width=367) (actual time=7794.28..449741.85 rows=40 loops=1) Join Filter: (("inner".id_contenido = "outer".id_contenido) AND ("inner".id_instalacion = "outer".id_instalacion)) -> Index Scan using jue_conf_pk on juegos_config (cost=0.00..12.19 rows=40 width=332) (actual time=0.43..8.12 rows=40 loops=1) -> Seq Scan on cont_publicacion (cost=0.00..7113.60 rows=1 width=35) (actual time=24.10..11239.67 rows=97 loops=40) Filter: ((upper((generar_vainilla)::text) = 'S'::text) AND (subplan)) SubPlan -> Aggregate (cost=15.85..15.85 rows=1 width=8) (actual time=25.03..25.03 rows=0 loops=17880) Filter: (max(fecha_publicacion) = $3) -> Seq Scan on cont_publicacion cp1 (cost=0.00..15.84 rows=1 width=8) (actual time=10.51..24.85 rows=7 loops=17880) Filter: ((id_instalacion = $0) AND (id_contenido = $1) AND (generar_vainilla = $2)) -> Sort (cost=9.73..9.74 rows=3 width=112) (actual time=10.49..10.52 rows=8 loops=1) Sort Key: cont_contenido.id_instalacion, cont_contenido.id_contenido -> Seq Scan on cont_contenido (cost=0.00..9.70 rows=3 width=112) (actual time=0.59..8.07 rows=8 loops=1) Filter: ((id_instalacion = 2::numeric) AND (id_sbc = 619::numeric) AND (id_tipo = 2::numeric)) Total runtime: 449765.69 msec (20 rows) -Mensaje original- De: Josh Berkus [mailto:[EMAIL PROTECTED] Enviado el: viernes, 01 de agosto de 2003 18:32 Para: Christopher Browne; Fernando Papa CC: [EMAIL PROTECTED] Asunto: Re: [PERFORM] I can't wait too much: Total runtime 432478.44 msec Fernando, > AND cont_publicacion.fecha_publicacion = (SELECT max(cp1.fecha_publicacion) > FROM cont_publicacion cp1 > WHERE cp1.id_instalacion = cont_publicacion.id_instalacion > AND cp1.id_contenido = cont_publicacion.id_contenido > AND cp1.generar_vainilla = cont_publicacion.generar_vainilla) Or event changing it to: AND EXISTS (SELECT max(cp1.fecha_publicacion) FROM cont_publicacion cp1 WHERE cp1.id_instalacion = cont_publicacion.id_instalacion AND cp1.id_contenido = cont_publicacion.id_contenido AND cp1.generar_vainilla = cont_publicacion.generar_vainilla HAVING max(cp1.fecha_publicacion) = cont_publicacion.fecha_publicacion) -- -Josh Berkus Aglio Database Solutions San Francisco
Re: [PERFORM] I can't wait too much: Total runtime 432478.44 msec
Sorry Chris... a little slower... esdc=> EXPLAIN ANALYZE SELECT cont_contenido.id_contenido ,cont_contenido.pertenece_premium ,cont_contenido.Titulo_esp as v_sufix ,cont_contenido.url_contenido ,cont_contenido.tipo_acceso ,cont_contenido.id_sbc ,cont_contenido.cant_vistos ,cont_contenido.cant_votos ,cont_contenido.puntaje_total ,cont_contenido.id_contenido_padre ,juegos_config.imagen_tapa_especial ,juegos_config.info_general_esp as info_general ,juegos_config.ayuda ,juegos_config.tips_tricks_esp as tips_tricks ,juegos_config.mod_imagen_tapa_especial ,cont_publicacion.fecha_publicacion as fecha_publicacion ,cont_publicacion.generar_Vainilla FROM cont_contenido ,juegos_config ,cont_publicacion WHERE cont_contenido.id_instalacion= 2 AND cont_contenido.id_sbc = 619 AND cont_contenido.id_tipo = 2 AND cont_contenido.id_instalacion = juegos_config.id_instalacion AND cont_contenido.id_contenido = juegos_config.id_contenido AND upper(cont_publicacion.generar_Vainilla) = 'S' AND cont_publicacion.id_instalacion = cont_contenido.id_instalacion AND cont_publicacion.id_contenido = cont_contenido.id_contenido AND cont_publicacion.fecha_publicacion = (SELECT cp1.fecha_publicacion FROM cont_publicacion cp1 WHERE cp1.id_instalacion = cont_publicacion.id_instalacion AND cp1.id_contenido = cont_publicacion.id_contenido AND cp1.generar_vainilla = cont_publicacion.generar_vainilla ORDER BY fecha_publicacion LIMIT 1) ORDER BY cont_publicacion.fecha_publicacion desc LIMIT 10 OFFSET 0 ; QUERY PLAN --- Limit (cost=9.75..9.76 rows=1 width=479) (actual time=465085.25..465085.27 rows=8 loops=1) -> Sort (cost=9.75..9.76 rows=1 width=479) (actual time=465085.23..465085.24 rows=8 loops=1) Sort Key: cont_publicacion.fecha_publicacion -> Merge Join (cost=9.73..9.74 rows=1 width=479) (actual time=210743.83..465083.31 rows=8 loops=1) Merge Cond: (("outer".id_instalacion = "inner".id_instalacion) AND ("outer".id_contenido = "inner".id_contenido)) -> Nested Loop (cost=0.00..284756.79 rows=1 width=367) (actual time=8319.87..464981.68 rows=40 loops=1) Join Filter: (("inner".id_contenido = "outer".id_contenido) AND ("inner".id_instalacion = "outer".id_instalacion)) -> Index Scan using jue_conf_pk on juegos_config (cost=0.00..12.19 rows=40 width=332) (actual time=52.93..142.31 rows=40 loops=1) -> Seq Scan on cont_publicacion (cost=0.00..7118.60 rows=1 width=35) (actual time=51.79..11617.12 rows=97 loops=40) Filter: ((upper((generar_vainilla)::text) = 'S'::text) AND (fecha_publicacion = (subplan))) SubPlan -> Limit (cost=15.85..15.85 rows=1 width=8) (actual time=25.86..25.86 rows=1 loops=17880) -> Sort (cost=15.85..15.86 rows=1 width=8) (actual time=25.82..25.82 rows=2 loops=17880) Sort Key: fecha_publicacion -> Seq Scan on cont_publicacion cp1 (cost=0.00..15.84 rows=1 width=8) (actual time=10.68..25.32 rows=7 loops=17880) Filter: ((id_instalacion = $0) AND (id_contenido = $1) AND (generar_vainilla = $2)) -> Sort (cost=9.73..9.74 rows=3 width=112) (actual time=94.91..94.93 rows=8 loops=1) Sort Key: cont_contenido.id_instalacion, cont_contenido.id_contenido -> Seq Scan on cont_contenido (cost=0.00..9.70 rows=3 width=112) (actual time=21.70..92.96 rows=8 loops=1) Filter: ((id_instalacion = 2::numeric) AND (id_sbc = 619::numeric) AND (id_tipo = 2::numeric)) Total runtime: 465088.66 msec (21 rows) -Mensaje original- De: Christopher Browne [mailto:[EMAIL PROTECTED] Enviado el: viernes, 01 de agosto de 2003 18:27 Para: Fernando Papa CC: [EMAIL PROTECTED] Asunto: Re: [PERFORM] I can't wait too much: Total runtime 432478.44 msec I'd point at the following as being a sterling candidate for being a cause of this being slow... AND cont_publicacion.fecha_publicacion = (SELECT max(cp1.fecha_publicacion) FROM cont_publicacion cp1 WHERE cp1.id_instalacion = cont_publicacion.id_instalacion AND cp1.id_contenido = cont_publicacion.id_contenido AND cp1.generar_vainilla = cont_publicacion.generar_vainilla) May I suggest changing it to: AND cont_publicacion.fecha_p
Re: [PERFORM] I can't wait too much: Total runtime 432478.44 msec
On Fri, 1 Aug 2003 18:17:17 -0300, "Fernando Papa" <[EMAIL PROTECTED]> wrote: > AND cont_publicacion.fecha_publicacion = (SELECT >max(cp1.fecha_publicacion) > FROM cont_publicacion cp1 > WHERE cp1.id_instalacion = >cont_publicacion.id_instalacion >AND cp1.id_contenido = cont_publicacion.id_contenido > >AND cp1.generar_vainilla = >cont_publicacion.generar_vainilla) If certain uniqueness conditions are met, the Postgres specific DISTINCT ON clause could help totally eliminating the subselect: SELECT DISTINCT ON ( cp.id_instalacion, cp.id_contenido, cp.generar_vainilla, cp.fecha_publicacion ) cc.id_contenido ,cc.pertenece_premium ,cc.Titulo_esp as v_sufix ,cc.url_contenido ,cc.tipo_acceso ,cc.id_sbc ,cc.cant_vistos ,cc.cant_votos ,cc.puntaje_total ,cc.id_contenido_padre ,jc.imagen_tapa_especial ,jc.info_general_esp as info_general ,jc.ayuda ,jc.tips_tricks_esp as tips_tricks ,jc.mod_imagen_tapa_especial ,cp.fecha_publicacion as fecha_publicacion ,cp.generar_Vainilla FROM cont_contenido cc ,juegos_config jc ,cont_publicacion cp WHERE cc.id_instalacion= 2 AND cc.id_sbc = 619 AND cc.id_tipo = 2 AND cc.id_instalacion = jc.id_instalacion AND cc.id_contenido = jc.id_contenido AND upper(cp.generar_Vainilla) = 'S' AND cp.id_instalacion = cc.id_instalacion AND cp.id_contenido = cc.id_contenido ORDER BY cp.id_instalacion, cp.id_contenido, cp.generar_vainilla, cp.fecha_publicacion desc However, this doesn't get the result in the original order, so you have to wrap another SELECT ... ORDER BY ... LIMIT around it. Or try to move the subselect into the FROM clause: SELECT cc.id_contenido ,cc.pertenece_premium ,cc.Titulo_esp as v_sufix ,cc.url_contenido ,cc.tipo_acceso ,cc.id_sbc ,cc.cant_vistos ,cc.cant_votos ,cc.puntaje_total ,cc.id_contenido_padre ,jc.imagen_tapa_especial ,jc.info_general_esp as info_general ,jc.ayuda ,jc.tips_tricks_esp as tips_tricks ,jc.mod_imagen_tapa_especial ,cp.fecha_publicacion as fecha_publicacion ,cp.generar_Vainilla FROM cont_contenido cc ,juegos_config jc ,(SELECT DISTINCT ON ( id_instalacion, id_contenido, generar_vainilla, fecha_publicacion ) * FROM cont_publicacion ORDER BY id_instalacion, id_contenido, generar_vainilla, fecha_publicacion desc ) AS cp WHERE cc.id_instalacion= 2 AND cc.id_sbc = 619 AND cc.id_tipo = 2 AND cc.id_instalacion = jc.id_instalacion AND cc.id_contenido = jc.id_contenido AND upper(cp.generar_Vainilla) = 'S' AND cp.id_instalacion = cc.id_instalacion AND cp.id_contenido = cc.id_contenido ORDER BY cp.fecha_publicacion desc LIMIT 10 OFFSET 0 [completely untested] Servus Manfred ---(end of broadcast)--- TIP 7: don't forget to increase your free space map settings
Re: [PERFORM] OSDL Database Test Suite 3 is available on PostgreSQL
On 01 Aug 2003 11:04:10 -0700, Jenny Zhang <[EMAIL PROTECTED]> wrote: >A sample OSDL-DBT3 test result report can be found at: >http://khack.osdl.org/stp/276912/ > >Your comments are welcome, | effective_cache_size | 1000 With 4GB of memory this is definitely too low and *can* (note that I don't say *must*) lead the planner to wrong decisions. | shared_buffers | 15200 ... looks reasonable. Did you test with other values? | sort_mem | 524288 This is a bit high, IMHO, but might be ok given that DBT3 is not run with many concurrent sessions (right?). http://khack.osdl.org/stp/276912/results/plot/vmstat_swap.png shows some swapping activity towards the end of the run which could be caused by a too high sort_mem setting. Servus Manfred ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org