Re: [PERFORM] select count(*) performance
On Aug 11, 5:54 pm, Detlef Rudolph [EMAIL PROTECTED] wrote: Hello Group, I've tried the VACUUM ANALYSE, that doesn't help much, but VACUUM FULL improves Performance down from about 40 secs to 8. I think in future I would use the reltuples value from pg_class for the table. Thanks a lot for your answers and a good Sunday, Det just do not forget, that reltuples is count and updated in pg_class only during the vacuuming or analyzing of a table... so the value is only an APPROXIMATE -- Valentine ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [PERFORM] Dell Hardware Recommendations
On Aug 10, 2007, at 4:36 PM, Merlin Moncure wrote: I'm not so sure I agree. They are using LSI firmware now (and so is everyone else). The servers are well built (highly subjective, I admit) and configurable. I have had some bad experiences with IBM gear (adaptec controller though), and white box parts 3ware, etc. I can tell you that dell got us the storage and the server in record time do agree on adaptec however Ok, perhaps you got luckier... I have two PowerVault 220 rack mounts with U320 SCSI drives in them. With an LSI 320-2X controller, it *refuses* to recognize some of the drives (channel 1 on either array). Dell blames LSI, LSI blames dell's backplane. This is consistent across multiple controllers we tried, and two different Dell disk arrays. Dropping the SCSI speed to 160 is the only way to make them work. I tend to believe LSI here. The Adaptec 2230SLP controller recognizes the arrays fine, but tends to drop devices at inopportune moments. Re-seating dropped devices starts a rebuild, but the speed is recognized as 1 and the rebuild takes two lifetimes to complete unless you insert a reboot of the system in there. Totally unacceptable. Again, dropping the scsi rate to 160 seems to make it more stable. ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] How to ENABLE SQL capturing???
On 8/10/07, Joshua D. Drake [EMAIL PROTECTED] wrote: -BEGIN PGP SIGNED MESSAGE- Hash: SHA1 smiley2211 wrote: Jeff, You are CORRECT...my queries were going to /var/log/messages...had to get the Linux Admin to grant me READ access to the file... You may want to actually get that to stop. Syslog is a notorious performance bottleneck for postgresql. Can you elaborate? The only reference to this I could find was a thread from 2004 where someone wasn't rotating his logs. -Jonathan ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] Help optimize view
-Original Message- From: Tom Lane [mailto:[EMAIL PROTECTED] Sent: Friday, August 10, 2007 5:44 PM To: Relyea, Mike Cc: pgsql-performance@postgresql.org Subject: Re: [PERFORM] Help optimize view Try increasing join_collapse_limit --- you have just enough tables here that the planner isn't going to consider all possible join orders. And it sorta looks like it's picking a bad one. regards, tom lane I tried increasing join_collapse_limit with no significant change in run time although a different plan was chosen. I've included a re-send of my original post, it looks like it didn't go through - it's not in the archives. I've also included an explain analyze before and after the join_collapse_limit change. I'm have the following view as part of a larger, aggregate query that is running slower than I'd like. There are 4 views total, each very similar to this one. Each of the views is then left joined with data from some other tables to give me the final result that I'm looking for. I'm hoping that if I can get some insight in to how to make this view execute faster, I can apply that learning to the other 3 views and thereby decrease the run time for my aggregate query. I'm running 8.2.4 on Windows XP with a single 10K rpm disk dedicated to the data directory and 1.5 GB memory. shared_buffers = 12288 work_mem = 262144 maintenance_work_mem = 131072 max_fsm_pages = 204800 random_page_cost = 2.0 effective_cache_size = 1 autovacuum = on EXPLAIN ANALYZE SELECT PrintSamples.MachineID, PrintSamples.PrintCopyID, tblColors.ColorID, avg(ParameterValues.ParameterValue) AS Mottle_NMF FROM AnalysisModules JOIN (tblColors JOIN (tblTPNamesAndColors JOIN PrintSamples ON tblTPNamesAndColors.TestPatternName::text = PrintSamples.TestPatternName::text JOIN (DigitalImages JOIN PrintSampleAnalyses ON DigitalImages.ImageID = PrintSampleAnalyses.ImageID JOIN (ParameterNames JOIN (Measurements JOIN ParameterValues ON Measurements.MeasurementID = ParameterValues.MeasurementID) ON ParameterNames.ParameterID = ParameterValues.ParameterID) ON PrintSampleAnalyses.psaID = Measurements.psaID) ON PrintSamples.PrintSampleID = DigitalImages.PrintSampleID) ON tblColors.ColorID = tblTPNamesAndColors.ColorID) ON AnalysisModules.MetricID = Measurements.MetricID GROUP BY PrintSamples.MachineID, PrintSamples.PrintCopyID, tblColors.ColorID, AnalysisModules.AnalysisModuleName, ParameterNames.ParameterName, PrintSamples.TestPatternName HAVING AnalysisModules.AnalysisModuleName::text = 'NMF'::text AND ParameterNames.ParameterName::text = 'NMF'::text AND tblColors.ColorID 3 AND PrintSamples.TestPatternName::text ~~ 'IQAF-TP8%'::text; HashAggregate (cost=519801.96..519898.00 rows=7683 width=70) (actual time=121101.027..121146.385 rows=14853 loops=1) - Hash Join (cost=286101.76..519667.51 rows=7683 width=70) (actual time=52752.600..120989.713 rows=15123 loops=1) Hash Cond: (Measurements.MetricID = AnalysisModules.MetricID) - Hash Join (cost=286099.98..519260.45 rows=87588 width=61) (actual time=52752.502..120933.784 rows=15123 loops=1) Hash Cond: (ParameterValues.MeasurementID = Measurements.MeasurementID) - Nested Loop (cost=8054.81..238636.75 rows=454040 width=21) (actual time=165.510..67811.086 rows=289724 loops=1) - Seq Scan on ParameterNames (cost=0.00..1.94 rows=1 width=17) (actual time=0.012..0.026 rows=1 loops=1) Filter: ((ParameterName)::text = 'NMF'::text) - Bitmap Heap Scan on ParameterValues (cost=8054.81..231033.70 rows=608089 width=12) (actual time=165.481..67094.656 rows=289724 loops=1) Recheck Cond: (ParameterNames.ParameterID = ParameterValues.ParameterID) - Bitmap Index Scan on PVParameterID_idx (cost=0.00..7902.79 rows=608089 width=0) (actual time=141.013..141.013 rows=289724 loops=1) Index Cond: (ParameterNames.ParameterID = ParameterValues.ParameterID) - Hash (cost=259861.12..259861.12 rows=1454724 width=48) (actual time=52573.270..52573.270 rows=961097 loops=1) - Hash Join (cost=8139.75..259861.12 rows=1454724 width=48) (actual time=1399.575..50896.641 rows=961097 loops=1) Hash Cond: (Measurements.psaID = PrintSampleAnalyses.psaID) - Seq Scan on Measurements (cost=0.00..199469.09 rows=7541009 width=12) (actual time=6.697..37199.702 rows=7539838 loops=1) - Hash (cost=7949.67..7949.67 rows=15206 width=44) (actual time=1392.743..1392.743 rows=18901 loops=1) - Hash Join (cost=5069.24..7949.67 rows=15206 width=44) (actual time=986.589..1358.908 rows=18901 loops=1) Hash Cond: (PrintSampleAnalyses.ImageID = DigitalImages.ImageID) - Seq Scan on PrintSampleAnalyses (cost=0.00..2334.25 rows=78825 width=8) (actual time=13.747..158.867 rows=78859 loops=1) - Hash (cost=4879.10..4879.10 rows=15211
Re: [PERFORM] How to ENABLE SQL capturing???
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Jonathan Ellis wrote: On 8/10/07, Joshua D. Drake [EMAIL PROTECTED] wrote: -BEGIN PGP SIGNED MESSAGE- Hash: SHA1 smiley2211 wrote: Jeff, You are CORRECT...my queries were going to /var/log/messages...had to get the Linux Admin to grant me READ access to the file... You may want to actually get that to stop. Syslog is a notorious performance bottleneck for postgresql. Can you elaborate? The only reference to this I could find was a thread from 2004 where someone wasn't rotating his logs. I am not sure what to elaborate on :). Syslog is slow, logging to file isn't. Although both will certainly slow down your installation quite a bit, syslog will slow it down more. If I recall correctly, it is because syslog is blocking. Joshua D. Drake -Jonathan - -- === The PostgreSQL Company: Command Prompt, Inc. === Sales/Support: +1.503.667.4564 24x7/Emergency: +1.800.492.2240 PostgreSQL solutions since 1997 http://www.commandprompt.com/ UNIQUE NOT NULL Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate PostgreSQL Replication: http://www.commandprompt.com/products/ -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.6 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFGwHv6ATb/zqfZUUQRAgMlAKCcZpj+CCP50Deo/CsSCN21IyjrCACghXfN uJQ+qsu4FI4Kjf8fpNiWgnw= =BJ8E -END PGP SIGNATURE- ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] How to ENABLE SQL capturing???
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Gregory Stark wrote: Joshua D. Drake [EMAIL PROTECTED] writes: If I recall correctly, it is because syslog is blocking. Are you sure it isn't just that syslog fsyncs its log files after every log message? Nope I am not sure at all ;). Darcy actually found the issue and can speak better to it, I never use syslog and have always logged direct to file. I don't think the individual syslogs are synchronous but if syslog falls behind the buffer will fill and throttle the sender. If your Postgres data is on the same device as the syslogs those fsyncs will probably cause a big slowdown directly on Postgres's I/O as well. You can turn off the fsyncs in syslog by putting a - before the filename. - -- === The PostgreSQL Company: Command Prompt, Inc. === Sales/Support: +1.503.667.4564 24x7/Emergency: +1.800.492.2240 PostgreSQL solutions since 1997 http://www.commandprompt.com/ UNIQUE NOT NULL Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate PostgreSQL Replication: http://www.commandprompt.com/products/ -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.6 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFGwIOnATb/zqfZUUQRAqWqAKCEhoW/01Hc//cDEpREit8ipn2SZwCfUxPE 1Ir6eyuD4EcShwsn4sMAeKA= =W2cJ -END PGP SIGNATURE- ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PERFORM] How to ENABLE SQL capturing???
Joshua D. Drake [EMAIL PROTECTED] writes: If I recall correctly, it is because syslog is blocking. Are you sure it isn't just that syslog fsyncs its log files after every log message? I don't think the individual syslogs are synchronous but if syslog falls behind the buffer will fill and throttle the sender. If your Postgres data is on the same device as the syslogs those fsyncs will probably cause a big slowdown directly on Postgres's I/O as well. You can turn off the fsyncs in syslog by putting a - before the filename. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com ---(end of broadcast)--- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate
Re: [PERFORM] Help optimize view
On Mon, Aug 13, 2007 at 10:35 AM, in message [EMAIL PROTECTED], Relyea, Mike [EMAIL PROTECTED] wrote: I'm running 8.2.4 on Windows XP with 1.5 GB memory. shared_buffers = 12288 effective_cache_size = 1 For starters, you might want to adjust one or both of these. It looks to me like you're telling it that it only has 78.125 MB cache space. That will make it tend to want to scan entire tables, on the assumption that the cache hit ratio will be poor for random reads. Since you're on 8.2.4, you can use units of measure to help make this easier to read. You could, for example, say: shared_buffers = 96MB effective_cache_size = 1200MB -Kevin ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [PERFORM] Help optimize view
On Mon, Aug 13, 2007 at 10:35 AM, in message [EMAIL PROTECTED] .net, Relyea, Mike [EMAIL PROTECTED] wrote: I'm running 8.2.4 on Windows XP with 1.5 GB memory. shared_buffers = 12288 effective_cache_size = 1 For starters, you might want to adjust one or both of these. It looks to me like you're telling it that it only has 78.125 MB cache space. That will make it tend to want to scan entire tables, on the assumption that the cache hit ratio will be poor for random reads. Since you're on 8.2.4, you can use units of measure to help make this easier to read. You could, for example, say: shared_buffers = 96MB effective_cache_size = 1200MB -Kevin I've increased shared_buffers to 128MB, and restarted the server. My total run time didn't really change. SELECT set_config('effective_cache_size', '1000MB', false); I have another app that uses about 500MB. SELECT set_config('join_collapse_limit', '20', false); explain analyze SELECT PrintSamples.MachineID, PrintSamples.PrintCopyID, tblColors.ColorID, avg(ParameterValues.ParameterValue) AS Mottle_NMF FROM AnalysisModules JOIN (tblColors JOIN (tblTPNamesAndColors JOIN PrintSamples ON tblTPNamesAndColors.TestPatternName::text = PrintSamples.TestPatternName::text JOIN (DigitalImages JOIN PrintSampleAnalyses ON DigitalImages.ImageID = PrintSampleAnalyses.ImageID JOIN (ParameterNames JOIN (Measurements JOIN ParameterValues ON Measurements.MeasurementID = ParameterValues.MeasurementID) ON ParameterNames.ParameterID = ParameterValues.ParameterID) ON PrintSampleAnalyses.psaID = Measurements.psaID) ON PrintSamples.PrintSampleID = DigitalImages.PrintSampleID) ON tblColors.ColorID = tblTPNamesAndColors.ColorID) ON AnalysisModules.MetricID = Measurements.MetricID GROUP BY PrintSamples.MachineID, PrintSamples.PrintCopyID, tblColors.ColorID, AnalysisModules.AnalysisModuleName, ParameterNames.ParameterName, PrintSamples.TestPatternName HAVING AnalysisModules.AnalysisModuleName::text = 'NMF'::text AND ParameterNames.ParameterName::text = 'NMF'::text AND tblColors.ColorID 3 AND PrintSamples.TestPatternName::text ~~ 'IQAF-TP8%'::text; HashAggregate (cost=489274.71..489372.94 rows=7858 width=70) (actual time=117632.844..117663.228 rows=14853 loops=1) - Hash Join (cost=256774.03..489137.20 rows=7858 width=70) (actual time=50297.022..117530.665 rows=15123 loops=1) Hash Cond: (ParameterValues.MeasurementID = Measurements.MeasurementID) - Nested Loop (cost=8054.81..238636.75 rows=454040 width=21) (actual time=172.341..66959.288 rows=289724 loops=1) - Seq Scan on ParameterNames (cost=0.00..1.94 rows=1 width=17) (actual time=0.020..0.034 rows=1 loops=1) Filter: ((ParameterName)::text = 'NMF'::text) - Bitmap Heap Scan on ParameterValues (cost=8054.81..231033.70 rows=608089 width=12) (actual time=172.297..66241.380 rows=289724 loops=1) Recheck Cond: (ParameterNames.ParameterID = ParameterValues.ParameterID) - Bitmap Index Scan on PVParameterID_idx (cost=0.00..7902.79 rows=608089 width=0) (actual time=147.690..147.690 rows=289724 loops=1) Index Cond: (ParameterNames.ParameterID = ParameterValues.ParameterID) - Hash (cost=247087.84..247087.84 rows=130510 width=57) (actual time=50109.022..50109.022 rows=15123 loops=1) - Hash Join (cost=8141.52..247087.84 rows=130510 width=57) (actual time=11095.022..50057.777 rows=15123 loops=1) Hash Cond: (Measurements.psaID = PrintSampleAnalyses.psaID) - Hash Join (cost=1.77..234364.57 rows=661492 width=21) (actual time=31.457..48123.380 rows=289724 loops=1) Hash Cond: (Measurements.MetricID = AnalysisModules.MetricID) - Seq Scan on Measurements (cost=0.00..199469.09 rows=7541009 width=12) (actual time=10.920..37814.792 rows=7539838 loops=1) - Hash (cost=1.71..1.71 rows=5 width=17) (actual time=0.066..0.066 rows=5 loops=1) - Seq Scan on AnalysisModules (cost=0.00..1.71 rows=5 width=17) (actual time=0.032..0.049 rows=5 loops=1) Filter: ((AnalysisModuleName)::text = 'NMF'::text) - Hash (cost=7949.67..7949.67 rows=15206 width=44) (actual time=1424.025..1424.025 rows=18901 loops=1) - Hash Join (cost=5069.24..7949.67 rows=15206 width=44) (actual time=1007.901..1387.787 rows=18901 loops=1) Hash Cond: (PrintSampleAnalyses.ImageID = DigitalImages.ImageID) - Seq Scan on PrintSampleAnalyses (cost=0.00..2334.25 rows=78825 width=8) (actual time=4.432..153.090 rows=78859 loops=1) - Hash (cost=4879.10..4879.10 rows=15211 width=44) (actual time=1003.424..1003.424 rows=18901 loops=1) - Hash Join (cost=2220.11..4879.10 rows=15211 width=44) (actual time=348.841..968.194 rows=18901 loops=1) Hash Cond: (DigitalImages.PrintSampleID = PrintSamples.PrintSampleID) - Seq Scan on DigitalImages
[PERFORM] Proposal: Pluggable Optimizer Interface
Hi All, Tomas Kovarik and I have presented at PGCon 2007 in Ottawa the ideas about other possible optimizer algorithms to be used in PostgreSQL. We are quite new to PostgreSQL project so it took us some time to go through the sources end explore the possibilities how things could be implemented. There is a proposal attached to this mail about the interface we would like to implement for switching between different optimizers. Please review it and provide a feedback to us. Thank You. Regards Julius Stroffek Proposal for Pluggable Optimizer Interface == Overview We have presented at PGCon 2007 in Ottawa couple of other approaches and algorithms that can be used for query optimization, see http://www.pgcon.org/2008/papers/Execution_Plan_Optimization_Techniques_Julius_Stroffek.pdf We have focused on algorithms for searching the space of all possible orders of joins including bushy trees. The possible algorithms include * Dynamic Programming (already implemented in PostgreSQL) * Genetic Algorithm (already implemented in PostgreSQL) * Dijkstra's Algorithm * A* Search Algorithm * Greedy âNearest Neighborâ Algorithm * Hill-Climbing * Simulated Annealing * Iterative Improvement * Two Phase Optimization * Toured Simulated Annealing Choosing the best algorithm from the above list is difficult. We have to consider the length of the optimizer computation vs. the quality of the solution which we would like to achieve. We may want to do some hybrid optimization â run couple of the algorithms from the above and choose the best solution found. We might run some very fast algorithm at the beginning and depending on the solution cost we may decide whether it is worthwhile to try to optimize the plan even more (using other algorithm with longer running time but producing better solutions). Therefore we would like to propose an interface which can be used to switch between different optimizer algorithms and/or allow a user to write and use his own implementation. It would allow the community to ship more optimizer algorithms as contrib modules and users may then decide which of those algorithms should be used for their queries. Creating an optimizer = We would propose to create a catalog holding the available optimizers in the system called pg_optimizer. Users could than use a SET command to switch between different optimizers. postgres=# select * from pg_optimizer; optname | optproc -+--- geqo| geqo_optimizer dynamic | dynamic_optimizer greedy | greedy_optimizer (4 rows) postgres=# set optimizer=greedy; SET Optimizer Invocation Point == There is a code in function make_rel_from_joinlist which decides whether to invoke dynamic programming or genetic algorithm for query optimization. We would propose to place the invocation of the plugged optimizer to the same place and with the same parameters as function geqo and make_one_rel_by_joins are currently invoked. Creating and dropping an optimizer == The optimizer function have to be implemented as a C-Language Function using âVersion 1 Calling Conventionsâ. The return type of the function is RelOptInfo * and the arguments passed to the function are 1.PlannerInfo *root 2.int levels_needed 3.List * initial_rels The proper âCREATE FUNCTIONâ statement have to be used to create the optimizer function. CREATE FUNCTION greedyoptimizer(internal, int, internal) RETURNS internal AS 'mylib', 'greedy_optimizer' LANGUAGE C ; Once, the optimizer function is created user may create an optimizer using the function with the statement CREATE OPTIMIZER greedy ( function = greedyoptimizer comment = 'Greedy Nearest Neighbor Optimizer' ); If the user decides not to use the optimizer anymore he can invoke DROP OPTIMIZER greedy; User have to also drop the optimizer function with DROP FUNCTION greedyoptimizer; Project TODO List = 1.Create a pg_optimizer catalog to hold available optimizers. 2.Create wrappers above the current dynamic programming and genetic algorithm optimizers to be used to call those implementations. 3.Modify the parser and add the functions to handle and execute the CREATE/DROP OPTIMIZER statements. 4.Modify GUC that it would be possible to switch between optimizers. 5.Change the code at the optimizer invocation point that the appropriate optimizer function would be called. 6.Handle object dependencies â make an entry in pg_depend that optimizer depends on its function. 7.Implement '\dO' command that will list the available optimizers. 8.Create a contrib module and ship some other optimizer algorithms. 9.Any other suggestion, comments and changes that will come out from the review of this proposal. Things to Decide 1.Rights. Who can create/drop optimizers? Who can use
Re: [PERFORM] Help optimize view
On Mon, Aug 13, 2007 at 1:48 PM, in message [EMAIL PROTECTED], Relyea, Mike [EMAIL PROTECTED] wrote: I've increased shared_buffers to 128MB, and restarted the server. My total run time didn't really change. Please forgive me if this guess doesn't help either, but could you try eliminating the GROUP BY options which don't echo values in the select value list, and move the HAVING conditions to a WHERE clause? Something like: explain analyze SELECT PrintSamples.MachineID, PrintSamples.PrintCopyID, tblColors.ColorID, avg(ParameterValues.ParameterValue) AS Mottle_NMF FROM AnalysisModules JOIN ( tblColors JOIN ( tblTPNamesAndColors JOIN PrintSamples ON (tblTPNamesAndColors.TestPatternName::text = PrintSamples.TestPatternName::text) JOIN ( DigitalImages JOIN PrintSampleAnalyses ON (DigitalImages.ImageID = PrintSampleAnalyses.ImageID) JOIN ( ParameterNames JOIN ( Measurements JOIN ParameterValues ON Measurements.MeasurementID = ParameterValues.MeasurementID ) ON ParameterNames.ParameterID = ParameterValues.ParameterID ) ON PrintSampleAnalyses.psaID = Measurements.psaID ) ON PrintSamples.PrintSampleID = DigitalImages.PrintSampleID ) ON tblColors.ColorID = tblTPNamesAndColors.ColorID ) ON AnalysisModules.MetricID = Measurements.MetricID WHERE AnalysisModules.AnalysisModuleName::text = 'NMF'::text AND ParameterNames.ParameterName::text = 'NMF'::text AND PrintSamples.TestPatternName::text ~~ 'IQAF-TP8%'::text AND tblColors.ColorID 3 GROUP BY PrintSamples.MachineID, PrintSamples.PrintCopyID, tblColors.ColorID ; I'd also be inclined to simplify the FROM clause by eliminating the parentheses and putting the ON conditions closer to where they are used, but that would be more for readability than any expectation that it would affect the plan. -Kevin ---(end of broadcast)--- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate
Re: [PERFORM] Help optimize view
Relyea, Mike [EMAIL PROTECTED] writes: I've increased shared_buffers to 128MB, and restarted the server. My total run time didn't really change. It doesn't look like you can hope for much in terms of improving the plan. The bulk of the time is going into scanning ParameterValues and Measurements, but AFAICS there is no way for the query to pull fewer rows from those tables than it is doing, and the size of the join means that a nestloop indexscan is likely to suck. (You could try forcing one by setting enable_hashjoin and enable_mergejoin to OFF, but I don't have much hope for that.) If you haven't played with work_mem yet, increasing that might make the hash joins go a bit faster --- but it looks like most of the time is going into the raw relation scans, so there's not going to be a lot of win to be had there either. Basically, joining lots of rows like this takes awhile. If you have to have a faster answer, I can only suggest rethinking your table design. Sometimes denormalization of the schema is necessary for performance. regards, tom lane ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] [HACKERS] Proposal: Pluggable Optimizer Interface
Julius Stroffek wrote: Hi All, Tomas Kovarik and I have presented at PGCon 2007 in Ottawa the ideas about other possible optimizer algorithms to be used in PostgreSQL. We are quite new to PostgreSQL project so it took us some time to go through the sources end explore the possibilities how things could be implemented. There is a proposal attached to this mail about the interface we would like to implement for switching between different optimizers. Please review it and provide a feedback to us. Thank You. hmm - how does is that proposal different from what got implemented with: http://archives.postgresql.org/pgsql-committers/2007-05/msg00315.php Stefan ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
[PERFORM] Stable function optimisation
Hello! Here's my test database: # table CREATE TABLE public.t ( id integer NOT NULL, a integer NOT NULL, CONSTRAINT pk_t PRIMARY KEY (id) ) CREATE INDEX idx_t_a ON public.t USING btree (a); # function CREATE OR REPLACE FUNCTION public.f() RETURNS integer AS $BODY$BEGIN RETURN 1; END$BODY$ LANGUAGE 'plpgsql' STABLE; # view CREATE OR REPLACE VIEW public.v AS SELECT t.id, t.a FROM public.t WHERE public.f() = t.a; # f() is stable test=# explain analyze select * from public.v; QUERY PLAN Seq Scan on t (cost=0.00..1991.00 rows=51200 width=8) (actual time=0.060..458.476 rows=50003 loops=1) Filter: (f() = a) Total runtime: 626.341 ms (3 rows) # changing f() to immutable test=# explain analyze select * from public.v; QUERY PLAN Seq Scan on t (cost=0.00..1741.00 rows=51200 width=8) (actual time=0.165..199.215 rows=50003 loops=1) Filter: (1 = a) Total runtime: 360.819 ms (3 rows) # changing f() to volatile test=# explain analyze select * from public.v; QUERY PLAN Seq Scan on t (cost=0.00..1991.00 rows=5 width=8) (actual time=0.217..560.426 rows=50003 loops=1) Filter: (f() = a) Total runtime: 732.655 ms (3 rows) The biggest question here is: Why is the runtime of the query with the stable function not near the runtime of the immutable function? It's definitely one query and the manual states that a stable function does not change in one statement and therefore can be optimised. Is this a pg problem or did I do something wrong? Thank you for your help! Philipp ---(end of broadcast)--- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate
Re: [PERFORM] Help optimize view
On Mon, Aug 13, 2007 at 1:48 PM, in message [EMAIL PROTECTED] .net, Relyea, Mike [EMAIL PROTECTED] wrote: I've increased shared_buffers to 128MB, and restarted the server. My total run time didn't really change. Please forgive me if this guess doesn't help either, but could you try eliminating the GROUP BY options which don't echo values in the select value list, and move the HAVING conditions to a WHERE clause? Something like: explain analyze SELECT PrintSamples.MachineID, PrintSamples.PrintCopyID, tblColors.ColorID, avg(ParameterValues.ParameterValue) AS Mottle_NMF FROM AnalysisModules JOIN ( tblColors JOIN ( tblTPNamesAndColors JOIN PrintSamples ON (tblTPNamesAndColors.TestPatternName::text = PrintSamples.TestPatternName::text) JOIN ( DigitalImages JOIN PrintSampleAnalyses ON (DigitalImages.ImageID = PrintSampleAnalyses.ImageID) JOIN ( ParameterNames JOIN ( Measurements JOIN ParameterValues ON Measurements.MeasurementID = ParameterValues.MeasurementID ) ON ParameterNames.ParameterID = ParameterValues.ParameterID ) ON PrintSampleAnalyses.psaID = Measurements.psaID ) ON PrintSamples.PrintSampleID = DigitalImages.PrintSampleID ) ON tblColors.ColorID = tblTPNamesAndColors.ColorID ) ON AnalysisModules.MetricID = Measurements.MetricID WHERE AnalysisModules.AnalysisModuleName::text = 'NMF'::text AND ParameterNames.ParameterName::text = 'NMF'::text AND PrintSamples.TestPatternName::text ~~ 'IQAF-TP8%'::text AND tblColors.ColorID 3 GROUP BY PrintSamples.MachineID, PrintSamples.PrintCopyID, tblColors.ColorID ; I'd also be inclined to simplify the FROM clause by eliminating the parentheses and putting the ON conditions closer to where they are used, but that would be more for readability than any expectation that it would affect the plan. -Kevin Thanks for your help. Re-writing the view like this maybe bought me something. I've pasted the explain analyze results below. Tough to tell because I also increased some of the statistics. From what Tom says, it sounds like if I want the data returned faster I'm likely to have to get beefier hardware. ALTER TABLE ParameterValues ALTER MeasurementID SET STATISTICS 500; ALTER TABLE ParameterValues ALTER ParameterID SET STATISTICS 500; ANALYZE ParameterValues; ALTER TABLE Measurements ALTER COLUMN MetricID SET STATISTICS 500; ALTER TABLE Measurements ALTER COLUMN psaID SET STATISTICS 500; ANALYZE Measurements; Running the above SQL: HashAggregate (cost=461541.53..461634.88 rows=7468 width=16) (actual time=110002.041..110024.777 rows=14853 loops=1) - Hash Join (cost=230789.57..461464.70 rows=7683 width=16) (actual time=56847.814..109936.722 rows=15123 loops=1) Hash Cond: (Measurements.MetricID = AnalysisModules.MetricID) - Hash Join (cost=230787.80..461057.64 rows=87588 width=20) (actual time=56847.697..109884.122 rows=15123 loops=1) Hash Cond: (ParameterValues.MeasurementID = Measurements.MeasurementID) - Nested Loop (cost=6353.15..234044.47 rows=454038 width=8) (actual time=179.154..52780.680 rows=289724 loops=1) - Seq Scan on ParameterNames (cost=0.00..1.94 rows=1 width=4) (actual time=0.012..0.027 rows=1 loops=1) Filter: ((ParameterName)::text = 'NMF'::text) - Bitmap Heap Scan on ParameterValues (cost=6353.15..228047.32 rows=479617 width=12) (actual time=179.123..52102.572 rows=289724 loops=1) Recheck Cond: (ParameterNames.ParameterID = ParameterValues.ParameterID) - Bitmap Index Scan on PVParameterID_idx (cost=0.00..6233.25 rows=479617 width=0) (actual time=152.752..152.752 rows=289724 loops=1) Index Cond: (ParameterNames.ParameterID = ParameterValues.ParameterID) - Hash (cost=206253.42..206253.42 rows=1454498 width=20) (actual time=56657.022..56657.022 rows=961097 loops=1) - Nested Loop (cost=5069.24..206253.42 rows=1454498 width=20) (actual time=932.249..55176.315 rows=961097 loops=1) - Hash Join (cost=5069.24..7949.67 rows=15206 width=16) (actual time=908.275..1257.120 rows=18901 loops=1) Hash Cond: (PrintSampleAnalyses.ImageID = DigitalImages.ImageID) - Seq Scan on PrintSampleAnalyses (cost=0.00..2334.25 rows=78825 width=8) (actual time=10.440..139.945 rows=78859 loops=1) - Hash (cost=4879.10..4879.10 rows=15211 width=16) (actual time=897.776..897.776 rows=18901 loops=1) - Hash Join (cost=2220.11..4879.10 rows=15211 width=16) (actual time=297.330..868.632 rows=18901 loops=1) Hash Cond: (DigitalImages.PrintSampleID = PrintSamples.PrintSampleID) - Seq Scan on DigitalImages (cost=0.00..1915.50
Re: [PERFORM] Stable function optimisation
Philipp Specht [EMAIL PROTECTED] writes: The biggest question here is: Why is the runtime of the query with the stable function not near the runtime of the immutable function? Stable functions don't get folded to constants. It's definitely one query and the manual states that a stable function does not change in one statement and therefore can be optimised. That's not the type of optimization that gets done with it. What STABLE is for is marking functions that are safe to use in index conditions. If you'd been using an indexable condition you'd have seen three different behaviors here. (I see that you do have an index on t.a, but apparently there are too many matching rows for the planner to think the index is worth using.) regards, tom lane ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [PERFORM] [HACKERS] Proposal: Pluggable Optimizer Interface
Stefan Kaltenbrunner [EMAIL PROTECTED] writes: Julius Stroffek wrote: There is a proposal attached to this mail about the interface we would like to implement for switching between different optimizers. Please review it and provide a feedback to us. hmm - how does is that proposal different from what got implemented with: http://archives.postgresql.org/pgsql-committers/2007-05/msg00315.php Well, it's a very different level of abstraction. The planner_hook would allow you to replace the *entire* planner, but if you only want to replace GEQO (that is, only substitute some other heuristics for partial search of a large join-order space), doing it from planner_hook will probably require duplicating a great deal of code. A hook right at the place where we currently choose geqo or regular would be a lot easier to experiment with. Replacing GEQO sounds like a fine area for investigation to me; I've always been dubious about whether it's doing a good job. But I'd prefer a simple hook function pointer designed in the same style as planner_hook (ie, intended to be overridden by a loadable module). The proposed addition of a system catalog and SQL-level management commands sounds like a great way to waste a lot of effort on mere decoration, before ever getting to the point of being able to demonstrate that there's any value in it. Also, while we might accept a small hook-function patch for 8.3, there's zero chance of any of that other stuff making it into this release cycle. regards, tom lane ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [PERFORM] Help optimize view
On Mon, Aug 13, 2007 at 4:00 PM, in message [EMAIL PROTECTED], Relyea, Mike [EMAIL PROTECTED] wrote: Re-writing the view like this maybe bought me something. Tough to tell because I also increased some of the statistics. I don't know whether it was the finer-grained statistics or the simplification, but it bought you a new plan. I don't know if the seven second improvement is real or within the run-to-run variation, though; it could be because you happened to be better-cached at the time. From what Tom says, it sounds like if I want the data returned faster I'm likely to have to get beefier hardware. That's not what he suggested. If you introduce redundancy in a controlled fashion, you could have a single table with an index to more quickly get you to the desired set of data. That can be maintained on an ongoing basis (possibly using triggers) or could be materialized periodically or prior to running a series of reports or queries. Such redundancies violate the normalization rules which are generally used in database design, but some denormalization is often needed for acceptable performance. -Kevin ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] [HACKERS] Proposal: Pluggable Optimizer Interface
Stefan, thanks for pointing this out. I missed this change. We would like to place the hooks to a different place in the planner and we would like to just replace the non-deterministic algorithm searching for the best order of joins and keep the rest of the planner untouched. I am not quite sure about the usage from the user point of view of what got implemented. I read just the code of the patch. Are there more explanations somewhere else? I understood that if the user creates his own implementation of the planner which can be stored in some external library, he have to provide some C language function as a hook activator which will assign the desired value to the planner_hook variable. Both, the activator function and the new planner implementation have to be located in the same dynamic library which will be loaded when CREATE FUNCTION statement would be used on hook activator function. Am I correct? Have I missed something? If the above is the case than it is exactly what we wanted except we would like to have the hook also in the different place. There are more things in the proposal as a new pg_optimizer catalog and different way of configuring the hooks. However, this thinks are not mandatory for the functionality but are more user friendly. Thanks Julo Stefan Kaltenbrunner wrote: Julius Stroffek wrote: Hi All, Tomas Kovarik and I have presented at PGCon 2007 in Ottawa the ideas about other possible optimizer algorithms to be used in PostgreSQL. We are quite new to PostgreSQL project so it took us some time to go through the sources end explore the possibilities how things could be implemented. There is a proposal attached to this mail about the interface we would like to implement for switching between different optimizers. Please review it and provide a feedback to us. Thank You. hmm - how does is that proposal different from what got implemented with: http://archives.postgresql.org/pgsql-committers/2007-05/msg00315.php Stefan
Re: [PERFORM] Performance on writable views
On Aug 11, 2007, at 8:58 AM, Joshua D. Drake wrote: Heikki Linnakangas wrote: Enrico Weigelt wrote: I'm often using writable views as interfaces to clients, so they only see virtual objects and never have to cope with the actual storage, ie. to give some client an totally denormalized view of certain things, containing only those information required for certain kind of operations. Now I've got the strange feeling that this makes updates slow, since it always has to run the whole view query to fetch an record to be updated (ie. to get OLD.*). There is some overhead in rewriting the query, but it shouldn't be significantly slower than issuing the statements behind the view directly. I wouldn't worry about it, unless you have concrete evidence that it's causing problems. I don't know about that, at least when using rules for partitioning the impact can be significant in comparison to triggers. That's because you have to re-evaluate the input query for each rule that's defined, so even if you only have rules for 2 partitions in a table (which is really about the minimum you can have, at least for some period of overlap surrounding the time when you switch to a new partition), you're looking at evaluating every input query twice. In this case, the rules presumably are just simply re-directing DML, so there'd only be one rule in play at a time. That means the only real overhead is in the rewrite engine. -- Decibel!, aka Jim Nasby[EMAIL PROTECTED] EnterpriseDB http://enterprisedb.com 512.569.9461 (cell) ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [PERFORM] [HACKERS] Proposal: Pluggable Optimizer Interface
Julius Stroffek [EMAIL PROTECTED] writes: I understood that if the user creates his own implementation of the planner which can be stored in some external library, he have to provide some C language function as a hook activator which will assign the desired value to the planner_hook variable. Both, the activator function and the new planner implementation have to be located in the same dynamic library which will be loaded when CREATE FUNCTION statement would be used on hook activator function. You could do it that way if you wanted, but a minimalistic solution is just to install the hook from the _PG_init function of a loadable library, and then LOAD is sufficient for a user to execute the thing. There's a small example at http://archives.postgresql.org/pgsql-patches/2007-05/msg00421.php Also, having the loadable module add a custom GUC variable would likely be a preferable solution for control purposes than making specialized functions. I attach another small hack I made recently, which simply scales all the planner's relation size estimates by a scale_factor GUC; this is handy for investigating how a plan will change with relation size, without having to actually create gigabytes of test data. There are more things in the proposal as a new pg_optimizer catalog and different way of configuring the hooks. However, this thinks are not mandatory for the functionality but are more user friendly. Granted, but at this point we are talking about infrastructure for planner-hackers to play with, not something that's intended to be a long-term API for end users. It may or may not happen that we ever need a user API for this at all. I think a planner that just does the right thing is far preferable to one with a lot of knobs that users have to know how to twiddle, so I see this more as scaffolding on which someone can build and test the replacement for GEQO; which ultimately would go in without any user-visible API additions. regards, tom lane #include postgres.h #include fmgr.h #include commands/explain.h #include optimizer/plancat.h #include optimizer/planner.h #include utils/guc.h PG_MODULE_MAGIC; void_PG_init(void); void_PG_fini(void); static double scale_factor = 1.0; static void my_get_relation_info(PlannerInfo *root, Oid relationObjectId, bool inhparent, RelOptInfo *rel); /* * Get control during planner's get_relation_info() function, which sets up * a RelOptInfo struct based on the system catalog contents. We can modify * the struct contents to cause the planner to work with a hypothetical * situation rather than what's actually in the catalogs. * * This simplistic example just scales all relation size estimates by a * user-settable factor. */ static void my_get_relation_info(PlannerInfo *root, Oid relationObjectId, bool inhparent, RelOptInfo *rel) { ListCell *ilist; /* Do nothing for an inheritance parent RelOptInfo */ if (inhparent) return; rel-pages = (BlockNumber) ceil(rel-pages * scale_factor); rel-tuples = ceil(rel-tuples * scale_factor); foreach(ilist, rel-indexlist) { IndexOptInfo *ind = (IndexOptInfo *) lfirst(ilist); ind-pages = (BlockNumber) ceil(ind-pages * scale_factor); ind-tuples = ceil(ind-tuples * scale_factor); } } /* * _pg_init() - library load-time initialization * * DO NOT make this static nor change its name! */ void _PG_init(void) { /* Get into the hooks we need to be in all the time */ get_relation_info_hook = my_get_relation_info; /* Make scale_factor accessible through GUC */ DefineCustomRealVariable(scale_factor, , , scale_factor, 0.0001, 1e9, PGC_USERSET, NULL, NULL); } /* * _PG_fini() - library unload-time finalization * * DO NOT make this static nor change its name! */ void _PG_fini(void) { /* Get out of all the hooks (just to be sure) */ get_relation_info_hook = NULL; } ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [PERFORM] Help optimize view
On Mon, Aug 13, 2007 at 4:25 PM, in message [EMAIL PROTECTED], Kevin Grittner [EMAIL PROTECTED] wrote: On Mon, Aug 13, 2007 at 4:00 PM, in message [EMAIL PROTECTED], Relyea, Mike [EMAIL PROTECTED] wrote: From what Tom says, it sounds like if I want the data returned faster I'm likely to have to get beefier hardware. That's not what he suggested. If you introduce redundancy in a controlled fashion, you could have a single table with an index to more quickly get you to the desired set of data. That can be maintained on an ongoing basis (possibly using triggers) or could be materialized periodically or prior to running a series of reports or queries. Such redundancies violate the normalization rules which are generally used in database design, but some denormalization is often needed for acceptable performance. One last thought regarding your table structure -- I noticed you were often joining on column names ending in ID and selecting using column names ending in Name, where the values for the name columns were only a few characters long. It is not always a good idea to create a meaningless ID number for a primary key if you have a meaningful value (or combination of values) which would uniquely identify a row. If you were able to use the columns in your search criteria as keys, you would have them in the Measurements table without creating any troublesome redundancy. You could then add Measurements indexes on these columns, and your query might run in under a second. The down side of meaningful keys (oft cited by proponents of the technique) is that if you decide that everything with an AnalysisModuleName name of 'NMF' should now be named 'NMX', you would have to update all rows which contain the old value. To be able to do this safely and reliably, you would want to use DOMAIN definitions rigorously. If you link through meaningless ID numbers (and what would be the point of changing those?) you can change 'NMF' to 'NMX' in one place, and everything would reflect the new value, since it would always join to one place for those characters. -Kevin ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] [HACKERS] Proposal: Pluggable Optimizer Interface
Tom, Also, while we might accept a small hook-function patch for 8.3, there's zero chance of any of that other stuff making it into this release cycle. I don't think anyone was thinking about 8.3. This is pretty much 8.4 stuff; Julius is just raising it now becuase they don't want to go down the wrong path and waste everyone's time. -- --Josh Josh Berkus PostgreSQL @ Sun San Francisco ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [PERFORM] [HACKERS] Proposal: Pluggable Optimizer Interface
Josh Berkus [EMAIL PROTECTED] writes: Tom, Also, while we might accept a small hook-function patch for 8.3, there's zero chance of any of that other stuff making it into this release cycle. I don't think anyone was thinking about 8.3. This is pretty much 8.4 stuff; Julius is just raising it now becuase they don't want to go down the wrong path and waste everyone's time. Well, if they get the hook in now, then in six months or so when they have something to play with, people would be able to play with it. If not, there'll be zero uptake till after 8.4 is released... regards, tom lane ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] Dell Hardware Recommendations
On 13-Aug-07, at 9:50 AM, Vivek Khera wrote: On Aug 10, 2007, at 4:36 PM, Merlin Moncure wrote: I'm not so sure I agree. They are using LSI firmware now (and so is everyone else). The servers are well built (highly subjective, I admit) and configurable. I have had some bad experiences with IBM gear (adaptec controller though), and white box parts 3ware, etc. I can tell you that dell got us the storage and the server in record time do agree on adaptec however Ok, perhaps you got luckier... I have two PowerVault 220 rack mounts with U320 SCSI drives in them. With an LSI 320-2X controller, it *refuses* to recognize some of the drives (channel 1 on either array). Dell blames LSI, LSI blames dell's backplane. This is consistent across multiple controllers we tried, and two different Dell disk arrays. Dropping the SCSI speed to 160 is the only way to make them work. I tend to believe LSI here. This is the crux of the argument here. Perc/5 is a dell trademark. They can ship any hardware they want and call it a Perc/5. Dave The Adaptec 2230SLP controller recognizes the arrays fine, but tends to drop devices at inopportune moments. Re-seating dropped devices starts a rebuild, but the speed is recognized as 1 and the rebuild takes two lifetimes to complete unless you insert a reboot of the system in there. Totally unacceptable. Again, dropping the scsi rate to 160 seems to make it more stable. ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match