Re: [PERFORM] NFS, file system cache and shared_buffers
John Melesky wrote: I just learned that NFS does not use a file system cache on the client side. That's ... incorrect. NFS is cache-capable. NFSv3 (I think? It may have been v2) started sending metadata on file operations that was intended to allow for client-side caches. NFSv4 added all sorts of stateful behavior which allows for much more aggressive caching. What do you mean by allows? Does it cache files in memory or not? Do you need additional software? Special configuration? Where did you read that you could not use caching with NFS? I have it by hearsay from somebody who seemed knowledgable, and the existence of CacheFS seemed to indicate it was true. Yours, Laurenz Albe -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] NFS, file system cache and shared_buffers
Jeff Janes wrote: All that said, there has always been a recommendation of caution around using NFS as a backing store for PG, or any RDBMS.. I know that Oracle recommends it - they even built an NFS client into their database server to make the most of it. Last I heard (which has been a while), Oracle supported specific brand named implementations of NFS, and warned against any others on a data integrity basis. I couldn't find any detailed information, but it seems that only certain NFS devices are supported. Why would they implement their own client? Did they have to do something special in their client to make it safe? I think it is mostly a performance issue. Each backend mounts its own copy of the data files it needs. Yours, Laurenz Albe -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Planner doesn't take indexes into account
random_page_cost = 4.0 seq_page_cost = 1.0 There is about 500,000 rows and about 500 new rows each business day. About 96% of rows meet given conditions, that is, count shoud be about 480,000. BR, Grzegorz Olszewski Date: Tue, 27 May 2014 14:14:21 -0700 Subject: Re: [PERFORM] Planner doesn't take indexes into account From: rumman...@gmail.com To: grzegorz.olszew...@outlook.com CC: pgsql-performance@postgresql.org What is random_page_cost and seq_page_cost in your server?And how many rows does the table have? On Tue, May 27, 2014 at 2:09 PM, Grzegorz Olszewski grzegorz.olszew...@outlook.com wrote: Hi, I wonder why planner uses Seq Scan instead of Index Scan. Here is my table (partial): content.contents -+-+- id | bigint | niepusty domyślnie nextval('content.contents_id_seq'::regclass) version | integer | niepusty date_published | timestamp without time zone | moderation_status | character varying(50) | publication_status | character varying(30) | And indexes (there are some other indexes too): contents_id_pkey PRIMARY KEY, btree (id) contents_date_published_idx btree (date_published) contents_moderation_status_idx btree (moderation_status) contents_publication_status_idx btree (publication_status) I tried also creating following indexes: contents_date_published_publication_status_moderation_statu_idx btree (date_published, publication_status, moderation_status) contents_publication_status_idx1 btree ((publication_status::text)) contents_moderation_status_idx1 btree ((moderation_status::text)) Then for this query (genrated by Hibernate): explain (analyze, buffers) select count(*) as y0_ from content.contents this_ inner join content.content_categories cat1_ on this_.CONTENT_CATEGORY_ID=cat1_.ID where cat1_.name in ([...]) and this_.date_published='2014-05-26 12:23:31.557000 +02:00:00' and (this_.PUBLICATION_STATUS is null or this_.PUBLICATION_STATUS'DRAFT') and (this_.moderation_status is null or this_.moderation_status'DANGEROUS') and exists(select * from content.content_visibilities cv where cv.content_id = this_.ID and cv.user_group_id in (1,2)); Planner creates such plan: QUERY PLAN -- Hash Semi Join (cost=31706.84..106020.81 rows=21871 width=2076) (actual time=1197.658..6012.406 rows=430218 loops=1) Hash Cond: (this_.id = cv.content_id) Buffers: shared hit=5 read=59031 written=3, temp read=47611 written=47549 - Hash Join (cost=2.22..56618.11 rows=22881 width=2076) (actual time=0.163..1977.304 rows=430221 loops=1) Hash Cond: (this_.content_category_id = cat1_.id) Buffers: shared hit=1 read=46829 written=1 - Seq Scan on contents this_ (cost=0.00..54713.92 rows=446176 width=2030) (actual time=0.048..915.724 rows=450517 loops=1) Filter: ((date_published = '2014-05-26 12:23:31.557'::timestamp without time zone) AND ((publication_status IS NULL) OR ((publication_status)::text 'DRAFT'::text)) AND ((moderation_status IS NULL) OR ((moderation_status)::text 'DANGEROUS'::text))) Rows Removed by Filter: 50 Buffers: shared read=46829 written=1 - Hash (cost=2.17..2.17 rows=4 width=46) (actual time=0.089..0.089 rows=4 loops=1) Buckets: 1024 Batches: 1 Memory Usage: 1kB Buffers: shared hit=1 - Seq Scan on content_categories cat1_ (cost=0.00..2.17 rows=4 width=46) (actual time=0.053..0.076 rows=4 loops=1) Filter: ((name)::text = ANY ('{przeglad-prasy/rp,przeglad-prasy/parkiet,komunikat-z-rynku-pap-emitent,komunikat-z-rynku-pap-depesze}'::text[])) Rows Removed by Filter: 74 Buffers: shared hit=1 - Hash (cost=24435.09..24435.09 rows=443083 width=8) (actual time=1197.146..1197.146 rows=447624 loops=1) Buckets: 4096 Batches: 32 Memory Usage: 560kB Buffers: shared hit=4 read=12202 written=2, temp written=1467 - Bitmap Heap Scan on content_visibilities cv (cost=7614.55..24435.09 rows=443083 width=8) (actual time=61.034..647.729 rows=447624 loops=1) Recheck Cond: (user_group_id = ANY ('{1,2}'::bigint[])) Buffers: shared hit=4
Re: [PERFORM] Planner doesn't take indexes into account
On 05/28/2014 12:59 PM, Grzegorz Olszewski wrote: random_page_cost = 4.0 seq_page_cost = 1.0 There is about 500,000 rows and about 500 new rows each business day. About 96% of rows meet given conditions, that is, count shoud be about 480,000. When such a large percentage of the rows match, a sequential scan is indeed a better plan than an index scan. Sequential access is much faster than random access. - Heikki -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Planner doesn't take indexes into account
On 05/28/2014 04:59 AM, Grzegorz Olszewski wrote: There is about 500,000 rows and about 500 new rows each business day. About 96% of rows meet given conditions, that is, count shoud be about 480,000. Heikki is right on this. Indexes are not a magic secret sauce that are always used simply because they exist. Think of it like this... If the table really matches about 480,000 rows, by forcing it to use the index, it has to perform *at least* 480,000 random seeks. Even if you have a high-performance SSD array that can do 100,000 random reads per second, you will need about five seconds just to read the data. A sequence scan can perform that same operation in a fraction of a second because it's faster to read the entire table and filter out the *non* matching rows. Indexes are really only used, or useful, when the number of matches is much lower than the row count of the table. I highly recommend reading up on cardinality and selectivity before creating more indexes. This page in the documentation does a really good job: http://www.postgresql.org/docs/9.3/static/row-estimation-examples.html -- Shaun Thomas OptionsHouse, LLC | 141 W. Jackson Blvd. | Suite 800 | Chicago IL, 60604 312-676-8870 stho...@optionshouse.com __ See http://www.peak6.com/email_disclaimer/ for terms and conditions related to this email -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] NFS, file system cache and shared_buffers
On Tue, May 27, 2014 at 4:06 AM, Albe Laurenz laurenz.a...@wien.gv.atwrote: I just learned that NFS does not use a file system cache on the client side. That's ... incorrect. NFS is cache-capable. NFSv3 (I think? It may have been v2) started sending metadata on file operations that was intended to allow for client-side caches. NFSv4 added all sorts of stateful behavior which allows for much more aggressive caching. Where did you read that you could not use caching with NFS? -- John Melesky | Sr Database Administrator 503.284.7581 x204 | john.mele...@rentrak.com john.mele...@rentrakmail.com RENTRAK | www.rentrak.com | NASDAQ: RENT Notice: This message is confidential and is intended only for the recipient(s) named above. If you have received this message in error, or are not the named recipient(s), please immediately notify the sender and delete this message.
Re: [PERFORM] Planner doesn't take indexes into account
OK, thank you very much. I've tried similar query but with very few rows matching. In this case index was present in the plan. BR, Grzegorz Olszewski Date: Wed, 28 May 2014 08:31:38 -0500 From: stho...@optionshouse.com To: grzegorz.olszew...@outlook.com; rumman...@gmail.com CC: pgsql-performance@postgresql.org Subject: Re: [PERFORM] Planner doesn't take indexes into account On 05/28/2014 04:59 AM, Grzegorz Olszewski wrote: There is about 500,000 rows and about 500 new rows each business day. About 96% of rows meet given conditions, that is, count shoud be about 480,000. Heikki is right on this. Indexes are not a magic secret sauce that are always used simply because they exist. Think of it like this... If the table really matches about 480,000 rows, by forcing it to use the index, it has to perform *at least* 480,000 random seeks. Even if you have a high-performance SSD array that can do 100,000 random reads per second, you will need about five seconds just to read the data. A sequence scan can perform that same operation in a fraction of a second because it's faster to read the entire table and filter out the *non* matching rows. Indexes are really only used, or useful, when the number of matches is much lower than the row count of the table. I highly recommend reading up on cardinality and selectivity before creating more indexes. This page in the documentation does a really good job: http://www.postgresql.org/docs/9.3/static/row-estimation-examples.html -- Shaun Thomas OptionsHouse, LLC | 141 W. Jackson Blvd. | Suite 800 | Chicago IL, 60604 312-676-8870 stho...@optionshouse.com __ See http://www.peak6.com/email_disclaimer/ for terms and conditions related to this email -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] NFS, file system cache and shared_buffers
Why would they implement their own client? Did they have to do something special in their client to make it safe? I think it is mostly a performance issue. Each backend mounts its own copy of the data files it needs. I personally would never put PostgreSQL on an NFS share on Linux. Unless things have changed radically in the last couple years, Linux's NFS code is flaky and unreliable, including flat-out lying about whether stuff has been sent and received or not. This is why NetApp's NFS servers came with their own, proprietary, Linux kernel module. NFS on Solaris/Illumos is a different story. Not sure about FreeBSD. -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance