Re: [PERFORM] NFS, file system cache and shared_buffers

2014-05-28 Thread Albe Laurenz
John Melesky wrote:
 I just learned that NFS does not use a file system cache on the client side.
 
 That's ... incorrect. NFS is cache-capable. NFSv3 (I think? It may have been 
 v2) started sending
 metadata on file operations that was intended to allow for client-side 
 caches. NFSv4 added all sorts
 of stateful behavior which allows for much more aggressive caching.

What do you mean by allows?  Does it cache files in memory or not?
Do you need additional software?  Special configuration?

 Where did you read that you could not use caching with NFS?

I have it by hearsay from somebody who seemed knowledgable, and the
existence of CacheFS seemed to indicate it was true.

Yours,
Laurenz Albe

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] NFS, file system cache and shared_buffers

2014-05-28 Thread Albe Laurenz
Jeff Janes wrote:
 All that said, there has always been a recommendation of caution around
 using NFS as a backing store for PG, or any RDBMS..
 
  I know that Oracle recommends it - they even built an NFS client
  into their database server to make the most of it.
 
 Last I heard (which has been a while), Oracle supported specific brand named 
 implementations of NFS,
 and warned against any others on a data integrity basis.

I couldn't find any detailed information, but it seems that only certain
NFS devices are supported.

 Why would they implement their own client?  Did they have to do something 
 special in their client to
 make it safe?

I think it is mostly a performance issue.  Each backend mounts its own copy
of the data files it needs.

Yours,
Laurenz Albe

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Planner doesn't take indexes into account

2014-05-28 Thread Grzegorz Olszewski
random_page_cost = 4.0
seq_page_cost = 1.0

There is about 500,000 rows and about 500 new rows each business day.

About 96% of rows meet given conditions, that is, count shoud be about 480,000.

BR,
Grzegorz Olszewski

Date: Tue, 27 May 2014 14:14:21 -0700
Subject: Re: [PERFORM] Planner doesn't take indexes into account
From: rumman...@gmail.com
To: grzegorz.olszew...@outlook.com
CC: pgsql-performance@postgresql.org

What is random_page_cost and seq_page_cost in your server?And how many rows 
does the table have?

On Tue, May 27, 2014 at 2:09 PM, Grzegorz Olszewski 
grzegorz.olszew...@outlook.com wrote:




Hi,

I wonder why planner uses Seq Scan instead of Index Scan.

Here is my table (partial):
content.contents
-+-+-

 id  | bigint  | niepusty domyślnie 
nextval('content.contents_id_seq'::regclass)
 version | integer | niepusty
 date_published  | timestamp without time zone | 

 moderation_status   | character varying(50)   | 
 publication_status  | character varying(30)   | 

And indexes (there are some other indexes too):
contents_id_pkey PRIMARY KEY, btree (id)

contents_date_published_idx btree (date_published)
contents_moderation_status_idx btree (moderation_status)
contents_publication_status_idx btree (publication_status)


I tried also creating following indexes:
contents_date_published_publication_status_moderation_statu_idx btree 
(date_published, publication_status, moderation_status)
contents_publication_status_idx1 btree ((publication_status::text))

contents_moderation_status_idx1 btree ((moderation_status::text))

Then for this query (genrated by Hibernate):
explain (analyze, buffers) select count(*) as y0_ from content.contents this_ 
inner join content.content_categories cat1_ on 
this_.CONTENT_CATEGORY_ID=cat1_.ID where cat1_.name in ([...])

and this_.date_published='2014-05-26 12:23:31.557000 +02:00:00'
and (this_.PUBLICATION_STATUS is null or this_.PUBLICATION_STATUS'DRAFT')
and (this_.moderation_status is null or this_.moderation_status'DANGEROUS')

and exists(select * from content.content_visibilities cv where cv.content_id = 
this_.ID and cv.user_group_id in (1,2));

Planner creates such plan:

  QUERY PLAN

  

--

 Hash Semi Join  (cost=31706.84..106020.81 rows=21871 width=2076) (actual 
time=1197.658..6012.406 rows=430218 loops=1)
   Hash Cond: (this_.id = cv.content_id)
   Buffers: shared hit=5 read=59031 written=3, temp read=47611 written=47549

   -  Hash Join  (cost=2.22..56618.11 rows=22881 width=2076) (actual 
time=0.163..1977.304 rows=430221 loops=1)
 Hash Cond: (this_.content_category_id = cat1_.id)

 Buffers: shared hit=1 read=46829 written=1
 -  Seq Scan on contents this_  (cost=0.00..54713.92 rows=446176 
width=2030) (actual time=0.048..915.724 rows=450517 loops=1)
   Filter: ((date_published = '2014-05-26 12:23:31.557'::timestamp 
without time zone) AND ((publication_status IS NULL) OR 
((publication_status)::text  'DRAFT'::text)) AND ((moderation_status IS NULL) 
OR ((moderation_status)::text  'DANGEROUS'::text)))

   Rows Removed by Filter: 50
   Buffers: shared read=46829 written=1
 -  Hash  (cost=2.17..2.17 rows=4 width=46) (actual time=0.089..0.089 
rows=4 loops=1)
   Buckets: 1024  Batches: 1  Memory Usage: 1kB

   Buffers: shared hit=1
   -  Seq Scan on content_categories cat1_  (cost=0.00..2.17 
rows=4 width=46) (actual time=0.053..0.076 rows=4 loops=1)
 Filter: ((name)::text = ANY 
('{przeglad-prasy/rp,przeglad-prasy/parkiet,komunikat-z-rynku-pap-emitent,komunikat-z-rynku-pap-depesze}'::text[]))

 Rows Removed by Filter: 74
 Buffers: shared hit=1
   -  Hash  (cost=24435.09..24435.09 rows=443083 width=8) (actual 
time=1197.146..1197.146 rows=447624 loops=1)
 Buckets: 4096  Batches: 32  Memory Usage: 560kB

 Buffers: shared hit=4 read=12202 written=2, temp written=1467
 -  Bitmap Heap Scan on content_visibilities cv  
(cost=7614.55..24435.09 rows=443083 width=8) (actual time=61.034..647.729 
rows=447624 loops=1)

   Recheck Cond: (user_group_id = ANY ('{1,2}'::bigint[]))
   Buffers: shared hit=4 

Re: [PERFORM] Planner doesn't take indexes into account

2014-05-28 Thread Heikki Linnakangas

On 05/28/2014 12:59 PM, Grzegorz Olszewski wrote:

random_page_cost = 4.0
seq_page_cost = 1.0

There is about 500,000 rows and about 500 new rows each business day.

About 96% of rows meet given conditions, that is, count shoud be about 480,000.


When such a large percentage of the rows match, a sequential scan is 
indeed a better plan than an index scan. Sequential access is much 
faster than random access.


- Heikki


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Planner doesn't take indexes into account

2014-05-28 Thread Shaun Thomas

On 05/28/2014 04:59 AM, Grzegorz Olszewski wrote:


There is about 500,000 rows and about 500 new rows each business day.

About 96% of rows meet given conditions, that is, count shoud be about
480,000.


Heikki is right on this. Indexes are not a magic secret sauce that are 
always used simply because they exist. Think of it like this...


If the table really matches about 480,000 rows, by forcing it to use the 
index, it has to perform *at least* 480,000 random seeks. Even if you 
have a high-performance SSD array that can do 100,000 random reads per 
second, you will need about five seconds just to read the data.


A sequence scan can perform that same operation in a fraction of a 
second because it's faster to read the entire table and filter out the 
*non* matching rows.


Indexes are really only used, or useful, when the number of matches is 
much lower than the row count of the table. I highly recommend reading 
up on cardinality and selectivity before creating more indexes. This 
page in the documentation does a really good job:


http://www.postgresql.org/docs/9.3/static/row-estimation-examples.html

--
Shaun Thomas
OptionsHouse, LLC | 141 W. Jackson Blvd. | Suite 800 | Chicago IL, 60604
312-676-8870
stho...@optionshouse.com

__

See http://www.peak6.com/email_disclaimer/ for terms and conditions related to 
this email


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] NFS, file system cache and shared_buffers

2014-05-28 Thread John Melesky
On Tue, May 27, 2014 at 4:06 AM, Albe Laurenz laurenz.a...@wien.gv.atwrote:

 I just learned that NFS does not use a file system cache on the client
 side.


That's ... incorrect. NFS is cache-capable. NFSv3 (I think? It may have
been v2) started sending metadata on file operations that was intended to
allow for client-side caches. NFSv4 added all sorts of stateful behavior
which allows for much more aggressive caching.

Where did you read that you could not use caching with NFS?



-- 
John Melesky | Sr Database Administrator
503.284.7581 x204 | john.mele...@rentrak.com john.mele...@rentrakmail.com
RENTRAK | www.rentrak.com | NASDAQ: RENT

Notice: This message is confidential and is intended only for the
recipient(s) named above. If you have received this message in error,
or are not the named recipient(s), please immediately notify the
sender and delete this message.


Re: [PERFORM] Planner doesn't take indexes into account

2014-05-28 Thread Grzegorz Olszewski
OK, thank you very much. I've tried similar query but with very few rows 
matching. In this case index was present in the plan.

BR,
Grzegorz Olszewski

 Date: Wed, 28 May 2014 08:31:38 -0500
 From: stho...@optionshouse.com
 To: grzegorz.olszew...@outlook.com; rumman...@gmail.com
 CC: pgsql-performance@postgresql.org
 Subject: Re: [PERFORM] Planner doesn't take indexes into account
 
 On 05/28/2014 04:59 AM, Grzegorz Olszewski wrote:
 
  There is about 500,000 rows and about 500 new rows each business day.
 
  About 96% of rows meet given conditions, that is, count shoud be about
  480,000.
 
 Heikki is right on this. Indexes are not a magic secret sauce that are 
 always used simply because they exist. Think of it like this...
 
 If the table really matches about 480,000 rows, by forcing it to use the 
 index, it has to perform *at least* 480,000 random seeks. Even if you 
 have a high-performance SSD array that can do 100,000 random reads per 
 second, you will need about five seconds just to read the data.
 
 A sequence scan can perform that same operation in a fraction of a 
 second because it's faster to read the entire table and filter out the 
 *non* matching rows.
 
 Indexes are really only used, or useful, when the number of matches is 
 much lower than the row count of the table. I highly recommend reading 
 up on cardinality and selectivity before creating more indexes. This 
 page in the documentation does a really good job:
 
 http://www.postgresql.org/docs/9.3/static/row-estimation-examples.html
 
 -- 
 Shaun Thomas
 OptionsHouse, LLC | 141 W. Jackson Blvd. | Suite 800 | Chicago IL, 60604
 312-676-8870
 stho...@optionshouse.com
 
 __
 
 See http://www.peak6.com/email_disclaimer/ for terms and conditions related 
 to this email
 
 
 -- 
 Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-performance
  

Re: [PERFORM] NFS, file system cache and shared_buffers

2014-05-28 Thread Josh Berkus

 Why would they implement their own client?  Did they have to do something 
 special in their client to
 make it safe?
 
 I think it is mostly a performance issue.  Each backend mounts its own copy
 of the data files it needs.

I personally would never put PostgreSQL on an NFS share on Linux.
Unless things have changed radically in the last couple years, Linux's
NFS code is flaky and unreliable, including flat-out lying about whether
stuff has been sent and received or not.  This is why NetApp's NFS
servers came with their own, proprietary, Linux kernel module.

NFS on Solaris/Illumos is a different story.  Not sure about FreeBSD.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance