Re: [HACKERS] postgresql shared buffers
Praveen Kumar N wrote: I have installed postgresql from sourcecode.I would like to know how pages are replaced in the bufferpool when we join two relations.I tried to trace it by editing files pgsql/src/backend/storage/buffer/(bufmgr.c,freelist.c).But i feel still I am missing some information after observing extracted information abt buffer replacement.My input datasize is 10times more than main memory/RAM size.When I joined two relations,postgresql accessed both relations sequentially one by one and that too only once.Then how is it joining two relations by accessing only once? Is it storing that accessed relations some where other than main memory/bufferpool(Becos they cant fit into main memory). What kind of a join is it? If it's a merge join, using indexes, it would only have to visit each heap page once. So can anybdy tell me is there is ne thing I am missing? Is there any concept like postgresql cache similar to kernel cache otherthan sharedbuffers.If so how can we figure it out. No. All access to relations (except temporary relations) go through bufmgr and the shared memory buffer cache. Is there any way by which postgresql is accessing database relations through,other than rotines in bufmgr.c nd freelist.c(I mean any other routines like ReadBuffer,StrategyGet etc.) No. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [HACKERS] postgresql shared buffers
Praveen Kumar N [EMAIL PROTECTED] writes: hai... I have installed postgresql from sourcecode.I would like to know how pages are replaced in the bufferpool when we join two relations.I tried to trace it by editing files pgsql/src/backend/storage/buffer/(bufmgr.c,freelist.c).But i feel still I am missing some information after observing extracted information abt buffer replacement. Try explain select ... The output may be enlightening. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [HACKERS] postgresql shared buffers
Praveen Kumar N wrote: Let me explain once more. I have two relations which are 10 times more than bufferpool size.I have observed the following things when joined that two relations(it using merge join to join both relations) 1.It first accessed system catalog tables 2.Relation 1 3.Relation 2 my doubt is one whole relation cant fit in the main memory.That too when we use merge join, it should keep some part of 1st relations and should scan second relation as bufferpool size is less compared to size of each relation.similarly for the remainin part of 1st relation.But it is not happening here.First whole Relation1 is scanned and then Relation 2 is scanned. Then how is it joining two relations using merge join? Am I missing something? Hmm. A hash join, maybe? You should do EXPLAIN on the query to see what it really does, otherwise we're just guessing. I traced scanning of relation by editing the functions ReadBuffer() and BufferAlloc(),StrategyGetBuffer(). That sounds valid. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] postgresql shared buffers
Heikki Linnakangas wrote: Praveen Kumar N wrote: Let me explain once more. I have two relations which are 10 times more than bufferpool size.I have observed the following things when joined that two relations(it using merge join to join both relations) 1.It first accessed system catalog tables 2.Relation 1 3.Relation 2 my doubt is one whole relation cant fit in the main memory.That too when we use merge join, it should keep some part of 1st relations and should scan second relation as bufferpool size is less compared to size of each relation.similarly for the remainin part of 1st relation.But it is not happening here.First whole Relation1 is scanned and then Relation 2 is scanned. Then how is it joining two relations using merge join? Am I missing something? Hmm. A hash join, maybe? You should do EXPLAIN on the query to see what it really does, otherwise we're just guessing. Another option would be that the tuples are written to a sort tape, I think. -- Alvaro Herrerahttp://www.CommandPrompt.com/ PostgreSQL Replication, Consulting, Custom Development, 24x7 support ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [HACKERS] postgresql shared buffers
Following is the output of query i have executed. praveen=# explain select count(*) from a_9000_0,b_9000_0 where a2=b2; QUERY PLAN Aggregate (cost=1924635.42..1924635.43 rows=1 width=0) - Merge Join (cost=109515.42..1665435.42 rows=10368 width=0) Merge Cond: (a_9000_0.a2 = b_9000_0.b2) - Sort (cost=54757.71..55117.71 rows=144000 width=260) Sort Key: a_9000_0.a2 - Seq Scan on a_9000_0 (cost=0.00..6979.00 rows=144000 width=260) - Sort (cost=54757.71..55117.71 rows=144000 width=260) Sort Key: b_9000_0.b2 - Seq Scan on b_9000_0 (cost=0.00..6979.00 rows=144000 width=260) (9 rows) On Fri, 8 Sep 2006, Heikki Linnakangas wrote: Date: Fri, 08 Sep 2006 14:57:57 +0100 From: Heikki Linnakangas [EMAIL PROTECTED] To: Praveen Kumar N [EMAIL PROTECTED] Cc: pgsql-hackers@postgresql.org Subject: Re: [HACKERS] postgresql shared buffers Praveen Kumar N wrote: Let me explain once more. I have two relations which are 10 times more than bufferpool size.I have observed the following things when joined that two relations(it using merge join to join both relations) 1.It first accessed system catalog tables 2.Relation 1 3.Relation 2 my doubt is one whole relation cant fit in the main memory.That too when we use merge join, it should keep some part of 1st relations and should scan second relation as bufferpool size is less compared to size of each relation.similarly for the remainin part of 1st relation.But it is not happening here.First whole Relation1 is scanned and then Relation 2 is scanned. Then how is it joining two relations using merge join? Am I missing something? Hmm. A hash join, maybe? You should do EXPLAIN on the query to see what it really does, otherwise we're just guessing. I traced scanning of relation by editing the functions ReadBuffer() and BufferAlloc(),StrategyGetBuffer(). That sounds valid. -- N Praveen Kumar Btech-IV CSE IIIT,Hyd AP,India Imagination is more important than knowledge... --Albert Einstein ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [HACKERS] postgresql shared buffers
Praveen Kumar N wrote: Following is the output of query i have executed. Looks like Alvaro guessed right. It reads both relations in sequence, sorts them in temporary storage, outside bufmgr, and then does a merge join on the sorted inputs. If you want to see the behavior you expected, I think you need to define indexes on a2 and b2, if you don't have them already, and coerce the planner to choose a nested loop join. I'd suggest using SET enable_seqscan=false; SET enable_bitmapscan=false; and see if that gets you a nested loop join. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] postgresql shared buffers
On Fri, 8 Sep 2006, Heikki Linnakangas wrote: Date: Fri, 08 Sep 2006 15:22:19 +0100 From: Heikki Linnakangas [EMAIL PROTECTED] To: Praveen Kumar N [EMAIL PROTECTED] Cc: pgsql-hackers@postgresql.org Subject: Re: [HACKERS] postgresql shared buffers Praveen Kumar N wrote: Following is the output of query i have executed. Looks like Alvaro guessed right. It reads both relations in sequence, sorts them in temporary storage, outside bufmgr, and then does a merge join on the sorted inputs. could you tell me how can we trace that? I mean which functions shall I checkout for that. If you want to see the behavior you expected, I think you need to define indexes on a2 and b2, if you don't have them already, and coerce the planner to choose a nested loop join. I'd suggest using SET enable_seqscan=false; SET enable_bitmapscan=false; and see if that gets you a nested loop join. -- N Praveen Kumar Btech-IV CSE IIIT,Hyd AP,India Imagination is more important than knowledge... --Albert Einstein ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] postgresql shared buffers
Praveen Kumar N wrote: On Fri, 8 Sep 2006, Heikki Linnakangas wrote: Looks like Alvaro guessed right. It reads both relations in sequence, sorts them in temporary storage, outside bufmgr, and then does a merge join on the sorted inputs. could you tell me how can we trace that? I mean which functions shall I checkout for that. The sort code is in src/backend/utils/sort/tuplesort.c and logtape.c. Can't remember function names from the top of my head. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [HACKERS] postgresql shared buffers
one more doubt.in the following example we dont need sort right.But in this case also relations were scanned sequentially one by one.So is it the case that any relation is accessed only once from database while executing a given query? praveen=# explain select count(*) from a_9000_0,b_9000_0; QUERY PLAN -- Aggregate (cost=537566595.00..537566595.01 rows=1 width=0) - Nested Loop (cost=7616.00..485726595.00 rows=2073600 width=0) - Seq Scan on a_9000_0 (cost=0.00..6979.00 rows=144000 width=0) - Materialize (cost=7616.00..9549.00 rows=144000 width=0) - Seq Scan on b_9000_0 (cost=0.00..6979.00 rows=144000 width=0) (5 rows) Regards, Praveen On Fri, 8 Sep 2006, Heikki Linnakangas wrote: Date: Fri, 08 Sep 2006 15:30:37 +0100 From: Heikki Linnakangas [EMAIL PROTECTED] To: Praveen Kumar N [EMAIL PROTECTED] Cc: pgsql-hackers@postgresql.org Subject: Re: [HACKERS] postgresql shared buffers Praveen Kumar N wrote: On Fri, 8 Sep 2006, Heikki Linnakangas wrote: Looks like Alvaro guessed right. It reads both relations in sequence, sorts them in temporary storage, outside bufmgr, and then does a merge join on the sorted inputs. could you tell me how can we trace that? I mean which functions shall I checkout for that. The sort code is in src/backend/utils/sort/tuplesort.c and logtape.c. Can't remember function names from the top of my head. -- N Praveen Kumar Btech-IV CSE IIIT,Hyd AP,India Imagination is more important than knowledge... --Albert Einstein ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [HACKERS] postgresql shared buffers
Praveen Kumar N wrote: one more doubt.in the following example we dont need sort right.But in this case also relations were scanned sequentially one by one. In this case it's because the result from one seqscan was materialized. So is it the case that any relation is accessed only once from database while executing a given query? Not in general -- you'll see that behavior only in particular cases. -- Alvaro Herrerahttp://www.CommandPrompt.com/ The PostgreSQL Company - Command Prompt, Inc. ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings