Re: [HACKERS] postgresql shared buffers

2006-09-08 Thread Heikki Linnakangas

Praveen Kumar N wrote:

I have installed postgresql from sourcecode.I would like to know
how pages are replaced in the bufferpool when we join two relations.I 
tried to trace it by editing files 
pgsql/src/backend/storage/buffer/(bufmgr.c,freelist.c).But i feel still I
am missing some information after observing extracted information abt 
buffer replacement.My input datasize is 10times more than main 
memory/RAM size.When I joined two relations,postgresql accessed both 
relations sequentially one by one and that too only once.Then how is 
it joining two relations by accessing only once? Is it storing that 
accessed relations some where other than main memory/bufferpool(Becos 
they cant fit into main memory).


What kind of a join is it? If it's a merge join, using indexes, it would 
only have to visit each heap page once.


So can anybdy tell me is there is ne thing I am missing? Is there any 
concept like postgresql cache similar to kernel cache otherthan 
sharedbuffers.If so how can we figure it out.


No. All access to relations (except temporary relations) go through 
bufmgr and the shared memory buffer cache.


Is there any way by which postgresql is accessing database relations 
through,other than rotines in bufmgr.c nd freelist.c(I mean any other 
routines like ReadBuffer,StrategyGet etc.)


No.

--
 Heikki Linnakangas
 EnterpriseDB   http://www.enterprisedb.com


---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [HACKERS] postgresql shared buffers

2006-09-08 Thread Gregory Stark
Praveen Kumar N [EMAIL PROTECTED] writes:

 hai...

 I have installed postgresql from sourcecode.I would like to know
 how pages are replaced in the bufferpool when we join two relations.I tried to
 trace it by editing files
 pgsql/src/backend/storage/buffer/(bufmgr.c,freelist.c).But i feel still I
 am missing some information after observing extracted information abt buffer
 replacement.

Try

explain select ...

The output may be enlightening.

-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [HACKERS] postgresql shared buffers

2006-09-08 Thread Heikki Linnakangas

Praveen Kumar N wrote:

Let me explain once more.

I have two relations which are 10 times more than bufferpool size.I 
have observed the following things when joined that two relations(it 
using merge join to join both relations)


1.It first accessed system catalog tables
2.Relation 1
3.Relation 2

my doubt is one whole relation cant fit in the main memory.That too 
when we use merge join, it should keep some part of 1st relations and 
should scan second relation as bufferpool size is less compared to 
size of each relation.similarly for the remainin part of 1st  
relation.But it is not happening here.First whole Relation1 is scanned 
and then Relation 2 is scanned. Then how is it joining two relations 
using merge join? Am I missing something?


Hmm. A hash join, maybe? You should do EXPLAIN on the query to see what 
it really does, otherwise we're just guessing.


I traced scanning of relation by editing the functions ReadBuffer() 
and BufferAlloc(),StrategyGetBuffer().


That sounds valid.

--
 Heikki Linnakangas
 EnterpriseDB   http://www.enterprisedb.com


---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [HACKERS] postgresql shared buffers

2006-09-08 Thread Alvaro Herrera
Heikki Linnakangas wrote:
 Praveen Kumar N wrote:
 Let me explain once more.
 
 I have two relations which are 10 times more than bufferpool size.I 
 have observed the following things when joined that two relations(it 
 using merge join to join both relations)
 
 1.It first accessed system catalog tables
 2.Relation 1
 3.Relation 2
 
 my doubt is one whole relation cant fit in the main memory.That too 
 when we use merge join, it should keep some part of 1st relations and 
 should scan second relation as bufferpool size is less compared to 
 size of each relation.similarly for the remainin part of 1st  
 relation.But it is not happening here.First whole Relation1 is scanned 
 and then Relation 2 is scanned. Then how is it joining two relations 
 using merge join? Am I missing something?
 
 Hmm. A hash join, maybe? You should do EXPLAIN on the query to see what 
 it really does, otherwise we're just guessing.

Another option would be that the tuples are written to a sort tape, I
think.

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [HACKERS] postgresql shared buffers

2006-09-08 Thread Praveen Kumar N


Following is the output of query i have executed.

praveen=# explain select count(*) from a_9000_0,b_9000_0 where a2=b2;
 QUERY PLAN

 Aggregate  (cost=1924635.42..1924635.43 rows=1 width=0)
   -  Merge Join  (cost=109515.42..1665435.42 rows=10368 width=0)
 Merge Cond: (a_9000_0.a2 = b_9000_0.b2)
 -  Sort  (cost=54757.71..55117.71 rows=144000 width=260)
   Sort Key: a_9000_0.a2
   -  Seq Scan on a_9000_0  (cost=0.00..6979.00 rows=144000 
width=260)

 -  Sort  (cost=54757.71..55117.71 rows=144000 width=260)
   Sort Key: b_9000_0.b2
   -  Seq Scan on b_9000_0  (cost=0.00..6979.00 rows=144000 
width=260)

(9 rows)




On Fri, 8 Sep 2006, Heikki Linnakangas wrote:


Date: Fri, 08 Sep 2006 14:57:57 +0100
From: Heikki Linnakangas [EMAIL PROTECTED]
To: Praveen Kumar N [EMAIL PROTECTED]
Cc: pgsql-hackers@postgresql.org
Subject: Re: [HACKERS] postgresql shared buffers

Praveen Kumar N wrote:

Let me explain once more.

I have two relations which are 10 times more than bufferpool size.I have 
observed the following things when joined that two relations(it using merge 
join to join both relations)


1.It first accessed system catalog tables
2.Relation 1
3.Relation 2

my doubt is one whole relation cant fit in the main memory.That too when we 
use merge join, it should keep some part of 1st relations and should scan 
second relation as bufferpool size is less compared to size of each 
relation.similarly for the remainin part of 1st  relation.But it is not 
happening here.First whole Relation1 is scanned and then Relation 2 is 
scanned. Then how is it joining two relations using merge join? Am I 
missing something?


Hmm. A hash join, maybe? You should do EXPLAIN on the query to see what it 
really does, otherwise we're just guessing.


I traced scanning of relation by editing the functions ReadBuffer() and 
BufferAlloc(),StrategyGetBuffer().


That sounds valid.




--
 N Praveen Kumar
 Btech-IV CSE
 IIIT,Hyd
 AP,India

Imagination is more important than knowledge...
--Albert Einstein


---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq


Re: [HACKERS] postgresql shared buffers

2006-09-08 Thread Heikki Linnakangas

Praveen Kumar N wrote:


Following is the output of query i have executed.


Looks like Alvaro guessed right. It reads both relations in sequence, 
sorts them in temporary storage, outside bufmgr, and then does a merge 
join on the sorted inputs.


If you want to see the behavior you expected, I think you need to define 
indexes on a2 and b2, if you don't have them already, and coerce the 
planner to choose a nested loop join. I'd suggest using SET 
enable_seqscan=false; SET enable_bitmapscan=false; and see if that gets 
you a nested loop join.


--
 Heikki Linnakangas
 EnterpriseDB   http://www.enterprisedb.com


---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [HACKERS] postgresql shared buffers

2006-09-08 Thread Praveen Kumar N

On Fri, 8 Sep 2006, Heikki Linnakangas wrote:


Date: Fri, 08 Sep 2006 15:22:19 +0100
From: Heikki Linnakangas [EMAIL PROTECTED]
To: Praveen Kumar N [EMAIL PROTECTED]
Cc: pgsql-hackers@postgresql.org
Subject: Re: [HACKERS] postgresql shared buffers

Praveen Kumar N wrote:


Following is the output of query i have executed.


Looks like Alvaro guessed right. It reads both relations in sequence, sorts 
them in temporary storage, outside bufmgr, and then does a merge join on the 
sorted inputs.


could you tell me how can we trace that? I mean which functions shall I 
checkout for that.





If you want to see the behavior you expected, I think you need to define 
indexes on a2 and b2, if you don't have them already, and coerce the planner 
to choose a nested loop join. I'd suggest using SET enable_seqscan=false; 
SET enable_bitmapscan=false; and see if that gets you a nested loop join.





--
 N Praveen Kumar
 Btech-IV CSE
 IIIT,Hyd
 AP,India

Imagination is more important than knowledge...
--Albert Einstein


---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org


Re: [HACKERS] postgresql shared buffers

2006-09-08 Thread Heikki Linnakangas

Praveen Kumar N wrote:

On Fri, 8 Sep 2006, Heikki Linnakangas wrote:

Looks like Alvaro guessed right. It reads both relations in sequence, 
sorts them in temporary storage, outside bufmgr, and then does a 
merge join on the sorted inputs.


could you tell me how can we trace that? I mean which functions shall 
I checkout for that.


The sort code is in src/backend/utils/sort/tuplesort.c and logtape.c. 
Can't remember function names from the top of my head.


--
 Heikki Linnakangas
 EnterpriseDB   http://www.enterprisedb.com


---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [HACKERS] postgresql shared buffers

2006-09-08 Thread Praveen Kumar N


one more doubt.in the following example we dont need sort right.But in 
this case also relations were scanned sequentially one by one.So is it the 
case that any relation is accessed only once from database while executing 
a given query?




praveen=# explain select count(*) from a_9000_0,b_9000_0;
QUERY PLAN
--
 Aggregate  (cost=537566595.00..537566595.01 rows=1 width=0)
   -  Nested Loop  (cost=7616.00..485726595.00 rows=2073600 width=0)
 -  Seq Scan on a_9000_0  (cost=0.00..6979.00 rows=144000 
width=0)

 -  Materialize  (cost=7616.00..9549.00 rows=144000 width=0)
   -  Seq Scan on b_9000_0  (cost=0.00..6979.00 rows=144000 
width=0)

(5 rows)


Regards,
Praveen


On Fri, 8 Sep 2006, Heikki Linnakangas wrote:


Date: Fri, 08 Sep 2006 15:30:37 +0100
From: Heikki Linnakangas [EMAIL PROTECTED]
To: Praveen Kumar N [EMAIL PROTECTED]
Cc: pgsql-hackers@postgresql.org
Subject: Re: [HACKERS] postgresql shared buffers

Praveen Kumar N wrote:

On Fri, 8 Sep 2006, Heikki Linnakangas wrote:

Looks like Alvaro guessed right. It reads both relations in sequence, 
sorts them in temporary storage, outside bufmgr, and then does a merge 
join on the sorted inputs.


could you tell me how can we trace that? I mean which functions shall I 
checkout for that.


The sort code is in src/backend/utils/sort/tuplesort.c and logtape.c. Can't 
remember function names from the top of my head.





--
 N Praveen Kumar
 Btech-IV CSE
 IIIT,Hyd
 AP,India

Imagination is more important than knowledge...
--Albert Einstein


---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [HACKERS] postgresql shared buffers

2006-09-08 Thread Alvaro Herrera
Praveen Kumar N wrote:
 
 one more doubt.in the following example we dont need sort right.But in 
 this case also relations were scanned sequentially one by one.

In this case it's because the result from one seqscan was materialized.

 So is it the 
 case that any relation is accessed only once from database while executing 
 a given query?

Not in general -- you'll see that behavior only in particular cases.

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings