Re: [PERFORM] Performance problem with table containing a lot of text (blog)

2007-08-29 Thread Heikki Linnakangas
Kari Lavikka wrote:
 It would be nice if I could flag a column to be toasted always,
 regardless of it's length.

The idea of being able to set the toast threshold per column was
discussed during 8.3 development, but no patch was produced IIRC. We
might do that in the future. If you're willing to compile from source,
you can lower TOAST_TUPLE_THRESHOLD.

You could also use ALTER TABLE ... ALTER COLUMN ... SET STORAGE EXTERNAL
to force the long blog entries to be stored in the toast table instead
of compressing them in the main table. Values smaller than
TOAST_TUPLE_THRESHOLD (2k by default?) still wouldn't be toasted,
though, so it might not make much difference.

 Because there isn't such option maybe I should create a separate table
 for blog text content. Does anybody have better ideas for this? :)

That's probably the easiest solution. You can put a view on top of them
to hide it from the application.

 P.S. Here's a plan for query #3. Users can have several bookmark groups
 they are following. User can limit visibility of an entry to some of
 his/her bookmark group. Those are not any kind of bottlenecks anyway...

If the user_bookmark table is not clustered by uid, I'm surprised the
planner didn't choose a bitmap index scan. Which version of PostgreSQL
is this?

PS. EXPLAIN ANALYZE is much more helpful than plain EXPLAIN.

-- 
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [PERFORM] Performance problem with table containing a lot of text (blog)

2007-08-29 Thread Kari Lavikka

On Wed, 29 Aug 2007, Heikki Linnakangas wrote:


The idea of being able to set the toast threshold per column was
discussed during 8.3 development, but no patch was produced IIRC. We
might do that in the future. If you're willing to compile from source,
you can lower TOAST_TUPLE_THRESHOLD.


We are currently using Postgres 8.1 but have to upgrade to 8.2 shortly. 
New version fixes some vacuum problems.


I always compile postgres from source. Maybe I have to do some 
calculations because that setting affects all tables and databases. Most 
of our text/varchar columns are quite short but setting the threshold too 
low causes excessive seeks to toast tables... right?



Because there isn't such option maybe I should create a separate table
for blog text content. Does anybody have better ideas for this? :)


That's probably the easiest solution. You can put a view on top of them
to hide it from the application.


Yeh.


If the user_bookmark table is not clustered by uid, I'm surprised the
planner didn't choose a bitmap index scan.


Drumroll... there are:
user_bookmark_pkey PRIMARY KEY, btree (bookmark_group_id, marked_uid), tablespace 
lun3
user_bookmark_marked_uid btree (marked_uid)
user_bookmark_uid btree (uid) CLUSTER, tablespace lun3

Queries are mostly like Gimme all of my bookmarked friends in all of my 
bookmark groups and rarely the opposite Gimme all users who have 
bookmarked me


I have clustered the table using uid to minimize random page fetches.

 - Kari



--
 Heikki Linnakangas
 EnterpriseDB   http://www.enterprisedb.com

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster



---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [PERFORM] Performance problem with table containing a lot of text (blog)

2007-08-29 Thread Heikki Linnakangas
Kari Lavikka wrote:
 On Wed, 29 Aug 2007, Heikki Linnakangas wrote:
 
 The idea of being able to set the toast threshold per column was
 discussed during 8.3 development, but no patch was produced IIRC. We
 might do that in the future. If you're willing to compile from source,
 you can lower TOAST_TUPLE_THRESHOLD.
 
 We are currently using Postgres 8.1 but have to upgrade to 8.2 shortly.
 New version fixes some vacuum problems.
 
 I always compile postgres from source. Maybe I have to do some
 calculations because that setting affects all tables and databases. Most
 of our text/varchar columns are quite short but setting the threshold
 too low causes excessive seeks to toast tables... right?

Right. If you have trouble finding the right balance, you can also use
ALTER STORAGE PLAIN to force the other columns not to be toasted.

-- 
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [PERFORM] Performance problem with table containing a lot of text (blog)

2007-08-28 Thread Dan Harris

Kari Lavikka wrote:

Hello!

Some background info.. We have a blog table that contains about eight 
million blog entries. Average length of an entry is 1200 letters. 
Because each 8k page can accommodate only a few entries, every query 
that involves several entries causes several random seeks to disk.  We 
are having problems with queries like:


1) give me a list of months when I have written someting
2) give me id's of entries I have written on month X year X
3) give me the number of blog entries my friends have written since last
   time


I didn't see your schema, but couldn't these problems be solved by storing the 
article id, owner id, and blog date in a separate table?  It seems that if you 
don't actually need the content of the blogs, all of those questions could be 
answered by querying a very simple table with minimal I/O overhead.




---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq


Re: [PERFORM] Performance problem with table containing a lot of text (blog)

2007-08-28 Thread Kari Lavikka


I didn't see your schema, but couldn't these problems be solved by storing 
the article id, owner id, and blog date in a separate table?  It seems that 
if you don't actually need the content of the blogs, all of those questions 
could be answered by querying a very simple table with minimal I/O overhead.


Yes. I was suggesting this as an option but I'm wondering if there 
are other solutions.


|\__/|
( oo )Kari Lavikka - [EMAIL PROTECTED] - (050) 380 3808
__ooO(  )Ooo___ _ ___ _ _  _   __  _  _
  

On Tue, 28 Aug 2007, Dan Harris wrote:


Kari Lavikka wrote:

Hello!

Some background info.. We have a blog table that contains about eight 
million blog entries. Average length of an entry is 1200 letters. Because 
each 8k page can accommodate only a few entries, every query that involves 
several entries causes several random seeks to disk.  We are having 
problems with queries like:


1) give me a list of months when I have written someting
2) give me id's of entries I have written on month X year X
3) give me the number of blog entries my friends have written since last
   time





---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

 http://www.postgresql.org/docs/faq



---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org