Re: [HACKERS] How to REINDEX in high volume environments?

2002-10-03 Thread Bruce Momjian


Jim, glad you are still around.  Yes, we would love to get tablespaces
in 7.4.  I think we need to think bigger and get something where we can
name tablespaces and place tables/indexes into these named spaces.  I
can reread the TODO.detail stuff and give you an outline.  How does that
sound?  Thomas Lockhart is also interested in this feature.

---

Jim Buttafuoco wrote:
 Just wanted to pipe in here.  I am still very interested in tablespaces ( I have 
many database systems that are over
 500GB and growing) and am willing to port my tablespace patch to 7.4.  I have 
everything (but only tested here) working
 in 7.2 but the patch was not accepted.  I didn't see a great speed improvement but 
the patch helps with storage management.
 
 Recap.  the patch would enable the following
 
 a database to have a default data tablespace and index tablespace
 a user to have a default data and index tablespace
 a table to have a specific tablespace
 an index to have a specfic tablespace
 
 I would like to also add  namespace (schema) to have a default data and index 
tablespaces
 
 Jim
 
 
 
 
  Justin Clift [EMAIL PROTECTED] writes:
   Shridhar Daithankar wrote:
   Looks like we should have a subdirectory in database directory which stores
   index.
  
   That was my first thought also, but an alternative/additional approach
   would be this (not sure if it's workable):
  
  See the tablespaces TODO item.  I'm not excited about building
  half-baked versions of tablespaces before we get around to doing the
  real thing ...
  
  regards, tom lane
  
  ---(end of broadcast)---
  TIP 5: Have you checked our extensive FAQ?
  
  http://www.postgresql.org/users-lounge/docs/faq.html
 
 
 
 
 
 ---(end of broadcast)---
 TIP 4: Don't 'kill -9' the postmaster
 

-- 
  Bruce Momjian|  http://candle.pha.pa.us
  [EMAIL PROTECTED]   |  (610) 359-1001
  +  If your life is a hard drive, |  13 Roberts Road
  +  Christ can be your backup.|  Newtown Square, Pennsylvania 19073

---(end of broadcast)---
TIP 6: Have you searched our list archives?

http://archives.postgresql.org



Re: [HACKERS] How to REINDEX in high volume environments?

2002-09-30 Thread Jim Buttafuoco

Just wanted to pipe in here.  I am still very interested in tablespaces ( I have many 
database systems that are over
500GB and growing) and am willing to port my tablespace patch to 7.4.  I have 
everything (but only tested here) working
in 7.2 but the patch was not accepted.  I didn't see a great speed improvement but the 
patch helps with storage management.

Recap.  the patch would enable the following

a database to have a default data tablespace and index tablespace
a user to have a default data and index tablespace
a table to have a specific tablespace
an index to have a specfic tablespace

I would like to also add  namespace (schema) to have a default data and index 
tablespaces

Jim




 Justin Clift [EMAIL PROTECTED] writes:
  Shridhar Daithankar wrote:
  Looks like we should have a subdirectory in database directory which stores
  index.
 
  That was my first thought also, but an alternative/additional approach
  would be this (not sure if it's workable):
 
 See the tablespaces TODO item.  I'm not excited about building
 half-baked versions of tablespaces before we get around to doing the
 real thing ...
 
   regards, tom lane
 
 ---(end of broadcast)---
 TIP 5: Have you checked our extensive FAQ?
 
 http://www.postgresql.org/users-lounge/docs/faq.html





---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster



Re: [HACKERS] How to REINDEX in high volume environments?

2002-09-29 Thread Shridhar Daithankar

On 28 Sep 2002 at 12:18, Tom Lane wrote:

 Justin Clift [EMAIL PROTECTED] writes:
  Shridhar Daithankar wrote:
  Looks like we should have a subdirectory in database directory which stores
  index.
 
  That was my first thought also, but an alternative/additional approach
  would be this (not sure if it's workable):
 
 See the tablespaces TODO item.  I'm not excited about building
 half-baked versions of tablespaces before we get around to doing the
 real thing ...

I wen thr. the messages posted regarding tablespaces. It looks like

Tablesspaces should provide

1. Managability 
2. Performance tuning
3. Better Administration..

Creating a directory for each object or object type would allow to do same 
thing.

Why directory?

1. You can mount it someplace else.
2. You can symlink it without worrying about postgresql creating new files 
instead of symlink while drop/recreate.

Whether to choose directory or tablespaces? I say directory. Why?

1. PostgreSQL philosphy has always been using facilities provided by OS and not 
to duplicate that work. Tablespaces directly violates that. Directory mounting 
does not.

2. Tablespaces combines objects on them, adding a layer of abstraction. and 
then come ideas like vacuuming a tablespace. Frankly given what vacuum does, I 
can't imagine what vacuuming tablespace would exactly do.

3. Tablespace would be a single file or structure of directories? How do we 
configure it? What tuning option do we provide?

Basically table spaces I feel is a layer of abstraction that can be avoided if 
we layout the DB in a directory tree with sufficient levels. That would be easy 
to deal with as configuration and maitainance delegated to OS and it would be 
flexible enough to.

Anyway if we have a directory per object/object type, how much different it's 
going to be from a table space? 

Frankly I am wary of table spaces because I have seen them in oracle and not 
eaxctly convinced that's the best way of doing things. 

If we introdude word tablespace, users will be expecting all those idiocies 
like taking a table space offline/online, adding data files aka pre-claiming 
space etc. All these are responsibilities of OS. Let OS handle it. PostgreSQL 
should just create a file structure which would grow as and when required.

The issue looks similimar to having raw disk I/O. Oracle might have good reason 
to do it but are we sure postgresql needs this? Just another policy decision 
waiting..

Here are some links I found in archive. Would like to know more about this 
issue..

http://candle.pha.pa.us/mhonarc/todo.detail/tablespaces/msg6.html
http://candle.pha.pa.us/mhonarc/todo.detail/tablespaces/msg7.html

Just a thought..

Bye
 Shridhar

--
The sooner our happiness together begins, the longer it will last.  -- 
Miramanee, The Paradise Syndrome, stardate 4842.6


---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to [EMAIL PROTECTED] so that your
message can get through to the mailing list cleanly



Re: [HACKERS] How to REINDEX in high volume environments?

2002-09-29 Thread Shridhar Daithankar

On 29 Sep 2002 at 0:43, Justin Clift wrote:

 Shridhar Daithankar wrote:
 The reason that I was thinking of having a different path per index
 would be for high volume situations like this:
 
 /dev/dsk1 : /pgdata - data here
 /dev/dsk2 : /pgindexes1 - some indexes here
 /dev/dsk3 : /pgindexes2 - some ultra-high volume activity here

I would say this would look better..

/pgdata
-indexes
--index1
---indexfiles
--index2
---indexfiles

Where index1 and index2 are two different indexes. Just like each table gets 
it's own directory, each index gets it's own directory as well. 

So the admin would/can tune on per object basis rather than worrying about 
creating right group of objects and then tuning about that group.

If required throwing per database transaction log there as well might prove a 
good idea. It would insulate one db from load of other, as far as I/O is 
concerned..

This possiblity is not lost with this scheme but it just gets something simpler 
IMO..

Just illustration of my another post on hackers on this topic.. 


Bye
 Shridhar

--
You're too beautiful to ignore.  Too much woman.-- Kirk to Yeoman 
Rand, The 
Enemy Within, stardate unknown


---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster



Re: [HACKERS] How to REINDEX in high volume environments?

2002-09-29 Thread Mario Weilguni

Am Samstag, 28. September 2002 10:17 schrieb Shridhar Daithankar:
(snip)
 I have to disagree.. Completely.. This is like turning PG-Metadata into
 registry...

 And what happens when index starts splitting when it grows beyond 1GB in
 size?

 Putting indexes into a separate subdirectoy and mount/link that directory
 on a device that is on a separate SCSI channel is what I can think of as
 last drop of performance out of it..
(snip)

I think a good approach would be the introduction of tablespaces like oracle has, and 
assigning locations to that tablespace.

Best regards,
Mario Weilguni

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send unregister YourEmailAddressHere to [EMAIL PROTECTED])



[HACKERS] How to REINDEX in high volume environments?

2002-09-28 Thread Justin Clift

Hi all,

Am experimenting to find out what kind of performance gain are achieved
from moving indexes to a different scsi drives than the WAL files, than
the data itself, etc.

Have come across an interesting problem.

Have moved the indexes to another drive, then created symlinks to them.

Ran a benchmark against the database, REINDEX'd the tables, VACUUM FULL
ANALYZE'd, prepared to re-run the benchmark again and guess what?

The indexes were back on the original drive.

The process of REINDEX-ing obviously creates another file then drops the
original.

Is there a way to allow REINDEX to work without having this side affect?

Pre-creating a bunch of dangling symlinks doesn't work (tried that, it
gives a ERROR:  cannot create accounts_pkey: File exists on FreeBSD
4.6.2 when using the REINDEX).

Any suggestions?

:-)

Regards and best wishes,

Justin Clift

-- 
My grandfather once told me that there are two kinds of people: those
who work and those who take the credit. He told me to try to be in the
first group; there was less competition there.
   - Indira Gandhi

---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to [EMAIL PROTECTED] so that your
message can get through to the mailing list cleanly



Re: [HACKERS] How to REINDEX in high volume environments?

2002-09-28 Thread Shridhar Daithankar

On 28 Sep 2002 at 17:08, Justin Clift wrote:

 Have moved the indexes to another drive, then created symlinks to them.
 Ran a benchmark against the database, REINDEX'd the tables, VACUUM FULL
 ANALYZE'd, prepared to re-run the benchmark again and guess what?
 
 The indexes were back on the original drive.
 Is there a way to allow REINDEX to work without having this side affect?
 
 Pre-creating a bunch of dangling symlinks doesn't work (tried that, it
 gives a ERROR:  cannot create accounts_pkey: File exists on FreeBSD
 4.6.2 when using the REINDEX).

Looks like we should have a subdirectory in database directory which stores 
index.

May be transaction logs, indexes goes in separte directory which can be 
symlinked. Linking a directory is much simpler solution than linking a file.

I suggest we have per database transaction log and indexes created in separate 
subdirectories for each database. Furhter given that large tables are segmented 
after one GB size, a table should have it's own subdirectory optionally..

At the cost of few inodes, postgresql would gain much more flexibility and 
hence tunability..

May be TODO for 7.4? Anyone?

Bye
 Shridhar

--
Software, n.:   Formal evening attire for female computer analysts.


---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster



Re: [HACKERS] How to REINDEX in high volume environments?

2002-09-28 Thread Justin Clift

Shridhar Daithankar wrote:
snip
 Looks like we should have a subdirectory in database directory which stores
 index.

That was my first thought also, but an alternative/additional approach
would be this (not sure if it's workable):

 - As each index already has a bunch of information stored stored for
it, would it be possible to have an additional column added called
'idxpath' or something?

 - This would mean that the index location would be stable per index,
and would allow for *really* high volume environments to keep different
indexes on different drives.

Not sure what the default value would be, maybe the PGDATA directory,
maybe something as a GUC variable, etc, but that's the concept.

:-)

Regards and best wishes,

Justin Clift

 
 May be transaction logs, indexes goes in separte directory which can be
 symlinked. Linking a directory is much simpler solution than linking a file.
 
 I suggest we have per database transaction log and indexes created in separate
 subdirectories for each database. Furhter given that large tables are segmented
 after one GB size, a table should have it's own subdirectory optionally..
 
 At the cost of few inodes, postgresql would gain much more flexibility and
 hence tunability..
 
 May be TODO for 7.4? Anyone?
 
 Bye
  Shridhar
 
 --
 Software, n.:   Formal evening attire for female computer analysts.
 
 ---(end of broadcast)---
 TIP 4: Don't 'kill -9' the postmaster

-- 
My grandfather once told me that there are two kinds of people: those
who work and those who take the credit. He told me to try to be in the
first group; there was less competition there.
   - Indira Gandhi

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send unregister YourEmailAddressHere to [EMAIL PROTECTED])



Re: [HACKERS] How to REINDEX in high volume environments?

2002-09-28 Thread Shridhar Daithankar

On 28 Sep 2002 at 17:51, Justin Clift wrote:

 Shridhar Daithankar wrote:
 snip
  Looks like we should have a subdirectory in database directory which stores
  index.
 
 That was my first thought also, but an alternative/additional approach
 would be this (not sure if it's workable):
 
  - As each index already has a bunch of information stored stored for
 it, would it be possible to have an additional column added called
 'idxpath' or something?
 
  - This would mean that the index location would be stable per index,
 and would allow for *really* high volume environments to keep different
 indexes on different drives.

I have to disagree.. Completely.. This is like turning PG-Metadata into 
registry...

And what happens when index starts splitting when it grows beyond 1GB in size?

Putting indexes into a separate subdirectoy and mount/link that directory on a 
device that is on a separate SCSI channel is what I can think of as last drop 
of performance out of it..

Just a thought, as usual..

I don't know how much efforts it would take but if we have pg_xlog in separte 
configurable dir. now, putting indexes as well and having per database pg_xlog 
should be on the same line. The later aspect is also important IMO..

Bye
 Shridhar

--
VMS, n.:The world's foremost multi-user adventure game.


---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster



Re: [HACKERS] How to REINDEX in high volume environments?

2002-09-28 Thread Justin Clift

Shridhar Daithankar wrote:
snip
 And what happens when index starts splitting when it grows beyond 1GB in size?

Having an index directory:

i.e. $PGDATA/data/oid/indexes/

(that's the kind of thing you mean isn't it?)

Sounds workable, and sounds better than the present approach.

The reason that I was thinking of having a different path per index
would be for high volume situations like this:

/dev/dsk1 : /pgdata - data here
/dev/dsk2 : /pgindexes1 - some indexes here
/dev/dsk3 : /pgindexes2 - some ultra-high volume activity here

Let's say that there's a bunch of data on /dev/dsk1, and for performance
reasons it's been decided to move the indexes to another drive
/dev/dsk2.

Now, if just one of those indexes is getting *a lot* of the drive
activity, it would make sense to move it to it's own dedicated drive. 
Having an um... PGINDEX (that's just an identifier for this example, not
an environment variable suggestion) directory location defined would
mean that each time a REINDEX operation occurs, then all new indexes
would be created in the same spot.  That sounds better than the present
approach thus far, but wouldn't work for situations where indexes are
spread across multiple disk drives.

The suggestion of having some kind of path info for each index is merely
a thought of how to meet that potential future need, not necessarily the
best method anyone has ever thought of.  Like someone might pipe up and
say Nah, it could be done better XYZ way, etc.

:-)

Regards and best wishes,

Justin Clift

 
 Putting indexes into a separate subdirectoy and mount/link that directory on a
 device that is on a separate SCSI channel is what I can think of as last drop
 of performance out of it..
 
 Just a thought, as usual..
 
 I don't know how much efforts it would take but if we have pg_xlog in separte
 configurable dir. now, putting indexes as well and having per database pg_xlog
 should be on the same line. The later aspect is also important IMO..
 
 Bye
  Shridhar
 
 --
 VMS, n.:The world's foremost multi-user adventure game.
 
 ---(end of broadcast)---
 TIP 4: Don't 'kill -9' the postmaster

-- 
My grandfather once told me that there are two kinds of people: those
who work and those who take the credit. He told me to try to be in the
first group; there was less competition there.
   - Indira Gandhi

---(end of broadcast)---
TIP 6: Have you searched our list archives?

http://archives.postgresql.org



Re: [HACKERS] How to REINDEX in high volume environments?

2002-09-28 Thread Alvaro Herrera

Justin Clift dijo: 

Hi,

 Ran a benchmark against the database, REINDEX'd the tables, VACUUM FULL
 ANALYZE'd, prepared to re-run the benchmark again and guess what?
 
 The indexes were back on the original drive.

Yes, this is expected.  Same for CLUSTER.  They create a different
filenode and point the relation (table or index) at it.

I think the separate space for indexes is a good idea.  However, and
this is orthogonal, I feel the way REINDEX works now is not the best,
because it precludes you from using the index while you are doing it.

I'm trying to implement a way to concurrently compact the indexes.
I hope to have it for 7.4.

-- 
Alvaro Herrera (alvherre[a]atentus.com)
Y una voz del caos me hablo y me dijo
Sonrie y se feliz, podria ser peor.
Y sonrei. Y fui feliz.
Y fue peor.


---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send unregister YourEmailAddressHere to [EMAIL PROTECTED])



Re: [HACKERS] How to REINDEX in high volume environments?

2002-09-28 Thread Greg Copeland

On Sat, 2002-09-28 at 02:16, Shridhar Daithankar wrote:
 On 28 Sep 2002 at 17:08, Justin Clift wrote:
 
  Have moved the indexes to another drive, then created symlinks to them.
  Ran a benchmark against the database, REINDEX'd the tables, VACUUM FULL
  ANALYZE'd, prepared to re-run the benchmark again and guess what?
  
  The indexes were back on the original drive.
  Is there a way to allow REINDEX to work without having this side affect?
  
  Pre-creating a bunch of dangling symlinks doesn't work (tried that, it
  gives a ERROR:  cannot create accounts_pkey: File exists on FreeBSD
  4.6.2 when using the REINDEX).
 
 Looks like we should have a subdirectory in database directory which stores 
 index.
 
 May be transaction logs, indexes goes in separte directory which can be 
 symlinked. Linking a directory is much simpler solution than linking a file.
 
 I suggest we have per database transaction log and indexes created in separate 
 subdirectories for each database. Furhter given that large tables are segmented 
 after one GB size, a table should have it's own subdirectory optionally..
 
 At the cost of few inodes, postgresql would gain much more flexibility and 
 hence tunability..
 
 May be TODO for 7.4? Anyone?


Very neat idea!  Sounds like an excellent way of gaining lots of
granularity!

I can't even think of a reason not to use the directory per table scheme
all the time.  Perhaps simply allowing for a script/tool that will
automatically perform such a physical table migration to a distinct 
directory would be in order too.

Either way, sounds like a good idea.

Greg





signature.asc
Description: This is a digitally signed message part


Re: [HACKERS] How to REINDEX in high volume environments?

2002-09-28 Thread Tom Lane

Justin Clift [EMAIL PROTECTED] writes:
 Shridhar Daithankar wrote:
 Looks like we should have a subdirectory in database directory which stores
 index.

 That was my first thought also, but an alternative/additional approach
 would be this (not sure if it's workable):

See the tablespaces TODO item.  I'm not excited about building
half-baked versions of tablespaces before we get around to doing the
real thing ...

regards, tom lane

---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

http://www.postgresql.org/users-lounge/docs/faq.html