Re: [HACKERS] Views, views, views! (long)

2005-05-09 Thread Darren King
On Friday the 6th of May 2005, Mr. Treat opined: 

> I also don't think it is any harder to learn to query the
> system tables than it would be to learn to query these new
> views (with a few caevets that I will come back to) and it
> might actually be better.

Admin tools are in a sense already a gui "view" into the database, so
whether they elect to use these new views seems rather immaterial to
this discussion, to me at least.

The more important aspect to me is as a user.  As someone last week
pointed out (but it seems to have been lost in the tool discussion) is
that these views give the user easy access to system information from
the command line.  This is where typing a multi-join querey against the
system catalog is error-prone and a PITA.

An even bigger point is that these views can be used in scripts or
procedures where the results can be used in another query or put into a
table.  The '\d' command might show me a list of tables, but I for one
find the Oracle-style views such as ALL_TABLES, etc. more useful.

I've always found the slash-style commands to be basically useful in
showing what's there but using them to build reports or extract data for
use in other queries is painful compared to having queriable views of
the same data.

Darren

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


Re: [HACKERS] Minor TODO list changes

2004-11-04 Thread Darren King
You are correct.

I would envision being able to alter a table "read-write" at any point.
If the index(es) on the table are completely filled from being created
in read-only mode, then the affected pages should be split with the
default fillfactor when/if a row is inserted or updated.  Altering the
table back to read-only would simple leave the index as is with a few
pages not filled, but still fully functional.

The insert-only index is intriguing as well though.

Darren

-Original Message-
From: Simon Riggs [mailto:[EMAIL PROTECTED] 
Sent: Thursday, November 04, 2004 1:50 PM
To: Rod Taylor
Cc: Darren King; Bruce Momjian; PostgreSQL Development
Subject: Re: [HACKERS] Minor TODO list changes

On Thu, 2004-11-04 at 18:15, Rod Taylor wrote:
> > At some point it would also be nice to be able to mark tables as 
> > read-only and then any indexes created on that table after that 
> > would have a fillfactor of 100%.  Then I'd be able to load the 
> > table, alter it to be read-only, then add the appropriate indexes 
> > that are automatically compacted.
> 
> If it's read-only, you might as well remove a number of the internal 
> fields used for visibility as well.

Yes, should be able to save 16 bytes/row for an INSERT only table that
would still allow multiple simultaneous COPY jobs against it, with no
more than 1 statement per transaction. I'd like to create an additional
tuple layout using the tuple version bits, so you'd be able to set a
flag at CREATE TABLE time to use that as an optional alternative from
the standard one. UPDATEs and DELETEs would be permanently disallowed
against such tables, just as if privileges had not been granted.
TRUNCATE would still work, however.

Call it something like NOMODIFY? You could then alter VACUUM to skip
such tables, so you'd be able to do a VACUUM database without scanning
all of the largest tables in your system.

Darren's first idea would then be interpreted as automatically setting
FILLFACTOR=100 on indexes of NOMODIFY tables.

Darren's second idea was dynamic: i.e. an ALTER TABLE READONLY after
loading, rather than using a different tuple layout, which would need to
be done before loading, probably at CREATE TABLE time.

Darren's second idea of READONLY tables is related, but not necessarily
the same as the NOMODIFY concept that Rod brings up.

--
Best Regards, Simon Riggs




---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [HACKERS] Minor TODO list changes

2004-11-04 Thread Darren King
In my data warehousing situation, I'd like to be able to specify that
the indexes be as compact as possible (fillfactor = 100%) in order to
hit as few index pages as necessary.

For summary tables there will not be any more inserts or deletions so
the index will not change either.  In that case, there's no point to
leaving any extra room for page-splitting. 

At some point it would also be nice to be able to mark tables as
read-only and then any indexes created on that table after that would
have a fillfactor of 100%.  Then I'd be able to load the table, alter it
to be read-only, then add the appropriate indexes that are automatically
compacted.

Darren

-Original Message-
From: Bruce Momjian [mailto:[EMAIL PROTECTED] 
Sent: Thursday, November 04, 2004 12:19 PM
To: Simon Riggs
Cc: [EMAIL PROTECTED]
Subject: Re: [HACKERS] Minor TODO list changes

Simon Riggs wrote:
> On Thu, 2004-11-04 at 16:51, Bruce Momjian wrote:
> > OK, I updated all your items.  
> 
> Thanks
> 
> > I removed fillfactor because I thought I was the only one who 
> > thought it was valuable and as I remember it was mostly useful for 
> > ISAM, which we don't support.  Can you think of a use for a non-100%

> > fillfactor?
> > 
> 
> I was under the impression the factor was 67% for data loaded on the 
> leading-edge of an index, and 50% for other INSERTs.
> (backend/access/nbtree/nbtinsert.c)
> 
> Not sure, without checking, what CREATE INDEX and COPY do, but I'm 
> guessing it is similar?
> 
> Other RDBMS use a higher leading-edge/standard fill factor.
> 
> There are situations where I'd want to set it at 90%, or even 100%. If

> I know the update rate is likely to be zero, then I'd like my indexes 
> to fit in 10-30% less memory and disk, please.
> 
> Or am I missing something?

Oh, good point.  I was thinking of just the leaf pages which I think are
100% filled.

-- 
  Bruce Momjian|  http://candle.pha.pa.us
  [EMAIL PROTECTED]   |  (610) 359-1001
  +  If your life is a hard drive, |  13 Roberts Road
  +  Christ can be your backup.|  Newtown Square, Pennsylvania
19073

---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faqs/FAQ.html



---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])


Re: [HACKERS] contrib vs. gborg/pgfoundry for replication solutions

2004-04-22 Thread Darren King
> I agree with the notion that "contrib" be removed
> from the main distribution. There is, however, a
> disconnect between supporting projects and the main system.
>
> Take a look at the www.postgresql.org web site.
> Most people visually filter out the side bars. I've
> been looking over effectiveness of paid advertisements
> in the various search engines, and the numbers are clear
> in that the paid links at the top of the page get the hits
> while the side hits get almost completely ignored. What I
> think this means is that people looking for projects will
> simply filter out any reference to gborg.
>
> What would be good, is to bring gborg a little closer to
> PostgreSQL by putting a link in the main site menu. Rather than:

What would be even better IMHO is to bring "gborg" a little close to
English or a word that makes sense to people visiting the PostgreSQL
site.

Is it "g-b-org" or "g-borg" or what?  Is it a carry-over from the Great
Bridge days or some lame hacker reference to the Borg?  I understand the
reference to Great Bridge, but do newbies?

Same goes for "pgfoundry".  It might be obvious to the people already
using PostgreSQL, but those aren't the people that need help finding
something at the web site.  What does a foundry have to do with computer
projects?

We're not associated with an iron-works or Bethlehem Steel, we're a
database project trying to attract and keep visitors.  If users can't
find what they want at our site, they'll find it somewhere else or move
on to another product that helps them do what they want.

Instead of "gborg projects", why not just call it what it is, "supported
projects"?

Darren

---(end of broadcast)---
TIP 8: explain analyze is your friend


Re: [HACKERS] Dreaming About Redesigning SQL

2003-10-24 Thread Darren King
> "Bob" <[EMAIL PROTECTED]> spewed forth...
>
> We achieved 8 times the performance with exactly the same 
> hardware. What the hell is this idiot talking about us
> relying on hardware? He is a moron. You will do everyone
> a favour if you just bounce him off the bottom of your
> killfile.
> ...
> [profanity-laced responses deleted]
> ...

It's about time to take this discussion private, guys.  It has long since stopped 
being on-topic to hacking PostgreSql internals.

Darren

---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


Re: [HACKERS] suggestions to improve postgresql suitability for data-mining

2003-07-22 Thread Darren King

> You want to process all invoices to count them
> and to sum up the amounts on a per month/area/type
> basis. The initial data size is in GB, but the
> size of the expected result is in KB (namely 2 data
> for each 100 areas * 12 months * 4 types).

The key to handling large datasets for data mining is pre-aggregation based on the 
smallest time frame needed for details.

I'd suggest running these large queries and storing the results in other tables, and 
then writing a set of functions to work with those aggregate tables.

No sense in summing up the same set of static data more than once if you can help it.

Darren

---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [HACKERS] Index location patch for review

2001-09-14 Thread Darren King

> > Attached is a patch that adds support for specifying a location  for
> > indexes via the "create database" command.
> > 
> > I believe this patch is complete, but it is my first .
> 
> This patch allows index locations to be specified as 
> different from data locations.  Is this a feature direction
> we want to go in?  Comments?

Having the table and index on separate drives can do wonders for i/o
performance. :)

darrenk


---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]



RE: [HACKERS] Performance TODO items

2001-07-30 Thread Darren King

> 3)  I am reading the Solaris Internals book and there is mention of a
> "free behind" capability with large sequential scans.  When a large
> sequential scan happens that would wipe out all the old cache entries,
> the kernel detects this and places its previous pages first
> on the free list.  For out code, if we do a sequential scan of a table
> that is larger than our buffer cache size, I think we should detect
> this and do the same.  See http://techdocs.postgresql.org for my
> performance paper for an example.
>
> New TODO entries are:
>
>   * Order duplicate index entries by tid
>   * Add queue of backends waiting for spinlock
>   * Add free-behind capability for large sequential scans

So why do we cache sequetially-read pages?  Or at least not have an
option to control it?

Oracle (to the best of my knowledge) does NOT cache pages read by a
sequential index scan for at least two reasons/assumptions (two being
all that I can recall):

1. Caching pages for sequential scans over sufficiently large tables
will just cycle the cache.  The pages that will be cached at the end of
the query will be the last N pages of the table, so when the same
sequential query is run again, the scan from the beginning of the table
will start flushing the oldest cached pages which are more than likely
going to be the ones that will be needed at the end of the scan, etc,
etc.  In a multi-user environment, the effect is worse.

2. Concurrent or consective queries in a dynamic database will not
generate plans that use the same sequential scans, so they will tend to
thrash the cache.

Now there are some databases where the same general queries are run time
after time and caching the pages from sequential scans does make sense,
but in larger, enterprise-type systems, indices are created to help
speed up the most used queries and the sequential cache entries only
serve to clutter the cache and flush the useful pages.

Is there any way that caching pages read in by a sequential scan could
be made a configurable-option?

Any chance someone could run pgbench on a test system set up to not
cache sequential reads?

Darren


---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]



RE: [HACKERS] What I do with PostgreSQL

2001-07-17 Thread Darren King

>> I am pumping about 200gb a week through the pg database,
>> and our estimated database size is something like 4tb by
>> the end of the year.
>
> Can anyone say 'Woof!'?

Amen, Lamar.  I was trying to think of something myself besides
'Wow!'...

As a side note, there's a blurb in the July 16, 2001 Interactive Week
about the MySQL AB vs NuSphere spat and the last paragraph of the
article casts a very favorable nod towards PostgreSQL.

I quote (any typos are mine) ...

"Analysts said MySQL must find a way to generate a development community
and support if it wants to compete with another open source database,
PostgreSQL, distributed by Red Hat and Great Bridge."

Article doesn't say who the "analysts" are, but the implication that
MySQL isn't up to competing with PostgreSQL was interesting to my eyes!
:)

Darren


---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to [EMAIL PROTECTED] so that your
message can get through to the mailing list cleanly



RE: [HACKERS] Re: Re: "--tuning" compile and runtime option (?)

2001-04-10 Thread Darren King

> I can't see how any configure option would be faster or
> better than the existing command line /config file parameters
> -- it would only serve to make things harder to deal with IMHO.
> "Tuning" PostgreSQL is pretty simple, and is explained pretty
> well throughout the manual (especially in the section titled
> "Understanding Performance"). We have -S -B and the fsync
> options,  That's about it..  Voice> --- right?
> All are explained in the manual and are as easy to use as anyone could
> ask...  Any OS tuning should be left up to the administrator
> as that's what administrators are for :-)

Someday, maybe other parameters (such as some of the optimizer's default
values) will be configurable/changable on the fly.  Would seem silly to
have to stop/start the server to change those.

Just an example, I'm sure there are other internal values that admins
would like to tweak without recompiling or restarting.  Would be much
nicer to login as the superuser to do it rather than shut down my site.

darrenk


---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])



RE: [HACKERS] 7.1 pg_dump fails for user-defined types (release stopper?)

2001-03-30 Thread Darren King

> A more promising idea is to hack function creation
> so that the OID assigned to the function is lower
> than the OIDs assigned to any shell types created
> when the function is defined.  Or we could try to
> hack pg_dump to fix this, but that doesn't seem
> appetizing.

Requiring OID ordering would open up a new can of worms.

What happens if the user does a drop/create on the function after
creating it?

The function could potentially be recreated with a higher OID and then
the user would be in the same situation.

If the system requires (or works around) creation ordering when creating
functions and types, ISTM that pg_dump should have to do the same.

darrenk


---(end of broadcast)---
TIP 6: Have you searched our list archives?

http://www.postgresql.org/search.mpl



RE: [HACKERS] Bug in CREATE OPERATOR

2000-12-20 Thread Darren King

>
> CREATE OPERATOR testbit (
> leftarg = bitset,
> rightarg = int4,
> procedure = testbit,
> commutator = testbit
> );
>
> Now we have a big problem, as the DROP OPERATOR command
> cannot delete the illegally named operator.

Have you tried deleting it directly from pg_operator instead of using
DROP OPERATOR?

Darren




RE: [HACKERS] Hash index on macaddr -> crash

2000-12-08 Thread Darren King

> We could fix this either by adding a new hash function to support
> macaddr, or by removing the pg_amXXX entries that claim macaddr is
> hashable.  Either change will not take effect without an initdb,
> however, and I'm loath to force one now that we've started beta.

How about creating an SQL statement that will make the change and
putting a blurb about it it in the README, INSTALL and/or FAQ?

This wouldn't require an initdb and would let people have the fix.

For things like this that update exising fields (vs adding/deleting
fields hard-wired for use in the backend), it should work, no?

Darren