date:20030822

Re: [GENERAL] [HACKERS] [pgsql-advocacy] Need concrete Why Postgres

2003-08-22 Thread Andrew Dunstan



Shridhar Daithankar wrote:

BTW any comments on storing an entire database in single file? I don't trust 
any file system for performance and data integrity if I have single 100GB file. 
I would rather have multiple of them..

I don't see why not. Entire file systems are stored within a single file 
sometimes. Examples: vmware, and IIRC UserMode Linux.

cheers

andrew

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster

Re: [GENERAL] [HACKERS] [pgsql-advocacy] Need concrete Why Postgres

2003-08-22 Thread Shridhar Daithankar

On Friday 22 August 2003 13:59, Andrew Dunstan wrote:
 Shridhar Daithankar wrote:
 BTW any comments on storing an entire database in single file? I don't
  trust any file system for performance and data integrity if I have single
  100GB file. I would rather have multiple of them..

 I don't see why not. Entire file systems are stored within a single file
 sometimes. Examples: vmware, and IIRC UserMode Linux.

Well, half the day that I have spent on interbase documnetation, I didn't see 
any WAL type logs. If transactions directly go to database and entire 
database is file, I seriously doubt about performance and recovery.

UML and VMware are emulators. You don't want to use them in production right?

I would really love if UML allowed access to filesystems. A Jail type feature 
including access control right in memory. That would really rock but it's a 
different story..

 Shridhar


---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org

Re: [GENERAL] [HACKERS] [pgsql-advocacy] Need concrete 'Why Postgres

2003-08-22 Thread Andrew Dunstan

 On Friday 22 August 2003 13:59, Andrew Dunstan wrote:
 Shridhar Daithankar wrote:
 BTW any comments on storing an entire database in single file? I
 don't
  trust any file system for performance and data integrity if I have
  single 100GB file. I would rather have multiple of them..

 I don't see why not. Entire file systems are stored within a single
 file sometimes. Examples: vmware, and IIRC UserMode Linux.

 Well, half the day that I have spent on interbase documnetation, I
 didn't see  any WAL type logs. If transactions directly go to database
 and entire  database is file, I seriously doubt about performance and
 recovery.


I am not saying I would do it that way, merely that I could see it
working. I agree about logs, though. I could see it working as 2 files,
one for the base db and one for the log.

 UML and VMware are emulators. You don't want to use them in production
 right?


I know companies using VMware extensively. It makes some sense in a multi-
platform environment. If it had lots of corruption problems people
wouldn't use it. (Personally I prefer to use VNC in such an environment).

(Interesting as this is it's probably OT for hackers, though).

cheers

andrew



---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings

Re: [HACKERS] [SQL] SELECT IN Still Broken in 7.4b

2003-08-22 Thread Shridhar Daithankar

On 21 Aug 2003 at 16:42, Tom Lane wrote:

 Stephan Szabo [EMAIL PROTECTED] writes:
  Within the scope of the new hashed IN stuff I believe so in at least some
  cases.  I have a few million row table of integers where searching for
  values IN (~1 values) takes longer than creating the temp table,
  copying into it and doing the in subquery.
 
 I did some profiling and soon realized that the main problem is the
 executor's trick for not returning the same row multiple times in a
 multiple-indexscan plan node.  The point is that given
   WHERE a = 1 OR b = 1
 you could create a plan that first indexscans on a, then indexscans on
 b --- but you mustn't return any tuples in the second scan that you
 already returned in the first.  IndexNext solves this by evaluating the
 prior-scan index conditions to see if they are true.  Which is okay if
 there aren't too many of them.  But when you have an N-element IN list
 this means you are going to do O(N^2) index expression evaluations.
 In the 1-element IN-list test case, ExecQual gets called almost
 50 million times :-(
 
 I'm toying with the notion of ripping out that logic and instead
 building an in-memory hashtable of already-returned TIDs.  This could
 theoretically run out of memory if the multiple indexscan returns enough
 tuples, but I think in practice that wouldn't happen because the planner
 wouldn't choose an indexscan when very large numbers of tuples are being
 selected.

If I understand it correctly, we are essentially looking at ~1 individual 
index scans, in above example, isn't it? So if planner takes this in account, 
it probably would not opt for seq. scan.

Other idea could be, index the in list first and search the index using 
locality of values in the in list. If the 10K values in the list are between 
10K-20K, they should be pretty easy to locate with single index sweep, assuming 
uniform distribution. May be planner could build this IN list index before 
drawing plan to determine cardinality and distribution of individual values.

On the least side, it could draw a plan for fetching a tuple range between min 
and max of IN values and building in memory hash of these values for 
comparison. That's would surely be cheaper than scanning entire table/result 
set over ad over 

Frankly, I would say a temp table is far better solution..:-)

Just a thought...

Bye
 Shridhar

--
Youth doesn't excuse everything.-- Dr. Janice Lester (in Kirk's body), 
Turnabout Intruder,  stardate 5928.5.


---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match

Re: [HACKERS] [SQL] SELECT IN Still Broken in 7.4b

2003-08-22 Thread Greg Stark


Tom Lane [EMAIL PROTECTED] writes:

 Finally, I suspect that once we get rid of the O(N^2) behavior in the
 executor, we will find that the next biggest bottleneck is in the
 planner; adding more work for it to do per OR-clause item will make
 things worse not better.

But time spent in the planner doesn't matter if you're using prepared queries
which is true for OLTP applications where there are very few queries being
executed many many times. I hear web sites are quite popular these days.

I missed the beginning of this thread so I'm not sure if it's relevant. But
wouldn't it be possible to just check if all the clauses and see if they're
using precisely the same index with an equality type operator? That won't
catch things like a BETWEEN 1 AND 2 OR a BETWEEN 5 AND 6 but it will catch
things like a IN (1,2,3,4,5,6). And I don't see how it wouldn't be linear
in the number of clauses.

-- 
greg


---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings

[HACKERS] Implementing FQA Indexes (in Italian) -- Help?

2003-08-22 Thread Josh Berkus

Folks,

Howdy!   Meet Enrico, who will soon be subscribing to the list.

Enrico wants to implement a new kind of indexing on PostgreSQL, and will do 
the programming, provided that he can get some help with the PostgreSQL code.   
The catch is that he would do better if someone could help who knew some 
Italian, since English is not Enrico's first language.   I do, but I know 
very little about our indexing code -- plus I'm out of town for a week and a 
half.

Let me post Enrico's proposal, as translated by Stefano:

In pratica si tratta di fare ricerche di similarità non esatta, ovvero se ho
un db contente immagini si tratterebbe di fare una query sull'immagine per
poter vedere qual'è quella più simile rispetto all'immagine passata come
riferimento per la query.(Ad es. ricerca di impronte digitali, sequenze di
dna etc...)

It's about making queries for proximities, that is, if I have a db 
containing images, [and if I make a query], it should return the most 
similar image to the one choosen for the query. E.g., for finger prints, 
DNA sequences, etc.

In inglese si dovrebbe tradurre come qualcosa del tipo Proximity search in
metric spaces, la struttura che ho utilizzato io è la FQA (Fixed Query
Array) ideata dal prof Edgar Chavez dell'università di Città el Messico.
Quello che ho realizzato io insieme ai prof. Marco Patella e Paolo Ciaccia
dell'univ. di Bologna è stata la realizzazione di un algoritmo ottimale per
la ricerca del oggetto più vicino a all'oggetto query, quello che in gergo
viene chiamato Neirest Neighboor (N.N.), e di un algoritmo approssimato
sempre per il calcolo del N.N.

In english it should be translated as Proximity search in metric spaces, 
the structure I used is the FQA (Fixed Query Array), created by prof. Edgar 
Chavez, from the university of [Ciudad del Mexico]. What I've done along 
with prof. Marco Patella and prof. Paolo Ciaccia from the university of 
Bologna [one of the oldest and most important universities of Italy] is an 
optimal algorythm that searches for the object nearest to the given one, a 
thing called Nearest Neighbor (N.N.), and an algorythm for the calculus 
of the N.N.

Il tutto è stato realizzato con file testuali (le immagini venivano
convertite in stringhe numeriche a n dimensioni una per ogni caratteristica,
ad es. forma, colore etc..)
Ora siccome ho visto che postgresql già implementa gli R-Tree per la ricerca
multidimensionale in ambiente multimediale l'idea era quella di implentare
le FQA su un server database vero.

It was done using text files (images were converted to n-dimensional 
numeric strings, one dimension for each thing e.g. shape, color, etc).
As PostgreSQL already uses R-Trees for multidimensional searches in 
multimedia environments, my idea was about implementing FQAs on a real 
database server.

-- 
Josh Berkus
Aglio Database Solutions
San Francisco

---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly

Re: [HACKERS] [SQL] SELECT IN Still Broken in 7.4b

2003-08-22 Thread Stephan Szabo

On 22 Aug 2003, Greg Stark wrote:


 Tom Lane [EMAIL PROTECTED] writes:

  Finally, I suspect that once we get rid of the O(N^2) behavior in the
  executor, we will find that the next biggest bottleneck is in the
  planner; adding more work for it to do per OR-clause item will make
  things worse not better.

 But time spent in the planner doesn't matter if you're using prepared queries
 which is true for OLTP applications where there are very few queries being
 executed many many times. I hear web sites are quite popular these days.

True, but not every application is such.  You still need to balance
planning time with other concerns.

 I missed the beginning of this thread so I'm not sure if it's relevant. But
 wouldn't it be possible to just check if all the clauses and see if they're
 using precisely the same index with an equality type operator? That won't
 catch things like a BETWEEN 1 AND 2 OR a BETWEEN 5 AND 6 but it will catch
 things like a IN (1,2,3,4,5,6). And I don't see how it wouldn't be linear
 in the number of clauses.

But that wouldn't help unless you made sure that what was being compared
was not the same AFAICT (since the point would be to not need to check if
the rows were already returned) since a=1 or a=1 is legal if meaningless.
And that's not necessarily immediately obvious depending on the behavior
of the = operator without trying it, for example does where a='c ' or
a='c' have a redundancy or not?


---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org

Re: [HACKERS] Single-file DBs WAS: Need concrete Why Postgres

2003-08-22 Thread Josh Berkus

Guys,

 BTW any comments on storing an entire database in single file? I don't
  trust any file system for performance and data integrity if I have single
  100GB file. I would rather have multiple of them..

 I don't see why not. Entire file systems are stored within a single file
 sometimes. Examples: vmware, and IIRC UserMode Linux.

Several database systems use a single file for data storage.   The problem 
with this is that it's not really a single file  it's a proprietary file 
system on top of the host file system.   This sort of design makes a couple 
assumptions:

1) That the database is better than the host filesystem/OS and storage system 
at regulating its use of, and storage of, data files;
2) that your data file will not exceed the maximum file size for the host OS.

Both of these assumptions are, IMHO, based on antiquated data (or on Windows).  
Modern *nix filesystems and RAID are very, very efficient at file access and 
only a database with the development budget of Oracle could hope to keep up.  
Additionally, databases larger than 2GB are becoming increasingly common.

Single-file databases also introduce a number of problems:

1) The database file is extremely vulnerable to corruption, and if corruption 
occurs it is usually not localized but destroys the entire database due to 
corruption of the internal file structure.  Recovery of raw data out of a 
damaged single-file database inevitably requires specialized tools if it is 
possible at all.
2) Often DBAs are prevented from using normal file operations on the database 
files to maintain their systems.  For example, try moving a partition on an 
MS SQL Server installation.  Go on, I double-dog dare you.
3) Due to the necessity of maintaining not only data and metadata, but a file 
partitioning structure as well, maintenance on single-file databases is often 
more time-consuming but at the same time more crucial (to prevent #1).

-- 
Josh Berkus
Aglio Database Solutions
San Francisco

---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]

Re: [HACKERS] Single-file DBs WAS: Need concrete Why Postgres

2003-08-22 Thread Jan Wieck

Some well known database that is very popular amongst people who care 
more for their data than for license fees uses few very big files that 
are statically allocated (if using files instead of raw devices).

Sure does Oracle internally maintain some sort of filesystem. But this 
is more due to other reasons.

If a filesystem contains only very few big files (and nothing else) and 
these files do not grow or shrink during normal operation and are really 
fully allocated in the block tables, then said filesystems metadata does 
not change and that means that the filesystem will never ever be corrupt 
from the OS's point of view (except due to hardware failure). Plus, an 
FSCK on a filesystem with very few huge files is fast, really *fast*. So 
in the case of an OS crash, your system is up in no time again, no 
matter how big your database is.

From there the DB itself maintains it's own metadata and has control 
with it's WAL and other mechanisms over what needs to be redone, undone 
and turned around to get back into a consistent state.

Jan

Josh Berkus wrote:

Guys,

BTW any comments on storing an entire database in single file? I don't
 trust any file system for performance and data integrity if I have single
 100GB file. I would rather have multiple of them..
I don't see why not. Entire file systems are stored within a single file
sometimes. Examples: vmware, and IIRC UserMode Linux.
Several database systems use a single file for data storage.   The problem 
with this is that it's not really a single file  it's a proprietary file 
system on top of the host file system.   This sort of design makes a couple 
assumptions:

1) That the database is better than the host filesystem/OS and storage system 
at regulating its use of, and storage of, data files;
2) that your data file will not exceed the maximum file size for the host OS.

Both of these assumptions are, IMHO, based on antiquated data (or on Windows).  
Modern *nix filesystems and RAID are very, very efficient at file access and 
only a database with the development budget of Oracle could hope to keep up.  
Additionally, databases larger than 2GB are becoming increasingly common.

Single-file databases also introduce a number of problems:

1) The database file is extremely vulnerable to corruption, and if corruption 
occurs it is usually not localized but destroys the entire database due to 
corruption of the internal file structure.  Recovery of raw data out of a 
damaged single-file database inevitably requires specialized tools if it is 
possible at all.
2) Often DBAs are prevented from using normal file operations on the database 
files to maintain their systems.  For example, try moving a partition on an 
MS SQL Server installation.  Go on, I double-dog dare you.
3) Due to the necessity of maintaining not only data and metadata, but a file 
partitioning structure as well, maintenance on single-file databases is often 
more time-consuming but at the same time more crucial (to prevent #1).



--
#==#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me.  #
#== [EMAIL PROTECTED] #
---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
 subscribe-nomail command to [EMAIL PROTECTED] so that your
 message can get through to the mailing list cleanly

Re: [HACKERS] Single-file DBs WAS: Need concrete Why Postgres

2003-08-22 Thread Andrew Rawnsley

On Friday, August 22, 2003, at 12:07 PM, Josh Berkus wrote:

Single-file databases also introduce a number of problems:

1) The database file is extremely vulnerable to corruption, and if 
corruption
occurs it is usually not localized but destroys the entire database 
due to
corruption of the internal file structure.  Recovery of raw data out 
of a
damaged single-file database inevitably requires specialized tools if 
it is
possible at all.

snip

Having fallen victim to Oracle crapping in its own nest and doing this 
exact thing, and having to drop some stupid amount of $$ to Oracle for 
them to use their specialized tool to try to recover data (which they 
really didn't do much of), I concur with this statement.

Boy, was that a lousy experience.

--
Josh Berkus
Aglio Database Solutions
San Francisco
---(end of 
broadcast)---
TIP 1: subscribe and unsubscribe commands go to 
[EMAIL PROTECTED]



Andrew Rawnsley
President
The Ravensfield Digital Resource Group, Ltd.
(740) 587-0114
www.ravensfield.com
---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster

Re: [HACKERS] Single-file DBs WAS: Need concrete Why Postgres

2003-08-22 Thread Josh Berkus

Jan,

 If a filesystem contains only very few big files (and nothing else) and 
 these files do not grow or shrink during normal operation and are really 
 fully allocated in the block tables, then said filesystems metadata does 
 not change and that means that the filesystem will never ever be corrupt 
 from the OS's point of view (except due to hardware failure). Plus, an 
 FSCK on a filesystem with very few huge files is fast, really *fast*. So 
 in the case of an OS crash, your system is up in no time again, no 
 matter how big your database is.

I'm not talking about problems with the host filesystem.  I'm talking about 
problems with the data file itself.   From my perspective, the length of time 
it takes to do an FSCK is inconsequential, because I do one maybe once every 
two years.  

It does you little good, though, to have the host OS reporting that the files 
are OK, when the database won't run.

  From there the DB itself maintains it's own metadata and has control 
 with it's WAL and other mechanisms over what needs to be redone, undone 
 and turned around to get back into a consistent state.

Yes, but you've just added a significant amount to the work the DB system 
needs to do in recovery.   PostgreSQL just needs to check for, and recover 
from, issues with LSN headers and transactions.   Single-file DBs, like SQL 
Server, need to also check and audit the internal file partitioning.

In my experience (a lot of MS SQL, more MS Access than I want to talk about, 
and a little Oracle) corruption failures on single-file databases are more 
frequent than databases which depend on the host OS, and such failures are 
much more severe when the occur.

-- 
-Josh Berkus
 Aglio Database Solutions
 San Francisco


---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly

Re: [HACKERS] Single-file DBs WAS: Need concrete Why Postgres

2003-08-22 Thread Mike Mascari

Josh Berkus wrote:

 Jan,

 In my experience (a lot of MS SQL, more MS Access than I want to talk about, 
 and a little Oracle) corruption failures on single-file databases are more 
 frequent than databases which depend on the host OS, and such failures are 
 much more severe when the occur.


Vadim seemed to think differently:

http://groups.google.com/groups?hl=enlr=ie=UTF-8oe=UTF-8threadm=00030722102200.00601%40lorc.wgcr.orgrnum=9prev=/groups%3Fhl%3Den%26lr%3D%26ie%3DUTF-8%26oe%3DUTF-8%26q%3DVadim%2Bsingle%2Bfile%2Bpostgres

In addition to Jan's points, using a single pre-allocated file also
reduces file descriptor consumption, although I don't know what the
costs are regarding maintaining the LRU of file descriptors, the price
of opens and closes, the price of having a high upper limit of file
descriptors, etc.

Just because Oracle and MS do something doesn't necessary make it
wrong. :-)


Mike Mascari
[EMAIL PROTECTED]









---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match

Re: [HACKERS] Single-file DBs WAS: Need concrete Why Postgres

2003-08-22 Thread Maksim Likharev

Just want to add my $0.03 on that.
single file DB has some benefits as:
1. ability to allocate one continues chunk on disk, 
   significantly reduces disk seeks, 
   mostly ancestral but still true today ( I still see DB living on none
arrays ) 
2. And of cause countless design considerations.

Saying that corruption in single file DB more severe probably not true
cause
one way or another you have to restore from backup, unless you can yank
corrupted
table/index/etc. file from some place else.

And more important, I have no idea how Oracle does it, but for MS SQL
transaction log
is a separate file, so if you lost transaction log... this is another
story, but 
nothing to do with single/multi file dbs.

And as for SQL Server, DB is usually 2 files db and log, very very
useful in handling
I can just copy, zip one file or attach/detach db.

Anyway there is no black and white here, different designs and different
limitations, like
On Windows it is very reasonable and I would say correct to have single
db file.
On *nix it is difficult ( using POSIX calls ), to have single file due
to 2GB 
limit on 32 bit CPUs and so on...
Choices, choices...


-Original Message-
From: Josh Berkus [mailto:[EMAIL PROTECTED]
Sent: Friday, August 22, 2003 1:05 PM
To: Jan Wieck
Cc: Andrew Dunstan; PostgreSQL-development
Subject: Re: [HACKERS] Single-file DBs WAS: Need concrete Why Postgres



Yes, but you've just added a significant amount to the work the DB
system 
needs to do in recovery.   PostgreSQL just needs to check for, and
recover 
from, issues with LSN headers and transactions.   Single-file DBs, like
SQL 
Server, need to also check and audit the internal file partitioning.

In my experience (a lot of MS SQL, more MS Access than I want to talk
about, 
and a little Oracle) corruption failures on single-file databases are
more 
frequent than databases which depend on the host OS, and such failures
are 
much more severe when the occur.

-- 
-Josh Berkus
 Aglio Database Solutions
 San Francisco

---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match

Re: [HACKERS] Collation rules and multi-lingual databases

2003-08-22 Thread Greg Stark


So, I needed a way to sort using collation rules other than the one the
database was built with. So I wrote up the following function exposing strxfrm
with an extra parameter to specify the LC_COLLATE value to use.

This is my first C function so I'm really unsure that I've done the right
thing. For the most part I pattern-matched off the string_io code in the
contrib directory. 

In particular I'm unsure about the code postgres-interfacing code in
c_varcharxfrm which makes an extra copy of both parameters that are passed in
and an extra copy of the result value. Are varchars guaranteed to be
nul-terminated? If so I can dispose of two of the copies. And I can probably
eliminate the copying of the result by alloting extra space when I allocate it
initially.

But more generally. Would it make more sense to use text or bytea or something
else to store these opaque binary strings? At least with glibc they tend to be
unreadable anyways.

Other caveats: It's condemned to be permanently non-threadsafe because the
whole locale system is a non thread-safe API. Also I fear some systems will
leak memory like a sieve when you call setlocale a few thousand times instead
of the 1 time at initialization that they foresaw. At least glibc doesn't seem
to leak in my brief testing.

If it's deemed a reasonable approach and nobody has any fatal flaws then I
expect it would be useful to put in the contrib directory?


/*
 * This software is distributed under the GNU General Public License
 * either version 2, or (at your option) any later version.
 */

#include postgres.h

#include locale.h

#include utils/builtins.h

static 
unsigned char * xfrm(unsigned char *data, int size, const unsigned char *locale, int localesize);

unsigned char * c_varcharxfrm(unsigned char *s, const unsigned char *locale);


static unsigned char *
xfrm(unsigned char *data, int size, const unsigned char *locale, int localesize)
{
  size_t length = size*3+4;
  char *transformed;
  size_t transformed_length;
  char *oldlocale, *newlocale;
 
  /* First try a buffer perhaps big enough.  */
  transformed = palloc (length);
 
  oldlocale = setlocale(LC_COLLATE, NULL);
  if (!oldlocale) {
elog(ERROR, setlocale(LC_COLLATE,NULL) failed to return a locale);
return NULL;
  }
  
  newlocale = setlocale(LC_COLLATE, locale);
  if (!newlocale) {
elog(ERROR, setlocale(LC_COLLATE,%s) failed to return a locale, locale);
return NULL;
  }

  transformed_length = strxfrm (transformed, data, length);

  /* If the buffer was not large enough, resize it and try again.  */
  if (transformed_length = length) {
elog(INFO, Calling strxfrm again because result didn't fit (%d%d), transformed_length, length);
length = transformed_length + 1;
transformed = palloc(length);
strxfrm (transformed, data, length);
  }
 
  newlocale = setlocale(LC_COLLATE, oldlocale);

  Assert(newlocale  !strcmp(newlocale,oldlocale));
  if (!newlocale || strcmp(newlocale,oldlocale)) {
elog(ERROR, Failed to reset locale (trying to reset locale to %s from %s instead got %s), oldlocale, locale, newlocale);
  }
  
  return transformed;
}


unsigned char *
c_varcharxfrm(unsigned char *s, const unsigned char *l)
{
  int lens = 0, lenl = 0, lenr = 0;
  unsigned char *str, *locale, *retval, *retval2;

  if (s) {
lens = *(int32 *) s - 4;
str = palloc(lens+1);
memcpy(str, s+4, lens);
str[lens]='\0';
  }

  if (l) {
lenl = *(int32 *) l - 4;
locale = palloc(lenl+1);
memcpy(locale, l+4, lenl);
locale[lenl]='\0';
  }

  retval = xfrm(str, lens, locale, lenl);
  
  lenr = strlen(retval);
  retval2 = palloc(lenr+5);
  memcpy(retval2+4, retval, lenr+1);
  *(int32 *)retval2 = lenr;
  
  return retval2;
}





/*
 * Local Variables:
 *	tab-width: 4
 *	c-indent-level: 4
 *	c-basic-offset: 4
 * End:
 */
SET search_path = public;

SET autocommit TO 'on';

CREATE OR REPLACE FUNCTION xfrm(varchar, varchar)
RETURNS varchar
AS 'strxfrm.so', 'c_varcharxfrm'
LANGUAGE 'C' STRICT IMMUTABLE ;



-- 
greg

---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org

Re: [HACKERS] Single-file DBs WAS: Need concrete Why Postgres

2003-08-22 Thread Josh Berkus

Mike,
 Vadim seemed to think differently:
 
http://groups.google.com/groups?hl=enlr=ie=UTF-8oe=UTF-8threadm=00030722102200.00601%40lorc.wgcr.orgrnum=9prev=/groups%3Fhl%3Den%26lr%3D%26ie%3DUTF-8%26oe%3DUTF-8%26q%3DVadim%2Bsingle%2Bfile%2Bpostgres

Bad link.  This gives me a post by Lamar Owen talking about usng OIDs to name 
files.

-- 
-Josh Berkus
 Aglio Database Solutions
 San Francisco


---(end of broadcast)---
TIP 8: explain analyze is your friend

Re: [HACKERS] Collation rules and multi-lingual databases

2003-08-22 Thread Peter Eisentraut

Greg Stark writes:

 This is my first C function so I'm really unsure that I've done the right
 thing. For the most part I pattern-matched off the string_io code in the
 contrib directory.

That was just about the worst example you could have picked.  Please
forget everything you have seen and start by reading the documentation.
In particular, learn about the version 1 call convention and about the
PostgreSQL license.  And read some code under src/backend/utils/adt.

 In particular I'm unsure about the code postgres-interfacing code in
 c_varcharxfrm which makes an extra copy of both parameters that are passed in
 and an extra copy of the result value. Are varchars guaranteed to be
 nul-terminated?

They are guaranteed not to be null-terminated.

 But more generally. Would it make more sense to use text or bytea or something
 else to store these opaque binary strings? At least with glibc they tend to be
 unreadable anyways.

bytea

 If it's deemed a reasonable approach and nobody has any fatal flaws then I
 expect it would be useful to put in the contrib directory?

I'd expect it to be too slow to be useful.  Have you run performance tests?

-- 
Peter Eisentraut   [EMAIL PROTECTED]


---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings

Re: [HACKERS] Collation rules and multi-lingual databases

2003-08-22 Thread Stephan Szabo


On 22 Aug 2003, Greg Stark wrote:


 So, I needed a way to sort using collation rules other than the one the
 database was built with. So I wrote up the following function exposing strxfrm
 with an extra parameter to specify the LC_COLLATE value to use.

 This is my first C function so I'm really unsure that I've done the right
 thing. For the most part I pattern-matched off the string_io code in the
 contrib directory.

 In particular I'm unsure about the code postgres-interfacing code in
 c_varcharxfrm which makes an extra copy of both parameters that are passed in
 and an extra copy of the result value. Are varchars guaranteed to be
 nul-terminated? If so I can dispose of two of the copies. And I can probably
 eliminate the copying of the result by alloting extra space when I allocate it
 initially.

 But more generally. Would it make more sense to use text or bytea or something
 else to store these opaque binary strings? At least with glibc they tend to be
 unreadable anyways.

 Other caveats: It's condemned to be permanently non-threadsafe because the
 whole locale system is a non thread-safe API. Also I fear some systems will
 leak memory like a sieve when you call setlocale a few thousand times instead
 of the 1 time at initialization that they foresaw. At least glibc doesn't seem
 to leak in my brief testing.

 If it's deemed a reasonable approach and nobody has any fatal flaws then I
 expect it would be useful to put in the contrib directory?

I'm not sure that ERROR if the locale cannot be put back is sufficient
(although that case should be rare or non-existant). Unless something else
in the system resets the locale, after your transaction rolls back, you're
in a dangerous state.  I'd think FATAL would be better.


---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings

Re: [HACKERS] Collation rules and multi-lingual databases

2003-08-22 Thread Tom Lane

Stephan Szabo [EMAIL PROTECTED] writes:
 On 22 Aug 2003, Greg Stark wrote:
 If it's deemed a reasonable approach and nobody has any fatal flaws then I
 expect it would be useful to put in the contrib directory?

 I'm not sure that ERROR if the locale cannot be put back is sufficient
 (although that case should be rare or non-existant).

A bigger risk is that something might elog(ERROR) while you have the
wrong locale set, denying you the chance to put back the right one.
I think this code is not nearly paranoid enough about how much it does
while the wrong locale is set.

 Unless something else
 in the system resets the locale, after your transaction rolls back, you're
 in a dangerous state.  I'd think FATAL would be better.

I'd go so far as to make it a critical section --- that ensures that any
ERROR will be turned to FATAL, even if it's in a subroutine you call.

regards, tom lane

---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org

Re: [HACKERS] Collation rules and multi-lingual databases

2003-08-22 Thread Stephan Szabo

On Fri, 22 Aug 2003, Tom Lane wrote:

 Stephan Szabo [EMAIL PROTECTED] writes:
  On 22 Aug 2003, Greg Stark wrote:
  If it's deemed a reasonable approach and nobody has any fatal flaws then I
  expect it would be useful to put in the contrib directory?

  I'm not sure that ERROR if the locale cannot be put back is sufficient
  (although that case should be rare or non-existant).

 A bigger risk is that something might elog(ERROR) while you have the
 wrong locale set, denying you the chance to put back the right one.
 I think this code is not nearly paranoid enough about how much it does
 while the wrong locale is set.

True, there are calls to palloc, elog, etc inside there, although the elog
could be removed.

  Unless something else
  in the system resets the locale, after your transaction rolls back, you're
  in a dangerous state.  I'd think FATAL would be better.

 I'd go so far as to make it a critical section --- that ensures that any
 ERROR will be turned to FATAL, even if it's in a subroutine you call.

I didn't know we could do that, could be handy, although the comments
imply that it turns into PANIC which would force a complete restart.  Then
again, it's better than a corrupted database.


---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly

[HACKERS] strerror_r and gethostbyname_r?

2003-08-22 Thread Larry Rosenman

How can I --enable-thread-safety and JUST need the getpwuid_r function and
not the strerror_r and gethostbyname_r function?
UnixWare doesn't have strerror_r and gethostbyname_r, and we DONT NEED THEM!

Please help me fix this.

Also, Olivier is right, there is an issue with the unixware template.

the below fixes that issue.

(what is running the template file??, which Shell or GMAKE?)

Index: src/template/unixware
===
RCS file: /projects/cvsroot/pgsql-server/src/template/unixware,v
retrieving revision 1.15
diff -u -r1.15 unixware
--- src/template/unixware   16 Aug 2003 15:35:51 -  1.15
+++ src/template/unixware   23 Aug 2003 02:22:11 -
@@ -11,4 +11,4 @@
SUPPORTS_THREADS=yes
NEED_REENTRANT_FUNC_NAMES=yes
-THREAD_CFLAGS += -D_REENTRANT
+THREAD_CFLAGS=$THREAD_CFLAGS -D_REENTRANT
--
Larry Rosenman http://www.lerctr.org/~ler
Phone: +1 972-414-9812 E-Mail: [EMAIL PROTECTED]
US Mail: 1905 Steamboat Springs Drive, Garland, TX 75044-6749


unixware.patch
Description: Binary data

---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly

Re: [HACKERS] Single-file DBs WAS: Need concrete Why Postgres

2003-08-22 Thread Lamar Owen

On Friday 22 August 2003 18:42, Josh Berkus wrote:
 Bad link.  This gives me a post by Lamar Owen talking about usng OIDs to
 name files.

I think he may be referring to the last paragraph.  Vadim had said that the 
tablenames would go to OIDs.  They have always been individual files.  Been a 
long time since I wrote that e-mail
-- 
Lamar Owen
Director of Information Technology
Pisgah Astronomical Research Institute

---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org

[HACKERS] Header files installed for contrib modules?

2003-08-22 Thread Robert Creager


Hey,

Just wondering if there is currently any mechanism in the contrib makefile
hierarchy for installing header files into an appropriate directory.  I didn't
find anything.

I'm using PGSphere (spherical types/operators), and trying to help them out a
little here and there.  When I converted my application over to use their
type/operators, I found that although the shared library is installed fine,
there appears to be no method for installing the header files into the
production directory scheme.

Since the library is installed into the pgsql/lib directory, shouldn't the
header files required by server side development be installed into
pgsql/include? Maybe not by default, but with a 'install-all-headers' like
option?

Thanks,
Rob

-- 
 22:05:34 up 21 days, 14:47,  4 users,  load average: 2.01, 2.03, 2.00


pgp0.pgp
Description: PGP signature

Re: [HACKERS] [GENERAL] Buglist

2003-08-22 Thread Shridhar Daithankar

On Friday 22 August 2003 16:23, Manfred Koizar wrote:
 On Fri, 22 Aug 2003 12:15:33 +0530, Shridhar Daithankar

 [EMAIL PROTECTED] wrote:
  Which leads us to a zero gravity vacuum, that does the lazy vacuum for
  pages currently available in the buffer cache only. [...]
 
 Since autovacuum issues vacuum analyze only, is it acceptable to say that
  this is taken care of already?

 Even a plain VACUUM (without FULL) scans the whole relation to find
 the (possibly few) pages that need to be changed.  We are trying to
 find a way to avoid those needless reads of clean pages, because (a)
 they are IOs competing with other disk operations and (b) they push
 useful pages out of OS cache and (c) of PG shared buffers.  The latter
 might become a non-issue with LRU-k, 2Q or ARC.  But (a) and (b)
 remain.

Umm.. What does FSM does then? I was under impression that FSM stores page 
pointers and vacuum work on FSM information only. In that case, it wouldn't 
have to waste time to find out which pages to clean.

 Shridhar


---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly

Re: [GENERAL] [HACKERS] [pgsql-advocacy] Need concrete Why Postgres not MySQL bullet list

2003-08-22 Thread Shridhar Daithankar

On 21 Aug 2003 at 21:30, Manfred Koizar wrote:

 On Thu, 21 Aug 2003 15:05:52 +0200, I wrote:
 Just wondering, what other databases has transactable DDLs?
 
 Firebird.
 
 Stop!  I withdraw that statement.  I must have mis-read some feature
 list :-(
 
 Tests with InterBase 6 showed that you can change metadata within a
 transaction, but when you ROLLBACK, metadata changes persist.

Well, isql documentation mentions that DDLs don't go to database unless you 
commit and autoddl parameter defaults to true.

Looks like there definition of transactable does not include a rollback case. 
Oops!

BTW any comments on storing an entire database in single file? I don't trust 
any file system for performance and data integrity if I have single 100GB file. 
I would rather have multiple of them..

Bye
 Shridhar

--
Moore's Constant:   Everybody sets out to do something, and everybody   does 
something, but no one does what he sets out to do.


---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org

Re: [HACKERS] [GENERAL] Buglist

2003-08-22 Thread Manfred Koizar

On Fri, 22 Aug 2003 16:27:53 +0530, Shridhar Daithankar
[EMAIL PROTECTED] wrote:
What does FSM does then?

FSM = Free Space Map.  VACUUM writes information into the FSM, INSERTs
consult the FSM to find pages with free space for new tuples.

 I was under impression that FSM stores page 
pointers and vacuum work on FSM information only. In that case, it wouldn't 
have to waste time to find out which pages to clean.

This has been discussed yesterday here and on -hackers (Decent VACUUM
(was: Buglist)).  We were talking about inventing a second data
structure: RSM.

Servus
 Manfred

---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]

Re: [HACKERS] [GENERAL] Buglist

2003-08-22 Thread Jan Wieck

Shridhar Daithankar wrote:

On Friday 22 August 2003 16:23, Manfred Koizar wrote:
On Fri, 22 Aug 2003 12:15:33 +0530, Shridhar Daithankar

[EMAIL PROTECTED] wrote:
 Which leads us to a zero gravity vacuum, that does the lazy vacuum for
 pages currently available in the buffer cache only. [...]

Since autovacuum issues vacuum analyze only, is it acceptable to say that
 this is taken care of already?
Even a plain VACUUM (without FULL) scans the whole relation to find
the (possibly few) pages that need to be changed.  We are trying to
find a way to avoid those needless reads of clean pages, because (a)
they are IOs competing with other disk operations and (b) they push
useful pages out of OS cache and (c) of PG shared buffers.  The latter
might become a non-issue with LRU-k, 2Q or ARC.  But (a) and (b)
remain.
Umm.. What does FSM does then? I was under impression that FSM stores page 
pointers and vacuum work on FSM information only. In that case, it wouldn't 
have to waste time to find out which pages to clean.
It's the other way around! VACUUM scan's the tables to find and reclaim 
free space and remembers that free space in the FSM. The regular 
backends that need storage space to insert tuples then use the free 
space in the pages that are recorded in the FSM instead of adding new 
pages at the end of the relations.

Jan
 Shridhar

---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings


--
#==#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me.  #
#== [EMAIL PROTECTED] #
---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
 joining column's datatypes do not match

Re: [HACKERS] [GENERAL] Buglist

2003-08-22 Thread Tom Lane

Jan Wieck [EMAIL PROTECTED] writes:
 Shridhar Daithankar wrote:
 Umm.. What does FSM does then? I was under impression that FSM stores page 
 pointers and vacuum work on FSM information only. In that case, it wouldn't 
 have to waste time to find out which pages to clean.

 It's the other way around! VACUUM scan's the tables to find and reclaim 
 free space and remembers that free space in the FSM.

Right.  One big question mark in my mind about these partial vacuum
proposals is whether they'd still allow adequate FSM information to be
maintained.  If VACUUM isn't looking at most of the pages, there's no
very good way to acquire info about where there's free space.

regards, tom lane

---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org

Re: [HACKERS] [GENERAL] Buglist

2003-08-22 Thread Shridhar Daithankar

On 22 Aug 2003 at 10:45, Tom Lane wrote:

 Jan Wieck [EMAIL PROTECTED] writes:
  Shridhar Daithankar wrote:
  Umm.. What does FSM does then? I was under impression that FSM stores page 
  pointers and vacuum work on FSM information only. In that case, it wouldn't 
  have to waste time to find out which pages to clean.
 
  It's the other way around! VACUUM scan's the tables to find and reclaim 
  free space and remembers that free space in the FSM.
 
 Right.  One big question mark in my mind about these partial vacuum
 proposals is whether they'd still allow adequate FSM information to be
 maintained.  If VACUUM isn't looking at most of the pages, there's no
 very good way to acquire info about where there's free space.

Somehow it needs to get two types of information.

A. If any transaction is accessing a page
B. If a page contains any free space.

Vacuum needs to look for pages not in A but in B. Can storage manager maintain 
two lists/hashes with minimal cost? In that case, all unlocked and not in 
transaction pages could be a much smaller subset.

Does it sound bizzare?


Bye
 Shridhar

--
Chemicals, n.:  Noxious substances from which modern foods are made.


---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster

Re: [HACKERS] [GENERAL] Buglist

2003-08-22 Thread Jan Wieck

Tom Lane wrote:

Jan Wieck [EMAIL PROTECTED] writes:
Shridhar Daithankar wrote:
Umm.. What does FSM does then? I was under impression that FSM stores page 
pointers and vacuum work on FSM information only. In that case, it wouldn't 
have to waste time to find out which pages to clean.

It's the other way around! VACUUM scan's the tables to find and reclaim 
free space and remembers that free space in the FSM.
Right.  One big question mark in my mind about these partial vacuum
proposals is whether they'd still allow adequate FSM information to be
maintained.  If VACUUM isn't looking at most of the pages, there's no
very good way to acquire info about where there's free space.
That's why I think it needs one more pg_stat column to count the number 
of vacuumed tuples. If one does

tuples_updated + tuples_deleted - tuples_vacuumed

he'll get approximately the number of tuples a regular vacuum might be 
able to reclaim. If that number is really small, no need for autovacuum 
to cause any big trouble by scanning the relation.

Another way to give autovacuum some hints would be to return some number 
as commandtuples from vacuum. like the number of tuples actually 
vacuumed. That together with the new number of reltuples in pg_class 
will tell autovacuum how frequent a relation really needs scanning.

Jan

--
#==#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me.  #
#== [EMAIL PROTECTED] #
---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
 joining column's datatypes do not match

Re: [HACKERS] [GENERAL] Buglist

2003-08-22 Thread Jan Wieck

Jan Wieck wrote:

Another way to give autovacuum some hints would be to return some number 
as commandtuples from vacuum. like the number of tuples actually 
vacuumed. That together with the new number of reltuples in pg_class 
will tell autovacuum how frequent a relation really needs scanning.
Which actually would be much better because it'd work without the 
statistics collector configured for gathering IO stats.

Jan

--
#==#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me.  #
#== [EMAIL PROTECTED] #
---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]

Re: [HACKERS] [GENERAL] Buglist

2003-08-22 Thread Shridhar Daithankar

On 22 Aug 2003 at 11:03, Jan Wieck wrote:

 Tom Lane wrote:
 
  Jan Wieck [EMAIL PROTECTED] writes:
  Shridhar Daithankar wrote:
  Umm.. What does FSM does then? I was under impression that FSM stores page 
  pointers and vacuum work on FSM information only. In that case, it wouldn't 
  have to waste time to find out which pages to clean.
  
  It's the other way around! VACUUM scan's the tables to find and reclaim 
  free space and remembers that free space in the FSM.
  
  Right.  One big question mark in my mind about these partial vacuum
  proposals is whether they'd still allow adequate FSM information to be
  maintained.  If VACUUM isn't looking at most of the pages, there's no
  very good way to acquire info about where there's free space.
 
 That's why I think it needs one more pg_stat column to count the number 
 of vacuumed tuples. If one does
 
  tuples_updated + tuples_deleted - tuples_vacuumed
 
 he'll get approximately the number of tuples a regular vacuum might be 
 able to reclaim. If that number is really small, no need for autovacuum 
 to cause any big trouble by scanning the relation.
 
 Another way to give autovacuum some hints would be to return some number 
 as commandtuples from vacuum. like the number of tuples actually 
 vacuumed. That together with the new number of reltuples in pg_class 
 will tell autovacuum how frequent a relation really needs scanning.

This kind of information does not really help autovacuum. If we are talking 
about modifying backend stat collection algo., so that vacuum does minimum 
work, is has translate to cheaper vacuum analyze so that autovacuum can fire it 
at will any time. In the best case, another resident process like stat 
collector can keep cleaning the deads.

This information must be in terms of pages and actually be maintained as per 
page stat. Looking at number of tuples values does not give any idea to vacuum 
how it is going to flush cache lines, either in postgresql or on OS. I doubt it 
will help vacuum command in itself to be any lighter or more efficient.

If it is easy to do, I would favour maitaining two page maps as I mentioned in 
another mail. One for pages in cache but not locked by any transaction and 
another for pages which has some free space. If it is rare for a page to be 
full, we can skip the later one. I think that could be good enough.




Bye
 Shridhar

--
Office Automation:  The use of computers to improve efficiency in the office   
 by 
removing anyone you would want to talk with over coffee.


---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send unregister YourEmailAddressHere to [EMAIL PROTECTED])

Re: [HACKERS] [GENERAL] Buglist

2003-08-22 Thread Matthew T. O'Connor

On Fri, 2003-08-22 at 11:08, Jan Wieck wrote:
  Another way to give autovacuum some hints would be to return some number 
  as commandtuples from vacuum. like the number of tuples actually 
  vacuumed. That together with the new number of reltuples in pg_class 
  will tell autovacuum how frequent a relation really needs scanning.
 
 Which actually would be much better because it'd work without the 
 statistics collector configured for gathering IO stats.

Which is certainly a good thing.  Using the stats system is a measurable
performance hit.

I still want to play with pg_autovacuum ignoring the stats system and
just looking at the FSM data.


---(end of broadcast)---
TIP 8: explain analyze is your friend

Re: [HACKERS] [GENERAL] Buglist

2003-08-22 Thread Matthew T. O'Connor

On Fri, 2003-08-22 at 11:17, Shridhar Daithankar wrote:
 On 22 Aug 2003 at 11:03, Jan Wieck wrote:
  That's why I think it needs one more pg_stat column to count the number 
  of vacuumed tuples. If one does
  
   tuples_updated + tuples_deleted - tuples_vacuumed
  
  he'll get approximately the number of tuples a regular vacuum might be 
  able to reclaim. If that number is really small, no need for autovacuum 
  to cause any big trouble by scanning the relation.
  
  Another way to give autovacuum some hints would be to return some number 
  as commandtuples from vacuum. like the number of tuples actually 
  vacuumed. That together with the new number of reltuples in pg_class 
  will tell autovacuum how frequent a relation really needs scanning.
 
 This kind of information does not really help autovacuum. If we are talking 
 about modifying backend stat collection algo., so that vacuum does minimum 
 work, is has translate to cheaper vacuum analyze so that autovacuum can fire it 
 at will any time. In the best case, another resident process like stat 
 collector can keep cleaning the deads.

I believe what Jan is talking about is knowing when to use a normal
vacuum, and when to do a vacuum decent.  So his proposal is working
under the assumption that there would be a cheaper vacuum analyze that
can be run most of the time.


---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]

Re: [HACKERS] [GENERAL] Buglist

2003-08-22 Thread Matthew T. O'Connor

On Fri, 2003-08-22 at 10:45, Tom Lane wrote:
 Jan Wieck [EMAIL PROTECTED] writes:
 Right.  One big question mark in my mind about these partial vacuum
 proposals is whether they'd still allow adequate FSM information to be
 maintained.  If VACUUM isn't looking at most of the pages, there's no
 very good way to acquire info about where there's free space.

Well, pg_autovacuum really needs to be looking at the FSM anyway.  It
could look at the FSM, and choose to to do a vacuum normal when there
the amount of FSM data becomes inadequate.  Of course I'm not sure how
you would differentiate a busy table with inadequate FSM data and an
inactive table that doesn't even register in the FSM.  Perhaps you would
still need to consult the stats system.


---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org

Re: [HACKERS] [GENERAL] Buglist

2003-08-22 Thread Jan Wieck

Okay, my proposal would be to have a VACUUM mode where it tells the 
buffer manager to only return a page if it is already in memory, and 
some not cached if it would have to read it from disk, and simply skip 
the page in that case. Probably needs some modifications in vacuums FSM 
handling, but basically that's it. It'll still cause IO for the 
resulting index bulk cleaning, so I don't know how efficient it'll be 
after all.

The number of vacuumed tuples returned will tell the autovacuum how 
useful this vacuum scan was. The less useful it is, the less frequent 
it'll be scheduled. There is no point in vacuuming a 50M row table every 
hour when the average number of tuples reclaimed is in the hundreds. I 
don't intend to avoid a full table scan completely. I only intend to 
lower the frequency of them. It will require some fuzzy logic in 
autovacuum to figure out if a CACHEONLY vacuum for a table needs to be 
more or less frequent to find more tuples though.

So far for what I have in mind. Now what are you proposing down there? 
Where do you intend to hold that per page stat and what exactly is 
maintaining it? And please don't give us any vague some other resident 
process. This only indicates you don't really know what it requires for 
a process to be able to read or write data in PostgreSQL.

Jan

Shridhar Daithankar wrote:

On 22 Aug 2003 at 11:03, Jan Wieck wrote:

Tom Lane wrote:

 Jan Wieck [EMAIL PROTECTED] writes:
 Shridhar Daithankar wrote:
 Umm.. What does FSM does then? I was under impression that FSM stores page 
 pointers and vacuum work on FSM information only. In that case, it wouldn't 
 have to waste time to find out which pages to clean.
 
 It's the other way around! VACUUM scan's the tables to find and reclaim 
 free space and remembers that free space in the FSM.
 
 Right.  One big question mark in my mind about these partial vacuum
 proposals is whether they'd still allow adequate FSM information to be
 maintained.  If VACUUM isn't looking at most of the pages, there's no
 very good way to acquire info about where there's free space.

That's why I think it needs one more pg_stat column to count the number 
of vacuumed tuples. If one does

 tuples_updated + tuples_deleted - tuples_vacuumed

he'll get approximately the number of tuples a regular vacuum might be 
able to reclaim. If that number is really small, no need for autovacuum 
to cause any big trouble by scanning the relation.

Another way to give autovacuum some hints would be to return some number 
as commandtuples from vacuum. like the number of tuples actually 
vacuumed. That together with the new number of reltuples in pg_class 
will tell autovacuum how frequent a relation really needs scanning.
This kind of information does not really help autovacuum. If we are talking 
about modifying backend stat collection algo., so that vacuum does minimum 
work, is has translate to cheaper vacuum analyze so that autovacuum can fire it 
at will any time. In the best case, another resident process like stat 
collector can keep cleaning the deads.

This information must be in terms of pages and actually be maintained as per 
page stat. Looking at number of tuples values does not give any idea to vacuum 
how it is going to flush cache lines, either in postgresql or on OS. I doubt it 
will help vacuum command in itself to be any lighter or more efficient.

If it is easy to do, I would favour maitaining two page maps as I mentioned in 
another mail. One for pages in cache but not locked by any transaction and 
another for pages which has some free space. If it is rare for a page to be 
full, we can skip the later one. I think that could be good enough.



Bye
 Shridhar
--
Office Automation:	The use of computers to improve efficiency in the office	by 
removing anyone you would want to talk with over coffee.

---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings


--
#==#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me.  #
#== [EMAIL PROTECTED] #
---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
 subscribe-nomail command to [EMAIL PROTECTED] so that your
 message can get through to the mailing list cleanly

Re: [HACKERS] [GENERAL] Buglist

2003-08-22 Thread Manfred Koizar

On Fri, 22 Aug 2003 10:45:50 -0400, Tom Lane [EMAIL PROTECTED]
wrote:
One big question mark in my mind about these partial vacuum
proposals is whether they'd still allow adequate FSM information to be
maintained.  If VACUUM isn't looking at most of the pages, there's no
very good way to acquire info about where there's free space.

VACUUM has accurate information about the pages it just visited.  Free
space information for pages not touched by VACUUM is still in the FSM,
unless free space on a page is too low to be interesting.  VACUUM has
to merge these two lists and throw away entries with little free space
if running out of room.

Thus we might end up with new almost full pages in the FSM while there
are pages with more free space lying around that a previous VACUUM
failed to register because there was more free space at that time.

Considering that
 .  FSM is lossy per definitionem
 .  we are targeting at relations with large passive areas
 .  decent VACUUM shall not replace lazy VACUUM
I see no problem here.

Future advice could be: VACCUM DECENT every hour, VACUUM daily,
VACUUM FULL once a year  where the first two could be scheduled by
autovacuum ...

Servus
 Manfred

---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match

Re: [HACKERS] [GENERAL] Buglist

2003-08-22 Thread Manfred Koizar

On Fri, 22 Aug 2003 12:18:02 -0400, Jan Wieck [EMAIL PROTECTED]
wrote:
Okay, my proposal would be to have a VACUUM mode where it tells the 
buffer manager to only return a page if it is already in memory

But how can it know?  Yes, we know exactly what we have in PG shared
buffers.  OTOH we keep telling people that they should configure
moderate values for shared_buffers because the OS is better at
caching.  Your CACHEONLY VACUUM wouldn't catch those pages that are in
the OS cache but not in the shared buffers, although they are
retrievable at almost the same low cost.

We should not try to avoid _any_ physical disk access.  It's good
enough to avoid useless reads.  Hence my proposal for a reclaimable
space list ...

Servus
 Manfred

---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faqs/FAQ.html

Re: [HACKERS] [GENERAL] Buglist

2003-08-22 Thread Tom Lane

Jan Wieck [EMAIL PROTECTED] writes:
 Okay, my proposal would be to have a VACUUM mode where it tells the 
 buffer manager to only return a page if it is already in memory, and 
 some not cached if it would have to read it from disk, and simply skip 
 the page in that case.

Since no such call is available at the OS level, this would only work
well with very large shared_buffers settings (ie, you try to rely on
PG shared buffers to the exclusion of kernel disk cache).  AFAIK the
general consensus is that that's not a good way to run Postgres.

regards, tom lane

---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings

Re: [HACKERS] [GENERAL] Buglist

2003-08-22 Thread Jan Wieck

Tom Lane wrote:

Jan Wieck [EMAIL PROTECTED] writes:
Okay, my proposal would be to have a VACUUM mode where it tells the 
buffer manager to only return a page if it is already in memory, and 
some not cached if it would have to read it from disk, and simply skip 
the page in that case.
Since no such call is available at the OS level, this would only work
well with very large shared_buffers settings (ie, you try to rely on
PG shared buffers to the exclusion of kernel disk cache).  AFAIK the
general consensus is that that's not a good way to run Postgres.
Oh-kay ... so yes Manfred, your RSM is probably the better way.

Jan

--
#==#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me.  #
#== [EMAIL PROTECTED] #
---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
 subscribe-nomail command to [EMAIL PROTECTED] so that your
 message can get through to the mailing list cleanly

39 matches

Mail list logo