Re: [HACKERS] Cygwin - make check broken

2005-08-04 Thread Reini Urban

Rocco Altier schrieb:

It looks like when we changed regress/GNUmakefile to pull rules from
Makefile.shlib, cygwin got broken in the process.

The problem is that regess.dll ends up being a symlink back to itself,
because we do a:
$(NAME)$(DLSUFFIX): $(shlib)
rm -f $(NAME)$(DLSUFFIX)
$(LN_S) $(shlib) $(NAME)$(DLSUFFIX)

And from Makefile.shlib (for cygwin)
ifeq ($(PORTNAME), cygwin)
  shlib = $(NAME)$(DLSUFFIX)

Thus regress.dll gets unhappy :-(

I don't know enough about the rest of the way the cygwin port is put
together, but it seems that the other platforms all have
shlib=lib$(NAME)...


For cygwin the normal rule is cyg$(NAME)$(DLSUFFIX),
but the postgresql maintainers refused to changed the prefix for 8.0

--
Reini Urban
http://xarch.tu-graz.ac.at/home/rurban/
http://phpwiki.org/

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match


Re: [HACKERS] MySQL to PostgreSQL for SugarCRM

2005-08-04 Thread Sergio A. Kessler

dennis, look at vtiger crm (http://www.vtiger.com)

vtiger is a fork of sugarcrm and are 100% commited to open source.
(unlike sugar, who offer the full version only for purchase)

and they actively want to support more databases.

regards,
/sak

Denis Lussier wrote:
At EnterpriseDB we're doing a little project along these lines.   In our 
lab, and soon for our company, we are running SugarCRM on 
EDB-Postgres.   Alas you say, but SugarCRM only supports MySQL:
 
EDB ships a nifty java based ETL tool with our product that is 99.5% 
based on the Enhydra Octopus LGPL project.  This allows for easily 
converting schema and data from a populated MySQL SugarCRM database into 
Postgres.   Then we hacked the PHP code of SugarCRM for a couple days 
and it is now working just fine on EDB-Postgres.
 
I think that whenever we come across an Open Source (or even a 
commercial\proprietary product) out there that supports MySQL, but not 
PostgreSQL...  We need to work collectively as a group to make it 
shamefully simple for the project to work with PostgreSQL also.  
SugarCRM has 250,000 downloads and every single one of those customers 
is guaranteed to be introduced to MySQL and only MySQL.
 
--Luss
 
PS1  Yes, we are going to try and work with the SugarCRM folks and get 
Postgres supported natively without customers having to hack the way we 
did to get it working.
 
PS2  I've hired many interns for summer and/or co-op jobs over the 
years.  The trick is to give them something interesting to do AND pay 
them a little more than they can make flipping burgers.
 



---(end of broadcast)---
TIP 6: explain analyze is your friend


[HACKERS] problem building 7.3 on FreeBSD 6

2005-08-04 Thread Palle Girgensohn

Hi!

Can someone with a FreeBSD 6.x installation confirm the build error 
referred here:




I maintain the postgresql ports for FreeBSD, but I have no version 6 
installed yet.


Thanks,
Palle


---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match


[HACKERS] Qustions about timestampz

2005-08-04 Thread Pavel Stehule

Hello

I am working on trunc and round function now. I have some problems with 
timestemp with time zone datatype. First question?


is correct  behaviour (I am in GMT+2 time zone)?

select current_timestamp, date_trunc('year', current_timestamp), 
date_trunc('month', current_timestamp);


timestamptz|date_trunc|date_trunc
"2005-07-31 10:46:39.087+02"|"2005-01-01 00:00:00+01"|"2005-07-01 
00:00:00+02"


I expected for all values time zone + 2.

Next questions. Is correct idea?

trunc(timestamptz '20010101 10:10:10+8', 'IYYY') --> 20010102 00:00:00+8 ??

or 20010101 18:00:00+02 (24-6) or 20010101 19:00:00+01 or ???

There are somewhere some rules about behavior timestamp with time zone?

Thenk You
Pavel Stehule

_
Citite se osamele? Poznejte nekoho vyjmecneho diky Match.com. 
http://www.msn.cz/



---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [HACKERS] win32 _dosmaperr()

2005-08-04 Thread Qingqing Zhou

""Magnus Hagander"" <[EMAIL PROTECTED]> writes
>
> I suggest you try using Process Explorer from www.sysinternals.com to
> figure out who has the file open. Most of the time it should be able to
> tell you exactly who has locked the file - at least as long as it's done
> from userspace. I'm not 100% sure on how it deals with kernel level
> locks.
>

After runing PG win32 (8.0.1) sever for a while and mix some heavy
transactions like checkpoint, vacuum together, I encountered another problem
should be in the same category. PG reports:

"could not unlink , continuing to try"

at dirmod.c/pgunlink() and deadloops there. I use the PE tool you mentioned,
I found there are only 3 processes hold the handle of the problematic xlog
segment, all of them are postgres backends. Using the FileMon tool from the
same website, I found that bgwriter tried to OPEN the xlog segment with ALL
ACCESS but failed with result DELETE PEND.

That is to say, under some conditions, even if I opened file with
SHARED_DELETE flag, I may not remove the file when it is open? I did some
tests, but every time I delete/rename an opened file, I could make it.

Things could get worse because the whole database cluster may stop working
and waiting for the buffer the bgwriter is working on, but bgwriter is
waiting for (by the deadloop in pgunlink) those postgres'es to move on (so
that they could close the problematic xlog segment), which is a deadlock.

Regards,
Qingqing







---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [HACKERS] Solving the OID-collision problem

2005-08-04 Thread Ian Burrell
Tom Lane  sss.pgh.pa.us> writes:
> 
> I was reminded again today of the problem that once a database has been
> in existence long enough for the OID counter to wrap around, people will
> get occasional errors due to OID collisions, eg
> 
> http://archives.postgresql.org/pgsql-general/2005-08/msg00172.php
> 

I'm a coworker of the original reporter.  I tracked down the cause of the toast
table unique constraint violation to the use of OIDs as chunk_id.  After the OID
wraparound, it tried to use an already used chunk_id.  That table has lots of
toast records which greatly increases the probability of a collision for the
current section of the OID counter.

> Getting rid of OID usage in user tables doesn't really do a darn thing
> to fix this.  It may delay wrap of the OID counter, but it doesn't stop
> it; and what's more, when the problem does happen it will be more
> serious (because the OIDs assigned to persistent objects will form a
> more densely packed set, so that you have a greater chance of collisions
> over a shorter time period).
> 
> We've sort of brushed this problem aside in the past by telling people
> they could just retry their transaction ... but why don't we make the
> database do the retrying?  I'm envisioning something like the attached
> quick-hack, which arranges that the pg_class and pg_type rows for tables
> will never be given OIDs duplicating an existing entry.  It basically
> just keeps generating and discarding OIDs until it finds one not in the
> table.  (This will of course not work for user-table OIDs, since we
> don't necessarily have an OID index on them, but it will work for all
> the system catalogs that have OIDs.)
> 

This will also be needed for toast tables.  They have the necessary index.

 - Ian



---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [HACKERS] US Census database (Tiger 2004FE) - 4.4G

2005-08-04 Thread Mark Woodward
I thought bout it, but it isn't the best program around, but it does work.
My program also reformats numbers, i.e. long/lat become properly
decimal-ed numerics, zips become integers, etc.

The question is...

Do you download the raw data and convert it into a database, or do you
download the pre-formatted database?

I would say the preformated database is easier to manage. There are
hundreds of individual zips files, in each of those files 10 or so data
files.



> Mark Woodward wrote:
>> It is 4.4G in space in a gzip package.
>>
>> I'll mail a DVD to two people who promise to host it for Hackers.
>
> Would it be easier to release the program you did to do
> this conversion?
>
>
> I use this pretty short (274 line) C program:
> http://www.forensiclogic.com/tmp/tgr2sql.c
> to convert the raw tiger files
> from http://www.census.gov/geo/www/tiger/index.html
> into SQL statements that can be loaded by postgresql.
>
> The #define SQL line controls if it makes data
> with INSERT statements or for COPY statements.
>
> ---(end of broadcast)---
> TIP 6: explain analyze is your friend
>


---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [HACKERS] openbsd, plpython, missing threading symbols

2005-08-04 Thread Marko Kreen
On Thu, Aug 04, 2005 at 08:14:51PM +0200, Stefan Kaltenbrunner wrote:
> Tom Lane wrote:
> >Andrew Dunstan <[EMAIL PROTECTED]> writes:
> >>Tom Lane wrote:
> >>>The alternative is to say that plpython isn't supported on BSDen unless
> >>>you choose to build an unthreaded libpython.
> >>I'm OK with that, but if that's what's done I think we should check for 
> >>it up front at configure time and not let it fail at run time like we do 
> >>now.
> > 
> > If you can create a suitable configure test, it'd be fine with me.
> 
> Not sure if it is of any help but mod_python seems to be using this
> configure.in snippet to detect (and reject) a threaded python installation:

Ok, I converted this to patch against config/python.m4.

Also made it work  with python 2.3, 2.4 that return 'True' not '1'.

The error is thrown only on BSD's.  As I understand threaded
python works fine on other OS'es?

Error message may need clarifying.

-- 
marko

Index: config/python.m4
===
RCS file: /projects/cvsroot/pgsql/config/python.m4,v
retrieving revision 1.11
diff -c -c -r1.11 python.m4
*** config/python.m411 Oct 2004 19:32:16 -  1.11
--- config/python.m44 Aug 2005 19:43:45 -
***
*** 77,80 
--- 77,95 
  AC_SUBST(python_libdir)[]dnl
  AC_SUBST(python_libspec)[]dnl
  AC_SUBST(python_additional_libs)[]dnl
+ 
+ # threaded python is not supported on bsd's
+ AC_MSG_CHECKING(whether Python is compiled with thread support)
+ pythreads=`${PYTHON} -c "import sys; print int('thread' in 
sys.builtin_module_names)"`
+ if test "$pythreads" = "1"; then
+   AC_MSG_RESULT(yes)
+   case $host_os in
+   *bsd*)
+ AC_MSG_ERROR([*** Threaded python not supported ***])
+ ;;
+   esac
+ else
+   AC_MSG_RESULT(no)
+ fi
+ 
  ])# PGAC_CHECK_PYTHON_EMBED_SETUP

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [HACKERS] Cygwin - make check broken

2005-08-04 Thread Tom Lane
"Rocco Altier" <[EMAIL PROTECTED]> writes:
> It looks like when we changed regress/GNUmakefile to pull rules from
> Makefile.shlib, cygwin got broken in the process.
> ...
> I don't know enough about the rest of the way the cygwin port is put
> together, but it seems that the other platforms all have
> shlib=lib$(NAME)...

Seems to me that defining shlib that way for Cygwin too would be a
reasonable answer, but I'm not sure if there will be any side-effects.
Can someone try it?

regards, tom lane

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


[HACKERS] Cygwin - make check broken

2005-08-04 Thread Rocco Altier
It looks like when we changed regress/GNUmakefile to pull rules from
Makefile.shlib, cygwin got broken in the process.

The problem is that regess.dll ends up being a symlink back to itself,
because we do a:
$(NAME)$(DLSUFFIX): $(shlib)
rm -f $(NAME)$(DLSUFFIX)
$(LN_S) $(shlib) $(NAME)$(DLSUFFIX)

And from Makefile.shlib (for cygwin)
ifeq ($(PORTNAME), cygwin)
  shlib = $(NAME)$(DLSUFFIX)

Thus regress.dll gets unhappy :-(

I don't know enough about the rest of the way the cygwin port is put
together, but it seems that the other platforms all have
shlib=lib$(NAME)...

-rocco

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [HACKERS] US Census database (Tiger 2004FE) - 4.4G

2005-08-04 Thread Ron Mayer

Mark Woodward wrote:

It is 4.4G in space in a gzip package.

I'll mail a DVD to two people who promise to host it for Hackers.


Would it be easier to release the program you did to do
this conversion?


I use this pretty short (274 line) C program:
   http://www.forensiclogic.com/tmp/tgr2sql.c
to convert the raw tiger files
from http://www.census.gov/geo/www/tiger/index.html
into SQL statements that can be loaded by postgresql.

The #define SQL line controls if it makes data
with INSERT statements or for COPY statements.

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [HACKERS] openbsd, plpython, missing threading symbols

2005-08-04 Thread Stefan Kaltenbrunner
Tom Lane wrote:
> Andrew Dunstan <[EMAIL PROTECTED]> writes:
> 
>>Tom Lane wrote:
>>
>>>The alternative is to say that plpython isn't supported on BSDen unless
>>>you choose to build an unthreaded libpython.
> 
> 
>>I'm OK with that, but if that's what's done I think we should check for 
>>it up front at configure time and not let it fail at run time like we do 
>>now.
> 
> 
> If you can create a suitable configure test, it'd be fine with me.


Not sure if it is of any help but mod_python seems to be using this
configure.in snippet to detect (and reject) a threaded python installation:


# check if python is compiled with threads
AC_MSG_CHECKING(whether Python is compiled with thread support)
PyTHREADS=`$PYTHON_BIN -c "import sys; print \"thread\" in
sys.builtin_module_names"`
if test "$PyTHREADS" = "1"; then
  AC_MSG_RESULT(yes)
  echo
  echo "  ** WARNING **"
  echo "  Python is compiled with thread support. Apache 1.3 does not
use threads."
  echo "  On some systems this will cause problems during compilation,
on others "
  echo "  it may result in unpredictable behaviour of your Apache
server. Yet on"
  echo "  others it will work just fine. The recommended approach is to
compile"
  echo "  Python without thread support in a separate location and
specify it with"
  echo "  --with-python option to this ./configure script."
  echo
else
  AC_MSG_RESULT([no threads, good])
fi


Stefan

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [HACKERS] Solving the OID-collision problem

2005-08-04 Thread mark
On Thu, Aug 04, 2005 at 12:20:24PM -0400, Tom Lane wrote:
> "Mark Woodward" <[EMAIL PROTECTED]> writes:
> >> I'm too lazy to run an experiment, but I believe it would.  Datum is
> >> involved in almost every function-call API in the backend. In
> >> particular this means that it would affect performance-critical code
> >> paths.
> > I hear you on the "lazy" part, but if OID becomes a structure, then you
> > are still comparing a native type until you get a match, then you make one
> > more comparison to confirm it is the right one, or move on.
> No, you're missing the point entirely: on 32-bit architectures, passing
> a 32-bit integral type to a function is an extremely well optimized
> operation, as is returning a 32-bit integral type.  Passing or
> returning a 64-bit struct is, um, not so well optimized.

I don't think this is necessarily true. For example, instead of passing
the 32-bit integer around, you would instead be passing a 32-bit pointer
to a data structure. This doesn't have to be expensive - although,
depending on the state of the API, it may require extensive changes to
make it inexpensive (or not - I don't know).

>From my perspective (new to this list - could be good, or could be bad)
the concept of the OID was too generalized. As a generalization, it
appears to have originally been intended to uniquely identify every
row in the database (system tables and user tables). As a generalization,
32-bits was not enough to represent every row in the database. It was a
mistake.

The work-around for this mistake, was to allow user tables to be
specially defined to not unnecessarily steal range from the OID space.
This work-around proved to be desirable enough, that as of PostgreSQL 8,
tables are no longer created with OIDs by default. It's still a
work-around. What has been purchased with this work-around is time to
properly address this problem. The problem has not been solved.

I see a few ways to solve this:

1) Create OID domains. The system tables could have their own OID
   counter separate from the user table OID counters. Tables that
   have no relationship to each other would be put in their own
   OID domain. It isn't as if you can map from row OID to table
   anyways, so any use of OID assumes knowledge of the table
   relationships. I see this as being relatively cheap to implement,
   with no impact on backwards compatibility, except in unusual cases
   where people have seriously abused the concept of an OID. This
   is another delay tactic, in that a sufficient number of changes
   to the system tables would still cause a wrap-around, however,
   it is equivalent or better to the suggestion that all user tables
   be created without oids, as this at least allows user tables to
   use oids again.

2) Enlarge the OID to be 64-bit or 128-bit. I don't see this as a
   necessarily being a performance problem, however, it might require
   significant changes to the API, which would be expensive. It might
   be argued that enlarging the OID merely delays the problem, and
   doesn't actually address it. Perhaps delaying it by 2^32 is
   effectively indefinately delaying it, or perhaps not. Those who
   thought 32-bits would be enough, or those who thought 2 digit years
   would be enough, under-estimated the problem. Compatibility can
   be mostly maintained, although the databases would probably need
   to be upgraded, and applications that assumed that the OID could
   fit into a 32-bit integer would break.

3) Leave OIDs as the general database-wide row identifier, and don't
   use OIDs to identifier system metadata. Instead, use a UUID (128-bit)
   or similar. System tables are special. Why shouldn't they have a
   non-general means of identifying stored metadata? This has some
   of the benefits of 1, all of the costs of 2, and it additional
   breaks compatibility for everything.

Based on my suggestions above, I see 1) as the best short and medium
term route. How hard would it be? Instead of a database wide OID
counter, we have several OID counters, with the table having an OID
counter association. Assuming the OID domain is properly defined, all
existing code continues to function properly, and wrap-around of the
OID in one domain, doesn't break the other domains, such as the system
tables.

Cheers,
mark

-- 
[EMAIL PROTECTED] / [EMAIL PROTECTED] / [EMAIL PROTECTED] 
__
.  .  _  ._  . .   .__.  . ._. .__ .   . . .__  | Neighbourhood Coder
|\/| |_| |_| |/|_ |\/|  |  |_  |   |/  |_   | 
|  | | | | \ | \   |__ .  |  | .|. |__ |__ | \ |__  | Ottawa, Ontario, Canada

  One ring to rule them all, one ring to find them, one ring to bring them all
   and in the darkness bind them...

   http://mark.mielke.cc/


---(end of broadcast)---
TIP 4: H

Re: [HACKERS] pg_dump -- data and schema only?

2005-08-04 Thread Mark Woodward
> "Mark Woodward" <[EMAIL PROTECTED]> writes:
>> Actually, there isn't a setting to just dump the able definitions and
>> the
>> data. When you dump the schema, it includes all the tablespaces,
>> namespaces, owners, etc.
>
>> Just the table nd object declarations and data would be useful.
>
> pg_dump -t table ?

I guess what I'm not being very clear.

pg_dump -# mydb
>>
--
-- PostgreSQL database dump
--

SET client_encoding = 'SQL_ASCII';
SET check_function_bodies = false;
SET client_min_messages = warning;

SET default_with_oids = true;

--
-- Name: rtz; Type: TABLE; Schema: public; Owner: markw; Tablespace:
--

CREATE TABLE rtz (
tlid integer,
rtsq integer,
zip4l integer,
zip4r integer
);

--
-- Data for Name: rtz; Type: TABLE DATA; Schema: public; Owner: markw
--

COPY rtz (tlid, rtsq, zip4l, zip4r) FROM stdin;
208014102   0   0   0
208014098   0   0   0
207023736   0   0   0
208014112   0   0   0
207027749   0   0   0

.

That way, it can go "easily" between various PG versions.

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [HACKERS] Solving the OID-collision problem

2005-08-04 Thread Mark Woodward
> "Mark Woodward" <[EMAIL PROTECTED]> writes:
>>> I'm too lazy to run an experiment, but I believe it would.  Datum is
>>> involved in almost every function-call API in the backend. In
>>> particular this means that it would affect performance-critical code
>>> paths.
>
>> I hear you on the "lazy" part, but if OID becomes a structure, then you
>> are still comparing a native type until you get a match, then you make
>> one
>> more comparison to confirm it is the right one, or move on.
>
> No, you're missing the point entirely: on 32-bit architectures, passing
> a 32-bit integral type to a function is an extremely well optimized
> operation, as is returning a 32-bit integral type.  Passing or
> returning a 64-bit struct is, um, not so well optimized.

That's only if you call by value, call by reference is just as optimized.

>
> There's also the small problem that it really has to fit into Datum.
>

Would it break a lot if you add more to a datum?

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [HACKERS] Enhanced containment selectivity function

2005-08-04 Thread Tom Lane
Matteo Beccati <[EMAIL PROTECTED]> writes:
> Moving it in contrib/ltree would be more difficult to me because it 
> depends on other functions declared in selfuncs.c 
> (get_restriction_variable, etc).

I'd be willing to consider exporting those functions from selfuncs.c.

regards, tom lane

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [HACKERS] Enhanced containment selectivity function

2005-08-04 Thread Matteo Beccati

Tom Lane wrote:

After looking at this a little, it doesn't seem like it has much to do
with the ordinary 2-D notion of containment.  In most of the core
geometric types, the "histogram" ordering is based on area, and so
testing the histogram samples against the query doesn't seem like it's
able to give very meaningful containment results --- the items shown
in the histogram could have any locations whatever.

The approach might be sensible for ltree's isparent operator --- I don't
have a very good feeling for the behavior of that operator, but it looks
like it has at least some relationship to the ordering induced by the
ltree < operator.


Actually, this was one of my doubts. The custom function seem to work 
well with ltree, but this also could be dependant from the way my 
dataset is organized.




So my thought is that (assuming Oleg and Teodor agree this is sensible
for ltree) we should put the selectivity function into contrib/ltree,
not directly into the core.  It might be best to call it something like
"parentsel", too, to avoid giving the impression that it has something
to do with 2-D containment.

Also, you should think about using the most-common-values list as well
as the histogram.  I would guess that many ltree applications would have
enough duplicate entries that the MCV list represents a significant
fraction of the total population.  Keep in mind when thinking about this
that the histogram describes the population of data *exclusive of the
MCV entries*.


I also agree that "parentsel" would better fit its purpose.

My patch was originally using MCV without good results, until I realized 
that MCV was empty because the column contains unique values :)

I'll look into adding a MCV check to it.

Moving it in contrib/ltree would be more difficult to me because it 
depends on other functions declared in selfuncs.c 
(get_restriction_variable, etc).


Thank you for your feedback


Best regards
--
Matteo Beccati
http://phpadsnew.com/
http://phppgads.com/

---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org


Re: [HACKERS] pg_dump -- data and schema only?

2005-08-04 Thread Tom Lane
"Mark Woodward" <[EMAIL PROTECTED]> writes:
> Actually, there isn't a setting to just dump the able definitions and the
> data. When you dump the schema, it includes all the tablespaces,
> namespaces, owners, etc.

> Just the table nd object declarations and data would be useful.

pg_dump -t table ?

regards, tom lane

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [HACKERS] US Census database (Tiger 2004FE) - 4.4G

2005-08-04 Thread Gavin M. Roy
You can send it to me, and ehpg will host it.  I'll send you a  
private email with my info.


Gavin

On Aug 4, 2005, at 8:26 AM, Mark Woodward wrote:


It is 4.4G in space in a gzip package.

I'll mail a DVD to two people who promise to host it for Hackers.

---(end of  
broadcast)---

TIP 4: Have you searched our list archives?

   http://archives.postgresql.org



Gavin M. Roy
800 Pound Gorilla
[EMAIL PROTECTED]



---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org


Re: [HACKERS] Solving the OID-collision problem

2005-08-04 Thread Tom Lane
"Mark Woodward" <[EMAIL PROTECTED]> writes:
>> I'm too lazy to run an experiment, but I believe it would.  Datum is
>> involved in almost every function-call API in the backend. In
>> particular this means that it would affect performance-critical code
>> paths.

> I hear you on the "lazy" part, but if OID becomes a structure, then you
> are still comparing a native type until you get a match, then you make one
> more comparison to confirm it is the right one, or move on.

No, you're missing the point entirely: on 32-bit architectures, passing
a 32-bit integral type to a function is an extremely well optimized
operation, as is returning a 32-bit integral type.  Passing or
returning a 64-bit struct is, um, not so well optimized.

There's also the small problem that it really has to fit into Datum.

regards, tom lane

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [HACKERS] Enhanced containment selectivity function

2005-08-04 Thread Tom Lane
Matteo Beccati <[EMAIL PROTECTED]> writes:
> This also made me think: is there a reason why geometric selectivity 
> functions return constant values rather than checking statistics for a 
> better result?

Because no one's ever bothered to work on them.  You should talk to
the PostGIS guys, however, because they *have* been working on real
statistics and real estimation functions for geometric types.  It'd
be nice to get some of that work back-ported into the core database.

http://www.postgis.org/

regards, tom lane

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [HACKERS] pg_dump -- data and schema only?

2005-08-04 Thread Tino Wildenhain
Am Donnerstag, den 04.08.2005, 11:52 -0400 schrieb Mark Woodward:
> > Am Donnerstag, den 04.08.2005, 10:26 -0400 schrieb Mark Woodward:
> >> I haven't seen this option, and does anyone thing it is a good idea?
> >>
> >> A option to pg_dump and maybe pg_dump all, that dumps only the table
> >> declarations and the data. No owners, tablespace, nothing.
> >>
> >> This, I think, would allow more generic PostgreSQL data transfers.
> >
> > pg_dump -s maybe?
> >
> > See man pg_dump:
> >
> >-s
> >
> >--schema-only
> >   Dump only the object definitions (schema), not data.
> >
> > Usually one dumps the database with -Fc and then construct
> > SQL for data and DDL via pg_restore from this binary dump.
> > You can then use pg_restore -l, edit (for example via sed)
> > and use it with -L to only generate SQL for these objects.
> >
> 
> Actually, there isn't a setting to just dump the able definitions and the
> data. When you dump the schema, it includes all the tablespaces,
> namespaces, owners, etc.
> 
> Just the table nd object declarations and data would be useful.

Reread my comment on the -l / -L option to pg_restore. You can
do all that with it :-)

-- 
Tino Wildenhain <[EMAIL PROTECTED]>


---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


[HACKERS] buildfarm happenings

2005-08-04 Thread Andrew Dunstan


If you haven't visited the buildfarm recently you might have missed a 
few developments:


. we are now reporting flags used on builds on the main dashboard page - 
I am working on making that look nicer by using icons - see 
http://www.pgbuildfarm.org/cgi-bin/show_status4.pl for prgress.


. the latest release provides for uploading all the log files, not just 
the log from the error stage. The details page for each build provides a 
link to each log file if they are present. Many buildfarm members are 
already running prerelease version of this facility.


. Eric Astor at EnterpriseDB is using a modified client on his member 
"finch" to run the Coverity static source code analysis tool, and the 
results are uploaded along with the rest of the log files - see for 
example 
http://www.pgbuildfarm.org/cgi-bin/show_log.pl?nm=finch&dt=2005-08-03%2020:12:42


All this and more is supported by the latest release, fresh today, of 
the client code, available on pgfoundry, the notes for which state:


Feature: upload a zipped tar of all the logs, regardless of success status
Feature: --from-source option lets you test changes in a locally 
modified repo copy

Feature: Cygwin cygrunserv sanity check
Feature: Add integrity checking for CVS conflicts and locally modified 
files.

Fixes: multiroot and keeperror modes fixed.


--from-source is not yet documented, but enables you to use the 
buildfarm client as a testing tool for changes made in a repo copy.



cheers

andrew

---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org


Re: [HACKERS] Enhanced containment selectivity function

2005-08-04 Thread Tom Lane
Matteo Beccati <[EMAIL PROTECTED]> writes:
> Someone on IRC (AndrewSN if I'm not wrong) pointed out that the 
> restriction selectivity function for <@ is contsel, which returns a 
> constant value of 0.001. So I started digging in the source code trying 
> to understand how the default behaviour could be enhanced, and ended up 
> writing a little patch which adds an alternative containment selectivity 
> function (called "contstatsel") which is able to deliver better results.

After looking at this a little, it doesn't seem like it has much to do
with the ordinary 2-D notion of containment.  In most of the core
geometric types, the "histogram" ordering is based on area, and so
testing the histogram samples against the query doesn't seem like it's
able to give very meaningful containment results --- the items shown
in the histogram could have any locations whatever.

The approach might be sensible for ltree's isparent operator --- I don't
have a very good feeling for the behavior of that operator, but it looks
like it has at least some relationship to the ordering induced by the
ltree < operator.

So my thought is that (assuming Oleg and Teodor agree this is sensible
for ltree) we should put the selectivity function into contrib/ltree,
not directly into the core.  It might be best to call it something like
"parentsel", too, to avoid giving the impression that it has something
to do with 2-D containment.

Also, you should think about using the most-common-values list as well
as the histogram.  I would guess that many ltree applications would have
enough duplicate entries that the MCV list represents a significant
fraction of the total population.  Keep in mind when thinking about this
that the histogram describes the population of data *exclusive of the
MCV entries*.

regards, tom lane

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [HACKERS] pg_dump -- data and schema only?

2005-08-04 Thread Mark Woodward
> Am Donnerstag, den 04.08.2005, 10:26 -0400 schrieb Mark Woodward:
>> I haven't seen this option, and does anyone thing it is a good idea?
>>
>> A option to pg_dump and maybe pg_dump all, that dumps only the table
>> declarations and the data. No owners, tablespace, nothing.
>>
>> This, I think, would allow more generic PostgreSQL data transfers.
>
> pg_dump -s maybe?
>
> See man pg_dump:
>
>-s
>
>--schema-only
>   Dump only the object definitions (schema), not data.
>
> Usually one dumps the database with -Fc and then construct
> SQL for data and DDL via pg_restore from this binary dump.
> You can then use pg_restore -l, edit (for example via sed)
> and use it with -L to only generate SQL for these objects.
>

Actually, there isn't a setting to just dump the able definitions and the
data. When you dump the schema, it includes all the tablespaces,
namespaces, owners, etc.

Just the table nd object declarations and data would be useful.

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [HACKERS] pg_dump -- data and schema only?

2005-08-04 Thread Tino Wildenhain
Am Donnerstag, den 04.08.2005, 10:26 -0400 schrieb Mark Woodward:
> I haven't seen this option, and does anyone thing it is a good idea?
> 
> A option to pg_dump and maybe pg_dump all, that dumps only the table
> declarations and the data. No owners, tablespace, nothing.
> 
> This, I think, would allow more generic PostgreSQL data transfers.

pg_dump -s maybe?

See man pg_dump: 

   -s

   --schema-only
  Dump only the object definitions (schema), not data.

Usually one dumps the database with -Fc and then construct
SQL for data and DDL via pg_restore from this binary dump.
You can then use pg_restore -l, edit (for example via sed)
and use it with -L to only generate SQL for these objects.

-- 
Tino Wildenhain <[EMAIL PROTECTED]>


---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


[HACKERS] US Census database (Tiger 2004FE) - 4.4G

2005-08-04 Thread Mark Woodward
It is 4.4G in space in a gzip package.

I'll mail a DVD to two people who promise to host it for Hackers.

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [HACKERS] Solving the OID-collision problem

2005-08-04 Thread Mark Woodward
> "Mark Woodward" <[EMAIL PROTECTED]> writes:
>>> 2. Performance.  Doing this would require widening Datum to 64 bits,
>>> which is a system-wide performance hit on 32-bit machines.
>
>> Do you really think it would make a measurable difference, more so than
>> your proposed solution? (I'm skeptical it would be measurable at all)
>
> I'm too lazy to run an experiment, but I believe it would.  Datum is
> involved in almost every function-call API in the backend. In
> particular this means that it would affect performance-critical code
> paths.

I hear you on the "lazy" part, but if OID becomes a structure, then you
are still comparing a native type until you get a match, then you make one
more comparison to confirm it is the right one, or move on. I think it is
a small hit that wouldn't even be noticed.

In fact, thinking about it

typedef struct _pgOID
{
   OLDOID_TYPE   oldOID;
   time_tunique;
}OID;

Everything works as it did before except that there is 32 bit date
identifier to prevent wrap around. Just one additional check is needed
only if there is a wrap.


> Creation of tables and such isn't performance-critical in most
> applications, so a few percent overhead there doesn't bother me.  A few
> percent across the board is another story.

Compared to all the other things going on, I would bet it isn't even
measuable.


---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [HACKERS] Solving the OID-collision problem

2005-08-04 Thread Tom Lane
"Mark Woodward" <[EMAIL PROTECTED]> writes:
>> 2. Performance.  Doing this would require widening Datum to 64 bits,
>> which is a system-wide performance hit on 32-bit machines.

> Do you really think it would make a measurable difference, more so than
> your proposed solution? (I'm skeptical it would be measurable at all)

I'm too lazy to run an experiment, but I believe it would.  Datum is
involved in almost every function-call API in the backend.  In
particular this means that it would affect performance-critical code
paths.  Creation of tables and such isn't performance-critical in most
applications, so a few percent overhead there doesn't bother me.  A few
percent across the board is another story.

regards, tom lane

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


[HACKERS] Enhanced containment selectivity function

2005-08-04 Thread Matteo Beccati

Hi,

I've recently had problems with slow queries caused by the selectivity 
of the <@ ltree operator, as you may see in my post here:


http://archives.postgresql.org/pgsql-performance/2005-07/msg00473.php


Someone on IRC (AndrewSN if I'm not wrong) pointed out that the 
restriction selectivity function for <@ is contsel, which returns a 
constant value of 0.001. So I started digging in the source code trying 
to understand how the default behaviour could be enhanced, and ended up 
writing a little patch which adds an alternative containment selectivity 
function (called "contstatsel") which is able to deliver better results.


This first version is based on the eqsel function and uses only 
histogram values to calculate the selectivity and uses the 0.001 
constant as a fallback.


This also made me think: is there a reason why geometric selectivity 
functions return constant values rather than checking statistics for a 
better result?


Attached you will find a patch suitable for current CVS HEAD. My C 
skills are a bit rusty and my knowledge of pg internals are very poor, 
so I'm sure it could be improved and modified to better fit the pg 
coding standards.



Here are the results on a slow query:

test=# EXPLAIN ANALYZE SELECT * FROM gw_users JOIN gw_batches USING 
(u_id) WHERE tree <@ '1041' AND t_stamp > '2005-07-01';


QUERY PLAN 


--
 Nested Loop  (cost=0.00..553.02 rows=8 width=364) (actual 
time=2.423..19787.259 rows=6785 loops=1)
   ->  Index Scan using gw_users_gisttree_key on gw_users 
(cost=0.00..21.63 rows=5 width=156) (actual time=0.882..107.434 
rows=4696 loops=1)

 Index Cond: (tree <@ '1041'::ltree)
   ->  Index Scan using gw_batches_t_stamp_u_id_key on gw_batches 
(cost=0.00..106.09 rows=15 width=212) (actual time=3.898..4.171 rows=1 
loops=4696)
 Index Cond: ((gw_batches.t_stamp > '2005-07-01 
00:00:00+02'::timestamp with time zone) AND ("outer".u_id = 
gw_batches.u_id))

 Total runtime: 19805.447 ms
(6 rows)

test=# EXPLAIN ANALYZE SELECT * FROM gw_users JOIN gw_batches USING 
(u_id) WHERE tree <<@ '1041' AND t_stamp > '2005-07-01';


QUERY PLAN 


-
 Hash Join  (cost=245.26..1151.80 rows=7671 width=364) (actual 
time=69.562..176.966 rows=6785 loops=1)

   Hash Cond: ("outer".u_id = "inner".u_id)
   ->  Bitmap Heap Scan on gw_batches  (cost=57.74..764.39 rows=8212 
width=212) (actual time=8.330..39.542 rows=7819 loops=1)
 Recheck Cond: (t_stamp > '2005-07-01 00:00:00+02'::timestamp 
with time zone)
 ->  Bitmap Index Scan on gw_batches_t_stamp_u_id_key 
(cost=0.00..57.74 rows=8212 width=0) (actual time=8.120..8.120 rows=7819 
loops=1)
   Index Cond: (t_stamp > '2005-07-01 
00:00:00+02'::timestamp with time zone)
   ->  Hash  (cost=175.79..175.79 rows=4692 width=156) (actual 
time=61.046..61.046 rows=4696 loops=1)
 ->  Seq Scan on gw_users  (cost=0.00..175.79 rows=4692 
width=156) (actual time=0.083..34.200 rows=4696 loops=1)

   Filter: (tree <<@ '1041'::ltree)
 Total runtime: 194.621 ms
(10 rows)

The second query uses a custom <<@ operator I added to test the 
alternative selectivity function:


CREATE FUNCTION contstatsel(internal, oid, internal, integer) RETURNS 
double precision AS 'contstatsel' LANGUAGE internal;

CREATE OPERATOR <<@ (
 LEFTARG = ltree,
 LEFTARG = ltree,
 PROCEDURE = ltree_risparent,
 COMMUTATOR = '@>',
 RESTRICT = contstatsel,
 JOIN = contjoinsel
);


Of course any comments/feedback are welcome.


Best regards
--
Matteo Beccati
http://phpadsnew.com/
http://phppgads.com/
Index: src/backend/utils/adt/selfuncs.c
===
RCS file: /projects/cvsroot/pgsql/src/backend/utils/adt/selfuncs.c,v
retrieving revision 1.187
diff -r1.187 selfuncs.c
1309a1310,1433
>  *contstatsel - Selectivity of containment for any 
> data types.
>  */
> Datum
> contstatsel(PG_FUNCTION_ARGS)
> {
>   PlannerInfo *root = (PlannerInfo *) PG_GETARG_POINTER(0);
>   Oid operator = PG_GETARG_OID(1);
>   List   *args = (List *) PG_GETARG_POINTER(2);
>   int varRelid = PG_GETARG_INT32(3);
>   VariableStatData vardata;
>   Node   *other;
>   boolvaronleft;
>   Datum  *values;
>   int nvalues;
>   double  selec = 0.0;
> 
>   /*
>* If expression is not variable = something or something = variable,
>* then punt and return a default estimate.
>*/
>   if (!get_restriction_variable(root, args, varRelid,
>   

Re: [HACKERS] Solving the OID-collision problem

2005-08-04 Thread Mark Woodward
> "Mark Woodward" <[EMAIL PROTECTED]> writes:
>> Why is there collision? It is because the number range of an OID is
>> currently smaller than the possible usage.
>
> Expanding OIDs to 64 bits is not really an attractive answer, on several
> grounds:
>
> 1. Disk space.

I don't really see this as a problem except in REALLY small installations
and PostgreSQL doesn't have that reputation. Also, we have without oid.

>
> 2. Performance.  Doing this would require widening Datum to 64 bits,
> which is a system-wide performance hit on 32-bit machines.

Do you really think it would make a measurable difference, more so than
your proposed solution? (I'm skeptical it would be measurable at all)

>
> 3. Portability.  We still manage to run on machines that have no 64-bit
> int type at all, and I for one am not prepared to give that up until
> it's necessary.

Maybe OID is no longer a number, but is now a structure:

typedef struct _pgOID
{
   time_tdate;
   int   id;
}OID;

(Not a serious proposal for the contents of the structure)

>
> Given that we've agreed to deprecate use of OIDs in user tables, I don't
> see any particular upside to making them 64 bits anyway.  None of the
> system catalogs seem likely to ever contain enough entries that a 32-bit
> limit is a problem.
>
> These are all more or less the same arguments as concern 64-bit
> transaction IDs.  The hacks we've indulged in to avoid that are far
> nastier and more invasive than what I'm suggesting for OIDs (vacuuming
> to forestall XID wraparound is certainly pretty ugly, and it's even
> user-visible).
>
> Perhaps at some point there will be a "64-bit build option" to make all
> these data types widen to 64 bits together.  I don't really foresee it
> happening in the near future though (even on 64-bit hardware, I doubt
> the performance tradeoffs are very favorable).  And abandoning support
> for the 32-bit universe altogether is surely a long way away.



---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


[HACKERS] pg_dump -- data and schema only?

2005-08-04 Thread Mark Woodward
I haven't seen this option, and does anyone thing it is a good idea?

A option to pg_dump and maybe pg_dump all, that dumps only the table
declarations and the data. No owners, tablespace, nothing.

This, I think, would allow more generic PostgreSQL data transfers.

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [HACKERS] Solving the OID-collision problem

2005-08-04 Thread Tom Lane
"Mark Woodward" <[EMAIL PROTECTED]> writes:
> Why is there collision? It is because the number range of an OID is
> currently smaller than the possible usage.

Expanding OIDs to 64 bits is not really an attractive answer, on several
grounds:

1. Disk space.

2. Performance.  Doing this would require widening Datum to 64 bits,
which is a system-wide performance hit on 32-bit machines.

3. Portability.  We still manage to run on machines that have no 64-bit
int type at all, and I for one am not prepared to give that up until
it's necessary.

Given that we've agreed to deprecate use of OIDs in user tables, I don't
see any particular upside to making them 64 bits anyway.  None of the
system catalogs seem likely to ever contain enough entries that a 32-bit
limit is a problem.

These are all more or less the same arguments as concern 64-bit
transaction IDs.  The hacks we've indulged in to avoid that are far
nastier and more invasive than what I'm suggesting for OIDs (vacuuming
to forestall XID wraparound is certainly pretty ugly, and it's even
user-visible).

Perhaps at some point there will be a "64-bit build option" to make all
these data types widen to 64 bits together.  I don't really foresee it
happening in the near future though (even on 64-bit hardware, I doubt
the performance tradeoffs are very favorable).  And abandoning support
for the 32-bit universe altogether is surely a long way away.

regards, tom lane

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [HACKERS] US Census database (Tiger 2004FE)

2005-08-04 Thread Christopher Kings-Lynne

It's been running for about an hour now, and it is up to 3.3G.

pg_dump tiger | gzip > tiger.pgz


| bzip2 > tiger.sql.bz2 :)

Chris

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq


Re: [HACKERS] US Census database (Tiger 2004FE)

2005-08-04 Thread Mark Woodward
>> It's been running for about an hour now, and it is up to 3.3G.
>>
>> pg_dump tiger | gzip > tiger.pgz
>
> | bzip2 > tiger.sql.bz2 :)
>

I find bzip2 FAR SLOWER than the gain in compression.

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [HACKERS] Bug introduced by recent ALTER OWNER permissions check change

2005-08-04 Thread Tom Lane
Stephen Frost <[EMAIL PROTECTED]> writes:
> Does it really?  I don't think so.  If you have owner privileges on the
> schema you can grant create rights to the role, then either ALTER OWNER
> if the patch is kept or just change to the role, create table x as
> select * from y;, etc, and then revoke the create privileges.

Hmm, that would work, but it still leaves me itchy.  If we allow this,
why not even further-removed schemes requiring several SET ROLEs?
For instance, you could argue that ALTER OWNER should be allowed to
anyone who can become the old object owner, even if their current role
doesn't include that privilege.  (That is, the difference between
is_member and has_privs checks.)  Or say that either the old or new
object owner can be owner of the containing schema.  (Which would amount
to disregarding whether a schema owner has revoked his own CREATE
privilege, on the grounds that he could always choose to grant it to
himself again.)  I'm really leery of going down this path without
significant use-cases in its favor.

> Having to special case superusers all over the place is an indication of
> poor design, imho.

Contorting the privilege rules to avoid special-casing superusers is
worse, imho.  At least when you do "if (superuser())" you know you
aren't creating any holes that might be exploitable by non-superusers.

regards, tom lane

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [HACKERS] US Census database (Tiger 2004FE)

2005-08-04 Thread Mark Woodward
> * Mark Woodward ([EMAIL PROTECTED]) wrote:
>> > How big dumped & compressed?  I may be able to host it depending on
>> how
>> > big it ends up being...
>>
>> It's been running for about an hour now, and it is up to 3.3G.
>
> Not too bad.  I had 2003 (iirc) loaded into 7.4 at one point.

Cool.

>
>> pg_dump tiger | gzip > tiger.pgz
>
> What db version are you using, how did you load it (ogr2ogr?), is it in
> postgis form?  Fun questions, all of them. :)

8.0.3, in simple pg_dump form.

I loaded it with a utility I wrote a long time ago for tigerua. It is a
fixed width text file to PG utility. It takes a "control" file that
describes the fields, field widths, and field name. It creates a SQL
"create table" statement, and also reads all the records from a control
file into a PostgreSQL copy command. A control file looks something like:

# Zip+4 codes
# Tiger 2003 Record Conversion File
# Copyright (c) 2004 Mark L. Woodward, Mohawk Software
TABLE RTZ
1:I RT
4:I VERSION
10:TTLID
3:S RTSQ
4:Z ZIP4L
4:Z ZIP4R


The first number is the field width in chars, second is an optional type
(there are a few, 'I' means ignore, 'Z' means zipcode, etc.) if no type is
given, then varchar is assumed. Last is the column name.


>
>> I'll let you know. Hopefully, it will fit on  DVD.
>
> I guess your upload pipe isn't very big?  snail-mail is slow... :)

Never underestimate the bandwidth of a few DVDs and FedEx. Do the math, it
is embarrasing.

>
>> You know, ... maybe pg_dump needs a progress bar? (How would it do that,
>> I
>> wonder?)
>
> Using the new functions in 8.1 which provide size-on-disk of things,
> hopefully there's also a function to give a tuple-size or similar as
> well.  It'd be a high estimate due to dead tuples but should be
> sufficient for a progress bar.
>
>   Thanks,
>
>   Stephen
>


---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [HACKERS] US Census database (Tiger 2004FE)

2005-08-04 Thread Tino Wildenhain
Am Donnerstag, den 04.08.2005, 08:40 -0400 schrieb Mark Woodward:
> > * Mark Woodward ([EMAIL PROTECTED]) wrote:
> >> I just finished converting and loading the US census data into
> >> PostgreSQL
> >> would anyone be interested in it for testing purposes?
> >>
> >> It's a *LOT* of data (about 40+ Gig in PostgreSQL)
> >
> > How big dumped & compressed?  I may be able to host it depending on how
> > big it ends up being...
> 
> It's been running for about an hour now, and it is up to 3.3G.
> 
> pg_dump tiger | gzip > tiger.pgz
> 
> I'll let you know. Hopefully, it will fit on  DVD.
> 
> You know, ... maybe pg_dump needs a progress bar? (How would it do that, I
> wonder?)
pg_dump -v maybe? ;) *hint hint*

-- 
Tino Wildenhain <[EMAIL PROTECTED]>


---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [HACKERS] US Census database (Tiger 2004FE)

2005-08-04 Thread Stephen Frost
* Mark Woodward ([EMAIL PROTECTED]) wrote:
> > How big dumped & compressed?  I may be able to host it depending on how
> > big it ends up being...
> 
> It's been running for about an hour now, and it is up to 3.3G.

Not too bad.  I had 2003 (iirc) loaded into 7.4 at one point.

> pg_dump tiger | gzip > tiger.pgz

What db version are you using, how did you load it (ogr2ogr?), is it in
postgis form?  Fun questions, all of them. :)

> I'll let you know. Hopefully, it will fit on  DVD.

I guess your upload pipe isn't very big?  snail-mail is slow... :)

> You know, ... maybe pg_dump needs a progress bar? (How would it do that, I
> wonder?)

Using the new functions in 8.1 which provide size-on-disk of things,
hopefully there's also a function to give a tuple-size or similar as
well.  It'd be a high estimate due to dead tuples but should be
sufficient for a progress bar.

Thanks,

Stephen


signature.asc
Description: Digital signature


Re: [HACKERS] US Census database (Tiger 2004FE)

2005-08-04 Thread Mark Woodward
> * Mark Woodward ([EMAIL PROTECTED]) wrote:
>> I just finished converting and loading the US census data into
>> PostgreSQL
>> would anyone be interested in it for testing purposes?
>>
>> It's a *LOT* of data (about 40+ Gig in PostgreSQL)
>
> How big dumped & compressed?  I may be able to host it depending on how
> big it ends up being...

It's been running for about an hour now, and it is up to 3.3G.

pg_dump tiger | gzip > tiger.pgz

I'll let you know. Hopefully, it will fit on  DVD.

You know, ... maybe pg_dump needs a progress bar? (How would it do that, I
wonder?)

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [HACKERS] US Census database (Tiger 2004FE)

2005-08-04 Thread Stephen Frost
* Mark Woodward ([EMAIL PROTECTED]) wrote:
> I just finished converting and loading the US census data into PostgreSQL
> would anyone be interested in it for testing purposes?
> 
> It's a *LOT* of data (about 40+ Gig in PostgreSQL)

How big dumped & compressed?  I may be able to host it depending on how
big it ends up being...

Stephen


signature.asc
Description: Digital signature


Re: [HACKERS] Solving the OID-collision problem

2005-08-04 Thread Mark Woodward
> I was reminded again today of the problem that once a database has been
> in existence long enough for the OID counter to wrap around, people will
> get occasional errors due to OID collisions, eg
>
> http://archives.postgresql.org/pgsql-general/2005-08/msg00172.php
>
> Getting rid of OID usage in user tables doesn't really do a darn thing
> to fix this.  It may delay wrap of the OID counter, but it doesn't stop
> it; and what's more, when the problem does happen it will be more
> serious (because the OIDs assigned to persistent objects will form a
> more densely packed set, so that you have a greater chance of collisions
> over a shorter time period).
>
> We've sort of brushed this problem aside in the past by telling people
> they could just retry their transaction ... but why don't we make the
> database do the retrying?  I'm envisioning something like the attached
> quick-hack, which arranges that the pg_class and pg_type rows for tables
> will never be given OIDs duplicating an existing entry.  It basically
> just keeps generating and discarding OIDs until it finds one not in the
> table.  (This will of course not work for user-table OIDs, since we
> don't necessarily have an OID index on them, but it will work for all
> the system catalogs that have OIDs.)
>
> I seem to recall having thought of this idea before, and having rejected
> it as being too much overhead to solve a problem that occurs only rarely
> --- but in a quick test involving many repetitions of
>
>   create temp table t1(f1 int, f2 int);
>   drop table t1;
>
> the net penalty was only about a 2% slowdown on one machine, and no
> measurable difference at all on another.  So it seems like it might
> be worth doing.
>
> Comments?
>
>   regards, tom lane
>

I hope I can say this without offense, but the obvious problem is not
"collision," but "uniqueness." The most efficient way of dealing with the
issue is to remove it.

Why is there collision? It is because the number range of an OID is
currently smaller than the possible usage. Maybe it is time to rething the
OID all together and create something like a GUID (Yes, I hate them too).
I know it is ugly, but it think coming up with strategies to work around a
design limitation is a waste of time, correcting he design limitation is
the best investment.

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [HACKERS] Bug introduced by recent ALTER OWNER permissions check change

2005-08-04 Thread Stephen Frost
* Tom Lane ([EMAIL PROTECTED]) wrote:
> Stephen Frost <[EMAIL PROTECTED]> writes:
> > I don't like this approach to solving the problem.  I would rather see
> > the check modified to allow the ownership change provided:
> 
> > the user issueing the command has access to destination role
> > AND
> > (
> > the destination role can create the table
> >   OR the user issuing the command has owner rights on the schema/db
> > )
> > etc
> 
> I don't think so --- this would allow unprivileged users to use ALTER
> OWNER to arrive at states they could not arrive at otherwise; which
> destroys the entire argument that non-superuser ALTER OWNER is not a
> security risk.  Shall we reverse out the patch and require you to
> justify it from scratch?

Does it really?  I don't think so.  If you have owner privileges on the
schema you can grant create rights to the role, then either ALTER OWNER
if the patch is kept or just change to the role, create table x as 
select * from y;, etc, and then revoke the create privileges.  So, such
unprivileged users (which yet are owners of the schema in question, one
of the requirements above) could arrive at that state regardless.

> Superusers should be allowed to do whatever they want, but that doesn't
> mean that we should weaken the rules applied to ordinary users.

Having to special case superusers all over the place is an indication of
poor design, imho.

Thanks,

Stephen


signature.asc
Description: Digital signature


Re: [HACKERS] US Census database (Tiger 2004FE)

2005-08-04 Thread Mark Woodward
Wow! a lot of people seem to want it!
I am dumping out with pg_dump right now, it may take a few hours.
It is in PostgreSQL 8.0.3
Does anyone have access to a high bandwidth server? I could mail it on a
DVD to someone who would host it.

>
> David Fetter wrote:
>
>>On Wed, Aug 03, 2005 at 05:00:16PM -0400, Mark Woodward wrote:
>>
>>
>>>I just finished converting and loading the US census data into
>>> PostgreSQL
>>>would anyone be interested in it for testing purposes?
>>>
>>>It's a *LOT* of data (about 40+ Gig in PostgreSQL)
>>>
>>>
>>
>>Sure.  Got a torrent?
>>
>>
>>
>
>
> How big is it when dumped and compressed?
>
> cheers
>
> andrew
>

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [HACKERS] Fundamental error in "no WAL log" index/file

2005-08-04 Thread Qingqing Zhou

"Simon Riggs" <[EMAIL PROTECTED]> writes
>
> I have learnt that Tom means: read the code. :-)
>
> CREATE INDEX doesn't produce xlog records *except* when you use PITR, so
> PITR does work correctly.
>

wstate.btws_use_wal = XLogArchivingActive() && !wstate.index->rd_istemp;

Ah-oh, that's true ;-)

Regards,
Qingqing




---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [HACKERS] dbt3 data with 10GB scale factor

2005-08-04 Thread Simon Riggs
On Tue, 2005-08-02 at 14:22 -0700, Mark Wong wrote:
> I've started scaling dbt3 up to the 10GB scale factor against CVS and
> the fast COPY patch:
> 
> http://www.testing.osdl.org/projects/dbt3testing/results/dev4-010/53/

Try "www1" if this URL fails for you.

> I'm sure there are some better database parameters I should use so
> please let me know what to try. ;)  What I've found interesting is the
> difference in the time it takes for Q9 to run in the power test than the
> throughput test:
> 
>  Power Test 00:56:27
> Throughput Stream 1 00:38:13
> Throughput Stream 2 00:41:33
> Throughput Stream 3 00:20:16
> Throughput Stream 4 00:18:11
> 
> Diffing the query plans between the Power Test and the individual
> streams in the Throughput Test, I only see a Materialize and Seq Scan on
> the nation table versus just a Seq Scan on the nation table between two
> of the plans.  But this doesn't appear to account for the execution time
> difference as the query with the same plan executes just as fast during
> the Throughput Test.  Here are the plans in full:

Very interesting. Excellent work on the test results.

Initial reaction was shock, but: most of these plans use nested joins,
so there's lots of shared_buffers work going on. It feels like this
would allow the queries to share a certain percentage of blocks and
allow the multiple backends to parallelise the I/O, which in the Power
test would all be single streamed.

Difference in plans? Well, they are actually all different queries, just
very similar. I note that the number of rows retrieved by the Power Test
query actually had the fewest number of rows of any query, so the time
difference might well have been much greater.

I'll look more into the plan differences.

Best Regards, Simon Riggs


---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match