subject:"Re\: \[HACKERS\] pg_migrator and an 8.3\-compatible tsvector data type"

Re: [HACKERS] pg_migrator and an 8.3-compatible tsvector data type

2009-06-01 Thread Robert Haas

On Sun, May 31, 2009 at 11:49 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Robert Haas robertmh...@gmail.com writes:
 (At the risk of beating a dead horse, note if we were upgrading the
 catalog tables directly via SQL, this type of scenario could be
 handled cleanly without hacking pg_dump; I repeat my earlier critique
 that the pg_migrator approach consigns us to a never-ending series of
 pg_dump hacks, that or having it not work very well.)

 Updating the catalog tables directly via SQL?  Good luck making that
 work.  If you ever get it to work at all, it'll require a pile of hacks
 that will make pg_migrator look like paradise.

For clarity, I really mean from a standalone backend, but ideally
I'd like it to be SQL.

 (Which is not to say that pg_migrator isn't a hack; it surely is.
 But it looks like the least painful approach available.)

Maybe.  It seems that we don't have a good way of handling datatype
conversions.  The approaches that have been proposed for tsvector
wouldn't work at all but for the fact that the new output function can
handle the old internal representation, which is not something that we
can guarantee in every case.  And, even so, they leave the database in
a broken state until the post-migration scripts have been run.  The
good news is that tsvector is not a datatype that everyone uses, and
those who do probably don't use it in every table, but what happens
when we want to change numeric incompatibly?

We really need to figure out an approach that lets us keep the old
datatypes around under a different name while making the original name
be the new version of the datatype.  That way people can migrate and
be up, and deal with the need to rewrite their tables at a later time.
 Having the database come up quickly but in a state where queries may
return wrong answers is a fairly poor excuse for in-place upgrade.
This is not to say that I don't appreciate the work Bruce is putting
into it, but I fear that without some more engineering it's just going
to be a hack used only by the desperate.

...Robert

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_migrator and an 8.3-compatible tsvector data type

2009-06-01 Thread Greg Stark

On Mon, Jun 1, 2009 at 4:03 PM, Robert Haas robertmh...@gmail.com wrote:
 On Sun, May 31, 2009 at 11:49 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Updating the catalog tables directly via SQL?  Good luck making that
 work.  If you ever get it to work at all, it'll require a pile of hacks
 that will make pg_migrator look like paradise.

 For clarity, I really mean from a standalone backend, but ideally
 I'd like it to be SQL.

Keep in mind that you have catalogs in all the databases, and even in
standalone mode you need those catalogs to find the, er, catalogs.
There's a reason bootstrap mode is so limited and even then it needs
some of the catalogs to already be in place.

 (Which is not to say that pg_migrator isn't a hack; it surely is.
 But it looks like the least painful approach available.)

 Maybe.  It seems that we don't have a good way of handling datatype
 conversions.  The approaches that have been proposed for tsvector
 wouldn't work at all but for the fact that the new output function can
 handle the old internal representation, which is not something that we
 can guarantee in every case.

Well I think all we need for that is to have pg_migrator provide the
old output function wrapped up in a migrate_foo() C function.


 And, even so, they leave the database in
 a broken state until the post-migration scripts have been run.  The
 good news is that tsvector is not a datatype that everyone uses, and
 those who do probably don't use it in every table, but what happens
 when we want to change numeric incompatibly?

Or, say, timestamp...

 We really need to figure out an approach that lets us keep the old
 datatypes around under a different name while making the original name
 be the new version of the datatype.  That way people can migrate and
 be up, and deal with the need to rewrite their tables at a later time.

I do agree that having to rewrite the whole table isn't really
upgrade-in-place.

But the work to support multiple versions of data types is more than
you're describing. You need to be concerned about things like joins
between tables when some columns are the old data type and some the
new, etc.

Ultimately people will have to convert the data types sometime.

-- 
greg

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_migrator and an 8.3-compatible tsvector data type

2009-06-01 Thread Tom Lane

Greg Stark st...@enterprisedb.com writes:
 On Mon, Jun 1, 2009 at 4:03 PM, Robert Haas robertmh...@gmail.com wrote:
 We really need to figure out an approach that lets us keep the old
 datatypes around under a different name while making the original name
 be the new version of the datatype.  That way people can migrate and
 be up, and deal with the need to rewrite their tables at a later time.

 I do agree that having to rewrite the whole table isn't really
 upgrade-in-place.

It's certainly the case that there is a lot more work to do before
pg_migrator could support everything that we reasonably want to be
able to do in a version update.  As I see it, the reason it's getting
revived now is that 8.3-8.4 happens to be an update where most of what
it can't (yet) do isn't necessary.  That means we can get it out there,
get the bugs out of the functionality it does have, and most importantly
try to set an expectation that future updates will also have some degree
of update-in-place capability.  If we wait till it's perfect then
nothing will ever happen at all in this space.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_migrator and an 8.3-compatible tsvector data type

2009-06-01 Thread Robert Haas

On Mon, Jun 1, 2009 at 11:10 AM, Greg Stark st...@enterprisedb.com wrote:
 On Mon, Jun 1, 2009 at 4:03 PM, Robert Haas robertmh...@gmail.com wrote:
 On Sun, May 31, 2009 at 11:49 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Updating the catalog tables directly via SQL?  Good luck making that
 work.  If you ever get it to work at all, it'll require a pile of hacks
 that will make pg_migrator look like paradise.

 For clarity, I really mean from a standalone backend, but ideally
 I'd like it to be SQL.

 Keep in mind that you have catalogs in all the databases, and even in
 standalone mode you need those catalogs to find the, er, catalogs.
 There's a reason bootstrap mode is so limited and even then it needs
 some of the catalogs to already be in place.

Yep.  Changes to the schema of the bootstrap tables are definitely the
toughest nut to crack.  I haven't been convinced that it's impossible,
but given my relative level of knowledge of the code compared to Tom,
that could well be an indication that I'm overly optimistic.

 (Which is not to say that pg_migrator isn't a hack; it surely is.
 But it looks like the least painful approach available.)

 Maybe.  It seems that we don't have a good way of handling datatype
 conversions.  The approaches that have been proposed for tsvector
 wouldn't work at all but for the fact that the new output function can
 handle the old internal representation, which is not something that we
 can guarantee in every case.

 Well I think all we need for that is to have pg_migrator provide the
 old output function wrapped up in a migrate_foo() C function

Well that would be better, but it still leaves the database temporarily broken.

 And, even so, they leave the database in
 a broken state until the post-migration scripts have been run.  The
 good news is that tsvector is not a datatype that everyone uses, and
 those who do probably don't use it in every table, but what happens
 when we want to change numeric incompatibly?

 Or, say, timestamp...

Yeah.

 We really need to figure out an approach that lets us keep the old
 datatypes around under a different name while making the original name
 be the new version of the datatype.  That way people can migrate and
 be up, and deal with the need to rewrite their tables at a later time.

 I do agree that having to rewrite the whole table isn't really
 upgrade-in-place.

 But the work to support multiple versions of data types is more than
 you're describing. You need to be concerned about things like joins
 between tables when some columns are the old data type and some the
 new, etc.

True.

 Ultimately people will have to convert the data types sometime.

Yes they will, but not having to do it as part of the upgrade is
important.  What particularly bothers me is the possibility that the
database comes on line and starts letting clients in (who don't know
that it's broken) while the breakage is still present.

...Robert

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_migrator and an 8.3-compatible tsvector data type

2009-06-01 Thread Robert Haas

On Mon, Jun 1, 2009 at 11:18 AM, Tom Lane t...@sss.pgh.pa.us wrote:
 Greg Stark st...@enterprisedb.com writes:
 On Mon, Jun 1, 2009 at 4:03 PM, Robert Haas robertmh...@gmail.com wrote:
 We really need to figure out an approach that lets us keep the old
 datatypes around under a different name while making the original name
 be the new version of the datatype.  That way people can migrate and
 be up, and deal with the need to rewrite their tables at a later time.

 I do agree that having to rewrite the whole table isn't really
 upgrade-in-place.

 It's certainly the case that there is a lot more work to do before
 pg_migrator could support everything that we reasonably want to be
 able to do in a version update.  As I see it, the reason it's getting
 revived now is that 8.3-8.4 happens to be an update where most of what
 it can't (yet) do isn't necessary.  That means we can get it out there,
 get the bugs out of the functionality it does have, and most importantly
 try to set an expectation that future updates will also have some degree
 of update-in-place capability.  If we wait till it's perfect then
 nothing will ever happen at all in this space.

I agree.  I remain doubtful that dumping and reloading the schema is
the best way to go, but it's certainly a worthwhile experiment,
because (a) I might easily be wrong and (b) we'll hopefully learn some
things that will be useful going forward.

...Robert

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_migrator and an 8.3-compatible tsvector data type

2009-06-01 Thread Bruce Momjian

Robert Haas wrote:
  It's certainly the case that there is a lot more work to do before
  pg_migrator could support everything that we reasonably want to be
  able to do in a version update. ?As I see it, the reason it's getting
  revived now is that 8.3-8.4 happens to be an update where most of what
  it can't (yet) do isn't necessary. ?That means we can get it out there,
  get the bugs out of the functionality it does have, and most importantly
  try to set an expectation that future updates will also have some degree
  of update-in-place capability. ?If we wait till it's perfect then
  nothing will ever happen at all in this space.
 
 I agree.  I remain doubtful that dumping and reloading the schema is
 the best way to go, but it's certainly a worthwhile experiment,
 because (a) I might easily be wrong and (b) we'll hopefully learn some
 things that will be useful going forward.

The amount of work to do an upgrade without dump/reloading the schema is
overly complex and would have to be adjusted for every major release,
with the high likelihood it would not be done or fail in random ways.

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_migrator and an 8.3-compatible tsvector data type

2009-06-01 Thread Bruce Momjian

Greg Stark wrote:
 On Mon, Jun 1, 2009 at 4:03 PM, Robert Haas robertmh...@gmail.com wrote:
  On Sun, May 31, 2009 at 11:49 PM, Tom Lane t...@sss.pgh.pa.us wrote:
  Updating the catalog tables directly via SQL? ?Good luck making that
  work. ?If you ever get it to work at all, it'll require a pile of hacks
  that will make pg_migrator look like paradise.
 
  For clarity, I really mean from a standalone backend, but ideally
  I'd like it to be SQL.
 
 Keep in mind that you have catalogs in all the databases, and even in
 standalone mode you need those catalogs to find the, er, catalogs.
 There's a reason bootstrap mode is so limited and even then it needs
 some of the catalogs to already be in place.
 
  (Which is not to say that pg_migrator isn't a hack; it surely is.
  But it looks like the least painful approach available.)
 
  Maybe. ?It seems that we don't have a good way of handling datatype
  conversions. ?The approaches that have been proposed for tsvector
  wouldn't work at all but for the fact that the new output function can
  handle the old internal representation, which is not something that we
  can guarantee in every case.
 
 Well I think all we need for that is to have pg_migrator provide the
 old output function wrapped up in a migrate_foo() C function.

Yep, it is just a matter of using a custom function call rather than a
cast.

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_migrator and an 8.3-compatible tsvector data type

2009-05-31 Thread Bruce Momjian

Greg Stark wrote:
 On Sat, May 30, 2009 at 1:11 PM, Bruce Momjian br...@momjian.us wrote:
  I have discovered a simpler solution using ALTER TABLE and calling a
  conversion function:
 
  ? ? ? ?test= CREATE TABLE tsvector_test(x tsvector);
  ? ? ? ?CREATE TABLE
  ? ? ? ?test= ALTER TABLE tsvector_test ALTER COLUMN x TYPE tsvector
  ? ? ? ?test- USING conversion_func(x);
  ? ? ? ?ALTER TABLE
 
  No need for a fake data type and the required index infrastructure.
 
 I assume you're putting this in the list of commands to run
 post-migration along with any reindex commands etc? Because it will
 take a while (still faster than dump/reload i think).
 
 For this case, assuming the new tsvector's output function doesn't get
 confused by the old ordering, I think you can just use USING
 x::text::tsvector as your conversion expression. For more complex
 cases you might need to package up the old output function.
 
 Also note that you'll want to do any other conversions in the same
 table at the same time rather than doing multiple conversions.

I have implemented Greg's ideas using ::text::tsvector casting and it
worked:

$ psql test
psql (8.4beta2)
Type help for help.

test= select * from tsvector_test;
   x

 'c' 'bb' 'aaa'
(1 row)

test=
test= \q

$ psql test  /u/postgres/rebuilt_tsvector_tables.txt
ALTER TABLE
ALTER TABLE
ALTER TABLE
$ psql test
psql (8.4beta2)
Type help for help.

test= select * from tsvector_test;
   x

 'aaa' 'bb' 'c'
(1 row)

Here is the script that groups multiple tsvector columns into a single
ALTER TABLE:

ALTER TABLE public.tsvector_test
  ALTER COLUMN x TYPE tsvector USING x::text::tsvector;
ALTER TABLE public.tsvector_test2
  ALTER COLUMN x TYPE tsvector USING x::text::tsvector,
  ALTER COLUMN y TYPE tsvector USING y::text::tsvector;
ALTER TABLE public.tsvector_test3
  ALTER COLUMN x TYPE tsvector USING x::text::tsvector,
  ALTER COLUMN y TYPE tsvector USING y::text::tsvector,
  ALTER COLUMN z TYPE tsvector USING z::text::tsvector;

I knew I was going to have to deal with this optimization but didn't
realize ALTER TABLE already supported multiple TYPE changes.

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_migrator and an 8.3-compatible tsvector data type

2009-05-31 Thread Bruce Momjian

Bruce Momjian wrote:
 Tom Lane wrote:
  Bruce Momjian br...@momjian.us writes:
   I have discovered a simpler solution using ALTER TABLE and calling a
   conversion function:
  
 test= CREATE TABLE tsvector_test(x tsvector);
 CREATE TABLE
 test= ALTER TABLE tsvector_test ALTER COLUMN x TYPE tsvector 
 test- USING conversion_func(x);
 ALTER TABLE
  
   No need for a fake data type and the required index infrastructure.
  
  I think this is basically a large-caliber foot gun.  You're going to
  pretend that invalid data is valid, until the user gets around to fixing
  it?
 
 What choice do we have?  While we can mark indexes as invalid (which we
 do), how do we mark a table's contents as invalid?  Should we create
 rules so no one can see the data and then have the ALTER TABLE script
 remove the rules after it is rebuilt?

OK, what ideas do people have to prevent access to tsvector columns?  I
am thinking of renaming the tables or something.

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_migrator and an 8.3-compatible tsvector data type

2009-05-31 Thread Greg Stark

On Sun, May 31, 2009 at 3:04 PM, Bruce Momjian br...@momjian.us wrote:
  I think this is basically a large-caliber foot gun.  You're going to
  pretend that invalid data is valid, until the user gets around to fixing
  it?

 What choice do we have?

I think in this case the caliber is pretty small and this might be
sufficient. It might be nice if we had a check somewhere in the
tsvector data types so people get informative errors if their
tsvectors are old-style rather than random incorrect results, but
that's mostly gilding.

In the general case of data type representation changes I think we
need something like:
  While we can mark indexes as invalid (which we
 do), how do we mark a table's contents as invalid?  Should we create
 rules so no one can see the data and then have the ALTER TABLE script
 remove the rules after it is rebuilt?

 OK, what ideas do people have to prevent access to tsvector columns?  I
 am thinking of renaming the tables or something.

1 Change the catalog so all the tsvector colums are bytea.

2 Include a c function like migrate_tsvector(bytea) which contains a
copy of the old data type's output function and calls the new  data
type's input function on the result.

3 Include an ALTER TABLE command which calls the c function.

The gotchas I can see with this is:

1) It only works for varlenas -- There isn't a universal fixed length
data  type. You would probably have to invent one.

2) I'm not sure what will happen to rules and triggers which call
functions on the old data type. If you restore the schema unchanged
and modify the catalog directly then they will still be there but have
mismatched types. Will users get errors? Will those errors be sensible
errors or nonsensical ones? Will the conversion still go ahead or will
it complain that there are things which depend on the column?

If the problems in (2) prove surmountable then this provides a general
solution for any varlena data type representation change. However it
will still be a O(n) conversion plus an index rebuild. That's
unfortunate but unless we plan to ship the full set of operators,
opclasses, opfamilies, cross-data-type operators, etc for the old data
type I see no way around it.

I haven't heard anyone suggest we should roll back the tsvector
changes and give up the features the changes  provide -- and that's
just a performance feature. If that's all it took to convince us to
give up in-place-upgrade for this data type then imagine how easy it
will be to justify for actual functional features.

(Personally I think we're fooling ourselves to think Postgres is
mature enough that we won't come up with any new improvements which
will justify a data format change. I would rather hope we'll keep
coming up with massive improvements which require major changes in
every release.)

-- 
greg

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_migrator and an 8.3-compatible tsvector data type

2009-05-31 Thread Bruce Momjian

Bruce Momjian wrote:
 Bruce Momjian wrote:
  Tom Lane wrote:
   Bruce Momjian br...@momjian.us writes:
I have discovered a simpler solution using ALTER TABLE and calling a
conversion function:
   
test= CREATE TABLE tsvector_test(x tsvector);
CREATE TABLE
test= ALTER TABLE tsvector_test ALTER COLUMN x TYPE tsvector 
test- USING conversion_func(x);
ALTER TABLE
   
No need for a fake data type and the required index infrastructure.
   
   I think this is basically a large-caliber foot gun.  You're going to
   pretend that invalid data is valid, until the user gets around to fixing
   it?
  
  What choice do we have?  While we can mark indexes as invalid (which we
  do), how do we mark a table's contents as invalid?  Should we create
  rules so no one can see the data and then have the ALTER TABLE script
  remove the rules after it is rebuilt?
 
 OK, what ideas do people have to prevent access to tsvector columns?  I
 am thinking of renaming the tables or something.

Another idea would be to rename just the _column_ (not the table), but
that doesn't block SELECT * from showing the column.  Another idea would
be to replace tsvector's out function, but that would affect all tables
meaning it would be hard to incrementally rebuild tables using tsvector
columns.  Another idea would be to use column-level permissions to
prevent column access, but that doesn't affect super-users.

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_migrator and an 8.3-compatible tsvector data type

2009-05-31 Thread Robert Haas

On Sat, May 30, 2009 at 11:58 PM, Bruce Momjian br...@momjian.us wrote:
 The way the restore works, you would have to call the data type
 'tsvector', and then that oid is linked around in all sort of odd places
 and you need to change them all, hence confusion.

In theory, I think it would be possible to hack around this problem by
doing the following:

1. Change the OID of 8.4's tsvector type to an OID that was not in use in 8.3.
2. Add a type called tsvector83 to 8.4 using the same OID that the
tsvector type used in 8.3.
3. Hack pg_dump so that when dumping in upgrade mode, tsvector
references from 8.3 are emitted as tsvector83.

This would mean that you'd be on line immediately following the
upgrade, because the tsvector83 type would still be usable; it just
wouldn't have the improved performance of the new tsvector.
Unfortunately, it would also require a catversion bump (and some
programming work besides, not sure how much).  Given that we are 7
months after the start of the last commitfest, which was theoretically
our final feature freeze, this is probably not realistic, but if it's
not too ugly it might be considered if a similar situation comes up
during 8.5 development.

(At the risk of beating a dead horse, note if we were upgrading the
catalog tables directly via SQL, this type of scenario could be
handled cleanly without hacking pg_dump; I repeat my earlier critique
that the pg_migrator approach consigns us to a never-ending series of
pg_dump hacks, that or having it not work very well.)

...Robert

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_migrator and an 8.3-compatible tsvector data type

2009-05-31 Thread Bruce Momjian

Robert Haas wrote:
 On Sat, May 30, 2009 at 11:58 PM, Bruce Momjian br...@momjian.us wrote:
  The way the restore works, you would have to call the data type
  'tsvector', and then that oid is linked around in all sort of odd places
  and you need to change them all, hence confusion.
 
 In theory, I think it would be possible to hack around this problem by
 doing the following:
 
 1. Change the OID of 8.4's tsvector type to an OID that was not in use in 
 8.3.
 2. Add a type called tsvector83 to 8.4 using the same OID that the
 tsvector type used in 8.3.
 3. Hack pg_dump so that when dumping in upgrade mode, tsvector
 references from 8.3 are emitted as tsvector83.

The pg_type oids are never transfered by pg_migrator; all schema
information is transfered as pg_dump --schema stuff, i.e. ASCII.

 This would mean that you'd be on line immediately following the
 upgrade, because the tsvector83 type would still be usable; it just
 wouldn't have the improved performance of the new tsvector.
 Unfortunately, it would also require a catversion bump (and some
 programming work besides, not sure how much).  Given that we are 7
 months after the start of the last commitfest, which was theoretically
 our final feature freeze, this is probably not realistic, but if it's
 not too ugly it might be considered if a similar situation comes up
 during 8.5 development.
 
 (At the risk of beating a dead horse, note if we were upgrading the
 catalog tables directly via SQL, this type of scenario could be
 handled cleanly without hacking pg_dump; I repeat my earlier critique
 that the pg_migrator approach consigns us to a never-ending series of
 pg_dump hacks, that or having it not work very well.)

We could create infrastructure to handle the 8.3 storage format but my
guess is that most people would just rather rebuild the tables.

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_migrator and an 8.3-compatible tsvector data type

2009-05-31 Thread Tom Lane

Robert Haas robertmh...@gmail.com writes:
 (At the risk of beating a dead horse, note if we were upgrading the
 catalog tables directly via SQL, this type of scenario could be
 handled cleanly without hacking pg_dump; I repeat my earlier critique
 that the pg_migrator approach consigns us to a never-ending series of
 pg_dump hacks, that or having it not work very well.)

Updating the catalog tables directly via SQL?  Good luck making that
work.  If you ever get it to work at all, it'll require a pile of hacks
that will make pg_migrator look like paradise.

(Which is not to say that pg_migrator isn't a hack; it surely is.
But it looks like the least painful approach available.)

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_migrator and an 8.3-compatible tsvector data type

2009-05-30 Thread Bruce Momjian

Bruce Momjian wrote:
 Tom Lane wrote:
  Josh Berkus j...@agliodbs.com writes:
   It would be nice to have pg_migrator handle this, especially if we could 
   do it in parallel.  Then we just have to warn users that migrating a 
   database with tsvector columns takes significantly longer.  That is,
  
   1) do rest of catalog swap and link/copy of objects.
   2) mark all tsvector columns as 83_tsvector and add new tsvector type
   (these columns will be unusable for queries)
   3) bring up database
   4) search for all 83_tsvector columns
   5) do ALTER TABLE on each of these columns, in parallel, up to a 
   configuration setting (default 3).
  
  pg_migrator is already emitting a script that is intended to be run
  after conversion, to handle REINDEXing of incompatible indexes.  That
  could easily be made to do ALTER TYPE on old tsvector columns too, no?
 
 Hmmm.  Well, the problem right now with v8_3_tsvector is that it fails
 on index create, even when the index is empty, because I can't figure
 out how to simply set up the proper index catalog entries.  Once that is
 fixed and I can bind tsvector to v8_3_tsvector on schema creation, I can
 easily emit ALTER TABLE to fix the issue.  And, at that point the
 tsvector GIN indexes would be automatically created so I can skip that
 part.

I have discovered a simpler solution using ALTER TABLE and calling a
conversion function:

test= CREATE TABLE tsvector_test(x tsvector);
CREATE TABLE
test= ALTER TABLE tsvector_test ALTER COLUMN x TYPE tsvector 
test- USING conversion_func(x);
ALTER TABLE

No need for a fake data type and the required index infrastructure.

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_migrator and an 8.3-compatible tsvector data type

2009-05-30 Thread Greg Stark

On Sat, May 30, 2009 at 1:11 PM, Bruce Momjian br...@momjian.us wrote:
 I have discovered a simpler solution using ALTER TABLE and calling a
 conversion function:

        test= CREATE TABLE tsvector_test(x tsvector);
        CREATE TABLE
        test= ALTER TABLE tsvector_test ALTER COLUMN x TYPE tsvector
        test- USING conversion_func(x);
        ALTER TABLE

 No need for a fake data type and the required index infrastructure.

I assume you're putting this in the list of commands to run
post-migration along with any reindex commands etc? Because it will
take a while (still faster than dump/reload i think).

For this case, assuming the new tsvector's output function doesn't get
confused by the old ordering, I think you can just use USING
x::text::tsvector as your conversion expression. For more complex
cases you might need to package up the old output function.

Also note that you'll want to do any other conversions in the same
table at the same time rather than doing multiple conversions.

Also, one gotcha to note is that tsvector data can appear inside
composite data types or arrays. I don't think that's common so perhaps
just a warning in the readme would suffice, but it's something to note
at least.

-- 
greg

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_migrator and an 8.3-compatible tsvector data type

2009-05-30 Thread Dimitri Fontaine


Hi,

Le 30 mai 09 à 16:02, Greg Stark a écrit :
On Sat, May 30, 2009 at 1:11 PM, Bruce Momjian br...@momjian.us  
wrote:

I have discovered a simpler solution using ALTER TABLE and calling a
conversion function:

   test= CREATE TABLE tsvector_test(x tsvector);
   CREATE TABLE
   test= ALTER TABLE tsvector_test ALTER COLUMN x TYPE tsvector
   test- USING conversion_func(x);
   ALTER TABLE

No need for a fake data type and the required index infrastructure.


I assume you're putting this in the list of commands to run
post-migration along with any reindex commands etc? Because it will
take a while (still faster than dump/reload i think).


Just thinking some more about the idea to get all those post- 
processing steps running in parallel, it's occurring to me that we  
have all we need already: would it be possible for pg_migrator to  
issue a schema only script with a catalog, in the custom archive format?
Then we could use pg_restore -j whatever post_migrator.script to run  
the last migration step.


Of course, people will want the custom script output of pg_migrator to  
be optional, I guess.


Regards,
--
dim
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_migrator and an 8.3-compatible tsvector data type

2009-05-30 Thread Tom Lane

Bruce Momjian br...@momjian.us writes:
 I have discovered a simpler solution using ALTER TABLE and calling a
 conversion function:

   test= CREATE TABLE tsvector_test(x tsvector);
   CREATE TABLE
   test= ALTER TABLE tsvector_test ALTER COLUMN x TYPE tsvector 
   test- USING conversion_func(x);
   ALTER TABLE

 No need for a fake data type and the required index infrastructure.

I think this is basically a large-caliber foot gun.  You're going to
pretend that invalid data is valid, until the user gets around to fixing
it?

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_migrator and an 8.3-compatible tsvector data type

2009-05-30 Thread Bruce Momjian

Dimitri Fontaine wrote:
 Hi,
 
 Le 30 mai 09 ? 16:02, Greg Stark a ?crit :
  On Sat, May 30, 2009 at 1:11 PM, Bruce Momjian br...@momjian.us  
  wrote:
  I have discovered a simpler solution using ALTER TABLE and calling a
  conversion function:
 
 test= CREATE TABLE tsvector_test(x tsvector);
 CREATE TABLE
 test= ALTER TABLE tsvector_test ALTER COLUMN x TYPE tsvector
 test- USING conversion_func(x);
 ALTER TABLE
 
  No need for a fake data type and the required index infrastructure.
 
  I assume you're putting this in the list of commands to run
  post-migration along with any reindex commands etc? Because it will
  take a while (still faster than dump/reload i think).

Yes.

 Just thinking some more about the idea to get all those post- 
 processing steps running in parallel, it's occurring to me that we  
 have all we need already: would it be possible for pg_migrator to  
 issue a schema only script with a catalog, in the custom archive format?
 Then we could use pg_restore -j whatever post_migrator.script to run  
 the last migration step.
 
 Of course, people will want the custom script output of pg_migrator to  
 be optional, I guess.

Yes, eventually that is possible.

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_migrator and an 8.3-compatible tsvector data type

2009-05-30 Thread Bruce Momjian

Tom Lane wrote:
 Bruce Momjian br...@momjian.us writes:
  I have discovered a simpler solution using ALTER TABLE and calling a
  conversion function:
 
  test= CREATE TABLE tsvector_test(x tsvector);
  CREATE TABLE
  test= ALTER TABLE tsvector_test ALTER COLUMN x TYPE tsvector 
  test- USING conversion_func(x);
  ALTER TABLE
 
  No need for a fake data type and the required index infrastructure.
 
 I think this is basically a large-caliber foot gun.  You're going to
 pretend that invalid data is valid, until the user gets around to fixing
 it?

What choice do we have?  While we can mark indexes as invalid (which we
do), how do we mark a table's contents as invalid?  Should we create
rules so no one can see the data and then have the ALTER TABLE script
remove the rules after it is rebuilt?

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_migrator and an 8.3-compatible tsvector data type

2009-05-30 Thread Bruce Momjian

Bruce Momjian wrote:
  Just thinking some more about the idea to get all those post- 
  processing steps running in parallel, it's occurring to me that we  
  have all we need already: would it be possible for pg_migrator to  
  issue a schema only script with a catalog, in the custom archive format?
  Then we could use pg_restore -j whatever post_migrator.script to run  
  the last migration step.
  
  Of course, people will want the custom script output of pg_migrator to  
  be optional, I guess.
 
 Yes, eventually that is possible.

I have added that as a pg_migrator TODO item.

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_migrator and an 8.3-compatible tsvector data type

2009-05-30 Thread Bruce Momjian

Bruce Momjian wrote:
 Tom Lane wrote:
  Bruce Momjian br...@momjian.us writes:
   I have discovered a simpler solution using ALTER TABLE and calling a
   conversion function:
  
 test= CREATE TABLE tsvector_test(x tsvector);
 CREATE TABLE
 test= ALTER TABLE tsvector_test ALTER COLUMN x TYPE tsvector 
 test- USING conversion_func(x);
 ALTER TABLE
  
   No need for a fake data type and the required index infrastructure.
  
  I think this is basically a large-caliber foot gun.  You're going to

Uh, isn't pg_migrator itself a large-caliber foot gun?  No need to call
just part of it a foot gun.  ;-)

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_migrator and an 8.3-compatible tsvector data type

2009-05-30 Thread Tom Lane

Bruce Momjian br...@momjian.us writes:
 Tom Lane wrote:
 I think this is basically a large-caliber foot gun.  You're going to
 pretend that invalid data is valid, until the user gets around to fixing
 it?

 What choice do we have?

Create a fake data type, just as you said before.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_migrator and an 8.3-compatible tsvector data type

2009-05-30 Thread Greg Stark

On Sat, May 30, 2009 at 6:23 PM, Bruce Momjian br...@momjian.us wrote:
 I think this is basically a large-caliber foot gun.  You're going to
 pretend that invalid data is valid, until the user gets around to fixing
 it?

 What choice do we have?

Well you can store the data in a new fake data type (or even just mark
the column as a bytea -- since any varlena is as good as any other).
Then provide this conversion function to create the new data.

I suppose that means you should drop the indexes since if you leave
them things could get weird. But doing the conversion would have to
rebuild indexes anyways so CREATE INDEX should run in the same time as
that step of the conversion would have taken. It would be nice if you
could leave them around so the conversion would rebuild them
automatically, but that would require creating operators and an
opclass for the fake data type which would be more of a pain than just
marking the column as a bytea or a data type with no operators.



-- 
greg

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_migrator and an 8.3-compatible tsvector data type

2009-05-30 Thread Bruce Momjian

Tom Lane wrote:
 Bruce Momjian br...@momjian.us writes:
  Tom Lane wrote:
  I think this is basically a large-caliber foot gun.  You're going to
  pretend that invalid data is valid, until the user gets around to fixing
  it?
 
  What choice do we have?
 
 Create a fake data type, just as you said before.

But I have to create fake index infastructure, and I have to then modify
every reference to that data time at the end, even references in rules
and expression indexes --- that seems too complex.

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_migrator and an 8.3-compatible tsvector data type

2009-05-30 Thread Bruce Momjian

Greg Stark wrote:
 On Sat, May 30, 2009 at 1:11 PM, Bruce Momjian br...@momjian.us wrote:
  I have discovered a simpler solution using ALTER TABLE and calling a
  conversion function:
 
  ? ? ? ?test= CREATE TABLE tsvector_test(x tsvector);
  ? ? ? ?CREATE TABLE
  ? ? ? ?test= ALTER TABLE tsvector_test ALTER COLUMN x TYPE tsvector
  ? ? ? ?test- USING conversion_func(x);
  ? ? ? ?ALTER TABLE
 
  No need for a fake data type and the required index infrastructure.
 
 I assume you're putting this in the list of commands to run
 post-migration along with any reindex commands etc? Because it will
 take a while (still faster than dump/reload i think).

Yes, current CVS now does this.  I added logic so if you rebuilt the
table, the index will not be rebuilt again because it is a GIN index.

 For this case, assuming the new tsvector's output function doesn't get
 confused by the old ordering, I think you can just use USING
 x::text::tsvector as your conversion expression. For more complex
 cases you might need to package up the old output function.

Wow, cool idea.  I will test that.

 Also note that you'll want to do any other conversions in the same
 table at the same time rather than doing multiple conversions.

Wow, I didn't know we supported that too:

 ALTER TABLE t ALTER COLUMN x TYPE tsvector USING x,  ALTER
COLUMN y TYPE tsvector USING y;

I will work on that also.

 Also, one gotcha to note is that tsvector data can appear inside
 composite data types or arrays. I don't think that's common so perhaps
 just a warning in the readme would suffice, but it's something to note
 at least.

Wow, not sure how I would even find that.

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_migrator and an 8.3-compatible tsvector data type

2009-05-30 Thread Bruce Momjian

Greg Stark wrote:
 On Sat, May 30, 2009 at 6:23 PM, Bruce Momjian br...@momjian.us wrote:
  I think this is basically a large-caliber foot gun. ?You're going to
  pretend that invalid data is valid, until the user gets around to fixing
  it?
 
  What choice do we have?
 
 Well you can store the data in a new fake data type (or even just mark
 the column as a bytea -- since any varlena is as good as any other).
 Then provide this conversion function to create the new data.
 
 I suppose that means you should drop the indexes since if you leave
 them things could get weird. But doing the conversion would have to
 rebuild indexes anyways so CREATE INDEX should run in the same time as
 that step of the conversion would have taken. It would be nice if you
 could leave them around so the conversion would rebuild them
 automatically, but that would require creating operators and an
 opclass for the fake data type which would be more of a pain than just
 marking the column as a bytea or a data type with no operators.

The way the restore works, you would have to call the data type
'tsvector', and then that oid is linked around in all sort of odd places
and you need to change them all, hence confusion.

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_migrator and an 8.3-compatible tsvector data type

2009-05-29 Thread Stefan Kaltenbrunner


Alvaro Herrera wrote:

Bruce Momjian wrote:

Alvaro Herrera wrote:



There are so many caveats on pg_migrator (and things that need to be
done after the migration is complete) that one starts to wonder if
people is not better off just using parallel pg_restore.  From Stefan's
reported timings I'm not sure that pg_migrator is that much of a benefit
in the first place ... unless copy mode can be made much faster.  (On
link mode it is so much faster that it's worth it, but then you don't
have an escape hatch).

That is accurate.  I doubt copy mode speed can be improved.


Why not?  Right now it's single-threaded.  Would it be faster if it ran
several copies in parallel?


I guess it would be much faster on powerful hardware - we also have to 
consider that copy mode now is a no-op really.
If it had to do any actual page conversation too it seems entirely 
possible that a parallel restore might be even faster that a single 
threaded pg_migrator in copy mode.



Stefan

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_migrator and an 8.3-compatible tsvector data type

2009-05-29 Thread Zdenek Kotala


Bruce Momjian píše v čt 28. 05. 2009 v 17:42 -0400:
 Josh Berkus wrote:
  On 5/28/09 2:30 PM, Bruce Momjian wrote:
   Because no one has responded, I am going to prevent pg_migrator from
   working with a cluster that uses tsvector.  I realize this limits
   pg_migrator's usefulness, but I have to move on.
  
  I don't know how to fix the issue, certainly.
  
  Why was there a format change to tsvector anyway?  Was there an 
  important reason for this?
 
 The ordering of the lexems was changed:

The biggest problem is dictionary change. I'm not sure if it happened
but IIRC Teodor mentioned it in Ottawa. If it happened It hits down
tsvector compatibility at all.  

Zdenek


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_migrator and an 8.3-compatible tsvector data type

2009-05-29 Thread Tom Lane

Alvaro Herrera alvhe...@commandprompt.com writes:
 Bruce Momjian wrote:
 Alvaro Herrera wrote:
 Why not?  Right now it's single-threaded.  Would it be faster if it ran
 several copies in parallel?
 
 Sure, but that assumes you have parallel I/O channels;  I assume right
 now it is I/O limited.

 But so does parallel pg_restore, no?

The point of parallel pg_restore is that COPY is frequently CPU-bound to
some extent, and so you can put multiple CPUs to work by parallelizing.
I find it much less probable that multiple cp operations can be
parallelized, unless the DB is spread across multiple tablespaces and
the code is smart enough to interleave by tablespace.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_migrator and an 8.3-compatible tsvector data type

2009-05-29 Thread Tom Lane

Zdenek Kotala zdenek.kot...@sun.com writes:
 The biggest problem is dictionary change. I'm not sure if it happened
 but IIRC Teodor mentioned it in Ottawa. If it happened It hits down
 tsvector compatibility at all.  

No more than changing dictionary behavior in an existing installation.
What was stated when the issue came up during 8.3 development is that
you don't normally need to worry about small changes in dictionary
behavior because overall text search behavior will still be close
enough.  (I seem to recall that I'd complained that any change in
dictionary behavior would invalidate indexes based on the dictionary,
and this was the answer.)

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_migrator and an 8.3-compatible tsvector data type

2009-05-29 Thread Zdenek Kotala


Tom Lane píše v pá 29. 05. 2009 v 11:28 -0400:
 Zdenek Kotala zdenek.kot...@sun.com writes:
  The biggest problem is dictionary change. I'm not sure if it happened
  but IIRC Teodor mentioned it in Ottawa. If it happened It hits down
  tsvector compatibility at all.  
 
 No more than changing dictionary behavior in an existing installation.
 What was stated when the issue came up during 8.3 development is that
 you don't normally need to worry about small changes in dictionary
 behavior because overall text search behavior will still be close
 enough.  (I seem to recall that I'd complained that any change in
 dictionary behavior would invalidate indexes based on the dictionary,
 and this was the answer.)

It seems to be OK then. However, pg_migrator should inform about it.

Zdenek


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_migrator and an 8.3-compatible tsvector data type

2009-05-29 Thread Bruce Momjian

Tom Lane wrote:
 Josh Berkus j...@agliodbs.com writes:
  Bruce,
  The ordering of the lexems was changed:
 
  What does that get us in terms of performance etc.?
 
 It was changed to support partial-match tsvector queries.  Without it,
 a partial match query would have to scan entire tsvectors instead
 of applying binary search.  I don't know if Oleg and Teodor did any
 actual performance tests on the size of the hit, but it seems like
 it could be pretty awful for large documents.

I started thinking about the performance issues of the tsvector changes.
Teodor gave me this code for conversion that basically does:

qsort_arg((void *) ARRPTR(t), t-size, sizeof(WordEntry), cmpLexeme, 
(void*) t);

So, basically, every time there is a cast we have to do a sort, which
for a large document would yield poor performance, and because we are
not storing the sorted result, it happens for every access;  this might
be an unacceptable performance burden.

So, one idea would be, instead of a cast, have pg_migrator rebuild the
tsvector columns with ALTER TABLE, so then the 8.4 index code could be
used.  But then we might as well just tell the users to migrate the
tsvector tables themselves, which is how pg_migrator behaves now.

Obviously we are still trying to figure out the best way to handle data
type changes;  I think as soon as we figure out a plan for tsvector we
can use that method for future changes.

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_migrator and an 8.3-compatible tsvector data type

2009-05-29 Thread Josh Berkus


Bruce,


So, one idea would be, instead of a cast, have pg_migrator rebuild the
tsvector columns with ALTER TABLE, so then the 8.4 index code could be
used.  But then we might as well just tell the users to migrate the
tsvector tables themselves, which is how pg_migrator behaves now.


It would be nice to have pg_migrator handle this, especially if we could 
do it in parallel.  Then we just have to warn users that migrating a 
database with tsvector columns takes significantly longer.  That is,


1) do rest of catalog swap and link/copy of objects.
2) mark all tsvector columns as 83_tsvector and add new tsvector type
   (these columns will be unusable for queries)
3) bring up database
4) search for all 83_tsvector columns
5) do ALTER TABLE on each of these columns, in parallel, up to a 
configuration setting (default 3).


However, I can also understand not having time to implement the above 
before 8.4 release.



--
Josh Berkus
PostgreSQL Experts Inc.
www.pgexperts.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_migrator and an 8.3-compatible tsvector data type

2009-05-29 Thread Tom Lane

Josh Berkus j...@agliodbs.com writes:
 It would be nice to have pg_migrator handle this, especially if we could 
 do it in parallel.  Then we just have to warn users that migrating a 
 database with tsvector columns takes significantly longer.  That is,

 1) do rest of catalog swap and link/copy of objects.
 2) mark all tsvector columns as 83_tsvector and add new tsvector type
 (these columns will be unusable for queries)
 3) bring up database
 4) search for all 83_tsvector columns
 5) do ALTER TABLE on each of these columns, in parallel, up to a 
 configuration setting (default 3).

pg_migrator is already emitting a script that is intended to be run
after conversion, to handle REINDEXing of incompatible indexes.  That
could easily be made to do ALTER TYPE on old tsvector columns too, no?

The parallel bit is pie in the sky and should not be considered even
for a millisecond during this release cycle.  Save it for 8.5, or
suggest to people that they manually cut the script apart if they're
desperate to have that.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_migrator and an 8.3-compatible tsvector data type

2009-05-29 Thread Bruce Momjian

Tom Lane wrote:
 Josh Berkus j...@agliodbs.com writes:
  It would be nice to have pg_migrator handle this, especially if we could 
  do it in parallel.  Then we just have to warn users that migrating a 
  database with tsvector columns takes significantly longer.  That is,
 
  1) do rest of catalog swap and link/copy of objects.
  2) mark all tsvector columns as 83_tsvector and add new tsvector type
  (these columns will be unusable for queries)
  3) bring up database
  4) search for all 83_tsvector columns
  5) do ALTER TABLE on each of these columns, in parallel, up to a 
  configuration setting (default 3).
 
 pg_migrator is already emitting a script that is intended to be run
 after conversion, to handle REINDEXing of incompatible indexes.  That
 could easily be made to do ALTER TYPE on old tsvector columns too, no?

Hmmm.  Well, the problem right now with v8_3_tsvector is that it fails
on index create, even when the index is empty, because I can't figure
out how to simply set up the proper index catalog entries.  Once that is
fixed and I can bind tsvector to v8_3_tsvector on schema creation, I can
easily emit ALTER TABLE to fix the issue.  And, at that point the
tsvector GIN indexes would be automatically created so I can skip that
part.

 The parallel bit is pie in the sky and should not be considered even
 for a millisecond during this release cycle.  Save it for 8.5, or
 suggest to people that they manually cut the script apart if they're
 desperate to have that.

Agreed.

FYI, this is 1% as hard as the Win32 port, so I am not discouraged.  ;-)

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_migrator and an 8.3-compatible tsvector data type

2009-05-28 Thread Bruce Momjian


Because no one has responed

---

Bruce Momjian wrote:
 I found out at PGCon that the internal format of tsvector changed
 slightly from 8.3 to 8.4.  Teodor gave me a conversion function and I
 have adjusted pg_migrator to install a v8_3_tsvector data type to be
 used during the load so the old user tables use that data type.  You can
 see the code here at the bottom of this file:
 
   
 http://cvs.pgfoundry.org/cgi-bin/cvsweb.cgi/pg-migrator/pg_migrator/src/pg_migrator.c?rev=1.36content-type=text/x-cvsweb-markup
 
 I am not done yet because eventually v8_3_tsvector will be moved into a
 separate schema and the tsvector data type restored.
 
 The problem I have is that while I have created v8_3_tsvector (with
 casts), I have not created index routines for it, so I get this error
 when restoring a database that uses a GIN index on tsvector:
 
   Restoring database schema  
   psql:/u/postgres/pg_migrator_dump_db.sql:7006: ERROR:  data type
   tsvector has no default operator class for access method gin
   HINT:  You must specify an operator class for the index or define a
   default operator class for the data type.
 
 So, how do I create a compatible set of hooks for v8_3_tsvector? 
 Because tsvector is a native data type I can't figure out how to set up
 a similar type.  Also, will the indexes be OK if they are created from
 types that don't have the 8.3 format?  What about new indexes created
 after the migration is done?
 
 Ideally the cast would just be called during index activity and the 8.4
 tsvector index routines used.  I am told by Teodor that GiST indexes
 would be fine with the 8.3 data format, and we are forcing the
 reindexing of GIN indexes anyway in 8.4.
 
 -- 
   Bruce Momjian  br...@momjian.ushttp://momjian.us
   EnterpriseDB http://enterprisedb.com
 
   + If your life is a hard drive, Christ can be your backup. +
 
 -- 
 Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-hackers

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_migrator and an 8.3-compatible tsvector data type

2009-05-28 Thread Bruce Momjian


Because no one has responded, I am going to prevent pg_migrator from
working with a cluster that uses tsvector.  I realize this limits
pg_migrator's usefulness, but I have to move on.

---

Bruce Momjian wrote:
 I found out at PGCon that the internal format of tsvector changed
 slightly from 8.3 to 8.4.  Teodor gave me a conversion function and I
 have adjusted pg_migrator to install a v8_3_tsvector data type to be
 used during the load so the old user tables use that data type.  You can
 see the code here at the bottom of this file:
 
   
 http://cvs.pgfoundry.org/cgi-bin/cvsweb.cgi/pg-migrator/pg_migrator/src/pg_migrator.c?rev=1.36content-type=text/x-cvsweb-markup
 
 I am not done yet because eventually v8_3_tsvector will be moved into a
 separate schema and the tsvector data type restored.
 
 The problem I have is that while I have created v8_3_tsvector (with
 casts), I have not created index routines for it, so I get this error
 when restoring a database that uses a GIN index on tsvector:
 
   Restoring database schema  
   psql:/u/postgres/pg_migrator_dump_db.sql:7006: ERROR:  data type
   tsvector has no default operator class for access method gin
   HINT:  You must specify an operator class for the index or define a
   default operator class for the data type.
 
 So, how do I create a compatible set of hooks for v8_3_tsvector? 
 Because tsvector is a native data type I can't figure out how to set up
 a similar type.  Also, will the indexes be OK if they are created from
 types that don't have the 8.3 format?  What about new indexes created
 after the migration is done?
 
 Ideally the cast would just be called during index activity and the 8.4
 tsvector index routines used.  I am told by Teodor that GiST indexes
 would be fine with the 8.3 data format, and we are forcing the
 reindexing of GIN indexes anyway in 8.4.
 
 -- 
   Bruce Momjian  br...@momjian.ushttp://momjian.us
   EnterpriseDB http://enterprisedb.com
 
   + If your life is a hard drive, Christ can be your backup. +
 
 -- 
 Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-hackers

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_migrator and an 8.3-compatible tsvector data type

2009-05-28 Thread Josh Berkus


On 5/28/09 2:30 PM, Bruce Momjian wrote:

Because no one has responded, I am going to prevent pg_migrator from
working with a cluster that uses tsvector.  I realize this limits
pg_migrator's usefulness, but I have to move on.


I don't know how to fix the issue, certainly.

Why was there a format change to tsvector anyway?  Was there an 
important reason for this?


--
Josh Berkus
PostgreSQL Experts Inc.
www.pgexperts.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_migrator and an 8.3-compatible tsvector data type

2009-05-28 Thread Bruce Momjian

Josh Berkus wrote:
 On 5/28/09 2:30 PM, Bruce Momjian wrote:
  Because no one has responded, I am going to prevent pg_migrator from
  working with a cluster that uses tsvector.  I realize this limits
  pg_migrator's usefulness, but I have to move on.
 
 I don't know how to fix the issue, certainly.
 
 Why was there a format change to tsvector anyway?  Was there an 
 important reason for this?

The ordering of the lexems was changed:

/*
 * Convert 8.3 tsvector to 8.4
 *
 * 8.3 sorts lexemes by its length and if lengths are the same then it 
uses
 * alphabetic order;  8.4 sorts lexemes in lexicographical order, e.g.
 *
 * = SELECT 'c bb aaa'::tsvector;
 * tsvector
 * 
 *  'aaa' 'bb' 'c' -- 8.4
 *  'c' 'bb' 'aaa' -- 8.3
 */

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_migrator and an 8.3-compatible tsvector data type

2009-05-28 Thread Josh Berkus


Bruce,


The ordering of the lexems was changed:

/*
 * Convert 8.3 tsvector to 8.4
 *
 * 8.3 sorts lexemes by its length and if lengths are the same then it 
uses
 * alphabetic order;  8.4 sorts lexemes in lexicographical order, e.g.
 *
 * =  SELECT 'c bb aaa'::tsvector;
 * tsvector
 * 
 *  'aaa' 'bb' 'c' -- 8.4
 *  'c' 'bb' 'aaa' -- 8.3
 */


What does that get us in terms of performance etc.?

Given that it's going to prevent users of Tsearch from 
upgrading-in-place, the format change ought to be giving us some serious 
gains.  Otherwise we should put it off until we need to make other 
datatype changes.


--
Josh Berkus
PostgreSQL Experts Inc.
www.pgexperts.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_migrator and an 8.3-compatible tsvector data type

2009-05-28 Thread Bruce Momjian

Josh Berkus wrote:
 Bruce,
 
  The ordering of the lexems was changed:
 
  /*
   * Convert 8.3 tsvector to 8.4
   *
   * 8.3 sorts lexemes by its length and if lengths are the same then it 
  uses
   * alphabetic order;  8.4 sorts lexemes in lexicographical order, e.g.
   *
   * =  SELECT 'c bb aaa'::tsvector;
   * tsvector
   * 
   *  'aaa' 'bb' 'c' -- 8.4
   *  'c' 'bb' 'aaa' -- 8.3
   */
 
 What does that get us in terms of performance etc.?
 
 Given that it's going to prevent users of Tsearch from 
 upgrading-in-place, the format change ought to be giving us some serious 
 gains.  Otherwise we should put it off until we need to make other 
 datatype changes.

No idea but now that we are in beta we would then be invalidating beta
tester data.  The area I got stuck on is that there is no CAST behavior
when creating an index.  We are already invalidating GIN indexes, but I
can't even create an index to support the old data type.

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_migrator and an 8.3-compatible tsvector data type

2009-05-28 Thread Tom Lane

Josh Berkus j...@agliodbs.com writes:
 Bruce,
 The ordering of the lexems was changed:

 What does that get us in terms of performance etc.?

It was changed to support partial-match tsvector queries.  Without it,
a partial match query would have to scan entire tsvectors instead
of applying binary search.  I don't know if Oleg and Teodor did any
actual performance tests on the size of the hit, but it seems like
it could be pretty awful for large documents.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_migrator and an 8.3-compatible tsvector data type

2009-05-28 Thread Tom Lane

Bruce Momjian br...@momjian.us writes:
 Josh Berkus wrote:
 Given that it's going to prevent users of Tsearch from 
 upgrading-in-place, the format change ought to be giving us some serious 
 gains.  Otherwise we should put it off until we need to make other 
 datatype changes.

 No idea but now that we are in beta we would then be invalidating beta
 tester data.  The area I got stuck on is that there is no CAST behavior
 when creating an index.  We are already invalidating GIN indexes, but I
 can't even create an index to support the old data type.

It's certainly doable.  Bruce is just applying the strategy he mentioned
in our talk ;-)

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_migrator and an 8.3-compatible tsvector data type

2009-05-28 Thread Josh Berkus


Tom,


It was changed to support partial-match tsvector queries.  Without it,
a partial match query would have to scan entire tsvectors instead
of applying binary search.  I don't know if Oleg and Teodor did any
actual performance tests on the size of the hit, but it seems like
it could be pretty awful for large documents.


Ah, ok, that's pretty significant.  It makes a huge difference for some 
users, one of whom will be quoted on the press release.


--
Josh Berkus
PostgreSQL Experts Inc.
www.pgexperts.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_migrator and an 8.3-compatible tsvector data type

2009-05-28 Thread Bruce Momjian

Tom Lane wrote:
 Bruce Momjian br...@momjian.us writes:
  Josh Berkus wrote:
  Given that it's going to prevent users of Tsearch from 
  upgrading-in-place, the format change ought to be giving us some serious 
  gains.  Otherwise we should put it off until we need to make other 
  datatype changes.
 
  No idea but now that we are in beta we would then be invalidating beta
  tester data.  The area I got stuck on is that there is no CAST behavior
  when creating an index.  We are already invalidating GIN indexes, but I
  can't even create an index to support the old data type.
 
 It's certainly doable.  Bruce is just applying the strategy he mentioned
 in our talk ;-)

Kind of --- I am stuck because we don't auto-cast for index usage like
we do for function calls and operators, so I will just keep going and
disable tsvector.

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_migrator and an 8.3-compatible tsvector data type

2009-05-28 Thread Greg Stark

On Thu, May 28, 2009 at 11:26 PM, Bruce Momjian br...@momjian.us wrote:
 Kind of --- I am stuck because we don't auto-cast for index usage like
 we do for function calls and operators, so I will just keep going and
 disable tsvector.

I was pretty stumped by what you meant by auto-cast but now I'm
beginning to think what you're looking for are cross-type comparison
operators? You could define a bunch of such operators which manually
call the cast on one side and add them to the operator class. I'm not
sure how to set up the operator class entries for GIN though.

-- 
greg

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_migrator and an 8.3-compatible tsvector data type

2009-05-28 Thread Tom Lane

Greg Stark st...@enterprisedb.com writes:
 On Thu, May 28, 2009 at 11:26 PM, Bruce Momjian br...@momjian.us wrote:
 Kind of --- I am stuck because we don't auto-cast for index usage like
 we do for function calls and operators, so I will just keep going and
 disable tsvector.

 I was pretty stumped by what you meant by auto-cast but now I'm
 beginning to think what you're looking for are cross-type comparison
 operators? You could define a bunch of such operators which manually
 call the cast on one side and add them to the operator class. I'm not
 sure how to set up the operator class entries for GIN though.

No, what we need is a GIN (and probably GIST) opclass to support
old-style tsvectors.  The user-level operators aren't really an issue
because an implicit cast will get inserted in front of them when
necessary, but we don't support that for indexes.

People who want decent performance on partial match queries (or maybe
even *any* TS queries) are going to have to convert to the new format
anyway, though, so maybe this isn't something worth putting a whole lot
of work into.  It might be sufficient if the database conversion
succeeds but leaves the tsvector columns unusable until they're
converted.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_migrator and an 8.3-compatible tsvector data type

2009-05-28 Thread Alvaro Herrera

Tom Lane wrote:

 People who want decent performance on partial match queries (or maybe
 even *any* TS queries) are going to have to convert to the new format
 anyway, though, so maybe this isn't something worth putting a whole lot
 of work into.  It might be sufficient if the database conversion
 succeeds but leaves the tsvector columns unusable until they're
 converted.

There are so many caveats on pg_migrator (and things that need to be
done after the migration is complete) that one starts to wonder if
people is not better off just using parallel pg_restore.  From Stefan's
reported timings I'm not sure that pg_migrator is that much of a benefit
in the first place ... unless copy mode can be made much faster.  (On
link mode it is so much faster that it's worth it, but then you don't
have an escape hatch).

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_migrator and an 8.3-compatible tsvector data type

2009-05-28 Thread Bruce Momjian

Alvaro Herrera wrote:
 Tom Lane wrote:
 
  People who want decent performance on partial match queries (or maybe
  even *any* TS queries) are going to have to convert to the new format
  anyway, though, so maybe this isn't something worth putting a whole lot
  of work into.  It might be sufficient if the database conversion
  succeeds but leaves the tsvector columns unusable until they're
  converted.
 
 There are so many caveats on pg_migrator (and things that need to be
 done after the migration is complete) that one starts to wonder if
 people is not better off just using parallel pg_restore.  From Stefan's
 reported timings I'm not sure that pg_migrator is that much of a benefit
 in the first place ... unless copy mode can be made much faster.  (On
 link mode it is so much faster that it's worth it, but then you don't
 have an escape hatch).

That is accurate.  I doubt copy mode speed can be improved.  I think
doing a backup, then using --link mode will be the most common usage
pattern.

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_migrator and an 8.3-compatible tsvector data type

2009-05-28 Thread Bruce Momjian

Greg Stark wrote:
 On Fri, May 29, 2009 at 12:53 AM, Bruce Momjian br...@momjian.us wrote:
  That is accurate. ?I doubt copy mode speed can be improved. ?I think
  doing a backup, then using --link mode will be the most common usage
  pattern.
 
 Well doing a backup will take more or less the same time as copy mode, no?
 
 Actually there's a big win for people who can take a snapshot using
 their SAN or filesystem, but that's not everyone.

Well, you are up during the backup.

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_migrator and an 8.3-compatible tsvector data type

2009-05-28 Thread Greg Stark

On Fri, May 29, 2009 at 12:53 AM, Bruce Momjian br...@momjian.us wrote:
 That is accurate.  I doubt copy mode speed can be improved.  I think
 doing a backup, then using --link mode will be the most common usage
 pattern.

Well doing a backup will take more or less the same time as copy mode, no?

Actually there's a big win for people who can take a snapshot using
their SAN or filesystem, but that's not everyone.

-- 
greg

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_migrator and an 8.3-compatible tsvector data type

2009-05-28 Thread Alvaro Herrera

Bruce Momjian wrote:
 Alvaro Herrera wrote:

  There are so many caveats on pg_migrator (and things that need to be
  done after the migration is complete) that one starts to wonder if
  people is not better off just using parallel pg_restore.  From Stefan's
  reported timings I'm not sure that pg_migrator is that much of a benefit
  in the first place ... unless copy mode can be made much faster.  (On
  link mode it is so much faster that it's worth it, but then you don't
  have an escape hatch).
 
 That is accurate.  I doubt copy mode speed can be improved.

Why not?  Right now it's single-threaded.  Would it be faster if it ran
several copies in parallel?

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_migrator and an 8.3-compatible tsvector data type

2009-05-28 Thread Bruce Momjian

Alvaro Herrera wrote:
 Bruce Momjian wrote:
  Alvaro Herrera wrote:
 
   There are so many caveats on pg_migrator (and things that need to be
   done after the migration is complete) that one starts to wonder if
   people is not better off just using parallel pg_restore.  From Stefan's
   reported timings I'm not sure that pg_migrator is that much of a benefit
   in the first place ... unless copy mode can be made much faster.  (On
   link mode it is so much faster that it's worth it, but then you don't
   have an escape hatch).
  
  That is accurate.  I doubt copy mode speed can be improved.
 
 Why not?  Right now it's single-threaded.  Would it be faster if it ran
 several copies in parallel?

Sure, but that assumes you have parallel I/O channels;  I assume right
now it is I/O limited.

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_migrator and an 8.3-compatible tsvector data type

2009-05-28 Thread Alvaro Herrera

Bruce Momjian wrote:
 Alvaro Herrera wrote:

  Why not?  Right now it's single-threaded.  Would it be faster if it ran
  several copies in parallel?
 
 Sure, but that assumes you have parallel I/O channels;  I assume right
 now it is I/O limited.

But so does parallel pg_restore, no?

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_migrator and an 8.3-compatible tsvector data type

2009-05-28 Thread Josh Berkus


Bruce,


Sure, but that assumes you have parallel I/O channels;  I assume right
now it is I/O limited.


Even a simple cp will be speeded up between 50% and 150% (depending on 
your storage) by doing several files in parallel.  However, I'm not sure 
that's what you should be spending your time on.


--
Josh Berkus
PostgreSQL Experts Inc.
www.pgexperts.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_migrator and an 8.3-compatible tsvector data type

2009-05-28 Thread Andrew Dunstan




Alvaro Herrera wrote:

Bruce Momjian wrote:
  

Alvaro Herrera wrote:



  

Why not?  Right now it's single-threaded.  Would it be faster if it ran
several copies in parallel?
  

Sure, but that assumes you have parallel I/O channels;  I assume right
now it is I/O limited.



But so does parallel pg_restore, no?

  


Well, no, it assumes you're not saturating your I/O bandwidth, which 
turns out to be the case more often than you might expect, even without 
multiple I/O channels.


cheers

andrew

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_migrator and an 8.3-compatible tsvector data type

2009-05-28 Thread Greg Smith


On Thu, 28 May 2009, Alvaro Herrera wrote:


There are so many caveats on pg_migrator (and things that need to be
done after the migration is complete) that one starts to wonder if
people is not better off just using parallel pg_restore.


If you've got something close to a 24x7 shop, where planned downtime can 
at worst in the small number of hours, the landscape breaks down something 
like this:


100GB:  Current dump/reload is probably fine
100GB-1TB:  Aggressive pg_dump and parallel pg_restore might work

1TB:  Nothing but pg_migrator will save you


If you're doing a version migration, you need to block writer activity 
from the time you start dumping to get a perfectly consistant snapshot, 
and large databases can't be made available to users until their indexes 
are rebuilt lest every client doing a select * from x where... kicks off 
a giant sequential scan.  Parallel restore only improves one section of 
that work.


There is a significant difference between something being difficult or 
having caveats and being it impossible, and right now there's no good 
answer for TB scale upgrades unless you've lucky enough to have an app 
that Slony can usefully replicate.  If there's somebody out there who has 
a bunch of tsvector data they need to get upgraded in-place, they should 
step up to help out here.  I think Bruce is right to move on to more 
popular issues until that happens.  This issue can all get sorted out in 
user space at some later time, right?


From Stefan's reported timings I'm not sure that pg_migrator is that 
much of a benefit in the first place ... unless copy mode can be made 
much faster.


Systems administrators have all sorts of techniques to speed up this sort 
of thing, it's a well solved problem.  The people who need in-place 
upgrade will figure out how to either make copy mode fast enough, or how 
to make link mode usable by filesystem tricks.  We don't need to worry 
about that for them.  I think it's premature and distracting to start 
talking about optimizing pg_migrator anyway, it's plenty fast for its 
intended audience.


--
* Greg Smith gsm...@gregsmith.com http://www.gregsmith.com Baltimore, MD

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

58 matches

Mail list logo