Re: [HACKERS] Including Snapshot Info with Indexes

2007-10-13 Thread Gokulakannan Somasundaram
On 10/12/07, Tom Lane [EMAIL PROTECTED] wrote:

 Andreas Joseph Krogh [EMAIL PROTECTED] writes:
  Will $SUBJECT make it possible for count(*) to use index? This is a much
  wanted feature.

 If you mean count(*) will become instantaneous, no it won't.  It would
 get faster, but probably not by more than a factor of 10 or so,
 corresponding to the size ratio between index and table.  Remember that
 the proposal is going to bloat indexes pretty badly: a simple index on
 an int4 column will double in size, more or less.  So that ratio will
 be much less favorable than it is today.


I accept that the indexes will be bigger in size for this approach. You
might need more disk-space  and you might need more memory to accomodate the
same amount of information. But i think disk costs and memory costs have
come down a lot, People can afford to buy more disk and memory. But the only
fact that remains true is that the disk, the last mechanical device is slow
in addressing Random I/Os. So this proposal is aimed  at occupying more
memory and disk space to reduce Random I/Os. Say, if we are accomodating 200
tuples per index page in today's index(for a typical configuration), and as
you said in the worst case (only one index field), we will be occupying 100
tuples per index page. But we would take away probably 100 random I/Os (say
with bitmap scan it reduces to 75). 1GB of memory is around $100 and 1GB of
disk is around $1. But one hour of Performance tuner would cost around $200
:)). So that's the trade-off for the enterprises, if they want to shift
between the two indexes.

Personally I think the bloat problem will doom this entire approach.
 The distributed costs of that will outweigh the speedups that can be
 achieved for specific queries.  The OP is free to try to prove this
 fear wrong, but no amount of discussion will constitute such a proof;
 only a testable implementation.

 regards, tom lane

 ---(end of broadcast)---
 TIP 4: Have you searched our list archives?

http://archives.postgresql.org



Re: [HACKERS] Another Idea: Try Including snapshot with TOAS (was: Including Snapshot Info with Indexes)

2007-10-13 Thread Simon Riggs
On Sat, 2007-10-13 at 11:12 +0530, Gokulakannan Somasundaram wrote:

 b) But i don't understand how storing the visibility info in TOAST
 table would help you. In that case you may need to update it for every
 delete/ update happening in the main table. Only thing, it would help
 is if someone wants to do a full table scan on TOAST table alone. May
 be Vacuum of TOAST tables can be done independently. But do you think
 it is worth the loss of performance in Update/Delete? 

No

-- 
  Simon Riggs
  2ndQuadrant  http://www.2ndQuadrant.com


---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [HACKERS] Seems we need a post-beta1 initdb already

2007-10-13 Thread Magnus Hagander
 It seems that we are faced with a choice of two evils:
 
 1.  Accept that there's an ABI break and increment libpq.so's major
 version number for 8.3.  This will be a PITA for packagers, who will
 have to carry a compatibility package to provide 8.2 libpq.so.
 
 2.  Renumber 8.3's encoding IDs to preserve compatibility with the
 8.2 values.  It turns out that we can do that, but we will have to
 force initdb because the contents of pg_database.encoding will change.
 
 I'm of the opinion that #2 is the lesser evil, but maybe I'm overly
 influenced by my Red Hat packaging responsibilities --- I'll personally
 have to spend time on a compatibility package if we go with #1.
 Other opinions out there?

#2 seems like a much better choice. A small inconvenience during beta is much 
better than one in the actual release.

People running the beta expects us to try not to force initdb, but also that 
we'll do it if we have to.

Might be worthwhile to try to get beta2 out as quickly as we can after the 
changes are in, to minimize the number of people who will need it?
 
 Also, if we do #2 it means that we have the option to resolve the
 contrib/txid mess by pushing txid into the core backend before beta2.
 Any votes pro or con on that?

Absolutely pro.

/Magnus 

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [HACKERS] Including Snapshot Info with Indexes

2007-10-13 Thread Gregory Stark

Gokulakannan Somasundaram [EMAIL PROTECTED] writes:

 I accept that the indexes will be bigger in size for this approach. You
 might need more disk-space  and you might need more memory to accomodate the
 same amount of information. But i think disk costs and memory costs have
 come down a lot, People can afford to buy more disk and memory. 

That's not how it works. We're not generally worried about people running out
of disk or memory resources. But no matter how cheap they get people will only
have what they have. We have to worry about running as fast as possible for a
*given* amount of RAM or disk.

Generally raising disk space usage results in a corresponding increase in run
time. So an index that takes twice as much space on disk will consume twice as
much time to consult as one that doesn't. You need to save enough time
elsewhere to make that up and then some to make it worthwhile.

I think we are pretty set on having the DSM for vacuuming purposes so you'll
also have to argue this approach will cover enough additional cases or be
better in some other way compared to using the DSM to be a win.

-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate


Re: [HACKERS] Including Snapshot Info with Indexes

2007-10-13 Thread Gokulakannan Somasundaram
Hi,
  I went through this article and it was good. Please have a look at it.

http://www.databasecolumn.com/2007/09/one-size-fits-all.html

This article was written by Michael Stonebraker, considered to be the
founder of our database. He has mentioned that the DBMS designed in 1970s
haven't changed according to the change that has happened in Hardware
landscape. The Vertica database(Monet is a open source version with the same
principle) makes use of the very same principle. Use more disk space, since
they are less costly and optimize the data warehousing.

Even otherwise we are recommending Indexes with snapshot as an option. We
are not replacing the current index scheme. So if someone feels that his
database should run on lesser disk space, let them create the normal index.
If he feels he can afford to have more redundant disk space, then he can
create indexes with snapshots. We are reducing random I/Os at the cost of
extra disk space. So definitely that's a good. But tech folks like us can
better decide on something based on experiments, as Tom has pointed out. So
let's see whether Indexes with snapshot is worth the trade-off in space.

Thanks,
Gokul.

On 10/13/07, Gregory Stark [EMAIL PROTECTED] wrote:


 Gokulakannan Somasundaram [EMAIL PROTECTED] writes:

  I accept that the indexes will be bigger in size for this approach. You
  might need more disk-space  and you might need more memory to accomodate
 the
  same amount of information. But i think disk costs and memory costs have
  come down a lot, People can afford to buy more disk and memory.

 That's not how it works. We're not generally worried about people running
 out
 of disk or memory resources. But no matter how cheap they get people will
 only
 have what they have. We have to worry about running as fast as possible
 for a
 *given* amount of RAM or disk.

 Generally raising disk space usage results in a corresponding increase in
 run
 time. So an index that takes twice as much space on disk will consume
 twice as
 much time to consult as one that doesn't. You need to save enough time
 elsewhere to make that up and then some to make it worthwhile.

 I think we are pretty set on having the DSM for vacuuming purposes so
 you'll
 also have to argue this approach will cover enough additional cases or be
 better in some other way compared to using the DSM to be a win.

 --
   Gregory Stark
   EnterpriseDB  http://www.enterprisedb.com



Re: [HACKERS] Including Snapshot Info with Indexes

2007-10-13 Thread Jonah H. Harris
On 10/13/07, Gokulakannan Somasundaram [EMAIL PROTECTED] wrote:
 I accept that the indexes will be bigger in size for this approach. You
 might need more disk-space  and you might need more memory to accomodate the
 same amount of information. But i think disk costs and memory costs have
 come down a lot, People can afford to buy more disk and memory. But the only
 fact that remains true is that the disk, the last mechanical device is slow
 in addressing Random I/Os. So this proposal is aimed  at occupying more
 memory and disk space to reduce Random I/Os. Say, if we are accomodating 200
 tuples per index page in today's index(for a typical configuration), and as
 you said in the worst case (only one index field), we will be occupying 100
 tuples per index page. But we would take away probably 100 random I/Os (say
 with bitmap scan it reduces to 75). 1GB of memory is around $100 and 1GB of
 disk is around $1. But one hour of Performance tuner would cost around $200
 :)). So that's the trade-off for the enterprises, if they want to shift
 between the two indexes.

I disagree.  Even with the latest on-disk size enhancements, Postgres
still has a substantially larger on-disk footprint than pretty much
every other database.  Like it or not, additional I/O costs are not
something that should just be dismissed.

Disregarding fundamental database issues (like increased I/O) leads me
to believe that you don't have much experience tuning databases.  As I
have a bit of experience adding visibility to the indexes, I stand
behind DSM.  From an analytical standpoint, and given Postgres' MVCC
design, it would be hard to beat a properly designed DSM in this area.

-- 
Jonah H. Harris, Sr. Software Architect | phone: 732.331.1324
EnterpriseDB Corporation| fax: 732.331.1301
499 Thornall Street, 2nd Floor  | [EMAIL PROTECTED]
Edison, NJ 08837| http://www.enterprisedb.com/

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate


Re: [HACKERS] Including Snapshot Info with Indexes

2007-10-13 Thread Jonah H. Harris
On 10/13/07, Gokulakannan Somasundaram [EMAIL PROTECTED] wrote:
 Hi,
   I went through this article and it was good. Please have a look at it.

 http://www.databasecolumn.com/2007/09/one-size-fits-all.html

 This article was written by Michael Stonebraker, considered to be the
 founder of our database. He has mentioned that the DBMS designed in 1970s
 haven't changed according to the change that has happened in Hardware
 landscape. The Vertica database(Monet is a open source version with the same
 principle) makes use of the very same principle. Use more disk space, since
 they are less costly and optimize the data warehousing.

Sorry, but quoting Stonebraker (who has a *financial* interest in his
statement), is pointless.  Similarly, you can't directly compare his
concepts to your case.

IMHO, you're trying to get buy-in.  In this group, unless you have a
patch that proves something, you're generally out of luck.  My
suggestion is to start coding.  You will find, as I did, that DSM is a
better solution.

-- 
Jonah H. Harris, Sr. Software Architect | phone: 732.331.1324
EnterpriseDB Corporation| fax: 732.331.1301
499 Thornall Street, 2nd Floor  | [EMAIL PROTECTED]
Edison, NJ 08837| http://www.enterprisedb.com/

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate


[HACKERS] Postgresql partitioning problems suggested solutions

2007-10-13 Thread SHARMILA JOTHIRAJAH
We are presently investigating to migrate large

(10 TB) databases from Oracle to PostgreSQL. We find the need

for table partitioning and the support of that is not good in

PgSQL. We think the problem might be important enough to reach

out to someone who might help us. Our idea is that a dedicated

partitioning function would not be necessary if:



- foreign keys could be defined to reference views (which in

 theory they should, according to C.J.Date.)



- the query optimizer would be able to eliminate union clauses

 from select, update and insert statements based on the

 partitioning key.



- an index could be built on a view (to make a global index

 accross partitions)



With these 3 requirements met, I think all we would need for a

partitioned table would be



CREATE VIEW(a, b, c) Foo AS

 SELECT a, b, c FROM Foo_1

UNION ALL

 SELECT a, b, c FROM Foo_2

UNION ALL

 SELECT a, b, c FROM Foo_3

;



say that (a, b) is the primary key of Foo and (a) is the

primary key of each partition and c is some other column

we would like to index, we could



CREATE INDEX Foo_c_idx ON Foo(c);



Now for



SELECT * FROM Foo WHERE b='2'



it should know to access only Foo_2, I suppose it could be done

with a rule, but that should work even if b='2' is implicitly

given (not just if b = constant) is stated explicitly.



Do you think that can be done without too much disturbance in

the pgsql sources?



For another thing, how difficult would it be to provide for a

complete insertupdate ability on views? Basically to make the

difference between a table and a view completely transparent?



There is another feature We have often wanted, and think that

can be done with such fully transparent views, i.e., ability

to define virtual fields, i.e., one could totally avoid

storing the partition key b (in above example) by:



CREATE VIEW(a, b, c) Foo AS

 SELECT a, '1' as b, c FROM Foo_1

UNION ALL

 SELECT a, '2' as b, c FROM Foo_2

UNION ALL

 SELECT a, '3' as b, c FROM Foo_3

;



We have often wanted to put long constant identifiers into

such virtual attributes that are only stored in the metadata

and not redundantly held on disk.


Thanks
Sharmila





   

Be a better Heartthrob. Get better relationship answers from someone who knows. 
Yahoo! Answers - Check it out. 
http://answers.yahoo.com/dir/?link=listsid=396545433

Re: [HACKERS] Postgresql partitioning problems suggested solutions

2007-10-13 Thread Gregory Stark
SHARMILA JOTHIRAJAH [EMAIL PROTECTED] writes:

 Now for
 SELECT * FROM Foo WHERE b='2'

 it should know to access only Foo_2, I suppose it could be done
 with a rule, but that should work even if b='2' is implicitly
 given (not just if b = constant) is stated explicitly.

This much already exists in Postgres. If you define constraints on the
partitions with an expression like (b=2) and turn on the constraint_exclusion
guc variable Postgres will do this.

The rest of your message lists a number of other interesting avenues but I'm
not sure it's the only way to go about accomplishing the things you need. I
would be interested to know more generally what problems you anticipate with
your application and what your needs are at a high level. The solutions you
propose are pretty specific and there have been other techniques proposed in
the past which may be more attractive than these specific solutions.

-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com

---(end of broadcast)---
TIP 6: explain analyze is your friend


[HACKERS] 8.3 full text search docs

2007-10-13 Thread andy
I have two doc updates I'd like to offer.  I see we have two example 
sections: creating rule-based dict's and creating parsers.  When I was 
starting I would have liked to see an example usage.


I'd like to offer: example usage and Upgrading.
This is my first draft, if anyone has suggestions I'd be interested in 
hearing them.  Also, I'm not sure where or who to send this to, so I 
hope -hacker is ok.


- Example Usage -
Staring a new project with Full Text Searching is easy.  There is 
nothing to install anymore, its all built in (in fact, don't install the 
contrib module tsearch2 because it will conflict with the tsearch2 built 
into the core).


We need to add a new column of type tsvector to the table you'd like to 
search.  In this example we'll use a table called notes.  If your table 
exists use:


alter table notes add searchvec tsvector;

If not use:

create table notes (
rowid integer,
note text,
searchvec tsvector
);

The searchvec column is what we will use for searching, so you probably 
want to create an index on it... from another place in the manual:


(http://www.postgresql.org/docs/8.3/static/textsearch-indexes.html)

GiST indexes are very good for dynamic data and fast if the number of 
unique words (lexemes) is under 100,000,

GIN indexes are best for static data with +100,000 lexemes.

For our example, I'll create a gist index:

create index notesvec on notes using gist(searchvec);

If you have existing data, we need to fill the searchvec column:

update notes set searchvec = to_tsvector(note);

After the update, any rows are inserted or updated will not have their 
searchvec column set automatically, for that we need to create a trigger:


create trigger notevecupdate
before insert or update on notes
for each row
execute procedure
  tsvector_update_trigger(searchvec, 'pg_catalog.english', note);


Some data:
insert into notes(rowid, note) values(1, 'this is a test');

insert into notes(rowid, note)
values(2, 'I do not like green eggs and ham');

insert into notes(rowid, note) values(3, 'the cat in the hat');

insert into notes(rowid, note)
values(4, 'rage against the dying of the light');

And now we can query it:

select * from notes where searchvec @@ to_tsquery('light');

or

select * from notes, to_tsquery('test') as q where searchvec @@ q;

writing it this way lets you reuse the tsquery q like this:

select note, ts_rank(searchvec, q)
from notes, to_tsquery('test') as q
where searchvec @@ q
order by ts_rank(searchvec, q);
- Example Usage -



- Upgrade from prior versions -

When tsearch2 was put into core, some functions and types were renamed, 
among other things.  A simple backup and restore will not work to 
migrate your database from versions below 8.3 to 8.3.


In general, the way to do it is backup all your data without the 
tsearch2 stuff, restore just the data, then recreate the tsearch2 stuff 
by hand.


(Its easier to think of this as an upgrade from tsearch2 to tsearch3, 
but without the whole renaming it to tsearch3 thing)


To make it a little easier, there is a way using the pg_restore to 
selective restore everything that is not in the old tsearch2.


First you must use pg_dump -Fc to backup your existing database.

Then we will create an item list of things we want to restore using this 
perl script.  It will strip out all the things that look like tsearch2, 
and return (to stdout) a list of things you should restore.


For example:

pg_dump -Fc -h oldserver -f ubberbase.bak ubberbase
perl maketoc.pl ubberbase.bak  toc
# now restore just the items in toc
pg_restore -Fc --create -d postgres --use-list toc -f ubberbse.bak

There is one thing that will fail, that's the trigger you used to update 
the tsvector column.  Its because the function tsearch2 was renamed to 
tsvector_update_trigger.  You'll need to recreate the trigger by hand.


Now that the structures and data are restored, you'll need to go through 
and redo the tsearch2 stuff by hand.


After you get the database fixed up, you'll also need to update your 
client programs (php, perl, etc).  For the most part, just renameing 
things (like rank to ts_rank) should be all that's required.



Oleg Bartunov has an incomplete list of items that have been renamed:

http://www.sai.msu.su/~megera/wiki/Tsearch2_83_changes

- Upgrade from prior versions -

#!/usr/bin/perl

# pg_restore must be in the path
#
# we use pg_restore to get the toc (requires the backup be made with pg_dump 
-Fc) from a backup
# create a newtoc with all the things that look like tsearch2 stuff stripped out
# so you can pg_restore -Fc -L newtoc ...

# syntax:  xtract dumpfile.bak  newtoc

use strict;

my %list = ('gtsq' = 1,
'gtsq_in' = 1,
'gtsq_out' = 1,
'gtsvector' = 1,
'gtsvector_in' = 1,
'gtsvector_out' = 1,
'tsquery' = 1,
'tsquery_in' = 1,
'tsquery_out' = 1,
'tsvector' = 1,
'tsvector_in' = 1,
'tsvector_out' = 1,
'statinfo' = 1,
'tokenout' = 1,
'tokentype' = 1,
'tsdebug' = 1,

[HACKERS] ABIs are hard

2007-10-13 Thread Gregory Stark

Hm, I suppose this is expected. I always hated the idea that libraries could
introduce new symbols without an soname bump but obviously nobody's going to
be interested in an soname bump today...

! 
/home/stark/src/local-HEAD/pgsql/src/test/regress/./tmp_check/install//usr/local/pgsql/bin/psql:
 symbol lookup error: 
/home/stark/src/local-HEAD/pgsql/src/test/regress/./tmp_check/install//usr/local/pgsql/bin/psql:
 undefined symbol: pg_valid_server_encoding_id

-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [HACKERS] 8.3 full text search docs

2007-10-13 Thread Albert Cervera i Areny
Andy,
note that documentation is discussed in the pgsql-docs list and patches 
usually are submitted to the pgsql-patches list. Nice to see both new 
sections, by the way.

A Diumenge 14 Octubre 2007, andy va escriure:
 I have two doc updates I'd like to offer.  I see we have two example
 sections: creating rule-based dict's and creating parsers.  When I was
 starting I would have liked to see an example usage.

 I'd like to offer: example usage and Upgrading.
 This is my first draft, if anyone has suggestions I'd be interested in
 hearing them.  Also, I'm not sure where or who to send this to, so I
 hope -hacker is ok.

 - Example Usage -
 Staring a new project with Full Text Searching is easy.  There is
 nothing to install anymore, its all built in (in fact, don't install the
 contrib module tsearch2 because it will conflict with the tsearch2 built
 into the core).

 We need to add a new column of type tsvector to the table you'd like to
 search.  In this example we'll use a table called notes.  If your table
 exists use:

 alter table notes add searchvec tsvector;

 If not use:

 create table notes (
   rowid integer,
   note text,
   searchvec tsvector
 );

 The searchvec column is what we will use for searching, so you probably
 want to create an index on it... from another place in the manual:

 (http://www.postgresql.org/docs/8.3/static/textsearch-indexes.html)

 GiST indexes are very good for dynamic data and fast if the number of
 unique words (lexemes) is under 100,000,
 GIN indexes are best for static data with +100,000 lexemes.

 For our example, I'll create a gist index:

 create index notesvec on notes using gist(searchvec);

 If you have existing data, we need to fill the searchvec column:

 update notes set searchvec = to_tsvector(note);

 After the update, any rows are inserted or updated will not have their
 searchvec column set automatically, for that we need to create a trigger:

 create trigger notevecupdate
 before insert or update on notes
 for each row
 execute procedure
tsvector_update_trigger(searchvec, 'pg_catalog.english', note);


 Some data:
 insert into notes(rowid, note) values(1, 'this is a test');

 insert into notes(rowid, note)
 values(2, 'I do not like green eggs and ham');

 insert into notes(rowid, note) values(3, 'the cat in the hat');

 insert into notes(rowid, note)
 values(4, 'rage against the dying of the light');

 And now we can query it:

 select * from notes where searchvec @@ to_tsquery('light');

 or

 select * from notes, to_tsquery('test') as q where searchvec @@ q;

 writing it this way lets you reuse the tsquery q like this:

 select note, ts_rank(searchvec, q)
 from notes, to_tsquery('test') as q
 where searchvec @@ q
 order by ts_rank(searchvec, q);
 - Example Usage -



 - Upgrade from prior versions -

 When tsearch2 was put into core, some functions and types were renamed,
 among other things.  A simple backup and restore will not work to
 migrate your database from versions below 8.3 to 8.3.

 In general, the way to do it is backup all your data without the
 tsearch2 stuff, restore just the data, then recreate the tsearch2 stuff
 by hand.

 (Its easier to think of this as an upgrade from tsearch2 to tsearch3,
 but without the whole renaming it to tsearch3 thing)

 To make it a little easier, there is a way using the pg_restore to
 selective restore everything that is not in the old tsearch2.

 First you must use pg_dump -Fc to backup your existing database.

 Then we will create an item list of things we want to restore using this
 perl script.  It will strip out all the things that look like tsearch2,
 and return (to stdout) a list of things you should restore.

 For example:

 pg_dump -Fc -h oldserver -f ubberbase.bak ubberbase
 perl maketoc.pl ubberbase.bak  toc
 # now restore just the items in toc
 pg_restore -Fc --create -d postgres --use-list toc -f ubberbse.bak

 There is one thing that will fail, that's the trigger you used to update
 the tsvector column.  Its because the function tsearch2 was renamed to
 tsvector_update_trigger.  You'll need to recreate the trigger by hand.

 Now that the structures and data are restored, you'll need to go through
 and redo the tsearch2 stuff by hand.

 After you get the database fixed up, you'll also need to update your
 client programs (php, perl, etc).  For the most part, just renameing
 things (like rank to ts_rank) should be all that's required.


 Oleg Bartunov has an incomplete list of items that have been renamed:

 http://www.sai.msu.su/~megera/wiki/Tsearch2_83_changes

 - Upgrade from prior versions -



-- 
Albert Cervera i Areny
http://www.NaN-tic.com

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [HACKERS] Seems we need a post-beta1 initdb already

2007-10-13 Thread Tom Lane
Magnus Hagander [EMAIL PROTECTED] writes:
 Might be worthwhile to try to get beta2 out as quickly as we can after the 
 changes are in, to minimize the number of people who will need it?

I'd like to get the locale/encoding issues straightened out, and also
get the contrib-tsearch-examples stuff in, before we think about beta2.
But hopefully that won't take very long.

regards, tom lane

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match