[PERFORM] \d output to a file

2004-12-15 Thread sarlav kumar
Hi All,

I would like to write the output of the \d command on all tables in a database to an output file. There are more than 200 tables in the database. I am aware of \o command to write the output to a file. But,it will be tough to do the \d for each table manually and write the output to a file. Is there a command/ way in which I can achieve this without having to do it for each table?
Any help in this regard would be really appreciated.

Thanks,
Saranya__Do You Yahoo!?Tired of spam?  Yahoo! Mail has the best spam protection around http://mail.yahoo.com 

Re: [PERFORM] \d output to a file

2004-12-15 Thread Grega Bremec
...and on Wed, Dec 15, 2004 at 06:38:22AM -0800, sarlav kumar used the keyboard:
 Hi All,
  
 I would like to write the output of the \d command on all tables in a 
 database to an output file. There are more than 200 tables in the database. I 
 am aware of \o command to write the output to a file. But, it will be tough 
 to do the \d for each table manually and write the output to a file. Is there 
 a command/ way in which I can achieve this without having to do it for each 
 table?
 Any help in this regard would be really appreciated.
  

Hello Sarlav.

You don't say which platform you're doing this on. If it's Windows, someone
else will have to advise you; if it's a UNIX-like platform though, the
following simple shell script should be helpful in achieving what you want:

---CUT-HERE---
#!/bin/bash
if [ -z $1 ]; then
echo Please specify a database to query.
exit 1
fi
DATABASE=$1
MYTABLES=`echo '\t\a\dt' | psql -q ${DATABASE} | cut -f 2 -d '|'`

for table in ${MYTABLES}; do
echo '\d '${table}
done | psql ${DATABASE}
---CUT-HERE---

You can store this script into a file called, for example, describe.sh and
invoke it like so:

$ ./describe.sh mydatabase  description.txt

It should then do what you want.

Should you have additional arguments to specify to psql, such as a host,
a username, a password and so on, it is easy to modify the script to do
that. Just supply those arguments in places where the psql command is
used.

Hope this helped,
-- 
Grega Bremec
gregab at p0f dot net


pgpTPZRwRibTV.pgp
Description: PGP signature


Re: [PERFORM] [NOVICE] \d output to a file

2004-12-15 Thread Tom Lane
Geoffrey [EMAIL PROTECTED] writes:
 sarlav kumar wrote:
 I would like to write the output of the \d command on all tables in a
 database to an output file.

 What is the OS?  On any UNIX variant you can do:
 echo '\d' | psql  outputfile

Or use \o:

regression=# \o zzz1
regression=# \d
regression=# \o
regression=# \d
 List of relations
 Schema | Name  | Type  |  Owner
+---+---+--
 public | pg_ts_cfg | table | postgres
 public | pg_ts_cfgmap  | table | postgres
 public | pg_ts_dict| table | postgres
 public | pg_ts_parser  | table | postgres
 public | t_test| table | postgres
 public | test_tsvector | table | postgres
(6 rows)

regression=# \q
$ cat zzz1
 List of relations
 Schema | Name  | Type  |  Owner
+---+---+--
 public | pg_ts_cfg | table | postgres
 public | pg_ts_cfgmap  | table | postgres
 public | pg_ts_dict| table | postgres
 public | pg_ts_parser  | table | postgres
 public | t_test| table | postgres
 public | test_tsvector | table | postgres
(6 rows)

$

regards, tom lane

---(end of broadcast)---
TIP 8: explain analyze is your friend


Re: [PERFORM] Query Optimization

2004-12-15 Thread Andrew Lazarus
sarlav kumar wrote:
Hi all,
 
Can someone please help me optimize this query? Is there a better way to 
write this query? I am generating a report of transactions ordered by 
time and with details of the sender and receiver etc.
 
SELECT distinct a.time::date ||'br'||substring(a.time::time::text,1,8) 
as Time,
CASE WHEN a.what = 0 THEN 'Money Transfer' WHEN a.what = 15 THEN 
'Purchase' WHEN a.what = 26 THEN 'Merchant Streamline' WHEN a.what = 13 
THEN 'Reversal' END  as Transaction_Type ,
c1.account_no as SenderAccount, c2.account_no as RecieverAccount,
b.country as SenderCountry, d.country as RecieverCountry,
b.firstname as SenderFirstName, b.lastname as SenderLastName,
d.firstname as ReceiverFirstName, d.lastname as ReceiverLastName,
a.status as status,
(select sum(td.amount * 0.01) from transaction_data td where td.data_id 
= a2.id and td.dir = 1 and td.uid = a.target_uid) as ReversedAmount,
(select sum(td.amount * 0.01) from transaction_data td where td.data_id 
= a2.id and td.dir = 0 and td.uid = a.uid ) as DepositedAmount, a.flags, 
(a.amount * 0.01) as Amount,
(a.fee * 0.01) as Fee
FROM data a, customerdata b, customerdata d, customer c1, customer c2 , 
participant p, data a2
WHERE p.id = a.partner_id AND (a.uid = c1.id) AND (a.target_uid = c2.id) 
and c1.id=b.uid and c2.id=d.uid
and a.confirmation is not null AND (a2.ref_id = a.id) and
((a2.what = 13) or (a2.what = 17) ) ORDER BY time desc ;
(query plan followed)
The expensive operation is the UNIQUE. Are you sure, in terms of 
business logic, that this is necessary? Is it actually possible to have 
duplicate transactions at the exact same time, and if so, would you 
really want to eliminate them?

As an aside, I prefer to have numeric constants like the 'what' field in 
a small lookup table of two columns (what_code, what_description); it's 
easier to extend and to document.
begin:vcard
fn:Andrew Lazarus
n:Lazarus;Andrew
org:Pillette Investment Management;Research and Development
adr;dom:;;3028 Fillmore;San Francisco;CA;94123
email;internet:[EMAIL PROTECTED]
title:Director
tel;work:800-366-0688
tel;fax:415-440-4093
url:http://www.pillette.com
version:2.1
end:vcard


---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send unregister YourEmailAddressHere to [EMAIL PROTECTED])


Re: [PERFORM] [NOVICE] \d output to a file

2004-12-15 Thread Ragnar Hafstað
On Wed, 2004-12-15 at 11:50 -0500, Tom Lane wrote:
 Geoffrey [EMAIL PROTECTED] writes:
  sarlav kumar wrote:
  I would like to write the output of the \d command on all tables in a
  database to an output file.
 
  What is the OS?  On any UNIX variant you can do:
  echo '\d' | psql  outputfile
 
 Or use \o:
 
 regression=# \o zzz1
 regression=# \d
or:
=# \d *
to get all tables as th OP wanted

 regression=# \o

gnari



---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


Re: [PERFORM] Partitioned table performance

2004-12-15 Thread Josh Berkus
Stacy,

 Thanks again for the reply.  So it sounds like the answer to my original
 question is that it's expected that the pseudo-partitioning would introduce
 a fairly significant amount of overhead.  Correct?

Correct.   For that matter, Oracle table partitioning introduces significant 
overhead, from what I've seen.  I don't think there's a way not to.

Generally, I counsel people that they only want to consider 
pseudo-partitioning if they have one axis on the table which is used in 90% 
or more of the queries against that table.

What would improve the situation significantly, and the utility of 
pseudo-partitioning, is the ability to have a single index span multiple 
partitions.   This would allow you to have a segmented index for the 
partitioned axis, yet still use an unsegmented index for the other columns.  
However, there's a *lot* of work to do to make that happen.

-- 
--Josh

Josh Berkus
Aglio Database Solutions
San Francisco

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send unregister YourEmailAddressHere to [EMAIL PROTECTED])


Re: [PERFORM] Partitioned table performance

2004-12-15 Thread Josh Berkus
Greg,

 Well Oracle has lots of partitioning intelligence pushed up to the planner
 to avoid overhead.

 If you have a query with something like WHERE date = '2004-01-01' and
 date is your partition key (even if it's a range) then Oracle will figure
 out which partition it will need at planning time.

Hmmm ... well, we're looking at making a spec for Postgres Table Partitioning.  
 
Maybe you could help?

 But I'm a bit puzzled. Why would Append have any significant cost? It's
 just taking the tuples from one plan node and returning them until they run
 out, then taking the tuples from another plan node. It should have no i/o
 cost and hardly any cpu cost. Where is the time going?

Beats me.  Tom?

 In my experience global indexes defeat the whole purpose of having the
 partitions. They make dropping and adding partitions expensive which was
 always the reason we wanted to partition something anyways.

Hmmm.Possibly, I was just thinking about the cost to partitioned tables 
when you do a selection *not* on the partitioned axis.   Also that currently 
we can't enforce UNIQUE constraints across partitions.

But maybe reducing the cost of Append is the answer to this.

 It is handy having a higher level interface to deal with partitioned
 tables. You can create a single local or segmented index and not have
 to manually deal with all the partitions as separate tables. But that's
 just syntactic sugar.

Right, and the easy part.

-- 
Josh Berkus
Aglio Database Solutions
San Francisco

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


Re: [PERFORM] Partitioned table performance

2004-12-15 Thread Greg Stark
Josh Berkus [EMAIL PROTECTED] writes:

 Stacy,
 
  Thanks again for the reply.  So it sounds like the answer to my original
  question is that it's expected that the pseudo-partitioning would introduce
  a fairly significant amount of overhead.  Correct?
 
 Correct.   For that matter, Oracle table partitioning introduces significant 
 overhead, from what I've seen.  I don't think there's a way not to.

Well Oracle has lots of partitioning intelligence pushed up to the planner to
avoid overhead.

If you have a query with something like WHERE date = '2004-01-01' and date
is your partition key (even if it's a range) then Oracle will figure out which
partition it will need at planning time.

Even if your query is something like WHERE date = ? then Oracle will still
recognize that it will only need a single partition at planning time, though
it has to decide which partition at execution time.

We didn't notice any run-time performance degradation when we went to
partitioned tables. Maybe we were so blinded by the joy they brought us on the
maintenance side though. I don't think we specifically checked for run-time
consequences.

But I'm a bit puzzled. Why would Append have any significant cost? It's just
taking the tuples from one plan node and returning them until they run out,
then taking the tuples from another plan node. It should have no i/o cost and
hardly any cpu cost. Where is the time going?

 What would improve the situation significantly, and the utility of 
 pseudo-partitioning, is the ability to have a single index span multiple 
 partitions.   This would allow you to have a segmented index for the 
 partitioned axis, yet still use an unsegmented index for the other columns.  
 However, there's a *lot* of work to do to make that happen.

In my experience global indexes defeat the whole purpose of having the
partitions. They make dropping and adding partitions expensive which was
always the reason we wanted to partition something anyways.

It is handy having a higher level interface to deal with partitioned tables.
You can create a single local or segmented index and not have to manually
deal with all the partitions as separate tables. But that's just syntactic
sugar.

-- 
greg


---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings


Re: [PERFORM] Partitioned table performance

2004-12-15 Thread Greg Stark

Josh Berkus [EMAIL PROTECTED] writes:

  But I'm a bit puzzled. Why would Append have any significant cost? It's
  just taking the tuples from one plan node and returning them until they run
  out, then taking the tuples from another plan node. It should have no i/o
  cost and hardly any cpu cost. Where is the time going?
 
 Beats me.  Tom?
 
  In my experience global indexes defeat the whole purpose of having the
  partitions. They make dropping and adding partitions expensive which was
  always the reason we wanted to partition something anyways.
 
 Hmmm.Possibly, I was just thinking about the cost to partitioned tables 
 when you do a selection *not* on the partitioned axis.   Also that currently 
 we can't enforce UNIQUE constraints across partitions.

Like I said though, we found global indexes defeated the whole purpose. That
meant no global UNIQUE constraints for us when we went to partitioned tables.
It gave the DBAs the willies but it really wasn't a big deal.

You can still do unique local indexes on a specific partition. So as long as
your partition key is in the primary key you can have a trustworthy primary
key.

And even if not, you usually find you're only loading data into only one
partition. In most applications it's pretty hard to get a record from two
different partitions with conflicting IDs and not hard to check for. You could
easily put a constraint saying that all PO numbers in the new fiscal year have
to be greater than the last PO number from last year, for example.

 But maybe reducing the cost of Append is the answer to this.

The problem with global indexes is that adding or removing an entire partition
becomes a large job. [Actually with Postgres MVCC I suppose removing might
not. But cleaning up would eventually be a large job, and the point remains
for adding a partition.]

Ideally adding and removing a partition should be a O(1) operation. No data
modification at all, purely catalog changes.

  It is handy having a higher level interface to deal with partitioned
  tables. You can create a single local or segmented index and not have
  to manually deal with all the partitions as separate tables. But that's
  just syntactic sugar.
 
 Right, and the easy part.

I think the hard part lies in the optimizer actually. The semantics of the
operations to manipulate partitions might be tricky to get right but the
coding should be straightforward. Having the optimizer be able to recognize
when it can prune partitions will be a lot of work.

-- 
greg


---(end of broadcast)---
TIP 8: explain analyze is your friend


Re: [PERFORM] indentifying the database in a Postgres log file.

2004-12-15 Thread Bruce Momjian
Theo Galanakis wrote:
 
 I have written a program that parses a syslog file, reading all the postgres
 transactions. I would like to know if there is a way for postgres to log
 also the specific database the sql statement originated from. 
 
 The only options available in the postgresql.conf are:
 #log_connections = false
 #log_duration = false
 #log_pid = false
 #log_statement = false
 #log_timestamp = false
 #log_hostname = false
 #log_source_port = false
 
 Is this possible? Or is there a smart work around.

In pre-8.0 the only way to do it is to log connections, grab the
database from there, and add the pid to join all log rows back to the
server row.  In 8.0 we have log_line_prefix that can display all
information.

-- 
  Bruce Momjian|  http://candle.pha.pa.us
  [EMAIL PROTECTED]   |  (610) 359-1001
  +  If your life is a hard drive, |  13 Roberts Road
  +  Christ can be your backup.|  Newtown Square, Pennsylvania 19073

---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


Re: [PERFORM] Partitioned table performance

2004-12-15 Thread Tom Lane
Greg Stark [EMAIL PROTECTED] writes:
 But I'm a bit puzzled. Why would Append have any significant cost? It's just
 taking the tuples from one plan node and returning them until they run out,
 then taking the tuples from another plan node. It should have no i/o cost and
 hardly any cpu cost. Where is the time going?

As best I can tell by profiling, the cost of the Append node per se is
indeed negligible --- no more than a couple percent of the runtime in
CVS tip for a test case similar to Stacy White's example.

It looks bad in EXPLAIN ANALYZE, but you have to realize that passing
the tuples up through the Append node doubles the instrumentation
overhead of EXPLAIN ANALYZE, which is pretty sizable already.  (If you
turn on \timing in psql and try the query itself vs. EXPLAIN ANALYZE,
the actual elapsed time is about double, at least for me.)

The other effect, which I hadn't expected, is that the seqscans
themselves actually slow down.  I get

regression=# explain analyze SELECT COUNT(*), MAX(bar1) FROM super_foo ;
   QUERY PLAN

 Aggregate  (cost=16414.32..16414.32 rows=1 width=4) (actual 
time=32313.980..32313.988 rows=1 loops=1)
   -  Append  (cost=0.00..13631.54 rows=556555 width=4) (actual 
time=0.232..21848.401 rows=524289 loops=1)
 -  Seq Scan on super_foo  (cost=0.00..0.00 rows=1 width=4) (actual 
time=0.020..0.020 rows=0 loops=1)
 -  Seq Scan on sub_foo1 super_foo  (cost=0.00..6815.77 rows=278277 
width=4) (actual time=0.187..6926.395 rows=262144 loops=1)
 -  Seq Scan on sub_foo2 super_foo  (cost=0.00..6815.77 rows=278277 
width=4) (actual time=0.168..7026.953 rows=262145 loops=1)
 Total runtime: 32314.993 ms
(6 rows)

regression=# explain analyze SELECT COUNT(*), MAX(bar1) FROM sub_foo1;
   QUERY PLAN

 Aggregate  (cost=8207.16..8207.16 rows=1 width=4) (actual 
time=9850.420..9850.428 rows=1 loops=1)
   -  Seq Scan on sub_foo1  (cost=0.00..6815.77 rows=278277 width=4) (actual 
time=0.202..4642.401 rows=262144 loops=1)
 Total runtime: 9851.423 ms
(3 rows)

Notice the actual times for the sub_foo1 seqscans.  That increase (when
counted for both input tables) almost exactly accounts for the
difference in non-EXPLAIN ANALYZE runtime.

After digging around, I find that the reason for the difference is that
the optimization to avoid a projection step (ExecProject) isn't applied
for scans of inheritance unions:

/*
 * Can't do it with inheritance cases either (mainly because Append
 * doesn't project).
 */
if (rel-reloptkind != RELOPT_BASEREL)
return false;

So if you were to try the example in a pre-7.4 PG, which didn't have
that optimization, you'd probably find that the speeds were just about
the same.  (I'm too lazy to verify this though.)

I looked briefly at what it would take to cover this case, and decided
that it's a nontrivial change, so it's too late to do something about it
for 8.0.  I think it's probably possible to fix it though, at least for
cases where the child tables have rowtypes identical to the parent.

regards, tom lane

---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [PERFORM] [NOVICE] \d output to a file

2004-12-15 Thread Geoffrey
sarlav kumar wrote:
Hi All,
I would like to write the output of the \d command on all tables in a
database to an output file. There are more than 200 tables in the
database. I am aware of \o command to write the output to a file.
But, it will be tough to do the \d for each table manually and write
the output to a file. Is there a command/ way in which I can achieve
this without having to do it for each table? Any help in this regard
would be really appreciated.
What is the OS?  On any UNIX variant you can do:
echo '\d' | psql  outputfile
But this will get you the system tables as well I think.
Alternately you could do something like:
for table in $(listoftables); do
echo '\d' | psql
done  outputfile
--
Until later, Geoffrey
---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?
  http://www.postgresql.org/docs/faqs/FAQ.html


[PERFORM] indentifying the database in a Postgres log file.

2004-12-15 Thread Theo Galanakis
Title: indentifying the database in a Postgres log file.





I have written a program that parses a syslog file, reading all the postgres transactions. I would like to know if there is a way for postgres to log also the specific database the sql statement originated from. 

The only options available in the postgresql.conf are:
#log_connections = false
#log_duration = false
#log_pid = false
#log_statement = false
#log_timestamp = false
#log_hostname = false
#log_source_port = false

Is this possible? Or is there a smart work around.


Regards,
 Theo






__
This email, including attachments, is intended only for the addressee
and may be confidential, privileged and subject to copyright.  If you
have received this email in error, please advise the sender and delete
it.  If you are not the intended recipient of this email, you must not
use, copy or disclose its content to anyone.  You must not copy or 
communicate to others content that is confidential or subject to 
copyright, unless you have the consent of the content owner.


Re: [PERFORM] Trying to create multi db query in one large queries

2004-12-15 Thread Christopher Browne
The world rejoiced as [EMAIL PROTECTED] (Josh Berkus) wrote:
 Hasnul,

 My question is if there is a query design that would query multiple
 server simultaneously.. would that improve the performance?

 Not without a vast amounts of infrastructure coding.  You're
 basically talking about what Oracle has spent the last 3 years and
 $100 million working on.

I recall a presentation from folks from Empress Software
http://www.empress.com/ back in about '94 or '95 offering this very
feature as part of the base functionality of their product.

I'm not sure it's quite fair to assess things as more or less
preposterous simply because they prove ludicrously expensive to
develop on a particular platform that happens to be targeted by even
more ludicrous quantities of development dollars...

On the other hand, it seems unlikely that improved performance would
be one of the merits of this approach...
-- 
let name=cbbrowne and tld=gmail.com in String.concat @ [name;tld];;
http://www3.sympatico.ca/cbbrowne/
Rules of the  Evil Overlord #92.  If I  ever talk to  the hero on the
phone, I  will  not taunt him.  Instead  I  will  say that  his dogged
perseverance has given me new insight on the futility  of my evil ways
and that if he leaves me alone for a few months of quiet contemplation
I will   likely  return to  the  path  of righteousness.   (Heroes are
incredibly gullible in this regard.) http://www.eviloverlord.com/

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send unregister YourEmailAddressHere to [EMAIL PROTECTED])