[GENERAL] datomic db as foreign data wrapper?

2014-03-15 Thread Reece Hart
Does anyone have experience accessing a datomic database using a foreign
data wrapper?

I'd be quite content with read-only and discarding the transaction data.

The real goal is to explore data in datomic using SQL.

Thanks,
Reece


[GENERAL] plans for plpython in RDS?

2014-02-18 Thread Reece Hart
Does anyone know if there are plans to support plpython in Amazon's RDS? I
(approximately) understand the issue, but I don't know if there's any
effort to remedy the problem or, rather, I shouldn't bother hoping.

Thanks,
Reece


Re: [GENERAL] automatically refresh all materialized views?

2014-02-18 Thread Reece Hart
On Fri, Feb 14, 2014 at 10:15 AM, Merlin Moncure  wrote:

> yeah -- you could do this with some gymnastics and some dynamic SQL.
> If I were lazy (check), I would just encode the order in the name of
> the view somehow.
>

Thanks. That's exactly what I do already. Apparently, I'm even lazier than
you!

In case anyone out there is looking for a convention, here's one that I
find useful:  _v for views intended for interactive queries, _dv (defining
views) for views that are too slow to use interactively, and _mv for the
materialized _dv views.


[GENERAL] automatically refresh all materialized views?

2014-02-14 Thread Reece Hart
Is there a way to automatically refresh all materialized views, preferably
in order of dependendency?

I should be able to chase pg_depends entries to create this ordering, right?

Thanks,
Reece


Re: [GENERAL] password-less access, without using pg_hba

2014-02-08 Thread Reece Hart
On Fri, Feb 7, 2014 at 8:27 AM, Steve Crawford <
scrawf...@pinpointresearch.com> wrote:

> Ignoring the scary security issues
>

One of the niceties of an RDS deployment is that I don't care much about
the security issues: The machine is not in our VPC, there's only public
data on it, and I presume that AWS has isolated the instance to their
satisfaction. From my point of view, it's an ideal way to make data public
and way better than running it ourselves.

If you can't access pg_hba.conf how about just sticking pgbouncer or
> similar in the middle and have your users connect through that?
>

I like the pgbouncer idea in principle, but it means more work for me that
I'm not willing to take on for this use.

Thanks everyone for the input. I'll stick with an advertised password.

-Reece


Re: [GENERAL] password-less access, without using pg_hba

2014-02-06 Thread Reece Hart
On Thu, Feb 6, 2014 at 6:37 PM, David Johnston  wrote:

> Doubtful.
>

Yeah, that's what I had assumed too.

The question is motivated entirely by what I think would make it easier for
users. In principle it's not difficult to give people a password (as I do
now), but in practice it's a barrier that I'd like to eliminate.

-Reece


[GENERAL] password-less access, without using pg_hba

2014-02-06 Thread Reece Hart
I'd like to provide public access, without a password, to a database hosted
on Amazon RDS.

I'm familiar with using pg_hba.conf to enable trust (no) authentication for
a user. pg_hba.conf is not available to DBAs on RDS.

Is there any other way to achieve password-less login in postgresql? I
tried alter user password NULL.

Thanks,
Reece


Re: [GENERAL] equivalent of mysql's SET type?

2011-03-10 Thread Reece Hart
On Wed, Mar 9, 2011 at 9:16 AM, Merlin Moncure  wrote:

> create type validation_flags as
> (
>  cluster bool,
>  freq bool
> );
>

Wow. That solution is nearly sexy, and far and away better than any solution
that I would have come up with. Thanks, Merlin!

-Reece


Re: [GENERAL] equivalent of mysql's SET type?

2011-03-09 Thread Reece Hart
On Tue, Mar 8, 2011 at 9:41 PM, John R Pierce  wrote:

> why not just have a set of booleans in the table for these individual
> on/off attributes?   wouldn't that be simplest?


I like that approach, but I think it's unlikely to fly in this specific case
for a couple reasons.

First, there are actually 8 factors (I edited for clarity... sorry about
that).

The original database is actively developed (released apx quarterly). I will
need an approach that minimizes my burden when they edit the set factors.

And, I'd like to be compatible with mysql syntax and semantics for sets. If
you hold your nose for a moment, you'll be able to read the following
without becoming ill: mysql uses comma delimited strings to assign and query
set types (but stored internally as bit vectors). So, one does
validation_status = 'cluster,freq' to set those bits or validation_status
like '%freq%' to query. Much to my chagrin, emulating this interface will
make migration easier. However, implementing this string interface to
set/get boolean columns is just too offensive to whatever modest design
sensibilities I have. (For more pleasure reading, see
http://dev.mysql.com/doc/refman/5.0/en/set.html. I particularly like the
*warning* issued when one tries to add a value that's not part of the set.)

-Reece


Re: [GENERAL] equivalent of mysql's SET type?

2011-03-08 Thread Reece Hart
On Tue, Mar 8, 2011 at 5:51 PM, Steve Atkins  wrote:

> tsvector or intarray might be other options, depending on how you're going
> to query it and whether those are really strings or more enum-ish.
>
> The more SQLy way of doing it would be to have that stored in a separate
> table, with a foreign key to this one.
>

Using a separate table is what I've got now in an experimental conversion.
This works well enough and might save me the bother of an enum update.
Perhaps I should just move on.

Thanks Steve.

-Reece


Re: [GENERAL] equivalent of mysql's SET type?

2011-03-08 Thread Reece Hart
On Tue, Mar 8, 2011 at 5:52 PM, Darren Duncan wrote:

> Try starting with an enum type to define the possible values:
>
>  CREATE TYPE Consequence_Type
>AS ENUM ('ESSENTIAL_SPLICE_SITE','STOP_GAINED',
>  'STOP_LOST','COMPLEX_INDEL','SPLICE_SITE');
>
> ... and then you could try using "ARRAY OF Consequence_Type" or some such.
>

Arrays occurred to me, but they don't give a set (i.e., a consequence type
can appear more than once) unless I write the code to dedupe the array.
However, if I were going to put that much effort into it, I might as well
represent the set directly and obviate the dedupe. Or, am I missing
something from your suggestion?

-Reece


[GENERAL] equivalent of mysql's SET type?

2011-03-08 Thread Reece Hart
I'm considering porting a MySQL database to PostgreSQL. That database uses
MySQL's SET type. Does anyone have advice about representing this type in
PostgreSQL?

MySQL DDL excerpt:
CREATE TABLE `transcript_variation` (
  `transcript_variation_id` int(10) unsigned NOT NULL AUTO_INCREMENT,
  `transcript_stable_id` varchar(128) NOT NULL,
   ...
  `consequence_type`
set('ESSENTIAL_SPLICE_SITE','STOP_GAINED','STOP_LOST','COMPLEX_INDEL','SPLICE_SITE')
) ENGINE=MyISAM AUTO_INCREMENT=174923212 DEFAULT CHARSET=latin1;


I'm considering implementing this as a new type based on a bit vector, but I
wonder if anyone has a more flexible general solution.

Thanks,
Reece


Re: [GENERAL] pg_dump issues

2010-02-02 Thread Reece Hart

On 02/02/2010 03:41 PM, DM wrote:

You were right, after zipping the dump file it came out to  6.9G


Also check out the "custom" pg_dump format (pg_dump -Fc ...), which is 
compressed. There are caveats regarding portability across PG versions 
with the custom format, which you may want to research.


-Reece


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Deleting 100 rows which meets certain criteria

2009-12-30 Thread Reece Hart
On Wed, 2009-12-30 at 09:11 -0800, shulkae wrote:
> I want to delete all rows exceeding 400 hours (10 days) and I want to
> limit deletion of only 100 records at a time.

If your table has a primary key or you can contrive one, then the 
DELETE FROM tbl WHERE pk in (SELECT pk FROM tbl WHERE hours>400)
construct sketched by others will work fine.

The much harder case is where you have a table that doesn't have a pk.
For instance, if you have a table like

name | hours
-+--
bob  | 400
bob  | 20

and you naively do
DELETE FROM tbl WHERE name in (SELECT name FROM tbl WHERE hours>400)
then you'll delete *all* bob rows, which I suspect is not what you want.

In cases without a pk, try something like
BEGIN;
CREATE TABLE tbl2 AS SELECT * FROM tbl WHERE hours>400;
TRUNCATE tbl;
INSERT INTO tbl SELECT * FROM tbl2;
COMMIT;

The effect is to select the rows you want and replace the contents of
tbl.  The above will work if you have indexes or views that depend on
tbl, but there are simpler variations on this theme if that is not a
concern.


And for a completely different approach, consider creating a view:
CREATE VIEW tbl_le_400 AS
SELECT * FROM tbl WHERE hours<=400;

Then use tbl_le_400 in lieu of tbl for your selects.


-Reece



-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Q: data modeling with inheritance

2009-07-03 Thread Reece Hart
On Fri, 2009-07-03 at 11:29 -0700, David Fetter wrote:

> I'm missing what you're doing here that foreign keys don't cover.
> Could you send along your DDL?

No DDL yet... I'm just in the thinking stages. FKs technically would do
it, but would become unwieldy. The intention was to have subclasses of
each of the variant, association, and phenotype tables. That leads to
the polymorphic key problem.


> Just generally, I've only found table inheritance useful for
> partitioning.  "Polymorphic" foreign key constraints can be handled
> other ways such as the one sketched out below.

That answers the question -- I do want polymorphic foreign keys. Dang.

Thanks,
Reece


Re: [GENERAL] Q: data modeling with inheritance

2009-07-03 Thread Reece Hart
On Thu, 2009-07-02 at 19:19 -0700, Nathan Boley wrote:

> Is an association, for example, an experiment that establishes a
> dependent relationship? So could there be multiple associations
> between variant and phenotype?

Exactly. You might have one group say that allele X "causes" some trait,
whereas another group might report a more precise increase in odds ratio
(for example) for the same genotype/phenotype.


> Is your concern that the number of joins will grow exponentially in
> the number of variants and phenotypes?

Not the number of joins, but the number of association subclasses. If I
have Nv variant subclasses and Np phenotype subclasses, I'd need Nv * Np
association subclasses. Multiply that by the number of association
subclasses. 


> So all variants would be stored in the variants table, all phenotypes are in
> the phenotypes table, and you join through association.

Thanks. I had considered that too and that's probably what I'll end up
using.


-Reece


[GENERAL] Q: data modeling with inheritance

2009-07-02 Thread Reece Hart
This is a question about data modeling with inheritance and a way to
circumvent the limitation that primary keys are not inherited.

I'm considering a project to model genomic variants and their associated
phenotypes. (Phenotype is a description of the observable trait, such as
disease or hair color.) There are many types of variation, many types of
phenotypes, and many types of association. By "type", I mean that they
have distinct structure (column names and inter-row dependencies).  The
abstract relations might look like this:

  variant  associationphenotype
  ---  ----
  variant_id - variant_id+--- phenotype_id
  genome_idphenotype_id -+short_descr
  strand   origin_id (i.e., who)  long_descr
  start_coord  ts (timestamp)
  stop_coord


There are several types of variants, such as insertions, deletions,
inversions, copy-number variants, single nucleotide polymorphisms,
translocations, and unknowable future genomic shenanigans.

Phenotypes might come from ontologies or controlled vocabularies that
need a graph structure, others domains might be free text.  Each is
probably best-served by a subclass table.

Associations might be quantitative or qualitative, and would come from
multiple origins.

The problem that arises is the combinatorial nature of the schema design
coupled with the lack of inherited primary keys.  In the current state
of PG, one must (I think) make joining tables (association subclasses)
for every combination of referenced foreign keys (variant and phenotype
subclasses).

So, how would you model this data?  Do I ditch inheritance?

Thanks,
Reece





-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Providing an alternative result when there is no result

2009-05-18 Thread Reece Hart
On Mon, 2009-05-18 at 15:13 -0400, Joshua Berry wrote:

> Is there an easy and efficient way to return a boolean false for a
> query that returns no result, and true for one that does return a
> result?


Presuming that you're not using the values in temp_table, I think you
should be using PERFORM * WHERE ...; and then IF FOUND ... ELSE ... END
IF;

See here:
http://www.postgresql.org/docs/8.3/static/plpgsql-statements.html
and also follow link to 38.5.5 .

-Reece


Re: [GENERAL] text column indexing in UTF-8 database

2009-03-12 Thread Reece Hart
On Thu, 2009-03-12 at 17:32 -0700, Steve Atkins wrote:

> If A=B then lower(A) = lower(B), and if A like B then lower(A) like  
> lower(B).
> 
> So, if nothing else, you could rewrite "where alias = 'Foo'" as
> "where lower(alias) = lower('Foo') and alias='Foo'" and take advantage
> of the lower() functional index.


Good idea. Thanks. The niggling remaining problem is that the DB is open
to a SQL-savvy audience and it'd be nice to avoid telling them to
casefold their predicates.

For regexps, lower(alias) ~* lower(regexp) won't work because extended
regexps might contain character classes (e.g., \S != \s). And, I guess
that alias ~* regexp requires a seqscan because the index isn't ordered
over ~* (right?). How about lower(alias) ~* regexp ? Is PG smart enough
to know that that ordering is well defined? Is my head on straight
there?

Thanks again,
Reece

-- 
Reece Hart, http://harts.net/reece/, GPG:0x25EC91A0


Re: [GENERAL] text column indexing in UTF-8 database

2009-03-12 Thread Reece Hart
On Thu, 2009-03-12 at 18:02 -0700, Jeff Davis wrote:
> It looks like an index using text_pattern_ops can be used for equality
> (see my test case below).

Odd. I can't reproduce your test case. I noticed that I edited out the
version and platform from my OP. (A: 8.3.6, x86_64 linux). You're on
8.3.6, or do you happen to be testing on the 8.4 branch?

I see this:

r...@rkh=> \i tpo-test.sql
version   

 PostgreSQL 8.3.6 on x86_64-unknown-linux-gnu, compiled by GCC gcc (GCC)
4.1.0 (SUSE Linux)

select name,setting from pg_settings where name~'locale|encoding';
  name   | setting 
-+-
 client_encoding | UTF8
 server_encoding | UTF8
(2 rows)

\!echo $LANG
en_US.UTF-8

create temp table a(t text);
CREATE TABLE

insert into a values('foo');
INSERT 0 1

set enable_seqscan='f';
SET

create index a_t_tpo on a (t text_pattern_ops);
CREATE INDEX

analyze a;
ANALYZE

explain analyze select * from a where t = 'foo';
QUERY
PLAN 
---
 Seq Scan on a  (cost=1.00..10001.01 rows=1 width=4) (actual
time=0.014..0.016 rows=1 loops=1)
   Filter: (t = 'foo'::text)
 Total runtime: 0.047 ms
(3 rows)

create index a_t on a (t);
CREATE INDEX

analyze a;
ANALYZE

explain analyze select * from a where t = 'foo';
  QUERY
PLAN   
---
 Index Scan using a_t on a  (cost=0.00..8.27 rows=1 width=4) (actual
time=0.061..0.062 rows=1 loops=1)
   Index Cond: (t = 'foo'::text)
 Total runtime: 0.099 ms
(3 rows)


script at http://harts.net/reece/tpo-test.sql


-- 
Reece Hart, http://harts.net/reece/, GPG:0x25EC91A0


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


[GENERAL] text column indexing in UTF-8 database

2009-03-12 Thread Reece Hart
Do I really need 4 indexes per column to handle the 4 combinations of
{equality,like/regexp} x {unfolded,casefolded} in a UTF-8 encoded
database?

I have a column that I'd like to be able to search with equality and
regexp (or like), optionally casefolded. The database is UTF-8 encoded.
The table and index defs are below.

Jeff Davis gave me a tip to use text_pattern_ops on indexes to speed up
regexp and like; that worked beautiful. But I discovered a caveat that
t_p_o apparently doesn't handle equality. Thus, I think I need distinct
indexes for the 4 cases above. Right?

Thanks,
Reece


r...@csb-dev=> \d pannotation
   Table "unison.pannotation"
 Column |   Type   | Modifiers  
+--+
 pannotation_id | integer  | not null default 
 origin_id  | integer  | not null
 alias  | text | not null
 descr  | text | 
 tax_id | integer  | 
 added  | timestamp with time zone | not null default timenow()
Indexes:
...
"pannotation_alias" btree (alias)
"pannotation_alias_cf" btree (lower(alias))
"pannotation_alias_cf_tpo" btree (lower(alias) text_pattern_ops)
"pannotation_alias_tpo" btree (alias text_pattern_ops)
...

where those indexes are defined as:
r...@csb-dev=> \x
r...@csb-dev=> select indexname,indexdef from pg_indexes
where indexname~'^pannotation_alias';
-[ RECORD 1 ]
indexname | pannotation_alias_cf_tpo
indexdef  | CREATE INDEX pannotation_alias_cf_tpo ON pannotation USING btree 
(lower(alias) text_pattern_ops)
-[ RECORD 2 ]-
indexname | pannotation_alias_tpo
indexdef  | CREATE INDEX pannotation_alias_tpo ON pannotation USING btree 
(alias text_pattern_ops)
-[ RECORD 3 ]-
indexname | pannotation_alias
indexdef  | CREATE INDEX pannotation_alias ON pannotation USING btree (alias)
-[ RECORD 4 ]-
indexname | pannotation_alias_cf
indexdef  | CREATE INDEX pannotation_alias_cf ON pannotation USING btree
(lower(alias))


-- 
Reece Hart, http://harts.net/reece/, GPG:0x25EC91A0


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Pet Peeves?

2009-01-31 Thread Reece Hart
My two:

* lack of PK/unique indexes on inherited tables (workarounds possible
but annoying)

* lack of auto-tuning or tuning tools (or perhaps my lack of awareness
of them?)

-Reece

-- 

Reece Hart, http://harts.net/reece/, GPG:0x25EC91A0


[GENERAL] Unison Protein Database manuscript

2009-01-02 Thread Reece Hart
Dear PostgreSQL friends-

I spoke at OSCON 2005 on the Unison Protein Database, which is built on
PostgreSQL. Unison's grown a lot in complexity, size, and (internal)
user base since then. The whole thing -- database, most content, web
interface, and command line tools -- are Open Source.

A paper on Unison was accepted to the Pacific Symposium on Biocomputing
and will appear shortly. I'll also be giving a talk on it at PSB on Jan
9.

Links:
http://unison-db.org/ -- the web site, docs, downloads
http://harts.net/reece/pubs/ -- the manuscript
http://psb.stanford.edu/ -- the PSB conference site

I'm open to all feedback and problem reports.

Cheers,
Reece

-- 
Reece Hart, http://harts.net/reece/, GPG:0x25EC91A0


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Stroring html form settings

2008-09-26 Thread Reece Hart
On Fri, 2008-09-26 at 09:23 -0700, Dianne Yumul wrote:

> If the structure of each report did not differ this would work great  
> but unfortunately it does.


Diane-

Would an EAV model work? I'm thinking something like .
For example:
formA,ckbox1,true
formA,input1,initial value
formB,textarea1,enter your long comment here

You could easily extend this to , or normalize it
as you see fit.

Another possibility is to use hstore hashes for form fields/values. See
http://www.postgresql.org/docs/8.3/static/hstore.html

Do either of those suffice?

-Reece

-- 
Reece Hart, http://harts.net/reece/, GPG:0x25EC91A0


Re: [GENERAL] PostgreSQL process architecture question.

2008-09-10 Thread Reece Hart
On Wed, 2008-09-10 at 00:02 -0600, Scott Marlowe wrote:

> Unless you have either a small data set or a very powerful RAID array,
> most the time you won't be CPU bound anyway.  But it would be nice to
> see some work come out to parallelize some of the work done in the
> back end.


I would have agreed with this several years ago, but many folks now buy
enough RAM to reduce the impact of IO. We're routinely CPU-bound on
small queries, and even on some large ones, on a 32GB / 16-core Opteron
box that serves a ~200GB database (on disk tables+indexes).

Does anyone know of research/references on query optimizers that include
parallelization as part of the cost estimate? I can envision how
PostgreSQL might parallelize a query plan that was optimized with an
assumption of one core. However, I wonder whether cpu and io costs are
sufficient for efficient parallel query optimization -- presumably
contention for memory (for parallel sorts, say) becomes critical.

-Reece

-- 
Reece Hart, http://harts.net/reece/, GPG:0x25EC91A0


Re: [GENERAL] partitioned table insert triggers

2008-06-22 Thread Reece Hart
On Mon, 2008-06-23 at 10:18 +0900, ITAGAKI Takahiro wrote:

> You need to do so in 8.3:
> EXECUTE 'INSERT INTO ' || ctable ||
> ' VALUES( (' || quote_literal(new) || '::' || ptable ||
> ').*)';
> 
> In 8.4, the next release, you will be able to use a better way:
> EXECUTE 'INSERT INTO ' || ctable || ' VALUES( ($1).* )' USING new;


Domo arigato, Itagaki-san. (I think I got that right.)

The 8.3 method works like a charm.

-Reece

-- 
Reece Hart, http://harts.net/reece/, GPG:0x25EC91A0


[GENERAL] partitioned table insert triggers

2008-06-21 Thread Reece Hart
I'm trying to create a date-partitioned version of the postgres_log
table that is described here:
http://www.postgresql.org/docs/current/static/runtime-config-logging.html


In lieu of hardcoding the partition table names in the trigger, as shown
in the table partitioning docs, I wanted to dynamically compute the
table name. I tried:

CREATE OR REPLACE FUNCTION postgres_log_insert_trigger()
RETURNS TRIGGER AS $$
DECLARE
c text;
ptable text := 'postgres_log';
ctable text := ptable || '_' ||
to_char(NEW.session_start_time, '_MM');
BEGIN
c := 'INSERT INTO ' || ctable || ' VALUES (NEW.*)';
EXECUTE c;
RETURN NULL;
END;
$$
LANGUAGE plpgsql;

I get:
ERROR:  NEW used in query that is not in a rule
CONTEXT:  SQL statement "INSERT INTO
measurement_postgres_log_2008_05 VALUES (NEW.*)"
PL/pgSQL function "postgres_log_insert_trigger" line 7 at
EXECUTE statement

I didn't really think that would work exactly for the reason given, but
I didn't know what else to try.

Now that my only idea is gone, the question for y'all is: how?

(Okay, I actually do have one more idea: construct the insert values
list with quote_literal. I'm hoping there's a better way.) 

Thanks,
Reece

-- 
Reece Hart, http://harts.net/reece/, GPG:0x25EC91A0


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Annoying messages when copy sql code to psql terminal

2008-06-06 Thread Reece Hart
On Thu, 2008-06-05 at 22:28 -0400, Merlin Moncure wrote:

> As others have noted, you have tabs in your sql source.  I'd advise if
> possible, not to use the tab character in sql, for this and other
> reasons.


Tabs in SQL are a problem only if you copy-paste. If your editor and
psql can see the same files (i.e., are running on the same machine or
you're saving to a network fs), it's more reliable to save the file and
read it in psql with \i or with psql -f. (\i and -f won't try to do tab
completion, of course.)

-Reece

-- 
Reece Hart, http://harts.net/reece/, GPG:0x25EC91A0


Re: [GENERAL] Copying data from a table to another database

2008-05-25 Thread Reece Hart
On Sun, 2008-05-25 at 12:24 +0100, Pedro Doria Meunier wrote:

> Now I've migrated the entire 'shebang' to another server and need to
> update the history table on the new server with data from the old
> server
> for every different record.


I'm not sure I understand this completely. Would COPY work? For example:

$ psql -h  -d  -c 'copy . to stdout' |
psql -h  -d  -c 'copy . from stdin'

This presumes that a table with the same structure already exists in the
new database.

If you have made schema changes, or you need only a subset of rows, you
can specify an appropriate select statement to the copy command on "old"
database. See documentation for COPY.

Also consider a statement like this:
=> INSERT INTO newtable SELECT * FROM oldtable EXCEPT SELECT * FROM
newtable;
I'm assuming that you populate a temporary oldtable in the new db
(perhaps using the COPY method above). This won't work if there are
intentional identical rows in your table.

The pipe assumes a Unix-ish box.

-Reece

-- 
Reece Hart, http://harts.net/reece/, GPG:0x25EC91A0


Re: [GENERAL] Short-circuiting FK check for a newly-added field

2008-05-22 Thread Reece Hart
On Tue, May 20, 2008 at 02:25:15PM -0400, Decibel! wrote:
> I need to add a field to a fairly large table. In the same alter statement 
> I'd like to add a FK constraint on that new field. Is there any way to 
> avoid the check of the table that the database is doing right now? The 
> check is pointless because the newly added field is nothing but NULLs.
>   
I don't see the problem. FK constraints don't fire on NULL values. I
think you might be imagining that a problem exists when it doesn't.

If the FK column is created as NOT NULL, there is a problem. The best
way to handle this case is to add the column (allowing nulls), populate
the columns, then alter the column to make it NOT NULL. You can wrap
that all in a transaction if you like.

> This is version 8.1.mumble.
>   
You can get the version using select version().


-Reece

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] how to modify a view

2008-05-20 Thread Reece Hart
On Wed, 2008-05-21 at 02:43 +0930, Shane Ambler wrote:

> Yes - The only way to do it is DROP VIEW... then CREATE VIEW... - the
> catch is any other views that depend on the view you wish to drop.


Tip: It's sometimes useful to rename an existing view or table to
deprecate it. (I use names like _someview_v_20080520.) When you rename a
table or view, the definition of any views which depend on them will be
updated appropriately. This is particularly handy when you have several
layers of views and don't want to break current behavior.

-Reece

-- 
Reece Hart, http://harts.net/reece/, GPG:0x25EC91A0


Re: [GENERAL] Link tables from different databases

2008-05-18 Thread Reece Hart
> I want it so that every operation on db1.table1 (ie. INSERT, UPDATE, DELETE) 
> also happens to db2.table2, but similarly I want it that application can do 
> those operations on db2.table2 but actually it does it on db1.table1 behind 
> the scene.

If the tables could be in the same database but in different schemas,
you should be able to arrange a view in schema2 (called "table2") with
rules that operate on table1 in schema1. See the manual for a discuss of
rules and views.

If you really want these tables in separate *databases* (same machine or
not), then you probably want dblink, a stiff drink, and a long-hard
think about the prudence of such a design.

-Reece

-- 
Reece Hart, http://harts.net/reece/, GPG:0x25EC91A0


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Unloading a table consistently

2008-05-13 Thread Reece Hart
On Sat, 2008-05-03 at 09:11 -0700, Christophe wrote:

> I will have a log table which, once a day or so, is copied to a file  
> (for movement to a data warehouse), and the log table emptied.

...

>  Is there a better way to handle this kind of thing that I'm  
> missing?


Anyone care to comment about whether a table partition might be better
here? Could an insert rule on a table partition automatically create the
inherited table, if needed, and insert there?

Thanks,
Reece

-- 
Reece Hart, http://harts.net/reece/, GPG:0x25EC91A0


Re: [GENERAL] Scripting function definitions as SQL?

2008-05-11 Thread Reece Hart
On Sun, 2008-05-11 at 06:12 -0700, Postgres User wrote:
> Has anyone written a function that scripts out all the functions in a
> database as full SQL statements (Create Function.)

You could pg_dump the schema in the "custom" format (-Fc), then call
pg_restore with -l to get the TOC, grep the TOC for functions, and feed
that back into pg_restore with -L. It sounds like a lot, but it's pretty
easy in practice, like so:

$ sudo -u postgres pg_dump -Fc -s mydb >mydb.pgdfc
$ pg_restore -l mydb.pgdfc >mydb.toc
$ grep -E '^[0-9]+; [0-9]+ [0-9]+ FUNCTION' mydb.toc >mydb-fx.toc
$ pg_restore -L mydb-fx.toc mydb.pgdfc 

The output of pg_restore is sql.

This technique is extremely useful for other kinds of schema elements as
well.

-Reece

-- 
Reece Hart, http://harts.net/reece/, GPG:0x25EC91A0


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] tsearch2 and hyphenated terms

2008-04-11 Thread Reece Hart
On Fri, 2008-04-11 at 22:07 +0400, Oleg Bartunov wrote:
> We have the same problem with names in astronomy, so we implemented
> dict_regex  http://vo.astronet.ru/arxiv/dict_regex.html
> Check it out !

Oleg-

This gets me a lot closer. Thank you.  I have two remaining problems.


The first problem is that 'bcl-w' and 'bcl-2' are parsed differently,
like so:

[EMAIL PROTECTED]> select * from ts_debug('english','bcl-w');
  alias  |   description   | token |  dictionaries  |  
dictionary  | lexemes 
-+-+---++--+-
 asciihword  | Hyphenated word, all ASCII  | bcl-w | {english_stem} | 
english_stem | {bcl-w}
 hword_asciipart | Hyphenated word part, all ASCII | bcl   | {english_stem} | 
english_stem | {bcl}
 blank   | Space symbols   | - | {} |   
   | 
 hword_asciipart | Hyphenated word part, all ASCII | w | {english_stem} | 
english_stem | {w}

[EMAIL PROTECTED]> select * from ts_debug('english','bcl-2');
   alias   |   description   | token |  dictionaries  |  dictionary  | lexemes 
---+-+---++--+-
 asciiword | Word, all ASCII | bcl   | {english_stem} | english_stem | {bcl}
 int   | Signed integer  | -2| {simple}   | simple   | {-2}

One option would be to write a new parser/modify wparser_def.c to make
the InHyphyenWordFirst accept p_isdigit or p_isalnum on the first
character (I think I got this right).  This would achieve Tom's initial
inkling that Bcl-2 might be parsed as a numhword and (to me) it seems
more congruent with asciihword class.

Perhaps a more broadly useful modification is for the lexer to also emit
whitespace-delimited tokens (period). asciihword almost does the trick,
but it too requires a post-hyphen alphabetic character.



The second problem is with quantifiers on PCRE's regexps.  I initially
implemented a dict_regex with a conf line like
(\w+)-(\w{1,2}) $1$2
I can make simpler expressions work (eg., (bcl)-(\w)). I think it must
be related to the README caveat regarding PCRE partial matching mode,
which I didn't understand initially.

However, I don't see that it's possible to write a general regexp like
the one I initially tried. Do you have any suggestions?


Thanks again. I'm very impressed with tsearch2.

-Reece

-- 
Reece Hart, http://harts.net/reece/, GPG:0x25EC91A0


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


[GENERAL] tsearch2 and hyphenated terms

2008-04-11 Thread Reece Hart
I'd like to use tsearch2 to index protein and gene names. Unfortunately,
such names are written inconsistently and sometimes with hyphens. For
example, MCL-1 and MCL1 are semantically equivalent but with the default
parser and to_tsvector, I see this:

[EMAIL PROTECTED]> select to_tsvector('MCL1 MCL-1');
   to_tsvector   
-
 '-1':3 'mcl':2 'mcl1':1

For the purposes of indexing these names, I suspect I'd get the majority
of cases by removing a hyphen when it's followed by 1 or 2 chars from
[a-zA-Z0-9]. Does that require a custom parser?

Thanks,
Reece

-- 
Reece Hart, http://harts.net/reece/, GPG:0x25EC91A0


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Survey: renaming/removing script binaries (createdb, createuser...)

2008-03-29 Thread Reece Hart
Although I, too, am not fond of the current command names, I'm not aware
of a naming conflict that is serious enough to warrant renaming. Do we
have even one example of one significant naming conflict? 

Renaming executable seems likely to create much more confusion that it
will solve. I loathe the idea of confounding years of newsgroup/mailing
list wisdom, especially for newcomers. The specter of having to ask
questions like "Which executable naming scheme are you using?" is also
unpalatable. (Remember, this is the group that will engage in
multi-message threads about postgres, postgresql, PostgreSQL, PG,
postgre, etc. We'll need a new list for the fallout of renaming
executables.)

Perhaps one way to sidestep this issue is for you (Zdeněk) to create a
pgFoundry project that provides the command line interface you propose
by wrapping the existing postgresql binaries. This would provide the
more rational names you seek. Furthermore, it could be used to provide
the *sole* interface to postgresql if postgresql is configured with a
--prefix that sequesters it from system paths. In the meritocracy of
open source, your interface might become the official interface later
on.

I completely "get" your motivation and I appreciate your effort to tidy
up. However, renaming is going to create problems, not solve them. (And,
it's not even clear that there is a problem.)

-Reece


P.S. My responses:
1. a (do nothing)  (or b if there really must be a change)
2. a (multiple times daily)
3. consistent with 1
4. c
Blue. No, red. Aaah...

-- 
Reece Hart, http://harts.net/reece/, GPG:0x25EC91A0


Re: [GENERAL] Postgresql partitioning

2008-03-22 Thread Reece Hart
On Fri, 2008-03-21 at 23:00 -0400, Ram Ravichandran wrote:
> I assume that primary key
> uniqueness is not tested across tables. Right?

That's correct.  It's on the TODOs:
Inheritance 
  * Allow inherited tables to inherit indexes, UNIQUE constraints,
and primary/foreign keys
(at http://www.postgresql.org/docs/faqs.TODO.html )


I wonder whether you might be able to achieve the benefits of
partitioning and the simplicity of a single-table updates by using a
view with an update rule. This would allow you to embed the logic for
moving rows between partitions when the partition criterion changes into
the database. I've not done this myself, so I'm, um, not speaking from
experience.

-Reece

-- 
Reece Hart, http://harts.net/reece/, GPG:0x25EC91A0


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] postgre vs MySQL

2008-03-12 Thread Reece Hart
On Tue, 2008-03-11 at 06:47 -0700, rrahul wrote:

> Any major clients of the two.
> You can add you own points too.


Perhaps someone can comment on current MySQL backups procedures.  I
believe that MySQL used to (still does?) require shutdown to be backed
up. I don't know whether this was true for all engines or whether it
might have been fixed. Having to shutdown a database to make a backup is
a non-starter for anything that other than a toy (or read-only)
databases.

-Reece

-- 
Reece Hart, http://harts.net/reece/, GPG:0x25EC91A0


[GENERAL] executing query results withing psql

2008-03-05 Thread Reece Hart
Is there a way to generate and directly execute query results within
psql? Essentially, I want eval within psql. 

For example, a lot of us generate DDL commands using queries, like this:
=> SELECT 'GRANT SELECT ON VIEW "'||table_schema||'.'||
table_name||'" TO budget_view;'
FROM information_schema.views
WHERE table_schema='budget';

To execute such results, I do one these:
* in a shell, pipe between psqls (ie, psql -Atc 'select ...' | psql)
* within psql, execute and copy-paste
* within psql, use \g file, then \i file
* within psql, use \g |psql

I like the \g + \i option most for consistency and robustness (don't
have to specify the db connection or worry about connecting to the wrong
db), but what I really want is a hypothetical \eval to execute the query
and then execute the results.

Any such thing?

Thanks,
Reece

-- 
Reece Hart, http://harts.net/reece/, GPG:0x25EC91A0


---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [GENERAL] dynamic crosstab

2008-02-13 Thread Reece Hart
On Wed, 2008-02-13 at 14:04 +0100, Tino Wildenhain wrote:
> Well after all you want a CSV not a table. You could shortcut this
> with a generic query which creates array out of your "columns"
> and join them to a CSV line. This would just be outputted as
> one single column from database.

Depending on your use case, this may be a better way:

In psql:
=> \copy (select col1,col2,col3 from data) TO data.csv CSV HEADER

or on the command line:
$ psql -c '\copy (select col1,col2,col3 from data) TO data.csv CSV 
HEADER'

Strictly speaking, the CSV formatting isn't being done in the database
but rather by psql.

-Reece

-- 
Reece Hart, http://harts.net/reece/, GPG:0x25EC91A0


---(end of broadcast)---
TIP 6: explain analyze is your friend


[GENERAL] Good F/OSS license summary

2008-02-01 Thread Reece Hart
Folks-

A recent thread
( http://archives.postgresql.org/pgsql-general/2008-01/msg01595.php )
suggests that there's a lot of interest and knowledge about F/OSS
licenses on this list. So...

Does anyone know of an authoritative, complete, and current license
comparison? After a lot of trolling, I've been able to satisfy one or
two of those criteria, but not all.


-*-mode:opinion-*-
It's disappointing that we as a community have yet to converge on just a
few licenses that differ only by the few characteristics that *really*
matter (e.g., virality, commercializability, code
reciprocation/share-alike, etc). The glut of license options and license
compatibility issues create enormous confusion and even some unfortunate
internecine FUD -- never mind the real uncertainty caused by the lack of
case law in this area. I long for a Creative Commons-like family of
licenses for code. 


Thanks,
Reece

-- 
Reece Hart, http://harts.net/reece/, GPG:0x25EC91A0


Re: [GENERAL] Oracle Analytical Functions

2008-01-30 Thread Reece Hart
create table data (
client_id integer,
datetime timestamp not null
);
create index data_client_id on data(client_id);

copy data from STDIN DELIMITER ',';
122,2007-05-01 12:00:00
122,2007-05-01 12:01:00
455,2007-05-01 12:02:00
455,2007-05-01 12:03:00
455,2007-05-01 12:08:00
299,2007-05-01 12:10:00
299,2007-05-01 12:34:00
\.

CREATE OR REPLACE FUNCTION visits (
OUT client_id INTEGER,
OUT datetime_1 TIMESTAMP,
OUT datetime_2 TIMESTAMP,
OUT dur INTERVAL )
RETURNS SETOF RECORD
LANGUAGE plpgsql
AS $_$
DECLARE
rp data%ROWTYPE;-- previous data table record
r data%ROWTYPE; -- data table record, more recent than
rp
BEGIN
rp = (NULL,NULL);
FOR r IN SELECT * FROM data ORDER BY client_id,datetime LOOP
   IF rp.client_id = r.client_id THEN
  client_id = r.client_id;
  datetime_1 = r.datetime;
  datetime_2 = rp.datetime;
  dur = r.datetime-rp.datetime;
  RETURN NEXT;
END IF;
rp = r;
END LOOP;
RETURN;
END;
$_$;


[EMAIL PROTECTED]> select * from visits() order by client_id,datetime_1;
 client_id | datetime_1  | datetime_2  |   dur
---+-+-+--
   122 | 2007-05-01 12:01:00 | 2007-05-01 12:00:00 | 00:01:00
   299 | 2007-05-01 12:34:00 | 2007-05-01 12:10:00 | 00:24:00
   455 | 2007-05-01 12:03:00 | 2007-05-01 12:02:00 | 00:01:00
   455 | 2007-05-01 12:08:00 | 2007-05-01 12:03:00 | 00:05:00
(4 rows)


-Reece

-- 
Reece Hart, http://harts.net/reece/, GPG:0x25EC91A0
create table data (
	client_id integer,
datetime timestamp not null
);
create index data_client_id on data(client_id);

copy data from STDIN DELIMITER ',';
122,2007-05-01 12:00:00
122,2007-05-01 12:01:00
455,2007-05-01 12:02:00
455,2007-05-01 12:03:00
455,2007-05-01 12:08:00
299,2007-05-01 12:10:00
299,2007-05-01 12:34:00
\.


CREATE OR REPLACE FUNCTION visits (
OUT client_id INTEGER,
OUT datetime_1 TIMESTAMP,
OUT datetime_2 TIMESTAMP,
OUT dur INTERVAL )
RETURNS SETOF RECORD
LANGUAGE plpgsql
AS $_$
DECLARE
rp data%ROWTYPE;-- previous data table record
r data%ROWTYPE; -- data table record, more recent than rp
BEGIN
rp = (NULL,NULL);
FOR r IN SELECT * FROM data ORDER BY client_id,datetime LOOP
   IF rp.client_id = r.client_id THEN
  client_id = r.client_id;
  datetime_1 = r.datetime;
  datetime_2 = rp.datetime;
  dur = r.datetime-rp.datetime;
  RETURN NEXT;
END IF;
rp = r;
END LOOP;
RETURN;
END;
$_$;


---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [GENERAL] OT - pg perl DBI question

2008-01-29 Thread Reece Hart
On Tue, 2008-01-29 at 13:28 -0500, Geoffrey wrote:

> Are you speaking from personal experience, or just of the lack of 
> maintenance?


I'm guessing lack of maintenance... the SFPUG hasn't maintained David
for a very long time. We think he lives at Casa Donde, but no one is
really sure where that is.

-- 
Reece Hart, http://harts.net/reece/, GPG:0x25EC91A0


Re: [GENERAL] (un)grouping question

2008-01-21 Thread Reece Hart
On Mon, 2008-01-21 at 22:05 +0200, Andrei Kovalevski wrote:
> How can you garantee that "somevalue + random()" won't duplicate other
> unique values in this column? ;)

Like the inane memo from the boss says, "expect the unexpected" and
"tell me all the unpredictable issues that will happen".

If uid is unique then something like
=> select uid,somevalue,somevalue||'-'||uid from mytable
is much better. 

-Reece

-- 
Reece Hart, http://harts.net/reece/, GPG:0x25EC91A0


---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [GENERAL] (un)grouping question

2008-01-21 Thread Reece Hart
On Mon, 2008-01-21 at 12:36 -0500, Rhys Stewart wrote:

> uid|somevalue
> 
> 1|11
> 2|44
> 3|31
> 4|44
> 5|71
> 6|33
> 7|33
> 8|44
> 9|14
> 
> would like to remove the duplicate values in the column somevalue.
> doing this by just adding a random number  is perfectly fine, however
> i want to retain at least one of the original values of somevalue. Any
> ideas how to do this in in a query? 


There's certainly no need for a random number hack. Instead, use a query
like 'select distinct on (somevalue) * from mytable;' .
Although DISTINCT is standard, the DISTINCT ON (cols) variant is a
PostgreSQL-ism.

See
http://www.postgresql.org/docs/8.2/static/queries-select-lists.html#QUERIES-DISTINCT
 .


-Reece, your heterographic brother

-- 
Reece Hart, http://harts.net/reece/, GPG:0x25EC91A0


Re: [GENERAL] Deadlock in Postgres 8.2

2008-01-21 Thread Reece Hart
On Sun, 2008-01-20 at 16:54 -0500, Greg Smith wrote:
> You can look at who has locks on what using pg_locks, see 
> http://www.postgresql.org/docs/8.2/static/view-pg-locks.html

I use the following view to ferret out locking problems. Briefly, look
for rows with state='WAIT', then find RUN rows for the same
 that have an exclusive lock.  

-Reece


CREATE OR REPLACE VIEW pgutils.locks AS
 SELECT l.pid, d.datname AS "database", n.nspname AS "schema", c.relname
AS relation, l.locktype, l."mode", 
CASE l."granted"
WHEN true THEN 'RUN'::text
ELSE 'WAIT'::text
END AS state, a.usename, a.current_query, to_char(now() -
a.query_start, 'HH24:MI:SS'::text) AS duration
   FROM pg_locks l
   JOIN pg_database d ON l."database" = d.oid
   JOIN pg_class c ON l.relation = c.oid
   JOIN pg_namespace n ON c.relnamespace = n.oid
   JOIN pg_stat_activity a ON l.pid = a.procpid
  ORDER BY l.pid, d.datname, n.nspname, c.relname, l."granted";

eg=> select * from pgutils.locks ;
  pid  | database |   schema   |  relation  | locktype |  
mode   | state | usename | current_query | duration 
---+--+++--+-+---+-+---+--
 15716 | csb-dev  | pg_catalog | pg_class   | relation | 
AccessShareLock | RUN   | rkh | select * from pgutils.locks ; | 00:00:00
 15716 | csb-dev  | pg_catalog | pg_class_oid_index | relation | 
AccessShareLock | RUN   | rkh | select * from pgutils.locks ; | 00:00:00
 15716 | csb-dev  | pg_catalog | pg_class_relname_nsp_index | relation | 
AccessShareLock | RUN   | rkh | select * from pgutils.locks ; | 00:00:00
...


-- 
Reece Hart, http://harts.net/reece/, GPG:0x25EC91A0


---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org/


Re: [GENERAL] double free corruption?

2007-12-31 Thread Reece Hart
On Fri, 2007-12-28 at 12:33 -0300, marcelo Cortez wrote:

>  i received the follow message from backend ,it's this
> a bug?

...

> *** glibc detected *** postgres: postgres richelet
> 201.235.11.133(2504) SELECT: double free or corruption
> (!prev): 0x0845d7e8 ***
> === Backtrace: =


Does this happen to be on Novell SLES/SLED 10? I saw this bug
sporadically and in several applications 6-9 months ago. I don't think I
ever knew the cause or even what specifically tickled this problem, but
my recollection is that a Novell patch set fixed it.

I remember that I saw a log of these in /var/log/messages (assuming
you're logging locally). Consider looking there to see if you have other
instances of this bug with other applications.

-Reece

-- 
Reece Hart, http://harts.net/reece/, GPG:0x25EC91A0


Re: [GENERAL] comparing rows

2007-12-10 Thread Reece Hart

On Mon, 2007-12-10 at 13:31 -0800, hjenkins wrote:

> I would like to take a timeseries of data and extract the rows of data
> flanking the gaps in it. So I need to compare timestamps from two
> adjacent
> rows, and determine if the interval is greater than the standard
> sampling
> interval.


It often helps for us to have a snippet of a table definition to frame
replies.  I'll assume that you have a "data" table with a timestamp
column called "ts". I suspect you could use a subquery, like this:

=> select D1.ts as ts1,(select ts from data D2 where D2.ts>D1.ts limit
1) as ts2 from data D1;

I'm uncertain about the performance of this subquery in modern PGs.  If
this query works for you, then you can wrap the whole thing in a view or
another subquery in order to compute ts2-ts1, like this:

=> select ts1,ts2,ts2-ts1 as delta from (  ) X;


This will get you only the timestamps of adjacent rows with large
deltas. The easiest way to get the associated data is to join on the
original data table where ts1=ts or ts2=ts.


-Reece

-- 
Reece Hart, http://harts.net/reece/, GPG:0x25EC91A0


Re: [GENERAL] any way for ORDER BY x to imply NULLS FIRST in 8.3?

2007-11-06 Thread Reece Hart

On Tue, 2007-11-06 at 11:38 +0400, rihad wrote:
> Is there any way to "hardcode" the NULL handling in an index (as  per 
> http://www.postgresql.org/docs/8.3/static/indexes-ordering.html) so
> that 
> SELECT * FROM t ORDER BY foo automatically implies NULLS FIRST (and, 
> similarly so that SELECT * FROM t ORDER BY foo DESC automatically 
> implies NULLS LAST)? Thing is, I'm using PHP Symfony/Propel to
> generate 
> their SQL and have no easy way to influence how they do so.

As Tom already stated, the ordering of NULLs with respect to non-NULLs
is defined by the implementation.

However, it's not clear that you've considered a clause like 'ORDER BY
(foo IS NULL), foo', which I believe is not implementation dependent.
(In SQL2003 draft, true is defined to sort before false. I can't find a
similar statement in SQL92 or SQL99.)

Bear in mind that the ordering of rows with the same value (incl. NULL)
of foo is arbitrary. And, I guess that the equivalence of this query on
two systems will depend on the collating locale also (but I'm a neophyte
in this area).

-Reece

-- 
Reece Hart, http://harts.net/reece/, GPG:0x25EC91A0


---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [GENERAL] getting list of tables from command line

2007-10-31 Thread Reece Hart
On Wed, 2007-10-31 at 08:01 -0700, Craig White wrote:
> I wanted a way to
> query a database and get a text file with just the table names and
> cannot figure out a way to do that.

This should do the trick:

$ psql -Atc "select table_schema||'.'||table_name from
information_schema.tables"

-Reece

-- 
Reece Hart, http://harts.net/reece/, GPG:0x25EC91A0


---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [GENERAL] Multiple versions on same box

2007-10-08 Thread Reece Hart
On Mon, 2007-10-08 at 17:34 -0700, Ralph Smith wrote:

> What is the best way to upgrade?  Use pg_dumpall (using 8.2's
> program), afterwards redirect that file into v8.2 via psql v8.2? 


There are lots of ways to do this. My favorite is to bring a new cluster
on port 5433 and then pipe data between them, roughly like this:

$ /path/to/7.4/pg_dumpall -p5432 ... | /path/to/8.2/psql -p5433 -qaf-
>restore.log 2>&1 

This obviates the intermediate dump file, which can be large.

As a precaution, I remove the PATH elements that locate pg (thus
requiring that I specify the full path) and I unsetl PG* variables.

-Reece

-- 
Reece Hart, http://harts.net/reece/, GPG:0x25EC91A0


Re: [GENERAL] Why the ERROR: duplicate key violates uniqueconstraint "master_pkey" is raised? - Is this a Bug?

2007-09-24 Thread Reece Hart

On Tue, 2007-09-25 at 10:06 +0530, Anoo Sivadasan Pillai wrote:

> But since you can’t control the order, it is likely that the operation
> will sometimes and will succeed sometimes, with the same data.. Weird
> I feel .

Not so weird (IMO). Many databases will return rows unordered unless you
specify otherwise. Once one understands that uniqueness constraints are
evaluated immediately and that rows are visited unordered, it's not so
weird.


> The work around given by you will work, Does it mean that whenever we
> are updating a unique key we have to take this precaution ? 

Probably. I suspect that you're using unique or primary keys to imply
some meaning that would be better stored elsewhere, and definitely NOT
in a primary key. Perhaps you should consider refactoring your design.
You're doing something wrong if you find that you need to update primary
keys routinely. 

-Reece

-- 
Reece Hart, http://harts.net/reece/, GPG:0x25EC91A0


Re: [GENERAL] Why the ERROR: duplicate key violates unique constraint "master_pkey" is raised? - Is this a Bug?

2007-09-24 Thread Reece Hart
On Mon, 2007-09-24 at 12:50 +0530, Anoo Sivadasan Pillai wrote:

> CREATE TABLE master ( m1 INT primary key , m2 VARCHAR(100)) ; 
> INSERT  INTO master VALUES  ( 1, 'm1' ) ;
> INSERT  INTO master VALUES  ( 2, 'm2' ) ;
> UPDATE  master SET m1 = m1 + 1; 
> Update fails with the message - ERROR: duplicate key violates unique
> constraint "master_pkey"


Primary key constraints are not deferred and are not deferrable (try:
\x, then select * from pg_constraint where conname~'master_pkey'). This
means that the constraint is checked immediately for each row updated.
Success will depend on the order in which postgresql decides to visit
rows: if it visits all rows in descending order, you'll be fine, but
that's unlikely and uncontrollable (AFAIK).

Here's a sketch of an easy workaround. You might have to modify it for
your particular range of m1.

begin;
update master set m1=-m1;
update master set m1=-m1+1;
commit;


You could just as easily add N to m1, then subtract (N-1) from m1.
You'll need N>max(m1).

Good luck,
Reece

-- 
Reece Hart, http://harts.net/reece/, GPG:0x25EC91A0


Re: [GENERAL] Seq Scan

2007-06-01 Thread Reece Hart
On Fri, 2007-06-01 at 18:24 +0100, Tyler Durden wrote:

> It uses Index Scan for id>20 and Seq Scan for id>10?!


Based on the statistics pg has for your table, and on the cost of using
the index, the cost based optimizer decided that it's more efficient to
seq scan all of the rows than to incur the index overhead. These
decisions are not always correct, but they're usually quite good.

If you don't believe it, try:
# set enable_seqscan=off;
# explain analyze ...
# set enable_seqscan=on;
# explain analyze ...

-Reece

-- 
Reece Hart, http://harts.net/reece/, GPG:0x25EC91A0


Re: [GENERAL] Make for PgSQL?

2007-05-31 Thread Reece Hart
On Thu, 2007-05-31 at 09:01 +0200, Vincenzo Romano wrote:
> Scripts have been named so that the lexicographical order of filenames
> brings
> the information about dependencies.
> 
> I've been playing with the GNU Make itself but it's quite hard to keep
> track
> of changes and to re-load a single SQL file that changed.

Expressing the dependencies is the hardest part. Once you have the
dependencies, the make part ought to be straightforward (except for
learning the bugaboos of make). "Keeping track of the changes" might
look something like this make snippet:

.DELETE_ON_ERROR:
.SUFFIXES:
%.log: %.sql
psql ... -f $< >$@

(Bless the pg folks with mountains of chocolate and mountain dew for
returning meaningful exit codes even for DDL changes, as opposed to,
say, sqlplus.)

The you need to express your dependencies in a way that make can
understand. The most general way to do this is with a list like:

common.log: utils.log
fx1.log: common.log utils.log
fx2.log: fx1.log
etc.

Finally, you'll want a list of all log targets so that you can type
something like "make update" or whatever to reload as needed. You can
get that with, for example:

SQL_FILES:=$(wildcard sqldir/*.sql)
TARGETS:=$(SQL_FILES:.sql=.log)
.PHONY: update
update: ${TARGETS}


> Is there any hint about "best prectices", software tools and the
> likes?

I don't know anything about best practices (ahem). However, it occurs to
me that it wouldn't be hard to move your dependency encoding into the
SQL itself, such as

-- requires: utils.sql common.sql
create or replace function ...

Then you'd automatically generate a file of sql dependencies using a
perl one-liner (or whatever).

-Reece

-- 
Reece Hart, http://harts.net/reece/, GPG:0x25EC91A0


---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org/


Re: [GENERAL] Command line export or copy utility?

2007-05-22 Thread Reece Hart
On Tue, 2007-05-22 at 18:07 -0400, Francisco Reyes wrote:
> Does anyone know of any export or copy utility that runs on FreeBSD?
> I basically need a program that will connect to one database, do a
> select and copy the result to a second database. 

Two options:

1) if you want a whole table or schema, a pipe works nicely:
eg$ pg_dump -t  | psql


2) As of 8.2, you can formulate COPY commands with subqueries. For
example:
eg$ psql -c 'COPY (SELECT origin_id,origin FROM origin
WHERE is_public order by 1) TO STDOUT'

eg$ psql -c 'COPY (SELECT x FROM a WHERE x%2=1) TO STDOUT' \
| psql -c 'COPY a FROM STDIN;'


The only wrinkle is what to do when you need the DDL for the table
itself (say, when you want to create the same table with a subset of the
rows). The way I do this is to pg_dump the schema (-s) in the custom
format (-Fc). Then, I generate a table of contents with pg_restore -l,
edit the TOC to include only the entries I want, and then rerun
pg_restore with -L.

Good luck,
Reece

-- 
Reece Hart, http://harts.net/reece/, GPG:0x25EC91A0


---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [GENERAL] Random Sample

2007-05-18 Thread Reece Hart
On Fri, 2007-05-18 at 15:36 -0500, [EMAIL PROTECTED] wrote:
> How do I pull a random sample of either 100 records or 5% of the
> population of a table? 

If you can be a little flexible about the number of samples, you can try

select * from table where random()<=0.05;

Of course, there's nothing that guarantees that you'll get 5% and this
only works reasonably for large N. On the other hand, if N were small,
you probably wouldn't be asking for a random sample.

You could also try

select * from table order by random() limit 100;

That'll be expensive, but get you exactly 100 (if your table has >= 100
rows, of course).

-Reece

-- 
Reece Hart, http://harts.net/reece/, GPG:0x25EC91A0


---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [GENERAL] Vacuum non-clustered tables only

2007-05-17 Thread Reece Hart
On Wed, 2007-05-16 at 15:40 -0700, Glen Parker wrote:
> Is there a semi-easy way vacuum all tables in a database *except*
> those that are clustered? 

Yes, it can be done "semi-easily".  Here's an example:

select N.nspname,relname
from pg_class C
join pg_namespace N on C.relnamespace=N.oid
where relkind='r' and not exists
  (select * from pg_index I
   where C.oid=I.indrelid and I.indisclustered);


Many of us build statements in SQL itself. For instance, replace the
first line above with

select 'vacuum '||N.nspname||'.'||C.relname||';' as vacuum_cmd

and now the query returns executable statements.  You probably want to
restrict by namespace/schema name as well.  If you put all of that into
a view, it's then easy to do something like

$ psql -c 'select vacuum_cmd from view' | psql -aX

which uses one connection to select the script, and another to execute
it.


Good luck,
Reece

-- 
Reece Hart, http://harts.net/reece/, GPG:0x25EC91A0


---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [GENERAL] Views- Advantages and Disadvantages

2007-05-09 Thread Reece Hart
On Wed, 2007-05-09 at 12:02 -0700, Dann Corbit wrote:
> Views can hide important information from the optimizer (especially
> index information). 

I believe that you're mistaken, and you can see it rather easily by
explaining a select on a view (or even a view of views). For example:

[EMAIL PROTECTED]> \d palias
View "unison.palias"
   Column|   Type   | Modifiers 
-+--+---
 palias_id   | integer  | 
 pseq_id | integer  | 
 origin_id   | integer  | 
 alias   | text | 
 descr   | text | 
 tax_id  | integer  | 
 ref_pseq_id | integer  | 
 added   | timestamp with time zone | 
View definition:
 SELECT pa.palias_id, pv.pseq_id, pa.origin_id, pa.alias,
pa.descr, pa.tax_id, pv.ref_pseq_id, pv.added
   FROM paliasorigin pa, pseqalias pv
  WHERE pv.palias_id = pa.palias_id AND pv.is_current = true;


[EMAIL PROTECTED]> explain select * from palias where tax_id=9606;
 QUERY
PLAN 


 Hash Join  (cost=701397.95..1634572.27 rows=1293890 width=137)
   Hash Cond: (pv.palias_id = pa.palias_id)
   ->  Seq Scan on pseqalias pv  (cost=0.00..474670.85
rows=20706650 width=20)
 Filter: is_current
   ->  Hash  (cost=647199.80..647199.80 rows=1692012 width=121)
 ->  Bitmap Heap Scan on paliasorigin pa
(cost=33808.65..647199.80 rows=1692012 width=121)
   Recheck Cond: (tax_id = 9606)
   ->  Bitmap Index Scan on paliasorigin_tax_id_idx
(cost=0.00..33385.65 rows=1692012 width=0)
 Index Cond: (tax_id = 9606)
(9 rows)


Long ago I compared a few views with their inlined counterparts and the
upshot is that there is exactly or practically zero difference.

-Reece

-- 
Reece Hart, http://harts.net/reece/, GPG:0x25EC91A0


---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [GENERAL] IF function?

2007-05-01 Thread Reece Hart
On Tue, 2007-05-01 at 07:38 -0700, novnov wrote:
> I didn't express myself clearly. Is there an IIF type function one can
> use
> inline in a view? 
> 
> SELECT UserName, IIF(mybooleanfield = true, "It's true", "It's not
> true") as
> SayIt FROM tblUser 

I think people understood you. The part that may not be obvious to you
is that case can be used in the select list, as in:
SELECT UserName, CASE mybool WHEN TRUE THEN $$It's true$$ ELSE $$It's
not true$$ END AS veracity FROM sometable;

or even
SELECT UserName, $$It's$$ || CASE mybool WHEN TRUE THEN '' ELSE ' not' 
  END || ' true' AS veracity FROM sometable;

Nearly anything you can express in a select statement can also be turned
into a view. (The only exception that comes to mind is that select
allows cols with the same name, but views don't.)

If you really want a function, that's not hard to write for the case of
consistent types:
[EMAIL PROTECTED]> create or replace function iif(boolean,text,text)
returns text language sql as 'select case $1 when true then $2
else $3 end';
CREATE FUNCTION
Time: 71.242 ms
[EMAIL PROTECTED]> select iif(true,'yep','nope');
 iif 
-
 yep
(1 row)

Time: 1.468 ms
[EMAIL PROTECTED]> select iif(false,'yep','nope');
 iif  
--
 nope
(1 row)

[EMAIL PROTECTED]> select $$It's$$ || iif(true,'',' not') || ' true';
 ?column?  
---
 It's true
(1 row)


The function is NOT declared strict: it can return null on null input. I
think that is the right behavior for the boolean, but not for the return
strings.

-Reece

-- 
Reece Hart, http://harts.net/reece/, GPG:0x25EC91A0


---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [GENERAL] Schema as versioning strategy

2007-04-25 Thread Reece Hart
On Wed, 2007-04-25 at 12:47 -0400, Owen Hartnett wrote:
> I want to "freeze" a snapshot of the database every year (think of 
> end of year tax records).  However, I want this frozen version (and 
> all the previous frozen versions) available to the database user as 
> read-only. 

First, I'd rename the current-year schema to a more meaningful name (eg,
taxes2006).  Each year you could do a schema-only dump of the current
year, tweak the schema name in the dump to reflect the new year, and
restore just the schema into the same database. The benefit of this
approach is that the data stay in place (ie, you don't dump public and
restore into a new schema). Conceptually, something as simple as the
following pipe might suffice to dump, rename, and restore into a new
schema:
   $ pg_dump -s -n taxes2006 | sed -e 's/taxes2006/taxes2007/g' | psql
-qa 
(This is a little dangerous because I've assumed that the string
'taxes2006' occurs only as a schema name.  I've also assumed Unix/Linux
and I have no idea what you'd do on a windows box.)

PostgreSQL doesn't have a read-only mode per se. The closest you can get
is to write a script to revoke insert/update/delete on all tables in an
archived schema; that's also pretty easy:
  $ psql -Atc "select 'REVOKE INSERT,UPDATE,DELETE FROM '||
nspname||'.'||
relname||' FROM someuser;' from pg_class C join pg_namespace N on
C.relnamespace=N.oid and N.nspname='taxes2006' WHERE C.relkind='r'"
\
| psql -qa
(In general, you should reduce everything to a one-liner.)

One of the advantages of having archived schemas in a single database is
that you'll be able to write queries that involve multiple years.  You
wouldn't be able to do that (easily*) if you archived the full database.

-Reece


* This is where David Fetter will mention dblink.

-- 
Reece Hart, http://harts.net/reece/, GPG:0x25EC91A0


---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


[GENERAL] digest data types?

2007-04-12 Thread Reece Hart
Does anyone have postgresql types for message digests, especially md5
and sha1?

Obviously I could store these as text (as I currently do), but I'm
particularly interested in custom types that store digests as binary
blobs and provide conversion to/from text.

Am I correct in assuming that the space saved by storing digests as
binary (1/2 size of hex) will substantially impact index ins/upd/del
performance or when the digest itself is a large fraction of the rest of
the row size?

Thanks,
Reece

-- 
Reece Hart, http://harts.net/reece/, GPG:0x25EC91A0


[GENERAL] digest data types?

2007-04-11 Thread Reece Hart
Does anyone have postgresql types for message digests, especially md5
and sha1?

Obviously I could store these as text (as I currently do), but I'm
particularly interested in custom types that store digests as binary
blobs and provide conversion to/from text.

Am I correct in assuming that the space saved by storing digests as
binary (1/2 size of hex) will substantially impact index ins/upd/del
performance or when the digest itself is a large fraction of the rest of
the row size?

Thanks,
Reece

-- 
Reece Hart, http://harts.net/reece/, GPG:0x25EC91A0


[GENERAL] storing checksums in a database

2007-04-10 Thread Reece Hart
I'd like to store several kinds of checksums in a database and would
appreciate some advice about the best way to do this. I will use these
checksums for uniqueness constraints and for searching.

I currently store md5s as text and I could store other checksums as text
also, but it occurs to me that there are at least two other options.

One option is to store binary blobs instead. This would save 2x on
storage over hex. For example, sha1 is a 160 bit hash; that's 20 bytes
as binary or 40 (8-bit) chars as hex. As longer hashes are used, I
imagine that the space savings might have significant impact on index
building and query optimization decisions. The downside is that one
would need to encode/decode for lookups based on the hash.

Another option is to create a custom type for hex digests that would
provide type conversion as necessary. These would provide the space
benefit of storing as a binary and the usability of having postgresql Do
The Right Thing depending on the type of the representations being
compared. I can't think of any downsides.

So, the questions:

1) Am I worrying about nothing, or is the size of a hash significant? In
my case, the rows consist of a protein sequence (~350 chars, as text), a
timestamp, an integer PK, and an integer length. The text md5 is 32 text
chars, of course, to which I might add sha1 (40 text chars). Thus, the
size of hashes is a sizable fraction of the rest of the row.

2) Does anyone know of postgresql digest data types that provide these
conversions? If not, I'll do it myself... but I'd rather steal of
course.

Thanks,
Reece

-- 
Reece Hart, http://harts.net/reece/, GPG:0x25EC91A0
./universe -G 6.672e-11 -e 1.602e-19 -protonmass 1.673e-27 -uspres bush
kernel warning: universe consuming too many resources. Killing.
universe killed due to catastrophic leadership. Try -uspres carter.


Re: [GENERAL] Dealing with table names in functions

2007-03-22 Thread Reece Hart
On Thu, 2007-03-22 at 09:40 -0400, Alvaro Herrera wrote:
> Pass the optionally qualified name and cast it to regclass.  It will
> work correctly when the name is not qualified, applying search_path,
> and it will also work when the name is qualified. 

Is there a way to get names that are always qualified, irrespective of
the search_path? Once one has the oid, it's an easy function to write,
but I suspect this machinery already exists.

For example, I'd like the hypothetical cast:
[EMAIL PROTECTED]> set search_path = 'unison';
SET

[EMAIL PROTECTED]> select 'pseq'::regclass::oid::FQregclass;
 regclass 
--
     unison.pseq
(1 row)


-Reece

-- 
Reece Hart, http://harts.net/reece/, GPG:0x25EC91A0
./universe -G 6.672e-11 -e 1.602e-19 -protonmass 1.673e-27 -uspres bush
kernel warning: universe consuming too many resources. Killing.
universe killed due to catastrophic leadership. Try -uspres carter.


---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [GENERAL] deleting a foreign key that has no references

2007-03-19 Thread Reece Hart
On Mon, 2007-03-19 at 13:03 -0700, Reece Hart wrote:

> The harder and more general case is to build such a query dynamically
> from pg_depends

...

> See the pg_depends documentation at


Apologies. I intended to write pg_constraint and the documentation at
http://www.postgresql.org/docs/8.2/interactive/catalog-pg-constraint.html

-- 
Reece Hart, http://harts.net/reece/, GPG:0x25EC91A0
./universe -G 6.672e-11 -e 1.602e-19 -protonmass 1.673e-27 -uspres bush
kernel warning: universe consuming too many resources. Killing.
universe killed due to catastrophic leadership. Try -uspres carter.


Re: [GENERAL] deleting a foreign key that has no references

2007-03-19 Thread Reece Hart
On Mon, 2007-03-19 at 11:12 -0500, Glen W. Mabey wrote:
> What I'm hoping to find out is whether there is some way to directly
> find out how many (using a SELECT query) references there are to a
> key. 

In the easy case when your schema doesn't change often, you can just
hard code a query of the FK tables and add up the row counts. I bet
something like 'select count(*) from (select * from FKtable1 UNION ALL
select * from FKtable2 ... )' will work (and I'm guessing that the UNION
ALL will optimize well). Obviously, you want indexes on the FKs.

The harder and more general case is to build such a query dynamically
from pg_depends. A good start would be to write a function that returns
an sql query like the above to count the referents of PKtable(PKcolumn).
If you can declare this function stable or immutable (I'm not sure of
this), then it might not be too painful to generate the query within the
trigger itself.  Otherwise, you might have to store/update these queries
in a separate table after every DDL change.

See the pg_depends documentation at
http://www.postgresql.org/docs/8.2/interactive/catalog-pg-depend.html

-Reece

-- 
Reece Hart, http://harts.net/reece/, GPG:0x25EC91A0
./universe -G 6.672e-11 -e 1.602e-19 -protonmass 1.673e-27 -uspres bush
kernel warning: universe consuming too many resources. Killing.
universe killed due to catastrophic leadership. Try -uspres carter.


---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [GENERAL] Postgres Database size

2007-03-17 Thread Reece Hart
On Sun, 2007-03-18 at 00:40 +0530, Mageshwaran wrote:
> how to find the size of a particular database in postgres...

The old way was to use du or similar. Recent versions (I believe >=8.1,
but check the release notes to be sure) provide several useful functions
for this:
 pg_column_size
 pg_database_size
 pg_relation_size
 pg_size_pretty
 pg_tablespace_size
 pg_total_relation_size


For example:

[EMAIL PROTECTED]> select datname,pg_size_pretty(pg_database_size(oid)) from
pg_database ;
   datname   | pg_size_pretty 
-+
 postgres| 3905 kB
 csb | 113 GB
 template0   | 3840 kB
 csb-dev | 124 GB
 csb-dev-snapshot-2007-03-08 | 123 GB
 csb_02_02_2007  | 121 GB
 template1   | 3840 kB


-- 
Reece Hart, http://harts.net/reece/, GPG:0x25EC91A0
./universe -G 6.672e-11 -e 1.602e-19 -protonmass 1.673e-27 -uspres bush
kernel warning: universe consuming too many resources. Killing.
universe killed due to catastrophic leadership. Try -uspres carter.


---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


[GENERAL] Oracle-PostgreSQL Coexistence/Interoperability (segue from PostgreSQL to Oracle)

2007-03-16 Thread Reece Hart
On Fri, 2007-03-16 at 14:20 -0400, Jonah H. Harris wrote:

> As Ezequias asked about migrating an application, I'm not quite sure
> why we're discussing this.  Using HSODBC to move data permanently is
> quite good assuming you have no data type issues.


I missed the OP and therefore the link to app migration. However, HSODBC
and its limitations were mentioned and therefore this discussion seems
germane. No matter... I've posted this as a new thread.

Perhaps you've implicitly answered my question -- you apparently think
HSODBC's primary utility is to "move data permanently". That's fine, but
far short of my desire to have a efficient way to expose schema and data
in a postgresql database to oracle's clients -- in real time, with pg
functions, sane type conversions, and reasonable join performance. I
appreciate the challenges of the latter for an optimizer, but it's
technically possible (and, psst, could be very lucrative).

I'm not trying to permanently migrate anything, but rather I'm trying to
coexist seamlessly with Oracle and lessen the barriers to transitioning
to postgresql (or v/v). Interoperability is really the only option when
one has to contend with the integration of multiple database apps/schema
from multiple developers and with Oracle-specific dependencies, such as
I do. Pragmatically speaking, you can't just migrate such a tangle
wholesale -- if there's no other route, it'll never happen (and,
sniffle, I'll be inextricably bound to Oracle forever).

-Reece

-- 
Reece Hart, http://harts.net/reece/, GPG:0x25EC91A0
./universe -G 6.672e-11 -e 1.602e-19 -protonmass 1.673e-27 -uspres bush
kernel warning: universe consuming too many resources. Killing.
universe killed due to catastrophic leadership. Try -uspres carter.


Re: [GENERAL] [SQL] PostgreSQL to Oracle

2007-03-16 Thread Reece Hart
On Thu, 2007-03-15 at 16:20 -0400, Robert Treat wrote:
> Do you find [HSODBC] works well?  I've used it from some older Oracle
> instances connecting back into PostgreSQL and the results I had have
> been flakey at best.

For us too. We tried in early 2006 with Oracle 8 via hsodbc to
postgresql 7.4.6 and had several problems. From what I remember:

- Oracle complained about SQLError when using unixODBC 3.0. This
necessitated using 2.5. 
http://www.ianywhere.com/developer/product_manuals/sqlanywhere/0902/en/html/dbpgen9/0291.htm

- We had type conversion problems. I can't remember many details of
this, but I *think* they may have been text/varchar and timestamp/date.
We don't use blobs.

- It took our admins a fair bit of time to get working. It probably
didn't help that Oracle was running on AIX.

- The real nail in the coffin seemed to be that there was no way to push
predicates to the remote side. This mean that a remote relation was
materialized locally in order to join with local data and that we'd
never get reasonable performance for cross-database joins (which,
perhaps, it's not designed to do).


I'm particularly curious about this last problem.  Without pushing
predicates to the remote side, it's hard for me to imagine that HSODBC
would be terribly useful for anything but building a local warehouse.
This seems so odd that I think I must be wrong, but I couldn't find any
documentation that to support or discourage cross-database joins.  Can
someone out there enlighten me?  What's the intent for HSODBC and what
are its limitations?  Can one access remote server side functions via
HSODBC (eg, sqlplus> select col1,plpgsqlfx(col1) from remotetable)?


Thanks,
Reece

-- 
Reece Hart, http://harts.net/reece/, GPG:0x25EC91A0
./universe -G 6.672e-11 -e 1.602e-19 -protonmass 1.673e-27 -uspres bush
kernel warning: universe consuming too many resources. Killing.
universe killed due to catastrophic leadership. Try -uspres carter.


---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [GENERAL] orphaned PGDATA/base/ subdirectories

2007-03-13 Thread Reece Hart
On Tue, 2007-03-13 at 18:16 -0400, Alvaro Herrera wrote:

> On the other hand it would be scary to have the elevator "change its
> mind" on the middle of carrying you to 20th and suddenly opening the
> floor for dumping you to the basement instead.  Which one would think
> is
> pretty much what must have happened here ... 


Actually, I got off at the floor 7-and-a-half and I discovered a great
view through Tom Lane's eyeballs. Here's what it looks like:

Jeez, I have the postgresql source memorized (all versions)... my hands
move rapidly between 5 keyboards, I can watch 5 monitors at the same
time, and I can respond to 15 emails while coding on 3 branches and
running regression tests... this is great!

-Reece

-- 
Reece Hart, http://harts.net/reece/, GPG:0x25EC91A0
./universe -G 6.672e-11 -e 1.602e-19 -protonmass 1.673e-27 -uspres bush
kernel warning: universe consuming too many resources. Killing.
universe killed due to catastrophic leadership. Try -uspres carter.


Re: [GENERAL] orphaned PGDATA/base/ subdirectories

2007-03-13 Thread Reece Hart
On Tue, 2007-03-13 at 17:44 -0400, Tom Lane wrote:

> are you in the habit of
> banging on ^C repeatedly?  I couldn't reproduce such a problem in a
> small amount of testing though. 


You mean C-c doesn't work like elevator buttons? 

It wouldn't surprise me that I hit repeatedly. I certainly remember at
least one case where it seemed like I was being ignored and hit C-c
repeatedly. It's also quite possible that I logged out from ssh (with
~.), and I have no idea what signal (signals?) that ultimately sends.
And, last night I sent KILL when I was convinced that I was being
ignored and was about to restart the cluster anyway, for which I'd
expect remnants. However, I'm certain of the observation that disk full
led to remnants of a partial copy. That job was running in the
background when I wasn't logged in.

Thanks,
Reece

-- 
Reece Hart, http://harts.net/reece/, GPG:0x25EC91A0
./universe -G 6.672e-11 -e 1.602e-19 -protonmass 1.673e-27 -uspres bush
kernel warning: universe consuming too many resources. Killing.
universe killed due to catastrophic leadership. Try -uspres carter.


[GENERAL] orphaned PGDATA/base/ subdirectories

2007-03-13 Thread Reece Hart
setup: postgresql 8.1.8, x86_64, gnu/linux

I recently used createdb -T to copy a large database and the process
failed due to disk full, but the space wasn't freed after that failure.
That led me to wonder how much of my disk usage was from failed or
interrupted operations.

In the 8.1.8 manual, Chapter 50, I read: 
For each database in the cluster there is a subdirectory within
PGDATA/base, named after the database's OID in pg_database. This
subdirectory is the default location for the database's files;
in particular, its system catalogs are stored there.

So, in pg_database I see:
[EMAIL PROTECTED]> select oid,datname from pg_database order by 1;
   oid   |   datname   
-+-
   1 | template1
   10792 | template0
   10793 | postgres
 1235174 | csb_02_02_2007
 1379984 | csb-dev
 2121234 | csb
 2296808 | csb-dev-snapshot-2007-03-07
 2422198 | csb-dev-snapshot-2007-03-08
 2558026 | csb-dev-snapshot-2007-03-13
(9 rows)

And in PGDATA/base:
[EMAIL PROTECTED]:/srv/postgresql/base> ls -l
total 207
drwx--  3 postgres postgres  2640 2007-03-12 22:07 1
drwx--  2 postgres postgres  2576 2007-02-16 17:21 10792
drwx--  3 postgres postgres  2640 2007-03-12 20:48 10793
drwx--  3 postgres Bioinfo  21472 2007-03-12 20:47 1235174
drwx--  2 postgres Bioinfo   3872 2006-10-19 21:25 1363637 ?
drwx--  2 postgres Bioinfo   1888 2006-10-20 08:32 1371793 ?
drwx--  3 postgres Bioinfo  31336 2007-03-12 20:48 1379984
drwx--  3 postgres Bioinfo  21336 2007-03-12 20:51 2121234
drwx--  3 postgres Bioinfo  30720 2007-03-13 08:57 2296808
drwx--  3 postgres Bioinfo  30680 2007-03-12 20:47 2422198
drwx--  2 postgres Bioinfo  21304 2007-03-12 18:50 2549963 ?
drwx--  2 postgres Bioinfo  20504 2007-03-12 20:47 2549977 ?
drwx--  2 postgres Bioinfo   4320 2007-03-12 20:47 2550014 ?
drwx--  2 postgres Bioinfo  18448 2007-03-13 03:57 2558026

I've put ? after the base/ subdirs that have no correspondent in
pg_database.

The manual clearly says that a each row in pg_database will have a
corresponding subdirectory in base/.  Is the converse also true, ie,
should every subdir in base/ also have a row in pg_database?  If so, am
I safe to rm the unreferenced subdirectories?

FWIW, I know that I've interrupted (^C) a few createdb -T operations
during the course of this installation, and I recently had the disk full
error. My best guess is that these orphaned directories result from
those interruptions and failure.

Thanks,
Reece

-- 
Reece Hart, http://harts.net/reece/, GPG:0x25EC91A0
./universe -G 6.672e-11 -e 1.602e-19 -protonmass 1.673e-27 -uspres bush
kernel warning: universe consuming too many resources. Killing.
universe killed due to catastrophic leadership. Try -uspres carter.


---(end of broadcast)---
TIP 6: explain analyze is your friend


[GENERAL] pg_restore and database comments

2007-03-07 Thread Reece Hart
I periodically take snapshots of my current development database to
facilitate recovery from screw-ups. I do so roughly like this:
createdb -EUTF-8 db-snapshot
pg_dump -Fc db | pg_restore -d db-snapshot

For which I get the fairly innocuous error:
pg_restore: WARNING:  database comments may only be applied to
the current database

I'm not aware of any other part of a database dump that references the
source database by name (nor should it, since this would cause problems
when restoring to a different database name).

Is there a better way to snaphsot a database? createdb -T db db-snapshot
seemed like a good idea, but that requires the template db to not be in
use (ie, no logged in users).

I'm running 8.1.8 and will move to 8.2.3 shortly.

Thanks,
Reece

-- 
Reece Hart, http://harts.net/reece/, GPG:0x25EC91A0
./universe -G 6.672e-11 -e 1.602e-19 -protonmass 1.673e-27 -uspres bush
kernel warning: universe consuming too many resources. Killing.
universe killed due to catastrophic leadership. Try -uspres carter.


---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [GENERAL] Database deadlock/hanging

2007-03-07 Thread Reece Hart
On Wed, 2007-03-07 at 10:08 -0600, John Gateley wrote:
> I'm guessing it's something related to table locks.
...
> Any pointers on what I should be looking for to prevent this from
> happening again? What information I should be tracking to figure
> out what is exactly happening? 

Your inserts almost certainly have a table or index exclusively locked
and thereby causing a backlog of selects.

You can fish current and waiting locks out of pg_locks, but those use
internal identifiers rather than names.  Here's a view that will make
pg_locks more readable:

[EMAIL PROTECTED]> CREATE OR REPLACE VIEW pgutils.locks AS
 SELECT l.pid, d.datname AS "database", n.nspname AS "schema", c.relname AS 
relation, l.locktype, l."mode", 
CASE l."granted"
WHEN true THEN 'RUN'::text
ELSE 'WAIT'::text
END AS state, a.usename, a.current_query, to_char(now() - 
a.query_start, 'HH24:MI:SS'::text) AS duration
   FROM pg_locks l
   JOIN pg_database d ON l."database" = d.oid
   JOIN pg_class c ON l.relation = c.oid
   JOIN pg_namespace n ON c.relnamespace = n.oid
   JOIN pg_stat_activity a ON l.pid = a.procpid
  ORDER BY l.pid, d.datname, n.nspname, c.relname, l."granted";

[EMAIL PROTECTED]> select * from pgutils.locks ;
  pid  | database |   schema   |  relation  | locktype |  mode  
 | state | usename | current_query | duration 
---+--+++--+-+---+-+---+--
 28434 | csb-dev  | pg_catalog | pg_class   | relation | 
AccessShareLock | RUN   | rkh | | 00:00:21
 28434 | csb-dev  | pg_catalog | pg_class_oid_index | relation | 
AccessShareLock | RUN   | rkh | | 00:00:21
 28434 | csb-dev  | pg_catalog | pg_locks   | relation | 
AccessShareLock | RUN   | rkh | | 00:00:21
 28434 | csb-dev  | pg_catalog | pg_namespace   | relation | 
AccessShareLock | RUN   | rkh | | 00:00:21
 28434 | csb-dev  | pg_catalog | pg_stat_activity   | relation | 
AccessShareLock | RUN   | rkh | | 00:00:21
 28434 | csb-dev  | pgutils    | locks  | relation | 
AccessShareLock | RUN   | rkh | | 00:00:21
(6 rows)

-Reece

-- 
Reece Hart, http://harts.net/reece/, GPG:0x25EC91A0
./universe -G 6.672e-11 -e 1.602e-19 -protonmass 1.673e-27 -uspres bush
kernel warning: universe consuming too many resources. Killing.
universe killed due to catastrophic leadership. Try -uspres carter.


---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [GENERAL] Union Query Improvement

2007-02-14 Thread Reece Hart
On Tue, 2007-02-13 at 05:46 -0800, Ray Bannon wrote:

> Select ID, plan_name from table/view
> Where plan_name = 'A'
> And rownum = 1
> UNION

...

> 
> Ad infinitum for about 100 iterations.
> 
> Any way to write this more efficiently?


I assume that "table/view" in your OP really refers to different tables
or views; otherwise, this is a trivial query that can be collapsed to
"select ... where plan_name in ('A','B','C') ..." or, perhaps just
"select ... where rownum=1".

As Tom said, UNION ALL may be more appropriate for your needs. See 
http://www.postgresql.org/docs/8.1/interactive/queries-union.html

You should consider inherited tables and table partitioning. See
http://www.postgresql.org/docs/8.1/interactive/ddl-partitioning.html

Partitioning offers two administrative benefits: consistency of child
tables and an implied UNION ALL of children when one selects on the
parent.

-Reece

-- 
Reece Hart, http://harts.net/reece/, GPG:0x25EC91A0
./universe -G 6.672e-11 -e 1.602e-19 -protonmass 1.673e-27 -uspres bush
kernel warning: universe consuming too many resources. Killing.
universe killed due to catastrophic leadership. Try -uspres carter.


Re: [GENERAL] Foreign Key Identification

2007-01-10 Thread Reece Hart
From: Ashish Karalkar 
  * To: pgsql-general ( at ) postgresql ( dot ) org
  * Subject: Foreign Key Identification
  * Date: Wed, 10 Jan 2007 08:23:28 -0800 (PST)


Hello All,
Is there any means to list out the foreign key tables
which are liked with a primary key?
What i want to do is something as follows:
...
select tablename from ? where Foreign key = TableAPK
and this should list me


I needed to do something similar. I wrote some views to do this and
packaged 'em up as something I call pgutils. See:

http://archives.postgresql.org/pgsql-general/2006-11/msg00752.php

http://harts.net/reece/pgutils/


-Reece

-- 
Reece Hart, http://harts.net/reece/, GPG:0x25EC91A0
./universe -G 6.672e-11 -e 1.602e-19 -protonmass 1.673e-27 -uspres bush
kernel warning: universe consuming too many resources. Killing.
universe killed due to catastrophic leadership. Try -uspres carter.


---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [GENERAL] select union with table name

2006-12-28 Thread Reece Hart
On Thu, 2006-12-28 at 19:09 +0700, Akbar wrote:
> select blue.name from blue union select red.name from red
>  give me this:
> name
> 'blabla'
> 'bubu'
> 'haha'
> 'kkk'
> 
> I want this:
> nametable_name
> 'blabla'   blue
> 'bubu'blue
> 'haha'red
> 'kkk'  red
> 
> Could I? 

select name,'blue' as "table_name" from blue union all select name,'red'
as "table_name" from red;

Note the 'all' after union... I suspect you'll want that or should at
least consider it.

-Reece

-- 
Reece Hart, http://harts.net/reece/, GPG:0x25EC91A0
./universe -G 6.672e-11 -e 1.602e-19 -protonmass 1.673e-27 -uspres bush
kernel warning: universe consuming too many resources. Killing.
universe killed due to catastrophic leadership. Try -uspres carter.


---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org/


Re: [GENERAL] (bash the search engine) Don't split on underscore

2006-12-19 Thread Reece Hart
On Tue, 2006-12-19 at 13:25 +0100, Hannes Dorbath wrote:

> I think it would be useful to adjust the parser to not split on underscores:
> 
> In case I'd like to lookup PG's to_number() function I won't get 
> anything useful. Number is contained nearly everywhere and to is 
> configured as stop word. Same with most other functions.


That would be useful and almost certainly result in better specificity.
A counter example is searching for "information schema", for which you'd
probably want hits to "information_schema" as well.

-Reece

-- 
Reece Hart, http://harts.net/reece/, GPG:0x25EC91A0
./universe -G 6.672e-11 -e 1.602e-19 -protonmass 1.673e-27 -uspres bush
kernel warning: universe consuming too many resources. Killing.
universe killed due to catastrophic leadership. Try -uspres carter.


Re: [GENERAL] Anyone? Best way to authenticate postgres against

2006-12-18 Thread Reece Hart
On Mon, 2006-12-18 at 20:23 -0600, Derrick wrote:
>   Subject: 
> [GENERAL] Anyone?  Best way to
> authenticate postgres against
> active directory?

I know very little about AD, its kerberos implementation, and
interoperability limitations. I have been using kerberos with postgresql
for years and think it's nearly the holy grail of authentication. I know
that linux can authenticate against AD domains , so I presume that one
could also get postgresql to do so.

Here are two bookmarks that I've saved and that you might find useful:
http://chrisp.de/en/rsrc/kerberos.html
http://www.windowsnetworking.com/articles_tutorials/Authenticating-Linux-Active-Directory.html

If you can make that work, the community might appreciate a summary of
how you did it.

-Reece

-- 
Reece Hart, http://harts.net/reece/, GPG:0x25EC91A0
./universe -G 6.672e-11 -e 1.602e-19 -protonmass 1.673e-27 -uspres bush
kernel warning: universe consuming too many resources. Killing.
universe killed due to catastrophic leadership. Try -uspres carter.


---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [GENERAL] Let's play bash the search engine

2006-12-18 Thread Reece Hart
On Mon, 2006-12-18 at 15:47 -0800, Joshua D. Drake wrote:

> Take a look at let us know what you think and how it performs for you.


Terrific. Fast and meaningful.

I echo Thomas' request to have docs limited to a version (or, better,
most recent). Perhaps archived docs should be searched via a separate
page entirely.

Most the queries I did hit what I expected, except that the docs were
for old versions. (In fact, I don't think 8.2 docs ever showed up
first.)

I tried "defer constraints" and got a few not-too-useful hits. However,
"deferred constraints" returned meaningful links. Is that a stemmer
problem?

-Reece

-- 
Reece Hart, http://harts.net/reece/, GPG:0x25EC91A0
./universe -G 6.672e-11 -e 1.602e-19 -protonmass 1.673e-27 -uspres bush
kernel warning: universe consuming too many resources. Killing.
universe killed due to catastrophic leadership. Try -uspres carter.


Re: [GENERAL] MySQL drops support for most distributions

2006-12-13 Thread Reece Hart
On Wed, 2006-12-13 at 14:21 -0500, Tom Lane wrote:

> I think the real criterion for big companies is not so much whether
> you're supporting your "own" product as whether you're big enough to
> be worth suing if things go wrong. 


This is a common misunderstanding and it is incorrect, at least in my
experience. I work at a company with >10K people. I oversee computer
architecture and operations for Research (~800 people) and I work very
closely with our large IT group.

In order to understand how we purchase hardware, software, or support,
you have to understand what's important to us. A successful company must
focus on their products and not irrelevant details about how they gets
produced and delivered. Employees may personally care about the detailed
means to product, but successful companies and their managers -- and,
ultimately, customers and stock holders -- do not.

The major concerns for our purchases include: 1) Does it meet our
functional requirements? 2) Does it integrate with our existing
infrastructure? 3) Can we identify a support channel? and 4) What's the
risk relative to other options? These days, OSS packages frequently
exceed functional requirements over proprietary alternatives. Apache is
an irrefutable example. Big vendors often have proven track records for
(2) and (3), but it's not the bigness per se that appeals. We choose
small vendors when that's appropriate for a need. Whom we sue when
things go wrong is almost never a consideration during purchasing. If a
relationship goes south, a suit is unlikely to address our primary goal,
the product.

Now, lest you think I'm a corporate troll on the pg lists, I should tell
you that I'm probably among the most visible and vocal open source
supporters here. I've long railed against proprietary software -- not
because of support issues but because I view *some* proprietary software
as a real threat to our long-term success. What's important is that our
data are usable in ways we see fit, without encumbrance from vendors.
This is not the goal of big vendors who require depend on lock-in for
growth.

The EnterpriseDB folks have the right strategy. Nobody wants Oracle
itself, but rather they want database services that behave like Oracle
(er, except the parts that annoy). If I can't tell that I'm not talking
to Oracle but getting the "right" answers, why should I care? Cheaper
too? Even better. Oracle should be scared because it seems inevitable
that their database business will be commoditized out of existence.

Concern for risk is perhaps the most elusive problem for OSS providers
and supporters. Companies don't like risk, and *any* change to a working
process is a risk. Much to my chagrin, this risk makes it difficult to
unseat even mediocre products. We should all cheer EnterpriseDB's
success in booking some big name companies. Their successes will
establish PostgreSQL as a reliable, cost-effective, and empowering
alternative to proprietary databases and therefore decrease the risk
concerns.

The only reason I spent this much time weighing in is because I'm
thrilled with PostgreSQL (er, sorry Tom, Postgres) and appreciate and
respect the terrific work done in this community. Thank you.

Cheers,
Reece

-- 
Reece Hart, http://harts.net/reece/, GPG:0x25EC91A0
./universe -G 6.672e-11 -e 1.602e-19 -protonmass 1.673e-27 -uspres bush
kernel warning: universe consuming too many resources. Killing.
universe killed due to catastrophic leadership. Try -uspres carter.


Re: [GENERAL] Copy command to load data into a PostgreSQL DB

2006-11-17 Thread Reece Hart
On Fri, 2006-11-17 at 15:46 -0500, Wm.A.Stafford wrote:

> If we use '\x05' as suggested by the  
> Fastreader User guide.  PostgreSQL complains "ERROR:  COPY delimiter 
> must be a single character".
> 
> Any help or suggestions would be appreciated. 


\x05, aka control-e, ctl-e, ^e, C-e, etc, is merely one way to represent
the *single* ASCII character 5. When a manual says type \x05, it almost
surely means that some program will interpret those four printable chars
as a single non-printable character rather than the 4-char string
literally. Different programs use different representations of control
characters and \x is very common, but the problem is that postgres's
copy command doesn't understand this syntax for non-printable
characters. No matter, copy will work for you, and in fact you can use
TWO ways to represent control-e. 

Option 1: Instead of '\x05', type E'\x05', that is:
- letter E
- single quote
- the 4-char string \x05
- single quote

E is postgresql's way of indicating that the string will be interpreted
in way that does not conform to SQL spec. This results from the pg
developers being pedantic about conformance and refusing to extend the
standard carelessly (I love 'em for this attitude).


Option 2: Instead of '\x05', type 'C-vC-e' . By this I mean:
  - single quote
  - control-v
  - control-e
  - single quote.
The magic here is that control-v means "take the next character
verbatim". In effect, you're typing ASCII char 5 (a single character)
literally into the single quotes rather than \x05 or any other
representation of it (e.g., \x05). I'm pretty sure that readline is
responsible for this interaction, and therefore this probably doesn't
work on readline-less installations.


-Reece

-- 
Reece Hart, http://harts.net/reece/, GPG:0x25EC91A0



Re: [GENERAL] R and postgres

2006-11-07 Thread Reece Hart




On Tue, 2006-11-07 at 00:22 -0800, Joe Conway wrote:

I think you want this:
http://bioconductor.org/packages/1.9/bioc/html/RdbiPgSQL.html

The bioconductor project is now maintaining RdbiPgSQL. I think many 
people also use RODBC to connect R with Postgres.


Joe-

I almost sent you an off-list email before I posted because I just knew you'd know the answer.

Thank you.

-Reece    





-- 
Reece Hart, http://harts.net/reece/, GPG:0x25EC91A0









Re: [GENERAL] database name aliases?

2006-11-07 Thread Reece Hart




Woody and Albe-

I wasn't aware of pg_service -- that does solve my original problem.

Thanks for the replies.

-Reece





-- 
Reece Hart, http://harts.net/reece/, GPG:0x25EC91A0








Re: [GENERAL] database name aliases?

2006-11-06 Thread Reece Hart




On Mon, 2006-11-06 at 16:43 -0800, Jeff Davis wrote:

You can use "ALTER DATABASE name RENAME TO newname;". Does that help?


This is what I do now to evolve from development to staging to production, as well as to deprecate versions. That indeed solves most of the problem.

Aliases might solve two problems. The first is to address the oft-recurring problem of wanting to be able to refer simultaneously to an instance and more generally to a concept (e.g., HEAD in cvs, or /etc/alternatives/ for system executables, etc). That is, one could refer to a specific db version/instance as well as a name for the "most recent" version (or dev, stage, prod, or whatever).

The second goal is more practical: postgres doesn't allow a database to be renamed while it's in use and that prohibition causes minor scheduling problems when rotating instances. I imagine that db aliases would affect only new connections.

Thanks,
Reece





-- 
Reece Hart, http://harts.net/reece/, GPG:0x25EC91A0









Re: [GENERAL] R and postgres

2006-11-06 Thread Reece Hart




On Mon, 2006-11-06 at 19:13 -0800, Adrian Klaver wrote:

You might want to take a look at PL/R a procedural language for R in Postgres.
http://www.joeconway.com/plr/ 


PL/R is indeed neat, but I want to go the other way: within R, fetch data from postgres and manipulate it (plot, histogram, etc). I appreciate that I could expose this through postgres, but that's a little awkward for prototyping and I'm not keen to add load to my postgres box.

-Reece





-- 
Reece Hart, http://harts.net/reece/, GPG:0x25EC91A0









[GENERAL] R and postgres

2006-11-06 Thread Reece Hart




I'd like to get R to talk to postgresql, but my head's spinning among a web of broken links, way outdated web pages, project deprecation announcements and a terrible signal:link ratio.

Rdbi and RdbiPgSQL seem to be the answer, despite both being apparently defunct projects.

What is the Right Thing for a guy who wants R to talk to postgresql? 

Thanks, and send aspirin,
Reece





-- 
Reece Hart, http://harts.net/reece/, GPG:0x25EC91A0









Re: [GENERAL] database name aliases?

2006-11-06 Thread Reece Hart




On Tue, 2006-11-07 at 04:54 +0500, Talha Khan wrote:

You can do this by using a script. I mean whenever you are creating a new version of the database. you can do that by running a script that creates a database  and then stores the name of that database in an environmental variable . Now you can run psql by connecting to a database through that variable.


I think I understand the gist of your proposal, but psql was just one client example. In principle, I'd want the same alias to be exposed to psql, perl DBI, odbc, jdbc, etc. I really think this should be done in the database itself to ensure consistency. 

-Reece





-- 
Reece Hart, http://harts.net/reece/, GPG:0x25EC91A0









[GENERAL] database name aliases?

2006-11-06 Thread Reece Hart




I'd like to be able to have several versions of a database available concurrently and one database alias that refers to the most recent of these.  For example:

dbname_1-1
dbname_1-2
dbname_1-3
dbname -> dbname_1-3

and
$ psql -d dbname
would connect to dbname_1-3. 

Any ideas?

Thanks,
Reece





-- 
Reece Hart, http://harts.net/reece/, GPG:0x25EC91A0









Re: [GENERAL] not null across multiple columns

2006-11-02 Thread Reece Hart
On Thu, 2006-11-02 at 19:10 -0500, Matthew Terenzio wrote:
> suppose you wanted to be certain that either one of two or more
> columns were present
> 
> Like in a user table, either a username or an email need to be present
> to create a row.
> 
> You  can't use not null because it's an either or situation.
> 
> what's the best method to accomplish this sort of constraint?

See check constraints:
http://www.postgresql.org/docs/8.1/interactive/ddl-constraints.html#AEN1954

I suppose you seek something like this:
create table people (
name text,
email text,
constraint valid_name_or_email 
check (name is not null or email is not null)
    );

Cheers,
Reece

-- 
Reece Hart, http://harts.net/reece/, GPG:0x25EC91A0



---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [GENERAL] DELETE performance issues

2006-11-02 Thread Reece Hart




On Thu, 2006-11-02 at 11:27 -0500, Tom Lane wrote:

Are you absolutely sure about that last?  Unindexed foreign key
references are the explanation nine times out of ten when someone
complains about deletes being slow. 


This was certainly the major cause when I had slow deletes. (The other problem was chaining of cascading deletes.) In order to help correct such problems, I wrote some views to identify unindexed, cascading foreign keys. An example:



[EMAIL PROTECTED]> select * from pgutils.foreign_keys_missing_indexes ;
 fk_namespace | fk_relation  |  fk_column  | fk_indexed | pk_namespace | pk_relation |  pk_column  | pk_indexed | ud ...
--+--+-++--+-+-++--- ...
 gong | node | alias_id    | f  | gong | alias   | alias_id    | t  | cn ...
 taxonomy | node | division_id | f  | taxonomy | division    | division_id | t  | cc ...
 gong | alias    | go_id   | f  | gong | node    | go_id   | t  | cc ...

etc...


ud is an abbreviation for update and delete constraint type (cascade, set null, restrict, etc). 

In this view, "indexed" means that the column is the first or only column in some index, i.e., pg_index.indkey[0] = pg_attribute.attnum. I suppose that one might want to distinguish the indexing cases more precisely as unindexed, sole-column index, first col of mult-col index, second col of multi-col index, etc, but I didn't do so. The views were originally written for 7.4 and I don't know what's appropriate for current multicolumn index behavior.

The code is in http://harts.net/reece/pgutils/ .

-Reece





-- 
Reece Hart, http://harts.net/reece/, GPG:0x25EC91A0








Re: [GENERAL] Deleting Problem

2006-10-30 Thread Reece Hart




On Tue, 2006-10-31 at 10:24 +1100, Jamie Deppeler wrote:

Here is my problem i have delete triggers on table5 and table6 which 
update summary information on table1,table2,table3,table4 if table5 or 6 
gets delete, problem i am faced with is when eg table1 record gets 
deleted i get the following error Error insert or update on table5 
violates foreign key constraint "table4". They are all link with a 
cascade delete. 


As Stephen said, it's difficult to help without more details about the triggers and the structure of your transaction. However, if you expect that the constraints are consistent at the end of the transaction, perhaps the problem is merely the order in which constraints are checked. See

http://www.postgresql.org/docs/8.1/interactive/sql-set-constraints.html

and consider something like SET CONSTRAINTS ALL DEFERRED . 

-Reece





-- 
Reece Hart, http://harts.net/reece/, GPG:0x25EC91A0









Re: [GENERAL] creating a dumpfile from a view

2006-10-30 Thread Reece Hart




On Mon, 2006-10-30 at 16:02 -0800, Richard Yen wrote:


Hi, I'm trying to create a dumpfile for a client.  The data is  
gathered from about 7 tables, and I need to output all the columns as  
the client wishes.


[snip]


Would anyone know how to dump the data from the view?  I tried the  
following, but it doesn't work:




I often export data with psqland feed to a spreadsheet, like this:

$ psql -F'' -Ac 'select col1,col2 from aview' >aview.csv
$ OOo aview.csv

Perhaps that will work for you.  is generated by typing ctrl-v, then .  ctrl-v inserts the next char literally. I dunno about the ill begotten csh-derivatives. Fortunately, I've never had conflicts with the delimiter choice and the data. You may not be so lucky.

If you're truly looking for the view data in pg_dump format, the only thing I can think of is to materialize the view and dump that.

A similar effect could be obtained by synthesizing a create table statement appropriate for the psql data export as above and using the copy command to load data. That could all be wrapped into a single file to be passed to psql for loading.

-Reece





-- 
Reece Hart, http://harts.net/reece/, GPG:0x25EC91A0









Re: [GENERAL] Postgresql 6.13

2006-10-14 Thread Reece Hart




On Fri, 2006-10-13 at 14:54 +0100, Richard Huxton wrote:

Greg's mostly right, but the Vic-20 port is available as a type-in 
program listing in issue 3 of "Your Computer". 


Don't bother. I already typed it in and have it on cassette tape here somewhere. I'll send it via parcel post with two 13 cent stamps as soon as I finish watching War Games on my betamax.

Is Carter still president?

-Reece





-- 
Reece Hart, http://harts.net/reece/, GPG:0x25EC91A0









Re: [GENERAL] using schema's for data separation

2006-09-29 Thread Reece Hart
On Fri, 2006-09-29 at 09:39 -0500, Erik Jones wrote:
> What we've done is to use table inheritance.  So, 
> we have a group of base account data tables and whenever a new account
> is added they get a set of tables that inherit from these base tables.
> This works well in that whenever we need a global schema change to any
> of these tables we can just alter that pertinent base table and the
> change will cascade down to the child tables. 

Many DDL commands do not propagate to the child tables -- foreign keys,
for example, are not inherited.  As a consequence, using inheritance for
Chris' purpose has the important caveat that referential integrity will
be difficult to ensure with standard FK constraints.  I believe that
other types of constraints and rules are also not propagated to child
tables.  Erik- do you have a clever solution to this aspect of using
inheritance?


On Thu, 2006-09-28 at 22:59 -0700, snacktime wrote:
> The connections to the database are from an unprivileged user, and
> everything goes through functions that run at the necessary
> privileges. 

Someone out there must have implemented row-level security for
postgresql (EnterpriseDB?).  You could implement this yourself by adding
a usesysid column to each table and using a combination of column
defaults, views, triggers, or rules to set usesysid on insert and update
and to require that the usesysid column matches the current user on
select and delete.  This probably improves the consistency of your
security policy (over having the policy in functions).

On the other hand, I could be blowing smoke -- it seems like this outta
work, but I haven't actually done this myself.  I don't actually do any
real work myself.


-Reece

-- 
Reece Hart, http://harts.net/reece/, GPG:0x25EC91A0


---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


[GENERAL] text formatting in a query, a la sprintf

2006-08-19 Thread Reece Hart




I often desire text and number formatting that's not available with functions like to_char. I'd really like to have functionality akin to sprintf, such as:

$ select no_hits,term,sprintf('http://google.com?q=%s',replace(queryterm,' ','+')) as url from queries;
$ select event,sprintf("%.2g",probability) as prob from event_probs;
$ select sprintf('%s',color,content) as td_cell from cells;

I considered a plperl function to do this, but plperl can't take anyarray and that means that I'd have to resort to ugliness like coding for specific numbers of args or encoding args them within in a single string... ick!

Is there a way to get more flexible formatting as in these examples within a backend query, even if not with sprintf per se?

Thanks,
Reece





-- 
Reece Hart, http://harts.net/reece/, GPG:0x25EC91A0








Re: [GENERAL] text datum VARDATA and strings

2006-08-14 Thread Reece Hart




On Mon, 2006-08-14 at 15:51 -0400, Tom Lane wrote:


Whose 7.x manual?  This stuff has been there since we invented the
"version 1" function call convention, which was 7.3 or before.  There
is some documentation in the SGML docs, but really we kind of expect you
to look at the standard built-in functions to see how things are done...



The PostgreSQL manual. I wrote these functions early in the 7.x series and I don't know which manual version exactly.  For example, the sec 9.5.4 of  
http://www.postgresql.org/docs/7.3/static/xfunc-c.html shows code for concat_text, which I remember was the basis for what I wrote. I now see and understand the text regarding detoasting the 'DatumGetXXX' macros; the relevance of these were not obvious to me at the time.



So it's best not to assume that you know what is inside a
text datum, if possible,



Okay. Does that mean that code in 9.5.4 should have a warning to that effect?




> 3) Is there any reason to believe that the code below is problematic?

The only thing I'd suggest is that checking for a null return from
palloc is a waste of effort.  It doesn't return to you if it runs
out of memory.



Okay.  Thanks for the advice, Tom.

-Reece






-- 
Reece Hart, http://harts.net/reece/, GPG:0x25EC91A0








[GENERAL] text datum VARDATA and strings

2006-08-14 Thread Reece Hart
Michael Enke recently asked in pgsql-bugs about VARDATA and C strings
(BUG #2574: C function: arg TEXT data corrupt).  Since that's not a bug,
I've moved this follow-up to pgsql-general.


On Mon, 2006-08-14 at 11:27 -0400, Tom Lane wrote:
> The usual way to get a C string from a TEXT datum is to call textout,
> eg
> str = DatumGetCString(DirectFunctionCall1(textout, datumval));


Yikes!  I've been accessing VARDATA text data like Michael for years
(code below).  I account for length and don't expect null-termination,
but I don't use anything like Tom's suggestion above.  (I always try to
do what Tom says because that usually hurts less.)


I have three questions:

1) I based everything I did on examples lifted nearly verbatim from a
7.x manual, and I bet Michael did similarly.  I've never heard of
DatumGetCString, DirectFunctionCall1, or textout.  Are these and other
treasures documented somewhere?  

2) Does DatumGetCString(DirectFunctionCall1(textout, datumval)) do
something other than null terminate a string?  All of the strings are
from [-A-Z0-1*]; server_encoding has been either SQL_ASCII or UTF8 in
case that's relevant.

3) Is there any reason to believe that the code below is problematic?

Thanks,
Reece




#include 
#include 
#include 
#include 

static char* clean_sequence(const char* in, int32 n);

PG_FUNCTION_INFO_V1(pg_clean_sequence);
Datum pg_clean_sequence(PG_FUNCTION_ARGS)
  {
  text* t0; /* in */
  text* t1; /* out */
  char* tmp;
  int32 tmpl;

  if ( PG_ARGISNULL(0) )
{ PG_RETURN_NULL(); }

  t0 = PG_GETARG_TEXT_P(0);

  tmp = clean_sequence( VARDATA(t0), VARSIZE(t0)-VARHDRSZ );
  tmpl = (int32) strlen(tmp);

  /* copy temp sequence into new pg variable */
  t1 = (text*) palloc( tmpl + VARHDRSZ );
  if (!t1)
{ elog( ERROR, "couldn't palloc (%d bytes)", tmpl+VARHDRSZ ); }
  memcpy(VARDATA(t1),tmp,tmpl);
  VARATT_SIZEP(t1) = tmpl + VARHDRSZ;

  pfree(tmp);

  PG_RETURN_TEXT_P(t1);
  }



/* clean_sequence -- strip non-IUPAC symbols
   The intent is to strip non-sequence data which might result from
   copy-pasting a fasta file or some such.

   in: char*, length
   out: char*, |out|<=length, NULL-TERMINATED
   out is palloc'd memory; caller must free

   allow chars from IUPAC std 20
   + selenocysteine (U) + ambiguity (BZX) + gap (-) + stop (*)
*/

#define isseq(c) ( ((c)>='A' && (c)<='Z' && (c)!='J' && (c)!='O') \
   || ((c)=='-') \
   || ((c)=='*') )

char* clean_sequence(const char* in, int32 n) {
  char* out;
  char* oi;
  int32 i;

  out = palloc( n + 1 );/* w/null */
  if (!out)
{ elog( ERROR, "couldn't palloc (%d bytes)", n+1 ); }
  
  for( i=0, oi=out; i<=n-1; i++ ) {
char c = toupper(in[i]);
if ( isseq(c) ) {
  *oi++ = c; 
}
  }
  *oi = '\0';
  return(out);
}


-- 
Reece Hart, http://harts.net/reece/, GPG:0x25EC91A0


---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


  1   2   >