from:"Steve Midgley"

Re: [SQL] How to count from a second table in an aggregate query?

2009-04-17 Thread Steve Midgley

Date: Wed, 15 Apr 2009 21:23:04 -0700
From: Steve Midgley scie...@misuse.org
To: Erik Jones ejo...@engineyard.com
Subject: Re: How to count from a second table in an aggregate query?
Message-ID: 49e6b2a8.5040...@misuse.org

Erik Jones wrote:

 On Apr 15, 2009, at 1:15 PM, Steve Midgley wrote:

 I want to generate an analysis report that counts the values in two 
 separate tables. I've been able to accomplish what I want with two 
 separate queries that I then merge together in Excel. Essentially 
 what I need is a horizontal UNION statement (or something like that).

 get a FK id and count of a certain column in one table, based on some 
 criteria
 - for each FK id, get the count of a different column in a different 
 table
 Display the counts from both queries side-by-side along with the FK 
 id's in a single result set

 Joining against a subquery for the second count does the trick:

 select src_contact_id, count(log_type), cp.count
 from contact_log ,
 (select contact_id, count(property_id)
   from contact_property
   group by contact_id) as cp
 where src_contact_id = cp.contact_id
 and log_type in ('web', 'detail')
 group by src_contact_id, cp.count
 order by src_contact_id

  src_contact_id | count | count
 +---+---
   1 | 5 | 4
   2 | 3 | 2

A friend of mine off-list provided an alternative SQL version which I thought 
the list might have interest in:

select src_contact_id, count(distinct contact_log.id), 
count(distinct contact_property.id) 
from

contact_log, contact_property
where contact_log.src_contact_id = contact_property.contact_id
and contact_log.log_type in ('web', 'detail')
group by src_contact_id;

Credit to Matt Gainsborough for that one. Makes perfect sense as I look at it. It's nice 
to see two alternate paths to the same solution. ANSI-92 joins work just as well as his 
ANSI-89 join syntax for this (i.e. using the JOIN keyword to set the 
relation).

Steve

[SQL] How to count from a second table in an aggregate query?

2009-04-15 Thread Steve Midgley


Hi,

I'm trying to figure out how to do something which I'd guess is easy for 
a sql whiz but has me stumped. I would greatly appreciate any help on 
this - it's a form of SQL query that I've never figured out, but have 
wanted to use many times over the years..


I want to generate an analysis report that counts the values in two 
separate tables. I've been able to accomplish what I want with two 
separate queries that I then merge together in Excel. Essentially what I 
need is a horizontal UNION statement (or something like that).


I've included some DDL and sample SQL queries that explain what I want 
better than I can in English, but the general idea is:


get a FK id and count of a certain column in one table, based on some 
criteria

- for each FK id, get the count of a different column in a different table
Display the counts from both queries side-by-side along with the FK id's 
in a single result set


Thanks for any assistance on this!

Steve

/*SQL STARTS*/
drop table if exists contact_log;
drop table if exists contact_property;
create table contact_log(id serial NOT null, src_contact_id integer, 
log_type character varying(63), CONSTRAINT contact_log_pkey PRIMARY KEY 
(id));
create table contact_property(id serial NOT null, contact_id integer, 
property_id integer,
 CONSTRAINT contact_property_pkey PRIMARY KEY (id), CONSTRAINT 
contact_property_cid_pid UNIQUE (contact_id, property_id));

insert into contact_log (src_contact_id, log_type) values(1, 'web');
insert into contact_log (src_contact_id, log_type) values(1, 'web');
insert into contact_log (src_contact_id, log_type) values(1, 'web');
insert into contact_log (src_contact_id, log_type) values(1, 'detail');
insert into contact_log (src_contact_id, log_type) values(1, 'detail');
insert into contact_log (src_contact_id, log_type) values(2, 'detail');
insert into contact_log (src_contact_id, log_type) values(2, 'detail');
insert into contact_log (src_contact_id, log_type) values(2, 'web');
insert into contact_log (src_contact_id, log_type) values(2, 'foobar');
insert into contact_log (src_contact_id, log_type) values(3, 'foobar');
insert into contact_log (src_contact_id, log_type) values(4, 'web');
insert into contact_property (contact_id, property_id) values(1, 20);
insert into contact_property (contact_id, property_id) values(1, 21);
insert into contact_property (contact_id, property_id) values(1, 22);
insert into contact_property (contact_id, property_id) values(2, 23);
insert into contact_property (contact_id, property_id) values(2, 24);
insert into contact_property (contact_id, property_id) values(1, 50);
insert into contact_property (contact_id, property_id) values(3, 51);
insert into contact_property (contact_id, property_id) values(5, 52);


-- This gets what I want from contact_log
select src_contact_id, count(log_type)
from contact_log
where
contact_log.src_contact_id in (select contact_id from contact_property)
and log_type in ('web', 'detail')
and src_contact_id in (select contact_id from contact_property)
group by src_contact_id
order by src_contact_id;
-- correct output is : 1|5, 2|3

-- This gets what I want from contact_property
select contact_id, count(property_id)
from contact_property
where
contact_id in (select src_contact_id from contact_log where log_type in 
('web', 'detail'))

group by contact_id
order by contact_id;
-- correct output is: 1|4, 2|2

-- THIS DOESN'T WORK (of course - but what would?)
select src_contact_id, count(log_type), count(property_id)
from contact_log
join contact_property cp on cp.contact_id = contact_log.src_contact_id
where
contact_log.src_contact_id in (select contact_id from contact_property)
and log_type in ('web', 'detail')
group by src_contact_id
order by src_contact_id
-- correct output *should be* : 1|5|4, 2|3|2
/*SQL ENDS*/

--
Sent via pgsql-sql mailing list (pgsql-sql@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-sql

Re: [SQL] How to count from a second table in an aggregate query?

2009-04-15 Thread Steve Midgley


Erik Jones wrote:


On Apr 15, 2009, at 1:15 PM, Steve Midgley wrote:

I want to generate an analysis report that counts the values in two 
separate tables. I've been able to accomplish what I want with two 
separate queries that I then merge together in Excel. Essentially 
what I need is a horizontal UNION statement (or something like that).


get a FK id and count of a certain column in one table, based on some 
criteria
- for each FK id, get the count of a different column in a different 
table
Display the counts from both queries side-by-side along with the FK 
id's in a single result set


Joining against a subquery for the second count does the trick:

select src_contact_id, count(log_type), cp.count
from contact_log ,
(select contact_id, count(property_id)
  from contact_property
  group by contact_id) as cp
where src_contact_id = cp.contact_id
and log_type in ('web', 'detail')
group by src_contact_id, cp.count
order by src_contact_id

 src_contact_id | count | count
+---+---
  1 | 5 | 4
  2 | 3 | 2

Thanks Erik! This is perfect. Oliveiros showed me another neat solution 
a while back that involved a select statement in the from clause, but I 
kind filed that solution mentally as a cool parlor trick. Now I see that 
I'm going to have to learn and study this form of SQL more closely, as 
it's extremely flexible and powerful.


Thanks for the very complete and patiently instructive response - it 
makes perfect sense. I'll work to share this along as I go.


Steve


--
Sent via pgsql-sql mailing list (pgsql-sql@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-sql

Re: [SQL] changing multiple pk's in one update

2009-04-14 Thread Steve Midgley

Date: Mon, 13 Apr 2009 17:09:49 -0400
From: Glenn Maynard glennfmayn...@gmail.com
To: pgsql-sql@postgresql.org
Subject: Re: changing multiple pk's in one update
Message-ID: d18085b50904131409g10d43d6cs35dd14ede13b...@mail.gmail.com

(JMdict?  I was playing with importing that into a DB a while back,
but the attributes in that XML are such a pain--and then my email died
while I was trying to get those changed, and I never picked it up
again.)

On Mon, Apr 13, 2009 at 1:20 PM, Stuart McGraw smcg2...@frii.com wrote:

 1 to the number of sentences in the entry) and the sentence text. Â The pk is
 of course the entry id and the sense number.
 There are other tables that have fk's to the senses.

Your PK is a composite of (entry, order)?  Won't your foreign keys
elsewhere all break when you shift the order around?

 I guess I could add an order[1] column and use the sense number as a
 surrogate partial key to avoid the need for key renumbering,
 but all the api's (and the normal human way of thinking) are based
 on sense number 1 of entry x, sense number 2 of entry y, so
 one would need to maintain order as a gapless sequence (or add a new
 mapping layer to map from/to a arbitrary monotonic sequence
 to a 1,2,3,... sequence) -- the gain doesn't seem that big.

Why not do this in the straightforward way: three separate fields: a
regular, sequential PK; an FK to the entry; and an order number.  Add
an index on (entry_key, order_number).  It's a little more expensive
since you have a new column and index (the PK), but in a table with a
lot of plain text that's probably insignificant.  Now you can use the
plain PK for your FK's.

I'd agree with this approach. I have a number of tables which are 
sensitive to arbitrary ordering and they sound roughly similar to your 
use-case (though my tables are probably smaller).

My approach is to create a string column in the table which permits 
defining arbitrary ordering. I use a string field b/c it's easier for me 
to stuff (by hand) new ordered records in between other existing 
records. But an integer would work just as well, so long as you make 
sure you keep enough space between the integers (e.g. 1000, 2000, 3000).

Also, if your ordered list is truly ordinal (each record is either 
1st, 2nd, 3rd, etc in a single list) you could just use 1,2,3,4 for the 
ordering, but then you have to mess with two records in order to swap 
the positions of (say) item 2 and 3. Of course you can do this pretty 
easily inside a transaction, and you don't have to worry about the mess 
of moving PK's.

Steve

--
Sent via pgsql-sql mailing list (pgsql-sql@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-sql

Re: [SQL] Nested selects

2009-04-09 Thread Steve Midgley

pgsql-sql-ow...@postgresql.org wrote:

Date: Tue, 7 Apr 2009 22:34:38 -0400
From: Glenn Maynard glennfmayn...@gmail.com
To: pgsql-sql@postgresql.org
Subject: Nested selects
Message-ID: d18085b50904071934g7ad206f1i14ac05f7bd29f...@mail.gmail.com

I'm deriving high scores from two tables: one containing data for each
time a user played (rounds), and one containing a list of stages:

CREATE TABLE stage (id SERIAL NOT NULL PRIMARY KEY, name VARCHAR);
CREATE TABLE round (id SERIAL NOT NULL PRIMARY KEY, score REAL,
stage_id INTEGER REFERENCES stage (id));
INSERT INTO stage (name) VALUES ('stage 1'), ('stage 2'), ('stage 3');
INSERT INTO round (stage_id, score) VALUES
  (1, 100), (1, 150), (1, 175),
  (2, 250), (2, 275), (2, 220),
  (3, 350), (3, 380), (3, 322);

SELECT r.* FROM round r
WHERE r.id IN (
-- Get the high scoring round ID for each stage:
SELECT
(
-- Get the high score for stage s:
SELECT r.id FROM round r
WHERE r.stage_id = s.id
ORDER BY r.score DESC LIMIT 1
)
FROM stage s
);

This works fine, and with a (stage_id, score DESC) index, is
reasonably fast with around 1000 stages.  round may expand to millions
of rows.

Unfortunately, it doesn't generalize to getting the top N scores for
each stage; LIMIT 2 isn't valid (more than one row returned by a
subquery used as an expression).

I fiddled with putting the inner results in an array, without much
luck, and I'm not sure how well that'd optimize.  Having the results
in any particular order isn't important.  (In practice, the inner
select will often be more specific--high scores on the west coast,
high scores this month, and so on.)

This seems embarrassingly simple: return the top rounds for each
stage--but I'm banging my head on it for some reason.

How about this:

select round.*, stage.name from round
left join stage on stage.id = round.stage_id
ORDER BY round.score DESC;

Steve

--
Sent via pgsql-sql mailing list (pgsql-sql@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-sql

Re: [SQL] Can we load all database objects in memory?

2009-03-26 Thread Steve Midgley

At 09:20 AM 3/26/2009, pgsql-sql-ow...@postgresql.org wrote:

Message-Id: 587e5df3-5859-48de-93f9-f7b05c37e...@rvt.dds.nl
From: ries van Twisk p...@rvt.dds.nl
To: DM dm.a...@gmail.com
In-Reply-To: 
eae6a62a0903251220p2edd379en50d17541edef0...@mail.gmail.com

Subject: Re: Can we load all database objects in memory?
Date: Wed, 25 Mar 2009 15:07:21 -0500
References: 
eae6a62a0903251220p2edd379en50d17541edef0...@mail.gmail.com

X-Archive-Number: 200903/89
X-Sequence-Number: 32332

The short answer is no, you cannot force PostgreSQL to load all
objects into memory.

However when you proper configure PostgreSQL most, if not all of 
your

data will be cached
by the OS and/or PostgreSQL shared memory system.

On Mar 25, 2009, at 2:20 PM, DM wrote:

I have a database of 10GB.
My Database Server has a RAM of 16GB

Is there a way that I can load all the database objects to memory?
Since you have such an abundance of RAM, I'd think you could simulate 
this feature by setting up some kind of psql script to pull down all 
10gb of data through Postgres when the server first boots - possibly 
just a bunch of select * from [table]; commands for each table in 
your catalog. All/most of the data should be cached into RAM by the OS 
at that point.

Also, I don't know what the state of the art is regarding RAM disks 
these days, but for a read-only database, that seems like an option 
(10gb of ram disk for your read-only data and 6 gb of ram for OS and 
Pg).

Steve

--
Sent via pgsql-sql mailing list (pgsql-sql@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-sql

Re: [SQL] Alter Table/Indexing

2009-03-25 Thread Steve Midgley

At 02:20 AM 3/25/2009, pgsql-sql-ow...@postgresql.org wrote:

To: Zdravko Balorda zdravko.balo...@siix.com
cc: pgsql-sql@postgresql.org
Subject: Re: Alter Table/Indexing
In-reply-to: 49c89fea.8060...@siix.com
References: 49c89fea.8060...@siix.com
Comments: In-reply-to Zdravko Balorda zdravko.balo...@siix.com
message dated Tue, 24 Mar 2009 09:55:06 +0100
Date: Tue, 24 Mar 2009 10:35:31 -0400
Message-ID: 27189.1237905...@sss.pgh.pa.us
From: Tom Lane t...@sss.pgh.pa.us
X-Archive-Number: 200903/84
X-Sequence-Number: 32327

Zdravko Balorda zdravko.balo...@siix.com writes:
 I wonder does ATER TABLE TYPE, SET, depends on indexes, like INSERT 
does
 in a sense it may be faster to drop and recreate index than sorting 

 after every row inserted.

ALTER TABLE TYPE already rebuilds the indexes; you won't make the
overall process any faster by doing that by hand.

regards, tom lane

I had a case (a long time ago) where I was on MS SQL in a production 
environment. We had a number of indices which were system related - 
meaning they were used infrequently to speed up certain administrative 
functions. When doing a bulk load we found that if we dropped these 
indices (but kept the ones that were crucial for production) we could 
significantly speed up the effective downtime of the system b/c any 
DDL statement was executed faster. We would then schedule these indices 
to be re-created at later dates, spreading out the load (b/c the system 
was in production at that point).

I wonder if Postgres functions similarly for such a use case? As Tom 
says, the total processing time is fixed: you have to upload the data 
and rebuild all the indices, but if there are non-critical indices, you 
can go from zero to data loaded faster by dropping them and 
rebuilding them manually later?

Thanks for any insight on that (and I hope my question helps the OP as 
well - if this seems off topic let me know),

Steve

--
Sent via pgsql-sql mailing list (pgsql-sql@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-sql

Re: [SQL] Exclude fields from SELECT command

2009-03-17 Thread Steve Midgley

At 05:20 PM 3/16/2009, pgsql-sql-ow...@postgresql.org wrote:

In-Reply-To: 1992170861895942...@unknownmsgid
References: 1992170861895942...@unknownmsgid
Date: Mon, 16 Mar 2009 22:45:54 +0100
Message-ID: 
162867790903161445i78127316s1c0deb3bec0e1...@mail.gmail.com

Subject: Re: Exclude fields from SELECT command
From: Pavel Stehule pavel.steh...@gmail.com
To: Charles Tam c@osm.net.au
Cc: pgsql-sql@postgresql.org

2009/3/16 Charles Tam c@osm.net.au:
 Hi Everybody

 Iâve a table with 35 fields and would like 
to perform a SELECT command

 without specifying every single field.

 As such, Iâve use the SELECT * 
command.Â  Is there an approach to exclude 5

 fields from being returned?

hello

no, there are no way

regards
Pavel Stehule

I think Pavel is right for 99% of the cases. But 
there is a cure that is worse than the disease.

You could select all columns from a bunch of 
tables without knowing what the column names 
were, excepting N columns, by iterating through 
the info schema data and building a SQL select 
appropriately (sql meta-programming I guess you 
would call it). But it's a real chore to do 
manually. If you have this need for some 
programmatic purpose (where some initial 
investment in effort will pay future dividends), 
then check out the info schema options:

http://developer.postgresql.org/pgdocs/postgres/infoschema-columns.html

In your case, I think you'd be looking for five 
values of table_name and then looking at all 
the column_name fields, building your column 
list, excepting the column_names you wish to exclude..

Best,

Steve

--
Sent via pgsql-sql mailing list (pgsql-sql@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-sql

[SQL] Re: select count of all overlapping geometries and return 0 if none.

2009-03-13 Thread Steve Midgley

At 06:20 AM 3/13/2009, pgsql-sql-ow...@postgresql.org wrote:

Message-ID: 457532.70947...@web45913.mail.sp1.yahoo.com
Date: Thu, 12 Mar 2009 10:28:19 -0700 (PDT)
From: Duffer Do dufferd...@yahoo.com
Subject: select count of all overlapping geometries and return 0 if 
none.

To: pgsql-sql@postgresql.org
X-Archive-Number: 200903/24
X-Sequence-Number: 32267

Hello all,
I have 2 tables locations and user_tracker:

locations has 2 columns
location_name
location_geometry

user_tracker has 3 columns
user_name
user_geometry
user_timestamp

locations table is coordinates and names of areas of interest.
user_tracker basically is an archive of a user's movements as he pans 
his map.

I have a need to assign a ranking of locations based on how many times 
users have intersected this location.

The problem I am having is that my query only returns locations that 
have been intersected by a user.
I need it to return ALL locations and a zero if this location has not 
been intersected.

As an example:

LOCATIONS
1: Talahassee, FL | talahassee's bounding box
2: Manhattan, NY  | Manhattan's bounding box
3: Frankfurt, GE| Frankfurt's bounding box

USER_TRACKER
john doe | geometry that overlaps Frankfurt  | today
john doe | geometry that overlaps Frankfurt  | today
john doe | geometry that overlaps Frankfurt  | today
john doe | geometry that overlaps Frankfurt  | yesterday
john doe | geometry that overlaps Frankfurt  | Monday
john doe | geometry that overlaps Frankfurt  | Sunday
Mary Jane  | geometry that overlaps Manhattan  | today
Rob Roy| geometry that overlaps Manhattan  | today
Rob Roy| geometry that overlaps Manhattan  | today

I want to return the following:
locations|  number_visits
Frankfurt|  6
Manhattan  |  3
Talahassee |  0

My query only returns:
Frankfurt|  6
Manhattan  | 3

Now I have really simplified this example for readability, my actual 
tables are more complex.

How can I accomplish this?

My query:
SELECT count(user_name) as number_visits, location_name from 
locations, user_tracker WHERE user_geometry  location_geometry

Thanks in advance

Hi,

I am stuck at an airport right now, and had the time to hack out your 
solution. I hope it helps. In the future, it would be helpful to the 
list (and more likely to yield responses) if you include the 
create/insert statements such as what I've put together below. It makes 
it much easier for people to quickly get you an answer - in this case I 
had a spare 20 minutes so I did it for you.

Also, I put a gist index on your locations table but that assumes your 
areas are flat, which isn't right for spherical coordinates on the 
earth. PostGis should be able to get you closer, if that kind of 
accuracy matters.

Anyway, here are the table setups and what I believe is the solution 
query:

-- START SQL SCRIPT --

drop table if exists locations;
create table locations
(id serial,
name varchar(255),
area circle);

insert into locations (name, area)
values
('Talahassee, FL','((0,0),1)'),
('Manhattan, NY','((2,0),1)'),
('Frankfurt, GE','((4,0),1)');

CREATE INDEX idx_locations_area ON locations USING gist (area);

drop table if exists user_tracker;
create table user_tracker
(id serial primary key,
name varchar(255),
location point,
time varchar(255));

insert into user_tracker (name,location,time)
values
('john doe', '(4,0)', 'today'),
('john doe', '(4,0)', 'today'),
('john doe', '(4,0)', 'today'),
('john doe', '(4,0)', 'yesterday'),
('john doe', '(4,0)', 'Monday'),
('john doe', '(4,0)', 'Sunday'),
('Mary Jane', '(2,0)', 'today'),
('Rob Roy', '(2,0)', 'today'),
('Rob Roy', '(2,0)', 'today');

-- SOLUTION QUERY HERE --
select count(user_tracker.id), locations.name from user_tracker
right outer join locations on user_tracker.location @ locations.area
group by locations.name;

-- END SQL SCRIPT --

OUTPUT:

3;Manhattan, NY
0;Talahassee, FL
6;Frankfurt, GE

Steve 

--
Sent via pgsql-sql mailing list (pgsql-sql@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-sql

Re: [SQL] Best practices for geo-spatial city name searches?

2009-02-25 Thread Steve Midgley

At 08:20 AM 2/25/2009, pgsql-sql-ow...@postgresql.org wrote:

To: pgsql-sql@postgresql.org
From:  Mark Stosberg m...@summersault.com
Subject: Best practices for geo-spatial city name searches?
Date:  Tue, 24 Feb 2009 11:19:56 -0500
Message-ID:  20090224111956.5b7a4...@summersault.com
X-Archive-Number: 200902/94
X-Sequence-Number: 32231

Hello,

I use PostgreSQL and the cube type to perform geo-spatial zipcode 
proximity
searches. I'm wondering about the best practices also supporting a 
geo-spatial

distance search based on a city name rather than zipcode.

In our original data model, we used a 'zipcodes' table, with the 
zipcode as the
primary key.  This can of course contain a City Name column, but 
there is a
problem with this, illustrated a Nome, Alaska case. Nome's zipcode 
is 99762.

It maps to multiple cities including Diomede, Alaska and Nome, Alaska.

In the data model described, only the Diomede row is imported, and 
the other
rows, including the Nome, Alaska row are dropped. So if you try to 
search

for Nome, Alaska, you won't find anything.

One solution would be to have a cities table, with the city/state as 
the
primary key, and a zipcode as an additional column. Then, by joining 
on the

zipcodes table, the coordinates for a city could be found.

Is there any other way I should be considering data modelling to 
support

searches on zipcodes and cities?

Hi Mark,

I built a very similar system for www.hutz.com. It uses a complete 
postcode database, without dropping nearby/overlapping cities. It also 
includes the postcode alias values, which are names that the post 
office uses as equivalent to the official names.

Within the city table, I created a series of self-joining id's:

id|alias_city_id|post_code_city_id|muni_city_id

So a city record can be an alias, postcode or muni record. A muni 
record is the definitive record for a city (and is defined by the 
postcode record closest to the city center as defined by the USGS). A 
postcode record, represents a postcode (zipcode) region within a city. 
An alias represents an alternate name that either refers to a muni 
record or a postcode record (and is defined as alias_city_id IS NOT 
NULL)

So if I want to search the table for only muni city records, the query 
looks like

select * from city where id = muni_city_id

I also included lat/long coordinates for every record, making it easy 
to calculate distances and find all city records within a certain 
range, etc. (I used the point and circle operators for this along 
with a GiST index - it's not perfect for long distances - it assumes 
the earth is flat, but it works great for small distances and is very 
fast).

I hope this helps. Feel free to contact me on-list or off, if you want 
to discuss more.

Steve

--
Sent via pgsql-sql mailing list (pgsql-sql@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-sql

Re: [SQL] problem using twice custom comparision operator

2009-01-25 Thread Steve Midgley

At 01:20 PM 1/24/2009, pgsql-sql-ow...@postgresql.org wrote:

From: Marek Florianczyk fra...@adm.tp.pl
Organization: TP SA
To: pgsql-sql@postgresql.org
Subject: problem using twice custom comparision operator
Date: Fri, 23 Jan 2009 21:42:44 +0100
Message-Id: 200901232142.44102.fra...@adm.tp.pl

Hi all,

I wanted to make custom operator to sort data like this:
1,2,10,1a,1b,10a

in to order:
1,1a,1b,2,10,10a

Hi Marek,

The following idea may be too different to fit your needs, but I got 
some help from this list a while back on how to force sorts for a 
specific query. It sounds like you want to override searching for all 
queries, so this may not be appropriate. Anyway here's an example of a 
solution that sorts things in arbitrary order for any given query:

SELECT * FROM foobar
ORDER BY CASE field
WHEN 555 then 1
WHEN 342 then 2
WHEN 111 then 3
ELSE 4

This sorts 555 then 342 then 111 then everything else.

Obviously this is oversimplified for your case, but you could write 
some comparisons in place of the static numbers (e.g. 555) that 
follow the same rules as the function you're writing. I don't know if 
performance would be anything comparable either (I'd guess that using 
the regex operators (like ~* would be the way to go).

I thought I'd mention this other approach in case it was of interest 
and you haven't run across it before.

Sincerely,

Steve

--
Sent via pgsql-sql mailing list (pgsql-sql@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-sql

[SQL] Re: some howto/theory book/tutorial on practical problem solving in SQL

2009-01-14 Thread Steve Midgley

At 09:20 AM 1/14/2009, pgsql-sql-ow...@postgresql.org wrote:

Date: Wed, 14 Jan 2009 12:05:29 +0100
From: Ivan Sergio Borgonovo m...@webthatworks.it
To: pgsql-sql@postgresql.org
Subject: Re: some howto/theory book/tutorial on practical problem 
solving in SQL

Message-ID: 20090114120529.0ab11...@dawn.webthatworks.it
In-Reply-To: 375b6e92-443f-4bd3-bd03-908925639...@engineyard.com
References: 2009073227.159ab...@dawn.webthatworks.it
375b6e92-443f-4bd3-bd03-908925639...@engineyard.com

 O'Reilly's SQL Hacks is a good one that fits the bill you describe.

I think it is complementary to Celko's SQL puzzles.
O'Reilly's book seems techniques on the field.
Celko's book seems design on the field.

Meanwhile I downloaded all the ecommerce, billing and ERP Open Source
programs I know (over 60) and started to examine the code to see
how they manage discounts in SQL.

The one that seems more promising to learn from seems:
ofbiz
webERP
promogest

I'll come back asking comments on a possible design to solve my
problem later.

Hi Ivan,

You're studying something very interesting, and I hope you will post 
your progress and solutions, questions and ideas back to this list as 
you go. I've forwarded this thread already to several people who work 
with related issues, and they're very interested in some solutions as 
well.

So stay in touch as you work on this, please.

Sincerely,

Steve

--
Sent via pgsql-sql mailing list (pgsql-sql@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-sql

Re: [SQL] How to excute dynamically a generated SQL command?

2009-01-05 Thread Steve Midgley

At 10:20 AM 1/4/2009, pgsql-sql-ow...@postgresql.org wrote:
Message-ID: 
618950b80901031757l15109658kdae1cdb0814d3...@mail.gmail.com

Date: Sat, 3 Jan 2009 17:57:32 -0800
From: John Zhang johnzhan...@gmail.com
To: postgis-us...@postgis.refractions.net
Subject: How to excute dynamically a generated SQL command?
X-Archive-Number: 200901/2
X-Sequence-Number: 32084

Hi the list,

Referring to the PostgreSQL 8.3 documentation  38.5.4. Executing 
Dynamic Commands , the command for executing a dynamic command is:

EXECUTE command-string [ INTO [STRICT] target ];

I am to execute an sql statement created dynamically, which is 
represented in a variable sSql.

Here is an example:
sSql='INSERT INTO hm_raster.hm_airphotos( file_ext, airphoto) VALUES 
('.tif',  lo_import( E''C:\\HM\\Data\\Flightmap.tif'');';

EXECUTE sSQL;

It raises the error as:
ERROR:  syntax error at end of input
LINE 1: ...E'C:\\HM\\Data\\Flightmap.tif')
  ^

I would appreciate a lot if you offer your input. Thanks a lot.

John

John: You're not escaping all your strings. That error message is a 
tip-off, I think. Try this line:

sSql='INSERT INTO hm_raster.hm_airphotos( file_ext, airphoto) VALUES 
(''.tif'',  lo_import( E''C:\\HM\\Data\\Flightmap.tif'');';

The part I changed was: ''.tif''

I'm not sure what language you're working in, but it's remotely 
possibly (depending on the execution stack) that you have to doubly 
escape your backslashes also, in which case:

sSql='INSERT INTO hm_raster.hm_airphotos( file_ext, airphoto) VALUES 
(''.tif'',  lo_import( E''C:HMDataFlightmap.tif'');';

I suffer on Windows wishing we could have / path separators by 
default. Note that these days Windows generally does support / 
instead of \ for paths if you're careful. If you put them in quotes, 
it works even on the command line, which is helpful. You can type this 
directly into the CMD prompt now:

dir c:/temp

All new programs I write on Windows (in Ruby) use forward slashes for 
paths, and it works just fine. Not sure about VB or C#, but I'd guess 
you can make it work. Might be simpler than all the escaping work..

Best,

Steve

--
Sent via pgsql-sql mailing list (pgsql-sql@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-sql

Re: [SQL] Question on Escape-string

2009-01-03 Thread Steve Midgley

At 05:20 AM 1/1/2009, pgsql-sql-ow...@postgresql.org wrote:

To: pgsql-sql@postgresql.org
Subject: Question on Escape-string
X-Archive-Number: 200812/132
X-Sequence-Number: 32082

Dear all,

I am using pl/pgsql to develop a function to implement some logic to 
load BLOB data, like .tif file, to postgres DB. The issue I am facing 
is the file name MUST be with double back-slash \\ in order for pgsql 
to process the string properly. However, when the string is Escaped in 
my function, how can I pass it in to lo_import() function?

Is there any function to double back-slash a string? Or how can we 
preserve a string as RAW?

 ISSUE :
-- use E'C:\\tmp\\tst.tif' for the full file name for IN 
parameter of load_blob function.
-- however, when the string is escaped it becomes 
'C:\tmp\tst.tif' as expected
-- the file name need be passed in to lo_import() function 
again without double \\
-- when it is passed in and escaped , the \ is gone and the 
filename becomes meaningless

Any input would be much appreciated!

Thanks a lot
John

Hi John,

If I understand you, you want to put double backslashes back into a 
string that has been stored in a Postgres field with single 
backslashes?

Here's some SQL I cooked up to demonstrate what I think is a solution. 
Note the use of \\ and  doubly-escaped backslashes in the regex 
replace parameters - that's the key.

DROP TABLE IF EXISTS test;
CREATE TABLE test
(
  filename character varying(255) NOT NULL,
  data bytea
);

insert into test (filename, data)
values (E'c:\\tmp\\tst.tif', '1234');

select replace(filename, E'\\', E''), data from test

Does this do it?

Steve

--
Sent via pgsql-sql mailing list (pgsql-sql@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-sql

Re: [SQL] Best way to and from a one-to-many joined table?

2008-12-07 Thread Steve Midgley

At 11:20 AM 12/6/2008, [EMAIL PROTECTED] wrote:

Message-ID: [EMAIL PROTECTED]
From: Oliveiros Cristina [EMAIL PROTECTED]
To: Bryce Nesbitt [EMAIL PROTECTED],
sql pgsql pgsql-sql@postgresql.org
References: [EMAIL PROTECTED]
Subject: Re: Best way to and from a one-to-many joined table?
Date: Fri, 5 Dec 2008 19:23:25 -

Howdy, Bryce

Could you please try this out and tell me if it gave what you want.

Best,
Oliveiros

SELECT person_name
FROM test_people p
JOIN test_attributes a
ON ((a.people_id = p.people_id) AND (a.attribute = @firstAttr))
JOIN test_attributes b
ON ((b.people_id = p.people_id) AND (b.attribute = 
@secondAttr));

Hi,

I saw a few people post answers to this question and it raised another 
related question for me.

What are the differences between the above query and this one. Are they 
semantically/functionally identical but might differ in performance? Or 
would they be optimized down to an identical query? Or am I misreading 
them and they are actually different?

SELECT person_name
FROM test_people p
JOIN test_attributes a
ON ((a.people_id = p.people_id)
JOIN test_attributes b
ON ((b.people_id = p.people_id)
WHERE
  (a.attribute = @firstAttr))
  AND (b.attribute = @secondAttr));

Also, any suggestions about how to figure out this on my own without 
bugging the list in the future would be great. Thanks for any insight!

Steve

p.s. I posting in the same thread, but if you think I should have 
started a new thread let me know for the future. 

--
Sent via pgsql-sql mailing list (pgsql-sql@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-sql

Re: [SQL] Sequence and nextval problem

2008-11-25 Thread Steve Midgley

At 11:20 PM 11/24/2008, [EMAIL PROTECTED] wrote:

Message-Id: [EMAIL PROTECTED]
From: ries van Twisk [EMAIL PROTECTED]
To: Tk421 [EMAIL PROTECTED]
In-Reply-To: [EMAIL PROTECTED]
Subject: Re: Sequence and nextval problem
Date: Mon, 24 Nov 2008 16:21:40 -0500
References: [EMAIL PROTECTED]
X-Archive-Number: 200811/144
X-Sequence-Number: 31928

On Nov 24, 2008, at 2:12 PM, Tk421 wrote:

  The conversion from access database to postgres worked fine.
Everithing it's ok. But now, when i use my database i've found a
problem with sequences. In the conversion, the autonumeric fields
from access have been converted to sequences, everithing ok in a
first view. The problem comes because the autonumeric fields in
access always return the last value of the table +1, but postgres
no. Postgres returns lost (i don't know how to call them) values.
An example.

[snip]
  In access if i execute INSERT INTO table (description) VALUES
('desc 8'), the result row is  8 |  desc 8
  But in postgres the same query te result row is 3 | desc 8

  My question is, can i do something to make ANY sequence to take
the last value from his associated table, and not a lost value?

This sounds like if the start of the sequence is set incorrectly:

Try this : SELECT setval('NAME OF SEQUENCE', SOME_INTEGER, true);

btw, you should also not expect a specific value from the sequence
except that you will always get the next value from the sequence.
it's also generally a bad idea to do select max(someid)+1 from 
table.

The whole concept of a sequence is thus much better.

I think this is sound general advice for a production database.

However if you control the database such that you can prevent access to 
it while you are updating it, you can run something like:

SELECT setval('NAME OF SEQUENCE', (select max(id)+1 from 
table_of_sequence), true);

Where table_of_sequence is the name of the table which the sequence 
is attached to.

The reason you don't use that syntax is that it's not multi-user safe. 
But if you know there are no other users running changes to that 
sequence when you run your updates, then you're good to go. It's a very 
fast way to update all your tables to make sure the sequence #'s are 
all valid, without having to look up the max value on each one (which 
would also require that you shut off access to the table and for a much 
longer time).

Hope that helps,

Steve

--
Sent via pgsql-sql mailing list (pgsql-sql@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-sql

Re: [SQL] grouping/clustering query

2008-10-23 Thread Steve Midgley

At 10:20 PM 10/22/2008, you wrote:
Message-ID: 
[EMAIL PROTECTED]

Date: Wed, 22 Oct 2008 12:14:49 +0700
From: David Garamond [EMAIL PROTECTED]
To: pgsql-sql@postgresql.org
Subject: grouping/clustering query
X-Archive-Number: 200810/89
X-Sequence-Number: 31731

Dear all,

I have an invoices (inv) table and bank transaction (tx) table.
There's also the payment table which is a many-to-many relation
between the former two tables, because each invoice can be paid by one
or more bank transactions, and each bank transaction can pay for one
or more invoices. Example:

# (invoiceid, txid)
(A, 1)
(A, 3)
(B, 1)
(B, 2)
(C, 5)
(D, 6)
(D, 7)
(E, 8)
(F, 8)

For journalling, I need to group/cluster this together. Is there a SQL
query that can generate this output:

# (journal: invoiceids, txids)
[A,B] , [1,2,3]
[C], [5]
[D], [6,7]
[E,F], [8]

Hi Dave,

I'm not following the logic here. A has 1,3 and B has 1,2. So why does 
the first line print:

[A,B] , [1,2,3]

What's the rule that tells the query to output this way? Is it that all 
of B's values are between A's values?

Also in your output, you've indicated [A,B] - does this mean you want 
two columns of output, each column being a pg array?

I may not be the best person to answer the actual SQL question, but I 
thought I'd clarify your requirements so the list members can have the 
best chance of answering.

Steve

--
Sent via pgsql-sql mailing list (pgsql-sql@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-sql

Re: [SQL] many-to-many relationship

2008-10-09 Thread Steve Midgley

At 05:20 PM 10/8/2008, [EMAIL PROTECTED] wrote:

Date: Wed, 8 Oct 2008 11:25:10 +0200
From: Louis-David Mitterrand [EMAIL PROTECTED]
To: pgsql-sql@postgresql.org
Subject: Re: many-to-many relationship
Message-ID: [EMAIL PROTECTED]
Mail-Followup-To: pgsql-sql@postgresql.org
References: [EMAIL PROTECTED] 
[EMAIL PROTECTED]

In-Reply-To: [EMAIL PROTECTED]
X-Archive-Number: 200810/23
X-Sequence-Number: 31665

 |id|image_url|f_table|f_key
 |1 |url..|person |1234
 |2 |url2.|event  |5678

 I think this is called a polymorphic join but I could be wrong 
about
 that. I'd guess you could construct a rule or trigger to validate 
the

 foreign key data on insert/update but that's out of my skill area.

Hi Steve,

So in your solution the f_table column is just text which needs to be
validated by a custom trigger?

Hi,

Yup - that's exactly what I'm suggesting. Storing the text value of the 
related tables right in the table in question. It might seem insane, 
but in my experience it works out reasonably well. Ruby on Rails has 
popularized the approach, using it both in the data backend, as well as 
in the OO frontend (so Rugy object class to be instantiated is chosen 
by the text value of f_table for a given row - hence the 
polymorphism).

http://wiki.rubyonrails.org/rails/pages/UnderstandingPolymorphicAssociations

There are some situations where this approach could create problems but 
if in general all you're doing is select statements along these lines:

select * from images where
f_table = 'person' and f_id = '1234'

There's not much to go wrong. (Famous last words).

And regarding the custom validation by trigger, I'd think that would 
work just fine. I'm not an expert on triggers, rules and constraints in 
Pg though. (I do all my validation in the middleware, which might give 
some people here high blood pressure). :)

Keep us posted on which solution you choose and how it works out for 
you!

Steve 

--
Sent via pgsql-sql mailing list (pgsql-sql@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-sql

Re: [SQL] many-to-many relationship

2008-10-07 Thread Steve Midgley

At 06:20 AM 10/7/2008, [EMAIL PROTECTED] wrote:

Date: Mon, 6 Oct 2008 15:08:02 +0200
From: Louis-David Mitterrand [EMAIL PROTECTED]
To: pgsql-sql@postgresql.org
Subject: many-to-many relationship
Message-ID: [EMAIL PROTECTED]
X-Archive-Number: 200810/13
X-Sequence-Number: 31655

Hi,

Say you have several objects (tables): person, location, event, etc. 
all

of which can have several images attached.

What is the best way to manage relations between a single 'image' 
table

and these different objects?

For now each 'image' row has pointers to id_person, id_location,
id_event, etc. (only one of which is used for any given row).

Is there a better way, more elegant way to do it, without using
redundant id_* pointers on each row and yet still enforce foreign 
keys?

Thanks,

Hi,

I think the relationship tables method works pretty well but I have 
another suggestion. You could store the Foreign table name within image 
table as well as the Foreign key.

|id|image_url|f_table|f_key
|1 |url..|person |1234
|2 |url2.|event  |5678

I think this is called a polymorphic join but I could be wrong about 
that. I'd guess you could construct a rule or trigger to validate the 
foreign key data on insert/update but that's out of my skill area.

Hope that helps a little,

Steve

--
Sent via pgsql-sql mailing list (pgsql-sql@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-sql

Re: [SQL] Finding sequential records

2008-09-30 Thread Steve Midgley


At 09:50 PM 9/29/2008, Richard Broersma wrote:
On Mon, Sep 29, 2008 at 7:48 PM, Steve Midgley [EMAIL PROTECTED] 
wrote:


 In my specific case it turns out I only had duplicates, but there 
could have
 been n-plicates, so your code is still correct for my use-case 
(though I

 didn't say that in my OP).

Ya there are a lot of neat queries that you can construct.  If you
have a good background in math and set theory (which I don't have) you
can develop all sorts of powerful analysis queries.

On a side note, I thought that I should mention that unwanted
duplicates are an example where some ~have gotten bitten~ with a
purely surrogate key approach.  To make matter worse, is when some
users update part of one duplicate and another updates a different
duplicated on a another field(s).  Then once the designer discovers
the duplicate problem, she/he has to figure out some way of merging
these non-exact duplicates.  So even if the designer has no intention
of implementing natural primary/foreign keys, he/she will still
benefit from a natural key consideration in that a strategy can be
designed to prevent getting bitten by duplicated data.

I only mention this because db designers get bitten by this all the
time.  Well at least the ones that subscribe to www.utteraccess.com
get bitten.  From what I've seen not one day has gone by without
someone posting a question to this site about how to both find and
remove all but one of the duplicates.


Truly. I have worked with some school districts around the US and this 
duplicate record problem is more than theoretical. Some of the 
gnarliest, dirtiest, n-plicate data I've ever seen comes out of the US 
public education system.


More generally where I have seen a need for natural keys, I've always 
taken the best of both worlds approach. So I always stick an 
integer/serial PK into any table - why not - they're cheap and 
sometimes are handy. And then for tables along the lines of your 
description, I add a compound unique index which serves the business 
rule of no dupes along these lines.


Am I following your point? Any reason why using serial PK's with 
compound natural unique indices is better/worse than just using 
natural PK's?


Steve


--
Sent via pgsql-sql mailing list (pgsql-sql@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-sql

Re: [SQL] Finding sequential records

2008-09-29 Thread Steve Midgley


At 05:38 PM 9/26/2008, Oliveiros Cristina wrote:

In-Reply-To: [EMAIL PROTECTED]
References: [EMAIL PROTECTED]
[EMAIL PROTECTED]
[EMAIL PROTECTED]
Howdy, Steve.

SELECT id
FROM dummy a
NATURAL JOIN (
SELECT fkey_id,name
FROM dummy
GROUP BY fkey_id,name
HAVING COUNT(*)  1 AND SUM(id) = (MAX(id) + MIN(id)) * (MAX(id) - 
MIN(id) + 1) / 2

) b
ORDER BY id;

In your table you just have duplicates? Or you may have triplicates? 
And quadruplicates? And in general n-uplicates? At the time, I thought 
you might have n-uplicates, so I designed the query to be as general 
as possible to handle all that cases, from which duplicates are a 
particular case, but now i am wondering if you don't have more than 
duplicates.


In my specific case it turns out I only had duplicates, but there could 
have been n-plicates, so your code is still correct for my use-case 
(though I didn't say that in my OP).



Well, anyway the idea is as follows
The sum of a sequence is given by first + last / 2 * n, with n = last 
- first + 1, OK ?


I *love* your application of that formula. It's rare for me to be able 
to use real math in SQL, so this was a pleasure to read (and 
understand!)


Thanks again to Richard and Oliveiros for a truly educating experience! 
I hope some others were similarly enlightened.


With gratitude,

Steve


--
Sent via pgsql-sql mailing list (pgsql-sql@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-sql

[SQL] Finding sequential records

2008-09-26 Thread Steve Midgley


Hi,

I've been kicking this around today and I can't think of a way to solve 
my problem in pure SQL (i.e. I can only do it with a 
looping/cursor-type solution and some variables).


Given a table with this DDL/data script:

drop table if exists dummy;
create table dummy (
  id integer primary key,
  name varchar(255),
  fkey_id integer
  )
;
insert into dummy (id, name, fkey_id) values (502163,'3Br/3Ba Pool 
Villa in Westin St. John, USVI- Summer 2008',500100);
insert into dummy (id, name, fkey_id) values (502164,'3Br/3Ba Pool 
Villa in Westin St. John, USVI- Summer 2008',500100);
insert into dummy (id, name, fkey_id) values (502169,'Lazy Bear 
Lodge',105);

-- not sequential id to previous
insert into dummy (id, name, fkey_id) values (502199,'Lazy Bear 
Lodge',105);
insert into dummy (id, name, fkey_id) values (502170,'3 Bed, 1 Bath 
Cottage Less Than a Mile from West Dennis Beach',500089);
insert into dummy (id, name, fkey_id) values (502171,'3 Bed, 1 Bath 
Cottage Less Than a Mile from West Dennis Beach',500089);

-- not sequential id nor duplicate fkey_id to previous
insert into dummy (id, name, fkey_id) values (502175,'3 Bed, 1 Bath 
Cottage Less Than a Mile from West Dennis Beach',500102);
insert into dummy (id, name, fkey_id) values (502213,'Sea 
Watch',500128);

-- not duplicate fkey_id to previous
insert into dummy (id, name, fkey_id) values (502214,'Sea 
Watch',500130);


Find all instances where
 * name is duplicated
 * fkey_id is the same (for the any set of duplicated name fields)
 * id is sequential (for any set of duplicated name fields)

The system should return

502163
502164
502170
502171

Here's as far as I got:

select id
from dummy
where
name in (
  select name from dummy
  group by name
  having count(name)1
)
order by id

I can't figure out how to test for duplicate fkey_id when name is the 
same, nor to test for sequential id's when name is the same.


Having a method for either would be great, and both would be a bonus!

It seems like there's a clever way to do this without cursors but I 
can't figure it out!


Thanks for any help!

Steve


--
Sent via pgsql-sql mailing list (pgsql-sql@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-sql

Re: [SQL] Finding sequential records

2008-09-26 Thread Steve Midgley


Wow.

Thanks to both Richard and Oliveiros.

Out of the box Oliveiros' solution does what I want but I don't 
understand why!



SELECT id
FROM dummy a
NATURAL JOIN (
SELECT fkey_id,name
FROM dummy
GROUP BY fkey_id,name
HAVING COUNT(*)  1 AND SUM(id) = (MAX(id) + MIN(id)) * (MAX(id) - 
MIN(id) + 1) / 2

) b
ORDER BY id;


What's going on here with the sum(id) equaling the average product of 
the min and max? I gather that's to match id's with id's that are one 
bigger than itself? Can anyone clarify how that is working?


Richard's sql is very interesting to me in concept - but it's not 
getting me the results correctly:



SELECT A.*
  FROM ( SELECT ID
  FROM Dummy
 GROUP BY name, fkey_id ) AS A
INNER JOIN Dummy AS D
  ON A.id - 1 = D.id
  OR A.id + 1 = D.id;


This returns an error:

ERROR: column dummy.id must appear in the GROUP BY clause or be used 
in an aggregate function

SQL state: 42803

I'm not sure how to setup that from select to produce id's without 
adding id to the group by (which would cause the query to return too 
many rows). Perhaps a natural join like in Oliveiros' sql would do the 
job?


Thanks for any advice on either of these solutions. I'm going to learn 
a lot here if someone can pound it into my head.


Thanks,

Steve

It seems to be returning any records that have sequential id's 
regardless

At 11:02 AM 9/26/2008, Richard Broersma wrote:
On Fri, Sep 26, 2008 at 10:39 AM, Steve Midgley [EMAIL PROTECTED] 
wrote:

 drop table if exists dummy;
 create table dummy (
  id integer primary key,
  name varchar(255),
  fkey_id integer
  )
 ;

 The system should return

 502163
 502164
 502170
 502171


--first get all of the duplicated ids

 SELECT id
 FROM Dummy
GROUP BY name, fkey_id


--Next from this list find check to see if there are any sibling
immediate above or below it.

SELECT A.*
  FROM ( SELECT ID
  FROM Dummy
 GROUP BY name, fkey_id ) AS A
INNER JOIN Dummy AS D
  ON A.id - 1 = D.id
  OR A.id + 1 = D.id;

--
Regards,
Richard Broersma Jr.

Visit the Los Angeles PostgreSQL Users Group (LAPUG)
http://pugs.postgresql.org/lapug



--
Sent via pgsql-sql mailing list (pgsql-sql@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-sql

Re: [SQL] surrogate vs natural primary keys

2008-09-19 Thread Steve Midgley

At 08:20 AM 9/18/2008, [EMAIL PROTECTED] wrote:
Message-ID: 
[EMAIL PROTECTED]

Date: Wed, 17 Sep 2008 09:20:44 -0700
From: Richard Broersma [EMAIL PROTECTED]
To: Scott Marlowe [EMAIL PROTECTED]
Subject: Re: surrogate vs natural primary keys
In-Reply-To: 
[EMAIL PROTECTED]

References: [EMAIL PROTECTED]
[EMAIL PROTECTED]
[EMAIL PROTECTED]
[EMAIL PROTECTED]
[EMAIL PROTECTED]
[EMAIL PROTECTED]
[EMAIL PROTECTED]
[EMAIL PROTECTED]
X-Archive-Number: 200809/124
X-Sequence-Number: 31576

My opinion is that the database constraints are the last line of
defense to ensure business rules and data integrity are not violated.
Since I highly value the ability to enforce business rules using
ordinary table DDL, I try to use natural keys as often as I can.

Hi Richard,

I often find your comments insightful and right on the money. This is 
another one of those cases. Your comments above are a great example of 
when natural keys make sense: I hadn't looked at it from this 
perspective!

I'm a middleware developer (the bane of DBA's!) -- and so I generally 
solve these sorts business rules constraints in the middleware code, 
which of course is prone to all kinds of different problems (like lazy 
developers who code around the OO validation checkers!).

Thanks for giving such a great explanation as to the value of natural 
keys! You haven't won me over, but you did teach me something - which I 
appreciate.

Best,

Steve

--
Sent via pgsql-sql mailing list (pgsql-sql@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-sql

Re: [SQL] surrogate vs natural primary keys

2008-09-17 Thread Steve Midgley

To: pgsql-sql@postgresql.org
From:  Seb [EMAIL PROTECTED]
Subject: Re: surrogate vs natural primary keys
Date:  Mon, 15 Sep 2008 17:56:31 -0500
Organization:  Church of Emacs
Lines: 20
Message-ID:  [EMAIL PROTECTED]
References:  [EMAIL PROTECTED]
[EMAIL PROTECTED]
X-Archive-Number: 200809/101
X-Sequence-Number: 31553

On Mon, 15 Sep 2008 16:45:08 -0600,
Scott Marlowe [EMAIL PROTECTED] wrote:

[...]

 I think this question is a lot like how large should I set
 shared_buffers?  There's lots of different answers based on how 
you

 are using your data.

Yes, this is precisely what I'm after: *criteria* to help me decide
which approach to take for different scenarios.  Such guidance is what
seems to be lacking from most of the discussions I've seen on the
subject.  It's hard to distill this information when most of the
discussion is centered on advocating one or the other approach.

I think Scott and others have laid out the main ideas in a very 
cool-headed way already, but here's my follow-on input:

I agree with Andrew Sullivan that using industry standard id's as your 
primary key can be problematic. But I do sometimes apply unique indices 
to such industry standard columns to ensure they are in fact unique 
and can be a surrogate for the real integer/serial primary key.

As a rule, I have decided to stay away from meaningful (natural) 
primary keys for these reasons:

1) They sometimes change b/c of business rule changes, forcing 
technical changes to the relationship model, when only internal table 
schema changes should be required to support the new business 
requirements.

2) Generating arbitrary/surrogate keys is easier b/c you can use 
sequence generators. (When creating a new record, I have to figure out 
the value of a meaningful column before saving the record which 
sometimes I don't want to do!)

3) Surrogate keys are guaranteed unique regardless of semantic content 
of the table.

4) All tables can all join to each other in the same ways: property.id 
holds the same data type as contact.id. All id fields are the same in 
type/format.

I think there's even a reasonable argument for globally unique 
surrogate keys: all keys for any table use the same sequence of id's. I 
implemented a system in the 90's that used globally unique id's and it 
opened up some interesting solutions that I wouldn't have thought of 
when I started the project (self joins were the same as foreign joins 
since the id's in both entities were guaranteed unique).

I've heard some people argue the use of GUID's for id's but I've been 
too scared to try that in a real system.

Sequential, arbitrary primary keys (as surrogate keys) are predictable 
though. So if you share those keys with the public (via URL's for 
example), then competitors can learn information about your business 
(how fast keys are generated for a certain table for example).

That's an argument for random, arbitrary primary keys though, not for 
compound/meaningful keys.

I think natural or compound keys make more sense to DBA's and let you 
implement some kinds of database solutions more quickly.

All in all, I don't really understand the merits of natural keys 
outside of data warehouse applications. In data warehouses, in my 
experience, compound natural keys just end up turning into fact tables! 
:)

In summary: I've never heard someone say they've been bitten by using 
an arbitrary surrogate key system, but I myself have been bitten and 
have heard lots of stories of problems when using natural keys.

I hope this helps some,

Steve

--
Sent via pgsql-sql mailing list (pgsql-sql@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-sql

Re: [SQL] Join question

2008-08-15 Thread Steve Midgley

At 12:20 PM 8/15/2008, [EMAIL PROTECTED] wrote:

Date: Fri, 15 Aug 2008 13:46:14 -0400
From: Edward W. Rouse [EMAIL PROTECTED]
To: pgsql-sql@postgresql.org
Subject: Re: Join question
Message-ID: [EMAIL PROTECTED]

I did try that, but I can't get both the values from table a with no 
entries
in table b and the values from table b with null entries to show up. 
It's

either one or the other.

Edward W. Rouse

Might have luck with applying some additional WHERE clause criteria to 
your full outer join. So if you don't want certain types NULL's in 
table b, restrict against that in WHERE clause? I could be 
misunderstanding the whole thing though..

Steve

--
Sent via pgsql-sql mailing list (pgsql-sql@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-sql

Re: [SQL] DELETE with JOIN

2008-08-07 Thread Steve Midgley

At 10:05 AM 8/7/2008, [EMAIL PROTECTED] wrote:

Date: Thu, 7 Aug 2008 09:14:49 -0700
From: [EMAIL PROTECTED]
To: pgsql-sql@postgresql.org
Subject: DELETE with JOIN
Message-ID: [EMAIL PROTECTED]

I want to delete with a join condition.  Google shows this is a common
problem, but the only solutions are either for MySQL or they don't
work in my situation because there are too many rows selected.  I also
have to make this work on several databases, includeing, grrr, Oracle,
so non-standard MySQL solutions are doubly aggravating.

DELETE FROM a WHERE a.b_id = b.id AND b.second_id = ?

I have tried to do this before and always found a way, usually

DELETE FROM a WHERE a.b_id IN (SELECT id FROM b WHERE second_id = 
?)

but I have too many rows, millions, in the IN crowd, ha ha, and it
barfs.  EXISTS is no better.  At least Oracle barfs, and I haven't got
to the others yet.  I figured I would go with the worst offender
first, and let me tell you, it is offensive.  Dang I wish it were
postgresql only!

I could write a Dumb Little Test Program (tm) to read in all those IN
ids and execute a zillion individual DELETE statements, but it would
be slow as puke and this little delete is going to come up quite often
now that I have a test program which needs to generate the junky data
and play with it for several days before deleting it and starting over
again.

Hi,

Have you tried something where you read in all those IN id's and then 
group them into blocks (of say 1,000 or 10,000 or whatever number works 
best)? Then execute:

DELETE FROM a WHERE a.b_id in ([static_list_of_ids])

Replacing in a loop [static_list_of_ids] with each block of 1000 id's 
in a comma delimited string? I use this technique sometimes in 
middleware and it works pretty well. There's probably a pure-sql 
solution in Pg as well but this method should work across any SQL 
platform, which seems like one of your requirements.

Steve

--
Sent via pgsql-sql mailing list (pgsql-sql@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-sql

Re: [SQL] Problem with ORDER BY and DISTINCT ON

2008-07-31 Thread Steve Midgley


At 03:51 PM 7/31/2008, Tom Lane wrote:

Steve Midgley [EMAIL PROTECTED] writes:
 At 07:29 AM 7/16/2008, Tom Lane wrote:
 I think what is happening is that ORDER BY knows that and gets rid 
of

 the duplicate entries while DISTINCT ON fails to do so.

 Of course removing the duplicate from both areas is the correct
 solution and I broke down and hacked that into the auto-sql-writing 

 code and so my immediate problem is solved. I'm happy to file this 
as a
 ticket for Pg (please point me to your ticket tool as I've never 
used
 it). This is not a very big deal but Pg has such a high compliance 
with
 wacky-but-valid SQL it does seem like it should be fixed just 
because.


I've applied a patch for this to CVS HEAD.  I doubt we'll try to fix 
it

in the back branches, though --- it's too much of a corner case to be
worth taking any risk of breaking other stuff.

regards, tom lane

Hey Tom,

That's really great - thanks. I'm impressed how quickly you are fixing 
this obscure issue. I came from MS SQL and it would be hard for me to 
put into words how much of a better job you all are doing on Pg.


Best,

Steve


--
Sent via pgsql-sql mailing list (pgsql-sql@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-sql

Re: [SQL] index for group by

2008-07-22 Thread Steve Midgley

At 09:20 AM 7/22/2008, [EMAIL PROTECTED] wrote:

Date: Tue, 22 Jul 2008 13:27:24 +0200
From: A. Kretschmer [EMAIL PROTECTED]
To: pgsql-sql@postgresql.org
Subject: Re: index for group by
Message-ID: [EMAIL PROTECTED]

am  Tue, dem 22.07.2008, um 13:18:30 +0200 mailte Patrick Scharrenberg 
folgendes:

 Hi,

 is there a way to speedup group by queries with an index?

 In particular if I have a table like this:

 CREATE TABLE data
 (
id1 integer,
id2 integer,
somedata character varying,
ts timestamp with time zone
 );

 where continously data is logged about id1 and id2 into 
somedata,

 together with the timestamp when it was logged.

 So I have multiple rows with the same id1 and id2 but different
 timestamp (and data maybe).

 At the moment I have ~40.000.000 rows in that table so doing a

   SELECT id1, id2 FROM data GROUP BY id1, id2;

without a where-clause every select forces a seq-scan.

First, why are you doing a group by when you aren't doing an 
aggregation (like COUNT, SUM, etc)? It seems like you can get way 
better performance by doing this:

SELECT DISTINCT ON (id1, id2) id1, id2 FROM data ORDER BY id1, id2

(Assuming your compound index is in id1,id2 order). Am I missing 
something?

A different more cumbersome idea I have for you (if you really do need 
a GROUP BY) is to build a warehouse table that precalculates the data 
you want. You can build some recurring process that runs every NN 
minutes or hours and fires off a stored procedure which grabs all the 
data from this data table, aggregates it and saves it to warehouse 
table. You could aggregate against your datetime stamp by N hours or 
days as well. If this idea is of interest you can write back to the 
list or off-list to me for more info.

Steve

--
Sent via pgsql-sql mailing list (pgsql-sql@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-sql

[SQL] Problem with ORDER BY and DISTINCT ON

2008-07-16 Thread Steve Midgley


Hi,

I'm a little baffled. I'm trying to generate a SQL statement that 
issues a DISTINCT ON using the same values as my ORDER BY statement. 
I'm using a somewhat complex CASE statement in my ORDER BY clause. I'm 
on Pg 8.2. Here is some SQL to get you started at seeing my problem:


--

drop table if exists property;
create table property
( id serial,
  state varchar(255),
  search_rate_max decimal(8,2),
  data_priority_code varchar(255)
);

SELECT DISTINCT ON
(property.state,
 CASE WHEN (search_rate_max IS NOT NULL) THEN 1 ELSE 2 
END,search_rate_max,
 CASE WHEN (search_rate_max IS NOT NULL) THEN 1 ELSE 2 
END,search_rate_max,

 property.id)
 property.id
FROM property WHERE ((property.data_priority_code IS NOT NULL))
ORDER BY
  property.state,
  CASE WHEN (search_rate_max IS NOT NULL) THEN 1 ELSE 2 
END,search_rate_max,
  CASE WHEN (search_rate_max IS NOT NULL) THEN 1 ELSE 2 
END,search_rate_max,

  property.id
LIMIT 10 OFFSET 0


RESULTS: ERROR: SELECT DISTINCT ON expressions must match initial ORDER 
BY expressions

SQL state: 42P10


Now if you run this statement it works

--

SELECT DISTINCT ON
(property.state,
 property.id)
 property.id
FROM property WHERE ((property.data_priority_code IS NOT NULL))
ORDER BY
  property.state,
  property.id
LIMIT 10 OFFSET 0

--

However if you run this statement it ALSO works, which tells me it's 
not just my CASE statements that are messing things up (note in this 
example, I just removed the primary key property.id from the ORDER BY 
and DISTINCT ON clauses:


---

SELECT DISTINCT ON
(property.state,
 CASE WHEN (search_rate_max IS NOT NULL) THEN 1 ELSE 2 
END,search_rate_max,
 CASE WHEN (search_rate_max IS NOT NULL) THEN 1 ELSE 2 
END,search_rate_max

 )
 property.id
FROM property WHERE ((property.data_priority_code IS NOT NULL))
ORDER BY
  property.state,
  CASE WHEN (search_rate_max IS NOT NULL) THEN 1 ELSE 2 
END,search_rate_max,
  CASE WHEN (search_rate_max IS NOT NULL) THEN 1 ELSE 2 
END,search_rate_max

LIMIT 10 OFFSET 0


RESULTS: ERROR: SELECT DISTINCT ON expressions must match initial ORDER 
BY expressions

SQL state: 42P10


Finally, if you run this statement it works fine (removing one of the 
duplicate search_rate_max statements):




SELECT DISTINCT ON
(property.state,
 CASE WHEN (search_rate_max IS NOT NULL) THEN 1 ELSE 2 
END,search_rate_max,

 property.id)
 property.id
FROM property WHERE ((property.data_priority_code IS NOT NULL))
ORDER BY
  property.state,
  CASE WHEN (search_rate_max IS NOT NULL) THEN 1 ELSE 2 
END,search_rate_max,

  property.id LIMIT 10 OFFSET 0



What's going on here? Am I doing something that isn't legitimate SQL? I 
can't see why having a duplicate CASE statement should foul things up 
like this? It's pretty clear (from additional testing not included in 
this email) that the duplicate search_rate_max CASE is causing the 
problem.


Thanks for any advice or suggestions on how to get this to run 
correctly. Is this a bug?


Basically I'm doing this as an optimization - I can get much better 
performance running the DISTINCT ON in some circumstances than using 
DISTINCT, but the edge case above is breaking my tests and preventing 
me from implementing the idea. The code is generated by an application 
layer which is not really paying attention to whether or not the two 
CASE statements apply to the same field or not (sometimes they do 
sometimes they don't)..


Thanks!

Steve


--
Sent via pgsql-sql mailing list (pgsql-sql@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-sql

Re: [SQL] Copyright and Paper walls (was: Rollback in Postgres)

2008-07-12 Thread Steve Midgley

At 11:59 AM 7/12/2008, [EMAIL PROTECTED] wrote:

Date: Sat, 12 Jul 2008 10:20:37 +0100
From: Simon Riggs [EMAIL PROTECTED]
To: Dave Page [EMAIL PROTECTED]
Cc: Lewis Cunningham [EMAIL PROTECTED], Scott Marlowe 
[EMAIL PROTECTED], samantha mahindrakar 
[EMAIL PROTECTED],  pgsql-sql@postgresql.org

Subject: Re: Rollback in Postgres
Message-ID: [EMAIL PROTECTED]

On Sat, 2008-07-12 at 09:40 +0100, Dave Page wrote:
 On Sat, Jul 12, 2008 at 8:56 AM, Simon Riggs 
[EMAIL PROTECTED] wrote:

  Please don't put links to copyrighted material on our lists.

 That's an odd thing to say, given that virtually every link on our
 lists probably points to material copyrighted in some way.

Prudence is all I ask for. We don't need to provide additional
advertising for others, nor do we wish to embroil ourselves in
accusations over copyright violations.

I don't want to pile more wood on the fire, but I think I can see both 
sides to this. I believe this is not so much copyright violation 
concern, but if the Pg team releases some cool feature relating to 
rollbacks down-the-road that is vaguely similar to Oracle's system, 
reducing the amount of discussion about Oracle's features on this list 
would reduce Oracle's ability to claim that the feature was a direct 
appropriation.

That said (and IANAL), I think posting links to for-profit and/or 
copyrighted websites is really important in general for the list. 
There's a lot of good information out there and I think it's not so 
great if this list were to limit itself only to public domain and open 
copyright documentation for consideration.

Just two more cents from the peanut gallery on a Saturday afternoon,

Steve

--
Sent via pgsql-sql mailing list (pgsql-sql@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-sql

Re: [SQL] PERSISTANT PREPARE (another point of view)

2008-07-11 Thread Steve Midgley

At 04:31 PM 7/11/2008, [EMAIL PROTECTED] wrote:

Date: Fri, 11 Jul 2008 23:31:03 +
From: Milan Oparnica [EMAIL PROTECTED]
To: pgsql-sql@postgresql.org
Subject: PERSISTANT PREPARE (another point of view)
Message-ID: [EMAIL PROTECTED]
[snip]
What could we gain by introducing a kind of global prepared statement 
area, is SIMPLICITY of DB DEVELOPMENT AND MAINTENANCE.

Here is our point of view:
[snip]
Now, instead of preparing statements on each connection request (and 
we use around 900 prepared statements), why couldn't we simply prepare 
these statements ONCE and keep them in some global storage for future 
everyday usage.

Hi,

What's wrong with using complex views, stored procedures, functions and 
maybe even custom data types to accomplish what you want here? It seems 
like you could build a lot of prepared statements using these tools, 
providing your application layer developers with a consistent set of 
interfaces to obtain data that are not tied to the data tables 
themselves. And allowing them to insert/update/manage tables via 
structured interfaces as well.

Am I missing something?

Best,

Steve

--
Sent via pgsql-sql mailing list (pgsql-sql@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-sql

Re: [SQL] ANSI Standard

2008-06-26 Thread Steve Midgley

At 02:20 AM 6/25/2008, [EMAIL PROTECTED] wrote:

Date: Tue, 24 Jun 2008 17:33:11 +0300
From: Pascal Tufenkji [EMAIL PROTECTED]
To: pgsql-sql@postgresql.org
Subject: ANSI Standard
Message-ID: [EMAIL PROTECTED]

Hi,

How do I know if a function (or a certain sql syntax) in Postgres is a 
SQL
ANSI Standard, hence it works on all databases such as MySQL, SQL 
Server,

Oracle.

In general, I find that the Pg docs pretty clear state what is ANSI 
standard and what isn't within Pg. You can also view the ANSI-92 
standard here:

http://www.contrib.andrew.cmu.edu/~shadow/sql/sql1992.txt

In terms of making sure you're cross platform compatible, I'd say you 
have to designate a series of platforms (e.g. MySQL 5, Pg 8.3, Oracle 
X, MS SQL X, ext) which you will test against and explicitly support. 
You will find that no matter how tightly you attempt to build your 
platform against ANSI-92 (or any other std) if you do not regularly 
test against a set of platforms, your solution will converge on 
supporting only the platforms you do regular test against.

I hope that helps,

Steve

--
Sent via pgsql-sql mailing list (pgsql-sql@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-sql

Re: [SQL] Tsearch

2008-06-12 Thread Steve Midgley

At 12:20 PM 6/12/2008, [EMAIL PROTECTED] wrote:

Date: Thu, 12 Jun 2008 08:47:44 -0400
From: PostgreSQL Admin [EMAIL PROTECTED]
To: pgsql-sql@postgresql.org
Subject: Tsearch
Message-ID: [EMAIL PROTECTED]

this is a small sample of the data:

short_desc
|   long_desc 

--+
 CHICKEN,BROILERS OR FRYERS,LEG,MEATSKN,CKD,FRIED,BATTER | Chicken,
broilers or fryers, leg, meat and skin, cooked, fried, batter

Is the best method of search through this data full text search via
tsearch or some other method.  I'm running version 8.3

say I want to search for chicken skin?

Thanks for the advice,
J

Having previously established myself as a non-expert on TSearch2, I 
hesitate to post, but I think your issue is data semantics more than 
search tool. How do you want your system to handle this psuedo data:

Chicken broilers, pork skins

Should that return a match for chicken skins or not? If your data are 
semantically stored such that any record that matches chicken and 
also matches skin refers to the skins of chickens (the actual 
meaning you are looking for), then you're going to have a relatively 
easy time of it.

If not, you'll have to consider ways to code your data so that searches 
behave correctly. You could develop search / coding schemes that say 
things like find all records with chicken and skin, where the 
following words do not appear between the words chicken and skin: beef, 
pork, cow, pig, etc..

Just some thoughts for you there.

Best,

Steve

--
Sent via pgsql-sql mailing list (pgsql-sql@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-sql

Re: [SQL] Conceptual Design Question

2008-06-10 Thread Steve Midgley

At 10:52 AM 6/10/2008, [EMAIL PROTECTED] wrote:

Date: Tue, 10 Jun 2008 05:05:24 -0700
From: Bryan Emrys [EMAIL PROTECTED]
To: pgsql-sql@postgresql.org
Subject: Conceptual Design Question
Message-ID: [EMAIL PROTECTED]

Hello Everyone,

In a text-heavy database, I'm trying to make an initial design 
decision in the following context.

There is a lot of long text that I could break down into three 
different categories:

[snip]
The conceptual question is what are the trade-offs between having one 
textual table compared with multiple text tables? Any help on pointing 
out practical considerations would be appreciated.

Thanks.

Bryan

Hi Bryan,

Firstly, I might investigate the GiST index and TSearch2 in this 
regard. I'm not an expert on them, and it maybe is cart before the 
horse, but if those tools are applicable and are easier to 
implement/maintain with one design approach or the other, I might use 
their design preferences as my guide for picking the right 
relationships.

Beyond that advice, it does seem to me that a polymorphic relationship 
(where one table holds multiple entities) *could* describe laws and 
treaties, though they are kind of different in their relations. 
Commentaries seem pretty distinct from these two things.

My overall opinion would also depend on the architecture. Will you have 
a unified middleware/ORM layer that can manage the business rules for 
the polymorphic data retrieval? Or will developers be going directly 
into the database to pull items directly?

If you have a unified ORM that stores the business rules, you can be 
more aggressive about using polymorphism, b/c the complexity can be 
hidden from most developers.

All in all, I think your model is really describing three distinct data 
entities, and should be stored in three separate tables, but that's a 
very high level and uninformed opinion! I'd let TSearch2 drive your 
design if that's a relevant consideration. Of course TSearch2 is very 
flexible so it might not really care much about this. :)

In general, I find that a data model that looks like the real data is 
the one that I'm happiest with - the systems I've seen with too much 
UML optimization and collapsing of sets of data into single tables tend 
to be harder to maintain, etc.

Just some random opinions for you there. I'm sure others have different 
perspectives which are equally or more valid!

Best,

Steve

--
Sent via pgsql-sql mailing list (pgsql-sql@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-sql

Re: [SQL] design resource

2008-06-06 Thread Steve Midgley

At 11:20 PM 6/5/2008, [EMAIL PROTECTED] wrote:

Date: Thu, 5 Jun 2008 10:14:04 -0400
From: Edward W. Rouse [EMAIL PROTECTED]
To: pgsql-sql@postgresql.org
Subject: design resource
Message-ID: [EMAIL PROTECTED]

I was wondering if there were any resources that have some table 
designs for common problems. Since that isn't very clear I will

give an example.

We have an internal app from years back that needs to be updated. One 
of the problems is that when it was originally created, the
company only had US customers. We now have international customers and 
need to support international addresses and phone numbers.
For the phone numbers that means adding a new column for international 
code or expanding the data field so that it's big enough to
hold the international prefix (still not sure which approach is best). 
But I haven't a clue as to how to set up for international

addresses.

So I was hoping there would be a resource that I could check where 
these kinds of data sets have been 'solved' to ease the effort. I
have several books on design patterns for programming but I've not 
seen a design patterns book for common database problems. Thanks.

Hi,

In addition to Craig's excellent answer, I'll give an additional 
nuance. I think that free-form and flexible/re-usable fields are the 
way to for handling addresses.

However, normalizing country is generally pretty smart (as is 
normalizing state/admin region within countries where you do a lot of 
business). This can be generally handled on the front-end with a 
pull-down menu of choices, but you would probably be happiest enforcing 
this on the back-end as well - possibly by having a country look up 
table:

country_id|iso2|iso3|full_name|short_name|full_accents|short_accents...etc

I keep the country names with and without accents to make searching 
easier across keyboards/locales.

I hope this helps too -- I think Craig has given you the lion's share 
of good advice for sure - and I definitely follow the practices more or 
less as he laid them out as well.

Sincerely,

Steve

--
Sent via pgsql-sql mailing list (pgsql-sql@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-sql

Re: [SQL] Extremely Low performance with ODBC

2008-05-28 Thread Steve Midgley

At 09:20 PM 5/27/2008, [EMAIL PROTECTED] wrote:

Date: Tue, 27 May 2008 09:29:56 -0700
From: Richard Broersma [EMAIL PROTECTED]
To: Sebastian Rychter [EMAIL PROTECTED]
Cc: pgsql-sql@postgresql.org
Subject: Re: Extremely Low performance with ODBC
Message-ID: 
[EMAIL PROTECTED]

On Mon, May 26, 2008 at 9:34 PM, Sebastian Rychter
[EMAIL PROTECTED] wrote:

 I'll keep on looking for any other differences on different logs.. 
I think I
 already searched all the Postgresql forums I know and no one ever 
reported

 something like that before.

I've been following this discussion with interest - years ago I was 
responsible for a high performance web system that used ODBC in the 
data layer (not quite an oxymoron there). But I haven't touched ODBC 
since 2001! (Be warned)..

With performance times that slow it seems like there are a couple of 
possibilities that I can think of. To narrow down the choices:

How many rows does your query return? If it's returning a lot try 
running the exact same query but with a limit 10 on the end - does 
that take just as long to run or does it come back quickly?

Here are some other ideas:

1) The Pg ODBC driver is broken and is looping internally in some wrong 
way, when constructing your result set. This seems a little unlikely 
but totally possible. The groups that Richard recommends can probably 
help on this end.

2) Your ODBC DSN is misconfigured. This seems more likely to me. You 
may already be an expert with ODBC and have it correctly configured, in 
which disregard this. But the Pg ODBC interface has a lot of unique 
options and any one or combination could be spiking your results speed. 

Have you played with the various config options in the advanced window 
of odbcad32.exe when creating a data source with a Pg ODBC driver?

For example, try toggling Keyset query optimization or Use 
Declare/Fetch - Also, does the cache size setting impact your query 
speed at all? There are a number of checkboxes here that seem like they 
might affect your situation. Be sure all the logging is turned off when 
testing performance.

Also, updatable cursors is turned on by default (on my copy of Pg 
ODBC) - that seems wrong and you might try your search with it turned 
off. Also server side prepare is turned off on my copy, and this also 
seems wrong - any diff with it turned on?

Another idea: when you run the query and it takes a long time, is the 
CPU spiked on your client machine or on the SQL Server or neither (if 
they're on the same machine, look in Task Manager to see which process 
is eating CPU). If it's not spiked anywhere, then it's possible that 
your problem is with a network socket timeout or something similar. 
Maybe use wire shark or something to watch your network traffic to dig 
deeper if this seems possible.

Others may have a more clear idea as to what these various ODBC options 
are for, but in my distant memories, fiddling with ODBC settings when 
doing performance tuning can make a big, big difference in how an 
application performs.

My whole goal in this regard was to find the settings in ODBC that 
caused ODBC to do as absolutely little as possible. Just receive raw 
sql, pipe it to my server and hand it back to me in a memory data 
structure. All the data structure parsing was done by my middleware 
connection wrapper to ODBC and none was done by ODBC itself. Once I got 
to that point, I was able to get some decent performance out of ODBC.

I hope some of these ideas helps! Feel free to write back on or off 
list.

Sincerely,

Steve

--
Sent via pgsql-sql mailing list (pgsql-sql@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-sql

Re: [SQL] Extremely Low performance with ODBC

2008-05-28 Thread Steve Midgley

At 09:20 PM 5/27/2008, [EMAIL PROTECTED] wrote:

Date: Tue, 27 May 2008 09:29:56 -0700
From: Richard Broersma [EMAIL PROTECTED]
To: Sebastian Rychter [EMAIL PROTECTED]
Cc: pgsql-sql@postgresql.org
Subject: Re: Extremely Low performance with ODBC
Message-ID: 
[EMAIL PROTECTED]

On Mon, May 26, 2008 at 9:34 PM, Sebastian Rychter
[EMAIL PROTECTED] wrote:

 I'll keep on looking for any other differences on different logs.. 
I think I
 already searched all the Postgresql forums I know and no one ever 
reported

 something like that before.

Apologies for the double post on this - I wonder also if you have tried 
alternative Pg ODBC drivers? I found this one while googling around:

http://uda.openlinksw.com/odbc/st/odbc-progress-st/

But I'd guess there are other alternatives. Also, you might check out 
their ADO driver and see if it gives you the same problems..

Just some more grist for the solution mill, hopefully!

Steve

--
Sent via pgsql-sql mailing list (pgsql-sql@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-sql

Re: [SQL] export CSV file through Java JDBC

2008-04-15 Thread Steve Midgley

At 07:20 AM 4/15/2008, you wrote:

Date: Mon, 14 Apr 2008 09:41:41 -0400
From: Emi Lu [EMAIL PROTECTED]
To: pgsql-sql@postgresql.org
Subject: export CSV file through Java JDBC
Message-ID: [EMAIL PROTECTED]

Good morning,

Running the following command from command line is ok, but cannot 
export

a table into a csv file through java JDBC code.

Please help!

JAVA code:
===
public static void exec(String command)
{
   try{
  Process p   = Runtime.getRuntime().exec(command);
  p.waitFor();
  p.destroy();
   }catch(Exception e) {
  System.err.println(exec command Error:   + 
e.getMessage());

   }
}

SQL Command:
=
psql -U username -d dbName -c  \copy tableName to 'result.csv'  with 
CSV 

When call exec(commands);

Nothing happens, result.csv was not created at all?

Thanks a lot!

A couple of thoughts. First, you aren't passing the password in, so 
that seems like a problem. Of course, psql won't let you specify a 
password on the command line but last I looked you can set an ENV var 
before running psql: PGPASSWORD=[your password here]

Second, you don't specify a server/port, which means your Pg server is 
localhost:5432?

Third, you are not specifying a path to pgsql, so you have to be sure 
that it can be found in the path. Now this can be tricky: your Java 
application may be running in a context DIFFERENT from your command 
prompt. The user/env your Java app is running in will determine what 
path vars are available to it - it may not be able to find psql. Try 
running which psql  /tmp/which.txt in your code above and see what 
happens (assuming you're on a box with which installed).

Fourth (minor), you don't specify column names in your export which 
could result in variable results depending on the create statement - 
it's better to specify to guarantee the same results every time.

Fifth, try capturing STDERR and STDOUT, so that if psql or command 
shell generate errors you'll know what they are. Maybe Java gives you 
that in e.getMessage or maybe you need to put it in your psql command 
line.

I'm doing exactly the same thing you are doing but in Ruby/ActiveRecord 
so I know this works. It works for me on Windows and Linux, fwiw.

I don't know enough Java to know if the command you are running is the 
standard shell execute command in Java. If it's not, that's what you 
want so change your code that way. You just want java to shell out to 
the OS command processor. Be sure when you set your command shell env 
var, that this env var persists long enough so that when you run your 
psql command it's still in effect. For example this psuedo code might 
not work b/c two different child shells are run:

system.exec(export PGPASSWORD=pass1234);
system.exec(psql my command here);

I think you want something more like this psuedo code:

system.set_environment(PGPASSWORD)=pass1234;
system.exec(psql my command here);

I hope this helps,

Steve

Re: [SQL] Create on insert a unique random number

2008-03-19 Thread Steve Midgley


At 06:47 AM 3/19/2008, D'Arcy J.M. Cain wrote:

But your suggestion was to base this key on the serial primary key so
where is your index collision protection?  You are going to get
collisions on both the serial key and, to a lesser extent, your
generated one. Besides, has anyone ever demonstrated a real issue with
lookups using serial primary keys? I think you are trying to second
guess the database engine with this and I don't think that that is a
great idea.

Hi D'Arcy,

I'm not following this line. Maybe we're talking about two different 
things here.. I don't know if Lance is using CRUD methodology per se, 
but that's a well accepted web approach and uses (generally) serial 
primary keys in the URL structure as (where numbers are serial pk's):


[website]/contact/12345
[website]/property/45678
  [and the client sends GET, POST, PUT, DELETE http requests, or 
mimics, to activate various functions]


Whether CRUD of otherwise, in the model I was promoting, there would be 
two index columns in the table along with other data, a public index 
and a serial primary key. The public index is based on the primary key:


pk | public_pk
1  | md5(1 + fixed salt)
2  | md5(2 + fixed salt)
...

AFAIK, an MD5 hash is guaranteed to generate a unique output for any 
unique input, so the serial key and fixed salt would guarantee no hash 
index collisions on the MD5 output. Of course if a competitor knows 
you're using MD5 and they know your salt, they could calculate all the 
md5 integer hashes and see which ones exist..


But I could care less if he uses md5 or sha-1 or Guids! (I just picked 
MD5 because another poster recommended it and it's very easy to 
implement in Pg). The point I care about is that there would be a 
public_pk that associates to one-and-only-one serial pk. Also that 
public_pk should be 1) not easily guessable, 2) non-clustering (and 
therefore non-serial). Then his url's would look like something like:


[website]/contact/c4ca4238a0b923820dcc509a6f75849b
[website]/property/c81e728d9d4c2f636f067f89cc14862c

 The issue is about creating an index into a sparse hash so that 
each
 record is somewhat randomly located in a sparse hash index space. 

 (One valid reason to do this would be if you wanted to hide the 
total
 number of records in your table from competitors or customers). 
(Just


If that is your goal then start your serial at something other than 1.
Start at 1,000,000 for example and your first user will think that
you already have one million clients.  Actually, he will think that
you started elsewhere than 1 but he won't know where.


The original post did not want users to be able to type in random 
integers like:


/contact/343

And find out if that record #343 exists or not (regardless of whether 
they can get access to the record - the error generated on 
no-authorization may be different from record-not-found). So starting 
at a million does not fix the OP's issue.


From my perspective, wherever you start your serial index, competitors 
can watch it grow over time, if it's a numeric serial. That could be 
more valuable in many businesses than knowing the initial size of the 
table.


Anyway, I hope that clears up what I was recommending! I didn't 
anticipate it would stir up this much analysis and I hope the OP finds 
your input and mine useful in coming up with a final answer to his 
issue. Thanks for taking the time to consider the issue and I'll look 
forward to any additional ideas or comments you have on this too!


Sincerely,

Steve


--
Sent via pgsql-sql mailing list (pgsql-sql@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-sql

Re: [SQL] Create on insert a unique random number

2008-03-18 Thread Steve Midgley

At 11:58 AM 3/18/2008, [EMAIL PROTECTED] wrote:

Date: Tue, 18 Mar 2008 13:40:42 -0500
From: Campbell, Lance [EMAIL PROTECTED]
To: Vivek Khera [EMAIL PROTECTED],
pgsql-sql@postgresql.org
Subject: Re: Create on insert a unique random number
Message-ID: 
[EMAIL PROTECTED]

Thanks for all of your input.  It appears that the best way to do this
is to create a default random number in the primary id field in the
table definition and then return that value after insert.  If an
exception occurs because of duplicates I will simple perform the same
insert statement again. I doubt there would be many duplicate hits if 
I

use a really large number.
[snip]
I built a web application called the Form Builder.  It allows
individuals to create web forms.  After a user is done building their
web form the tool provides a URL for the user to access the form.
Obviously the URL has the random ID of the form in it.  Most of the
forms created with this tool can be accessed and filled out by the
general public.

[snip]

Hi Lance,

I think I get you as a fellow web systems (aka middleware) guy. My 
opinion is that the use of a sparse index is totally reasonable for 
the purpose you describe. But I would argue that you could take it a 
little further in implementation that might keep your db design sane 
while still giving you the sparse index function on the front-end.

1) Create a second field (as someone recommend on this list) that is an 
MD5 of your primary key. Use that as your accessor index from the web 
application. But keep the primary key as an integer serial, so that it 
works as expected, and you can build relations normally. I think in the 
end you'll be happier with this method than messing around with a 
custom primary key system.. You can build a trigger that generates the 
MD5 hash every time a record is created (or you can do it in your ORM 
layer in the web app).

2) Also, (but OT) put a monitor on your weblogs to look for 404 
errors (page not found for the sql-only people here). This will 
supplement your sparse index by detecting people who are scanning your 
sparse index space and generating lots of misses.

Hope that helps,

Steve

--
Sent via pgsql-sql mailing list (pgsql-sql@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-sql

Re: [SQL] Create on insert a unique random number

2008-03-18 Thread Steve Midgley


At 12:36 PM 3/18/2008, D'Arcy J.M. Cain wrote:

On Tue, 18 Mar 2008 12:23:35 -0700
Steve Midgley [EMAIL PROTECTED] wrote:
 1) Create a second field (as someone recommend on this list) that 
is an
 MD5 of your primary key. Use that as your accessor index from the 
web


I strongly disagree for three reasons.  First, if you are going to
generate a key then don't store it.  Just generate it every time.
Second, don't generate it based on a known field.  You may think that
it is secure but what if you private key is compromised?  Do you then
change everyone's security code?  Third, what if one person's
code is compromised?  If it is based on a calculation then you
can't change that one person's security code.

Generate a random number and store that.  You will be much happier 
when

something goes wrong and something always goes wrong.

--
D'Arcy J.M. Cain [EMAIL PROTECTED] |  Democracy is three 
wolves


Hi D'Arcy,

I'm not clear on your concern here - an MD5 hash doesn't have a private 
key that can be compromised, afaik. It's a one way hash. I don't see 
much difference between making an MD5 of the primary key and generating 
a random number for the public primary key, except that you shouldn't 
get index collisions with the MD5 method (whereas eventually you will 
with a random number, though of course using a GUID would eliminate 
that concern for practical purposes).


The issue raised by the OP, I believe, is not about security of the 
primary key # itself or its ability to provide unauthorized access to 
the underlying records. The system in question protects its records 
from unauthorized access already.


The issue is about creating an index into a sparse hash so that each 
record is somewhat randomly located in a sparse hash index space. 
(One valid reason to do this would be if you wanted to hide the total 
number of records in your table from competitors or customers). (Just 
for reference of my view on the problem: 
http://en.wikipedia.org/wiki/Hash_table)


Whether SHA-1 or MD5, I think the point is that if you don't care about 
speed in generating the hash index (which the OP doesn't apparently), 
hash indexing via an encryption algorithm will ensure that the hash 
index is relatively free of clustering - which as I understand it, is 
the point of this exercise. Encryption as a hash index generator is 
imperfect for sure, as the Wikipedia article goes at length to discuss, 
but from my perspective it does the job - at least as far as the OP 
describes it (or I understood it!). [smile]


I may be way off here of course, and I appreciate the input - any 
thoughts?


Steve


--
Sent via pgsql-sql mailing list (pgsql-sql@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-sql

Re: [SQL] Insert problem

2008-03-10 Thread Steve Midgley

At 09:20 AM 3/10/2008, [EMAIL PROTECTED] wrote:

Date: Mon, 10 Mar 2008 00:14:12 +
From: Jamie Tufnell [EMAIL PROTECTED]
To: pgsql-sql@postgresql.org
Subject: Re: Insert problem
Message-ID: 
[EMAIL PROTECTED]

[snip]
 table defination

 create sequence schItem_item_seq
 create table schItem
 (scid int NOT NULL references schedule ON DELETE CASCADE,
 item int NOT NULL default nextval('schItem_item_seq'),

[snip]

It looks like there's already a row where scid=2072 and
item=nextval('schItem_item_seq').

Try:

SELECT setval('schItem_item_seq', (SELECT max(item)+1 FROM schItem));

And then run your query again.

Cheers,
J.

A friendly amendment to Jamie's (correct) advice. Be sure that no other 
tools are obtaining id's from the sequence or inserting rows into 
schItem when you run this.. (Safest way is to have the db offline when 
running this). I got some good info from the experts here about this 
while back and wrote up a little blog article detailing the issue (in 
short it's a big pain to do it while the db is online, with no obvious 
sure-fire solution):

http://www.misuse.org/science/2007/08/07/obtaining-a-block-of-ids-from-a-sequence-in-postgresql/

I hope this helps a little!

Steve

--
Sent via pgsql-sql mailing list (pgsql-sql@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-sql

Re: [SQL] works but ...

2008-03-10 Thread Steve Midgley


At 02:49 PM 3/10/2008, A. R. Van Hook wrote:
The following code seems to work but it leads to the following 
question(s):
Is there a sequence for each scid,item or is there one sequence that 
must be reset

when changing scid?

   $cmd = select setval('schItem_item_seq', (select max(item)+1 from 
schItem where scid=$newScid));

   $result = $conn-exec($cmd);
   $OK +=  cmp_eq($conn,PGRES_TUPLES_OK, $result-resultStatus);

   $cmd = insert into schItem (scid, value, iflag, outalts, sidate, 
istid)
   select $newScid,  i.value, i.iflag, i.outalts, i.sidate, 
i.istid

 from schItem i, schItem s
   where (i.scid=$wrongScid and i.item = $searchItem)
 and (s.scid=$newScid)
   group by i.value, i.iflag, i.outalts, i.sidate, i.istid, 
i.scid;

   $result = $conn-exec($cmd);
   $OK +=  cmp_eq($conn,PGRES_COMMAND_OK, $result-resultStatus);

   $cmd = delete from schItem where scid = $wrongScid and item = 
$searchItem;


ps
   the full perl script is attached

thanks again
art


--
Arthur R. Van Hook


Hi Arthur,

I'm not totally clear on your question but generally speaking there is 
one sequence per primary key, by default on each table. So if you reset 
that key, then your table will start issuing keys at that new number. 
Another way to be more safe is to +5 your sequence, so that even if a 
few inserts slip in, you're still ahead of the game..


Steve




--
Sent via pgsql-sql mailing list (pgsql-sql@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-sql

Re: [SQL] using copy from in function

2008-03-05 Thread Steve Midgley

At 03:20 AM 3/5/2008, [EMAIL PROTECTED] wrote:

Date: Wed, 5 Mar 2008 01:51:19 +0300
From: Yura Gal [EMAIL PROTECTED]
To: pgsql-sql@postgresql.org
Subject: using copy from in function
Message-ID: 
[EMAIL PROTECTED]

I'm trying to automate import data using CORY FROM. For this purpose I
wrote plpgsql function. As my postgres works on windows vista I need
to use E'' syntax for path-to-file. This syntax works fine in SQL
queries like:
COPY table FROM E'path_to_file_with_double_backslashes';

[snip]

  _file := $$c:\folder1\folder2\$$ || _chrom || '.txt';

Hi,

I'm not sure if this is related, but I have had terrible trouble using 
\ marks for paths in WinXP.. I have found surprisingly that / work 
and don't cause any parsing problems. Also, I believe that if you use 
syntax like:

'/folder1/folder2/' || _chrom || '.txt'

(i.e. leaving off the c: part too), you may find that everything just 
works a little cleaner / fewer unexpected surprises.

Like I said, I don't know if this is your issue (and Vista), but it's 
been my experience with WinXP and file paths in Postgresql.

Best,

Steve

--
Sent via pgsql-sql mailing list (pgsql-sql@postgresql.org)
To make changes to your subscription:
http://mail.postgresql.org/mj/mj_wwwusr?domain=postgresql.orgextra=pgsql-sql

Re: [SQL] Usage of UUID with 8.3 (Windows)

2008-02-12 Thread Steve Midgley

At 09:20 AM 2/12/2008, [EMAIL PROTECTED] wrote:

Date: Mon, 11 Feb 2008 11:56:33 -0500
From: Tom Lane [EMAIL PROTECTED]
To: [EMAIL PROTECTED]
Cc: pgsql-sql@postgresql.org
Subject: Re: Usage of UUID with 8.3 (Windows)
Message-ID: [EMAIL PROTECTED]

[EMAIL PROTECTED] writes:
 Now, what do I have to do in order to generate a valid UUID (or the 
5 different versions as implemented by the RFC) under Windows?

Figure out how to build ossp-uuid on Windows ...

regards, tom lane

You can review this thread too. MS provides native WinAPI tools and 
language wrappers for this job:

http://answers.google.com/answers/threadview?id=553194

Here's the specific API call, which could possibly be wrapped inside PG 
- however that raises the problem Tom notes about boatloads of 
platform specific code -- I'd guess that a custom written stored 
procedure is the way to go? Not sure how to make WinAPI calls from PG - 
perhaps someone on list has experience or references for that:

http://msdn.microsoft.com/library/default.asp?url=/library/en-us/rpc/rpc/uuidcreate.asp

HTH,

Steve

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

[SQL] Columns view? (Finding column names for a table)

2008-02-06 Thread Steve Midgley


Hi,

I see this documentation item but can't figure out how to use it:

http://www.postgresql.org/docs/8.2/interactive/infoschema-columns.html

The view columns contains information about all table columns (or view 
columns) in the database.


However, if I execute select columns; I get a not found error. I'm 
sure there's some simple explanation - I'm interested specifically in 
listing the column names of a specific table (in Pg 8.2). The manual 
SQL I've developed is (where [table_name] is the table I want columns 
for):


 select pg_attribute.attname, * from pg_attribute
  join pg_class on pg_class.oid = pg_attribute.attrelid
where
  pg_class.relname = '[table_name]'
  and
  attnum  0
  and atttypid  0

Not pretty but seems to work. Of course if there were a view that 
encapsulated this and future-proofed it, that'd be much nicer.


Any assistance is appreciated! Thanks,

Steve


---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

   http://www.postgresql.org/about/donate

Re: [SQL] accounting schema

2008-02-06 Thread Steve Midgley

At 05:09 PM 2/6/2008, [EMAIL PROTECTED] wrote:

Date: Wed, 6 Feb 2008 17:08:54 -0800
From: Medi Montaseri [EMAIL PROTECTED]
To: pgsql-sql@postgresql.org
Subject: accounting schema
Message-ID: 
[EMAIL PROTECTED]

Hi,

I am learning my way into Accounting and was wondering how Accounting
applications are designed. perhaps you could point the way

On one hand, accountants talk about a sacret equation A = L + OE 
(Asset =
Libility + Owner Equity) and then under each categories there are one 
or

many account. On the other hand a DBA thinks in terms of tables and
relations. Instead of getting theoritical, allow me to setup an 
example

Hi Medi,

You might read some source code and docs for open source accounting 
software and see how it's done. Here's one example that might be 
interesting and simple enough to follow:

http://www.gnucash.org/

In general, I think there are many different accounting methods, so you 
have to get clear about which one you're using. Double-entry 
accounting is common. Cash vs. accrual posting methods matter (i.e. 
when does an expense or receivable charge against the assets 
balance?)

My most basic understanding is that in general you track assets as they 
come in, to an Accounts Receivable ledger (i.e. table) and 
liabilities to an Accounts Payable ledger. Then you reconcile these 
two books into a General Ledger table which gives you something 
like an audit trail of all activity (and a running balance). I'm sure 
Wikipedia will define these three terms and lots more with greater 
clarity.

But my (limited) experience with accounting schema is that they often 
involve these three tables (AR/AP/GL) at their core.

As you add bank accounts, complex investment instruments, depreciation 
etc, things get considerably more complex of course.

I'll readily admit my limited experience, and I'm sure others on this 
list have far better information. I hope this gets you started anyway.

Sincerely,

Steve

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

Re: [SQL] accounting schema

2008-02-06 Thread Steve Midgley


At 06:54 PM 2/6/2008, Medi Montaseri wrote:

Thanks Steve...

And finally you mentioned that bank accounts are tricky...can you 
expand on this please. After all I am under the impression that bank 
accounts are a corner stone of this whole book keeping...I 
mean...bank accounts have debits and credits just like any 
account...eg interest earned is a credit and bank fees are 
debits...what worries you about bank accounts...


Thanks guys...this is very very nice

Medi


Hi Medi,

We may be well off-topic for Pgsql but it is a modeling issue so I'll 
reply on-list..


Consider this scenario: You want to track your AP/AR accounts by 
business unit - so you have a Marketing account, Sales account, 
Engineering account, etc. But let's say you want to keep all the money 
for all the units in only two bank accounts depending on which region 
the expenses will be paid out from (to reduce processing fees). So you 
might pay for things accrued by Engineering and Marketing on the West 
Coast from one bank account and Engineering and Marketing expenses on 
the East Coast from the other (a US-centric example).


So the bank accounts where cash is actually deposited/withdrawn is 
different from where the money is received or spent from a logical 
perspective in the AR/AP ledgers.


This is a simple example and real-world issues become truly horrendous. 
(For example, try to account for investment instruments like long term 
property holdings with lines of credit against the real estate).


I've always thought that if DBA's existed when Accounting was invented, 
things would look very, very different. :)


Good luck,

Steve


---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org

[SQL] Negative numbers for PK/ID's?

2008-02-05 Thread Steve Midgley


Hi,

A while ago on a different SQL platform, I had the idea to use negative 
numbers as id's for certain system records that I didn't prefer to have 
interspersed with other records in a table. (For example, we had 
template records which we used to spawn new records, and rather than 
store them in a different table we just had records id=-1,-2 etc and 
did an insert/select to create new live rows based on the templates). 
Preventing them from displaying in the results was as simple as 
ensuring that all select statements had a id  0 statement in them..


I'm wondering if there are any Bad Things that happen if I use negative 
integers for primary key values in Postgres (v8.2)? My primary keys are 
all bigserial type.


Any comments or advice?

Thanks!

Steve


---(end of broadcast)---
TIP 6: explain analyze is your friend

[SQL] Re: Proposed archival read only trigger on rows - prevent history modification

2008-01-29 Thread Steve Midgley

At 07:50 PM 1/29/2008, [EMAIL PROTECTED] wrote:

Date: Mon, 28 Jan 2008 20:16:35 -0800
From: Bryce Nesbitt [EMAIL PROTECTED]
To: pgsql-sql@postgresql.org
Subject: Proposed archival read only trigger on rows - prevent history 
modification

[snip]
I'm considering building a protective mechanism, and am seeking 
feedback

on the idea.  The approach would be to add a new column named ro to
each table at invoice level and below.  Then have a trigger on
'ro'==true deny the write, and probably raise a huge stink.  As 
invoice
are mailed each month, all the supporting data would be set to ro 
true.

[snip]

Hi Bryce,

I have a similar situation but a little in reverse. I have many sets of 
*incoming* records, which I want to preserve, though at any one time 
there is only one live version of the incoming records. Sometimes I 
have to read and compare versions of the records, live or otherwise. 
The logical records I'm talking about occupy a number of tables that 
are joined together in the database itself.

My solution, which required a little middleware engineering, was to 
create two tables for each table that had multiple versions of 
records. So, I have  a property table and a property_versions table 
that have identical table structures. I have a column common to every 
such versioned set of tables called import_group_id. For live tables, 
this just tells me which version from the *_versions table is 
currently being used for that row. The live tables have primary keys 
just like normal (id as a serial int field). The versions tables' 
primary keys are different, compound keyed off id and 
import_group_id. This permits normalization but also allows multiple 
versions of the same records.

In your case, I'd say you could archive your data table to a 
data_versions table. You might archive periodically and leave the 
records on the live table (but knowing that audit versions are safely 
tucked away and easily accessible/comparable), or you might migrate the 
records off the live table onto the versions table (insert followed by 
a delete in a transaction). If you adopted the latter method, you could 
union the two tables to get a complete set of rows. (Via a view even? 
Not sure if you can create a view on a union but it seems likely you 
can..)

You could also choose (like I did) to store multiple versions of the 
records, if your data are slowly changing rather than completely 
unchanging. However it sounds like your situation is such that you want 
to ensure old records are not modified ever. If you set up triggers 
and/or permission on the data_versions table, no one can ever delete 
or modify anything there. This might work perfectly for your purposes. 
Data in the live table can be edited as you like but changes to the 
data_versions table is simply not permitted. Inserts are allowed to 
data_versions but nothing else..

I hope this idea is useful. If I haven't explained it well, drop me a 
line and I'll try to clarify. Good luck with the project!

Steve

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly

Re: [SQL] improvements to query with hierarchical elements

2008-01-25 Thread Steve Midgley



At 07:24 PM 1/22/2008, you wrote:

Hi all,

I have created a little test database to help illustrate my situation.

CREATE TABLE categories (
id integer NOT NULL,
name character varying(255) NOT NULL,
description character varying(255),
vocabulary_id integer,
derived boolean
);

CREATE TABLE category_descendants (
id integer NOT NULL,
ancestor_id integer,
descendant_id integer,
distance integer,
derived boolean
);

CREATE TABLE category_links (
id integer NOT NULL,
parent_id integer,
child_id integer,
derived boolean
);
[snip..]
As stated in my last post, any help you can give on how to improve 
queries of this type would be very much appreciated.


Thanks!
Ryan



Hi Ryan,

I've been toying with your sample data for a bit and I apologize but 
your query has me baffled. Not that it's wrong - it actually looks very 
sophisticated, but it seems super complex to me - kind of like how I 
usually feel reading perl.. :)


I'm sure real sql-heads would get it right away but I'm not able to.

If you're looking to optimize the use-case you provided in your first 
email, the best thing I can suggest from what I understand would make 
an assumption:


Are the data in your tables are slowly changing? So could you build 
some analytic/pre-calculated data into these tables or related 
supporting ones to guide your searches/queries?


For example, if you want to find only records which are immediate 
children of other records, why not make a table which stores just that 
information? Your current tables are fully hierarchical which is great, 
but you want to look things up quickly based on a specific 
relationship: records who are direct children of a particular record..


So if you made a calculated table that stores this information, you 
could keep it up to date either by running the calculation script 
periodically or by attaching updates to relevant triggers / rules.


I'm sorry I'm not able to get into the SQL / example you sent further. 
I got lost in the code, which I'm a little embarrassed to admit but 
there you are.


If you're interested in this idea of precalculating values to optimize 
your search, I'd be happy to discuss further. Also, Ralph Kimball's 
Data Warehousing books are excellent on this subject (one of the few 
authors who truly changed the way I think about data).


Steve


---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

Re: [SQL] improvements to query with hierarchical elements

2008-01-21 Thread Steve Midgley

Date: Sun, 20 Jan 2008 20:01:08 -0800
From: Ryan Wallace [EMAIL PROTECTED]
To: pgsql-sql@postgresql.org
Subject: improvements to query with hierarchical elements
Message-ID: [EMAIL PROTECTED]
Greetings,

I have a complex query which I am trying to figure out the most 
efficient

way of performing.

My database is laid out as follows:
items -have_many- events -have_many- event_locations -have_many-
locations

also rows in the location_links table link two locations together in a
parent-child relationship and rows in the location_descendants table 
provide

a full list of the descendants of a
particular location.

I am trying to find all locations which both are direct children of a 
given
parent location, and are associated with at least one item in a 
constrained

subset of items.
(eg. Find all states of the USA in which at least one wooden axe was 
made.

Also find the number of wooden axes made in each state.)

I have developed the following query:

SELECT  locations.*,
location_ids.item_count AS item_count
FROMlocations
JOIN
(SELECT immediate_descendants.ancestor_id AS id,
COUNT(DISTINCT creation_events.item_id) AS
item_count
FROMevent_locations
JOIN
(SELECT *
FROMlocation_descendants
WHERE   ancestor_id IN
(SELECT child_id
FROMlocation_links
WHERE   parent_id = *note 1*
)
) AS immediate_descendants
ON  event_locations.location_id =
immediate_descendants.descendant_id
JOIN
(SELECT *
FROMevents
WHERE   item_id IN (*note 2*) AND
association = 'creation'
) AS creation_events
ON  event_locations.event_id =
creation_events.id
GROUP BY immediate_descendants.ancestor_id
) AS location_ids ON locations.id = location_ids.id

*note 1* - the id of the parent location.
*note 2* - the query which returns a list of constrained item ids

This works but I am looking for any way to improve the performance of 
the
query (including changing the layout of the tables). Any ideas, 
suggestions

or general pointers would be greatly appreciated.

Thanks very much,
Ryan

Hi Ryan,

I have built some similar queries so I might be able to help you. But 
it's a little hard (for me) to dig into your query without a test set. 
Could you please post some create table and insert statements to give 
us a little test bed to run your query in? I realize that may be a fair 
bit of work for you but it would help me to give you some ideas.

Without seeing a more formal schema and being able to toy with it, I'm 
not sure I can give good advice. Others may have different opinions 
which I would welcome.

Sincerely,

Steve

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

Re: [SQL] UTF8 encoding and non-text data types

2008-01-14 Thread Steve Midgley

On Jan 13, 2008 8:51 PM, Steve Midgley 
mailto:[EMAIL PROTECTED][EMAIL PROTECTED] wrote:
At 02:22 PM 1/13/2008, 
mailto:[EMAIL PROTECTED][EMAIL PROTECTED] 
wrote:

Date: Sat, 12 Jan 2008 14:21:00 -0800
From: Medi Montaseri mailto:[EMAIL PROTECTED] 
[EMAIL PROTECTED]

To: mailto:pgsql-sql@postgresql.orgpgsql-sql@postgresql.org
Subject: UTF8 encoding and non-text data types
Message-ID:
mailto:[EMAIL PROTECTED] 
  [EMAIL PROTECTED]

I understand PG supports UTF-8 encoding and I have sucessfully
inserted
Unicode text into columns. I was wondering about other data types 
such

as
numbers, decimal, dates

That is, say I have a table t1 with
create table t1 { name text, cost decimal }
I can insert UTF8 text datatype into this table with no problem
But if my application attempts to insert numbers encloded in UTF8,
then I
get wrong datatype error

Is the solution for the application layer (not database) to convert
the
non-text UTF8 numbers to ASCII and then insert it into database ?

Thanks
Medi

Hi Medi,

I have only limited experience in this area, but it sounds like you
sending your numbers as strings? In your example:

create table t1 { name text, cost decimal };

insert into t1 (name, cost) values ('name1', '1');

I can't think of how else you're sending numeric values as UTF8? I 
know
that Pg will accept numbers as strings and convert internally (that 
has

worked for me in some object relational environments where I don't
choose to cope with data types), but I think it would be better if you
simply didn't send your numeric data in quotations, whether as UTF8 or 

ASCII. If you don't have control over this layer (that quotes your
values), then I'd say converting to ASCII would solve the problem. But
better to convert to numeric and not ship quoted strings at all.

I may be totally off-base and missing something fundamental and I'm
very open to correction (by anyone), but that's what I can see here.

Best regards,

Steve
At 11:01 AM 1/14/2008, Medi Montaseri wrote:
Thanks Steve,

Actually I do not insert text data into my numeric field.
As I mentioned given
create table t1 { name text, cost decimal }
then I would like to insert numeric data into column cost because 
then I can later benefit from numerical operators like SUM, AVG, etc

More specifically, I am using HTML, Perl and PG. So from the HTML 
point of view a textfield is just some strings. So my user would enter 
12345 but expressed in UTF8. Perl would get this and use DBI to insert 
it into PG

What I am experiencing now is that DB errors that I am trying to 
insert an incorrect data into column cost which is numeric and the 
data is coming in from HTML in UTF8

Mybe I have to convert it to ASCII numbers in Perl before 
inserting  them into PG

Thanks
Medi

Hi Medi,

I agree that you should convert your values in Perl before handing to 
DBI. I'm not familiar with DBI but presumably if you're sending it UTF8 
values it's attempting to quote them or do something with them, that a 
numeric field in Pg can't handle. Can you trap/monitor the exact sql 
statement that is generated by DBI and sent to Pg? That would help a 
lot in knowing what it is doing, but I suspect if you just convert your 
numbers from the HTML/UTF8 source values into actual Perl numeric 
values and then ship to DBI you'll be better off. And you'll get some 
input validation for free.

I hope this helps,

Steve

Re: [SQL] UTF8 encoding and non-text data types

2008-01-14 Thread Steve Midgley


At 12:43 PM 1/14/2008, Medi Montaseri wrote:
Here is my traces from perl CGI code, I'll include two samples one in 
ASCII and one UTF so we know what to expect


Here is actual SQL statement being executed in Perl and DBI. I do not 
quote the numerical value, just provided to DBI raw.


insert into t1 (c1, cost) values ('tewt', 1234)
this works find
insert into t1 (c1, cost) values ('#1588;#1583;', 
#1777;#1778;#1779;#1780;)
 DBD::Pg::db do failed: ERROR:  syntax error at or near ; at 
character 59,


And the PG log itself is very similar and says
ERROR:  syntax error at or near ; at character 59

Char 59 by the way is the first accurance of semi-colon as in #1; 
which is being caught by PG parser.


Medi


On Jan 14, 2008 12:18 PM, Steve Midgley 
mailto:[EMAIL PROTECTED][EMAIL PROTECTED] wrote:


On Jan 13, 2008 8:51 PM, Steve Midgley 
mailto:[EMAIL PROTECTED][EMAIL PROTECTED] wrote:
At 02:22 PM 1/13/2008, 
mailto:[EMAIL PROTECTED][EMAIL PROTECTED] 
wrote:

Date: Sat, 12 Jan 2008 14:21:00 -0800
From: Medi Montaseri mailto:[EMAIL PROTECTED] 
[EMAIL PROTECTED]

To: mailto:pgsql-sql@postgresql.orgpgsql-sql@postgresql.org
Subject: UTF8 encoding and non-text data types
Message-ID:
mailto:[EMAIL PROTECTED] 
  [EMAIL PROTECTED]


I understand PG supports UTF-8 encoding and I have sucessfully
inserted
Unicode text into columns. I was wondering about other data types 
such

as
numbers, decimal, dates

That is, say I have a table t1 with
create table t1 { name text, cost decimal }
I can insert UTF8 text datatype into this table with no problem
But if my application attempts to insert numbers encloded in UTF8,
then I
get wrong datatype error

Is the solution for the application layer (not database) to convert 


the
non-text UTF8 numbers to ASCII and then insert it into database ?

Thanks
Medi
Hi Medi,
I have only limited experience in this area, but it sounds like you
sending your numbers as strings? In your example:
create table t1 { name text, cost decimal };
insert into t1 (name, cost) values ('name1', '1');
I can't think of how else you're sending numeric values as UTF8? I 
know
that Pg will accept numbers as strings and convert internally (that 
has

worked for me in some object relational environments where I don't
choose to cope with data types), but I think it would be better if 
you
simply didn't send your numeric data in quotations, whether as UTF8 
or

ASCII. If you don't have control over this layer (that quotes your
values), then I'd say converting to ASCII would solve the problem. 
But

better to convert to numeric and not ship quoted strings at all.
I may be totally off-base and missing something fundamental and I'm
very open to correction (by anyone), but that's what I can see here.
Best regards,
Steve
At 11:01 AM 1/14/2008, Medi Montaseri wrote:
Thanks Steve,

Actually I do not insert text data into my numeric field.
As I mentioned given
create table t1 { name text, cost decimal }
then I would like to insert numeric data into column cost because 
then I can later benefit from numerical operators like SUM, AVG, etc


More specifically, I am using HTML, Perl and PG. So from the HTML 
point of view a textfield is just some strings. So my user would 
enter 12345 but expressed in UTF8. Perl would get this and use DBI to 
insert it into PG


What I am experiencing now is that DB errors that I am trying to 
insert an incorrect data into column cost which is numeric and the 
data is coming in from HTML in UTF8


Mybe I have to convert it to ASCII numbers in Perl before 
inserting  them into PG


Thanks
Medi


Hi Medi,

I agree that you should convert your values in Perl before handing to 
DBI. I'm not familiar with DBI but presumably if you're sending it 
UTF8 values it's attempting to quote them or do something with them, 
that a numeric field in Pg can't handle. Can you trap/monitor the 
exact sql statement that is generated by DBI and sent to Pg? That 
would help a lot in knowing what it is doing, but I suspect if you 
just convert your numbers from the HTML/UTF8 source values into actual 
Perl numeric values and then ship to DBI you'll be better off. And 
you'll get some input validation for free.


I hope this helps,

Steve


Hi Medi,

That structure for numeric values is never going to work, as best as I 
understand Postgres (and other sql pipes). You have to convert those 
UTF chars to straight numeric format. Hopefully that solves your 
problem? I hope it's not too hard for you to get at the code which is 
sending the numbers as UTF?


Steve

Re: [SQL] UTF8 encoding and non-text data types

2008-01-13 Thread Steve Midgley

At 02:22 PM 1/13/2008, [EMAIL PROTECTED] wrote:

Date: Sat, 12 Jan 2008 14:21:00 -0800
From: Medi Montaseri [EMAIL PROTECTED]
To: pgsql-sql@postgresql.org
Subject: UTF8 encoding and non-text data types
Message-ID: 
[EMAIL PROTECTED]

I understand PG supports UTF-8 encoding and I have sucessfully 
inserted
Unicode text into columns. I was wondering about other data types such 
as

numbers, decimal, dates

That is, say I have a table t1 with
create table t1 { name text, cost decimal }
I can insert UTF8 text datatype into this table with no problem
But if my application attempts to insert numbers encloded in UTF8, 
then I

get wrong datatype error

Is the solution for the application layer (not database) to convert 
the

non-text UTF8 numbers to ASCII and then insert it into database ?

Thanks
Medi

Hi Medi,

I have only limited experience in this area, but it sounds like you 
sending your numbers as strings? In your example:

create table t1 { name text, cost decimal };

insert into t1 (name, cost) values ('name1', '1');

I can't think of how else you're sending numeric values as UTF8? I know 
that Pg will accept numbers as strings and convert internally (that has 
worked for me in some object relational environments where I don't 
choose to cope with data types), but I think it would be better if you 
simply didn't send your numeric data in quotations, whether as UTF8 or 
ASCII. If you don't have control over this layer (that quotes your 
values), then I'd say converting to ASCII would solve the problem. But 
better to convert to numeric and not ship quoted strings at all.

I may be totally off-base and missing something fundamental and I'm 
very open to correction (by anyone), but that's what I can see here.

Best regards,

Steve

---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org

[SQL] Re: How to keep at-most N rows per group? periodic DELETEs or constraints or..?

2008-01-09 Thread Steve Midgley

At 07:20 AM 1/9/2008, [EMAIL PROTECTED] wrote:

Date: Tue, 8 Jan 2008 17:41:18 +
From: Jamie Tufnell [EMAIL PROTECTED]
To: pgsql-sql@postgresql.org
Subject: Re: How to keep at-most N rows per group? periodic DELETEs or 
constraints or..?
Message-ID: 
[EMAIL PROTECTED]

On 1/8/08, codeWarrior [EMAIL PROTECTED] wrote:
 Jamie:

 I think you are probably having slowdown issues in your DELETE 
FROM WHERE
 NOT IN SELECT ORDER BY DESCENDING construct -- that seems a bit 
convoluted

 to me

Hmm so rather than NOT IN ( .. LIMIT 50)  would you suggest IN ( ...
OFFSET 50) like in Erik's example?  Or something else entirely?

 ALSO: It looks to me like you have a column named timestamp' ??? 
This is
 bad practice since timestamp is a reserved word... You really 
ought NOT to

 use reserved words for column names... different debate.

I do realize it would be better to use something else and thanks for
the tip   This is an established database and timestamp has been
used in other tables which is why I stuck to it here.. one day when
time permits maybe I'll rename them all!

 Why bother deleting records anyway ? Why not alter your query that 
tracks

 the 50 records to LIMIT 50 ???

The read query does LIMIT 50 and the reason for deleting the rest of
the records is because they're not needed by the application and
there's loads of them being created all the time (currently several
million unnecessary rows) -- I imagine eventually this will slow
things down?

Do you think a regular batch process to delete rows might be more
appropriate than a trigger in this scenario?

Thanks,
Jamie

This is kludgy but you would have some kind of random number test at 
the start of the trigger - if it evals true once per every ten calls to 
the trigger (say), you'd cut your delete statements execs by about 10x 
and still periodically truncate every set of user rows fairly often. On 
average you'd have ~55 rows per user, never less than 50 and a few 
outliers with 60 or 70 rows before they get trimmed back down to 50.. 
Seems more reliable than a cron job, and solves your problem of an ever 
growing table? You could adjust the random number test easily if you 
change your mind of the balance of size of table vs. # of delete 
statements down the road.

Steve

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

[SQL] Re: How to keep at-most N rows per group? periodic DELETEs or constraints or..?

2008-01-08 Thread Steve Midgley

I think what you want is related to this post on how to create a FIFO 
queue in Postgres:


http://people.planetpostgresql.org/greg/index.php?/archives/89-Implementing-a-queue-in-SQL-Postgres-version.html

The major difference is that you want a FIFO queue per user_id, so the 
triggering code would want to bump old records aggregating on user_id 
to calculate the limit subquery. His original code is this:



 DELETE FROM q WHERE id NOT IN
(SELECT id FROM q ORDER BY id DESC LIMIT maxrows);

And that subquery is where (I think!) you'd want to add where user_id 
= [user_id] - I'm not sure how you'll pass user_id into this function, 
maybe someone else can help with that?


Hopefully this is useful?

Steve

At 06:24 AM 1/8/2008, [EMAIL PROTECTED] wrote:

Date: Tue, 8 Jan 2008 14:24:22 +
From: Jamie Tufnell [EMAIL PROTECTED]
To: pgsql-sql@postgresql.org
Subject: How to keep at-most N rows per group? periodic DELETEs or 
constraints or..?
Message-ID: 
[EMAIL PROTECTED]


Hi,

I have a table that stores per-user histories of recently viewed items
and I'd like to limit the amount of history items to = 50 per user.
I'm considering doing this with a query run from cron every so often
but I'm not happy with what I've come up with so far, and since it's a
quite active table I thought I'd ask here to see if there's a more
efficient way.

Right now the table structure is as follows...

user_item_history: id (PK), user_id (FK), item_id (FK), timestamp

For user_ids that have more than 50 rows, I want to keep the most
recent 50 and delete the rest.

The most obvious way of doing this for me is:

--
-- Get the user_ids with 50 or more history entries like this
--
SELECT user_id, count(*)
FROM user_scene_history
GROUP BY user_id
HAVING count(*)  50;

--
-- Then iterate the ids above (_user_id)
--
DELETE FROM user_scene_history
WHERE user_id = _user_id AND id NOT IN (
SELECT id FROM user_scene_history
WHERE user_id = _user_id
ORDER BY timestamp DESC
LIMIT 50);

I've left out the simple logic tying the above two queries together
for clarity..

I haven't actually tested this but while I assume it would work I
imagine there is a neater and possibly more efficient way of attacking
this.  I'm also open to different approaches of limiting the user's
history too ... perhaps with table constraints so they can simply
never exceed 50 entries? But I'm not sure how to do this..

Any help would be greatly appreciated..

Thanks,
Jamie

[SQL] System catalog future changes

2007-12-18 Thread Steve Midgley


Hello again,

Reading a previous recent post and answers called Describe Table got 
me thinking about a little piece of SQL I use in an application to get 
a list of all the tables for a specific namespace:


select pg_class.relname as table_name
from pg_class
 join pg_namespace on pg_namespace.oid = relnamespace
where pg_class.relkind = 'r' and pg_namespace.nspname = 'public'
order by pg_class.relname

I have assumed that this sort of query would be relatively durable - 
that future system catalog changes are unlikely to make this query stop 
working? Does anyone have any knowledge of how fixed the columns and 
values of this query are (i.e. are there a lot of internal and external 
dependencies that make future Pg versiosn unlikely to break the above 
code)?


Any other input on the above SQL - should I be doing this in another 
way?


Thanks for any thoughts or advice,

Steve


---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match

Re: [SQL] System catalog future changes

2007-12-18 Thread Steve Midgley


Thanks Erik - that cleans things up significantly for me.

For the record, the SQL for finding all the table 
names (alpha ordered) in the public schema using pg_tables view is:


select tablename from pg_tables where 
schemaname='public' order by tablename


Steve

At 09:38 AM 12/18/2007, Erik Jones wrote:

On Dec 18, 2007, at 10:53 AM, Steve Midgley wrote:


Hello again,

Reading a previous recent post and answers called Describe Table
got me thinking about a little piece of SQL I use in an application
to get a list of all the tables for a specific namespace:

select pg_class.relname as table_name
from pg_class
 join pg_namespace on pg_namespace.oid = relnamespace
where pg_class.relkind = 'r' and pg_namespace.nspname = 'public'
order by pg_class.relname

I have assumed that this sort of query would be relatively
durable - that future system catalog changes are unlikely to make
this query stop working? Does anyone have any knowledge of how
fixed the columns and values of this query are (i.e. are there a
lot of internal and external dependencies that make future Pg
versiosn unlikely to break the above code)?

Any other input on the above SQL - should I be doing this in
another way?

Thanks for any thoughts or advice,


If all you're looking for is regular tables, the I'd use the
pg_tables view.

Erik Jones

Software Developer | Emma®
[EMAIL PROTECTED]
800.595.4401 or 615.292.5888
615.292.0777 (fax)

Emma helps organizations everywhere communicate  market in style.
Visit us online at http://www.myemma.com






---(end of broadcast)---
TIP 6: explain analyze is your friend

Re: [SQL] Query design assistance - getting daily totals

2007-12-13 Thread Steve Midgley


Hi,

Rodrigo is exactly right in my opinion. To provide a little more info 
on this calendar or day dimension idea..


You can create, for example, a time table dimension which stores every 
day of every year as a unique record (for as far into the future as you 
need). You can then associate various attributes to each day, depending 
on your business needs like so:


id|datetime|is_business_day|is_weekday|is_fed_holiday

Of course it's not normalized but that's the point. You then just store 
the id in various places and it's easy to join back to this table and 
figure out if a particular day has an attribute you're interested in 
(or you can find the id's for all the days which have a particular 
attribute for a given date range - to go the other direction, for 
example).


You can get more on this type of thinking from the most excellent 
resource by Ralph Kimball The Data Warehouse Toolkit: The Complete 
Guide to Dimensional Modeling (Second Edition) - this book did more to 
open my eyes to alternative to traditional normalized modeling than 
anything else. It also made me feel less guilty about building certain 
non-normal structures. :)


I hope that's helpful..

Steve

At 12:21 PM 12/13/2007, [EMAIL PROTECTED] wrote:

Date: Wed, 12 Dec 2007 14:53:08 -0500
From: =?UTF-8?Q?Rodrigo_De_Le=C3=B3n?= [EMAIL PROTECTED]
To: Paul Lambert [EMAIL PROTECTED]
Cc: pgsql-sql@postgresql.org
Subject: Re: Query design assistance - getting daily totals
Message-ID: 
[EMAIL PROTECTED]


On Dec 12, 2007 1:39 AM, Paul Lambert [EMAIL PROTECTED] 
wrote:

 It's a financial application which needs to work using a concept of
 'financial periods' which may not necessarily correspond to 
calendar
 months and it's much easier to manage in this way than it is to 
merge it
 all together using a date field. Eg, 1st January may actually be 
the
 15th 'working day' of the 9th 'financial period' - however looking 
at

 just a date of jan-1 there is no way of knowing this and it's the
 periods that matter more so than the actual date.

I think what you need is a Calendar Table to map actual dates to
buckets e.g. 'financial periods', etc. See:

http://codeinet.blogspot.com/2006/08/auxiliary-calendar-table-for-sql.html



---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org

Re: [SQL] INSERT INTO relational tables

2007-12-08 Thread Steve Midgley

At 09:23 AM 12/7/2007, [EMAIL PROTECTED] wrote:

Date: Fri, 07 Dec 2007 14:22:26 +0100
From: Stefan Scheidegger [EMAIL PROTECTED]
To: pgsql-sql@postgresql.org
Subject: INSERT INTO relational tables
Message-ID: [EMAIL PROTECTED]

Hi all

I'm new to SQL and I'm facing a problem I can't 
find any information about (google, forums). By 
now I wonder if I understood something wrong about relational DBs.

An example to explain my Problem:
Lets say I have a table containing information 
about the customer (name, address, ...) and 
about his order (pieces, product-name, price). 
Because one customer can order several products 
I split the table into two relational tables to prevent redundancy:

tbl_customer (cust_id, cust_name, cust_address)
and
tbl_order (ord_pieces, ord_productname, 
ord_price, ord_customer REFERENCES tbl_customer(cust_id))

Now I want to insert several new customers with 
several orders each at once. If I had all 
information in one table, this would be easy with something like:

INSERT INTO tbl_customerorders (name, address, 
pieces, porductname, price) VALUES ('MR. X', 
'1st street', 3, 't-shirts', 30), ('MR. X', '1st 
street', 5, 'books', 50),  ('MRS. Y', '2nd 
street', 1, 't-shirt', 10),...

But how can I do this in one query if I split 
the table? I can add one new customer, get his 
ID with curval() and then add his orders. But 
this wonât work if I want to add several customers at once.

To read this information I can do a query with 
the argument WHERE cust_id=ord_customer. I can 
create a VIEW doing this so I can read the data 
as if it was stored in only one table. But is 
there in posgres/sql an abstraction-layer that 
allows me to insert as if the information was 
stored in one table? (Something like a VIEW that 
provides INSERT, UPDATE, â¦ and automatically 
inserts the referenced ID.)

Thanks for any help!
Stefan

Andrea has given you some deep answers (i.e. 
smarter than what I'm going to say) but since you 
say you're a newbie to SQL maybe this will help.

Perhaps all you need is to wrap your entire set 
of statements into a transaction 
(http://www.postgresql.org/docs/8.2/interactive/tutorial-transactions.html) 
to ensure that your entires are acid 
(http://databases.about.com/od/specificproducts/a/acid.htm).

BEGIN
insert tbl_customer...;
insert tbl_order...;
insert tbl_order...;
END

It's not clear whether you're looking for syntax 
simplicity or referential integrity (or both). If 
syntax is your need, then Andreas has given some good ideas.

As another thought about syntax enhancement: 
there are lots of object-relational mappers out 
there that let you model your relations in a 
higher level language (I use Ruby and 
ActiveRecord which are easy to write and learn, 
but Java, .NET, etc all have great stuff). Then 
you just build your entities in the domain 
specific language and it handles all the id 
inserts and relational mapping for you. They can 
even handle mapping many-to-many joined entities, 
if you're careful in setting it up.

I hope this is helpful,

Steve

---(end of broadcast)---
TIP 6: explain analyze is your friend

[SQL] (repost) Help understanding expressions in order by clause

2007-10-25 Thread Steve Midgley


Hi,

(I posted this to the list on 10/25 but it didn't seem to get 
distributed - apologies if it did and I'm actually double posting right 
now..)


I've read on this list about some pretty powerful examples of using 
expressions in order by clauses to sort according to very complex 
rules. I kind of half get it and was playing around with it today to 
learn more. I'm hoping for one or two additional pointers from the 
resident experts..


Take this sample:

--sql start
drop table if exists test_order;
create temp table test_order
(
 id serial,
 email varchar(255)
);
insert into test_order (email)
values ('[EMAIL PROTECTED]');
insert into test_order (email)
values ('[EMAIL PROTECTED]');
insert into test_order (email)
values ('[EMAIL PROTECTED]');
insert into test_order (email)
values ('[EMAIL PROTECTED]');
insert into test_order (email)
values ('[EMAIL PROTECTED]');

select * from test_order
order by id  3, email
--sql end

Putting the expression id  3 causes the id of 3 to sort to the TOP 
of the list. This statement does the same thing:


select * from test_order
order by id = 3 DESC, email

I know there is a good, rational reason for this, and I'm hoping 
someone will explain it to me? From what I can see, true is sorting 
AFTER false in Postgres? In general, is it better/preferred just to use 
case statements to get the results I'm expecting?


Or maybe someone could point me to the relevant docs that explains 
order by behavior in more detail? I read this 
http://www.postgresql.org/docs/8.2/interactive/sql-select.html#SQL-ORDERBY 
and it didn't make things any clearer.


I'm quite interested in the power of expressioned sort by's and so I 
appreciate any assistance in this regard.


Thanks!

Steve


---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

   http://www.postgresql.org/about/donate

[SQL] SQL performance help: self join or static var

2007-09-17 Thread Steve Midgley


Hi,

Given a table (truncated some real fields for simplicity):

CREATE TABLE city
(
  id serial NOT NULL,
  muni_city_id integer,
  post_code_city_id integer,
  alias_city_id integer,
  city_type character varying(15),
  post_code_type character varying(15),
  CONSTRAINT city_pkey PRIMARY KEY (id)
)
WITHOUT OIDS;
CREATE INDEX index_city_on_muni_city_id
  ON city
  USING btree
  (muni_city_id);
CREATE INDEX index_city_on_post_code_type
  ON city
  USING btree
  (post_code_type);

Filled with ~168,000 records

Which of the following SQL statements should I expect better 
performance on?


select * from city
where post_code_type in ('P', 'R')

EXPLAIN ANALYZE:
Seq Scan on city  (cost=0.00..4492.82 rows=76172 width=290) (actual 
time=0.039..163.564 rows=30358 loops=1)
  Filter: ((post_code_type)::text = ANY (('{P,R}'::character 
varying[])::text[]))

Total runtime: 231.947 ms

OR

select * from city
where id = muni_city_id

EXPLAIN ANALYZE:
Seq Scan on city  (cost=0.00..3535.41 rows=383 width=290) (actual 
time=0.022..124.463 rows=30200 loops=1)

  Filter: (muni_city_id = id)
Total runtime: 195.342 ms

In my case both statements are semantically equivalent and I'm trying 
to figure out if I should prefer the search of a varchar field for 
static values over the self join constraint to an indexed integer 
column?


My (almost completely ignorant) eyes say that the latter 
(id=muni_city_id) is faster by a little - any voices of support or 
other ideas in this regard? Caveats?


Thanks,

Steve


---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

Re: [SQL] raw data into table process

2007-08-22 Thread Steve Midgley


Hi,

I'm not sure if you have access to a scripting language (like perl or 
ruby) but my experience is that if you transform the source text file 
into a secondary text file that postgres copy can read 
natively/directly into the data formats you want, the copy command will 
move everything into your primary table and way faster than using sql 
to transform.


I've had great experience with using Ruby/regex to do text file 
transforms such as this - it's amazingly fast (considering many say 
it's a performance dog).


The best way to get started is copy some sample data OUT of PG to a 
file and then use your scripting language to build a transformation 
that formats exactly like your sample. (I've imported polygon, circle 
and point types using this method, as well as dates, like what you're 
trying to do).


I hope that helps - drop me a line off-list if you'd like some sample 
Ruby code to read/write/transform your source.


Steve


At 06:33 AM 8/22/2007, [EMAIL PROTECTED] wrote:

Date: Wed, 22 Aug 2007 14:36:15 +1000
From: novice [EMAIL PROTECTED]
To: [EMAIL PROTECTED], pgsql-sql@postgresql.org
Subject: raw data into table process
Message-ID: 
[EMAIL PROTECTED]


I am trying to record the following entries into a table.  I'm curious
to know if there's an efficient/effective way of doing this?  This
entries come from an ancient datalogger (note: separated by space and
uses YY/MM/DD format to record date)

Plain file sample.dat

3665   OK   BS 07/08/16 07:28
3665   CC   BS 07/08/16 07:29
3665   CS   BS 07/08/16 07:29
3665   CS   BS 07/08/16 07:29
4532   OK   BS 07/08/16 07:34
4004   OK   BS 07/08/16 07:51
3991   OK   BS 07/08/16 07:54


This is the table that I'm adding the entries to

CREATE TABLE maintenance
(
  maintenance_id SERIAL PRIMARY KEY,
  meter_id integer,
  status text,
  inspector text,
  inspection_date timestamp with time zone,
)

--  Begin SQL Script
--  First table to dump the records in
CREATE TABLE dataload1
(data text)

-- Dump records using \copy
\copy dataload1 FROM sample.dat

-- Second table to import unique records ONLY
CREATE TABLE dataload2 AS
  SELECT DISTINCT
  data FROM dataload1;

-- Now I update unique records into the maintenance table
-- maintenance_id is SERIAL so it will be populated automatically
INSERT INTO maintenance(meter_id, status, inspector, inspection_date)
SELECT substr(data, 1, 4)::int
  , substr(data, 8, 3)
  , substr(data, 21, 2)
  , (20||substr(data, 24, 2) ||'-'|| substr(data, 27, 2) ||'-'||
substr(data, 30, 2)||' '||substr(data, 33, 5))::timestamp as
inspection_date
  FROM dataload2
-- So the new records will also be in timestamp order
  ORDER BY inspection_date ;

-- Some housekeeping
VACUUM FULL VERBOSE ANALYZE maintenance;

-- Finally, drop the temporary tables
DROP TABLE dataload1
DROP TABLE dataload2

--  End SQL script

Any thoughts and suggestions welcome.



---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq

Re: [SQL] Increment a sequence by more than one

2007-08-06 Thread Steve Midgley


Hi Peter,

I struggled to implement Michael's suggestion to use CACHE in this 
regard when he made it but after your encouragement I've studied it 
more and you and he are both totally right - CACHE is designed to do 
exactly what I want. Here's the sample code so as to put this issue to 
bed and to record what the solution is:


Scenario:
Bob wants a block of 50 id's
Alice just wants a single id but will accidentally interlope into 
Bob's sequence obtainment.

property_id_seq = 100

Bob:
# alter sequence property_id_seq CACHE 50
Alice:
# select nextval('property_id_seq')
= 101 (wastes ids up to 150)
Bob:
# select nextval('propery_id_seq')
= 151 (Bob now knows that 151-201 are locked permanently for his 
exclusive use)

Bob:
# alter sequence property_id_seq CACHE 1
= Sequence will now return single ids to everyone

So in the worst case, there will be id wastage equal to the CACHE 
size times the number of interlopers who grab ids while Bob is 
obtaining his block. And Bob's time to grab a set of id's is fairly 
small since he's only issuing a couple of very fast sql statements..


NOTE: If all calling parties must agree to always use the same CACHE 
number for obtaining blocks of id's, then this method seems bulletproof 
(if two parties use differing CACHE #'s then they could cause too few 
id's to be CACHED to one of the parties).


I hope this helps someone else on the archives down the road. Thanks to 
everyone for putting their time and attention on this problem. I'm very 
grateful.


Sincerely,

Steve

At 08:00 AM 8/6/2007, Peter Childs wrote:


On 03/08/07, Michael Glaesemann 
mailto:[EMAIL PROTECTED][EMAIL PROTECTED] wrote:


On Aug 3, 2007, at 15:27 , Erik Jones wrote:

 Is there actually a requirement that the block of 5000 values not
 have gaps?

Good point.

 If not, why not make the versioned table's id column default to
 nextval from the same sequence?

Of course, the ids of the two tables could be interleaved in this
case. This might not be an issue, of course.


Michael Glaesemann
grzm seespotcode net


It seams to me that one should use the cache feature of a sequence is 
there just for this purpose.


That way when you get the next value your session caches and any other 
sessions will get one after your cache range.


Peter

[SQL] Increment a sequence by more than one

2007-08-03 Thread Steve Midgley


Hi,

I'm writing an import app in a third party language. It's going to use 
copy to to move data from STDIN to a postgres (8.2) table. There are 
some complexities though: it's going to copy the records to a 
versioned table first, and then at a later time the records will be 
copied by a different procedure to the live table. The live table and 
versioned table are identical in terms of their field definitions. But 
there is no sequence associated with the versioned table (whose primary 
key is id plus import_group_id, whereas the live table's pk is just 
id). So all versioned table entries must already know what their id 
would be in the live table. (This makes sense for other business 
process we have, but it's a bit of a problem in this instance).


My problem: I'd like to be able to grab a block of id's from the live 
table's pk sequence. So let's say my importer has 5,000 new rows to 
import and the current max pk in the live table is 540,203. I'd like to 
be able to increment the primary key sequence in such a way that I get 
a block of ids all to myself and the sequence is reset to 545,203 with 
a guarantee that all the id's between 540203 and 545203 are unused.


I'm guessing this can be done with a stored procedure, but if possible 
I'd like to be able to run this command from my third party app without 
calling a stored procedure (we try not to use stored procedures here 
b/c we code entirely in this third party language - if we had to, it's 
possible we could install a stored procedure though).


But since I've seen so much magic on display from people on this list, 
I'm going to ask if it's possible to do this solely from PG SQL sent 
from a third party language? The main tricky bit seems to be ensuring 
that everything is locked so two such increment calls at the same time 
don't yield overlapping blocks of ids. Is there a way to lock the 
sequence generator for the duration of a nextval and setval call? 
Since pk sequence functions like nextval cannot be rolled back, I'm 
guessing that begin/end won't do the job?


I.e:

-- need magic lock statement on pk sequence here
nextval
-- returns 540203
setval(545203)
-- now sequence is set to where I want it and I own 5000 id's
-- release magic lock here

My fallback is to just have a routine that calls nextval a bunch of 
times and stores all the id's it gets - they may or may not be 
sequential but they'll be unique. This is going to be a really slow way 
to get a large number of id's of course and just seems plain 
wrongheaded in many ways.


Any insights? All help is appreciated and input on a better way to 
solve the problem completely is of course welcome as well.


Sincerely,

Steve

Re: [SQL] Increment a sequence by more than one

2007-08-03 Thread Steve Midgley


Hi Scott,

Thanks for this info (and Michael too!).

Let me see if I understand your suggestion. I would run these three 
commands in sequence:


# select nextval('[my_seq_name]');
returns = 52 [I believe that the sequence is at 52]
# alter sequence [my_seq_name] increment by 5000;
# select nextval('[my_seq_name]');
returns = 5052

If the third command doesn't return 5052 - i.e. it returns 5053, then 
I know that somewhere in this sequence another process grabbed an id 
out from under me. It doesn't matter where, but I would know that my 
5000 id's are not unique and should be discarded? If the third command 
DOES return 5052, then I know that those 5000 id's are locked for my 
use and no other application could have grabbed one of them?


Can anyone see a flaw in that? It looks right to me..

Scott - it also seems to me that I need not waste all those id's if 
another application does grab one during my statement: If I detect a 
failure, I could just reset the pk sequence back to the max id of the 
underlying table before trying again. I think this code would do it 
(stolen from Ruby's postgres adaptor):


SELECT setval('[seq_name]', (SELECT COALESCE(MAX([pk_of_seq]), (SELECT 
min_value FROM [seq_name])) FROM [table_of_pk]), false)


So for table property with pk of id and sequence name 
property_id_seq:


SELECT setval('property_id_seq', (SELECT COALESCE(MAX(id), (SELECT 
min_value FROM property_id_seq)) FROM property), false)


Then I could just retry the first process - though on a table that is 
very busy, I might have retry for a while..


Regarding Michael's suggestion - I tried messing around with LOCK and 
similar commands but they're only allowed to run against TABLES not 
SEQUENCES - too bad - that would have been perfect.


I'm now starting to think that there's no way to solve this problem in 
an elegant manner even in a stored procedure? Your method seems to be 
as good as it's going to get? (Not that I'm complaining!)


Thanks again - any thoughts are welcome,

Steve

At 08:01 PM 8/3/2007, Scott Marlowe wrote:

On 8/3/07, Steve Midgley [EMAIL PROTECTED] wrote:

  Hi,

  I'm writing an import app in a third party language. It's going to 
use
 copy to to move data from STDIN to a postgres (8.2) table. There 
are some
 complexities though: it's going to copy the records to a 
versioned table
 first, and then at a later time the records will be copied by a 
different
 procedure to the live table. The live table and versioned table 
are
 identical in terms of their field definitions. But there is no 
sequence

 associated with the versioned table (whose primary key is id plus
 import_group_id, whereas the live table's pk is just id). So 
all
 versioned table entries must already know what their id would be 
in the
 live table. (This makes sense for other business process we have, 
but it's a

 bit of a problem in this instance).

  My problem: I'd like to be able to grab a block of id's from the 
live
 table's pk sequence. So let's say my importer has 5,000 new rows to 
import
 and the current max pk in the live table is 540,203. I'd like to be 
able to
 increment the primary key sequence in such a way that I get a block 
of ids
 all to myself and the sequence is reset to 545,203 with a guarantee 
that all

 the id's between 540203 and 545203 are unused.

The real danger in doing this is race conditions.  Most anything you
do involves a possible race condition.  As long as the race condition
doesn't result in an id getting used twice, you're safe.

So:

test=# create sequence m;
CREATE SEQUENCE
test=# select nextval('m');
 nextval
-
   1
(1 row)

test=# alter sequence m increment by 5000;
ALTER SEQUENCE
test=# select nextval('m');
 nextval
-
5001
(1 row)

test=# alter sequence m increment by 1;
ALTER SEQUENCE
test=# select nextval('m');
 nextval
-
5002
(1 row)

In this method, the only possible race condition is that someone might
run a nextval('m') between the time you set the increment to 5000 and
1 again.  If that happens, you'd have 5,000 missing ids, but since
sequences are designed to prevent dupes, not holes, that's ok.

  But since I've seen so much magic on display from people on this 
list, I'm
 going to ask if it's possible to do this solely from PG SQL sent 
from a

 third party language? The main tricky bit seems to be ensuring that
 everything is locked so two such increment calls at the same time 
don't
 yield overlapping blocks of ids. Is there a way to lock the 
sequence

 generator for the duration of a nextval and setval call?

Avoiding the setval is the real key.  It doesn't scale.  Missing 5,000
ids is no big deal.  repeating them IS a big deal.  Not using setval
is the best answer.



---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

   http://www.postgresql.org/about/donate

[SQL] Race condition in resetting a sequence

2007-08-03 Thread Steve Midgley


Hi Scott,

You've moved into more general territory, so I'm starting a new thread. 
The code I provided to reset a primary key sequence is actually part of 
Ruby on Rails core library - actually they use something very similar 
to what I originally sent:


SELECT setval('#{sequence}', (SELECT COALESCE(MAX(#{pk})+(SELECT 
increment_by FROM #{sequence}), (SELECT min_value FROM #{sequence})) 
FROM #{table}), false)


Where:
#{sequence} = sequence name
#{pk} = primary key of table under sequence
#{table} = table under sequence

Their code is a little different from what I provided before b/c it 
increments by one (times the increment #) above the max(pk). But 
essentially it's the same. (I think their method leaves small gaps in 
the sequence every time it runs). Also I think they're method is likely 
to be a little slower (one extra select statement) and therefore 
(perhaps) more vulnerable to a race?


You mentioned something more general though: As long as you're using 
setval you have a race condition? However the postgres manual states:


The sequence functions, listed in 
http://www.postgresql.org/docs/8.2/interactive/functions-sequence.html#FUNCTIONS-SEQUENCE-TABLETable 
9-34, provide simple, multiuser-safe methods for obtaining successive 
sequence values from sequence objects.


(http://www.postgresql.org/docs/8.2/interactive/functions-sequence.html)

Included in Table 9-34 is setval - so I'm not clear how it can have a 
race condition all by itself? Or maybe it only has a race condition 
when used in ways similar to how Ruby/Rails is using it? (i.e. with a 
compound select/coalesce statement as one of its parameters?) Would 
this command have a race condition:


select setval('my_pk_seq', 500)

This issue is reasonably important since Ruby on Rails is fairly widely 
used. As you say, the race window would be pretty small on a compound 
select -- and the Ruby function doesn't actually get called very often, 
but if you wouldn't mind explaining how the race condition would 
manifest, I'll post a warning on the RoR bug tracking site so that 
people can at least understand that there's a potential bug here..


Thanks again,

Steve

At 08:42 PM 8/3/2007, Scott Marlowe wrote:

On 8/3/07, Steve Midgley [EMAIL PROTECTED] wrote:
 Hi Scott,

 Thanks for this info (and Michael too!).

 Let me see if I understand your suggestion. I would run these three
 commands in sequence:

 # select nextval('[my_seq_name]');
 returns = 52 [I believe that the sequence is at 52]
 # alter sequence [my_seq_name] increment by 5000;
 # select nextval('[my_seq_name]');
 returns = 5052

 If the third command doesn't return 5052 - i.e. it returns 5053, 
then
 I know that somewhere in this sequence another process grabbed an 
id
 out from under me. It doesn't matter where, but I would know that 
my
 5000 id's are not unique and should be discarded? If the third 
command
 DOES return 5052, then I know that those 5000 id's are locked for 
my

 use and no other application could have grabbed one of them?

No, that's not what would happen.  If someone grabbed an id after the
increment value was changed, then you'd get 10052, cause they would
increment the sequence by 5,000.since you're not using setval, and
you're keeping the increment positive, there's no danger of collision,
only of over-incrementing and leaving a giant hole in your sequence.
which is ok.

 Can anyone see a flaw in that? It looks right to me..

 Scott - it also seems to me that I need not waste all those id's if
 another application does grab one during my statement: If I detect 
a
 failure, I could just reset the pk sequence back to the max id of 
the

 underlying table before trying again. I think this code would do it
 (stolen from Ruby's postgres adaptor):

That is open to a race condition.  The bad kind.

 SELECT setval('[seq_name]', (SELECT COALESCE(MAX([pk_of_seq]), 
(SELECT

 min_value FROM [seq_name])) FROM [table_of_pk]), false)

As long as you're using setval, you have a race condition.  Please
avoid it.  Unless you can guarantee that no one else is using the
database at the same time (during a maintenance window etc...)

 So for table property with pk of id and sequence name
 property_id_seq:

 SELECT setval('property_id_seq', (SELECT COALESCE(MAX(id), (SELECT
 min_value FROM property_id_seq)) FROM property), false)

You'd think that the select coalesce and the outer select setval would
not have a race condition, but they still do.  Just a much smaller
one.

 I'm now starting to think that there's no way to solve this problem 
in
 an elegant manner even in a stored procedure? Your method seems 
to be

 as good as it's going to get? (Not that I'm complaining!)

Yep.  Safe is better than pretty or elegant. :)

[SQL] Selecting rows with static ordering

2007-04-26 Thread Steve Midgley


Hello,

I have a strange problem (or one that I've never had before anyway). I 
am searching for a list of id's for a given table (these id values 
are generated at run-time and held statically in an application-local 
variable).


From that application, I want to retrieve all those rows, and I want 
them in the order they are currently stored in that variable. So take 
for example this foreign application variable:


  ids = 3,2,5,1,4

The application then executes this sql:

  select * from table where id in (3,2,5,1,4)

As-is, of course, the above query will return the 5 records in a 
semi-random (i.e. unpredictable/unreliable) order. And I don't want to 
just order by id - I want to order by id(3,2,5,1,4) (if you see 
what I mean)


Is there a neat trick that anyone knows for pulling this off in a 
single query? Basically right now I'm issuing 5 queries to the backend 
to ensure ordering but this horribly inefficient.


Any input or advice would be appreciated,

Steve



---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

   http://www.postgresql.org/about/donate

Re: [SQL] [pgsql-sql] Daily digest v1.2492 (19 messages)

2007-04-03 Thread Steve Midgley


Hi John,

It sounds like a disk-bound operation, so cpu is not maxed out. I'm not 
clear on all the details of your operation but it sounds like you're 
using Java to do row-by-row based inserts, selects and updates within a 
transaction, from a file. This can be a very slow process if you have 
many rows. The OS stats you describe fits that theory (but not 
conclusively).


If you are using (psuedo-)code such as:

Open file {
  Read line {
select from Pg: select from [other_table] where val = 
[line[colN]]
exec to Pg: insert into [table] (col1, col2, ...) values 
(line[col1], line[col2]...

  }
}

You can radically speed up such a system by using the copy 
(http://www.postgresql.org/docs/8.2/interactive/sql-copy.html) command 
to load all the data at once from the file into Pg and then do 
post-processing with Java/SQL to get all the fields looking right. 
Doing a bulk update with a join across several tables is so much faster 
than looping through them with a wrapper in Java (or other lang) you 
won't believe it.


I hope this helps and is on-topic for you.

Steve

At 09:38 AM 4/3/2007, [EMAIL PROTECTED] wrote:

Date: Tue, 03 Apr 2007 22:16:13 +0800
From: John Summerfield [EMAIL PROTECTED]
To: pgsql-sql@postgresql.org
Subject: A long-running transaction
Message-ID: [EMAIL PROTECTED]

I have a Java (java 1.1) program that I wrote some years ago, to read
records from a text file and insert it into a ostgresql database.

One of the assumptions I made was that one file contained one day's
data, maybe as many as 1500 records, and I coded it to do the whole 
lot
as one transaction so either a single file was loaded in its entirity, 


or none of its data was.

I lost the Java code, but revived the idea and I've collected about 
two
years' data using (Linux) shell scripts, and loading the data using 
psql.


Then, I found the Java code on a disused hard disk:-)

I made the necessary changes for it to build in java 1.5, and used 
psql
to extract data from my new database in the correct format for the old 

program. This time, I have a little more data than I ever loaded at 
once

before:
[EMAIL PROTECTED]:~$ wc -l testdata
6242217 testdata
[EMAIL PROTECTED]:~$ \ls -hl testdata
-rw-r--r-- 1 summer summer 285M 2007-03-28 22:32 testdata
[EMAIL PROTECTED]:~$

Now, it wouldn't surprise me if postgresql used lots of memory - but 
how

much could it possibly need? My laptop, where I first tried this, has
1.25 Gbytes, so I could allow it some.

It wouldn't surprise me a lot if it used lots of memory and caused all 

sorts of problems, but no, it's chugging away, still using no more RAM 


than it could have had on my old Pentium 133 all those years ago.

In the process of checking it out, I've set it running on a machine 
with

a AMD Sempron(tm)   2400+ running Kubuntu 6.10 (kernel is
2.6.17-6-server-xen0) and 512 Mbytes of RAM.

This is the java program:-)
summer   pts/6:0.0 Thu205days  1:07   1:07
/usr/bin/gij-4.1 -cp /usr/s
It's been running five days so far, and I can see where it's up to by
attaching strace. It's reading 2k of the input file every few seconds.

Okay, clearly something's wrong, and I don't think it's all my crddu 
code.

No probs swapping:
[EMAIL PROTECTED]:~$ free
  total   used   free sharedbuffers 
cached
Mem:460800 456472   4328  0860 
262164

-/+ buffers/cache: 193448 267352
Swap:  14618722841461588
[EMAIL PROTECTED]:~$

It is hitting the disk pretty hard now on this machine, but the 
laptop's
still going too, and the disk seems to run about half the time, part 
of

a second running, part idle (but the intervals are getting shorter).

It struck me as fairly curious that neither postgresql nor the
application was hogging the CPU.

Perhaps the laptop is more interesting: look at the size of the buffer 
pool:

[EMAIL PROTECTED]:~ free
  total   used   free sharedbuffers 
cached
Mem:   12955281268548  26980  0   3976 
392388

-/+ buffers/cache: 872184 423344
Swap:  1941496  326561908840
[EMAIL PROTECTED]:~
Again, no problem with over-use of RAM, and I'm logged on using KDE 
too

and that's running fine.

It's been running a little longer here:
summer   pts/2328Mar07  5days 25:12  25:11  java -cp
/home/summer/Classes/:/usr/share/p

This is Sun's Java 1.5 on OpenSUSE 10.2.


This is what suggested I should write:
[EMAIL PROTECTED]:~ procinfo
Linux 2.6.18.8-0.1-default ([EMAIL PROTECTED]) (gcc 4.1.2 20061115) #1
1CPU [Echidna.]

Memory:  TotalUsedFree  Shared Buffers
Mem:   1295528 1271720   23808   03716
Swap:  1941496   32656 1908840

Bootup: Tue Mar 27 18:50:19 2007Load average: 2.21 2.65 2.69 2/243 
19305


user  :   1d  3:17:04.03  16.0%  page in :  131097310  disk 1:
3079516r20087664w
nice  :   0:05:39.64   0.1%  page out:  197016649

Re: [SQL] Issue with copying data from a text file.

2007-03-21 Thread Steve Midgley


Hi,

I think I had the exact same problem as you do a while back and I 
solved it by removing the header row and the CSV HEADER clause of the 
statement. For the large files I had, it was easier (for me) to remove 
the header row than it was to escape out all the quotes (or regen the 
file):


COPY deal_lines_temp_load FROM
'c:/temp/autodrs_deal_lines.txt'
WITH DELIMITER AS '^';

I think the parser doesn't look for nor generate quoted rows except 
when CSV is specified.. It would be nice if there was a way to specify 
a HEADER row without invoking CSV parsing rules (friendly hint to 
core devs!) :)


Let us all know if that works!

Steve

At 03:14 AM 3/20/2007, you wrote:

Date: Tue, 20 Mar 2007 11:25:38 +0900
From: Paul Lambert [EMAIL PROTECTED]
To: pgsql-sql@postgresql.org
Subject: Issue with copying data from a text file.
Message-ID: [EMAIL PROTECTED]

I have a procedure in place that copies data from a caret delimited 
text

file into a table storing some information.

One of the fields in the table contains an item description which may
contain item dimensions such as - 17 alloy wheels

The problem I am getting when I do my load is I believe due to the
presence of the double quotation marks giving the copy the impression
that it is to include the information following as a single text 
string
until it gets to the next set of double quotes. As a result, I get the 


following:

AutoDRS=#   COPY deal_lines_temp_load FROM
'c:/temp/autodrs_deal_lines.txt'
WITH DELIMITER AS '^' CSV HEADER;
ERROR:  value too long for type character varying(30)
CONTEXT:  COPY deal_lines_temp_load, line 87, column order_desc: 17 5 


spoke alloy wheels.^1291.18^117.38^983.69^1291.18^^C^^

The column as you can see is defined as a 30 character field, the load 


contains in this column ^17 5 spoke alloy wheels.^

I note an option in the COPY command to specify the quote character,
defaulting to double quote. The problem being a single quote will also 

be used in the data, as will other characters. Is there any way to get 
a
copy to have no quote character? I.e. read the file and put whatever 
is
between the caret characters straight into the appropriate field 
exactly

as is.

TIA,
Paul.

--
Paul Lambert
Database Administrator
AutoLedgers



---(end of broadcast)---
TIP 6: explain analyze is your friend

Re: [SQL] A form of inheritance with PostgreSQL

2007-03-09 Thread Steve Midgley


Hi Greg,

While not in a C++ framework, you might find that it's not too hard to 
implement something similar in your system - It's called Single Table 
Inheritance. References to the Ruby on Rails implementation here:


http://wiki.rubyonrails.org/rails/pages/SingleTableInheritance

It's based on Martin Fowler's Patterns of Enterprise Architecture book 
- please find references to his original patterns here:


http://www.martinfowler.com/eaaCatalog/singleTableInheritance.html

The key, I believe, is simply adding a type and a parent_id to the 
class table, so you can model all your types and their hierarchical 
relations. Fowler's diagram is pretty clear. I think then you would 
store the data in another table (or tables) and link into this 
inheritance structure to establish ancestry for any piece of data (some 
people try to store the data in this table too, but I think that's a 
mistake personally).


If I understand what you're trying to do, you can use this design 
pattern in your application language to implement an inheritance scheme 
without any special database features (i.e. in a SQL-standard manner).


I hope this is helpful,

Steve



At 12:28 AM 3/9/2007, [EMAIL PROTECTED] wrote:

Date: Thu, 08 Mar 2007 13:01:51 -0500
From: Greg Toombs [EMAIL PROTECTED]
To: pgsql-sql@postgresql.org
Subject: A form of inheritance with PostgreSQL
Message-ID: [EMAIL PROTECTED]

 Hello.

I'm trying to figure out how to nicely implement a C++ class-like 
system

with PostgreSQL. Consider the following:

Tables Fruit, Apple, Orange

I want to design the foreign key scheme such that there are relations
between fruit and apple, and fruit and orange, that imply that apple 
is a

fruit, and orange is a fruit.

I don't want to eliminate the existence of Apple and Orange tables,
because there will be columns specific to both Apple and Orange; if I
include these columns in Fruit, then if Fruit is an Orange, the Apple
columns will be needlessly present in Apple rows.

The different ways of implementing this scheme that I've thought of 
(some

uglier than others):

- Have Fruit contain foreign keys to both Apple and Orange, and write 
a

check constraint in Fruit specifying that exactly one of (Apple FK,
Orange FK) needs to be non-null. The disadvantage of this method is 
that

it isn't exactly loosely coupled. For every other fruit type table I
implemented I'd have to go back and add a foreign key in Fruit.

- Have a foreign key in Apple to Fruit, and in Orange to Fruit; then
somehow create a constraint that imposes uniqueness on the union of
foreign keys in both Apple and Orange. To figure out what type of 
fruit a
Fruit row is, run a query for foreign keys in Orange and Apple 
matching
the primary key of Fruit. You'd also want to somehow create a 
constraint

that the result of this query should always return exactly one row
(perhaps with a trigger?)

Any advice will be appreciated! As I'm relatively new to Postgre, I 
might

need some help with the actual implementation as well.

Thank you.

- Greg



---(end of broadcast)---
TIP 6: explain analyze is your friend

Re: [SQL] GiST index question: performance

2007-03-06 Thread Steve Midgley


Hi Peter,

:)

All my Pg code is written via (or handed to) an abstraction layer, and 
I actually write no functions or stored procedures at all. I write 
using Rails, so in this case it's a Ruby library called ActiveRecord 
which has a Postgres module that allows me to talk via 
ActiveRecord-speak or via direct Postgres sql commands. (For example, 
AR has no idea how to create a GiST index, so I issue that DDL 
statement manually using the special syntax - also AR is not always so 
smart about SQL queries so tricky ones I write by hand).


Maybe I misunderstand Q3C completely but it looks like C code that has 
to be installed into the Postgres server itself - not a series of SQL 
functions that can implemented on an unmodified server. I think my ISP 
is fine with anything that gets installed via user-level privileges. 
Anything that requires root and/or anything that involves binary code 
they are more cautious about.


To be fair, I'm cautious about the same things, but given Oleg's 
reputation and contributions to Pg, I wouldn't be so concerned about 
Q3C specifically.


Am I ignorant of something fundamental in this conversation? I really 
do appreciate any education or insight here. Are C code patches or 
functions more of a risk to server stability/reliability than higher 
level code? Or am I speaking gibberish?


Thanks,

Steve





At 01:01 AM 3/6/2007, Peter Eisentraut wrote:

Steve Midgley wrote:
 my ISP that manages my Pg SQL server is (in my interests)
 concerned about installing anything non-standard (read: unstable)
 onto their server. I was able to get them to install your TSearch2
 b/c it's been proven many times, but I'm hesitant to even bring up
 Q3C since it's less widely deployed.

How do you manage to get your own code installed under that theory?

--
Peter Eisentraut
http://developer.postgresql.org/~petere/



---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly

[SQL] GiST index question: performance

2007-03-05 Thread Steve Midgley


Hi,

First off, can I say how much I love GiST? It's already solved a few 
problems for me that seemed impossible to solve in real-time queries. 
Thanks to everyone who works on that project!


I'm developing a geographic index based on a set of zip code 
boundaries. Points of interest (POI) will fall within some boundaries 
and not others. I need to search to find which POI are within a 
specified boundary.


I think have two options (see below) and I'm wondering if anyone has an 
opinion or experience as to whether one or the other will have 
substantially different performance characteristics. I can obviously 
test when I get that far, but I'd prefer to try the anticipated faster 
route first, if anyone has existing experience they can share:


1) Index a series of circles of NN radius around each boundary marker 
(lat/long point). Run a search on POI for those that fall within any of 
the specified circles.


2) Index a set of polygons that mark the minimum area around the 
boundary markers in question. Run a search on POI that fall within this 
single polygon.


The polygon will have more points, but there will be more circles to 
search - my understanding of GiST is limited so I'm not sure if there's 
a performance benefit to searching many circles or a few polygons.


My tables are of this size:

# of POI: 50,000
# of zip blocks (with and without regions): 217,000
# of zip blocks in a given city (and hence in a given polygon): ~5

Any thoughts or ideas?

Thank you,

Steve

p.s. I could use a GIS system alongside of Postgres but performance and 
efficiency are key to this system, and it seems to me that raw GiST 
indexed SQL queries are going to be fastest and create the lowest load 
on the server?



---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

   http://www.postgresql.org/about/donate

Re: [SQL] GiST index question: performance

2007-03-05 Thread Steve Midgley


Thanks Oleg - very interesting stuff you are working on.

You may recall I exchanged emails with you on openfts a little while 
ago - my ISP that manages my Pg SQL server is (in my interests) 
concerned about installing anything non-standard (read: unstable) onto 
their server. I was able to get them to install your TSearch2 b/c it's 
been proven many times, but I'm hesitant to even bring up Q3C since 
it's less widely deployed.


The search method I proposed in my first email is not totally accurate 
but just searching circles with radii using a GiST index and standard 
Pg circle datatypes seems like a close enough solution for me (as 
opposed to Q3C's conical search intersections with a spherical 
projection). I realize that at higher latitudes my circles will be 
elliptical but our needs are for approximations that are very fast 
rather than accurate and the radii being searched are small relative to 
the size of the sphere (I.e. when searching Nome, find everything in 
+/- 40 miles and especially don't return Anchorage POI)..


It's an end user database, so if the query takes 500ms, that's really 
too long. On the Q3C site, I see that your measure of speed is 
processing many, many rows in 20 hours, which is a whole different 
ballgame. :)


Do you have a thought as to whether GiST is going to be faster/more 
efficient with Pg standard types of polygons or circles? I suppose I 
should just test out both, and quit wasting your time. I'll certainly 
repost to the list with whatever I uncover.


I really do appreciate the help you've provided.

Sincerely,

Steve



At 12:21 PM 3/5/2007, you wrote:

On Mon, 5 Mar 2007, Steve Midgley wrote:


Hi,

First off, can I say how much I love GiST? It's already solved a few 
problems for me that seemed impossible to solve in real-time queries. 
Thanks to everyone who works on that project!


Thanks, Steve !



I'm developing a geographic index based on a set of zip code 
boundaries. Points of interest (POI) will fall within some boundaries 
and not others. I need to search to find which POI are within a 
specified boundary.


You POI is what we call ConeSearch query in astronomy.
Please, take a look on Q3C algorithm available from http://q3c.sf.net.
Some information http://www.sai.msu.su/~megera/wiki/SkyPixelization

This is what we use in our Virtual Observatory project and we're able 
to

work with 10^9 objects on moderate hardware. It doesn't use GiST but
special pixelization scheme allow to use standard Btree.



I think have two options (see below) and I'm wondering if anyone has 
an opinion or experience as to whether one or the other will have 
substantially different performance characteristics. I can obviously 
test when I get that far, but I'd prefer to try the anticipated 
faster route first, if anyone has existing experience they can share:


1) Index a series of circles of NN radius around each boundary marker 
(lat/long point). Run a search on POI for those that fall within any 
of the specified circles.


2) Index a set of polygons that mark the minimum area around the 
boundary markers in question. Run a search on POI that fall within 
this single polygon.


The polygon will have more points, but there will be more circles to 
search - my understanding of GiST is limited so I'm not sure if 
there's a performance benefit to searching many circles or a few 
polygons.


My tables are of this size:

# of POI: 50,000
# of zip blocks (with and without regions): 217,000
# of zip blocks in a given city (and hence in a given polygon): ~5

Any thoughts or ideas?

Thank you,

Steve

p.s. I could use a GIS system alongside of Postgres but performance 
and efficiency are key to this system, and it seems to me that raw 
GiST indexed SQL queries are going to be fastest and create the 
lowest load on the server?



---(end of 
broadcast)---

TIP 7: You can help support the PostgreSQL project by donating at

  http://www.postgresql.org/about/donate


Regards,
Oleg
_
Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
Sternberg Astronomical Institute, Moscow University, Russia
Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/
phone: +007(495)939-16-83, +007(495)939-23-83

Re: [SQL] COPY FROM query

2007-02-12 Thread Steve Midgley


Hi,

For what it's worth, WindowsXP (at least) seems to currently support 
forward slashes at the OS level, so this Postgres behavior isn't as odd 
as it might seem. If you enclose your paths with double quote () 
marks, Windows will even accept Unix style paths for some instructions 
on the command line:


dir /temp

Feeding Windows API calls with forward slashes seems to work with 
everything I've tried so far, so using them in Postgres seems perfectly 
smart, and reasonable..


Hope that helps,

Steve

At 11:03 AM 2/12/2007, [EMAIL PROTECTED] wrote:

Date: Mon, 12 Feb 2007 09:09:08 +0900
From: Paul Lambert [EMAIL PROTECTED]
To: Joe [EMAIL PROTECTED]
Cc: pgsql-sql@postgresql.org
Subject: Re: COPY FROM query.
Message-ID: [EMAIL PROTECTED]

Joe wrote:
 Hi Paul,

 On Mon, 2007-02-12 at 08:19 +0900, Paul Lambert wrote:
 I'm attempting to copy from a table into a file using a select 
query

 inside the copy.

 The following is my command:

 COPY (SELECT DISTINCT ON (dealer_id,appraisal_id) * FROM
 appraisals_temp) TO 'c:\autodrs_appraisal_new.txt' WITH DELIMITER 
AS '^'

 CSV HEADER;

 I get the following returned:

 WARNING:  nonstandard use of escape in a string literal
 LINE 1: ...ealer_id,appraisal_id) * FROM appraisals_temp) TO 
'C:\autodr...

   ^
 HINT:  Use the escape string syntax for escapes, e.g., E'\r\n'.

 ERROR: relative path not allowed for COPY to file
 SQL state: 42602


 (The caret character is pointing to the M in FROM)

 I believe that on Windows you need to use double backslashes, i.e., 
'c:\
 \autodrs_appraisal_new.txt', although the regular slash may also 
work,

 i.e., 'c:/autodrs_appraisal_new.txt'.

 Joe


 ---(end of 
broadcast)---

 TIP 5: don't forget to increase your free space map settings



If this is the case, it is strange that the first copy statement works 


as that is also only using a single backslash.

Having said that, if I switch it to a forward slash it works without
error... odd, given that weenblows standard is backslash.

Thanks for the help though.

--
Paul Lambert
Database Administrator
AutoLedgers



---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

77 matches

Mail list logo