[GENERAL] Understanding autocommit

2012-08-22 Thread Chris Angelico
I'm looking at these two pages:

http://www.postgresql.org/docs/9.1/static/ecpg-sql-set-autocommit.html
http://www.postgresql.org/docs/9.1/static/sql-start-transaction.html

According to the first page, autocommit is off by default for
embedded SQL programs. Does this mean everything except the 'psql'
command line interpreter, or is that some special case, and most
programs default to autocommit on?

Currently, I have explicit 'begin' statements in all our code. What
I'd like to achieve is DB2-style semantics where a transaction is
automatically and implicitly opened as soon as any query is performed,
and that transaction remains until committed or rolled back (or until
end of session implicit rollback).

I'm sure there's something really obvious here, but... how do I find
out whether my program's running in autocommit mode or not?

Thanks!

Chris Angelico


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Understanding autocommit

2012-08-22 Thread Albe Laurenz
Chris Angelico wrote:
 I'm looking at these two pages:
 
 http://www.postgresql.org/docs/9.1/static/ecpg-sql-set-autocommit.html
 http://www.postgresql.org/docs/9.1/static/sql-start-transaction.html
 
 According to the first page, autocommit is off by default for
 embedded SQL programs. Does this mean everything except the 'psql'
 command line interpreter, or is that some special case, and most
 programs default to autocommit on?
 
 Currently, I have explicit 'begin' statements in all our code. What
 I'd like to achieve is DB2-style semantics where a transaction is
 automatically and implicitly opened as soon as any query is performed,
 and that transaction remains until committed or rolled back (or until
 end of session implicit rollback).
 
 I'm sure there's something really obvious here, but... how do I find
 out whether my program's running in autocommit mode or not?

An embedded SQL program is a program written in C that makes use
of ecpg.

The server itself has autocommit, so every program that uses
PostgreSQL will be in autocommit mode by default.

The only way around that are explicit BEGIN or START TRANSACTION
commands (which is what ecpg uses).

Yours,
Laurenz Albe


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Problems with timestamp with time zone and old dates?

2012-08-22 Thread hubert depesz lubaczewski
On Tue, Aug 21, 2012 at 05:29:14PM -0400, Michael Clark wrote:
 For example, if I insert like so:
 INSERT INTO sometable (startdate) values ('1750-08-21 21:17:00+00:00');
 
 I get the following when I select:
 SELECT startdate FROM sometable;
   startdate
 --
  1750-08-21 15:59:28-05:17:32
 (1 row)

It's impossible to tell without knowing what is your time zone, but
I don't see anything particularly strange about it. Non-integer offsets
do happen, and in the past there were more commonly used.

Based on what I can see in timezone data, it looks that time zone
America/Toronto
used this offset until 1895.

depesz

-- 
The best thing about modern society is how easy it is to avoid contact with it.
 http://depesz.com/


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] How hard would a path operator be to implement in PostgreSQL

2012-08-22 Thread Chris Travers
Hi all;

So I found an interesting and relatively manageable way of doing this.

Suppose we have an inventory table:

CREATE TABLE inventory_item (
id serial primary key,
cogs_account_id int references account(id),
inv_account_id int references account(id),
income_account_id int references account(id),
sku text not null,
description text,
last_cost numeric, -- null if never purchased
sell_price numeric not null,
active bool not null default true
);

Now we want to be able to add pointers to this table in other tables
without adding a lot of decentralized code.  So what we do is:


CREATE TABLE joins_inventory_item (
inventory_item_id int
);

Then we create a table method function like:


CREATE FUNCTION inventory_item(joins_inventory_item) RETURNS inventory_item
LANGUAGE SQL AS $$
SELECT * FROM inventory_item where id = $1.inventory_item_id;
$$;

Then any table which inherits joins_inventory_item gets a path back.
So for example:

CREATE TABLE inventory_barcode (
barcode text primary key;
FOREIGN KEY inventory_item_id REFERENCES inventory_item(id)
);

Then we can:

select (bc.inventory_item).sku FROM inventory_barcode bc WHERE barcode
= '12345';

Best Wishes,
Chris Travers


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Are there any options to parallelize queries?

2012-08-22 Thread Seref Arikan
Craid and Pavel: thanks to you both for the responses.

Craig, this is for my PhD work, so no commercial interest at this point.
However, I'm pushing very hard at various communities for funding/support
for a Postgres based implementation of an EHR repository, that'll hopefully
benefit from my PhD efforts. I'll certainly add the option of funding some
key work into those discussions, which actually fits the model that we've
been discussing at the university for some time very well.

Kind regards
Seref


On Wed, Aug 22, 2012 at 4:24 AM, Craig Ringer ring...@ringerc.id.au wrote:

 On 08/21/2012 04:45 PM, Seref Arikan wrote:

  Parallel software frameworks such as Erlang's OTP or Scala's Akka do
 help a lot, but it would be a lot better if I could feed those
 frameworks with data faster. So, what options do I have to execute
 queries in parallel, assuming a transactional system running on
 postgresql?


 AFAIK Native support for parallelisation of query execution is currently
 almost non-existent in Pg. You generally have to break your queries up into
 smaller queries that do part of the work, run them in parallel, and
 integrate the results together client-side.

 There are some tools that can help with this. For example, I think
 PgPool-II has some parallelisation features, though I've never used them.
 Discussion I've seen on this list suggests that many people handle it in
 their code directly.

 Note that Pg is *very* good at concurently running many queries, with
 features like synchronized scans. The whole DB is written around fast
 concurrent execution of queries, and it'll happily use every CPU and I/O
 resource you have. However, individual queries cannot use multiple CPUs or
 I/O threads, you need many queries running in parallel to use the
 hardware's resources fully.


 As far as I know the only native in-query parallelisation Pg offers is via
 effective_io_concurrency, and currently that only affects bitmap heap scans:

 
 http://archives.postgresql.**org/pgsql-general/2009-10/**msg00671.phphttp://archives.postgresql.org/pgsql-general/2009-10/msg00671.php

 ... not seqscans or other access methods.

 Execution of each query is done with a single process running a single
 thread, so there's no CPU parallelism except where the compiler can
 introduce some behind the scenes - which isn't much. I/O isn't parallelised
 across invocations of nested loops, by splitting seqscans up into chunks,
 etc either.

 There are some upsides to this limitation, though:

 - The Pg code is easier to understand, maintain, and fix

 - It's easier to add features

 - It's easier to get right, so it's less buggy and more
   reliable.


 As the world goes more and more parallel Pg is likely to follow at some
 point, but it's going to be a mammoth job. I don't see anyone volunteering
 the many months of their free time required, there's nobody being funded to
 work on it, and I don't see any of the commercial Pg forks that've added
 parallel features trying to merge their work back into mainline.

 If you have a commercial need, perhaps you can find time to fund work on
 something that'd help you out, like honouring effective_io_concurrency for
 sequential scans?

 --
 Craig Ringer



Re: [GENERAL] Are there any options to parallelize queries?

2012-08-22 Thread Chris Travers
Does Postgres-XC support query parallelism (at least splitting the
query up for portions that run on different nodes)?  They just
released 1.0.  I don't know if this sort of thing is supported there
and it might be overkill at any rate.

Best Wishes,
Chris Travers


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


[GENERAL] NULL value comparison

2012-08-22 Thread Michael Sacket
Good Day,

I'm trying to figure out why a postgresql query doesn't return what I'd expect 
with a query like this where there are NULL values:

select * from users where is_enabled'Y';

I'm expecting it to return all records where is_enabled is 'N' or NULL.  
Perhaps my expectations are misguided.  Any thoughts would be appreciated.

Thanks!
Michael


Example:

CREATE TABLE users (
name char(50) NOT NULL,
is_enabled char
)

insert into users (name, is_enabled) values ('Michael', 'Y');
insert into users (name, is_enabled) values ('Jeremy', 'N');
insert into users (name, is_enabled) values ('Sherry', NULL);


select * from users where is_enabled'Y';
+++
| name   | is_enabled |
+++
| Jeremy | N  |
+++
1 rows in set (0.03 sec)





-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] NULL value comparison

2012-08-22 Thread Pavel Stehule
2012/8/22 Michael Sacket msac...@gammastream.com:
 Good Day,

 I'm trying to figure out why a postgresql query doesn't return what I'd 
 expect with a query like this where there are NULL values:

 select * from users where is_enabled'Y';

 I'm expecting it to return all records where is_enabled is 'N' or NULL.  
 Perhaps my expectations are misguided.  Any thoughts would be appreciated.

no - NULL is not comparable with any value

your query should be

WHERE is_enabled  'Y' or is_enabled IS NULL

or

WHERE is_enabled IS DISTINCT FROM 'Y'

Regards

Pavel Stehule


 Thanks!
 Michael


 Example:

 CREATE TABLE users (
 name char(50) NOT NULL,
 is_enabled char
 )

 insert into users (name, is_enabled) values ('Michael', 'Y');
 insert into users (name, is_enabled) values ('Jeremy', 'N');
 insert into users (name, is_enabled) values ('Sherry', NULL);


 select * from users where is_enabled'Y';
 +++
 | name   | is_enabled |
 +++
 | Jeremy | N  |
 +++
 1 rows in set (0.03 sec)





 --
 Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-general


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] NULL value comparison

2012-08-22 Thread Adrian Klaver

On 08/22/2012 06:23 AM, Michael Sacket wrote:

Good Day,

I'm trying to figure out why a postgresql query doesn't return what I'd expect 
with a query like this where there are NULL values:

select * from users where is_enabled'Y';

I'm expecting it to return all records where is_enabled is 'N' or NULL.  
Perhaps my expectations are misguided.  Any thoughts would be appreciated.


See below for explanation:
http://www.postgresql.org/docs/9.1/interactive/functions-comparison.html



Thanks!
Michael





--
Adrian Klaver
adrian.kla...@gmail.com


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] NULL value comparison

2012-08-22 Thread David Johnston
On Aug 22, 2012, at 9:23, Michael Sacket msac...@gammastream.com wrote:

 Good Day,
 
 I'm trying to figure out why a postgresql query doesn't return what I'd 
 expect with a query like this where there are NULL values:
 
 select * from users where is_enabled'Y';
 
 I'm expecting it to return all records where is_enabled is 'N' or NULL.  
 Perhaps my expectations are misguided.  Any thoughts would be appreciated.

The only record known to be not equal to Y is N since it is possible the 
unknown value represented by NULL could be Y.  If you really want both you 
need to use IS DISTINCT FROM

http://www.postgresql.org/docs/9.1/static/functions-comparison.html

Note a useful alternative is

COALESCE(is_enabled, 'N')  'Y'

This explicitly indicates that unknown values are to be treated as 'N'

A better solution is not allow NULL values in the first place.  Add a NOT NULL 
constraint on the column and a DEFAULT expression on the table as well.

You should consider enums and/or a check constraint for allowed values as well.

 
 Thanks!
 Michael
 
 
 Example:
 
 CREATE TABLE users (
   name char(50) NOT NULL,
   is_enabled char
 )
 
 insert into users (name, is_enabled) values ('Michael', 'Y');
 insert into users (name, is_enabled) values ('Jeremy', 'N');
 insert into users (name, is_enabled) values ('Sherry', NULL);
 
 
 select * from users where is_enabled'Y';
 +++
 | name   | is_enabled |
 +++
 | Jeremy | N  |
 +++
 1 rows in set (0.03 sec)
 
 
 
 
 
 -- 
 Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-general



-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] NULL value comparison

2012-08-22 Thread Michael Sacket
Thank you all very much!

Unfortunately I can't change the query... but I can modify the data.  I updated 
the NULL values to 'N' and put the appropriate NOT NULL constraint and a 
default value of 'N'.

On Aug 22, 2012, at 8:37 AM, David Johnston wrote:

 On Aug 22, 2012, at 9:23, Michael Sacket msac...@gammastream.com wrote:
 
 Good Day,
 
 I'm trying to figure out why a postgresql query doesn't return what I'd 
 expect with a query like this where there are NULL values:
 
 select * from users where is_enabled'Y';
 
 I'm expecting it to return all records where is_enabled is 'N' or NULL.  
 Perhaps my expectations are misguided.  Any thoughts would be appreciated.
 
 The only record known to be not equal to Y is N since it is possible the 
 unknown value represented by NULL could be Y.  If you really want both you 
 need to use IS DISTINCT FROM
 
 http://www.postgresql.org/docs/9.1/static/functions-comparison.html
 
 Note a useful alternative is
 
 COALESCE(is_enabled, 'N')  'Y'
 
 This explicitly indicates that unknown values are to be treated as 'N'
 
 A better solution is not allow NULL values in the first place.  Add a NOT 
 NULL constraint on the column and a DEFAULT expression on the table as well.
 
 You should consider enums and/or a check constraint for allowed values as 
 well.
 
 
 Thanks!
 Michael
 
 
 Example:
 
 CREATE TABLE users (
  name char(50) NOT NULL,
  is_enabled char
 )
 
 insert into users (name, is_enabled) values ('Michael', 'Y');
 insert into users (name, is_enabled) values ('Jeremy', 'N');
 insert into users (name, is_enabled) values ('Sherry', NULL);
 
 
 select * from users where is_enabled'Y';
 +++
 | name   | is_enabled |
 +++
 | Jeremy | N  |
 +++
 1 rows in set (0.03 sec)
 
 
 
 
 
 -- 
 Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-general
 
 
 
 -- 
 Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-general



-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Amazon High I/O instances

2012-08-22 Thread Vincent Veyron
Le mercredi 22 août 2012 à 13:15 +0800, Craig Ringer a écrit :

 He appears to be suggesting that buying access to real hardware in a 
 datacenter (if not buying the hardware yourself) is more cost effective 
 and easier to manage than using cloud style services with more 
 transient hosts like EC2 offers. At least that's how I understood it. 

Hi Craig,

Actually, my comments about costs were misleading : I simply reacted to
the fact that the OP wanted to test his application for high
performance, and thought that it would be easier with bare metal rather
than with AWS, because you have less parameters to control this way.

Also, I'll admit that I jumped the gun without reading about the SSD
offer by Amazon. Still, I would test first with a machine that I
control, but it maybe that Sébastien already did that. 

I am curious to know what kind of application requires 10s to 100s of
instances with a PostgreSQL database, because that could get unwieldy
with big data (which I assumed from the high performance specification)

-- 
Vincent Veyron
http://marica.fr/
Gestion informatique des sinistres d'assurances et des dossiers contentieux 
pour le service juridique



-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Database Bloat

2012-08-22 Thread Jerry Sievers
elliott elli...@cpi.com writes:

 Hi,

 I am using PostgreSQL 9.1 and loading very large tables ( 13 million
 rows each ).  The flat file size is only 25M.  However, the equivalent
 database table is 548MB.  This is without any indexes applied and auto
 vacuum  turned on.  I have read that the bloat can be around 5 times
 greater for tables than flat files so over 20 times seems quite
 excessive.

 Any ideas on how to go about decreasing this bloat or is this not
 unexpected for such large tables?

Well, check if the table has a low fill-factor setting.

\d+ footable

Hasn't come up yet in the remarks by others on this thread but worth verifying.

 Thanks


 -- 
 Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-general


-- 
Jerry Sievers
Postgres DBA/Development Consulting
e: postgres.consult...@comcast.net
p: 732.216.7255


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Database Bloat

2012-08-22 Thread Jeff Janes
On Mon, Aug 20, 2012 at 10:53 AM, elliott elli...@cpi.com wrote:
 Hi,

 I am using PostgreSQL 9.1 and loading very large tables ( 13 million rows
 each ).  The flat file size is only 25M.

That is only 2 bytes per row.  Is the size given for the flat file for
a compressed file?

Cheers,

Jeff


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Database Bloat

2012-08-22 Thread elliott

Yes, it is a tif file.  Uncompressed it is around 85M.

On 8/22/2012 1:20 PM, Jeff Janes wrote:

On Mon, Aug 20, 2012 at 10:53 AM, elliottelli...@cpi.com  wrote:
   

Hi,

I am using PostgreSQL 9.1 and loading very large tables ( 13 million rows
each ).  The flat file size is only 25M.
 

That is only 2 bytes per row.  Is the size given for the flat file for
a compressed file?

Cheers,

Jeff

   




--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] NULL value comparison

2012-08-22 Thread Thalis Kalfigkopoulos
Hi Michael.

NULL is not any specific value. Thus Pg correctly doesnot tell you that it
is 'Y'. It is NULL means that we dont know the value. Thus it may be 'Y'
as much as it may not be 'Y'. The comparison is not applicable in the case
of NULL and that's why there are the IS NULL and IS NOT NULL operators.

Regards,
Thalis

On Aug 22, 2012 10:24 AM, Michael Sacket msac...@gammastream.com wrote:

 Good Day,

 I'm trying to figure out why a postgresql query doesn't return what I'd
expect with a query like this where there are NULL values:

 select * from users where is_enabled'Y';

 I'm expecting it to return all records where is_enabled is 'N' or NULL.
 Perhaps my expectations are misguided.  Any thoughts would be appreciated.

 Thanks!
 Michael


 Example:

 CREATE TABLE users (
 name char(50) NOT NULL,
 is_enabled char
 )

 insert into users (name, is_enabled) values ('Michael', 'Y');
 insert into users (name, is_enabled) values ('Jeremy', 'N');
 insert into users (name, is_enabled) values ('Sherry', NULL);


 select * from users where is_enabled'Y';
 +++
 | name   | is_enabled |
 +++
 | Jeremy | N  |
 +++
 1 rows in set (0.03 sec)





 --
 Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Amazon High I/O instances

2012-08-22 Thread Sébastien Lorion
Vincent, I would appreciate that you stop assuming things based on zero
information about what I am doing. I understand that you are trying to be
helpful, but I can assure you that going bare-metal only does not make any
sense in my context.

Sébastien

On Wed, Aug 22, 2012 at 12:44 PM, Vincent Veyron vv.li...@wanadoo.frwrote:

 Le mercredi 22 août 2012 à 13:15 +0800, Craig Ringer a écrit :

  He appears to be suggesting that buying access to real hardware in a
  datacenter (if not buying the hardware yourself) is more cost effective
  and easier to manage than using cloud style services with more
  transient hosts like EC2 offers. At least that's how I understood it.

 Hi Craig,

 Actually, my comments about costs were misleading : I simply reacted to
 the fact that the OP wanted to test his application for high
 performance, and thought that it would be easier with bare metal rather
 than with AWS, because you have less parameters to control this way.

 Also, I'll admit that I jumped the gun without reading about the SSD
 offer by Amazon. Still, I would test first with a machine that I
 control, but it maybe that Sébastien already did that.

 I am curious to know what kind of application requires 10s to 100s of
 instances with a PostgreSQL database, because that could get unwieldy
 with big data (which I assumed from the high performance specification)

 --
 Vincent Veyron
 http://marica.fr/
 Gestion informatique des sinistres d'assurances et des dossiers
 contentieux pour le service juridique




Re: [GENERAL] Database Bloat

2012-08-22 Thread Merlin Moncure
On Wed, Aug 22, 2012 at 12:25 PM, elliott elli...@cpi.com wrote:
 Yes, it is a tif file.  Uncompressed it is around 85M.

ok, 85 - 548mb is reasonable considering you have very narrow rows
and an index that covers 2/3 of your column data.   if you want to see
dramatic reduction in table size, you probably need to explore use of
arrays in some fashion.

merlin


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Amazon High I/O instances

2012-08-22 Thread Andrew Hannon
Just looking into High IO instances for a DB deployment. In order to get past 
1TB, we are looking at RAID-0. I have heard 
(http://hackerne.ws/item?id=4266119) there might be a problem if TRIM isn't 
supported. Does anyone know if it is and has anyone used RAID-0 on these 
instances? (Linux of course…) 

On Aug 21, 2012, at 9:36 AM, Merlin Moncure wrote:
 On Tue, Aug 21, 2012 at 12:33 AM, Sébastien Lorion
 s...@thestrangefactory.com wrote:
 Hello,
 
 Since Amazon has added new high I/O instance types and EBS volumes, anyone
 has done some benchmark of PostgreSQL on them ?
 
 http://perspectives.mvdirona.com/2012/07/20/IOPerformanceNoLongerSucksInTheCloud.aspx
 http://perspectives.mvdirona.com/2012/08/01/EBSProvisionedIOPSOptimizedInstanceTypes.aspx
 http://aws.typepad.com/aws/2012/08/fast-forward-provisioned-iops-ebs.html
 
 I will be testing my app soon, but was curious to know if others have done
 some tests so I can compare / have a rough idea to what to expect. Looking
 on Google, I found an article about MySQL
 (http://palominodb.com/blog/2012/07/24/palomino-evaluates-amazon%E2%80%99s-new-high-io-ssd-instances),
 but nothing about PostgresSQL.
 
 here's a datapoint, stock config:
 pgbench -i -s 500
 pgbench -c 16 -T 60
 number of transactions actually processed: 418012
 tps = 6962.607292 (including connections establishing)
 tps = 6973.154593 (excluding connections establishing)
 
 not too shabby.  this was run by a friend who is evaluating high i/o
 instances for their high load db servers.   we didn't have time to
 kick off a high scale read only test unfortunately.
 
 Regarding 'AWS vs bare metal', I think high i/o instances full a huge
 niche in their lineup.   Dollar for dollar, I'm coming around to the
 point of view that dealing with aws is a cheaper/more effective
 solution than renting out space from a data center or (even worse)
 running your own data center unless you're very large or have other
 special requirements.  Historically the problem with AWS is that you
 had no solution for highly transaction bound systems which forced you
 to split your environment which ruined most of the benefit, and they
 fixed that.
 
 merlin


[GENERAL] Statistical aggregates with intervals

2012-08-22 Thread Thomas Munro
Hi

I noticed that 'avg' works on 'interval', but 'stddev' and 'variance' don't:

hack= create table test (start_time timestamptz, end_time timestamptz);
CREATE TABLE
hack= insert into test values (now(), now() + interval '1 second');
INSERT 0 1`
hack= insert into test values (now(), now() + interval '1 second');
INSERT 0 1
hack= insert into test values (now(), now() + interval '4 second');
INSERT 0 1
hack= select avg(end_time - start_time) from test;
   avg
--
 00:00:02
(1 row)

hack= select stddev(end_time - start_time) from test;
ERROR:  function stddev(interval) does not exist
LINE 1: select stddev(end_time - start_time) from test;
   ^
HINT:  No function matches the given name and argument types. You
might need to add explicit type casts.

Sure enough the standard deviation of time intervals can be computed
by following that hint:

hack= select interval '1 second' * stddev(extract(epoch from end_time
- start_time)) as stddev from test;
 stddev
-
 00:00:01.732051
(1 row)

But is there some way I can use CREATE AGGREGATE to define stddev for
intervals in terms of the built-in stddev aggregate, just transforming
the inputs and output?  Or am I missing something fundamental that
explains why stddev(interval) isn't supported?

Thanks!


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Database Bloat

2012-08-22 Thread John R Pierce

On 08/22/12 10:25 AM, elliott wrote:
Yes, it is a tif file.  Uncompressed it is around 85M. 


a tif file is a pixel map image, eg, graphics, no? I thought we were 
talking about CSV data here?




--
john r pierceN 37, W 122
santa cruz ca mid-left coast



--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Amazon High I/O instances

2012-08-22 Thread Alan Hodgson
On Wednesday, August 22, 2012 04:10:01 PM Andrew Hannon wrote:
 Just looking into High IO instances for a DB deployment. In order to get
 past 1TB, we are looking at RAID-0. I have heard
 (http://hackerne.ws/item?id=4266119) there might be a problem if TRIM isn't
 supported. Does anyone know if it is and has anyone used RAID-0 on these
 instances? (Linux of course…)

Just use LVM striping. If it turns out to be an issue; that seems to be mostly 
conjecture.

I note that the SSDs are only instance storage. The data will be gone when the 
instance goes away. I have used instance storage in replicated setups but it 
always feels rather fragile unless your data really is transient or you can 
maintain 2 replicas.

Their other new service, provisioned IOPS for EBS, might be more useful for a 
persistent database. Although not nearly SSD speeds, of course.


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


[GENERAL] Migrating from 8.3 to 9.1 - date/time storage types do not match

2012-08-22 Thread Scott Briggs
So we have a large TB database that we need to migrate to 9.1 and I'm
wondering if there's a way to do this process in stages.

Since the date/time storage types changes between 8.3 and 8.4, I
realize we'll have to dump the database and my plan is to create a
backup using pg_start_backup(), run that backup on another server
using postgres 8.3 which I could then dump and import into a 9.1
server.

My question is, is there some way I can take WAL files created since
the beginning of the original backup on the 8.3 server, convert those
into text that I could then apply to the 9.1 server?  The mysql
equivalent to this is mysqlbinlog which you can apply to mysql binary
log files.

Cheers,
Scott


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Migrating from 8.3 to 9.1 - date/time storage types do not match

2012-08-22 Thread John R Pierce

On 08/22/12 2:17 PM, Scott Briggs wrote:

So we have a large TB database that we need to migrate to 9.1 and I'm
wondering if there's a way to do this process in stages.

Since the date/time storage types changes between 8.3 and 8.4, I
realize we'll have to dump the database and my plan is to create a
backup using pg_start_backup(), run that backup on another server
using postgres 8.3 which I could then dump and import into a 9.1
server.

My question is, is there some way I can take WAL files created since
the beginning of the original backup on the 8.3 server, convert those
into text that I could then apply to the 9.1 server?  The mysql
equivalent to this is mysqlbinlog which you can apply to mysql binary
log files.



postgres makes changes to the binary format with every x.y version.   
there's far more changes than just the binary date/time format.



those WAL files contain, for all practical purposes, binary deltas to 
the data files.  they can only be applied to an exact file by file 
duplicate of the original.  when you do your dump/import from 8.x to 
9.1, there's no resemblance between the original and 9.1 files.




--
john r pierceN 37, W 122
santa cruz ca mid-left coast



--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Migrating from 8.3 to 9.1 - date/time storage types do not match

2012-08-22 Thread Alan Hodgson
On Wednesday, August 22, 2012 05:17:10 PM Scott Briggs wrote:
 So we have a large TB database that we need to migrate to 9.1 and I'm
 wondering if there's a way to do this process in stages.
 
 Since the date/time storage types changes between 8.3 and 8.4, I
 realize we'll have to dump the database and my plan is to create a
 backup using pg_start_backup(), run that backup on another server
 using postgres 8.3 which I could then dump and import into a 9.1
 server.
 
 My question is, is there some way I can take WAL files created since
 the beginning of the original backup on the 8.3 server, convert those
 into text that I could then apply to the 9.1 server?  The mysql
 equivalent to this is mysqlbinlog which you can apply to mysql binary
 log files.

The most likely way to get this done is with Slony. Setup a Slony slave, 
upgrade the slave to 9.1 with a dump/reload, run it and let Slony catch it up, 
and then promote it to be the Slony cluster master and switch your clients 
over.

Slony is table based and certainly not trivial to setup, but it'll work. If 
your change rate isn't too horrendously high, anyway.

The Slony tables need some updating after a dump/reload to fix OID issues. 
There are tools included to do that. You'd need to experiment to nail down the 
process. And you would want to make sure no one is making DDL changes outside 
of Slony during the whole process.

If you aren't already familiar with Slony, this is probably weeks of work to 
fully prepare for and get right. But it will let you avoid the downtime.


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


[GENERAL] Can column name aliases be supported?

2012-08-22 Thread Gauthier, Dave
Here's the problem

I have a table with a column called last_name.  I have one customer who likes 
to articulate queries and updates for this using column name last_name (no 
problem there) but another who likes to call it lname and yet another who 
likes to call it surname.So 3 different names for the same physical 
column in the table.

I know a view can be used to alias one of them. But what about the other 2?  
Three views (or 2 views and the main table) is awkward.  Also, you can't modify 
the data through a view.

I thought of having a table with the 3 column names in there last_name, 
lname and surname.  The queries should work fine.  As for the updates and 
inserts,  have a before trigger make sure they're all synched up.  But that 
only works if only one of the value changes, otherwise how do you know which 
one to sync up to! H.

Does anyone know of an elegant way to do this?

Thanks in Advance


Re: [GENERAL] Can column name aliases be supported?

2012-08-22 Thread Bosco Rama
On 08/22/12 15:19, Gauthier, Dave wrote:
 
 I know a view can be used to alias one of them.

It can alias all of them:

create view xyz as
   select *, last_name as lname, last_name as surname
  from mytable;

(not the nicest version but functional)

HTH.

Bosco.


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Can column name aliases be supported?

2012-08-22 Thread Rob Sargent

On 08/22/2012 04:19 PM, Gauthier, Dave wrote:

Here's the problem

I have a table with a column called last_name.  I have one customer
who likes to articulate queries and updates for this using column name
last_name (no problem there) but another who likes to call it lname
and yet another who likes to call it surname.So 3 different names
for the same physical column in the table.

I know a view can be used to alias one of them. But what about the other
2?  Three views (or 2 views and the main table) is awkward.  Also, you
can't modify the data through a view.

I thought of having a table with the 3 column names in there
last_name, lname and surname.  The queries should work fine.  As
for the updates and inserts,  have a before trigger make sure they're
all synched up.  But that only works if only one of the value changes,
otherwise how do you know which one to sync up to! H.

Does anyone know of an elegant way to do this?

Thanks in Advance

What environment lets the customer articulate queries using apparently 
random choices for column names?


If the customers are seeing (in their client-side vision) their 
flavour of column name, the same magic presenting that flavour might 
then be responsible for deconvolving same?


Else some nasty per client sed scripts are in your future :)

rjs


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Can column name aliases be supported?

2012-08-22 Thread Bosco Rama
Ooops!  Hit send too fast...

On 08/22/12 15:34, Bosco Rama wrote:
 On 08/22/12 15:19, Gauthier, Dave wrote:
 
 I know a view can be used to alias one of them.
 
 It can alias all of them:
 
 create view xyz as
select *, last_name as lname, last_name as surname
   from mytable;
 
 (not the nicest version but functional)

... and then use an INSTEAD/ALSO rule to do the update, if needed.

Bosco.


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Database Bloat

2012-08-22 Thread David Johnston
 -Original Message-
 From: pgsql-general-ow...@postgresql.org [mailto:pgsql-general-
 ow...@postgresql.org] On Behalf Of John R Pierce
 Sent: Wednesday, August 22, 2012 4:32 PM
 To: pgsql-general@postgresql.org
 Subject: Re: [GENERAL] Database Bloat
 
 On 08/22/12 10:25 AM, elliott wrote:
  Yes, it is a tif file.  Uncompressed it is around 85M.
 
 a tif file is a pixel map image, eg, graphics, no? I thought we were
 talking about CSV data here?
 

If I had to hazard a guess I think he meant TAR since he is talking about
compression...

David J.




-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Migrating from 8.3 to 9.1 - date/time storage types do not match

2012-08-22 Thread Alan Hodgson
On Wednesday, August 22, 2012 02:43:05 PM Alan Hodgson wrote:
 The most likely way to get this done is with Slony. Setup a Slony slave,
 upgrade the slave to 9.1 with a dump/reload, run it and let Slony catch it
 up, and then promote it to be the Slony cluster master and switch your
 clients over.
 

Or, of course, just setup a clean 9.1 and let Slony populate it. Duh. Too long 
since I did that ...



-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


[GENERAL] What text format is this and can I import it into Postgres?

2012-08-22 Thread Mike Christensen
I'd like to import this data into a Postgres database:

http://www.ars.usda.gov/SP2UserFiles/Place/12354500/Data/SR24/dnload/sr24.zip

However, I'm not quite sure what format this is.  It's definitely not
CSV.  Here's an example of a few rows:

~01001~^~0100~^~Butter, salted~^~BUTTER,WITH
SALT~^~~^~~^~Y~^~~^0^~~^6.38^4.27^8.79^3.87
~01002~^~0100~^~Butter, whipped, with salt~^~BUTTER,WHIPPED,WITH
SALT~^~~^~~^~Y~^~~^0^~~^6.38^4.27^8.79^3.87
~01003~^~0100~^~Butter oil, anhydrous~^~BUTTER
OIL,ANHYDROUS~^~~^~~^~Y~^~~^0^~~^6.38^4.27^8.79^3.87
~01004~^~0100~^~Cheese,
blue~^~CHEESE,BLUE~^~~^~~^~Y~^~~^0^~~^6.38^4.27^8.79^3.87
~01005~^~0100~^~Cheese,
brick~^~CHEESE,BRICK~^~~^~~^~Y~^~~^0^~~^6.38^4.27^8.79^3.87
~01006~^~0100~^~Cheese,
brie~^~CHEESE,BRIE~^~~^~~^~Y~^~~^0^~~^6.38^4.27^8.79^3.87
~01007~^~0100~^~Cheese,
camembert~^~CHEESE,CAMEMBERT~^~~^~~^~Y~^~~^0^~~^6.38^4.27^8.79^3.87
~01008~^~0100~^~Cheese,
caraway~^~CHEESE,CARAWAY~^~~^~~^~~^~~^0^~~^6.38^4.27^8.79^3.87
~01009~^~0100~^~Cheese,
cheddar~^~CHEESE,CHEDDAR~^~~^~~^~Y~^~~^0^~~^6.38^4.27^8.79^3.87
~01010~^~0100~^~Cheese,
cheshire~^~CHEESE,CHESHIRE~^~~^~~^~~^~~^0^~~^6.38^4.27^8.79^3.87
~01011~^~0100~^~Cheese,
colby~^~CHEESE,COLBY~^~~^~~^~Y~^~~^0^~~^6.38^4.27^8.79^3.87

Is there an easy way to get this into PG, or a tool I can download for
this, or do I need to parse it myself with a script or something?
Thanks!


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Performance implications of numeric?

2012-08-22 Thread Craig Ringer

On 08/23/2012 12:48 AM, Wells Oliver wrote:

Hey, thanks for your feedback. Just to clarify: pg_total_relation_size
returns bytes, correct?


Please reply to the list, not directly to me.

Yes, pg_total_relation_size returns bytes. The documentation 
(http://www.postgresql.org/docs/9.1/static/functions-admin.html) doesn't 
seem to explicitly say that for pg_total_relation_size though it does 
for pg_relation_size and other functions.


Use pg_size_pretty to convert bytes to human values for display.

--
Craig Ringer



--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] What text format is this and can I import it into Postgres?

2012-08-22 Thread Rob Sargent

On 08/22/2012 06:23 PM, Mike Christensen wrote:

I'd like to import this data into a Postgres database:

http://www.ars.usda.gov/SP2UserFiles/Place/12354500/Data/SR24/dnload/sr24.zip

However, I'm not quite sure what format this is.  It's definitely not
CSV.  Here's an example of a few rows:

~01001~^~0100~^~Butter, salted~^~BUTTER,WITH
SALT~^~~^~~^~Y~^~~^0^~~^6.38^4.27^8.79^3.87
~01002~^~0100~^~Butter, whipped, with salt~^~BUTTER,WHIPPED,WITH
SALT~^~~^~~^~Y~^~~^0^~~^6.38^4.27^8.79^3.87
~01003~^~0100~^~Butter oil, anhydrous~^~BUTTER
OIL,ANHYDROUS~^~~^~~^~Y~^~~^0^~~^6.38^4.27^8.79^3.87
~01004~^~0100~^~Cheese,
blue~^~CHEESE,BLUE~^~~^~~^~Y~^~~^0^~~^6.38^4.27^8.79^3.87
~01005~^~0100~^~Cheese,
brick~^~CHEESE,BRICK~^~~^~~^~Y~^~~^0^~~^6.38^4.27^8.79^3.87
~01006~^~0100~^~Cheese,
brie~^~CHEESE,BRIE~^~~^~~^~Y~^~~^0^~~^6.38^4.27^8.79^3.87
~01007~^~0100~^~Cheese,
camembert~^~CHEESE,CAMEMBERT~^~~^~~^~Y~^~~^0^~~^6.38^4.27^8.79^3.87
~01008~^~0100~^~Cheese,
caraway~^~CHEESE,CARAWAY~^~~^~~^~~^~~^0^~~^6.38^4.27^8.79^3.87
~01009~^~0100~^~Cheese,
cheddar~^~CHEESE,CHEDDAR~^~~^~~^~Y~^~~^0^~~^6.38^4.27^8.79^3.87
~01010~^~0100~^~Cheese,
cheshire~^~CHEESE,CHESHIRE~^~~^~~^~~^~~^0^~~^6.38^4.27^8.79^3.87
~01011~^~0100~^~Cheese,
colby~^~CHEESE,COLBY~^~~^~~^~Y~^~~^0^~~^6.38^4.27^8.79^3.87

Is there an easy way to get this into PG, or a tool I can download for
this, or do I need to parse it myself with a script or something?
Thanks!




Looks CSVish to me with '^' as C (after all it means Character Separated 
Data)


rjs



--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] What text format is this and can I import it into Postgres?

2012-08-22 Thread Rob Sargent

On 08/22/2012 06:23 PM, Mike Christensen wrote:

I'd like to import this data into a Postgres database:

http://www.ars.usda.gov/SP2UserFiles/Place/12354500/Data/SR24/dnload/sr24.zip

However, I'm not quite sure what format this is.  It's definitely not
CSV.  Here's an example of a few rows:

~01001~^~0100~^~Butter, salted~^~BUTTER,WITH
SALT~^~~^~~^~Y~^~~^0^~~^6.38^4.27^8.79^3.87
~01002~^~0100~^~Butter, whipped, with salt~^~BUTTER,WHIPPED,WITH
SALT~^~~^~~^~Y~^~~^0^~~^6.38^4.27^8.79^3.87
~01003~^~0100~^~Butter oil, anhydrous~^~BUTTER
OIL,ANHYDROUS~^~~^~~^~Y~^~~^0^~~^6.38^4.27^8.79^3.87
~01004~^~0100~^~Cheese,
blue~^~CHEESE,BLUE~^~~^~~^~Y~^~~^0^~~^6.38^4.27^8.79^3.87
~01005~^~0100~^~Cheese,
brick~^~CHEESE,BRICK~^~~^~~^~Y~^~~^0^~~^6.38^4.27^8.79^3.87
~01006~^~0100~^~Cheese,
brie~^~CHEESE,BRIE~^~~^~~^~Y~^~~^0^~~^6.38^4.27^8.79^3.87
~01007~^~0100~^~Cheese,
camembert~^~CHEESE,CAMEMBERT~^~~^~~^~Y~^~~^0^~~^6.38^4.27^8.79^3.87
~01008~^~0100~^~Cheese,
caraway~^~CHEESE,CARAWAY~^~~^~~^~~^~~^0^~~^6.38^4.27^8.79^3.87
~01009~^~0100~^~Cheese,
cheddar~^~CHEESE,CHEDDAR~^~~^~~^~Y~^~~^0^~~^6.38^4.27^8.79^3.87
~01010~^~0100~^~Cheese,
cheshire~^~CHEESE,CHESHIRE~^~~^~~^~~^~~^0^~~^6.38^4.27^8.79^3.87
~01011~^~0100~^~Cheese,
colby~^~CHEESE,COLBY~^~~^~~^~Y~^~~^0^~~^6.38^4.27^8.79^3.87

Is there an easy way to get this into PG, or a tool I can download for
this, or do I need to parse it myself with a script or something?
Thanks!



oops, more like a funky string sep
~^~ between values, single tilde for star of record
the nasty bit will be to undo the exponentiation

rjs



--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] What text format is this and can I import it into Postgres?

2012-08-22 Thread Mike Christensen
On Wed, Aug 22, 2012 at 5:34 PM, Bosco Rama postg...@boscorama.com wrote:
 On 08/22/12 17:23, Mike Christensen wrote:
 I'd like to import this data into a Postgres database:

 http://www.ars.usda.gov/SP2UserFiles/Place/12354500/Data/SR24/dnload/sr24.zip

 However, I'm not quite sure what format this is.  It's definitely not
 CSV.  Here's an example of a few rows:

 ~01001~^~0100~^~Butter, salted~^~BUTTER,WITH
 SALT~^~~^~~^~Y~^~~^0^~~^6.38^4.27^8.79^3.87

 http://www.ars.usda.gov/SP2UserFiles/Place/12354500/Data/SR24/sr24_doc.pdf

 Section Relation Files (Logical page 25)

Okay so is there a way to import Relational Files into Postgres?

The alternative would be to download the Excel version, then Save as
CSV, and import into Postgres via the COPY command.

Mike


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] What text format is this and can I import it into Postgres?

2012-08-22 Thread Edson Richter
For me seems to be CSV, but you have ~ in place of double quotes, and 
^ as separator.


Regards,

Edson.


Em 22/08/2012 21:23, Mike Christensen escreveu:

I'd like to import this data into a Postgres database:

http://www.ars.usda.gov/SP2UserFiles/Place/12354500/Data/SR24/dnload/sr24.zip

However, I'm not quite sure what format this is.  It's definitely not
CSV.  Here's an example of a few rows:

~01001~^~0100~^~Butter, salted~^~BUTTER,WITH
SALT~^~~^~~^~Y~^~~^0^~~^6.38^4.27^8.79^3.87
~01002~^~0100~^~Butter, whipped, with salt~^~BUTTER,WHIPPED,WITH
SALT~^~~^~~^~Y~^~~^0^~~^6.38^4.27^8.79^3.87
~01003~^~0100~^~Butter oil, anhydrous~^~BUTTER
OIL,ANHYDROUS~^~~^~~^~Y~^~~^0^~~^6.38^4.27^8.79^3.87
~01004~^~0100~^~Cheese,
blue~^~CHEESE,BLUE~^~~^~~^~Y~^~~^0^~~^6.38^4.27^8.79^3.87
~01005~^~0100~^~Cheese,
brick~^~CHEESE,BRICK~^~~^~~^~Y~^~~^0^~~^6.38^4.27^8.79^3.87
~01006~^~0100~^~Cheese,
brie~^~CHEESE,BRIE~^~~^~~^~Y~^~~^0^~~^6.38^4.27^8.79^3.87
~01007~^~0100~^~Cheese,
camembert~^~CHEESE,CAMEMBERT~^~~^~~^~Y~^~~^0^~~^6.38^4.27^8.79^3.87
~01008~^~0100~^~Cheese,
caraway~^~CHEESE,CARAWAY~^~~^~~^~~^~~^0^~~^6.38^4.27^8.79^3.87
~01009~^~0100~^~Cheese,
cheddar~^~CHEESE,CHEDDAR~^~~^~~^~Y~^~~^0^~~^6.38^4.27^8.79^3.87
~01010~^~0100~^~Cheese,
cheshire~^~CHEESE,CHESHIRE~^~~^~~^~~^~~^0^~~^6.38^4.27^8.79^3.87
~01011~^~0100~^~Cheese,
colby~^~CHEESE,COLBY~^~~^~~^~Y~^~~^0^~~^6.38^4.27^8.79^3.87

Is there an easy way to get this into PG, or a tool I can download for
this, or do I need to parse it myself with a script or something?
Thanks!






--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Alternatives to very large tables with many performance-killing indicies?

2012-08-22 Thread Jeff Janes
On Thu, Aug 16, 2012 at 1:54 PM, Wells Oliver wellsoli...@gmail.com wrote:
 Hey folks, a question. We have a table that's getting large (6 million rows
 right now, but hey, no end in sight).

Does it grow in chunks, or one row at a time?

 It's wide-ish, too, 98 columns.

How many of the columns are NULL for any given row?  Or perhaps
better, what is the distribution of values for any given column?  For
a given column, is there some magic value (NULL, 0, 1, -1, , '')
which most of the rows have?

 The problem is that each of these columns needs to be searchable quickly at
 an application level, and I'm far too responsible an individual to put 98
 indexes on a table.

That is somewhat melodramatic.  Sure, creating 98 indexes does not
come for free.  And it is great that you are aware of this.  But just
because they are not free does not mean they are not worth their cost.
 Look at all the other costs of using a RDBMS.  Each letter of ACID
does not come for free.  But it is often worth the price.

In the generic case, you have a large amount of data to index.
Indexing a lot of data requires a lot of resources.  There is magic
bullet to this.

 Wondering what you folks have come across in terms of
 creative solutions that might be native to postgres. I can build something
 that indexes the data and caches it and runs separately from PG, but I
 wanted to exhaust all native options first.

If the data is frequently updated/inserted, then how would you
invalidate the cache when needed?  And if the data is not frequently
updated/inserted, then what about the obvious PG solution (building 96
indexes) is a problem?

If your queries are of the nature of:

where
  col1=:1 or
  col2=:1 or
  col3=:1 or
...
  col96=:1 or

then a full text index would probably be a better option.

Otherwise, it is hard to say.  You could replace 96 columns with a
single hstore column which has 96 different keys.  But from what I can
tell, maintaining a gin index on that hstore column would probably be
slower than maintaining 96 individual btree indexes.

And if you go with a gist index on the single hstore column, the cost
of maintenance is greatly reduced relative to gin.  But the index is
basically useless, you might as well just drop the index and do the
full table scan instead.

Cheers,

Jeff


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] What text format is this and can I import it into Postgres?

2012-08-22 Thread Bosco Rama
On 08/22/12 17:23, Mike Christensen wrote:
 I'd like to import this data into a Postgres database:
 
 http://www.ars.usda.gov/SP2UserFiles/Place/12354500/Data/SR24/dnload/sr24.zip
 
 However, I'm not quite sure what format this is.  It's definitely not
 CSV.  Here's an example of a few rows:
 
 ~01001~^~0100~^~Butter, salted~^~BUTTER,WITH
 SALT~^~~^~~^~Y~^~~^0^~~^6.38^4.27^8.79^3.87

http://www.ars.usda.gov/SP2UserFiles/Place/12354500/Data/SR24/sr24_doc.pdf

Section Relation Files (Logical page 25)

HTH

Bosco.


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] NULL value comparison

2012-08-22 Thread Craig Ringer

On 08/22/2012 09:37 PM, David Johnston wrote:

On Aug 22, 2012, at 9:23, Michael Sacket msac...@gammastream.com wrote:


Good Day,

I'm trying to figure out why a postgresql query doesn't return what I'd expect 
with a query like this where there are NULL values:

select * from users where is_enabled'Y';

I'm expecting it to return all records where is_enabled is 'N' or NULL.  
Perhaps my expectations are misguided.  Any thoughts would be appreciated.


The only record known to be not equal to Y is N since it is possible the unknown 
value represented by NULL could be Y.  If you really want both you need to use IS DISTINCT FROM



http://sqlblog.com/blogs/paul_nielsen/archive/2007/11/11/the-real-problem-with-null.aspx

Teaching that NULL means unknown tends to lead to confusion down the 
track, in cases where NULL means no value or bork bork oogabooga 
instead.


Null is interpreted as the known value 'no value' by aggregate 
functions; were that not the case, the result of:


regress=# SELECT SUM(i) FROM ( VALUES (1),(2),(NULL),(3) ) x(i);
 sum
-
  6
(1 row)


would be NULL, not 6, and the result of:

regress=# SELECT SUM(i) FROM  generate_series(1,0) i;
 sum
-

(1 row)

ie a sum on no values would not make sense; it's no value here not 
unknown.


Null isn't consistent in meaning, and trying to treat it as unknown 
just leads to confusion. It'd be nice if SQL had separate UNKNOWN and 
NO_VALUE_OR_NA keywords instead of NULL, but alas, it doesn't - and 
I'm not sure that'd cover all the cases either.





--
Craig Ringer


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] What text format is this and can I import it into Postgres?

2012-08-22 Thread John R Pierce

On 08/22/12 5:40 PM, David Johnston wrote:

The first delimiter is ~^~ (tilde-carat-tilde)

The last field is itself delimited with just ^ (carat)


simpler than that, ~ is the QUOTE character, ^ is the field delimiter :)



--
john r pierceN 37, W 122
santa cruz ca mid-left coast



--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] NULL value comparison

2012-08-22 Thread Craig Ringer

On 08/22/2012 10:58 PM, Michael Sacket wrote:

Thank you all very much!

Unfortunately I can't change the query... but I can modify the data.  I updated 
the NULL values to 'N' and put the appropriate NOT NULL constraint and a 
default value of 'N'.



What tool/app is generating the query? They need to be told they're 
doing something wrong and unsafe - unless it's documented that the 
target column must be NOT NULL, anyway.


PostgreSQL has a workaround for one such wrong, broken and unsafe 
program, Microsoft Access. However the workaround is limited to 
transforming = NULL to IS NOT NULL; it doesn't actually change the 
semantics of NULL.


http://www.postgresql.org/docs/9.1/interactive/runtime-config-compatible.html#GUC-TRANSFORM-NULL-EQUALS

--
Craig Ringer




--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] What text format is this and can I import it into Postgres?

2012-08-22 Thread Bosco Rama


On 08/22/12 17:41, Mike Christensen wrote:
 On Wed, Aug 22, 2012 at 5:38 PM, Mike Christensen m...@kitchenpc.com wrote:
 On Wed, Aug 22, 2012 at 5:34 PM, Bosco Rama postg...@boscorama.com wrote:
 On 08/22/12 17:23, Mike Christensen wrote:
 I'd like to import this data into a Postgres database:

 http://www.ars.usda.gov/SP2UserFiles/Place/12354500/Data/SR24/dnload/sr24.zip

 However, I'm not quite sure what format this is.  It's definitely not
 CSV.  Here's an example of a few rows:

 ~01001~^~0100~^~Butter, salted~^~BUTTER,WITH
 SALT~^~~^~~^~Y~^~~^0^~~^6.38^4.27^8.79^3.87

 http://www.ars.usda.gov/SP2UserFiles/Place/12354500/Data/SR24/sr24_doc.pdf

 Section Relation Files (Logical page 25)

 Okay so is there a way to import Relational Files into Postgres?

Hmm.  with a little sed or awk you could convert them into CSV for copy.  Or
you could even just use copy with the '^' as delimiter and massage the tables
afterwards.

There is also pgloader which may help.  I haven't used it but it gets mentioned
here quite a lot.

 The alternative would be to download the Excel version, then Save as
 CSV, and import into Postgres via the COPY command.

That's the lowest effort course you have, I think.

 Oh, also if anyone knows of a way to export an Access database to
 Postgres, that might be helpful. I don't have a copy of Access.

mdb-tools?   http://mdbtools.sourceforge.net/

Not sure if it's up to date.

There are a few other things mentioned here (some links no longer work :-( ) :
http://wiki.postgresql.org/wiki/Converting_from_other_Databases_to_PostgreSQL#Microsoft_Access

Bosco.


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] What text format is this and can I import it into Postgres?

2012-08-22 Thread Mike Christensen
On Wed, Aug 22, 2012 at 5:38 PM, Mike Christensen m...@kitchenpc.com wrote:
 On Wed, Aug 22, 2012 at 5:34 PM, Bosco Rama postg...@boscorama.com wrote:
 On 08/22/12 17:23, Mike Christensen wrote:
 I'd like to import this data into a Postgres database:

 http://www.ars.usda.gov/SP2UserFiles/Place/12354500/Data/SR24/dnload/sr24.zip

 However, I'm not quite sure what format this is.  It's definitely not
 CSV.  Here's an example of a few rows:

 ~01001~^~0100~^~Butter, salted~^~BUTTER,WITH
 SALT~^~~^~~^~Y~^~~^0^~~^6.38^4.27^8.79^3.87

 http://www.ars.usda.gov/SP2UserFiles/Place/12354500/Data/SR24/sr24_doc.pdf

 Section Relation Files (Logical page 25)

 Okay so is there a way to import Relational Files into Postgres?

 The alternative would be to download the Excel version, then Save as
 CSV, and import into Postgres via the COPY command.

Oh, also if anyone knows of a way to export an Access database to
Postgres, that might be helpful. I don't have a copy of Access.


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] What text format is this and can I import it into Postgres?

2012-08-22 Thread John R Pierce

On 08/22/12 5:23 PM, Mike Christensen wrote:

I'd like to import this data into a Postgres database:

http://www.ars.usda.gov/SP2UserFiles/Place/12354500/Data/SR24/dnload/sr24.zip

However, I'm not quite sure what format this is.  It's definitely not
CSV.  Here's an example of a few rows:



this worked for me... (in psql)

create table test (f1 text,f2 text,f3 text,f4 text,f5 text,f6 text,f7 
text,f8 text,f9 text);

 \copy test from DATA_SRC.txt with delimiter '^' csv quote as '~'
select * from test;
-[ RECORD 1 
]---

--
f1 | D1066
f2 | G.V. Mann
f3 | The Health and Nutritional status of Alaskan Eskimos.
f4 | 1962
f5 | American Journal of Clinical Nutrition
f6 | 11
f7 |
f8 | 31
f9 | 76
-[ RECORD 2 
]---

--
f1 | D1073
f2 | J.P. McBride, R.A. Maclead
f3 | Sodium and potassium in fish from the Canadian Pacific coast.
f4 | 1956
f5 | Journal of the American Dietetic Association
f6 | 32
f7 |
f8 | 636
f9 | 638
-[ RECORD 3 
]---

--
f1 | D1107
f2 | M.E. Stansby
f3 | Chemical Characteristics of fish caught in the northwest Pacific 
Oceans.

f4 | 1976
f5 | Marine Fish Rev.
f6 | 38
f7 | 9
f8 | 1
f9 | 11
.



obviously, use better field  names...

--
john r pierceN 37, W 122
santa cruz ca mid-left coast



--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


[GENERAL] no null checking/check constraint checking in nested tables: Bug, missing feature, or desired behavior?

2012-08-22 Thread Chris Travers
Hi;

I was noticing that when storing nested data in PostgreSQL, that both
CHECK and NOT NULL  constraints are not fired.

It seems like this is a case where inheritance provides a cleaner way
to incorporate re-usable data structures (with internal integrity
enforcement and method mapping) in the database (and I am thinking
that nested tables might be something largely relegated to views), so
I don't see this as a big deal at all.  But it did puzzle me at first.
 I guess I kinda see the logic in it right now.

As an aside, the more I delve into table inheritance, the more amazing
and useful it actually is (warts and all), and multiple inheritance
(which to my knowledge is only supported by PostgreSQL) turns this
into something I expect to use a lot more of in the future.  I will
probably send a second email out at some point with my thoughts on
this.

I guess my major reason for asking is wondering if this is behavior
that is expected to change in the future or if the idea that table
constraints are only enforced on the named table is something that is
likely to change.

Here is a minimal example case:


or_examples=# create table test.typetest (id int not null, check (id  0));
CREATE TABLE
or_examples=# create table test.tabletest (test test.typetest);
CREATE TABLE
or_examples=# insert into test.tabletest values (row(-1));
INSERT 0 1
or_examples=# insert into test.tabletest values (row(null));
INSERT 0 1

To do this, I have to (after deleting rows):
or_examples=# alter table test.tabletest add check ((test).id is not
null and (test).id  0);

or_examples=# select * from test.tabletest;
 test
--
 (-1)
 ()
(2 rows)


To do this, I have to (after deleting rows):
or_examples=# alter table test.tabletest add check ((test).id is not
null and (test).id  0);

or_examples=# insert into test.tabletest values (row(null));ERROR:
new row for relation tabletest violates check constraint
tabletest_test_check

or_examples=# select version()
or_examples-# ;
  version

-
--
 PostgreSQL 9.1.4 on i386-redhat-linux-gnu, compiled by gcc (GCC)
4.7.0 20120507
(Red Hat 4.7.0-5), 32-bit
(1 row)


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] What text format is this and can I import it into Postgres?

2012-08-22 Thread David Johnston
The first delimiter is ~^~ (tilde-carat-tilde)

The last field is itself delimited with just ^ (carat)

I would use text parsing tools to do this myself though various commands in 
PosegreSQL could be combined to get the desired result.  The last 4 numbers 
(second parse) should probably be stored in a numeric[]

Look at COPY and regexp_matches()

David J.


On Aug 22, 2012, at 20:23, Mike Christensen m...@kitchenpc.com wrote:

 I'd like to import this data into a Postgres database:
 
 http://www.ars.usda.gov/SP2UserFiles/Place/12354500/Data/SR24/dnload/sr24.zip
 
 However, I'm not quite sure what format this is.  It's definitely not
 CSV.  Here's an example of a few rows:
 
 ~01001~^~0100~^~Butter, salted~^~BUTTER,WITH
 SALT~^~~^~~^~Y~^~~^0^~~^6.38^4.27^8.79^3.87
 ~01002~^~0100~^~Butter, whipped, with salt~^~BUTTER,WHIPPED,WITH
 SALT~^~~^~~^~Y~^~~^0^~~^6.38^4.27^8.79^3.87
 ~01003~^~0100~^~Butter oil, anhydrous~^~BUTTER
 OIL,ANHYDROUS~^~~^~~^~Y~^~~^0^~~^6.38^4.27^8.79^3.87
 ~01004~^~0100~^~Cheese,
 blue~^~CHEESE,BLUE~^~~^~~^~Y~^~~^0^~~^6.38^4.27^8.79^3.87
 ~01005~^~0100~^~Cheese,
 brick~^~CHEESE,BRICK~^~~^~~^~Y~^~~^0^~~^6.38^4.27^8.79^3.87
 ~01006~^~0100~^~Cheese,
 brie~^~CHEESE,BRIE~^~~^~~^~Y~^~~^0^~~^6.38^4.27^8.79^3.87
 ~01007~^~0100~^~Cheese,
 camembert~^~CHEESE,CAMEMBERT~^~~^~~^~Y~^~~^0^~~^6.38^4.27^8.79^3.87
 ~01008~^~0100~^~Cheese,
 caraway~^~CHEESE,CARAWAY~^~~^~~^~~^~~^0^~~^6.38^4.27^8.79^3.87
 ~01009~^~0100~^~Cheese,
 cheddar~^~CHEESE,CHEDDAR~^~~^~~^~Y~^~~^0^~~^6.38^4.27^8.79^3.87
 ~01010~^~0100~^~Cheese,
 cheshire~^~CHEESE,CHESHIRE~^~~^~~^~~^~~^0^~~^6.38^4.27^8.79^3.87
 ~01011~^~0100~^~Cheese,
 colby~^~CHEESE,COLBY~^~~^~~^~Y~^~~^0^~~^6.38^4.27^8.79^3.87
 
 Is there an easy way to get this into PG, or a tool I can download for
 this, or do I need to parse it myself with a script or something?
 Thanks!
 
 
 -- 
 Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-general


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Can column name aliases be supported?

2012-08-22 Thread Craig Ringer

On 08/23/2012 06:41 AM, Bosco Rama wrote:

Ooops!  Hit send too fast...

On 08/22/12 15:34, Bosco Rama wrote:

On 08/22/12 15:19, Gauthier, Dave wrote:


I know a view can be used to alias one of them.


It can alias all of them:

create view xyz as
select *, last_name as lname, last_name as surname
   from mytable;

(not the nicest version but functional)


... and then use an INSTEAD/ALSO rule to do the update, if needed.


... though if the user UPDATEd more then one of them, who knows what'd 
happen.


I'd either (a) fire the client, or (b) define different views for 
different clients, with appropriate DO INSTEAD triggers (9.1) or rules 
(9.0 and below).


(a) would be awfully tempting.

--
Craig Ringer


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] What text format is this and can I import it into Postgres?

2012-08-22 Thread Mike Christensen
On Wed, Aug 22, 2012 at 5:57 PM, John R Pierce pie...@hogranch.com wrote:
 On 08/22/12 5:23 PM, Mike Christensen wrote:

 I'd like to import this data into a Postgres database:


 http://www.ars.usda.gov/SP2UserFiles/Place/12354500/Data/SR24/dnload/sr24.zip

 However, I'm not quite sure what format this is.  It's definitely not
 CSV.  Here's an example of a few rows:


 this worked for me... (in psql)

 create table test (f1 text,f2 text,f3 text,f4 text,f5 text,f6 text,f7
 text,f8 text,f9 text);
  \copy test from DATA_SRC.txt with delimiter '^' csv quote as '~'
 select * from test;
 -[ RECORD 1
 ]---
 --
 f1 | D1066
 f2 | G.V. Mann
 f3 | The Health and Nutritional status of Alaskan Eskimos.
 f4 | 1962
 f5 | American Journal of Clinical Nutrition
 f6 | 11
 f7 |
 f8 | 31
 f9 | 76
 -[ RECORD 2
 ]---
 --
 f1 | D1073
 f2 | J.P. McBride, R.A. Maclead
 f3 | Sodium and potassium in fish from the Canadian Pacific coast.
 f4 | 1956
 f5 | Journal of the American Dietetic Association
 f6 | 32
 f7 |
 f8 | 636
 f9 | 638
 -[ RECORD 3
 ]---
 --
 f1 | D1107
 f2 | M.E. Stansby
 f3 | Chemical Characteristics of fish caught in the northwest Pacific
 Oceans.
 f4 | 1976
 f5 | Marine Fish Rev.
 f6 | 38
 f7 | 9
 f8 | 1
 f9 | 11
 .



 obviously, use better field  names...

SWEET!

This is exactly what I needed.  Thanks so much..  Already got the
first table imported..

Mike


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] NULL value comparison

2012-08-22 Thread Michael Sacket

On Aug 22, 2012, at 8:17 PM, Craig Ringer wrote:

 On 08/22/2012 10:58 PM, Michael Sacket wrote:
 Thank you all very much!
 
 Unfortunately I can't change the query... but I can modify the data.  I 
 updated the NULL values to 'N' and put the appropriate NOT NULL constraint 
 and a default value of 'N'.
 
 
 What tool/app is generating the query? They need to be told they're doing 
 something wrong and unsafe - unless it's documented that the target column 
 must be NOT NULL, anyway.
 
 PostgreSQL has a workaround for one such wrong, broken and unsafe program, 
 Microsoft Access. However the workaround is limited to transforming = NULL 
 to IS NOT NULL; it doesn't actually change the semantics of NULL.
 
 http://www.postgresql.org/docs/9.1/interactive/runtime-config-compatible.html#GUC-TRANSFORM-NULL-EQUALS



Well... the query would be my fault, before I learned the value of having most 
columns NOT NULL.  It's from an old WebObjects application.  EOF at the time 
was generally not especially helpful with modeling boolean values and hooking 
them up to checkboxes so I (erroneously as it turns out) defined true to be 'Y' 
and anything else as false.  In any case, it worked without issue until I 
switched to PostgreSQL yesterday and I didn't understand why.  I was looking 
for the quickest route to a working version without having to coax EOF into 
generating the a different select.

The good news is I now have the proper constraints in place and the app and 
it's 130 tables are working with PostgreSQL in less than a day.  Additionally, 
thanks to this list, I believe I understand the semantics of NULL now.  

Thanks!
Michael

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] NULL value comparison

2012-08-22 Thread Craig Ringer

On 08/23/2012 10:32 AM, Michael Sacket wrote:


The good news is I now have the proper constraints in place and the app and 
it's 130 tables are working with PostgreSQL in less than a day.


Wow, that's cool, especially without SQL changes.

What was the previous database? I'm curious now.

--
Craig Ringer


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] What text format is this and can I import it into Postgres?

2012-08-22 Thread Adrian Klaver

On 08/22/2012 05:41 PM, Mike Christensen wrote:

On Wed, Aug 22, 2012 at 5:38 PM, Mike Christensen m...@kitchenpc.com wrote:

On Wed, Aug 22, 2012 at 5:34 PM, Bosco Rama postg...@boscorama.com wrote:

On 08/22/12 17:23, Mike Christensen wrote:

I'd like to import this data into a Postgres database:

http://www.ars.usda.gov/SP2UserFiles/Place/12354500/Data/SR24/dnload/sr24.zip

However, I'm not quite sure what format this is.  It's definitely not
CSV.  Here's an example of a few rows:

~01001~^~0100~^~Butter, salted~^~BUTTER,WITH
SALT~^~~^~~^~Y~^~~^0^~~^6.38^4.27^8.79^3.87


http://www.ars.usda.gov/SP2UserFiles/Place/12354500/Data/SR24/sr24_doc.pdf

Section Relation Files (Logical page 25)


Okay so is there a way to import Relational Files into Postgres?

The alternative would be to download the Excel version, then Save as
CSV, and import into Postgres via the COPY command.


Oh, also if anyone knows of a way to export an Access database to
Postgres, that might be helpful. I don't have a copy of Access.


There is MDB tools. It is only works with older versions of Access and 
is limited in what it can do:


http://mdbtools.sourceforge.net/

Do you have a Windows machine available? Depending on what is installed 
there may be an Access ODBC driver available that will allow you jump 
through an intermediate step i.e. Excel.








--
Adrian Klaver
adrian.kla...@gmail.com


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] What text format is this and can I import it into Postgres?

2012-08-22 Thread John R Pierce

On 08/22/12 7:17 PM, Mike Christensen wrote:

This is exactly what I needed.  Thanks so much..  Already got the
first table imported..


I think would use 'TEXT' for the string fields, INTEGER for the whole 
numbers and NUMERIC for the fractional ones...
once you have the data imported, and define the appropriate field of 
each table as its PRIMARY KEY, you should be able to do a wide range of 
joins to collect specific sorts of data.




--
john r pierceN 37, W 122
santa cruz ca mid-left coast



--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Alternatives to very large tables with many performance-killing indicies?

2012-08-22 Thread Scott Marlowe
On Thu, Aug 16, 2012 at 2:54 PM, Wells Oliver wellsoli...@gmail.com wrote:
 Hey folks, a question. We have a table that's getting large (6 million rows
 right now, but hey, no end in sight). It's wide-ish, too, 98 columns.

 The problem is that each of these columns needs to be searchable quickly at
 an application level, and I'm far too responsible an individual to put 98
 indexes on a table. Wondering what you folks have come across in terms of
 creative solutions that might be native to postgres. I can build something
 that indexes the data and caches it and runs separately from PG, but I
 wanted to exhaust all native options first.

I submit that you're far better off working with the users to see
which fields they really need indexes on, and especially which
combinations of fields with functional and / or partial indexes serve
them the best.

To start with you can create indexes willy nilly if you want and then
use the pg_stat*index tables to see which are or are not getting used
and start pruning them as time goes by.  But keep an eye out for long
running queries with your logging and investigate to see what
specialized indexes might help the most for those queries.  Often a
simple index on (a,b) where x is not null or something can give great
improvements over any bitmap hash scans of multiple indexes ever
could, especially on large data sets.


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


[GENERAL] Some thoughts on table inheritance (which is uniquely awesome on PostgreSQL)

2012-08-22 Thread Chris Travers
I have now been working with table inheritance for a while and after
starting to grapple with many of the use cases it has have become
increasingly impressed with this feature.  I also think that some of
the apparent limitations fundamentally follow from the support for
multiple inheritance, and multiple inheritance itself is so useful I
would not want to see this go away.  Inheritance really starts to come
to its own once you start using table methods, and some features that
are useful in some sorts of inheritance modelling are useless in
others.

Below I will offer a few suggestions regarding what can be done to
make life a little easier for those of us using these features and
they are not the typical suggestions.  I still stand by my view that
at least from what I have looked at, PostgreSQL allows you to do
object-relational modelling better than you can do in other ORDBMS's I
have looked at so far.

What table inheritance gives you is an ability to model pieces of a
database, and derived information, on small units which can then be
later combined.  When we think of that as the primary use case
(instead of set/subset modelling) then the current DDL caveats largely
don't apply.  Logic can thus follow a group of columns rather than
having to be repetitively attached to tables.  The fact that this
allows you to create essentially derived values from groups of re-used
columns is itself remarkable and can be used to implement path
traversal etc. which is not directly supported in PostgreSQL in the
sense that it is in Oracle or DB2.  With multiple inheritance you can
actually build superior path traversal systems than you can easily on
DB2 or Oracle because you can re-use enforced foreign keys (Oracle has
an IS DANGLING operator for cross-table references!).

As far as I can tell, this sort of use is PostgreSQL-only because it
relies on multiple inheritance which is not supported on DB2,
Informix, or Oracle.  I am not aware of any other ORDBMS that allows
for multiple inheritance and this has profound impacts on things like
primary key inheritance, which I think becomes meaningless when
combined with multiple inheritance.  If I inherit two tables each with
a different primary key, I obviously cannot inherit both without
having multiple primary keys in the child table.

I have to be the sort of person who sees bugs as features, but in this
respect I cannot see the lack of inheriting a primary key as a bug
anymore.  It seems to me mathematically incompatible with PostgreSQL's
take on table inheritance generally and this is one of those cases
where multiple inheritance changes everything.

Additionally it is important to note that primary key management is
not a huge problem because it can be solved using techniques borrowed
from table partitioning.  If you are doing set/subset modelling (as in
the cities/capitals example) the solution is to have a cities table
which is constrained with a trigger or rule which does not allow
inserts and then capitals and noncapitals tables. The primary key can
then include an is_capital bool field which can be constrained
differently on both tables.  This has the advantage of knowing whether
a city selected is a capital from the top-level query as well, and
allows for the planner to treat the inheritance tree as a partitioned
table set.  Superset-constraint management would also have to use a
second table which would be referenced by all child tables (and
perhaps maintained by triggers).   While superset management tables
can be used to solve a subset of foreign key problems, they highlight
a different (and perhaps more solvable) set of these problems.

As far as I can tell, Oracle and DB2 do not discuss primary key
inheritance and it isn't clear whether this is a problem on those
platforms too.  Foreign key management pretty clearly is a problem
given the way these platforms handle cross-relational REFs.  In other
words, I think that on the whole table inheritance is still cutting
edge on PostgreSQL and has been for some time.

Foreign keys can be managed in a few ways including superset
constraint tables maintained with triggers.  These work well for
enforcing foreign keys against subsets, but inheriting a foreign key
constraint means redefining it repetitively on every child table.  At
the same time, not all foreign keys may want to be inherited.

The following changes to behavior I would personally find very useful
(and I believe would be useful in partitioned tables as well):

   * foreign keys (i.e. REFERENCES clauses) being able to be marked
INHERIT or NOINHERIT on the parent table.  INHERIT foreign keys would
be automatically created on child tables.  The default could be left
to be NOINHERIT to avoid breaking backwards compatibility.

   * unique constraints being able to be marked INHERIT or NOINHERIT.
A unique constraint that is marked INHERIT would be automatically
created again on the child table.  This could be documented to be
domain-specific to each child 

Re: [GENERAL] Can column name aliases be supported?

2012-08-22 Thread Chris Angelico
On Thu, Aug 23, 2012 at 8:19 AM, Gauthier, Dave dave.gauth...@intel.com wrote:
 I have a table with a column called last_name.  I have one customer who
 likes to articulate queries and updates for this using column name
 last_name (no problem there) but another who likes to call it lname and
 yet another who likes to call it surname.So 3 different names for the
 same physical column in the table.

Here's an out-of-the-box suggestion.

Drop the column altogether and have a single column name. Trying to
divide names up never works properly. Does surname mean family name?
Not all cultures put the family name last. Is last_name simply the
part of the name after the last space?

Save yourself a whole lot of trouble and just store names in single
fields. And you dodge the field naming issue at the same time!

ChrisA


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] What text format is this and can I import it into Postgres?

2012-08-22 Thread Mike Christensen
 This is exactly what I needed.  Thanks so much..  Already got the
 first table imported..


 I think would use 'TEXT' for the string fields, INTEGER for the whole
 numbers and NUMERIC for the fractional ones...
 once you have the data imported, and define the appropriate field of each
 table as its PRIMARY KEY, you should be able to do a wide range of joins to
 collect specific sorts of data.

The PDF that's included in the ZIP file actually has all the data
types and precisions, primary keys, and relations.  I just copied that
for my table schemas.  I decided not to define any FK constraints
because I want to easily zap all the data and re-import it when new
versions of the database are released, and I never change any of the
data myself.

Mike


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Can column name aliases be supported?

2012-08-22 Thread Craig Ringer

On 08/23/2012 11:56 AM, Chris Angelico wrote:

On Thu, Aug 23, 2012 at 8:19 AM, Gauthier, Dave dave.gauth...@intel.com wrote:

I have a table with a column called last_name.  I have one customer who
likes to articulate queries and updates for this using column name
last_name (no problem there) but another who likes to call it lname and
yet another who likes to call it surname.So 3 different names for the
same physical column in the table.


Here's an out-of-the-box suggestion.

Drop the column altogether and have a single column name. Trying to
divide names up never works properly. Does surname mean family name?
Not all cultures put the family name last. Is last_name simply the
part of the name after the last space?


+1 to that, and it gets way worse:

http://www.kalzumeus.com/2010/06/17/falsehoods-programmers-believe-about-names/

and while you're at it, read this:

http://www.joelonsoftware.com/articles/Unicode.html

--
Craig Ringer



--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Can column name aliases be supported?

2012-08-22 Thread Chris Angelico
On Thu, Aug 23, 2012 at 2:19 PM, Craig Ringer ring...@ringerc.id.au wrote:
 On 08/23/2012 11:56 AM, Chris Angelico wrote:

 Here's an out-of-the-box suggestion.

 Drop the column altogether and have a single column name. Trying to
 divide names up never works properly. Does surname mean family name?
 Not all cultures put the family name last. Is last_name simply the
 part of the name after the last space?

 +1 to that, and it gets way worse:

 http://www.kalzumeus.com/2010/06/17/falsehoods-programmers-believe-about-names/

Yes, that link was posted on python-list a little while back, and
that's what I had in mind as I was writing that up. Couldn't remember
the actual link though. Thanks!

 and while you're at it, read this:

 http://www.joelonsoftware.com/articles/Unicode.html

Definitely. I disagree with Joel Spolsky on many things, but I agree
with that post. These days, Unicode is an absolute necessity. Our
PHP-based web site has a number of issues with Unicode input, but at
least everything that goes through the database (we use Postgres for
everything) is safe.

ChrisA


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Some thoughts on table inheritance (which is uniquely awesome on PostgreSQL)

2012-08-22 Thread David Johnston
On Aug 22, 2012, at 23:22, Chris Travers chris.trav...@gmail.com wrote:

  * unique constraints being able to be marked INHERIT or NOINHERIT.
 A unique constraint that is marked INHERIT would be automatically
 created again on the child table.  This could be documented to be
 domain-specific to each child table, and that if you need super-set
 unique constraints, you need to borrow techniques from table
 partitioning.
 
 
 * PRIMARY KEY inheritance would be documented as necessarily
 excluded by multiple inheritance.  The concept simply doesn't make
 sense when a child table can have multiple parents.  If it did, you'd
 have multiple primary keys.  According to Oracle and DB2
 documentation, the best they offer regarding such things is an OID
 field anyway.
 

late night thinking here...

An inherited PK constraint could be made into a unique/not-null constraint if a 
second PK constraint becomes inherited.  In that case the table no longer has a 
PK constraint until the user creates one that makes sense.

This whole issue occurs due to surrogate keys being used as PK.  In a 
partitioning scheme the partitioning field should be part of the natural key 
and thus cross-relation matching could not occur in the absence of a 
mis-allocation which a partition specific check constraint on that column would 
solve.

In an OO situation, in the absence of partitioning, a key is a concept of 
identity.  Identity requires that the type of two entities matches; and the 
type of a child object will never match the type of an object of its parent.  
Thus regardless of single or multiple inheritance PK inheritance makes no sense 
in an OO situation.

Even with multiple inheritance you might want to inherit a PK from from parent 
but from the other parent(s) you might simply want to inherit their PK as a 
unique constraint.  In so doing you assert that you are on the same level as 
the PK parent objects while you have different attributes than your siblings.  
Jack and Jill can both inherit PK from human being but could inherit 
phone_number and email from contact_info (not the best example I know...I tried 
making gender work here but my mind went blank if trying to rationalize why 
gender wouldn't just be a FK).

For FK, however, the question is whether I am referencing a specific instance 
or whether I simply am referencing an arbitrary set of properties that anything 
matching those properties could match.  Currently the former is what we have, 
and since we are dealing with entities (as opposed to behavior) that makes 
sense.  Ignoring partitioning if I define an FK relationship to flying things 
 I supposedly do not care whether you store a bird-type flyer or an 
airplane-type flyer.  If someone names their pet bird Polly and someone else 
names a plane Polly then what...Inheriting an FK to a target non-partitioned 
table makes sense but how does one deal with inheriting onto a target table 
that has children?

Just some thoughts as I have not, as became obvious writing this, thought 
through using the database in this fashion.  Most inheritance I have used is 
behavioral in nature whereas a database deals with identity.  Segregating 
between type composition and partitioning mentally, and ideally in the 
language, makes a lot of sense to me.  It seems that currently both models are 
partially implemented and done so using the same syntactical foundation...

David J.









-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general