from:"Benjamin Smith"

Re: [GENERAL] Deletion Challenge

2015-12-14 Thread Benjamin Smith

> test=> delete from cash_journal where ARRAY[click, cash_journal_id] NOT in
> (select max(ARRAY[click,cash_journal_id]) from cash_journal group by
> fairian_id); DELETE 7


For what it's worth, we've run into *severe* performance issues using in() if 
there are a large number of values in conjunction with a complex query. (EG: 
more than 10,000) Using a with() prefix table and joining against that doesn't 
seem to carry anything like a similar performance penalty.


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Deletion Challenge

2015-12-14 Thread Benjamin Smith

On Saturday, December 05, 2015 11:08:05 AM Berend Tober wrote:
> WITH max_click AS (
>SELECT
>  cash_journal.fairian_id,
>  max(cash_journal.click) AS click
>  FROM cash_journal
>  GROUP BY cash_journal.fairian_id
>  )
>delete from cash_journal j
>  using max_click b
>  where j.fairian_id = b.fairian_id
>  and j.click< b.click;
> 
> WITH max_journal_id AS (
>SELECT
>  cash_journal.fairian_id,
>  cash_journal.click,
>  max(cash_journal.cash_journal_id) AS cash_journal_id
>  FROM cash_journal
>  GROUP BY cash_journal.fairian_id, cash_journal.click
>  )
>delete from cash_journal j
>   using max_journal_id b
>   where j.fairian_id= b.fairian_id
>   and j.click   = b.click
>   and j.cash_journal_id < b.cash_journal_id;

Although I couldn't be sure if this would provide atomicity, I'd merge these 
into one query like: 

WITH max_click AS (
   SELECT
 cash_journal.fairian_id,
 max(cash_journal.click) AS click
 FROM cash_journal
 GROUP BY cash_journal.fairian_id
 ), 
max_journal_id AS (
   SELECT
 cash_journal.fairian_id,
 cash_journal.click,
 max(cash_journal.cash_journal_id) AS cash_journal_id
 FROM cash_journal
 GROUP BY cash_journal.fairian_id, cash_journal.click
 ), 
delete_journal1 AS 
 (
   delete from cash_journal j
 using max_click b
 where j.fairian_id = b.fairian_id
 and j.click< b.click
returning *, 'journal1'::varchar AS source
), 
delete_journal2 AS 
   (
   delete from cash_journal j
  using max_journal_id b
  where j.fairian_id= b.fairian_id
  and j.click   = b.click
  and j.cash_journal_id < b.cash_journal_id
returning *, 'journal2'::varchar AS source
   )
-- AND THEN TO FIND OUT WHAT HAPPENED 
SELECT delete_journal1.* 
UNION ALL 
select delete_journal2.* 




-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Permissions, "soft read failure" - wishful thinking?

2015-12-14 Thread Benjamin Smith

On Monday, December 14, 2015 05:25:16 PM Adrian Klaver wrote:
> > FOLLOWUP QUESTION: is there a way to ask the query planner what
> > tables/fields were output in a database result?
> 
> Just dawned on me, are you asking if EXPLAIN can output more detailed 
> information?

Ha ha, in another post, I just explained that the idea for the follow up 
question came from EXPLAIN ANALYZE. Yes, the idea being to see if there was a 
way to ask PG what tables/fields were used to output a specific result, field 
by 
field, and then squelch these fields in our DB abstraction layer rather than in 
the DB directly. 

We're being asked to satisfy some pretty strict guarantees of data privacy 
that were unanticipated when designing our product. Adding strict permissions 
now would be an expensive proposition. 

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Permissions, "soft read failure" - wishful thinking?

2015-12-14 Thread Benjamin Smith

On Monday, December 14, 2015 05:20:52 PM Adrian Klaver wrote:
> > FOLLOWUP QUESTION: is there a way to ask the query planner what
> > tables/fields were output in a database result?
> 
> I am not following, that would be in the query output would it not? A 
> more detailed explanation of what you want to see would be helpful.

The problem is that permissions were originally designed to be handled in the 
ORM (and they are, just fine) and read-level permissions were set up at the 
page level (it's a web based application) so you can either access the page or 
you can't. 

But in order to satisfy a large client, they want the guarantee of permissions 
at the field level. Problem is, our product has been in development for a LONG 
time (started when PHP3 was king of the hill) and if we did that, large, 
complex queries would break in many horrible ways. 

For example, it's typical to create a with prefix table with a 5-table query, 
then link to that two or three times over in order to develop a complex 
relationship with data, and then output the result. 

I may be wishing for magic, but would it be possible to be able to ask the DB 
server where the fields of data got their data from? (so we could squelch the 
output as it passes through our DB abstraction layer)

EXPLAIN ANALYZE *almost* does this, thus the thought. 

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

[GENERAL] Permissions, "soft read failure" - wishful thinking?

2015-12-14 Thread Benjamin Smith

Is there a way to set PG field-level read permissions so that a deny doesn't 
cause the query to bomb, but the fields for which permission is denied to be 
nullified? 

In our web-based app, we have a request to implement granular permissions: 
table/field level permissions. EG: userX can't read customers.socialsecurity in 
any circumstance. We'd like to implement DB-level permissions; so far, we've 
been using an ORM to manage CRUD permissions. 

This is old hat, but our system has a large number of complex queries that 
immediately break if *any* field permission fails. So, implementing this for 
customers could be *very* painful 

Is that there is a way to let the query succeed, but nullify any fields where 
read permissions fail? (crossing fingers) We'd be watching the PG logs to 
identify problem queries in this case. 


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

[GENERAL] converting in() clause into a with prefix?

2015-10-16 Thread Benjamin Smith

I have a horribly-performing query similar to below, and I'd like to convert 
it to use a "WITH mytable as ( ... ) " without having to re-architect my code. 
For some reason, using a WITH prefix seems to generally work much faster than 
IN() sub clause even allowing identical results. (runs in 1/4th the time) 

Is there a PG native function that can convert the listing format of in() 
clause to row-level results from a WITH prefix? I see the array* functions but 
they seem to work with arrays like "array[1,2,3]" and unnest seems to drill 
right through nested arrays and flattens every single element to a new row, 
regardless of depth.  EG: the following two lines are equivalent: 

select unnest(array([1,2,2,3]); 
select unnest(array[array[1,2],array[2,3]]); 

I'd expect the latter to put out two rows as
 
1, 2
2, 3

Thanks for your input, clarifying pseudo code examples below (PHP). We're 
running 9.4.4 on CentOS 6. 

Ben 


// DESIRED END RESULT PSUEDO CODE  
$query = "
WITH mytable AS 
(
unnest(". $in .", school_id, building_id) 
)
SELECT
id, 
name
FROM mytable 
JOIN classes ON 
(
mytable.school_id = classes.school_id
AND mytable.building_id = classes.building_id 
)" ;


// CURRENT CODE EXAMPLE (PHP) 
$query = "
SELECT 
id, 
name 
FROM classes 
WHERE 
(classes.school_id, classes.building_id) IN (" . $in . ")"; 


// EXAMPLE RESULT (small list) 
SELECT 
id, 
name 
FROM classes 
WHERE 
(classes.school_id, classes.building_id) IN ((291,189),(291,192),
(291,191),(291,195),(291,198),(291,197),(291,188),(291,184),(291,199),
(291,193),(291,185),(291,194),(291,-2),(291,186),(291,183),(291,187),
(291,200),(291,190),(291,196),(291,202),(200,455),(200,433),(200,442),
(200,440),(200,445),(200,438),(200,437),(200,450),(200,452),(200,459),
(200,456),(200,432),(200,441),(200,460),(200,446),(200,447),(200,448),
(200,449),(200,434),(200,443),(200,-2),(200,444),(200,457),(200,458),
(200,451),(200,454),(200,453),(246,-1),(246,190),(246,186),(246,188),
(246,195),(246,196),(246,197),(246,192),(246,-2),(246,184),(246,189),
(246,193),(246,194),(246,198),(246,191),(246,199),(246,187),(246,185),
(246,183),(246,200),(63,-1),(63,269),(63,267),(63,264),(63,258),(63,126),
(63,270),(63,262),(63,127),(63,134),(63,263),(63,271),(63,272),(63,265),
(63,-2),(63,259),(63,260),(63,266),(9,-1),(9,283),(9,192),(9,266),(9,276),
(9,313),(9,294),(9,311),(9,298),(9,299),(9,300),(9,312),(9,310),(9,263),
(9,272),(9,305),(9,314),(9,152),(9,284),(9,277),(9,260),(9,271),(9,278),
(9,315),(9,287),(9,289),(9,316),(9,288),(9,290),(9,259),(9,258),(9,269),
(9,279),(9,265),(9,273),(9,280),(9,64),(9,264),(9,281),(9,317),(9,304),
(9,303),(9,322),(9,267),(9,282),(9,318),(9,306),(9,295),(9,319),(9,301),
(9,302),(9,43),(9,268),(9,275),(9,-2),(9,262),(9,296),(9,297),(9,320),(9,286),
(9,274),(9,307),(9,308),(9,321),(9,270),(9,261),(9,309),(9,292),(9,293),
(9,285),(9,291),(94,-1),(94,280),(94,285),(94,268),(94,265),(94,278),(94,283),
(94,287),(94,288),(94,289),(94,284),(94,281),(94,110),(94,262),(94,259),
(94,273),(94,270),(94,266),(94,258),(94,269),(94,274),(94,263),(94,290),
(94,272),(94,264),(94,260),(94,279),(94,261),(94,286),(94,275),(94,277),
(94,291),(94,109),(94,-2),(94,257),(111,-1),(111,452),(111,486),(111,404),
(111,456),(111,455),(111,485),(111,484),(111,483),(111,457),(111,441),
(111,468),(111,430),(111,442),(111,446),(111,490),(111,440),(111,466),
(111,491),(111,449),(111,454),(111,461),(111,492),(111,469),(111,465),
(111,458),(111,426),(111,443),(111,493),(111,488),(111,475),(111,481),
(111,482),(111,474),(111,477),(111,478),(111,476),(111,479),(111,480),
(111,471),(111,463),(111,450),(111,470),(111,494),(111,447),(111,448),
(111,495),(111,-2),(111,473),(111,496),(111,429),(111,487),(111,497),
(111,167),(111,444),(111,451),(111,431),(111,445),(111,166),(111,453),
(111,467),(111,462),(111,464),(111,472),(111,489),(334,-1),(334,188),
(334,189),(334,203),(334,201),(334,200),(334,193),(334,192),(334,191),
(334,202),(334,185),(334,190),(334,204),(334,186),(334,187),(334,197),
(334,199),(334,205),(334,184),(334,194),(334,-2),(334,196),(334,183),
(334,195),(334,206),(334,198),(334,207),(201,-1),(201,445),(201,442),
(201,446),(201,468),(201,458),(201,455),(201,456),(201,469),(201,454),
(201,453),(201,466),(201,474),(201,443),(201,441),(201,440),(201,447),
(201,470),(201,460),(201,459),(201,461),(201,471),(201,465),(201,463),
(201,448),(201,451),(201,472),(201,457),(201,-2),(201,462),(201,464),
(201,452),(201,449),(201,450),(201,473),(201,467),(201,475),(62,-1),(62,279),
(62,280),(62,294),(62,281),(62,282),(62,285),(62,274),(62,299),(62,300),
(62,290),(62,291),(62,289),(62,273),(62,286),(62,194),(62,295),(62,275),
(62,-2),(62,292),(62,301),(62,196),(62,195),(62,296),(62,276),(62,284),
(62,287),(62,297),(62,288),(62,277),(62,298),(62,278),(188,-1),(188,443),
(188,446),(188,449),(188,453),(188,454),(188,455),(188,456),(188,450),
(188,445),(188,

Re: [GENERAL] Postgresql 9.4 and ZFS?

2015-09-30 Thread Benjamin Smith

On Wednesday, September 30, 2015 03:49:44 PM Keith Fiske wrote:
> We've run postgres on ZFS for years with great success (first on
> OpenSolaris, now on OmniOS, and I personally run it on FreeBSD). The
> snapshotting feature makes upgrades on large clusters much less scary
> (snapshot and revert if it goes bad) and being able to bring a snapshot
> backup up as a clone to restore an accidentally dropped table is great.

Somebody mentioned some trouble running it with ZFS on Linux, which is exactly 
how we're planning our roll out. (We're a RHEL/CentOS shop) Have you tried 
that config, and has it worked for you? 

> Others have given a lot of great advice as far as system tuning. Only other
> thing I can add is you definitely do want your data directory on its own
> pool. But I recommend putting the actual data in a folder under that pool
> (possibly by major version name). For example if your pool is
> 
> /data/postgres
> 
> Create a folder under that directory to actually put the data:
> 
> mkdir /data/postgres/9.4
> 
> This allows pg_upgrade's --link option to work during major upgrades since
> you can't have an upgrade destination on a different filesystem. Just make
> a 9.5 directory in the same spot when the time comes around. With ZFS
> snapshots available, there's really no reason not to use the --link option
> to greatly speed up upgrades.

Recently, the PGDG RPMs provided by PostgreSQL have done something similar by 
moving from /var/lib/pgsql/ to (EG) /var/lib/pgsql/9.4 and we've followed 
suit, trying to keep things "stock" where possible. 

Our intent is to make /var/lib/pgsql a filesystem in a pool containing no other 
file systems, with SSD-based VDEVs that aren't shared for any other purpose.


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Postgresql 9.4 and ZFS?

2015-09-30 Thread Benjamin Smith

On Wednesday, September 30, 2015 09:58:08 PM Tomas Vondra wrote:
> On 09/30/2015 07:33 PM, Benjamin Smith wrote:
> > On Wednesday, September 30, 2015 02:22:31 PM Tomas Vondra wrote:
> >> I think this really depends on the workload - if you have a lot of
> >> random writes, CoW filesystems will perform significantly worse than
> >> e.g. EXT4 or XFS, even on SSD.
> > 
> > I'd be curious about the information you have that leads you to this
> > conclusion. As with many (most?) "rules of thumb", the devil is
> > quiteoften the details.
> 
> A lot of testing done recently, and also experience with other CoW
> filesystems (e.g. BTRFS explicitly warns about workloads with a lot of
> random writes).
> 
> >>> We've been running both on ZFS/CentOS 6 with excellent results, and
> >>> are considering putting the two together. In particular, the CoW
> >>> nature (and subsequent fragmentation/thrashing) of ZFS becomes
> >>> largely irrelevant on SSDs; the very act of wear leveling on an SSD
> >>> is itself a form of intentional thrashing that doesn't affect
> >>> performance since SSDs have no meaningful seek time.
> >> 
> >> I don't think that's entirely true. Sure, SSD drives handle random I/O
> >> much better than rotational storage, but it's not entirely free and
> >> sequential I/O is still measurably faster.
> >> 
> >> It's true that the drives do internal wear leveling, but it probably
> >> uses tricks that are impossible to do at the filesystem level (which is
> >> oblivious to internal details of the SSD). CoW also increases the amount
> >> of blocks that need to be reclaimed.
> >> 
> >> In the benchmarks I've recently done on SSD, EXT4 / XFS are ~2x
> >> faster than ZFS. But of course, if the ZFS features are interesting
> >> for you, maybe it's a reasonable price.
> > 
> > Again, the details would be highly interesting to me. What memory
> > optimization was done? Status of snapshots? Was the pool RAIDZ or
> > mirrored vdevs? How many vdevs? Was compression enabled? What ZFS
> > release was this? Was this on Linux,Free/Open/Net BSD, Solaris, or
> > something else?
> 
> I'm not sure what you mean by "memory optimization" so the answer is
> probably "no".

I mean the full gamut: 

Did you use an l2arc? Did you use a dedicated ZIL? What was arc_max set to? 
How much RAM/GB was installed on the machine? How did you set up PG? (PG 
defaults are historically horrible for higher-RAM machines) 

> FWIW I don't have much experience with ZFS in production, all I have is
> data from benchmarks I've recently done exactly with the goal to educate
> myself on the differences of current filesystems.
> 
> The tests were done on Linux, with kernel 4.0.4 / zfs 0.6.4. So fairly
> recent versions, IMHO.
> 
> My goal was to test the file systems under the same conditions and used
> a single device (Intel S3700 SSD). I'm aware that this is not a perfect
> test and ZFS offers interesting options (e.g. moving ZIL to a separate
> device). I plan to benchmark some additional configurations with more
> devices and such.

Also, did you try with/without compression? My information so far is that 
compression significantly improves overall performance. 

> > A 2x performance difference is almost inconsequential in my
> > experience, where growth is exponential. 2x performance change
> > generally means 1 to 2 years of advancement or deferment against the
> > progression of hardware; our current, relatively beefy DB servers
> > are already older than that, and have an anticipated life cycle of at
> > leastanother couple years.
> 
> I'm not sure I understand what you suggest here. What I'm saying is that
> when I do a stress test on the same hardware, I do get ~2x the
> throughput with EXT4/XFS, compared to ZFS.

What I'm saying is only what it says on its face: A 50% performance difference 
is rarely enough to make or break a production system; performance/capacity 
reserves of 95% or more are fairly typical, which means the difference between 
5% utilization and 10%. Even if latency rose by 50%, that's typically the 
difference between 20ms and 30ms, not enough that, over the 'net for a 
SOAP/REST call, that anybody'd notice even if it's enough to make you want to 
optimize things a bit. 

> > // Our situation // Lots of RAM for the workload: 128 GB of ECC RAM
> > with an on-disk DB size of ~ 150 GB. Pretty much, everything runs
> > straight out of RAM cache, with only writes hitting disk. Smart
> &

Re: [GENERAL] Postgresql 9.4 and ZFS?

2015-09-30 Thread Benjamin Smith

On Wednesday, September 30, 2015 02:22:31 PM Tomas Vondra wrote:
> I think this really depends on the workload - if you have a lot of
> random writes, CoW filesystems will perform significantly worse than
> e.g. EXT4 or XFS, even on SSD.

I'd be curious about the information you have that leads you to this 
conclusion. As with many (most?) "rules of thumb", the devil is quite often  
the details. 

> > We've been running both on ZFS/CentOS 6 with excellent results, and
> > are considering putting the two together. In particular, the CoW
> > nature (and subsequent fragmentation/thrashing) of ZFS becomes
> > largely irrelevant on SSDs; the very act of wear leveling on an SSD
> > is itself a form of intentional thrashing that doesn't affect
> > performance since SSDs have no meaningful seek time.
> 
> I don't think that's entirely true. Sure, SSD drives handle random I/O
> much better than rotational storage, but it's not entirely free and
> sequential I/O is still measurably faster.
> 
> It's true that the drives do internal wear leveling, but it probably
> uses tricks that are impossible to do at the filesystem level (which is
> oblivious to internal details of the SSD). CoW also increases the amount
> of blocks that need to be reclaimed.
> 
> In the benchmarks I've recently done on SSD, EXT4 / XFS are ~2x faster
> than ZFS. But of course, if the ZFS features are interesting for you,
> maybe it's a reasonable price.

Again, the details would be highly interesting to me. What memory optimization 
was done? Status of snapshots? Was the pool RAIDZ or mirrored vdevs? How many 
vdevs? Was compression enabled? What ZFS release was this? Was this on Linux, 
Free/Open/Net BSD, Solaris, or something else? 

A 2x performance difference is almost inconsequential in my experience, where 
growth is exponential. 2x performance change generally means 1 to 2 years of 
advancement or deferment against the progression of hardware; our current, 
relatively beefy DB servers are already older than that, and have an 
anticipated life cycle of at least another couple years.

// Our situation //
Lots of RAM for the workload: 128 GB of ECC RAM with an on-disk DB size of ~ 
150 GB. Pretty much, everything runs straight out of RAM cache, with only 
writes hitting disk. Smart reports 4/96 read/write ratio. 

Query load: Constant, heavy writes and heavy use of temp tables in order to 
assemble very complex queries. Pretty much the "worst case" mix of reads and 
writes, average daily peak of about 200-250 queries/second. 

16 Core XEON servers, 32 HT "cores". 

SAS 3 Gbps

CentOS 6 is our O/S of choice. 

Currently, we're running Intel 710 SSDs in a software RAID1 without trim 
enabled and generally happy with the reliability and performance we see. We're 
planning to upgrade storage soon (since we're over 50% utilization) and in the 
process, bring the magic goodness of snapshots/clones from ZFS. 

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Postgresql 9.4 and ZFS?

2015-09-29 Thread Benjamin Smith

On Tuesday, September 29, 2015 10:35:28 AM John R Pierce wrote:
> On 9/29/2015 10:01 AM, Benjamin Smith wrote:
> > Does anybody here have any recommendations for using PostgreSQL 9.4
> > (latest) with ZFS?
> 
> For databases, I've always used mirrored pools, not raidz*.


> put pgdata in its own zfs file system in your zpool.  on that dedicated
> zfs, set the blocksize to 8k.

Based on my reading here, that would be -o ashift=13 ? 
HowDoesZFSonLinuxHandleAdvacedFormatDrives

EG: 2^13 = 8192 




-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

[GENERAL] Postgresql 9.4 and ZFS?

2015-09-29 Thread Benjamin Smith

Does anybody here have any recommendations for using PostgreSQL 9.4 (latest) 
with ZFS? 

We've been running both on ZFS/CentOS 6 with excellent results, and are 
considering putting the two together. In particular, the CoW nature (and 
subsequent fragmentation/thrashing) of ZFS becomes largely irrelevant on SSDs; 
the very act of wear leveling on an SSD is itself a form of intentional 
thrashing that doesn't affect performance since SSDs have no meaningful seek 
time. It would seem that PGCon  2013 even had a workshop on it! 
https://www.pgcon.org/2013/schedule/events/612.en.html

The exact configuration we're contemplating is either (3x 400 RAIDZ1) or (4x 
400 RAIDZ2) with Intel Enterprise SATA3 SSDs, with default (lz4) compression 
enabled. 

If this is a particularly good or bad idea, I'd like to hear it, and why? 

Thanks, 

BenP


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Recommendations for SSDs in production?

2011-11-03 Thread Benjamin Smith

On Thursday, November 03, 2011 10:59:37 AM you wrote:
> There's a pretty varied mix of speed, durability, and price with any
> SSD based architecture, but the two that have proven best in our
> testing and production use (for ourselves and our clients) seem to be
> Intel (mostly 320 series iirc), and Fusion-IO. I'd start with looking
> at those.

This is *exactly* the type of feedback that I've been looking for - thanks! 

The Fusion IO looks to be in the "if you have to ask about prices you probably 
can't afford it" range, although getting a million IOPS is damned impressive. 
I'm surprised a bit by the Intel 320 referenced, since this is typically 
touted as a consumer device, but we've consequently decided to give the 710 a 
shot since it's basically a 320 with some reliability upgrades, and see how it 
goes. 

I will post my results here later, time permitting. 

Thanks again. 

-- 
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Recommendations for SSDs in production?

2011-11-02 Thread Benjamin Smith

On Wednesday, November 02, 2011 01:01:47 PM Yeb Havinga wrote:
> Could you tell a bit more about the sudden death? Does the drive still 
> respond to queries for smart attributes?

Just that. It's almost like somebody physically yanked them out of the 
machine, after months of 24x7 perfect performance. A cold reboot seems to 
restore order for a while, but the drives die again similarly fairly soon 
after a failure like this. 

From what I can tell, SMART is not worth much with SSDs. 

-Ben 

-- 
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Recommendations for SSDs in production?

2011-11-02 Thread Benjamin Smith

On Wednesday, November 02, 2011 11:39:25 AM Thomas Strunz wrote:
> I have no idea what you do but just the fact that you bought ssds to 
> improve performance means it's rather high load and hence important. 

Important enough that we back everything up hourly. Because of this, we 
decided to give the SSDs a try. 

> Using a consumer drive for that IMHO is not the best idea. I know a lot 
> about ssds but just in consumer space. Intel has a good reputation in 
> terms of reliability but they are not the fastest. 

Which is what we're trying next, X25E. 710's apparently have 1/5th the rated 
write endurance, without much speed increase, so don't seem like such an 
exciting product. 

> I guess go Intel 
> route or some other crazy expensive enterprise stuff.

It's advice about some of the "crazy expensive enterprise" stuff that I'm 
seeking...? I don't mind spending some money if I get to keep up this level of 
performance, but also am not looking to make somebody's private plane payment, 
either. 

-- 
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

[GENERAL] Recommendations for SSDs in production?

2011-11-02 Thread Benjamin Smith

Well, 

After reading several glowing reviews of the new OCZ Vertex3 SSD last spring, 
we did some performance testing in dev on RHEL6. (CentOS) 

The results were nothing short of staggering. Complex query results returned 
in 1/10th the time as a pessimistic measurement. System loads dropped from 2+ 
to 0.1 or less. 

Wow. 

So after months of using this SSD without any issues at all, we tentatively 
rolled this out to production, and had blissful, sweet beauty until about 2 
weeks ago, now we are running into sudden death scenarios. We have excellent 
backup system, so the damage is reduced to roadbumps, but are looking for a 
longer term solution that doesn't compromise performance too much. 

The config is super-basic, basically no tuning at all was done: 

# fdisk /dev/NNN; 
mke2fs -j $partn; 
mount $partn /var/lib/pgsql; 
rsync -vaz /var/lib/pgsql.old /var/lib/pgsql; 
service postgresql start; 

I don't mind spending some money. Can anybody comment on a recommended drive 
in real world use?

After some review I found: 

1) Micron P300 SSD: claims excellent numbers, can't find them for sale 
anywhere. 

2) Intel X25E - good reputation, significantly slower than the Vertex3. We're 
buying some to reduce downtime. 

3) OCZ "Enterprise" - reviews are mixed. 

4) Kingston "Enterprise" drives appear to be identical to consumer drives with 
a different box. 

5) STEC drives are astronomically expensive. (EG: "You're kidding, right?") 

6) Corsair consumer drives getting excellent reviews, Aberdeen Inc recommended 
in use with RAID 1. 

7) Seagate Pulsar drives, XT.2 drives are expensive SLC but can't find a 
vendor, Pulsar .2 drives are more available but having trouble finding reviews 
other than rehashed press releases. 

Thanks! 

-- 
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] SSDs with Postgresql?

2011-04-19 Thread Benjamin Smith

On Sunday, April 17, 2011 01:55:02 AM Henry C. wrote:
> On Thu, April 14, 2011 18:56, Benjamin Smith wrote:
> > After a glowing review at AnandTech (including DB benchmarks!) I decided
> > to spring for an OCX Vertex 3 Pro 120 for evaluation purposes. It cost
> > about $300
> > 
> >  with shipping, etc and at this point, won't be putting any
> > 
> > Considering that I sprang for 96 GB of ECC RAM last spring for around
> > $5000, even if I put the OCX drives in pairs w/RAID1, I'd still come out
> > well ahead if it allows me to put off buying more servers for a year or
> > two.
> 
> Exactly.  Be aware of the risks, plan for failure and reap the rewards.

Just curious what your thoughts are with respect to buying SSDs and mirroring 
them with software RAID 1. (I use Linux/CentOS) 

-Ben 

-- 
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.

Re: [GENERAL] pgsql 9.0.1 table corruption

2011-04-15 Thread Benjamin Smith

On Friday, April 15, 2011 09:50:57 AM Tom Lane wrote:
> If you simply unpacked the tar archive and started a postmaster on that,
> you'd be pretty much guaranteed to get a corrupt database.  The tar
> archive is not a valid snapshot by itself --- you have to replay
> whatever WAL was generated during the archiving sequence in order to get
> to a consistent database state. 

I have, more than once, "moved" a PG instance from one machine to another with 
the following sequence, without apparent issue. is there anything I'm missing 
and/or need to be concerned with? 

1) service postgresql stop; 
2) rsync -vaz /var/lib/pgsql root@newserver:/var/lib/pgsql; 
3) ssh root@newserver; 
4) (edit postgresql.conf, set IP addresses, memory, etc)
5) service postgresql start; 

This is a "done at 10 PM on Friday night" kind of process. 

-Ben 

-- 
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.

Re: [GENERAL] SSDs with Postgresql?

2011-04-14 Thread Benjamin Smith

After a glowing review at AnandTech (including DB benchmarks!) I decided to 
spring for an OCX Vertex 3 Pro 120 for evaluation purposes. It cost about $300 
with shipping, etc and at this point, won't be putting any 

Considering that I sprang for 96 GB of ECC RAM last spring for around $5000, 
even if I put the OCX drives in pairs w/RAID1, I'd still come out well ahead 
if it allows me to put off buying more servers for a year or two. 

-Ben 

On Thursday, April 14, 2011 02:30:06 AM Leonardo Francalanci wrote:
> have a look at
> 
>  http://postgresql.1045698.n5.nabble.com/Intel-SSDs-that-may-not-suck-td426
> 8261.html
> 
> 
> It looks like those are "safe" to use with a db, and aren't that expensive.

-- 
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.

[GENERAL] SSDs with Postgresql?

2011-04-13 Thread Benjamin Smith

The speed benefits of SSDs as benchmarked would seem incredible. Can anybody 
comment on SSD benefits and problems in real life use? 

I maintain some 100 databases on 3 servers, with 32 GB of RAM each and an 
extremely rich, complex schema. (300+ normalized tables) 

I was wondering if anybody here could comment on the benefits of SSD in 
similar, high-demand rich schema situations? 


-Ben 

-- 
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.

Re: [GENERAL] Web Hosting

2011-03-07 Thread Benjamin Smith

Try this: 

http://lmgtfy.com/?q=web+hosting+postgresql



On Sunday, March 06, 2011 11:33:01 am Eduardo wrote:
> At 17:24 06/03/2011, you wrote:
> >On 3/5/2011 4:08 PM, matty jones wrote:
> >>I already have a domain name but I am looking for a hosting company
> >>that I can use PG with.  The few I have contacted have said that
> >>they support MySQL only and won't give me access to install what I
> >>need or they want way to much.  I don't need a dedicated host which
> >>so far seems the only way this will work, all the companies I have
> >>researched so far that offer shared hosting or virtual hosting only
> >>use MySQL.  I will take care of the setup and everything myself but
> >>I have already written my code using PG/PHP and I have no intention
> >>of switching.
> >>
> >>Thanks.
> >
> >http://hub.org/
> 
> +1

-- 
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.

Re: [GENERAL] pg_dump makes our system unusable - any way to pg_dump in the middle of the day? (postgres 8.4.4)

2011-02-25 Thread Benjamin Smith

I'd also add: run pg_tune on your server. Made a *dramatic* difference for us. 

On Friday, February 25, 2011 05:26:56 am Vick Khera wrote:
> On Thu, Feb 24, 2011 at 6:38 PM, Aleksey Tsalolikhin
> 
>  wrote:
> > In practice, if I pg_dump our 100 GB database, our application, which
> > is half Web front end and half OLTP, at a certain point, slows to a
> > crawl and the Web interface becomes unresponsive.  I start getting
> > check_postgres complaints about number of locks and query lengths.  I
> > see locks around for over 5 minutes.
> 
> I'd venture to say your system does not have enough memory and/or disk
> bandwidth, or your Pg is not tuned to make use of enough of your
> memory.  The most likely thing is that you're saturating your disk
> I/O.
> 
> Check the various system statistics from iostat and vmstat to see what
> your baseline load is, then compare that when pg_dump is running.  Are
> you dumping over the network or to the local disk as well?

-- 
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.

Re: [GENERAL] Massively Parallel transactioning?

2010-08-19 Thread Benjamin Smith

On Wednesday, August 18, 2010 04:58:08 pm Joshua D. Drake wrote:
> Well if you are just using it for updates to the schema etc... you
> should only need to launch a single connection to each database to make
> those changes.

And that's exactly the problem. On each server, we have at least dozens of 
active databases - one for each client. Opening a connection for each database 
starts to become problematic. 

This problem is now solved with prepared transactions. 

-- 
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Massively Parallel transactioning?

2010-08-19 Thread Benjamin Smith

On Wednesday, August 18, 2010 08:40:21 pm Adrian von Bidder wrote:
> Heyho!
> 
> On Thursday 19 August 2010 01.32:06 Benjamin Smith wrote:
> > This way we can be sure that either all the databases are in synch, or
> > that we  need to rollback the program patch/update.
> 
> I guess this might be more a hack than a solution: do the updates in
> batches and use 2pc: first connect to batches of databases, but instead of
> commit, you "prepare to commit".  Prepared commits like this are
> persistent accross connections, so you can come back later and commit or
> rollback.
> 
> Note that such prepared commits will block (some) stuff and use resources
> (not sure how many) before they are finally committed or rolled back, so
> you'll want to make sure they don't stick around too long.

I can't see how this would be a hack, it's EXACTLY what I'm looking for! 

So often I find that when limits in Postgres get in my way, it's because I 
don't understand Postgres well enough. 

Much kudos to all of the Postgres team!

-- 
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Clustering, parallelised operating system, super-computing

2010-08-18 Thread Benjamin Smith

On Thursday, May 13, 2010 11:51:08 pm Brian Modra wrote:
> Maybe the best way to solve this is not to do automatic distribution
> of the data, but rather to provide tools for implementing distributed
> references and joins.

Here's my vote! I'd *LOVE* it if I could do a simple cross-database join 
(without the ugliness of dblink), it would be just awesome. Two beers for 
cross-database foreign keys... 

We already do use dblink extensively with a wrapper. for various reporting 
functions. Since the cross-database queries are in the minority, it does 
function as load balancing, even if the cross-joined queries aren't so 
balanced. 

-- 
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

[GENERAL] Massively Parallel transactioning?

2010-08-18 Thread Benjamin Smith

Is there a way to update a number of databases hosted on a single server 
without opening a separate psql connection to each database? 

We have a cluster of servers hosting an application on Postgres. Right now, we 
have dozens of databases per server, enough that we're starting to have 
problems with our update process. 

When we release updates, we have a migrate script within our update process 
that runs all the database schema updates for all our clients. The way that it 
works is to open a transaction on all the databases concurrently, run the 
commands in sequence on the databases within the transactions, and then commit 
them all (or rollback if there was a problem) 

This way we can be sure that either all the databases are in synch, or that we 
need to rollback the program patch/update. 

So far, it's been a dream, but now, as we continue to grow, we're starting to 
reach connection limits per server. Short of raising the number of 
simultaneous connections, is there a way to run all the transactions for a 
single server for all databases within it on a single (or small number) of 
connections? 

I've tried the following: 

# ATTEMPT 1
$psql -U postgres template1 -h server1; 
template1=# begin transaction; create table testtable (name varchar); 
BEGIN
CREATE TABLE
\c somedatabase; ri
psql (8.4.4, server 8.4.0)
SSL connection (cipher: DHE-RSA-AES256-SHA, bits: 256)
You are now connected to database "somedatabase".
somedatabase=# rollback; 
NOTICE:  there is no transaction in progress
ROLLBACK
somedatabase=# \c template1; 
template1=# rollback; 
NOTICE:  there is no transaction in progress
ROLLBACK
template1=# 

# ATTEMPT 2
$psql -U postgres template1 -h server1; 
template1=# alter table somedatabase.testtable add address varchar; 
ERROR:  cross-database references are not implemented: 
"somedatabase.public.students"
template1=#

Is there a better way?

-- 
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Code tables, conditional foreign keys?

2009-05-26 Thread Benjamin Smith

"A deep unwavering belief is a sure sign that you're missing something."
-- Unknown

I had no intention of sparking an ideological discussion. 

I read Joe's article reference previously - a simple case for using a 
normalized database. I would pretty much agree with his assessment from 
beginning to end. However, he really doesn't address my scenario at all. 
I'm not trying to mash my database together into a single table, I'm 
trying to deal with the fact that we have hundreds of tables with nearly 
identical syntax, but further, deal with the concept of "code tables". 

See our product has to work in many venues, and each venue has their own 
set of code-table data that they'd like to support. Worse, they often use 
similar values for the different things, so the "natural key" is just not 
natural. Sometimes I've seen venues "re-using" the old code table value 
from previous years to mean new things in current/future years. Yes, this 
is a bad, bad, bad idea but it was still there and it's still my job to 
deal with it. 

Surrogate keys are used to make sure that 15 to mean "BS College Degree" 
in venue A aren't confused with 15 to mean "No High School Education" in 
another venue. They cover a similar value, EG: applicant's educational 
level. Some values don't translate at all, (EG: differing representations 
of vocational arts) so using our own code table set and then translating 
doesn't work consistently, either. 

So we have multiple, distinct sets of data to be used within a single 
field. Either that, or we create massive data tables with every possible 
different set of otherwise similar data, each of which has a foreign key 
to a table with a slightly different name, which is, far and away, even 
uglier. (EG: applicants.ca_edlevel, applicants.or_edlevel 
applicants.nv_edlevel, applicants.southca_edlevel...) 

educational level is one example, there are hundreds that we have to deal 
with! 

So back to the first question: is there a way to have a conditional 
foreign key? 

On Saturday 23 May 2009 17:22:36 Lew wrote:
> Conrad Lender wrote:
> > I didn't intend any disrespect to Joe Celko. I have read a number of his
> > articles, which tend to be well written and informative. Last year, when
> > I posted to comp.databases asking for advice on whether to refactor that
> > table, he wrote "You will have to throw it all out and start over with a
> > relational design", "Throw away the idiot who did the EAV. This is not a
> > good design -- in fact, it is not a design at all", and "This is basic
> > stuff!!" Then he copied the same EAV example that was linked earlier by
> > Rodrigo, claiming that "someone like me" had suggested it. With all the
> > respect I have for Mr. Celko, that was hardly helpful, as that example
> > and the situation I had described were quite different. It also did not
> > encourage me to follow his advice and start from scratch (and fire my
> > boss, who was the mentioned "idiot").
>
> If we fired every boss who actually is an idiot there would be about half
> the number of bosses.
>
> All kidding aside, why is the boss specifying a database architecture? 
> That is not the boss's job.
>
> > I understand the problems that can arise from bad design choices, and I
> > know that Celko is vehemently opposed to anything that resembles EAV,
>
> For good reasons.
>
> > but I felt that in our case "throwing it all away" would be excessive.
>
> Perhaps not.  I had a situation some years ago where a supervisor would not
> let me normalize a database and consequently the project nearly failed.
> Fortunately, the company assigned a new team lead/project manager who did
> the normalization or it would have been a disaster.  Trying to make a bad
> approach work is often, if not always, more expensive than replacing it
> with a good approach.
>
> > We had safeguards to ensure referential integrity, and keeping the
> > values in the same table allowed us to let users manage them all with
> > the same form. So I guess it's like Stefan Keller said in a different
> > thread today: "Know when to break the rules."
>
> Managing all the values in the same form is not intrinsically connected to
> whether one stores the values in an EAV layout.
>
> Telling oneself that one should know when to break the rules is not the
> same as knowing when to break the rules.  They are the rules for good
> reason.
>
> All I'm saying is that EAV is a very problematic approach.  I've been on
> projects that tried to use it, and while that didn't make me an expert on
> the matter by any means, it gave me some cause to trust Mr. Celko's opinion
> on the matter.
>
> --
> Lew
>
> --
> Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-general
>
> --
> This message has been scanned for viruses and
> dangerous content by MailScanner, and is
> believed to be clea

Re: [GENERAL] Aggregate Function to return most common value for a column

2009-05-23 Thread Benjamin Smith

I've used this same concept in subqueries for a very long time. Doing this 
allows me to "dive in" and get other values from the joined table, rather than 
just the thing that we're getting the most of. 



- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
"I kept looking for somebody to solve the problem. 
Then I realized... I am somebody!" 

-- Author Unknown

-- 
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.

[GENERAL] Code tables, conditional foreign keys?

2009-05-22 Thread Benjamin Smith

I have some questions about the best way to best use foreign keys in complex 
schemas. It's becoming cumbersome to manage a large set of foreign keys - is 
there a better way? 

// FOUNDATIONAL // 

Let's say that you want to keep addresses, and one of the values that you need 
to keep is the state. So you have two tables defined: 

create table states 
( state varchar unique); 
create table customers 
(... state varchar not null references states(state), ...); 

If you want to be a bit more "pure", you might do it like this: 

create table states
(id serial primary key, state varchar(2), description varchar); 
create table customers 
(... states_id integer not null references states(id), ...); 

So far, so good. But when you have a large number of fields with foreign key 
references, you end up with a bazillion reference tables, all with very 
similar layouts. EG: 

create table customer_types 
(id serial primary key, title varchar(4), description varchar); 
create table customer_taxcode
(id serial primary key, title varchar(4), description varchar); 
... 
create table customers
(... 
customer_types_id integer not null references customer_types(id), 
customer_taxcode_id integer not null references customer_taxcode(id), 
...); 

Getting the appropriate code tables from all these different tables becomes 
cumbersome, just because there are SO MANY tables to get these values from. 

So the next idea is to create a master set of code tables and foreign key to 
there, but this has its own set of problems EG: 

create table codetables 
(
id serial primary key, 
table varchar unique not null
); 
create table codevalues 
(
id serial primary key,
codetables_id integer not null references codetables(id), 
value varchar not null, 
unique(codetables_id, value) 
); 
create table customers 
(
customer_types_id integer not null references codevalues(id), 
customer_taxcode_id integer references codevalues(id), 
)

How do you know that taxcode_id references the correct set of code values? You 
could use a dual foreign key, but then you have to have a field for each and 
every codetable you reference, eg: 

insert into codetables(33, 'customertypes'); 
insert into codevalues(codetables_id, value) values (33, 'Gubbmint'); 
create table customers 
(
customer_types_id integer not null, 
customer_taxcode_id integer, 
custtypes not null default 33, -- the codetables.id for customer types 
taxcodes not null default 34, -- the codetables.id for taxcodes 
foreign key (custtypes, customer_types_id) 
references codevalues(codetables_id, id), 
foreign key (taxcodes, customer_taxcode_id) 
references codevalues(codetables_id, id)
); 

This also becomes cumbersome. Is there a better way? Is there some way to do 
this not covered in the docs? What would be ideal is to treat the reference 
something like a join - might be something like: 

 
create table customers
(
... 
customer_taxcode_id integer not null references codevalues(id) ON 
codevalues.codetables_id = 33, 
...
)
...


Thanks! 

-Ben  

-- 
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.

Re: [GENERAL] Is this good spec for a PostgreSQL server?

2007-09-24 Thread Benjamin Smith

On Wednesday 19 September 2007, Bjørn T Johansen wrote:
> It's a Dell server with the following spec:
> 
> PE2950 Quad-Core Xeon E5335 2.0GHz, dual 
> 4GB 667MHz memory
> 3 x 73GB SAS 15000 rpm disk
> PERC 5/I Integrated controller card (8 ports, 256MB cache, battery backup) x 
6 backplane

Asking "is this a good database server?" is a meaningless question without 
more information. I have an ancient 500 Mhz Pentium III that runs a 
lightweight Postgres database excellently, but I wouldn't recommend it for 
enterprise duty!

I've admin'd a few Dell servers, and consistently ran into minor driver 
niggles. They often pick hardware that isn't supported in the source kernel 
tree, though to their credit, they DO usually provide appropriate drivers. 

In one case, it was an ethernet driver that was unsupported by my distro. 
(RedHat/CentOS) There were sources available that I could recompile, and I 
did, and it worked fine, but it was sure a pain in the [EMAIL PROTECTED] to 
have to 
recompile it everytime a new kernel came out, and there was no way to test 
whether or not the recompile "took" until the reboot - and the reboot is the 
WORST way to test an ethernet driver when you are admining remotely. 

Personally, I prefer generic, white-box solutions, like a Tyan reference 
system, or maybe a SuperMicro. They tend to be conservative in their hardware 
choices, they're quite reliable, very solid performers, and for the price of 
one "on brand" server, you can get two whitebox systems and have a hot 
failover on site. I have 4x quad-core Opteron 1U rackmounts that I've been 
blissfully happy with, 2x 300 GB 10k SCSI (software RAID 1), 4 GB of RAM, 
dual Gb NICs. 

I can pull any one of the RAID 1 drives out any machine, plug it into any 
other machine, and have a working, booted system in < 5 minutes. No driver 
headaches, no hassle, with excellent reliability under load. (knocks on wood) 

Each person picks their favorite blend of poison, I guess. 

-Ben 
-- 
I kept looking for somebody to solve the problem.
Then I realized - I am somebody. 
-- Author Unknown

-- 
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org/

Re: [GENERAL] Stuck on Foreign Keys

2007-09-24 Thread Benjamin Smith

On Wednesday 19 September 2007, Chester wrote:
> Hi
> 
> I have a question regarding foreign keys, I just cannot get it to create 
> them for meI must be doing something wrong but I have no idea what 
> that might be :)
> 
> I have a table "clients"
> 
> clientID (primary)
> ticode
> Firstname
> SecondName
> 
> I have a second table "titles"
> 
> ticode (primary)
> Title
> 
> I am trying to create a foreign key on TIcode "clients" table as in below,
> 
> ALTER TABLE clients ADD CONSTRAINT the_title FOREIGN KEY (ticode) 
> REFERENCES titles (ticode) ;
> 
> I keep getting this error
> 
> ERROR:  insert or update on table "clients" violates foreign key 
> constraint "the_title"
> DETAIL:  Key (ticode)=( ) is not present in table "titles".

This foreign key constraint that every instance of clients.ticode must have a 
corresponding (unique) titles.ticode. But you don't - there are records in 
clients where there's a ticode value that's not found in titles.ticode. 

Cheers! 

-Ben 

-- 
I kept looking for somebody to solve the problem.
Then I realized - I am somebody. 
-- Author Unknown

-- 
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.


---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly

[GENERAL] Postgresql and SSL

2007-09-19 Thread Benjamin Smith

I'm using 8.1 RPMs for CentOS and so far, it's been great. 

Now, I'm going to enable SSL. I had no trouble with the instructions on the 
documentation for server-only certificates, and verified that psql (Linux) 
acknowledges the SSL connection. 

But I am stumped as to how to create a client certificate that's enforced!  

I tried the instructions found 
http://marc.info/?l=tomcat-user&m=106293430225790&w=2

and used the "ca.pem" created there as the postgres root.crt and although the 
PG daemon no longer indicates that it couldn't find root.crt, it also doesn't 
require a client certificate installed to access with psql. 

Any pointers for somebody who is NOT an ssl guru? (like myself!) 

-Ben 

-- 
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.


---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq

Re: [GENERAL] How to enforce uniqueness when NULL values are present?

2007-03-12 Thread Benjamin Smith

On Saturday 10 March 2007, Christian Schröder wrote:
> Let's assume that the values in this table are some limits that are
> given for different data (identified by the id). Some of the limits are
> only valid after a given date, whereas other limits are valid all the
> time. How would you put this information into one or more tables? Of
> course, I could use a special date to indicate that a limit is valid all
> the time (e.g. 1970-01-01), but I don't think that this is better design
> than representing this with a NULL value. Or I could split the data into
> two different tables, one with the date column and one without. But then
> I had to work with two tables with more or less the same meaning.
> Wouldn't it be quite strange to model the same entities (the limits)
> with two tables?

- SNIP - 

>1. If a record with a given id and a null value in the date field
>   exists, no other record with the same id is allowed.
>2. If multiple records with the same id exist, they must have
>   different values in the date field and none of them must have a
>   null value in this field.
Seems to me that this is what you are looking for: 

TABLE listofids:

     Column |       Type       | Modifiers
    +--+---
     id     | integer          | not null
 hasdates | bool| default null 
 unique(id, hasdates) 

TABLE listofidsdates: 
     Column |       Type       | Modifiers
    +--+---
     listofids_id     | integer          | not null REFERENCES listofids(id) 
     date   | date             |  not null 
 unique(listofids_id, date) 

When there are dates, set listofids.hasdates=null. Otherwise, set it to true. 
Does this seem most properly normalized? (it's how I would do it!) How could 
this be done better? 

-Ben

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match

Re: [GENERAL] server vendor recommendation

2007-03-07 Thread Benjamin Smith

I've purchased a number of systems (a dozen or so) from avadirect. 

http://www.avadirect.com

Their prices are excellent, hardware is solid quality, their service is 
median. This is a discount shop, so don't expect lightening support. But you 
can buy three fast AVA systems of top-notch quality for the price of a single 
HP or Sun server with "enterprise support". Those sound like pretty good 
numbers if you are technically inclined. 

I especially like their 1U Tyan-based 1U Opterons with 10k SCSI drives - 
almost all of the systems I've purchased from them have been in this 
configuration and all of these have worked very well for me. I have two right 
now being torture tested in the other room. Quad-core Opterons, 4 GB ECC RAM, 
dual 10k SCSI, great DB servers - sweet! 

I like to buy two identical drives with the storage I need, and use Linux 
kernel software RAID 1. This gives me good performance, redundancy, and 
compatability that doesn't tie me to a specific chipset or RAID card. There 
is room for 4 bays on the front, so I can add two more drives later if it's 
needed. My O/S of choice is CentOS 4. http://www.centos.org 

The only problem I've had with them is that I bought a couple of SuperMicro 1U 
systems in a similar configuration and had issues, but so far, they've been 
resolved equitably. 

-Ben 

PS: No, I don't work for them in any way. 

On Wednesday 07 March 2007, snacktime wrote:
> Any recommendations for vendors that can build custom servers?
> Specifically opteron based with scsi raid.
> 
> Chris
> 
> ---(end of broadcast)---
> TIP 1: if posting/reading through Usenet, please send an appropriate
>subscribe-nomail command to [EMAIL PROTECTED] so that your
>message can get through to the mailing list cleanly
> 
> -- 
> This message has been scanned for viruses and
> dangerous content by MailScanner, and is
> believed to be clean.
> 

---(end of broadcast)---
TIP 6: explain analyze is your friend

Re: [GENERAL] Converting 7.x to 8.x

2007-01-25 Thread Benjamin Smith

On Tuesday 23 January 2007 13:55, Carlos wrote:
> What would be the faster way to convert a 7.4.x database into an 8.x
> database?  A dump of the database takes over 20 hours so we want to convert
> the database without having to do a dump and resptore.

You've probably already accounted for this, but make sure you've tried your 
options for loading the database. Using long ("insert") form vs copy can make 
a *huge* performance difference. 

(Hours vs seconds, in some cases!) 

-Ben 
-- 
"The best way to predict the future is to invent it."
- XEROX PARC slogan, circa 1978

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match

Re: [GENERAL] sequence increment jumps?

2007-01-25 Thread Benjamin Smith

On Thursday 25 January 2007 09:53, Douglas McNaught wrote:
> Nature of the beast.  Sequence increments aren't rolled back on
> transaction abort (for performance and concurrency reasons), so you
> should expect gaps.

Behavior long ago noted and accounted for. But I've always wondered why this 
was so? Is there a specific reason for this behavior? 

-Ben 

-- 
"The best way to predict the future is to invent it."
- XEROX PARC slogan, circa 1978

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

Re: [GENERAL] Postgresql.conf

2007-01-23 Thread Benjamin Smith

Andreas, 

Would you mind explaining what you mean by "localized object names" and why it 
might be bad? Or where I might go to learn more? 

Thanks,

-Ben 

On Tuesday 23 January 2007 07:38, Tino Wildenhain wrote:
> A. Kretschmer schrieb:
> > am  Tue, dem 23.01.2007, um 10:12:13 -0500 mailte Brandon Aiken folgendes:
> >> Out of curiosity, has the COUNT(*) with no WHERE clause slowness been
> >> fixed in 8.x?  Or is it still an issue of "there's no solution that
> >> won't harm aggregates with WHERE clauses"?
> > 
> > I will try it:
> > 
> > scholl=# \timing
> > Timing is on.
> > scholl=# select count(1) from bde_meldungen ;
> 
>  ^^
>   k localized object names ;)))
> 
> Tino
> 
> ---(end of broadcast)---
> TIP 1: if posting/reading through Usenet, please send an appropriate
>subscribe-nomail command to [EMAIL PROTECTED] so that your
>message can get through to the mailing list cleanly
> 
> -- 
> This message has been scanned for viruses and
> dangerous content by MailScanner, and is
> believed to be clean.
> 

-- 
"I kept looking around for somebody to solve the problem. 
Then I realized I am somebody" 
   -Anonymous

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

Re: [GENERAL] Partitioning Vs. Split Databases - performance?

2006-12-21 Thread Benjamin Smith

On Thursday 21 December 2006 14:41, Joshua D. Drake wrote:
>You should read up on schemas and how they work. Plus the
> addition of schemas and table spaces means you can infinite scaling
> within the confines of your hardware itself.

Ok, so I'd like you to correct me if I'm wrong: 

1) Schemas operate within a database. A schema is analogized as a filesystem 
directory in the docs, except that you can't recurse schemas. 

2) A database runs on one machine, with the following addenda: 
A) slony lets you copy that database to another system, 
B) pgtool lets you duplicate the database if you're real careful around 
updating with unique IDs and aggregate functions, 
C) you can essentially do cross-machine RAID so that if your primary DB 
gets 
hosed, you can fire up the backup machine and continue working. 
D) pg-cluster, a synchronous clustering solution appears to be 
new/unstable, 
doesn't appear to be current to 8.x, and takes a performance hit on writes. 

3) Thus, any service that splits up a database (EG: a schema) is subject to 
all the limitations outlined in #2. 

Did I miss anything? 

-- 
"The best way to predict the future is to invent it."
- XEROX PARC slogan, circa 1978

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

Re: [GENERAL] Partitioning Vs. Split Databases - performance?

2006-12-21 Thread Benjamin Smith

On Thursday 21 December 2006 11:47, Ron Johnson wrote:
> This gives you linear growth potential, since if your current box
> gets over-utilized, buy a 2nd box and move some of the databases to it.

So far, I'm inclined to go this way, due to the option for linear scaling. 

> >> 2) Copy out the data specific to a customer and load into separate tables
> >> (with slightly different names, EG table "dates" becomes "cust1_dates")
> >> and 
> >> use data partitioning to help with performance as needed.
> 
> Definitely *not* scalable.  And *very* messy.  Yech.

Scales better than present, methinks, but still not the best idea. I'd have to 
revisit all my queries to make sure that they use the correct tablename. 

> > 3) Put each customer in their own schema/namespace which resides within
> > its own table space.
> >
> > Then you can move customers wherever you need in terms of IO.

How is that functionally different than using a separate database? What's the 
advantage here? I don't *need* to restrict myself to one database, and doing 
this does require that I revisit 100% of the SQL queries to make sure that 
I'm referencing the right schema. 

This solution seems to have the same problems as using dynamic tablenames. 

> Splitting like mentioned in these three tactics means that you've
> now got 2x as many tables.  Add more customers and you've got that
> many more tables.  Perfect candidate for "schema drift".
> 
> If each table has cust_id in it, then you could:
> 
> 4) retain 1 database and partition each table on cust_id.
> 
> http://www.postgresql.org/docs/8.1/interactive/ddl-partitioning.html

Intriguing idea, and one that I might do in the future. However, I have 170 
normalized tables defined at present. The task of auditing each of these 
tables for the constraint ranges sounds somewhat nightmarish. Not all the 
tables have the cust_id record defined - some are implicit. 

> >> Given the same physical hardware, which one is likely to perform better? 
Does
> >> it make any difference? Does using separate databases use more RAM than a
> >> single database with a bunch of different tables?
> 
> Config files are global, so I doubt it.
> 
> >> Company is growing rapidly, so growth room is important...
> 
> Then go for Option 1.

My conclusion, too. Another poster mentioned schema drift, and that's a real 
concern, but we're already updating the schema through a script which could 
be extended to update all databases, not just one, so I feel this problem 
would be minor to non-existent. 

Thanks! 

-Ben 
-- 
"The best way to predict the future is to invent it."
- XEROX PARC slogan, circa 1978

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match

[GENERAL] Partitioning Vs. Split Databases - performance?

2006-12-21 Thread Benjamin Smith

I'm breaking up a database into several sets of data with similar layout. (we 
currently have multiple customers using a single database and tableset, we're 
splitting it out to give us more "wiggle room") 

It seems that there are basically two ways to proceed: 

1) Copy out the data specific to a customer and load into a separate database 
for that customer, or 

2) Copy out the data specific to a customer and load into separate tables 
(with slightly different names, EG table "dates" becomes "cust1_dates") and 
use data partitioning to help with performance as needed. 

Given the same physical hardware, which one is likely to perform better? Does 
it make any difference? Does using separate databases use more RAM than a 
single database with a bunch of different tables? 

Company is growing rapidly, so growth room is important... 

-Ben 
-- 
"The best way to predict the future is to invent it."
- XEROX PARC slogan, circa 1978

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match

[GENERAL] Performance of outer joins?

2006-12-16 Thread Benjamin Smith

I have a situation that can be summarized to the following: 

-- day in 20061215 format 
Create table calendar (
day integer unique not null
); 

Create table customers (
id serial unique not null, 
name varchar, 
address varchar, 
); 

Create table deliveries (
customers_id integer not null references customers(id), 
calendar_day integer not null references calendar(day), 
delivered bool not null default false, 
unique(customers_id, calendar_id)
); 

Imagine tens of thousands of customers, a few million deliveries. A query 
that's structurally similar to the following query is rather slow. It's 
taking over 30 seconds, climbing fast on reasonable HW. (SMP Opteron, 10k 
SCSI, 4 GB RAM) If I remove the outer join, performance is < 1 second. 

SELECT customers.id as customers_id, 
customers.name AS customers_name, 
calendar.day AS calendar_day, 
CASE WHEN (deliveries.delivered IS NULL) THEN 'n/a' 
WHEN (deliveries.delivered=TRUE) THEN 'yes'
ELSE 'no' END AS delivered
FROM customers 
JOIN calendars ON 
(
-- GIVE A CALENDAR OF POSSIBLE DAYS FOR DELIVERIES 
calendar.day < 20061201
AND calendar.day >= 20060101
) 
LEFT OUTER JOIN deliveries ON 
( 
customers.id=deliveries.customers_id 
AND deliveries.calendar_day=calendar.day 
) 
; 

What can I do to improve the performance of this oft-used query? Is there a 
better way to do this, or am I doomed to looping thru results and parsing the 
results in code? 

Thanks, 

-Ben 

-- 
"I kept looking around for somebody to solve the problem. 
Then I realized I am somebody" 
   -Anonymous

---(end of broadcast)---
TIP 6: explain analyze is your friend

Re: [GENERAL] Postgres Team: Thank You All

2006-11-23 Thread Benjamin Smith

On Wednesday 20 September 2006 18:59, Brian Maguire wrote:
> I justed wanted to let you know how impressed and pleased  I have been with
> postgres over the past 5 years .  The timeliness and quality of the releases
> are always robust and stable.  Every release has a very nice mix of admin,
> performance, platform, and feature adds.  The support of the listserves is
> bar none to any commercial support I have ever purchased. I can't remember a
> time I did not get 3-5 answers to any question I have ever had. I must also
> note that the new website design and organization have added very nice
> polish to the project. 

Just found this while searching my email list. I'm CTO of a small (but 
growing!) software company that uses PG exclusively, as the RPMs that come 
with CentOS. 

Under loads heavy and light, with virtually no administration overhead, PG 
chugs along and "just works". 

*round of applause* 

-Ben 

> Cheers and thank you all,
> 
> Brian
> 
>  
> 
>  
> 
> 
> ---(end of broadcast)---
> TIP 1: if posting/reading through Usenet, please send an appropriate
>subscribe-nomail command to [EMAIL PROTECTED] so that your
>message can get through to the mailing list cleanly
> 
> -- 
> This message has been scanned for viruses and
> dangerous content by MailScanner, and is
> believed to be clean.
> 
> 

-- 
"I kept looking around for somebody to solve the problem. 
Then I realized I am somebody" 
   -Anonymous

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly

Re: [GENERAL] off topic - web shop

2006-09-12 Thread Benjamin Smith

On Monday 11 September 2006 11:30, stig erikson wrote:
> Hi.
> We are looking to open a small web shop. I looked around to see if there are 
any open source web shops.
> Can anyone recommend any web shop system (free or non-free)?


I'd guess you're looking for OSCommerce. (Sucks, but less so than most other 
free alternatives) 

Good luck! 

-Ben 

> ---(end of broadcast)---
> TIP 4: Have you searched our list archives?
> 
>http://archives.postgresql.org
> 
> -- 
> This message has been scanned for viruses and
> dangerous content by MailScanner, and is
> believed to be clean.
> 

-- 
"I kept looking around for somebody to solve the problem. 
Then I realized I am somebody" 
   -Anonymous

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match

[GENERAL] Restricting access to rows?

2006-05-25 Thread Benjamin Smith

We have a growing ASP-hosted application built on PHP/Postgres 8.1, and are 
getting requests from clients to manipulate the databases more directly. 
However, the structure of our databases prevents this from happening readily.

Assume I have two tables configured thusly: 

create table customers (
id serial unique not null, 
name varchar not null
); 

create table widgets ( 
customers_id integer not null references customers(id), 
name varchar not null, 
value real not null default 0
);

insert into customers (name) values ('Bob'); 
insert into customers (name) values ('Jane'); 
insert into widgets (customers_id, name, value) VALUES (1, 'Foo', 100); 
insert into widgets (customers_id, name, value) VALUES (1, 'Bar', 50); 
insert into widgets (customers_id, name, value) VALUES (2, 'Bleeb', 500); 

This leaves us with two customers, Bob who has two widgets worth $150, and 
Jane with one widget worth $500. 

How can I set up a user so that Bob can update his records, without letting 
Bob update Jane's records? Is it possible, say with a view or some other 
intermediate data type? 

Thanks, 

-Ben 
-- 
"The best way to predict the future is to invent it."
- XEROX PARC slogan, circa 1978

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org

[GENERAL] Checking for Foreign Keys constraining a record?

2006-04-27 Thread Benjamin Smith

I have a customer table (very important) and have numerous fields in other 
tables FK to the serial id of the customer table. 

There's an option to delete a customer record, but it has to fail if any 
records are linked to it (eg: invoices) in order to prevent the books from 
getting scrambled. 

I want to be able to determine in advance whether or not a record is 
"deleteable" before displaying the button to delete the record. If it's not 
deleteable, it should say so before the user hits the button. 

But, the only way that I've been able to find out if the customer record is 
deletable is to begin a transaction, try to delete it, check to see if it 
worked, and then rollback the session. 

This causes my error logger to log errors everytime somebody looks at a 
customer record, and (I'm sure) is not very efficient. 

Is there a way to ask the database: "Are there any FK constraints that would 
prevent this record from being deleted?" 

Thanks, 

-Ben 
-- 
"The best way to predict the future is to invent it."
- XEROX PARC slogan, circa 1978

---(end of broadcast)---
TIP 6: explain analyze is your friend

[GENERAL] Select first ten of each category?

2006-04-12 Thread Benjamin Smith

I'm stumped on this one... 

I have a table defined thusly: 

create table items ( 
id serial, 
category integer not null references category(id), 
name varchar not null, 
price real, 
unique(category, name)); 

It has a LARGE number of entries. I'd like to grab the 10 most expensive items 
from each category in a single query. How can this be done? Something like 

Select items.* 
FROM items 
where id IN (
select firstTen(id) FROM items 
group by category
ORDER BY price DESC
) 
ORDER BY price desc; 

But I've not found any incantation to make this idea work... 

-Ben 
-- 
"The best way to predict the future is to invent it."
- XEROX PARC slogan, circa 1978

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly

Re: [GENERAL] Advantages of PostgreSQL over MySQL 5.0

2006-03-22 Thread Benjamin Smith

On Wednesday 22 March 2006 03:06, Jimbo1 wrote:
> Hello there,
> 
> I'm a freelance Oracle Developer by trade (can almost hear the boos now
> ;o)), and am looking into developing my own Snowboarding-related
> website over the next few years. Anyway, I'm making some decisions now
> about the site architecture, and the database I'm going to need is
> obviously included. If my site works out, I'm expecting reasonably
> heavy traffic, so want a database that I'm confident can cope with it.

I've built many sites based on PostgreSQL. Originally,like most, I started 
with MySQL, but after I discovered PG in about 2000, I've switched all 
development to it, and have never looked back. I have "enterprise" systems 
developed with PostgreSQL with 500 users, 50 online at a time, > 100 database 
tables. Although the data sample is still not that impressive, (71 MB sql 
file with pg_dump) the database itself is quite complex, with multiple 
foreign keys in a single table being the norm.

It's just been a dream. It's solid, reliable, and virtually always behaves as 
expected. 

My only caveat is that occasionally, you really have to watch the use of 
indexes. I had one query (nasty, with 7-8 tables involved in a combined 
inner->outer->inner join) that was taking some 20 seconds to execute. Just 
changing the order of some of the tables in the query, without logically 
changing the result at all, dropped that time down to < 50 ms! 

> Regarding MySQL, I've been put off by Oracle's recent purchase of
> InnoDB and realise this could badly impact the latest version of the
> MySQL database. I can almost hear Larry Ellison's laughter from here
> (allegedly)! I've also been put off by the heavy marketing propaganda
> on the MySQL website.

Perhaps the single thing I most like about PostgreSQL is the feeling that "it 
can't be taken away from me". The license is sufficiently open, and the 
product is sufficiently stable, that I don't ever wonder if I'm "compliant" 
or "paid up", nor do I wonder if my growth will be particularly limited 
anywhere in the forseeable future. 

> "With MySQL, customers across all industries are finding they can
> easily handle nearly every type of database workload, with performance
> and scalability outpacing every other open source rival. As Los Alamos
> lab (who uses MySQL to manage their terabyte data warehouse) said, "We
> chose MySQL over PostgreSQL primarily because it scales better and has
> embedded replication.".".

PostgreSQL has replication, as well. From what I've read, it's probably about 
on par with MySQL in terms of manageability and reliability. 

But, truthfully, having dealt with database replication, it's a PAIN IN THE 
ARSE and very unlikely worth it. In fact, systems that I've worked on that 
included replication are generally less reliable than those that simply do a 
dump/copy every hour or two, due to the increased management headaches and 
niggling problems that invariably seem to occur. 

Consider replication if the cost of a full-time DB Admin is justified by 
saving perhaps a few hours of uptime per year. If so, go for it. Be honest 
about it - most people grossly overestimate the actual cost of few hours of 
downtime every other year. 

> If any PostgreSQL devotees on this group can comment on the above and
> its accuracy/inaccuracy, I'd really appreciate it.

PG does constraints wonderfully. It's performance is midline with simple 
schemas. It handles very complex schemas wonderfully, and, with a little 
tuning, can make very effective use of memory to speed performance. 

My $0.02. Cheers! 

-Ben 
-- 
"The best way to predict the future is to invent it."
- XEROX PARC slogan, circa 1978

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq

[GENERAL] Updating a sequential range of unique values?

2006-02-17 Thread Benjamin Smith

How can I update a range of constrained values in order, without having to 
resubmit a query for every single possiblity? 

I'm trying to create a customer-specific sequence number, so that, for each 
customer, the number starts at one, and continues, 1, 2, 3, 4, 5... etc. with 
no values skipped. (This is necessary, as the record is used to sort values, 
and the order can be changed by the customer) 

Here's sample code that demonstrates my question: 

create table snark (custid integer not null, custseq integer not null, 
unique(custid, custseq));

insert into snark (custid, custseq) VALUES (1, 2);
insert into snark (custid, custseq) VALUES (1, 4);
insert into snark (custid, custseq) VALUES (1, 3);
insert into snark (custid, custseq) VALUES (1, 1);

begin transaction; 
DELETE FROM snark WHERE custid=1 AND custseq=2; 
UPDATE snark SET custseq=custseq-1 WHERE custid=1 and custseq>2; 

This generates an error! 
ERROR: duplicate key violates unique constraint "snark_custid_key"

I've tried putting an "order by" clause on the query: 

UPDATE snark SET custseq=custseq-1 WHERE custid=1 and custseq>2 
ORDER BY custseq ASC; 

But that got me nowhere. Also, I can't defer the enforcement of the 
constraint, as, according to the manual, this only works for foreign keys. 

Any ideas where to go from here? 

-Ben 
-- 
"The best way to predict the future is to invent it."
- XEROX PARC slogan, circa 1978

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

Re: [GENERAL] Postgresql Segfault in 8.1

2006-01-26 Thread Benjamin Smith

// FIXED // 

Tom, thank you so much for your help! Now running 8.1.2, the query now works 
quickly and properly. 

-Ben 

On Wednesday 25 January 2006 13:17, Benjamin Smith wrote:
> Version: postgresql-8.1.0-4.c4
> 
> I'll have to see about getting an update... 
> 
> Thanks a TON, 
> 
> -Ben 
> 
> On Wednesday 25 January 2006 13:11, you wrote:
> > Benjamin Smith <[EMAIL PROTECTED]> writes:
> > > Aha, yep. Sorry: 
> > > Program received signal SIGSEGV, Segmentation fault.
> > > 0x0043c82c in heap_modifytuple ()
> > > (gdb) bt
> > > #0  0x0043c82c in heap_modifytuple ()
> > > #1  0x0043c8f5 in slot_getattr ()
> > > #2  0x0047a50a in FormIndexDatum ()
> > > #3  0x004ebee3 in ExecInsertIndexTuples ()
> > > #4  0x004e5265 in ExecutorRun ()
> > > #5  0x00564312 in FreeQueryDesc ()
> > > #6  0x00565287 in PortalRun ()
> > > #7  0x00560f8b in pg_parse_query ()
> > > #8  0x00562e0e in PostgresMain ()
> > > #9  0x0053d316 in ClosePostmasterPorts ()
> > > #10 0x0053ea59 in PostmasterMain ()
> > > #11 0x005033c3 in main ()
> > 
> > Oh, so this is happening during index entry creation?  (The reference to
> > heap_modifytuple is misleading, but in a debug-symbol-free backend it's
> > not so surprising.)
> > 
> > This suddenly looks a whole lot like a known bug:
> > http://archives.postgresql.org/pgsql-hackers/2005-11/msg01016.php
> > 
> > Which version did you say you were using exactly?  That bug is fixed
> > in 8.1.1 ...
> > 
> > regards, tom lane
> > 
> > -- 
> > This message has been scanned for viruses and
> > dangerous content by MailScanner, and is
> > believed to be clean.
> > 
> 
> -- 
> "The best way to predict the future is to invent it."
> - XEROX PARC slogan, circa 1978
> 
> ---(end of broadcast)---
> TIP 5: don't forget to increase your free space map settings
> 
> -- 
> This message has been scanned for viruses and
> dangerous content by MailScanner, and is
> believed to be clean.
> 

-- 
"I kept looking around for somebody to solve the problem. 
Then I realized I am somebody" 
   -Anonymous

-- 
"The best way to predict the future is to invent it."
- XEROX PARC slogan, circa 1978

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org

Re: [GENERAL] Postgresql Segfault in 8.1

2006-01-25 Thread Benjamin Smith

Version: postgresql-8.1.0-4.c4

I'll have to see about getting an update... 

Thanks a TON, 

-Ben 

On Wednesday 25 January 2006 13:11, you wrote:
> Benjamin Smith <[EMAIL PROTECTED]> writes:
> > Aha, yep. Sorry: 
> > Program received signal SIGSEGV, Segmentation fault.
> > 0x0043c82c in heap_modifytuple ()
> > (gdb) bt
> > #0  0x0043c82c in heap_modifytuple ()
> > #1  0x0043c8f5 in slot_getattr ()
> > #2  0x0047a50a in FormIndexDatum ()
> > #3  0x004ebee3 in ExecInsertIndexTuples ()
> > #4  0x004e5265 in ExecutorRun ()
> > #5  0x00564312 in FreeQueryDesc ()
> > #6  0x00565287 in PortalRun ()
> > #7  0x00560f8b in pg_parse_query ()
> > #8  0x00562e0e in PostgresMain ()
> > #9  0x0053d316 in ClosePostmasterPorts ()
> > #10 0x0053ea59 in PostmasterMain ()
> > #11 0x005033c3 in main ()
> 
> Oh, so this is happening during index entry creation?  (The reference to
> heap_modifytuple is misleading, but in a debug-symbol-free backend it's
> not so surprising.)
> 
> This suddenly looks a whole lot like a known bug:
> http://archives.postgresql.org/pgsql-hackers/2005-11/msg01016.php
> 
> Which version did you say you were using exactly?  That bug is fixed
> in 8.1.1 ...
> 
>   regards, tom lane
> 
> -- 
> This message has been scanned for viruses and
> dangerous content by MailScanner, and is
> believed to be clean.
> 

-- 
"The best way to predict the future is to invent it."
- XEROX PARC slogan, circa 1978

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

Re: [GENERAL] Postgresql Segfault in 8.1

2006-01-25 Thread Benjamin Smith

Tom, 

Since we host customer data, I have to get OK from the company attorney before 
I can give you a full "howto create". I've been unable to recreate it without 
a full database dump. I'm waiting for a call back on that. I also can't 
recreate it on IA32. 

I tried to replicate the issue on a uniproc P4/32, but it worked fine there, 
so it does seem to be something specific about the fact that it's either 
X86/64 or that it's dual proc. The production server has 4GB of ECC RAM. 

I can consistently create the problem by dumping and reloading the database to 
a different PG database, and running it there, so AFAICT I'm not bugging 
anybody when I run this query. 

In the meantime, I found the "debuginfo" rpm, and installed it without a 
hitch. Luckily, it seems to "take effect" without having to restart the PG 
daemon. (which is busy serving 10-20 people at any given moment...) 

Again, here's the output from gdb. This looks a bit more useful, I hope this 
helps! 

Program received signal SIGSEGV, Segmentation fault.
slot_deform_tuple (slot=0xa669c8, natts=36) at heaptuple.c:1262
1262off = att_addlength(off, thisatt->attlen, tp + off);
(gdb) bt
#0  slot_deform_tuple (slot=0xa669c8, natts=36) at heaptuple.c:1262
#1  0x0043c8f5 in slot_getattr (slot=0xa669c8, attnum=36, 
isnull=0x7fbfffde87 "")
at heaptuple.c:1367
#2  0x0047a50a in FormIndexDatum (indexInfo=0xa66b60, slot=0xa669c8,
estate=0xa61190, values=0x7fbfffdf10, isnull=0x7fbfffdef0 "") at 
index.c:962
#3  0x004ebee3 in ExecInsertIndexTuples (slot=0xa669c8, 
tupleid=0xa6efc4,
estate=0xa61190, is_vacuum=0 '\0') at execUtils.c:925
#4  0x004e5265 in ExecutorRun (queryDesc=Variable "queryDesc" is not 
available.
) at execMain.c:1437
#5  0x00564312 in ProcessQuery (parsetree=Variable "parsetree" is not 
available.
) at pquery.c:174
#6  0x00565287 in PortalRun (portal=0xa5ed70, 
count=9223372036854775807,
dest=0xa596f8, altdest=0xa596f8, completionTag=0x7fbfffe380 "") at 
pquery.c:1076
#7  0x00560f8b in exec_simple_query (
query_string=0xa440e0 "INSERT INTO lcclasses (id, schoolyear, modified, 
entrydate, creator, status, name, location, city, maxclasssize, 
prerequisites, cost, costnote, coursecode, section, credits, whytake, 
materialsnote, te"...) at postgres.c:1014
#8  0x00562e0e in PostgresMain (argc=4, argv=0xa0cca0,
username=0xa0cc60 "cworksdev") at postgres.c:3168
#9  0x0053d316 in ServerLoop () at postmaster.c:2852
#10 0x0053ea59 in PostmasterMain (argc=5, argv=0x9ea510) at 
postmaster.c:943
#11 0x005033c3 in main (argc=5, argv=0x9ea510) at main.c:256
(gdb) continue
Continuing.

Program terminated with signal SIGSEGV, Segmentation fault.
The program no longer exists.

##
Postgresql.conf 
listen_addresses = '127.0.0.1'
port = 5432
max_connections = 96
shared_buffers=25
temp_buffers = 1
max_prepared_transactions = 0
work_mem = 1024 # min 64, size in KB
maintenance_work_mem = 16384# min 1024, size in KB
max_stack_depth = 9240
redirect_stderr = on# Enable capturing of stderr into log
log_directory = 'pg_log'# Directory where log files are 
written
log_truncate_on_rotation = on   # If on, any existing log file of the 
same
log_rotation_age = 1440 # Automatic rotation of logfiles will
log_rotation_size = 0   # Automatic rotation of logfiles will
autovacuum = on
autovacuum_naptime = 600
lc_messages = 'en_US.UTF-8' # locale for system error 
message
lc_monetary = 'en_US.UTF-8' # locale for monetary 
formatting
lc_numeric = 'en_US.UTF-8'      # locale for number formatting
lc_time = 'en_US.UTF-8' # locale for time formatting
add_missing_from = on

-Ben 

On Wednesday 25 January 2006 11:18, you wrote:
> Benjamin Smith <[EMAIL PROTECTED]> writes:
> > OK, here's the output: 
> > (gdb) continue
> > Continuing.
> 
> > Program received signal SIGSEGV, Segmentation fault.
> > 0x0043c82c in heap_modifytuple ()
> > (gdb)  
> 
> > // not very hopeful, I'd think // 
> 
> You forgot the "bt" part ... although I'm not sure we'd learn a whole
> lot more without debug symbols.
> 
> > Can I get a RHES/CENTOS PG 8.1 RPM with the symbols enabled?
> 
> Current Red Hat practice is to put the debug symbols into separate
> "debuginfo" RPMs.  Hopefully you can find the debuginfo RPM wherever
> you got the postgres RPM from.
> 
>

Re: [GENERAL] Postgresql Segfault in 8.1

2006-01-25 Thread Benjamin Smith

OK, here's the output: 

(gdb) continue
Continuing.

Program received signal SIGSEGV, Segmentation fault.
0x0043c82c in heap_modifytuple ()
(gdb)  

// not very hopeful, I'd think // 

Can I get a RHES/CENTOS PG 8.1 RPM with the symbols enabled? Unfortunately, I 
don't think I can give out a dump of the DB (heavily constrained) because of 
private customer information... and the query works *FINE* with different 
datasets. There's something specific about THIS QUERY that's causing the 
failure. 

I'm going to try to get this to fail on another system that's not in 
production use, though it's a uniprocessor P4. 

-Ben 

On Wednesday 25 January 2006 07:52, you wrote:
> Benjamin Smith <[EMAIL PROTECTED]> writes:
> > What's the best way to do this? Take PG down (normally started as a 
service) 
> > and run directly in a single-user mode? 
> 
> No, just start a psql session in one window, then in another window
> determine the PID of the backend process it's connected to, and attach
> gdb to that process.  Something like
> 
>   ps auxww | grep postgres:
>   ... eyeball determination of correct PID ...
>   gdb /path/to/postgres-executable PID
>   gdb> continue
> 
> Now, in the psql window, do what's needed to provoke the crash.  gdb
> should trap at the instant of the segfault and give you another gdb>
> prompt.  Type "bt" to get the backtrace, then "q" to disconnect.
> 
>   regards, tom lane
> 
> -- 
> This message has been scanned for viruses and
> dangerous content by MailScanner, and is
> believed to be clean.
> 

-- 
"The best way to predict the future is to invent it."
- XEROX PARC slogan, circa 1978

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match

Re: [GENERAL] Postgresql Segfault in 8.1

2006-01-24 Thread Benjamin Smith

Thanks, 

What's the best way to do this? Take PG down (normally started as a service) 
and run directly in a single-user mode? 

I've never reallly worked with gdb... 

-Ben 

On Tuesday 24 January 2006 17:27, you wrote:
> > What information do you need to help figure this out? 
> 
> Reproduce it with gdb attached to the backend process and post the
> backtrace...  You may need to recompile PG with debugging symbols to
> get the most info.

-- 
"The best way to predict the future is to invent it."
- XEROX PARC slogan, circa 1978


---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match

[GENERAL] Postgresql Segfault in 8.1

2006-01-24 Thread Benjamin Smith

I'm running PostgreSQL 8.1 on CentOS 4.2, Dual proc Athlon 64 w/4 GB RAM. 

I'm trying to get a PHP app to work, but the failure happens when the command 
is copy/pasted into pgsql. Trying to run a large insert statement, and I get: 

server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
The connection to the server was lost. Attempting reset: Failed.
# 

in /var/log/messages, I see 

Jan 24 17:00:04 kepler kernel: postmaster[26185]: segfault at 2516d728 
rip 0043c82c rsp 007fbfffddd0 error 4

The insert statement is long, but doesn't seem to violate anything strange - 
no weird characters, and all the fields have been properly escaped with 
pg_escape(). 

What information do you need to help figure this out? 

-Ben 
-- 
"The best way to predict the future is to invent it."
- XEROX PARC slogan, circa 1978

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq

Re: [GENERAL] Putting restrictions on pg_dump?

2006-01-05 Thread Benjamin Smith

Good ideas, all. but, what about keeping things like check constraints, 
foreign keys, etc? 

Hmmm... maybe, if I dumped the entire DB schema, with no data, and then looped 
thru the tables, creating a temp table (as you describe) with a funky name 
(such as TABLEaBcDeFgH_U) and then pg_dumping that, and then using a regex to 
rename the table in the output... (eg 

/TABLE\s+TABLEaBcDeFgH_U/TABLE customers/

Ugh. I was hoping there was a cleaner way...

-Ben 

On Wednesday 04 January 2006 23:35, you wrote:
> On Wed, 4 Jan 2006 21:00:25 -0800, Benjamin Smith <[EMAIL PROTECTED]> 
wrote:
> > Is there a way to put a limit on pg_dump, so that it doesn't dump ALL 
data, 
> > but that matching a particular query? 
> > 
> > Something like: 
> > 
> > pg_dump -da --attribute-inserts -t "customers" \
> > --matching-query="select * from customers where id=11"; 
> > 
> > I'd like to selectively dump information from a query, but using the 
output 
> > format from pg_dump so that it can be used to create a (partial) database. 
> > 
> > Can this sort of thing be done? 
> 
> Not directly with pg_dump.  
> 
> You could create a table (create table customers_1 as select * from
> customers where id=11) and dump that but remember to change the
> tablename in the dump file or after loading it. You dont get any
> pk/fk/indexes on the table definition.
> 
> You could also use copy to stdout/stdin.
> 
> eg dump
> psql -d dbname -c "create temp table dump as select * from customers
> where id=11; copy dump to stdout;" >dumpfile
> 
> eg restore
> psql -d newdb -c "copy customers from stdin"  
> You might need to play around with supplying username/password.
> 
> klint.
> 
> +---+-+
> : Klint Gore: "Non rhyming:
> : EMail   : [EMAIL PROTECTED]   :  slang - the:
> : Snail   : A.B.R.I.:  possibilities  :
> : Mail  University of New England   :  are useless"   :
> :   Armidale NSW 2351 Australia : L.J.J.  :
> : Fax : +61 2 6772 5376 : :
> +---+-+
> 

-- 
"The best way to predict the future is to invent it."
- XEROX PARC slogan, circa 1978

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org

[GENERAL] Putting restrictions on pg_dump?

2006-01-04 Thread Benjamin Smith

Is there a way to put a limit on pg_dump, so that it doesn't dump ALL data, 
but that matching a particular query? 

Something like: 

pg_dump -da --attribute-inserts -t "customers" \
--matching-query="select * from customers where id=11"; 

I'd like to selectively dump information from a query, but using the output 
format from pg_dump so that it can be used to create a (partial) database. 

Can this sort of thing be done? 

-Ben 
-- 
"The best way to predict the future is to invent it."
- XEROX PARC slogan, circa 1978

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org

Re: [GENERAL] Anyone doing a 8.1.0-ia64-RHEL4-as.rpm ?

2006-01-04 Thread Benjamin Smith

I'm using CentOS 4.2 on a dual-opteron, but I'd guess it'd probably work for 
IA64... 

Try putting this somewhere in your /etc/yum.repos.d/... 

#additional packages that extend functionality of existing packages
[centosplus]
name=CentOS-$releasever - Plus
baseurl=http://mirror.centos.org/centos/$releasever/centosplus/$basearch/
gpgcheck=1
enabled=1
includepkgs=postgresql* compat-postgresql-libs*
gpgkey=http://mirror.centos.org/centos/RPM-GPG-KEY-centos4

You'll need to import the RPM-GPG-KEY first, and make sure you backup your 
postgres. Simply yum update from PG 7.x to 8.x will have some nasty surprises 
without a dump/reload. 

Then, use yum... 

-Ben 

On Wednesday 04 January 2006 03:04, DANTE ALEXANDRA wrote:
> Hello,
> 
> Is anyone working on an 8.1.0 RPM for IA64 on Red Hat Enterprise Linux 4 
> AS ?
> On the web site : 
> http://www.postgresql.org/ftp/binary/v8.1.1/linux/rpms/redhat/, I have 
> found a "rhel-as 4" version for RHEL4 but I don't think it is a version 
> for IA64.
> 
> Could someone explain me how generate this rpm ?
> 
> Thank you for your help.
> Regards,
> Alexandra DANTE
> 
> ---(end of broadcast)---
> TIP 4: Have you searched our list archives?
> 
>http://archives.postgresql.org
> 

-- 
"The best way to predict the future is to invent it."
- XEROX PARC slogan, circa 1978

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq

Re: [GENERAL] Performance woes

2005-12-12 Thread Benjamin Smith

Wow! I did exactly what you suggested, and played with the ordering for about 
20-30 minutes. After 10 minutes or so, I'd gotten the response time down to 
1700 ms from 2200 ms. 

Moving the join conditions up into the "FROM" clause, and dropping the "WHERE" 
clause altogether allowed me to reorder the statements easily without having 
to worry about rethinking all the logic. 

And, this dropped the query time from between 2.2-30 seconds all the way down 
to just 55-ish ms, without any new indexes! What's more, the improvement came 
from a move of a block I thought more or less unimportant! 

// tries to put jaw back into mouth // 

-Ben 

On Monday 12 December 2005 16:11, you wrote:
> Benjamin Smith <[EMAIL PROTECTED]> writes:
> > The example that I gave was a small one to illustrate my understanding of 
> > multiple foreign keys, indexes and how they work together. (or don't) The 
> > actual query is quite a bit bigger and nastier. I've considered breaking 
it 
> > up into smaller pieces, but this query has been extensively tested and 
> > debugged. It's on a complex schema carefully designed to meet some very 
> > demanding requirements. 
> 
> What you probably need to do is rethink the join order.  As coded, the
> planner has no freedom to change the join order, which means it's up to
> you to get it right.  In particular it seems a bad idea to be processing
> the join to enrollments last when that table is the best-constrained
> one.  Instead of "enrollments, stmoduleobjass LEFT JOIN lots-o-stuff"
> consider "enrollments JOIN stmoduleobjass ON relevant-join-conditions
> LEFT JOIN lots-o-stuff".  Likewise for lcregistrations vs lcclasses.
> 
>   regards, tom lane
> 

-- 
"The best way to predict the future is to invent it."
- XEROX PARC slogan, circa 1978

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

Re: [GENERAL] Performance woes

2005-12-12 Thread Benjamin Smith

id = 18) AND ((schoolyear)::text = '2005 
- 2006'::text))
   ->  Hash  (cost=5.19..5.19 rows=12 width=20) (actual 
time=0.102..0.102 rows=10 loops=1)
 ->  Bitmap Heap Scan on tr_summary  (cost=2.04..5.19 
rows=12 width=20) (actual time=0.043..0.063 rows=10 loops=1)
   Recheck Cond: (scope_id = 18)
   ->  Bitmap Index Scan on unique_rollover  
(cost=0.00..2.04 rows=12 width=0) (actual time=0.027..0.027 rows=10 loops=1)
 Index Cond: (scope_id = 18)
 Total runtime: .063 ms
**

Configuration: I tried tweaking shared_buffers, but adding more/less did 
nothing to improve performance. Current values: 

Dual proc Opteron 2.0 Ghz, 
4 GB ECC RAM. 
10k SCSI drives, software RAID 1
Centos 4.2 (Redhat ES clone) 
PostgreSQL 8.1, 64 bit, loaded with RPMs from the PG website for Redhat ES. 

max_connections 64
shared_buffers 25
temp_buffers 1
max_prepared_transactions = 0 
work_mem 1024 
maintenance_work_mem = 16384 
max_stack_depth 9240 
autovacuum on 
autovacuum_naptime 600 

Side note: When I add indexes or change table definitions to try to get PG to 
use indexes, performance tanks instantly to very, poor. (> 30 seconds query 
time) But, when I run the query a few times, and then run vacuum analyze, it 
snaps back down to the 2-ish second range. 

-Ben 

On Saturday 10 December 2005 11:50, Stephan Szabo wrote:
> 
> On Sat, 10 Dec 2005, Benjamin Smith wrote:
> 
> > A few questions:
> >
> > 1) Let's assume that I have some multipile foreign keys, and I join on 
three
> > values. For example:
> >
> > Create table gangsters (
> > name varchar not null,
> > birthdate integer not null,
> > shirtnumber integer not null,
> > primary key (name, birthdate, shirtnumber);
> >
> > create table children (
> > father_name varchar not null,
> > father_bd integer not null,
> > father_shirtnumber integer not null,
> > birthdate integer not null,
> > name varchar not null,
> > foreign key (father_name, father_bd, father_shirtnumber) REFERENCES
> > gangsters(name, birthdate, shirtnumber)
> > );
> >
> > We have two table declarations, each with implicit indexes:
> > 1) table gangsters has a primary_key index on name, birthdate, 
shirtnumber.
> > 2) children has an implicit index on father_name, father_bd,
> > father_shirtnumber. (right?)
> 
> AFAIK, not unless you create one.
> 
> Explain analyze output for the query would probably be useful as well.
> 
> ---(end of broadcast)---
> TIP 9: In versions below 8.0, the planner will ignore your desire to
>choose an index scan if your joining column's datatypes do not
>match
> 

-- 
"The best way to predict the future is to invent it."
- XEROX PARC slogan, circa 1978

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match

[GENERAL] Performance woes

2005-12-10 Thread Benjamin Smith

I have a small company growing fast, selling a product based largely on 
Postgres. We have a rapidly growing database with (currently) 117 tables. 

Our primary server, an aging system based around an IDE, 2.4 Ghz P4, is being 
replaced by a new, 10k SCSI, dual-proc AMD opteron/64, with 4 GB of ECC RAM. 

At just about every task, the newer machine just blows away the P4. Rebuilding 
the database happens in < 1 minute, instead of nearly 1/2 hour! Copying GB of 
data files is blazing fast! 

But, the real money shot is a single query. It's big, nasty, and complex, and 
hit pretty hard. It's not IO bound, it is clearly CPU bound. I've allocated 
up to 3 GB of RAM for pg, and tweaked the shmmax and shared_buffers. 

And, it's exactly as fast on the new, dual-proc Opteron as the aging P4. 2.2 
seconds. It's literally within 1 ms time! (2,206 ms vs 2,207 ms) Throwing 
more RAM at it makes no difference. 

WTF??!?!? 

A few questions: 

1) Let's assume that I have some multipile foreign keys, and I join on three 
values. For example: 

Create table gangsters ( 
name varchar not null, 
birthdate integer not null, 
shirtnumber integer not null, 
primary key (name, birthdate, shirtnumber); 

create table children ( 
father_name varchar not null, 
father_bd integer not null, 
father_shirtnumber integer not null, 
birthdate integer not null, 
name varchar not null, 
foreign key (father_name, father_bd, father_shirtnumber) REFERENCES 
gangsters(name, birthdate, shirtnumber)
); 

We have two table declarations, each with implicit indexes: 
1) table gangsters has a primary_key index on name, birthdate, 
shirtnumber. 
2) children has an implicit index on father_name, father_bd, 
father_shirtnumber. (right?) 

If I were to join on gangster and children, EG: 

Select gangster.name AS father, 
gangster.birirthdate AS father_bd, 
children.name AS kid_name 
from gangster, children 
where gangster.name=children.father_name 
AND gangster.birthdate = children.father_bd
AND gangster.shirtnumber=children.father_shirtnumber; 

Wouldn't this use the indexes implicitly created in the primary_key and 
foreign key constraints? 



-- 
"The best way to predict the future is to invent it."
- XEROX PARC slogan, circa 1978

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

[GENERAL] Errors upgrading from 7.3 to 8.1

2005-11-28 Thread Benjamin Smith

Currently running Postgres 7.3.9 32bit on Fedora Core 1, and upgrading to PG 
8.1 64bit on Centos 4. 

When I load the file, 
psql -U dbname < dbname.sql 

I get this error: 
ERROR:  invalid UTF-8 byte sequence detected near byte 0x96
when inserting fields that seem to contain HTML. What could be causing this? 
My understanding is that pg_dump should properly escape things so that I'm 
not trying to dump/load things improperly. 

The dumps are made (on the PG 7.3 server) 
pg_dump -d -f $OUTPUT.pgsql $db 

Are being restore with (on the new 8.1 server) 
psql -U $db -e < $OUTPUT.pgsql 

-Ben 
-- 
"The best way to predict the future is to invent it."
- XEROX PARC slogan, circa 1978

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

Re: [GENERAL] SQL injection

2005-11-04 Thread Benjamin Smith

Prepared statements are the way to go. 

I developed my own prepared statements methodology (I called it "SafeQuery") 
some time back before discovering that others had done it. It's so nice, 
since I've not worried about SQL injection for YEARS. 

Sample of my API: 

$_REQUEST['username'], 
 'password'=>$_REQUEST['password']); 
if (!$res=$MDB->SafeQuery($sql, $todb)) 
 return Error("Database query failure"); 
?> 

SafeQuery checks: 
1) That the variables in the query (in brackets) and in the input array 
all 
match up. 
2) Runs pg_escape_string on all elements in $todb; 
3) Copy/Pastes strings from the array into the query. 
4) Runs query against DB, returns results from pg_exec(); 

-Ben 

On Tuesday 01 November 2005 05:27, Kevin Murphy wrote:
> Can some knowledgeable person set the record straight on SQL injection, 
> please?  I thought that the simple answer was to use prepared statements 
> with bind variables (except when you are letting the user specify whole 
> chunks of SQL, ugh), but there are many people posting who either don't 
> know about prepared statements or know something I don't.
> 
> Thanks,
> Kevin Murphy
> 
> P.S.  I don't use PHP, but google informs me that PHP definitely has 
> prepared statement options: PEAR::DB, PDO in 5.X+, etc.
> 
> ---(end of broadcast)---
> TIP 2: Don't 'kill -9' the postmaster
> 

-- 
"The best way to predict the future is to invent it."
- XEROX PARC slogan, circa 1978

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match

[GENERAL] Dumb Questions - upgrade notes?

2005-11-01 Thread Benjamin Smith

I'm running 7.3, and considering the upgrade to 8.1 to make use of multiple 
indexes. Where is the upgrade notes from 7.3->7.4, and from 7.4-> 8.x so that 
I can see what impact this would have on my app? I can't seem to find them... 

Thanks

-Ben 
-- 
"The best way to predict the future is to invent it."
- XEROX PARC slogan, circa 1978

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq

Re: [GENERAL] Frequency of Analyze?

2005-11-01 Thread Benjamin Smith

Wow. 

Does it really produce the expected (probably dramatic, in my case!) 
improvement in performance? (I'll be trying it out anyway..., but I'd love 
your feedback) 

How stable is it? Looks like I have a PG upgrade in my near future... 

-Ben 

On Friday 28 October 2005 14:51, you wrote:
> > 3) What's the best way to handle indexes when only 1 index is used per 
table 
> > in a query, but there are several due to the various constraints on it? Is 
> > there a way to benefit from all of these other indexes somehow? Is there a 
> > definitive, detailed book for optimizing PG queries? 
> 
> 8.1 can utilize multiple indexes per table.

-- 
"The best way to predict the future is to invent it."
- XEROX PARC slogan, circa 1978

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org

[GENERAL] Frequency of Analyze?

2005-10-28 Thread Benjamin Smith

I have a rapidly growing database with a very complex schema, and I'm looking 
to improve performance. It's typical to have 2-4 foreign keys in each table, 
and there are currently 113 tables, and queries with 5-10 tables with 
combined inner/outer joins are pretty typical. (I avoid subqueries anywhere I 
can) 

So far, my guiding philosophy has been "constrain everything" with primary 
keys, unique, foreign keys and the like, relying on the implicit indexes and 
the query scheduler to handle things, and so far, it's worked very well. 

The database has grown in total size (calculated by the size of a pg_dump) 25% 
in the past month, and the growth rate seems to be accellerating. (yikes!) We 
are buying new equipment now, but I'd still like to optimize as best as 
possible. 

A few questions: 

1) Any problem with running "Analyze" hourly via cron? 

2) Does "Vacuum analyze" also do the actions performed by "Analyze"? 

3) What's the best way to handle indexes when only 1 index is used per table 
in a query, but there are several due to the various constraints on it? Is 
there a way to benefit from all of these other indexes somehow? Is there a 
definitive, detailed book for optimizing PG queries? 

-Ben 

-- 
"The best way to predict the future is to invent it."
- XEROX PARC slogan, circa 1978

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

Re: [GENERAL] Why different execution times for different instances for the

2005-10-25 Thread Benjamin Smith

I ran into something like this once, where a complex update occurred inside a 
transaction. When the update happened, I saw what you describe - the DB hung, 
and the load average shot out thru the roof until I restarted the PG daemon.

The query otherwise worked fine, but only failed with this specific set of 
input data, about which nothing was unusual.  (and now long since gone) 

I never found out exactly what it was that caused the problem, but rewording 
the query to something functionally identical but differently structured 
solved it. I've never seen the problem since. Wish I had more to say that 
might help identify the actual problem. 

I'm using Postgres 7.3.9 on Fedora Core 1. 

-Ben 

On Tuesday 25 October 2005 13:33, Oliver Elphick wrote:
> On Tue, 2005-10-25 at 13:14 -0700, Kishore B wrote: 
> > Hi , 
> > I am Kishore doing freelance development of J2EE applications. 
> >  
> > 
> > We switched to use Postgresql recently because of the advantages it
> > has over other commercial databases. All went well untill recently,
> > untill we began working on an application that needs to maintain  a
> > huge database
> > 
> > 
> >  
> > We are executing a single query that returned very fast on the first
> > instance. But when I executed the same query for multiple times, it is
> > giving strange results. It is not coming back. 
> >  
> > When I checked with the processes running in the system, I observed
> > that multiple instances of postmaster are running and all of them are
> > consuming very high amounts of memory. I could also observe that they
> > are sharing the memory in a uniform distribution across them. 
> >  
> > Can you please let me know if you have experienced the same and how do
> > you resolved it.
> >  
> > Thank you,
> > Kishore.
> 
> No, I haven't seen anything like that.
> 
> I am forwarding this mail to the general enquiries list, in case anyone
> else can help.
> 
> -- 
> Oliver Elphick  olly@lfix.co.uk
> Isle of Wight  http://www.lfix.co.uk/oliver
> GPG: 1024D/A54310EA  92C8 39E7 280E 3631 3F0E  1EC0 5664 7A2F A543 10EA
>  
>Do you want to know God?   http://www.lfix.co.uk/knowing_god.html
> 
> 
> ---(end of broadcast)---
> TIP 5: don't forget to increase your free space map settings
> 

-- 
"The best way to predict the future is to invent it."
- XEROX PARC slogan, circa 1978

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

[GENERAL] Dumb question about count()

2005-07-21 Thread Benjamin Smith

I'm sure I've done this before, but for some reason, my main noodle is drawing 
a blank. Assume we have three tables defined thusly: 

create table classrooms (
id serial unique not null, 
name varchar
); 

create table seats (
classrooms_id integer not null references classrooms(id), 
position varchar
); 

create table students(
classrooms_id integer not null references classrooms(id), 
name varchar
); 

Now, I want to get a result like: 

classroom | students | seats 
101A0   25
101B22  30
102A11  0
... etc. 

Something somewhat akin to 

select classroom.title, 
count(students.id) AS students, 
count(seats.id) AS seats
from classrooms, students, seats
where classrooms.id=students.classrooms_id
and classrooms.id=seats.id 

Except that it counts 0s for seats/students. 

Why can't I recall/find how to do this particular join? 

-Ben 
-- 
"The best way to predict the future is to invent it."
- XEROX PARC slogan, circa 1978

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly

[GENERAL] Deleting a rule?

2005-06-17 Thread Benjamin Smith

I wrote a rule a while back that, due to the software being extended, now 
needs to be deleted. 

How do I drop a rule? 

DELETE FROM pg_rules WHERE rulename='foo'; 

doesn't seem to cut it... 

-Ben 
-- 
"The best way to predict the future is to invent it."
- XEROX PARC slogan, circa 1978

---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq

Re: [GENERAL] Postgres vs Firebird?

2005-05-05 Thread Benjamin Smith

Based on the extensive feedback here, as well as other information from other 
websites found since asking here, I've decided that I'm still, very happily, 
a PG user. 

No significant issues to date - PG has "just worked" for me for 5 years now, 
and the frustrating limitations (EG: alter table drop field, row size limits) 
functionally disappeared some time ago. It's always good to reevaluate where 
you are now to make sure you're not doing something stupid. 

Replication is the only thing that remains, and there appears to be 
commercially viable solutions for that, as well, though we've not yet 
implemented it. We're still in a very heavy product development phase, and we 
have a pretty good backup solution. 

When will PG replication come ready "out of the box" with the RH RPMs?

Anyway, once again, Postgres is my friend... 

-Ben 

-- 
"The best way to predict the future is to invent it."
- XEROX PARC slogan, circa 1978

---(end of broadcast)---
TIP 8: explain analyze is your friend

[GENERAL] Postgres vs Firebird?

2005-05-04 Thread Benjamin Smith

As a long-time user of Postgres, (First started using it at 7.0) I'm reading 
recently that Firebird has been taking off as a database. 

Perhaps this is not the best place to ask this, but is there any compelling 
advantage to using Firebird over Postgres? We have a large database (almost 
100 tables of highly normalized data) heavily loaded with foreign keys and 
other constraints, and our application makes heavy use of transactions. 

I say this as my company's growth has been exponential, showing no sign of 
letting up soon, and I'm reviewing clustering and replication technologies so 
that we can continue to scale as nicely as we have to date with our single 
server. (now with a load avg around .30 typically) 

-Ben 
-- 
"The best way to predict the future is to invent it."
- XEROX PARC slogan, circa 1978

---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings

[GENERAL] Lost in Foreign Key land

2005-04-09 Thread Benjamin Smith

Ok, I have a stupid-simple table: 

create table files (
 id serial unique not null, 
 mime varchar not null, 
 name varchar not null
 ); 

Idea being that I can save values from a file upload into a table, and use 
throughout my application. This gives me a central repository to look for 
files of a particular name, etc. while allowing multiple files with the same 
name to be uploaded. It might be used something like this: 

create table personalinfo ( 
 name varchar, 
 address varchar, 
 resume integer not null references files(id)
 ); 

But, I want to keep it "clean" - meaning that if the file orphaned, (isn't 
used anywhere), I want to delete it. I asked a related question a few months 
ago, and never understood the responses. (See thread "Search for restricting 
foreign keys") I just spent the last few hours trying to make sense of Forian 
Plug's query from an email dated 1/25/05 and reading up on the attribute 
tables, and I am lost. 

I'm sure it's very logical, and I'm just as sure that the logic, for now, 
escapes me. What I'd like to be able to do is get a list of files table id 
fields that have no values tied to them. 

If I execute "delete from files;", it won't delete them, because of foreign 
keys that refer to one or more of the files records. 

How can I get a list of files records with no foreign key records referencing 
the id field, without having to join on every single table that refers to 
files(id)? (now maybe a dozen, and growing) Something like "select id from 
files where id not in (select references to files.id)"; 

-Ben 
-- 
"The best way to predict the future is to invent it."
- XEROX PARC slogan, circa 1978

---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings

[GENERAL] pseudo-serial values in dual primary key?

2005-03-19 Thread Benjamin Smith

Is it possible to have the equivalent of a serial data type in a table, 
sub-categorized? 

Assume the following: 

create table categories (id serial, title varchar); 

Now, I want to create an entries table, and by default, count serially by 
category, so that category 1 has entries.sequence of 1, 2, 3, and so does 
category 2. (where sequence= 1, 2, 3...)  Something like: 

create table entries (
categories_id integer not null references categories(id), 
sequence default max(entries.sequence WHERE categories_id=this.categories_id), 
primary key (categories_id, sequence) 
); 

I'm not sure about the semantics of this, but i want sequence to start at 1, 
and count up, for its category as defined by categories_id. I already know 
that I can set enforce the uniqueness of categories_id and sequence with thte 
primary key, and I could just write some more app code to do a query to get 
the max value of sequence where categories_id=$categories_id, but can this be 
done without adding a bunch of xtra application code? 

-Ben 
-- 
"The best way to predict the future is to invent it."
- XEROX PARC slogan, circa 1978

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])

[GENERAL] Checking for schedule conflicts

2005-03-12 Thread Benjamin Smith

Given the tables defined below, what's the easiest way to check for schedule 
conflicts? 

So far, the only way I've come up with is to create a huge, multi-dimensional 
array in PHP, with a data element for every minute of all time taken up by 
all events, and then check for any of these minutes to be set as I go through 
all the records. (ugh!) 

But, how could I do this in the database? 

But I'd like to see something like 
"select count(*) FROM events, sched 
 WHERE sched.date=$date
 AND events.id=sched.events_id 
 ...
 GROUP BY date, start1 " 

And here's where I get stumped. You can't group by start or end because we 
need to check if they OVERLAP any other records on the same date. 

Ideas? 


// Sometimes, recurring events 
create table events (
 id serial not null primary key, 
 title varchar
 ); 
// date=MMDD, start/end: HH:MM (24-hour) 
create table sched ( 
 events_id integer not null references events(id), 
 date integer not null, 
 start integer not null, 
 end integer not null
 ); 
insert into events (title) 
 VALUES ('Tuesday Pickup'); 
insert into sched(events_id, date, start, end) 
 VALUES (1, 20050308, 0900, 1300); 
insert into sched (events_id, date, start, end) 
 VALUES (1, 20050315, 0900, 1300); 


-- 
"The best way to predict the future is to invent it."
- XEROX PARC slogan, circa 1978

---(end of broadcast)---
TIP 8: explain analyze is your friend

[GENERAL] PG and OpenSSI?

2005-02-14 Thread Benjamin Smith

Has anybody had any experience using PG with OpenSSI (Single System Image) for 
a high-availability cluster? 

http://openssi.org/cgi-bin/view?page=openssi.html

Is this feasible? Possible? Easier/harder than other PG clustering solutions? 

-Ben 
-- 
"The best way to predict the future is to invent it."
- XEROX PARC slogan, circa 1978

---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org

[GENERAL] Search for restricting foreign keys

2005-01-24 Thread Benjamin Smith

Is there a way in PG 7.3, given a field, to find out what other tables & 
records are linked to it via a foreign key? EG: 

create table cities (id serial primary key, 
 title varchar not null); 
insert into cities(title) values ('San Fransisco'); 
insert into cities(title) values ('Los Angeles'); 

create table stores (id serial primary key, 
 city integer not null references cities(id), 
 title varchar); 
insert into stores(city, title) values (1, 'North City'); 
insert into stores(city, title) values (2, 'Central District'); 
insert into stores (city, title) values (1, 'Beachfront"); 

Given the above, and I wanted to know all the tables/records that relate to id 
1, San Fransisco, and get a result something like: 

table  | primary key 
stores | 1 
stores | 3 

Does such functionality exist in PG? Isn't it already doing essentially this 
when attempting to delete a record with other records linked to it? 

Currently, I do this by attempting to delete a record in a transaction, and 
trap the error - it's a terrible way to do this, and sometimes I'm already in 
a transaction. 

-Ben 
-- 
"The best way to predict the future is to invent it."
- XEROX PARC slogan, circa 1978

---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings

[GENERAL] Tracking back foreign keys?

2004-12-30 Thread Benjamin Smith

In one of my apps, I have an "images manager" that's become unmanageable. I'd 
like to be able to easily track key dependencies. 

Let's say I have a set of tables like this: 

create table Customers  (
id serial unique not null primary key, 
name varchar not null, 
address varchar not null
image integer references images(id) 
); 

create table Vendors ( 
id serial unique not null primary key, 
name varchar not null, 
vendor_account varchar, 
picture integer references images(id) 
); 

create table Images ( 
id serial unique not null, 
filename varchar not null, 
mime varchar not null
); 

I know that in the images table I have lots of cruft, "dead wood", but when I 
delete from images, is there a "nice" way of finding out what dependencies 
there are? 

Something like 

Select pg_table.name from pg_table where pg_field references images.id

? 

How else do I put it? The output I'd like would be something like
images.id / tablename / table.primary key 
11 / Vendors / 14
12 / Customers / 9

Can this be done? 

-Ben 

-- 
"I kept looking around for somebody to solve the problem. 
Then I realized I am somebody" 
   -Anonymous

---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faqs/FAQ.html

Re: [GENERAL] Non-aggregate values attached to aggregates?

2004-12-16 Thread Benjamin Smith

Thanks much for your help! 

It took a few tries to get what it was all about, but I got it. 

On Thursday 16 December 2004 14:09, Michael Fuhr wrote:
> > Dates are kept as ]MMDD', eg 2004114 for Nov 14, 2004. 
> 
> Why not use a DATE type?  You can reformat it with to_char() if
> need be.

Not a bad idea. I used int because it was simple and "good enough". My app is 
written in PHP, and by using strtotime() I can trap typo errors before they 
get near the database (which frequently are difficult to display in a 
layperson readable format) 

EG: 

$sql="INSERT INTO enrollments (date) values ('[date]')"; 
$pass=array('date'=>$_ENROLL['date']; 
if (!$DB->SafeQuery($sql, $pass)) 
 return error("Database Error: ".$DB->Error()); 

If there's an error here, it'd say something like: "ERROR: Bad date external 
representation 'mya 11 2004'. Compare with: 

$sql="INSERT INTO enrollments (date) VALUES [date]"; 
IF (($date=strtotime($_REQUEST['date'])) ==-1) 
return Error("I'm sorry, but ".$_REQUEST['date']." does not appear to 
be a 
valid date. Please fix and try again"); 
if (!$DB->SafeQuery($sql, array('date'=>$date))) 
return error("Database Error: ".$DB->Error()); 

which is much more "friendly" to the end user. 

BTW: What is to_char, and why doesn't this work in PG 7.3.x? 

select to_char(datefield) from TableX; 

-Ben 
-- 
"The best way to predict the future is to invent it."
- XEROX PARC slogan, circa 1978

---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings

[GENERAL] Non-aggregate values attached to aggregates?

2004-12-16 Thread Benjamin Smith

I have a list of students, and a list of enrollment records, and I'm trying to 
get a list of students and their most recent enrollment/disenrollment dates. 

create table students (id serial primary key, name varchar);
create table enrollments (
students_id integer not null references students(id), 
start integer not null, 
finish integer not null default 0);
insert into students (name) VALUES ('johnny');
insert into enrollments (students_id, start, finish) VALUES 
(1, 20030901, 20040530);
insert into enrollments (students_id, start, finish) VALUES 
(1, 20040901, 0);

Student enrolled last year, and is currently enrolled. If students are 
currently enrolled, the finish date is "0". Dates are kept as ]MMDD', eg 
2004114 for Nov 14, 2004. 

I want to be able to export the student name, most recent enrollment date, and 
disenrollment date. I've successfully gotten the student name and most recent 
enrollment date, but never the associated exit date. 

This returns most recent enrollment date: 
select students.name, max(enrollments.start) as start from students, 
enrollments where enrollments.students_id=students.id group by students.name;

Now, to get the exit date, I've tried 

select students.name, 
max(enrollments.start) as start, 
finish
from students, enrollments 
where enrollments.students_id=students.id 
AND max(enrollments.start)=enrollments.start 
group by students.name, enrollments.finish

which results in "ERROR:  Aggregates not allowed in WHERE clause" and also: 

select students.name, 
max(enrollments.start) as start, 
finish 
from students, enrollments 
where enrollments.students_id=students.id 
group by students.name, enrollments.finish 
having enrollments.start=max(enrollments.start);

which returns "ERROR: Attribute enrollments.start must be GROUPed or used in 
an aggregate function" 

How can this be done? Can it be done? 

-Ben 
-- 
"The best way to predict the future is to invent it."
- XEROX PARC slogan, circa 1978


---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org

77 matches

Mail list logo