[GENERAL] How do you handle shared memory corruption issues? (contrib/xml2)

2007-06-27 Thread CG
Postgresql 8.1

I made a simple modification to contrib/xml2 to include the ability to process 
exslt tags... On the production side, the postmaster crashes when I try to 
process my exslt stylesheet. On my development machine, everything runs without 
crashing. There's a number of differences there... There's little or no load. 
It has far less memory, and its PostgreSQL memory configuration reflects that. 
It has a different processor (P4 versus x86_64 on production). There are other 
differences at the compiler and library level I'm sure. Any of these things 
could be contributing to the crashes.

They are both using the same version of PsotgreSQL, compiled with the same 
options. They are both using the same version of libxml2 and libxslt.

Incidently, I installed the Perl bindings for libxml2 and libxslt. I created a 
plperlu function which uses those libraries to do the xml transformations. It 
blows up in the same way.

So, I suspect that there is an issue with libxslt, but I have no idea how to 
nail it down. Please advise...

I'm up against a deadline, so my main goal is to get this working. Timelines 
and paychecks being as they are, I'm not nearly as concerned about fixing the 
larger problem. Is there a different way to process xml and (e)xslt at the 
database not using contrib/xml2 or XML::LibXSLT ?


   

Boardwalk for $500? In 2007? Ha! Play Monopoly Here and Now (it's updated for 
today's economy) at Yahoo! Games.
http://get.games.yahoo.com/proddesc?gamekey=monopolyherenow  

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [GENERAL] How do you handle shared memory corruption issues? (contrib/xml2)

2007-06-27 Thread CG
I compiled the libraries from the source tarballs I configured using 
--with-crypt --with-debugger --with-python. It is always possible I missed 
something!

Do you think the RPMs will even be applicable considering I'm not using RedHat 
or Fedora?


- Original Message 
From: Martin Gainty <[EMAIL PROTECTED]>
To: CG <[EMAIL PROTECTED]>; postgresql listserv 
Sent: Wednesday, June 27, 2007 12:46:59 PM
Subject: Re: [GENERAL] How do you handle shared memory corruption issues? 
(contrib/xml2)


Hi CG

looks as if your 64bit box needs 64bit libraries instead of default 32 bit

did you check here
http://dries.studentenweb.org/apt/packages/perl-XML-LibXSLT/info.html

Let us know
M--

This email message and any files transmitted with it contain confidential
information intended only for the person(s) to whom this email message is
addressed.  If you have received this email message in error, please notify
the sender immediately by telephone or email and destroy the original
message without making a copy.  Thank you.

- Original Message - 
From: "CG" <[EMAIL PROTECTED]>
To: "postgresql listserv" 
Sent: Wednesday, June 27, 2007 12:02 PM
Subject: [GENERAL] How do you handle shared memory corruption issues? 
(contrib/xml2)


> Postgresql 8.1
>
> I made a simple modification to contrib/xml2 to include the ability to 
> process exslt tags... On the production side, the postmaster crashes when 
> I try to process my exslt stylesheet. On my development machine, 
> everything runs without crashing. There's a number of differences there... 
> There's little or no load. It has far less memory, and its PostgreSQL 
> memory configuration reflects that. It has a different processor (P4 
> versus x86_64 on production). There are other differences at the compiler 
> and library level I'm sure. Any of these things could be contributing to 
> the crashes.
>
> They are both using the same version of PsotgreSQL, compiled with the same 
> options. They are both using the same version of libxml2 and libxslt.
>
> Incidently, I installed the Perl bindings for libxml2 and libxslt. I 
> created a plperlu function which uses those libraries to do the xml 
> transformations. It blows up in the same way.
>
> So, I suspect that there is an issue with libxslt, but I have no idea how 
> to nail it down. Please advise...
>
> I'm up against a deadline, so my main goal is to get this working. 
> Timelines and paychecks being as they are, I'm not nearly as concerned 
> about fixing the larger problem. Is there a different way to process xml 
> and (e)xslt at the database not using contrib/xml2 or XML::LibXSLT ?
>
>
>
> 
> Boardwalk for $500? In 2007? Ha! Play Monopoly Here and Now (it's updated 
> for today's economy) at Yahoo! Games.
> http://get.games.yahoo.com/proddesc?gamekey=monopolyherenow
>
> ---(end of broadcast)---
> TIP 6: explain analyze is your friend
>


 

TV dinner still cooling? 
Check out "Tonight's Picks" on Yahoo! TV.
http://tv.yahoo.com/

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org/


Re: [GENERAL] How do you handle shared memory corruption issues? (contrib/xml2)

2007-06-27 Thread CG
...As we continue through this process I just want to express my sincere thanks 
for your suggestions heretofore

I use neither RedHat nor Fedora :( I also didn't look carefully at the RPMs you 
were offering me. They are for XML::LibXSLT, not LibXSLT. Sorry about that!

I originally built XML::LibXSLT using CPAN. For argument's sake, I dropped in 
just the binary from the Fedora Core 6 x86_64 RPM (the rest of the RPM's 
contents according to diff were identical). It blew up with a Floating Point 
Error when I ran 

#!/usr/bin/perl
 use XML::LibXSLT;
 use XML::LibXML;
 my $parser = XML::LibXML->new();
 my $xslt = XML::LibXSLT->new();
 my $source = $parser->parse_file('foo.xml');
 my $style_doc = $parser->parse_file('bar.xsl');
 my $stylesheet = $xslt->parse_stylesheet($style_doc);
 my $results = $stylesheet->transform($source);
 print $stylesheet->output_string($results);

The auto/XML/LibXSLT/LibXSLT.so library compiled on the machine which it is 
running should be sufficient, right?



- Original Message 
From: Martin Gainty <[EMAIL PROTECTED]>
To: CG <[EMAIL PROTECTED]>; postgresql listserv 
Sent: Wednesday, June 27, 2007 2:23:37 PM
Subject: Re: [GENERAL] How do you handle shared memory corruption issues? 
(contrib/xml2)


yes..go with the specific distro for your OS (either RH or Fedora)

keep us apprised!

Thanks/
M--
This email message and any files transmitted with it contain confidential
information intended only for the person(s) to whom this email message is
addressed.  If you have received this email message in error, please notify
the sender immediately by telephone or email and destroy the original
message without making a copy.  Thank you.

- Original Message - 
From: "CG" <[EMAIL PROTECTED]>
To: "Martin Gainty" <[EMAIL PROTECTED]>; "postgresql listserv" 

Sent: Wednesday, June 27, 2007 2:14 PM
Subject: Re: [GENERAL] How do you handle shared memory corruption issues? 
(contrib/xml2)


>I compiled the libraries from the source tarballs I configured 
>using --with-crypt --with-debugger --with-python. It is always possible I 
>missed something!
>
> Do you think the RPMs will even be applicable considering I'm not using 
> RedHat or Fedora?
>
>
> - Original Message 
> From: Martin Gainty <[EMAIL PROTECTED]>
> To: CG <[EMAIL PROTECTED]>; postgresql listserv 
> 
> Sent: Wednesday, June 27, 2007 12:46:59 PM
> Subject: Re: [GENERAL] How do you handle shared memory corruption issues? 
> (contrib/xml2)
>
>
> Hi CG
>
> looks as if your 64bit box needs 64bit libraries instead of default 32 bit
>
> did you check here
> http://dries.studentenweb.org/apt/packages/perl-XML-LibXSLT/info.html
>
> Let us know
> M--
>
> This email message and any files transmitted with it contain confidential
> information intended only for the person(s) to whom this email message is
> addressed.  If you have received this email message in error, please 
> notify
> the sender immediately by telephone or email and destroy the original
> message without making a copy.  Thank you.
>
> - Original Message - 
> From: "CG" <[EMAIL PROTECTED]>
> To: "postgresql listserv" 
> Sent: Wednesday, June 27, 2007 12:02 PM
> Subject: [GENERAL] How do you handle shared memory corruption issues?
> (contrib/xml2)
>
>
>> Postgresql 8.1
>>
>> I made a simple modification to contrib/xml2 to include the ability to
>> process exslt tags... On the production side, the postmaster crashes when
>> I try to process my exslt stylesheet. On my development machine,
>> everything runs without crashing. There's a number of differences 
>> there...
>> There's little or no load. It has far less memory, and its PostgreSQL
>> memory configuration reflects that. It has a different processor (P4
>> versus x86_64 on production). There are other differences at the compiler
>> and library level I'm sure. Any of these things could be contributing to
>> the crashes.
>>
>> They are both using the same version of PsotgreSQL, compiled with the 
>> same
>> options. They are both using the same version of libxml2 and libxslt.
>>
>> Incidently, I installed the Perl bindings for libxml2 and libxslt. I
>> created a plperlu function which uses those libraries to do the xml
>> transformations. It blows up in the same way.
>>
>> So, I suspect that there is an issue with libxslt, but I have no idea how
>> to nail it down. Please advise...
>>
>> I'm up against a deadline, so my main goal is to get this working.
>> Timeline

[GENERAL] xpath_string namespace issue...

2007-06-29 Thread CG
I'm not sure what I'm missing here... :)

select xpath_string($xml$

  baz

$xml$
,'//f:bar/text()')

This does not give me back "baz" as I was expecting it to... How does one 
clue-in the xpath functions to the namespaces in the XML document?


   

Got a little couch potato? 
Check out fun summer activities for kids.
http://search.yahoo.com/search?fr=oni_on_mail&p=summer+activities+for+kids&cs=bz
 

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


[GENERAL] xpath_* namespace bug

2007-07-09 Thread CG
select xpath_string($xml$

  baz

$xml$
,'//f:bar/text()');

This does not give me back "baz" as I was expecting it to... It seems like 
xpath is ignoring the namespace directives in the source XML document.


 

TV dinner still cooling? 
Check out "Tonight's Picks" on Yahoo! TV.
http://tv.yahoo.com/

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [GENERAL] xpath_* namespace bug

2007-07-09 Thread CG
Can't wait... Got to have it now. :(

I patched xpath.c and created a function which takes a third parameter which 
are the known namespaces in a space delimited hash list ... I based the patch 
on some example code from the libxml website which works perfectly from the 
command line. However, in PostgreSQL, it only works 65% of the time. The other 
35% of the time it returns an empty string. I have no idea how to debug that 
kind of problem.

You make it sound like 8.3 will have full-featured xpath_* frunctions. Will it 
have full-featured xslt support as well?

- Original Message 
From: Nikolay Samokhvalov <[EMAIL PROTECTED]>
To: CG <[EMAIL PROTECTED]>
Cc: postgresql listserv 
Sent: Monday, July 9, 2007 4:33:04 PM
Subject: Re: [GENERAL] xpath_* namespace bug


AFAIK, contrib/xml2 doesn't support namespaces for XPath expressions
at all. Wait for PostgreSQL 8.3 :-)

On 7/9/07, CG <[EMAIL PROTECTED]> wrote:
> select xpath_string($xml$
> 
>   baz
> 
> $xml$
> ,'//f:bar/text()');
>
> This does not give me back "baz" as I was expecting it to... It seems like 
> xpath is ignoring the namespace directives in the source XML document.
>
>
>
> 
> TV dinner still cooling?
> Check out "Tonight's Picks" on Yahoo! TV.
> http://tv.yahoo.com/
>
> ---(end of broadcast)---
> TIP 9: In versions below 8.0, the planner will ignore your desire to
>choose an index scan if your joining column's datatypes do not
>match
>


-- 
Best regards,
Nikolay


   

Get the Yahoo! toolbar and be alerted to new email wherever you're surfing.
http://new.toolbar.yahoo.com/toolbar/features/mail/index.php

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


[GENERAL] Need help to clean up after failed CLUSTER

2008-04-16 Thread CG
I'm using PostgreSQL 8.1 ... 

I had to terminate some clustering before it had completed. I think I have 
quite a bit of wasted disk space in half-baked table files. I need some 
suggestions for an easy way to find and clean out the files left over from the 
failed cluster operations.

TIA!


  

Be a better friend, newshound, and 
know-it-all with Yahoo! Mobile.  Try it now.  
http://mobile.yahoo.com/;_ylt=Ahu06i62sR8HDtDypao8Wcj9tAcJ

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


[GENERAL] Automatic type conversion

2009-07-15 Thread CG
I'm in the process of attempting to upgrade from 8.1 to 8.4, and I've been 
using the uniqueidentifier contrib module for my UUID's ... In 8.1 Postgresql 
was able to figure out what to do with statements like

# SELECT 'Your new UUID is ' || newid();
   ?column?
---
 Your new UUID is 5f1a4987-9005-42d6-8e62-d1ff14bb1433
(1 row)

but in 8.4 (and probably 8.3 also...), 

# SELECT 'Your new UUID is ' || newid();
ERROR:  operator is not unique: unknown || uniqueidentifier
LINE 1: SELECT 'Your new UUID is ' || newid();
   ^
HINT:  Could not choose a best candidate operator. You might need to add 
explicit type casts.

But, with the new internal UUID datatype, what is the magic recipe that allows 
this to run?

# SELECT 'Your new UUID is ' || uuid_generate_v4();
   ?column?
---
 Your new UUID is cd8d8f5f-3eea-4e96-be4f-f93daa174d8f
(1 row)
 
I could add the explicit type casts, but I'd rather find out what the nature of 
the subtle (or not-so-subtle) difference I've stumbled upon is...


  

[GENERAL] best practice transitioning from one datatype to another

2009-07-15 Thread CG
Another UUID question... I was hoping to put this off for a while, but it looks 
like it's going to need to happen sooner than later. Rats! I keep telling 
myself transitioning to core datatypes is good for the soul.

While transitioning from 8.1 to 8.4, I need to transition to the internal UUID 
type in place of the contrib/uniqueidentifier module. I've built the database 
around uniqueidentifier, so nearly every table has one column of that data 
type. It's going to be tedious to 

ALTER TABLE foo ALTER COLUMN bar SET DATA TYPE uuid;
...repeat 600 times...

I'll also have to drop and reload the views and the rules on tables. It'll be 
tedious even if the tables have no data in them. 

Can anyone recommend a better/faster way to make the transition?


  

Re: [GENERAL] best practice transitioning from one datatype to another

2009-07-16 Thread CG
Trying to fix a dump file ...

cat backup.sql | sed -e "s/uniqueidentifier/uuid/g" > backup_fixed.sql

... gives me a dump that won't import. It is hard digging through 30+ gigs of 
text data to find where sed ate a field delimiter, so I'm going to give Tom's 
idea a try. I didn't even know the ALTER TYPE x RENAME TO y; was even 
available, and I probably wouldn't have tried it if Tom hadn't suggested it. It 
takes a certan amount of chutzpah to make that kind of change before diving 
into the lengthy process of database upgrading. 



From: Arndt Lehmann 
To: pgsql-general@postgresql.org
Sent: Thursday, July 16, 2009 5:22:26 AM
Subject: Re: [GENERAL] best practice transitioning from one datatype to another

On Jul 16, 5:46 pm, a.w...@netzmeister-st-pauli.de (Andreas Wenk)
wrote:
> Arndt Lehmann schrieb:
>
> > On Jul 16, 8:05 am, t...@sss.pgh.pa.us (Tom Lane) wrote:
> >> CG  writes:
> >>> While transitioning from 8.1 to 8.4, I need to transition to the internal 
> >>> UUID type in place of the contrib/uniqueidentifier module. I've built the 
> >>> database around uniqueidentifier, so nearly every table has one column of 
> >>> that data type. It's going to be tedious to
> >>> ALTER TABLE foo ALTER COLUMN bar SET DATA TYPE uuid;
> >>> ...repeat 600 times...
> >>> I'll also have to drop and reload the views and the rules on tables. 
> >>> It'll be tedious even if the tables have no data in them.
> >>> Can anyone recommend a better/faster way to make the transition?
> >> Couldn't you rename the type to uuid in the 8.1 database before you
> >> dump?
>
> >>                         regards, tom lane
>
> >> --
> >> Sent via pgsql-general mailing list (pgsql-gene...@postgresql.org)
> >> To make changes to your 
> >> subscription:http://www.postgresql.org/mailpref/pgsql-general
>
> > Just an idea - don't know if it works, first try on a test server:
> > 1. Dump the complete database into text format (pg_dump --format=t)
> > 2. Do a search and replace from "uniqueidentifier" to "uuid"
> > 3. Reimport
>
> > Best Regards,
> >   Arndt Lehmann
>
> uh - --format=t means tar format. --format=p means plain text ...
>
> Or am I missing something?
>
> $pg_dump --help
> Usage:
>    pg_dump [OPTION]... [DBNAME]
>
> General options:
> ...
>    -F, --format=c|t|p       output file format (custom, tar, plain text)
> ...
>
> Cheers
>
> Andy
>
> --
> Sent via pgsql-general mailing list (pgsql-gene...@postgresql.org)
> To make changes to your 
> subscription:http://www.postgresql.org/mailpref/pgsql-general

Hi Andy,

You are perfectly right. My oversight.

Best Regards,
  Arndt Lehmannn

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general



  

Re: [GENERAL] best practice transitioning from one datatype to another

2009-07-20 Thread CG
I'm stumped-- at least for an easy way to do this.

When referencing the uniqueidentifier data type in PostgreSQL 8.1 
(now uuid, because of an UPDATE to the pg_ tables) in function definitions in 
schemas not in the search path, one must reference the data type as 
"public.uniqueidentifier" (ahem. "public.uuid"). This was done because the 
query planner couldn't with 100% certainty determine type equality (i.e. for 
foreign key constraints, comparisons in WHERE clauses...), so it decided to use 
sequential scans where index scans would have been most appropriate.

When I reload to PostgreSQL 8.4, it doesn't understand public.uuid. So, the 
only way I can think of to get those functions back into the database is to use 
a restore list, and keep them from being inserted in the first place. Then, 
one-at-a-time, re-create them manually. There must be a better way, though!

I'm sure this only my second of several more hurdles to overcome before I'm 
finished with the transition.

Your wisdom will be appreciated!

CG




 



____
From: Tom Lane 
To: CG 
Cc: pgsql-general@postgresql.org
Sent: Wednesday, July 15, 2009 7:05:47 PM
Subject: Re: [GENERAL] best practice transitioning from one datatype to another

CG  writes:
> While transitioning from 8.1 to 8.4, I need to transition to the internal 
> UUID type in place of the contrib/uniqueidentifier module. I've built the 
> database around uniqueidentifier, so nearly every table has one column of 
> that data type. It's going to be tedious to 

> ALTER TABLE foo ALTER COLUMN bar SET DATA TYPE uuid;
> ...repeat 600 times...

> I'll also have to drop and reload the views and the rules on tables. It'll be 
> tedious even if the tables have no data in them. 

> Can anyone recommend a better/faster way to make the transition?

Couldn't you rename the type to uuid in the 8.1 database before you
dump?

            regards, tom lane



  

[GENERAL] anyelement and anynonarray inclusion

2009-09-29 Thread CG
 
 
I use a custom datatype that's a good candidate for being included in the 
family of data types that fit in the "anynonarray" and "anyelement" categories. 
How can I get PostgreSQL to include that data type when it is considering 
selecting functions and operators that take the polymorphic types?


  

[GENERAL] ALTER TABLE ... DISABLE TRIGGERS Isolation leve

2011-05-09 Thread CG
I'm using 8.4.1

I want to add a column to a table, but there are update triggers that will fire 
that don't need to fire for this operation. So, I'd like to add the column with 
triggers off. Normally this operation would take 10 or so seconds, so locking 
the table for that amount of time is not a big deal. I just want to make sure 
that no new data gets written to the table while the triggers are disabled.

BEGIN;
ALTER TABLE foo ADD COLUMN bar DISABLE TRIGGER USER;
COMMIT;

seems to leave the triggers disabled.

My tests seem to show that 


BEGIN;
ALTER TABLE foo DISABLE TRIGGER USER;

locks the table fully, then

ALTER TABLE foo ADD COLUMN bar;
ALTER TABLE foo ENABLE TRIGGER USER;
COMMIT;

gets the job done. I only pause because I figured that the single DISABLE 
triggerin transaction would have flopped back when the transaction committed. I 
was wrong about that I only need a little bit of affirmation or a kick in 
the right direction.

Thanks folks.

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


[GENERAL] information_schema.referential_constraints contains NULLs

2011-10-14 Thread CG
PostgreSQL 9.1.0


For some of the referential constraints listed in my 
information_schema.referential_constraints table the values for the fields 
unique_constraint_catalog, unique_constraint_schema, and unique_constraint_name 
are NULL. There doesn't seem to be any rhyme or reason to which ones have 
values and which ones don't. They should all have values, right? Any ideas what 
could be going on?


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


[GENERAL] Best way to debug user defined type

2008-11-05 Thread CG
I'm trying to debug a C function that is used for the binary send/receive part 
of a user defined type. I can debug the send part fine, but the receive part 
takes an input parameter of type "internal", and I can't seem to conjure up an 
"internal" to feed to my function to test if the output is working. Can you 
suggest a debugging strategy?



  

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


[GENERAL] pg_dump and ON DELETE CASCADE problem

2009-12-09 Thread CG
Hi all,
 
We're using PostgreSQL 8.4 ... We do our nightly database backups with pg_dump. 
I was doing a test restore and I encountered some data during the reload that 
was in a table against the conditions of a foreign key constraint. I run my 
restores with the "-e" option to halt on errors, so this data halted the 
restore... I went to check the running database and the row in question had 
been deleted. 
 
I had defined the foreign key to cascade on delete, and I imagine that during 
the dump the delete occurred on the master table. Perhaps the keyed table had 
already been dumped so when it came time to dump the master table, the 
referencing row was not there to be dumped. One would imagine that PostgreSQL 
would have protections for that sort of thing... 
 
Can you think of how I can protect against this in the future?
 
CG


  

Re: [GENERAL] pg_dump and ON DELETE CASCADE problem

2009-12-10 Thread CG
The command's nothing out-of-the-ordinary:
 
#!/bin/bash

export LD_LIBRARY_PATH=/usr/local/pgsql/lib
 
#
# Set Variables
#
DAY_NUM=`/bin/date +"%d"`
MON_NUM=`/bin/date +"%m"`
YEAR_NUM=`/bin/date +"%Y"`

/usr/local/pgsql/bin/pg_dump -h 192.168.1.5 -Upostgres -f 
backup.$YEAR_NUM$MON_NUM$DAY_NUM.pga -Fc -b data 
 
#END
 
Curiouser and curiouser... Last night's dump failed to restore in the same way:
 
pg_restore: [archiver (db)] Error while PROCESSING TOC:
pg_restore: [archiver (db)] Error from TOC entry 7545; 2606 311883439 FK 
CONSTRAINT packet_search_trigram_puuid_fkey postgres
pg_restore: [archiver (db)] could not execute query: ERROR:  insert or update 
on table "packet_search_trigram" violates foreign key constraint 
"packet_search_trigram_puuid_fkey"
DETAIL:  Key (packet_uuid)=(0ab44da9-544d-413a-9ab1-a1b442310b24) is not 
present in table "packet".
    Command was:
ALTER TABLE ONLY packet_search_trigram
    ADD CONSTRAINT packet_search_trigram_puuid_fkey FOREIGN KEY (packet_uuid) 
REFERE...
pg_restore: *** aborted because of error
pg_restore: finished item 7545 FK CONSTRAINT packet_search_trigram_puuid_fkey
pg_restore: [archiver] worker process failed: exit code 1
pg_restore: *** aborted because of error

That was the same failure I got the previous night. I go to the live database 
and rows with that key are /not/ in either one of those tables. They /were/ in 
the tables at one point. I have an ON DELETE trigger that copies deleted rows 
into another table, so I can see that a row with that key once existed in those 
tables.
 
This may not be a pg_dump problem, but some sort of MVCC irregularity where 
pg_dump is able to dump rows that it shouldn't. I bet a VACUUM FULL would clean 
this up, but I have a live problem here. If I eradicate it, who knows when 
we'll see it again...
 

--- On Wed, 12/9/09, Craig Ringer  wrote:


From: Craig Ringer 
Subject: Re: [GENERAL] pg_dump and ON DELETE CASCADE problem
To: cgg...@yahoo.com
Cc: "postgresql listserv" 
Date: Wednesday, December 9, 2009, 9:02 PM


On 10/12/2009 3:31 AM, CG wrote:
> Hi all,
> We're using PostgreSQL 8.4 ... We do our nightly database backups with
> pg_dump. I was doing a test restore and I encountered some data during
> the reload that was in a table against the conditions of a foreign key
> constraint. I run my restores with the "-e" option to halt on errors, so
> this data halted the restore... I went to check the running database and
> the row in question had been deleted.

> I had defined the foreign key to cascade on delete, and I imagine that
> during the dump the delete occurred on the master table. Perhaps the
> keyed table had already been dumped so when it came time to dump the
> master table, the referencing row was not there to be dumped.

pg_dump does all its work in a single serializable transaction to avoid this 
sort of problem. It doesn't see any changes made to the database after it 
starts. So, assuming you used pg_dump to dump the database as a whole rather 
than invoking it separately for a bunch of separate tables, that should not be 
your problem.

How do you run pg_dump? Can you supply the script or command line?

> One would
> imagine that PostgreSQL would have protections for that sort of thing...

It does, which is what makes the issue you've encountered somewhat strange.

--
Craig Ringer



  

Re: [GENERAL] pg_dump and ON DELETE CASCADE problem

2009-12-10 Thread CG

Thanks for the suggestion. I'm not sure what you mean when you say I should 
restore to a file. Do you mean I should dump the database to an SQL file 
instead of the "compressed" format?

What do you think I will find?

In the database dump, it is including a row that should be marked as deleted. I 
can select on that key in the production database and get zero rows, and I can 
select on that key in the restored database and find the row. When I ignore 
errors the data is restored, but the foreign key can't be created (and that is 
the only error I encounter). The presence of the data in the dump can not be 
contested... :)



--- On Thu, 12/10/09, Adrian Klaver  wrote:

> 
> One thing that comes to mind is to restore the dump file to
> a file instead of a 
> database and see what is being dumped from the live
> database.
> 
> 
> 
> -- 
> Adrian Klaver
> akla...@comcast.net
> 



  

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] pg_dump and ON DELETE CASCADE problem

2009-12-11 Thread CG
That's really nifty! I didn't know you could do that!

So I expanded it, and I grepped for that UUID through the 46 gig file, and I 
found the row in the dump that shouldn't be there... It defies explanation.

--- On Thu, 12/10/09, John R Pierce  wrote:

> From: John R Pierce 
> Subject: Re: [GENERAL] pg_dump and ON DELETE CASCADE problem
> To: cgg...@yahoo.com, pgsql-general@postgresql.org
> Date: Thursday, December 10, 2009, 3:29 PM
> CG wrote:
> > Thanks for the suggestion. I'm not sure what you mean
> when you say I should restore to a file. Do you mean I
> should dump the database to an SQL file instead of the
> "compressed" format?
> >   
> 
> he meant...
> 
>    pg_restore -f outputfile.sql
> yourdumpfile
> 
> this will convert the dumpfile to SQL...
> 
> 
> 
> 
> 




-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] pg_dump and ON DELETE CASCADE problem

2009-12-15 Thread CG


--- On Fri, 12/11/09, Scott Marlowe  wrote:

> From: Scott Marlowe 
> Subject: Re: [GENERAL] pg_dump and ON DELETE CASCADE problem
> To: cgg...@yahoo.com
> Cc: pgsql-general@postgresql.org, "Adrian Klaver" , 
> "Craig Ringer" 
> Date: Friday, December 11, 2009, 1:17 PM
> On Thu, Dec 10, 2009 at 1:21 PM, CG
> 
> wrote:
> >
> > Thanks for the suggestion. I'm not sure what you mean
> when you say I should restore to a file. Do you mean I
> should dump the database to an SQL file instead of the
> "compressed" format?
> >
> > What do you think I will find?
> >
> > In the database dump, it is including a row that
> should be marked as deleted. I can select on that key in the
> production database and get zero rows, and I can select on
> that key in the restored database and find the row. When I
> ignore errors the data is restored, but the foreign key
> can't be created (and that is the only error I encounter).
> The presence of the data in the dump can not be contested...
> :)
> 
> This could be a corrupted index problem maybe?  If you
> do this:
> 
> set enable_indexscan=off;
> select * from table where key=value;
> 
> does it still not show up?
> 

Bingo. Showed right up. I did a reindex, and now it shows up searching via 
sequential scan or index scan.

So that's pretty scary to have a corrupted index. Once I reindexed, I'm able to 
see /a lot/ of data I couldn't before. This is the first time in 9 years that 
I've been bitten by PostgreSQL, and this one HURT.

PostgreSQL didn't crash, so there was no indication of failure until the 
demp-reload. To quote from the masters: Although in theory this should never 
happen, in practice indexes may become corrupted due to software bugs or 
hardware failures. I'm reasonably certain that the hardware for the server is 
sound. No crashes, no alarms... That leaves sofware bugs. 

We're running PostgreSQL 8.4.1. I don't see any smoking gun bugfixes in 8.4.2, 
but we'll upgrade ASAP anyway... 

What are your suggestions for how to proceed?




-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] pg_dump and ON DELETE CASCADE problem

2009-12-17 Thread CG


--- On Tue, 12/15/09, Adrian Klaver  wrote:

> From: Adrian Klaver 
> Subject: Re: [GENERAL] pg_dump and ON DELETE CASCADE problem
> To: cgg...@yahoo.com
> Cc: "postgresql listserv" , "Craig Ringer" 
> , "Scott Marlowe" 
> Date: Tuesday, December 15, 2009, 6:53 PM
> On Tuesday 15 December 2009 2:33:39
> pm CG wrote:
> 
> >
> > Bingo. Showed right up. I did a reindex, and now it
> shows up searching via
> > sequential scan or index scan.
> >
> > So that's pretty scary to have a corrupted index. Once
> I reindexed, I'm
> > able to see /a lot/ of data I couldn't before. This is
> the first time in 9
> > years that I've been bitten by PostgreSQL, and this
> one HURT.
> >
> > PostgreSQL didn't crash, so there was no indication of
> failure until the
> > demp-reload. To quote from the masters: Although in
> theory this should
> > never happen, in practice indexes may become corrupted
> due to software bugs
> > or hardware failures. I'm reasonably certain that the
> hardware for the
> > server is sound. No crashes, no alarms... That leaves
> sofware bugs.
> >
> > We're running PostgreSQL 8.4.1. I don't see any
> smoking gun bugfixes in
> > 8.4.2, but we'll upgrade ASAP anyway...
> >
> > What are your suggestions for how to proceed?
> 
> Interesting, though something is still bothering me. To
> quote from one of your 
> posts upstream;
> 
> "That was the same failure I got the previous night. I go
> to the live database 
> and rows with that key are /not/ in either one of those
> tables. They /were/ in 
> the tables at one point. I have an ON DELETE trigger that
> copies deleted rows 
> into another table, so I can see that a row with that key
> once existed in those 
> tables."
>  
> Would seem that the rows where deleted and should not be
> there when the table 
> was reindexed. Are the 'new' rows you are seeing also in
> the delete table?
> 

select foo from bar where baz = 'key';

I was mistaken when I said that the row was not in the table. If I had an index 
on baz, and the index was corrupted, that SQL would return 0 rows leading me to 
believe that there were no rows in the table with that key.

And, the reason for that row remaining in the database after its foreign keyed 
parent row was deleted was because the delete operation was depending on the 
index to find the rows to delete, and that index was corrupt.

Of course, I had no idea that the index was corrupt when I made my first post.

On the table that has the "martian" row, there is no delete storage. Since the 
data in the table is trigger-generated for FTI searches, there's no reason to 
keep that data around.

I'm still faced with the unpleasant and unresolved issue of why the index was 
corrupted in the first place.



  

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] pg_dump and ON DELETE CASCADE problem

2009-12-17 Thread CG


--- On Thu, 12/17/09, Adrian Klaver  wrote:

> 
> Would it be possible to see the table schemas and indices
> ?
> 
> > 

Sure (you asked for it!!) :

CREATE TABLE packet
(
  id integer NOT NULL DEFAULT nextval('packet_id_seq'::regclass),
  packet_uuid uniqueidentifier NOT NULL DEFAULT newid(),
  username character varying(50) NOT NULL DEFAULT ''::character varying,
  pgroup_uuid uniqueidentifier DEFAULT newid(),
  orig_trans_uuid uniqueidentifier,
  user_reference_id character varying(50) DEFAULT ''::character varying,
  trans_data character varying(100) NOT NULL DEFAULT ''::character varying,
  trans_type character varying(50) NOT NULL DEFAULT 'unknown'::character 
varying,
  trans_date timestamp with time zone DEFAULT ('now'::text)::timestamp(6) with 
time zone,
  processor character varying(10),
  service character varying(10),
  CONSTRAINT packet_pkey PRIMARY KEY (id)
)
WITH (
  OIDS=TRUE
);
ALTER TABLE packet OWNER TO postgres;
GRANT ALL ON TABLE packet TO postgres;
GRANT ALL ON TABLE packet TO adduser;

CREATE INDEX packet_otuuid_idx
  ON packet
  USING btree
  (orig_trans_uuid);

CREATE INDEX packet_pgroup_uuid_idx
  ON packet
  USING btree
  (pgroup_uuid);

CREATE INDEX packet_puuid_hash_uniq
  ON packet
  USING hash
  (packet_uuid);

CREATE UNIQUE INDEX packet_puuid_idx
  ON packet
  USING btree
  (packet_uuid);

CREATE INDEX packet_trans_date_idx
  ON packet
  USING btree
  (trans_date);

CREATE INDEX packet_user_idx
  ON packet
  USING btree
  (username);

CREATE INDEX packet_user_puuid_idx
  ON packet
  USING btree
  (username, packet_uuid);

CREATE OR REPLACE RULE packet_delete_rule AS
ON DELETE TO packet DO  INSERT INTO removed_packet (id, packet_uuid, 
username, pgroup_uuid, orig_trans_uuid, user_reference_id, trans_data, 
trans_type, trans_date, processor, service)  SELECT packet.id, 
packet.packet_uuid, packet.username, packet.pgroup_uuid, 
packet.orig_trans_uuid, packet.user_reference_id, packet.trans_data, 
packet.trans_type, packet.trans_date, packet.processor, packet.service
   FROM packet
  WHERE packet.id = old.id;

CREATE TRIGGER packet_count_delete_trig
  BEFORE DELETE
  ON packet
  FOR EACH ROW
  EXECUTE PROCEDURE letter_count_trig();

CREATE TRIGGER packet_count_insert_trig
  AFTER INSERT
  ON packet
  FOR EACH ROW
  EXECUTE PROCEDURE letter_count_trig();

CREATE TRIGGER packet_delete_trig
  BEFORE DELETE
  ON packet
  FOR EACH ROW
  EXECUTE PROCEDURE packet_datalink_status_trig();

CREATE TRIGGER packet_insert_trig
  AFTER INSERT
  ON packet
  FOR EACH ROW
  EXECUTE PROCEDURE packet_ins_trig();

CREATE TABLE dpo.packet_search_trigram
(
  id integer NOT NULL DEFAULT nextval('packet_search_trigram_id_seq'::regclass),
  packet_uuid uniqueidentifier NOT NULL,
  trigram_vector tsvector NOT NULL,
  CONSTRAINT packet_search_trigram_id_pkey PRIMARY KEY (id),
  CONSTRAINT packet_search_trigram_puuid_fkey FOREIGN KEY (packet_uuid)
  REFERENCES dpo.packet (packet_uuid) MATCH SIMPLE
  ON UPDATE CASCADE ON DELETE CASCADE
)
WITH (
  OIDS=FALSE
);
ALTER TABLE dpo.packet_search_trigram OWNER TO postgres;
GRANT ALL ON TABLE dpo.packet_search_trigram TO postgres WITH GRANT OPTION;
GRANT ALL ON TABLE dpo.packet_search_trigram TO addgroup;

CREATE INDEX packet_search_trigram_packet_uuid_idx
  ON dpo.packet_search_trigram
  USING hash
  (packet_uuid);

CREATE INDEX packet_search_trigram_trigram_vector_idx
  ON dpo.packet_search_trigram
  USING gin
  (trigram_vector);



  

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


[GENERAL] Function written in C, hangs on one machine and not another...

2005-10-28 Thread CG
PostgreSQL 7.4 ...

Essentially, I've written a function in C for use with PostgreSQL. The debugger
shows that the program is hanging on the part of the program that is writing
data into it's own STDIN. 

[snip]

  // Open up and hijack STDIN
  int pipe_pair[2];
  int pipe_rv = pipe(pipe_pair);
  if (pipe_rv != 0)
  // Abort! Abort!
  {
close(pipe_pair[1]);
pfree(param_1);
pfree(param_2);
PG_RETURN_NULL();
  }

  int newfd = dup2(pipe_pair[0],STDIN_FILENO);
  if (newfd != 0)
  // Abort! Abort!
  {
close(pipe_pair[1]);
pfree(param_1);
pfree(param_2);
PG_RETURN_NULL();
  }

  // Write param_1 to hijacked pipe
  write(pipe_pair[1], param_1, param_1_len); // Hangs here...

[/snip]

It works on the machine I use for testing from within PostgreSQL, but it
doesn't work on the machine which is the production server. I'd hate for this
to matter, but I ought to disclose that testing machine is a 1-way AMD Box with
a more recent version of the Linux 2.6 kernel, and a more recent version of
libc. The production machine is a 2-way Dell Xeon processor. Same version of
PostgreSQL, compiled with the same flags (except with debugging symbols for the
testing machine). You'd, or at least I would, think simple code like this would
compile and run on multiple platforms...

I can perform the same STDIN hijacking on both machines in a standalone
program, but it fails under PostgreSQL.

I'm completely stumped, and I need YOUR insight! Thank you!!

CGV



__ 
Yahoo! FareChase: Search multiple travel sites in one click.
http://farechase.yahoo.com

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [GENERAL] Function written in C, hangs on one machine and not another...

2005-10-28 Thread CG
--- Martijn van Oosterhout  wrote:

> On Fri, Oct 28, 2005 at 06:38:29AM -0700, CG wrote:
> 
> Umm, what *are* you trying to do? Is this running in the backend?

Yes, running on the back-end. I'm trying to utilize Adobe's FDF toolkit to
parse the FDF files stored in my database. They distirubte a C-Library that can
be used to parse FDF files. 

> Firstly, depending on the saize of param_1, the write will block
> because it can't write all of it (usually PIPE_BUF). Perhaps recent
> kernel versions have changed to make it so no data is accepted until a
> reader appears even if the data is smaller than that.
> 
> Since apparently you want the read to happen in the same process as the
> write, you've just deadlocked yourself. The write won't happen till
> someone reads, and the read won't happen because you're stuck
> writing...

So it might be a kernel thing. What is different when the function is called
from within PostgreSQL that is different that the function being called in a
standalone program?

> Finally, this is insane, why would you want to change STDIN?

Insanity? I agree completely. The major issue is that the FDF Toolkit has only
one function for reading in FDF Data:

/*
  FDFOpen: Reads an FDF file into memory. Client should call FDFClose() when
  the FDF is no longer needed. Parameters:

  - fileName: Complete pathname (in Host encoding), or  "-" to read from stdin.
  - howMany: If fileName specifies stdin, then howMany should indicate the
number of characters to read. Otherwise, it is unused. In a web server
environment, this is available as the value of the CONTENT_LENGTH
environment variable. In some servers executing cgi-bin scripts, if the
script tries to read stdin until an EOF is reached, the script hangs.
Thus this parameter.
  - pTheFDF: If FDFOpen() returns FDFErcOK, then pTheFDF will point to an
FDFDoc, which is needed for most other calls in the API.
  - Error codes: FDFErcBadParameter, FDFErcFileSysErr, FDFErcBadFDF,
FDFErcInternalError
*/
FDFLIBAPI FDFErc FDFOpen(const char* fileName, ASInt32 howMany, FDFDoc*
pTheFDF);

There's no other way to load data into the toolkit! (Can you /feel/ the
insanity?)

Does this give you any more insight into an alternate method of getting this
thing done?





__ 
Start your day with Yahoo! - Make it your home page! 
http://www.yahoo.com/r/hs

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [GENERAL] Function written in C, hangs on one machine and not another...

2005-10-28 Thread CG
--- Martijn van Oosterhout  wrote:

> On Fri, Oct 28, 2005 at 07:24:12AM -0700, CG wrote:
> Not entirely sure, but I'm sure the size of the write matters. For
> example, if your test rpogram, did you check that the write actually
> wrote everything?

There's beginning and ending tokens in the FDF filespec. The toolkit complains
if the data isn't all there...

> Firstly, instead of using stdin, you can pass /dev/fd/
> as the filename (on Linux). This avoids stuffing with stdin.

That's a FANTASTIC idea. I'll give it a go. We'll cross our fingers, hold our
breath, and hope that the blocking issue evaporates. :)

> ISTM the best idea: write the data to disk then read it back. Why be
> difficult when you can do it easily...

I was never supposed to have to do this sort of thing. The idea was never to
pull individual peices of data out of the FDFs. Now, the bosses say I have to
do some usage analysis, and the data is locked up tight in an FDF. I suppose I
could write 100+ files to disk and read them back off and then delete them.
At the time, that seemed more insane to me than trying to pump data into stdin.
I'm not so sure anymore.. :)





__ 
Start your day with Yahoo! - Make it your home page! 
http://www.yahoo.com/r/hs

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [GENERAL] Function written in C, hangs on one machine and not another...

2005-10-28 Thread CG

Thanks to the great suggestions I've at least gotten it to not hang... 

Martijn's hint about blocking led me to open up those filehandles in a
non-blocking mode. It appears that write() will write, at a maximum, only 4096
bytes when it is called from within PostgreSQL. I've tried to push data into it
in <=4096-byte slugs, but after 4096 bytes it just won't take anymore. Since (I
think) using a non-blocking mode could cause problems with thread safety, it's
probably a lost cause.

I'm new to C, so this may seem extremely naive: I'm not sure how to use exec()
to solve this problem. Could you give me a few pointers to get me started?



--- Douglas McNaught <[EMAIL PROTECTED]> wrote:

> Martijn van Oosterhout  writes:
> 
> > The things that have screwed me up in the past with pulling tricks like
> > this are:
> >
> > 1. Program has registered atexit() handlers. _exit() avoids this.
> > 2. Pending stdio output that gets flushed. The backend doesn't use
> > stdio much so you might be fine here.
> > 3. Signals. Make sure you don't get sent signals that screw state.
> > Might be wise to block them all, or reset them all to default.
> >
> > Truly, exec() is the cleanest way to solve all this, it simply replaces
> > the current process, lock, stock and barrel.
> 
> Definitely.  It would probably also be good to close all file
> descriptors (except for stdin/etdout/stderr) before exec(), just in
> case the other binary does something screwy with random file
> descriptors (which it obviously shouldn't).
> 
> -Doug
> 
> ---(end of broadcast)---
> TIP 2: Don't 'kill -9' the postmaster
> 






__ 
Yahoo! FareChase: Search multiple travel sites in one click.
http://farechase.yahoo.com

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


[GENERAL] CLUSTERing on Insert

2006-09-17 Thread CG
As I'm waiting for a CLUSTER operation to finish, it occurs to me that in a lot 
of cases, the performance benefits to having one's data stored on disk in index 
order can outweigh the overhead involved in inserting data on-disk in index 
order Just an idea I thought I'd throw out. :) 
 
Also, the CLUSTER operation is about as straight forward as one can get. It 
basically reads each row, one-by-one, in the index order over to the new table, 
reindexes, then renames the new table to preserve references. I've been 
thinking about how to speed up the copy process. Perhaps taking contiguous 
blocks of data and moving them into place would save some I/O time. Locking the 
table is another problem. Would it be impossible to perform the CLUSTER within 
the context of a READ COMMITTED transaction, and then pick up the leftover CRUD 
rows and put them at the end of the file. The existing code makes some 
assumptions that the table was not altered. There would be no more assumptions. 
 
I'm sure I'm not the first person to scratch his head thinking about CLUSTER. 
Maybe I just don't really understand the limitations that are out there 
preventing these things from being created. But, what else is there to do at 
1AM on a Sunday night waiting for a 500MB table to CLUSTER? :)
 
 
CG
 

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


[GENERAL] Creating an index-type for LIKE '%value%'

2005-02-07 Thread CG
Once upon a time there was an FTI contrib module that split up a varchar field
into little bits and placed them into an FTI table to facilitate a full text
index search. It was like being able to do a "SELECT * FROM table WHERE field
LIKE '%value%';" and have it search an index!

It was a great idea! What a pain it was to implement!

You see, a trigger had to be run on INSERT and UPDATE to split up the varchar
field into little pieces. On DELETE you'd have to clear out the rows from the
FTI table. And when you wanted to use the FTI table in a SELECT you had to
write your SQL to join up that FTI table and dig through it. 

As I was exploring ways to optimize my application's use of the database, which
has to run the horrid "SELECT * FROM table WHERE field LIKE '%value%';" in
places, I thought this solution could be built upon to allow for an easier
deployment.

AFAICT, the "right" way to do this would be to create an index type which would
take care of splitting the varchar field, and to have the query planner use the
index in a SELECT when a "LIKE '%value%'" is used in the WHERE clause.

Tsearch2 is fantastic, but it works best for fields that contain words. I have
to sift through alphanumeric identification numbers. 

Is the split-field FTI the best way to tackle my problem?

What can I do to get better performance on "SELECT * FROM table WHERE field
LIKE '%value%';" ??

CG



__ 
Do you Yahoo!? 
Yahoo! Mail - Easier than ever with enhanced search. Learn more.
http://info.mail.yahoo.com/mail_250

---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match


[GENERAL] ltree + gist index performance degrades significantly over a night

2006-02-24 Thread CG
PostgreSQL 8.1.3

I'm trying to collect some hard numbers to show just how much it degrades and
over how long a time interval.

All I have now is anecdotal evidence, and I was hoping to save myself some
downtime by seeking advice early.

I have a search table which I use for partial-match text searches:

CREATE TABLE search
(
  id int4 NOT NULL DEFAULT nextval('search_id_seq'::regclass),
  item_id int8 NOT NULL,
  search_vector ltree NOT NULL,
  CONSTRAINT search_id_pkey PRIMARY KEY (id),
  CONSTRAINT search_item_id_fkey FOREIGN KEY (item_id)
  REFERENCES items (id) MATCH SIMPLE
  ON UPDATE CASCADE ON DELETE CASCADE
) 
WITH OIDS;

CREATE INDEX lsearch_vector_idx
  ON search
  USING gist
  (search_vector);

I have some triggers that insert rows into the search table as rows are
inserted into "items".

I implimented this yesterday, and the immediate effect was a fantastic return
time for partial text searches in the sub-second range. By today, these queries
take 10 minutes sometimes... There are about 134000 rows in the table.

The table gets analyzed nightly. Should the frequency be more? There are about
1000 rows added a day, only about 30 or so rows removed, and nothing is ever
updated. There's not that much turnover.

The search vectors are built like this:

For a string "Hello World" the ltree is created like 'h.e.l.l.o.w.o.r.l.d' ...
If I wanted to find all rows with "orl" in them i would construct an lquery
like '*.o.r.l.*' and use the "~" operator in the where clause. I would link to
the table "items" by the item_id ... 

What could be making this go so wrong? Is there a better way to accomplish my
task?

CG

__
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around 
http://mail.yahoo.com 

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [GENERAL] ltree + gist index performance degrades significantly over a night

2006-02-24 Thread CG
--- Scott Marlowe <[EMAIL PROTECTED]> wrote:

> Are you vacuuming regularly, are your fsm settings high enough, and what
> does vacuum verbose say?

Autovacuum is running, but I do a nightly vacuum analyze. When I just do a
vacuum analyze on the table I get:

data=# vacuum analyze verbose search;
INFO:  vacuuming "search"
INFO:  index "search_id_pkey" now contains 1344661 row versions in 5134 pages
DETAIL:  9 index row versions were removed.
0 index pages have been deleted, 0 are currently reusable.
CPU 0.09s/0.07u sec elapsed 4.91 sec.
INFO:  index "search_vector_idx" now contains 1344672 row versions in 47725
pages
DETAIL:  9 index row versions were removed.
0 index pages have been deleted, 0 are currently reusable.
CPU 1.77s/0.37u sec elapsed 407.55 sec.
INFO:  index "search_item_id_idx" now contains 1344690 row versions in 6652
pages
DETAIL:  9 index row versions were removed.
0 index pages have been deleted, 0 are currently reusable.
CPU 0.24s/0.08u sec elapsed 45.62 sec.
INFO:  "search": removed 9 row versions in 2 pages
DETAIL:  CPU 0.00s/0.00u sec elapsed 0.05 sec.
INFO:  "letter_search": found 9 removable, 1344661 nonremovable row versions in
33984 pages
DETAIL:  0 dead row versions cannot be removed yet.
There were 141 unused item pointers.
0 pages are entirely empty.
CPU 2.41s/0.62u sec elapsed 483.06 sec.
INFO:  vacuuming "pg_toast.pg_toast_174918394"
INFO:  index "pg_toast_174918394_index" now contains 0 row versions in 1 pages
DETAIL:  0 index pages have been deleted, 0 are currently reusable.
CPU 0.00s/0.00u sec elapsed 0.01 sec.
INFO:  "pg_toast_174918394": found 0 removable, 0 nonremovable row versions in
0 pages
DETAIL:  0 dead row versions cannot be removed yet.
There were 0 unused item pointers.
0 pages are entirely empty.
CPU 0.00s/0.00u sec elapsed 0.01 sec.
INFO:  analyzing "search"
INFO:  "search": scanned 3000 of 33984 pages, containing 119035 live rows and 0
dead rows; 3000 rows in sample, 1348428 estimated total rows
VACUUM

max_fsm_pages = 6   # min max_fsm_relations*16, 6 bytes each
max_fsm_relations = 2000# min 100, ~70 bytes each




__
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around 
http://mail.yahoo.com 

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [GENERAL] ltree + gist index performance degrades significantly over a night

2006-02-24 Thread CG


--- Martijn van Oosterhout  wrote:

> That's very odd. Like the other person said, do you vacuum and analyse?
> But my question is: is it using the index? What does EXPLAIN / EXPLAIN
> ANALYZE tell you?

data=# explain select * from search where search_vector ~ '*.o.r.l.*'::lquery;
QUERY PLAN
---
 Bitmap Heap Scan on search  (cost=53.71..4566.65 rows=1345 width=161)
   Recheck Cond: (search_vector ~ '*.o.r.l.*'::lquery)
   ->  Bitmap Index Scan on search_vector_idx  (cost=0.00..53.71 rows=1345
width=0)
 Index Cond: (search_vector ~ '*.o.r.l.*'::lquery)
(4 rows)

data=# explain analyze select * from search where search_vector ~
'*.o.r.l.*'::lquery;
QUERY PLAN
--
 Bitmap Heap Scan on search  (cost=53.71..4566.65 rows=1345 width=161) (actual
time=183684.156..196997.278 rows=1655 loops=1)
   Recheck Cond: (search_vector ~ '*.o.r.l.*'::lquery)
   ->  Bitmap Index Scan on search_vector_idx  (cost=0.00..53.71 rows=1345
width=0) (actual time=183683.857..183683.857 rows=1655 loops=1)
 Index Cond: (search_vector ~ '*.o.r.l.*'::lquery)
 Total runtime: 197000.061 ms
(5 rows)

I appreciate you taking the time to help me out. Thank you all.


__
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around 
http://mail.yahoo.com 

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [GENERAL] ltree + gist index performance degrades significantly over a night

2006-02-27 Thread CG
Tsearch2 searches for whole words, and is designed with language in mind, yes? 

I'm looking for consecutive characters in words or serial numbers, etc. 

As for support, the same guys who wrote Tsearch2 wrote ltree. Can't go wrong
there!

Here's the solution to this problem: As usual, operator error. :(

For some reason I thought it would be a good idea to cluster the table on the
item_id index... What in the world was I thinking? When I clustered the search
table on the search_vector index (which makes the most sense, yes?) it seemed
to bring actual performance in-line with the type of performance I imagined
that I would receive.

I could probably get even better performance out of the table, at the cost of a
significant increase in table and index size, by chopping up the columns into
smaller chunks.

"Hello World" would yield

'h.e.l.l.o.w.o.r.l.d'
'e.l.l.o.w.o.r.l.d'
'l.l.o.w.o.r.l.d'
'l.o.w.o.r.l.d'
'o.w.o.r.l.d'
'w.o.r.l.d'
'o.r.l.d'
'r.l.d'

and using a wildcard search "search_vector ~ 'o.r.l.*'" would jump right to the
vectors which start with "o.r.l" ... 

Thanks for all the responses! They did get my head pointed in the right
direction.

CG

--- "Jim C. Nasby" <[EMAIL PROTECTED]> wrote:

> On Fri, Feb 24, 2006 at 09:02:04AM -0800, CG wrote:
> > I have a search table which I use for partial-match text searches:
> 
> > For a string "Hello World" the ltree is created like 'h.e.l.l.o.w.o.r.l.d'
> ...
> > If I wanted to find all rows with "orl" in them i would construct an lquery
> > like '*.o.r.l.*' and use the "~" operator in the where clause. I would link
> to
> > the table "items" by the item_id ... 
> 
> Is there some reason you can't use tsearch2? I suspect it would probably
> work better; if nothing else you'd probably get better support since a
> lot more people use it.
> -- 
> Jim C. Nasby, Sr. Engineering Consultant  [EMAIL PROTECTED]
> Pervasive Software  http://pervasive.comwork: 512-231-6117
> vcard: http://jim.nasby.net/pervasive.vcf   cell: 512-569-9461
> 



__
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around 
http://mail.yahoo.com 

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [GENERAL] ltree + gist index performance degrades significantly over a night

2006-02-27 Thread CG

That would do the job, wouldn't it? :)

I don't think it's a naive question at all. Its quite a good question, and the
solution you suggest is a good option to have, and would probably work better
than the single-vector ltree index for simple substring matching. In my case,
the ltree+gist index table actually contains more pages than the table of data
itself. I'd need to see if the space required for the varchar+btree tables are
comparible, better, or worse than the ltree+gist tables with regards to size.
Now that I think about it, building substrings out of ltree nodes would be
incredible overkill comapred to the effetiveness of the varchar+btree.

The extra advantages of ltree are the ability to match and extract nodes in a
path based not only on contents but also proximity, and aggregate on those
characteristics. 

In my case this might be good for serial numbers where each digit or grouping
of digits have special values which would be used to aggregate on.

The ltree method was suggested to me a while back when I was frustrated with
the performance of "like '%something%'" ... 



--- Bernhard Weisshuhn <[EMAIL PROTECTED]> wrote:

> On Mon, Feb 27, 2006 at 09:14:40AM -0800, CG <[EMAIL PROTECTED]> wrote:
> 
> > I could probably get even better performance out of the table, at the cost
> of a
> > significant increase in table and index size, by chopping up the columns
> into
> > smaller chunks.
> > 
> > "Hello World" would yield
> > 
> > 'h.e.l.l.o.w.o.r.l.d'
> > 'e.l.l.o.w.o.r.l.d'
> > 'l.l.o.w.o.r.l.d'
> > 'l.o.w.o.r.l.d'
> > 'o.w.o.r.l.d'
> > 'w.o.r.l.d'
> > 'o.r.l.d'
> > 'r.l.d'
> > 
> > and using a wildcard search "search_vector ~ 'o.r.l.*'" would jump right to
> the
> > vectors which start with "o.r.l" ... 
> 
> But with this approch you'd be fine with a normal varchar_ops btree index
> for textfields and searching using "like 'world%'", wouldn't you?
> Or is the ltree approch more efficient?
> 
> I'm not trying to be smart-assed, it's a naive question, since I'm
> looking for an efficient substring search solution in postgresql myself.
> 
> regards,
> bkw
> 
> 


__
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around 
http://mail.yahoo.com 

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [GENERAL] ltree + gist index performance degrades significantly over a night

2006-02-28 Thread CG


--- Bernhard Weisshuhn <[EMAIL PROTECTED]> wrote:

> On Mon, Feb 27, 2006 at 10:27:20AM -0800, CG <[EMAIL PROTECTED]> wrote:
> 
> > [...] I'd need to see if the space required for the varchar+btree tables
> are
> > comparible, better, or worse than the ltree+gist tables with regards to
> size.
> 
> Please test this, I'm guessing (hoping actually) that having bazillions of
> combinations of 26 (or so) characters (ltree labels) might be consuming
> less space than having bazillions of substings in the database.
> 
> Or maybe some clever combination of both approaches?
> 
> If you find out something interesting, please let me know.

Performance using varchar+btree, breaking up the string into distinct letter
groups >= 3 chars is slightly better. Size of the varchar search vector table
table is much bigger.. Most of my fields are about 15-25 characters in length.
Expect even bigger tables for longer fields. The size of the btree index is
less. The time to bootstrap the data into the tables was significantly longer.
I used two triggers, one that normalized the search field before insert, and
another that inserted a breakdown row after the insert row. There's a recursive
effect built-in to get down to the smallest unique element.

I'm sticking with ltree and setting up a vacuum analyze on a cron to keep the
searches snappy. Hope that helps you with your project!

CG

__
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around 
http://mail.yahoo.com 

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


[GENERAL] GiST index slower than seqscan

2006-04-20 Thread CG
I'm still trying to wrap my brain around this one. Please forgive me if this is
the proverbial "dead horse" that I'm beating.

In case you're unfamiliar with this particular horse, I'm using ltree to create
a full text index on some <= 50 char long fields for a lookup table. The idea
was to be able to tear through tons of data quickly finding case insensitive
substring matches.  

Here's my index...

CREATE INDEX letter_search_vector_idx
  ON letter_search
  USING gist
  (search_vector);

I know that the index is bigger than the table, but shouldn't it be able to
quickly scan the few branches that matter?

I've tried to do a varchar-based substring lookup table, and the size for that
table+index is enormous compared to the ltree table + index (which is huge
anyway)

I'm thinking that I've created something or am using something incorrectly.

I need to be set straight. Please advise!

[snip]

Welcome to psql 8.1.3, the PostgreSQL interactive terminal.

Type:  \copyright for distribution terms
   \h for help with SQL commands
   \? for help with psql commands
   \g or terminate with semicolon to execute query
   \q to quit
data=# analyze letter_search;
ANALYZE
data=# explain select * from letter_search where search_vector ~
charslquery('669344');
QUERY PLAN


---
 Bitmap Heap Scan on letter_search  (cost=59.14..4978.98 rows=1467 width=162)
   Recheck Cond: (search_vector ~ '*.6.6.9.3.4.4.*'::lquery)
   ->  Bitmap Index Scan on letter_search_vector_idx  (cost=0.00..59.14
rows=146
7 width=0)
 Index Cond: (search_vector ~ '*.6.6.9.3.4.4.*'::lquery)
(4 rows)

data=# explain analyze select * from letter_search where search_vector ~
charslquery('669344');
 QUERY PLAN


-
 Bitmap Heap Scan on letter_search  (cost=59.14..4978.98 rows=1467 width=162)
(a
ctual time=63061.402..63072.362 rows=2 loops=1)
   Recheck Cond: (search_vector ~ '*.6.6.9.3.4.4.*'::lquery)
   ->  Bitmap Index Scan on letter_search_vector_idx  (cost=0.00..59.14
rows=146
7 width=0) (actual time=63058.094..63058.094 rows=2 loops=1)
 Index Cond: (search_vector ~ '*.6.6.9.3.4.4.*'::lquery)
 Total runtime: 63072.411 ms
(5 rows)

data=# set enable_bitmapscan=off;
SET
data=# explain select * from letter_search where search_vector ~ charslqu
ery('669344');
  QUERY PLAN


--
 Index Scan using letter_search_vector_idx on letter_search 
(cost=0.00..5837.70
 rows=1467 width=162)
   Index Cond: (search_vector ~ '*.6.6.9.3.4.4.*'::lquery)
(2 rows)

data=# explain analyze select * from letter_search where search_vector ~
charslquery('669344');
   QUERY
PLA
N

-
 Index Scan using letter_search_vector_idx on letter_search 
(cost=0.00..5837.70
 rows=1467 width=162) (actual time=14582.619..162637.887 rows=2 loops=1)
   Index Cond: (search_vector ~ '*.6.6.9.3.4.4.*'::lquery)
 Total runtime: 162637.977 ms
(3 rows)

data=# set enable_indexscan=off;
SET
data=# explain select * from letter_search where search_vector ~ charslqu
ery('669344');
  QUERY PLAN
--
 Seq Scan on letter_search  (cost=0.00..55232.18 rows=1467 width=162)
   Filter: (search_vector ~ '*.6.6.9.3.4.4.*'::lquery)
(2 rows)

data=# explain analyze select * from letter_search where search_vector ~
charslquery('669344');
  QUERY PLAN


--
 Seq Scan on letter_search  (cost=0.00..55232.18 rows=1467 width=162) (actual
ti
me=4725.525..9428.087 rows=2 loops=1)
   Filter: (search_vector ~ '*.6.6.9.3.4.4.*'::lquery)
 Total runtime: 9428.118 ms
(3 rows)

[/snip]


__
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around 
http://mail.yahoo.com 

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [GENERAL] GiST index slower than seqscan

2006-04-21 Thread CG


--- Teodor Sigaev <[EMAIL PROTECTED]> wrote:

> > In case you're unfamiliar with this particular horse, I'm using ltree to
> create
> > a full text index on some <= 50 char long fields for a lookup table. The
> idea
> > was to be able to tear through tons of data quickly finding case
> insensitive
> > substring matches.  
> > 
> 
> Why it is a ltree, not a tsearch?

When I said full text, I meant substring. Please correct me if I am wrong, but
tsearch would be useful for finding words in a paragraph, not characters in a
word (or small group of words) ... If I had fields 'Hello World!', 'Low Tide',
and 'Following Day' they would all be hits for a search on 'low' ...

> 
> 
> >  Index Cond: (search_vector ~ '*.6.6.9.3.4.4.*'::lquery)
> 
> That's the problem. Queries which begin with '*' will be slow enough...
> 

Indeed. Substring searches are quite costly... I was hoping that the
hiearchical nature of ltree would allow me to be able to sift quickly through
the list since every alpha or numeric character would be a branch on the tree.

> Try to reduce SIGLENINT in tsearch2/gistidx.h up to 8 (do not forget reindex
> !!) 
> and try it

I bet you meant ltree/ltree.h ... I'll give that a try and see what happens!
Thank you! 

__
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around 
http://mail.yahoo.com 

---(end of broadcast)---
TIP 6: explain analyze is your friend


[GENERAL] Postmaster crashes after upgrade to 8.1.4!

2006-05-25 Thread CG
nect to the database and repeat your command.
2006-05-25 08:30:50.089 EDT myuser mydata 10.0.1.1(4133)WARNING:  terminating
connection because of crash of another server process
2006-05-25 08:30:50.089 EDT myuser mydata 10.0.1.1(4133)DETAIL:  The postmaster
has commanded this server process to roll back the current transaction and
exit, because another server process exited abnormally and possibly corrupted
shared memory.
2006-05-25 08:30:50.089 EDT myuser mydata 10.0.1.1(4133)HINT:  In a moment you
should be able to reconnect to the database and repeat your command.
2006-05-25 08:30:50.091 EDT   LOG:  all server processes terminated;
reinitializing
2006-05-25 08:30:50.103 EDT   LOG:  database system was interrupted at
2006-05-25 08:29:27 EDT
2006-05-25 08:30:50.103 EDT   LOG:  checkpoint record is at 28/3C101AE8
2006-05-25 08:30:50.103 EDT   LOG:  redo record is at 28/3C101AE8; undo record
is at 0/0; shutdown TRUE
2006-05-25 08:30:50.103 EDT   LOG:  next transaction ID: 204190698; next OID:
186879866
2006-05-25 08:30:50.103 EDT   LOG:  next MultiXactId: 1; next MultiXactOffset:
0
2006-05-25 08:30:50.103 EDT   LOG:  database system was not properly shut down;
automatic recovery in progress
2006-05-25 08:30:50.114 EDT   LOG:  redo starts at 28/3C101B38


What could be wrong? I am panicking. Please advise!

CG

__
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around 
http://mail.yahoo.com 

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [GENERAL] Postmaster crashes after upgrade to 8.1.4!

2006-05-25 Thread CG
I didn't find a core dump. 

Perhaps I'm looking in the wrong spot or for the wrong file. The file should be
called "core.32140", correct? ... I did a "find / -name core*" ... that found
nothing useful.



--- Tom Lane <[EMAIL PROTECTED]> wrote:

> CG <[EMAIL PROTECTED]> writes:
> > 2006-05-25 08:30:50.076 EDT   LOG:  server process (PID 32140) was
> terminated
> > by signal 11
> 
> That should be leaving a core dump file (if not, restart the postmaster
> under "ulimit -c unlimited").  Get a stack trace with gdb to get some
> more info about what's going on.
> 
>   regards, tom lane
> 


__
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around 
http://mail.yahoo.com 

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [GENERAL] Postmaster crashes after upgrade to 8.1.4!

2006-05-25 Thread CG
Okay, there was no core dump to be found.

I had to revert back to 8.1.3 which seems to be running fine. I am /extremely/
thankful that there was no data corruption.

I took a 24 hour old dumpfile of the database it was crashing on and I restored
it to a similar AMD64 box (SunFire x2100 instead of SunFire x4100) running
8.1.4 and tried to crash it as the other was crashing. No joy. It seems to run.
I'll leave it running and try to put a decent load on the box to get it to
crash. 

Since I would have to down the production database to get a working copy, I
won't be able to copy the offending data directory over to the test
installation until my next maint window rolls around in a few weeks. That, or
we have another outage of some type which would give me the ability to down the
database and copy the tree over.

I wish I could've done more analysis while the server was crippled. I'll keep
trying.

CG

--- Tom Lane <[EMAIL PROTECTED]> wrote:

> CG <[EMAIL PROTECTED]> writes:
> > 2006-05-25 08:30:50.076 EDT   LOG:  server process (PID 32140) was
> terminated
> > by signal 11
> 
> That should be leaving a core dump file (if not, restart the postmaster
> under "ulimit -c unlimited").  Get a stack trace with gdb to get some
> more info about what's going on.
> 
>   regards, tom lane
> 
> ---(end of broadcast)---
> TIP 4: Have you searched our list archives?
> 
>http://archives.postgresql.org
> 


__
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around 
http://mail.yahoo.com 

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster