[HACKERS] pg_largeobject implementation question

2012-10-10 Thread Scott Corscadden
I'm pruning an 8.4 database that was using a *lot* of space in pg_largeobject 
(400GB according to my size query). What I've done seems to work, but I don't 
know if there's a time-bomb waiting for me, so I have to ask you who've 
implemented this part. Steps:

1)  On new.better.server.com:   time nc -l -p 1234 | tee backup.dump | 
pg_restore --verbose -Fc -d mydbname
2)  On old.godhelpme.server.com:pg_dump --verbose -Fc -Z 0 --schema=foo 
--schema=bar --schema=public  mydbname | nc other.better.server 1234

That part's pretty cool, streaming out/in works really well. By specifically 
picking my schemas, no large objects come across. I've already migrated out to 
filesystem most of them, there's around 102 actual LO's I still need though, so 
then:

3) On old.godhelpme.server.com:  time psql -qAt -d mydbname -c 'copy (select 
loidcolumnname from thistable where loidcolumnname is not null) to STDOUT'  
myneat.list

I do #3 on a few tables, sort and uniquify the rows, then get the data out by:

4) On old.godhelpme.server.com:  time cat myneat.list | xargs -n 1 
./extract_blobs.sh

Where extract_blobs.sh looks like this:

#/bin/bash
psql -qAt -d mydbname -c copy (select * from pg_largeobject where loid = ${1}) 
to STDOUT  /Volumes/some-less-bad-disk/the-blobs.copy

I rsync that to the new machine, then:

5) On new.better.server.com:  cat the-blobs.copy  | psql -At -d mydbname -c 
copy pg_largeobject (loid, pageno, data) from STDIN

That seems to work - I spun up the new db, can add new blobs, but I notice that 
the loid is /lower/ than some of the ones I've imported, so

** MY QUESTION ** - Will pg_largeobject automatically choose new OIDs that 
don't conflict, if I've added lo's this way, by direct COPY?



Thanks so much for reading this far. If you're in London, Ontario, Canada, let 
me know and I'll buy you a good beer. (some limitations may apply! :)

./scc



Re: [HACKERS] pg_largeobject implementation question

2012-10-10 Thread Scott Corscadden
A very timely answer, and we'd debating moving to 9.2 at the same time but 
decided on staying on the 8.4 line (latest patch level though). After we do 
this we should be able to do a regular, normal pg_dump (not excluding anything) 
to get from 8.4 - 9.2 in a few weeks from now.

Thanks so much Tom. 

./scc

On 2012-10-10, at 11:04 AM, Tom Lane t...@sss.pgh.pa.us wrote:

 Scott Corscadden sc...@corscadden.ca writes:
 ** MY QUESTION ** - Will pg_largeobject automatically choose new OIDs that 
 don't conflict, if I've added lo's this way, by direct COPY?
 
 In 8.4, yes.  In later versions, you'd need to do something about
 creating corresponding rows in pg_largeobject_metadata.
 
 In general, all built-in OID columns have mechanisms for choosing
 nonconflicting new values --- but in 9.0 and up, pg_largeobject_metadata
 is the authoritative store of existing OIDs for blobs, not
 pg_largeobject.
 
   regards, tom lane



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] MVCC and large objects

2012-10-02 Thread Scott Corscadden
While I'm referring to the 8.4 release, it's a slightly more general question. 
We're moving away from using pg_largeobject as fast as we can, and while doing 
so I'm seeing interesting behaviour. I'd like to confirm that this is is 
expected, or perhaps I don't have tuning parameters set quite correctly. I 
believe we have standard autovacuum running.

* Table Foo has an oid data column, points to a valid BLOB. We modified this 
table to also have a datafilepath column.
* We wrote a throttleable copier process which walks rows, reads the BLOB 
data out to a file and then UPDATEs the datafilepath column with where it 
wrote it. We did not alter the BLOB data in any way. When asked for byte data, 
the higher level code will first return it from the datafilepath if it's there, 
and fall back on the lo otherwise. 

While the above was working away, we nearly missed the fact that the 
public.pg_largeobject table seemed to be growing commensurate with what we 
were exporting! As we were doing this as the primary disk was nearly out of 
space, it was fortunate I could pause this work. We were able to move the 
entire system and it's now continuing along, but my question:

Is this expected? I'm a little surprised. My theory is that MVCC seems to be 
including the pg_largeobject referenced as a part of the row, and even though 
we're not updating the BLOB at all, a snapshot is getting created. *Is this 
expected*?

Many thanks - single-link RTFM answers welcome, I have seen the MVCC through 
pictures, and I get it - just not how a BLOB fits into MVCC here.

./scc

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers