PostgreSQL info

Gil Hale Wed, 02 Apr 2008 07:11:35 -0700

Well, I downloaded the latest FTC DoNotCall Area Code files Tuesday
afternoon, then unzipped them for processing.  Most of the files are grouped
by state, so they will contain more than a single Area Code.  A few other
files are for specific Area Codes, or for US territories with one or more
Area Codes (Puerto Rico, US Samoa, US Virgin Islands).


The file names are rather long (50+ characters), but the core
State/Territory/Area Code value for each file is preceded and appended with
an underscore ("_") character.  Using the ADIR() function I renamed each of
the 67 files with a shorter name (State, Area Code or Territory, preceded
with "_") than originally provided, just to keep things simple for me to
keep track of.  As part of the file renaming code I also created a SQL
Script file to use with the PGAdminIII SQL command window to import each of
the csv files.  I ended up importing 9.07Gb of csv data into the PostgreSQL
database (plus another 4.3Gb for the indexes).  That represents 157,889,571
records.

It took the machinery 3 hours and 34 minutes to import all the records from
all 67 csv files, which covers all Area Codes in the US and its territories.
Normally I would have only imported the files I needed for my clients' Area
Codes they subscribed to, but curiosity got the better of me this time <g>.
I guess I just wanted to see how well PostgreSQL could handle this size
table.  It did fine, but I see the "DoNotCall table" is actually comprised
of a series of files, each "only" 1Gb in size.  So I guess PostgreSQL does
its own internal file spanning for the entire DoNotCall table.  Ditto for
the index, the files are split into 1Gb pieces.  Very interesting.

After finishing the csv import process I ran my DoNotCall build routine that
creates the updated .dbf files for my client machinery based on the Area
Codes they subscribed to.  Part of the build routine includes the Zipping of
the .dbf file for each client/group using ZipGenius via CLI.  From there
each client PC connects to my ftp Server on a timed basis, downloads and
unzips the updated .dbf file, then processes the contents of the updated
file against the Customer.dbf table's phone numbers.  All automated.  Very
cool.

Were it not for the PostgreSQL piece of the puzzle to handle the enormous
table size (even for only the 23 Area Codes I normally would have imported)
I would not be able to provide the same level of automation and flexibility
with native VFP due to the size of the required DoNotCall table alone.  And
splitting the Area Codes into multiple tables could have worked, but would
not only have been a kludge, but resulted in a far slower processing time
overall.  All of VFP table builds took only a few minutes once the csv file
imports were done.  I did not notice any speed degradation between using a
2.5Gb PostgreSQL table with only 23 Area Codes in it vs using a 9.07Gb
PostgreSQL with all Area Codes in it.  The parameterized views ran extremely
fast against the fully populated PostgreSQL table - very impressive of both
PostgreSQL and VFP.  They certainly seem to play together well.

The significant time burden was due to me deciding to import all the FTC csv
files as opposed to only the ones I needed - no surprise there (just over 20
minutes vs 3:34 hours).  I did note, however, that importing the csv file
into the PostgreSQL table via PGAdminIII was significantly faster than if I
had run an APPEND FROM <FileName> CSV into a VFP table.  I have no recorded
time comparisons, just my gut feel based on years of watching VFP import
records from large csv files (millions of records at a crack).  And, for the
record, when I do my imports into a VFP table I leave any index/order set
off, as having a table with an active index/order will slow down an import
process horribly with large tables as VFP tries to rearrange imported
records into the active sort order...

Now to test the process on the Compaq Server with 1/2 the RAM and a simple
SATA drive, but running Linux <g>...  heh-heh...  I may wait a few weeks to
do that.


Gil

> -----Original Message-----
> From: [EMAIL PROTECTED]
> [mailto:[EMAIL PROTECTED] Behalf Of Gil Hale
> Sent: Tuesday, April 01, 2008 1:05 PM
> To: [EMAIL PROTECTED]
> Subject: RE: [NF] RE: Microsoft New SQL OS >>> PostgreSQL info
>
>
> I forgot to advise the ProFox group of the apparent latent impact
> on my Dell
> sc430 Server when I hammered it so heavily processing a huge number of
> Federal Do Not Call records the other week.  The sc430 runs Windows 2003
> Server, and had 1Gb DDR2 ECC RAM (512mg x 2, from Dell).  I noticed the
> morning after running the VFP processes used to update dbf tables for
> distribution to clients, drawing the client record sets from the
> PostgreSQL
> database, my scheduled morning VFP TranslationEngine app was running dog
> slow.  I checked the sc430 Server and found it had not fully released the
> RAM it ate up running the PostgreSQL processes - likely because I
> still had
> that app (and PostgreSQL tables) open so I could tweak it "later
> on".  Upon
> closing the DoNotCall project the sc430 Server RAM climbed back
> to where it
> normally runs.
>
> So, in an effort to prevent poor performance issues, and in
> keeping with all
> I have read about PostgreSQL performance and how to enhance it, I
> ordered 2
> 1Gb DDR2 ECC RAM chips (installed them this afternoon).  I will be
> processing the DoNotCall updates again next week, using the VFP
> project that
> interfaces with the PostgreSQL database.  Assuming anyone cares
> to know what
> happens with the performance with the additional RAM I will report the
> results, even if somewhat anecdotal.  The sc430 can handle 4Gb RAM, but I
> wanted to see how this does first.  After that little experiment I plan to
> migrate the PostgreSQL database from the Windows 2003 Server to a Ubuntu
> Linux v-7.10 Server with a "paltry" 1.5Gb RAM in it, just to see how it
> performs in comparison.  Both Servers have an Intel P4 2.8Ghz
> Dual-Core CPU,
> the Dell has 3Gb RAM and uses a non-boot SCSI HDD for the PG database, the
> Compaq has 1.5Gb RAM, and uses its boot SATA HDD for its PG
> database.  Both
> Servers can be pumped to 4Gb RAM if need be.  Let the games begin!
> heh-heh...  This is too much fun.
>
> Gil
>
> > -----Original Message-----
> > From: [EMAIL PROTECTED]
> > [mailto:[EMAIL PROTECTED] Behalf Of Gil Hale
> > Sent: Tuesday, April 01, 2008 11:50 AM
> > To: [EMAIL PROTECTED]
> > Subject: [NF] RE: Microsoft New SQL OS
> >
> >
> > Brat!  I was about to cut from PostgreSQL to the new SQL OS because I
> > thought M$ finally did something right.  Grrr...
> >
> > Gil
> >
> > > -----Original Message-----
> > > From: [EMAIL PROTECTED]
> > > [mailto:[EMAIL PROTECTED] Behalf Of Alan Bourke
> > > Sent: Tuesday, April 01, 2008 6:33 AM
> > > To: [EMAIL PROTECTED]
> > > Subject: Re: Microsoft New SQL OS
> > >
> > >
> > > 2 things!
> > >
> > > April Fool, and XP SP2 is as bug free as any OS I've used.
> > >
> > >
[excessive quoting removed by server]

_______________________________________________
Post Messages to: [email protected]
Subscription Maintenance: http://leafe.com/mailman/listinfo/profox
OT-free version of this list: http://leafe.com/mailman/listinfo/profoxtech
Searchable Archive: http://leafe.com/archives/search/profox
This message: http://leafe.com/archives/byMID/profox/[EMAIL PROTECTED]
** All postings, unless explicitly stated otherwise, are the opinions of the 
author, and do not constitute legal or medical advice. This statement is added 
to the messages for those lawyers who are too stupid to see the obvious.

RE: [NF] RE: Microsoft New SQL OS >>> PostgreSQL info

Reply via email to