On Wed, Apr 2, 2008 at 9:13 AM, Gil Hale <[EMAIL PROTECTED]> wrote:

> Well, I downloaded the latest FTC DoNotCall Area Code files Tuesday
> afternoon, then unzipped them for processing.  Most of the files are
> grouped
> by state, so they will contain more than a single Area Code.  A few other
> files are for specific Area Codes, or for US territories with one or more
> Area Codes (Puerto Rico, US Samoa, US Virgin Islands).
>
> The file names are rather long (50+ characters), but the core
> State/Territory/Area Code value for each file is preceded and appended
> with
> an underscore ("_") character.  Using the ADIR() function I renamed each
> of
> the 67 files with a shorter name (State, Area Code or Territory, preceded
> with "_") than originally provided, just to keep things simple for me to
> keep track of.  As part of the file renaming code I also created a SQL
> Script file to use with the PGAdminIII SQL command window to import each
> of
> the csv files.  I ended up importing 9.07Gb of csv data into the
> PostgreSQL
> database (plus another 4.3Gb for the indexes).  That represents
> 157,889,571
> records.
>
> It took the machinery 3 hours and 34 minutes to import all the records
> from
> all 67 csv files, which covers all Area Codes in the US and its
> territories.
> Normally I would have only imported the files I needed for my clients'
> Area
> Codes they subscribed to, but curiosity got the better of me this time
> <g>.
> I guess I just wanted to see how well PostgreSQL could handle this size
> table.  It did fine, but I see the "DoNotCall table" is actually comprised
> of a series of files, each "only" 1Gb in size.  So I guess PostgreSQL does
> its own internal file spanning for the entire DoNotCall table.  Ditto for
> the index, the files are split into 1Gb pieces.  Very interesting.
>
> After finishing the csv import process I ran my DoNotCall build routine
> that
> creates the updated .dbf files for my client machinery based on the Area
> Codes they subscribed to.  Part of the build routine includes the Zipping
> of
> the .dbf file for each client/group using ZipGenius via CLI.  From there
> each client PC connects to my ftp Server on a timed basis, downloads and
> unzips the updated .dbf file, then processes the contents of the updated
> file against the Customer.dbf table's phone numbers.  All automated.  Very
> cool.
>
> Were it not for the PostgreSQL piece of the puzzle to handle the enormous
> table size (even for only the 23 Area Codes I normally would have
> imported)
> I would not be able to provide the same level of automation and
> flexibility
> with native VFP due to the size of the required DoNotCall table alone.
>  And
> splitting the Area Codes into multiple tables could have worked, but would
> not only have been a kludge, but resulted in a far slower processing time
> overall.  All of VFP table builds took only a few minutes once the csv
> file
> imports were done.  I did not notice any speed degradation between using a
> 2.5Gb PostgreSQL table with only 23 Area Codes in it vs using a 9.07Gb
> PostgreSQL with all Area Codes in it.  The parameterized views ran
> extremely
> fast against the fully populated PostgreSQL table - very impressive of
> both
> PostgreSQL and VFP.  They certainly seem to play together well.
>
> The significant time burden was due to me deciding to import all the FTC
> csv
> files as opposed to only the ones I needed - no surprise there (just over
> 20
> minutes vs 3:34 hours).  I did note, however, that importing the csv file
> into the PostgreSQL table via PGAdminIII was significantly faster than if
> I
> had run an APPEND FROM <FileName> CSV into a VFP table.  I have no
> recorded
> time comparisons, just my gut feel based on years of watching VFP import
> records from large csv files (millions of records at a crack).  And, for
> the
> record, when I do my imports into a VFP table I leave any index/order set
> off, as having a table with an active index/order will slow down an import
> process horribly with large tables as VFP tries to rearrange imported
> records into the active sort order...
>
> Now to test the process on the Compaq Server with 1/2 the RAM and a simple
> SATA drive, but running Linux <g>...  heh-heh...  I may wait a few weeks
> to
> do that.
>
> -------------------------------------------------

Now that your data is in place, what calls do you perform on it and do you
have much in the way of add / update or is it all a query to see if you
can't use that #, or you get a list of #'s that can be used with the name
addy data as well?





-- 
Stephen Russell
Sr. Production Systems Programmer
Mimeo.com
Memphis TN

901.246-0159


--- StripMime Report -- processed MIME parts ---
multipart/alternative
  text/plain (text body -- kept)
  text/html
---

_______________________________________________
Post Messages to: [email protected]
Subscription Maintenance: http://leafe.com/mailman/listinfo/profox
OT-free version of this list: http://leafe.com/mailman/listinfo/profoxtech
Searchable Archive: http://leafe.com/archives/search/profox
This message: http://leafe.com/archives/byMID/profox/[EMAIL PROTECTED]
** All postings, unless explicitly stated otherwise, are the opinions of the 
author, and do not constitute legal or medical advice. This statement is added 
to the messages for those lawyers who are too stupid to see the obvious.

Reply via email to