Hi Greg,

That's pretty much how we had things mapped out. Basic source validation in the beginning, then a more standard validation later on. I guess my question was - is there any open source project that would help with this work flow, such as handling exceptions in the case of bad data, etc. etc.?

-Matt

On Tue, 29 Dec 2009, Greg Rundlett (freephile) wrote:

There are several issues all bundled up here.

First you have to define the file format(s) that your feeds come in so
that you can read them and isolate the data elements correctly.  Are
they CSV, TSV, XML, custom?
At the first (read) step you can do gross validation like validate the
source for correct character encoding, record separators, record
length etc.
You also want to define "layouts" for various feeds.  By defining the
layout, you can apply a set of processing instructions to transform
the received layout into the desired layout.
Once you have the data in the desired layout, you can then do finer
validation such as per field rules.  E.g. last name must
/(a-zA-Z'-){2,100}/
Then you can write an import routine that determines the correct
layout transformation to apply, reads the source, tranposes it,
validates it, stores it in the database, logs the event

hth,

Greg Rundlett

nbpt 978-225-8302
m. 978-764-4424
-skype/aim/irc/twitter freephile
http://profiles.aim.com/freephile




On Tue, Dec 29, 2009 at 12:28 PM, Matt Juszczak <m...@atopia.net> wrote:
Hi all,

I need to write a tool that somehow takes multiple data feeds (some xls
files, some csv files, etc.), defines each format,
verifies/validates/sanitizes the data, and imports it into standardized
MySQL tables.

As an example, one feed might look like this:

First Name, Last Name, DOB

And another like this:

Name, DOB

And another:

DOB, Last Name, First Name, Middle Initial

I need to somehow be able to say "For this source, this column needs to be
split into two, and this other column is the date of birth column".  Once
the data is parsed, it would need to be imported into standard MySQL tables:

first_name, last_name, birth_date, etc.

This is a simple example - the files and tables actually have 20-30 columns
each, so using mysqlimport or a simple shell script probably isn't the best
option.

Are there open source tools out there that do this already?  If not, should
I just build something?

Thanks,

-Matt
_______________________________________________
New York PHP Users Group Community Talk Mailing List
http://lists.nyphp.org/mailman/listinfo/talk

http://www.nyphp.org/Show-Participation

_______________________________________________
New York PHP Users Group Community Talk Mailing List
http://lists.nyphp.org/mailman/listinfo/talk

http://www.nyphp.org/Show-Participation
_______________________________________________
New York PHP Users Group Community Talk Mailing List
http://lists.nyphp.org/mailman/listinfo/talk

http://www.nyphp.org/Show-Participation

Reply via email to