To be honest... This would be a task I'd rather solve in Perl, but that's just because I'm used to doing things like this in Perl, I honestly have no idea if it would be better suited or not.
To get all filenames in a dir you'd also use opendir and readdir. Maybe it's possible to get to the data by using regular expressions. If not, take a look at the HTML::TreeBuilder module for parsing. When things didn't work out using regular expressions, I've always been able to get the job done using HTML::TreeBuilder. Good luck! Piet. > -----Original Message----- > From: Brandon Smith [mailto:[EMAIL PROTECTED] > Sent: dinsdag 12 juli 2005 12:27 > To: [email protected] > Subject: Re: [php-list] data mining > > > The easy part will be getting the list of files and opening them. You > can use opendir, readdir, and fopen for this. You're going to have to > parse a lot of HTML, so you're going to have to either find a > parser or > write one. You can also use some XML parsing stuff built into > PHP, but > you might have problems if the document is not well formed. > > It looks like the data you want is all in tables, so you > would probably > extract the contents of your td tags. Since the order of the > columns is > the same throughout the table, you can extract each row in > the same fashion. > > I have written an HTML parser you may use if you'd like. It seems to > work well in a variety of situations, but I make no > guarantees that it > will work with your data. > > http://sproutworks.com/displaysource.php?filename=htmlparser.php > > There are also plenty of other HTML parsers out there. > > good luck, > > Brandon > > >Hello, > >Recently i've been given a hell of a task. > >I have about 17962 files from which i have to take out data and > >introduce it into an db. > >The files are all formated the same but how can i take out > the name of > >each person and the data coresponding, and the insert into DB i know. > >Basicaly i have to pull out each row and insert it into db. > >Note: there are all 17962 files in the same directory and > i'll have to > >go through all af them and parse all existing data. > >You will get some errors if trying the page because i have > not included > >the java-scripts. > >Even if it's lot's of work i would apreciate some guiding, or some > >schematic syntax for cycling through the files and most > important how > >can i pull out the data from each file. > >Thank you for you time Mike, > >Here's how my files look, i'll give you the first one (p.s. > it's kinda > >long): > >--snip-- > > > > > > -- > --------------------------------- > * Brandon Smith > * programmer / web designer > * http://sproutworks.com > > > > Community email addresses: > Post message: [email protected] > Subscribe: [EMAIL PROTECTED] > Unsubscribe: [EMAIL PROTECTED] > List owner: [EMAIL PROTECTED] > > Shortcut URL to this page: > http://groups.yahoo.com/group/php-list > Yahoo! Groups Links > > > > > > Community email addresses: Post message: [email protected] Subscribe: [EMAIL PROTECTED] Unsubscribe: [EMAIL PROTECTED] List owner: [EMAIL PROTECTED] Shortcut URL to this page: http://groups.yahoo.com/group/php-list Yahoo! Groups Links <*> To visit your group on the web, go to: http://groups.yahoo.com/group/php-list/ <*> To unsubscribe from this group, send an email to: [EMAIL PROTECTED] <*> Your use of Yahoo! Groups is subject to: http://docs.yahoo.com/info/terms/
