To be honest... This would be a task I'd rather solve in Perl, but that's
just because I'm used to doing things like this in Perl, I honestly have no
idea if it would be better suited or not.

To get all filenames in a dir you'd also use opendir and readdir. Maybe it's
possible to get to the data by using regular expressions. If not, take a
look at the HTML::TreeBuilder module for parsing. When things didn't work
out using regular expressions, I've always been able to get the job done
using HTML::TreeBuilder.

Good luck!

Piet.


> -----Original Message-----
> From: Brandon Smith [mailto:[EMAIL PROTECTED] 
> Sent: dinsdag 12 juli 2005 12:27
> To: [email protected]
> Subject: Re: [php-list] data mining
> 
> 
> The easy part will be getting the list of files and opening them. You 
> can use opendir, readdir, and fopen for this. You're going to have to 
> parse a lot of HTML, so you're going to have to either find a 
> parser or 
> write one. You can also use some XML parsing stuff built into 
> PHP, but 
> you might have problems if the document is not well formed.
> 
> It looks like the data you want is all in tables, so you 
> would probably 
> extract the contents of your td tags. Since the order of the 
> columns is 
> the same throughout the table, you can extract each row in 
> the same fashion.
> 
> I have written an HTML parser you may use if you'd like. It seems to 
> work well in a variety of situations, but I make no 
> guarantees that it 
> will work with your data.
> 
> http://sproutworks.com/displaysource.php?filename=htmlparser.php
> 
> There are also plenty of other HTML parsers out there.
> 
> good luck,
> 
> Brandon
> 
> >Hello,
> >Recently i've been given a hell of a task.
> >I have about 17962 files from which i have to take out data and
> >introduce it into an db.
> >The files are all formated the same but how can i take out 
> the name of 
> >each person and the data coresponding, and the insert into DB i know.
> >Basicaly i have to pull out each row and insert it into db.
> >Note: there are all 17962 files in the same directory and 
> i'll have to 
> >go through all af them and parse all existing data.
> >You will get some errors if trying the page because i have 
> not included 
> >the java-scripts.
> >Even if it's lot's of work i would apreciate some guiding, or some 
> >schematic syntax for cycling through the files and most 
> important how 
> >can i pull out the data from each file.
> >Thank you for you time Mike,
> >Here's how my files look, i'll give you the first one (p.s. 
> it's kinda 
> >long):
> >--snip--
> >  
> >
> 
> -- 
> ---------------------------------
> * Brandon Smith
> * programmer / web designer
> * http://sproutworks.com
> 
> 
> 
> Community email addresses:
>   Post message: [email protected]
>   Subscribe:    [EMAIL PROTECTED]
>   Unsubscribe:  [EMAIL PROTECTED]
>   List owner:   [EMAIL PROTECTED]
> 
> Shortcut URL to this page:
>   http://groups.yahoo.com/group/php-list 
> Yahoo! Groups Links
> 
> 
> 
>  
> 
> 


Community email addresses:
  Post message: [email protected]
  Subscribe:    [EMAIL PROTECTED]
  Unsubscribe:  [EMAIL PROTECTED]
  List owner:   [EMAIL PROTECTED]

Shortcut URL to this page:
  http://groups.yahoo.com/group/php-list 
Yahoo! Groups Links

<*> To visit your group on the web, go to:
    http://groups.yahoo.com/group/php-list/

<*> To unsubscribe from this group, send an email to:
    [EMAIL PROTECTED]

<*> Your use of Yahoo! Groups is subject to:
    http://docs.yahoo.com/info/terms/
 


Reply via email to