Part of the problem is that there is a lot of dirty HTML that doesn't conform to standards out there on the net.
It sounds like you are building a screenscraper. If you want to parse a few values from a page, you might consider using regular expressions to find the values. -- Walt Aaron Luman wrote: > As a matter of introduction I should say that I struggle with PHP. I > do most of my web work in PHP, but probably not how I should be doing > it, haha. Anyway, I'm trying to learn better practices as well as how > to use pre-made scripts and tools. > > My current struggle with PHP is that i am trying to figure out a > simple way to parse an html file. My hope is that I will be able to > copy the source and then submit it via a form using post. I found two > pre-made scripts that looked promising, but I am not sure how good of > an option they are. > > The first - http://sourceforge.net/projects/simplehtmldom/ - looked > good but then when I tried using it with the entire file it returned > memory usage errors while testing on my local machine. > > The second - http://php-html.sourceforge.net/ - works without error, > but takes nearly a minute to parse the full file (nearly 5 MB of text) > > I am concerned that when I get my code working and posted to my host > that they will freak out about the heavy workload imposed by parsing > the large files. > > The end result of parsing the source is that I would like to be able > to find particular values so that they can be filtered out along with > their markup (a table of between 1500 and 2000 characters) and > reposted in a final, results page, which should be fairly easy once I > get the parsing concerns worked out. > > Should I be worried about the memory/processor draw while using these > (or similar) parses? Do any of you have experience with another > parsing tool that is more efficient? Is there a better way to go > about doing this? > > Thanks in advance for any help that you might be able to offer. > > Aaron > > _______________________________________________ > > UPHPU mailing list > [email protected] > http://uphpu.org/mailman/listinfo/uphpu > IRC: #uphpu on irc.freenode.net > _______________________________________________ UPHPU mailing list [email protected] http://uphpu.org/mailman/listinfo/uphpu IRC: #uphpu on irc.freenode.net
