Part of the problem is that there is a lot of dirty HTML that doesn't 
conform to standards out there on the net.

It sounds like you are building a screenscraper.  If you want to parse a 
few values from a page, you might consider using regular expressions to 
find the values.

-- Walt

Aaron Luman wrote:
> As a matter of introduction I should say that I struggle with PHP.  I  
> do most of my web work in PHP, but probably not how I should be doing  
> it, haha.  Anyway, I'm trying to learn better practices as well as how  
> to use pre-made scripts and tools.
> 
> My current struggle with PHP is that i am trying to figure out a  
> simple way to parse an html file.  My hope is that I will be able to  
> copy the source and then submit it via a form using post.  I found two  
> pre-made scripts that looked promising, but I am not sure how good of  
> an option they are.
> 
> The first - http://sourceforge.net/projects/simplehtmldom/  - looked  
> good but then when I tried using it with the entire file it returned  
> memory usage errors while testing on my local machine.
> 
> The second - http://php-html.sourceforge.net/ - works without error,  
> but takes nearly a minute to parse the full file (nearly 5 MB of text)
> 
> I am concerned that when I get my code working and posted to my host  
> that they will freak out about the heavy workload imposed by parsing  
> the large files.
> 
> The end result of parsing the source is that I would like to be able  
> to find particular values so that they can be filtered out along with  
> their markup (a table of between 1500 and 2000 characters) and  
> reposted in a final, results page, which should be fairly easy once I  
> get the parsing concerns worked out.
> 
> Should I be worried about the memory/processor draw while using these  
> (or similar) parses?  Do any of you have experience with another  
> parsing tool that is more efficient?  Is there a better way to go  
> about doing this?
> 
> Thanks in advance for any help that you might be able to offer.
> 
> Aaron
> 
> _______________________________________________
> 
> UPHPU mailing list
> [email protected]
> http://uphpu.org/mailman/listinfo/uphpu
> IRC: #uphpu on irc.freenode.net
> 


_______________________________________________

UPHPU mailing list
[email protected]
http://uphpu.org/mailman/listinfo/uphpu
IRC: #uphpu on irc.freenode.net

Reply via email to