If all you want is to parse HTML code, then you could treat it as
XML, of course, that would assume that the sites are all well XHTML
formed. Other than that, I can only thing on PCRE.
Boby wrote:
I need to extract news items from several news sites.
In order to do that, I need to parse
Jay Blanchard wrote:
I need to extract news items from several news sites.
...
Can anybody please give me some pointers?
Can you be more specific here? This is awfully broad.
I'll give an example:
Let's say I want to extract some news-items from the www.CNN.com web
page (If you visit
[snip]
Let's say I want to extract some news-items from the www.CNN.com web
page (If you visit CNN's page, you can see the 'MORE NEWS' block at the
right side).
I know how to extract the news-items (or any other data in the page)
using regular expressions, but I wonder if there are other ways.
On Thu, February 16, 2006 1:20 pm, Boby wrote:
Jay Blanchard wrote:
I need to extract news items from several news sites.
...
Can anybody please give me some pointers?
Can you be more specific here? This is awfully broad.
I'll give an example:
Let's say I want to extract some
What; nobody has anything to say about parsing HTML and doing search and
replaces!! Is there another news group that might be better suited? I do
want to do it PHP if I hadn't made that clear.
Somebody, anybody, please help.
Henry [EMAIL PROTECTED] wrote in message
[snip]
What; nobody has anything to say about parsing HTML and doing search and
replaces!! Is there another news group that might be better suited? I do
want to do it PHP if I hadn't made that clear.
Somebody, anybody, please help.
[/snip]
What? No one wants to help someone who didn't search the
Thanks Jay, I am still a newbie and I will read the manual, thankyou for the
help.
Having an OK day in the UK .
Henry
Jay Blanchard [EMAIL PROTECTED] wrote in message
news:003f01c27e93$87bc1da0$8102a8c0;000347D72515...
[snip]
What; nobody has anything to say about parsing HTML
The tools for you to execute the regular expression are there for you in
the manual. The actual regular expression that you're looking for is
not a php issue. And I can't say that I'm totally convinced that you're
still not trying to circumvent google's TOS.
Henry wrote:
What; nobody has
[snip]
Thanks Jay, I am still a newbie and I will read the manual, thankyou for the
help.
Having an OK day in the UK .
[/snip]
Henry your questions will get answered more quickly and accurately when you
provide
a. A clear explanation of the problem at hand
2. Proof that you have
I did a search and I remebered that I have previously seen some of your
work. In particlar your guide to CMS in evolt.org. Which I think is
absolutely wonderful. Thankyou for your help and I hope that I haven't gone
to far with my disingenuous comment posting.
I hadn't appreciated the time
Hi Henry,
If it is so simple perhaps you might spend 5
minutes generating the regular expression to
use that will ignore the contents of tags save
for the contents of quotes within meta tags and
do the replace for an associative array of mappings.
I assure you that I am not trying to circumvent google's anything. I'm
trying to provide a HTML translation page tool for some of my visitors where
they will provide there own URL and a translation of some keywords will be
done for them. Thats all.
I never actually was going to flaunt googles
What?!? You're not awake at 4:30 in the morning writing code?!? I
think the commitee will have to reconsider your geek club membership. :)
Jay Blanchard wrote:
[snip]
Thanks Jay, I am still a newbie and I will read the manual, thankyou for the
help.
Having an OK day in the UK .
[snip]
I did a search ...
[/snip]
My apologies Henry, I had just received a piece of disturbing news along
with starting my Monday at 4:30 CST with some database server problems. You
just happened to get in the line of fire.
Start with the regular expression functions in PHP. Once you have an
[snip]
What?!? You're not awake at 4:30 in the morning writing code?!? I
think the commitee will have to reconsider your geek club membership. :)
[/snip]
How do you think that I knew the original post came in at that time?
ROFLMAO. Go ahead revoke my Geek Club card, the discounts no longer
Lee Doolan wrote:
I have written form screen which has as one of it's elements a
textarea box in which a user can input some text --like a simple
bio-- which will appear on another screen. I'd like to edit check
this text. It would be a good idea to make sure that it has, among other
Are you looking at the site the calling script is on? I had the same problem
spidering my site to build a search engine. Using http://localhost/;
instead of the site address got it working.
I find that file() works fine on external pages but falls over (in exactly
the same way as yours) when
17 matches
Mail list logo