Gordon Stewart wrote:
> 
> 
> Hi,
> 
> i know this is possible (& want to do it...) - But i'm wondering if
> anyone has already done this & has a working script ?
> 
> :- Basically, I want to create a Google doc (thats easy....) - Make it
> published - so no login required (thats easy)...
> 
> :- I'm wanting a PHP script to check that google doc every 24 hours
> :- Take away all the HTML codes / Codings etc...
> :- So that I'm left with just the bare text of what is entered in the 
> doc.....
> 
> I've checked on a published test document...
> 
> the ONLY word is "test" - & the document (HTML) is 33KB in size...
> 
> Lots of Divs / HTML / Javascript etc...
> 
> Of course - I'll change it (the document) & test on more words /
> paragraphs etc - To see how they look in HTML...
> 
> But - Is there a script already out there ?
> 
> Ps - I don't mind blank lines in my output (or 2-3 bank lines per 1
> blank line on the screen...) - as long as its got a space beween
> lines/paragraphs
> 
> Thanks...
> 
I'm not familiar with google docs, but if it displays in html, you can 
do the following:

Get this function: http://pastebin.com/m41377320
Then do something like this:
<?php
$html = GetPage('http://www.google.com/wherever/my/page/is.html');
$html = preg_replace("#<script[^>]*>.+?</script>#i','',$html);
$html = preg_replace("#<br\b[^>]*>#i","\n",$html); //turn br tags to 
newlines
$html = preg_replace("#<p\b[^>]*>#i","\n",$html); //turn p tags to newlines
$text = strip_tags($html); //strip out all other tags
?>
Now $text has your page text with no html in it.

William Piper

Reply via email to