Gordon Stewart wrote: > > > Hi, > > i know this is possible (& want to do it...) - But i'm wondering if > anyone has already done this & has a working script ? > > :- Basically, I want to create a Google doc (thats easy....) - Make it > published - so no login required (thats easy)... > > :- I'm wanting a PHP script to check that google doc every 24 hours > :- Take away all the HTML codes / Codings etc... > :- So that I'm left with just the bare text of what is entered in the > doc..... > > I've checked on a published test document... > > the ONLY word is "test" - & the document (HTML) is 33KB in size... > > Lots of Divs / HTML / Javascript etc... > > Of course - I'll change it (the document) & test on more words / > paragraphs etc - To see how they look in HTML... > > But - Is there a script already out there ? > > Ps - I don't mind blank lines in my output (or 2-3 bank lines per 1 > blank line on the screen...) - as long as its got a space beween > lines/paragraphs > > Thanks... > I'm not familiar with google docs, but if it displays in html, you can do the following:
Get this function: http://pastebin.com/m41377320 Then do something like this: <?php $html = GetPage('http://www.google.com/wherever/my/page/is.html'); $html = preg_replace("#<script[^>]*>.+?</script>#i','',$html); $html = preg_replace("#<br\b[^>]*>#i","\n",$html); //turn br tags to newlines $html = preg_replace("#<p\b[^>]*>#i","\n",$html); //turn p tags to newlines $text = strip_tags($html); //strip out all other tags ?> Now $text has your page text with no html in it. William Piper