Re: [PHP] extract data from html

2001-06-30 Thread Hugh Bothwell

   1. Open the html file in read only mode
   2. Start reading the html file till I encounter a td tag (I don't
know
   how to do this)
   3. Grab that data after the td tag (and then what?)
 
  See http://php.net/manual/en/function.fopen.php and
  http://php.net/manual/en/function.fgetss.php plus the chapter for
  whatever DBMS you want to drop the file contents into.

 Thanks.  One thing just reading the manual without the idea of how the
 function works is of no use.  Some examples would help.  In fact I did use
 fopen, fgets, fgetss but the problems is that the html tag that I am
 looking is td.  Now this is easy but if td width=25% or td
 colspan=7 would give a problem.

(grimace) The PHP manual is actually very well written; I can usually find
exactly what I need in  10s.  I think your complaint just covers sloppy
thinking.

I'd think you should be able to find screen-scraper code around; if not, try
this:

- search for 'td'.  NOTE: use a case-insensitive search!
- search for the first trailing ''.  Save (this character position + 1).
- search for the first trailing '/td'.  Again, case-insensitive!
- store everything between the two; strip all HTML tags, add slashes, and
store it.
- increment your file position by 5 characters and repeat.

I'd give you actual code, but I think you could use some manual practice
(smirk).



-- 
PHP General Mailing List (http://www.php.net/)
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
To contact the list administrators, e-mail: [EMAIL PROTECTED]




Re: [PHP] extract data from html

2001-06-29 Thread Adrian D'Costa

On Thu, 28 Jun 2001, CC Zona wrote:

 In article [EMAIL PROTECTED],
  [EMAIL PROTECTED] (Adrian D'Costa) wrote:
 
  1.  Open the html file in read only mode
  2.  Start reading the html file till I encounter a td tag (I don't know
  how to do this)
  3.  Grab that data after the td tag (and then what?)
 
 See http://php.net/manual/en/function.fopen.php and 
 http://php.net/manual/en/function.fgetss.php plus the chapter for 
 whatever DBMS you want to drop the file contents into.

Thanks.  One thing just reading the manual without the idea of how the
function works is of no use.  Some examples would help.  In fact I did use
fopen, fgets, fgetss but the problems is that the html tag that I am
looking is td.  Now this is easy but if td width=25% or td
colspan=7 would give a problem.

Adrian


-- 
PHP General Mailing List (http://www.php.net/)
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
To contact the list administrators, e-mail: [EMAIL PROTECTED]




Re: [PHP] extract data from html

2001-06-29 Thread CC Zona

In article [EMAIL PROTECTED],
 [EMAIL PROTECTED] (Adrian D'Costa) wrote:

   1.   Open the html file in read only mode
   2.   Start reading the html file till I encounter a td tag (I don't know
   how to do this)
   3.   Grab that data after the td tag (and then what?)
  
  See http://php.net/manual/en/function.fopen.php and 
  http://php.net/manual/en/function.fgetss.php plus the chapter for 
  whatever DBMS you want to drop the file contents into.
 
 Thanks.  One thing just reading the manual without the idea of how the
 function works is of no use.  

It should be exactly of use.  Explaining how a function works, and how to 
use it, is the point of a manual.  If there are finer points that you need 
clarification on *after reading the manual entry*, that's understandable.  
But for the most part the manual is clear and accessible, even to a newbie.  
I learned PHP, as a complete novice to that language and programming in 
general, by simply reading the manual through.

 Some examples would help.  

..which is why most manual entries include 1-2 official examples, plus 
several others in the user annotations.  If you're not reading the 
annotated version of the manual (as in the links provided), then you're 
only scratching the surface of what the manual has to offer.

 In fact I did use
 fopen, fgets, fgetss but the problems is that the html tag that I am
 looking is td.  Now this is easy but if td width=25% or td
 colspan=7 would give a problem.

See http://php.net/manual/en/function.strip_tags.php (as is noted in the 
see also note and user annotation for fgetss, BTW; did you bother to read 
the links provided before proclaiming them of no use?)

-- 
CC

-- 
PHP General Mailing List (http://www.php.net/)
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
To contact the list administrators, e-mail: [EMAIL PROTECTED]




[PHP] extract data from html

2001-06-28 Thread Adrian D'Costa

Hi,

I keep receiving a lot of word documents the I need to extract and put
into a mysql table.  As of now, I do a cut and paste manually using a html
from to a php script to dump it into the mysql table.

Since the format is usually the same (most of the time) I am sure there
should be another way to do this.  I converted the word doc into a html
file.  Now the question is that I need to write a php script to open that
file and read thru its contents. I am not that experienced in php but can
understand the basics.  The steps that I would take are :

1.  Open the html file in read only mode
2.  Start reading the html file till I encounter a td tag (I don't know
how to do this)
3.  Grab that data after the td tag (and then what?)
4.  Repeat this process till encounter the /table tag.

If anyone could give me the steps in hard code and any other alternative
what I would be happy.

A Sample of the word doc (too big to attach)
+---+
COS  Zurich
+---+
JuneJulyAug Hotel   Price
++---+---++-+
 28 2   8   Arion   Fr.1349
Inc Village Fr.1099
++---+---+---+--+

This is a table.  Each table belongs to a different city.  Each row has
more than 2 hotels.

What would be the best solution?

TIA

Adrian


-- 
PHP General Mailing List (http://www.php.net/)
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
To contact the list administrators, e-mail: [EMAIL PROTECTED]




Re: [PHP] extract data from html

2001-06-28 Thread CC Zona

In article [EMAIL PROTECTED],
 [EMAIL PROTECTED] (Adrian D'Costa) wrote:

 1.Open the html file in read only mode
 2.Start reading the html file till I encounter a td tag (I don't know
 how to do this)
 3.Grab that data after the td tag (and then what?)

See http://php.net/manual/en/function.fopen.php and 
http://php.net/manual/en/function.fgetss.php plus the chapter for 
whatever DBMS you want to drop the file contents into.

-- 
CC

-- 
PHP General Mailing List (http://www.php.net/)
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
To contact the list administrators, e-mail: [EMAIL PROTECTED]