Re: [PHP-WIN] DOM

Tom Wed, 17 Jan 2001 02:54:52 -0800
At this point I would say RTFM, but I'm all too aware that the PHP funciton
description runs to about 900 pages, so that may not be too helpful. Seriously
though, if you haven't already, its well worth downloading the BigManual.html
from the php site as this is a complete command reference!
The functions that will be useful to do this are
strstr() (or stristr() for case insensitive) which will find the first occurance
of a string
strtr() which is technically called 'string translate' but allows you to
translate a set of characters from one thing to another - use with a wildcard
and you can translate %<table> to ""
substr() which returns only part of a string - find where the first occurence is
with strstr() and then use this as the parameter to chop the string.

You may also be interested in strip_tags() which (fairly obviously) strips all
HTML and PHP tags from a string (which could be the whole file if you felt like
it)

Tom

[EMAIL PROTECTED] wrote:

> That sounds the way to go. Yes the sites I have looked at so far put their
> data into tables. Is there a PHP command that performs a replace all with ""
> until first occurrence of <table> kind of thing?I agree with the DOM statement
> too. I worked out the DOM access viaJavascript on IE quite quickly but I've
> had a very brief look at the PHP DOM access libraries and they look like much
> harder work (especially as I'm new to programming).

> Thanks

>
> James
>
> -----Original Message-----
> From: Tom [mailto:[EMAIL PROTECTED]]
> Sent: 16 January 2001 10:27
> Cc: [EMAIL PROTECTED]
> Subject: Re: [PHP-WIN] DOM
>
> James
>
> My guess is that whatever the site, the dat will be in a table.
> It is fairly trivial to strip off everything before the beginning of the
> table
> (replace everything up to and including <TABLE> with "")
> Then replace (say) <tr> with </td> and </tr> with ""
> Finally strip out the end of the file (</table> and onwards)
>
> You will then just have the table data, all seperated by </td><td>, which
> should
> be easy to handle (presumably you want to make a distinction between EPIC
> and
> price - do this by using the datatype, and there'll probably be a pile of
> formatting to sort out as well, but that should really be pretty trivial!)
>
> Alternatively, you could spend a considerable amount of time trying to get a
> generic XML parser to work and rebuild a DOM which would no doubt improve
> your
> XML skills immeasurably, but you would probably die trying!
>
> Tom
>
> James Duncan wrote:
>
> > Thanks Tom. Yes you have it exactly right. That is the approach I'm
> > currently aiming for! However, as you say this approach is hard-coded to
> > each source website. These websites have a nasty habit of changing their
> > format slightly on a fairly regular basis. I'm also attempting to pull
> share
> > price information from many different websites at the same time because
> none
> > provide the full set of data I require plus some shares (off market
> > particularly) are only provided on dedicated web sites.
> >
> > The reason I'm attempting to access the HTML textual data via the DOM is
> > because I can run a looped search on all the #text fields until I find a
> > match on a company name or EPIC code and then all data on the nested #text
> > elements will be referring to that company. This allows easy data capture
> > and transfer to my database. Another major benefit of this approach is
> that
> > the same PHP code can be used to search ANY HTML file and recover the
> > required data without source code changes. That's the idea but whether
> it's
> > actually possible in reality is another matter ;)
> >
> > Thanks
> >
> > James
> >
> > -----Original Message-----
> > From: Tom [mailto:[EMAIL PROTECTED]]
> > Sent: 15 January 2001 10:31
> > Cc: [EMAIL PROTECTED]
> > Subject: Re: [PHP-WIN] DOM
> >
> > James
> >
> > If I'm reading your many posts right, then what you are trying to do is
> pull
> > the
> > share prices from the same site at (say) half hourly intervals, so that
> you
> > can
> > use them yourself / analyse them or whatever.
> > In this case, I suspect that the format of the page you pull down will
> > ALWAYS BE
> > IDENTICAL, so you actually only have to work out a suitable parser to
> > extract
> > the data once.
> > If I remember rightly from a couple of weeks back, you are using MySQL as
> > the
> > database? In this case, pull the html file down, save it on your server
> and
> > examine how the html is constructed (it will almost certainly be an ASP /
> > PHP
> > while construct to build a table, all of whose rows will thus be identical
> > apart
> > from the data).
> > Then you can use a command line (run from a PHP script if you like) MySQL
> > LOAD DATA INFILE 'blah.html' INTO TABLE Share_Prices  FIELDS TERMINATED BY
> > '</td><td>';
> > type of construct.
> > Note that you will want to strip out the beginning and end of the file
> first
> > as
> > well. This may sound like a bit of work, but you only have to do it once,
> as
> > the
> > file format will always be the same (barring the addition of new stocks).
> >
> > Tom
> >
> > James Duncan wrote:
> >
> > > I don't think this will work in my case because I don't control the
> layout
> > > of the HTML page and hence can't add the hidden fields. I'm downloading
> > the
> > > HTML pages from a website. It would require as much work to insert the
> > > hidden fields as trying to strip the HTML tags in an attempt to read the
> > > data directly from the HTML page itself. There must be a way to access
> the
> > > DOM directly from PHP? I notice in the manual there is a section
> regarding
> > > XML DOM but not the DOM itself.
> > >
> > > Are the DOM values only available on the client? If that's the case then
> > PHP
> > > can't be used to read them because it's limited to the server side?
> > >
> > > Thanks
> > >
> > > James
> > >
> > > -----Original Message-----
> > > From: Michael Stearne [mailto:[EMAIL PROTECTED]]
> > > Sent: 13 January 2001 17:06
> > > To: James Duncan
> > > Cc: [EMAIL PROTECTED]
> > > Subject: Re: [PHP-WIN] DOM
> > >
> > > Could you do something like:
> > >
> > >
> >
> myForm.myField.value=tablejames.firstChild.childNodes[1].childNodes[4].first
> > > Child.firstChild.node Value;
> > >
> > > Set up a form of hidden fields.  Extract the values from the DOM and
> then
> > > have the user hit a Submit button to get to the next page.  At that
> point
> > > the values that were collected and put into the hidden form fields will
> be
> > > submitted and you next page (the PHP page) could INSERT the values into
> > the
> > > database,
> > >
> > > Michael
> > >
> > > On Friday, January 12, 2001, at 07:30 PM, James Duncan wrote:
> > >
> > > > Hi folks,
> > > >
> > > > I'm still new to HTML, Javascript and PHP but learning (fast
> hopefully).
> > > > I've just started accessing DOM elements. I have worked out how to
> > update
> > > > the contents of table cells directly using this method, etc. In
> > Javascript
> > > I
> > > > would use code like:
> > > >
> > > >   alert("Value is: " +
> > > >
> > >
> >
> tablejames.firstChild.childNodes[1].childNodes[4].firstChild.firstChild.node
> > > > Name);
> > > >   alert("Value is: " +
> > > >
> > >
> >
> tablejames.firstChild.childNodes[1].childNodes[5].firstChild.firstChild.node
> > > > Value);
> > > >
> > > > This Javascript shows the name and value of the child element.
> > > >
> > > > Now I want to use PHP to extract data (values) from HTML pages like I
> do
> > > > with the above Javascript. Is this possible? Obviously with the
> > Javascript
> > > > the HTML page has already been rendered in the browser (i.e. all tree
> > > > elements have been created). This makes extracting data a simple case
> of
> > > > finding the "#text" elements and reading in the values. Can I do the
> > same
> > > > thing with PHP and an HTML file I've downloaded from the Internet?
> > > Obviously
> > > > this file is sitting on my server and hasn't been rendered in a
> > browser...
> > > >
> > > > The whole point of this exercise is so that I can extract values from
> an
> > > > HTML table and populate them into a database. Maybe it's easier to
> > process
> > > > the HTML file line by line and strip the unwanted HTML tags? However,
> > with
> > > > this approach I've got to hardcode each webpage...
> > > >
> > > > If this is a silly question then sorry but you only learn if you ask
> ;)
> > > >
> > > > Thanks
> > > >
> > > > James
> > > >
> > > >
> > > >
> > > > --
> > > > PHP Windows Mailing List (http://www.php.net/)
> > > > To unsubscribe, e-mail: [EMAIL PROTECTED]
> > > > For additional commands, e-mail: [EMAIL PROTECTED]
> > > > To contact the list administrators, e-mail:
> [EMAIL PROTECTED]
> > > >
> > > >
> > > >
> > >
> > > --
> > > PHP Windows Mailing List (http://www.php.net/)
> > > To unsubscribe, e-mail: [EMAIL PROTECTED]
> > > For additional commands, e-mail: [EMAIL PROTECTED]
> > > To contact the list administrators, e-mail: [EMAIL PROTECTED]
> >
> > --
> > PHP Windows Mailing List (http://www.php.net/)
> > To unsubscribe, e-mail: [EMAIL PROTECTED]
> > For additional commands, e-mail: [EMAIL PROTECTED]
> > To contact the list administrators, e-mail: [EMAIL PROTECTED]


-- 
PHP Windows Mailing List (http://www.php.net/)
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
To contact the list administrators, e-mail: [EMAIL PROTECTED]
Re: [PHP-WIN] DOM

Reply via email to