-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

jdalton wrote:
> Like I said I am rather new to the xpath (mainly because its not cross
> browser at the moment) and css3 syntax.
CSS3 fails under the same "not cross browser" umbrella given its current
status (both as standardization and implementation).

> I recognize the power it holds and it has peeked my curiosity.
> Do you know of any resources for finding good usage examples?
Whatever links we may give, they will all fail as "advertising", so let's
stick to the Wikipedia article [1], as it provides lotta good pointers.
Besides, it looks like Google still works OK for this query [2] ;-)

[1] http://en.wikipedia.org/wiki/Xpath
[2] http://www.google.com/search?q=xpath%20tutorial

> P.S. Here is a regexp I use for extracting html elements from text in 
> php. The regexp can be used in js too. Its very thorough and allows for 
> tags like <div title='next >' id="nav">test</div>
That ">" char inside @title should have been encoded, but you already knew
that; same goes for things like <foo onclick="x = y > 2 ? a : b"> or <a
href="...?foo=a&bar=b">, etc.

> $selfClosing = preg_match('/(br|hr|img|input|link|meta|param)/i',
> $tagName);
You missed 'base', 'area', 'col' (maybe more, depending what standard you
are referring to - like 'frame'?) ;-)

> if($selfClosing){
>   //very thorough regular expression
>   $pattern = '%<'.$tagName.'(?:(?:(?:\s|\n)+\w+(?:(?:\s|\n)*=(?:\s|
> \n)*(?:".*?"|\'.*?\'|[^\'">\s]+))?)+(?:\s|\n)*|(?:\s|\n)*)/>%i';

I wouldn't name this really "thorough", but OK, if it serves your purposes,
then use it. If you allow me some comments (OK, ignoring the ugly "new
lines" workaround), I'd say:

- - the approach you chose for determining the attributes list results in
a too high level of alternation: TAG ( ATTRIBUTE+ BLANK* | BLANK* )
you could minimize it to: TAG ATTRIBUTE* BLANKS*

- - attribute names may be more than "\w+" (eg.: 'http-equiv'). If you want to
play friendly with at least HTML4, allow at least '-', but extending it to
XML name/nmtoken rules should be even better ;-)

- - be more economic when getting the attributes values by using negative
lists, e.g. ".*?" -> "[^"]*"

- - note that browsers can handle even aberrations like "<foo bar= >", making
attribute value optional (even when "=" is in place) would do it

> }
> else{
>   $pattern = '%<'.$tagName.'(?:(?:(?:\s|\n)+\w+(?:(?:\s|\n)*=(?:\s|
> \n)*(?:".*?"|\'.*?\'|[^\'">\s]+))?)+(?:\s|\n)*|(?:\s|\n)*)>(?:(?:.|
> \n)*?)</'.$tagName.'>%';

OK, I'm sure you know this one may fail badly, but anyway, at least allow
whitespace in the "end-tag", after the tagName. ;-)

> }

cheers

and my apologies for this off-topic message
(is "Saturday night" a good excuse?) ;-)
- --
Marius Feraru
-----BEGIN PGP SIGNATURE-----

iD8DBQFGPQQntZHp/AYZiNkRAix0AJ9MGhh1DX2lHvX6a7RhTJRntDhkxQCfZbOP
JwlQQvGoofJ/B2Ndz4AQbhw=
=W7Av
-----END PGP SIGNATURE-----

--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups "Ruby 
on Rails: Spinoffs" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at 
http://groups.google.com/group/rubyonrails-spinoffs?hl=en
-~----------~----~----~----~------~----~------~--~---

Reply via email to