-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 jdalton wrote: > Like I said I am rather new to the xpath (mainly because its not cross > browser at the moment) and css3 syntax. CSS3 fails under the same "not cross browser" umbrella given its current status (both as standardization and implementation).
> I recognize the power it holds and it has peeked my curiosity. > Do you know of any resources for finding good usage examples? Whatever links we may give, they will all fail as "advertising", so let's stick to the Wikipedia article [1], as it provides lotta good pointers. Besides, it looks like Google still works OK for this query [2] ;-) [1] http://en.wikipedia.org/wiki/Xpath [2] http://www.google.com/search?q=xpath%20tutorial > P.S. Here is a regexp I use for extracting html elements from text in > php. The regexp can be used in js too. Its very thorough and allows for > tags like <div title='next >' id="nav">test</div> That ">" char inside @title should have been encoded, but you already knew that; same goes for things like <foo onclick="x = y > 2 ? a : b"> or <a href="...?foo=a&bar=b">, etc. > $selfClosing = preg_match('/(br|hr|img|input|link|meta|param)/i', > $tagName); You missed 'base', 'area', 'col' (maybe more, depending what standard you are referring to - like 'frame'?) ;-) > if($selfClosing){ > //very thorough regular expression > $pattern = '%<'.$tagName.'(?:(?:(?:\s|\n)+\w+(?:(?:\s|\n)*=(?:\s| > \n)*(?:".*?"|\'.*?\'|[^\'">\s]+))?)+(?:\s|\n)*|(?:\s|\n)*)/>%i'; I wouldn't name this really "thorough", but OK, if it serves your purposes, then use it. If you allow me some comments (OK, ignoring the ugly "new lines" workaround), I'd say: - - the approach you chose for determining the attributes list results in a too high level of alternation: TAG ( ATTRIBUTE+ BLANK* | BLANK* ) you could minimize it to: TAG ATTRIBUTE* BLANKS* - - attribute names may be more than "\w+" (eg.: 'http-equiv'). If you want to play friendly with at least HTML4, allow at least '-', but extending it to XML name/nmtoken rules should be even better ;-) - - be more economic when getting the attributes values by using negative lists, e.g. ".*?" -> "[^"]*" - - note that browsers can handle even aberrations like "<foo bar= >", making attribute value optional (even when "=" is in place) would do it > } > else{ > $pattern = '%<'.$tagName.'(?:(?:(?:\s|\n)+\w+(?:(?:\s|\n)*=(?:\s| > \n)*(?:".*?"|\'.*?\'|[^\'">\s]+))?)+(?:\s|\n)*|(?:\s|\n)*)>(?:(?:.| > \n)*?)</'.$tagName.'>%'; OK, I'm sure you know this one may fail badly, but anyway, at least allow whitespace in the "end-tag", after the tagName. ;-) > } cheers and my apologies for this off-topic message (is "Saturday night" a good excuse?) ;-) - -- Marius Feraru -----BEGIN PGP SIGNATURE----- iD8DBQFGPQQntZHp/AYZiNkRAix0AJ9MGhh1DX2lHvX6a7RhTJRntDhkxQCfZbOP JwlQQvGoofJ/B2Ndz4AQbhw= =W7Av -----END PGP SIGNATURE----- --~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "Ruby on Rails: Spinoffs" group. To post to this group, send email to [email protected] To unsubscribe from this group, send email to [EMAIL PROTECTED] For more options, visit this group at http://groups.google.com/group/rubyonrails-spinoffs?hl=en -~----------~----~----~----~------~----~------~--~---
