Re: ITS 2.0, Selectors 4 and Selectors API 2
On Mon, 2013-07-01 at 19:46 -0700, Tab Atkins Jr. wrote: [...] If you want Selectors to be able to select attribute nodes, address it directly with a new selector. This should not be smuggled in via the subject indicator. Maybe it would be simpler to support an XPath() selector? When you start using ITS you'll find other cases that get difficult with existing CSS selectors, e.g. . partShortDescription elements whose id attribute value appears in the list of id attributes in the includes attribute of a partsDiagram element in the same section, and where that diagram has a language=only attribute on the replacementCopies element, and the diagram issue year is earlier than 1996. This sort of thing is fairly frequently written with XPath selectors today, and is a plausible use case (e.g. the older exploded parts diagram is only available in spanish, includes Spanish labels for the parts that readers will have to match up to a table of part numbers, so they need the same text in the diagram and in the table). A rigorous comparison of XPath with CSS selectors would be worth doing; piecemeal attempts to duplicate functionality don't seem worthwhile to me. On the other hand I do agree that it sounds like some limitation in CSS selector namespace handling could be alleviated. Liam -- Liam Quin - XML Activity Lead, W3C, http://www.w3.org/People/Quin/ Pictures from old books: http://fromoldbooks.org/ Ankh: irc.sorcery.net irc.gnome.org freenode/#xml
Re: XPath and find/findAll methods
On Tue, 2011-11-29 at 18:09 +0200, Henri Sivonen wrote: On Tue, Nov 29, 2011 at 7:33 AM, Liam R E Quin l...@w3.org wrote: (2) Not a dead end [...] Thanks for responding, Henri. A detailed reply follows, but the short answer is - (1) yes, browsers could be using the latest XPath. It would help authors greatly. (2) yes, there are some issues to resolve. The way to resolve issues is liaison, and working together. We should do more of that. (3) the example I gave can in fact work in an XPath 2 environment. (4) Backwards Compatibility Mode is probably rather badly named; just as HTML has legacy content, so does XSLT. This is why I suggest that, if a findAll() is introduced that can do languages other than CSS Selectors, it allow a version number, with the meaning, I'm expecting at least this version of the language. Then one could write code, if necessary to fall back, maybe downloading emulation code in JavaScript if e.g. CSS selectors 5 or XPath 4 or whatever wasn't available. None of this is a reason not to make the existing XPath in Web browsers easier to use today, though. I felt I needed to post to balance things a bit because the status of XPath seemed unclear to some people on the list. Slightly more detailed reply... Sure, XPath and XSLT keep being developed. What I meant by evolutionary dead end is that the XPath 1.0-compatibile evolutionary path has been relegated to a separate mode instead of XPath 2.0 and newer being compatible by design.So the new development you cite happens with Compatibility Mode set to false. We can't change history, the compatibility mode is about compatibility e.g. with pre-existing XSLT 1 stylesheets. I don't see HTML 5 adding the canvas element to RFC 1866 (HTML 2) either - and wouldn't expect it. Just as you don't expect people working on Mosaic or Cello to need new features in the HTML 2 spec, we're not adding new features to XPath 1, because we already did that and called it XPath 2 :-) Rather, the XPath spec is careful to document the differences in behaviour. We should really get rid of callign it compatibility mode and have specific feature tests instead -- it'd be a lot clearer. The main differences are * XPath 2 introduces sequences, so you can have sequences of arbitrary values, not just nodelists; the empty sequence is returned in some cases where it makes sense but Xath 1 used NaN or silently failed in some other way; this could be a sequence-available feature. * XPath 2 introduced named typing, rather like the C Programming Language - e.g. a sockSize can be treated differently from a shoeSize even though they are both numeric. Amongst other things, this allows a saner interpretation of A = B, in the case that A and B are xs:boolean. Although these are factored out into compatibility mode, in fact, they are mostly cases that could never arise in XPath 1, as it didn't have the typing system, so we could probably merge them rather than having a type-system-available feature test. * a number of error cases now raise errors instead of either failing or doing an obviously wrong thing. E.g. if $n is a nodelist of 3 paragraphs having content 11, 2, and 3 respectively, in XPath 1, $n + 6 gives 17, and in XPath 2 it gives an error. But a Web browser could plausibly use the XPath 1 behaviour and also emit a warning in the developer console I think. * XPath 2 allows implementations much more freedom in rearranging evaluation order, greatly improving performance. * there are some other minor changes listed in appendix I of the current XPath 2.0 draft [1]. Most of these result in errors, so a browser could easily allow them and produce helpful warnings. E.g. 1 $a 6 will always be true, if $a is numeric, in XPath 1 (it's evaluated left to right and you get 0 or 1 for 1 $a), and is an error in XPath 2. * there are some changes to do with DTD and Schema handling that do not affect Web browsers. Frankly the differences are probably comparable to differences between versions of HTML or editions of CSS -- the later specs get more precise, and in the case of ambiguities or weird corner cases sometimes it means a change. The HTML 5 parsing goal was that all browsers would produce the same DOM for a given document, and since that wasn't true previously, some documents clearly now generate a different DOM. I don't have enough data about existing XPath-using Web content to know how badly the Web would break if browsers started interpreting existing XPath (1.x) expressions as XPath 2.x expression with Compatibility Mode set to false, but the fact that the WG felt that it needed to define a compatibility mode suggests that the WG itself believed the changes to be breaking ones. They are breaking in a sense - the culture of the XML Activity is to be very detailed and increasingly precise, so yes, there _are_ possible XPath expressions which changed meaning. The answer to this is not to avoid any spec that changes, as that would also mean avoiding
XPath and find/findAll methods
Wearing my XML Activity Lead hat, I want to give some information that may help people decide here. The actual answer isn't my concern, but only that it's based on clear information. (0) XPath XPath is a language for selecting from XML (or HTML or SGML) document trees. It is used by some other specs, including XML Schema and XSLT, and it's extended by XQuery. XPath is very widely used in the XML world, e.g. in servers and on desktops and in shoes :-) I've lost track of the number of implementations of XPath, even though there are just a few dozen major implementations of course. XPath is popular because it has a regular syntax that's easy to learn and a good fit for XML. (1) XPath 1, 2 and 3 compatibility XPath 2 is backwards compatible with XPath 1; there _are_ some very minor differences, most of which would not affect Web browsers at all because they depend on DTDs or Schemas. Similarly, an XSLT 2 engine will interpret XSLT 1 transformations. There are some exceptions listed in the Backward Compatibility section of the XSLT 2 spec, but they are very minor. (2) Not a dead end XSLT 1 and XPath 1 are not evolutionary dead ends although it's true that neither the xt nor the libxml2 library supports XSLT 2 and XPath 2. There's some support (along with XQuery) in the Qt libraries, and also in C++ with XQilla and Zorba. There are maybe 50 implementations of XPath 2 and/or XQuery 2 that I've encountered. XQuery 3.0 and XPath 3.0 are about to go to Last Call, we hope, and XSLT 3.0 to follow next year. The work is very much active and alive. (3) XPath and efficiency XPath can be implemented very efficiently. In most cases in practice, O(1) or O(log n) can be achieved. Some of the techniques modern XPath libraries use are also used by Web browsers for CSS selectors - e.g. keep an index of elements, and evaluate from the right-hand end (most specific) or start with whichever element occurs the fewest times. There are implementations of XQuery (an extension of XPath) being used with petabytes of XML data. That is not to say you couldn't also use CSS selectors on petabytes of data -- it's not an either-or or a battle between the two languages. XQuery response times are generally measured in milliseconds, although, as with SQL or JavaScript or Java or C, you can write infinite loops :-) The trick is to notice that there are idioms in XPath that can be optimised much more easily than the corresponding JavaScript code. There have been some papers at VLDB on XPath and XQuery optimization. XPath was written with efficiency and optimization in mind, and drew on implementation experience. (4) XMLness XPath 2 is actually defined in terms of a data model, and can work over non-XML sources - e.g. not just HTML and XML, but also relational data and anything else that can be represented usefully with a similar data model. It lacks arrays and hashes/maps, which makes JSON support somewhat inconvenient, but there are people working on extendng XPath to handle JSON more gracefully (e.g. via JSONIQ). (5) Orthogonality I think this was mentioned in discussions but may not've been clear. In general, XPath is an expression language - anywhere you can have an expression, you can have any expression, and the expressions all work together. For example, predicates can contain any XPath expression, recursively: /html/body/div/p[@id = /html/head/link[@rel = 'me']/@src]/strong This is all strong elements in p elements that are direct children of div elements that are direct children of the body element, and whose p parent has an d attribute that has the same value as the src attribute of a link element in the head which has rel=me. (this is a microformat-style query on a document, of course) XPath selectors give a different way of looking at finding things than CSS selectors and probably appeal in differing amounts to different people. (6) Note on History Not really important today, but someone mentioned it, so I'll note that XPath came out of SGML and (later) HyTime work dating back long before the World Wide Web and CSS; that work really ended with the publication of DSSSL and HyTime, but many of the same people were (and in some cases are) involved with XML and XPath. XPath has different goals from CSS selectors, and there's not actually a battle between them XSLT and XQuery are widely used on the back end of Web apps, and less often in the browser, but in some environments the browser-based support can be very useful, depending on the division of labour. (7) XPath Selectors and CSS Selectors There's a huge overlap of functionality. Some claims were made based on misunderstandings (in both directions probably, but I can only correct the ones about XPath)... XPath can't handle things like :hover or :first-line -- not true. XPath has a mechanism by which a browser would support them, using a functional notation: //a[hover()] It's perfectly standard for an implementation to add functions.