Re: ITS 2.0, Selectors 4 and Selectors API 2

2013-07-01 Thread Liam R E Quin
On Mon, 2013-07-01 at 19:46 -0700, Tab Atkins Jr. wrote:
[...]
 If you want Selectors to be able to select attribute nodes, address it
 directly with a new selector.  This should not be smuggled in via the
 subject indicator.

Maybe it would be simpler to support an XPath() selector?

When you start using ITS you'll find other cases that get difficult with
existing CSS selectors, e.g.
  . partShortDescription elements whose id attribute value appears in
the list of id attributes in the includes attribute of a partsDiagram
element in the same section, and where that diagram has a
language=only attribute on the replacementCopies element, and the
diagram issue year is earlier than 1996.

This sort of thing is fairly frequently written with XPath selectors
today, and is a plausible use case (e.g. the older exploded parts
diagram is only available in spanish, includes Spanish labels for the
parts that readers will have to match up to a table of part numbers, so
they need the same text in the diagram and in the table).

A rigorous comparison of XPath with CSS selectors would be worth doing;
piecemeal attempts to duplicate functionality don't seem worthwhile to
me. On the other hand I do agree that it sounds like some limitation in
CSS selector namespace handling could be alleviated.

Liam

-- 
Liam Quin - XML Activity Lead, W3C, http://www.w3.org/People/Quin/
Pictures from old books: http://fromoldbooks.org/
Ankh: irc.sorcery.net irc.gnome.org freenode/#xml




Re: XPath and find/findAll methods

2011-11-30 Thread Liam R E Quin
On Tue, 2011-11-29 at 18:09 +0200, Henri Sivonen wrote:
 On Tue, Nov 29, 2011 at 7:33 AM, Liam R E Quin l...@w3.org wrote:
  (2) Not a dead end
[...]

Thanks for responding, Henri.

A detailed reply follows, but the short answer is -
(1) yes, browsers could be using the latest XPath. It would help authors
greatly.
(2) yes, there are some issues to resolve. The way to resolve issues is
liaison, and working together.  We should do more of that.
(3) the example I gave can in fact work in an XPath 2 environment.
(4) Backwards Compatibility Mode is probably rather badly named; just as
HTML has legacy content, so does XSLT.

This is why I suggest that, if a findAll() is introduced that can do
languages other than CSS Selectors, it allow a version number, with the
meaning, I'm expecting at least this version of the language. Then one
could write code, if necessary to fall back, maybe downloading emulation
code in JavaScript if e.g. CSS selectors 5 or XPath 4 or whatever wasn't
available.

None of this is a reason not to make the existing XPath in Web browsers
easier to use today, though. I felt I needed to post to balance things a
bit because the status of XPath seemed unclear to some people on the
list.

Slightly more detailed reply...

 Sure, XPath and XSLT keep being developed. What I meant by
 evolutionary dead end is that the XPath 1.0-compatibile evolutionary
 path has been relegated to a separate mode instead of XPath 2.0 and
 newer being compatible by design.So the new development you cite
 happens with Compatibility Mode set to false. 

We can't change history, the compatibility mode is about compatibility
e.g. with pre-existing XSLT 1 stylesheets.  I don't see HTML 5 adding
the canvas element to RFC 1866 (HTML 2) either - and wouldn't expect it.
Just as you don't expect people working on Mosaic or Cello to need new
features in the HTML 2 spec, we're not adding new features to XPath 1,
because we already did that and called it XPath 2 :-)

Rather, the XPath spec is careful to document the differences in
behaviour.

We should really get rid of callign it compatibility mode and have
specific feature tests instead -- it'd be a lot clearer.  The main
differences are
* XPath 2 introduces sequences, so you can have sequences of arbitrary
values, not just nodelists; the empty sequence is returned in some cases
where it makes sense but Xath 1 used NaN or silently failed in some
other way; this could be a sequence-available feature.

* XPath 2 introduced named typing, rather like the C Programming
Language - e.g. a sockSize can be treated differently from a shoeSize
even though they are both numeric. Amongst other things, this allows a
saner interpretation of A = B, in the case that A and B are xs:boolean.
Although these are factored out into compatibility mode, in fact, they
are mostly cases that could never arise in XPath 1, as it didn't have
the typing system, so we could probably merge them rather than having a
type-system-available feature test.

* a number of error cases now raise errors instead of either failing or
doing an obviously wrong thing.  E.g. if $n is a nodelist of 3
paragraphs having content 11, 2, and 3 respectively, in XPath 1,
$n + 6 gives 17, and in XPath 2 it gives an error. But a Web browser
could plausibly use the XPath 1 behaviour and also emit a warning in the
developer console I think.

* XPath 2 allows implementations much more freedom in rearranging
evaluation order, greatly improving performance.

* there are some other minor changes listed in appendix I of the current
XPath 2.0 draft [1]. Most of these result in errors, so a browser could
easily allow them and produce helpful warnings. E.g. 1  $a  6 will
always be true, if $a is numeric, in XPath 1 (it's evaluated left to
right and you get 0 or 1 for 1  $a), and is an error in XPath 2.

* there are some changes to do with DTD and Schema handling that do not
affect Web browsers.

Frankly the differences are probably comparable to differences between
versions of HTML or editions of CSS -- the later specs get more precise,
and in the case of ambiguities or weird corner cases sometimes it means
a change.  The HTML 5 parsing goal was that all browsers would produce
the same DOM for a given document, and since that wasn't true
previously, some documents clearly now generate a different DOM.


 I don't have enough data about existing XPath-using Web content to
 know how badly the Web would break if browsers started interpreting
 existing XPath (1.x) expressions as XPath 2.x expression with
 Compatibility Mode set to false, but the fact that the WG felt that it
 needed to define a compatibility mode suggests that the WG itself
 believed the changes to be breaking ones.

They are breaking in a sense - the culture of the XML Activity is to be
very detailed and increasingly precise, so yes, there _are_ possible
XPath expressions which changed meaning.

The answer to this is not to avoid any spec that changes, as that would
also mean avoiding

XPath and find/findAll methods

2011-11-28 Thread Liam R E Quin
Wearing my XML Activity Lead hat, I want to give some information that
may help people decide here.  The actual answer isn't my concern, but
only that it's based on clear information.

(0) XPath

XPath is a language for selecting from XML (or HTML or SGML) document
trees.  It is used by some other specs, including XML Schema and XSLT,
and it's extended by XQuery.  XPath is very widely used in the XML
world, e.g. in servers and on desktops and in shoes :-)  I've lost track
of the number of implementations of XPath, even though there are just a
few dozen major implementations of course.

XPath is popular because it has a regular syntax that's easy to learn
and a good fit for XML.


(1) XPath 1, 2 and 3 compatibility

XPath 2 is backwards compatible with XPath 1; there _are_ some very
minor differences, most of which would not affect Web browsers at all
because they depend on DTDs or Schemas.

Similarly, an XSLT 2 engine will interpret XSLT 1 transformations.
There are some exceptions listed in the Backward Compatibility section
of the XSLT 2 spec, but they are very minor.


(2) Not a dead end

XSLT 1 and XPath 1 are not evolutionary dead ends although it's true
that neither the xt nor the libxml2 library supports XSLT 2 and XPath 2.
There's some support (along with XQuery) in the Qt libraries, and also
in C++ with XQilla and Zorba.  There are maybe 50 implementations of
XPath 2 and/or XQuery 2 that I've encountered.  XQuery 3.0 and XPath 3.0
are about to go to Last Call, we hope, and XSLT 3.0 to follow next year.
The work is very much active and alive.


(3) XPath and efficiency

XPath can be implemented very efficiently.

In most cases in practice, O(1) or O(log n) can be achieved. Some of the
techniques modern XPath libraries use are also used by Web browsers for
CSS selectors - e.g. keep an index of elements, and evaluate from the
right-hand end (most specific) or start with whichever element occurs
the fewest times.

There are implementations of XQuery (an extension of XPath) being used
with petabytes of XML data.  That is not to say you couldn't also use
CSS selectors on petabytes of data -- it's not an either-or or a battle
between the two languages.  XQuery response times are generally measured
in milliseconds, although, as with SQL or JavaScript or Java or C, you
can write infinite loops :-)

The trick is to notice that there are idioms in XPath that can be
optimised much more easily than the corresponding JavaScript code.
There have been some papers at VLDB on XPath and XQuery optimization.

XPath was written with efficiency and optimization in mind, and drew on
implementation experience.


(4) XMLness

XPath 2 is actually defined in terms of a data model, and can work over
non-XML sources - e.g. not just HTML and XML, but also relational data
and anything else that can be represented usefully with a similar data
model.  It lacks arrays and hashes/maps, which makes JSON support
somewhat inconvenient, but there are people working on extendng XPath to
handle JSON more gracefully (e.g. via JSONIQ).


(5) Orthogonality

I think this was mentioned in discussions but may not've been clear.

In general, XPath is an expression language - anywhere you can have an
expression, you can have any expression, and the expressions all work
together.  For example, predicates can contain any XPath expression,
recursively:
/html/body/div/p[@id = /html/head/link[@rel = 'me']/@src]/strong

This is all strong elements in p elements that are direct children of
div elements that are direct children of the body element, and whose p
parent has an d attribute that has the same value as the src attribute
of a link element in the head which has rel=me.  (this is a
microformat-style query on a document, of course)

XPath selectors give a different way of looking at finding things than
CSS selectors and probably appeal in differing amounts to different
people.


(6) Note on History

Not really important today, but someone mentioned it, so I'll note that
XPath came out of SGML and (later) HyTime work dating back long before
the World Wide Web and CSS; that work really ended with the publication
of DSSSL and HyTime, but many of the same people were (and in some cases
are) involved with XML and XPath.

XPath has different goals from CSS selectors, and there's not actually a
battle between them

XSLT and XQuery are widely used on the back end of Web apps, and less
often in the browser, but in some environments the browser-based support
can be very useful, depending on the division of labour.

(7) XPath Selectors and CSS Selectors

There's a huge overlap of functionality. Some claims were made based on
misunderstandings (in both directions probably, but I can only correct
the ones about XPath)...

XPath can't handle things like :hover or :first-line -- not true.
XPath has a mechanism by which a browser would support them, using a
functional notation:
//a[hover()]
It's perfectly standard for an implementation to add functions.