[xml] How could I understand the slightly difference of the parsing process of E1/E2[E3] and E1[E2][E3]?

Ming Chen Thu, 12 Jan 2012 21:47:52 -0800

 
Hi Experts,
 
Recently a colleague and I have disagreed on the parsing process of the 
E1/E2[E3], where E3 has a numeric type.  Here is the example XML file:
 
<?xml version="1.0" encoding="UTF-8"?>
   <xml>
      <table>
            <rec id="1">
            <para type="error" position="11"/>
            <para type="warning" position="12"/>
            <para type="warning" position="13"/>
         </rec>
         <rec id="2">
            <para type="warning" position="21"/>
            <para type="warning" position="22"/>
            <para type="warning" position="23"/>
         </rec>
         <rec id="3">
            <para type="info" position="31"/>
            <para type="warning" position="32"/>
            <para type="warning" position="33"/>
         </rec>
    </table>
</xml>
For XPath expression "//rec/para[1]", xmllint.exe outputs:
<para type="error" position="11"/><para type="warning" position="21"/><para 
type="info" position="31"/>


While my colleague said that the output should be:  <para type="error" 
position="11"/>

According to his explaination, only after all evaluations of E2 against all 
nodes resulting from the evaluation of E1, the evaluation of E3 can begin. As 
for the example, the process looks like:

rec(id = "1") -> all para under rec (id = "1") -> rec(id = "2") -> all para 
under rec (id = "2") -> rec(id = "3") -> all para under rec (id = "3") , the 
result sequence then acts as the input sequence of the evaluation of E3, so 
only the very first para node should be ouput. To me, this is a little bit like 
BFS (Breadth-First-Search).

On the contrary, I think the process should be:

rec(id = "1") -> all para under rec (id = "1") -> get the first para under rec 
(id = "1")  -> rec(id = "2") -> all para under rec (id = "2")  -> get the first 
para under rec (id = "2")  -> rec(id = "3") -> all para under rec (id = "3")  
-> get the first para under rec (id = "3"). The first para node under each 
rec should be output, as xmllint.exe did above. Could be regarded as DFS 
(Depth-First-Search)?

I havn't found a clear definition about the parsing rule from the XPath spec. 
Or you can say that I have not understood the spec well :). Anyway, I cannot 
persuade him even I have shown him many XML tools that really performs the same 
with xmllint.exe. Could someone give me the theoretical support?

My colleague used another XPath expression to support his opinion: 
"//rec[1]/para[@type="warning"][2]" (Please focus on the latter part: 
"para[@type="warning"][2]", I purposely used rec[1] to avoid mix-up).

In this case, all evaluations of E2 against all nodes resulting from the 
evaluation of E1 have been done before the evaluation of E3. The output of 
para[@type="warning"] act as the input sequence of E3. Sounds reasonable? BFS?

As descibed in section 3.3.2 Predicates of XPath spec 2.0 (1.0 should follow it 
too): In the case of multiple adjacent predicates, the predicates are applied 
from left to right, and the result of applying each predicate serves as the 
input sequence for the following predicate.

So, what's the nice distinction between E1/E2[E3] and E1[E2][E3]. They both 
evaluate using inner focus, a paragraph of the spec lists both together as if 
they completely consistent (but do they really follow the same parsing rule?):

Cited from 2.1.2 Dynamic Context (Spec 2.0):

Certain language constructs, notably the path expressionE1/E2 and the predicate 
E1[E2], create a new focus for the evaluation of a sub-expression. In these 
constructs,E2 is evaluated once for each item in the sequence that results from 
evaluating E1. Each timeE2 is evaluated, it is evaluated with a different 
focus. The focus for evaluating E2 is referred to below as the inner focus, 
while the focus for evaluatingE1 is referred to as the outer focus. The inner 
focus exists only while E2 is being evaluated. When this evaluation is 
complete, evaluation of the containing expression continues with its original 
focus unchanged.


Any commens will be appreciated.

Hi Liam, 

Could you have a look at this :)

Thanks,
Ming

_______________________________________________
xml mailing list, project page  http://xmlsoft.org/
xml@gnome.org
http://mail.gnome.org/mailman/listinfo/xml

[xml] How could I understand the slightly difference of the parsing process of E1/E2[E3] and E1[E2][E3]?

Reply via email to