David,

The problem you're describing is not a bug, your XPath query is executed correctly.

Let's see what happens when query /martif/text/body/termEntry[contains (langSet/ntig/termGrp/term/text(),'bancaire')] is executed. First, XPath finds all nodes with path /martif/text/body/termEntry/langSet/ ntig/termGrp/term and selects their children text nodes. The result of this step is node-set, which includes <term> data for every language. Then, XPath evaluates function contains(), where first argument is node-set. Per XPath specification [1], function contains expects two arguments of type string, not node-set, so it converts the first argument to string using function string(). When applied to node-set, it returns string value of the _first_ node in the document order.

Instead of checking <term> data for every language, it just checks if <term> data contains given string for language that happened to be first in the document. You can easily verify that by rearranging order of langSet tags in the document. The query /martif/text/body/termEntry [contains(langSet[starts-with(@lang,'fr')]/ntig/termGrp/term/text (),'bancaire')] works because of the same reason: contains() function only gets one langSet.

If you want the query that would check all the text nodes to see if they contain some substring, you can try something like that: /martif/text/body/termEntry[langSet/ntig/termGrp/term[contains(text (),'bancaire')]]

[1] http://www.w3.org/TR/1999/REC-xpath-19991116

Regards,
Natalia


On Oct 21, 2009, at 12:02 PM, David Vergnaud wrote:

Hi,

I'm reporting on a problem which I'm pretty much convinced is a bug in the current 1-2.dev version of Xindice (1.2m1). I'm using Xindice running on its own (no Tomcat) as a daemon on a Linux box (Suse 11) with JDK 1.6.

Basically, I have a DB where I've stored terminology entries that contain information about various banking terms in 4 languages. I want to be able to conduct two types of searches, one where the term is searched for only one of the languages, and one where the search is carried out in all languages. For this, I use two versions of a somewhat complicated XPath expression: one where the language is specified (as attribute of one of the nodes, in a predicate) and one where it isn't. This is the only difference between the two expressions. Surprisingly, the one where the language is fixed does return results where the one without specification doesn't. Besides, I've tested the XPath expression on other systems, and seen that there really should be results.

The first impression is that when evaluating function arguments inside a predicate, only the first node of a node set is evaluated. In my case, that would be confirmed by the following fact: each entry contains first the German word, then either French or English. When doing an "unrefined" search (no language specification) with a German word, results are returned. When doing the same unrefined search with French or English, no results are returned.

Here's an example of an XPath we're using, first with the language refinement, then without: /martif/text/body/termEntry[contains(langSet[starts-with (@lang,'fr')]/ntig/termGrp/term/text(),'bancaire')] /martif/text/body/termEntry[contains(langSet/ntig/termGrp/term/text (),'bancaire')]

As you can see, the goal is to extract a termEntry element which contains the word "bancaire" under the specified path. In the first path, I set the langSet to have attribute lang start with "fr" (for French), in the second I don't. As I said before, the first expression yields a result and the second one doesn't.

I'm including an example DB entry which can be used to test this -- I assume it should be possible to observe this behaviour with only one entry in the DB as well. In order to use the xpath above with it, one would need to prefix all node names in the xpath expression with "tbx" (I only removed that for legibility).

Should this prove to be an error on my side, I'd be grateful to anyone who'd point it out. Otherwise, it might need to be taken onto the Xindice bug list.

Cheers,

David

<?xml version="1.0"?>
<martif xmlns="http://www.lisa.org/tbx"; type="TBX" xml:lang="de-CH">
 <martifHeader>
   <fileDesc>
     <titleStmt>
       <title>
         Test-TerminologieDB        </title>
     </titleStmt>
     <publicationStmt>
       <p>
          Version 1.1        </p>
     </publicationStmt>
     <sourceDesc>
       <p>
          Version 1.1        </p>
     </sourceDesc>
   </fileDesc>
 </martifHeader>
 <text>
   <body>
     <termEntry>
       <descrip type="classificationCode" />
       <descrip type="subjectField">
       </descrip>
       <langSet xml:lang="de-CH">
         <transacGrp>
           <transac type="transactionType">
             created            </transac>
           <transacNote type="responsibility">
             STEA            </transacNote>
           <date>
             2009-09-15T14:44:54.924+02:00            </date>
         </transacGrp>
         <descrip type="reliabilityCode">
           1          </descrip>
         <note />
         <descripGrp>
           <descrip type="definition">
Die Garantie ist eine selbstständige, vom Hauptschuldverhältnis unabhängige Verpflichtung. Der Garant (die Bank) kann keinerlei Einwendungen und Einreden aus dem Grundgeschäft erheben. Das heisst: Der Garant zahlt auf erste schriftliche Anforderung (Inanspruchnahme) des Begünstigten, gegen Einreichung der im Garantietext vorgeschriebenen Bestätigung und allenfalls vorgeschriebenen Dokumente. </descrip>
           <adminGrp>
             <admin type="source">
               CS Glossar              </admin>
           </adminGrp>
         </descripGrp>
         <ntig>
           <termGrp>
             <term>
               Bankgarantie              </term>
             <termNote type="partOfSpeech" />
             <termNote type="grammaticalGender" />
             <termNote type="grammaticalNumber" />
             <termCompList type="lemma">
               <termComp />
             </termCompList>
             <termCompList type="morphologicalElement">
               <termComp />
             </termCompList>
             <termNote type="termType">
               main              </termNote>
             <termNote type="usageNote" />
           </termGrp>
           <adminGrp>
             <admin type="source">
               CS Glossar              </admin>
             <note />
           </adminGrp>
           <descripGrp>
             <descrip type="example" />
             <adminGrp>
               <admin type="source" />
             </adminGrp>
           </descripGrp>
           <note />
         </ntig>
         <ntig>
           <termGrp>
             <term />
             <termNote type="termType">
               abbr              </termNote>
           </termGrp>
           <adminGrp>
             <admin type="source" />
           </adminGrp>
         </ntig>
         <ntig>
           <termGrp>
             <term />
             <termNote type="termType">
               syn              </termNote>
             <termNote type="grammaticalGender" />
             <termCompList type="lemma">
               <termComp />
             </termCompList>
             <termCompList type="morphologicalElement">
               <termComp />
             </termCompList>
           </termGrp>
           <adminGrp>
             <admin type="source" />
             <note />
           </adminGrp>
           <descrip type="example" />
           <adminGrp>
             <admin type="source" />
           </adminGrp>
           <note />
         </ntig>
       </langSet>
       <langSet xml:lang="en-GB">
         <transacGrp>
           <transac type="transactionType">
             created            </transac>
           <transacNote type="responsibility">
             STEA            </transacNote>
           <date>
             2009-09-15T14:44:54.924+02:00            </date>
         </transacGrp>
         <descrip type="reliabilityCode">
           1          </descrip>
         <note />
         <descripGrp>
           <descrip type="definition" />
           <adminGrp>
             <admin type="source" />
           </adminGrp>
         </descripGrp>
         <ntig>
           <termGrp>
             <term>
               bank guarantee              </term>
             <termNote type="partOfSpeech" />
             <termNote type="grammaticalGender" />
             <termNote type="grammaticalNumber" />
             <termCompList type="lemma">
               <termComp />
             </termCompList>
             <termCompList type="morphologicalElement">
               <termComp />
             </termCompList>
             <termNote type="termType">
               main              </termNote>
             <termNote type="usageNote" />
           </termGrp>
           <adminGrp>
             <admin type="source">
               CS Glossar              </admin>
             <note />
           </adminGrp>
           <descripGrp>
             <descrip type="example" />
             <adminGrp>
               <admin type="source" />
             </adminGrp>
           </descripGrp>
           <note />
         </ntig>
         <ntig>
           <termGrp>
             <term />
             <termNote type="termType">
               abbr              </termNote>
           </termGrp>
           <adminGrp>
             <admin type="source" />
           </adminGrp>
         </ntig>
         <ntig>
           <termGrp>
             <term />
             <termNote type="termType">
               syn              </termNote>
             <termNote type="grammaticalGender" />
             <termCompList type="lemma">
               <termComp />
             </termCompList>
             <termCompList type="morphologicalElement">
               <termComp />
             </termCompList>
           </termGrp>
           <adminGrp>
             <admin type="source" />
             <note />
           </adminGrp>
           <descrip type="example" />
           <adminGrp>
             <admin type="source" />
           </adminGrp>
           <note />
         </ntig>
       </langSet>
       <langSet xml:lang="fr-CH">
         <transacGrp>
           <transac type="transactionType">
             created            </transac>
           <transacNote type="responsibility">
             STEA            </transacNote>
           <date>
             2009-09-15T14:44:54.924+02:00            </date>
         </transacGrp>
         <descrip type="reliabilityCode">
           1          </descrip>
         <note />
         <descripGrp>
           <descrip type="definition" />
           <adminGrp>
             <admin type="source" />
           </adminGrp>
         </descripGrp>
         <ntig>
           <termGrp>
             <term>
               garantie bancaire              </term>
             <termNote type="partOfSpeech" />
             <termNote type="grammaticalGender" />
             <termNote type="grammaticalNumber" />
             <termCompList type="lemma">
               <termComp />
             </termCompList>
             <termCompList type="morphologicalElement">
               <termComp />
             </termCompList>
             <termNote type="termType">
               main              </termNote>
             <termNote type="usageNote" />
           </termGrp>
           <adminGrp>
             <admin type="source">
               CS Glossar              </admin>
             <note />
           </adminGrp>
           <descripGrp>
             <descrip type="example" />
             <adminGrp>
               <admin type="source" />
             </adminGrp>
           </descripGrp>
           <note />
         </ntig>
         <ntig>
           <termGrp>
             <term />
             <termNote type="termType">
               abbr              </termNote>
           </termGrp>
           <adminGrp>
             <admin type="source" />
           </adminGrp>
         </ntig>
         <ntig>
           <termGrp>
             <term />
             <termNote type="termType">
               syn              </termNote>
             <termNote type="grammaticalGender" />
             <termCompList type="lemma">
               <termComp />
             </termCompList>
             <termCompList type="morphologicalElement">
               <termComp />
             </termCompList>
           </termGrp>
           <adminGrp>
             <admin type="source" />
             <note />
           </adminGrp>
           <descrip type="example" />
           <adminGrp>
             <admin type="source" />
           </adminGrp>
           <note />
         </ntig>
       </langSet>
       <langSet xml:lang="it-CH">
         <transacGrp>
           <transac type="transactionType">
             created            </transac>
           <transacNote type="responsibility">
             STEA            </transacNote>
           <date>
             2009-09-15T14:44:54.924+02:00            </date>
         </transacGrp>
         <descrip type="reliabilityCode">
           1          </descrip>
         <note />
         <descripGrp>
           <descrip type="definition" />
           <adminGrp>
             <admin type="source" />
           </adminGrp>
         </descripGrp>
         <ntig>
           <termGrp>
             <term>
               garanzia bancaria              </term>
             <termNote type="partOfSpeech" />
             <termNote type="grammaticalGender" />
             <termNote type="grammaticalNumber" />
             <termCompList type="lemma">
               <termComp />
             </termCompList>
             <termCompList type="morphologicalElement">
               <termComp />
             </termCompList>
             <termNote type="termType">
               main              </termNote>
             <termNote type="usageNote" />
           </termGrp>
           <adminGrp>
             <admin type="source">
               CS Glossar              </admin>
             <note />
           </adminGrp>
           <descripGrp>
             <descrip type="example" />
             <adminGrp>
               <admin type="source" />
             </adminGrp>
           </descripGrp>
           <note />
         </ntig>
         <ntig>
           <termGrp>
             <term />
             <termNote type="termType">
               abbr              </termNote>
           </termGrp>
           <adminGrp>
             <admin type="source" />
           </adminGrp>
         </ntig>
         <ntig>
           <termGrp>
             <term />
             <termNote type="termType">
               syn              </termNote>
             <termNote type="grammaticalGender" />
             <termCompList type="lemma">
               <termComp />
             </termCompList>
             <termCompList type="morphologicalElement">
               <termComp />
             </termCompList>
           </termGrp>
           <adminGrp>
             <admin type="source" />
             <note />
           </adminGrp>
           <descrip type="example" />
           <adminGrp>
             <admin type="source" />
           </adminGrp>
           <note />
         </ntig>
       </langSet>
     </termEntry>
   </body>
 </text>
</martif>

__________________________________________________
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around
http://mail.yahoo.com

Reply via email to