David,
The problem you're describing is not a bug, your XPath query is
executed correctly.
Let's see what happens when query /martif/text/body/termEntry[contains
(langSet/ntig/termGrp/term/text(),'bancaire')] is executed. First,
XPath finds all nodes with path /martif/text/body/termEntry/langSet/
ntig/termGrp/term and selects their children text nodes. The result of
this step is node-set, which includes <term> data for every language.
Then, XPath evaluates function contains(), where first argument is
node-set. Per XPath specification [1], function contains expects two
arguments of type string, not node-set, so it converts the first
argument to string using function string(). When applied to node-set,
it returns string value of the _first_ node in the document order.
Instead of checking <term> data for every language, it just checks if
<term> data contains given string for language that happened to be
first in the document. You can easily verify that by rearranging order
of langSet tags in the document. The query /martif/text/body/termEntry
[contains(langSet[starts-with(@lang,'fr')]/ntig/termGrp/term/text
(),'bancaire')] works because of the same reason: contains() function
only gets one langSet.
If you want the query that would check all the text nodes to see if
they contain some substring, you can try something like that:
/martif/text/body/termEntry[langSet/ntig/termGrp/term[contains(text
(),'bancaire')]]
[1] http://www.w3.org/TR/1999/REC-xpath-19991116
Regards,
Natalia
On Oct 21, 2009, at 12:02 PM, David Vergnaud wrote:
Hi,
I'm reporting on a problem which I'm pretty much convinced is a bug
in the current 1-2.dev version of Xindice (1.2m1). I'm using Xindice
running on its own (no Tomcat) as a daemon on a Linux box (Suse 11)
with JDK 1.6.
Basically, I have a DB where I've stored terminology entries that
contain information about various banking terms in 4 languages. I
want to be able to conduct two types of searches, one where the term
is searched for only one of the languages, and one where the search
is carried out in all languages. For this, I use two versions of a
somewhat complicated XPath expression: one where the language is
specified (as attribute of one of the nodes, in a predicate) and one
where it isn't. This is the only difference between the two
expressions. Surprisingly, the one where the language is fixed does
return results where the one without specification doesn't. Besides,
I've tested the XPath expression on other systems, and seen that
there really should be results.
The first impression is that when evaluating function arguments
inside a predicate, only the first node of a node set is evaluated.
In my case, that would be confirmed by the following fact: each
entry contains first the German word, then either French or English.
When doing an "unrefined" search (no language specification) with a
German word, results are returned. When doing the same unrefined
search with French or English, no results are returned.
Here's an example of an XPath we're using, first with the language
refinement, then without:
/martif/text/body/termEntry[contains(langSet[starts-with
(@lang,'fr')]/ntig/termGrp/term/text(),'bancaire')]
/martif/text/body/termEntry[contains(langSet/ntig/termGrp/term/text
(),'bancaire')]
As you can see, the goal is to extract a termEntry element which
contains the word "bancaire" under the specified path. In the first
path, I set the langSet to have attribute lang start with "fr" (for
French), in the second I don't. As I said before, the first
expression yields a result and the second one doesn't.
I'm including an example DB entry which can be used to test this --
I assume it should be possible to observe this behaviour with only
one entry in the DB as well. In order to use the xpath above with
it, one would need to prefix all node names in the xpath expression
with "tbx" (I only removed that for legibility).
Should this prove to be an error on my side, I'd be grateful to
anyone who'd point it out. Otherwise, it might need to be taken onto
the Xindice bug list.
Cheers,
David
<?xml version="1.0"?>
<martif xmlns="http://www.lisa.org/tbx" type="TBX" xml:lang="de-CH">
<martifHeader>
<fileDesc>
<titleStmt>
<title>
Test-TerminologieDB </title>
</titleStmt>
<publicationStmt>
<p>
Version 1.1 </p>
</publicationStmt>
<sourceDesc>
<p>
Version 1.1 </p>
</sourceDesc>
</fileDesc>
</martifHeader>
<text>
<body>
<termEntry>
<descrip type="classificationCode" />
<descrip type="subjectField">
</descrip>
<langSet xml:lang="de-CH">
<transacGrp>
<transac type="transactionType">
created </transac>
<transacNote type="responsibility">
STEA </transacNote>
<date>
2009-09-15T14:44:54.924+02:00 </date>
</transacGrp>
<descrip type="reliabilityCode">
1 </descrip>
<note />
<descripGrp>
<descrip type="definition">
Die Garantie ist eine selbstständige, vom
Hauptschuldverhältnis unabhängige Verpflichtung. Der Garant (die
Bank) kann keinerlei Einwendungen und Einreden aus dem Grundgeschäft
erheben. Das heisst: Der Garant zahlt auf erste schriftliche
Anforderung (Inanspruchnahme) des Begünstigten, gegen Einreichung
der im Garantietext vorgeschriebenen Bestätigung und allenfalls
vorgeschriebenen Dokumente. </descrip>
<adminGrp>
<admin type="source">
CS Glossar </admin>
</adminGrp>
</descripGrp>
<ntig>
<termGrp>
<term>
Bankgarantie </term>
<termNote type="partOfSpeech" />
<termNote type="grammaticalGender" />
<termNote type="grammaticalNumber" />
<termCompList type="lemma">
<termComp />
</termCompList>
<termCompList type="morphologicalElement">
<termComp />
</termCompList>
<termNote type="termType">
main </termNote>
<termNote type="usageNote" />
</termGrp>
<adminGrp>
<admin type="source">
CS Glossar </admin>
<note />
</adminGrp>
<descripGrp>
<descrip type="example" />
<adminGrp>
<admin type="source" />
</adminGrp>
</descripGrp>
<note />
</ntig>
<ntig>
<termGrp>
<term />
<termNote type="termType">
abbr </termNote>
</termGrp>
<adminGrp>
<admin type="source" />
</adminGrp>
</ntig>
<ntig>
<termGrp>
<term />
<termNote type="termType">
syn </termNote>
<termNote type="grammaticalGender" />
<termCompList type="lemma">
<termComp />
</termCompList>
<termCompList type="morphologicalElement">
<termComp />
</termCompList>
</termGrp>
<adminGrp>
<admin type="source" />
<note />
</adminGrp>
<descrip type="example" />
<adminGrp>
<admin type="source" />
</adminGrp>
<note />
</ntig>
</langSet>
<langSet xml:lang="en-GB">
<transacGrp>
<transac type="transactionType">
created </transac>
<transacNote type="responsibility">
STEA </transacNote>
<date>
2009-09-15T14:44:54.924+02:00 </date>
</transacGrp>
<descrip type="reliabilityCode">
1 </descrip>
<note />
<descripGrp>
<descrip type="definition" />
<adminGrp>
<admin type="source" />
</adminGrp>
</descripGrp>
<ntig>
<termGrp>
<term>
bank guarantee </term>
<termNote type="partOfSpeech" />
<termNote type="grammaticalGender" />
<termNote type="grammaticalNumber" />
<termCompList type="lemma">
<termComp />
</termCompList>
<termCompList type="morphologicalElement">
<termComp />
</termCompList>
<termNote type="termType">
main </termNote>
<termNote type="usageNote" />
</termGrp>
<adminGrp>
<admin type="source">
CS Glossar </admin>
<note />
</adminGrp>
<descripGrp>
<descrip type="example" />
<adminGrp>
<admin type="source" />
</adminGrp>
</descripGrp>
<note />
</ntig>
<ntig>
<termGrp>
<term />
<termNote type="termType">
abbr </termNote>
</termGrp>
<adminGrp>
<admin type="source" />
</adminGrp>
</ntig>
<ntig>
<termGrp>
<term />
<termNote type="termType">
syn </termNote>
<termNote type="grammaticalGender" />
<termCompList type="lemma">
<termComp />
</termCompList>
<termCompList type="morphologicalElement">
<termComp />
</termCompList>
</termGrp>
<adminGrp>
<admin type="source" />
<note />
</adminGrp>
<descrip type="example" />
<adminGrp>
<admin type="source" />
</adminGrp>
<note />
</ntig>
</langSet>
<langSet xml:lang="fr-CH">
<transacGrp>
<transac type="transactionType">
created </transac>
<transacNote type="responsibility">
STEA </transacNote>
<date>
2009-09-15T14:44:54.924+02:00 </date>
</transacGrp>
<descrip type="reliabilityCode">
1 </descrip>
<note />
<descripGrp>
<descrip type="definition" />
<adminGrp>
<admin type="source" />
</adminGrp>
</descripGrp>
<ntig>
<termGrp>
<term>
garantie bancaire </term>
<termNote type="partOfSpeech" />
<termNote type="grammaticalGender" />
<termNote type="grammaticalNumber" />
<termCompList type="lemma">
<termComp />
</termCompList>
<termCompList type="morphologicalElement">
<termComp />
</termCompList>
<termNote type="termType">
main </termNote>
<termNote type="usageNote" />
</termGrp>
<adminGrp>
<admin type="source">
CS Glossar </admin>
<note />
</adminGrp>
<descripGrp>
<descrip type="example" />
<adminGrp>
<admin type="source" />
</adminGrp>
</descripGrp>
<note />
</ntig>
<ntig>
<termGrp>
<term />
<termNote type="termType">
abbr </termNote>
</termGrp>
<adminGrp>
<admin type="source" />
</adminGrp>
</ntig>
<ntig>
<termGrp>
<term />
<termNote type="termType">
syn </termNote>
<termNote type="grammaticalGender" />
<termCompList type="lemma">
<termComp />
</termCompList>
<termCompList type="morphologicalElement">
<termComp />
</termCompList>
</termGrp>
<adminGrp>
<admin type="source" />
<note />
</adminGrp>
<descrip type="example" />
<adminGrp>
<admin type="source" />
</adminGrp>
<note />
</ntig>
</langSet>
<langSet xml:lang="it-CH">
<transacGrp>
<transac type="transactionType">
created </transac>
<transacNote type="responsibility">
STEA </transacNote>
<date>
2009-09-15T14:44:54.924+02:00 </date>
</transacGrp>
<descrip type="reliabilityCode">
1 </descrip>
<note />
<descripGrp>
<descrip type="definition" />
<adminGrp>
<admin type="source" />
</adminGrp>
</descripGrp>
<ntig>
<termGrp>
<term>
garanzia bancaria </term>
<termNote type="partOfSpeech" />
<termNote type="grammaticalGender" />
<termNote type="grammaticalNumber" />
<termCompList type="lemma">
<termComp />
</termCompList>
<termCompList type="morphologicalElement">
<termComp />
</termCompList>
<termNote type="termType">
main </termNote>
<termNote type="usageNote" />
</termGrp>
<adminGrp>
<admin type="source">
CS Glossar </admin>
<note />
</adminGrp>
<descripGrp>
<descrip type="example" />
<adminGrp>
<admin type="source" />
</adminGrp>
</descripGrp>
<note />
</ntig>
<ntig>
<termGrp>
<term />
<termNote type="termType">
abbr </termNote>
</termGrp>
<adminGrp>
<admin type="source" />
</adminGrp>
</ntig>
<ntig>
<termGrp>
<term />
<termNote type="termType">
syn </termNote>
<termNote type="grammaticalGender" />
<termCompList type="lemma">
<termComp />
</termCompList>
<termCompList type="morphologicalElement">
<termComp />
</termCompList>
</termGrp>
<adminGrp>
<admin type="source" />
<note />
</adminGrp>
<descrip type="example" />
<adminGrp>
<admin type="source" />
</adminGrp>
<note />
</ntig>
</langSet>
</termEntry>
</body>
</text>
</martif>
__________________________________________________
Do You Yahoo!?
Tired of spam? Yahoo! Mail has the best spam protection around
http://mail.yahoo.com