Hi Natalia, thx a lot for your reply and making this clear to me. I thought the default behaviour for functions such as contains() when receiving a set of text nodes as argument was to apply the processing to each node separately and in turn return a set of booleans. Seeing as this is wrong, I perfectly get your point. As it appears, the online tool I used to check my claim made the same incorrect assumption.
Again thanks and regards, David ----- Original Message ---- From: Natalia Shilenkova <nshilenk...@gmail.com> To: xindice-users@xml.apache.org Sent: Thu, October 22, 2009 12:13:55 AM Subject: Re: Incorrect "No result for query" for XPath expression David, The problem you're describing is not a bug, your XPath query is executed correctly. Let's see what happens when query /martif/text/body/termEntry[contains(langSet/ntig/termGrp/term/text(),'bancaire')] is executed. First, XPath finds all nodes with path /martif/text/body/termEntry/langSet/ntig/termGrp/term and selects their children text nodes. The result of this step is node-set, which includes <term> data for every language. Then, XPath evaluates function contains(), where first argument is node-set. Per XPath specification [1], function contains expects two arguments of type string, not node-set, so it converts the first argument to string using function string(). When applied to node-set, it returns string value of the _first_ node in the document order. Instead of checking <term> data for every language, it just checks if <term> data contains given string for language that happened to be first in the document. You can easily verify that by rearranging order of langSet tags in the document. The query /martif/text/body/termEntry[contains(langSet[starts-with(@lang,'fr')]/ntig/termGrp/term/text(),'bancaire')] works because of the same reason: contains() function only gets one langSet. If you want the query that would check all the text nodes to see if they contain some substring, you can try something like that: /martif/text/body/termEntry[langSet/ntig/termGrp/term[contains(text(),'bancaire')]] [1] http://www.w3.org/TR/1999/REC-xpath-19991116 Regards, Natalia On Oct 21, 2009, at 12:02 PM, David Vergnaud wrote: > Hi, > > I'm reporting on a problem which I'm pretty much convinced is a bug in the > current 1-2.dev version of Xindice (1.2m1). I'm using Xindice running on its > own (no Tomcat) as a daemon on a Linux box (Suse 11) with JDK 1.6. > > Basically, I have a DB where I've stored terminology entries that contain > information about various banking terms in 4 languages. I want to be able to > conduct two types of searches, one where the term is searched for only one of > the languages, and one where the search is carried out in all languages. For > this, I use two versions of a somewhat complicated XPath expression: one > where the language is specified (as attribute of one of the nodes, in a > predicate) and one where it isn't. This is the only difference between the > two expressions. Surprisingly, the one where the language is fixed does > return results where the one without specification doesn't. Besides, I've > tested the XPath expression on other systems, and seen that there really > should be results. > > The first impression is that when evaluating function arguments inside a > predicate, only the first node of a node set is evaluated. In my case, that > would be confirmed by the following fact: each entry contains first the > German word, then either French or English. When doing an "unrefined" search > (no language specification) with a German word, results are returned. When > doing the same unrefined search with French or English, no results are > returned. > > Here's an example of an XPath we're using, first with the language > refinement, then without: > /martif/text/body/termEntry[contains(langSet[starts-with(@lang,'fr')]/ntig/termGrp/term/text(),'bancaire')] > /martif/text/body/termEntry[contains(langSet/ntig/termGrp/term/text(),'bancaire')] > > As you can see, the goal is to extract a termEntry element which contains the > word "bancaire" under the specified path. In the first path, I set the > langSet to have attribute lang start with "fr" (for French), in the second I > don't. As I said before, the first expression yields a result and the second > one doesn't. > > I'm including an example DB entry which can be used to test this -- I assume > it should be possible to observe this behaviour with only one entry in the DB > as well. In order to use the xpath above with it, one would need to prefix > all node names in the xpath expression with "tbx" (I only removed that for > legibility). > > Should this prove to be an error on my side, I'd be grateful to anyone who'd > point it out. Otherwise, it might need to be taken onto the Xindice bug list. > > Cheers, > > David > > <?xml version="1.0"?> > <martif xmlns="http://www.lisa.org/tbx" type="TBX" xml:lang="de-CH"> > <martifHeader> > <fileDesc> > <titleStmt> > <title> > Test-TerminologieDB </title> > </titleStmt> > <publicationStmt> > <p> > Version 1.1 </p> > </publicationStmt> > <sourceDesc> > <p> > Version 1.1 </p> > </sourceDesc> > </fileDesc> > </martifHeader> > <text> > <body> > <termEntry> > <descrip type="classificationCode" /> > <descrip type="subjectField"> > </descrip> > <langSet xml:lang="de-CH"> > <transacGrp> > <transac type="transactionType"> > created </transac> > <transacNote type="responsibility"> > STEA </transacNote> > <date> > 2009-09-15T14:44:54.924+02:00 </date> > </transacGrp> > <descrip type="reliabilityCode"> > 1 </descrip> > <note /> > <descripGrp> > <descrip type="definition"> > Die Garantie ist eine selbstständige, vom Hauptschuldverhältnis > unabhängige Verpflichtung. Der Garant (die Bank) kann keinerlei Einwendungen > und Einreden aus dem Grundgeschäft erheben. Das heisst: Der Garant zahlt auf > erste schriftliche Anforderung (Inanspruchnahme) des Begünstigten, gegen > Einreichung der im Garantietext vorgeschriebenen Bestätigung und allenfalls > vorgeschriebenen Dokumente. </descrip> > <adminGrp> > <admin type="source"> > CS Glossar </admin> > </adminGrp> > </descripGrp> > <ntig> > <termGrp> > <term> > Bankgarantie </term> > <termNote type="partOfSpeech" /> > <termNote type="grammaticalGender" /> > <termNote type="grammaticalNumber" /> > <termCompList type="lemma"> > <termComp /> > </termCompList> > <termCompList type="morphologicalElement"> > <termComp /> > </termCompList> > <termNote type="termType"> > main </termNote> > <termNote type="usageNote" /> > </termGrp> > <adminGrp> > <admin type="source"> > CS Glossar </admin> > <note /> > </adminGrp> > <descripGrp> > <descrip type="example" /> > <adminGrp> > <admin type="source" /> > </adminGrp> > </descripGrp> > <note /> > </ntig> > <ntig> > <termGrp> > <term /> > <termNote type="termType"> > abbr </termNote> > </termGrp> > <adminGrp> > <admin type="source" /> > </adminGrp> > </ntig> > <ntig> > <termGrp> > <term /> > <termNote type="termType"> > syn </termNote> > <termNote type="grammaticalGender" /> > <termCompList type="lemma"> > <termComp /> > </termCompList> > <termCompList type="morphologicalElement"> > <termComp /> > </termCompList> > </termGrp> > <adminGrp> > <admin type="source" /> > <note /> > </adminGrp> > <descrip type="example" /> > <adminGrp> > <admin type="source" /> > </adminGrp> > <note /> > </ntig> > </langSet> > <langSet xml:lang="en-GB"> > <transacGrp> > <transac type="transactionType"> > created </transac> > <transacNote type="responsibility"> > STEA </transacNote> > <date> > 2009-09-15T14:44:54.924+02:00 </date> > </transacGrp> > <descrip type="reliabilityCode"> > 1 </descrip> > <note /> > <descripGrp> > <descrip type="definition" /> > <adminGrp> > <admin type="source" /> > </adminGrp> > </descripGrp> > <ntig> > <termGrp> > <term> > bank guarantee </term> > <termNote type="partOfSpeech" /> > <termNote type="grammaticalGender" /> > <termNote type="grammaticalNumber" /> > <termCompList type="lemma"> > <termComp /> > </termCompList> > <termCompList type="morphologicalElement"> > <termComp /> > </termCompList> > <termNote type="termType"> > main </termNote> > <termNote type="usageNote" /> > </termGrp> > <adminGrp> > <admin type="source"> > CS Glossar </admin> > <note /> > </adminGrp> > <descripGrp> > <descrip type="example" /> > <adminGrp> > <admin type="source" /> > </adminGrp> > </descripGrp> > <note /> > </ntig> > <ntig> > <termGrp> > <term /> > <termNote type="termType"> > abbr </termNote> > </termGrp> > <adminGrp> > <admin type="source" /> > </adminGrp> > </ntig> > <ntig> > <termGrp> > <term /> > <termNote type="termType"> > syn </termNote> > <termNote type="grammaticalGender" /> > <termCompList type="lemma"> > <termComp /> > </termCompList> > <termCompList type="morphologicalElement"> > <termComp /> > </termCompList> > </termGrp> > <adminGrp> > <admin type="source" /> > <note /> > </adminGrp> > <descrip type="example" /> > <adminGrp> > <admin type="source" /> > </adminGrp> > <note /> > </ntig> > </langSet> > <langSet xml:lang="fr-CH"> > <transacGrp> > <transac type="transactionType"> > created </transac> > <transacNote type="responsibility"> > STEA </transacNote> > <date> > 2009-09-15T14:44:54.924+02:00 </date> > </transacGrp> > <descrip type="reliabilityCode"> > 1 </descrip> > <note /> > <descripGrp> > <descrip type="definition" /> > <adminGrp> > <admin type="source" /> > </adminGrp> > </descripGrp> > <ntig> > <termGrp> > <term> > garantie bancaire </term> > <termNote type="partOfSpeech" /> > <termNote type="grammaticalGender" /> > <termNote type="grammaticalNumber" /> > <termCompList type="lemma"> > <termComp /> > </termCompList> > <termCompList type="morphologicalElement"> > <termComp /> > </termCompList> > <termNote type="termType"> > main </termNote> > <termNote type="usageNote" /> > </termGrp> > <adminGrp> > <admin type="source"> > CS Glossar </admin> > <note /> > </adminGrp> > <descripGrp> > <descrip type="example" /> > <adminGrp> > <admin type="source" /> > </adminGrp> > </descripGrp> > <note /> > </ntig> > <ntig> > <termGrp> > <term /> > <termNote type="termType"> > abbr </termNote> > </termGrp> > <adminGrp> > <admin type="source" /> > </adminGrp> > </ntig> > <ntig> > <termGrp> > <term /> > <termNote type="termType"> > syn </termNote> > <termNote type="grammaticalGender" /> > <termCompList type="lemma"> > <termComp /> > </termCompList> > <termCompList type="morphologicalElement"> > <termComp /> > </termCompList> > </termGrp> > <adminGrp> > <admin type="source" /> > <note /> > </adminGrp> > <descrip type="example" /> > <adminGrp> > <admin type="source" /> > </adminGrp> > <note /> > </ntig> > </langSet> > <langSet xml:lang="it-CH"> > <transacGrp> > <transac type="transactionType"> > created </transac> > <transacNote type="responsibility"> > STEA </transacNote> > <date> > 2009-09-15T14:44:54.924+02:00 </date> > </transacGrp> > <descrip type="reliabilityCode"> > 1 </descrip> > <note /> > <descripGrp> > <descrip type="definition" /> > <adminGrp> > <admin type="source" /> > </adminGrp> > </descripGrp> > <ntig> > <termGrp> > <term> > garanzia bancaria </term> > <termNote type="partOfSpeech" /> > <termNote type="grammaticalGender" /> > <termNote type="grammaticalNumber" /> > <termCompList type="lemma"> > <termComp /> > </termCompList> > <termCompList type="morphologicalElement"> > <termComp /> > </termCompList> > <termNote type="termType"> > main </termNote> > <termNote type="usageNote" /> > </termGrp> > <adminGrp> > <admin type="source"> > CS Glossar </admin> > <note /> > </adminGrp> > <descripGrp> > <descrip type="example" /> > <adminGrp> > <admin type="source" /> > </adminGrp> > </descripGrp> > <note /> > </ntig> > <ntig> > <termGrp> > <term /> > <termNote type="termType"> > abbr </termNote> > </termGrp> > <adminGrp> > <admin type="source" /> > </adminGrp> > </ntig> > <ntig> > <termGrp> > <term /> > <termNote type="termType"> > syn </termNote> > <termNote type="grammaticalGender" /> > <termCompList type="lemma"> > <termComp /> > </termCompList> > <termCompList type="morphologicalElement"> > <termComp /> > </termCompList> > </termGrp> > <adminGrp> > <admin type="source" /> > <note /> > </adminGrp> > <descrip type="example" /> > <adminGrp> > <admin type="source" /> > </adminGrp> > <note /> > </ntig> > </langSet> > </termEntry> > </body> > </text> > </martif> > > __________________________________________________ > Do You Yahoo!? > Tired of spam? Yahoo! Mail has the best spam protection around > http://mail.yahoo.com __________________________________________________ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com