Re: Need simple XPath help

Eirikur Hrafnsson Fri, 14 Jan 2005 09:58:15 -0800

This was the same example I tried a week ago and now it worked for me! BUT only because I changed my xml and the config in Domain.xml so there must be a bug in SimpleXMLExtractor!

If your xml specifies a namespace like mine:
<article xmlns="http://xmlns.idega.com/com.idega.block.article";>
  <headline>Header</headline>
  <teaser>Teaser</teaser>
  <body>Body</body>
  <author>Author</author>
  <source/>
  <comment>Comment</comment>
</article>

And even if your domain.xml definition looks like this (exactly the same only with the added namespace... <extractor classname="org.apache.slide.extractor.SimpleXmlExtractor" uri="/files"> <configuration> <instruction namespace="http://xmlns.idega.com/com.idega.block.article"; property="headline" xpath="/article/headline/text()" /> <instruction namespace="http://xmlns.idega.com/com.idega.block.article"; property="teaser" xpath="/article/teaser/text()" /> <instruction namespace="http://xmlns.idega.com/com.idega.block.article"; property="body" xpath="/article/body/text()" /> <instruction namespace="http://xmlns.idega.com/com.idega.block.article"; property="author" xpath="/article/author/text()" /> <instruction namespace="http://xmlns.idega.com/com.idega.block.article"; property="source" xpath="/article/source/text()" /> <instruction namespace="http://xmlns.idega.com/com.idega.block.article"; property="comment" xpath="/article/comment/text()" /> </configuration> </extractor>

The xpath will not find anything...

best regards
Eirikur, Idega.


On 14.1.2005, at 17:26, Daniel Florey wrote:

Just checked out the Slide CVS head to check it and it still works fine for me :-) Just took the demo configuration in Domain.xml:

 <extractors> <extractor classname="org.apache.slide.extractor.SimpleXmlExtractor" uri="/files/articles/test.xml"> <configuration> <instruction property="title" xpath="/article/title/text()" /> <instruction property="summary" xpath="/article/summary/text()" /> </configuration> </extractor> </extractors>
and put a file under /files/articles/ with name test.xml and content:
<article>
        <title>Some title</title>
        <summary>Summary</summary>
</article>
Everything worked fine!
Check that your event configuration contains the PropertyExtractorTrigger:

 <events> ... <listener classname="org.apache.slide.extractor.PropertyExtractorTrigger" /> ...
    </events>
Check out if this works for your as well.
Cheers,
Daniel
"Slide Developers Mailing List" <[email protected]> schrieb am 14.01.05 17:09:10:
On 14.1.2005, at 15:32, Daniel Florey wrote:
It worked for me when I've implemented it. And as far as I remember my XPath expressions have been extremely simple. So I assume it has something to do with the new jdom version. Have you debugged the code?
Just the Slide side, I don't have any sources for XPath.
What did you xml file look like and the xpaths that worked for it?
Daniel
"Slide Developers Mailing List" <[email protected]> schrieb am 14.01.05 15:53:36:
Nice try but no cookie! ; )
"nodeList" in the SimpleXMLExtractor code is always empty for all the XPaths I have tried and therefore the "properyValue" will always be null. Instruction instruction = (Instruction) i.next(); XPath xPath = instruction.getxPath(); List nodeList = xPath.selectNodes(document); Object propertyValue = filter(nodeList, instruction); if (propertyValue != null) { properties.put(instruction.getPropertyName(), propertyValue); }
The "compiled" value of the xpath you suggested is
/child::*[(name() = "article")]/child::*[(name() = "author")]
if that tells anyone anything...
Has anyone gotten SimpleXMLExtractor to work at all?
best regards
Eirikur, Idega.
On 14.1.2005, at 04:21, James Mason wrote:
Disclaimer: I have no idea if this will work ;).
I ran into a similar issue with dom4j and namespaces in another
application. Using a xpath like:
 /*[name() = 'article']/*[name() = 'headline']
worked in that case.
-James
On Wed, 2005-01-12 at 13:32 +0000, Eirikur Hrafnsson wrote:
I found one thing that it might be related to. I used an example <instruction> from Domain.xml and that doesn't specify a namespace but I found that in SimpleXMLExctractor the method protected Instruction createInstruction(Configuration instruction) throws ConfigurationException { try { String property = instruction.getAttribute("property"); String namespace = instruction.getAttribute("namespace", "DAV:"); XPath xPath = XPath.newInstance(instruction.getAttribute("xpath")); return new Instruction(xPath, PropertyName.getPropertyName(property, namespace)); } catch (JDOMException e) { throw new ConfigurationException("Could not create xPath from given attribute", instruction); } }

seems to require a namespace or it used the default "DAV:" namespace so I changed the xpath instructions to <instruction namespace="http://xmlns.idega.com/com.idega.block.article"; property="headline" xpath="/article/headline" /> ...
and the xml to (just testing hope there isn't a real need):
<article xmlns:I="http://xmlns.idega.com/com.idega.block.article";>
   <I:headline>Header</I:headline>
   <I:teaser>Teaser</I:teaser>
   <I:body>Body</I:body>
   <I:author>Author</I:author>
   <I:source/>
   <I:comment>Comment</I:comment>
</article>
But sadly it behaves the same way....
Eirikur, idega.
On 11.1.2005, at 22:21, Daniel Florey wrote:
I think the problem might be different. It's just some time ago that I've written the SimpleXMLExtractor thing. If I find the time I'll look at it tomorrow. My first guess would be that it was developed using a different version of jdom. Something might have changed as Slide was updated to jdom1.0 AFAK.
Cheers,
Daniel
"Slide Developers Mailing List" <[email protected]>
schrieb
am 11.01.05 22:33:44:
Nope...same deal.
The "compiled" path for that example is
/child::article/child::headline
and propertyValue is still null.
-Eirikur, idega.
On 11.1.2005, at 20:27, Flick, Tim wrote:
It may be simpler than you think!
Try this:
"/article/headline"
This should return the value of article/headline, which in your
example
would be "Header".  I hope?
Regards,
Tim
-----Original Message-----
From: Eirikur Hrafnsson [mailto:[EMAIL PROTECTED]
Sent: Tuesday, January 11, 2005 2:52 PM
To: Slide Developers Mailing List
Subject: Re: Need simple XPath help
The problem is that I trying to get the SimpleXMLExtractor to extract those values as properties for the Lucene index but this code in SimpleXMLExtractor always returns "propertyValue" as null.
    public Map extract(InputStream content) throws
ExtractorException {
         Map properties = new HashMap();
         try {
             SAXBuilder saxBuilder = new SAXBuilder();
             Document document = saxBuilder.build(content);
             for (Iterator i = instructions.iterator();
i.hasNext();) {
                 Instruction instruction = (Instruction)
i.next();
                 XPath xPath = instruction.getxPath();
                 List nodeList = xPath.selectNodes(document);
                 Object propertyValue = filter(nodeList,
instruction);
                 if (propertyValue != null) {
properties.put(instruction.getPropertyName(),
propertyValue);
                 }
in Domain.xml I have the third try...I obviously know nothing
about
XPath ;)
  
     <extractors>
                
         <extractor
classname="org.apache.slide.extractor.SimpleXmlExtractor"
uri="/files">
             <configuration>
                 <instruction property="headline"
xpath="/article/headline[1]" />
                 <instruction property="teaser"
xpath="/article/teaser[1]" />
                 <instruction property="body"
xpath="/article/body[1]"
/>
                 <instruction property="author"
xpath="/article/author[1]" />
                 <instruction property="source"
xpath="/article/source[1]" />
                 <instruction property="comment"
xpath="/article/comment[1]" />
             </configuration>
         </extractor>
...
-Eirikur, Idega.
On 11.1.2005, at 19:36, Daniel Florey wrote:
A assume your examples didn't work... The xpath expression looks correct to me. What exactly is the encountered problem? Cheers, Daniel
"Slide Developers Mailing List" <[email protected]>
schrieb
am 11.01.05 20:18:45:
Anyone know what is the XPath to the content of the xml tags in this example? I've tried some paths but I'm barely a novice in XPath...
<article xmlns="http://idega.com/block/article/bean";>
   <headline>Header</headline>
   <teaser>Teaser</teaser>
   <body>Body</body>
   <author>Author</author>
   <source/>
   <comment>Comment</comment>
</article>
I tried e.g. :
"/article/headline[1]/text()"
and just
"/article/headline/text()"
best regards
Eirikur, idega.
__________________________________________________________
Mit WEB.DE FreePhone mit hoechster Qualitaet ab 0 Ct./Min.
weltweit telefonieren! http://freephone.web.de/?mc=021201
-------------------------------------------------------------- -- -- -- - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Best Regards
Eirikur S. Hrafnsson, [EMAIL PROTECTED]
Chief Software Engineer
Idega Software
http://www.idega.com
Best Regards
Eirikur S. Hrafnsson, [EMAIL PROTECTED]
Chief Software Engineer
Idega Software
http://www.idega.com
________________________________________________________________
Verschicken Sie romantische, coole und witzige Bilder per SMS!
Jetzt neu bei WEB.DE FreeMail: http://freemail.web.de/?mc=021193
----------------------------------------------------------------- -- -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Best Regards
Eirikur S. Hrafnsson, [EMAIL PROTECTED]
Chief Software Engineer
Idega Software
http://www.idega.com
------------------------------------------------------------------- -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Best Regards
Eirikur S. Hrafnsson, [EMAIL PROTECTED]
Chief Software Engineer
Idega Software
http://www.idega.com
-------------------------------------------------------------------- - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
______________________________________________________________
Verschicken Sie romantische, coole und witzige Bilder per SMS!
Jetzt bei WEB.DE FreeMail: http://f.web.de/?mc=021193
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
Best Regards
Eirikur S. Hrafnsson, [EMAIL PROTECTED]
Chief Software Engineer
Idega Software
http://www.idega.com
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
______________________________________________________________
Verschicken Sie romantische, coole und witzige Bilder per SMS!
Jetzt bei WEB.DE FreeMail: http://f.web.de/?mc=021193
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Best Regards

Eirikur S. Hrafnsson, [EMAIL PROTECTED]
Chief Software Engineer
Idega Software
http://www.idega.com


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: Need simple XPath help

Reply via email to