Re: Problems with DIH XPath flatten
Here's a sample: ?xml version=1.0 encoding=ISO-8859-1? !DOCTYPE document [ !ENTITY nbsp #160; !ENTITY copy #169; !ENTITY reg #174; ] document kbml version=-//Indiana University//DTD KBML 0.9//EN kbqIn Mac OS X, how do I enable or disable the firewall?/kbq body pkbh docid=aghe access=allowedMac OS Xdomainall/domainvisibilityvisible/visibility/kbh includes an easy-to-use kbh docid=aoru access=allowedfirewalldomainall/domainvisibilityvisible/visibility/kbh that can prevent potentially harmful incoming connections from other computers. To turn it on or off:/p h3Mac OS X 10.6 (Snow Leopard)/h3 olliFrom the Apple menu, select miSystem Preferences...†/mi. When the codeSystem Preferences/code window appears, from the miView/mi menu, select miSecurity/mi. br clear=none/br clear=none/ /liliClick the miFirewall/mi tab. ... /li/ol /body xtra term weight=0macos/term term weight=0macintosh/term term weight=0apple/term term weight=0macosx/term ... /xtra /kbml metadata docidaozg/docid owner firstname= lastname=Macintosh Supportscmac/owner ... /metadata /document The /document/kbml/kbq works fine, but as you can see, it has no children. The actual content of the document is within the body element, though, which requires some flattening. Thanks for your time, Adam 2009/10/6 Noble Paul നോബിള് नोब्ळ् noble.p...@corp.aol.com: send a small sample xml snippet you are trying to index and it may help On Tue, Oct 6, 2009 at 9:29 PM, Adam Foltzer acfolt...@gmail.com wrote: Hi all, I'm trying to set up DataImportHandler to index some XML documents available over web services. The XML includes both content and metadata, so for the indexable content, I'm trying to just index everything under the content tag: entity dataSource=kbws name=kbxml pk=title url=resturl processor=XPathEntityProcessor forEach=/document transformer=HTMLStripTransformer flatten=true field column=content name=content xpath=/document/kbml/body flatten=true stripHTML=true / field column=title name=title xpath=/document/kbml/kbq / /entity The result of this is that the title field gets populated and indexed (there are no child nodes of /document/kbml/kbq), but content does not get indexed at all. Since /document/kbml/body has many children, I expected that flatten=true would store all of the body text in the field. Instead, it stores nothing at all. I've tried this with many combinations of transformers and flatten options, and the result is the same each time. Here are the relevant field declarations from the schema (the type=text is just the one from the example's schema.xml). I have tried combinations here as well of stored= and multiValued=, with the same result each time. field name=title type=text indexed=true stored=true multiValued=true / field name=content type=text indexed=true stored=true multiValued=true / If it would help troubleshooting, I could send along some sample XML. I don't want to spam the list with an attachment unless it's necessary, though :) Thanks in advance for your help, Adam Foltzer -- - Noble Paul | Principal Engineer| AOL | http://aol.com
Problems with DIH XPath flatten
Hi all, I'm trying to set up DataImportHandler to index some XML documents available over web services. The XML includes both content and metadata, so for the indexable content, I'm trying to just index everything under the content tag: entity dataSource=kbws name=kbxml pk=title url=resturl processor=XPathEntityProcessor forEach=/document transformer=HTMLStripTransformer flatten=true field column=content name=content xpath=/document/kbml/body flatten=true stripHTML=true / field column=title name=title xpath=/document/kbml/kbq / /entity The result of this is that the title field gets populated and indexed (there are no child nodes of /document/kbml/kbq), but content does not get indexed at all. Since /document/kbml/body has many children, I expected that flatten=true would store all of the body text in the field. Instead, it stores nothing at all. I've tried this with many combinations of transformers and flatten options, and the result is the same each time. Here are the relevant field declarations from the schema (the type=text is just the one from the example's schema.xml). I have tried combinations here as well of stored= and multiValued=, with the same result each time. field name=title type=text indexed=true stored=true multiValued=true / field name=content type=text indexed=true stored=true multiValued=true / If it would help troubleshooting, I could send along some sample XML. I don't want to spam the list with an attachment unless it's necessary, though :) Thanks in advance for your help, Adam Foltzer
Re: Problems with DIH XPath flatten
Hi Shalin, Good question; sorry I forgot it in the initial post. I have tried with both a nightly build from earlier this month (Oct 2 I believe) as well as a build from the trunk as of yesterday afternoon. Adam On Tue, Oct 6, 2009 at 5:04 PM, Shalin Shekhar Mangar shalinman...@gmail.com wrote: On Tue, Oct 6, 2009 at 9:29 PM, Adam Foltzer acfolt...@gmail.com wrote: Hi all, I'm trying to set up DataImportHandler to index some XML documents available over web services. The XML includes both content and metadata, so for the indexable content, I'm trying to just index everything under the content tag: entity dataSource=kbws name=kbxml pk=title url=resturl processor=XPathEntityProcessor forEach=/document transformer=HTMLStripTransformer flatten=true field column=content name=content xpath=/document/kbml/body flatten=true stripHTML=true / field column=title name=title xpath=/document/kbml/kbq / /entity The result of this is that the title field gets populated and indexed (there are no child nodes of /document/kbml/kbq), but content does not get indexed at all. Since /document/kbml/body has many children, I expected that flatten=true would store all of the body text in the field. Instead, it stores nothing at all. I've tried this with many combinations of transformers and flatten options, and the result is the same each time. Which Solr version are you using? The flatten attribute was introduced after 1.3 released. -- Regards, Shalin Shekhar Mangar.