Re: Problems with DIH XPath flatten

2009-10-07 Thread Adam Foltzer
Here's a sample:

?xml version=1.0 encoding=ISO-8859-1?
!DOCTYPE document [
!ENTITY nbsp #160;
!ENTITY copy #169;
!ENTITY reg #174;
]
document
  kbml version=-//Indiana University//DTD KBML 0.9//EN
kbqIn Mac OS X, how do I enable or disable the firewall?/kbq
body
pkbh docid=aghe access=allowedMac OS
Xdomainall/domainvisibilityvisible/visibility/kbh includes
an easy-to-use kbh docid=aoru
access=allowedfirewalldomainall/domainvisibilityvisible/visibility/kbh
that
can prevent potentially harmful incoming connections from other
computers. To turn it on or off:/p


h3Mac OS X 10.6 (Snow Leopard)/h3

olliFrom the Apple menu, select miSystem Preferences...†/mi.
When the codeSystem Preferences/code window appears, from the
miView/mi menu, select miSecurity/mi.

br clear=none/br clear=none/
/liliClick the miFirewall/mi tab.

...

/li/ol
/body
xtra
  term weight=0macos/term
  term weight=0macintosh/term
  term weight=0apple/term
  term weight=0macosx/term

...

/xtra
  /kbml
  metadata
docidaozg/docid
owner firstname= lastname=Macintosh Supportscmac/owner

...

  /metadata
/document

The /document/kbml/kbq works fine, but as you can see, it has no
children. The actual content of the document is within the body
element, though, which requires some flattening.

Thanks for your time,
Adam

2009/10/6 Noble Paul നോബിള്‍  नोब्ळ् noble.p...@corp.aol.com:
 send a small sample xml snippet you are trying to index and it may help

 On Tue, Oct 6, 2009 at 9:29 PM, Adam Foltzer acfolt...@gmail.com wrote:
 Hi all,

 I'm trying to set up DataImportHandler to index some XML documents available
 over web services. The XML includes both content and metadata, so for the
 indexable content, I'm trying to just index everything under the content
 tag:

 entity dataSource=kbws name=kbxml pk=title
        url=resturl processor=XPathEntityProcessor
        forEach=/document transformer=HTMLStripTransformer
 flatten=true
 field column=content name=content xpath=/document/kbml/body
 flatten=true stripHTML=true /
 field column=title name=title xpath=/document/kbml/kbq /
 /entity

 The result of this is that the title field gets populated and indexed (there
 are no child nodes of /document/kbml/kbq), but content does not get indexed
 at all. Since /document/kbml/body has many children, I expected that
 flatten=true would store all of the body text in the field. Instead, it
 stores nothing at all. I've tried this with many combinations of
 transformers and flatten options, and the result is the same each time.

 Here are the relevant field declarations from the schema (the type=text is
 just the one from the example's schema.xml). I have tried combinations here
 as well of stored= and multiValued=, with the same result each time.

 field name=title type=text indexed=true stored=true
 multiValued=true /
 field name=content type=text indexed=true stored=true
 multiValued=true /

 If it would help troubleshooting, I could send along some sample XML. I
 don't want to spam the list with an attachment unless it's necessary, though
 :)

 Thanks in advance for your help,

 Adam Foltzer




 --
 -
 Noble Paul | Principal Engineer| AOL | http://aol.com



Problems with DIH XPath flatten

2009-10-06 Thread Adam Foltzer
Hi all,

I'm trying to set up DataImportHandler to index some XML documents available
over web services. The XML includes both content and metadata, so for the
indexable content, I'm trying to just index everything under the content
tag:

entity dataSource=kbws name=kbxml pk=title
url=resturl processor=XPathEntityProcessor
forEach=/document transformer=HTMLStripTransformer
flatten=true
field column=content name=content xpath=/document/kbml/body
flatten=true stripHTML=true /
field column=title name=title xpath=/document/kbml/kbq /
/entity

The result of this is that the title field gets populated and indexed (there
are no child nodes of /document/kbml/kbq), but content does not get indexed
at all. Since /document/kbml/body has many children, I expected that
flatten=true would store all of the body text in the field. Instead, it
stores nothing at all. I've tried this with many combinations of
transformers and flatten options, and the result is the same each time.

Here are the relevant field declarations from the schema (the type=text is
just the one from the example's schema.xml). I have tried combinations here
as well of stored= and multiValued=, with the same result each time.

field name=title type=text indexed=true stored=true
multiValued=true /
field name=content type=text indexed=true stored=true
multiValued=true /

If it would help troubleshooting, I could send along some sample XML. I
don't want to spam the list with an attachment unless it's necessary, though
:)

Thanks in advance for your help,

Adam Foltzer


Re: Problems with DIH XPath flatten

2009-10-06 Thread Adam Foltzer
Hi Shalin,

Good question; sorry I forgot it in the initial post. I have tried with both
a nightly build from earlier this month (Oct 2 I believe) as well as a build
from the trunk as of yesterday afternoon.

Adam

On Tue, Oct 6, 2009 at 5:04 PM, Shalin Shekhar Mangar 
shalinman...@gmail.com wrote:

 On Tue, Oct 6, 2009 at 9:29 PM, Adam Foltzer acfolt...@gmail.com wrote:

  Hi all,
 
  I'm trying to set up DataImportHandler to index some XML documents
  available
  over web services. The XML includes both content and metadata, so for the
  indexable content, I'm trying to just index everything under the content
  tag:
 
  entity dataSource=kbws name=kbxml pk=title
 url=resturl processor=XPathEntityProcessor
 forEach=/document transformer=HTMLStripTransformer
  flatten=true
  field column=content name=content xpath=/document/kbml/body
  flatten=true stripHTML=true /
  field column=title name=title xpath=/document/kbml/kbq /
  /entity
 
  The result of this is that the title field gets populated and indexed
  (there
  are no child nodes of /document/kbml/kbq), but content does not get
 indexed
  at all. Since /document/kbml/body has many children, I expected that
  flatten=true would store all of the body text in the field. Instead, it
  stores nothing at all. I've tried this with many combinations of
  transformers and flatten options, and the result is the same each time.
 
 
 Which Solr version are you using? The flatten attribute was introduced
 after
 1.3 released.

 --
 Regards,
 Shalin Shekhar Mangar.