or could i use a filter in schema.xml where i define a fieldtype and use some
filter that understands xpath?
On 4. Sep 2013, at 11:52 AM, Shalin Shekhar Mangar wrote:
> No that wouldn't work. It seems that you probably need a custom
> Transformer to extract the right div content. I do not know i
No that wouldn't work. It seems that you probably need a custom
Transformer to extract the right div content. I do not know if
TikaEntityProcessor supports such a thing.
On Wed, Sep 4, 2013 at 12:38 PM, Andreas Owen wrote:
> so could i just nest it in a XPathEntityProcessor to filter the html or
so could i just nest it in a XPathEntityProcessor to filter the html or is
there something like xpath for tika?
but now i dont know how to pass the text to tika, what do i put in url and
datasou
I don't know much about Tika but in the example data-config.xml that
you posted, the "xpath" attribute on the field "text" won't work
because the xpath attribute is used only by a XPathEntityProcessor.
On Thu, Aug 29, 2013 at 10:20 PM, Andreas Owen wrote:
> I want tika to only index the content i
I want tika to only index the content in ... for the
field "text". unfortunately it's indexing the hole page. Can't xpath do this?
data-config.xml:
http://127.0.0.1/tkb/internet/docImportUrl.xml"; forEach="/docs/doc"
dataSource="main">