On 06/28/2011 12:04 PM, Chris Hostetter wrote:

: I'm streaming over the document content (presumably via tika) and its
: gathering the document's metadata which includes the keywords metadata field.
: Since I'm also passing that field from the DB to the REST call as a list (as
: you suggested) there is a collision because the keywords field is single
: valued.
:
: I can change this behavior using a copy field.  What I wanted to know is if
: there was a specific reason the default schema defined a field like keywords
: single valued so I could make sure I wasn't missing something before I changed
: things.

That file is just an example, you're absolutely free to change it to meet
your use case.

I'm not very familiar with Tika, but based on the comment in the example
config...

    <!-- Common metadata fields, named specifically to match up with
      SolrCell metadata when parsing rich documents such as Word, PDF.
      Some fields are multiValued only because Tika currently may return
      multiple values for them.
    -->

...i suspect it was intentional that that field is *not* multiValued (i
guess Tika always returns a single delimited value?) but if you have
multiple descrete values you want to send for your DB backed data there is
no downside to changing that.

: While I'm at it, I'd REALLY like to know how to use DIH to index the metadata
: from the database while simultaneously streaming over the document content and
: indexing it.  I've never quite figured it out yet but I have to believe it is
: a possibility.

There's a TikaEntityProcessor that can be used to have Tika crunch the
data that comes from an "entity" and extract out specific fields, and it
can be used in combination with a JdbcDataSource and a BinFileDataSource
so that a field in your db data specifies the name of a file on disk to
use as the TikaEntity -- but i've personally never tried it

Here's a simple example someone posted last year that they got working...

http://lucene.472066.n3.nabble.com/TikaEntityProcessor-not-working-td856965.html



-Hoss


Thanks Hoss, I'll just change the schema then.

The problem with TikaEntityProcessor is this installation is still running v1.4.1 so I'll need to upgrade.

Any short and sweet instructions for upgrading to 3.2? I have a pretty straight forward Tomcat install, would just dropping in the new war suffice?


- Tod

Reply via email to