Splitting fields

Joe Fitzgerald Fri, 27 May 2011 06:46:34 -0700

Hello,


I am in an odd position.  The application server I use has built-in
integration with SOLR.  Unfortunately, its native capabilities are
fairly limited, specifically, it only supports a standard/pre-defined
set of fields which can be indexed.  As a result, it has left me
kludging how I work with Solr and doing things like putting what I'd
like to be multiple, separate fields into a single Solr field.

 

As an example, I may put a customer id and name into a single field
called 'custom1'.  Ideally, I'd like this information to be returned in
separate fields...and even better would be for them to be indexed as
separate fields but I can live without the latter.  Currently, I'm
building out a json representation of this information which makes it
easy for me to deal with when I extract the results...but it all feels
wrong.

 

I do have complete control over the actual Solr installation (just not
the indexing call to Solr), so I was hoping there may be a way to
configure Solr to take my single field and split it up into a different
field for each key in my json representation.

 

I don't see anything native to Solr that would do this for me but there
are a few features that I thought sounded similar and was hoping to get
some opinions on how I may be able to move forward with this...

 

Poly fields, such as the spatial location, might help?  Can I build my
own poly-field that would split up the main field into subfields?  Do
poly-fields let me return the subfields?  I don't quite have my head
around polyfields yet.

 

Another option although I suspect this won't be considered a good
approach, but what about extending the copyField functionality of
schema.xml to support my needs?  It would seem not entirely unreasonable
that copyField would provide a means to extract only a portion of the
contents of the source field to place in the destination field, no?  I'm
sure people more familiar with Solr's architecture could explain why
this isn't really an appropriate thing for Solr to handle (just because
it could doesn't mean it should)...

The other - and probably best -- option would be to leverage Solr
directly, bypassing the native integration of my application server,
which we've already done for most cases.  I'd love to go this route but
I'm having a hard time figuring out how to "easily" accomplish the same
functionality provided by my app server integration...perhaps someone on
the list could help me with this path forward?  Here is what I'm trying
to accomplish:

 

I'm indexing documents (text, pdf, html...) but I need to include fields
in the results of my searches which are only available from a db query.
I know how to have Solr index results from a db query, but I'm having
trouble getting it to index the documents that are associated to each
record of that query (full path/filename is one of the fields of that
query).

 

I started to try to use the dataImport handler to do this, by setting up
a FileDataSource in addition to my jdbc data source.  I tried to
leverage the filedatasource to populate a sub-entity based on the db
field that contains the full path/filename, but I wasn't sure how to
specify the db field from the root query/entity.  Before I spent too
much time, I also realized I wasn't sure how to get Solr to deal with
binary file types this way either which upon further reading seemed like
I would need to leverage Tika - can that be done within the confines of
dataimporthandler?

 

Any advice is greatly appreciated.  Thanks in advance,

 

Joe

Splitting fields

Reply via email to