On 2010-03-24 15:58, Fábio Aragão da Silva wrote:
hello there,
I'm working on the development of a piece of code that integrates Solr
with Vignette/OpenText Content Management, meaning Vignette content
instances will be indexed in solr when published and deleted from solr
when unpublished. I'm using solr 1.4, solrj and solr cell.

I've implemented most of the code and I've ran into only a single
issue so far: vignette content management supports the attachment of
multiple binary documents (such as .doc, .pdf or .xls files) to a
single content instance. I am mapping each content instance in
Vignette to a solr document, but now I have a content instance in
vignette with multiple binary files attached to it.

So my question is: is it possible to have more than one binary file
indexed into a single document in solr?

I'm a beginner in solr, but from what I understood I have two options
to index content using solrj: either to use UpdateRequest() and the
add() method to add a SolrInputDocument to the request (in case the
document doesn´t represent a binary file), or to use
ContentStreamUpdateRequest() and the addFile() method to add a binary
file to the content stream request.

I don't see a way, though, to say "this document is comprised of two
files, a word and a pdf, so index them as one document in solr using
content1 and content2 fields - or merge their content into a single
'content' field)".

I tried calling the addFile() twice (one call for each file) and no
error but nothing getting indexed as well.

ContentStreamUpdateRequest req = new
ContentStreamUpdateRequest("/update/extract");
req.addFile(new File("file1.doc"));
req.addFile(new File("file2.pdf"));
req.setParam("literal.id", "multiple_files_test");
req.setParam("uprefix", "attr_");
req.setAction(AbstractUpdateRequest.ACTION.COMMIT, true, true);
server.request(req);

Any thoughts on this would be greatly appreciated.

Write your own RequestHandler that uses the existing ExtractingRequestHandler to actually parse the streams, and then you combine the results arbitrarily in your handler, eventually sending an AddUpdateCommand to the update processor. You can obtain both the update processor and SolrCell instance from req.getCore().


--
Best regards,
Andrzej Bialecki     <><
 ___. ___ ___ ___ _ _   __________________________________
[__ || __|__/|__||\/|  Information Retrieval, Semantic Web
___|||__||  \|  ||  |  Embedded Unix, System Integration
http://www.sigram.com  Contact: info at sigram dot com

Reply via email to