Re: Metadata and FullText, indexed at different times - looking for best approach

Erick Erickson Tue, 17 Jul 2012 06:12:44 -0700

In that case, I think your best option is to re-index the entire document
when you have the text available, metadata and all. Which actually
begs the question whether you want to index the bare metadata at
all. Is it the use-case that the user actually gets value when there's no
text? If not, forget DIH and just index the metadata as a result of the
text becoming available.


Best
Erick

On Mon, Jul 16, 2012 at 1:43 PM, Alexandre Rafalovitch
<arafa...@gmail.com> wrote:
> Thank you,
>
> I am already on 4alpha. Patch feels a little too unstable for my
> needs/familiarity with the codes.
>
> What about something around multiple cores? Could I have full-text
> fields stored in a separate cores and somehow (again, minimum
> hand-coding) do search against all those cores and get back combined
> list of document IDs? Or would it making comparative ranking/sorting
> impossible?
>
> Regards,
>    Alex.
> Personal blog: http://blog.outerthoughts.com/
> LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
> - Time is the quality of nature that keeps events from happening all
> at once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD
> book)
>
>
> On Sun, Jul 15, 2012 at 12:08 PM, Erick Erickson
> <erickerick...@gmail.com> wrote:
>> You've got a couple of choices. There's a new patch in town
>> https://issues.apache.org/jira/browse/SOLR-139
>> that allows you to update individual fields in a doc if (and only if)
>> all the fields in the original document were stored (actually, all the
>> non-copy fields).
>>
>> So if you're storing (stored="true") all your metadata information, you can
>> just update the document when the  text becomes available assuming you
>> know the uniqueKey when you update.
>>
>> Under the covers, this will find the old document, get all the fields, add 
>> the
>> new fields to it, and re-index the whole thing.
>>
>> Otherwise, your fallback idea is a good one.
>>
>> Best
>> Erick
>>
>> On Sat, Jul 14, 2012 at 11:05 PM, Alexandre Rafalovitch
>> <arafa...@gmail.com> wrote:
>>> Hello,
>>>
>>> I have a database of metadata and I can inject it into SOLR with DIH
>>> just fine. But then, I also have the documents to extract full text
>>> from that I want to add to the same records as additional fields. I
>>> think DIH allows to run Tika at the ingestion time, but I may not have
>>> the full-text files at that point (they could arrive days later). I
>>> can match the file to the metadata by a file name matching a field
>>> name.
>>>
>>> What is the best approach to do that staggered indexing with minimum
>>> custom code? I guess my fallback position is a custom full-text
>>> indexer agent that re-adds the metadata fields when the file is being
>>> indexed. Is there anything better?
>>>
>>> I am a newbie using v4.0alpha of SOLR (and loving it).
>>>
>>> Thank you,
>>>     Alex.
>>> Personal blog: http://blog.outerthoughts.com/
>>> LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
>>> - Time is the quality of nature that keeps events from happening all
>>> at once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD
>>> book)

Re: Metadata and FullText, indexed at different times - looking for best approach

Reply via email to