Hi Jaya

Text extraction is a step before you put data into solr. Say, you have pdf
or doc type documents, you will extract the text (minus unnecessary
formatting details etc.) and store in solr. Later you can query it as you
said. i have not worked in extraction area, but look at this for an idea:
https://lucene.apache.org/solr/guide/6_6/uploading-data-with-solr-cell-using-apache-tika.html

`Tika will automatically attempt to determine the input document type
(Word, PDF, HTML) and extract the content appropriately. If you like, you
can explicitly specify a MIME type for Tika with the stream.type parameter.`



Regards
Nawab


On Thu, Sep 28, 2017 at 6:56 AM, Johnson, Jaya <jaya.john...@moodys.com>
wrote:

> Hi:
> I am trying to ingest a few memos - they do not have any standard format
> (json, xml etc etc) but just plain text however the memos all follow some
> template. What I would like to od post ingestion is to extract keywords and
> some values around it. So say for instance if the text contains the key
> word Outstanding Amount: 1000.  I would like to search for Outstanding
> Amount ( I can do that using the query interface) how to I extract the
> entire string Outstanding Amount +3or4 words from Solr.
>
> I am really new to solr so any documentation etc would be super helpful.
> Is Solr the right tool for this use case also....
>
> Thanks.
> -----------------------------------------
>
> Moody's monitors email communications through its networks for regulatory
> compliance purposes and to protect its customers, employees and business
> and where allowed to do so by applicable law. The information contained in
> this e-mail message, and any attachment thereto, is confidential and may
> not be disclosed without our express permission. If you are not the
> intended recipient or an employee or agent responsible for delivering this
> message to the intended recipient, you are hereby notified that you have
> received this message in error and that any review, dissemination,
> distribution or copying of this message, or any attachment thereto, in
> whole or in part, is strictly prohibited. If you have received this message
> in error, please immediately notify us by telephone, fax or e-mail and
> delete the message and all of its attachments. Every effort is made to keep
> our network free from viruses. You should, however, review this e-mail
> message, as well as any attachment thereto, for viruses. We take no
> responsibility and have no liability for any computer virus which may be
> transferred via this e-mail message.
>
> -----------------------------------------
>

Reply via email to