I would strongly suggest going directly to the repositories rather than the
Solr index, where possible, as the source for the documents you are
indexing.  This is MCF's standard use case.  It is meant to handle
disparate repositories all going into a single output.  Effort is made in
every repository connector to handle the following:

- Document content
- Document metadata
- Document security information

Much of this is lost if you use Solr as an intermediate repository.  The
Solr index does not contain the original document, but just a list of terms
and term frequencies.  I do not know how you reconstruct it properly for
indexing into ElasticSearch.

Karl


On Mon, Aug 5, 2019 at 10:11 AM Dileepa Jayakody <dileepajayak...@gmail.com>
wrote:

> Hi Karl and all,
>
> In my use-case, one of the data-sources is an already populated Solr index
> which is an e-commerce web-site data index (customers, products &
> services).
> Apart from the Solr Index, I need to ingest several other heterogeneous
> data-sources such as PostgresSQL databases, CRM data etc into the federated
> search index (the output index will either be a Solr, Elastic-search. We
> haven't yet finalized on the output index, but I know that both of these
> are supported in MCF as output connectors.).
>
> @Karl based on your comments, I would appreciate your opinion on below
> ingestion flow.
> Solr repository/data-source > Solr schema transformations >
> Solr/Elastic-search search-index
>
> For such a scenario, do you think MCF is not the ideal option as the
> ETL/ingestion tool?  Should I go for a lower-level ETL tool such as Apache
> Nifi ?
> Or will writing a MCF Solr repository connector be useful to achieve this?
> WDYT?
>
> Thanks a lot.
> Regards,
> Dileepa
>
>
>
> On Mon, Aug 5, 2019 at 3:40 PM Karl Wright <daddy...@gmail.com> wrote:
>
>> If you are trying to extract data from a Solr index, I know of no way to
>> do that.
>> Karl
>>
>>
>> On Mon, Aug 5, 2019 at 9:08 AM Dileepa Jayakody <
>> dileepajayak...@gmail.com> wrote:
>>
>>> Hi All,
>>>
>>> Thanks for your replies.
>>> I'm looking for a repository connector. I've used the Solr output
>>> connector before. But now what I need is to connect to a solr index as a
>>> repository and retrieve the documents from there. So I need a Solr
>>> repository connector.
>>>
>>> @Karl
>>> I will look at the Solr connector, but this is an output connect, isn't
>>> it? Can use this as a repository connector to retrieve docs?
>>>
>>> Thanks,
>>> Dileepa
>>>
>>> On Mon, Aug 5, 2019 at 12:45 PM Cihad Guzel <cguz...@gmail.com> wrote:
>>>
>>>> Hi Dileepa,
>>>>
>>>> You can check all MFC Connectors list from
>>>> https://manifoldcf.apache.org/release/release-2.13/en_US/included-connectors.html
>>>>
>>>> MFC have a Solr Output Connector. It is not a repository connector. if
>>>> you want to use as repository connector, you should write a new repository
>>>> connector.
>>>>
>>>> Regards,
>>>> Cihad Guzel
>>>>
>>>>
>>>> Dileepa Jayakody <dileepajayak...@gmail.com>, 5 Ağu 2019 Pzt, 13:18
>>>> tarihinde şunu yazdı:
>>>>
>>>>> Hi All,
>>>>>
>>>>> I'm working on a project which needs to implement a federated search
>>>>> solution with heterogeneous data repositories. One repository is a Solr
>>>>> index. I would like to use ManifoldCF as the data ingestion engine in this
>>>>> project as I have worked with MCF before.
>>>>>
>>>>> Does ManifoldCF has a Solr repository connector which I can use here?
>>>>> Or will I need to implement a new repository connector for Solr?
>>>>> Any guidance here is much appreciated.
>>>>>
>>>>> Thanks,
>>>>> Dileepa
>>>>>
>>>>

Reply via email to