Hi Rafa,

Thank you for your valuable suggestions.

On Tue, Aug 13, 2019 at 5:25 PM Rafa Haro <rh...@apache.org> wrote:

> Hi Dileepa,
>
> IMHO, Furkan's approach makes the most sense here. As Olivier pointed out,
> to retrieve the original content from a Lucene based index, all the fields
> you are interested in must be stored. If it is your case, you can probably
> implement a Repository connector then. You can enable incremental crawling
> by querying for all the Solr documents (q=*:*), using pagination and using
> one of the fields as a filter to locate only new or modified documents at
> each crawl.
>
> But, it seems to make more sense if you include your Solr index as a new
> distributed index along with the other index (ES or Solr) that you plan to
> populate using ManifoldCF. Typical resources you are going to need for
> achieving that is 1) a query adapter to convert the user query to a query
> language supported for all your indexes (easy in this case, because both
> can talk Lucene query syntax) and 2) a module to normalize the scores of
> the results from all your indexes. You can use a min-max approach for
> normalising, for example.
>
Are you referring to a query-time-merge approach instead of following a
index-time-merge approach (getting all the content in a central index
first) as a solution here? If so, yes, I am considering that approach as
well, however the concern is some data-sources are rdf data stores, and
querying rdf stores are typically slow in POV and the whole search would
then be slow with query-federation approach. WDYT?

Thanks,
Dileepa

>
> This is a quite typical scenario, so I'm sure you can easily find good
> literature about how to architecture a distributed federated search engine
>
> Cheers,
> Rafa
>
> On Tue, Aug 6, 2019 at 2:52 PM Dileepa Jayakody <dileepajayak...@gmail.com>
> wrote:
>
>> Hi All,
>>
>> Thank you for your replies.
>>
>> @Furkan, Olivier, thanks for the pointers. I will check the approach of
>> the Solr repository connector as per given references.
>> @Olivier if you can contribute the Solr repo-connector you are working
>> on, to MCF that will be awesome! Will be looking forward to an update on
>> that.
>>
>> Regards,
>> Dileepa
>>
>>
>> On Mon, Aug 5, 2019 at 5:01 PM Olivier Tavard <
>> olivier.tav...@francelabs.com> wrote:
>>
>>> Hello,
>>>
>>> We are currently working on this kind of repository connector for a
>>> customer. We plan to give the code to the MCF project if the customer lets
>>> us do it legally. We will know it at the end of the month or at the
>>> beginning of next month.
>>>
>>> In order to have this working, all the fields of the target Solr need to
>>> be stored, this condition is mandatory. You can give a look to the Solr
>>> entity processor of Data Import Handler component :
>>> https://lucene.apache.org/solr/guide/8_0/uploading-structured-data-store-data-with-the-data-import-handler.html#entity-processors.
>>>  We
>>> were inspired by that for the development of the connector.
>>>
>>> Best regards,
>>>
>>> Olivier
>>>
>>>
>>>
>>> Le 5 août 2019 à 16:38, Furkan KAMACI <furkankam...@gmail.com> a écrit :
>>>
>>> Hi Dileepa,
>>>
>>> Writing a custom repository connector can let you achieve your goal.
>>> Read and directly write to an output connector.
>>>
>>> You should check your requirements i.e. which data sources you will
>>> connect. MCF may rid of huge integration pains compared to many other ETL
>>> tools in your case.
>>>
>>> On the other hand, if you wanna achieve a federated search, you could
>>> search across distributed indexes. Otherwise, it is a heteregous sourced
>>> indexing architecture. You can federate your search query into Solr without
>>> ingesting it to any other place. By the way, MCF will let you make document
>>> level security, you should handle it manually in such a case.
>>>
>>> Kind Regards,
>>> Furkan KAMACI
>>>
>>> 5 Ağu 2019 Pzt, saat 17:11 tarihinde Dileepa Jayakody <
>>> dileepajayak...@gmail.com> şunu yazdı:
>>>
>>>> Hi Karl and all,
>>>>
>>>> In my use-case, one of the data-sources is an already populated Solr
>>>> index which is an e-commerce web-site data index (customers, products &
>>>> services).
>>>> Apart from the Solr Index, I need to ingest several other heterogeneous
>>>> data-sources such as PostgresSQL databases, CRM data etc into the federated
>>>> search index (the output index will either be a Solr, Elastic-search. We
>>>> haven't yet finalized on the output index, but I know that both of these
>>>> are supported in MCF as output connectors.).
>>>>
>>>> @Karl based on your comments, I would appreciate your opinion on below
>>>> ingestion flow.
>>>> Solr repository/data-source > Solr schema transformations >
>>>> Solr/Elastic-search search-index
>>>>
>>>> For such a scenario, do you think MCF is not the ideal option as the
>>>> ETL/ingestion tool?  Should I go for a lower-level ETL tool such as Apache
>>>> Nifi ?
>>>> Or will writing a MCF Solr repository connector be useful to achieve
>>>> this?
>>>> WDYT?
>>>>
>>>> Thanks a lot.
>>>> Regards,
>>>> Dileepa
>>>>
>>>>
>>>>
>>>> On Mon, Aug 5, 2019 at 3:40 PM Karl Wright <daddy...@gmail.com> wrote:
>>>>
>>>>> If you are trying to extract data from a Solr index, I know of no way
>>>>> to do that.
>>>>> Karl
>>>>>
>>>>>
>>>>> On Mon, Aug 5, 2019 at 9:08 AM Dileepa Jayakody <
>>>>> dileepajayak...@gmail.com> wrote:
>>>>>
>>>>>> Hi All,
>>>>>>
>>>>>> Thanks for your replies.
>>>>>> I'm looking for a repository connector. I've used the Solr output
>>>>>> connector before. But now what I need is to connect to a solr index as a
>>>>>> repository and retrieve the documents from there. So I need a Solr
>>>>>> repository connector.
>>>>>>
>>>>>> @Karl
>>>>>> I will look at the Solr connector, but this is an output connect,
>>>>>> isn't it? Can use this as a repository connector to retrieve docs?
>>>>>>
>>>>>> Thanks,
>>>>>> Dileepa
>>>>>>
>>>>>> On Mon, Aug 5, 2019 at 12:45 PM Cihad Guzel <cguz...@gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>>> Hi Dileepa,
>>>>>>>
>>>>>>> You can check all MFC Connectors list from
>>>>>>> https://manifoldcf.apache.org/release/release-2.13/en_US/included-connectors.html
>>>>>>>
>>>>>>> MFC have a Solr Output Connector. It is not a repository connector.
>>>>>>> if you want to use as repository connector, you should write a new
>>>>>>> repository connector.
>>>>>>>
>>>>>>> Regards,
>>>>>>> Cihad Guzel
>>>>>>>
>>>>>>>
>>>>>>> Dileepa Jayakody <dileepajayak...@gmail.com>, 5 Ağu 2019 Pzt, 13:18
>>>>>>> tarihinde şunu yazdı:
>>>>>>>
>>>>>>>> Hi All,
>>>>>>>>
>>>>>>>> I'm working on a project which needs to implement a federated
>>>>>>>> search solution with heterogeneous data repositories. One repository 
>>>>>>>> is a
>>>>>>>> Solr index. I would like to use ManifoldCF as the data ingestion 
>>>>>>>> engine in
>>>>>>>> this project as I have worked with MCF before.
>>>>>>>>
>>>>>>>> Does ManifoldCF has a Solr repository connector which I can use
>>>>>>>> here? Or will I need to implement a new repository connector for Solr?
>>>>>>>> Any guidance here is much appreciated.
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>> Dileepa
>>>>>>>>
>>>>>>>
>>>

Reply via email to