Hi Dileepa, IMHO, Furkan's approach makes the most sense here. As Olivier pointed out, to retrieve the original content from a Lucene based index, all the fields you are interested in must be stored. If it is your case, you can probably implement a Repository connector then. You can enable incremental crawling by querying for all the Solr documents (q=*:*), using pagination and using one of the fields as a filter to locate only new or modified documents at each crawl.
But, it seems to make more sense if you include your Solr index as a new distributed index along with the other index (ES or Solr) that you plan to populate using ManifoldCF. Typical resources you are going to need for achieving that is 1) a query adapter to convert the user query to a query language supported for all your indexes (easy in this case, because both can talk Lucene query syntax) and 2) a module to normalize the scores of the results from all your indexes. You can use a min-max approach for normalising, for example. This is a quite typical scenario, so I'm sure you can easily find good literature about how to architecture a distributed federated search engine Cheers, Rafa On Tue, Aug 6, 2019 at 2:52 PM Dileepa Jayakody <dileepajayak...@gmail.com> wrote: > Hi All, > > Thank you for your replies. > > @Furkan, Olivier, thanks for the pointers. I will check the approach of > the Solr repository connector as per given references. > @Olivier if you can contribute the Solr repo-connector you are working on, > to MCF that will be awesome! Will be looking forward to an update on that. > > Regards, > Dileepa > > > On Mon, Aug 5, 2019 at 5:01 PM Olivier Tavard < > olivier.tav...@francelabs.com> wrote: > >> Hello, >> >> We are currently working on this kind of repository connector for a >> customer. We plan to give the code to the MCF project if the customer lets >> us do it legally. We will know it at the end of the month or at the >> beginning of next month. >> >> In order to have this working, all the fields of the target Solr need to >> be stored, this condition is mandatory. You can give a look to the Solr >> entity processor of Data Import Handler component : >> https://lucene.apache.org/solr/guide/8_0/uploading-structured-data-store-data-with-the-data-import-handler.html#entity-processors. >> We >> were inspired by that for the development of the connector. >> >> Best regards, >> >> Olivier >> >> >> >> Le 5 août 2019 à 16:38, Furkan KAMACI <furkankam...@gmail.com> a écrit : >> >> Hi Dileepa, >> >> Writing a custom repository connector can let you achieve your goal. Read >> and directly write to an output connector. >> >> You should check your requirements i.e. which data sources you will >> connect. MCF may rid of huge integration pains compared to many other ETL >> tools in your case. >> >> On the other hand, if you wanna achieve a federated search, you could >> search across distributed indexes. Otherwise, it is a heteregous sourced >> indexing architecture. You can federate your search query into Solr without >> ingesting it to any other place. By the way, MCF will let you make document >> level security, you should handle it manually in such a case. >> >> Kind Regards, >> Furkan KAMACI >> >> 5 Ağu 2019 Pzt, saat 17:11 tarihinde Dileepa Jayakody < >> dileepajayak...@gmail.com> şunu yazdı: >> >>> Hi Karl and all, >>> >>> In my use-case, one of the data-sources is an already populated Solr >>> index which is an e-commerce web-site data index (customers, products & >>> services). >>> Apart from the Solr Index, I need to ingest several other heterogeneous >>> data-sources such as PostgresSQL databases, CRM data etc into the federated >>> search index (the output index will either be a Solr, Elastic-search. We >>> haven't yet finalized on the output index, but I know that both of these >>> are supported in MCF as output connectors.). >>> >>> @Karl based on your comments, I would appreciate your opinion on below >>> ingestion flow. >>> Solr repository/data-source > Solr schema transformations > >>> Solr/Elastic-search search-index >>> >>> For such a scenario, do you think MCF is not the ideal option as the >>> ETL/ingestion tool? Should I go for a lower-level ETL tool such as Apache >>> Nifi ? >>> Or will writing a MCF Solr repository connector be useful to achieve >>> this? >>> WDYT? >>> >>> Thanks a lot. >>> Regards, >>> Dileepa >>> >>> >>> >>> On Mon, Aug 5, 2019 at 3:40 PM Karl Wright <daddy...@gmail.com> wrote: >>> >>>> If you are trying to extract data from a Solr index, I know of no way >>>> to do that. >>>> Karl >>>> >>>> >>>> On Mon, Aug 5, 2019 at 9:08 AM Dileepa Jayakody < >>>> dileepajayak...@gmail.com> wrote: >>>> >>>>> Hi All, >>>>> >>>>> Thanks for your replies. >>>>> I'm looking for a repository connector. I've used the Solr output >>>>> connector before. But now what I need is to connect to a solr index as a >>>>> repository and retrieve the documents from there. So I need a Solr >>>>> repository connector. >>>>> >>>>> @Karl >>>>> I will look at the Solr connector, but this is an output connect, >>>>> isn't it? Can use this as a repository connector to retrieve docs? >>>>> >>>>> Thanks, >>>>> Dileepa >>>>> >>>>> On Mon, Aug 5, 2019 at 12:45 PM Cihad Guzel <cguz...@gmail.com> wrote: >>>>> >>>>>> Hi Dileepa, >>>>>> >>>>>> You can check all MFC Connectors list from >>>>>> https://manifoldcf.apache.org/release/release-2.13/en_US/included-connectors.html >>>>>> >>>>>> MFC have a Solr Output Connector. It is not a repository connector. >>>>>> if you want to use as repository connector, you should write a new >>>>>> repository connector. >>>>>> >>>>>> Regards, >>>>>> Cihad Guzel >>>>>> >>>>>> >>>>>> Dileepa Jayakody <dileepajayak...@gmail.com>, 5 Ağu 2019 Pzt, 13:18 >>>>>> tarihinde şunu yazdı: >>>>>> >>>>>>> Hi All, >>>>>>> >>>>>>> I'm working on a project which needs to implement a federated search >>>>>>> solution with heterogeneous data repositories. One repository is a Solr >>>>>>> index. I would like to use ManifoldCF as the data ingestion engine in >>>>>>> this >>>>>>> project as I have worked with MCF before. >>>>>>> >>>>>>> Does ManifoldCF has a Solr repository connector which I can use >>>>>>> here? Or will I need to implement a new repository connector for Solr? >>>>>>> Any guidance here is much appreciated. >>>>>>> >>>>>>> Thanks, >>>>>>> Dileepa >>>>>>> >>>>>> >>