https://lucidworks.com/post/indexing-with-solrj/
> On Jun 7, 2020, at 3:22 PM, Fiz N <fiznewy...@gmail.com> wrote: > > Thanks Jorn and Erick. > > Hi Erick, looks like the skeletal SOLRJ program attachment is missing. > > Thanks > Fiz > > On Sun, Jun 7, 2020 at 12:20 PM Erick Erickson <erickerick...@gmail.com> > wrote: > >> Here’s a skeletal SolrJ program using Tika as another alternative. >> >> Best, >> Erick >> >>> On Jun 7, 2020, at 2:06 PM, Jörn Franke <jornfra...@gmail.com> wrote: >>> >>> You have to write an external application that creates multiple threads, >> parses the PDFs and index them in Solr. Ideally you parse the PDFs once and >> store the resulting text on some file system and then index it. Reason is >> that if you upgrade to two major versions of Solr you might need to reindex >> again. Then you can save time because you don’t need to parse the PDFs >> again. >>> It can be also useful in case you are not sure yet about the final >> schema and need to index several times in different schemas etc >>> >>> You can also use Apache manifoldCF. >>> >>> >>> >>>> Am 07.06.2020 um 19:19 schrieb Fiz N <fiznewy...@gmail.com>: >>>> >>>> Hello SOLR Experts, >>>> >>>> I am working on a POC to Index millions of PDF documents present in >>>> Multiple Folder in fileshare. >>>> >>>> Could you please let me the best practices and step to implement it. >>>> >>>> Thanks >>>> Fiz Nadiyal. >> >>