It's obviously a configuration problem. Are you using the extract update handler? If not, do you have tika in the pipeline?
Karl On Tue, Sep 25, 2018 at 4:24 AM Ronny Heylen <[email protected]> wrote: > Hi, > We have been using SOLR for a few years and now the server has been > transferred to the VM's in out HQ ( and reinstalled ), > We ara having the the following issue now : > orcing SOLR indexation by curl works, as we can see from: > *curl "* > *http://gbsloappwp0083.corp.qbe.com:8080/solr/update/extract?literal.id=1&commit=true* > <http://gbsloappwp0083.corp.qbe.com:8080/solr/update/extract?literal.id=1&commit=true>*" > -F "myfile=@z:\qbere_bru\common\testsolr.txt"* > which has successfully indexed testsolr.txt. > As can be checked by: > http://gbsloappwp0083.corp.qbe.com:8080/solr/collection1/select?q=ella > giving: > <result name="response" numFound="1" start="0"> > Searching for john returns 0 files: > http://gbsloappwp0083.corp.qbe.com:8080/solr/collection1/select?q=john > <result name="response" numFound="0" start="0"/> > and searching for any gives also 1 file: > http://gbsloappwp0083.corp.qbe.com:8080/solr/collection1/select?q=* > <result name="response" numFound="1" start="0"> > > However, launching a job from ManifoldCF doesn't seem to work. > We see the folder names in file definition, we see that the job indexes > documents (or at least seems to do so), but SOLR API: > http://gbsloappwp0083.corp.qbe.com:8080/solr/collection1/select?q=* > still return 1 file only, the one we have manually indexed > > If anybody have anu suggestion, would be really gratful > > [email protected] > > > aan ik > > Op di 31 jul. 2018 om 12:12 schreef Karl Wright <[email protected]>: > >> Hi Vinay, >> >> Dynamic rescan is meant for web-crawling and revisits already crawled >> documents based on how often they have changed in the past. It is >> therefore wholly inappropriate for something like a file crawl, since >> directory contents (one of the kinds of documents there are in a file >> crawl) change very infrequently. >> >> Instead, I recommend that you run complete crawls, non-dynamic. You can >> even run minimal crawls fairly often, which will pick up new and changed >> documents, and run non-minimal crawls on a less frequent schedule to >> capture deletions. >> >> Thanks, >> Karl >> >> >> On Tue, Jul 31, 2018 at 4:05 AM VINAY Bengaluru <[email protected]> >> wrote: >> >>> Hi Karl, >>> We have set up a scheduler for our jobs with input >>> connector as file system and output connector as Solr. >>> We have set up a scheduler as follows : >>> Schedule type: Rescan documents dynamically >>> Recrawl interval: blank >>> Schedule time: appropriate times with job invocation as complete. >>> >>> We see that the job is not picking up documents at the scheduled >>> intervals. >>> >>> Why the job doesn't pickup new docs at the scheduled interval? Anything >>> wrong with our job configuration or our understanding? >>> >>> Thanks and regards, >>> Vinay >>> >>>
