Hi Karl, Unfortunately I currently don't have a copy of you book. Thus I am asking all architectural and configuration question. Can you please confirm that the by first option you mean "rescan dynamically" and second option is "scan document once" ?
Regarding my second question. From the List Output Connections if I click view for an existing SOLR connection and Click the Re-ingest all associated documents what changes occurs within ManifoldCF ? Does this action delete any record from existing tables ? Regards Anupam On Tue, Nov 6, 2012 at 11:46 PM, Karl Wright <[email protected]> wrote: > Hi Anupam, > > I'm having difficulty understanding what you posted here, but I will > try to explain the difference between "rescan dynamically" and "scan > every document once". You may find more help also in ManifoldCF in > Action, at http://www.manning.com/wright . > > The first option causes your job to run forever. The job runs only in > the schedule windows allotted for it. It periodically "discovers" new > documents, and (depending on the crawling model of the connector) may > check for existence or modification of an already-crawled document. > Each document has its own schedule for doing this. > > The second option is more likely to be what you want. Each job > starts, runs, and completes, being sure to run only in the scheduling > windows you provide. You then run it again, and again (or your job > schedule makes that happen). It will do the minimal work to keep your > index up to date. > > There are significant differences between how you would set up a job > using one model vs. the other. I strongly suggest you read at least > the first few chapters of the book. > > Karl > > On Tue, Nov 6, 2012 at 12:35 PM, Anupam Bhattacharya > <[email protected]> wrote: > > My incremental indexing was working previously but I have messed up with > few > > settings due to which the documents indexed for the previous day gets > > deleted & only the new once shows up. I suspect that it is due to the > > settings in List all Job>Edit selected job>Scheduling>Schedule type: > "Rescan > > documents dynamically" OR "Scan every document once" ? Please let me know > > the appropriate settings to index only the new documents in the > repository. > > > > After deleting the SOLR indexes data folder and clearing the table > records > > in jobqueue, repohistory, ingeststatus I found that ManifoldCF scans only > > the rest new document list. Untill I go to List Output Connections and > Click > > View for a SOLR connection and Click and Ok the Re-ingest all associated > > documents. How it is functioning to keep a track of which documents > ingested > > previously and then fetch only the list of new document list ? > > > > Regards > > Anupam > > > > > > On Tue, Aug 14, 2012 at 10:01 AM, Anupam Bhattacharya < > [email protected]> > > wrote: > >> > >> Thanks.. > >> > >> There is a option to set Start Method in Connection tab in the Job > >> settings. I made to changes to "Start when the Schedule window starts" > and > >> the problem got resolved. > >> > >> Regards > >> Anupam > >> > >> > >> On Thu, Aug 2, 2012 at 10:59 PM, Karl Wright <[email protected]> > wrote: > >>> > >>> The incremental will work the same whether the job is run manually or > >>> started automatically. > >>> > >>> If you have added the appropriate schedule record to your job, you > >>> also have to select the "run job automatically" radio button on one of > >>> the other job tabs for automatic runs to take place. I suspect that > >>> is what you are missing. > >>> > >>> Karl > >>> > >>> On Thu, Aug 2, 2012 at 1:12 PM, Anupam Bhattacharya < > [email protected]> > >>> wrote: > >>> > I have a Job which is indexing properly even the incremental > indexing, > >>> > if > >>> > initiated/Run manually. Although even after adding a specific time to > >>> > Run > >>> > the schedular process the Jobs is not starting on its own. > >>> > > >>> > What is the ideal configuration to configure a Job which run > >>> > automatically > >>> > everyday at 12 am and does and incremental re-indexing (only look for > >>> > those > >>> > document which are new OR modified after the last crawl) of the > >>> > repository ? > >>> > > >>> > Is it necessary to input/give the total run time details for adding a > >>> > specific schedule time. > >>> > > >>> > Regards > >>> > Anupam > >> > >> > > > > > > > > -- > > Thanks & Regards > > Anupam Bhattacharya > > > > > -- Thanks & Regards Anupam Bhattacharya
