Indexation screenshot is as below. [image: image.png]
On Tue, Oct 29, 2019 at 7:57 PM Karl Wright <[email protected]> wrote: > I need both ingestion and deletion. > Karl > > > On Tue, Oct 29, 2019 at 8:09 AM Priya Arora <[email protected]> wrote: > >> History is shown as below as it does not indicates any error. >> [image: 12.JPG] >> >> Thanks >> Priya >> >> On Tue, Oct 29, 2019 at 5:02 PM Karl Wright <[email protected]> wrote: >> >>> What does the history say about these documents? >>> Karl >>> >>> On Tue, Oct 29, 2019 at 6:53 AM Priya Arora <[email protected]> wrote: >>> >>>> >>>> it may be that (a) they weren't found, or (b) that the document >>>> specification in the job changed and they are no longer included in the >>>> job. >>>> >>>> URL's that were deleted are valid URL's(as that does not result in 404 >>>> or page not found error), and it is not being mentioned in Exclusion tab of >>>> job configuration. >>>> And the URL's were getting indexed earlier and except for index name in >>>> Elasticsearch nothing is changed in Job specification and in other >>>> connectors. >>>> >>>> Thanks >>>> Priya >>>> >>>> On Tue, Oct 29, 2019 at 3:40 PM Karl Wright <[email protected]> wrote: >>>> >>>>> ManifoldCF is an incremental crawler, which means that on every >>>>> (non-continuous) job run it sees which documents it can find and removes >>>>> the ones it can't. The history for the documents being deleted should >>>>> tell >>>>> you why they are being deleted -- it may be that (a) they weren't found, >>>>> or >>>>> (b) that the document specification in the job changed and they are no >>>>> longer included in the job. >>>>> >>>>> Karl >>>>> >>>>> >>>>> On Tue, Oct 29, 2019 at 5:30 AM Priya Arora <[email protected]> >>>>> wrote: >>>>> >>>>>> Hi All, >>>>>> >>>>>> I have a query regarding ManifoldCF Job process.I have a job to crawl >>>>>> intranet site >>>>>> Repository Type:- Web >>>>>> Output Connector Type:- Elastic search. >>>>>> >>>>>> Job have to crawl around4-5 lakhs of total records. I have discarded >>>>>> the previous index and created a new index(in Elasticsearch) with proper >>>>>> mappings and settings and started the job again after cleaning Database >>>>>> even(Database used a PostgreSQL). >>>>>> But while the job continues its ingests the records properly but just >>>>>> before finishing (some times in between also), it initiates the process >>>>>> of >>>>>> Deletions and also it does not index the deleted documents again in >>>>>> index. >>>>>> >>>>>> Can you please something if I am doing anything wrong? or is this a >>>>>> process of manifoldcf if yes , why its not getting ingested again. >>>>>> >>>>>> Thanks and regards >>>>>> Priya >>>>>> >>>>>>
