Thanks Karl, I've been launching the job a couple of times with a small set of documents and what I see is that the elastic indexes every time each document, even though the weight of the document is always the same and I don't notice any "html dynamic content" like current time that could cause checksum to be different.
Consulting the "Simple history" menu option shows that Elastic output connector is called "08-23-2018 06:27:19.274 Indexation (Elasticsearch 2.4.6)" So I guess there is a miss-configuration somewhere... El jue., 23 ago. 2018 a las 1:45, Karl Wright (<[email protected]>) escribió: > Hi Gustavo, > > I take it from your question that you are using the Web Connector? > > All connectors create a version string that is used to determine whether > content needs to be reindexed or not. The Web Connector's version string > uses a checksum of the page contents; we found the "last modified" header > to be unreliable, if I recall correctly. > > Thanks, > Karl > > > On Wed, Aug 22, 2018 at 12:35 PM Gustavo Beneitez < > [email protected]> wrote: > >> Hi everyone, >> >> I am currently creating a job that indexes part of Liferay intranet >> content. >> Every time the job is executed the documents are fully reindexed in >> Elastic, no matter they didn't change. >> I thought I had read somewhere the crawler uses "last-modified" http >> header, but also that saves into database a hash. >> I was looking for the right one within the user's manual but no luck, so >> please could you tell me which is the correct one? >> >> Thanks in advance! >> >
