CONNECTORS-551 FWIW, I will not be able to look at this for another few hours, most likely.
Karl On Tue, Oct 9, 2012 at 9:31 AM, Karl Wright <[email protected]> wrote: > Hi Martin, > > FWIW, the agents startup sequence also does not have logic which > deletes documents or jobs. > > Nevertheless I will create a ticket and have a look at this ASAP. > > Karl > > On Tue, Oct 9, 2012 at 9:25 AM, Martin Gielow <[email protected]> wrote: >> I have just completed testing the behaviour on the unaltered >> multiprocess-example using the provided HSQL instance. >> >> Indeed, when using the file system connector, Manifold works as it should. >> The agent can be stopped and restarted and the previously processed >> documents are retained. When I tried the JDBC (pointed to a MySQL DB) and >> Wiki connectors, however, I received the same results as yesterday - all >> documents are deleted as soon as the agent restarts (not on shutdown but >> when running the agent again after it has been stopped). >> >> For the JDBC connector I could imagine that this may somehow be related to >> flawed seeding or version queries (although I believe them to be ok), but in >> the case of Wiki there are hardly any settings I believe I could have gotten >> wrong. >> >> >> On Mon, Oct 8, 2012 at 6:58 PM, Karl Wright <[email protected]> wrote: >>> >>> I just tried this; the experiment yields no document deletions >>> recorded in the simple history (as expected). >>> >>> So clearly there is a complicating factor somewhere that you will need to >>> find. >>> >>> I would suggest going about the basic process of eliminating >>> variables. For example, try a continuous crawl in your environment >>> using the file system connector on a moderately-sized set of sample >>> documents, and see if it seems to do the same thing as the other >>> connectors you are using. If it does, then that would suggest that >>> one of your modifications was in fact causing the problem. If not, >>> then I should look at trying to repeat the experiment here with one of >>> the connectors you are working with. >>> >>> Thanks, >>> Karl >>> >>> On Mon, Oct 8, 2012 at 12:22 PM, Karl Wright <[email protected]> wrote: >>> > There is no logic whatsoever in agents-shutdown that should delete >>> > documents from the queue and from the index, and I have never seen >>> > this behavior before, but this is really easy to verify. It should be >>> > simple to take an unaltered 1.0 distribution, create a filesystem job >>> > on the multiprocess example, start it crawling continuously, then stop >>> > and restart the agents process, and then look at the simple history to >>> > see whether any documents get deleted or not. I may have time to try >>> > this later in the evening, we'll see. >>> > >>> > Karl >>> > >>> > On Mon, Oct 8, 2012 at 12:06 PM, Martin Gielow <[email protected]> >>> > wrote: >>> >> Hi Karl, >>> >> >>> >> thanks for the lightning-speed reply! :) >>> >> >>> >> On Mon, Oct 8, 2012 at 5:23 PM, Karl Wright <[email protected]> wrote: >>> >>> >>> >>> Hi Martin, >>> >>> >>> >>> The behavior you describe is expected only if you are either deleting >>> >>> the job, or the job is set to expire old documents after a certain >>> >>> time interval (and that interval has transpired). >>> >>> >>> >>> Can you tell me what your expiration interval is? >>> >>> >>> >> >>> >> The expiration interval is set to 1440 (minutes, according to the >>> >> interface). I also just tried to leave the box empty, so that there >>> >> should >>> >> be no expiration, but the behaviour remained the same. >>> >> >>> >>> >>> >>> Also, when you say "shutting down agents process", can you clarify >>> >>> what deployment model you are using? How are you shutting down this >>> >>> process? >>> >> >>> >> >>> >> I am using a slightly modified version of the multiprocess-example with >>> >> postgres as the DBMS. To run and shutdown the agents I use the batch >>> >> files >>> >> that are provided with the example (start-agents.bat and >>> >> stop-agents.bat). >>> >> I have also tried to run the agents process from Eclipse to be able to >>> >> debug >>> >> into it and was getting the same results. >>> >> >>> >>> >>> >>> Thanks, >>> >>> Karl >>> >> >>> >> >>> >> Regards, >>> >> Martin >>> >> >>> >> >>> >>> >>> >>> >>> >>> On Mon, Oct 8, 2012 at 11:18 AM, Martin Gielow >>> >>> <[email protected]> >>> >>> wrote: >>> >>> > Hello, >>> >>> > >>> >>> > I'm using Manifold to crawl several data sources using the Wiki and >>> >>> > the >>> >>> > JDBC >>> >>> > connectors. I have set the associated jobs to run continuously so >>> >>> > that >>> >>> > new >>> >>> > documents will be added in a timely manner. The problem I am having >>> >>> > with >>> >>> > this, is that whenever the Agent is stopped and then restarted, the >>> >>> > jobs >>> >>> > will delete all of their documents (also propagating the deletes to >>> >>> > the >>> >>> > associated output connection) before turning themselves inactive >>> >>> > (which >>> >>> > they >>> >>> > shouldn't as they are set to run continuously). >>> >>> > >>> >>> > If I then restart the job, in case of the JDBC connection, it is not >>> >>> > finding >>> >>> > any previously added documents and will set itself inactive again. >>> >>> > In >>> >>> > case >>> >>> > of the Wiki connection, the documents are also deleted, but are >>> >>> > successfully >>> >>> > reindexed when the job is restartet manually. >>> >>> > >>> >>> > The only way I found to prevent the jobs from deleting their items >>> >>> > in >>> >>> > this >>> >>> > case, was to manually stop the affected jobs before the Agent is >>> >>> > stopped >>> >>> > (using the abort option) and to restart them after the Agent has >>> >>> > been >>> >>> > restarted. >>> >>> > >>> >>> > >>> >>> > I am using the 1.0 release of Manifold and couldn't find anything >>> >>> > regarding >>> >>> > this behaviour in either the documentation or the wiki. >>> >>> > >>> >>> > Is there an obvious flaw with my setup or something I may have >>> >>> > missed in >>> >>> > the >>> >>> > configuration? >>> >>> > >>> >>> > Thanks in advance for any tips! >>> >>> > >>> >>> > Regards, >>> >>> > Martin >>> >> >>> >> >> >>
