If you are going to write to a file, you might as well write to the log file, since that mechanism is already available.
Karl On Tue, Sep 16, 2014 at 2:44 AM, lalit jangra <[email protected]> wrote: > Thanks Karl, > > As compared to all three methods suggested by you, i believe writing to > file would be easier, correct me if i am wrong. > > What i initially thought that while job is running, i need to write > counter values for each document seeded and processed as we are calling > addSeedDocument() & processDocument() methods for each document. In this > case, it would not be easy to reconcile after job is complete as i do have > loads of data once job finishes and mapping them would be tough. This is > why i am trying to avoid file based mechanism. Also i would hit the > tracking issue as we are calling connector object multiple times and having > multiple agents running parallely. > > Please suggest. > > Regards. > > On Tue, Sep 16, 2014 at 11:59 AM, Karl Wright <[email protected]> wrote: > >> Hi Lalit, >> >> So, let me clarify: you want some independent measure as to whether every >> document seeded, per job, has been in fact processed? >> >> If that is a correct statement, there is by definition no "in code" way >> to do it, since there are multiple agents running in your setup. Each agent >> may process some of the documents, and certainly no agent will process all >> of them. Also, restarting any agents process will lose the information you >> are attempting to record. >> >> So you are stuck with three possibilities: >> >> The first possibility is to use [INFO] statements written to the log. >> This would work, but you don't have the information you need in your >> connector (specifically the job ID), so you would have to add these logging >> statements to various places in the ManifoldCF framework. >> >> The second possibility is to make use of the history database table, >> where events are recorded. You could create two new activity types, also >> written within the framework, for tracking seeding of records and for >> tracking processing of records. There are already activity types for job >> start and end. >> >> Finally, the third possibility: If you must absolutely avoid the file >> system, you would have to write a tracking process which allowed ManifoldCF >> threads to connect via sockets and communicate document seeding and >> processing events. Once again, within the framework, you would transmit >> events to the recording process. This system would be at risk of losing >> tracking data when your tracking process needed to be restarted, however. >> >> None of these are trivial to implement. Essentially, keeping track of >> documents is what MCF uses the database for in the first place, so this >> requirement is like insisting that there be a second ManifoldCF there to be >> sure that the first one did the right thing. It's an incredible waste of >> resources, frankly. Using the log is perhaps the simplest to implement and >> most consistent with what clients might be expecting, but it has very >> significant I/O costs. Using the history table has a similar problem, >> while also putting your database under load. The last solution requires a >> lot of well-constructed code and remains vulnerable to system instability. >> Take your pick. >> >> Karl >> >> >> Thanks, >> Karl >> >> >> On Tue, Sep 16, 2014 at 12:54 AM, lalit jangra <[email protected]> >> wrote: >> >>> Greetings , >>> >>> As part of implementation, i need to put a reconciliation mechanism in >>> place where it can be verified how many documents have been crawled for a >>> job and same can be displayed in logs. >>> >>> First thing came into my mind is to put counters in e.g. CMIS connector >>> code in addSeed() and proecessDocuments() methods and increase it as we >>> progress but as i could see for CMIS that CmisRepositoryConnector.java is >>> getting called for each seeded document to be ingested, these counters are >>> not accurate. Is there any method where i can persist these counters within >>> code itself as i do not want to persist them in file system. >>> >>> Please suggest. >>> -- >>> Regards, >>> Lalit. >>> >> >> > > > -- > Regards, > Lalit. >
