Hi Karl, I checked the source code and in IncrementalIngester.java at line 555 of checkFetchDocument() method we are checking for forced metadata match of previous run and current run. if there is a change then file is considered updated. So Please advice on how to send a parameter to output connector from StartupThread class which changes for every job execution?
Thanks, Jitu On Tue, Dec 23, 2014 at 5:32 PM, Jitu <[email protected]> wrote: > Hi Karl, > > Thanks for your support. Here is what i tried. In StartupThread.java > inside run method. i am trying to create one unique id called InstanceId > and store it as part of forcedMetaData which will be sent to > outputconnector. It all works fine. But when i re-run the same job again > and again all files are getting crawled again. Is this because forced > metadata is getting changed? is forced metadata used to check whether the > file is updated or not? > > code snippet: > > final String instanceId = IDFactory.make(threadContext); > // Only now record the fact that we are trying to start > the job. > > connectionMgr.recordHistory(jobDescription.getConnectionName(), > null,connectionMgr.ACTIVITY_JOBSTART,null, > > jobID.toString()+"("+jobDescription.getDescription()+")",null,instanceId,null); > jobDescription.clearForcedMetadata(); > jobDescription.addForcedMetadataValue("JOB_INSTANCE_ID", > instanceId); > jobManager.save(jobDescription); > > > Thanks, > Jitu > > On Mon, Dec 22, 2014 at 6:58 PM, Karl Wright <[email protected]> wrote: > >> Hi Jitu, >> >> Your client's needs seem rather unusual, and will potentially be somewhat >> expensive performance-wise. So unless I hear from others as well that this >> is a key feature, there's no point in contributing a patch. >> >> You will of course need to keep track of whatever changes you develop so >> that you can later upgrade to newer versions of MCF. >> >> Thanks, >> Karl >> >> >> On Mon, Dec 22, 2014 at 8:14 AM, Jitu <[email protected]> wrote: >> >>> Hi Karl, >>> >>> Thanks for the quick reply and support. This is exactly what i was >>> looking for. Thank you so much. If i modify WorkerThread.java do i need to >>> submit a patch for the same? >>> >>> Thanks, >>> Jitu >>> >>> On Mon, Dec 22, 2014 at 4:12 PM, Karl Wright <[email protected]> wrote: >>> >>>> Hi Jitu, >>>> >>>> I'm sorry for the miscommunication. What I meant is that without any >>>> modifications, you can add the job's name as metadata for all documents >>>> indexed with the job. >>>> >>>> If you need to index hard-wired metadata for every job run, you will >>>> need to modify WorkerThread.java. The IJobDescription object is readily >>>> available there, but you will also need to write a SQL query to obtain the >>>> job's start time. >>>> >>>> Karl >>>> >>>> >>>> On Mon, Dec 22, 2014 at 4:33 AM, Jitu <[email protected]> wrote: >>>> >>>>> Hi Karl, >>>>> Thanks for the quick reply and support. i have gone through >>>>> the source code of "ForcedMetadataConnector.java" as well as end user >>>>> document " >>>>> http://manifoldcf.apache.org/release/trunk/en_US/end-user-documentation.html#metadataadjuster". >>>>> It says we can add a string constant for every job run. but for my client >>>>> requirement he wants to know what all files crawled for every run of the >>>>> job. so to search that i need to a send unique id of every job run as part >>>>> of metadata. this unique id changes for every job run so i cannot use >>>>> ForcedMetadataConnector. you advised "It's certainly possible to add the >>>>> current job's start time field as hard-wired metadata" Please let me know >>>>> how to achieve it. >>>>> >>>>> Thanks, >>>>> Jitu >>>>> >>>>> On Fri, Dec 19, 2014 at 1:09 PM, Karl Wright <[email protected]> >>>>> wrote: >>>>> >>>>>> Hi Jitu, >>>>>> >>>>>> You can certainly add a unique string associated with a job to every >>>>>> document using the Metadata Adjuster transformation connector (which of >>>>>> course can be the job name). The time of indexing is already sent as a >>>>>> metadata field (can't remember which one off the top of my head, but I'm >>>>>> sure you can find it). What you can't get, mainly because it basically >>>>>> has >>>>>> little meaning in MCF, is the time the job was started. It's certainly >>>>>> possible to add the current job's start time field as hard-wired >>>>>> metadata, >>>>>> but I bet your client would prefer the actual time of indexing of the >>>>>> document anyhow. >>>>>> >>>>>> Thanks, >>>>>> Karl >>>>>> >>>>>> >>>>>> On Fri, Dec 19, 2014 at 2:30 AM, Jitu <[email protected]> wrote: >>>>>>> >>>>>>> Hi Karl, >>>>>>> Thanks for all your support. For one of our customer >>>>>>> they need job scheduled information to be sent as part of output >>>>>>> connector. >>>>>>> Basically my customer wants to know what all files are indexed in one >>>>>>> job >>>>>>> run using solr search. >>>>>>> >>>>>>> For example if my job ran on 17th dec 2014 at 11:23 AM then i will >>>>>>> send a unique string say "JobName 17-12-2014 11:23" as part of file >>>>>>> metadata to solr output connector. During solr search it will use this >>>>>>> string to search what all files are indexed as part of this string or >>>>>>> job >>>>>>> run. >>>>>>> >>>>>>> Please correct me if i am wrong or suggest me how to achive it. >>>>>>> >>>>>>> Thanks, >>>>>>> Jitu >>>>>>> >>>>>> >>>>> >>>> >>> >> >
