Re: schedule information

Jitu Tue, 23 Dec 2014 04:34:32 -0800

Hi Karl,

I checked the source code and in IncrementalIngester.java at line 555 of
checkFetchDocument() method we are checking for forced metadata match of
previous run and current run. if there is a change then file is considered
updated. So Please advice on how to send a parameter to output connector
from StartupThread class which changes for every job execution?


Thanks,
Jitu

On Tue, Dec 23, 2014 at 5:32 PM, Jitu <[email protected]> wrote:

> Hi Karl,
>
> Thanks for your support. Here is what i tried. In StartupThread.java
> inside run method. i am trying to create one unique id called InstanceId
> and store it as part of forcedMetaData which will be sent to
> outputconnector. It all works fine. But when i re-run the same job again
> and again all files are getting crawled again. Is this because forced
> metadata is getting changed? is forced metadata used to check whether the
> file is updated or not?
>
> code snippet:
>
>                   final String instanceId = IDFactory.make(threadContext);
>                   // Only now record the fact that we are trying to start
> the job.
>
> connectionMgr.recordHistory(jobDescription.getConnectionName(),
>                     null,connectionMgr.ACTIVITY_JOBSTART,null,
>
> jobID.toString()+"("+jobDescription.getDescription()+")",null,instanceId,null);
>                   jobDescription.clearForcedMetadata();
>                   jobDescription.addForcedMetadataValue("JOB_INSTANCE_ID",
> instanceId);
>                   jobManager.save(jobDescription);
>
>
> Thanks,
> Jitu
>
> On Mon, Dec 22, 2014 at 6:58 PM, Karl Wright <[email protected]> wrote:
>
>> Hi Jitu,
>>
>> Your client's needs seem rather unusual, and will potentially be somewhat
>> expensive performance-wise.  So unless I hear from others as well that this
>> is a key feature, there's no point in contributing a patch.
>>
>> You will of course need to keep track of whatever changes you develop so
>> that you can later upgrade to newer versions of MCF.
>>
>> Thanks,
>> Karl
>>
>>
>> On Mon, Dec 22, 2014 at 8:14 AM, Jitu <[email protected]> wrote:
>>
>>> Hi Karl,
>>>
>>> Thanks for the quick reply and support. This is exactly what i was
>>> looking for. Thank you so much. If i modify WorkerThread.java do i need to
>>> submit a patch for the same?
>>>
>>> Thanks,
>>> Jitu
>>>
>>> On Mon, Dec 22, 2014 at 4:12 PM, Karl Wright <[email protected]> wrote:
>>>
>>>> Hi Jitu,
>>>>
>>>> I'm sorry for the miscommunication.  What I meant is that without any
>>>> modifications, you can add the job's name as metadata for all documents
>>>> indexed with the job.
>>>>
>>>> If you need to index hard-wired metadata for every job run, you will
>>>> need to modify WorkerThread.java.  The IJobDescription object is readily
>>>> available there, but you will also need to write a SQL query to obtain the
>>>> job's start time.
>>>>
>>>> Karl
>>>>
>>>>
>>>> On Mon, Dec 22, 2014 at 4:33 AM, Jitu <[email protected]> wrote:
>>>>
>>>>> Hi Karl,
>>>>>           Thanks for the quick reply and support. i have gone through
>>>>> the source code of "ForcedMetadataConnector.java" as well as  end user
>>>>> document "
>>>>> http://manifoldcf.apache.org/release/trunk/en_US/end-user-documentation.html#metadataadjuster";.
>>>>> It says we can add a string constant for every job run. but for my client
>>>>> requirement he wants to know what all files crawled for every run of the
>>>>> job. so to search that i need to a send unique id of every job run as part
>>>>> of metadata. this unique id changes for every job run so i cannot use
>>>>> ForcedMetadataConnector. you advised "It's certainly possible to add the
>>>>> current job's start time field as hard-wired metadata" Please let me know
>>>>> how to achieve it.
>>>>>
>>>>> Thanks,
>>>>> Jitu
>>>>>
>>>>> On Fri, Dec 19, 2014 at 1:09 PM, Karl Wright <[email protected]>
>>>>> wrote:
>>>>>
>>>>>> Hi Jitu,
>>>>>>
>>>>>> You can certainly add a unique string associated with a job to every
>>>>>> document using the Metadata Adjuster transformation connector (which of
>>>>>> course can be the job name).  The time of indexing is already sent as a
>>>>>> metadata field (can't remember which one off the top of my head, but I'm
>>>>>> sure you can find it).  What you can't get, mainly because it basically 
>>>>>> has
>>>>>> little meaning in MCF, is the time the job was started.  It's certainly
>>>>>> possible to add the current job's start time field as hard-wired 
>>>>>> metadata,
>>>>>> but I bet your client would prefer the actual time of indexing of the
>>>>>> document anyhow.
>>>>>>
>>>>>> Thanks,
>>>>>> Karl
>>>>>>
>>>>>>
>>>>>> On Fri, Dec 19, 2014 at 2:30 AM, Jitu <[email protected]> wrote:
>>>>>>>
>>>>>>> Hi Karl,
>>>>>>>             Thanks for all your support. For one of our customer
>>>>>>> they need job scheduled information to be sent as part of output 
>>>>>>> connector.
>>>>>>> Basically my customer wants to know what all files are indexed in one 
>>>>>>> job
>>>>>>> run using solr search.
>>>>>>>
>>>>>>> For example if my job ran on 17th dec 2014 at 11:23 AM then i will
>>>>>>> send a unique string say "JobName 17-12-2014 11:23" as part of file
>>>>>>> metadata to solr output connector. During solr search it will use this
>>>>>>> string to search what all files are indexed as part of this string or 
>>>>>>> job
>>>>>>> run.
>>>>>>>
>>>>>>> Please correct me if i am wrong or suggest me how to achive it.
>>>>>>>
>>>>>>> Thanks,
>>>>>>> Jitu
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>

Re: schedule information

Reply via email to