The "path" attribute is not meant to include terminal file names, only
directories.  I'm surprised that this works at all.  The file spec part
should include the file name.

Karl


On Wed, Aug 11, 2021 at 2:14 AM ritika jain <ritikajain5...@gmail.com>
wrote:

> *Dynamic Job *
>
> {"job":{"_children_":[{"_type_":"id","_value_":"1628595470228"},{"_type_":"description","_value_":"DEMo
>  TEMP 
> API-1628595484"},{"_type_":"repository_connection","_value_":"Demo_Repo"},{"_type_":"document_specification","_children_":[{"_type_":"startpoint","include":[{"_attribute_indexable":"yes","_attribute_filespec":"\/*.pdf","_value_":"","_attribute_type":"file"},{"_attribute_indexable":"yes","_attribute_filespec":"\/*.doc","_value_":"","_attribute_type":"file"},{"_attribute_indexable":"yes","_attribute_filespec":"\/*.docx","_value_":"","_attribute_type":"file"},{"_attribute_indexable":"yes","_attribute_filespec":"\/*.docb","_value_":"","_attribute_type":"file"},{"_attribute_indexable":"yes","_attribute_filespec":"\/*.dotx","_value_":"","_attribute_type":"file"},{"_attribute_indexable":"yes","_attribute_filespec":"\/*.dot","_value_":"","_attribute_type":"file"},{"_attribute_indexable":"yes","_attribute_filespec":"\/*.docm","_value_":"","_attribute_type":"file"},{"_attribute_indexable":"yes","_attribute_filespec":"\/*.ppt","_value_":"","_attribute_type":"file"},{"_attribute_indexable":"yes","_attribute_filespec":"\/*.pptx","_value_":"","_attribute_type":"file"},{"_attribute_indexable":"yes","_attribute_filespec":"\/*.wpd","_value_":"","_attribute_type":"file"},{"_attribute_indexable":"yes","_attribute_filespec":"\/*.wp","_value_":"","_attribute_type":"file"},{"_attribute_indexable":"yes","_attribute_filespec":"\/*.wp5","_value_":"","_attribute_type":"file"},{"_attribute_indexable":"yes","_attribute_filespec":"\/*.wp4","_value_":"","_attribute_type":"file"},{"_attribute_indexable":"yes","_attribute_filespec":"\/*.wp6","_value_":"","_attribute_type":"file"},{"_attribute_indexable":"yes","_attribute_filespec":"\/*.wp7","_value_":"","_attribute_type":"file"},{"_attribute_indexable":"yes","_attribute_filespec":"\/*.png","_value_":"","_attribute_type":"file"},{"_attribute_indexable":"yes","_attribute_filespec":"\/*.jpg","_value_":"","_attribute_type":"file"},{"_attribute_indexable":"yes","_attribute_filespec":"\/*.jpeg","_value_":"","_attribute_type":"file"},{"_attribute_indexable":"yes","_attribute_filespec":"\/*.gif","_value_":"","_attribute_type":"file"},{"_attribute_indexable":"yes","_attribute_filespec":"\/*.bmp","_value_":"","_attribute_type":"file"},{"_attribute_indexable":"yes","_attribute_filespec":"\/*.mpg","_value_":"","_attribute_type":"file"},{"_attribute_indexable":"yes","_attribute_filespec":"\/*.xlsm","_value_":"","_attribute_type":"file"},{"_attribute_indexable":"yes","_attribute_filespec":"\/*.xlsb","_value_":"","_attribute_type":"file"},{"_attribute_indexable":"yes","_attribute_filespec":"\/*.xls","_value_":"","_attribute_type":"file"},{"_attribute_indexable":"yes","_attribute_filespec":"\/*.xlsx","_value_":"","_attribute_type":"file"},{"_attribute_indexable":"yes","_attribute_filespec":"\/*.doc","_value_":"","_attribute_type":"file"},{"_attribute_indexable":"yes","_attribute_filespec":"\/*.mpeg","_value_":"","_attribute_type":"file"},{"_attribute_filespec":"*","_value_":"","_attribute_type":"directory"}],"_attribute_path":*"windows\/Job\/Demo
>  School 
> Network\/Information\/restpuntion.docx"*,"_value_":""},{"_type_":"maxlength","_value_":"","_attribute_value":"2000000"},{"_type_":"security","_value_":"","_attribute_value":"on"},{"_type_":"sharesecurity","_value_":"","_attribute_value":"on"},{"_type_":"parentfoldersecurity","_value_":"","_attribute_value":"on"}]},{"_type_":"pipelinestage","_children_":[{"_type_":"stage_id","_value_":"0"},{"_type_":"stage_isoutput","_value_":"false"},{"_type_":"stage_connectionname","_value_":"Tika"},{"_type_":"stage_specification","_children_":[{"_type_":"keepAllMetadata","_value_":"","_attribute_value":"true"},{"_type_":"ignoreException","_value_":"","_attribute_value":"true"},{"_type_":"lowerNames","_value_":"","_attribute_value":"false"},{"_type_":"writeLimit","_value_":"","_attribute_value":""},{"_type_":"boilerplateprocessor","_value_":"","_attribute_value":"de.l3s.boilerpipe.extractors.KeepEverythingExtractor"}]}]},{"_type_":"pipelinestage","_children_":[{"_type_":"stage_id","_value_":"1"},{"_type_":"stage_prerequisite","_value_":"0"},{"_type_":"stage_isoutput","_value_":"false"},{"_type_":"stage_connectionname","_value_":"Metadata
>  
> Adjuster"},{"_type_":"stage_specification","_children_":[{"_type_":"expression","_attribute_parameter":"d_connector_type","_value_":"","_attribute_value":"FileShare"},{"_type_":"expression","_attribute_parameter":"d_description","_value_":"","_attribute_value":"\"${dc:description}\""},{"_type_":"keepAllMetadata","_value_":"","_attribute_value":"true"},{"_type_":"filterEmpty","_value_":"","_attribute_value":"true"}]}]},{"_type_":"pipelinestage","_children_":[{"_type_":"stage_id","_value_":"2"},{"_type_":"stage_prerequisite","_value_":"1"},{"_type_":"stage_isoutput","_value_":"true"},{"_type_":"stage_connectionname","_value_":"Deltares_Output"},{"_type_":"stage_specification"}]},{"_type_":"start_mode","_value_":"manual"},{"_type_":"run_mode","_value_":"scan
>  
> once"},{"_type_":"hopcount_mode","_value_":"accurate"},{"_type_":"priority","_value_":"1"},{"_type_":"recrawl_interval","_value_":"86400000"},{"_type_":"max_recrawl_interval","_value_":"infinite"},{"_type_":"expiration_interval","_value_":"infinite"},{"_type_":"reseed_interval","_value_":"3600000"}]}}
>
>
> *Other Manual Job*
>
> {"job":{"_children_":[{"_type_":"id","_value_":"1599130705168"},{"_type_":"description","_value_":"Demo_job"},{"_type_":"repository_connection","_value_":"mas_Repo"},{"_type_":"document_specification","_children_":[{"_type_":"startpoint","include":[{"_attribute_indexable":"yes","_attribute_filespec":"\/*.pdf","_value_":"","_attribute_type":"file"},{"_attribute_indexable":"yes","_attribute_filespec":"\/*.doc","_value_":"","_attribute_type":"file"},{"_attribute_indexable":"yes","_attribute_filespec":"\/*.docm","_value_":"","_attribute_type":"file"},{"_attribute_indexable":"yes","_attribute_filespec":"\/*.docx","_value_":"","_attribute_type":"file"},{"_attribute_indexable":"yes","_attribute_filespec":"\/*.docb","_value_":"","_attribute_type":"file"},{"_attribute_indexable":"yes","_attribute_filespec":"\/*.dot","_value_":"","_attribute_type":"file"},{"_attribute_indexable":"yes","_attribute_filespec":"\/*.dotx","_value_":"","_attribute_type":"file"},{"_attribute_indexable":"yes","_attribute_filespec":"\/*.wpd
>  
> ","_value_":"","_attribute_type":"file"},{"_attribute_indexable":"yes","_attribute_filespec":"\/*.pptx","_value_":"","_attribute_type":"file"},{"_attribute_indexable":"yes","_attribute_filespec":"\/*.ppt","_value_":"","_attribute_type":"file"},{"_attribute_indexable":"yes","_attribute_filespec":"\/*.wp","_value_":"","_attribute_type":"file"},{"_attribute_indexable":"yes","_attribute_filespec":"\/*.wp4","_value_":"","_attribute_type":"file"},{"_attribute_indexable":"yes","_attribute_filespec":"\/*.wp5","_value_":"","_attribute_type":"file"},{"_attribute_indexable":"yes","_attribute_filespec":"\/*.wp6","_value_":"","_attribute_type":"file"},{"_attribute_indexable":"yes","_attribute_filespec":"\/*.wp7","_value_":"","_attribute_type":"file"},{"_attribute_indexable":"yes","_attribute_filespec":"\/*.xlsm
>  
> ","_value_":"","_attribute_type":"file"},{"_attribute_indexable":"yes","_attribute_filespec":"\/*.xls","_value_":"","_attribute_type":"file"},{"_attribute_indexable":"yes","_attribute_filespec":"\/*.xls","_value_":"","_attribute_type":"file"},{"_attribute_indexable":"yes","_attribute_filespec":"\/*.xlsb","_value_":"","_attribute_type":"file"},{"_attribute_indexable":"yes","_attribute_filespec":"\/*.xlsx","_value_":"","_attribute_type":"file"},{"_attribute_indexable":"yes","_attribute_filespec":"\/*.png","_value_":"","_attribute_type":"file"},{"_attribute_indexable":"yes","_attribute_filespec":"\/*.jpg","_value_":"","_attribute_type":"file"},{"_attribute_indexable":"yes","_attribute_filespec":"\/*.jpeg","_value_":"","_attribute_type":"file"},{"_attribute_indexable":"yes","_attribute_filespec":"\/*.bmp","_value_":"","_attribute_type":"file"},{"_attribute_indexable":"yes","_attribute_filespec":"\/*.gif","_value_":"","_attribute_type":"file"},{"_attribute_indexable":"yes","_attribute_filespec":"\/*.mpeg","_value_":"","_attribute_type":"file"},{"_attribute_indexable":"yes","_attribute_filespec":"\/*.mpg","_value_":"","_attribute_type":"file"},{"_attribute_filespec":"*","_value_":"","_attribute_type":"directory"}],"_attribute_path":"*windows\/Job\/Demo
>  School 
> Network\/Information\*","_value_":""},{"_type_":"maxlength","_value_":"","_attribute_value":"5000000"},{"_type_":"security","_value_":"","_attribute_value":"on"},{"_type_":"sharesecurity","_value_":"","_attribute_value":"on"},{"_type_":"parentfoldersecurity","_value_":"","_attribute_value":"off"}]},{"_type_":"pipelinestage","_children_":[{"_type_":"stage_id","_value_":"0"},{"_type_":"stage_isoutput","_value_":"false"},{"_type_":"stage_connectionname","_value_":"Tika"},{"_type_":"stage_specification","_children_":[{"_type_":"keepAllMetadata","_value_":"","_attribute_value":"true"},{"_type_":"lowerNames","_value_":"","_attribute_value":"false"},{"_type_":"writeLimit","_value_":"","_attribute_value":""},{"_type_":"ignoreException","_value_":"","_attribute_value":"true"},{"_type_":"boilerplateprocessor","_value_":"","_attribute_value":"de.l3s.boilerpipe.extractors.KeepEverythingExtractor"}]}]},{"_type_":"pipelinestage","_children_":[{"_type_":"stage_id","_value_":"1"},{"_type_":"stage_prerequisite","_value_":"0"},{"_type_":"stage_isoutput","_value_":"false"},{"_type_":"stage_connectionname","_value_":"Metadata
>  
> Adjuster"},{"_type_":"stage_specification","_children_":[{"_type_":"expression","_attribute_parameter":"d_connector_type","_value_":"","_attribute_value":"FileShare"},{"_type_":"expression","_attribute_parameter":"d_description","_value_":"","_attribute_value":"\"${dc:description}\"
>  
> "},{"_type_":"keepAllMetadata","_value_":"","_attribute_value":"true"},{"_type_":"filterEmpty","_value_":"","_attribute_value":"true"}]}]},{"_type_":"pipelinestage","_children_":[{"_type_":"stage_id","_value_":"2"},{"_type_":"stage_prerequisite","_value_":"1"},{"_type_":"stage_isoutput","_value_":"true"},{"_type_":"stage_connectionname","_value_":"Deltares_Output"},{"_type_":"stage_specification"}]},{"_type_":"start_mode","_value_":"manual"},{"_type_":"run_mode","_value_":"scan
>  
> once"},{"_type_":"hopcount_mode","_value_":"accurate"},{"_type_":"priority","_value_":"5"},{"_type_":"recrawl_interval","_value_":"86400000"},{"_type_":"max_recrawl_interval","_value_":"infinite"},{"_type_":"expiration_interval","_value_":"infinite"},{"_type_":"reseed_interval","_value_":"3600000"}]}}
>
> Basically these two job structures are fully same.Except Path:- is
> mentioned as 1) Complete path till File location 2) only path till folders.
>
> In the first  case the ingestion file has a slash at the end and In second 
> case we don't.
>
>
> Thanks'
>
> Ritika
>
>
> On Tue, Aug 10, 2021 at 6:52 PM Karl Wright <daddy...@gmail.com> wrote:
>
>> I am sorry, but I'm having trouble understanding how exactly you are
>> configuring the JCIFS connector in these two cases.    Can you view the job
>> in each case and provide cut-and-paste of the view?
>>
>> Karl
>>
>>
>> On Tue, Aug 10, 2021 at 9:09 AM ritika jain <ritikajain5...@gmail.com>
>> wrote:
>>
>>> Hi All,
>>>
>>> I am using Window shares connector in 2.14 manifoldcf version and
>>> Elastic as output.
>>> I have created a dynamic manifoldcf job API via which a job will be
>>> created in manifoldcf with inclusions list and path, only particular file
>>> path is to be mentioned . Example file path:- C:/Users/Dell/Desktop/abc.txt.
>>>
>>> A job will be created to crawl only this single file .
>>> *Issue is :-*
>>> When this job ingest document in Elastic search  there is slash, that is
>>> getting appended in the end
>>>
>>> *Ingested file is* :- C:/Users/Dell/Desktop/abc.txt/
>>>
>>> But when same file is crawled via Manifoldcf job settings by mentioning
>>> path till folder structure (as manual job creation does not allow file path
>>> till particular file it allows till folders only).
>>> It does not append /
>>>
>>> *Ingested file in this case:-*
>>> C:/Users/Dell/Desktop/abc.txt
>>> as expected original file.
>>>
>>> *Query*
>>> Why is this the case as it makes searching in ES ambiguous.
>>>
>>> Thanks
>>> Ritika
>>>
>>>
>>>

Reply via email to