Hi, We are using Tika-data-config.xml to index some of the pdf files in our application as shown below:
<?xml version="1.0" encoding="UTF-8" standalone="no"?><dataConfig> <dataSource type="BinFileDataSource"/> <document> <entity baseDir="\usr\share\help" dataSource="null" fileName=".*\.(PDF)|(pdf)|(doc)|(docx)|(DOC)|(DOCX)|(txt)|(ppt)|(xls)|(csv)" name="f" onError="skip" processor="FileListEntityProcessor" recursive="true" rootEntity="false"> <field column="fileAbsolutePath" name="path"/> <field column="fileSize" name="size"/> <field column="fileLastModified" name="lastmodified"/> <field column="file" name="fileName"/> <entity format="text" name="tika-test" processor="TikaEntityProcessor" url="${f.fileAbsolutePath}"> <field column="Author" meta="true" name="author"/> <field column="title" meta="true" name="title"/> <field column="text" name="text"/> </entity> </entity> </document> </dataConfig> Now I want to index the documents which are in cloud or in another file system. Is it possible to give filesystem path of S3(AWS) or any external system filepath in baseDir parameter? Thanks, Srinivas ________________________________ DISCLAIMER: E-mails and attachments from Bamboo Rose, LLC are confidential. If you are not the intended recipient, please notify the sender immediately by replying to the e-mail, and then delete it without making copies or using it in any way. No representation is made that this email or any attachments are free of viruses. Virus scanning is recommended and is the responsibility of the recipient. Disclaimer The information contained in this communication from the sender is confidential. It is intended solely for use by the recipient and others authorized to receive it. If you are not the recipient, you are hereby notified that any disclosure, copying, distribution or taking action in relation of the contents of this information is strictly prohibited and may be unlawful. This email has been scanned for viruses and malware, and may have been automatically archived by Mimecast Ltd, an innovator in Software as a Service (SaaS) for business. Providing a safer and more useful place for your human generated data. Specializing in; Security, archiving and compliance. To find out more visit the Mimecast website.