Re: How to process files in a sorted order
Hey Konstantinos, The best way to take care of this is, as Cam said, to sub-class the crawler, and define a new FileFilter (note there are default ones defined in ProductCrawler). So you could create e.g., a: SortedFilesCrawler extends ProductCrawler{ // new FileFilter defined here // override crawl methods @Override public void crawl(){ // use your filter} @Override public void crawl(File dirRoot) {//use your filter} } Hope that helps! Cheers, Chris Chris Mattmann chris.mattm...@gmail.com -Original Message- From: Konstantinos Mavrommatis kmavromma...@celgene.com Reply-To: dev@oodt.apache.org Date: Thursday, November 7, 2013 8:44 PM To: dev@oodt.apache.org dev@oodt.apache.org Subject: How to process files in a sorted order Hi, In my environment I am using cas-crawler to process directories of 1000s of files. The metadata for these files are extracted automatically using the mimetypes definitions and small wrapper scripts. In these directories some of the files are derived from other files and metadata from the older files need to be transferred to the newer file. In order to achieve this I need to have the files processed by the cas-crawler starting from the older file to the newer file or in other cases in alphabetical order.. Any ideas how this can be achieved? The crawler command I currently use is: ./crawler_launcher --operation --launchAutoCrawler --productPath $FILEPATH --filemgrUrl $FMURL --clientTransferer org.apache.oodt.cas.filemgr.datatransfer.InPlaceDataTransferFactory --mimeExtr actorRepo ../policy/mime-extractor-map.xml Thanks in advance for your help Konstantinos * THIS ELECTRONIC MAIL MESSAGE AND ANY ATTACHMENT IS CONFIDENTIAL AND MAY CONTAIN LEGALLY PRIVILEGED INFORMATION INTENDED ONLY FOR THE USE OF THE INDIVIDUAL OR INDIVIDUALS NAMED ABOVE. If the reader is not the intended recipient, or the employee or agent responsible to deliver it to the intended recipient, you are hereby notified that any dissemination, distribution or copying of this communication is strictly prohibited. If you have received this communication in error, please reply to the sender to notify us of the error and delete the original message. Thank You. *
Re: How to process files in a sorted order
Thanks Cameron, I will try to play with it and I will post a solution if I find one. Konstantinos On 11/13/13 3:18 PM, Cameron Goodale sigep...@gmail.com wrote: Konstantinos, My name is Cameron and I am a committer on the Apache OODT project. I am not familiar with the internals of crawler, but I don't believe there is a way to accomplish your goal of enforcing a sorting algorithm within the crawler config. I think you will have to write your own crawler that will implement your sorting logic. Sincerely, Cameron Goodale On Thu, Nov 7, 2013 at 7:44 PM, Konstantinos Mavrommatis kmavromma...@celgene.com wrote: Hi, In my environment I am using cas-crawler to process directories of 1000s of files. The metadata for these files are extracted automatically using the mimetypes definitions and small wrapper scripts. In these directories some of the files are derived from other files and metadata from the older files need to be transferred to the newer file. In order to achieve this I need to have the files processed by the cas-crawler starting from the older file to the newer file or in other cases in alphabetical order.. Any ideas how this can be achieved? The crawler command I currently use is: ./crawler_launcher --operation --launchAutoCrawler --productPath $FILEPATH --filemgrUrl $FMURL --clientTransferer org.apache.oodt.cas.filemgr.datatransfer.InPlaceDataTransferFactory --mimeExtr actorRepo ../policy/mime-extractor-map.xml Thanks in advance for your help Konstantinos * THIS ELECTRONIC MAIL MESSAGE AND ANY ATTACHMENT IS CONFIDENTIAL AND MAY CONTAIN LEGALLY PRIVILEGED INFORMATION INTENDED ONLY FOR THE USE OF THE INDIVIDUAL OR INDIVIDUALS NAMED ABOVE. If the reader is not the intended recipient, or the employee or agent responsible to deliver it to the intended recipient, you are hereby notified that any dissemination, distribution or copying of this communication is strictly prohibited. If you have received this communication in error, please reply to the sender to notify us of the error and delete the original message. Thank You. * -- Sent from a Tin Can attached to a String * THIS ELECTRONIC MAIL MESSAGE AND ANY ATTACHMENT IS CONFIDENTIAL AND MAY CONTAIN LEGALLY PRIVILEGED INFORMATION INTENDED ONLY FOR THE USE OF THE INDIVIDUAL OR INDIVIDUALS NAMED ABOVE. If the reader is not the intended recipient, or the employee or agent responsible to deliver it to the intended recipient, you are hereby notified that any dissemination, distribution or copying of this communication is strictly prohibited. If you have received this communication in error, please reply to the sender to notify us of the error and delete the original message. Thank You. *
Re: How to process files in a sorted order
Konstantinos, My name is Cameron and I am a committer on the Apache OODT project. I am not familiar with the internals of crawler, but I don't believe there is a way to accomplish your goal of enforcing a sorting algorithm within the crawler config. I think you will have to write your own crawler that will implement your sorting logic. Sincerely, Cameron Goodale On Thu, Nov 7, 2013 at 7:44 PM, Konstantinos Mavrommatis kmavromma...@celgene.com wrote: Hi, In my environment I am using cas-crawler to process directories of 1000s of files. The metadata for these files are extracted automatically using the mimetypes definitions and small wrapper scripts. In these directories some of the files are derived from other files and metadata from the older files need to be transferred to the newer file. In order to achieve this I need to have the files processed by the cas-crawler starting from the older file to the newer file or in other cases in alphabetical order.. Any ideas how this can be achieved? The crawler command I currently use is: ./crawler_launcher --operation --launchAutoCrawler --productPath $FILEPATH --filemgrUrl $FMURL --clientTransferer org.apache.oodt.cas.filemgr.datatransfer.InPlaceDataTransferFactory --mimeExtr actorRepo ../policy/mime-extractor-map.xml Thanks in advance for your help Konstantinos * THIS ELECTRONIC MAIL MESSAGE AND ANY ATTACHMENT IS CONFIDENTIAL AND MAY CONTAIN LEGALLY PRIVILEGED INFORMATION INTENDED ONLY FOR THE USE OF THE INDIVIDUAL OR INDIVIDUALS NAMED ABOVE. If the reader is not the intended recipient, or the employee or agent responsible to deliver it to the intended recipient, you are hereby notified that any dissemination, distribution or copying of this communication is strictly prohibited. If you have received this communication in error, please reply to the sender to notify us of the error and delete the original message. Thank You. * -- Sent from a Tin Can attached to a String
How to process files in a sorted order
Hi, In my environment I am using cas-crawler to process directories of 1000s of files. The metadata for these files are extracted automatically using the mimetypes definitions and small wrapper scripts. In these directories some of the files are derived from other files and metadata from the older files need to be transferred to the newer file. In order to achieve this I need to have the files processed by the cas-crawler starting from the older file to the newer file or in other cases in alphabetical order.. Any ideas how this can be achieved? The crawler command I currently use is: ./crawler_launcher --operation --launchAutoCrawler --productPath $FILEPATH --filemgrUrl $FMURL --clientTransferer org.apache.oodt.cas.filemgr.datatransfer.InPlaceDataTransferFactory --mimeExtr actorRepo ../policy/mime-extractor-map.xml Thanks in advance for your help Konstantinos * THIS ELECTRONIC MAIL MESSAGE AND ANY ATTACHMENT IS CONFIDENTIAL AND MAY CONTAIN LEGALLY PRIVILEGED INFORMATION INTENDED ONLY FOR THE USE OF THE INDIVIDUAL OR INDIVIDUALS NAMED ABOVE. If the reader is not the intended recipient, or the employee or agent responsible to deliver it to the intended recipient, you are hereby notified that any dissemination, distribution or copying of this communication is strictly prohibited. If you have received this communication in error, please reply to the sender to notify us of the error and delete the original message. Thank You. *