Re: How to process files in a sorted order

2013-11-19 Thread Chris Mattmann
Hey Konstantinos,

The best way to take care of this is, as Cam said, to sub-class the
crawler, and define a new FileFilter (note there are default ones defined
in ProductCrawler). So you could create e.g., a:

SortedFilesCrawler extends ProductCrawler{
   // new FileFilter defined here
   // override crawl methods
   @Override
   public void crawl(){ // use your filter}

   @Override
   public void crawl(File dirRoot) {//use your filter}

}

Hope that helps!

Cheers,
Chris


Chris Mattmann
chris.mattm...@gmail.com




-Original Message-
From: Konstantinos Mavrommatis kmavromma...@celgene.com
Reply-To: dev@oodt.apache.org
Date: Thursday, November 7, 2013 8:44 PM
To: dev@oodt.apache.org dev@oodt.apache.org
Subject: How to process files in a sorted order

Hi,
In my environment I am using cas-crawler to process directories of 1000s
of files. The metadata for these files are extracted automatically using
the mimetypes definitions and small wrapper scripts.
In these directories some of the files are derived from other files and
metadata from the older files need to be transferred to the newer file.
In order to achieve this I need to have the files processed by the
cas-crawler starting from the older file to the newer file or in other
cases in alphabetical order..
Any ideas how this can be achieved?

The crawler command I currently use is:
./crawler_launcher --operation --launchAutoCrawler --productPath
$FILEPATH --filemgrUrl $FMURL --clientTransferer
org.apache.oodt.cas.filemgr.datatransfer.InPlaceDataTransferFactory
--mimeExtr
actorRepo ../policy/mime-extractor-map.xml

Thanks in advance for your help
Konstantinos

*
THIS ELECTRONIC MAIL MESSAGE AND ANY ATTACHMENT IS
CONFIDENTIAL AND MAY CONTAIN LEGALLY PRIVILEGED
INFORMATION INTENDED ONLY FOR THE USE OF THE INDIVIDUAL
OR INDIVIDUALS NAMED ABOVE.
If the reader is not the intended recipient, or the
employee or agent responsible to deliver it to the
intended recipient, you are hereby notified that any
dissemination, distribution or copying of this
communication is strictly prohibited. If you have
received this communication in error, please reply to the
sender to notify us of the error and delete the original
message. Thank You.
* 




Re: How to process files in a sorted order

2013-11-15 Thread Konstantinos Mavrommatis
Thanks Cameron,
I will try to play with it and I will post a solution if I find one.
Konstantinos

On 11/13/13 3:18 PM, Cameron Goodale sigep...@gmail.com wrote:

Konstantinos,

My name is Cameron and I am a committer on the Apache OODT project.  I am
not familiar with the internals of crawler, but I don't believe there is a
way to accomplish your goal of enforcing a sorting algorithm within the
crawler config.  I think you will have to write your own crawler that will
implement your sorting logic.



Sincerely,


Cameron Goodale


On Thu, Nov 7, 2013 at 7:44 PM, Konstantinos Mavrommatis 
kmavromma...@celgene.com wrote:

 Hi,
 In my environment I am using cas-crawler to process directories of 1000s
 of files. The metadata for these files are extracted automatically using
 the mimetypes definitions and small wrapper scripts.
 In these directories some of the files are derived from other files and
 metadata from the older files need to be transferred to the newer file.
 In order to achieve this I need to have the files processed by the
 cas-crawler starting from the older file to the newer file or in other
 cases in alphabetical order..
 Any ideas how this can be achieved?

 The crawler command I currently use is:
 ./crawler_launcher --operation --launchAutoCrawler --productPath
$FILEPATH
 --filemgrUrl $FMURL --clientTransferer
 org.apache.oodt.cas.filemgr.datatransfer.InPlaceDataTransferFactory
  --mimeExtr
 actorRepo ../policy/mime-extractor-map.xml

 Thanks in advance for your help
 Konstantinos

 *
 THIS ELECTRONIC MAIL MESSAGE AND ANY ATTACHMENT IS
 CONFIDENTIAL AND MAY CONTAIN LEGALLY PRIVILEGED
 INFORMATION INTENDED ONLY FOR THE USE OF THE INDIVIDUAL
 OR INDIVIDUALS NAMED ABOVE.
 If the reader is not the intended recipient, or the
 employee or agent responsible to deliver it to the
 intended recipient, you are hereby notified that any
 dissemination, distribution or copying of this
 communication is strictly prohibited. If you have
 received this communication in error, please reply to the
 sender to notify us of the error and delete the original
 message. Thank You.
 *




-- 

Sent from a Tin Can attached to a String

*
THIS ELECTRONIC MAIL MESSAGE AND ANY ATTACHMENT IS
CONFIDENTIAL AND MAY CONTAIN LEGALLY PRIVILEGED
INFORMATION INTENDED ONLY FOR THE USE OF THE INDIVIDUAL
OR INDIVIDUALS NAMED ABOVE. 
If the reader is not the intended recipient, or the
employee or agent responsible to deliver it to the
intended recipient, you are hereby notified that any
dissemination, distribution or copying of this
communication is strictly prohibited. If you have
received this communication in error, please reply to the
sender to notify us of the error and delete the original
message. Thank You.
* 



Re: How to process files in a sorted order

2013-11-13 Thread Cameron Goodale
Konstantinos,

My name is Cameron and I am a committer on the Apache OODT project.  I am
not familiar with the internals of crawler, but I don't believe there is a
way to accomplish your goal of enforcing a sorting algorithm within the
crawler config.  I think you will have to write your own crawler that will
implement your sorting logic.



Sincerely,


Cameron Goodale


On Thu, Nov 7, 2013 at 7:44 PM, Konstantinos Mavrommatis 
kmavromma...@celgene.com wrote:

 Hi,
 In my environment I am using cas-crawler to process directories of 1000s
 of files. The metadata for these files are extracted automatically using
 the mimetypes definitions and small wrapper scripts.
 In these directories some of the files are derived from other files and
 metadata from the older files need to be transferred to the newer file.
 In order to achieve this I need to have the files processed by the
 cas-crawler starting from the older file to the newer file or in other
 cases in alphabetical order..
 Any ideas how this can be achieved?

 The crawler command I currently use is:
 ./crawler_launcher --operation --launchAutoCrawler --productPath $FILEPATH
 --filemgrUrl $FMURL --clientTransferer
 org.apache.oodt.cas.filemgr.datatransfer.InPlaceDataTransferFactory
  --mimeExtr
 actorRepo ../policy/mime-extractor-map.xml

 Thanks in advance for your help
 Konstantinos

 *
 THIS ELECTRONIC MAIL MESSAGE AND ANY ATTACHMENT IS
 CONFIDENTIAL AND MAY CONTAIN LEGALLY PRIVILEGED
 INFORMATION INTENDED ONLY FOR THE USE OF THE INDIVIDUAL
 OR INDIVIDUALS NAMED ABOVE.
 If the reader is not the intended recipient, or the
 employee or agent responsible to deliver it to the
 intended recipient, you are hereby notified that any
 dissemination, distribution or copying of this
 communication is strictly prohibited. If you have
 received this communication in error, please reply to the
 sender to notify us of the error and delete the original
 message. Thank You.
 *




-- 

Sent from a Tin Can attached to a String


How to process files in a sorted order

2013-11-07 Thread Konstantinos Mavrommatis
Hi,
In my environment I am using cas-crawler to process directories of 1000s of 
files. The metadata for these files are extracted automatically using the 
mimetypes definitions and small wrapper scripts.
In these directories some of the files are derived from other files and 
metadata from the older files need to be transferred to the newer file.
In order to achieve this I need to have the files processed by the cas-crawler 
starting from the older file to the newer file or in other cases in 
alphabetical order..
Any ideas how this can be achieved?

The crawler command I currently use is:
./crawler_launcher --operation --launchAutoCrawler --productPath $FILEPATH 
--filemgrUrl $FMURL --clientTransferer 
org.apache.oodt.cas.filemgr.datatransfer.InPlaceDataTransferFactory  --mimeExtr
actorRepo ../policy/mime-extractor-map.xml

Thanks in advance for your help
Konstantinos

*
THIS ELECTRONIC MAIL MESSAGE AND ANY ATTACHMENT IS
CONFIDENTIAL AND MAY CONTAIN LEGALLY PRIVILEGED
INFORMATION INTENDED ONLY FOR THE USE OF THE INDIVIDUAL
OR INDIVIDUALS NAMED ABOVE. 
If the reader is not the intended recipient, or the
employee or agent responsible to deliver it to the
intended recipient, you are hereby notified that any
dissemination, distribution or copying of this
communication is strictly prohibited. If you have
received this communication in error, please reply to the
sender to notify us of the error and delete the original
message. Thank You.
*