Brian, I created a wiki page with your guidance below:

https://cwiki.apache.org/confluence/display/OODT/OODT+Crawler+Help

Others can feel free to jump on and contribute.

Cheers,
Chris

On Jun 1, 2011, at 2:20 PM, holenoter wrote:

> hey thomas,
> 
> you are using StdProductCrawler which assumes a *.met file already exist for 
> each file (it has only one precondition which is the existing of the *.met 
> file) . . . if you want a *.met file generated you will have to use one of 
> the other 2 crawlers.  running: ./crawler_launcher -psc will give you a list 
> of supported crawlers.  you can then run: ./crawler_launcher -h -cid 
> <crawler_id> where crawler id is one of the ids from the previous command . . 
> . unfortunately i don't think the other crawlers are documented all that 
> extensively . . . MetExtractorProductCrawler will use a single extractor for 
> all files . . . AutoDetectProductCrawler requires a mapping file to be filled 
> out an mime-types defined
> 
> * MetExtractorProductCrawler example configuration can be found in the source:
>  - allows you to specify how the crawler will run your extractor
> https://svn.apache.org/repos/asf/oodt/trunk/metadata/src/main/resources/examples/extern-config.xml
> 
> * AutoDetectProductCrawler example configuration can be found in the source:
>  - uses the same metadata extractor specification file (you will have one of 
> these for each mime-type)
>  - allows you to define your mime-types -- that is, give a mime-type for a 
> given filename regular expression
> https://svn.apache.org/repos/asf/oodt/trunk/crawler/src/main/resources/examples/mimetypes.xml
> 
>    - your file might look something like:
>       
> <mime-info>
>       
> 
>       
>       <mime-type type="product/hdf5">
>               
>       
> <glob pattern="*.h5"/>
>       
>       
> </mime-type>
>       
> 
>       </mime-info>
>  - maps your mime-types to extractors
> https://svn.apache.org/repos/asf/oodt/trunk/crawler/src/main/resources/examples/mime-extractor-map.xml
> 
> Hope this helps . . .
> -brian
> 
> On Jun 01, 2011, at 12:54 PM, Thomas Bennett <[email protected]> wrote:
> 
>> Hi,
>> 
>> I've successfully got the CmdLineIngester working with an ExternMetExtractor 
>> (written in python):
>> 
>> However, when I try launch the crawler I get a warning telling me the the 
>> preconditions for ingest have not been met. No .met file has been created.
>> 
>> Two questions:
>> 1) I'm just wondering if there is any configuration that I'm missing.
>> 2) Where should I start hunting in the code or logs to find out why my met 
>> extractor was not run?
>> 
>> Kind regards,
>> Thomas
>> 
>> For your reference, here is the command and output.
>> 
>> bin$ ./crawler_launcher --crawlerId StdProductCrawler --productPath 
>> /usr/local/meerkat/data/staging/products/hdf5 --filemgrUrl 
>> http://localhost:9000 --failureDir /tmp --actionIds DeleteDataFile 
>> MoveDataFileToFailureDir Unique --metFileExtension met --clientTransferer 
>> org.apache.oodt.cas.filemgr.datatransfer.LocalDataTransferFactory 
>> --metExtractor org.apache.oodt.cas.metadata.extractors.ExternMetExtractor 
>> --metExtractorConfig 
>> /usr/local/meerkat/extractors/katextractor/katextractor.config
>> http://localhost:9000
>> StdProductCrawler
>> Jun 1, 2011 9:48:07 PM org.apache.oodt.cas.crawlProductCrawler crawl
>> INFO: Crawling /usr/local/meerkat/data/staging/products/hdf5
>> Jun 1, 2011 9:48:07 PM org.apache.oodt.cascrawl.ProductCrawler handleFile
>> INFO: Handling file 
>> /usr/local/meerkat/data/staging/products/hdf5/1263940095.h5
>> Jun 1, 2011 9:48:07 PM org.apache.oodt.cas.crawl.ProductCrawler handleFile
>> WARNING: Failed to pass preconditions for ingest of product: 
>> [/usr/local/meerkat/data/staging/products/hdf5/1263940095.h5]
>> 


++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Chris Mattmann, Ph.D.
Senior Computer Scientist
NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
Office: 171-266B, Mailstop: 171-246
Email: [email protected]
WWW:   http://sunset.usc.edu/~mattmann/
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Adjunct Assistant Professor, Computer Science Department
University of Southern California, Los Angeles, CA 90089 USA
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

Reply via email to