Hi Ricky, On Nov 3, 2011, at 2:04 PM, Nguyen, Ricky wrote:
> Cool. Thanks Chris. Briefly, what's the reasoning behind this design decision? No problem. The rationale mainly has to do with lifecycle of the Metadata object itself, and with the notion of versioning. Versioning is really a process to generate the final data store references, and is used in conjunction with data transferring as part of the archiving process. So intrinsically, versioning is co-located with wherever the data transfer occurs. In addition, there are no guarantees for write back from the versioner and its copy of the Metadata object provided to it. That's because the metadata object is read-only at that point in time. This is done to reduce scoping, and to ensure that the only pieces of code that can modify that metadata object are: 1. client-side metadata extractors 2. server-side metadata extractors This is mainly due to the desire to localize changes to metadata in order to eventually persist the metadata in the catalog. The above is depicted pictorially here, in the final Use Case section: http://oodt.apache.org/components/maven/filemgr/development/developer.html > > Also, when developing custom metExtractors, what factors go into the decision > whether to use client-side vs server-side extraction for a particular > ProductType? Great question. Client-side metadata extractors plug into e.g., the Curator, to the Crawler, etc., These are particularly useful for stand-alone types of extraction processes. Server side metadata extractors are useful for derived metadata; for catch-all situations where you want to make sure certain fields are filled, and where you want to co-locate metadata extraction (and associated library dependencies) with the file manager server. HTH! Cheers, Chris > On Nov 3, 2011, at 12:54 PM, Mattmann, Chris A (388J) wrote: > >> On Nov 3, 2011, at 2:44 AM, Nguyen, Ricky wrote: >> >>> in short: >>> (1) client-side metExtractor + versioner = all client-extracted met is >>> available to the versioner >>> (2) server-side metExtractor + versioner = server-extracted met is NOT >>> available to the versioner (unless, as Chris suggested, versioner re-runs >>> server-side metExtractor) >>> >>> Is (2) expected behavior? >> >> Yep, sure is. >> >> Cheers, >> Chris >> >>> -Ricky >>> >>> On Nov 2, 2011, at 10:16 PM, Mattmann, Chris A (388J) wrote: >>> >>>> Hi Ricky, >>>> >>>> You're running into the issue of where/when Versioning is done. >>>> >>>> Right now you are using a server-side met extractor -- that metadata is >>>> extracted on the server side, cataloged, >>>> but is _not_ passed back to the client, for use in client-side versioning >>>> (which I'm guessing you're using). >>>> >>>> One way around this is to take an approach similar to the >>>> FinalFileLocationExtractor -- that is: make your >>>> versioner run the server side met extractor as part of its versioning >>>> process to derive the same metadata >>>> that you want used for versioning. Or, alternatively, bake in somehow (to >>>> the metadata stream that you >>>> use in read-only form in the versioner) the field that you are interested >>>> in flowing through. >>>> >>>> HTH! >>>> >>>> Cheers, >>>> Chris >>>> >>>> On Nov 2, 2011, at 4:20 PM, Nguyen, Ricky wrote: >>>> >>>>> Hi, >>>>> >>>>> My MetadataBasedFileVersioner can't see the met produced by my custom >>>>> metExtractor >>>>> >>>>> I've read OODT-72. That issue describes using the Versioner's calculated >>>>> Reference to assign Metadata (ver -> met). My issue is the opposite >>>>> direction, using extracted Metadata in the Versioner's Reference >>>>> calculation. >>>>> >>>>> For example, suppose my metExtractor assigns a value to the "MRN" >>>>> element. Then I want my versioner to create a datastore reference at >>>>> "/[MRN]/[Filename]". >>>>> >>>>> My product-types.xml (abbreviated): >>>>> <type name="CustomProdType"/> >>>>> <versioner class="CustomMetBasedFileVersioner"/> >>>>> <extractor class="CoreMetExtractor"/> >>>>> <extractor class="MimeTypeExtractor"/> >>>>> <extractor class="MRNExtractor"/> >>>>> <extractor class="FinalFileLocationExtractor"/> >>>>> </type> >>>>> >>>>> After I ingest the file, I dump the met (using MetadataDumper) and the >>>>> product (using ProductDumper). The met looks fine: >>>>> <key>FileLocation</key> >>>>> <val>%2FUsers%2Frnguyen%2Fvpicu%2Fdata%2Farchive%2FMRN_1010209</val> >>>>> >>>>> But the product reference doesn't: >>>>> <reference >>>>> dataStore="file:/Users/rnguyen/vpicu/data/archive/MRN_null/null" >>>>> orig="file:///Users/rnguyen/vpicu/components/filemgr/policy/cerner/vps_demog.csv" >>>>> size="1114427"/> >>>>> >>>>> Is this an issue? Or am I not using the components correctly? Is there a >>>>> better way to achieve what I want? >>>>> >>>>> Thanks, >>>>> Ricky >>>>> >>>>> >>>>> --------------------------------------------------------------------- >>>>> CONFIDENTIALITY NOTICE: This e-mail message, including any attachments, >>>>> is for the sole use of the intended recipient(s) and may contain >>>>> confidential >>>>> or legally privileged information. Any unauthorized review, use, >>>>> disclosure >>>>> or distribution is prohibited. If you are not the intended recipient, >>>>> please >>>>> contact the sender by reply e-mail and destroy all copies of this >>>>> original message. >>>>> >>>>> --------------------------------------------------------------------- >>>>> >>>> >>>> >>>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ >>>> Chris Mattmann, Ph.D. >>>> Senior Computer Scientist >>>> NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA >>>> Office: 171-266B, Mailstop: 171-246 >>>> Email: [email protected] >>>> WWW: http://sunset.usc.edu/~mattmann/ >>>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ >>>> Adjunct Assistant Professor, Computer Science Department >>>> University of Southern California, Los Angeles, CA 90089 USA >>>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ >>>> >>> >>> >>> >>> >>> --------------------------------------------------------------------- >>> CONFIDENTIALITY NOTICE: This e-mail message, including any attachments, >>> is for the sole use of the intended recipient(s) and may contain >>> confidential >>> or legally privileged information. Any unauthorized review, use, disclosure >>> or distribution is prohibited. If you are not the intended recipient, please >>> contact the sender by reply e-mail and destroy all copies of this original >>> message. >>> >>> --------------------------------------------------------------------- >>> >> >> >> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ >> Chris Mattmann, Ph.D. >> Senior Computer Scientist >> NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA >> Office: 171-266B, Mailstop: 171-246 >> Email: [email protected] >> WWW: http://sunset.usc.edu/~mattmann/ >> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ >> Adjunct Assistant Professor, Computer Science Department >> University of Southern California, Los Angeles, CA 90089 USA >> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ >> > > > > > --------------------------------------------------------------------- > CONFIDENTIALITY NOTICE: This e-mail message, including any attachments, > is for the sole use of the intended recipient(s) and may contain confidential > or legally privileged information. Any unauthorized review, use, disclosure > or distribution is prohibited. If you are not the intended recipient, please > contact the sender by reply e-mail and destroy all copies of this original > message. > > --------------------------------------------------------------------- > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ Chris Mattmann, Ph.D. Senior Computer Scientist NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA Office: 171-266B, Mailstop: 171-246 Email: [email protected] WWW: http://sunset.usc.edu/~mattmann/ ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ Adjunct Assistant Professor, Computer Science Department University of Southern California, Los Angeles, CA 90089 USA ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
