Hi Ricky,

On Nov 3, 2011, at 2:04 PM, Nguyen, Ricky wrote:

> Cool. Thanks Chris. Briefly, what's the reasoning behind this design decision?

No problem. The rationale mainly has to do with lifecycle of the Metadata 
object itself, and
with the notion of versioning. Versioning is really a process to generate the 
final data store 
references, and is used in conjunction with data transferring as part of the 
archiving process. 
So intrinsically, versioning is co-located with wherever the data transfer 
occurs. 

In addition, there are no guarantees for write back from the versioner and its 
copy of the 
Metadata object provided to it. That's because the metadata object is read-only 
at that point in 
time. This is done to reduce scoping, and to ensure that the only pieces of 
code that can modify 
that metadata object are:

1. client-side metadata extractors
2. server-side metadata extractors

This is mainly due to the desire to localize changes to metadata in order to 
eventually persist 
the metadata in the catalog.

The above is depicted pictorially here, in the final Use Case section:

http://oodt.apache.org/components/maven/filemgr/development/developer.html

> 
> Also, when developing custom metExtractors, what factors go into the decision 
> whether to use client-side vs server-side extraction for a particular 
> ProductType?

Great question. Client-side metadata extractors plug into e.g., the Curator, to 
the Crawler, etc., These are particularly useful for stand-alone 
types of extraction processes. Server side metadata extractors are useful for 
derived metadata; for catch-all situations where you want to make 
sure certain fields are filled, and where you want to co-locate metadata 
extraction (and associated library dependencies) with the file manager
server.

HTH!

Cheers,
Chris

> On Nov 3, 2011, at 12:54 PM, Mattmann, Chris A (388J) wrote:
> 
>> On Nov 3, 2011, at 2:44 AM, Nguyen, Ricky wrote:
>> 
>>> in short:
>>> (1) client-side metExtractor + versioner = all client-extracted met is 
>>> available to the versioner
>>> (2) server-side metExtractor + versioner = server-extracted met is NOT 
>>> available to the versioner (unless, as Chris suggested, versioner re-runs 
>>> server-side metExtractor)
>>> 
>>> Is (2) expected behavior?
>> 
>> Yep, sure is. 
>> 
>> Cheers,
>> Chris
>> 
>>> -Ricky
>>> 
>>> On Nov 2, 2011, at 10:16 PM, Mattmann, Chris A (388J) wrote:
>>> 
>>>> Hi Ricky,
>>>> 
>>>> You're running into the issue of where/when Versioning is done. 
>>>> 
>>>> Right now you are using a server-side met extractor -- that metadata is 
>>>> extracted on the server side, cataloged, 
>>>> but is _not_ passed back to the client, for use in client-side versioning 
>>>> (which I'm guessing you're using). 
>>>> 
>>>> One way around this is to take an approach similar to the 
>>>> FinalFileLocationExtractor -- that is: make your 
>>>> versioner run the server side met extractor as part of its versioning 
>>>> process to derive the same metadata 
>>>> that you want used for versioning. Or, alternatively, bake in somehow (to 
>>>> the metadata stream that you 
>>>> use in read-only form in the versioner) the field that you are interested 
>>>> in flowing through.
>>>> 
>>>> HTH!
>>>> 
>>>> Cheers,
>>>> Chris
>>>> 
>>>> On Nov 2, 2011, at 4:20 PM, Nguyen, Ricky wrote:
>>>> 
>>>>> Hi,
>>>>> 
>>>>> My MetadataBasedFileVersioner can't see the met produced by my custom 
>>>>> metExtractor
>>>>> 
>>>>> I've read OODT-72. That issue describes using the Versioner's calculated 
>>>>> Reference to assign Metadata (ver -> met). My issue is the opposite 
>>>>> direction, using extracted Metadata in the Versioner's Reference 
>>>>> calculation.
>>>>> 
>>>>> For example, suppose my metExtractor assigns a value to the "MRN" 
>>>>> element. Then I want my versioner to create a datastore reference at 
>>>>> "/[MRN]/[Filename]".
>>>>> 
>>>>> My product-types.xml (abbreviated):
>>>>> <type name="CustomProdType"/>
>>>>> <versioner class="CustomMetBasedFileVersioner"/>
>>>>> <extractor class="CoreMetExtractor"/>
>>>>> <extractor class="MimeTypeExtractor"/>
>>>>> <extractor class="MRNExtractor"/>
>>>>> <extractor class="FinalFileLocationExtractor"/>
>>>>> </type>
>>>>> 
>>>>> After I ingest the file, I dump the met (using MetadataDumper) and the 
>>>>> product (using ProductDumper). The met looks fine:
>>>>> <key>FileLocation</key>
>>>>> <val>%2FUsers%2Frnguyen%2Fvpicu%2Fdata%2Farchive%2FMRN_1010209</val>
>>>>> 
>>>>> But the product reference doesn't:
>>>>> <reference 
>>>>> dataStore="file:/Users/rnguyen/vpicu/data/archive/MRN_null/null" 
>>>>> orig="file:///Users/rnguyen/vpicu/components/filemgr/policy/cerner/vps_demog.csv"
>>>>>  size="1114427"/>
>>>>> 
>>>>> Is this an issue? Or am I not using the components correctly? Is there a 
>>>>> better way to achieve what I want?
>>>>> 
>>>>> Thanks,
>>>>> Ricky
>>>>> 
>>>>> 
>>>>> ---------------------------------------------------------------------
>>>>> CONFIDENTIALITY NOTICE: This e-mail message, including any attachments, 
>>>>> is for the sole use of the intended recipient(s) and may contain 
>>>>> confidential
>>>>> or legally privileged information. Any unauthorized review, use, 
>>>>> disclosure
>>>>> or distribution is prohibited. If you are not the intended recipient, 
>>>>> please
>>>>> contact the sender by reply e-mail and destroy all copies of this 
>>>>> original message.  
>>>>> 
>>>>> ---------------------------------------------------------------------
>>>>> 
>>>> 
>>>> 
>>>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>>> Chris Mattmann, Ph.D.
>>>> Senior Computer Scientist
>>>> NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
>>>> Office: 171-266B, Mailstop: 171-246
>>>> Email: [email protected]
>>>> WWW:   http://sunset.usc.edu/~mattmann/
>>>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>>> Adjunct Assistant Professor, Computer Science Department
>>>> University of Southern California, Los Angeles, CA 90089 USA
>>>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>>> 
>>> 
>>> 
>>> 
>>> 
>>> ---------------------------------------------------------------------
>>> CONFIDENTIALITY NOTICE: This e-mail message, including any attachments, 
>>> is for the sole use of the intended recipient(s) and may contain 
>>> confidential
>>> or legally privileged information. Any unauthorized review, use, disclosure
>>> or distribution is prohibited. If you are not the intended recipient, please
>>> contact the sender by reply e-mail and destroy all copies of this original 
>>> message.  
>>> 
>>> ---------------------------------------------------------------------
>>> 
>> 
>> 
>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>> Chris Mattmann, Ph.D.
>> Senior Computer Scientist
>> NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
>> Office: 171-266B, Mailstop: 171-246
>> Email: [email protected]
>> WWW:   http://sunset.usc.edu/~mattmann/
>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>> Adjunct Assistant Professor, Computer Science Department
>> University of Southern California, Los Angeles, CA 90089 USA
>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>> 
> 
> 
> 
> 
> ---------------------------------------------------------------------
> CONFIDENTIALITY NOTICE: This e-mail message, including any attachments, 
> is for the sole use of the intended recipient(s) and may contain confidential
> or legally privileged information. Any unauthorized review, use, disclosure
> or distribution is prohibited. If you are not the intended recipient, please
> contact the sender by reply e-mail and destroy all copies of this original 
> message.  
> 
> ---------------------------------------------------------------------
> 


++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Chris Mattmann, Ph.D.
Senior Computer Scientist
NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
Office: 171-266B, Mailstop: 171-246
Email: [email protected]
WWW:   http://sunset.usc.edu/~mattmann/
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Adjunct Assistant Professor, Computer Science Department
University of Southern California, Los Angeles, CA 90089 USA
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

Reply via email to