[ 
http://issues.apache.org/jira/browse/NUTCH-139?page=comments#action_12363834 ] 

Jerome Charron commented on NUTCH-139:
--------------------------------------

Andrzej,

I really don't like this "X-Nutch" naming convention. First it's really 
protocol level oriented, and it forces to map "X-Nutch" values with original 
ones (of course an utility method can easily provides this mapping). But I 
really think this solution is really clean (from my point of view).

We should perhaps define one more time what is a MetaData value.
I suggest to define a new class to represent a metadata value instead of using 
a simple String.
Thus, we can define a class that holds both original and final value.
The idea is that the only way to set the original value is to construct a new 
object (I will call this class MetaValue, but native english speakers are 
encourage to propose a better name), then when you set the value of this 
metadata value, it never override the original one, but the final one.
Here is a short piece of code:

public class MetaValue {
    private String[] original = null;
    private List actual = null;

    public MetaValue(String[] values) {
        // Constructor for multi value
        original = values;
    }
    public MetaValue(String value) {
        // Constructor for single value
       original = new String[] { value };
    }
   public void setValue(String[] values) {
       // copies the values in a new empty actual list
   }

   public void addValue(String value) {
       // append this value to the list of values
   }

   public String[] getOriginalValues() { }

   public String[] getFinalValues() { }

   public String[] getValues() {
       // Return the final values if the list of values is not null
      // otherwise return the final values
  }
}

With this approach we can keep the same value (MetaValue) with the same key.


> Standard metadata property names in the ParseData metadata
> ----------------------------------------------------------
>
>          Key: NUTCH-139
>          URL: http://issues.apache.org/jira/browse/NUTCH-139
>      Project: Nutch
>         Type: Improvement
>   Components: fetcher
>     Versions: 0.7.1, 0.7, 0.6, 0.7.2-dev, 0.8-dev
>  Environment: Power Mac OS X 10.4, Dual Processor G5 2.0 Ghz, 1.5 GB  RAM, 
> although bug is independent of environment
>     Reporter: Chris A. Mattmann
>     Assignee: Chris A. Mattmann
>     Priority: Minor
>      Fix For: 0.7.2-dev, 0.8-dev, 0.7.1, 0.7, 0.6
>  Attachments: NUTCH-139.060105.patch, NUTCH-139.Mattmann.patch.txt, 
> NUTCH-139.jc.review.patch.txt
>
> Currently, people are free to name their string-based properties anything 
> that they want, such as having names of "Content-type", "content-TyPe", 
> "CONTENT_TYPE" all having the same meaning. Stefan G. I believe proposed a 
> solution in which all property names be converted to lower case, but in 
> essence this really only fixes half the problem right (the case of 
> identifying that "CONTENT_TYPE"
> and "conTeNT_TyPE" and all the permutations are really the same). What about
> if I named it "Content     Type", or "ContentType"?
>  I propose that a way to correct this would be to create a standard set of 
> named Strings in the ParseData class that the protocol framework and the 
> parsing framework could use to identify common properties such as 
> "Content-type", "Creator", "Language", etc.
>  The properties would be defined at the top of the ParseData class, something 
> like:
>  public class ParseData{
>    .....
>     public static final String CONTENT_TYPE = "content-type";
>     public static final String CREATOR = "creator";
>    ....
> }
> In this fashion, users could at least know what the name of the standard 
> properties that they can obtain from the ParseData are, for example by making 
> a call to ParseData.getMetadata().get(ParseData.CONTENT_TYPE) to get the 
> content type or a call to ParseData.getMetadata().set(ParseData.CONTENT_TYPE, 
> "text/xml"); Of course, this wouldn't preclude users from doing what they are 
> currently doing, it would just provide a standard method of obtaining some of 
> the more common, critical metadata without pouring over the code base to 
> figure out what they are named.
> I'll contribute a patch near the end of the this week, or beg. of next week 
> that addresses this issue.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira



-------------------------------------------------------
This SF.net email is sponsored by: Splunk Inc. Do you grep through log files
for problems?  Stop!  Download the new AJAX search engine that makes
searching your log files as easy as surfing the  web.  DOWNLOAD SPLUNK!
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=103432&bid=230486&dat=121642
_______________________________________________
Nutch-developers mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-developers

Reply via email to