[jira] Commented: (NUTCH-139) Standard metadata property names in the ParseData metadata

2006-02-08 Thread Andrzej Bialecki (JIRA)
[ http://issues.apache.org/jira/browse/NUTCH-139?page=comments#action_12365623 ] Andrzej Bialecki commented on NUTCH-139: - I like this patch, the split of Metadata names into interfaces looks right. +1. > Standard metadata property names in the Par

[jira] Commented: (NUTCH-139) Standard metadata property names in the ParseData metadata

2006-02-08 Thread Doug Cutting (JIRA)
[ http://issues.apache.org/jira/browse/NUTCH-139?page=comments#action_12365619 ] Doug Cutting commented on NUTCH-139: +1 This looks great. Thanks for all the hard work on this one! > Standard metadata property names in the ParseData metadata > ---

[jira] Commented: (NUTCH-139) Standard metadata property names in the ParseData metadata

2006-02-03 Thread Jerome Charron (JIRA)
[ http://issues.apache.org/jira/browse/NUTCH-139?page=comments#action_12365103 ] Jerome Charron commented on NUTCH-139: -- > except for the sake of purity of OO approach Andrzej, as you noticed certainly, it is my defect... ;-) You know, I have still th

[jira] Commented: (NUTCH-139) Standard metadata property names in the ParseData metadata

2006-02-03 Thread Andrzej Bialecki (JIRA)
[ http://issues.apache.org/jira/browse/NUTCH-139?page=comments#action_12365098 ] Andrzej Bialecki commented on NUTCH-139: - FWIW, I agree with Doug on this - I don't see that subclasses would buy us much in terms of functionality, except for the sak

[jira] Commented: (NUTCH-139) Standard metadata property names in the ParseData metadata

2006-02-03 Thread Jerome Charron (JIRA)
[ http://issues.apache.org/jira/browse/NUTCH-139?page=comments#action_12365095 ] Jerome Charron commented on NUTCH-139: -- Ok Doug. Your point of view makes sense for me. I hope, I can provide a (final) patch for the next week. > Standard metadata proper

[jira] Commented: (NUTCH-139) Standard metadata property names in the ParseData metadata

2006-02-03 Thread Doug Cutting (JIRA)
[ http://issues.apache.org/jira/browse/NUTCH-139?page=comments#action_12365089 ] Doug Cutting commented on NUTCH-139: Jerome: yes, it makes sense, but there's also metadata that's not tightly related to the protocol or the parser, e.g., the nutch segmen

[jira] Commented: (NUTCH-139) Standard metadata property names in the ParseData metadata

2006-02-03 Thread Jerome Charron (JIRA)
[ http://issues.apache.org/jira/browse/NUTCH-139?page=comments#action_12365066 ] Jerome Charron commented on NUTCH-139: -- Sorry for this very late response... The idea behind separate subclasses of Metadata for content and parses is to enforce the seman

[jira] Commented: (NUTCH-139) Standard metadata property names in the ParseData metadata

2006-01-27 Thread Doug Cutting (JIRA)
[ http://issues.apache.org/jira/browse/NUTCH-139?page=comments#action_12364242 ] Doug Cutting commented on NUTCH-139: I was confused about which was the latest version. (I deleted the older versions. Is there a way to simply mark them obsolete?) So,

[jira] Commented: (NUTCH-139) Standard metadata property names in the ParseData metadata

2006-01-27 Thread Jerome Charron (JIRA)
[ http://issues.apache.org/jira/browse/NUTCH-139?page=comments#action_12364218 ] Jerome Charron commented on NUTCH-139: -- > I think we're near agreement here. I really hope ... ;-) > We should add an add() method to Metadata, and change set() to repl

Re: [jira] Commented: (NUTCH-139) Standard metadata property names in the ParseData metadata

2006-01-26 Thread Doug Cutting
Andrzej Bialecki wrote: Erhm.. please bear with me. I'd rather see these two classes in a separate package altogether, org.apache.nutch.metadata. The reason is that most likely these two classes will be used elsewhere too, not just in the protocol and parse/fetch related context. I'm specifical

Re: [jira] Commented: (NUTCH-139) Standard metadata property names in the ParseData metadata

2006-01-26 Thread Andrzej Bialecki
Doug Cutting (JIRA) wrote: [ http://issues.apache.org/jira/browse/NUTCH-139?page=comments#action_12364125 ] My apologies for commenting here - JIRA produces broken HTML for me, I can't use it... Doug Cutting commented on NUTCH-139: I think we're

[jira] Commented: (NUTCH-139) Standard metadata property names in the ParseData metadata

2006-01-26 Thread Doug Cutting (JIRA)
[ http://issues.apache.org/jira/browse/NUTCH-139?page=comments#action_12364125 ] Doug Cutting commented on NUTCH-139: I think we're near agreement here. Here are the changes I think this patch still needs: MetadataNames belongs in the protocol package,

[jira] Commented: (NUTCH-139) Standard metadata property names in the ParseData metadata

2006-01-26 Thread Chris A. Mattmann (JIRA)
[ http://issues.apache.org/jira/browse/NUTCH-139?page=comments#action_12364116 ] Chris A. Mattmann commented on NUTCH-139: - Just to add to Jerome's last comment, I think the key here is simplicity. As a software developer, and ultimately as an end u

[jira] Commented: (NUTCH-139) Standard metadata property names in the ParseData metadata

2006-01-26 Thread Jerome Charron (JIRA)
[ http://issues.apache.org/jira/browse/NUTCH-139?page=comments#action_12364112 ] Jerome Charron commented on NUTCH-139: -- In fact, the more I look at this, the more I agreed with last Doug comment. There is no real needs (for now) for a so complicated m

[jira] Commented: (NUTCH-139) Standard metadata property names in the ParseData metadata

2006-01-25 Thread Doug Cutting (JIRA)
[ http://issues.apache.org/jira/browse/NUTCH-139?page=comments#action_12363996 ] Doug Cutting commented on NUTCH-139: I think this is all easily handled by naming, and that we don't need another map. We keep using "title" and "content-type" as examples

[jira] Commented: (NUTCH-139) Standard metadata property names in the ParseData metadata

2006-01-25 Thread Andrzej Bialecki (JIRA)
[ http://issues.apache.org/jira/browse/NUTCH-139?page=comments#action_12363942 ] Andrzej Bialecki commented on NUTCH-139: - Yes, this should work ok ... but it strikes me as unnecessarily complicated. After all, in most cases we will have single val

[jira] Commented: (NUTCH-139) Standard metadata property names in the ParseData metadata

2006-01-24 Thread Jerome Charron (JIRA)
[ http://issues.apache.org/jira/browse/NUTCH-139?page=comments#action_12363834 ] Jerome Charron commented on NUTCH-139: -- Andrzej, I really don't like this "X-Nutch" naming convention. First it's really protocol level oriented, and it forces to map "X-

[jira] Commented: (NUTCH-139) Standard metadata property names in the ParseData metadata

2006-01-20 Thread Andrzej Bialecki (JIRA)
[ http://issues.apache.org/jira/browse/NUTCH-139?page=comments#action_12363394 ] Andrzej Bialecki commented on NUTCH-139: - Yes, I agree with the split into a generic MetaData container, and subclasses that define necessary constants for metadata na

[jira] Commented: (NUTCH-139) Standard metadata property names in the ParseData metadata

2006-01-19 Thread Chris A. Mattmann (JIRA)
[ http://issues.apache.org/jira/browse/NUTCH-139?page=comments#action_12363352 ] Chris A. Mattmann commented on NUTCH-139: - Hi Jerome, >org.apache.nutch.parse.ParseData > * The constructor becomes ParseData(ParseStatus, String, Outlink[], > Con

[jira] Commented: (NUTCH-139) Standard metadata property names in the ParseData metadata

2006-01-13 Thread Jerome Charron (JIRA)
[ http://issues.apache.org/jira/browse/NUTCH-139?page=comments#action_12362618 ] Jerome Charron commented on NUTCH-139: -- Here is a new proposal for this issue. org.apache.nutch.util.MetaData * becomes an utility class that is only a container of mult

[jira] Commented: (NUTCH-139) Standard metadata property names in the ParseData metadata

2006-01-09 Thread Doug Cutting (JIRA)
[ http://issues.apache.org/jira/browse/NUTCH-139?page=comments#action_12362249 ] Doug Cutting commented on NUTCH-139: Let me try to be more concrete. I'd prefer that the X-nutch properties be removed from MetadataNames before this is committed, and mov

[jira] Commented: (NUTCH-139) Standard metadata property names in the ParseData metadata

2006-01-09 Thread Doug Cutting (JIRA)
[ http://issues.apache.org/jira/browse/NUTCH-139?page=comments#action_12362242 ] Doug Cutting commented on NUTCH-139: We can just use different names, rather than two metaData objects: X-nutch names for derived or other values that are usually protocol

[jira] Commented: (NUTCH-139) Standard metadata property names in the ParseData metadata

2006-01-07 Thread Jerome Charron (JIRA)
[ http://issues.apache.org/jira/browse/NUTCH-139?page=comments#action_12362061 ] Jerome Charron commented on NUTCH-139: -- I agree with your analysis Andrzej. I suggested to commit this patch because it is a response to this issue: standard metadata name

[jira] Commented: (NUTCH-139) Standard metadata property names in the ParseData metadata

2006-01-07 Thread Andrzej Bialecki (JIRA)
[ http://issues.apache.org/jira/browse/NUTCH-139?page=comments#action_12362049 ] Andrzej Bialecki commented on NUTCH-139: - I see three issues here: * using standard metadata names and handling misspelles/erroneous ones: this patch provides this fu

[jira] Commented: (NUTCH-139) Standard metadata property names in the ParseData metadata

2006-01-06 Thread Jerome Charron (JIRA)
[ http://issues.apache.org/jira/browse/NUTCH-139?page=comments#action_12362013 ] Jerome Charron commented on NUTCH-139: -- Doug, The purpose of this patch is to provide some standard metadata names and to be able to handle erroneous names, not to handle

[jira] Commented: (NUTCH-139) Standard metadata property names in the ParseData metadata

2006-01-06 Thread Doug Cutting (JIRA)
[ http://issues.apache.org/jira/browse/NUTCH-139?page=comments#action_12362003 ] Doug Cutting commented on NUTCH-139: Also, since the primary use of multiple metadata values should be for protocols where multiple-values are required, the method to add a

Re: [jira] Commented: (NUTCH-139) Standard metadata property names in the ParseData metadata

2006-01-06 Thread Doug Cutting
Chris Mattmann wrote: I've tried removing the 5 copies of the comment, however I can't find a button on JIRA to remove comments. Maybe an administrator for Nutch can do it? I removed the extra comments. No problem. Doug

[jira] Commented: (NUTCH-139) Standard metadata property names in the ParseData metadata

2006-01-06 Thread Doug Cutting (JIRA)
[ http://issues.apache.org/jira/browse/NUTCH-139?page=comments#action_12361994 ] Doug Cutting commented on NUTCH-139: Jerome, Some HTTP headers have multiple values. Correctly reflecting that was I thought the primary motivation for adding multiple va

RE: [jira] Commented: (NUTCH-139) Standard metadata property names in the ParseData metadata

2006-01-05 Thread Chris Mattmann
ology. > -Original Message- > From: Doug Cutting (JIRA) [mailto:[EMAIL PROTECTED] > Sent: Thursday, January 05, 2006 8:04 PM > To: nutch-dev@incubator.apache.org > Subject: [jira] Commented: (NUTCH-139) Standard metadata property names in > the ParseData metadata > >

RE: [jira] Commented: (NUTCH-139) Standard metadata property names in the ParseData metadata

2006-01-05 Thread Chris Mattmann
stitute of Technology. > -Original Message- > From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] > Sent: Thursday, January 05, 2006 8:28 PM > To: nutch-dev@lucene.apache.org > Subject: RE: [jira] Commented: (NUTCH-139) Standard metadata property > names in the ParseData

RE: [jira] Commented: (NUTCH-139) Standard metadata property names in the ParseData metadata

2006-01-05 Thread chris.mattmann
ology. > -Original Message- > From: Doug Cutting (JIRA) [mailto:[EMAIL PROTECTED] > Sent: Thursday, January 05, 2006 8:04 PM > To: nutch-dev@incubator.apache.org > Subject: [jira] Commented: (NUTCH-139) Standard metadata property names in > the ParseData metadata > >

[jira] Commented: (NUTCH-139) Standard metadata property names in the ParseData metadata

2006-01-05 Thread Chris A. Mattmann (JIRA)
[ http://issues.apache.org/jira/browse/NUTCH-139?page=comments#action_12361927 ] Chris A. Mattmann commented on NUTCH-139: - Hi Doug, While it's true that content-length can be computed from the Content's data, wouldn't it also be nice to have it

[jira] Commented: (NUTCH-139) Standard metadata property names in the ParseData metadata

2006-01-05 Thread Chris A. Mattmann (JIRA)
[ http://issues.apache.org/jira/browse/NUTCH-139?page=comments#action_12361926 ] Chris A. Mattmann commented on NUTCH-139: - Hi Doug, While it's true that content-length can be computed from the Content's data, wouldn't it also be nice to have it

[jira] Commented: (NUTCH-139) Standard metadata property names in the ParseData metadata

2006-01-05 Thread Chris A. Mattmann (JIRA)
[ http://issues.apache.org/jira/browse/NUTCH-139?page=comments#action_12361925 ] Chris A. Mattmann commented on NUTCH-139: - Hi Doug, While it's true that content-length can be computed from the Content's data, wouldn't it also be nice to have it

[jira] Commented: (NUTCH-139) Standard metadata property names in the ParseData metadata

2006-01-05 Thread Chris A. Mattmann (JIRA)
[ http://issues.apache.org/jira/browse/NUTCH-139?page=comments#action_12361924 ] Chris A. Mattmann commented on NUTCH-139: - Hi Doug, While it's true that content-length can be computed from the Content's data, wouldn't it also be nice to have it

[jira] Commented: (NUTCH-139) Standard metadata property names in the ParseData metadata

2006-01-05 Thread Chris A. Mattmann (JIRA)
[ http://issues.apache.org/jira/browse/NUTCH-139?page=comments#action_12361923 ] Chris A. Mattmann commented on NUTCH-139: - Hi Doug, While it's true that content-length can be computed from the Content's data, wouldn't it also be nice to have it

[jira] Commented: (NUTCH-139) Standard metadata property names in the ParseData metadata

2006-01-05 Thread Doug Cutting (JIRA)
[ http://issues.apache.org/jira/browse/NUTCH-139?page=comments#action_12361922 ] Doug Cutting commented on NUTCH-139: One more thing. Content length should also not need to be stored in the metadata as an x-nutch value. The content length is simply th

[jira] Commented: (NUTCH-139) Standard metadata property names in the ParseData metadata

2006-01-05 Thread Jerome Charron (JIRA)
[ http://issues.apache.org/jira/browse/NUTCH-139?page=comments#action_12361900 ] Jerome Charron commented on NUTCH-139: -- Doug, This implementation is a multi-valued implementation: 1. The protocol headers are stored as-is. 2. Then correct values (guess

[jira] Commented: (NUTCH-139) Standard metadata property names in the ParseData metadata

2006-01-05 Thread Doug Cutting (JIRA)
[ http://issues.apache.org/jira/browse/NUTCH-139?page=comments#action_12361891 ] Doug Cutting commented on NUTCH-139: If we store protocol headers as metadata then we should store them as-is. If they're incorrect, then we should store the correct value

[jira] Commented: (NUTCH-139) Standard metadata property names in the ParseData metadata

2006-01-05 Thread Andrzej Bialecki (JIRA)
[ http://issues.apache.org/jira/browse/NUTCH-139?page=comments#action_12361889 ] Andrzej Bialecki commented on NUTCH-139: - Looks good to me, +1 > Standard metadata property names in the ParseData metadata > -

[jira] Commented: (NUTCH-139) Standard metadata property names in the ParseData metadata

2005-12-21 Thread Jerome Charron (JIRA)
[ http://issues.apache.org/jira/browse/NUTCH-139?page=comments#action_12361045 ] Jerome Charron commented on NUTCH-139: -- Andrzej, Do you read in my mind? Yes of course, that's the way I want to do it: First checks for the most common cases (lower case

[jira] Commented: (NUTCH-139) Standard metadata property names in the ParseData metadata

2005-12-21 Thread Andrzej Bialecki (JIRA)
[ http://issues.apache.org/jira/browse/NUTCH-139?page=comments#action_12361043 ] Andrzej Bialecki commented on NUTCH-139: - Regarding the move to a class with public static fields: I don't have any problem with that. Regarding the Levenshtein dista

[jira] Commented: (NUTCH-139) Standard metadata property names in the ParseData metadata

2005-12-21 Thread Jerome Charron (JIRA)
[ http://issues.apache.org/jira/browse/NUTCH-139?page=comments#action_12361041 ] Jerome Charron commented on NUTCH-139: -- Ok, Chris and me will implement MetadataNames in this way. Just some few comments: I plan to move the MetadataNames to a class rath

[jira] Commented: (NUTCH-139) Standard metadata property names in the ParseData metadata

2005-12-20 Thread Andrzej Bialecki (JIRA)
[ http://issues.apache.org/jira/browse/NUTCH-139?page=comments#action_12360933 ] Andrzej Bialecki commented on NUTCH-139: - I like Jerome's proposal of using the new ContentProperties class; this could save a lot of work, especially this naming mess

[jira] Commented: (NUTCH-139) Standard metadata property names in the ParseData metadata

2005-12-20 Thread Chris A. Mattmann (JIRA)
[ http://issues.apache.org/jira/browse/NUTCH-139?page=comments#action_12360931 ] Chris A. Mattmann commented on NUTCH-139: - Hmm, Okay, I just finished reading the rest of the comments :-) Sorry, just woke up out here in Los Angeles. Okay, I thin

[jira] Commented: (NUTCH-139) Standard metadata property names in the ParseData metadata

2005-12-20 Thread Chris A. Mattmann (JIRA)
[ http://issues.apache.org/jira/browse/NUTCH-139?page=comments#action_12360929 ] Chris A. Mattmann commented on NUTCH-139: - Hi Andrzej, > I have an objection, in fact I think the patches miss the main point of using > of prefixed property names. D

[jira] Commented: (NUTCH-139) Standard metadata property names in the ParseData metadata

2005-12-20 Thread Jerome Charron (JIRA)
[ http://issues.apache.org/jira/browse/NUTCH-139?page=comments#action_12360920 ] Jerome Charron commented on NUTCH-139: -- And why not using the fact that the ContentProperties object can now handles multi-valued properties. Each piece of code that wants

[jira] Commented: (NUTCH-139) Standard metadata property names in the ParseData metadata

2005-12-20 Thread Andrzej Bialecki (JIRA)
[ http://issues.apache.org/jira/browse/NUTCH-139?page=comments#action_12360909 ] Andrzej Bialecki commented on NUTCH-139: - Yes, that was again the reason for prefixing - we want to keep as much of the original metadata as we can, to facilitate vari

[jira] Commented: (NUTCH-139) Standard metadata property names in the ParseData metadata

2005-12-20 Thread Jerome Charron (JIRA)
[ http://issues.apache.org/jira/browse/NUTCH-139?page=comments#action_12360906 ] Jerome Charron commented on NUTCH-139: -- Andrzej, Here are more comments about my doubts, and how to handle metadata names. if for instance a protocol plugin doesn't have

[jira] Commented: (NUTCH-139) Standard metadata property names in the ParseData metadata

2005-12-20 Thread Jerome Charron (JIRA)
[ http://issues.apache.org/jira/browse/NUTCH-139?page=comments#action_12360902 ] Jerome Charron commented on NUTCH-139: -- Andrzej, Thanks for taking time to take a look at the patch. In fact, we have some discussion with Chris about this point (that's w

[jira] Commented: (NUTCH-139) Standard metadata property names in the ParseData metadata

2005-12-20 Thread Andrzej Bialecki (JIRA)
[ http://issues.apache.org/jira/browse/NUTCH-139?page=comments#action_12360901 ] Andrzej Bialecki commented on NUTCH-139: - I have an objection, in fact I think the patches miss the main point of using of prefixed property names. In this patch only

[jira] Commented: (NUTCH-139) Standard metadata property names in the ParseData metadata

2005-12-17 Thread Chris A. Mattmann (JIRA)
[ http://issues.apache.org/jira/browse/NUTCH-139?page=comments#action_12360681 ] Chris A. Mattmann commented on NUTCH-139: - Hi Doug, Jerome, >I'm confused as to why all of the constant names have "X_nutch" in them. I'd >expect to see something lik

[jira] Commented: (NUTCH-139) Standard metadata property names in the ParseData metadata

2005-12-17 Thread Jerome Charron (JIRA)
[ http://issues.apache.org/jira/browse/NUTCH-139?page=comments#action_12360659 ] Jerome Charron commented on NUTCH-139: -- +1 with Doug comments: * Remove X_nutch to constants names * Add "X-nutch-" prefix to constants values * Move constants definitions

[jira] Commented: (NUTCH-139) Standard metadata property names in the ParseData metadata

2005-12-16 Thread Doug Cutting (JIRA)
[ http://issues.apache.org/jira/browse/NUTCH-139?page=comments#action_12360645 ] Doug Cutting commented on NUTCH-139: I'm confused as to why all of the constant names have "X_nutch" in them. I'd expect to see something like that in their string values,

[jira] Commented: (NUTCH-139) Standard metadata property names in the ParseData metadata

2005-12-13 Thread Chris A. Mattmann (JIRA)
[ http://issues.apache.org/jira/browse/NUTCH-139?page=comments#action_12360389 ] Chris A. Mattmann commented on NUTCH-139: - According to Andrzej: "I agree, too. Perhaps we should use the names as they appear in the Dublin Core for those properties