[
http://issues.apache.org/jira/browse/NUTCH-139?page=comments#action_12365619 ]
Doug Cutting commented on NUTCH-139:
+1 This looks great. Thanks for all the hard work on this one!
Standard metadata property names in the ParseData metadata
[
http://issues.apache.org/jira/browse/NUTCH-139?page=comments#action_12365623 ]
Andrzej Bialecki commented on NUTCH-139:
-
I like this patch, the split of Metadata names into interfaces looks right. +1.
Standard metadata property names in the
[
http://issues.apache.org/jira/browse/NUTCH-139?page=comments#action_12365066 ]
Jerome Charron commented on NUTCH-139:
--
Sorry for this very late response...
The idea behind separate subclasses of Metadata for content and parses is to
enforce the
[
http://issues.apache.org/jira/browse/NUTCH-139?page=comments#action_12365089 ]
Doug Cutting commented on NUTCH-139:
Jerome: yes, it makes sense, but there's also metadata that's not tightly
related to the protocol or the parser, e.g., the nutch
[
http://issues.apache.org/jira/browse/NUTCH-139?page=comments#action_12365095 ]
Jerome Charron commented on NUTCH-139:
--
Ok Doug. Your point of view makes sense for me.
I hope, I can provide a (final) patch for the next week.
Standard metadata
[
http://issues.apache.org/jira/browse/NUTCH-139?page=comments#action_12365098 ]
Andrzej Bialecki commented on NUTCH-139:
-
FWIW, I agree with Doug on this - I don't see that subclasses would buy us much
in terms of functionality, except for the
[
http://issues.apache.org/jira/browse/NUTCH-139?page=comments#action_12365103 ]
Jerome Charron commented on NUTCH-139:
--
except for the sake of purity of OO approach
Andrzej, as you noticed certainly, it is my defect... ;-)
You know, I have still
[
http://issues.apache.org/jira/browse/NUTCH-139?page=comments#action_12364218 ]
Jerome Charron commented on NUTCH-139:
--
I think we're near agreement here.
I really hope ... ;-)
We should add an add() method to Metadata, and change set() to
[
http://issues.apache.org/jira/browse/NUTCH-139?page=comments#action_12364112 ]
Jerome Charron commented on NUTCH-139:
--
In fact, the more I look at this, the more I agreed with last Doug comment.
There is no real needs (for now) for a so complicated
[
http://issues.apache.org/jira/browse/NUTCH-139?page=comments#action_12364116 ]
Chris A. Mattmann commented on NUTCH-139:
-
Just to add to Jerome's last comment, I think the key here is simplicity. As a
software developer, and ultimately as an end
[
http://issues.apache.org/jira/browse/NUTCH-139?page=comments#action_12364125 ]
Doug Cutting commented on NUTCH-139:
I think we're near agreement here.
Here are the changes I think this patch still needs:
MetadataNames belongs in the protocol
Doug Cutting (JIRA) wrote:
[ http://issues.apache.org/jira/browse/NUTCH-139?page=comments#action_12364125 ]
My apologies for commenting here - JIRA produces broken HTML for me, I
can't use it...
Doug Cutting commented on NUTCH-139:
I think
Andrzej Bialecki wrote:
Erhm.. please bear with me. I'd rather see these two classes in a
separate package altogether, org.apache.nutch.metadata. The reason is
that most likely these two classes will be used elsewhere too, not just
in the protocol and parse/fetch related context. I'm
[
http://issues.apache.org/jira/browse/NUTCH-139?page=comments#action_12363942 ]
Andrzej Bialecki commented on NUTCH-139:
-
Yes, this should work ok ... but it strikes me as unnecessarily complicated.
After all, in most cases we will have single
[
http://issues.apache.org/jira/browse/NUTCH-139?page=comments#action_12363834 ]
Jerome Charron commented on NUTCH-139:
--
Andrzej,
I really don't like this X-Nutch naming convention. First it's really
protocol level oriented, and it forces to map
[
http://issues.apache.org/jira/browse/NUTCH-139?page=comments#action_12363394 ]
Andrzej Bialecki commented on NUTCH-139:
-
Yes, I agree with the split into a generic MetaData container, and subclasses
that define necessary constants for metadata
[
http://issues.apache.org/jira/browse/NUTCH-139?page=comments#action_12363352 ]
Chris A. Mattmann commented on NUTCH-139:
-
Hi Jerome,
org.apache.nutch.parse.ParseData
* The constructor becomes ParseData(ParseStatus, String, Outlink[],
[
http://issues.apache.org/jira/browse/NUTCH-139?page=comments#action_12362618 ]
Jerome Charron commented on NUTCH-139:
--
Here is a new proposal for this issue.
org.apache.nutch.util.MetaData
* becomes an utility class that is only a container of
[
http://issues.apache.org/jira/browse/NUTCH-139?page=comments#action_12362242 ]
Doug Cutting commented on NUTCH-139:
We can just use different names, rather than two metaData objects: X-nutch
names for derived or other values that are usually protocol
[
http://issues.apache.org/jira/browse/NUTCH-139?page=comments#action_12362049 ]
Andrzej Bialecki commented on NUTCH-139:
-
I see three issues here:
* using standard metadata names and handling misspelles/erroneous ones: this
patch provides this
[
http://issues.apache.org/jira/browse/NUTCH-139?page=comments#action_12362061 ]
Jerome Charron commented on NUTCH-139:
--
I agree with your analysis Andrzej.
I suggested to commit this patch because it is a response to this issue:
standard metadata
[
http://issues.apache.org/jira/browse/NUTCH-139?page=comments#action_12361994 ]
Doug Cutting commented on NUTCH-139:
Jerome,
Some HTTP headers have multiple values. Correctly reflecting that was I
thought the primary motivation for adding multiple
[
http://issues.apache.org/jira/browse/NUTCH-139?page=comments#action_12362003 ]
Doug Cutting commented on NUTCH-139:
Also, since the primary use of multiple metadata values should be for protocols
where multiple-values are required, the method to add
[
http://issues.apache.org/jira/browse/NUTCH-139?page=comments#action_12361924 ]
Chris A. Mattmann commented on NUTCH-139:
-
Hi Doug,
While it's true that content-length can be computed from the Content's data,
wouldn't it also be nice to have it
[
http://issues.apache.org/jira/browse/NUTCH-139?page=comments#action_12361926 ]
Chris A. Mattmann commented on NUTCH-139:
-
Hi Doug,
While it's true that content-length can be computed from the Content's data,
wouldn't it also be nice to have it
[
http://issues.apache.org/jira/browse/NUTCH-139?page=comments#action_12361927 ]
Chris A. Mattmann commented on NUTCH-139:
-
Hi Doug,
While it's true that content-length can be computed from the Content's data,
wouldn't it also be nice to have it
.
-Original Message-
From: Doug Cutting (JIRA) [mailto:[EMAIL PROTECTED]
Sent: Thursday, January 05, 2006 8:04 PM
To: nutch-dev@incubator.apache.org
Subject: [jira] Commented: (NUTCH-139) Standard metadata property names in
the ParseData metadata
[ http://issues.apache.org/jira
.
-Original Message-
From: Doug Cutting (JIRA) [mailto:[EMAIL PROTECTED]
Sent: Thursday, January 05, 2006 8:04 PM
To: nutch-dev@incubator.apache.org
Subject: [jira] Commented: (NUTCH-139) Standard metadata property names in
the ParseData metadata
[ http://issues.apache.org/jira
[
http://issues.apache.org/jira/browse/NUTCH-139?page=comments#action_12361041 ]
Jerome Charron commented on NUTCH-139:
--
Ok, Chris and me will implement MetadataNames in this way.
Just some few comments:
I plan to move the MetadataNames to a class
[
http://issues.apache.org/jira/browse/NUTCH-139?page=comments#action_12361043 ]
Andrzej Bialecki commented on NUTCH-139:
-
Regarding the move to a class with public static fields: I don't have any
problem with that.
Regarding the Levenshtein
[
http://issues.apache.org/jira/browse/NUTCH-139?page=comments#action_12361045 ]
Jerome Charron commented on NUTCH-139:
--
Andrzej,
Do you read in my mind?
Yes of course, that's the way I want to do it: First checks for the most common
cases (lower
[
http://issues.apache.org/jira/browse/NUTCH-139?page=comments#action_12360901 ]
Andrzej Bialecki commented on NUTCH-139:
-
I have an objection, in fact I think the patches miss the main point of using
of prefixed property names.
In this patch
[
http://issues.apache.org/jira/browse/NUTCH-139?page=comments#action_12360902 ]
Jerome Charron commented on NUTCH-139:
--
Andrzej,
Thanks for taking time to take a look at the patch.
In fact, we have some discussion with Chris about this point
(that's
[
http://issues.apache.org/jira/browse/NUTCH-139?page=comments#action_12360906 ]
Jerome Charron commented on NUTCH-139:
--
Andrzej,
Here are more comments about my doubts, and how to handle metadata names.
if for instance a protocol plugin doesn't have
[
http://issues.apache.org/jira/browse/NUTCH-139?page=comments#action_12360920 ]
Jerome Charron commented on NUTCH-139:
--
And why not using the fact that the ContentProperties object can now handles
multi-valued properties.
Each piece of code that
[
http://issues.apache.org/jira/browse/NUTCH-139?page=comments#action_12360929 ]
Chris A. Mattmann commented on NUTCH-139:
-
Hi Andrzej,
I have an objection, in fact I think the patches miss the main point of using
of prefixed property names.
[
http://issues.apache.org/jira/browse/NUTCH-139?page=comments#action_12360931 ]
Chris A. Mattmann commented on NUTCH-139:
-
Hmm,
Okay, I just finished reading the rest of the comments :-) Sorry, just woke up
out here in Los Angeles. Okay, I
[
http://issues.apache.org/jira/browse/NUTCH-139?page=comments#action_12360933 ]
Andrzej Bialecki commented on NUTCH-139:
-
I like Jerome's proposal of using the new ContentProperties class; this could
save a lot of work, especially this naming
[
http://issues.apache.org/jira/browse/NUTCH-139?page=comments#action_12360681 ]
Chris A. Mattmann commented on NUTCH-139:
-
Hi Doug, Jerome,
I'm confused as to why all of the constant names have X_nutch in them. I'd
expect to see something like
[
http://issues.apache.org/jira/browse/NUTCH-139?page=comments#action_12360645 ]
Doug Cutting commented on NUTCH-139:
I'm confused as to why all of the constant names have X_nutch in them. I'd
expect to see something like that in their string values,
[
http://issues.apache.org/jira/browse/NUTCH-139?page=comments#action_12360389 ]
Chris A. Mattmann commented on NUTCH-139:
-
According to Andrzej:
I agree, too. Perhaps we should use the names as they appear in the Dublin
Core for those properties
41 matches
Mail list logo