[
http://issues.apache.org/jira/browse/NUTCH-139?page=comments#action_12365623 ]
Andrzej Bialecki commented on NUTCH-139:
-
I like this patch, the split of Metadata names into interfaces looks right. +1.
> Standard metadata property names in the Par
[
http://issues.apache.org/jira/browse/NUTCH-139?page=comments#action_12365619 ]
Doug Cutting commented on NUTCH-139:
+1 This looks great. Thanks for all the hard work on this one!
> Standard metadata property names in the ParseData metadata
> ---
[
http://issues.apache.org/jira/browse/NUTCH-139?page=comments#action_12365103 ]
Jerome Charron commented on NUTCH-139:
--
> except for the sake of purity of OO approach
Andrzej, as you noticed certainly, it is my defect... ;-)
You know, I have still th
[
http://issues.apache.org/jira/browse/NUTCH-139?page=comments#action_12365098 ]
Andrzej Bialecki commented on NUTCH-139:
-
FWIW, I agree with Doug on this - I don't see that subclasses would buy us much
in terms of functionality, except for the sak
[
http://issues.apache.org/jira/browse/NUTCH-139?page=comments#action_12365095 ]
Jerome Charron commented on NUTCH-139:
--
Ok Doug. Your point of view makes sense for me.
I hope, I can provide a (final) patch for the next week.
> Standard metadata proper
[
http://issues.apache.org/jira/browse/NUTCH-139?page=comments#action_12365089 ]
Doug Cutting commented on NUTCH-139:
Jerome: yes, it makes sense, but there's also metadata that's not tightly
related to the protocol or the parser, e.g., the nutch segmen
[
http://issues.apache.org/jira/browse/NUTCH-139?page=comments#action_12365066 ]
Jerome Charron commented on NUTCH-139:
--
Sorry for this very late response...
The idea behind separate subclasses of Metadata for content and parses is to
enforce the seman
[
http://issues.apache.org/jira/browse/NUTCH-139?page=comments#action_12364242 ]
Doug Cutting commented on NUTCH-139:
I was confused about which was the latest version. (I deleted the older
versions. Is there a way to simply mark them obsolete?)
So,
[
http://issues.apache.org/jira/browse/NUTCH-139?page=comments#action_12364218 ]
Jerome Charron commented on NUTCH-139:
--
> I think we're near agreement here.
I really hope ... ;-)
> We should add an add() method to Metadata, and change set() to repl
Andrzej Bialecki wrote:
Erhm.. please bear with me. I'd rather see these two classes in a
separate package altogether, org.apache.nutch.metadata. The reason is
that most likely these two classes will be used elsewhere too, not just
in the protocol and parse/fetch related context. I'm specifical
Doug Cutting (JIRA) wrote:
[ http://issues.apache.org/jira/browse/NUTCH-139?page=comments#action_12364125 ]
My apologies for commenting here - JIRA produces broken HTML for me, I
can't use it...
Doug Cutting commented on NUTCH-139:
I think we're
[
http://issues.apache.org/jira/browse/NUTCH-139?page=comments#action_12364125 ]
Doug Cutting commented on NUTCH-139:
I think we're near agreement here.
Here are the changes I think this patch still needs:
MetadataNames belongs in the protocol package,
[
http://issues.apache.org/jira/browse/NUTCH-139?page=comments#action_12364116 ]
Chris A. Mattmann commented on NUTCH-139:
-
Just to add to Jerome's last comment, I think the key here is simplicity. As a
software developer, and ultimately as an end u
[
http://issues.apache.org/jira/browse/NUTCH-139?page=comments#action_12364112 ]
Jerome Charron commented on NUTCH-139:
--
In fact, the more I look at this, the more I agreed with last Doug comment.
There is no real needs (for now) for a so complicated m
[
http://issues.apache.org/jira/browse/NUTCH-139?page=comments#action_12363996 ]
Doug Cutting commented on NUTCH-139:
I think this is all easily handled by naming, and that we don't need another
map.
We keep using "title" and "content-type" as examples
[
http://issues.apache.org/jira/browse/NUTCH-139?page=comments#action_12363942 ]
Andrzej Bialecki commented on NUTCH-139:
-
Yes, this should work ok ... but it strikes me as unnecessarily complicated.
After all, in most cases we will have single val
[
http://issues.apache.org/jira/browse/NUTCH-139?page=comments#action_12363834 ]
Jerome Charron commented on NUTCH-139:
--
Andrzej,
I really don't like this "X-Nutch" naming convention. First it's really
protocol level oriented, and it forces to map "X-
[
http://issues.apache.org/jira/browse/NUTCH-139?page=comments#action_12363394 ]
Andrzej Bialecki commented on NUTCH-139:
-
Yes, I agree with the split into a generic MetaData container, and subclasses
that define necessary constants for metadata na
[
http://issues.apache.org/jira/browse/NUTCH-139?page=comments#action_12363352 ]
Chris A. Mattmann commented on NUTCH-139:
-
Hi Jerome,
>org.apache.nutch.parse.ParseData
> * The constructor becomes ParseData(ParseStatus, String, Outlink[],
> Con
[
http://issues.apache.org/jira/browse/NUTCH-139?page=comments#action_12362618 ]
Jerome Charron commented on NUTCH-139:
--
Here is a new proposal for this issue.
org.apache.nutch.util.MetaData
* becomes an utility class that is only a container of mult
[
http://issues.apache.org/jira/browse/NUTCH-139?page=comments#action_12362249 ]
Doug Cutting commented on NUTCH-139:
Let me try to be more concrete. I'd prefer that the X-nutch properties be
removed from MetadataNames before this is committed, and mov
[
http://issues.apache.org/jira/browse/NUTCH-139?page=comments#action_12362242 ]
Doug Cutting commented on NUTCH-139:
We can just use different names, rather than two metaData objects: X-nutch
names for derived or other values that are usually protocol
[
http://issues.apache.org/jira/browse/NUTCH-139?page=comments#action_12362061 ]
Jerome Charron commented on NUTCH-139:
--
I agree with your analysis Andrzej.
I suggested to commit this patch because it is a response to this issue:
standard metadata name
[
http://issues.apache.org/jira/browse/NUTCH-139?page=comments#action_12362049 ]
Andrzej Bialecki commented on NUTCH-139:
-
I see three issues here:
* using standard metadata names and handling misspelles/erroneous ones: this
patch provides this fu
[
http://issues.apache.org/jira/browse/NUTCH-139?page=comments#action_12362013 ]
Jerome Charron commented on NUTCH-139:
--
Doug,
The purpose of this patch is to provide some standard metadata names and to be
able to handle erroneous names, not to handle
[
http://issues.apache.org/jira/browse/NUTCH-139?page=comments#action_12362003 ]
Doug Cutting commented on NUTCH-139:
Also, since the primary use of multiple metadata values should be for protocols
where multiple-values are required, the method to add a
Chris Mattmann wrote:
I've tried removing the 5 copies of the comment, however I can't find a
button on JIRA to remove comments. Maybe an administrator for Nutch can do
it?
I removed the extra comments. No problem.
Doug
[
http://issues.apache.org/jira/browse/NUTCH-139?page=comments#action_12361994 ]
Doug Cutting commented on NUTCH-139:
Jerome,
Some HTTP headers have multiple values. Correctly reflecting that was I
thought the primary motivation for adding multiple va
ology.
> -Original Message-
> From: Doug Cutting (JIRA) [mailto:[EMAIL PROTECTED]
> Sent: Thursday, January 05, 2006 8:04 PM
> To: nutch-dev@incubator.apache.org
> Subject: [jira] Commented: (NUTCH-139) Standard metadata property names in
> the ParseData metadata
>
>
stitute of Technology.
> -Original Message-
> From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]
> Sent: Thursday, January 05, 2006 8:28 PM
> To: nutch-dev@lucene.apache.org
> Subject: RE: [jira] Commented: (NUTCH-139) Standard metadata property
> names in the ParseData
ology.
> -Original Message-
> From: Doug Cutting (JIRA) [mailto:[EMAIL PROTECTED]
> Sent: Thursday, January 05, 2006 8:04 PM
> To: nutch-dev@incubator.apache.org
> Subject: [jira] Commented: (NUTCH-139) Standard metadata property names in
> the ParseData metadata
>
>
[
http://issues.apache.org/jira/browse/NUTCH-139?page=comments#action_12361927 ]
Chris A. Mattmann commented on NUTCH-139:
-
Hi Doug,
While it's true that content-length can be computed from the Content's data,
wouldn't it also be nice to have it
[
http://issues.apache.org/jira/browse/NUTCH-139?page=comments#action_12361926 ]
Chris A. Mattmann commented on NUTCH-139:
-
Hi Doug,
While it's true that content-length can be computed from the Content's data,
wouldn't it also be nice to have it
[
http://issues.apache.org/jira/browse/NUTCH-139?page=comments#action_12361925 ]
Chris A. Mattmann commented on NUTCH-139:
-
Hi Doug,
While it's true that content-length can be computed from the Content's data,
wouldn't it also be nice to have it
[
http://issues.apache.org/jira/browse/NUTCH-139?page=comments#action_12361924 ]
Chris A. Mattmann commented on NUTCH-139:
-
Hi Doug,
While it's true that content-length can be computed from the Content's data,
wouldn't it also be nice to have it
[
http://issues.apache.org/jira/browse/NUTCH-139?page=comments#action_12361923 ]
Chris A. Mattmann commented on NUTCH-139:
-
Hi Doug,
While it's true that content-length can be computed from the Content's data,
wouldn't it also be nice to have it
[
http://issues.apache.org/jira/browse/NUTCH-139?page=comments#action_12361922 ]
Doug Cutting commented on NUTCH-139:
One more thing. Content length should also not need to be stored in the
metadata as an x-nutch value. The content length is simply th
[
http://issues.apache.org/jira/browse/NUTCH-139?page=comments#action_12361900 ]
Jerome Charron commented on NUTCH-139:
--
Doug,
This implementation is a multi-valued implementation:
1. The protocol headers are stored as-is.
2. Then correct values (guess
[
http://issues.apache.org/jira/browse/NUTCH-139?page=comments#action_12361891 ]
Doug Cutting commented on NUTCH-139:
If we store protocol headers as metadata then we should store them as-is. If
they're incorrect, then we should store the correct value
[
http://issues.apache.org/jira/browse/NUTCH-139?page=comments#action_12361889 ]
Andrzej Bialecki commented on NUTCH-139:
-
Looks good to me, +1
> Standard metadata property names in the ParseData metadata
> -
[
http://issues.apache.org/jira/browse/NUTCH-139?page=comments#action_12361045 ]
Jerome Charron commented on NUTCH-139:
--
Andrzej,
Do you read in my mind?
Yes of course, that's the way I want to do it: First checks for the most common
cases (lower case
[
http://issues.apache.org/jira/browse/NUTCH-139?page=comments#action_12361043 ]
Andrzej Bialecki commented on NUTCH-139:
-
Regarding the move to a class with public static fields: I don't have any
problem with that.
Regarding the Levenshtein dista
[
http://issues.apache.org/jira/browse/NUTCH-139?page=comments#action_12361041 ]
Jerome Charron commented on NUTCH-139:
--
Ok, Chris and me will implement MetadataNames in this way.
Just some few comments:
I plan to move the MetadataNames to a class rath
[
http://issues.apache.org/jira/browse/NUTCH-139?page=comments#action_12360933 ]
Andrzej Bialecki commented on NUTCH-139:
-
I like Jerome's proposal of using the new ContentProperties class; this could
save a lot of work, especially this naming mess
[
http://issues.apache.org/jira/browse/NUTCH-139?page=comments#action_12360931 ]
Chris A. Mattmann commented on NUTCH-139:
-
Hmm,
Okay, I just finished reading the rest of the comments :-) Sorry, just woke up
out here in Los Angeles. Okay, I thin
[
http://issues.apache.org/jira/browse/NUTCH-139?page=comments#action_12360929 ]
Chris A. Mattmann commented on NUTCH-139:
-
Hi Andrzej,
> I have an objection, in fact I think the patches miss the main point of using
> of prefixed property names.
D
[
http://issues.apache.org/jira/browse/NUTCH-139?page=comments#action_12360920 ]
Jerome Charron commented on NUTCH-139:
--
And why not using the fact that the ContentProperties object can now handles
multi-valued properties.
Each piece of code that wants
[
http://issues.apache.org/jira/browse/NUTCH-139?page=comments#action_12360909 ]
Andrzej Bialecki commented on NUTCH-139:
-
Yes, that was again the reason for prefixing - we want to keep as much of the
original metadata as we can, to facilitate vari
[
http://issues.apache.org/jira/browse/NUTCH-139?page=comments#action_12360906 ]
Jerome Charron commented on NUTCH-139:
--
Andrzej,
Here are more comments about my doubts, and how to handle metadata names.
if for instance a protocol plugin doesn't have
[
http://issues.apache.org/jira/browse/NUTCH-139?page=comments#action_12360902 ]
Jerome Charron commented on NUTCH-139:
--
Andrzej,
Thanks for taking time to take a look at the patch.
In fact, we have some discussion with Chris about this point
(that's w
[
http://issues.apache.org/jira/browse/NUTCH-139?page=comments#action_12360901 ]
Andrzej Bialecki commented on NUTCH-139:
-
I have an objection, in fact I think the patches miss the main point of using
of prefixed property names.
In this patch only
[
http://issues.apache.org/jira/browse/NUTCH-139?page=comments#action_12360681 ]
Chris A. Mattmann commented on NUTCH-139:
-
Hi Doug, Jerome,
>I'm confused as to why all of the constant names have "X_nutch" in them. I'd
>expect to see something lik
[
http://issues.apache.org/jira/browse/NUTCH-139?page=comments#action_12360659 ]
Jerome Charron commented on NUTCH-139:
--
+1 with Doug comments:
* Remove X_nutch to constants names
* Add "X-nutch-" prefix to constants values
* Move constants definitions
[
http://issues.apache.org/jira/browse/NUTCH-139?page=comments#action_12360645 ]
Doug Cutting commented on NUTCH-139:
I'm confused as to why all of the constant names have "X_nutch" in them. I'd
expect to see something like that in their string values,
[
http://issues.apache.org/jira/browse/NUTCH-139?page=comments#action_12360389 ]
Chris A. Mattmann commented on NUTCH-139:
-
According to Andrzej:
"I agree, too. Perhaps we should use the names as they appear in the Dublin
Core for those properties
55 matches
Mail list logo