Hi, Andy,

Just applied msword-patch05.txt. Thanks.

On Thu, Aug 05, 2004 at 10:53:06AM +0100, Andy Hedges wrote:
> I stored the dates in millis since the epoch which is probably the 
> easiest to manipulate to human readable form (I thinking of the commons 
> taglibs and so on). I'm happy to add hypens to the property names, want 
> another patch for that?

Does msword provide timezone info? 
To record raw millis or a formatted one is always a question.

Will be good if you can send another one with hypens.

> 
> As for metadata it seems to me that there are three sets of metadata:
> 
> 1.) Protocol metadata
> 2.) Media-type metadata (MS Doc properties, HTML meta tags (Dublin Core 
> and so on)
> 3.) Stuff we add (e.g. truncate=true).

Yes. I am thinking the same. Just never got enough incentive from
my own projects to really do it. Maybe you can make a proposal on this
based on your experience?

> 
> Would it be worth representing it this way with three separate 
> properties object.
> 
> We could go further and have a hashmap of properties files and have 
> unlimited metadata sets but I would rather wait and see if we need it.
> 
A simpler alternative is to assume everything is string, and
use prefix to create name space. There will be no change to
current ParseData.

John


-------------------------------------------------------
This SF.Net email is sponsored by OSTG. Have you noticed the changes on
Linux.com, ITManagersJournal and NewsForge in the past few weeks? Now,
one more big change to announce. We are now OSTG- Open Source Technology
Group. Come see the changes on the new OSTG site. www.ostg.com
_______________________________________________
Nutch-developers mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/nutch-developers

Reply via email to