Bugs item #999549, was opened at 2004-07-28 15:47
Message generated for change (Comment added) made by andyhedges
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=491356&aid=999549&group_id=59548

Category: plugin: other
Group: None
Status: Open
Resolution: None
Priority: 5
Submitted By: Andy Hedges (andyhedges)
Assigned to: Nobody/Anonymous (nobody)
Summary: MSWord document's title

Initial Comment:
MSWord document titles weren't being extracted and
stored. This patch does that by extracting the title
from the documents "properties".



----------------------------------------------------------------------

>Comment By: Andy Hedges (andyhedges)
Date: 2004-07-29 09:06

Message:
Logged In: YES 
user_id=583029

After doing some extensive test on this I have discovered
that occasionally Word 'Streams' don't have the
SummaryInformation documents in them. This apparently
happens when a word doc is opened in StarOffice (or I
imagine OO.o) and saved out again.

Anyway this new patch sets a timeout on the listener and if
no SummaryInformation is found sets the title to the empty
string.

This seems a bit complicated to extract a title from a
document but this maybe due to the nature of the format or
the api. Could someone who is familiar with POI and the
Apache api please comment?

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=491356&aid=999549&group_id=59548


-------------------------------------------------------
This SF.Net email is sponsored by BEA Weblogic Workshop
FREE Java Enterprise J2EE developer tools!
Get your free copy of BEA WebLogic Workshop 8.1 today.
http://ads.osdn.com/?ad_id=4721&alloc_id=10040&op=click
_______________________________________________
Nutch-developers mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/nutch-developers

Reply via email to