Bugs item #999549, was opened at 2004-07-28 15:47 Message generated for change (Comment added) made by andyhedges You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=491356&aid=999549&group_id=59548
Category: plugin: other Group: None Status: Open Resolution: None Priority: 5 Submitted By: Andy Hedges (andyhedges) Assigned to: Nobody/Anonymous (nobody) Summary: MSWord document's title Initial Comment: MSWord document titles weren't being extracted and stored. This patch does that by extracting the title from the documents "properties". ---------------------------------------------------------------------- >Comment By: Andy Hedges (andyhedges) Date: 2004-07-29 09:06 Message: Logged In: YES user_id=583029 After doing some extensive test on this I have discovered that occasionally Word 'Streams' don't have the SummaryInformation documents in them. This apparently happens when a word doc is opened in StarOffice (or I imagine OO.o) and saved out again. Anyway this new patch sets a timeout on the listener and if no SummaryInformation is found sets the title to the empty string. This seems a bit complicated to extract a title from a document but this maybe due to the nature of the format or the api. Could someone who is familiar with POI and the Apache api please comment? ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=491356&aid=999549&group_id=59548 ------------------------------------------------------- This SF.Net email is sponsored by BEA Weblogic Workshop FREE Java Enterprise J2EE developer tools! Get your free copy of BEA WebLogic Workshop 8.1 today. http://ads.osdn.com/?ad_id=4721&alloc_id=10040&op=click _______________________________________________ Nutch-developers mailing list [EMAIL PROTECTED] https://lists.sourceforge.net/lists/listinfo/nutch-developers
