Bugs item #999549, was opened at 2004-07-28 15:47
Message generated for change (Settings changed) made by johnnx
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=491356&aid=999549&group_id=59548

Category: plugin: other
Group: None
>Status: Closed
Resolution: None
Priority: 5
Submitted By: Andy Hedges (andyhedges)
Assigned to: Nobody/Anonymous (nobody)
Summary: MSWord document's title

Initial Comment:
MSWord document titles weren't being extracted and
stored. This patch does that by extracting the title
from the documents "properties".



----------------------------------------------------------------------

Comment By: Andy Hedges (andyhedges)
Date: 2004-08-05 10:44

Message:
Logged In: YES 
user_id=583029

altered to take on board some feedback regarding patch file
creation.

----------------------------------------------------------------------

Comment By: Andy Hedges (andyhedges)
Date: 2004-08-03 16:32

Message:
Logged In: YES 
user_id=583029

Removed some unnecessary debug.

----------------------------------------------------------------------

Comment By: Andy Hedges (andyhedges)
Date: 2004-08-03 15:41

Message:
Logged In: YES 
user_id=583029

Updated to neaten patch file and to include all MS Word
properties.

----------------------------------------------------------------------

Comment By: Andy Hedges (andyhedges)
Date: 2004-07-29 09:06

Message:
Logged In: YES 
user_id=583029

After doing some extensive test on this I have discovered
that occasionally Word 'Streams' don't have the
SummaryInformation documents in them. This apparently
happens when a word doc is opened in StarOffice (or I
imagine OO.o) and saved out again.

Anyway this new patch sets a timeout on the listener and if
no SummaryInformation is found sets the title to the empty
string.

This seems a bit complicated to extract a title from a
document but this maybe due to the nature of the format or
the api. Could someone who is familiar with POI and the
Apache api please comment?

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=491356&aid=999549&group_id=59548


-------------------------------------------------------
This SF.Net email is sponsored by OSTG. Have you noticed the changes on
Linux.com, ITManagersJournal and NewsForge in the past few weeks? Now,
one more big change to announce. We are now OSTG- Open Source Technology
Group. Come see the changes on the new OSTG site. www.ostg.com
_______________________________________________
Nutch-developers mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/nutch-developers

Reply via email to