[EMAIL PROTECTED] wrote:
Hi, Andy,
Since you are at it, could it be extended to extract all available
standard properties of a ms office document? Will be very useful.
Just skimmed over poi hpsf howto. You seem to have the base code already.
Yes, I thought the same however the title was stopping me achieving my
project goals the metadata was just a nice to have - I'll put it in
tomorrow though no problem. Do you think it would be appropriate to put
the custom metadata in too?
The patch has a few problems (though not critical, but cause a bit headache):
(1) it is a reverse patch. You might want do it the other way.
Whoops - I'm working from nightlies due to network restrictions at work.
(2) 'tab' is not consistent.
Yep I'll fix this too before creating the patch.
(3) line ends with '\r\n' instead of '\n'
Sorry, forced to use Windows at work :( I'll fix this too.
Thanks,
John
On Mon, Aug 02, 2004 at 12:03:15PM +0100, Andy Hedges wrote:
This all seems to work fine. Has anyone else tried it? Any chance of a
commit on it?
Andy
Bugs item #999549, was opened at 2004-07-28 15:47
Message generated for change (Comment added) made by andyhedges
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=491356&aid=999549&group_id=59548
Category: plugin: other
Group: None
Status: Open
Resolution: None
Priority: 5
Submitted By: Andy Hedges (andyhedges)
Assigned to: Nobody/Anonymous (nobody)
Summary: MSWord document's title
Initial Comment:
MSWord document titles weren't being extracted and
stored. This patch does that by extracting the title
from the documents "properties".
----------------------------------------------------------------------
Comment By: Andy Hedges (andyhedges)
Date: 2004-07-29 09:06
Message:
Logged In: YES
user_id=583029
After doing some extensive test on this I have discovered
that occasionally Word 'Streams' don't have the
SummaryInformation documents in them. This apparently
happens when a word doc is opened in StarOffice (or I
imagine OO.o) and saved out again.
Anyway this new patch sets a timeout on the listener and if
no SummaryInformation is found sets the title to the empty
string.
This seems a bit complicated to extract a title from a
document but this maybe due to the nature of the format or
the api. Could someone who is familiar with POI and the
Apache api please comment?
----------------------------------------------------------------------
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=491356&aid=999549&group_id=59548
-------------------------------------------------------
This SF.Net email is sponsored by BEA Weblogic Workshop
FREE Java Enterprise J2EE developer tools!
Get your free copy of BEA WebLogic Workshop 8.1 today.
http://ads.osdn.com/?ad_id=4721&alloc_id=10040&op=click
_______________________________________________
Nutch-developers mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/nutch-developers
-------------------------------------------------------
This SF.Net email is sponsored by OSTG. Have you noticed the changes on
Linux.com, ITManagersJournal and NewsForge in the past few weeks? Now,
one more big change to announce. We are now OSTG- Open Source Technology
Group. Come see the changes on the new OSTG site. www.ostg.com
_______________________________________________
Nutch-developers mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/nutch-developers
__________________________________________
http://www.neasys.com - A Good Place to Be
Come to visit us today!
-------------------------------------------------------
This SF.Net email is sponsored by OSTG. Have you noticed the changes on
Linux.com, ITManagersJournal and NewsForge in the past few weeks? Now,
one more big change to announce. We are now OSTG- Open Source Technology
Group. Come see the changes on the new OSTG site. www.ostg.com
_______________________________________________
Nutch-developers mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/nutch-developers
-------------------------------------------------------
This SF.Net email is sponsored by OSTG. Have you noticed the changes on
Linux.com, ITManagersJournal and NewsForge in the past few weeks? Now,
one more big change to announce. We are now OSTG- Open Source Technology
Group. Come see the changes on the new OSTG site. www.ostg.com
_______________________________________________
Nutch-developers mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/nutch-developers