FYI, I just updated the textmining.org homepage with the following info.
The tm-extractors library has a new release! v1.0. You can download it here:
http://text-mining.googlecode.com/files/tm-extractors-1.0.jar
The tm-extractors library is a pure java library for extracting text
from Word docum
The textmining library (textmining.org) for Word docs should work fine
with non-english text as well. Let me know if it doesn't
On 8/2/07, Ben Litchfield <[EMAIL PROTECTED]> wrote:
> In terms of PDF documents...
>
> PDFBox should work just fine with any latin based languages; at this
> time certai
I was playing around with MoreLikeThis and I noticed the problems you
are talking about as well.
One idea I thought of was for MoreLikeThis to focus only on proper
nouns for the purposes of similarity or give a significant boost to
those. Pretty much the same idea you had in #1.
I found a list o
The 512 byte thing is a limitation of POIFS I think. I could be wrong
though. Have you tried opening the file with just POIFS?
On 3/26/07, Antony Bowesman <[EMAIL PROTECTED]> wrote:
Ryan Ackley wrote:
> Yes I do have plans for adding fast save support and support for more
> file
the rich formatting.
On 3/26/07, jafarim <[EMAIL PROTECTED]> wrote:
Good to know that your devised commercial feature is already offered by
Enhydra Snapper as an open-source feature.
Check here: http://www.enhydra.org/apps/snapper/index.html
On 3/26/07, Ryan Ackley <[EMAIL PROTECTE
so handles a greater variety of files.
Ryan, thanks for fixing your site. Do you have any plans/ideas on how to parse
the 'fast-saved' files and any ideas on Word files older than the Word 6 format?
Regards
Antony
Ryan Ackley wrote:
> As the author of both Word POI and
to on this is in the "Lucene in Action"
book.
On 3/24/07, jafarim <[EMAIL PROTECTED]> wrote:
Can anyone make a comparison between the two, namely POI API and the one
from textmining.org?
On 3/24/07, Ryan Ackley <[EMAIL PROTECTED]> wrote:
>
> The site is down but you c
The site is down but you can download the word extractor library direct here:
http://www.textmining.org/textmining.zip
Going to fix the site this weekend.
On 3/24/07, Sami Siren <[EMAIL PROTECTED]> wrote:
Antony Bowesman wrote:
>> Are there other sollutions?
There's also antiword [1] which c
[EMAIL PROTECTED]> wrote:
Last I remember, it was being voted on by the Incubator committee.
Good to hear TextMining is back in action! Does that mean you are
back on POI Word again too?
-Grant
On Mar 20, 2007, at 10:35 PM, Ryan Ackley wrote:
> Someone pointed me there already. Looks interes
http://wiki.apache.org/incubator/TikaProposal
Better home for your lib, perhaps?
Otis
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Simpy -- http://www.simpy.com/ - Tag - Search - Share
- Original Message
From: Ryan Ackley <[EMAIL PROTECTED]>
T
I've been out of the loop for a while. I just saw this recent thread
and re-subscribed to the list.
In the next month or two I will be able to put some time into the
textmining library. Fast saved files are on the list of improvements
as well as other features that have been requested. I would al
Michael,
Cool, looks nice. I downloaded the distribution and I notice that you are
using several bsd-licensed libraries besides lucene including the
textmining.org library. I couldn't find any acknowledgement of those
libraries in your documentation. The Apache 2.0 license lets you just
inclu
12 matches
Mail list logo