This vote passes with 3 +1s from the Lucene PMC members:
Grant Ingersoll
Sami Siren
Chris Hostetter
And 4 +1s from:
Jukka Zitting
Uwe Schindler
Chris Mattmann
Rida Benjelloun
Thanks to everyone for their input and I'll complete the related tasks this
evening.
Cheers,
Dave
On Mon, Dec 8, 2008 at 9:58 PM, Christopher Corbell
<[EMAIL PROTECTED]> wrote:
> Just to add my lurker's thoughts to this thread, for what it's worth...
>
> Nearly all of the issues raised in this thread (and in the other one I've
> been following on Dublin Core) are to me appropriate to a "middlew
>
> Yep, the mime type detection system in Tika is based on the one developed
> for Nutch primarily by Jerome Charron. Jerome worked on an update to this
> mime system, with the freedesktop.org-style interface, and then I worked to
> clean this up and get the functionality into Tika.
The basic ide
Hi Stephane,
Thanks for your email.
> I didn't know Tika mime type detection was based on freedesktop.org.
> I've also developed a mimeType detection system built on top of
> freedesktop, leveraging the shared-mime-info database to be accurate. Is
> this what you guys have done as well?
Yep, the
Hi Chris,
I didn't know Tika mime type detection was based on freedesktop.org.
I've also developed a mimeType detection system built on top of
freedesktop, leveraging the shared-mime-info database to be accurate. Is
this what you guys have done as well?
In any case, the point I was trying to m
Hi Stephane,
> This is definitely a good news. Besides very good parsers, Aperture also
> has strong support for mime type. I know we also have support for
> detecting mime types but at some point and time we may consider using
> theirs and focus solely on writing Parsers?
I would be strongly ag
FWIW, Mahout uses Confluence, and I find the experience a whole lot
more pleasurable, but +0 on either choice.
On Dec 8, 2008, at 5:03 AM, Jukka Zitting wrote:
Hi,
On Mon, Dec 8, 2008 at 2:41 AM, Mattmann, Chris A
<[EMAIL PROTECTED]> wrote:
Since I didn't see an official Tika wiki yet, I we
I think I would let things shake out a little bit with the change to a
new license. IANAL, but I think I would at least wait for a
release. It does seem to make sense, though.
Personally, though, I really like Tika's SAX model for extraction and
the, um, lack of RDF.
2 more cents...
G
Hi,
You're definitely right that there would be a mapping between a given
document and XML, via a ContentHandler, which is king of what tika does
already. This also means that metadata would be extracted from the "raw"
ContentHandler.
In any case, as you pointed out Tika might not be the best
Hi,
On Tue, Dec 9, 2008 at 12:19 PM, Stephane Bastian
<[EMAIL PROTECTED]> wrote:
> Parsing goes through several fairly well defined steps and in the case of
> Tika it could be represented as follow:
> 1) Generate Sax events out of the stream
> 2) Extracts metadata and save them in an instance of t
This is definitely a good news. Besides very good parsers, Aperture also
has strong support for mime type. I know we also have support for
detecting mime types but at some point and time we may consider using
theirs and focus solely on writing Parsers?
One problem though is that parsers return R
Hi Jukka,
This fix would definitely help me in the short run since I've got to
extends the Html parser for my specific needs. However, I'm thinking
that I may run in the same problem with another parser in a month or two.
Therefore I'm leaning toward finding a solution that would work for all
Hi,
The Aperture project (http://aperture.sourceforge.net/) has relicensed
all their code to the BSD license, see
http://sourceforge.net/forum/forum.php?forum_id=891966.
They probably have some code that we could reuse, and perhaps we also
have some valuable bits to contribute to them. The BSD li
Hi,
On Tue, Dec 9, 2008 at 8:27 AM, Stephane Bastian
<[EMAIL PROTECTED]> wrote:
> So, I wanted to know 1) if other people had trouble extending existing
> Parser? and 2) if this is an issue we should tackle?
We're of course open to contributions on issues like this, but I'm
wondering if your use
14 matches
Mail list logo