Max - Welcome!!!

I'm literally sitting on the edge of my seat anxious for a viable Ruby Lucene, so I applaud your efforts.

I'm very keen on the GCJ/SWIG approach so that the Ruby version can stay in sync with the Java version simply by running the build process, very much like PyLucene does. A began a native Ruby port once upon a time myself (rucene and rubylucene at RubyForge, with very little code but some basic file I/O actually out there) and I dropped it once I saw PyLucene and how well it performed.

Hopefully you'd be interested in assisting with the nascent effort under way here, or if you come up with something on your own and would like to contribute it to Apache to live along side Java Lucene, we'd welcome it.

    Erik


On Jul 7, 2005, at 11:48 AM, Max Nickel wrote:

Hi all,
Miles Barr and Erik Hatcher posted on my webby and since i wanted to get
in contact with you sooner or later anyway, i'm doing it now :).
Originaly i wanted to wait until i have some more quality code, but
well...
(So if you read something like "it's working" or "i ported", don't take
this to literal please ;))

So let me introduce myself first, i'm Max Nickel and am working on a
project i called Rise, what tries to be a ruby implementation of Lucene. I just read some of the recent posts on this mailing list, and it seems that you are concentrating your efforts on getting it done with SWIG, so
i don't know if what i did will be of much use for you.
I took a different approach and first tried a pure ruby implementation. This was rise-0.1.1 what you also can get on rise.rubyforge.org or on my
outdated Arch repo. At this stage everything was still very buggy and
nowhere what you can call working, but i had enough working to see that
pure ruby simply is unacceptable slow (i expected this to happen
anyway).
So at this point i decided to port some of the more important parts in
terms of performance to C. I know that this might not be the best
approach when you care about portability or deployment, but i felt that
if you want to do something different then indexing your adressbook it
was necessary.
Right now i have ported following classes either complete or parts of it
as Mixins:
FS/RAM-IO, Tokenizers upto LowerCaseTokenizer, Term, TermBuffer, Token,
QuickSort, HeapSort, TermInfosWriter#add and #write,
DocumentWriter#writePostings and #addPosition, and SegmentTermEnum +
some helper classes.

The C implementations doesn't use any different headers then ruby.h or
rubyio.h (only once sys/stdlib.h is needed in fsio.c), so everywhere
where ruby compiles, rise should compile also.
Also nearly all classes except the IO ones, aren't pure C, but make use
of ruby's C functions like rb_ivar_*, rb_funcall etc.

As i wrote in an email to Miles Barr earlier, here are some very rough
indexing stats:
/usr/src/linux of a recent 2.6.12 kernel takes on my machine
with Lucene ~4 Minutes
with Rise in pure ruby > 60 Minutes
with my current Rise/C impl ~20 Minutes.

The current status is unfortunatly broken, since somewhere on my recent
changes i made some stupid mistake and keep getting "Docs out of
order"-Exceptions when merging segments. I havent had much time on my
hand lately to hunt this bug, but i hope it will be the last major one
before 0.1.2 release (except that the searching side is broken as it
isnt updated to the changes i made yet).

Since i was tired of GNU/Archs UI and switched to monotone you also cant get my current sources. But when i managed to setup my local server i'll
let you know.

kind regards,
/max



  • rise Max Nickel
    • Re: rise Erik Hatcher

Reply via email to