Re: rise

Erik Hatcher Fri, 08 Jul 2005 18:33:10 -0700

Max - Welcome!!!

I'm literally sitting on the edge of my seat anxious for a viableRuby Lucene, so I applaud your efforts.

I'm very keen on the GCJ/SWIG approach so that the Ruby version canstay in sync with the Java version simply by running the buildprocess, very much like PyLucene does. A began a native Ruby portonce upon a time myself (rucene and rubylucene at RubyForge, withvery little code but some basic file I/O actually out there) and Idropped it once I saw PyLucene and how well it performed.

Hopefully you'd be interested in assisting with the nascent effortunder way here, or if you come up with something on your own andwould like to contribute it to Apache to live along side Java Lucene,we'd welcome it.


    Erik


On Jul 7, 2005, at 11:48 AM, Max Nickel wrote:

Hi all,
Miles Barr and Erik Hatcher posted on my webby and since i wantedto get
in contact with you sooner or later anyway, i'm doing it now :).
Originaly i wanted to wait until i have some more quality code, but
well...
(So if you read something like "it's working" or "i ported", don'ttake
this to literal please ;))

So let me introduce myself first, i'm Max Nickel and am working on a
project i called Rise, what tries to be a ruby implementation ofLucene.I just read some of the recent posts on this mailing list, and itseemsthat you are concentrating your efforts on getting it done withSWIG, so
i don't know if what i did will be of much use for you.
I took a different approach and first tried a pure rubyimplementation.This was rise-0.1.1 what you also can get on rise.rubyforge.org oron my
outdated Arch repo. At this stage everything was still very buggy and
nowhere what you can call working, but i had enough working to seethat
pure ruby simply is unacceptable slow (i expected this to happen
anyway).
So at this point i decided to port some of the more important parts in
terms of performance to C. I know that this might not be the best
approach when you care about portability or deployment, but i feltthat
if you want to do something different then indexing your adressbook it
was necessary.
Right now i have ported following classes either complete or partsof it
as Mixins:
FS/RAM-IO, Tokenizers upto LowerCaseTokenizer, Term, TermBuffer,Token,
QuickSort, HeapSort, TermInfosWriter#add and #write,
DocumentWriter#writePostings and #addPosition, and SegmentTermEnum +
some helper classes.

The C implementations doesn't use any different headers then ruby.h or
rubyio.h (only once sys/stdlib.h is needed in fsio.c), so everywhere
where ruby compiles, rise should compile also.
Also nearly all classes except the IO ones, aren't pure C, but makeuse
of ruby's C functions like rb_ivar_*, rb_funcall etc.

As i wrote in an email to Miles Barr earlier, here are some very rough
indexing stats:
/usr/src/linux of a recent 2.6.12 kernel takes on my machine
with Lucene ~4 Minutes
with Rise in pure ruby > 60 Minutes
with my current Rise/C impl ~20 Minutes.
The current status is unfortunatly broken, since somewhere on myrecent
changes i made some stupid mistake and keep getting "Docs out of
order"-Exceptions when merging segments. I havent had much time on my
hand lately to hunt this bug, but i hope it will be the last major one
before 0.1.2 release (except that the searching side is broken as it
isnt updated to the changes i made yet).
Since i was tired of GNU/Archs UI and switched to monotone you alsocantget my current sources. But when i managed to setup my local serveri'll
let you know.

kind regards,
/max

Re: rise

Reply via email to