Hi all, Miles Barr and Erik Hatcher posted on my webby and since i wanted to get in contact with you sooner or later anyway, i'm doing it now :). Originaly i wanted to wait until i have some more quality code, but well... (So if you read something like "it's working" or "i ported", don't take this to literal please ;))
So let me introduce myself first, i'm Max Nickel and am working on a project i called Rise, what tries to be a ruby implementation of Lucene. I just read some of the recent posts on this mailing list, and it seems that you are concentrating your efforts on getting it done with SWIG, so i don't know if what i did will be of much use for you. I took a different approach and first tried a pure ruby implementation. This was rise-0.1.1 what you also can get on rise.rubyforge.org or on my outdated Arch repo. At this stage everything was still very buggy and nowhere what you can call working, but i had enough working to see that pure ruby simply is unacceptable slow (i expected this to happen anyway). So at this point i decided to port some of the more important parts in terms of performance to C. I know that this might not be the best approach when you care about portability or deployment, but i felt that if you want to do something different then indexing your adressbook it was necessary. Right now i have ported following classes either complete or parts of it as Mixins: FS/RAM-IO, Tokenizers upto LowerCaseTokenizer, Term, TermBuffer, Token, QuickSort, HeapSort, TermInfosWriter#add and #write, DocumentWriter#writePostings and #addPosition, and SegmentTermEnum + some helper classes. The C implementations doesn't use any different headers then ruby.h or rubyio.h (only once sys/stdlib.h is needed in fsio.c), so everywhere where ruby compiles, rise should compile also. Also nearly all classes except the IO ones, aren't pure C, but make use of ruby's C functions like rb_ivar_*, rb_funcall etc. As i wrote in an email to Miles Barr earlier, here are some very rough indexing stats: /usr/src/linux of a recent 2.6.12 kernel takes on my machine with Lucene ~4 Minutes with Rise in pure ruby > 60 Minutes with my current Rise/C impl ~20 Minutes. The current status is unfortunatly broken, since somewhere on my recent changes i made some stupid mistake and keep getting "Docs out of order"-Exceptions when merging segments. I havent had much time on my hand lately to hunt this bug, but i hope it will be the last major one before 0.1.2 release (except that the searching side is broken as it isnt updated to the changes i made yet). Since i was tired of GNU/Archs UI and switched to monotone you also cant get my current sources. But when i managed to setup my local server i'll let you know. kind regards, /max
