Re: [lucy-user] C library:Suggester

Serkan Mulayim Mon, 15 May 2017 14:13:47 -0700

Thanks Marvin and Peter for your comments.

I tried to make the library work for Lexicons, but I am receiving a
Segfault. I believe I am not able to initialize the LexiconReader
correctly. I could not find any samples anywhere. I put my code snippet to
the end of the message. As I mentioned earlier, I simply would like to have
an access to the tokens for a specific field. Those fields do not have any
stemmers.


I have a few questions for the following code snippet. In order to create a
LexiconReader*, I create IndexReader. Then I initialize the LexReader. I am
really suspicious on what I am doing, because LexReader_init method takes a
LexReader (self) as argument and return the LexReader. In order to do this
I had to malloc a LexiconReader pointer, otherwise LexReader_init fails.
Since sizeof(lucy_LexiconReader) fails, I malloc with 10000 bytes. The
program crashes at line:
lucy_Lexicon * lexicon = LexReader_Lexicon(lexiconReader, field_str,
(cfish_Obj*) term_str);.

The lldb output for this crash is below.

Is anyone able to see what I am doing wrong here.

Thanks in advance,
Serkan

---------------------------------------------------LLDB
Output------------------------------------------------------
    frame #0: 0x00000001000036ab
testSuggester`LUCY_LexReader_Lexicon(self=0x0000000103000000,
field=0x0000000100508910, term=0x00000001005086e0) + 27 at
LexiconReader.h:275
   272 extern LUCY_VISIBLE uint32_t LUCY_LexReader_Lexicon_OFFSET;
   273 static CFISH_INLINE lucy_Lexicon*
   274 LUCY_LexReader_Lexicon(lucy_LexiconReader* self, cfish_String*
field, cfish_Obj* term) {
-> 275    const LUCY_LexReader_Lexicon_t method =
(LUCY_LexReader_Lexicon_t)cfish_obj_method(self,
LUCY_LexReader_Lexicon_OFFSET);
   276    return method(self, field, term);
   277 }
   278
(lldb) n
Process 68332 stopped
* thread #1: tid = 0x17272f5, 0x000000010000549f
testSuggester`cfish_method(klass=0x0000000000000000, offset=192) + 31 at
cfish_parcel.h:108, queue = 'com.apple.main-thread', stop reason =
EXC_BAD_ACCESS (code=1, address=0xc0)
    frame #0: 0x000000010000549f
testSuggester`cfish_method(klass=0x0000000000000000, offset=192) + 31 at
cfish_parcel.h:108
   105 cfish_method(const void *klass, uint32_t offset) {
   106    union { char *cptr; cfish_method_t *fptr; } ptr;
   107    ptr.cptr = (char*)klass + offset;
-> 108    return ptr.fptr[0];
   109 }
   110


-----------------------------------------------Code
Snippet--------------------------------------------------------
lucy_FSFolder *folder = lucy_FSFolder_new(folder_str);
lucy_IndexReader *indexReader = lucy_IxReader_open((cfish_Obj *)
folder_str, NULL, NULL);
cfish_Vector *segments = IxReader_Get_Segments(indexReader);
lucy_Snapshot *snapshot = IxReader_Get_Snapshot(indexReader);
int32_t seg_tick = IxReader_Get_Seg_Tick(indexReader);

//sizeof does not work for lexiconreader or for datareader. Put 10000 for
testing
lucy_LexiconReader * lexiconReader = (lucy_LexiconReader*) malloc(10000);
lucy_LexReader_init(lexiconReader, schema, (lucy_Folder*) folder, snapshot,
segments, seg_tick);


cfish_String *field_str = Str_newf(field);
cfish_String *term_str = Str_newf(term);
lucy_Lexicon * lexicon = LexReader_Lexicon(lexiconReader, field_str,
(cfish_Obj*) term_str);

char *out;

cfish_Obj *out_str = Lex_Get_Term(lexicon);
out = Str_To_Utf8((cfish_String*) out_str);
DECREF(out_str);
printf("%s\n", out);
free(out);
while(Lex_Next(lexicon)) {
cfish_Obj *out_str = Lex_Get_Term(lexicon);
out = Str_To_Utf8((cfish_String*) out_str);
DECREF(out_str);
printf("%s\n", out);
free(out);
}


On Wed, May 3, 2017 at 2:30 PM, Peter Karman <pe...@peknet.com> wrote:

> You might find this Perl implementation a helpful reference.
>
> https://metacpan.org/pod/LucyX::Suggester
>
> On Wed, May 3, 2017 at 3:06 PM, Serkan Mulayim <serkanmula...@gmail.com>
> wrote:
>
> > Thank you very much Marvin,
> >
> > When I type hell, I would like to get tokens starting with hell, e.g.
> > {"hell","hello","helix"}. I do not want to get documents which contain
> hell
> > token in the title. So it seems like it should be working on the tokens.
> >
> > What I need is basically to be able to iterate over all tokens which are
> > lexicographically ordered. Also I would need to sort them based on their
> > frequencies when returning the results. I guess Lexicon class,
> > https://lucy.apache.org/docs/c/Lucy/Index/Lexicon.html,  is designed for
> > this. Can you please confirm? I hope the returned results in the
> > lucy_Lex_seek contains the frequency of the terms as well.
> >
> > Thanks again,
> > Serkan
> >
> >
> >
> >
> >
> > On Tue, May 2, 2017 at 4:22 PM, Marvin Humphrey <mar...@rectangular.com>
> > wrote:
> >
> > > On Mon, May 1, 2017 at 3:55 PM, Serkan Mulayim <
> serkanmula...@gmail.com>
> > > wrote:
> > >
> > > > I am using the C library. I would like to get the suggester or
> > > autocomplete
> > > > functionality in my library. It needs to return {"hello", "hell",
> > > "hellx"}
> > > > when your query is "hell". I feel like I need to be able to read all
> > the
> > > > tokens in the whole index, and return the results based on it. I
> looked
> > > at
> > > > the indexReader for this, but I could not find any useful
> information.
> > Do
> > > > you think this is possible?
> > >
> > > Autosuggestion functionality will need tuning, just like search
> results.
> > > In
> > > fact, autosuggestion is really a specialized form of search
> application.
> > > It
> > > could be implemented with a separate index or separate fields.
> > >
> > > Say that we only wanted to offer suggestions derived from the `title`
> > > field.
> > > Split each title into an array of words.  Then for each word, index
> > > starting
> > > at some letter, say the third.  For the title `hello world`, you'd get
> > the
> > > following tokens:
> > >
> > >     hello -> hel hell hello
> > >     world -> wor worl world
> > >
> > > Then at search time, perform a search query with every keystroke.
> > >
> > >     h -> (no result)
> > >     he -> (no result)
> > >     hel -> "hello world"
> > >
> > > Once you've got basic functionality running, experiment with minimum
> > token
> > > length, adding Soundex/Metaphone, performing character normalization,
> > etc.
> > >
> > > Marvin Humphrey
> > >
> >
>
>
>
> --
> Peter Karman . https://peknet.com/ <http://peknet.com/> .
> https://keybase.io/peterkarman
>

Re: [lucy-user] C library:Suggester

Reply via email to