Thanks Marvin and Peter for your comments. I tried to make the library work for Lexicons, but I am receiving a Segfault. I believe I am not able to initialize the LexiconReader correctly. I could not find any samples anywhere. I put my code snippet to the end of the message. As I mentioned earlier, I simply would like to have an access to the tokens for a specific field. Those fields do not have any stemmers.
I have a few questions for the following code snippet. In order to create a LexiconReader*, I create IndexReader. Then I initialize the LexReader. I am really suspicious on what I am doing, because LexReader_init method takes a LexReader (self) as argument and return the LexReader. In order to do this I had to malloc a LexiconReader pointer, otherwise LexReader_init fails. Since sizeof(lucy_LexiconReader) fails, I malloc with 10000 bytes. The program crashes at line: lucy_Lexicon * lexicon = LexReader_Lexicon(lexiconReader, field_str, (cfish_Obj*) term_str);. The lldb output for this crash is below. Is anyone able to see what I am doing wrong here. Thanks in advance, Serkan ---------------------------------------------------LLDB Output------------------------------------------------------ frame #0: 0x00000001000036ab testSuggester`LUCY_LexReader_Lexicon(self=0x0000000103000000, field=0x0000000100508910, term=0x00000001005086e0) + 27 at LexiconReader.h:275 272 extern LUCY_VISIBLE uint32_t LUCY_LexReader_Lexicon_OFFSET; 273 static CFISH_INLINE lucy_Lexicon* 274 LUCY_LexReader_Lexicon(lucy_LexiconReader* self, cfish_String* field, cfish_Obj* term) { -> 275 const LUCY_LexReader_Lexicon_t method = (LUCY_LexReader_Lexicon_t)cfish_obj_method(self, LUCY_LexReader_Lexicon_OFFSET); 276 return method(self, field, term); 277 } 278 (lldb) n Process 68332 stopped * thread #1: tid = 0x17272f5, 0x000000010000549f testSuggester`cfish_method(klass=0x0000000000000000, offset=192) + 31 at cfish_parcel.h:108, queue = 'com.apple.main-thread', stop reason = EXC_BAD_ACCESS (code=1, address=0xc0) frame #0: 0x000000010000549f testSuggester`cfish_method(klass=0x0000000000000000, offset=192) + 31 at cfish_parcel.h:108 105 cfish_method(const void *klass, uint32_t offset) { 106 union { char *cptr; cfish_method_t *fptr; } ptr; 107 ptr.cptr = (char*)klass + offset; -> 108 return ptr.fptr[0]; 109 } 110 -----------------------------------------------Code Snippet-------------------------------------------------------- lucy_FSFolder *folder = lucy_FSFolder_new(folder_str); lucy_IndexReader *indexReader = lucy_IxReader_open((cfish_Obj *) folder_str, NULL, NULL); cfish_Vector *segments = IxReader_Get_Segments(indexReader); lucy_Snapshot *snapshot = IxReader_Get_Snapshot(indexReader); int32_t seg_tick = IxReader_Get_Seg_Tick(indexReader); //sizeof does not work for lexiconreader or for datareader. Put 10000 for testing lucy_LexiconReader * lexiconReader = (lucy_LexiconReader*) malloc(10000); lucy_LexReader_init(lexiconReader, schema, (lucy_Folder*) folder, snapshot, segments, seg_tick); cfish_String *field_str = Str_newf(field); cfish_String *term_str = Str_newf(term); lucy_Lexicon * lexicon = LexReader_Lexicon(lexiconReader, field_str, (cfish_Obj*) term_str); char *out; cfish_Obj *out_str = Lex_Get_Term(lexicon); out = Str_To_Utf8((cfish_String*) out_str); DECREF(out_str); printf("%s\n", out); free(out); while(Lex_Next(lexicon)) { cfish_Obj *out_str = Lex_Get_Term(lexicon); out = Str_To_Utf8((cfish_String*) out_str); DECREF(out_str); printf("%s\n", out); free(out); } On Wed, May 3, 2017 at 2:30 PM, Peter Karman <pe...@peknet.com> wrote: > You might find this Perl implementation a helpful reference. > > https://metacpan.org/pod/LucyX::Suggester > > On Wed, May 3, 2017 at 3:06 PM, Serkan Mulayim <serkanmula...@gmail.com> > wrote: > > > Thank you very much Marvin, > > > > When I type hell, I would like to get tokens starting with hell, e.g. > > {"hell","hello","helix"}. I do not want to get documents which contain > hell > > token in the title. So it seems like it should be working on the tokens. > > > > What I need is basically to be able to iterate over all tokens which are > > lexicographically ordered. Also I would need to sort them based on their > > frequencies when returning the results. I guess Lexicon class, > > https://lucy.apache.org/docs/c/Lucy/Index/Lexicon.html, is designed for > > this. Can you please confirm? I hope the returned results in the > > lucy_Lex_seek contains the frequency of the terms as well. > > > > Thanks again, > > Serkan > > > > > > > > > > > > On Tue, May 2, 2017 at 4:22 PM, Marvin Humphrey <mar...@rectangular.com> > > wrote: > > > > > On Mon, May 1, 2017 at 3:55 PM, Serkan Mulayim < > serkanmula...@gmail.com> > > > wrote: > > > > > > > I am using the C library. I would like to get the suggester or > > > autocomplete > > > > functionality in my library. It needs to return {"hello", "hell", > > > "hellx"} > > > > when your query is "hell". I feel like I need to be able to read all > > the > > > > tokens in the whole index, and return the results based on it. I > looked > > > at > > > > the indexReader for this, but I could not find any useful > information. > > Do > > > > you think this is possible? > > > > > > Autosuggestion functionality will need tuning, just like search > results. > > > In > > > fact, autosuggestion is really a specialized form of search > application. > > > It > > > could be implemented with a separate index or separate fields. > > > > > > Say that we only wanted to offer suggestions derived from the `title` > > > field. > > > Split each title into an array of words. Then for each word, index > > > starting > > > at some letter, say the third. For the title `hello world`, you'd get > > the > > > following tokens: > > > > > > hello -> hel hell hello > > > world -> wor worl world > > > > > > Then at search time, perform a search query with every keystroke. > > > > > > h -> (no result) > > > he -> (no result) > > > hel -> "hello world" > > > > > > Once you've got basic functionality running, experiment with minimum > > token > > > length, adding Soundex/Metaphone, performing character normalization, > > etc. > > > > > > Marvin Humphrey > > > > > > > > > -- > Peter Karman . https://peknet.com/ <http://peknet.com/> . > https://keybase.io/peterkarman >