[lucy-issues] [jira] [Commented] (LUCY-326) C lib: Possible memory leak in SnowStemmer when provided schema for the indexer is not DECREFFED
[ https://issues.apache.org/jira/browse/LUCY-326?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16363118#comment-16363118 ] Serkan Mulayim commented on LUCY-326: - Hi [~nwellnhof], yes you are right that there is no memory leak when i call the destroy method. But there might be cases where the destroy code is not called such as crashes. I just wanted to make sure that in those cases I only see an increase in the "Still Reachable" block in the valgrind output. I believe OS will not be able to resolve this in such cases. Thanks anyways. > C lib: Possible memory leak in SnowStemmer when provided schema for the > indexer is not DECREFFED > > > Key: LUCY-326 > URL: https://issues.apache.org/jira/browse/LUCY-326 > Project: Lucy > Issue Type: Bug > Components: C bindings >Affects Versions: 0.6.1 > Environment: linux >Reporter: Serkan Mulayim >Priority: Major > > In my C library I create a static global struct (which contains some runtime > variables as well as lucy_Schema pointer) which is created when the program > is loaded. There is also a destroy function which cleans up (also DECREFs > the schema) the runtime data. When I index some documents by providing this > schema to the indexer, and call destroy function before the program (using > the lib) exits, I do not see any memory leaks in the valgrind output. I only > see (still reachable has some non-zero values due to lucy_bootstrap_parcel > function). > On the other hand if I do not call the destroy function before the exit, I > would expect to see only an increase in "still reachable" block in valgrind > output, but I also see "possibly lost" as following: > --- > ==16942== 70 bytes in 1 blocks are possibly lost in loss record 147 of 178 > ==16942== at 0x4C29B78: realloc (vg_replace_malloc.c:785) > ==16942== by 0x4F86CC4: increase_size (utilities.c:332) > ==16942== by 0x4F87865: replace_s (utilities.c:360) > ==16942== by 0x4EF4195: SN_set_current (api.c:62) > ==16942== by 0x4F44644: sb_stemmer_stem (libstemmer_utf8.c:80) > ==16942== by 0x4F65723: LUCY_SnowStemmer_Transform_IMP (SnowballStemmer.c:80) > ==16942== by 0x4F4FA69: LUCY_Analyzer_Transform (Analyzer.h:197) > ==16942== by 0x4F4FA69: LUCY_PolyAnalyzer_Transform_Text_IMP > (PolyAnalyzer.c:110) > ==16942== by 0x4F15368: LUCY_Analyzer_Transform_Text (Analyzer.h:204) > ==16942== by 0x4F15368: LUCY_Inverter_Add_Field_IMP (Inverter.c:181) > ==16942== by 0x4F14E91: LUCY_Inverter_Add_Field (Inverter.h:296) > ==16942== by 0x4F14E91: LUCY_Inverter_Invert_Doc_IMP (Inverter.c:109) > ==16942== by 0x4F63164: LUCY_Inverter_Invert_Doc (Inverter.h:275) > ==16942== by 0x4F63164: LUCY_SegWriter_Add_Doc_IMP (SegWriter.c:109) > ==16942== by 0x4F7E069: LUCY_Indexer_Add_Doc (Indexer.h:260) > ==16942== by 0x4F7F23F: index_messages_json (Search.c:432) > ==16942== > ==16942== LEAK SUMMARY: > ==16942== definitely lost: 0 bytes in 0 blocks > ==16942== indirectly lost: 0 bytes in 0 blocks > ==16942== possibly lost: 70 bytes in 1 blocks > ==16942== still reachable: 246,683 bytes in 5,077 blocks > ==16942== suppressed: 0 bytes in 0 blocks > --- > Similarly for another program where I do only search (not indexing), I see > the similar behaviour. Valgrind output is below for that one: > - > ==16949== > ==16949== HEAP SUMMARY: > ==16949== in use at exit: 229,312 bytes in 5,061 blocks > ==16949== total heap usage: 34,993 allocs, 29,932 frees, 1,791,083 bytes > allocated > ==16949== > ==16949== 37 bytes in 1 blocks are possibly lost in loss record 96 of 177 > ==16949== at 0x4C29B78: realloc (vg_replace_malloc.c:785) > ==16949== by 0x4F86CC4: increase_size (utilities.c:332) > ==16949== by 0x4F87865: replace_s (utilities.c:360) > ==16949== by 0x4EF4195: SN_set_current (api.c:62) > ==16949== by 0x4F44644: sb_stemmer_stem (libstemmer_utf8.c:80) > ==16949== by 0x4F65723: LUCY_SnowStemmer_Transform_IMP (SnowballStemmer.c:80) > ==16949== by 0x4F4FA69: LUCY_Analyzer_Transform (Analyzer.h:197) > ==16949== by 0x4F4FA69: LUCY_PolyAnalyzer_Transform_Text_IMP > (PolyAnalyzer.c:110) > ==16949== by 0x4EF35F3: LUCY_Analyzer_Transform_Text (Analyzer.h:204) > ==16949== by 0x4EF35F3: LUCY_Analyzer_Split_IMP (Analyzer.c:48) > ==16949== by 0x4F5AAC8: LUCY_Analyzer_Split (Analyzer.h:211) > ==16949== by 0x4F5AAC8: LUCY_QParser_Expand_Leaf_IMP (QueryParser.c:916) > ==16949== by 0x4F59ECA: LUCY_QParser_Expand (QueryParser.h:298) > ==16949== by
[lucy-issues] [jira] [Commented] (LUCY-326) C lib: Possible memory leak in SnowStemmer when provided schema for the indexer is not DECREFFED
[ https://issues.apache.org/jira/browse/LUCY-326?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16362311#comment-16362311 ] Nick Wellnhofer commented on LUCY-326: -- OK, the patch I mentioned this was just a guess. You said that if you call your own "destroy" function which DECREFs all Lucy objects, you don't see any memory leaks. So everything seems OK. If you're only seeing "possibly lost" blocks with some alive, intentionally unfreed libstemmer data structures, this probably has to do with interior pointers used in the libstemmer code. It doesn't mean that there's a leak. When the SnowballStemmer object is eventually destroyed, the "possibly lost" blocks should disappear. If you want to test your own code for memory leaks, you should always call your own "destroy" function and DECREF all Lucy objects. Does this explanation clear things up for you? > C lib: Possible memory leak in SnowStemmer when provided schema for the > indexer is not DECREFFED > > > Key: LUCY-326 > URL: https://issues.apache.org/jira/browse/LUCY-326 > Project: Lucy > Issue Type: Bug > Components: C bindings >Affects Versions: 0.6.1 > Environment: linux >Reporter: Serkan Mulayim >Priority: Major > > In my C library I create a static global struct (which contains some runtime > variables as well as lucy_Schema pointer) which is created when the program > is loaded. There is also a destroy function which cleans up (also DECREFs > the schema) the runtime data. When I index some documents by providing this > schema to the indexer, and call destroy function before the program (using > the lib) exits, I do not see any memory leaks in the valgrind output. I only > see (still reachable has some non-zero values due to lucy_bootstrap_parcel > function). > On the other hand if I do not call the destroy function before the exit, I > would expect to see only an increase in "still reachable" block in valgrind > output, but I also see "possibly lost" as following: > --- > ==16942== 70 bytes in 1 blocks are possibly lost in loss record 147 of 178 > ==16942== at 0x4C29B78: realloc (vg_replace_malloc.c:785) > ==16942== by 0x4F86CC4: increase_size (utilities.c:332) > ==16942== by 0x4F87865: replace_s (utilities.c:360) > ==16942== by 0x4EF4195: SN_set_current (api.c:62) > ==16942== by 0x4F44644: sb_stemmer_stem (libstemmer_utf8.c:80) > ==16942== by 0x4F65723: LUCY_SnowStemmer_Transform_IMP (SnowballStemmer.c:80) > ==16942== by 0x4F4FA69: LUCY_Analyzer_Transform (Analyzer.h:197) > ==16942== by 0x4F4FA69: LUCY_PolyAnalyzer_Transform_Text_IMP > (PolyAnalyzer.c:110) > ==16942== by 0x4F15368: LUCY_Analyzer_Transform_Text (Analyzer.h:204) > ==16942== by 0x4F15368: LUCY_Inverter_Add_Field_IMP (Inverter.c:181) > ==16942== by 0x4F14E91: LUCY_Inverter_Add_Field (Inverter.h:296) > ==16942== by 0x4F14E91: LUCY_Inverter_Invert_Doc_IMP (Inverter.c:109) > ==16942== by 0x4F63164: LUCY_Inverter_Invert_Doc (Inverter.h:275) > ==16942== by 0x4F63164: LUCY_SegWriter_Add_Doc_IMP (SegWriter.c:109) > ==16942== by 0x4F7E069: LUCY_Indexer_Add_Doc (Indexer.h:260) > ==16942== by 0x4F7F23F: index_messages_json (Search.c:432) > ==16942== > ==16942== LEAK SUMMARY: > ==16942== definitely lost: 0 bytes in 0 blocks > ==16942== indirectly lost: 0 bytes in 0 blocks > ==16942== possibly lost: 70 bytes in 1 blocks > ==16942== still reachable: 246,683 bytes in 5,077 blocks > ==16942== suppressed: 0 bytes in 0 blocks > --- > Similarly for another program where I do only search (not indexing), I see > the similar behaviour. Valgrind output is below for that one: > - > ==16949== > ==16949== HEAP SUMMARY: > ==16949== in use at exit: 229,312 bytes in 5,061 blocks > ==16949== total heap usage: 34,993 allocs, 29,932 frees, 1,791,083 bytes > allocated > ==16949== > ==16949== 37 bytes in 1 blocks are possibly lost in loss record 96 of 177 > ==16949== at 0x4C29B78: realloc (vg_replace_malloc.c:785) > ==16949== by 0x4F86CC4: increase_size (utilities.c:332) > ==16949== by 0x4F87865: replace_s (utilities.c:360) > ==16949== by 0x4EF4195: SN_set_current (api.c:62) > ==16949== by 0x4F44644: sb_stemmer_stem (libstemmer_utf8.c:80) > ==16949== by 0x4F65723: LUCY_SnowStemmer_Transform_IMP (SnowballStemmer.c:80) > ==16949== by 0x4F4FA69: LUCY_Analyzer_Transform (Analyzer.h:197) > ==16949== by 0x4F4FA69: LUCY_PolyAnalyzer_Transform_Text_IMP > (PolyAnalyzer.c:110) > ==16949== by 0x4EF35F3:
[lucy-issues] [jira] [Commented] (LUCY-326) C lib: Possible memory leak in SnowStemmer when provided schema for the indexer is not DECREFFED
[ https://issues.apache.org/jira/browse/LUCY-326?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16361308#comment-16361308 ] Serkan Mulayim commented on LUCY-326: - Thanks [~nwellnhof], my apologies for the latency in getting back to you. Can you help me understand the steps to update the library. I could not find the exact same filename. But I found /modules/analysis/snowstem/source/libstemmer/libstemmer_utf8.c . I believe this is the one. So do you suggest me to apply the patch for this file? > C lib: Possible memory leak in SnowStemmer when provided schema for the > indexer is not DECREFFED > > > Key: LUCY-326 > URL: https://issues.apache.org/jira/browse/LUCY-326 > Project: Lucy > Issue Type: Bug > Components: C bindings >Affects Versions: 0.6.1 > Environment: linux >Reporter: Serkan Mulayim >Priority: Major > > In my C library I create a static global struct (which contains some runtime > variables as well as lucy_Schema pointer) which is created when the program > is loaded. There is also a destroy function which cleans up (also DECREFs > the schema) the runtime data. When I index some documents by providing this > schema to the indexer, and call destroy function before the program (using > the lib) exits, I do not see any memory leaks in the valgrind output. I only > see (still reachable has some non-zero values due to lucy_bootstrap_parcel > function). > On the other hand if I do not call the destroy function before the exit, I > would expect to see only an increase in "still reachable" block in valgrind > output, but I also see "possibly lost" as following: > --- > ==16942== 70 bytes in 1 blocks are possibly lost in loss record 147 of 178 > ==16942== at 0x4C29B78: realloc (vg_replace_malloc.c:785) > ==16942== by 0x4F86CC4: increase_size (utilities.c:332) > ==16942== by 0x4F87865: replace_s (utilities.c:360) > ==16942== by 0x4EF4195: SN_set_current (api.c:62) > ==16942== by 0x4F44644: sb_stemmer_stem (libstemmer_utf8.c:80) > ==16942== by 0x4F65723: LUCY_SnowStemmer_Transform_IMP (SnowballStemmer.c:80) > ==16942== by 0x4F4FA69: LUCY_Analyzer_Transform (Analyzer.h:197) > ==16942== by 0x4F4FA69: LUCY_PolyAnalyzer_Transform_Text_IMP > (PolyAnalyzer.c:110) > ==16942== by 0x4F15368: LUCY_Analyzer_Transform_Text (Analyzer.h:204) > ==16942== by 0x4F15368: LUCY_Inverter_Add_Field_IMP (Inverter.c:181) > ==16942== by 0x4F14E91: LUCY_Inverter_Add_Field (Inverter.h:296) > ==16942== by 0x4F14E91: LUCY_Inverter_Invert_Doc_IMP (Inverter.c:109) > ==16942== by 0x4F63164: LUCY_Inverter_Invert_Doc (Inverter.h:275) > ==16942== by 0x4F63164: LUCY_SegWriter_Add_Doc_IMP (SegWriter.c:109) > ==16942== by 0x4F7E069: LUCY_Indexer_Add_Doc (Indexer.h:260) > ==16942== by 0x4F7F23F: index_messages_json (Search.c:432) > ==16942== > ==16942== LEAK SUMMARY: > ==16942== definitely lost: 0 bytes in 0 blocks > ==16942== indirectly lost: 0 bytes in 0 blocks > ==16942== possibly lost: 70 bytes in 1 blocks > ==16942== still reachable: 246,683 bytes in 5,077 blocks > ==16942== suppressed: 0 bytes in 0 blocks > --- > Similarly for another program where I do only search (not indexing), I see > the similar behaviour. Valgrind output is below for that one: > - > ==16949== > ==16949== HEAP SUMMARY: > ==16949== in use at exit: 229,312 bytes in 5,061 blocks > ==16949== total heap usage: 34,993 allocs, 29,932 frees, 1,791,083 bytes > allocated > ==16949== > ==16949== 37 bytes in 1 blocks are possibly lost in loss record 96 of 177 > ==16949== at 0x4C29B78: realloc (vg_replace_malloc.c:785) > ==16949== by 0x4F86CC4: increase_size (utilities.c:332) > ==16949== by 0x4F87865: replace_s (utilities.c:360) > ==16949== by 0x4EF4195: SN_set_current (api.c:62) > ==16949== by 0x4F44644: sb_stemmer_stem (libstemmer_utf8.c:80) > ==16949== by 0x4F65723: LUCY_SnowStemmer_Transform_IMP (SnowballStemmer.c:80) > ==16949== by 0x4F4FA69: LUCY_Analyzer_Transform (Analyzer.h:197) > ==16949== by 0x4F4FA69: LUCY_PolyAnalyzer_Transform_Text_IMP > (PolyAnalyzer.c:110) > ==16949== by 0x4EF35F3: LUCY_Analyzer_Transform_Text (Analyzer.h:204) > ==16949== by 0x4EF35F3: LUCY_Analyzer_Split_IMP (Analyzer.c:48) > ==16949== by 0x4F5AAC8: LUCY_Analyzer_Split (Analyzer.h:211) > ==16949== by 0x4F5AAC8: LUCY_QParser_Expand_Leaf_IMP (QueryParser.c:916) > ==16949== by 0x4F59ECA: LUCY_QParser_Expand (QueryParser.h:298) > ==16949== by 0x4F59ECA: LUCY_QParser_Parse_IMP
[lucy-issues] [jira] [Commented] (LUCY-326) C lib: Possible memory leak in SnowStemmer when provided schema for the indexer is not DECREFFED
[ https://issues.apache.org/jira/browse/LUCY-326?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16353978#comment-16353978 ] Nick Wellnhofer commented on LUCY-326: -- The offending code is in the Snowball stemmer (third-party code). Can you try if the following patch helps? https://github.com/snowballstem/snowball/commit/9ea5add413942d0aa2335cd8133c682263325ed8#diff-33107391b35633e4879a7a1a4bae2931 OTOH, "possibly lost" reports can be false positives if there are interior pointers to data structures. Maybe libstemmer uses interior pointers somewhere. > C lib: Possible memory leak in SnowStemmer when provided schema for the > indexer is not DECREFFED > > > Key: LUCY-326 > URL: https://issues.apache.org/jira/browse/LUCY-326 > Project: Lucy > Issue Type: Bug > Components: C bindings >Affects Versions: 0.6.1 > Environment: linux >Reporter: Serkan Mulayim >Priority: Major > > In my C library I create a static global struct (which contains some runtime > variables as well as lucy_Schema pointer) which is created when the program > is loaded. There is also a destroy function which cleans up (also DECREFs > the schema) the runtime data. When I index some documents by providing this > schema to the indexer, and call destroy function before the program (using > the lib) exits, I do not see any memory leaks in the valgrind output. I only > see (still reachable has some non-zero values due to lucy_bootstrap_parcel > function). > On the other hand if I do not call the destroy function before the exit, I > would expect to see only an increase in "still reachable" block in valgrind > output, but I also see "possibly lost" as following: > --- > ==16942== 70 bytes in 1 blocks are possibly lost in loss record 147 of 178 > ==16942== at 0x4C29B78: realloc (vg_replace_malloc.c:785) > ==16942== by 0x4F86CC4: increase_size (utilities.c:332) > ==16942== by 0x4F87865: replace_s (utilities.c:360) > ==16942== by 0x4EF4195: SN_set_current (api.c:62) > ==16942== by 0x4F44644: sb_stemmer_stem (libstemmer_utf8.c:80) > ==16942== by 0x4F65723: LUCY_SnowStemmer_Transform_IMP (SnowballStemmer.c:80) > ==16942== by 0x4F4FA69: LUCY_Analyzer_Transform (Analyzer.h:197) > ==16942== by 0x4F4FA69: LUCY_PolyAnalyzer_Transform_Text_IMP > (PolyAnalyzer.c:110) > ==16942== by 0x4F15368: LUCY_Analyzer_Transform_Text (Analyzer.h:204) > ==16942== by 0x4F15368: LUCY_Inverter_Add_Field_IMP (Inverter.c:181) > ==16942== by 0x4F14E91: LUCY_Inverter_Add_Field (Inverter.h:296) > ==16942== by 0x4F14E91: LUCY_Inverter_Invert_Doc_IMP (Inverter.c:109) > ==16942== by 0x4F63164: LUCY_Inverter_Invert_Doc (Inverter.h:275) > ==16942== by 0x4F63164: LUCY_SegWriter_Add_Doc_IMP (SegWriter.c:109) > ==16942== by 0x4F7E069: LUCY_Indexer_Add_Doc (Indexer.h:260) > ==16942== by 0x4F7F23F: index_messages_json (Search.c:432) > ==16942== > ==16942== LEAK SUMMARY: > ==16942== definitely lost: 0 bytes in 0 blocks > ==16942== indirectly lost: 0 bytes in 0 blocks > ==16942== possibly lost: 70 bytes in 1 blocks > ==16942== still reachable: 246,683 bytes in 5,077 blocks > ==16942== suppressed: 0 bytes in 0 blocks > --- > Similarly for another program where I do only search (not indexing), I see > the similar behaviour. Valgrind output is below for that one: > - > ==16949== > ==16949== HEAP SUMMARY: > ==16949== in use at exit: 229,312 bytes in 5,061 blocks > ==16949== total heap usage: 34,993 allocs, 29,932 frees, 1,791,083 bytes > allocated > ==16949== > ==16949== 37 bytes in 1 blocks are possibly lost in loss record 96 of 177 > ==16949== at 0x4C29B78: realloc (vg_replace_malloc.c:785) > ==16949== by 0x4F86CC4: increase_size (utilities.c:332) > ==16949== by 0x4F87865: replace_s (utilities.c:360) > ==16949== by 0x4EF4195: SN_set_current (api.c:62) > ==16949== by 0x4F44644: sb_stemmer_stem (libstemmer_utf8.c:80) > ==16949== by 0x4F65723: LUCY_SnowStemmer_Transform_IMP (SnowballStemmer.c:80) > ==16949== by 0x4F4FA69: LUCY_Analyzer_Transform (Analyzer.h:197) > ==16949== by 0x4F4FA69: LUCY_PolyAnalyzer_Transform_Text_IMP > (PolyAnalyzer.c:110) > ==16949== by 0x4EF35F3: LUCY_Analyzer_Transform_Text (Analyzer.h:204) > ==16949== by 0x4EF35F3: LUCY_Analyzer_Split_IMP (Analyzer.c:48) > ==16949== by 0x4F5AAC8: LUCY_Analyzer_Split (Analyzer.h:211) > ==16949== by 0x4F5AAC8: LUCY_QParser_Expand_Leaf_IMP (QueryParser.c:916) > ==16949== by 0x4F59ECA: LUCY_QParser_Expand (QueryParser.h:298) >