[lucy-issues] [jira] [Commented] (LUCY-326) C lib: Possible memory leak in SnowStemmer when provided schema for the indexer is not DECREFFED

2018-02-13 Thread Serkan Mulayim (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCY-326?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16363118#comment-16363118
 ] 

Serkan Mulayim commented on LUCY-326:
-

Hi [~nwellnhof], yes you are right that there is no memory leak when i call the 
destroy method. But there might be cases where the destroy code is not called 
such as crashes. I just wanted to make sure that in those cases I only see an 
increase in the "Still Reachable" block in the valgrind output. I believe OS 
will not be able to resolve this in such cases. 

Thanks anyways.

> C lib: Possible memory leak in SnowStemmer when provided schema for the 
> indexer is not DECREFFED
> 
>
> Key: LUCY-326
> URL: https://issues.apache.org/jira/browse/LUCY-326
> Project: Lucy
>  Issue Type: Bug
>  Components: C bindings
>Affects Versions: 0.6.1
> Environment: linux
>Reporter: Serkan Mulayim
>Priority: Major
>
> In my C library I create a static global struct (which contains some runtime 
> variables as well as lucy_Schema pointer) which is created when the program 
> is loaded.  There is also a destroy function which cleans up (also DECREFs 
> the schema) the runtime data. When I index some documents by providing this 
> schema to the indexer, and call destroy function before the program (using 
> the lib) exits, I do not see any memory leaks in the valgrind output. I only 
> see (still reachable has some non-zero values due to lucy_bootstrap_parcel 
> function).
> On the other hand if I do not call the destroy function before the exit, I 
> would expect to see only an increase in "still reachable" block in valgrind 
> output, but I also see "possibly lost" as following:
> ---
> ==16942== 70 bytes in 1 blocks are possibly lost in loss record 147 of 178
>  ==16942== at 0x4C29B78: realloc (vg_replace_malloc.c:785)
>  ==16942== by 0x4F86CC4: increase_size (utilities.c:332)
>  ==16942== by 0x4F87865: replace_s (utilities.c:360)
>  ==16942== by 0x4EF4195: SN_set_current (api.c:62)
>  ==16942== by 0x4F44644: sb_stemmer_stem (libstemmer_utf8.c:80)
>  ==16942== by 0x4F65723: LUCY_SnowStemmer_Transform_IMP (SnowballStemmer.c:80)
>  ==16942== by 0x4F4FA69: LUCY_Analyzer_Transform (Analyzer.h:197)
>  ==16942== by 0x4F4FA69: LUCY_PolyAnalyzer_Transform_Text_IMP 
> (PolyAnalyzer.c:110)
>  ==16942== by 0x4F15368: LUCY_Analyzer_Transform_Text (Analyzer.h:204)
>  ==16942== by 0x4F15368: LUCY_Inverter_Add_Field_IMP (Inverter.c:181)
>  ==16942== by 0x4F14E91: LUCY_Inverter_Add_Field (Inverter.h:296)
>  ==16942== by 0x4F14E91: LUCY_Inverter_Invert_Doc_IMP (Inverter.c:109)
>  ==16942== by 0x4F63164: LUCY_Inverter_Invert_Doc (Inverter.h:275)
>  ==16942== by 0x4F63164: LUCY_SegWriter_Add_Doc_IMP (SegWriter.c:109)
>  ==16942== by 0x4F7E069: LUCY_Indexer_Add_Doc (Indexer.h:260)
>  ==16942== by 0x4F7F23F: index_messages_json (Search.c:432)
>  ==16942==
>  ==16942== LEAK SUMMARY:
>  ==16942== definitely lost: 0 bytes in 0 blocks
>  ==16942== indirectly lost: 0 bytes in 0 blocks
>  ==16942== possibly lost: 70 bytes in 1 blocks
>  ==16942== still reachable: 246,683 bytes in 5,077 blocks
>  ==16942== suppressed: 0 bytes in 0 blocks
> ---
> Similarly for another program where I do only search (not indexing), I see 
> the similar behaviour. Valgrind output is below for that one:
> -
> ==16949==
>  ==16949== HEAP SUMMARY:
>  ==16949== in use at exit: 229,312 bytes in 5,061 blocks
>  ==16949== total heap usage: 34,993 allocs, 29,932 frees, 1,791,083 bytes 
> allocated
>  ==16949==
>  ==16949== 37 bytes in 1 blocks are possibly lost in loss record 96 of 177
>  ==16949== at 0x4C29B78: realloc (vg_replace_malloc.c:785)
>  ==16949== by 0x4F86CC4: increase_size (utilities.c:332)
>  ==16949== by 0x4F87865: replace_s (utilities.c:360)
>  ==16949== by 0x4EF4195: SN_set_current (api.c:62)
>  ==16949== by 0x4F44644: sb_stemmer_stem (libstemmer_utf8.c:80)
>  ==16949== by 0x4F65723: LUCY_SnowStemmer_Transform_IMP (SnowballStemmer.c:80)
>  ==16949== by 0x4F4FA69: LUCY_Analyzer_Transform (Analyzer.h:197)
>  ==16949== by 0x4F4FA69: LUCY_PolyAnalyzer_Transform_Text_IMP 
> (PolyAnalyzer.c:110)
>  ==16949== by 0x4EF35F3: LUCY_Analyzer_Transform_Text (Analyzer.h:204)
>  ==16949== by 0x4EF35F3: LUCY_Analyzer_Split_IMP (Analyzer.c:48)
>  ==16949== by 0x4F5AAC8: LUCY_Analyzer_Split (Analyzer.h:211)
>  ==16949== by 0x4F5AAC8: LUCY_QParser_Expand_Leaf_IMP (QueryParser.c:916)
>  ==16949== by 0x4F59ECA: LUCY_QParser_Expand (QueryParser.h:298)
>  ==16949== by 

[lucy-issues] [jira] [Commented] (LUCY-326) C lib: Possible memory leak in SnowStemmer when provided schema for the indexer is not DECREFFED

2018-02-13 Thread Nick Wellnhofer (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCY-326?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16362311#comment-16362311
 ] 

Nick Wellnhofer commented on LUCY-326:
--

OK, the patch I mentioned this was just a guess.

You said that if you call your own "destroy" function which DECREFs all Lucy 
objects, you don't see any memory leaks. So everything seems OK. If you're only 
seeing "possibly lost" blocks with some alive, intentionally unfreed libstemmer 
data structures, this probably has to do with interior pointers used in the 
libstemmer code. It doesn't mean that there's a leak. When the SnowballStemmer 
object is eventually destroyed, the "possibly lost" blocks should disappear.

If you want to test your own code for memory leaks, you should always call your 
own "destroy" function and DECREF all Lucy objects.

Does this explanation clear things up for you?

 

> C lib: Possible memory leak in SnowStemmer when provided schema for the 
> indexer is not DECREFFED
> 
>
> Key: LUCY-326
> URL: https://issues.apache.org/jira/browse/LUCY-326
> Project: Lucy
>  Issue Type: Bug
>  Components: C bindings
>Affects Versions: 0.6.1
> Environment: linux
>Reporter: Serkan Mulayim
>Priority: Major
>
> In my C library I create a static global struct (which contains some runtime 
> variables as well as lucy_Schema pointer) which is created when the program 
> is loaded.  There is also a destroy function which cleans up (also DECREFs 
> the schema) the runtime data. When I index some documents by providing this 
> schema to the indexer, and call destroy function before the program (using 
> the lib) exits, I do not see any memory leaks in the valgrind output. I only 
> see (still reachable has some non-zero values due to lucy_bootstrap_parcel 
> function).
> On the other hand if I do not call the destroy function before the exit, I 
> would expect to see only an increase in "still reachable" block in valgrind 
> output, but I also see "possibly lost" as following:
> ---
> ==16942== 70 bytes in 1 blocks are possibly lost in loss record 147 of 178
>  ==16942== at 0x4C29B78: realloc (vg_replace_malloc.c:785)
>  ==16942== by 0x4F86CC4: increase_size (utilities.c:332)
>  ==16942== by 0x4F87865: replace_s (utilities.c:360)
>  ==16942== by 0x4EF4195: SN_set_current (api.c:62)
>  ==16942== by 0x4F44644: sb_stemmer_stem (libstemmer_utf8.c:80)
>  ==16942== by 0x4F65723: LUCY_SnowStemmer_Transform_IMP (SnowballStemmer.c:80)
>  ==16942== by 0x4F4FA69: LUCY_Analyzer_Transform (Analyzer.h:197)
>  ==16942== by 0x4F4FA69: LUCY_PolyAnalyzer_Transform_Text_IMP 
> (PolyAnalyzer.c:110)
>  ==16942== by 0x4F15368: LUCY_Analyzer_Transform_Text (Analyzer.h:204)
>  ==16942== by 0x4F15368: LUCY_Inverter_Add_Field_IMP (Inverter.c:181)
>  ==16942== by 0x4F14E91: LUCY_Inverter_Add_Field (Inverter.h:296)
>  ==16942== by 0x4F14E91: LUCY_Inverter_Invert_Doc_IMP (Inverter.c:109)
>  ==16942== by 0x4F63164: LUCY_Inverter_Invert_Doc (Inverter.h:275)
>  ==16942== by 0x4F63164: LUCY_SegWriter_Add_Doc_IMP (SegWriter.c:109)
>  ==16942== by 0x4F7E069: LUCY_Indexer_Add_Doc (Indexer.h:260)
>  ==16942== by 0x4F7F23F: index_messages_json (Search.c:432)
>  ==16942==
>  ==16942== LEAK SUMMARY:
>  ==16942== definitely lost: 0 bytes in 0 blocks
>  ==16942== indirectly lost: 0 bytes in 0 blocks
>  ==16942== possibly lost: 70 bytes in 1 blocks
>  ==16942== still reachable: 246,683 bytes in 5,077 blocks
>  ==16942== suppressed: 0 bytes in 0 blocks
> ---
> Similarly for another program where I do only search (not indexing), I see 
> the similar behaviour. Valgrind output is below for that one:
> -
> ==16949==
>  ==16949== HEAP SUMMARY:
>  ==16949== in use at exit: 229,312 bytes in 5,061 blocks
>  ==16949== total heap usage: 34,993 allocs, 29,932 frees, 1,791,083 bytes 
> allocated
>  ==16949==
>  ==16949== 37 bytes in 1 blocks are possibly lost in loss record 96 of 177
>  ==16949== at 0x4C29B78: realloc (vg_replace_malloc.c:785)
>  ==16949== by 0x4F86CC4: increase_size (utilities.c:332)
>  ==16949== by 0x4F87865: replace_s (utilities.c:360)
>  ==16949== by 0x4EF4195: SN_set_current (api.c:62)
>  ==16949== by 0x4F44644: sb_stemmer_stem (libstemmer_utf8.c:80)
>  ==16949== by 0x4F65723: LUCY_SnowStemmer_Transform_IMP (SnowballStemmer.c:80)
>  ==16949== by 0x4F4FA69: LUCY_Analyzer_Transform (Analyzer.h:197)
>  ==16949== by 0x4F4FA69: LUCY_PolyAnalyzer_Transform_Text_IMP 
> (PolyAnalyzer.c:110)
>  ==16949== by 0x4EF35F3: 

[lucy-issues] [jira] [Commented] (LUCY-326) C lib: Possible memory leak in SnowStemmer when provided schema for the indexer is not DECREFFED

2018-02-12 Thread Serkan Mulayim (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCY-326?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16361308#comment-16361308
 ] 

Serkan Mulayim commented on LUCY-326:
-

Thanks [~nwellnhof], my apologies for the latency in getting back to you. Can 
you help me understand the steps to update the library. I could not find the 
exact same filename. But I found 
/modules/analysis/snowstem/source/libstemmer/libstemmer_utf8.c . I believe this 
is the one. So do you suggest me to apply the patch for this file?

 

> C lib: Possible memory leak in SnowStemmer when provided schema for the 
> indexer is not DECREFFED
> 
>
> Key: LUCY-326
> URL: https://issues.apache.org/jira/browse/LUCY-326
> Project: Lucy
>  Issue Type: Bug
>  Components: C bindings
>Affects Versions: 0.6.1
> Environment: linux
>Reporter: Serkan Mulayim
>Priority: Major
>
> In my C library I create a static global struct (which contains some runtime 
> variables as well as lucy_Schema pointer) which is created when the program 
> is loaded.  There is also a destroy function which cleans up (also DECREFs 
> the schema) the runtime data. When I index some documents by providing this 
> schema to the indexer, and call destroy function before the program (using 
> the lib) exits, I do not see any memory leaks in the valgrind output. I only 
> see (still reachable has some non-zero values due to lucy_bootstrap_parcel 
> function).
> On the other hand if I do not call the destroy function before the exit, I 
> would expect to see only an increase in "still reachable" block in valgrind 
> output, but I also see "possibly lost" as following:
> ---
> ==16942== 70 bytes in 1 blocks are possibly lost in loss record 147 of 178
>  ==16942== at 0x4C29B78: realloc (vg_replace_malloc.c:785)
>  ==16942== by 0x4F86CC4: increase_size (utilities.c:332)
>  ==16942== by 0x4F87865: replace_s (utilities.c:360)
>  ==16942== by 0x4EF4195: SN_set_current (api.c:62)
>  ==16942== by 0x4F44644: sb_stemmer_stem (libstemmer_utf8.c:80)
>  ==16942== by 0x4F65723: LUCY_SnowStemmer_Transform_IMP (SnowballStemmer.c:80)
>  ==16942== by 0x4F4FA69: LUCY_Analyzer_Transform (Analyzer.h:197)
>  ==16942== by 0x4F4FA69: LUCY_PolyAnalyzer_Transform_Text_IMP 
> (PolyAnalyzer.c:110)
>  ==16942== by 0x4F15368: LUCY_Analyzer_Transform_Text (Analyzer.h:204)
>  ==16942== by 0x4F15368: LUCY_Inverter_Add_Field_IMP (Inverter.c:181)
>  ==16942== by 0x4F14E91: LUCY_Inverter_Add_Field (Inverter.h:296)
>  ==16942== by 0x4F14E91: LUCY_Inverter_Invert_Doc_IMP (Inverter.c:109)
>  ==16942== by 0x4F63164: LUCY_Inverter_Invert_Doc (Inverter.h:275)
>  ==16942== by 0x4F63164: LUCY_SegWriter_Add_Doc_IMP (SegWriter.c:109)
>  ==16942== by 0x4F7E069: LUCY_Indexer_Add_Doc (Indexer.h:260)
>  ==16942== by 0x4F7F23F: index_messages_json (Search.c:432)
>  ==16942==
>  ==16942== LEAK SUMMARY:
>  ==16942== definitely lost: 0 bytes in 0 blocks
>  ==16942== indirectly lost: 0 bytes in 0 blocks
>  ==16942== possibly lost: 70 bytes in 1 blocks
>  ==16942== still reachable: 246,683 bytes in 5,077 blocks
>  ==16942== suppressed: 0 bytes in 0 blocks
> ---
> Similarly for another program where I do only search (not indexing), I see 
> the similar behaviour. Valgrind output is below for that one:
> -
> ==16949==
>  ==16949== HEAP SUMMARY:
>  ==16949== in use at exit: 229,312 bytes in 5,061 blocks
>  ==16949== total heap usage: 34,993 allocs, 29,932 frees, 1,791,083 bytes 
> allocated
>  ==16949==
>  ==16949== 37 bytes in 1 blocks are possibly lost in loss record 96 of 177
>  ==16949== at 0x4C29B78: realloc (vg_replace_malloc.c:785)
>  ==16949== by 0x4F86CC4: increase_size (utilities.c:332)
>  ==16949== by 0x4F87865: replace_s (utilities.c:360)
>  ==16949== by 0x4EF4195: SN_set_current (api.c:62)
>  ==16949== by 0x4F44644: sb_stemmer_stem (libstemmer_utf8.c:80)
>  ==16949== by 0x4F65723: LUCY_SnowStemmer_Transform_IMP (SnowballStemmer.c:80)
>  ==16949== by 0x4F4FA69: LUCY_Analyzer_Transform (Analyzer.h:197)
>  ==16949== by 0x4F4FA69: LUCY_PolyAnalyzer_Transform_Text_IMP 
> (PolyAnalyzer.c:110)
>  ==16949== by 0x4EF35F3: LUCY_Analyzer_Transform_Text (Analyzer.h:204)
>  ==16949== by 0x4EF35F3: LUCY_Analyzer_Split_IMP (Analyzer.c:48)
>  ==16949== by 0x4F5AAC8: LUCY_Analyzer_Split (Analyzer.h:211)
>  ==16949== by 0x4F5AAC8: LUCY_QParser_Expand_Leaf_IMP (QueryParser.c:916)
>  ==16949== by 0x4F59ECA: LUCY_QParser_Expand (QueryParser.h:298)
>  ==16949== by 0x4F59ECA: LUCY_QParser_Parse_IMP 

[lucy-issues] [jira] [Commented] (LUCY-326) C lib: Possible memory leak in SnowStemmer when provided schema for the indexer is not DECREFFED

2018-02-06 Thread Nick Wellnhofer (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCY-326?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16353978#comment-16353978
 ] 

Nick Wellnhofer commented on LUCY-326:
--

The offending code is in the Snowball stemmer (third-party code). Can you try 
if the following patch helps?

https://github.com/snowballstem/snowball/commit/9ea5add413942d0aa2335cd8133c682263325ed8#diff-33107391b35633e4879a7a1a4bae2931

OTOH, "possibly lost" reports can be false positives if there are interior 
pointers to data structures. Maybe libstemmer uses interior pointers somewhere.

> C lib: Possible memory leak in SnowStemmer when provided schema for the 
> indexer is not DECREFFED
> 
>
> Key: LUCY-326
> URL: https://issues.apache.org/jira/browse/LUCY-326
> Project: Lucy
>  Issue Type: Bug
>  Components: C bindings
>Affects Versions: 0.6.1
> Environment: linux
>Reporter: Serkan Mulayim
>Priority: Major
>
> In my C library I create a static global struct (which contains some runtime 
> variables as well as lucy_Schema pointer) which is created when the program 
> is loaded.  There is also a destroy function which cleans up (also DECREFs 
> the schema) the runtime data. When I index some documents by providing this 
> schema to the indexer, and call destroy function before the program (using 
> the lib) exits, I do not see any memory leaks in the valgrind output. I only 
> see (still reachable has some non-zero values due to lucy_bootstrap_parcel 
> function).
> On the other hand if I do not call the destroy function before the exit, I 
> would expect to see only an increase in "still reachable" block in valgrind 
> output, but I also see "possibly lost" as following:
> ---
> ==16942== 70 bytes in 1 blocks are possibly lost in loss record 147 of 178
>  ==16942== at 0x4C29B78: realloc (vg_replace_malloc.c:785)
>  ==16942== by 0x4F86CC4: increase_size (utilities.c:332)
>  ==16942== by 0x4F87865: replace_s (utilities.c:360)
>  ==16942== by 0x4EF4195: SN_set_current (api.c:62)
>  ==16942== by 0x4F44644: sb_stemmer_stem (libstemmer_utf8.c:80)
>  ==16942== by 0x4F65723: LUCY_SnowStemmer_Transform_IMP (SnowballStemmer.c:80)
>  ==16942== by 0x4F4FA69: LUCY_Analyzer_Transform (Analyzer.h:197)
>  ==16942== by 0x4F4FA69: LUCY_PolyAnalyzer_Transform_Text_IMP 
> (PolyAnalyzer.c:110)
>  ==16942== by 0x4F15368: LUCY_Analyzer_Transform_Text (Analyzer.h:204)
>  ==16942== by 0x4F15368: LUCY_Inverter_Add_Field_IMP (Inverter.c:181)
>  ==16942== by 0x4F14E91: LUCY_Inverter_Add_Field (Inverter.h:296)
>  ==16942== by 0x4F14E91: LUCY_Inverter_Invert_Doc_IMP (Inverter.c:109)
>  ==16942== by 0x4F63164: LUCY_Inverter_Invert_Doc (Inverter.h:275)
>  ==16942== by 0x4F63164: LUCY_SegWriter_Add_Doc_IMP (SegWriter.c:109)
>  ==16942== by 0x4F7E069: LUCY_Indexer_Add_Doc (Indexer.h:260)
>  ==16942== by 0x4F7F23F: index_messages_json (Search.c:432)
>  ==16942==
>  ==16942== LEAK SUMMARY:
>  ==16942== definitely lost: 0 bytes in 0 blocks
>  ==16942== indirectly lost: 0 bytes in 0 blocks
>  ==16942== possibly lost: 70 bytes in 1 blocks
>  ==16942== still reachable: 246,683 bytes in 5,077 blocks
>  ==16942== suppressed: 0 bytes in 0 blocks
> ---
> Similarly for another program where I do only search (not indexing), I see 
> the similar behaviour. Valgrind output is below for that one:
> -
> ==16949==
>  ==16949== HEAP SUMMARY:
>  ==16949== in use at exit: 229,312 bytes in 5,061 blocks
>  ==16949== total heap usage: 34,993 allocs, 29,932 frees, 1,791,083 bytes 
> allocated
>  ==16949==
>  ==16949== 37 bytes in 1 blocks are possibly lost in loss record 96 of 177
>  ==16949== at 0x4C29B78: realloc (vg_replace_malloc.c:785)
>  ==16949== by 0x4F86CC4: increase_size (utilities.c:332)
>  ==16949== by 0x4F87865: replace_s (utilities.c:360)
>  ==16949== by 0x4EF4195: SN_set_current (api.c:62)
>  ==16949== by 0x4F44644: sb_stemmer_stem (libstemmer_utf8.c:80)
>  ==16949== by 0x4F65723: LUCY_SnowStemmer_Transform_IMP (SnowballStemmer.c:80)
>  ==16949== by 0x4F4FA69: LUCY_Analyzer_Transform (Analyzer.h:197)
>  ==16949== by 0x4F4FA69: LUCY_PolyAnalyzer_Transform_Text_IMP 
> (PolyAnalyzer.c:110)
>  ==16949== by 0x4EF35F3: LUCY_Analyzer_Transform_Text (Analyzer.h:204)
>  ==16949== by 0x4EF35F3: LUCY_Analyzer_Split_IMP (Analyzer.c:48)
>  ==16949== by 0x4F5AAC8: LUCY_Analyzer_Split (Analyzer.h:211)
>  ==16949== by 0x4F5AAC8: LUCY_QParser_Expand_Leaf_IMP (QueryParser.c:916)
>  ==16949== by 0x4F59ECA: LUCY_QParser_Expand (QueryParser.h:298)
>