[ https://issues.apache.org/jira/browse/LUCENENET-414?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Digy updated LUCENENET-414: --------------------------- Fix Version/s: (was: Lucene.Net 2.9.2) Lucene.Net 2.9.4g Lucene.Net 2.9.4 > The definition of CharArraySet is dangerously confusing and leads to bugs > when used. > ------------------------------------------------------------------------------------ > > Key: LUCENENET-414 > URL: https://issues.apache.org/jira/browse/LUCENENET-414 > Project: Lucene.Net > Issue Type: Bug > Components: Lucene.Net Core > Affects Versions: Lucene.Net 2.9.2 > Environment: Irrelevant > Reporter: Vincent Van Den Berghe > Priority: Minor > Fix For: Lucene.Net 2.9.4, Lucene.Net 2.9.4g > > > Right now, CharArraySet derives from System.Collections.Hashtable, but > doesn't actually use this base type for storing elements. > However, the StandardAnalyzer.STOP_WORDS_SET is exposed as a > System.Collections.Hashtable. The trivial code to build your own stopword set > using the StandardAnalyzer.STOP_WORDS_SET and adding your own set of > stopwords like this: > CharArraySet myStopWords = new CharArraySet(StandardAnalyzer.STOP_WORDS_SET, > ignoreCase: false); > foreach (string domainSpecificStopWord in DomainSpecificStopWords) > stopWords.Add(domainSpecificStopWord); > ... will fail because the CharArraySet accepts an ICollection, which will be > passed the Hashtable instance of STOP_WORDS_SET: the resulting myStopWords > will only contain the DomainSpecificStopWords, and not those from > STOP_WORDS_SET. > One workaround would be to replace the first line with this: > CharArraySet stopWords = new > CharArraySet(StandardAnalyzer.STOP_WORDS_SET.Count + > DomainSpecificStopWords.Length, ignoreCase: false); > foreach (string domainSpecificStopWord in > (CharArraySet)StandardAnalyzer.STOP_WORDS_SET) > stopWords.Add(domainSpecificStopWord); > ... but this makes use of the implementation detail (the STOP_WORDS_SET is > really an UnmodifiableCharArraySet which is itself a CharArraySet). It works > because it forces the foreach() to use the correct > CharArraySet.GetEnumerator(), which is defined as a "new" method (this has a > bad code smell to it) > At least 2 possibilities exist to solve this problem: > - Make CharArraySet use the Hashtable instance and a custom comparator, > instead of its own implementation. > - Make CharArraySet use HashSet<char[]>, defined in .NET 4.0. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira