Hi, I'm not engaged in the design of Sword and JSword Engines but as i understand from the mentioned above, we depend on a library that get updates very frequently in java but no updates for its C port If I'm right, Can we search for other libraries to use ?
The origin of the problem that Xiphos and other frontends can't search in Arabic when the source use diacritics and the search term doesn't contain any diacritics . Thanks for your interest in solving the problem Pola > Date: Mon, 26 Nov 2012 08:42:33 -0600 > From: greg.helli...@gmail.com > To: sword-devel@crosswire.org > Subject: Re: [sword-devel] Search bug & New Arabic Bible, Not Shaped SVD > Version > > On Mon, Nov 26, 2012 at 8:12 AM, DM Smith <dmsm...@crosswire.org> wrote: > > Correct. JSword uses Lucene's filter for the language, which does more > > normalization than the StandardAnalyzer which SWORD uses exclusively. The > > StandardAnalyzer should only be used for "unaccented" latinate text. Same > > with the SimpleAnalyzer. (In Lucene, an analyzer is a filter chain which > > normalizes text. Rule-of-Thumb: the same should be used for both index > > construction and searching.) > > > > Each release of Lucene adds and/or improves the filters for non-latin text. > > > > The biggest problem with using a new version of Lucene is that it > > invalidates, without notice, prior indexes. An analyzer may change from > > release to release. It has been true of the StandardAnalyzer. The impact is > > that the number of search hits may be reduced, perhaps to 0. > > > > (Un?)Fortunately for SWORD it rarely will encounter this problem, as > CLucene is extremely rarely updated. It has seen exactly two commits > over the past 20 months (since the tagging of the 2.3.3.4 release, > which is current head) and neither has been an update to the > Analyzers. This has the benefit of not invalidating search indexes > very often but has the drawback of almost never seeing updates to the > analyzers and any bugs they may carry. > > It seems like we could have a set of Analyzers that we build on a > per-language basis. The CLucene contrib libraries include analyzers > specifically for German and CJK as examples. I doubt that the upstream > maintainers would object to including additional analyzers if we > developed them. That is, if we can even get in contact with them and > they're not completely dormant. > > > Both SWORD and JSword need a mechanism to record the version of Lucene that > > is used in constructing an index and to refuse to search an index unless > > the version of Lucene for searching and indexing match. > > > > Much noise has been made about this. But no one has been willing to > actually implement it or been rebuffed when proposals have been made > as to how this might be stored. Nearly any changes made would still > lead to invalidation of existing indexes, against which there has been > much friction in the past. Storing the value in a file next to the > indexes is a near-trivial change, but no one has done so. > > To avoid this current issue, though, would it be better to track the > Lucene version or the Analyzer version used? From what I know of > Lucene, some sort of hybrid of the two might be best. My understanding > is that some versions of Lucene break compatibility with indexes made > in previous versions, while the current issue would be addressed by > filter changes which should be applied to both the index and incoming > search terms. > > Again, implementing this is a near trivial task (although > compatibility between the indexes created in C and those in Java would > probably not be possible because the Java Lucene library is much more > active than CLucene). It's simply never been a priority for anyone to > do. > > --Greg > > > Also of note, there have been some substantial changes to Unicode from > > release to release. So, if the version unicode used by the OS, Java, ICU, > > .... changes, the index may no longer be valid. From what I can tell this > > will be minority languages. > > > > In Him, > > DM Smith > > > > > > On Nov 26, 2012, at 7:22 AM, Peter von Kaehne <ref...@gmx.net> wrote: > > > >> > >>> Von: David Haslam <dfh...@googlemail.com> > >> > >>> So a similar patch would be necessary in principle to JSword ??? > >> > >> No. If And Bible does not have a problem, then Jsword does its job > >> correctly. > >> > >> Peter > >> > >> _______________________________________________ > >> sword-devel mailing list: sword-devel@crosswire.org > >> http://www.crosswire.org/mailman/listinfo/sword-devel > >> Instructions to unsubscribe/change your settings at above page > > > > > > _______________________________________________ > > sword-devel mailing list: sword-devel@crosswire.org > > http://www.crosswire.org/mailman/listinfo/sword-devel > > Instructions to unsubscribe/change your settings at above page > > _______________________________________________ > sword-devel mailing list: sword-devel@crosswire.org > http://www.crosswire.org/mailman/listinfo/sword-devel > Instructions to unsubscribe/change your settings at above page
_______________________________________________ sword-devel mailing list: sword-devel@crosswire.org http://www.crosswire.org/mailman/listinfo/sword-devel Instructions to unsubscribe/change your settings at above page