Hi,
I'm not engaged in the design of Sword and JSword Engines but as i understand 
from the mentioned above, we depend on a library that get updates very 
frequently in java but no updates for its C port 
If I'm right, Can we search for other libraries to use ?

The origin of the problem that Xiphos and other frontends can't search in 
Arabic when the source use diacritics and the search term doesn't contain any 
diacritics .

Thanks for your interest in solving the problem

Pola

> Date: Mon, 26 Nov 2012 08:42:33 -0600
> From: greg.helli...@gmail.com
> To: sword-devel@crosswire.org
> Subject: Re: [sword-devel] Search bug & New Arabic Bible,     Not Shaped SVD 
> Version
> 
> On Mon, Nov 26, 2012 at 8:12 AM, DM Smith <dmsm...@crosswire.org> wrote:
> > Correct. JSword uses Lucene's filter for the language, which does more 
> > normalization than the StandardAnalyzer which SWORD uses exclusively. The 
> > StandardAnalyzer should only be used for "unaccented" latinate text. Same 
> > with the SimpleAnalyzer. (In Lucene, an analyzer is a filter chain which 
> > normalizes text. Rule-of-Thumb: the same should be used for both index 
> > construction and searching.)
> >
> > Each release of Lucene adds and/or improves the filters for non-latin text.
> >
> > The biggest problem with using a new version of Lucene is that it 
> > invalidates, without notice, prior indexes. An analyzer may change from 
> > release to release. It has been true of the StandardAnalyzer. The impact is 
> > that the number of search hits may be reduced, perhaps to 0.
> >
> 
> (Un?)Fortunately for SWORD it rarely will encounter this problem, as
> CLucene is extremely rarely updated. It has seen exactly two commits
> over the past 20 months (since the tagging of the 2.3.3.4 release,
> which is current head) and neither has been an update to the
> Analyzers. This has the benefit of not invalidating search indexes
> very often but has the drawback of almost never seeing updates to the
> analyzers and any bugs they may carry.
> 
> It seems like we could have a set of Analyzers that we build on a
> per-language basis. The CLucene contrib libraries include analyzers
> specifically for German and CJK as examples. I doubt that the upstream
> maintainers would object to including additional analyzers if we
> developed them. That is, if we can even get in contact with them and
> they're not completely dormant.
> 
> > Both SWORD and JSword need a mechanism to record the version of Lucene that 
> > is used in constructing an index and to refuse to search an index unless 
> > the version of Lucene for searching and indexing match.
> >
> 
> Much noise has been made about this. But no one has been willing to
> actually implement it or been rebuffed when proposals have been made
> as to how this might be stored. Nearly any changes made would still
> lead to invalidation of existing indexes, against which there has been
> much friction in the past. Storing the value in a file next to the
> indexes is a near-trivial change, but no one has done so.
> 
> To avoid this current issue, though, would it be better to track the
> Lucene version or the Analyzer version used? From what I know of
> Lucene, some sort of hybrid of the two might be best. My understanding
> is that some versions of Lucene break compatibility with indexes made
> in previous versions, while the current issue would be addressed by
> filter changes which should be applied to both the index and incoming
> search terms.
> 
> Again, implementing this is a near trivial task (although
> compatibility between the indexes created in C and those in Java would
> probably not be possible because the Java Lucene library is much more
> active than CLucene). It's simply never been a priority for anyone to
> do.
> 
> --Greg
> 
> > Also of note, there have been some substantial changes to Unicode from 
> > release to release. So, if the version unicode used by the OS, Java, ICU, 
> > .... changes, the index may no longer be valid. From what I can tell this 
> > will be minority languages.
> >
> > In Him,
> >         DM Smith
> >
> >
> > On Nov 26, 2012, at 7:22 AM, Peter von Kaehne <ref...@gmx.net> wrote:
> >
> >>
> >>> Von: David Haslam <dfh...@googlemail.com>
> >>
> >>> So a similar patch would be necessary in principle to JSword ???
> >>
> >> No. If And Bible does not have a problem, then Jsword does its job 
> >> correctly.
> >>
> >> Peter
> >>
> >> _______________________________________________
> >> sword-devel mailing list: sword-devel@crosswire.org
> >> http://www.crosswire.org/mailman/listinfo/sword-devel
> >> Instructions to unsubscribe/change your settings at above page
> >
> >
> > _______________________________________________
> > sword-devel mailing list: sword-devel@crosswire.org
> > http://www.crosswire.org/mailman/listinfo/sword-devel
> > Instructions to unsubscribe/change your settings at above page
> 
> _______________________________________________
> sword-devel mailing list: sword-devel@crosswire.org
> http://www.crosswire.org/mailman/listinfo/sword-devel
> Instructions to unsubscribe/change your settings at above page
                                          
_______________________________________________
sword-devel mailing list: sword-devel@crosswire.org
http://www.crosswire.org/mailman/listinfo/sword-devel
Instructions to unsubscribe/change your settings at above page

Reply via email to