There are some languages in which the apostrophe is used a letter of the
alphabet rather than an item of punctuation.
e.g. Somali, in which the apostrophe represents the /Alef/.
See http://en.wikipedia.org/wiki/Somali_alphabet
Guessing that our Lucene indexing method generally strips out such
IIRC, the StandardAnalyzer that SWORD uses doesn't allow for that. It has its
own handling of the punctuation that is fixed. I've said before, the analyzer
is only good for English like languages.
In Him,
DM
On Dec 10, 2012, at 11:17 AM, David Haslam dfh...@googlemail.com wrote:
Developers\' Collaboration Forum sword-devel@crosswire.org
Betreff: Re: [sword-devel] Search bug New Arabic Bible, Not Shaped SVD
Version
On Mon, Nov 26, 2012 at 11:15 PM, Nic Carter niccar...@mac.com wrote:
My understanding is that we are currently locked into a really old
version
You're talking about vowels, not shaping. Shaping in Arabic changes the
shape of the letter according to its context in the word (initial,
medial, final, or isolated). I imagine unshaped Arabic would be very
difficult to read. Arabic without vowel marks, on the other hand, is
standard.
I
someone in this list report them about all this discussion :)
So now we know the problem and the solution .
Date: Mon, 26 Nov 2012 01:05:16 -0800
From: chris...@crosswire.org
To: sword-devel@crosswire.org
Subject: Re: [sword-devel] Search bug New Arabic Bible, Not Shaped SVD
Version
You're
@crosswire.org
Betreff: Re: [sword-devel] Search bug New Arabic Bible, Not Shaped SVD
Version
Sorry for choosing the wrong word
this wikipedia article talking about this topic
https://en.wikipedia.org/wiki/Arabic_diacritics
Thanks Chris for your reply about the filter, Actually I don't
Which (I suppose) would have been a patch to the SWORD API ?
So a similar patch would be necessary in principle to JSword ???
David
--
View this message in context:
http://sword-dev.350566.n4.nabble.com/Re-Search-bug-New-Arabic-Bible-Not-Shaped-SVD-Version-tp4651330p4651336.html
Sent from
Von: David Haslam dfh...@googlemail.com
So a similar patch would be necessary in principle to JSword ???
No. If And Bible does not have a problem, then Jsword does its job correctly.
Peter
___
sword-devel mailing list: sword-devel@crosswire.org
On Mon, Nov 26, 2012 at 6:22 AM, Peter von Kaehne ref...@gmx.net wrote:
Von: David Haslam dfh...@googlemail.com
So a similar patch would be necessary in principle to JSword ???
No. If And Bible does not have a problem, then Jsword does its job correctly.
However, BibleTime would require
Correct. JSword uses Lucene's filter for the language, which does more
normalization than the StandardAnalyzer which SWORD uses exclusively. The
StandardAnalyzer should only be used for unaccented latinate text. Same with
the SimpleAnalyzer. (In Lucene, an analyzer is a filter chain which
On Mon, Nov 26, 2012 at 8:12 AM, DM Smith dmsm...@crosswire.org wrote:
Correct. JSword uses Lucene's filter for the language, which does more
normalization than the StandardAnalyzer which SWORD uses exclusively. The
StandardAnalyzer should only be used for unaccented latinate text. Same
: Re: [sword-devel] Search bug New Arabic Bible, Not Shaped SVD
Version
On Mon, Nov 26, 2012 at 8:12 AM, DM Smith dmsm...@crosswire.org wrote:
Correct. JSword uses Lucene's filter for the language, which does more
normalization than the StandardAnalyzer which SWORD uses exclusively
My understanding is that we are currently locked into a really old version of
the C library it is no longer being maintained. Instead we need to port SWORD
to use the current version of the library, which is actively being maintained...
I gather some work has been done on this but I'm not sure
I think Arabic shapes add extra Unicode characters that's why the 2 same words
- i mentioned before - don't give the same results
--
Any Arabic search problem is unconnected to shaping.
Modules are routinely created and stored in a normalised format, user entries,
e.g. for
Using a comparison tool from ICU the two strings resulted in different
character numbers
Words to compare
يَسُوعَ
يسوع
Which is the Name of JESUS Christ in Arabic but one is shaped and the other
isn't
Words converted to HEX Format
\u064a \u064e \u0633 \u064f \u0648 \u0639 \u064e
\u064a \u0633
Hi,
Sorry for posting a lot these days :)
I just do many searches, readings and experiments on CrossWire Programs and
modules .
I found that i can't search in the SVD bible since all words are shaped while i
write not shaped search words
For example searching for يسوع is not equal يَسُوعَ
Pola,
For several very valid reasons we never copy e-Sword source texts to make
SWORD modules hosted by CrossWire.
/Please do not go down that route/.
David
--
View this message in context:
Pola wrote,
Currently I think in using Mod2Osis to extract the OSIS source text then
use Any program that can Remove all Arabic shapes then Package it again
using OSIS2Mod
Please understand that a round trip using mod2osis and osis2mod is highly
deprecated.
Information will always be lost due to
Pola wrote, The permanent solution is to make search indexes ignore all
Arabic shapes
Indeed, this would be true for all similar scripts that used glyph shaping.
Not just those in the Arabic/Persian family either.
The fundamental problem has been identified and described.
We really do need a
for accuracy of some
words
Date: Sat, 24 Nov 2012 09:12:25 -0800
From: dfh...@googlemail.com
To: sword-devel@crosswire.org
Subject: Re: [sword-devel] Search bug New Arabic Bible, Not Shaped SVD
Version
Pola wrote, The permanent solution is to make search indexes ignore all
Arabic shapes
20 matches
Mail list logo