Re: [sword-devel] RC - odd bug, diatheke, lookup, probably engine rather than utilities.

2017-11-02 Thread Troy A. Griffitts
OK, I can't blame Peter anymore since Greg also found the '2's.  :) So, I hunted this one down.  Odd problem... We 'fixed' a bug which was incorrectly checking for file open errors.  We were already sending the error code to logError when there was an error. Well, we now detect file open errors

Re: [sword-devel] Soft hyphens

2017-11-02 Thread Michael H
If you have a list of words with valid hyphenation points, it is very valuable to someone someday that list is documented as a spelling dictionary, even if it is incomplete and known to be. Finding valid hyphenation points is the biggest chunk of time in preparing for publication. and in many

Re: [sword-devel] Soft hyphens

2017-11-02 Thread David Haslam
Since my 9:26pm reply, I've been a busy bee, and generated a counted list of the Lingala words that contain a soft hyphen. i.e. After I removed the multiple and "useless" occurrences. There are 4584 such words, though one escapee has just "ambushed" me. 001 ­Israel This one begins with a

Re: [sword-devel] Soft hyphens

2017-11-02 Thread David Haslam
I didn't ignore it, but you may have missed my reply when you started to compose yours. The ZWNJ is indeed the proper character to use. This is a semantic matter, nothing to do with hyphenated word-wrap at line end, which is solely presentational. David -- Sent from:

Re: [sword-devel] Soft hyphens

2017-11-02 Thread ref...@gmx.net
Hi David, I think Michael has made a point which you ignored in your response - Indic and other scripts. The correct character in most of these places though is likely a zero width non joiner space character, at least it would be in Arabic derived scripts. I think the correct solution is that if

Re: [sword-devel] Soft hyphens

2017-11-02 Thread David Haslam
What would be of interest as a practical benefit for future typesetters is to prepare a comprehensive replace list for all the longer words in the LinVB source text. The search column would contain the word without a soft hyphen. The replace column would contain the same word with a soft hyphen

Re: [sword-devel] Soft hyphens

2017-11-02 Thread David Haslam
Regexp `([ [:punct:]]\xAD|\xAD[ [:punct:]])` is a reasonable definition for a "useless soft hyphen", unless in the language there is a punctuation mark that is used as part of a word. The inventors of some alphabets chose more wisely than others by allocating for the glottal stop the character

Re: [sword-devel] Soft hyphens

2017-11-02 Thread DM Smith
I see your point. For them to be useful, every word should have a soft hyphens between syllables (or intra-word semantic breaks). Not just some. It is just as likely in a dynamic word wrap of a browser (or other etext viewer) whose width can change that any word but the first few on a line will

Re: [sword-devel] Soft hyphens

2017-11-02 Thread David Haslam
They should use the ZWNJ rather than the soft hyphen. ZWNJ = Zero Width Non Joiner U+200C. The caution should not have been necessary. David -- Sent from: http://sword-dev.350566.n4.nabble.com/ ___ sword-devel mailing list:

Re: [sword-devel] Soft hyphens

2017-11-02 Thread David Haslam
Having soft hyphens to improve readability on hand held small devices is fine in theory, but it's not in practice. The more I've thought about soft hyphens, the more I've understood that their use was a kludge for a particular typesetting task at one time for publishing a printed Bible from

Re: [sword-devel] Soft hyphens

2017-11-02 Thread Cyrille
Le 02/11/2017 à 15:28, DM Smith a écrit : > I don’t think they should be removed upstream except to fix errors. David > classified these as multiple and useless. Regarding useless, I’m not sure > that “punctuation” is such a universal language construct that it can be > included in such a

Re: [sword-devel] Soft hyphens

2017-11-02 Thread DM Smith
I don’t think they should be removed upstream except to fix errors. David classified these as multiple and useless. Regarding useless, I’m not sure that “punctuation” is such a universal language construct that it can be included in such a determination. E.g. An apostrophe is often used as a

Re: [sword-devel] RC - odd bug, diatheke, lookup, probably engine rather than utilities.

2017-11-02 Thread Greg Hellings
Correction, it's the "installmgr -ri CrossWire KJV" command that generates a wall of "2" output. --Greg On Thu, Nov 2, 2017 at 9:19 AM, Greg Hellings wrote: > I should not that this is in SVN HEAD in my case, not the RC. > > --Greg > > On Thu, Nov 2, 2017 at 9:19 AM,

Re: [sword-devel] RC - odd bug, diatheke, lookup, probably engine rather than utilities.

2017-11-02 Thread Greg Hellings
I should not that this is in SVN HEAD in my case, not the RC. --Greg On Thu, Nov 2, 2017 at 9:19 AM, Greg Hellings wrote: > > > On Thu, Nov 2, 2017 at 7:15 AM, Peter Von Kaehne wrote: > >> I noticed this first on svn head in my normal source directory

Re: [sword-devel] RC - odd bug, diatheke, lookup, probably engine rather than utilities.

2017-11-02 Thread Greg Hellings
On Thu, Nov 2, 2017 at 7:15 AM, Peter Von Kaehne wrote: > I noticed this first on svn head in my normal source directory where I > also work - so I suspected that it was my fault from something I did not > remember I had done at some point somewhere. So I downloaded your RC tar >

Re: [sword-devel] Soft hyphens

2017-11-02 Thread Michael H
CAUTION: The soft hyphen is sometimes used in Indian and East Asian language scripts to prevent two adjacent characters from becoming a combined ligature. This is more common in minor languages. It is commonly used when the font in use while being typed is designed for another language using the

Re: [sword-devel] Soft hyphens

2017-11-02 Thread Cyrille
Le 02/11/2017 à 13:25, David Haslam a écrit : > It is a much simpler task to remove ALL soft hyphens rather than removing > only the delinquent ones! My proposition is to remove it in the osis file maybe during the conversion from usfm to osis, with o2u.py. Maybe Ryan would accept to add this in

Re: [sword-devel] Soft hyphens

2017-11-02 Thread Cyrille
Le 02/11/2017 à 10:36, ref...@gmx.net a écrit : > Leaving aside the module you are working on, how many other modules > have the same problem? konnym is affected by this problem. > If it is a few only, we might as well reissue them and worry about > engine enhancement later. > > Peter > > Peter

Re: [sword-devel] Soft hyphens

2017-11-02 Thread David Haslam
It is a much simpler task to remove ALL soft hyphens rather than removing only the delinquent ones! - multiple soft hyphens at the same position in a word - useless soft hyphens (before or after a space or punctuation mark) Delinquent ones were quite a common occurrence in the Lingala source

Re: [sword-devel] RC - odd bug, diatheke, lookup, probably engine rather than utilities.

2017-11-02 Thread Peter Von Kaehne
I noticed this first on svn head in my normal source directory where I also work - so I suspected that it was my fault from something I did not remember I had done at some point somewhere. So I downloaded your RC tar ball and compiled that in a separate location + run it in place (see the paths

Re: [sword-devel] Text preparation for searching in SWORD [was: Soft hyphens]

2017-11-02 Thread Peter Von Kaehne
I will check hopefully later today. Peter > Gesendet: Donnerstag, 02. November 2017 um 12:11 Uhr > Von: "David Haslam" > An: sword-devel@crosswire.org > Betreff: Re: [sword-devel] Text preparation for searching in SWORD [was: Soft > hyphens] > > Thanks Troy, > > I

Re: [sword-devel] Text preparation for searching in SWORD [was: Soft hyphens]

2017-11-02 Thread David Haslam
Thanks Troy, I probably won't have chance to test - I'm a text & module developer, not a code developer that builds SWORD from source together with a suitable front-end. I was already aware of the concept of strip filters, and these are to some extent mentioned in our wiki.

[sword-devel] Text preparation for searching in SWORD [was: Soft hyphens]

2017-11-02 Thread Troy A. Griffitts
SWORD has a number of filtering stages which occur at different places and events. Specifically interesting for this discussion are "strip filters".  These are called immediately before searching and should be called on the search string before passing it to search: ListKey results =

Re: [sword-devel] Soft hyphens

2017-11-02 Thread David Haslam
I am recommending the complete removal of soft hyphens because their use is a typographical kludge not semantic construction. See https://crosswire.org/wiki/Converting_SFM_Bibles_to_OSIS#Soft_hyphens Being a kludge, there could never be any possibility that any particular word would always have

Re: [sword-devel] Soft hyphens

2017-11-02 Thread ref...@gmx.net
Leaving aside the module you are working on, how many other modules have the same problem? If it is a few only, we might as well reissue them and worry about engine enhancement later. PeterPeterSent from my mobile. Please forgive shortness, typos and weird autocorrects. Original Message

Re: [sword-devel] Soft hyphens

2017-11-02 Thread David Haslam
Update: Research results of SWORD search for soft hyphens: In Xiphos there is a problem with the exact search. If the same word occurs in the text both with and without a soft hyphen, - A search for the word with a soft hyphen will find only those instances - A search for the word without a