Couple of corrections on the list. David pointed out some problems:
Hazar–maveth should not be on the list. It is without the hyphen in the text.
These should be on the list:
Abi–ezrites
Beth–elite
Beth–lehemite
Rab–mag

The following are programmatic "typos"
KirhereKir–heres
RabsariRab–saris


Also, if you are looking for Gibeah–haaraloth, you'll find it in a note in Joel 
5:3.


Thanks,
        DM

On Mar 2, 2013, at 11:42 AM, DM Smith <dmsm...@crosswire.org> wrote:

> I see two different questions being posed:
> a) The correctness of using an ndash within a word.
> b) The ability to search for words containing ndash or any kind of dash, 
> including a simple hyphen.
> 
> I'll start with my conclusion: Changing the ndash to a simple hyphen does not 
> really address the questions.
> 
> Regarding correctness:
> The usage of ndash in the KJV is within names only. At the bottom, I've 
> included a list of the names having an ndash. In the 2003 version of the 1769 
> KJV, these words were not hyphenated. They were hyphenated with an ndash in 
> the 2006 cleanup. As an interesting aside, I looked at some of the non-name 
> words that are hyphenated in the 1769 KJV and compared them to a photocopy of 
> the 1611. These are word such as God-ward, us-ward, thee-ward, joint-heirs, 
> .... My search was not exhaustive, but the 1611 didn't have hyphens, but 
> either concatenated the words as with the -ward suffixes or with a space as 
> in joint heirs. The other thing I noticed was that in each case where the KJV 
> (either 1769 or 1611) had a hyphenated name, it was a Hebrew transliteration 
> of some sort and had an attached note to at least one of the instances.
> 
> One question is whether they should be taken as a whole or parts? So, is 
> Beth–el, equivalent to Beth el or to Bethel? Another question, does a dash 
> (hyphen, ndash, mdash, ...) have the same meaning today as it did hundreds of 
> years ago? Same question but regarding different languages: Do different 
> languages use a dash with different semantics than modern English?
> 
> Regarding search:
> This regards several issues:
> How does Lucene handle these different characters?
> What does an end user want/expect?
> Can we leverage that to meet user expectation?
> 
> Lucene's handling:
> Lucene uses an Analyzer to split text into words on punctuation for indexing 
> and for search. JSword uses SimpleAnalyzer because it makes no further 
> assumptions on the text. SWORD lib uses StandardAnalyzer which does. I think 
> the StandardAnalyzer has special rules for hyphens. In Lucene 3.6 the 
> StandardAnalyzer behavior changes to use UAX 29 rules for splitting the text. 
> This is a huge step forward. I don't know whether it handles '-' differently 
> than other punctuation. (JSword switched from the StandardAnalyzer to the 
> SimpleAnalyzer very early on because of the extra assumptions that 
> StandardAnalyzer makes about what the user wants to index and not index and 
> because it was significantly slower.)
> 
> With the SimpleAnalyzer a dash (hyphen, ndash, mdash) are used to create 
> phrases. As such Beth–el, Beth-el and "Beth el" are equivalent. (This is with 
> Lucene 3.0.3, earlier versions may differ). Note, it really doesn't matter 
> that it's a dash, any punctuation will do. I don't think this is the case 
> with the StandardAnalyzer.
> 
> One of the impacts of having hypenated words is that searching for Bethlehem 
> won't find Beth–lehem. (The NT and OT differ on the spelling in the KJV.) It 
> doesn't matter what kind of dash is used. The user cannot omit the hyphen to 
> concatenate the words.
> 
> Another impact of hyphenated words is that it is much harder to do a wild 
> card search. It doesn't matter what kind of dash is used. If the search 
> request has a dash a * cannot be used.
> 
> So Lucene can do the right thing wrt the ndash and hyphen. They are identical 
> wrt indexing and searching. The user does not have to know the form that is 
> used in the file and match that.
> 
> The other feature that Lucene offers out of the box is Fuzzy Searching. I 
> will find close approximations to the word that you are requesting. All that 
> needs to be done is append a ~ to the end of the word. For example, Abimelek~ 
> finds Abimael, Abimelech, Abiezer and Ahimelech. This is not a Soundex 
> search, so the results are often surprising. Bethelham~ finds Meshullam and 
> Bethlehem~ finds betrothed but not Bethlehem.
> 
> Some front-ends don't use Lucene for indexing. Some use an older version. So 
> the behavior can differ.
> Also, SWORD doesn't require indexing for "slow" search. Don't know if the 
> SWORD "slow" search treats the various dashes the same or differently. (I 
> think this is the Multi-word search mentioned by David)
> 
> User expectation:
> The hyphenation of these names is not common in other translations. I think 
> that most users would expect Bethel and not Beth–el or Beth-el. Together this 
> makes searching multiple Bibles at the same time very difficult.
> 
> I think that a user might have a reasonable expectation not knowing that 
> proper spelling of more than a few of them. Let alone that they are 
> hyphenated. 
> 
> Leveraging:
> I think that if StandardAnalyzer does not give expected behavior then 
> SimpleAnalyzer should be used.
> 
> I think that hyphenated words should also be indexed as unhyphenated.
> 
> Adding a simple filter to change different forms of dashes into a single form 
> for both search and index is a good solution but would break backward 
> compatibility with existing indexes and changing from StandardAnalyzer to 
> SimpleAnalyzer would be as much of a pain and a better solution (at least 
> until 3.6, which I have not evaluated to see if it changes the behavior 
> sufficiently.)
> 
> Conclusion: Changing the ndash to a simple hyphen does not really address the 
> problems.
> 
> In Him,
>       DM
> 
> Abed–nego
> Abel–beth–maachah
> Abel–maim
> Abel–meholah
> Abel–mizraim
> Abel–shittim
> Abi–albon
> Abi–ezer
> Abi–ezrite
> Adoni–bezek
> Adoni–zedek
> Allon–bachuth
> Almon–diblathaim
> Ashdoth–pisgah
> Ataroth–adar
> Ataroth–addar
> Aznoth–tabor
> Baalath–beer
> Baal–berith
> Baal–gad
> Baal–hamon
> Baal–hanan
> Baal–hazor
> Baal–hermon
> Baal–meon
> Baal–peor
> Baal–perazim
> Baal–shalisha
> Baal–tamar
> Baal–zebub
> Baal–zephon
> Bamoth–baal
> Bashan–havoth–jair
> Bath–rabbim
> Bath–sheba
> Bath–shua
> Beer–elim
> Beer–lahai–roi
> Beer–sheba
> Beesh–terah
> Ben–ammi
> Bene–berak
> Bene–jaakan
> Ben–hadad
> Ben–hail
> Ben–hanan
> Ben–oni
> Ben–zoheth
> Berodach–baladan
> Beth–anath
> Beth–anoth
> Beth–arabah
> Beth–aram
> Beth–arbel
> Beth–aven
> Beth–azmaveth
> Beth–baal–meon
> Beth–barah
> Beth–birei
> Beth–car
> Beth–dagon
> Beth–diblathaim
> Beth–el
> Beth–emek
> Beth–ezel
> Beth–gader
> Beth–gamul
> Beth–haccerem
> Beth–haran
> Beth–hoglah
> Beth–hogla
> Beth–horon
> Beth–jeshimoth
> Beth–jesimoth
> Beth–lebaoth
> Beth–lehem–judah
> Beth–lehem
> Beth–maachah
> Beth–marcaboth
> Beth–meon
> Beth–nimrah
> Beth–palet
> Beth–pazzez
> Beth–peor
> Beth–phelet
> Beth–rapha
> Beth–rehob
> Beth–shan
> Beth–shean
> Beth–shemesh
> Beth–shemite
> Beth–shittah
> Beth–tappuah
> Beth–zur
> Caleb–ephratah
> Chephar–haammonai
> Chisloth–tabor
> Chor–ashan
> Chushan–rishathaim
> Col–hozeh
> Dan–jaan
> Dibon–gad
> Ebed–melech
> Eben–ezer
> El–beth–el
> El–elohe–Israel
> El–elohe–Israel
> Elon–beth–hanan
> El–paran
> En–eglaim
> En–gannim
> En–gedi
> En–haddah
> En–hakkore
> En–hazor
> En–mishpat
> En–rimmon
> En–rogel
> En–shemesh
> En–tappuah
> Ephes–dammim
> Esar–haddon
> Esh–baal
> Evil–merodach
> Ezion–gaber
> Ezion–geber
> Gath–hepher
> Gath–rimmon
> Gibeah–haaraloth
> Gittah–hepher
> Gur–baal
> Hamath–zobah
> Hammoth–dor
> Hamon–gog
> Havoth–jair
> Hazar–addar
> Hazar–enan
> Hazar–gaddah
> Hazar–hatticon
> Hazar–maveth
> Hazar–shual
> Hazar–susah
> Hazar–susim
> Hazazon–tamar
> Hazezon–tamar
> Helkath–hazzurim
> Hephzi–bah
> Hor–hagidgad
> I–chabod
> Ije–abarim
> Ir–nahash
> Ir–shemesh
> Ishbi–benob
> Ish–bosheth
> Ish–tob
> Ittah–kazin
> Jaare–oregim
> Jabesh–gilead
> Jashubi–lehem
> Jegar–sahadutha
> Jehovah–jireh
> Jehovah–nissi
> Jehovah–shalom
> Jiphthah–el
> Jushab–hesed
> Kadesh–barnea
> Kedesh–naphtali
> Keren–happuch
> Kibroth–hattaavah
> Kir–haraseth
> Kir–hareseth
> Kir–haresh
> KirhereKir–heres
> Kirjath–arba
> Kirjath–arim
> Kirjath–baal
> Kirjath–huzoth
> Kirjath–jearim
> Kirjath–sannah
> Kirjath–sepher
> Lahai–roi
> Lo–ammi
> Lo–debar
> Lo–ruhamah
> Maaleh–acrabbim
> Magor–missabib
> Mahaneh–dan
> Maher–shalal–hash–baz
> Malchi–shua
> Me–jarkon
> Melchi–shua
> Meribah–Kadesh
> Merib–baal
> Merodach–baladan
> Metheg–ammah
> Migdal–el
> Migdal–gad
> Misrephoth–maim
> Moresheth–gath
> Nathan–melech
> Nebuzar–adan
> Nergal–sharezer
> Obed–edom
> Padan–aram
> Pahath–moab
> Pas–dammim
> Perez–uzzah
> Perez–uzza
> Pharaoh–hophra
> Pharaoh–nechoh
> Pharaoh–necho
> Pi–beseth
> Pi–hahiroth
> Poti–pherah
> RabsariRab–saris
> Rab–shakeh
> Ramathaim–zophim
> Ramath–lehi
> Ramath–mizpeh
> Ramoth–gilead
> Regem–melech
> Remmon–methoar
> Rimmon–parez
> Romamti–ezer
> Ru–hamah
> Samgar–nebo
> Sela–hammahlekoth
> Shear–jashub
> Shethar–boznai
> Shihor–libnath
> Shimron–meron
> Succoth–benoth
> Syria–damascus
> Syria–maachah
> Taanath–shiloh
> Tahtim–hodshi
> Tel–abib
> Tel–haresha
> Tel–harsa
> Tel–melah
> Tiglath–pileser
> Tilgath–pilneser
> Timnath–heres
> Timnath–serah
> Tob–adonijah
> Tubal–cain
> Uzzen–sherah
> Zareth–shahar
> Zaphnath–paaneah
> 
> 
> On Mar 2, 2013, at 6:01 AM, Chris Burrell <ch...@burrell.me.uk> wrote:
> 
>> Can't this be done with a simple filter, i.e. always change the '-' to one 
>> kind regardless of the length. And when the user input comes in, do the same.
>> Chris
>> 
>> 
>> On 2 March 2013 02:36, Nic Carter <niccar...@mac.com> wrote:
>> 
>> Do you have a proposed solution to this, David?
>> 
>> I know that on my iPhone it is very simple to use a proper ndash & so I will 
>> always use the correct type of dash according to what I am writing. (same 
>> with on a Mac!)
>> However, the more significant issue is simply that people don't know there 
>> is a difference (or why they are different lengths, etc)...  ;)
>> 
>> On 25/02/2013, at 2:48 AM, David Haslam <dfh...@googlemail.com> wrote:
>> 
>> > In the KJV module, if you want to search for [say] the hyphenated name
>> > "Maher–shalal–hash–baz", you first have to be aware that this module uses
>> > the ndash in place of the hyphen.
>> >
>> > btw.  It's not so easy to enter the ndash from a keyboard, and probably 
>> > even
>> > harder in an Android tablet or mobile.
>> >
>> > If you use ordinary hyphen/minus for the search key hyphen for this module,
>> > you don't find anything with "Exact phrase".
>> > If you use "Multi-word", you do find "Maher" highlighted in the found 
>> > verse.
>> > (e.g. using Xiphos).
>> >
>> > For modules in general, however, the user cannot usually know in advance
>> > whether hyphenated words use the ndash, the hyphen or something else.
>> >
>> > Has anyone else looked into this aspect of the search feature?
>> >
>> > David
>> >
>> >
>> >
>> >
>> >
>> > --
>> > View this message in context: 
>> > http://sword-dev.350566.n4.nabble.com/Searching-for-hyphenated-words-tp4652016.html
>> > Sent from the SWORD Dev mailing list archive at Nabble.com.
>> >
>> > _______________________________________________
>> > sword-devel mailing list: sword-devel@crosswire.org
>> > http://www.crosswire.org/mailman/listinfo/sword-devel
>> > Instructions to unsubscribe/change your settings at above page
>> 
>> 
>> _______________________________________________
>> sword-devel mailing list: sword-devel@crosswire.org
>> http://www.crosswire.org/mailman/listinfo/sword-devel
>> Instructions to unsubscribe/change your settings at above page
>> 
>> _______________________________________________
>> sword-devel mailing list: sword-devel@crosswire.org
>> http://www.crosswire.org/mailman/listinfo/sword-devel
>> Instructions to unsubscribe/change your settings at above page
> 
> _______________________________________________
> sword-devel mailing list: sword-devel@crosswire.org
> http://www.crosswire.org/mailman/listinfo/sword-devel
> Instructions to unsubscribe/change your settings at above page

_______________________________________________
sword-devel mailing list: sword-devel@crosswire.org
http://www.crosswire.org/mailman/listinfo/sword-devel
Instructions to unsubscribe/change your settings at above page

Reply via email to