Couple of corrections on the list. David pointed out some problems: Hazar–maveth should not be on the list. It is without the hyphen in the text. These should be on the list: Abi–ezrites Beth–elite Beth–lehemite Rab–mag
The following are programmatic "typos" KirhereKir–heres RabsariRab–saris Also, if you are looking for Gibeah–haaraloth, you'll find it in a note in Joel 5:3. Thanks, DM On Mar 2, 2013, at 11:42 AM, DM Smith <dmsm...@crosswire.org> wrote: > I see two different questions being posed: > a) The correctness of using an ndash within a word. > b) The ability to search for words containing ndash or any kind of dash, > including a simple hyphen. > > I'll start with my conclusion: Changing the ndash to a simple hyphen does not > really address the questions. > > Regarding correctness: > The usage of ndash in the KJV is within names only. At the bottom, I've > included a list of the names having an ndash. In the 2003 version of the 1769 > KJV, these words were not hyphenated. They were hyphenated with an ndash in > the 2006 cleanup. As an interesting aside, I looked at some of the non-name > words that are hyphenated in the 1769 KJV and compared them to a photocopy of > the 1611. These are word such as God-ward, us-ward, thee-ward, joint-heirs, > .... My search was not exhaustive, but the 1611 didn't have hyphens, but > either concatenated the words as with the -ward suffixes or with a space as > in joint heirs. The other thing I noticed was that in each case where the KJV > (either 1769 or 1611) had a hyphenated name, it was a Hebrew transliteration > of some sort and had an attached note to at least one of the instances. > > One question is whether they should be taken as a whole or parts? So, is > Beth–el, equivalent to Beth el or to Bethel? Another question, does a dash > (hyphen, ndash, mdash, ...) have the same meaning today as it did hundreds of > years ago? Same question but regarding different languages: Do different > languages use a dash with different semantics than modern English? > > Regarding search: > This regards several issues: > How does Lucene handle these different characters? > What does an end user want/expect? > Can we leverage that to meet user expectation? > > Lucene's handling: > Lucene uses an Analyzer to split text into words on punctuation for indexing > and for search. JSword uses SimpleAnalyzer because it makes no further > assumptions on the text. SWORD lib uses StandardAnalyzer which does. I think > the StandardAnalyzer has special rules for hyphens. In Lucene 3.6 the > StandardAnalyzer behavior changes to use UAX 29 rules for splitting the text. > This is a huge step forward. I don't know whether it handles '-' differently > than other punctuation. (JSword switched from the StandardAnalyzer to the > SimpleAnalyzer very early on because of the extra assumptions that > StandardAnalyzer makes about what the user wants to index and not index and > because it was significantly slower.) > > With the SimpleAnalyzer a dash (hyphen, ndash, mdash) are used to create > phrases. As such Beth–el, Beth-el and "Beth el" are equivalent. (This is with > Lucene 3.0.3, earlier versions may differ). Note, it really doesn't matter > that it's a dash, any punctuation will do. I don't think this is the case > with the StandardAnalyzer. > > One of the impacts of having hypenated words is that searching for Bethlehem > won't find Beth–lehem. (The NT and OT differ on the spelling in the KJV.) It > doesn't matter what kind of dash is used. The user cannot omit the hyphen to > concatenate the words. > > Another impact of hyphenated words is that it is much harder to do a wild > card search. It doesn't matter what kind of dash is used. If the search > request has a dash a * cannot be used. > > So Lucene can do the right thing wrt the ndash and hyphen. They are identical > wrt indexing and searching. The user does not have to know the form that is > used in the file and match that. > > The other feature that Lucene offers out of the box is Fuzzy Searching. I > will find close approximations to the word that you are requesting. All that > needs to be done is append a ~ to the end of the word. For example, Abimelek~ > finds Abimael, Abimelech, Abiezer and Ahimelech. This is not a Soundex > search, so the results are often surprising. Bethelham~ finds Meshullam and > Bethlehem~ finds betrothed but not Bethlehem. > > Some front-ends don't use Lucene for indexing. Some use an older version. So > the behavior can differ. > Also, SWORD doesn't require indexing for "slow" search. Don't know if the > SWORD "slow" search treats the various dashes the same or differently. (I > think this is the Multi-word search mentioned by David) > > User expectation: > The hyphenation of these names is not common in other translations. I think > that most users would expect Bethel and not Beth–el or Beth-el. Together this > makes searching multiple Bibles at the same time very difficult. > > I think that a user might have a reasonable expectation not knowing that > proper spelling of more than a few of them. Let alone that they are > hyphenated. > > Leveraging: > I think that if StandardAnalyzer does not give expected behavior then > SimpleAnalyzer should be used. > > I think that hyphenated words should also be indexed as unhyphenated. > > Adding a simple filter to change different forms of dashes into a single form > for both search and index is a good solution but would break backward > compatibility with existing indexes and changing from StandardAnalyzer to > SimpleAnalyzer would be as much of a pain and a better solution (at least > until 3.6, which I have not evaluated to see if it changes the behavior > sufficiently.) > > Conclusion: Changing the ndash to a simple hyphen does not really address the > problems. > > In Him, > DM > > Abed–nego > Abel–beth–maachah > Abel–maim > Abel–meholah > Abel–mizraim > Abel–shittim > Abi–albon > Abi–ezer > Abi–ezrite > Adoni–bezek > Adoni–zedek > Allon–bachuth > Almon–diblathaim > Ashdoth–pisgah > Ataroth–adar > Ataroth–addar > Aznoth–tabor > Baalath–beer > Baal–berith > Baal–gad > Baal–hamon > Baal–hanan > Baal–hazor > Baal–hermon > Baal–meon > Baal–peor > Baal–perazim > Baal–shalisha > Baal–tamar > Baal–zebub > Baal–zephon > Bamoth–baal > Bashan–havoth–jair > Bath–rabbim > Bath–sheba > Bath–shua > Beer–elim > Beer–lahai–roi > Beer–sheba > Beesh–terah > Ben–ammi > Bene–berak > Bene–jaakan > Ben–hadad > Ben–hail > Ben–hanan > Ben–oni > Ben–zoheth > Berodach–baladan > Beth–anath > Beth–anoth > Beth–arabah > Beth–aram > Beth–arbel > Beth–aven > Beth–azmaveth > Beth–baal–meon > Beth–barah > Beth–birei > Beth–car > Beth–dagon > Beth–diblathaim > Beth–el > Beth–emek > Beth–ezel > Beth–gader > Beth–gamul > Beth–haccerem > Beth–haran > Beth–hoglah > Beth–hogla > Beth–horon > Beth–jeshimoth > Beth–jesimoth > Beth–lebaoth > Beth–lehem–judah > Beth–lehem > Beth–maachah > Beth–marcaboth > Beth–meon > Beth–nimrah > Beth–palet > Beth–pazzez > Beth–peor > Beth–phelet > Beth–rapha > Beth–rehob > Beth–shan > Beth–shean > Beth–shemesh > Beth–shemite > Beth–shittah > Beth–tappuah > Beth–zur > Caleb–ephratah > Chephar–haammonai > Chisloth–tabor > Chor–ashan > Chushan–rishathaim > Col–hozeh > Dan–jaan > Dibon–gad > Ebed–melech > Eben–ezer > El–beth–el > El–elohe–Israel > El–elohe–Israel > Elon–beth–hanan > El–paran > En–eglaim > En–gannim > En–gedi > En–haddah > En–hakkore > En–hazor > En–mishpat > En–rimmon > En–rogel > En–shemesh > En–tappuah > Ephes–dammim > Esar–haddon > Esh–baal > Evil–merodach > Ezion–gaber > Ezion–geber > Gath–hepher > Gath–rimmon > Gibeah–haaraloth > Gittah–hepher > Gur–baal > Hamath–zobah > Hammoth–dor > Hamon–gog > Havoth–jair > Hazar–addar > Hazar–enan > Hazar–gaddah > Hazar–hatticon > Hazar–maveth > Hazar–shual > Hazar–susah > Hazar–susim > Hazazon–tamar > Hazezon–tamar > Helkath–hazzurim > Hephzi–bah > Hor–hagidgad > I–chabod > Ije–abarim > Ir–nahash > Ir–shemesh > Ishbi–benob > Ish–bosheth > Ish–tob > Ittah–kazin > Jaare–oregim > Jabesh–gilead > Jashubi–lehem > Jegar–sahadutha > Jehovah–jireh > Jehovah–nissi > Jehovah–shalom > Jiphthah–el > Jushab–hesed > Kadesh–barnea > Kedesh–naphtali > Keren–happuch > Kibroth–hattaavah > Kir–haraseth > Kir–hareseth > Kir–haresh > KirhereKir–heres > Kirjath–arba > Kirjath–arim > Kirjath–baal > Kirjath–huzoth > Kirjath–jearim > Kirjath–sannah > Kirjath–sepher > Lahai–roi > Lo–ammi > Lo–debar > Lo–ruhamah > Maaleh–acrabbim > Magor–missabib > Mahaneh–dan > Maher–shalal–hash–baz > Malchi–shua > Me–jarkon > Melchi–shua > Meribah–Kadesh > Merib–baal > Merodach–baladan > Metheg–ammah > Migdal–el > Migdal–gad > Misrephoth–maim > Moresheth–gath > Nathan–melech > Nebuzar–adan > Nergal–sharezer > Obed–edom > Padan–aram > Pahath–moab > Pas–dammim > Perez–uzzah > Perez–uzza > Pharaoh–hophra > Pharaoh–nechoh > Pharaoh–necho > Pi–beseth > Pi–hahiroth > Poti–pherah > RabsariRab–saris > Rab–shakeh > Ramathaim–zophim > Ramath–lehi > Ramath–mizpeh > Ramoth–gilead > Regem–melech > Remmon–methoar > Rimmon–parez > Romamti–ezer > Ru–hamah > Samgar–nebo > Sela–hammahlekoth > Shear–jashub > Shethar–boznai > Shihor–libnath > Shimron–meron > Succoth–benoth > Syria–damascus > Syria–maachah > Taanath–shiloh > Tahtim–hodshi > Tel–abib > Tel–haresha > Tel–harsa > Tel–melah > Tiglath–pileser > Tilgath–pilneser > Timnath–heres > Timnath–serah > Tob–adonijah > Tubal–cain > Uzzen–sherah > Zareth–shahar > Zaphnath–paaneah > > > On Mar 2, 2013, at 6:01 AM, Chris Burrell <ch...@burrell.me.uk> wrote: > >> Can't this be done with a simple filter, i.e. always change the '-' to one >> kind regardless of the length. And when the user input comes in, do the same. >> Chris >> >> >> On 2 March 2013 02:36, Nic Carter <niccar...@mac.com> wrote: >> >> Do you have a proposed solution to this, David? >> >> I know that on my iPhone it is very simple to use a proper ndash & so I will >> always use the correct type of dash according to what I am writing. (same >> with on a Mac!) >> However, the more significant issue is simply that people don't know there >> is a difference (or why they are different lengths, etc)... ;) >> >> On 25/02/2013, at 2:48 AM, David Haslam <dfh...@googlemail.com> wrote: >> >> > In the KJV module, if you want to search for [say] the hyphenated name >> > "Maher–shalal–hash–baz", you first have to be aware that this module uses >> > the ndash in place of the hyphen. >> > >> > btw. It's not so easy to enter the ndash from a keyboard, and probably >> > even >> > harder in an Android tablet or mobile. >> > >> > If you use ordinary hyphen/minus for the search key hyphen for this module, >> > you don't find anything with "Exact phrase". >> > If you use "Multi-word", you do find "Maher" highlighted in the found >> > verse. >> > (e.g. using Xiphos). >> > >> > For modules in general, however, the user cannot usually know in advance >> > whether hyphenated words use the ndash, the hyphen or something else. >> > >> > Has anyone else looked into this aspect of the search feature? >> > >> > David >> > >> > >> > >> > >> > >> > -- >> > View this message in context: >> > http://sword-dev.350566.n4.nabble.com/Searching-for-hyphenated-words-tp4652016.html >> > Sent from the SWORD Dev mailing list archive at Nabble.com. >> > >> > _______________________________________________ >> > sword-devel mailing list: sword-devel@crosswire.org >> > http://www.crosswire.org/mailman/listinfo/sword-devel >> > Instructions to unsubscribe/change your settings at above page >> >> >> _______________________________________________ >> sword-devel mailing list: sword-devel@crosswire.org >> http://www.crosswire.org/mailman/listinfo/sword-devel >> Instructions to unsubscribe/change your settings at above page >> >> _______________________________________________ >> sword-devel mailing list: sword-devel@crosswire.org >> http://www.crosswire.org/mailman/listinfo/sword-devel >> Instructions to unsubscribe/change your settings at above page > > _______________________________________________ > sword-devel mailing list: sword-devel@crosswire.org > http://www.crosswire.org/mailman/listinfo/sword-devel > Instructions to unsubscribe/change your settings at above page
_______________________________________________ sword-devel mailing list: sword-devel@crosswire.org http://www.crosswire.org/mailman/listinfo/sword-devel Instructions to unsubscribe/change your settings at above page