I see two different questions being posed: a) The correctness of using an ndash within a word. b) The ability to search for words containing ndash or any kind of dash, including a simple hyphen.
I'll start with my conclusion: Changing the ndash to a simple hyphen does not really address the questions. Regarding correctness: The usage of ndash in the KJV is within names only. At the bottom, I've included a list of the names having an ndash. In the 2003 version of the 1769 KJV, these words were not hyphenated. They were hyphenated with an ndash in the 2006 cleanup. As an interesting aside, I looked at some of the non-name words that are hyphenated in the 1769 KJV and compared them to a photocopy of the 1611. These are word such as God-ward, us-ward, thee-ward, joint-heirs, .... My search was not exhaustive, but the 1611 didn't have hyphens, but either concatenated the words as with the -ward suffixes or with a space as in joint heirs. The other thing I noticed was that in each case where the KJV (either 1769 or 1611) had a hyphenated name, it was a Hebrew transliteration of some sort and had an attached note to at least one of the instances. One question is whether they should be taken as a whole or parts? So, is Beth–el, equivalent to Beth el or to Bethel? Another question, does a dash (hyphen, ndash, mdash, ...) have the same meaning today as it did hundreds of years ago? Same question but regarding different languages: Do different languages use a dash with different semantics than modern English? Regarding search: This regards several issues: How does Lucene handle these different characters? What does an end user want/expect? Can we leverage that to meet user expectation? Lucene's handling: Lucene uses an Analyzer to split text into words on punctuation for indexing and for search. JSword uses SimpleAnalyzer because it makes no further assumptions on the text. SWORD lib uses StandardAnalyzer which does. I think the StandardAnalyzer has special rules for hyphens. In Lucene 3.6 the StandardAnalyzer behavior changes to use UAX 29 rules for splitting the text. This is a huge step forward. I don't know whether it handles '-' differently than other punctuation. (JSword switched from the StandardAnalyzer to the SimpleAnalyzer very early on because of the extra assumptions that StandardAnalyzer makes about what the user wants to index and not index and because it was significantly slower.) With the SimpleAnalyzer a dash (hyphen, ndash, mdash) are used to create phrases. As such Beth–el, Beth-el and "Beth el" are equivalent. (This is with Lucene 3.0.3, earlier versions may differ). Note, it really doesn't matter that it's a dash, any punctuation will do. I don't think this is the case with the StandardAnalyzer. One of the impacts of having hypenated words is that searching for Bethlehem won't find Beth–lehem. (The NT and OT differ on the spelling in the KJV.) It doesn't matter what kind of dash is used. The user cannot omit the hyphen to concatenate the words. Another impact of hyphenated words is that it is much harder to do a wild card search. It doesn't matter what kind of dash is used. If the search request has a dash a * cannot be used. So Lucene can do the right thing wrt the ndash and hyphen. They are identical wrt indexing and searching. The user does not have to know the form that is used in the file and match that. The other feature that Lucene offers out of the box is Fuzzy Searching. I will find close approximations to the word that you are requesting. All that needs to be done is append a ~ to the end of the word. For example, Abimelek~ finds Abimael, Abimelech, Abiezer and Ahimelech. This is not a Soundex search, so the results are often surprising. Bethelham~ finds Meshullam and Bethlehem~ finds betrothed but not Bethlehem. Some front-ends don't use Lucene for indexing. Some use an older version. So the behavior can differ. Also, SWORD doesn't require indexing for "slow" search. Don't know if the SWORD "slow" search treats the various dashes the same or differently. (I think this is the Multi-word search mentioned by David) User expectation: The hyphenation of these names is not common in other translations. I think that most users would expect Bethel and not Beth–el or Beth-el. Together this makes searching multiple Bibles at the same time very difficult. I think that a user might have a reasonable expectation not knowing that proper spelling of more than a few of them. Let alone that they are hyphenated. Leveraging: I think that if StandardAnalyzer does not give expected behavior then SimpleAnalyzer should be used. I think that hyphenated words should also be indexed as unhyphenated. Adding a simple filter to change different forms of dashes into a single form for both search and index is a good solution but would break backward compatibility with existing indexes and changing from StandardAnalyzer to SimpleAnalyzer would be as much of a pain and a better solution (at least until 3.6, which I have not evaluated to see if it changes the behavior sufficiently.) Conclusion: Changing the ndash to a simple hyphen does not really address the problems. In Him, DM Abed–nego Abel–beth–maachah Abel–maim Abel–meholah Abel–mizraim Abel–shittim Abi–albon Abi–ezer Abi–ezrite Adoni–bezek Adoni–zedek Allon–bachuth Almon–diblathaim Ashdoth–pisgah Ataroth–adar Ataroth–addar Aznoth–tabor Baalath–beer Baal–berith Baal–gad Baal–hamon Baal–hanan Baal–hazor Baal–hermon Baal–meon Baal–peor Baal–perazim Baal–shalisha Baal–tamar Baal–zebub Baal–zephon Bamoth–baal Bashan–havoth–jair Bath–rabbim Bath–sheba Bath–shua Beer–elim Beer–lahai–roi Beer–sheba Beesh–terah Ben–ammi Bene–berak Bene–jaakan Ben–hadad Ben–hail Ben–hanan Ben–oni Ben–zoheth Berodach–baladan Beth–anath Beth–anoth Beth–arabah Beth–aram Beth–arbel Beth–aven Beth–azmaveth Beth–baal–meon Beth–barah Beth–birei Beth–car Beth–dagon Beth–diblathaim Beth–el Beth–emek Beth–ezel Beth–gader Beth–gamul Beth–haccerem Beth–haran Beth–hoglah Beth–hogla Beth–horon Beth–jeshimoth Beth–jesimoth Beth–lebaoth Beth–lehem–judah Beth–lehem Beth–maachah Beth–marcaboth Beth–meon Beth–nimrah Beth–palet Beth–pazzez Beth–peor Beth–phelet Beth–rapha Beth–rehob Beth–shan Beth–shean Beth–shemesh Beth–shemite Beth–shittah Beth–tappuah Beth–zur Caleb–ephratah Chephar–haammonai Chisloth–tabor Chor–ashan Chushan–rishathaim Col–hozeh Dan–jaan Dibon–gad Ebed–melech Eben–ezer El–beth–el El–elohe–Israel El–elohe–Israel Elon–beth–hanan El–paran En–eglaim En–gannim En–gedi En–haddah En–hakkore En–hazor En–mishpat En–rimmon En–rogel En–shemesh En–tappuah Ephes–dammim Esar–haddon Esh–baal Evil–merodach Ezion–gaber Ezion–geber Gath–hepher Gath–rimmon Gibeah–haaraloth Gittah–hepher Gur–baal Hamath–zobah Hammoth–dor Hamon–gog Havoth–jair Hazar–addar Hazar–enan Hazar–gaddah Hazar–hatticon Hazar–maveth Hazar–shual Hazar–susah Hazar–susim Hazazon–tamar Hazezon–tamar Helkath–hazzurim Hephzi–bah Hor–hagidgad I–chabod Ije–abarim Ir–nahash Ir–shemesh Ishbi–benob Ish–bosheth Ish–tob Ittah–kazin Jaare–oregim Jabesh–gilead Jashubi–lehem Jegar–sahadutha Jehovah–jireh Jehovah–nissi Jehovah–shalom Jiphthah–el Jushab–hesed Kadesh–barnea Kedesh–naphtali Keren–happuch Kibroth–hattaavah Kir–haraseth Kir–hareseth Kir–haresh KirhereKir–heres Kirjath–arba Kirjath–arim Kirjath–baal Kirjath–huzoth Kirjath–jearim Kirjath–sannah Kirjath–sepher Lahai–roi Lo–ammi Lo–debar Lo–ruhamah Maaleh–acrabbim Magor–missabib Mahaneh–dan Maher–shalal–hash–baz Malchi–shua Me–jarkon Melchi–shua Meribah–Kadesh Merib–baal Merodach–baladan Metheg–ammah Migdal–el Migdal–gad Misrephoth–maim Moresheth–gath Nathan–melech Nebuzar–adan Nergal–sharezer Obed–edom Padan–aram Pahath–moab Pas–dammim Perez–uzzah Perez–uzza Pharaoh–hophra Pharaoh–nechoh Pharaoh–necho Pi–beseth Pi–hahiroth Poti–pherah RabsariRab–saris Rab–shakeh Ramathaim–zophim Ramath–lehi Ramath–mizpeh Ramoth–gilead Regem–melech Remmon–methoar Rimmon–parez Romamti–ezer Ru–hamah Samgar–nebo Sela–hammahlekoth Shear–jashub Shethar–boznai Shihor–libnath Shimron–meron Succoth–benoth Syria–damascus Syria–maachah Taanath–shiloh Tahtim–hodshi Tel–abib Tel–haresha Tel–harsa Tel–melah Tiglath–pileser Tilgath–pilneser Timnath–heres Timnath–serah Tob–adonijah Tubal–cain Uzzen–sherah Zareth–shahar Zaphnath–paaneah On Mar 2, 2013, at 6:01 AM, Chris Burrell <ch...@burrell.me.uk> wrote: > Can't this be done with a simple filter, i.e. always change the '-' to one > kind regardless of the length. And when the user input comes in, do the same. > Chris > > > On 2 March 2013 02:36, Nic Carter <niccar...@mac.com> wrote: > > Do you have a proposed solution to this, David? > > I know that on my iPhone it is very simple to use a proper ndash & so I will > always use the correct type of dash according to what I am writing. (same > with on a Mac!) > However, the more significant issue is simply that people don't know there is > a difference (or why they are different lengths, etc)... ;) > > On 25/02/2013, at 2:48 AM, David Haslam <dfh...@googlemail.com> wrote: > > > In the KJV module, if you want to search for [say] the hyphenated name > > "Maher–shalal–hash–baz", you first have to be aware that this module uses > > the ndash in place of the hyphen. > > > > btw. It's not so easy to enter the ndash from a keyboard, and probably even > > harder in an Android tablet or mobile. > > > > If you use ordinary hyphen/minus for the search key hyphen for this module, > > you don't find anything with "Exact phrase". > > If you use "Multi-word", you do find "Maher" highlighted in the found verse. > > (e.g. using Xiphos). > > > > For modules in general, however, the user cannot usually know in advance > > whether hyphenated words use the ndash, the hyphen or something else. > > > > Has anyone else looked into this aspect of the search feature? > > > > David > > > > > > > > > > > > -- > > View this message in context: > > http://sword-dev.350566.n4.nabble.com/Searching-for-hyphenated-words-tp4652016.html > > Sent from the SWORD Dev mailing list archive at Nabble.com. > > > > _______________________________________________ > > sword-devel mailing list: sword-devel@crosswire.org > > http://www.crosswire.org/mailman/listinfo/sword-devel > > Instructions to unsubscribe/change your settings at above page > > > _______________________________________________ > sword-devel mailing list: sword-devel@crosswire.org > http://www.crosswire.org/mailman/listinfo/sword-devel > Instructions to unsubscribe/change your settings at above page > > _______________________________________________ > sword-devel mailing list: sword-devel@crosswire.org > http://www.crosswire.org/mailman/listinfo/sword-devel > Instructions to unsubscribe/change your settings at above page
_______________________________________________ sword-devel mailing list: sword-devel@crosswire.org http://www.crosswire.org/mailman/listinfo/sword-devel Instructions to unsubscribe/change your settings at above page