[Haskell-cafe] bug in Prelude.words?
Does anyone else think it odd that Prelude.words will break a string at a non-breaking space?Prelude words "abc def\xA0ghi"["abc","def","ghi"]I would have expected this to be the obvious behaviour:Prelude words "abc def\xA0ghi"["abc","def\160ghi"]Regards,Malcolm ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] bug in Prelude.words?
It doesn't seem odd to me. Consider an HTML page with that sentence displayed on it. If you ask the viewer of the page how many words are in the sentence, then surely you will get the answer 3? On 28 March 2011 16:55, malcolm.wallace malcolm.wall...@me.com wrote: Does anyone else think it odd that Prelude.words will break a string at a non-breaking space? Prelude words abc def\xA0ghi [abc,def,ghi] I would have expected this to be the obvious behaviour: Prelude words abc def\xA0ghi [abc,def\160ghi] Regards, Malcolm ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe -- Colin Adams Preston, Lancashire, ENGLAND () ascii ribbon campaign - against html e-mail /\ www.asciiribbon.org - against proprietary attachments ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] bug in Prelude.words?
On 28 March 2011 17:55, malcolm.wallace malcolm.wall...@me.com wrote: Does anyone else think it odd that Prelude.words will break a string at a non-breaking space? Prelude words abc def\xA0ghi [abc,def,ghi] I think it's predictable, isSpace (which words is based on) is based on generalCategory, which returns the proper Unicode category: λ generalCategory '\xa0' Space So: -- | Selects white-space characters in the Latin-1 range.-- (In Unicode terms, this includes spaces and some control characters.)isSpace :: Char - Bool-- isSpace includes non-breaking space-- Done with explicit equalities both for efficiency, and to avoid a tiresome-- recursion with GHC.List elemisSpace c = c == ' ' || c == '\t'|| c == '\n'|| c == '\r'|| c == '\f' || c == '\v'|| c == '\xa0' || iswspace (fromIntegral (ord c)) /= 0 ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] bug in Prelude.words?
Consider an HTML page with that "sentence" displayed on it. If you ask the viewer of the page how many words are in the sentence, then surely you will get the answer 3?But what about the author? Surely there is no reason to use a non-breaking space unless they intend it to mean that the characters before and after it belong to the same logical unit-of-comprehension?Regards, Malcolm ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] bug in Prelude.words?
On Mar 28, 2011, at 12:05 PM, Christopher Done wrote: On 28 March 2011 17:55, malcolm.wallace malcolm.wall...@me.com wrote: Does anyone else think it odd that Prelude.words will break a string at a non-breaking space? Prelude words abc def\xA0ghi [abc,def,ghi] I think it's predictable, isSpace (which words is based on) is based on generalCategory, which returns the proper Unicode category: λ generalCategory '\xa0' Space I agree, and I also agree that it would make sense the other way (not breaking on non-breaking spaces). Perhaps it would be a good idea to add a remark to the documentation which specifies the treatment of non- breaking spaces. -- James___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] bug in Prelude.words?
I think it's predictable, isSpace (which words is based on) is based on generalCategory, which returns the proper Unicode category:λ generalCategory '\xa0' SpaceI agree, and I also agree that it would make sense the other way (not breaking on non-breaking spaces). Perhaps it would be a good idea to add a remark to the documentation which specifies the treatment of non-breaking spaces.I note that Java has two distinct properties concerning whitespace:Character.isSpaceChar('\xA0') == TrueCharacter.isWhitespace('\xA0') == FalseContrast with-- \x20 is ASCII spaceCharacter.isSpaceChar('\x20') == TrueCharacter.isWhitespace('\x20') == True-- \x2060 is the word-joiner (zero-width non-breaking space)Character.isSpaceChar('\x2060') == FalseCharacter.isWhitespace('\x2060') == False-- \x202F is the narrow non-breaking spaceCharacter.isSpaceChar('\x202F') == TrueCharacter.isWhitespace('\x202F') == False-- \x2009 is the thin spaceCharacter.isSpaceChar('\x2009') == TrueCharacter.isWhitespace('\x2009') == True___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] bug in Prelude.words?
On 2011-03-28 16:20 +, malcolm.wallace wrote: But what about the author? Surely there is no reason to use a non-breaking space unless they intend it to mean that the characters before and after it belong to the same logical unit-of-comprehension? The non-breaking part of non-breaking space refers to breaking text into lines. In other words, if two words are separated by a non-breaking space, then a line break will not be put between those words. A non-breaking space does *not* make two words into one word. -- Nick Bowler, Elliptic Technologies (http://www.elliptictech.com/) ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] bug in Prelude.words?
On 28 Mar 2011, at 17:20, malcolm.wallace wrote: Consider an HTML page with that sentence displayed on it. If you ask the viewer of the page how many words are in the sentence, then surely you will get the answer 3? But what about the author? Surely there is no reason to use a non-breaking space unless they intend it to mean that the characters before and after it belong to the same logical unit-of-comprehension? I'm not sure that a logical unit-of-comprehension is the same as a word though. As an aside – in publishing non-breaking spaces are commonly used for other purposes too, for example forcing a word onto a certain line to stop a space river appearing in a paragraph. Bob ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe