Re: isSpace is too slow

2007-05-27 Thread Wagner Ferenc
Bulat Ziganshin <[EMAIL PROTECTED]> writes:

> moreover, if we know platforms where iswpace checks the whole range,
> we can speed up code on these platforms by omitting isspace check

As I see it, glibc does exactly this, see
http://sources.redhat.com/cgi-bin/cvsweb.cgi/libc/wctype/wcfuncs.c?rev=1.20&content-type=text/x-cvsweb-markup&cvsroot=glibc
-- 
Feri.
___
Glasgow-haskell-users mailing list
Glasgow-haskell-users@haskell.org
http://www.haskell.org/mailman/listinfo/glasgow-haskell-users


Re: isSpace is too slow

2007-05-21 Thread Simon Marlow

Neil Mitchell wrote:

Hi


Want to try also the
Data.ByteString.Base.isSpaceWord8 :: Word8 -> Bool


isspace: 0.375
iswspace: 0.400
ByteString: 0.460
Char: 0.672

Not as fast as isspace/iswspace, but quite a bit faster than Char.
Perhaps someone needs to take a peek at the generated ASM for each of
these routines and see where the Haskell one is falling down.


Depending on the platform, iswspace might not be checking the full Unicode 
character space here: on Linux it doesn't unless you have LANG set, I think. 
The Data.Char version does support full Unicode.


That's not to say we couldn't improve things here, I'm sure we can.

Cheers,
Simon
___
Glasgow-haskell-users mailing list
Glasgow-haskell-users@haskell.org
http://www.haskell.org/mailman/listinfo/glasgow-haskell-users


Re: isSpace is too slow

2007-05-21 Thread Ketil Malde
On Sun, 2007-05-20 at 16:59 +0100, Duncan Coutts wrote:

> isSpace c   =  c == ' ' ||
>c == '\t'||
>c == '\n'||
>c == '\r'||
>c == '\f'||
>c == '\v'||
>c == '\xa0'  ||
>iswspace (fromIntegral (ord c)) /= 0
> 
> iswspace does a generic lookup in the unicode property database I think.

So there's little hope of beating iswspace unless your input contains a
lot of spaces, I guess - for all non-space, we call iswspace, which
presumably repeats the tests for ASCII space.

Wouldn't something along these lines be more efficient?

isSpace :: Char -> Bool
isSpace = isSp . ord
isSp c | c <= 13= c >= 8  -- \b..\r
   | c <= 127   = c == 32 -- ' '
   | c <= 255   = c == 0xa0 -- nbsp
   | otherwise  = iswspace(..)

A quick test shows about a factor two improvement
on /usr/share/dict/words, but that will of course only trig the first
match.

-k




___
Glasgow-haskell-users mailing list
Glasgow-haskell-users@haskell.org
http://www.haskell.org/mailman/listinfo/glasgow-haskell-users


Re: isSpace is too slow

2007-05-21 Thread Yitzchak Gale

Duncan Coutts wrote:

iswspace... We could short-cut that for ascii characters.


Also, '\t', '\n', '\r', '\f', and '\v' are contiguous. So

isSpace c =c == ' '
   || c <= '\r' && c >= '\t'
   || c == '\xa0'
   || c > '\xff' && iswspace (fromIntegral (ord c)) /= 0

That makes 4 comparisons or less in the common
cases.

If we assume that ascii characters greater than '\xa0'
occur much less often than average, we can short-cut
those also and cut it down to 3 or less.

Note that the first space character above 255 is '\x1680'
(according to isSpace).

-Yitzchak
___
Glasgow-haskell-users mailing list
Glasgow-haskell-users@haskell.org
http://www.haskell.org/mailman/listinfo/glasgow-haskell-users


Re: isSpace is too slow

2007-05-20 Thread Duncan Coutts
On Sun, 2007-05-20 at 13:42 +0100, Neil Mitchell wrote:
> Hi
> 
> > Want to try also the
> > Data.ByteString.Base.isSpaceWord8 :: Word8 -> Bool
> 
> isspace: 0.375
> iswspace: 0.400
> ByteString: 0.460
> Char: 0.672
> 
> Not as fast as isspace/iswspace, but quite a bit faster than Char.
> Perhaps someone needs to take a peek at the generated ASM for each of
> these routines and see where the Haskell one is falling down.

When I looked at this before I think it was mainly because they're doing
a lot of Unicode stuff. Here's the code

-- | Selects white-space characters in the Latin-1 range.
-- (In Unicode terms, this includes spaces and some control characters.)
isSpace :: Char -> Bool
-- isSpace includes non-breaking space
-- Done with explicit equalities both for efficiency, and to avoid a tiresome
-- recursion with GHC.List elem
isSpace c   =  c == ' ' ||
   c == '\t'||
   c == '\n'||
   c == '\r'||
   c == '\f'||
   c == '\v'||
   c == '\xa0'  ||
   iswspace (fromIntegral (ord c)) /= 0

iswspace does a generic lookup in the unicode property database I think.
We could short-cut that for ascii characters. If it's less than 255 and
not one of the listed space chars then we know it's not a space char
without having to consult iswspace. In fact it might be best to check if
it's less than 255 initially and then go straight for the generic
unicode iswspace or do the specialised ascii test.

It's also worth testing if on the ascii subset if a bitvector test is
faster than several comparisons (in the average case of a non-space).

Note also that the space chars in the ascii subset are within the first
33 code points (ord ' ' = 32). I'm not sure if that fact actually helps
us to optimise a bitvector test.

Duncan

___
Glasgow-haskell-users mailing list
Glasgow-haskell-users@haskell.org
http://www.haskell.org/mailman/listinfo/glasgow-haskell-users


Re: isSpace is too slow

2007-05-20 Thread Neil Mitchell

Hi


Want to try also the
Data.ByteString.Base.isSpaceWord8 :: Word8 -> Bool


isspace: 0.375
iswspace: 0.400
ByteString: 0.460
Char: 0.672

Not as fast as isspace/iswspace, but quite a bit faster than Char.
Perhaps someone needs to take a peek at the generated ASM for each of
these routines and see where the Haskell one is falling down.

Thanks

Neil
___
Glasgow-haskell-users mailing list
Glasgow-haskell-users@haskell.org
http://www.haskell.org/mailman/listinfo/glasgow-haskell-users


Re: isSpace is too slow

2007-05-20 Thread Donald Bruce Stewart
ndmitchell:
> Hi,
> 
> I'm running a particular benchmark which calls isSpace a lot
> (basically wc -w). There are three ways to do the underlying space
> comparison - using the Haskell Data.Char.isSpace, using the C isspace,
> or using the C iswspace:
> 
> isspace: 0.375
> iswspace: 0.400
> Char.isSpace: 0.672
> 
> Any chance someone could speed this up?  Perhaps just replacing
> isSpace with a direct call to iswspace?

Want to try also the 
Data.ByteString.Base.isSpaceWord8 :: Word8 -> Bool

We added that originally I think because isSpace was too slow.

-- Don
___
Glasgow-haskell-users mailing list
Glasgow-haskell-users@haskell.org
http://www.haskell.org/mailman/listinfo/glasgow-haskell-users