[Haskell-cafe] How to speedup generically parsing sum types?

2011-11-03 Thread Bas van Dijk
Hello,

I recently added default generic implementations of toJSON and
parseJSON to the aeson package. Now I'm optimizing them. Here are some
benchmark results that compare:

* th: toJSON and fromJSON generated by template-haskell. Can be
compared to hand-written code. Should be the fastest of all.

* syb: toJSON and fromJSON from the Data.Aeson.Generic module. Uses
the Data type class.

* generic: my toJSON and fromJSON using GHC Generics.

The benchmark itself can be found here:
https://github.com/basvandijk/aeson/blob/optimizations/benchmarks/AesonCompareAutoInstances.hs

toJSON
==

D/toJSON/th 3.631734 us
D/toJSON/syb32.66679 us
D/toJSON/generic3.371868 us

BigRecord/toJSON/th 8.982990 us
BigRecord/toJSON/syb48.90737 us
BigRecord/toJSON/generic8.971597 us

BigProduct/toJSON/th1.578259 us
BigProduct/toJSON/syb   29.21153 us
BigProduct/toJSON/generic   1.623115 us

BigSum/toJSON/th51.81214 ns
BigSum/toJSON/syb   1.256708 us
BigSum/toJSON/generic   71.32851 ns


fromJSON


D/fromJSON/th   7.017204 us
D/fromJSON/syb  23.46567 us
D/fromJSON/generic  7.968974 us

BigRecord/fromJSON/th   8.513789 us
BigRecord/fromJSON/syb  36.64501 us
BigRecord/fromJSON/generic  10.07809 us

BigProduct/fromJSON/th  2.430677 us
BigProduct/fromJSON/syb 17.97764 us
BigProduct/fromJSON/generic 2.201130 us

BigSum/fromJSON/th  414.8699 ns
BigSum/fromJSON/syb 4.113170 us
BigSum/fromJSON/generic 13.62614 us !!!


As can be seen, in most cases the GHC Generics implementation is much
faster than SYB and just as fast as TH. I'm impressed by how well GHC
optimizes the code!

Unfortunately the last benchmark, generically parsing a big sum type,
is much slower. The code for parsing sums, which can be found here:

https://github.com/basvandijk/aeson/blob/optimizations/Data/Aeson/Types/Internal.hs#L1059

is basically this:


instance (GFromSum a, GFromSum b) = GFromJSON (a :+: b) where
gParseJSON (Object (M.toList - [keyVal])) = gParseSum keyVal
gParseJSON v = typeMismatch sum (:+:) v
{-# INLINE gParseJSON #-}


class GFromSum f where
gParseSum :: Pair - Parser (f a)

instance (GFromSum a, GFromSum b) = GFromSum (a :+: b) where
gParseSum keyVal = (L1 $ gParseSum keyVal) |
   (R1 $ gParseSum keyVal)
{-# INLINE gParseSum #-}

instance (Constructor c, GFromJSON a, ConsFromJSON a) =
GFromSum (C1 c a) where
gParseSum (key, value)
| key == pack (conName (undefined :: t c a p)) =
gParseJSON value
| otherwise = notFound $ unpack key
{-# INLINE gParseSum #-}


notFound :: String - Parser a
notFound key = fail $ The key \ ++ key ++ \ was not found
{-# INLINE notFound #-}


Any idea how to make it faster?

Regards,

Bas

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] How to speedup generically parsing sum types?

2011-11-03 Thread José Pedro Magalhães
Hi Bas,

First of all, thanks for these numbers. I have previously compared the
performance of GP libs [1] and your results confirm what I would expect,
except for that last one, BigSum/fromJSON/generic.

It's good that you're using INLINE pragmas on the generic function already.
What I would also try:
- Compile with -O2 and -fno-spec-constr-count (this last one is
particularly important)
- Add {-# INLINE [1] #-} pragmas to the to/from methods of your Generic
instances.

To add these INLINE pragmas you will have to give your own instances of
Generic (so you can't derive them). I'd suggest you get hold of them with
-ddump-deriv, copy-paste and add the pragmas, just for testing purposes.
The phase is important: you first want to make sure you inline the generic
function definition, and only then the from/to.

Please keep me posted on the effects of these suggestions. In particular,
if the INLINE pragmas on the from/to methods are essential, I'll be happy
to add them to the derived instances.


Cheers,
Pedro

[1] http://dreixel.net/research/pdf/ogie.pdf

2011/11/3 Bas van Dijk v.dijk@gmail.com

 Hello,

 I recently added default generic implementations of toJSON and
 parseJSON to the aeson package. Now I'm optimizing them. Here are some
 benchmark results that compare:

 * th: toJSON and fromJSON generated by template-haskell. Can be
 compared to hand-written code. Should be the fastest of all.

 * syb: toJSON and fromJSON from the Data.Aeson.Generic module. Uses
 the Data type class.

 * generic: my toJSON and fromJSON using GHC Generics.

 The benchmark itself can be found here:

 https://github.com/basvandijk/aeson/blob/optimizations/benchmarks/AesonCompareAutoInstances.hs

 toJSON
 ==

 D/toJSON/th 3.631734 us
 D/toJSON/syb32.66679 us
 D/toJSON/generic3.371868 us

 BigRecord/toJSON/th 8.982990 us
 BigRecord/toJSON/syb48.90737 us
 BigRecord/toJSON/generic8.971597 us

 BigProduct/toJSON/th1.578259 us
 BigProduct/toJSON/syb   29.21153 us
 BigProduct/toJSON/generic   1.623115 us

 BigSum/toJSON/th51.81214 ns
 BigSum/toJSON/syb   1.256708 us
 BigSum/toJSON/generic   71.32851 ns


 fromJSON
 

 D/fromJSON/th   7.017204 us
 D/fromJSON/syb  23.46567 us
 D/fromJSON/generic  7.968974 us

 BigRecord/fromJSON/th   8.513789 us
 BigRecord/fromJSON/syb  36.64501 us
 BigRecord/fromJSON/generic  10.07809 us

 BigProduct/fromJSON/th  2.430677 us
 BigProduct/fromJSON/syb 17.97764 us
 BigProduct/fromJSON/generic 2.201130 us

 BigSum/fromJSON/th  414.8699 ns
 BigSum/fromJSON/syb 4.113170 us
 BigSum/fromJSON/generic 13.62614 us !!!


 As can be seen, in most cases the GHC Generics implementation is much
 faster than SYB and just as fast as TH. I'm impressed by how well GHC
 optimizes the code!

 Unfortunately the last benchmark, generically parsing a big sum type,
 is much slower. The code for parsing sums, which can be found here:


 https://github.com/basvandijk/aeson/blob/optimizations/Data/Aeson/Types/Internal.hs#L1059

 is basically this:


 instance (GFromSum a, GFromSum b) = GFromJSON (a :+: b) where
gParseJSON (Object (M.toList - [keyVal])) = gParseSum keyVal
gParseJSON v = typeMismatch sum (:+:) v
{-# INLINE gParseJSON #-}


 class GFromSum f where
gParseSum :: Pair - Parser (f a)

 instance (GFromSum a, GFromSum b) = GFromSum (a :+: b) where
gParseSum keyVal = (L1 $ gParseSum keyVal) |
   (R1 $ gParseSum keyVal)
{-# INLINE gParseSum #-}

 instance (Constructor c, GFromJSON a, ConsFromJSON a) =
GFromSum (C1 c a) where
gParseSum (key, value)
| key == pack (conName (undefined :: t c a p)) =
gParseJSON value
| otherwise = notFound $ unpack key
{-# INLINE gParseSum #-}


 notFound :: String - Parser a
 notFound key = fail $ The key \ ++ key ++ \ was not found
 {-# INLINE notFound #-}


 Any idea how to make it faster?

 Regards,

 Bas

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] How to speedup generically parsing sum types?

2011-11-03 Thread Bas van Dijk
2011/11/3 José Pedro Magalhães j...@cs.uu.nl:
 - Compile with -O2 and -fno-spec-constr-count (this last one is particularly
 important)

I already compiled with -O2. Adding -fno-spec-constr-count does not
change the results.

 - Add {-# INLINE [1] #-} pragmas to the to/from methods of your Generic
 instances.

I tried:

BigSum/toJSON/generic goes from 70 ns to 52 ns! So inlining 'from' is
an improvement.

Unfortunately BigSum/fromJSON/generic stays at 13 us.

Bas

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] How to speedup generically parsing sum types?

2011-11-03 Thread Bas van Dijk
For those who find this interesting. Here's the code of the BigSum benchmark
with a manual Generic instance with inlined 'from' and 'to':
https://gist.github.com/1336426

José, I was thinking about the following idea. Say GHC generates the
following instance for BigSum:

instance Generic BigSum where
  type Rep BigSum = D1 D1BigSum SumOfBigSum
  ...

type SumOfBigSum =
   (   (   (C1 C1_0BigSum U1
   :+: (C1 C1_1BigSum U1 :+: C1 C1_2BigSum U1)
   )
   :+: (C1 C1_3BigSum U1
   :+: (C1 C1_4BigSum U1 :+: C1 C1_5BigSum U1)
   )
   )
   :+: (   (C1 C1_6BigSum U1
   :+: (C1 C1_7BigSum U1 :+: C1 C1_8BigSum U1)
   )
   :+: (C1 C1_9BigSum  U1
   :+: (C1 C1_10BigSum U1 :+: C1 C1_11BigSum U1)
   )
   )
   )
  :+: (   (   (C1 C1_12BigSum U1
  :+: (C1 C1_13BigSum U1 :+: C1 C1_14BigSum U1)
  )
  :+: (C1 C1_15BigSum U1
  :+: (C1 C1_16BigSum U1 :+: C1 C1_17BigSum U1)
  )
  )
  :+: (   (C1 C1_18BigSum U1
  :+: (C1 C1_19BigSum U1 :+: C1 C1_20BigSum U1)
  )
  :+: (   (C1 C1_21BigSum U1 :+: C1 C1_22BigSum U1)
  :+: (C1 C1_23BigSum U1 :+: C1 C1_24BigSum U1)
  )
  )
  )

It also generates the following function (or method): (I haven't
figured out the correct type yet. A correct version might need to use
type families or functional dependencies)

conPath :: String - Maybe (C1 ? ? ? - SumOfBigSum)
conPath F01 = Just $ L1 . L1 . L1 . L1
conPath F02 = Just $ L1 . L1 . L1 . R1 . L1
conPath F03 = Just $ L1 . L1 . L1 . R1 . R1
conPath F04 = Just $ L1 . L1 . R1 . L1
conPath F05 = Just $ L1 . L1 . R1 . R1 . L1
conPath F06 = Just $ L1 . L1 . R1 . R1 . R1
conPath F07 = Just $ L1 . R1 . L1 . L1
conPath F08 = Just $ L1 . R1 . L1 . R1 . L1
conPath F09 = Just $ L1 . R1 . L1 . R1 . R1
conPath F10 = Just $ L1 . R1 . R1 . L1
conPath F11 = Just $ L1 . R1 . R1 . R1 . L1
conPath F12 = Just $ L1 . R1 . R1 . R1 . R1
conPath F13 = Just $ R1 . L1 . L1 . L1
conPath F14 = Just $ R1 . L1 . L1 . R1 . L1
conPath F15 = Just $ R1 . L1 . L1 . R1 . R1
conPath F16 = Just $ R1 . L1 . R1 . L1
conPath F17 = Just $ R1 . L1 . R1 . R1 . L1
conPath F18 = Just $ R1 . L1 . R1 . R1 . R1
conPath F19 = Just $ R1 . R1 . L1 . L1
conPath F20 = Just $ R1 . R1 . L1 . R1 . L1
conPath F21 = Just $ R1 . R1 . L1 . R1 . R1
conPath F22 = Just $ R1 . R1 . R1 . L1 . L1
conPath F23 = Just $ R1 . R1 . R1 . L1 . R1
conPath F24 = Just $ R1 . R1 . R1 . R1 . L1
conPath F25 = Just $ R1 . R1 . R1 . R1 . R1
conPath _ = Nothing

conPath is given the name of the constructor. If it's a valid name it
will return a function that constructs a SumOfBigSum given the
corresponding constructor. Of course, since the types of the
constructors can vary (not in this case) coming up with a correct
implementation is a challenge.

Using conPath in my gParseJSON is easy:

instance (GFromJSON a, GFromJSON b) = GFromJSON (a :+: b) where
gParseJSON (Object (M.toList - [(key, val)])) =
case conPath key of
  Nothing   - mzero
  Just path - path $ gParseJSON val

gParseJSON v = typeMismatch sum (:+:) v
{-# INLINE gParseJSON #-}

I suspect this to be much more efficient.

Bas

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] How to speedup generically parsing sum types?

2011-11-03 Thread Claus Reinke

* syb: toJSON and fromJSON from the Data.Aeson.Generic module. Uses
the Data type class.
..
As can be seen, in most cases the GHC Generics implementation is much
faster than SYB and just as fast as TH. I'm impressed by how well GHC
optimizes the code!


Not that it matters much if you're going with other tools, but your 
SYB code has a long linear chain of type rep comparisons, at every

level of the recursive traversals. That is partially inherent in the SYB
design (reducing everything to cast), but could probably be improved?

Claus



___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] How to speedup generically parsing sum types?

2011-11-03 Thread Bas van Dijk
2011/11/3 Claus Reinke claus.rei...@talk21.com:
 Not that it matters much if you're going with other tools, but your SYB code
 has a long linear chain of type rep comparisons, at every
 level of the recursive traversals. That is partially inherent in the SYB
 design (reducing everything to cast), but could probably be improved?

I'm not an expert on the SYB code. So if you have an idea how to
improve it please file a ticket or even better send a pull request.

Cheers,

Bas

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] How to speedup generically parsing sum types?

2011-11-03 Thread Twan van Laarhoven

On 03/11/11 11:16, Bas van Dijk wrote:

...
instance (Constructor c, GFromJSON a, ConsFromJSON a) = GFromSum (C1 c a) where
 gParseSum (key, value)
 | key == pack (conName (undefined :: t c a p)) =
 gParseJSON value
 | otherwise = notFound $ unpack key
 {-# INLINE gParseSum #-}


notFound :: String -  Parser a
notFound key = fail $ The key \ ++ key ++ \ was not found
{-# INLINE notFound #-}


Perhaps relying on Attoparsec backtracking for picking out the right 
alternative from the sum is the problem. You could try it with Maybe:



class GFromSum f where
gParseSum :: Pair - Maybe (Parser (f a))

instance (Constructor c, GFromJSON a, ConsFromJSON a)
= GFromSum (C1 c a) where
gParseSum (key, value)
| key == pack (conName (undefined :: t c a p))
= Just (gParseJSON value)
| otherwise = Nothing

instance (GFromSum a, GFromSum b) = GFromSum (a :+: b) where
gParseSum keyVal = (fmap L1 $ gParseSum keyVal)
   | (fmap R1 $ gParseSum keyVal)
{-# INLINE gParseSum #-}

instance (GFromSum a, GFromSum b) = GFromJSON (a :+: b) where
gParseJSON (Object (M.toList - [keyVal]))
| Just p - gParseSum keyVal - p
gParseJSON v = typeMismatch sum (:+:) v



Twan

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] How to speedup generically parsing sum types?

2011-11-03 Thread Bas van Dijk
On 3 November 2011 17:38, Twan van Laarhoven twa...@gmail.com wrote:
 Perhaps relying on Attoparsec backtracking for picking out the right
 alternative from the sum is the problem. You could try it with Maybe:

Good idea. I implemented and committed it and the
BigSum/fromJSON/generic benchmark goes from 13.6 us to 11.3 us. Still
a lot slower than the 4.1 us of SYB or the 414.9 ns of TH but an
improvement nonetheless.

Thanks for the idea!

Bas

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe