[Haskell-cafe] ANN: lazy-csv - the fastest and most space-efficient parser for CSV

2013-02-25 Thread Malcolm Wallace
There are lots of Haskell CSV parsers out there.  Most have poor 
error-reporting, and do not scale to large inputs.  I am pleased to announce an 
industrial-strength library that is robust, fast, space-efficient, lazy, and 
scales to gigantic inputs with no loss of performance.

http://code.haskell.org/lazy-csv/

Downloads from Hackage:

http://hackage.haskell.org/package/lazy-csv

This library has been in industrial use for several years now, but this is the 
first public release.  No doubt the API is not as general as it could be, but 
it already serves many purposes very well.  I'm happy to receive bug reports 
and suggestions for improvements.

Regards,
Malcolm


___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] ANN: lazy-csv - the fastest and most space-efficient parser for CSV

2013-02-25 Thread Oliver Charles

On 02/25/2013 10:47 AM, Malcolm Wallace wrote:

There are lots of Haskell CSV parsers out there.  Most have poor 
error-reporting, and do not scale to large inputs.  I am pleased to announce an 
industrial-strength library that is robust, fast, space-efficient, lazy, and 
scales to gigantic inputs with no loss of performance.

 http://code.haskell.org/lazy-csv/

Downloads from Hackage:

 http://hackage.haskell.org/package/lazy-csv

This library has been in industrial use for several years now, but this is the 
first public release.  No doubt the API is not as general as it could be, but 
it already serves many purposes very well.  I'm happy to receive bug reports 
and suggestions for improvements.

Regards,
 Malcolm


Obvious question: How does this compare to cassava? Especially cassava's 
Data.CSV.Incremental module? I specifically ask because you mention that 
it's  It is lazier, faster, more space-efficient, and more flexible in 
its treatment of errors, than any other extant Haskell CSV library on 
Hackage but there is no mention of cassava in the website.


- Ollie
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] ANN: lazy-csv - the fastest and most space-efficient parser for CSV

2013-02-25 Thread Malcolm Wallace

On 25 Feb 2013, at 11:14, Oliver Charles wrote:

 Obvious question: How does this compare to cassava? Especially cassava's 
 Data.CSV.Incremental module? I specifically ask because you mention that it's 
  It is lazier, faster, more space-efficient, and more flexible in its 
 treatment of errors, than any other extant Haskell CSV library on Hackage 
 but there is no mention of cassava in the website.

Simple answer - I have never heard of cassava, and suspect it did not exist 
when I first did the benchmarking. I'd be happy to re-do my performance 
comparison, including cassava and any other recent-ish CSV libraries, if I can 
find them.

Regards,
Malcolm
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] ANN: lazy-csv - the fastest and most space-efficient parser for CSV

2013-02-25 Thread Ivan Lazar Miljenovic
On 25 February 2013 21:47, Malcolm Wallace malcolm.wall...@me.com wrote:
 There are lots of Haskell CSV parsers out there.  Most have poor 
 error-reporting, and do not scale to large inputs.  I am pleased to announce 
 an industrial-strength library that is robust, fast, space-efficient, lazy, 
 and scales to gigantic inputs with no loss of performance.

 http://code.haskell.org/lazy-csv/

 Downloads from Hackage:

 http://hackage.haskell.org/package/lazy-csv

Note that on your website, you list the Hackage URL as having
packages rather than package...


 This library has been in industrial use for several years now, but this is 
 the first public release.  No doubt the API is not as general as it could be, 
 but it already serves many purposes very well.  I'm happy to receive bug 
 reports and suggestions for improvements.

 Regards,
 Malcolm


 ___
 Haskell-Cafe mailing list
 Haskell-Cafe@haskell.org
 http://www.haskell.org/mailman/listinfo/haskell-cafe



-- 
Ivan Lazar Miljenovic
ivan.miljeno...@gmail.com
http://IvanMiljenovic.wordpress.com

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] ANN: lazy-csv - the fastest and most space-efficient parser for CSV

2013-02-25 Thread John Wiegley
 Malcolm Wallace malcolm.wall...@me.com writes:

 Simple answer - I have never heard of cassava, and suspect it did not exist
 when I first did the benchmarking. I'd be happy to re-do my performance
 comparison, including cassava and any other recent-ish CSV libraries, if I
 can find them.

I would be very interested in those results, Malcolm.

Thanks,
-- 
John Wiegley
FP Complete Haskell tools, training and consulting
http://fpcomplete.com   johnw on #haskell/irc.freenode.net

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] ANN: lazy-csv - the fastest and most space-efficient parser for CSV

2013-02-25 Thread Ozgun Ataman
I'd also like to point to a couple of CSV libraries I released a long time ago 
and have been maintaining that both target constant-space operation and try 
(and hope) for the best in terms of speed. I'd be very interested to know how 
they fare in terms of performance benchmarking:

Latest, based on conduit: http://hackage.haskell.org/package/csv-conduit (just 
released the latest version)

Older, based on enumerator: http://hackage.haskell.org/package/csv-enumerator

Notice how both are based on IO streaming libraries of fame to achieve both 
constant space operation AND nice interoperability with their habitat. I have 
found this to be especially true in the case of conduit.

If you end up designing a benchmark, I'd be happy to get it working with my 
library.

- Oz


On Monday, February 25, 2013 at 5:16 PM, John Wiegley wrote:

  Malcolm Wallace malcolm.wall...@me.com 
  (mailto:malcolm.wall...@me.com) writes:
 

   
  
 
 
  Simple answer - I have never heard of cassava, and suspect it did not exist
  when I first did the benchmarking. I'd be happy to re-do my performance
  comparison, including cassava and any other recent-ish CSV libraries, if I
  can find them.
  
 
 
 I would be very interested in those results, Malcolm.
 
 Thanks,
 -- 
 John Wiegley
 FP Complete Haskell tools, training and consulting
 http://fpcomplete.com johnw on #haskell/irc.freenode.net 
 (http://irc.freenode.net)
 
 ___
 Haskell-Cafe mailing list
 Haskell-Cafe@haskell.org (mailto:Haskell-Cafe@haskell.org)
 http://www.haskell.org/mailman/listinfo/haskell-cafe
 
 


___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] ANN: lazy-csv - the fastest and most space-efficient parser for CSV

2013-02-25 Thread Don Stewart
Cassava is quite new, but has the same goals as lazy-csv.

Its about a year old now -
http://blog.johantibell.com/2012/08/a-new-fast-and-easy-to-use-csv-library.html

I know Johan has been working on the benchmarks of late - it would be very
good to know how the two compare in features
On Feb 25, 2013 11:23 AM, Malcolm Wallace malcolm.wall...@me.com wrote:


 On 25 Feb 2013, at 11:14, Oliver Charles wrote:

  Obvious question: How does this compare to cassava? Especially cassava's
 Data.CSV.Incremental module? I specifically ask because you mention that
 it's  It is lazier, faster, more space-efficient, and more flexible in its
 treatment of errors, than any other extant Haskell CSV library on Hackage
 but there is no mention of cassava in the website.

 Simple answer - I have never heard of cassava, and suspect it did not
 exist when I first did the benchmarking. I'd be happy to re-do my
 performance comparison, including cassava and any other recent-ish CSV
 libraries, if I can find them.

 Regards,
 Malcolm
 ___
 Haskell-Cafe mailing list
 Haskell-Cafe@haskell.org
 http://www.haskell.org/mailman/listinfo/haskell-cafe

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] ANN: lazy-csv - the fastest and most space-efficient parser for CSV

2013-02-25 Thread Johan Tibell
On Mon, Feb 25, 2013 at 2:32 PM, Don Stewart don...@gmail.com wrote:

 Cassava is quite new, but has the same goals as lazy-csv.

 Its about a year old now -
 http://blog.johantibell.com/2012/08/a-new-fast-and-easy-to-use-csv-library.html

 I know Johan has been working on the benchmarks of late - it would be very
 good to know how the two compare in features

I whipped together a quick benchmark:
https://github.com/tibbe/cassava/blob/master/benchmarks/Benchmarks.hs

To run, check out the cassava repo on GitHub and run: cabal configure
--enable-benchmarks  cabal build  cabal bench

Here are the results (all the normal caveats for benchmarking applies):

benchmarking positional/decode/presidents/without conversion
mean: 62.85965 us, lb 62.56705 us, ub 63.26101 us, ci 0.950
std dev: 1.751446 us, lb 1.371323 us, ub 2.295576 us, ci 0.950

benchmarking positional/decode/streaming/presidents/without conversion
mean: 93.81925 us, lb 91.14701 us, ub 98.19217 us, ci 0.950
std dev: 17.20842 us, lb 11.58690 us, ub 23.41786 us, ci 0.950

benchmarking comparison/lazy-csv
mean: 133.2609 us, lb 132.4415 us, ub 135.3085 us, ci 0.950
std dev: 6.193178 us, lb 3.123661 us, ub 12.83148 us, ci 0.950

The two first set of numbers are for cassava (in the all-at-once vs
streaming mode). The last set is for lazy-csv.

The feature sets of the two libraries are quite different. Both do basic
CSV parsing (with some extensions).

 * lazy-csv parses CSV data to something akin to [[ByteString]], but with a
heavy focus on error recovery and precise error messages.
 * cassava parses CSV data to [a], where a is a user-defined type that
represents a CSV record. There are options to recover from *type
conversion* errors, but not from malformed CSV. cassava has several parsing
modes: incremental for parsing interleaved with I/O, streaming for lazy
parsing (with or without I/O), and all-at-once parsing for when you want to
hold all the data in memory.

-- Johan
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe