[Haskell-cafe] ANN: lazy-csv - the fastest and most space-efficient parser for CSV
There are lots of Haskell CSV parsers out there. Most have poor error-reporting, and do not scale to large inputs. I am pleased to announce an industrial-strength library that is robust, fast, space-efficient, lazy, and scales to gigantic inputs with no loss of performance. http://code.haskell.org/lazy-csv/ Downloads from Hackage: http://hackage.haskell.org/package/lazy-csv This library has been in industrial use for several years now, but this is the first public release. No doubt the API is not as general as it could be, but it already serves many purposes very well. I'm happy to receive bug reports and suggestions for improvements. Regards, Malcolm ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] ANN: lazy-csv - the fastest and most space-efficient parser for CSV
On 02/25/2013 10:47 AM, Malcolm Wallace wrote: There are lots of Haskell CSV parsers out there. Most have poor error-reporting, and do not scale to large inputs. I am pleased to announce an industrial-strength library that is robust, fast, space-efficient, lazy, and scales to gigantic inputs with no loss of performance. http://code.haskell.org/lazy-csv/ Downloads from Hackage: http://hackage.haskell.org/package/lazy-csv This library has been in industrial use for several years now, but this is the first public release. No doubt the API is not as general as it could be, but it already serves many purposes very well. I'm happy to receive bug reports and suggestions for improvements. Regards, Malcolm Obvious question: How does this compare to cassava? Especially cassava's Data.CSV.Incremental module? I specifically ask because you mention that it's It is lazier, faster, more space-efficient, and more flexible in its treatment of errors, than any other extant Haskell CSV library on Hackage but there is no mention of cassava in the website. - Ollie ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] ANN: lazy-csv - the fastest and most space-efficient parser for CSV
On 25 Feb 2013, at 11:14, Oliver Charles wrote: Obvious question: How does this compare to cassava? Especially cassava's Data.CSV.Incremental module? I specifically ask because you mention that it's It is lazier, faster, more space-efficient, and more flexible in its treatment of errors, than any other extant Haskell CSV library on Hackage but there is no mention of cassava in the website. Simple answer - I have never heard of cassava, and suspect it did not exist when I first did the benchmarking. I'd be happy to re-do my performance comparison, including cassava and any other recent-ish CSV libraries, if I can find them. Regards, Malcolm ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] ANN: lazy-csv - the fastest and most space-efficient parser for CSV
On 25 February 2013 21:47, Malcolm Wallace malcolm.wall...@me.com wrote: There are lots of Haskell CSV parsers out there. Most have poor error-reporting, and do not scale to large inputs. I am pleased to announce an industrial-strength library that is robust, fast, space-efficient, lazy, and scales to gigantic inputs with no loss of performance. http://code.haskell.org/lazy-csv/ Downloads from Hackage: http://hackage.haskell.org/package/lazy-csv Note that on your website, you list the Hackage URL as having packages rather than package... This library has been in industrial use for several years now, but this is the first public release. No doubt the API is not as general as it could be, but it already serves many purposes very well. I'm happy to receive bug reports and suggestions for improvements. Regards, Malcolm ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe -- Ivan Lazar Miljenovic ivan.miljeno...@gmail.com http://IvanMiljenovic.wordpress.com ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] ANN: lazy-csv - the fastest and most space-efficient parser for CSV
Malcolm Wallace malcolm.wall...@me.com writes: Simple answer - I have never heard of cassava, and suspect it did not exist when I first did the benchmarking. I'd be happy to re-do my performance comparison, including cassava and any other recent-ish CSV libraries, if I can find them. I would be very interested in those results, Malcolm. Thanks, -- John Wiegley FP Complete Haskell tools, training and consulting http://fpcomplete.com johnw on #haskell/irc.freenode.net ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] ANN: lazy-csv - the fastest and most space-efficient parser for CSV
I'd also like to point to a couple of CSV libraries I released a long time ago and have been maintaining that both target constant-space operation and try (and hope) for the best in terms of speed. I'd be very interested to know how they fare in terms of performance benchmarking: Latest, based on conduit: http://hackage.haskell.org/package/csv-conduit (just released the latest version) Older, based on enumerator: http://hackage.haskell.org/package/csv-enumerator Notice how both are based on IO streaming libraries of fame to achieve both constant space operation AND nice interoperability with their habitat. I have found this to be especially true in the case of conduit. If you end up designing a benchmark, I'd be happy to get it working with my library. - Oz On Monday, February 25, 2013 at 5:16 PM, John Wiegley wrote: Malcolm Wallace malcolm.wall...@me.com (mailto:malcolm.wall...@me.com) writes: Simple answer - I have never heard of cassava, and suspect it did not exist when I first did the benchmarking. I'd be happy to re-do my performance comparison, including cassava and any other recent-ish CSV libraries, if I can find them. I would be very interested in those results, Malcolm. Thanks, -- John Wiegley FP Complete Haskell tools, training and consulting http://fpcomplete.com johnw on #haskell/irc.freenode.net (http://irc.freenode.net) ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org (mailto:Haskell-Cafe@haskell.org) http://www.haskell.org/mailman/listinfo/haskell-cafe ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] ANN: lazy-csv - the fastest and most space-efficient parser for CSV
Cassava is quite new, but has the same goals as lazy-csv. Its about a year old now - http://blog.johantibell.com/2012/08/a-new-fast-and-easy-to-use-csv-library.html I know Johan has been working on the benchmarks of late - it would be very good to know how the two compare in features On Feb 25, 2013 11:23 AM, Malcolm Wallace malcolm.wall...@me.com wrote: On 25 Feb 2013, at 11:14, Oliver Charles wrote: Obvious question: How does this compare to cassava? Especially cassava's Data.CSV.Incremental module? I specifically ask because you mention that it's It is lazier, faster, more space-efficient, and more flexible in its treatment of errors, than any other extant Haskell CSV library on Hackage but there is no mention of cassava in the website. Simple answer - I have never heard of cassava, and suspect it did not exist when I first did the benchmarking. I'd be happy to re-do my performance comparison, including cassava and any other recent-ish CSV libraries, if I can find them. Regards, Malcolm ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] ANN: lazy-csv - the fastest and most space-efficient parser for CSV
On Mon, Feb 25, 2013 at 2:32 PM, Don Stewart don...@gmail.com wrote: Cassava is quite new, but has the same goals as lazy-csv. Its about a year old now - http://blog.johantibell.com/2012/08/a-new-fast-and-easy-to-use-csv-library.html I know Johan has been working on the benchmarks of late - it would be very good to know how the two compare in features I whipped together a quick benchmark: https://github.com/tibbe/cassava/blob/master/benchmarks/Benchmarks.hs To run, check out the cassava repo on GitHub and run: cabal configure --enable-benchmarks cabal build cabal bench Here are the results (all the normal caveats for benchmarking applies): benchmarking positional/decode/presidents/without conversion mean: 62.85965 us, lb 62.56705 us, ub 63.26101 us, ci 0.950 std dev: 1.751446 us, lb 1.371323 us, ub 2.295576 us, ci 0.950 benchmarking positional/decode/streaming/presidents/without conversion mean: 93.81925 us, lb 91.14701 us, ub 98.19217 us, ci 0.950 std dev: 17.20842 us, lb 11.58690 us, ub 23.41786 us, ci 0.950 benchmarking comparison/lazy-csv mean: 133.2609 us, lb 132.4415 us, ub 135.3085 us, ci 0.950 std dev: 6.193178 us, lb 3.123661 us, ub 12.83148 us, ci 0.950 The two first set of numbers are for cassava (in the all-at-once vs streaming mode). The last set is for lazy-csv. The feature sets of the two libraries are quite different. Both do basic CSV parsing (with some extensions). * lazy-csv parses CSV data to something akin to [[ByteString]], but with a heavy focus on error recovery and precise error messages. * cassava parses CSV data to [a], where a is a user-defined type that represents a CSV record. There are options to recover from *type conversion* errors, but not from malformed CSV. cassava has several parsing modes: incremental for parsing interleaved with I/O, streaming for lazy parsing (with or without I/O), and all-at-once parsing for when you want to hold all the data in memory. -- Johan ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe