Re: [Haskell-cafe] Mathematics and Statistics libraries
Hi Heinrich, If we compare the GHCi experience with R or IPython, leaving aside any GUIs, the help system they have at the repl level is just a lot more intuitive and easy to use, and you get access to the full manual entries. For example, compare what you see if you type :info sort into GHCi versus ?sort in R. R gives you a view of the full docs for the function, whereas in GHCi you just get the type signature. I usually def a command to call out to ":!hoogle --info %", which gives what you expect :info should. So, as is usually the case, there's a solution in Haskell that matches the features in other systems, but it's not the default and you have to invest effort getting it set up right. This is fine for Haskell devs who do some stats work, but it represents an offputtingly steep learning curve for quants who are willing to learn a little Haskell but expect (reasonably) some basic stuff like inline help to Just Work. Tom On 25 March 2012 08:26, Heinrich Apfelmus wrote: > Tom Doris wrote: >> >> >> If you're interested in UI work, ideally we'd have something similar >> to RStudio as an environment, a simple set of windows encapsulating an >> editor, a repl, a plotting panel and help/history, this sounds >> superficial but it really has an impact when you're exploring a data >> set and trying stuff out. > > > Concerning UI, the following project suggestion aims to give GHCi a web GUI > > http://hackage.haskell.org/trac/summer-of-code/ticket/1609 > > But one of your criteria is that a good UI should come with a help system, > too, right? > > > Best regards, > Heinrich Apfelmus > > -- > http://apfelmus.nfshost.com > > > > ___ > Haskell-Cafe mailing list > Haskell-Cafe@haskell.org > http://www.haskell.org/mailman/listinfo/haskell-cafe ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Mathematics and Statistics libraries
If the goal is to help Haskell be a more acceptable choice for general statistical analysis tasks, then hmatrix, statistics, and the various gsl wrappers already provide the majority of the functionality needed. I think the bigger problem is that there is no guidance on which libraries are industrial strength, and there's no glue layer making it easier to use the APIs you'd want to, and GHCi isn't always ideal as a repl for this workflow. If you're interested in UI work, ideally we'd have something similar to RStudio as an environment, a simple set of windows encapsulating an editor, a repl, a plotting panel and help/history, this sounds superficial but it really has an impact when you're exploring a data set and trying stuff out. However, it would be a bigger contribution to get us to the point where we are able to just "import Quant.Prelude" to bring into scope all the standard functionality assumed in an environment like R or Matlab. In my experience most of this can come from re-exporting existing libraries while occasionally wrapping functions to simplify the interfaces and make them more consistent (e.g., a quant doesn't particularly need to know why Statistics.Sample.KernelDensity.kde uses unboxed vectors when the rest of that lib uses Generic, and they certainly won't want to spend their time remembering that they need to convert to call that function). As an exercise, in GHCi, try loading a few arbitrary csv files of tables including floating point columns, do a linear regression of one such column on another, and then display a scatterplot with the regression line, maybe throw in a check for the normality of the residuals. Assume you'll need to be able to handle large data sets so you need to use bytestring, attoparsec etc; beware that there's a known bug that will cause a segfault/bus error if you use some hmatrix/gsl functions from GHCi on x86_64, which is kind of a blocker in itself. Maybe I missed something obvious but it took me a looong time to figure out which containers, persistence + parsing, stats and plotting packages I should choose. I really disagree that we need a data frame type structure; they're an abomination in R, they try to accommodate event records and time series, and do neither well. Haskell records are fine for inhomogeneous event series and for homogeneous time series parallel Vectors or Matrices are better as they can be passed to BLAS and LAPACK with consequent performance and clarity advantages - column oriented storage rocks, and Haskell is already a good fit. Having used C++, Matlab and R (the latter for quite a while) I now use Haskell for all of my statistical analysis work, despite the many shortcomings it's definitely worth it for the code clarity and type checking, to say nothing of the pre-optimization performance and robustness. Best of luck, happy to share some preliminary code with you directly if you're interested! Tom On 21 March 2012 17:24, Ben Jones wrote: > I am a student currently interested in participating in Google Summer of > Code. I have a strong interest in Haskell, and a semester's worth of coding > experience in the language. I am a mathematics and cs double major with only > a semester left and I am looking for information regarding what the > community is lacking as far as mathematics and statistics libraries are > concerned. If there is enough interest I would like to put together a > project with this. I understand that such libraries are probably low > priority, but if anyone has anything I would love to hear it. > > Thanks for reading, > -Benjamin > > ___ > Haskell-Cafe mailing list > Haskell-Cafe@haskell.org > http://www.haskell.org/mailman/listinfo/haskell-cafe > ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] empty fields are dropped in bytestring csv
Hacky patch to fix this for future reference, against bytestring-csv-0.1.2, cost center annotations used to anecdotally verify that the change doesn't significantly impact performance, (interestingly the Alex lexer in bytestring-csv appears to allocate 1.5GB while lexing a 1.6MB csv file!?) Text/CSV/ByteString.hs 65c65 < fields = [ unquote s | Item s <- line ] --- > fields = [ unquote s | Item s <- pline line] 76a77,86 > > > pline fs@(Item x : []) = fs > pline (Item x : Comma : []) = {-# SCC "plinea" #-} Item x : Comma : Item S.empty : [] > pline (Item x : Comma : rs) = {-# SCC "plineb" #-} Item x : Comma : pline rs > pline (Comma : []) = {-# SCC "plinec" #-} Comma : Item S.empty : Comma : Item S.empty : [] > pline (Comma : rs) = {-# SCC "plined" #-} Item S.empty : Comma : pline rs > pline (Newline : rs ) = [] > pline [] = [] > On 17 February 2012 23:16, Tom Doris wrote: > the bytestring-csv package appears to have a bug whereby empty fields are > dropped completely from the row, which is different to Text.CSV , which > will return an empty field in the parse result. I'd argue this is a bug in > bytestring-csv, anyone know whether this has been raised before, or know of > a workaround? > > Prelude Data.Maybe Data.List Text.CSV.ByteString Data.ByteString.Char8> > parseCSV $ pack "a,b,c\n1,2,3\n1,,9\n" > Just [["a","b","c"],["1","2","3"],["1","9"]] > > -- the last row has two fields ^ > > Prelude Text.CSV> parseCSV "/tmp/err" "a,b,c\n1,2,3\n1,,9\n" > Right [["a","b","c"],["1","2","3"],["1","","9"],[""]] > > > ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
[Haskell-cafe] empty fields are dropped in bytestring csv
the bytestring-csv package appears to have a bug whereby empty fields are dropped completely from the row, which is different to Text.CSV , which will return an empty field in the parse result. I'd argue this is a bug in bytestring-csv, anyone know whether this has been raised before, or know of a workaround? Prelude Data.Maybe Data.List Text.CSV.ByteString Data.ByteString.Char8> parseCSV $ pack "a,b,c\n1,2,3\n1,,9\n" Just [["a","b","c"],["1","2","3"],["1","9"]] -- the last row has two fields ^ Prelude Text.CSV> parseCSV "/tmp/err" "a,b,c\n1,2,3\n1,,9\n" Right [["a","b","c"],["1","2","3"],["1","","9"],[""]] ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
[Haskell-cafe] hmatrix under ghci on x86_64
I'm using ghci + hmatrix and a few other packages as a Haskell based replacement for Matlab, everything works well so far in terms of available functionality. However, I have encountered an issue when running in ghci on x86_64 systems - calls into functions that in turn call gsl functions will result in a bus error, e.g: Prelude> :m +Numeric.Container Prelude Numeric.Container> randomVector 10 Gaussian 10 fromList Bus error (core dumped) I attached gdb and found the bus error was happening in gsl_rng_alloc() but some investigation indicate that the problem is probably due to this bug in ghci: http://hackage.haskell.org/trac/ghc/ticket/2912 which has been marked as a duplicate of http://hackage.haskell.org/trac/ghc/ticket/781 and a recent update to 781 indicates that it won't be addressed until at least v 7.6.1 (781 also references http://hackage.haskell.org/trac/ghc/ticket/3658 and it seems that this is a pretty large piece of work - moving to fully dynamically linked ghci which has been around for a while and pushed back a few times). Does anyone know of a workaround that would allow ghci to use wrapped gsl functionality on x86_64 systems in the meantime? Most linux boxes used by quants are x86_64 now, so this issue will impact many people who would like to use Haskell instead of Matlab. Thanks ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
[Haskell-cafe] library-profiling default
Hi Is there a good reason that the default for library-profiling in .cabal/config is set to False? It seems a lot of people hit the problem of trying to profile for the first time, finding it doesn't work because profiling libraries haven't been installed, then they have to walk the dependencies reinstalling everything. Is there a major cost or problem with just defaulting this to True? Apologies if this is answered elsewhere, I saw various discussions on why it is difficult to automatically build required libs with profiling on demand, but nothing that discussed changing the default so that they are always built. Tom ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
[Haskell-cafe] Data.Judy
Hi, Are there any plans to extent the current Data.Judy package to include bindings to JudySL and JudyHS? There's a standalone binding to JudySL by Andrew Choi that is usable but it would of course be better to have the functionality in the Data.Judy package proper. Thanks Tom ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
[Haskell-cafe] Re: chart "broken" under 6.12 according to criterion
According to the criterion.cabal file shipped with the latest (0.5.0.1) version of criterion, the Chart package is broken under GHC 6.12: flag Chart description: enable use of the Chart package -- Broken under GHC 6.12 so far Does anyone know the status of this problem? It's been a little frustrating getting Criterion up and running - it didn't work at all under 6.10 due to a compiler bug ("The impossible happened" error on uvector install) and now it works under 6.12 but without the nice charts that are so useful. Appreciate any insight or workarounds for this, thanks (Apologies, previous email sent prematurely!) Tom On 1 July 2010 10:16, Tom Doris wrote: > According to the criterion.cabal file shipped with the latest (0.5.0.1) > version of criterion, the Chart package is broken under GHC 6.12: > > flag Chart > > > ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
[Haskell-cafe] chart "broken" under 6.12 according to criterion
According to the criterion.cabal file shipped with the latest (0.5.0.1) version of criterion, the Chart package is broken under GHC 6.12: flag Chart ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe