from:"Iustin Pop"

[Haskell-cafe] The RTSOPTS -qm flag's impact on runtime

2013-09-30 Thread Iustin Pop

Hi all,

I found an interesting case where the rtsopts -qm flag makes a
significant difference in runtime (~50x). This is using GHC 7.6.3, llvm 3.4, 
program
compiled with -threaded -O2 -fllvm and a couple of language extension.
Source is at
http://benchmarksgame.alioth.debian.org/u64q/benchmark.php?test=chameneosreduxlang=ghcid=4data=u64q,
on the language shootout benchmarks.

Running the code without -N results (on my computer) in around 4 seconds
of runtime:
$ time ./orig 600
…
real0m3.919s
user0m3.903s
sys 0m0.010s

This is reasonably consistent. Running -N4 (this is an 8-core machine)
results in the surprising:

$ time ./orig 600 +RTS -N4
…
real1m15.154s
user1m38.790s
sys 2m7.947s

The cores are all used very erratically (continuously changing
5%-20%-40%) and the overall cpu usage is ~27-28%. Note the surprising
2m7s of sys usage, which means the kernel is involved a lot…

Note that removing the explicit forkOn and running with -N4 results in
somewhat worse performance:

real2m6.548s
user2m13.470s
sys 2m3.043s

So in that sense the forkOn itself is not at fault. What I have found is
that -qm is here a life saver:

$ time ./orig 600 +RTS -N4 -qm
real0m2.773s
user0m5.610s
sys 0m0.123s

Adding -qa doesn't make a big difference. To summarise more runs (in
terms of cpu used, user+sys):

with forkOn:
  - -N4: 228s
  - -N4 -qa: 110s
  - -N4 -qm:   6s
  - -N4 -qm -qa:   6s

without forkOn:
  - -N4: 253s
  - -N4 -qa: 252s
  - -N4 -qm:   5s
  - -N4 -qm -qa:   5s

(Note that without forkOn is a bit slower in term of wall-clock, as
the with forkOn version distributes the work a bit better, even if it
uses overall a tiny bit more CPU.)

So the question is, what does -qm actually do that it affects this
benchmark so much (~50x)? (The docs are not very clear on it)

And furthermore, could there be an heuristic inside the runtime such
that automatic thread migration is suspended if threads are
over-migrated (which is what I suppose happens here)?

thanks for any explanations,
iustin
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe

Re: [Haskell-cafe] GSoC Project Proposal: Markdown support for Haddock

2013-04-04 Thread Iustin Pop

On Thu, Apr 04, 2013 at 06:41:19PM +0100, Edsko de Vries wrote:
 Yes please!

+1 as well. I find the current syntax too restrictive…

iustin

 On Thu, Apr 4, 2013 at 5:49 PM, Johan Tibell johan.tib...@gmail.com wrote:
 
  Hi all,
 
  Haddock's current markup language leaves something to be desired once
  you want to write more serious documentation (e.g. several paragraphs
  of introductory text at the top of the module doc). Several features
  are lacking (bold text, links that render as text instead of URLs,
  inline HTML).
 
  I suggest that we implement an alternative haddock syntax that's a
  superset of Markdown. It's a superset in the sense that we still want
  to support linkifying Haskell identifiers, etc. Modules that want to
  use the new syntax (which will probably be incompatible with the
  current syntax) can set:
 
  {-# HADDOCK Markdown #-}
 
  on top of the source file.
 
  Ticket: http://trac.haskell.org/haddock/ticket/244
 
  -- Johan
 
  ___
  Haskell-Cafe mailing list
  Haskell-Cafe@haskell.org
  http://www.haskell.org/mailman/listinfo/haskell-cafe
 

 ___
 Haskell-Cafe mailing list
 Haskell-Cafe@haskell.org
 http://www.haskell.org/mailman/listinfo/haskell-cafe


___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe

Re: [Haskell-cafe] How far compilers are allowed to go with optimizations?

2013-02-09 Thread Iustin Pop

On Sat, Feb 09, 2013 at 09:56:12AM +0100, Johan Holmquist wrote:
 As a software developer, who typically inherits code to work on rather
 than simply writing new, I see a potential of aggressive compiler
 optimizations causing trouble. It goes like this:
 
 Programmer P inherits some application/system to improve upon. Someday
 he spots some piece of rather badly written code. So he sets out and
 rewrites that piece, happy with the improvement it brings to clarity
 and likely also to efficiency.
 
 The code goes into production and, disaster. The new improved
 version runs 3 times slower than the old, making it practically
 unusable. The new version has to be rolled back with loss of uptime
 and functionality and  management is not happy with P.
 
 It just so happened that the old code triggered some aggressive
 optimization unbeknownst to everyone, **including the original
 developer**, while the new code did not. (This optimization maybe even
 was triggered only on a certain version of the compiler, the one that
 happened to be in use at the time.)
 
 I fear being P some day.
 
 Maybe this is something that would never happen in practice, but how
 to be sure...

An interesting point, but I think here P is still at fault. If we're
talking about important software, there will be regression tests, both
in terms of quality and performance? Surely there will be a canary
period, parallel running of the old and new system, etc.?

regards,
iustin

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe

[Haskell-cafe] Is it possible to have constant-space JSON decoding?

2012-12-04 Thread Iustin Pop

Hi,

I'm trying to parse a rather simple but big JSON message, but it turns
out that memory consumption is a problem, and I'm not sure what the
actual cause is.

Let's say we have a simple JSON message: an array of 5 million numbers.
I would like to parse this in constant space, such that if I only need
the last element, overall memory usage is low (yes, unrealistic use, but
please bear with me for a moment).

Using aeson, I thought the following program will be nicely-behaved:

 import Data.Aeson
 import qualified Data.Attoparsec.ByteString.Lazy as AL
 import qualified Data.ByteString.Lazy as L
 
 main = do
   r - L.readFile numbers
   case AL.parse json r :: AL.Result Value of
 AL.Fail _ context errs - do
  print context
  print errs
 AL.Done _ d - case fromJSON d::Result [Value] of
  Error x - putStrLn x
  Success d - print $ last d

However, this uses (according to +RTS -s) ~1150 GB of memory. I've tried
switching from json to json', but while that uses slightly less memory
(~1020 MB) it clearly can't be fully lazy, since it forces conversion to
actual types.

Looking at the implementation of FromJSON [a], it seems we could
optimise the code by not forcing to a list. New (partial) version does:

 AL.Done _ d - case d of
  Array v - print $ V.last v

And this indeed reduces the memory, when using json', to about ~700MB.
Better, but still a lot.

It seems that the Array constructor holds a vector, and this results in
too much strictness?

Looking at the memory profiles (with json and Array), things are
quite interesting - lots of VOID, very small USE, all generated from
Data.Aeson.Parser.Internal:array. Using -hd, we have a reasonable equal
split between various attoparsec combinators, Data.Aeson.Parser.Internal
epressions, etc.

So, am I doing something wrong, or is it simply not feasible to get
constant-space JSON decoding?

Using the 'json' library instead of 'aeson' is no better, since that
wants the input as a String which consumes even more memory (and dies,
when compiled with -O2, with out of stack even for 64MB stack).

thanks,
iustin

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe

Re: [Haskell-cafe] Is it possible to have constant-space JSON decoding?

2012-12-04 Thread Iustin Pop

On Tue, Dec 04, 2012 at 12:23:19PM -0200, Felipe Almeida Lessa wrote:
 Aeson doesn't have an incremental parser so it'll be
 difficult/impossible to do what you want.  I guess you want an
 event-based JSON parser, such as yajl [1].  I've never used this
 library, though.

Ah, I see. Thanks, I wasn't aware of that library.

So it seems that using either 'aeson' or 'json', we should be prepared
to pay the full cost of input message (string/bytestring) plus the cost
of the converted data structures.

thanks!
iustin

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe

Re: [Haskell-cafe] Is it possible to have constant-space JSON decoding?

2012-12-04 Thread Iustin Pop

On Tue, Dec 04, 2012 at 09:47:52AM -0500, Clark Gaebel wrote:
 Aeson is used for the very common usecase of short messages that need to be
 parsed as quickly as possible into a static structure. A lot of things are
 sacrificed to make this work, such as incremental parsing and good error
 messages. It works great for web APIs like twitter's.

I see, good to know.

 I didn't even know people used JSON to store millions of integers. It
 sounds like fun.

Actually, that's not the actual use case :), this was just an example to
show memory use with trivial data structures (where strictness/lazyness
is easier to reason about).

In my case, I have reasonably-sized message of complex objects, which
results in the same memory profile: cost of input message (as
String/ByteString) plus cost of the converted objects. What bothers me
is that I don't seem to be able to at least remove the cost of the input
data after parsing, due to non-strict types I convert to.

thanks,
iustin

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe

Re: [Haskell-cafe] Is it possible to have constant-space JSON decoding?

2012-12-04 Thread Iustin Pop

On Tue, Dec 04, 2012 at 03:58:24PM +0100, Herbert Valerio Riedel wrote:
 Iustin Pop ius...@google.com writes:
 
 [...]
 
  Let's say we have a simple JSON message: an array of 5 million numbers.
  I would like to parse this in constant space, such that if I only need
  the last element, overall memory usage is low (yes, unrealistic use, but
  please bear with me for a moment).
 
  Using aeson, I thought the following program will be nicely-behaved:
 
 part of the problem is that aeson builds an intermediate JSON parse-tree
 which has quite an overhead for representing a list of numbers on the
 heap as each numbers requires multiple heap objects (see also [1]). This
 is an area where e.g. Python has a significantly smaller footprint
 (mostly due to a more efficient heap representation).

Ah, I see. Thanks for the link, so that's from where the 'S' constructor
was coming from in the -hd output.

And indeed, I was surprised as well that Python has a more efficient
representation for this.

 [...]
 
  It seems that the Array constructor holds a vector, and this results in
  too much strictness?
 
 btw, using a list on the other hand would add an overhead of 2 words
 (due to the ':' constructor) for representing each JSON array element in
 the parse-tree, that's probably why aeson uses vectors instead of lists.

Ack.

thanks,
iustin

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe

Re: [Haskell-cafe] curl package broken in Windows

2012-11-13 Thread Iustin Pop

On Mon, Nov 12, 2012 at 08:23:21PM -0500, Michael Orlitzky wrote:
 On 11/12/12 17:43, Iavor Diatchki wrote:
  Hi,
  
  Ok, there were only minor differences between the repo and the version
  on hackage so I imported the changes into the repo, which should now be
  the same as version 1.3.7 on hackage.
  Please feel free to submit merge requestsall the folks I know who
  worked on this originally are busy with other stuff, so we really need
  someone who's using the library to help.
  
 
 I reported this a while ago, and Iustin gave an awesome explanation of
 the problem:
 
 http://haskell.1045720.n5.nabble.com/Network-Curl-cookie-jar-madness-td5716344.html
 
 I've since switched to http-conduit for that project but it would be
 nice to have curl working anyway because it requires less thinking. If
 someone's going to maintain it, then consider this a bug report!

I will file that as a bug report, thanks for the reminder!

iustin

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe

Re: [Haskell-cafe] curl package broken in Windows

2012-11-13 Thread Iustin Pop

On Mon, Nov 12, 2012 at 02:43:32PM -0800, Iavor Diatchki wrote:
 Hi,
 
 Ok, there were only minor differences between the repo and the version on
 hackage so I imported the changes into the repo, which should now be the
 same as version 1.3.7 on hackage.
 Please feel free to submit merge requestsall the folks I know who
 worked on this originally are busy with other stuff, so we really need
 someone who's using the library to help.

Hi,

So I started by filling a couple of bug reports (yay for a bugtracker,
finally). However, it seems that I can't set labels on the issues (bugs
vs. enhancements), probably github rights.

Will try to send merge requests as well, time/skills permitting.

thanks,
iustin

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe

Re: [Haskell-cafe] curl package broken in Windows

2012-11-12 Thread Iustin Pop

On Mon, Nov 12, 2012 at 01:48:23PM -0800, Iavor Diatchki wrote:
 Hi,
 
 the curl binding certainly needs some love---if anyone has the time to fix
 it up and maintain it, help would be most appreciated.  There is a repo for
 it over here: https://github.com/GaloisInc/curl which is the most up-to
 date version I know of, but since the last commit there seems to be from 4
 years ago, I'm not going to bet that there aren't any additional fixes
 floating around.  (cc-ing Don, who is listed as the maintainer, but I'm not
 sure if he has time to deal with curl right now)

I've tried to contact Don multiple times over the past month with offers
of whatever help I can give, but I heard nothing back.

I didn't know about the github repo (it's not listed on the hackage
page), so thanks a lot for that info, I'll try to send some merge
requests and file bugs (there is a least one critical bug w.r.t. SSL
usage on Linux and another small-impact bug with cookie jars usage).

iustin, who uses curl and _really_ wants to see it improved

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe

Re: [Haskell-cafe] Quickcheck

2012-11-12 Thread Iustin Pop

On Mon, Nov 12, 2012 at 10:14:30PM +0100, Simon Hengel wrote:
 On Mon, Nov 12, 2012 at 07:21:06PM +, gra...@fatlazycat.com wrote:
  Hi, 
  
  Trying to find some good docs on QuickCheck, if anyone has one ?
  
  Been scanning what I can find, but a question.
  
  What would be the best way to generate two different/distinct integers ?
 
 I would use Quickcheck's implication operator here:
 
 quickCheck $ \x y - x /= (y :: Int) == ...

That's good, but it only eliminates test cases after they have been
generated. A slightly better (IMHO) version is to generate correct
values in the first place:

prop_Test :: Property
prop_Test =
  forAll (arbitrary::Gen Int) $ \x -
  forAll (arbitrary `suchThat` (/= x)) $ \y -
  …

regards,
iustin

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe

Re: [Haskell-cafe] curl package broken in Windows

2012-11-12 Thread Iustin Pop

On Mon, Nov 12, 2012 at 10:57:25PM +0100, Iustin Pop wrote:
 On Mon, Nov 12, 2012 at 01:48:23PM -0800, Iavor Diatchki wrote:
  Hi,
  
  the curl binding certainly needs some love---if anyone has the time to fix
  it up and maintain it, help would be most appreciated.  There is a repo for
  it over here: https://github.com/GaloisInc/curl which is the most up-to
  date version I know of, but since the last commit there seems to be from 4
  years ago, I'm not going to bet that there aren't any additional fixes
  floating around.  (cc-ing Don, who is listed as the maintainer, but I'm not
  sure if he has time to deal with curl right now)
 
 I've tried to contact Don multiple times over the past month with offers
 of whatever help I can give, but I heard nothing back.
 
 I didn't know about the github repo (it's not listed on the hackage
 page), so thanks a lot for that info, I'll try to send some merge
 requests and file bugs (there is a least one critical bug w.r.t. SSL
 usage on Linux and another small-impact bug with cookie jars usage).

Hmm, checking again, the github repo is at version 1.3.5 (April 2009),
whereas hackage is at version 1.3.7 (uploaded in May 2011).

Still hunting for a correct upstream project page or tracker…

regards,
iustin

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe

Re: [Haskell-cafe] Defining a Strict language pragma

2012-11-05 Thread Iustin Pop

On Mon, Nov 05, 2012 at 02:52:56PM -0800, Johan Tibell wrote:
 Hi all,
 
 I would like to experiment with writing some modules (e.g. low-level
 modules that do a lot of bit twiddling) in a strict subset of Haskell. The
 idea is to remove boilerplate bangs (!) and instead declare the whole
 module strict. I believe this would both make code that needs to be strict
 more declarative (you say what you want once, instead of putting bangs
 everywhere), less noisy, and more predictable (no need to reason about
 laziness in places where you know you don't want it). The idea is to
 introduce a new language pragma
 
 {-# LANGUAGE Strict #-}
 
 that has the above described effect.
 
 The tricky part is to define the semantics of this pragma in terms of
 Haskell, instead of in terms of Core. While we also need the latter, we
 cannot describe the feature to users in terms of Core. The hard part is to
 precisely define the semantics, especially in the presence of separate
 compilation (i.e. we might import lazy functions).
 
 I'd like to get the Haskell communities input on this. Here's a strawman:
 
  * Every function application f _|_ = _|_, if f is defined in this module
 [1]. This also applies to data type constructors (i.e. the code acts if all
 fields are preceded by a bang).
 
  * lets and where clauses act like (strict) case statements.
 
  * It's still possible to define strict arguments, using ~. In essence
 the Haskell lazy-by-default with opt-out via ! is replaced with
 strict-by-default with opt-out via ~.

Did you mean here it's still possible to define _lazy_ arguments? The
duality of !/~ makes sense, indeed.

I personally have no idea what implications this might have, but I would
be very interested to see how existing code (which doesn't require
laziness) would behave when run under this new pragma.

regards,
iustin

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe

Re: [Haskell-cafe] Optimal line length for haskell

2012-10-29 Thread Iustin Pop

On Mon, Oct 29, 2012 at 05:20:20PM +0530, Rustom Mody wrote:
 There was a recent discussion on the python list regarding maximum line
 length.
 It occured to me that beautiful haskell programs tend to be plump (ie have
 long lines) compared to other languages whose programs are 'skinnier'.
 My thoughts on this are at
 http://blog.languager.org/2012/10/layout-imperative-in-functional.html.
 
 Are there more striking examples than the lexer from the standard prelude?
 [Or any other thoughts/opinions :-) ]

For what is worth, in our project (Ganeti) which has a mixed
Python/Haskell codebase, we're using the same maximum length
(80-but-really-79) in both languages, without any (real) issues.

regards,
iustin

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe

Re: [Haskell-cafe] Optimal line length for haskell

2012-10-29 Thread Iustin Pop

On Mon, Oct 29, 2012 at 03:50:57PM +, Niklas Hambüchen wrote:
 I would prefer to completely ignore line lengths when writing Haskell.
 
 In general, giving good names to things in where-clauses automatically
 keeps my code short enough.
 
 My opinion is that different people like different code layouts, and
 when formatting code in certain ways, we will always have to make
 compromises.
 
 I would like if there was a layout normal form for storing Haskell code
 - all code presented to humans should be shown just as that human likes
 it best.
 
 In the future, I would like to work on a personalizable real-time
 formatter that editors can hook into, using haskell-src-exts.

+1 to that; I know that it would indeed increase my productivity…

iustin

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe

Re: [Haskell-cafe] [Security] Put haskell.org on https

2012-10-28 Thread Iustin Pop

On Sun, Oct 28, 2012 at 01:38:46PM +0100, Petr P wrote:
   Erik,
 
 does cabal need to do any authenticated stuff? For downloading
 packages I think HTTP is perfectly fine. So we could have HTTP for
 cabal download only and HTTPS for everything else.

Kindly disagree here. Ensuring that packages are downloaded
safely/correctly without MITM attacks is also important. Even if as an
option.

regards,
iustin

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe

Re: [Haskell-cafe] [Security] Put haskell.org on https

2012-10-28 Thread Iustin Pop

On Sun, Oct 28, 2012 at 03:53:04PM +0100, Petr P wrote:
 2012/10/28 Iustin Pop iu...@k1024.org:
  On Sun, Oct 28, 2012 at 01:38:46PM +0100, Petr P wrote:
  does cabal need to do any authenticated stuff? For downloading
  packages I think HTTP is perfectly fine. So we could have HTTP for
  cabal download only and HTTPS for everything else.
 
  Kindly disagree here. Ensuring that packages are downloaded
  safely/correctly without MITM attacks is also important. Even if as an
  option.
 
 Good point. But if cabal+https is a problem, this could be solved by
 other means too, for example by signing the packages.

Well, I agree, but then the same could be applied on upload too, like
Debian does - instead of user+pw, register a GPG key.

iustin

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe

Re: [Haskell-cafe] [Security] Put haskell.org on https

2012-10-28 Thread Iustin Pop

On Sun, Oct 28, 2012 at 04:26:07PM +0100, Changaco wrote:
 On Sun, 28 Oct 2012 14:45:02 +0100 Iustin Pop wrote:
  Kindly disagree here. Ensuring that packages are downloaded
  safely/correctly without MITM attacks is also important. Even if as an
  option.
 
 HTTPS doesn't fully protect against a MITM since there is no shared
 secret between client and server prior to the connection.
 
 The MITM can use a self-signed certificate, or possibly a certificate
 signed by a compromised CA.

Sure, but I was talking about a proper certificate signed by a
well-known registrar, at which point the https client would default to
verify the signature against the system certificate store.

Yes, I'm fully aware that this is not fully safe, but I hope you agree
that https with a proper certificate is much better than plain http.

regards,
iustin

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe

Re: [Haskell-cafe] [Security] Put haskell.org on https

2012-10-28 Thread Iustin Pop

On Sun, Oct 28, 2012 at 05:10:39PM +0100, Changaco wrote:
 On Sun, 28 Oct 2012 16:39:10 +0100 Iustin Pop wrote:
  Sure, but I was talking about a proper certificate signed by a
  well-known registrar, at which point the https client would default to
  verify the signature against the system certificate store.
 
 It doesn't matter what kind of certificate the server uses since the
 client generally doesn't know about it, especially on first connection.
 Some programs remember the certificate between uses and inform you
 when it changes, but that's not perfect either.

The client doesn't have to know about it, if it can verify a chain of
trust via the system cert store, as I said above.

regards,
iustin

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe

Re: [Haskell-cafe] serialize an unknown type

2012-10-21 Thread Iustin Pop

On Sun, Oct 21, 2012 at 07:20:10PM +0200, Corentin Dupont wrote:
 Hi,
 Sorry if it was not enough explicit.
 I want to write functions like this:
 
 serialize :: (Show a) = Event a - IO ()
 deserialize :: (Read a) = IO () - Event a
 
 The functions would write and read the data in a file, storing/retrieving
 also the type a I suppose...

Can't you simply, when defining the type event, add a deriving (Show,
Read)? Then the standard (but slow) read/show serialisation would work
for you.

If you're asking to de-serialise an unknown type (i.e. you don't know
what type it should restore a-priori, but you want to do that based on
the contents of the file), things become a little more complex. Unless
you can further restrict the type 'a', really complex.

Maybe stating your actual problem, rather than the implementation
question, would be better?

regards,
iustin

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe

Re: [Haskell-cafe] serialize an unknown type

2012-10-21 Thread Iustin Pop

On Sun, Oct 21, 2012 at 08:11:50PM +0200, Corentin Dupont wrote:
 Hi Iustin,
 yes I want to deserialize an unknown type based on the content of the file
 (in this example).
 Let's say I can reduce the spectum of types to: Strings, all the types in
 Enum, Ints. Is it possible?

Maybe :) I don't know how to do all the types in enum. For Strings,
Ints and Float, you could have something like: serialise a pair instead
of just the type, and decode:

data Event = EventString String | EventInt Int | EventFloat Float
…
  (kind, val) - read in_data
  return $ case kind of
string - EventString val
int - EventInt (read val)
float - EventFloat (read val)

But this is only for a very few specific types.

 My real problem is that on my web interface I want to use web routes to
 allow a user to pass an Event back to the engine.
 The problem is that it seems that web-routes only accepts Strings (or some
 known types) to be passed on the web route, whereas I need to pass random
 types.

Are you sure you need to pass random types? Can't you define a (large)
set of known types, for example?

regards,
iustin

 On Sun, Oct 21, 2012 at 8:00 PM, Iustin Pop iu...@k1024.org wrote:
 
  On Sun, Oct 21, 2012 at 07:20:10PM +0200, Corentin Dupont wrote:
   Hi,
   Sorry if it was not enough explicit.
   I want to write functions like this:
  
   serialize :: (Show a) = Event a - IO ()
   deserialize :: (Read a) = IO () - Event a
  
   The functions would write and read the data in a file, storing/retrieving
   also the type a I suppose...
 
  Can't you simply, when defining the type event, add a deriving (Show,
  Read)? Then the standard (but slow) read/show serialisation would work
  for you.
 
  If you're asking to de-serialise an unknown type (i.e. you don't know
  what type it should restore a-priori, but you want to do that based on
  the contents of the file), things become a little more complex. Unless
  you can further restrict the type 'a', really complex.
 
  Maybe stating your actual problem, rather than the implementation
  question, would be better?
 
  regards,
  iustin
 

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe

Re: [Haskell-cafe] Network.Curl cookie jar madness

2012-08-19 Thread Iustin Pop

On Sun, Aug 19, 2012 at 12:45:47AM -0400, Michael Orlitzky wrote:
 On 08/18/2012 08:52 PM, Michael Orlitzky wrote:
  I'm one bug away from a working program and need some help. I wrote a
  little utility that logs into LWN.net, retrieves an article, and creates
  an epub out of it.
 
 I've created two pages where anyone can test this. The first just takes
 any username and password via post and sets a session variable. The
 second prints Success. if the session variable is set, and Failure.
 if it isn't. The bash script,

[…]

 The attached haskell program using Network.Curl, doesn't:
 
   $ runghc haskell-test.hs
   Logged in...
   Failure.
 
 Any help is appreciated =)

So, take this with a grain of salt: I've been bitten by curl (the
haskell bindings, I mean) before, and I don't hold the quality of the
library in great regard.

The libcurl documentation says: When you set a file name with
CURLOPT_COOKIEJAR, that file name will be created and all received
cookies will be stored in it when curl_easy_cleanup(3) is called (i.e.
at the end of a curl handle session). But even though the curl bindings
seem to run easy_cleanup on handles (initialize → mkCurl →
mkCurlWithCleanup), they don't do this correctly:

DEBUG: ALLOC: CURL
DEBUG: ALLOC: /tmp/network-curl-test-haskell20417.txt
DEBUG: ALLOC: username=foopassword=bar
DEBUG: ALLOC: http://michael.orlitzky.com/tmp/network-curl-test1.php
DEBUG: ALLOC: WRITER
DEBUG: ALLOC: WRITER

Note there's no DEBUG: FREE: CURL as the code seems to imply there
should be. Hence, the handle is never cleaned up (do the curl bindings
leak handles?), so the cookie file is never written.

Side note: by running the same program multiple times, sometimes you see
DEBUG: FREE: CURL, sometimes no FREE actions. I believe there's
something very wrong in the curl bindings with regard to cleanups.

If I modify curl to export a force cleanup function, I can make the
program work (but not always; my patch is a hack).

Alternatively, as the curl library doesn't need a cookie jar to use
cookies in the same handle, by modifying your code to reuse the same
curl handle (returning it from log_in and reusing the same in get_page)
gives me a success code. But the cookie file is still not filled, since
the curl handle is never properly terminated.

Since the curl bindings also have problems in multi-threaded programs
when SSL is enabled (as it doesn't actually setup the curl library
correctly with regards to multi-threaded memory allocation), I would
suggest you try to use the http conduit library, since that's a pure
haskell library that should work as well, if not better.

Happy to be proved wrong, if I'm just biased against curl :)

regards,
iustin

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe

Re: [Haskell-cafe] Network.Curl cookie jar madness

2012-08-19 Thread Iustin Pop

On Sun, Aug 19, 2012 at 06:06:53PM +0200, Iustin Pop wrote:
 On Sun, Aug 19, 2012 at 12:45:47AM -0400, Michael Orlitzky wrote:
  On 08/18/2012 08:52 PM, Michael Orlitzky wrote:
   I'm one bug away from a working program and need some help. I wrote a
   little utility that logs into LWN.net, retrieves an article, and creates
   an epub out of it.
  
  I've created two pages where anyone can test this. The first just takes
  any username and password via post and sets a session variable. The
  second prints Success. if the session variable is set, and Failure.
  if it isn't. The bash script,
 
 […]
 
  The attached haskell program using Network.Curl, doesn't:
  
$ runghc haskell-test.hs
Logged in...
Failure.
  
  Any help is appreciated =)
 
 So, take this with a grain of salt: I've been bitten by curl (the
 haskell bindings, I mean) before, and I don't hold the quality of the
 library in great regard.
 
 The libcurl documentation says: When you set a file name with
 CURLOPT_COOKIEJAR, that file name will be created and all received
 cookies will be stored in it when curl_easy_cleanup(3) is called (i.e.
 at the end of a curl handle session). But even though the curl bindings
 seem to run easy_cleanup on handles (initialize → mkCurl →
 mkCurlWithCleanup), they don't do this correctly:
 
 DEBUG: ALLOC: CURL
 DEBUG: ALLOC: /tmp/network-curl-test-haskell20417.txt
 DEBUG: ALLOC: username=foopassword=bar
 DEBUG: ALLOC: http://michael.orlitzky.com/tmp/network-curl-test1.php
 DEBUG: ALLOC: WRITER
 DEBUG: ALLOC: WRITER
 
 Note there's no DEBUG: FREE: CURL as the code seems to imply there
 should be. Hence, the handle is never cleaned up (do the curl bindings
 leak handles?), so the cookie file is never written.
 
 Side note: by running the same program multiple times, sometimes you see
 DEBUG: FREE: CURL, sometimes no FREE actions. I believe there's
 something very wrong in the curl bindings with regard to cleanups.

On more investigation, this seems to be due to the somewhat careless use
of Foreign.Concurrent; from the docs:

  “The finalizer will be executed after the last reference to the
  foreign object is dropped. There is no guarantee of promptness, and in
  fact there is no guarantee that the finalizer will eventually run at
  all.”

Also, see http://hackage.haskell.org/trac/ghc/ticket/1364.

So it seems that the intended way of cleaning up curl handles is all
fine and dandy if one doesn't require timely cleanup; in most cases,
this is not needed, but for cookies it is broken.

I don't know what the proper solution is; either way, it seems that
there should be a way to force the cleanup to be run, via
finalizeForeignPtr, or requiring full manual handling of curl handles
(instead of via finalizers).

Gah, native libs++.

regards,
iustin

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe

Re: [Haskell-cafe] Network.Curl cookie jar madness

2012-08-18 Thread Iustin Pop

On Sat, Aug 18, 2012 at 08:52:00PM -0400, Michael Orlitzky wrote:
 I'm one bug away from a working program and need some help. I wrote a
 little utility that logs into LWN.net, retrieves an article, and creates
 an epub out of it. Full code here:
 
   git clone http://michael.orlitzky.com/git/lwn-epub.git
 
 This is the code that gets the login cookie:
 
   cj - make_cookie_jar
   li_result - log_in cj uname pword
 
   case li_result of
 Left err - do
   let msg = Failed to log in.  ++ err
   hPutStrLn stderr msg
 Right response_body - do
   hPutStrLn stderr response_body
 
   return $ cfg { C.cookie_jar = Just cj }
 
 Curl is making the request, but if I remove the (hPutStrLn stderr
 response_body), it doesn't work! What's even more insane is, this works:
 
   hPutStrLn stderr response_body
 
 and this doesn't:
 
   hPutStrLn stdout response_body
 
 whaaa? I really don't want to dump the response body to
 stderr, but I can't even begin to imagine what's going on here. Has
 anyone got Network.Curl working with a cookie jar?

Is this perchance due to laziness? And the fact that stderr is not
buffered by default, so all output is forced right then (forcing the
evaluation), whereas stdout is buffered, so the output might only be
made later (or even after you to an hFlush).

I'd try to make sure that response_body is fully evaluated before
returning from the function.

Or I might be totally wrong, in which case sorry :)

regards,
iustin

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe

Re: [Haskell-cafe] Windows: openFile gives permission denied when file in use

2011-12-29 Thread Iustin Pop

On Thu, Dec 29, 2011 at 11:49:11AM +, Andrew Coppin wrote:
 On 29/12/2011 04:29 AM, Antoine Latter wrote:
 On Wed, Dec 28, 2011 at 3:52 PM, Michael Snoymanmich...@snoyman.com  wrote:
 Hi all,
 
 I just received a bug report from a client that, when an input file is
 open in FrameMaker, my program gives a permission denied error. This
 bug is reproducible with a simple Haskell program:
 
 
 This bug and its discussion is similar, but not identical:
 http://hackage.haskell.org/trac/ghc/ticket/4363
 
 This one has been rumbling on for ages. As others have said, the
 Report demands that locking occur, which is probably a mistake. The
 daft thing is, apparently Linux doesn't really support locking, so
 on that platform these types of thing all work fine, and only on
 Windows, which does and always has supported propper locking, people
 get these errors. And yet, many people seem surprised to hear that
 Windows can actually turn off the locking; they seem completely
 unaware of the extensive and highly flexible locking facilities that
 Windows provides. Every time I hear oh, I don't think Windows can
 handle that, I sigh with resignation.

Sorry to say, but it seems you yourself are unaware of the extensive
and highly flexible locking facilities on Linux :) The defaults on
Linux are advisory locking, not mandatory, but claiming Linux doesn't
support locking is plain wrong.

regards,
iustin

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe

Re: [Haskell-cafe] Windows: openFile gives permission denied when file in use

2011-12-29 Thread Iustin Pop

On Thu, Dec 29, 2011 at 12:20:18PM +, Andrew Coppin wrote:
 Every time I hear oh, I don't think Windows can
 handle that, I sigh with resignation.
 
 Sorry to say, but it seems you yourself are unaware of the extensive
 and highly flexible locking facilities on Linux :) The defaults on
 Linux are advisory locking, not mandatory, but claiming Linux doesn't
 support locking is plain wrong.
 
 I would suggest that advisory locking isn't particularly useful.

In my experience (as an application writer) it is very useful; it's just
a different paradigm, not a weaker one. Off-hand I don't remember a case
where having mandatory locking would have improved things.

 I
 gather that Linux does now support real locking though. (And file
 update notifications, and ACLs, and lots of other things that
 Windows has had for far longer.)

Hrmm: Mandatory File Locking For The Linux Operating System, 15 April
1996 :)

 Either way, I have no interest in starting a Windows vs Linux
 flamewar. 

Me neither - I just wanted to point out that your email sounded _too_
eager to blame the current state of affairs on the fact that Linux
doesn't support proper locking. Whereas, the problem is just one of
taking into account platform differences better.

 I'm just saying it would be nice if Haskell could support
 more of what these two OSes have to offer.

Totally agreed!

all the best,
iustin

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe

Re: [Haskell-cafe] On the purity of Haskell

2011-12-29 Thread Iustin Pop

On Thu, Dec 29, 2011 at 05:51:57PM +0100, Jerzy Karczmarczuk wrote:
 Iustin Pop::
 In practice too:
 
 bar _ = do
 s- readFile /tmp/x.txt
 return (read s)
 
 Once you're in a monad that has 'state', the return value doesn't
 strictly depend anymore on the function arguments.
 Nice example. PLEASE, show us the trace of its execution. Then, the
 discussion might be more fruitful

Sorry?

I did the same mistake of misreading the grand-parent's IO Int vs.
Int, if that's what you're referring to.

Otherwise, I'm confused as what you mean.

iustin

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe

Re: [Haskell-cafe] On the purity of Haskell

2011-12-29 Thread Iustin Pop

On Thu, Dec 29, 2011 at 05:55:24PM +0100, Iustin Pop wrote:
 On Thu, Dec 29, 2011 at 05:51:57PM +0100, Jerzy Karczmarczuk wrote:
  Iustin Pop::
  In practice too:
  
  bar _ = do
  s- readFile /tmp/x.txt
  return (read s)
  
  Once you're in a monad that has 'state', the return value doesn't
  strictly depend anymore on the function arguments.
  Nice example. PLEASE, show us the trace of its execution. Then, the
  discussion might be more fruitful
 
 Sorry?
 
 I did the same mistake of misreading the grand-parent's IO Int vs.
 Int, if that's what you're referring to.
 
 Otherwise, I'm confused as what you mean.

And to clarify better my original email: yes, (bar x) always gives you
back the same IO action; but the results of said IO action are/can be
different when executed.

iustin

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe

Re: [Haskell-cafe] On the purity of Haskell

2011-12-29 Thread Iustin Pop

On Thu, Dec 29, 2011 at 07:19:17AM -0600, Gregg Reynolds wrote:
 On Wed, Dec 28, 2011 at 2:44 PM, Heinrich Apfelmus
 apfel...@quantentunnel.de wrote:
 
  The beauty of the IO monad is that it doesn't change anything about purity.
  Applying the function
 
    bar :: Int - IO Int
 
  to the value 2 will always give the same result:
 
    bar 2 = bar (1+1) = bar (5-3)
 
 Strictly speaking, that doesn't sound right.  The result of an IO
 operation is outside of the control (and semantics) of the Haskell
 program, so Haskell has no idea what it will be.  Within the program,
 there is no result.  So Int - IO Int is not really a function - it
 does not map a determinate input to a determinate output.  The IO
 monad just makes it look and act like a function, sort of, but what it
 really does is provide reliable ordering of non-functional operations
 - invariant order, not invariant results.

Not only strictly speaking. In practice too:

bar _ = do
   s - readFile /tmp/x.txt
   return (read s)

Once you're in a monad that has 'state', the return value doesn't
strictly depend anymore on the function arguments.

At least that's my understanding.

regards,
iustin

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe

Re: [Haskell-cafe] Generating Code

2011-12-10 Thread Iustin Pop

On Fri, Dec 09, 2011 at 10:30:18PM +0100, L Corbijn wrote:
 The major set of problems for using template haskell is that it
 doesn't have the correct features, or better said it tries to solve
 another problem. Template haskell generates code into an existing
 module, while for this problem there is no module yet to generate it
 into. Of course I could generate those modules and let template
 haskell make the FFI imports, but then the problem remains how to
 generate those modules. So template haskell seems (as I see it) to
 solve the problem of writing almost the same code twice by generating
 it from some parameters coded in some source file. Another problem is
 that the export and import lists of the modules need to be generated
 too and this seems not an option for TH.

On Fri, Dec 09, 2011 at 04:27:31PM -0600, Antoine Latter wrote:
 For my case, template haskell can't create modules, and template
 haskell solves a different problem - I've not interested in creating
 Haskell declarations from Haskell declarations - I'm interested in
 creating Haskell modules from an external, formal,  specification. In
 a sense I'm compiling to Haskell.

This answer is for both the above quotes. While TH is not perfect (and
sometimes tedious/difficult to write in), it's not restricted to simply
generate code based on some parameters in an existing Haskell file.

It cannot generate modules, true, but other than that you could have a
module simply like this:

  module Foo where

  import …

  $(myBuilder)

Where myBuilder doesn't take any parameters, just reads some external
(XML, text file, whatever) and build the code from scratch.

I might misunderstand the problem, but I think that you _could_ use TH
for compiling to Haskell, as long as you have a Haskell parser for the
external/formal spec.

regards,
iustin

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe

Re: [Haskell-cafe] ANN: Monad.Reader Issue 19

2011-10-30 Thread Iustin Pop

On Wed, Oct 26, 2011 at 03:17:47PM -0400, Brent Yorgey wrote:
 I am pleased to announce that Issue 19 of The Monad.Reader, a special
 issue on parallelism and concurrency, is now available:
 
   http://themonadreader.files.wordpress.com/2011/10/issue19.pdf

Thanks a lot for the TMR, it's a pleasure to read, as always.

If like me there are people who (would like to) read this on an ebook
reader, these are the changes that finally give me good results on a
Sony Reader:

\usepackage[papersize={90mm,120mm},margin=2mm]{geometry}
\usepackage[kerning=true]{microtype}
\usepackage[T1]{fontenc}
\usepackage[charter]{mathdesign}
\usepackage{hyperref}
\hypersetup{pdftitle={The Monad Reader 19}}
\sloppy

Additionally, all the \includegraphics commands need changing from
absolute measurements (e.g. width=6cm) to something relative; in my
case, I just used \textwidth. And finally, the verbatim sections need a
bit smaller font, as they can't be reflowed nicely; or alternatively,
inserting manual line wraps (keeping the code consistent).

regards,
iustin

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe

Re: [Haskell-cafe] Function composition in run-time?

2011-08-24 Thread Iustin Pop

On Wed, Aug 24, 2011 at 04:35:42PM +0400, dokondr wrote:
 Hi,
 What is the Haskell way to compose functions in run-time?
 Depending on configuration parameters I need to be able to compose function
 in several ways without recompilation.
 When program starts it reads configuration parameters from a text file. For
 example, I have three functions, f1, f2, f3,  each doing some string
 processing. I need to support two configurations of string processors :
 
 if param1
then sp = f1 . f2 . f3
else sp = f1 . f3
 
 I'd like to avoid 'if' somehow and instead use some declarative way to
 specify code to run in external configuration file. In other words I need
 some easy tools to create mini DSLs without all the efforts usually involved
 with implementing full-blown DSL.

A simple alternative to if would be:

  options = [ (foo, f1 . f2 . f3)
, (bar, f1 . f3 )]

and then lookup param options. I don't know if this is what you're
looking for, though.

regards,
iustin

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe

Re: [Haskell-cafe] Function composition in run-time?

2011-08-24 Thread Iustin Pop

On Wed, Aug 24, 2011 at 04:57:19PM +0400, dokondr wrote:
 On Wed, Aug 24, 2011 at 4:44 PM, Iustin Pop ius...@google.com wrote:
 
  On Wed, Aug 24, 2011 at 04:35:42PM +0400, dokondr wrote:
   Hi,
   What is the Haskell way to compose functions in run-time?
   Depending on configuration parameters I need to be able to compose
  function
   in several ways without recompilation.
 
 
 A simple alternative to if would be:
 
   options = [ (foo, f1 . f2 . f3)
 , (bar, f1 . f3 )]
 
  and then lookup param options. I don't know if this is what you're
  looking for, though.
 
 
 Thanks!
 Yes, this is what I need - simple and easy. Yet, how function application
 will work in this case ?
 I mean after lookup returns me a composition ... need to check what type
 will it be.

Well, as with your 'if', all compositions must have the same type, since
lists are homogeneous.

regards,
iustin

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe

Re: [Haskell-cafe] how to read CPU time vs wall time report from GHC?

2011-08-14 Thread Iustin Pop

On Sun, Aug 14, 2011 at 08:11:36PM +0200, Wishnu Prasetya wrote:
 Hi guys,
 
 I'm new in parallel programming with Haskell. I made a simple test
 program using that par combinator etc, and was a bit unhappy that it
 turns out to be  slower than its sequential version. But firstly, I
 dont fully understand how to read the runtime report produced by GHC
 with -s option:
 
   SPARKS: 5 (5 converted, 0 pruned)
 
   INIT  time0.02s  (  0.01s elapsed)
   MUT   time3.46s  (  0.89s elapsed)
   GCtime5.49s  (  1.46s elapsed)
   EXIT  time0.00s  (  0.00s elapsed)
   Total time8.97s  (  2.36s elapsed)
 
 As I understand it from the documentation, the left time-column is
 the CPU time, whereas the right one is elapses wall time. But how
 come that the wall time is less than the CPU time? Isn't wall time =
 user's perspective of time; so that is CPU time + IO + etc?

Yes, but if you have multiple CPUs, then CPU time accumulates faster
than wall-clock time.

Based on the above example, I guess you have or you run the program on 4
cores (2.36 * 4 = 9.44, which means you got a very nice ~95%
efficiency).

regards,
iustin

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe

Re: [Haskell-cafe] how to read CPU time vs wall time report from GHC?

2011-08-14 Thread Iustin Pop

On Sun, Aug 14, 2011 at 08:32:36PM +0200, Wishnu Prasetya wrote:
 On 14-8-2011 20:25, Iustin Pop wrote:
 On Sun, Aug 14, 2011 at 08:11:36PM +0200, Wishnu Prasetya wrote:
 Hi guys,
 
 I'm new in parallel programming with Haskell. I made a simple test
 program using that par combinator etc, and was a bit unhappy that it
 turns out to be  slower than its sequential version. But firstly, I
 dont fully understand how to read the runtime report produced by GHC
 with -s option:
 
SPARKS: 5 (5 converted, 0 pruned)
 
INIT  time0.02s  (  0.01s elapsed)
MUT   time3.46s  (  0.89s elapsed)
GCtime5.49s  (  1.46s elapsed)
EXIT  time0.00s  (  0.00s elapsed)
Total time8.97s  (  2.36s elapsed)
 
 As I understand it from the documentation, the left time-column is
 the CPU time, whereas the right one is elapses wall time. But how
 come that the wall time is less than the CPU time? Isn't wall time =
 user's perspective of time; so that is CPU time + IO + etc?
 Yes, but if you have multiple CPUs, then CPU time accumulates faster
 than wall-clock time.
 
 Based on the above example, I guess you have or you run the program on 4
 cores (2.36 * 4 = 9.44, which means you got a very nice ~95%
 efficiency).
 
 regards,
 iustin

 That makes sense... But are you sure thats how i should read this?

As far as I know, this is correct.

 I dont want to jump happy too early.

Well, you algorithm does work in parallel, but if you look at the GC/MUT
time, ~60% of the total runtime is spent in GC, so you have a space leak
or an otherwise inefficient algorithm. The final speedup is just
3.46s/2.36s, i.e. 1.46x instead of ~4x, so you still have some work to
do to make this better.

At least, this is how I read those numbers.

regards,
iustin

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe

Re: [Haskell-cafe] Haskell Weekly News: Issue 187

2011-06-23 Thread Iustin Pop

On Thu, Jun 23, 2011 at 01:26:58PM -0400, Daniel Santa Cruz wrote:
 Lyndon,
 
 The links are minimized in hopes of making the plain text version
 somewhat readable. It is purely for aesthetical reasons. If you view
 the web version
 http://contemplatecode.blogspot.com/2011/06/haskell-weekly-news-issue-187.html
 you'll see that they are not minimized there.

FYI, a regular link (though longer) seems more appropriate to me.
Don't know if other people feel the same though.

But anyway, thanks for the HWN (in either short or the long form)!

regards,
iustin

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe

Re: [Haskell-cafe] Why is there no splitSeperator function inData.List

2011-02-14 Thread Iustin Pop

On Sun, Feb 13, 2011 at 11:33:16PM -0800, Donn Cave wrote:
  It is curious though that the Python community managed to agree on a
  single implementation and include that in the standard library
 
 To me, it's more like 2 implementations, overloaded on the same
 function name.
 
 Python 2.6.2 (r262:71600, Aug 30 2009, 15:41:32) 
 [GCC 2.95.3-haiku-090629] on haiku1
 Type help, copyright, credits or license for more information.
  import string
  string.split(' ho ho ')
 ['ho', 'ho']
  string.split(' ho ho ', ' ')
 ['', 'ho', 'ho', '']
  
 
 I.e., let the separator parameter default (to whitespace), and you
 get what we have with Prelude.words, but specify a split character
 and you get a reversible split.  It wasn't a new idea, the Bourne
 shell for example has a similar dual semantics depending on whether
 the separator is white space or not.  Somehow doesn't seem right
 for Haskell, though.

Agreed, but I don't think that lacking a generic split functionality (as
in reversible split) is also good.

iustin

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe

Re: [Haskell-cafe] Why is there no splitSeperator function in Data.List

2011-02-13 Thread Iustin Pop

On Sat, Feb 12, 2011 at 11:21:37AM -0500, Gwern Branwen wrote:
 On Sat, Feb 12, 2011 at 11:00 AM, Robert Clausecker fuz...@gmail.com wrote:
  Is there any reason, that one can't find a function that splits a list
  at a seperator in the standard library? I imagined something like this:
 
 
     splitSeperator :: Eq a = a - [a] - [[a]]
 
     splitSeperator ',' foo,bar,baz
       -- [foo,bar,baz]
 
  Or something similar? This is needed so often, even if I can implement
  it in one line, is there any reason why it's not in the libs?
 
 See http://hackage.haskell.org/package/split
 
 The reason it's not in Data.List is because there are a bazillion
 different splits one might want (when I was pondering the issue before
 Brent released it, I had collected something like 8 different proposed
 splits), so no agreement could ever be reached.

It is curious though that the Python community managed to agree on a
single implementation and include that in the standard library… So it is
possible :)

I also needed a split function and ended up with coding one that behaves
like the Python one for my project.

regards,
iustin

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe

Re: [Haskell-cafe] Why is there no splitSeperator function in Data.List

2011-02-13 Thread Iustin Pop

On Sun, Feb 13, 2011 at 06:01:01PM +0800, Lyndon Maydwell wrote:
 Does the Python implementation operate on Strings, or all lists?

Of course, just on strings.

 I think this could be quite important as many split implementations
 take regular expressions as arguments. This could be quite challenging
 for general lists.

Agreed. But (in Python at least), split via re and split via (static)
element are two separate functions, and split via element can be nicely
replicated in Haskell.

regards,
iustin

 That said, I would like to see some of these features in the split package.
 
 On Sun, Feb 13, 2011 at 5:50 PM, Iustin Pop iu...@k1024.org wrote:
  On Sat, Feb 12, 2011 at 11:21:37AM -0500, Gwern Branwen wrote:
  On Sat, Feb 12, 2011 at 11:00 AM, Robert Clausecker fuz...@gmail.com 
  wrote:
   Is there any reason, that one can't find a function that splits a list
   at a seperator in the standard library? I imagined something like this:
  
  
      splitSeperator :: Eq a = a - [a] - [[a]]
  
      splitSeperator ',' foo,bar,baz
        -- [foo,bar,baz]
  
   Or something similar? This is needed so often, even if I can implement
   it in one line, is there any reason why it's not in the libs?
 
  See http://hackage.haskell.org/package/split
 
  The reason it's not in Data.List is because there are a bazillion
  different splits one might want (when I was pondering the issue before
  Brent released it, I had collected something like 8 different proposed
  splits), so no agreement could ever be reached.
 
  It is curious though that the Python community managed to agree on a
  single implementation and include that in the standard library… So it is
  possible :)
 
  I also needed a split function and ended up with coding one that behaves
  like the Python one for my project.
 
  regards,
  iustin
 
  ___
  Haskell-Cafe mailing list
  Haskell-Cafe@haskell.org
  http://www.haskell.org/mailman/listinfo/haskell-cafe
 

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe

Re: [Haskell-cafe] Why is there no splitSeperator function in Data.List

2011-02-13 Thread Iustin Pop

On Sun, Feb 13, 2011 at 11:21:42AM +0100, Henning Thielemann wrote:
 
 On Sun, 13 Feb 2011, Iustin Pop wrote:
 
 On Sat, Feb 12, 2011 at 11:21:37AM -0500, Gwern Branwen wrote:
 
 See http://hackage.haskell.org/package/split
 
 The reason it's not in Data.List is because there are a bazillion
 different splits one might want (when I was pondering the issue before
 Brent released it, I had collected something like 8 different proposed
 splits), so no agreement could ever be reached.
 
 It is curious though that the Python community managed to agree on a
 single implementation and include that in the standard library… So it is
 possible :)
 
 It was not the implementation, that was discussed in length, but it
 was the question, what 'split' shall actually do.

Doh, of course I meant they managed to agree on a single definition of
what split means. Sorry for bad wording.

 If you are satisfied with a simple Haskell 98 implementation of a
 'split' operation you might like 'chop' in
   
 http://hackage.haskell.org/packages/archive/utility-ht/0.0.5.1/doc/html/Data-List-HT.html

Probably, but when I can have my own version in ~4 lines of Haskell, I'd
rather not have another dependency (that might or might not be packaged
in my distro).

regards,
iustin

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe

[Haskell-cafe] HDBC, postgresql, bytestrings and embedded NULLs

2011-01-07 Thread Iustin Pop

Hi all,

It seems that (at least) the postgresql bindings do not allow pure
binary data.

I have a simple table:

  debug=# create table test (name bytea);

byteas seems to be the backing type on the DB side for bytestrings.

and then I run this:

  import Database.HDBC.PostgreSQL
  import Database.HDBC
  import Data.ByteString
  
  main = do
db - connectPostgreSQL dbname=debug
stmt - prepare db INSERT INTO test (name) VALUES($1)
execute stmt [toSql $ pack [0]]
execute stmt [toSql $ pack [65, 0, 66]]
commit db
  

What happens is that the inserted string is cut-off at the first NULL
value: the first row is empty, and the second row contains just A.

http://www.postgresql.org/docs/8.4/static/datatype-binary.html says:

“When entering bytea values, octets of certain values must be escaped
(but all octet values can be escaped) when used as part of a string
literal in an SQL statement. In general, to escape an octet, convert it
into its three-digit octal value and precede it by two backslashes”, and
continues to list that NULL should be quoted as E'\\000'. However, I
find no such quoting in the HDCB.Postgresql sources.

Anyone else stumbled on this?

thanks,
iustin

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe

Re: [Haskell-cafe] HDBC, postgresql, bytestrings and embedded NULLs

2011-01-07 Thread Iustin Pop

On Fri, Jan 07, 2011 at 09:49:35AM -0600, John Goerzen wrote:
 On 01/07/2011 05:24 AM, Michael Snoyman wrote:
 On Fri, Jan 7, 2011 at 11:44 AM, Iustin Popius...@google.com  wrote:
 Yes, I had a bug reported in persistent-postgresql that I traced back
 to this bug. I reported the bug, but never heard a response. Frankly,
 if I had time, I would write a low-level PostgreSQL binding so I could
 skip HDBC entirely.
 
 I'm not seeing an open issue at
 https://github.com/jgoerzen/hdbc-postgresql/issues -- did you report
 it somewhere else?

Ah, I didn't know it's hosted there. I'm going to fill my report,
thanks!

iustin

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe

[Haskell-cafe] Network.Curl and thread safety

2011-01-05 Thread Iustin Pop

Hi all,

I'm not able to find out how one can use Network.Curl with the
threaded runtime safely.

I have this simple example:

import Network.Curl
import Control.Concurrent
import Control.Concurrent.MVar

getUrl :: (Monad m) = String - IO (m String)
getUrl url = do
  (code, body) - curlGetString url [CurlSSLVerifyPeer False,
 CurlSSLVerifyHost 0,
 CurlTimeout 15,
 CurlConnectTimeout 5]
  return $ case code of
 CurlOK - return body
 _ - fail $ Curl error for  ++ url ++  error  ++ show code

runGet :: String - MVar (Maybe String) - IO ()
runGet url mv = do
  body - getUrl url
  putMVar mv body

main = withCurlDo $ do
  let urls = replicate 10 https://google.com/;
  mvs - mapM (\_ - newEmptyMVar) urls
  mapM_ (\(mv, url) - forkIO (runGet url mv)) $ zip mvs urls
  mapM (\mv - takeMVar mv = print) mvs
  threadDelay 1000


When using curl linked with GnuTLS and running it with the
multi-threaded runtime, it fails immediately with:
  $ ./main 
  main: ath.c:193: _gcry_ath_mutex_lock: Assertion `*lock ==
  ((ath_mutex_t) 0)' failed.
  Aborted (core dumped)

Reading the Network.Curl docs, it seems that using withCurlDo should be
enough to make this work (although I'm not sure about the description of
the function and no forking or lazy returns).

Using Network.Curl built against the OpenSSL headers does successfully
retrieve all URLs, but fails a bit later (during the threadDelay,
probably in GC) again with segmentation fault in libcurl.4.so.

I've tried changing forkIO to forkOS, but still no luck, with either SSL
library.

Any ideas what I'm doing wrong?

regards,
iustin

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe

Re: [Haskell-cafe] Type System vs Test Driven Development

2011-01-05 Thread Iustin Pop

On Wed, Jan 05, 2011 at 10:27:29PM +0100, Gregory Collins wrote:
 Once I had written the test harness, I spent literally less than a
 half-hour setting this up. Highly recommended, even if it is a (blech)
 Java program. Testing is one of the few areas where I think our
 software engineering tooling is on par with or exceeds that which is
 available in other languages.

Indeed, I have found this to be true as well, and have been trying to
explain it to non-Haskellers. Though I would also rank the memory/space
profiler very high compared to what is available for some other
languages.

And note, it's also easy to integrate with the Python-based buildbot, if
one doesn't want to run Java :)

regards,
iustin

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe

Re: [Haskell-cafe] ANN: HaLVM 1.0: the Haskell Lightweight Virtual Machine

2010-11-30 Thread Iustin Pop

On Tue, Nov 30, 2010 at 02:16:07PM -0800, Adam Wick wrote:
 Galois, Inc. is pleased to announce the immediate release of the
 Haskell Lightweight Virtual Machine (or HaLVM), version 1.0. The
 HaLVM is a port of the GHC runtime system to the Xen hypervisor,
 allowing programmers to create Haskell programs that run directly on
 Xen's bare metal. Internally, Galois has used this system in
 several projects with much success, and we hope y'all will have an
 equally great time with it.
 
 What might you do with a HaLVM? Pretty much anything you want. :)
 Explore designs for operating system decomposition, examine new
 notions of mobile computation with the HaLVM and Xen migration, or
 find interesting network services and lock them inside small, cheap,
 single-purpose VMs.

As someone who deals daily with Xen and often with Haskell too,
this is very interesting. Thanks for releasing and looking forward to
playing with it! (And hopefully more than just playing :)

thanks,
iustin

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe

Re: [Haskell-cafe] An interesting paper from Google

2010-10-15 Thread Iustin Pop

On Fri, Oct 15, 2010 at 09:28:09PM +0100, Andrew Coppin wrote:
  http://k1024.org/~iusty/papers/icfp10-haskell-reagent.pdf
 
 I'm sure some of you have seen this already. For those who lack the
 time or inclination to read through the (six) pages of this report,
 here's the summary...

Nice summary, I hope you found the paper interesting!

 We [i.e., the report authors] took a production Python system and
 rewrote bits of it in Haskell, some of which is now in production
 use. We conclude the following:
 
 - Python lets you just do whatever the hell you want, but Haskell
 demands that you actually have a *plan* before you start churning
 out code and running it. The result is generally cleaner and more
 consistent when you get there.
 
 - Haskell's much-criticised immutable data is actually an
 *advantage* for backtracking search problems.
 
 - Haskell wins for thread-safety.
 
 - ADTs are nicer than exceptions.
 
 - The parts of Haskell stolen by Python aren't as nice in Python as
 they are in Haskell. [Well, duh.]

I'd say unfortunately, not just duh…

 - We like what GHC provides for profiling.
 
 - We are dissappointed by what GHC provides for debugging.
 
 - String is too slow. None of the alternatives seem to be widely
 supported. If your library consumes Strings and returns Strings, the
 fact that ByteString exists doesn't help you.
 
 - Recent changes to GHC broke our code. (Specifically, extensible
 exceptions.) We were quite surprised that such a stable and
 mature system as GHC would do this to us.
 
 - Haskell has quite a high barrier to entry. [Again, duh.]
 
 The paper also contains an interesting section that basically says
 we tried porting the Python implementing of XYZ into Haskell, but
 there wasn't really any advantage because it's all I/O. In my
 humble opinion, it's all I/O is a common beginner's mistake.
 Reading between the lines, it sounds like they wrote the whole thing
 in the IO monad, and then decided it looked just like the existing
 Python code so there wasn't much point in continuing.

Not quite (not all was in the I/O monad). It doesn't make sense to
rewrite 40K of lines from language A into language B just for fun. But
the advantages were not as strong as for the balancing algorithms to
justify any potential conversion. They were strong, just not strong
enough.

 I guess it's
 one of the bad habits that imperative programmers get into. With a
 little more experience, you eventually figure out that you can limit
 the stuff actually in the IO monad to a surprisingly small section,
 and so almost everything else in pure code, no matter how much the
 problem looks like it's all I/O. But anyway, I'm only guessing
 from what I can actually see with my own eyes in the report itself.

That's not how I would describe it (if I had to write it in a single
paragraph).

Basically, if you take a random, numerical/algorithmic problem, and you
write it in FP/Haskell, it's easy to show to most non-FP programmers why
Haskell wins on many accounts. But if you take a heavy I/O problem
(networking code, etc.), while Haskell is as good as Python, it is less
easy to show the strengths of the language. Yes, all the nice bits are
still there, but when you marshall data between network and your
internal structures the type system is less useful than when you just
write algorithms that process the internal data. Similar with the other
nice parts.

Now, if I were to start from scratch… :)

 I'm surprised about the profiler. They seem really, really impressed
 with it. Which is interesting to me, since I can never seen to get
 anything sensible out of it. It always seems to claim that my
 program is spending 80% of its runtime executing zipWith or
 something equally absurd.

I'm surprised that you're surprised :) The profiler is indeed awesome,
and in general I can manage to get one factor of magnitude speedup on my
initial algorithms, if not more.

Even if it just tells me that zipWith is the slow part, that's enough.
I'd even say it's a very good hint where to start.

 I'm unsurprised which their
 dissappointment with debugging. I'm still quite surprised that
 there's no tool anywhere which will trivially print out the
 reduction sequence for executing an expression. You'd think this
 would be laughably easy, and yet nobody has done it yet.

Indeed.

Thanks again for the summary :)

regards,
iustin
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe

Re: [Haskell-cafe] An interesting paper from Google

2010-10-15 Thread Iustin Pop

On Fri, Oct 15, 2010 at 11:08:14PM +0100, Andrew Coppin wrote:
  On 15/10/2010 10:43 PM, Iustin Pop wrote:
 On Fri, Oct 15, 2010 at 09:28:09PM +0100, Andrew Coppin wrote:
 I'm surprised about the profiler. They seem really, really impressed
 with it. Which is interesting to me, since I can never seen to get
 anything sensible out of it. It always seems to claim that my
 program is spending 80% of its runtime executing zipWith or
 something equally absurd.
 I'm surprised that you're surprised :) The profiler is indeed awesome,
 and in general I can manage to get one factor of magnitude speedup on my
 initial algorithms, if not more.
 
 Even if it just tells me that zipWith is the slow part, that's enough.
 I'd even say it's a very good hint where to start.
 
 zipWith is a generic library function which always takes exactly the
 same amount of time. Unless you're using it so extensively that it's
 allocating huge amounts of memory or something, it would seem
 infinitely more likely that whatever function zipWith is *applying*
 should be the actual culprit, not zipWith itself.

I know about zipWith. And if the profile tells me I spend too much time
in zipWith, it means a few things:

- zipWith might have to force evaluation of the results, hence the
  incorrect attribution of costs
- if even after that zipWith is the culprit, it might be the way the
  lists are consumed (are they lazy-built?), and that might mean you
  just have to workaround that via a different algorithm

Again, the fact that it tells you time is being spent in a library
function is not bad, not at all.

 Of course, I'm talking about profiling in time. GHC also enables you
 to profile in space as well. I'm not actually sure to which one
 you're referring.

In general, time profiling. Although the space profiling is useful too,
it gives you hints on what the (lazy) program does, as opposed to what
you think it does. The retainer graphs are cool, e.g. you might see that
some code hangs on to data more than you fought, and you can save some
heap and GC time due to that.

 I haven't had much success with either. It's just
 too hard to figure out what the sea of numbers actually represent.
 (Since it's quite new, I'm assuming it's not the new ThreadScope
 functionallity - which I haven't tried yet, but looks extremely
 cool...)

I haven't used ThreadScope yet, but it's on my todo list.

iustin
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe

Re: [Haskell-cafe] Does anyone know the reason for differeng measurements in profiling under -threaded?

2010-04-22 Thread Iustin Pop

On Fri, Apr 23, 2010 at 12:14:29AM +0200, Jesper Louis Andersen wrote:
 Hi,
 
 I am asking if anyone has seen the following behaviour from GHC under
 -threaded and heavy use of File I/O, network I/O and STM use. After
 having run Combinatorrent for a while and then terminating it, we get
 the following output from the RTS in GC statistics:

[snip]

I don't know GHC internals, but from the description of the program (and
the fact that you don't use more than one core), I wonder why you use
-threaded?

How does the behaviour change if you use a single-threaded runtime?

(I'm asking this as the single-threaded runtime can be much faster in
certain use-cases.)

regards,
iustin
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe

Re: [Haskell-cafe] What is the consensus about -fwarn-unused-do-bind ?

2010-04-09 Thread Iustin Pop

On Fri, Apr 09, 2010 at 09:07:29AM -0700, Bryan O'Sullivan wrote:
 On Fri, Apr 9, 2010 at 6:44 AM, Ivan Lazar Miljenovic 
 ivan.miljeno...@gmail.com wrote:
 
 
  As of 6.12.1, the new -fwarn-unused-do-bind warning is activated with
  -Wall.  This is based off a bug report by Neil Mitchell:
  http://hackage.haskell.org/trac/ghc/ticket/3263 .
 
  However, does it make sense for this to be turned on with -Wall?
 
 
 Personally, I find it to be tremendously noisy and unhelpful, and I always
 edit my .cabal files to turn it off. I think of it as a usability
 regression.

Well, I would say it could be helpful, but given that even
Text.Printf.printf triggers this error in harmless statements it is
indeed a regression.

iustin
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe

[Haskell-cafe] OT: the format checker for ICFP 2010 papers, help!

2010-04-01 Thread Iustin Pop

This is off-topic, apologies in advance, but I hope people here have
experience with this.

I submitted a paper for ICFP but the paper checker says: “Margins too
small: text block bigger than maximum 7in x 9in on pages 1–6 by 4–5% in
at least one dimension”.

Now, I've used the standard class file and template, didn't alter any of
the margins/columns spacing, my paper size is set to letter, and
pdflatex doesn't give me any overfull hboxes. Does anyone know why the
error happens in this case?

Also, if the format checker is available somewhere for download so that
I can pre-check my paper, that'd be great.

thanks,
iustin
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe

Re: [Haskell-cafe] OT: the format checker for ICFP 2010 papers, help!

2010-04-01 Thread Iustin Pop

On Thu, Apr 01, 2010 at 05:25:44PM +0100, Thomas Schilling wrote:
 Do you perhaps have some text that run into the margins?  If I have
 references of the form Longname~\emph{et~al.}~\cite{foobar} Latex
 does not know how to split this up the text extends into the margins.
 A similar problem might occur for verbatim sections.  I submitted a
 paper based on the standard stylesheet earlier today and did not
 encounter any problems.

No, it was the wrong template, as it turned out. I did check for the
Overfull hbox message from latex and had none.

Interesting that your paper didn't trigger the error though…

thanks,
iustin
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe

Re: [Haskell-cafe] Bytestrings and [Char]

2010-03-23 Thread Iustin Pop

On Tue, Mar 23, 2010 at 08:51:16AM -0700, John Millikin wrote:
 On Tue, Mar 23, 2010 at 00:27, Johann Höchtl johann.hoec...@gmail.com wrote:
  How are ByteStrings (Lazy, UTF8) and Data.Text meant to co-exist? When I
  read bytestrings over a socket which happens to be UTF16-LE encoded and
  identify a fitting function in Data.Text, I guess I have to transcode them
  with Data.Text.Encoding to make the type System happy?
 
 There's no such thing as a UTF8 or UTF16 bytestring -- a bytestring is
 just a more efficient encoding of [Word8], just as Text is a more
 efficient encoding of [Char]. If the file format you're parsing
 specifies that some series of bytes is text encoded as UTF16-LE, then
 you can use the Text decoders to convert to Text.
 
 Poor separation between bytes and characters has caused problems in
 many major languages (C, C++, PHP, Ruby, Python) -- lets not abandon
 the advantages of correctness to chase a few percentage points of
 performance.

I agree with the principle of correctness, but let's be honest - it's
(many) orders of magnitude between ByteString and String and Text, not
just a few percentage points…

I've been struggling with this problem too and it's not nice. Every time
one uses the system readFile  friends (anything that doesn't read via
ByteStrings), it hell slow.

Test: read a file and compute its size in chars. Input text file is
~40MB in size, has one non-ASCII char. The test might seem stupid but it
is a simple one. ghc 6.12.1.

Data.ByteString.Lazy (bytestring readFile + length) -  10 miliseconds,
incorrect length (as expected).

Data.ByteString.Lazy.UTF8 (system readFile + fromString + length) - 11
seconds, correct length.

Data.Text.Lazy (system readFile + pack + length) - 26s, correct length.

String (system readfile + length) - ~1 second, correct length.

For the record:

python2.6 (str type) -  ~60ms, incorrect length.
python3.1 (unicode)  - ~310ms, correct length.

If anyone has a solution on how to work on fast text (unicode)
transformations (but not a 1:1 pipeline where fusion can work nicely),
I'd be glad to hear.

iustin
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe

Re: [Haskell-cafe] Bytestrings and [Char]

2010-03-23 Thread Iustin Pop

On Tue, Mar 23, 2010 at 01:21:49PM -0400, Nick Bowler wrote:
 On 18:11 Tue 23 Mar , Iustin Pop wrote:
  I agree with the principle of correctness, but let's be honest - it's
  (many) orders of magnitude between ByteString and String and Text, not
  just a few percentage points…
  
  I've been struggling with this problem too and it's not nice. Every time
  one uses the system readFile  friends (anything that doesn't read via
  ByteStrings), it hell slow.
  
  Test: read a file and compute its size in chars. Input text file is
  ~40MB in size, has one non-ASCII char. The test might seem stupid but it
  is a simple one. ghc 6.12.1.
  
  Data.ByteString.Lazy (bytestring readFile + length) -  10 miliseconds,
  incorrect length (as expected).
  
  Data.ByteString.Lazy.UTF8 (system readFile + fromString + length) - 11
  seconds, correct length.
  
  Data.Text.Lazy (system readFile + pack + length) - 26s, correct length.
  
  String (system readfile + length) - ~1 second, correct length.
 
 Is this a mistake?  Your own report shows String  readFile being an
 order of magnitude faster than everything else, contrary to your earlier
 claim.

No, it's not a mistake. String is faster than pack to Text and length, but it's
100 times slower than ByteString.

My whole point is that difference between byte processing and char processing
in Haskell is not a few percentage points, but order of magnitude. I would
really like to have only the 6x penalty that Python shows, for example.

regards,
iustin
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe

Re: [Haskell-cafe] Bytestrings and [Char]

2010-03-23 Thread Iustin Pop

On Tue, Mar 23, 2010 at 05:53:00PM +, Vincent Hanquez wrote:
 On 23/03/10 17:11, Iustin Pop wrote:
 Data.ByteString.Lazy (bytestring readFile + length) -  10 miliseconds,
 incorrect length (as expected).
 
 Data.ByteString.Lazy.UTF8 (system readFile + fromString + length) - 11
 seconds, correct length.
 
 Why would you use system readFile and not bytestring readFile in the
 second case ?
 
 using the later i got a ~48x slowdown, not a 1100x slowdown, and
 the length is correct too.

Hmm… because I didn't think of that :)

iustin
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe

Re: [Haskell-cafe] Bytestrings and [Char]

2010-03-23 Thread Iustin Pop

On Tue, Mar 23, 2010 at 03:31:33PM -0400, Nick Bowler wrote:
 On 18:25 Tue 23 Mar , Iustin Pop wrote:
  On Tue, Mar 23, 2010 at 01:21:49PM -0400, Nick Bowler wrote:
   On 18:11 Tue 23 Mar , Iustin Pop wrote:
I agree with the principle of correctness, but let's be honest - it's
(many) orders of magnitude between ByteString and String and Text, not
just a few percentage points…

I've been struggling with this problem too and it's not nice. Every time
one uses the system readFile  friends (anything that doesn't read via
ByteStrings), it hell slow.

Test: read a file and compute its size in chars. Input text file is
~40MB in size, has one non-ASCII char. The test might seem stupid but it
is a simple one. ghc 6.12.1.

Data.ByteString.Lazy (bytestring readFile + length) -  10 miliseconds,
incorrect length (as expected).

Data.ByteString.Lazy.UTF8 (system readFile + fromString + length) - 11
seconds, correct length.

Data.Text.Lazy (system readFile + pack + length) - 26s, correct length.

String (system readfile + length) - ~1 second, correct length.
   
   Is this a mistake?  Your own report shows String  readFile being an
   order of magnitude faster than everything else, contrary to your earlier
   claim.
  
  No, it's not a mistake. String is faster than pack to Text and length, but 
  it's
  100 times slower than ByteString.
 
 Only if you don't care about obtaining the correct answer, in which case
 you may as well just say const 42 or somesuch, which is even faster.
 
  My whole point is that difference between byte processing and char 
  processing
  in Haskell is not a few percentage points, but order of magnitude. I would
  really like to have only the 6x penalty that Python shows, for example.
 
 Hang on a second... less than 10 milliseconds to read 40 megabytes from
 disk?  Something's fishy here.

Of course I don't want to benchmark the disk, and therefore the source file is
on tmpfs.

 I ran my own tests with a 400M file (419430400 bytes) consisting almost
 exclusively of the letter 'a' with two Japanese characters placed at
 every multiple of 40 megabytes (UTF-8 encoded).
 
 With Prelude.readFile/length and 5 runs, I see
 
   10145ms, 10087ms, 10223ms, 10321ms, 10216ms.
 
 with approximately 10% of that time spent performing GC each run.
 
 With Data.Bytestring.Lazy.readFile/length and 5 runs, I see
 
   8223ms, 8192ms, 8077ms, 8091ms, 8174ms.
 
 with approximately 20% of that time spent performing GC each run.
 Maybe there's some magic command line options to tune the GC for our
 purposes, but I only managed to make things slower.  Thus, I'll handwave
 a bit and just shave off the GC time from each result.
 
 Prelude: 9178ms mean with a standard deviation of 159ms.
 Data.ByteString.Lazy: 6521ms mean with a standard deviation of 103ms.
 
 Therefore, we managed a throughput of 43 MB/s with the Prelude (and got
 the right answer), while we managed 61 MB/s with lazy ByteStrings (and
 got the wrong answer).  My disk won't go much, if at all, faster than
 the second result, so that's good.

I'll bet that for a 400MB file, if you have more than two 2GB of ram, most of
it will be cached. If you want to check Haskell performance, just copy it to a
tmpfs filesytem so that the disk is out of the equation.

 So that's a 30% reduction in throughput.  I'd say that's a lot worse
 than a few percentage points, but certainly not orders of magnitude.

Because you're possibly benchmarking the disk also. With a 400MB file on tmpfs,
lazy bytestring readfile + length takes on my machine ~150ms, which is way
faster than 8 seconds…

 On the other hand, using Data.ByteString.Lazy.readFile and
 Data.ByteString.Lazy.UTF8.length, we get results of around 12000ms with
 approximately 5% of that time spent in GC, which is rather worse than
 the Prelude.  Data.Text.Lazy.IO.readFile and Data.Text.Lazy.length are
 even worse, with results of around 25 *seconds* (!!) and 2% of that time
 spent in GC.
 
 GNU wc computes the correct answer as quickly as lazy bytestrings
 compute the wrong answer.  With perl 5.8, slurping the entire file as
 UTF-8 computes the correct answer just as slowly as Prelude.  In my
 first ever Python program (with python 2.6), I tried to read the entire
 file as a unicode string and it quickly crashes due to running out of
 memory (yikes!), so it earns a DNF.
 
 So, for computing the right answer with this simple test, it looks like
 the Prelude is the best option.  We tie with Perl and lose only to GNU
 wc (which is written in C).  Really, though, it would be nice to close
 that gap.

Totally agreed :)

iustin
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe

Re: [Haskell-cafe] Bytestrings and [Char]

2010-03-23 Thread Iustin Pop

On Tue, Mar 23, 2010 at 11:22:23AM -0700, Bryan O'Sullivan wrote:
 2010/3/23 Iustin Pop iu...@k1024.org
 
  I agree with the principle of correctness, but let's be honest - it's
  (many) orders of magnitude between ByteString and String and Text, not
  just a few percentage points…
 
 
 Well, your benchmarks are highly suspect. See below.
 
 
  Data.ByteString.Lazy.UTF8 (system readFile + fromString + length) - 11
  seconds, correct length.
 
 
 You should be using bytestring I/O for this.

As I was told, indeed :)

  Data.Text.Lazy (system readFile + pack + length) - 26s, correct length.
 
 
 You should be using text I/O for this.

I didn't realize Data.Text.IO exists. Thanks for the hint!

iustin
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe

56 matches

Mail list logo