Re: [Haskell-cafe] Network.Curl cookie jar madness

2012-08-19 Thread Brandon Allbery
On Sat, Aug 18, 2012 at 8:52 PM, Michael Orlitzky mich...@orlitzky.comwrote:

 Curl is making the request, but if I remove the (hPutStrLn stderr
 response_body), it doesn't work! What's even more insane is, this works:

   hPutStrLn stderr response_body

 and this doesn't:

   hPutStrLn stdout response_body

 whaaa? I really don't want to dump the response body to


At a guess, this is laziness and buffering interacting:  stderr is usually
unbuffered since it's error or log output that one usually wants to see
immediately; stdout is usually line buffered unless redirected, in which
case it's block buffered.

The real issue is that you (or perhaps Curl) is being too lazy and not
running the log_in until the result is actually needed; hPutStrLn is
forcing it, but incompletely when it's buffered.  (Which strikes me as
weird unless Curl is using unsafeInterleaveIO somewhere)  You will need
to force it or hold the handle open until the content is fully evaluated;
if it's in the half-closed state that hGetContents sets, it's usually best
to not close the handle explicitly but let the implicit lazy close do it.

-- 
brandon s allbery  allber...@gmail.com
wandering unix systems administrator (available) (412) 475-9364 vm/sms
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Network.Curl cookie jar madness

2012-08-19 Thread Iustin Pop
On Sun, Aug 19, 2012 at 12:45:47AM -0400, Michael Orlitzky wrote:
 On 08/18/2012 08:52 PM, Michael Orlitzky wrote:
  I'm one bug away from a working program and need some help. I wrote a
  little utility that logs into LWN.net, retrieves an article, and creates
  an epub out of it.
 
 I've created two pages where anyone can test this. The first just takes
 any username and password via post and sets a session variable. The
 second prints Success. if the session variable is set, and Failure.
 if it isn't. The bash script,

[…]

 The attached haskell program using Network.Curl, doesn't:
 
   $ runghc haskell-test.hs
   Logged in...
   Failure.
 
 Any help is appreciated =)

So, take this with a grain of salt: I've been bitten by curl (the
haskell bindings, I mean) before, and I don't hold the quality of the
library in great regard.

The libcurl documentation says: When you set a file name with
CURLOPT_COOKIEJAR, that file name will be created and all received
cookies will be stored in it when curl_easy_cleanup(3) is called (i.e.
at the end of a curl handle session). But even though the curl bindings
seem to run easy_cleanup on handles (initialize → mkCurl →
mkCurlWithCleanup), they don't do this correctly:

DEBUG: ALLOC: CURL
DEBUG: ALLOC: /tmp/network-curl-test-haskell20417.txt
DEBUG: ALLOC: username=foopassword=bar
DEBUG: ALLOC: http://michael.orlitzky.com/tmp/network-curl-test1.php
DEBUG: ALLOC: WRITER
DEBUG: ALLOC: WRITER

Note there's no DEBUG: FREE: CURL as the code seems to imply there
should be. Hence, the handle is never cleaned up (do the curl bindings
leak handles?), so the cookie file is never written.

Side note: by running the same program multiple times, sometimes you see
DEBUG: FREE: CURL, sometimes no FREE actions. I believe there's
something very wrong in the curl bindings with regard to cleanups.

If I modify curl to export a force cleanup function, I can make the
program work (but not always; my patch is a hack).

Alternatively, as the curl library doesn't need a cookie jar to use
cookies in the same handle, by modifying your code to reuse the same
curl handle (returning it from log_in and reusing the same in get_page)
gives me a success code. But the cookie file is still not filled, since
the curl handle is never properly terminated.

Since the curl bindings also have problems in multi-threaded programs
when SSL is enabled (as it doesn't actually setup the curl library
correctly with regards to multi-threaded memory allocation), I would
suggest you try to use the http conduit library, since that's a pure
haskell library that should work as well, if not better.

Happy to be proved wrong, if I'm just biased against curl :)

regards,
iustin

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Network.Curl cookie jar madness

2012-08-19 Thread Iustin Pop
On Sun, Aug 19, 2012 at 06:06:53PM +0200, Iustin Pop wrote:
 On Sun, Aug 19, 2012 at 12:45:47AM -0400, Michael Orlitzky wrote:
  On 08/18/2012 08:52 PM, Michael Orlitzky wrote:
   I'm one bug away from a working program and need some help. I wrote a
   little utility that logs into LWN.net, retrieves an article, and creates
   an epub out of it.
  
  I've created two pages where anyone can test this. The first just takes
  any username and password via post and sets a session variable. The
  second prints Success. if the session variable is set, and Failure.
  if it isn't. The bash script,
 
 […]
 
  The attached haskell program using Network.Curl, doesn't:
  
$ runghc haskell-test.hs
Logged in...
Failure.
  
  Any help is appreciated =)
 
 So, take this with a grain of salt: I've been bitten by curl (the
 haskell bindings, I mean) before, and I don't hold the quality of the
 library in great regard.
 
 The libcurl documentation says: When you set a file name with
 CURLOPT_COOKIEJAR, that file name will be created and all received
 cookies will be stored in it when curl_easy_cleanup(3) is called (i.e.
 at the end of a curl handle session). But even though the curl bindings
 seem to run easy_cleanup on handles (initialize → mkCurl →
 mkCurlWithCleanup), they don't do this correctly:
 
 DEBUG: ALLOC: CURL
 DEBUG: ALLOC: /tmp/network-curl-test-haskell20417.txt
 DEBUG: ALLOC: username=foopassword=bar
 DEBUG: ALLOC: http://michael.orlitzky.com/tmp/network-curl-test1.php
 DEBUG: ALLOC: WRITER
 DEBUG: ALLOC: WRITER
 
 Note there's no DEBUG: FREE: CURL as the code seems to imply there
 should be. Hence, the handle is never cleaned up (do the curl bindings
 leak handles?), so the cookie file is never written.
 
 Side note: by running the same program multiple times, sometimes you see
 DEBUG: FREE: CURL, sometimes no FREE actions. I believe there's
 something very wrong in the curl bindings with regard to cleanups.

On more investigation, this seems to be due to the somewhat careless use
of Foreign.Concurrent; from the docs:

  “The finalizer will be executed after the last reference to the
  foreign object is dropped. There is no guarantee of promptness, and in
  fact there is no guarantee that the finalizer will eventually run at
  all.”

Also, see http://hackage.haskell.org/trac/ghc/ticket/1364.

So it seems that the intended way of cleaning up curl handles is all
fine and dandy if one doesn't require timely cleanup; in most cases,
this is not needed, but for cookies it is broken.

I don't know what the proper solution is; either way, it seems that
there should be a way to force the cleanup to be run, via
finalizeForeignPtr, or requiring full manual handling of curl handles
(instead of via finalizers).

Gah, native libs++.

regards,
iustin

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Network.Curl cookie jar madness

2012-08-19 Thread Michael Orlitzky
On 08/19/2012 12:58 PM, Iustin Pop wrote:
 
 On more investigation, this seems to be due to the somewhat careless use
 of Foreign.Concurrent; from the docs:
 
   “The finalizer will be executed after the last reference to the
   foreign object is dropped. There is no guarantee of promptness, and in
   fact there is no guarantee that the finalizer will eventually run at
   all.”
 
 Also, see http://hackage.haskell.org/trac/ghc/ticket/1364.
 
 So it seems that the intended way of cleaning up curl handles is all
 fine and dandy if one doesn't require timely cleanup; in most cases,
 this is not needed, but for cookies it is broken.
 
 I don't know what the proper solution is; either way, it seems that
 there should be a way to force the cleanup to be run, via
 finalizeForeignPtr, or requiring full manual handling of curl handles
 (instead of via finalizers).
 
 Gah, native libs++.
 

Wow, thanks for the in-depth analysis. I'll just switch to
Network.Browser or its conduit counterpart.

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


[Haskell-cafe] Network.Curl cookie jar madness

2012-08-18 Thread Michael Orlitzky
I'm one bug away from a working program and need some help. I wrote a
little utility that logs into LWN.net, retrieves an article, and creates
an epub out of it. Full code here:

  git clone http://michael.orlitzky.com/git/lwn-epub.git

This is the code that gets the login cookie:

  cj - make_cookie_jar
  li_result - log_in cj uname pword

  case li_result of
Left err - do
  let msg = Failed to log in.  ++ err
  hPutStrLn stderr msg
Right response_body - do
  hPutStrLn stderr response_body

  return $ cfg { C.cookie_jar = Just cj }

Curl is making the request, but if I remove the (hPutStrLn stderr
response_body), it doesn't work! What's even more insane is, this works:

  hPutStrLn stderr response_body

and this doesn't:

  hPutStrLn stdout response_body

whaaa? I really don't want to dump the response body to
stderr, but I can't even begin to imagine what's going on here. Has
anyone got Network.Curl working with a cookie jar?

For anyone with an LWN account: you can set,

  username = foo
  password = bar

in ~/.lwn-epub/lwn-epub.conf and attempt to run with,

  lwn-epub -o current.epub current/bigpage

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Network.Curl cookie jar madness

2012-08-18 Thread Iustin Pop
On Sat, Aug 18, 2012 at 08:52:00PM -0400, Michael Orlitzky wrote:
 I'm one bug away from a working program and need some help. I wrote a
 little utility that logs into LWN.net, retrieves an article, and creates
 an epub out of it. Full code here:
 
   git clone http://michael.orlitzky.com/git/lwn-epub.git
 
 This is the code that gets the login cookie:
 
   cj - make_cookie_jar
   li_result - log_in cj uname pword
 
   case li_result of
 Left err - do
   let msg = Failed to log in.  ++ err
   hPutStrLn stderr msg
 Right response_body - do
   hPutStrLn stderr response_body
 
   return $ cfg { C.cookie_jar = Just cj }
 
 Curl is making the request, but if I remove the (hPutStrLn stderr
 response_body), it doesn't work! What's even more insane is, this works:
 
   hPutStrLn stderr response_body
 
 and this doesn't:
 
   hPutStrLn stdout response_body
 
 whaaa? I really don't want to dump the response body to
 stderr, but I can't even begin to imagine what's going on here. Has
 anyone got Network.Curl working with a cookie jar?

Is this perchance due to laziness? And the fact that stderr is not
buffered by default, so all output is forced right then (forcing the
evaluation), whereas stdout is buffered, so the output might only be
made later (or even after you to an hFlush).

I'd try to make sure that response_body is fully evaluated before
returning from the function.

Or I might be totally wrong, in which case sorry :)

regards,
iustin

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Network.Curl cookie jar madness

2012-08-18 Thread Michael Orlitzky
On 08/18/2012 09:00 PM, Iustin Pop wrote:
 On Sat, Aug 18, 2012 at 08:52:00PM -0400, Michael Orlitzky wrote:

 Curl is making the request, but if I remove the (hPutStrLn stderr
 response_body), it doesn't work! What's even more insane is, this works:

   hPutStrLn stderr response_body

 and this doesn't:

   hPutStrLn stdout response_body

 whaaa? I really don't want to dump the response body to
 stderr, but I can't even begin to imagine what's going on here. Has
 anyone got Network.Curl working with a cookie jar?
 
 Is this perchance due to laziness? And the fact that stderr is not
 buffered by default, so all output is forced right then (forcing the
 evaluation), whereas stdout is buffered, so the output might only be
 made later (or even after you to an hFlush).
 
 I'd try to make sure that response_body is fully evaluated before
 returning from the function.
 
 Or I might be totally wrong, in which case sorry :)
 

I thought so at first, but I've tried every trick I know to avoid it. If
I add an hFlush to the stdout version, it still fails. If I deepseq the
response_body (it's just a string, after all), it still fails.

In the case statement, we've already made the request and received a
response (the Left/Right is based on the response code). I'm thinking
the cookie jar should be full at that point whether or not I use the
response body.

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Network.Curl cookie jar madness

2012-08-18 Thread Michael Orlitzky
On 08/18/2012 08:52 PM, Michael Orlitzky wrote:
 I'm one bug away from a working program and need some help. I wrote a
 little utility that logs into LWN.net, retrieves an article, and creates
 an epub out of it.

I've created two pages where anyone can test this. The first just takes
any username and password via post and sets a session variable. The
second prints Success. if the session variable is set, and Failure.
if it isn't. The bash script,

  #!/bin/bash

  COOKIE_JAR='/tmp/network-curl-test-bash.txt'
  POST_DATA='username=foopassword=bar'
  URL1='http://michael.orlitzky.com/tmp/network-curl-test1.php'
  URL2='http://michael.orlitzky.com/tmp/network-curl-test2.php'

  echo 'Logging in...'
  curl --cookie-jar ${COOKIE_JAR} \
   --data ${POST_DATA} \
   ${URL1}

  echo 'Retrieving second page...'
  curl --cookie ${COOKIE_JAR} \
   ${URL2}

works:

  $ ./bash-test.sh
  Logging in...
  Retrieving second page...
  Success.

The attached haskell program using Network.Curl, doesn't:

  $ runghc haskell-test.hs
  Logged in...
  Failure.

Any help is appreciated =)
module Main
where

import Network.Curl
import System.IO (hClose)
import System.Directory (getTemporaryDirectory)
import System.IO.Temp (openTempFile)

default_curl_opts :: [CurlOption]
default_curl_opts =
  [ CurlDNSUseGlobalCache False,
CurlDNSCacheTimeout 0,
CurlFollowLocation True,
CurlTimeout 45 ]


make_cookie_jar :: IO FilePath
make_cookie_jar = do
  temp_dir - getTemporaryDirectory
  let file_name_template = network-curl-test-haskell.txt
  (out_path, out_handle) - openTempFile temp_dir file_name_template
  hClose out_handle -- We just want to create it for now.
  return out_path


-- | Log in using curl. Store the resulting session cookies in the
--   supplied file.
log_in :: FilePath - String - IO (Either String String)
log_in cookie_jar url =
  withCurlDo $ do
curl - initialize
resp - do_curl_ curl url curl_opts :: IO CurlResponse
let code = respCurlCode resp

return $
  case code of
CurlOK - Right (respBody resp)
error_code - Left $ HTTP Error:  ++ (show error_code)
  where
post_data :: [String]
post_data = [username=foo, password=bar]

post_opts :: [CurlOption]
post_opts =
  [ CurlCookieSession True,
CurlCookieJar cookie_jar,
CurlPost True,
CurlPostFields post_data ]

curl_opts :: [CurlOption]
curl_opts = default_curl_opts ++ post_opts



get_page :: FilePath - String - IO String
get_page cookie_jar url =
  withCurlDo $ do
curl - initialize
resp - do_curl_ curl url curl_opts :: IO CurlResponse
let code = respCurlCode resp

return $
  case code of
CurlOK - respBody resp
error_code - HTTP Error:  ++ (show error_code)

  where
get_opts = [ CurlCookieFile cookie_jar ]
curl_opts = default_curl_opts ++ get_opts



main :: IO ()
main = do
  cj - make_cookie_jar
  li_result - log_in cj page1_url
  case li_result of
Left err - do
  let msg = Failed to log in.  ++ err
  putStrLn msg
Right _ - do
  putStrLn Logged in...

  page2_result - get_page cj page2_url
  putStrLn page2_result

  where
page1_url = http://michael.orlitzky.com/tmp/network-curl-test1.php;
page2_url = http://michael.orlitzky.com/tmp/network-curl-test2.php;
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe