Re: [Haskell-cafe] Network.Curl cookie jar madness
On Sat, Aug 18, 2012 at 8:52 PM, Michael Orlitzky mich...@orlitzky.comwrote: Curl is making the request, but if I remove the (hPutStrLn stderr response_body), it doesn't work! What's even more insane is, this works: hPutStrLn stderr response_body and this doesn't: hPutStrLn stdout response_body whaaa? I really don't want to dump the response body to At a guess, this is laziness and buffering interacting: stderr is usually unbuffered since it's error or log output that one usually wants to see immediately; stdout is usually line buffered unless redirected, in which case it's block buffered. The real issue is that you (or perhaps Curl) is being too lazy and not running the log_in until the result is actually needed; hPutStrLn is forcing it, but incompletely when it's buffered. (Which strikes me as weird unless Curl is using unsafeInterleaveIO somewhere) You will need to force it or hold the handle open until the content is fully evaluated; if it's in the half-closed state that hGetContents sets, it's usually best to not close the handle explicitly but let the implicit lazy close do it. -- brandon s allbery allber...@gmail.com wandering unix systems administrator (available) (412) 475-9364 vm/sms ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Network.Curl cookie jar madness
On Sun, Aug 19, 2012 at 12:45:47AM -0400, Michael Orlitzky wrote: On 08/18/2012 08:52 PM, Michael Orlitzky wrote: I'm one bug away from a working program and need some help. I wrote a little utility that logs into LWN.net, retrieves an article, and creates an epub out of it. I've created two pages where anyone can test this. The first just takes any username and password via post and sets a session variable. The second prints Success. if the session variable is set, and Failure. if it isn't. The bash script, […] The attached haskell program using Network.Curl, doesn't: $ runghc haskell-test.hs Logged in... Failure. Any help is appreciated =) So, take this with a grain of salt: I've been bitten by curl (the haskell bindings, I mean) before, and I don't hold the quality of the library in great regard. The libcurl documentation says: When you set a file name with CURLOPT_COOKIEJAR, that file name will be created and all received cookies will be stored in it when curl_easy_cleanup(3) is called (i.e. at the end of a curl handle session). But even though the curl bindings seem to run easy_cleanup on handles (initialize → mkCurl → mkCurlWithCleanup), they don't do this correctly: DEBUG: ALLOC: CURL DEBUG: ALLOC: /tmp/network-curl-test-haskell20417.txt DEBUG: ALLOC: username=foopassword=bar DEBUG: ALLOC: http://michael.orlitzky.com/tmp/network-curl-test1.php DEBUG: ALLOC: WRITER DEBUG: ALLOC: WRITER Note there's no DEBUG: FREE: CURL as the code seems to imply there should be. Hence, the handle is never cleaned up (do the curl bindings leak handles?), so the cookie file is never written. Side note: by running the same program multiple times, sometimes you see DEBUG: FREE: CURL, sometimes no FREE actions. I believe there's something very wrong in the curl bindings with regard to cleanups. If I modify curl to export a force cleanup function, I can make the program work (but not always; my patch is a hack). Alternatively, as the curl library doesn't need a cookie jar to use cookies in the same handle, by modifying your code to reuse the same curl handle (returning it from log_in and reusing the same in get_page) gives me a success code. But the cookie file is still not filled, since the curl handle is never properly terminated. Since the curl bindings also have problems in multi-threaded programs when SSL is enabled (as it doesn't actually setup the curl library correctly with regards to multi-threaded memory allocation), I would suggest you try to use the http conduit library, since that's a pure haskell library that should work as well, if not better. Happy to be proved wrong, if I'm just biased against curl :) regards, iustin ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Network.Curl cookie jar madness
On Sun, Aug 19, 2012 at 06:06:53PM +0200, Iustin Pop wrote: On Sun, Aug 19, 2012 at 12:45:47AM -0400, Michael Orlitzky wrote: On 08/18/2012 08:52 PM, Michael Orlitzky wrote: I'm one bug away from a working program and need some help. I wrote a little utility that logs into LWN.net, retrieves an article, and creates an epub out of it. I've created two pages where anyone can test this. The first just takes any username and password via post and sets a session variable. The second prints Success. if the session variable is set, and Failure. if it isn't. The bash script, […] The attached haskell program using Network.Curl, doesn't: $ runghc haskell-test.hs Logged in... Failure. Any help is appreciated =) So, take this with a grain of salt: I've been bitten by curl (the haskell bindings, I mean) before, and I don't hold the quality of the library in great regard. The libcurl documentation says: When you set a file name with CURLOPT_COOKIEJAR, that file name will be created and all received cookies will be stored in it when curl_easy_cleanup(3) is called (i.e. at the end of a curl handle session). But even though the curl bindings seem to run easy_cleanup on handles (initialize → mkCurl → mkCurlWithCleanup), they don't do this correctly: DEBUG: ALLOC: CURL DEBUG: ALLOC: /tmp/network-curl-test-haskell20417.txt DEBUG: ALLOC: username=foopassword=bar DEBUG: ALLOC: http://michael.orlitzky.com/tmp/network-curl-test1.php DEBUG: ALLOC: WRITER DEBUG: ALLOC: WRITER Note there's no DEBUG: FREE: CURL as the code seems to imply there should be. Hence, the handle is never cleaned up (do the curl bindings leak handles?), so the cookie file is never written. Side note: by running the same program multiple times, sometimes you see DEBUG: FREE: CURL, sometimes no FREE actions. I believe there's something very wrong in the curl bindings with regard to cleanups. On more investigation, this seems to be due to the somewhat careless use of Foreign.Concurrent; from the docs: “The finalizer will be executed after the last reference to the foreign object is dropped. There is no guarantee of promptness, and in fact there is no guarantee that the finalizer will eventually run at all.” Also, see http://hackage.haskell.org/trac/ghc/ticket/1364. So it seems that the intended way of cleaning up curl handles is all fine and dandy if one doesn't require timely cleanup; in most cases, this is not needed, but for cookies it is broken. I don't know what the proper solution is; either way, it seems that there should be a way to force the cleanup to be run, via finalizeForeignPtr, or requiring full manual handling of curl handles (instead of via finalizers). Gah, native libs++. regards, iustin ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Network.Curl cookie jar madness
On 08/19/2012 12:58 PM, Iustin Pop wrote: On more investigation, this seems to be due to the somewhat careless use of Foreign.Concurrent; from the docs: “The finalizer will be executed after the last reference to the foreign object is dropped. There is no guarantee of promptness, and in fact there is no guarantee that the finalizer will eventually run at all.” Also, see http://hackage.haskell.org/trac/ghc/ticket/1364. So it seems that the intended way of cleaning up curl handles is all fine and dandy if one doesn't require timely cleanup; in most cases, this is not needed, but for cookies it is broken. I don't know what the proper solution is; either way, it seems that there should be a way to force the cleanup to be run, via finalizeForeignPtr, or requiring full manual handling of curl handles (instead of via finalizers). Gah, native libs++. Wow, thanks for the in-depth analysis. I'll just switch to Network.Browser or its conduit counterpart. ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
[Haskell-cafe] Network.Curl cookie jar madness
I'm one bug away from a working program and need some help. I wrote a little utility that logs into LWN.net, retrieves an article, and creates an epub out of it. Full code here: git clone http://michael.orlitzky.com/git/lwn-epub.git This is the code that gets the login cookie: cj - make_cookie_jar li_result - log_in cj uname pword case li_result of Left err - do let msg = Failed to log in. ++ err hPutStrLn stderr msg Right response_body - do hPutStrLn stderr response_body return $ cfg { C.cookie_jar = Just cj } Curl is making the request, but if I remove the (hPutStrLn stderr response_body), it doesn't work! What's even more insane is, this works: hPutStrLn stderr response_body and this doesn't: hPutStrLn stdout response_body whaaa? I really don't want to dump the response body to stderr, but I can't even begin to imagine what's going on here. Has anyone got Network.Curl working with a cookie jar? For anyone with an LWN account: you can set, username = foo password = bar in ~/.lwn-epub/lwn-epub.conf and attempt to run with, lwn-epub -o current.epub current/bigpage ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Network.Curl cookie jar madness
On Sat, Aug 18, 2012 at 08:52:00PM -0400, Michael Orlitzky wrote: I'm one bug away from a working program and need some help. I wrote a little utility that logs into LWN.net, retrieves an article, and creates an epub out of it. Full code here: git clone http://michael.orlitzky.com/git/lwn-epub.git This is the code that gets the login cookie: cj - make_cookie_jar li_result - log_in cj uname pword case li_result of Left err - do let msg = Failed to log in. ++ err hPutStrLn stderr msg Right response_body - do hPutStrLn stderr response_body return $ cfg { C.cookie_jar = Just cj } Curl is making the request, but if I remove the (hPutStrLn stderr response_body), it doesn't work! What's even more insane is, this works: hPutStrLn stderr response_body and this doesn't: hPutStrLn stdout response_body whaaa? I really don't want to dump the response body to stderr, but I can't even begin to imagine what's going on here. Has anyone got Network.Curl working with a cookie jar? Is this perchance due to laziness? And the fact that stderr is not buffered by default, so all output is forced right then (forcing the evaluation), whereas stdout is buffered, so the output might only be made later (or even after you to an hFlush). I'd try to make sure that response_body is fully evaluated before returning from the function. Or I might be totally wrong, in which case sorry :) regards, iustin ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Network.Curl cookie jar madness
On 08/18/2012 09:00 PM, Iustin Pop wrote: On Sat, Aug 18, 2012 at 08:52:00PM -0400, Michael Orlitzky wrote: Curl is making the request, but if I remove the (hPutStrLn stderr response_body), it doesn't work! What's even more insane is, this works: hPutStrLn stderr response_body and this doesn't: hPutStrLn stdout response_body whaaa? I really don't want to dump the response body to stderr, but I can't even begin to imagine what's going on here. Has anyone got Network.Curl working with a cookie jar? Is this perchance due to laziness? And the fact that stderr is not buffered by default, so all output is forced right then (forcing the evaluation), whereas stdout is buffered, so the output might only be made later (or even after you to an hFlush). I'd try to make sure that response_body is fully evaluated before returning from the function. Or I might be totally wrong, in which case sorry :) I thought so at first, but I've tried every trick I know to avoid it. If I add an hFlush to the stdout version, it still fails. If I deepseq the response_body (it's just a string, after all), it still fails. In the case statement, we've already made the request and received a response (the Left/Right is based on the response code). I'm thinking the cookie jar should be full at that point whether or not I use the response body. ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Network.Curl cookie jar madness
On 08/18/2012 08:52 PM, Michael Orlitzky wrote: I'm one bug away from a working program and need some help. I wrote a little utility that logs into LWN.net, retrieves an article, and creates an epub out of it. I've created two pages where anyone can test this. The first just takes any username and password via post and sets a session variable. The second prints Success. if the session variable is set, and Failure. if it isn't. The bash script, #!/bin/bash COOKIE_JAR='/tmp/network-curl-test-bash.txt' POST_DATA='username=foopassword=bar' URL1='http://michael.orlitzky.com/tmp/network-curl-test1.php' URL2='http://michael.orlitzky.com/tmp/network-curl-test2.php' echo 'Logging in...' curl --cookie-jar ${COOKIE_JAR} \ --data ${POST_DATA} \ ${URL1} echo 'Retrieving second page...' curl --cookie ${COOKIE_JAR} \ ${URL2} works: $ ./bash-test.sh Logging in... Retrieving second page... Success. The attached haskell program using Network.Curl, doesn't: $ runghc haskell-test.hs Logged in... Failure. Any help is appreciated =) module Main where import Network.Curl import System.IO (hClose) import System.Directory (getTemporaryDirectory) import System.IO.Temp (openTempFile) default_curl_opts :: [CurlOption] default_curl_opts = [ CurlDNSUseGlobalCache False, CurlDNSCacheTimeout 0, CurlFollowLocation True, CurlTimeout 45 ] make_cookie_jar :: IO FilePath make_cookie_jar = do temp_dir - getTemporaryDirectory let file_name_template = network-curl-test-haskell.txt (out_path, out_handle) - openTempFile temp_dir file_name_template hClose out_handle -- We just want to create it for now. return out_path -- | Log in using curl. Store the resulting session cookies in the -- supplied file. log_in :: FilePath - String - IO (Either String String) log_in cookie_jar url = withCurlDo $ do curl - initialize resp - do_curl_ curl url curl_opts :: IO CurlResponse let code = respCurlCode resp return $ case code of CurlOK - Right (respBody resp) error_code - Left $ HTTP Error: ++ (show error_code) where post_data :: [String] post_data = [username=foo, password=bar] post_opts :: [CurlOption] post_opts = [ CurlCookieSession True, CurlCookieJar cookie_jar, CurlPost True, CurlPostFields post_data ] curl_opts :: [CurlOption] curl_opts = default_curl_opts ++ post_opts get_page :: FilePath - String - IO String get_page cookie_jar url = withCurlDo $ do curl - initialize resp - do_curl_ curl url curl_opts :: IO CurlResponse let code = respCurlCode resp return $ case code of CurlOK - respBody resp error_code - HTTP Error: ++ (show error_code) where get_opts = [ CurlCookieFile cookie_jar ] curl_opts = default_curl_opts ++ get_opts main :: IO () main = do cj - make_cookie_jar li_result - log_in cj page1_url case li_result of Left err - do let msg = Failed to log in. ++ err putStrLn msg Right _ - do putStrLn Logged in... page2_result - get_page cj page2_url putStrLn page2_result where page1_url = http://michael.orlitzky.com/tmp/network-curl-test1.php; page2_url = http://michael.orlitzky.com/tmp/network-curl-test2.php; ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe