[Haskell-cafe] Parallel Data.Vector.generate?
Hello all. I'm using the Data.Vector.generate function with a complicated creation function to create a long vector. Is it possible to parallelize the creation of each element? Alternatively, if there was something like parMap for vectors, I suppose I could pass id to Data.Vector.generate and use parMap. Does something like this exist? Thanks, Myles C. Maxfield ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Parallel Data.Vector.generate?
Thanks for this! I didn't know about Repa, and it sounds like it's exactly what the doctor ordered. I think I'll port me entire program to it! --Myles On Thursday, March 28, 2013, Dmitry Dzhus wrote: 28.03.2013, 10:38, Myles C. Maxfield myles.maxfi...@gmail.comjavascript:; : Hello all. I'm using the Data.Vector.generate function with a complicated creation function to create a long vector. Is it possible to parallelize the creation of each element? Alternatively, if there was something like parMap for vectors, I suppose I could pass id to Data.Vector.generate and use parMap. Does something like this exist? You may use computeP + fromFunction from Repa. Wrapping of vectors to Repa arrays (and vice versa) is O(1). ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Safe 'chr' function?
Thanks you two for your answers. Consider this issue closed now :-) --Myles On Thu, Jan 3, 2013 at 12:05 AM, Michael Snoyman mich...@snoyman.comwrote: You could wrap chr with a call to spoon[1]. It's not the most elegant solution, but it works. [1] http://hackage.haskell.org/packages/archive/spoon/0.3/doc/html/Control-Spoon.html#v:spoon On Thu, Jan 3, 2013 at 9:50 AM, Myles C. Maxfield myles.maxfi...@gmail.com wrote: Hello, I'm working on a general text-processing library [1] and one of my quickcheck tests is designed to make sure that my library doesn't throw exceptions (it returns an Either type on failure). However, there are some inputs that cause me to pass bogus values to the 'chr' function (such as 1208914), which causes it to throw an exception. Is there a version of that function that is safe? (I'm hoping for something like Int - Maybe Char). Alternatively, is there a way to know ahead of time whether or not an Int will cause 'chr' to throw an exception? Thanks, Myles C. Maxfield [1] http://hackage.haskell.org/package/punycode ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
[Haskell-cafe] Safe 'chr' function?
Hello, I'm working on a general text-processing library [1] and one of my quickcheck tests is designed to make sure that my library doesn't throw exceptions (it returns an Either type on failure). However, there are some inputs that cause me to pass bogus values to the 'chr' function (such as 1208914), which causes it to throw an exception. Is there a version of that function that is safe? (I'm hoping for something like Int - Maybe Char). Alternatively, is there a way to know ahead of time whether or not an Int will cause 'chr' to throw an exception? Thanks, Myles C. Maxfield [1] http://hackage.haskell.org/package/punycode ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Does anyone know where George Pollard is?
False alarm. He got back to me. Thanks, Myles On Thu, Nov 8, 2012 at 12:20 AM, Ketil Malde ke...@malde.org wrote: Myles C. Maxfield myles.maxfi...@gmail.com writes: Does anyone know where he is? On GitHub? https://github.com/Porges One of the repos was apparently updated less than a week ago. If not, is there an accepted practice to resolve this situation? Should I upload my own 'idna2' package? You can always upload a fork, but unless you have a lot of new functionality that won't fit naturally in the old package, you can perhaps try a bit more to contact the original author. -k ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
[Haskell-cafe] Does anyone know where George Pollard is?
Hello, I sent a message to George Pollard (por...@porg.es) about his 'idna' package [1] a couple days ago, but he hasn't responded. I'd like to depend on his package for something that I'm working on, but his package fails to build (The email I sent him includes a patch that should fix up the build problems). The package hasn't been updated for 3 years and he hasn't listed a source control repository. Does anyone know where he is? If not, is there an accepted practice to resolve this situation? Should I upload my own 'idna2' package? Thanks, Myles [1] http://hackage.haskell.org/package/idna ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Auto-termination and leftovers in Conduits
Cool! Thanks so much! --Myles On Sat, Oct 27, 2012 at 8:35 PM, Michael Snoyman mich...@snoyman.com wrote: The important issue here is that, when using =$, $=, and =$=, leftovers will discarded. To see this more clearly, realize that the first line of sink is equivalent to: out1 - C.injectLeftovers CT.lines C.+ CL.head So any leftovers from lines are lost once you move past that line. In order to get this to work, stick the consume inside the same composition: sink = C.injectLeftovers CT.lines C.+ do out1 - CL.head out2 - CL.consume return (out1, T.unlines out2) Or: sink = CT.lines C.=$ do out1 - CL.head out2 - CL.consume return (out1, T.unlines out2) Michael On Sat, Oct 27, 2012 at 9:20 PM, Myles C. Maxfield myles.maxfi...@gmail.com wrote: Hey, Say I have a stream of Data.Text.Text objects flowing through a conduit, where the divisions between successive Data.Text.Text items occur at arbitrary boundaries (maybe the source is sourceFile $= decode utf8). I'd like to create a Sink that returns a tuple of (the first line, the rest of the input). My first attempt at this looks like this: sink = do out1 - CT.lines C.=$ CL.head out2 - CL.consume return (out1, T.concat out2) However, the following input provides: runIdentity $ CL.sourceList [abc\nde, f\nghi] C.$$ sink (Just abc,f\nghi) But what I really want is (Just abc, \ndef\nghi) I think this is due to the auto-termination you mention in [1]. My guess is that when CT.lines yields the first value, (CL.head then also yields it,) and execution is auto-terminated before CT.lines gets a chance to specify any leftovers. How can I write this sink? (I know I can just use CL.consume and T.break (== '\n'), but I'm not interested in that. I'm trying to figure out how to get the behavior I'm looking for with conduits.) Thanks, Myles [1] http://hackage.haskell.org/packages/archive/conduit/0.5.2.7/doc/html/Data-Conduit.html ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
[Haskell-cafe] Auto-termination and leftovers in Conduits
Hey, Say I have a stream of Data.Text.Text objects flowing through a conduit, where the divisions between successive Data.Text.Text items occur at arbitrary boundaries (maybe the source is sourceFile $= decode utf8). I'd like to create a Sink that returns a tuple of (the first line, the rest of the input). My first attempt at this looks like this: sink = do out1 - CT.lines C.=$ CL.head out2 - CL.consume return (out1, T.concat out2) However, the following input provides: runIdentity $ CL.sourceList [abc\nde, f\nghi] C.$$ sink (Just abc,f\nghi) But what I really want is (Just abc, \ndef\nghi) I think this is due to the auto-termination you mention in [1]. My guess is that when CT.lines yields the first value, (CL.head then also yields it,) and execution is auto-terminated before CT.lines gets a chance to specify any leftovers. How can I write this sink? (I know I can just use CL.consume and T.break (== '\n'), but I'm not interested in that. I'm trying to figure out how to get the behavior I'm looking for with conduits.) Thanks, Myles [1] http://hackage.haskell.org/packages/archive/conduit/0.5.2.7/doc/html/Data-Conduit.html ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Hackage Package Discoverability
The last revision of the encoding package (0.6.7.1) was uploaded 6 days ago, so it's certainly not old. The package is also not unwieldly: the functions (runPut . encode punycode) and (runGet (decode punycode)) are equivalent to my 'encode' and 'decode' functions. In addition, it supports many more kinds of encodings and is much more general than my little library. In addition, it is much more flexible because of its use of ByteSource and ByteSink. It seems like a hands-down win to me. I've CC'ed the maintainer of the encoding package; maybe he can better reply about the encoding library. On Mon, Oct 22, 2012 at 11:14 PM, Bryan O'Sullivan b...@serpentine.com wrote: On Tue, Oct 23, 2012 at 5:53 AM, Myles C. Maxfield myles.maxfi...@gmail.com wrote: I am the author/maintainer of the 'punycode' hackage package. After 4 months, I just found that punycode conversion already exists in the Data.Encoding.BootString package inside the 'encoding' package. I'd like to deprecate my package in favor of the 'encoding' package. Please don't plan to do that. The encoding package may have filled a gap at some point, but now it looks old, unwieldy, inefficient (String), and weird (implicit parameters?) to me, and it's mostly obsolete (the standard I/O library has supported Unicode and encodings for a while now). I would not use the encodings package myself, for instance. Your punycode package, in contrast, has a simple API and looks easy to use. I'd suggest that you supprt the Text type as well as String, but otherwise please keep it around and maintain it. ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
[Haskell-cafe] Hackage Package Discoverability
Hello, I am the author/maintainer of the 'punycode' hackage package. After 4 months, I just found that punycode conversion already exists in the Data.Encoding.BootString package inside the 'encoding' package. I'd like to deprecate my package in favor of the 'encoding' package. However, I would also like to solve the discoverability problem of people not knowing to look in the 'encoding' package when they're looking for the punycode algorithm. (I certainly didn't look there, and as a result, I re-implemented the algorithm). My initial thought is to keep my package in the hackage database, but put a big label on it saying DEPRECATED: Use Data.Encoding.BootString instead. Is there a better way to make this algorithm discoverable? Thanks, Myles ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Dynamic Programming with Data.Vector
Aha there it is! Thanks so much. I didn't see it because it's under the Unfolding section instead of the Construction section. --Myles On Mon, Sep 17, 2012 at 6:07 AM, Roman Leshchinskiy r...@cse.unsw.edu.auwrote: Myles C. Maxfield wrote: Overall, I'm looking for a function, similar to Data.Vector's 'generate' function, but instead of the generation function taking the destination index, I'd like it to take the elements that have previously been constructed. Is there such a function? If there isn't one, is this kind of function feasible to write? If such a function doesn't exist and is feasible to write, I'd be happy to try to write and contribute it. Indeed there is, it's called constructN (or constructrN if you want to construct it right to left). Roman ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
[Haskell-cafe] Dynamic Programming with Data.Vector
Hello, I've been writing dynamic programming (dp) algorithms in imperative languages for years, and I was thinking recently about how to use it in a Haskell context. In particular, I want to write a function that takes an ordered collection of items and produces a new item to insert into the ordered collection. The most straightforward way to do this would be to use a list, something like the following: recurse :: [Integer] - [Integer] recurse l = newValue : recurse (take (length l + 1) infiniteList) where newValue = ... infiniteList :: [Integer] infiniteList = initialList ++ recurse initialList where initialList = ... solution :: Integer solution = infiniteList !! 5 I'm assuming that this can run fast because I'm assuming the 'take' function won't actually duplicate the list ([1] doesn't actually list the running time of 'take') Is this a correct assumption to make? Secondarily, and most importantly for me, I'm curious about how to make this fast when the computation of 'newValue' requires random access to the inputted list. I'm assuming that I would use Vectors instead of lists for this kind of computation, and [2] describes how I can use the O(1) 'slice' instead of 'take' above. However, both of Vector's cons and snoc functions are O(n) which defeats the purpose of using this kind of algorithm. Obviously, I can solve this problem with mutable vectors, but that's quite inelegant. Overall, I'm looking for a function, similar to Data.Vector's 'generate' function, but instead of the generation function taking the destination index, I'd like it to take the elements that have previously been constructed. Is there such a function? If there isn't one, is this kind of function feasible to write? If such a function doesn't exist and is feasible to write, I'd be happy to try to write and contribute it. [1] http://www.haskell.org/ghc/docs/latest/html/libraries/base/Data-List.html#g:11 [2] http://hackage.haskell.org/packages/archive/vector/0.9.1/doc/html/Data-Vector.html#g:6 ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Dynamic Programming with Data.Vector
Someone replied saying that I could use a HashMap and a fold to do this, and that solution should work quite well. Bonus points if there's a solution without the space overhead of a hashmap :-) I'm hoping for an unboxed vector. --Myles On Sun, Sep 16, 2012 at 12:40 PM, Myles C. Maxfield myles.maxfi...@gmail.com wrote: Hello, I've been writing dynamic programming (dp) algorithms in imperative languages for years, and I was thinking recently about how to use it in a Haskell context. In particular, I want to write a function that takes an ordered collection of items and produces a new item to insert into the ordered collection. The most straightforward way to do this would be to use a list, something like the following: recurse :: [Integer] - [Integer] recurse l = newValue : recurse (take (length l + 1) infiniteList) where newValue = ... infiniteList :: [Integer] infiniteList = initialList ++ recurse initialList where initialList = ... solution :: Integer solution = infiniteList !! 5 I'm assuming that this can run fast because I'm assuming the 'take' function won't actually duplicate the list ([1] doesn't actually list the running time of 'take') Is this a correct assumption to make? Secondarily, and most importantly for me, I'm curious about how to make this fast when the computation of 'newValue' requires random access to the inputted list. I'm assuming that I would use Vectors instead of lists for this kind of computation, and [2] describes how I can use the O(1) 'slice' instead of 'take' above. However, both of Vector's cons and snoc functions are O(n) which defeats the purpose of using this kind of algorithm. Obviously, I can solve this problem with mutable vectors, but that's quite inelegant. Overall, I'm looking for a function, similar to Data.Vector's 'generate' function, but instead of the generation function taking the destination index, I'd like it to take the elements that have previously been constructed. Is there such a function? If there isn't one, is this kind of function feasible to write? If such a function doesn't exist and is feasible to write, I'd be happy to try to write and contribute it. [1] http://www.haskell.org/ghc/docs/latest/html/libraries/base/Data-List.html#g:11 [2] http://hackage.haskell.org/packages/archive/vector/0.9.1/doc/html/Data-Vector.html#g:6 ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Re-exports of resourcet in conduit
Yep, that was my problem. Thanks so much! GHC is smart enough to realize that Data.Conduit.ResourceT is the same as Control.Monad.Trans.Resource.ResourceT. Yay! --Myles On Sun, Jun 3, 2012 at 2:00 AM, Michael Snoyman mich...@snoyman.com wrote: The easiest thing to do is just build your code with cabal, which will ensure you're using consistent versions. (Similar questions came up twice recently on Stack Overflow[1][2].) Wiping our your ~/.ghc and installing from scratch should work also, but it's like using a tactical nuke instead of a scalpel. As for checking versions of dependencies, try `ghc-pkg describe conduit`. Michael [1] http://stackoverflow.com/questions/10729291/lifting-trouble-with-resourcet/10730909#10730909 [2] http://stackoverflow.com/questions/10843547/snap-monad-liftio-and-ghc-7-4-1/10847401#10847401 On Sun, Jun 3, 2012 at 3:01 AM, Myles C. Maxfield myles.maxfi...@gmail.com wrote: It could be. Do you know how I can check which versions of packages other packages have built against with Cabal? Will it help if I remove all the relevant packages and then re-install only a single version? Thanks, Myles On Saturday, June 2, 2012, Antoine Latter wrote: Is it possible that you are puuling in two different versions of the package that defines the MonadThrow class? That is, package a was built against version 1, but package b was built against version 2? This would make GHC think the type-class were incompatable. This is just a guess - I have not tried what you are trying. On Jun 2, 2012 6:35 PM, Myles C. Maxfield myles.maxfi...@gmail.com wrote: To: Michael Snoyman CC: haskell-cafe Hello, I'm having a problem working with the conduit library, and was hoping you could help me out. Data.Conduit re-exports ResourceT, MonadResource, and MonadThrow (but not ExceptionT) from Control.Monad.Trans.Resource. I have a conduit which operates on a monad in the MonadThrow class. I am trying to figure out which MonadThrow class this should be (the Data.Conduit.MonadThrow class, or the Montrol.Monad.Trans.Resource.MonadThrow class, since apparently GHC doesn't recognize them as the same, even though one is just a re-export of the other). If a user of this conduit wants to chain this conduit up with something like sourceFile, the underlying monad has to be a member of Data.Conduit.MonadResource and whatever MonadThrow class I chose to use. I would like to be able to use an existing instance to lift the class of the inner monad to the class of the entire monad stack (so I don't have to tell the user of my conduit that they have to define their own instances), and the only rule that I can find that does that is the following from Data.Conduit: Data.Conduit.MonadThrow m = Data.Conduit.MonadThrow (Data.Conduit.ResourceT m) However, GHC doesn't seem to think that Control.Monad.Trans.Resource.ExceptionT is an instance of Data.Conduit.MonadThrow: No instance for (Data.Conduit.MonadThrow (ExceptionT IO)) arising from a use of `.' Control.Monad.Trans.Resource has a similar instance: Control.Monad.Trans.Resource.MonadThrow m = Control.Monad.Trans.Resource.MonadThrow (Control.Monad.Trans.Resource.ResourceT m) but because sourceFile operates in the Data.Conduit.MonadResource class, and Control.Monad.Trans.Resource.ResourceT isn't a member of that class (it's only a member of Control.Monad.Trans.Resource.MonadResource), that doesn't help: No instance for (Data.Conduit.MonadResource (Control.Monad.Trans.Resource.ResourceT (ExceptionT IO))) arising from a use of `.' It should be noted that neither module defines anything like the following: MonadResource m = MonadResource (ExceptionT m) It seems like the underlying problem here is that: 1) I am required to use the Control.Monad.Trans.Resource.ExceptionT class, because Data.Conduit doesn't re-export it 2) I am required to use the Data.Conduit.MonadResource class, because sourceFile and others require it 3) There doesn't seem to be an existing instance that bridges between the two. This seems like a fundamental flaw with re-exporting; it can only work if you re-export every single last thing from the original module. This doesn't seem tenable because the orignal module might not be under your control, so its author can add new symbols whenever he/she wants to. I see two solutions to this problem: 1) Re-export Control.Monad.Trans.Resource.ExceptionT in Data.Conduit. This will work until someone adds something to the resourcet package and someone wants to use the new addition and Data.Conduit.ResourceT in the same stack 2) Don't re-export anything in Data.Conduit; make sourceFile and others explicitly depend on types in another module, but this might break compatibility with existing programs if they use fully-qualified symbol names. 3) Make anyone who wants to use a monad stack in MonadThrow and MonadResource
[Haskell-cafe] Re-exports of resourcet in conduit
To: Michael Snoyman CC: haskell-cafe Hello, I'm having a problem working with the conduit library, and was hoping you could help me out. Data.Conduit re-exports ResourceT, MonadResource, and MonadThrow (but not ExceptionT) from Control.Monad.Trans.Resource. I have a conduit which operates on a monad in the MonadThrow class. I am trying to figure out which MonadThrow class this should be (the Data.Conduit.MonadThrow class, or the Montrol.Monad.Trans.Resource.MonadThrow class, since apparently GHC doesn't recognize them as the same, even though one is just a re-export of the other). If a user of this conduit wants to chain this conduit up with something like sourceFile, the underlying monad has to be a member of Data.Conduit.MonadResource and whatever MonadThrow class I chose to use. I would like to be able to use an existing instance to lift the class of the inner monad to the class of the entire monad stack (so I don't have to tell the user of my conduit that they have to define their own instances), and the only rule that I can find that does that is the following from Data.Conduit: Data.Conduit.MonadThrow m = Data.Conduit.MonadThrow (Data.Conduit.ResourceT m) However, GHC doesn't seem to think that Control.Monad.Trans.Resource.ExceptionT is an instance of Data.Conduit.MonadThrow: No instance for (Data.Conduit.MonadThrow (ExceptionT IO)) arising from a use of `.' Control.Monad.Trans.Resource has a similar instance: Control.Monad.Trans.Resource.MonadThrow m = Control.Monad.Trans.Resource.MonadThrow (Control.Monad.Trans.Resource.ResourceT m) but because sourceFile operates in the Data.Conduit.MonadResource class, and Control.Monad.Trans.Resource.ResourceT isn't a member of that class (it's only a member of Control.Monad.Trans.Resource.MonadResource), that doesn't help: No instance for (Data.Conduit.MonadResource (Control.Monad.Trans.Resource.ResourceT (ExceptionT IO))) arising from a use of `.' It should be noted that neither module defines anything like the following: MonadResource m = MonadResource (ExceptionT m) It seems like the underlying problem here is that: 1) I am required to use the Control.Monad.Trans.Resource.ExceptionT class, because Data.Conduit doesn't re-export it 2) I am required to use the Data.Conduit.MonadResource class, because sourceFile and others require it 3) There doesn't seem to be an existing instance that bridges between the two. This seems like a fundamental flaw with re-exporting; it can only work if you re-export every single last thing from the original module. This doesn't seem tenable because the orignal module might not be under your control, so its author can add new symbols whenever he/she wants to. I see two solutions to this problem: 1) Re-export Control.Monad.Trans.Resource.ExceptionT in Data.Conduit. This will work until someone adds something to the resourcet package and someone wants to use the new addition and Data.Conduit.ResourceT in the same stack 2) Don't re-export anything in Data.Conduit; make sourceFile and others explicitly depend on types in another module, but this might break compatibility with existing programs if they use fully-qualified symbol names. 3) Make anyone who wants to use a monad stack in MonadThrow and MonadResource define their own instances. This is probably no good because it means that many different modules will implement the same instance in potentially many different ways. I feel like option 2) is probably the best solution here. I'm perfectly happy to issue a pull request for whichever option you think is best, but I don't know which solution you think is best for your project. What do you think? --Myles ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Re-exports of resourcet in conduit
It could be. Do you know how I can check which versions of packages other packages have built against with Cabal? Will it help if I remove all the relevant packages and then re-install only a single version? Thanks, Myles On Saturday, June 2, 2012, Antoine Latter wrote: Is it possible that you are puuling in two different versions of the package that defines the MonadThrow class? That is, package a was built against version 1, but package b was built against version 2? This would make GHC think the type-class were incompatable. This is just a guess - I have not tried what you are trying. On Jun 2, 2012 6:35 PM, Myles C. Maxfield myles.maxfi...@gmail.comjavascript:_e({}, 'cvml', 'myles.maxfi...@gmail.com'); wrote: To: Michael Snoyman CC: haskell-cafe Hello, I'm having a problem working with the conduit library, and was hoping you could help me out. Data.Conduit re-exports ResourceT, MonadResource, and MonadThrow (but not ExceptionT) from Control.Monad.Trans.Resource. I have a conduit which operates on a monad in the MonadThrow class. I am trying to figure out which MonadThrow class this should be (the Data.Conduit.MonadThrow class, or the Montrol.Monad.Trans.Resource.MonadThrow class, since apparently GHC doesn't recognize them as the same, even though one is just a re-export of the other). If a user of this conduit wants to chain this conduit up with something like sourceFile, the underlying monad has to be a member of Data.Conduit.MonadResource and whatever MonadThrow class I chose to use. I would like to be able to use an existing instance to lift the class of the inner monad to the class of the entire monad stack (so I don't have to tell the user of my conduit that they have to define their own instances), and the only rule that I can find that does that is the following from Data.Conduit: Data.Conduit.MonadThrow m = Data.Conduit.MonadThrow (Data.Conduit.ResourceT m) However, GHC doesn't seem to think that Control.Monad.Trans.Resource.ExceptionT is an instance of Data.Conduit.MonadThrow: No instance for (Data.Conduit.MonadThrow (ExceptionT IO)) arising from a use of `.' Control.Monad.Trans.Resource has a similar instance: Control.Monad.Trans.Resource.MonadThrow m = Control.Monad.Trans.Resource.MonadThrow (Control.Monad.Trans.Resource.ResourceT m) but because sourceFile operates in the Data.Conduit.MonadResource class, and Control.Monad.Trans.Resource.ResourceT isn't a member of that class (it's only a member of Control.Monad.Trans.Resource.MonadResource), that doesn't help: No instance for (Data.Conduit.MonadResource (Control.Monad.Trans.Resource.ResourceT (ExceptionT IO))) arising from a use of `.' It should be noted that neither module defines anything like the following: MonadResource m = MonadResource (ExceptionT m) It seems like the underlying problem here is that: 1) I am required to use the Control.Monad.Trans.Resource.ExceptionT class, because Data.Conduit doesn't re-export it 2) I am required to use the Data.Conduit.MonadResource class, because sourceFile and others require it 3) There doesn't seem to be an existing instance that bridges between the two. This seems like a fundamental flaw with re-exporting; it can only work if you re-export every single last thing from the original module. This doesn't seem tenable because the orignal module might not be under your control, so its author can add new symbols whenever he/she wants to. I see two solutions to this problem: 1) Re-export Control.Monad.Trans.Resource.ExceptionT in Data.Conduit. This will work until someone adds something to the resourcet package and someone wants to use the new addition and Data.Conduit.ResourceT in the same stack 2) Don't re-export anything in Data.Conduit; make sourceFile and others explicitly depend on types in another module, but this might break compatibility with existing programs if they use fully-qualified symbol names. 3) Make anyone who wants to use a monad stack in MonadThrow and MonadResource define their own instances. This is probably no good because it means that many different modules will implement the same instance in potentially many different ways. I feel like option 2) is probably the best solution here. I'm perfectly happy to issue a pull request for whichever option you think is best, but I don't know which solution you think is best for your project. What do you think? --Myles ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org javascript:_e({}, 'cvml', 'Haskell-Cafe@haskell.org'); http://www.haskell.org/mailman/listinfo/haskell-cafe ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Conduit Best Practices for leftover data
Thanks for responding to this. Some responses are inline. On Sat, Apr 14, 2012 at 8:30 PM, Michael Snoyman mich...@snoyman.com wrote: On Thu, Apr 12, 2012 at 9:25 AM, Myles C. Maxfield myles.maxfi...@gmail.com wrote: Hello, I am interested in the argument to Done, namely, leftover data. More specifically, when implementing a conduit/sink, what should the conduit specify for the (Maybe i) argument to Done in the following scenarios (Please note that these scenarios only make sense if the type of 'i' is something in Monoid): 1) The conduit outputted the last thing that it felt like outputting, and exited willfully. There seem to be two options here - a) the conduit/sink should greedily gather up all the remaining input in the stream and mconcat them, or b) Return the part of the last thing that never got represented in any part of anything outputted. Option b seems to make the most sense here. Yes, option (b) is definitely what's intended. 2) Something upstream produced Done, so the second argument to NeedInput gets run. This is guaranteed to be run at the boundary of an item, so should it always return Nothing? Instead, should it remember all the input it has consumed for the current (yet-to-be-outputted) element, so it can let Data.Conduit know that, even though the conduit appeared to consume the past few items, it actually didn't (because it needs more input items to make an output)? Remembering this sequence could potentially have disastrous memory usage. On the other hand, It could also greedily gather everything remaining in the stream. No, nothing so complicated is intended. Most likely you'll never return any leftovers from the second field of NeedInput. One other minor point: it's also possible that the second field will be used if the *downstream* pipe returns Done. Just to help me understand, what is a case when you want to specify something in this field? I can't think of a case when a Conduit would specify anything in this case. 3) The conduit/sink encountered an error mid-item. In general, is there a commonly-accepted way to deal with this? If a conduit fails in the middle of an item, it might not be clear where it should pick up processing, so the conduit probably shouldn't even attempt to continue. It would probably be good to return some notion of where it was in the input when it failed. It could return (Done (???) (Left errcode)) but this requires that everything downstream in the pipeline be aware of Errcode, which is not ideal.I could use MonadError along with PipeM, but this approach completely abandons the part of the stream that has been processed successfully. I'd like to avoid using Exceptions if at all possible. Why avoid Exceptions? It's the right fit for the job. You can still keep your conduit pure by setting up an `ExceptionT Identity` stack, which is exactly how you can use the Data.Conduit.Text functions from pure code. Really, what you need to be asking is is there any logical way to recover from an exception here? I suppose this is a little off-topic, but do you prefer ExceptionT or ErrorT? Any exception/error that I'd be throwing is just a container around a String, so both of them will work fine for my purposes. It doesn't seem that a user application even has any way to access leftover data anyway, so perhaps this discussion will only be relevant in a future version of Conduit. At any rate, any feedback you could give me on this issue would be greatly appreciated. Leftover data is definitely used: 1. If you compose together two `Sink` with monadic bind, the leftovers from the first will be passed to the second. You can do that That's so cool!I never realized that Pipes are members of Monad. 2. If you use connect-and-resume ($$+), the leftovers are returned as part of the `Source`, and provided downstream. This too is really neat :] I didn't realize how this worked. Michael ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Conduit Best Practices for leftover data
2. If you use connect-and-resume ($$+), the leftovers are returned as part of the `Source`, and provided downstream. I'm trying to figure out how to use this, but I'm getting a little bit confused. In particular, here is a conduit that produces an output for every 'i' inputs. I'm returning partial data when the input stream hits an EOF (And I verified that the partial data is correct with Debug.Trace), yet the output of 'partial' is ([[1,2,3,4,5]],[]) instead of ([[1,2,3,4,5]],[6,7,8]). Can you help me understand what's going on? Thanks, Myles import qualified Data.Conduit as C import qualified Data.Conduit.List as CL -- functionally the same as concatenating all the inputs, then repeatedly running splitAt on the concatenation. takeConduit :: (Num a, Monad m) = a - C.Pipe [a1] [a1] m () takeConduit i = takeConduitHelper i [] [] where takeConduitHelper x lout lin | x == 0 = C.HaveOutput (takeConduitHelper i [] lin) (return ()) $ reverse lout | null lin = C.NeedInput (takeConduitHelper x lout) (C.Done (Just $ reverse lout) ()) | otherwise = takeConduitHelper (x - 1) (head lin : lout) $ tail lin partial :: (Num t, Monad m, Enum t) = m ([[t]], [[t]]) partial = do (source, output) - CL.sourceList [[1..8]] C.$$+ (takeConduit 5 C.=$ CL.consume) output' - source C.$$ CL.consume return (output, output') On Sun, Apr 15, 2012 at 2:12 PM, Myles C. Maxfield myles.maxfi...@gmail.com wrote: Thanks for responding to this. Some responses are inline. On Sat, Apr 14, 2012 at 8:30 PM, Michael Snoyman mich...@snoyman.com wrote: On Thu, Apr 12, 2012 at 9:25 AM, Myles C. Maxfield myles.maxfi...@gmail.com wrote: Hello, I am interested in the argument to Done, namely, leftover data. More specifically, when implementing a conduit/sink, what should the conduit specify for the (Maybe i) argument to Done in the following scenarios (Please note that these scenarios only make sense if the type of 'i' is something in Monoid): 1) The conduit outputted the last thing that it felt like outputting, and exited willfully. There seem to be two options here - a) the conduit/sink should greedily gather up all the remaining input in the stream and mconcat them, or b) Return the part of the last thing that never got represented in any part of anything outputted. Option b seems to make the most sense here. Yes, option (b) is definitely what's intended. 2) Something upstream produced Done, so the second argument to NeedInput gets run. This is guaranteed to be run at the boundary of an item, so should it always return Nothing? Instead, should it remember all the input it has consumed for the current (yet-to-be-outputted) element, so it can let Data.Conduit know that, even though the conduit appeared to consume the past few items, it actually didn't (because it needs more input items to make an output)? Remembering this sequence could potentially have disastrous memory usage. On the other hand, It could also greedily gather everything remaining in the stream. No, nothing so complicated is intended. Most likely you'll never return any leftovers from the second field of NeedInput. One other minor point: it's also possible that the second field will be used if the *downstream* pipe returns Done. Just to help me understand, what is a case when you want to specify something in this field? I can't think of a case when a Conduit would specify anything in this case. 3) The conduit/sink encountered an error mid-item. In general, is there a commonly-accepted way to deal with this? If a conduit fails in the middle of an item, it might not be clear where it should pick up processing, so the conduit probably shouldn't even attempt to continue. It would probably be good to return some notion of where it was in the input when it failed. It could return (Done (???) (Left errcode)) but this requires that everything downstream in the pipeline be aware of Errcode, which is not ideal.I could use MonadError along with PipeM, but this approach completely abandons the part of the stream that has been processed successfully. I'd like to avoid using Exceptions if at all possible. Why avoid Exceptions? It's the right fit for the job. You can still keep your conduit pure by setting up an `ExceptionT Identity` stack, which is exactly how you can use the Data.Conduit.Text functions from pure code. Really, what you need to be asking is is there any logical way to recover from an exception here? I suppose this is a little off-topic, but do you prefer ExceptionT or ErrorT? Any exception/error that I'd be throwing is just a container around a String, so both of them will work fine for my purposes. It doesn't seem that a user application even has any way to access leftover data anyway, so perhaps this discussion will only be relevant in a future version of Conduit. At any rate, any feedback you could give me on this issue would be greatly appreciated. Leftover data
Re: [Haskell-cafe] Conduit Best Practices for leftover data
Sorry for the spam. A similar matter is this following program, where something downstream reaches EOF right after a conduit outputs a HaveOutput. Because the type of the early-closed function is just 'r' or 'm r', there is no way for the conduit to return any partial output. This means that any extra values in the chunk the conduit read are lost. Is there some way around this? -- takeConduit as in previous email -- partial2 outputs ([[1,2,3,4,5]],[]) instead of ([[1,2,3,4,5]],[6,7,8]) monadSink :: Monad m = CI.Sink [a1] m ([[a1]], [[a1]]) monadSink = do output - takeConduit 5 C.=$ CL.take 1 output' - CL.consume return (output, output') partial2 :: (Num t, Monad m, Enum t) = m ([[t]], [[t]]) partial2 = CL.sourceList [[1..8]] C.$$ monadSink Thanks, Myles On Sun, Apr 15, 2012 at 4:53 PM, Myles C. Maxfield myles.maxfi...@gmail.com wrote: 2. If you use connect-and-resume ($$+), the leftovers are returned as part of the `Source`, and provided downstream. I'm trying to figure out how to use this, but I'm getting a little bit confused. In particular, here is a conduit that produces an output for every 'i' inputs. I'm returning partial data when the input stream hits an EOF (And I verified that the partial data is correct with Debug.Trace), yet the output of 'partial' is ([[1,2,3,4,5]],[]) instead of ([[1,2,3,4,5]],[6,7,8]). Can you help me understand what's going on? Thanks, Myles import qualified Data.Conduit as C import qualified Data.Conduit.List as CL -- functionally the same as concatenating all the inputs, then repeatedly running splitAt on the concatenation. takeConduit :: (Num a, Monad m) = a - C.Pipe [a1] [a1] m () takeConduit i = takeConduitHelper i [] [] where takeConduitHelper x lout lin | x == 0 = C.HaveOutput (takeConduitHelper i [] lin) (return ()) $ reverse lout | null lin = C.NeedInput (takeConduitHelper x lout) (C.Done (Just $ reverse lout) ()) | otherwise = takeConduitHelper (x - 1) (head lin : lout) $ tail lin partial :: (Num t, Monad m, Enum t) = m ([[t]], [[t]]) partial = do (source, output) - CL.sourceList [[1..8]] C.$$+ (takeConduit 5 C.=$ CL.consume) output' - source C.$$ CL.consume return (output, output') On Sun, Apr 15, 2012 at 2:12 PM, Myles C. Maxfield myles.maxfi...@gmail.com wrote: Thanks for responding to this. Some responses are inline. On Sat, Apr 14, 2012 at 8:30 PM, Michael Snoyman mich...@snoyman.com wrote: On Thu, Apr 12, 2012 at 9:25 AM, Myles C. Maxfield myles.maxfi...@gmail.com wrote: Hello, I am interested in the argument to Done, namely, leftover data. More specifically, when implementing a conduit/sink, what should the conduit specify for the (Maybe i) argument to Done in the following scenarios (Please note that these scenarios only make sense if the type of 'i' is something in Monoid): 1) The conduit outputted the last thing that it felt like outputting, and exited willfully. There seem to be two options here - a) the conduit/sink should greedily gather up all the remaining input in the stream and mconcat them, or b) Return the part of the last thing that never got represented in any part of anything outputted. Option b seems to make the most sense here. Yes, option (b) is definitely what's intended. 2) Something upstream produced Done, so the second argument to NeedInput gets run. This is guaranteed to be run at the boundary of an item, so should it always return Nothing? Instead, should it remember all the input it has consumed for the current (yet-to-be-outputted) element, so it can let Data.Conduit know that, even though the conduit appeared to consume the past few items, it actually didn't (because it needs more input items to make an output)? Remembering this sequence could potentially have disastrous memory usage. On the other hand, It could also greedily gather everything remaining in the stream. No, nothing so complicated is intended. Most likely you'll never return any leftovers from the second field of NeedInput. One other minor point: it's also possible that the second field will be used if the *downstream* pipe returns Done. Just to help me understand, what is a case when you want to specify something in this field? I can't think of a case when a Conduit would specify anything in this case. 3) The conduit/sink encountered an error mid-item. In general, is there a commonly-accepted way to deal with this? If a conduit fails in the middle of an item, it might not be clear where it should pick up processing, so the conduit probably shouldn't even attempt to continue. It would probably be good to return some notion of where it was in the input when it failed. It could return (Done (???) (Left errcode)) but this requires that everything downstream in the pipeline be aware of Errcode, which is not ideal.I could use MonadError along with PipeM, but this approach completely abandons the part of the stream that has been processed
[Haskell-cafe] Conduit Best Practices for leftover data
Hello, I am interested in the argument to Done, namely, leftover data. More specifically, when implementing a conduit/sink, what should the conduit specify for the (Maybe i) argument to Done in the following scenarios (Please note that these scenarios only make sense if the type of 'i' is something in Monoid): 1) The conduit outputted the last thing that it felt like outputting, and exited willfully. There seem to be two options here - a) the conduit/sink should greedily gather up all the remaining input in the stream and mconcat them, or b) Return the part of the last thing that never got represented in any part of anything outputted. Option b seems to make the most sense here. 2) Something upstream produced Done, so the second argument to NeedInput gets run. This is guaranteed to be run at the boundary of an item, so should it always return Nothing? Instead, should it remember all the input it has consumed for the current (yet-to-be-outputted) element, so it can let Data.Conduit know that, even though the conduit appeared to consume the past few items, it actually didn't (because it needs more input items to make an output)? Remembering this sequence could potentially have disastrous memory usage. On the other hand, It could also greedily gather everything remaining in the stream. 3) The conduit/sink encountered an error mid-item. In general, is there a commonly-accepted way to deal with this? If a conduit fails in the middle of an item, it might not be clear where it should pick up processing, so the conduit probably shouldn't even attempt to continue. It would probably be good to return some notion of where it was in the input when it failed. It could return (Done (???) (Left errcode)) but this requires that everything downstream in the pipeline be aware of Errcode, which is not ideal.I could use MonadError along with PipeM, but this approach completely abandons the part of the stream that has been processed successfully. I'd like to avoid using Exceptions if at all possible. It doesn't seem that a user application even has any way to access leftover data anyway, so perhaps this discussion will only be relevant in a future version of Conduit. At any rate, any feedback you could give me on this issue would be greatly appreciated. Thanks, Myles ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Mixing Unboxed Mutable Vectors and Parsers
It's a JPEG parser. Progressive JPEG is set up where there's a vector Word8s, and some of the entries in the vector may be 0. The JPEG has a stream of bits, and the decoder is supposed to shift in one bit to each successive element in the vector, skipping over 0s, and stop when it reaches some specified number of 0s. So if your partially decoded vector is 2, 8, 0, 12, 0, 10, 6, 0, 2, 10 and the jpeg has this bit stream 1, 1, 0, 1, 0, 0, 1, 0, ... and the jpeg says shift in until the 3rd zero is found that would result in the partially decoded vector being 3, 9, 0, 12, 0, 11, 6, 0, 2, 10 with the leftover part of the stream being 0, 1, 0, ... The JPEG parser has to keep track of where it is in the partially decoded vector to know how many bits to shift in, and where they belong, so the next iteration is aligned to the right place. It would be possible to keep track of this stuff throughout the parsing, and have the result of the parse be a second delta framebuffer and apply it to the original after each scan is parsed, but that's fairly ugly and I'd like to avoid doing that. If that's what I have to do, though, I guess I have to do it. Isn't there a better way? --Myles On Sat, Apr 7, 2012 at 11:56 PM, Stephen Tetley stephen.tet...@gmail.com wrote: Hi Myles It seems odd to mix parsing (consuming input) with mutation. What problem are you trying to solve and are you sure you can't get better phase separation than this paragraph suggests? My first idea was to simply parse all the deltas, and later apply them to the input list. However, I can't do that because the value of the deltas depend on the value they're modifying. ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
[Haskell-cafe] Mixing Unboxed Mutable Vectors and Parsers
CC: Maintainers of STMonadTrans, Vector, and JuicyPixels Hello, I am writing a Haskell Attoparsec parser which will modify 2-d arrays of small values (Word8, Int8, etc.). My first idea was to simply parse all the deltas, and later apply them to the input list. However, I can't do that because the value of the deltas depend on the value they're modifying. My first pass at this program used a function of the form: p :: [[Word8]] - Parser [[Word8]] This approach works, however, the program uses far too much memory. Some quick testing shows that lists of Word8s are ~52.6x larger than unboxed vectors of Word8s, and boxed vectors of Word8s are ~7.5x larger than unboxed vectors of Word8s. A better approach would be to use Data.Vector.Unboxed.Mutable and do the mutations inline with the parser. Because mutable vectors require a monad in PrimMonad to do the mutations inside of, I'd have to use a monad transformer to combine Parser and something in PrimMonad. Attoparsec doesn't support being used as a monad transformer, so I can't say something like p :: (PrimMonad m, UM.Unbox a) = M.MVector (PrimState m) (UM.MVector (PrimState m) a) - ParserT m () I can't use Parsec (instead of Attoparsec) because I require streaming semantics -- eventually I'm going to hook this up to Data.Conduit and parse directly from the net. There is STT (in the package STMonadTrans), however, so I could potentially make the function result in STT Parser (). However, STT doesn't work with Data.Vector.Mutable or Data.Vector.Unboxed.Mutable, because STT isn't a member of the PrimMonad class (as far as I can tell). STT itself doesn't define unboxed mutable vectors (only boxed mutable vectors), but I feel that giving up unboxing isn't really an option because of the memory footprint. As a general observation, it seems silly to have two different mutable vector implementations, one for STT and the other for PrimMonad. So here are my questions: 1. Is it possible to implement PrimMonad with STT? I looked around for a little while, but couldn't find anything that did this. 2. Otherwise, is it reasonable to try to implement unboxed mutable vectors in STT? I feel this is probably going down the wrong path. 3. Are there any parsers that support streaming semantics and being used as a monad transformer? This would require rewriting my whole program to use this new parser, but if that's what I have to do, then so be it. Thanks, Myles C. Maxfield ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Streaming to JuicyPixels
So I started working on moving JuicyPixels to a streaming interface, and have some observations. This is going to be a pretty major change, touching pretty much every function, and the end result will end up looking very close to the code that I have already written. I'm not nearly as close to the code as the author is, and I've already made some mistakes due to not understanding how the original code is structured and layed out. Because this is essentially turning out to be a rewrite, I think it makes more sense for me to just work on my own library, and release it as a streaming alternative to JuicyPixels. How do you feel about this, Vincent? Thanks, Myles On Wed, Feb 22, 2012 at 12:30 PM, Vincent Berthoux vincent.berth...@gmail.com wrote: Hi, Please go ahead, and github is the perfect medium for code sharing :) Regards Vincent Berthoux Le 22 février 2012 20:20, Myles C. Maxfield myles.maxfi...@gmail.com a écrit : Let's put aside the issue of getting access to the pixels before the stream is complete. How would you feel if I implemented use of the STT monad transformer on top of Get in JuicyPixels, in order to get rid of the (remaining getBytes) call, and then expose the underlying Get interface to callers? This would allow use for streaming. Is this something that you feel that I should pursue? I can send you a GitHub Pull Request when I'm done. Thanks, Myles On Wed, Feb 22, 2012 at 5:01 AM, Vincent Berthoux vincent.berth...@gmail.com wrote: Hi, I can understand your performance problems, I bumped into them before the first release of Juicy Pixels and took a long time to get 'correct' performance out of the box, and the IDCT is not the only 'hot point', I got problems with intermediate data structure as well. Any list has proven a bit problematic performance wise, so I rewrote some things which would have been easily implemented with forM_ or mapM_ with manual recursion. I didn't knew STT existed, so it open a new area of reflection for the streaming Jpeg decoder, instead of using the (remaining getBytes) combo, staying in the Get monad might help. The reason to expose the ST monad is that I use it internally to mutate the final image directly, and I'd prefer to avoid freezing/unfreezing the underlying array. So in order to give access to the intermediate representation, a type could be (STT s (StateT JpegDecodingState Get) (MutableImage s PixelYCbCr8)) (the Jpeg decoder only produce (Image PixelYCbCr8 internally)). This should allow a freeze then a color conversion. As it would induce performance loss, this version should exist alongside current implementation. This is not trivial, but it's far from impossible. For the IDCT implementation, I don't think a package make much sense, if you want to use it, just grab it and customize the interface to your needs :). Regards Vincent Berthoux Le 21 février 2012 06:16, Myles C. Maxfield myles.maxfi...@gmail.coma écrit : Hello, and thanks for the quick reply. You're right that using (remaining getBytes) won't work for streaming, as it would pull the rest of the stream into a buffer, thereby delaying further processing until the image is completely downloaded. :-( I'm a little unclear about what you mean about the use of the ST monad. There is an STThttp://hackage.haskell.org/packages/archive/STMonadTrans/0.2/doc/html/Control-Monad-ST-Trans.html monad transformer, so you could wrap that around Get. Is that what you're referring to? As an aside: I didn't realize that JuicyPixels existed, so I wrote a JPEG decoder specifically designed for streaming - it doesn't request a byte of input unless it has to, uses a StateT (wrapped around Attoparsec) to keep track of which bit in the current byte is next, and does the Huffman decoding in-line. However, I didn't use ST for the IDCT, so my own decoder has serious performance problems. This prompted me to start searching around for a solution, and I came across JuicyPixels, which already exists and is much faster than my own implementation. I'm hoping to get rid of my own decoder and just use JuicyPixels. If you're curious, my own code is here: https://github.com/litherum/jpeg. Is it reasonable to extend JuicyPixels to fit my use case? It sounds like JuicyPixels wouldn't work so well as it stands. I'd be happy to do whatever work is necessary to help out and get JuicyPixels usable for me. However, if that would require a full (or near-full) rewrite, it might make more sense for me to use my own implementation with your IDCT. Is there a way we can share just the IDCT between our two repositories? Perhaps making a new IDCT8 library that we can both depend on? As for what API I'd like to be able to use, just a Get DynamicImage should suffice (assuming it has streaming semantics as described above). It would be really nice if it was possible to get at the incomplete image before the stream is completed
Re: [Haskell-cafe] Streaming to JuicyPixels
Let's put aside the issue of getting access to the pixels before the stream is complete. How would you feel if I implemented use of the STT monad transformer on top of Get in JuicyPixels, in order to get rid of the (remaining getBytes) call, and then expose the underlying Get interface to callers? This would allow use for streaming. Is this something that you feel that I should pursue? I can send you a GitHub Pull Request when I'm done. Thanks, Myles On Wed, Feb 22, 2012 at 5:01 AM, Vincent Berthoux vincent.berth...@gmail.com wrote: Hi, I can understand your performance problems, I bumped into them before the first release of Juicy Pixels and took a long time to get 'correct' performance out of the box, and the IDCT is not the only 'hot point', I got problems with intermediate data structure as well. Any list has proven a bit problematic performance wise, so I rewrote some things which would have been easily implemented with forM_ or mapM_ with manual recursion. I didn't knew STT existed, so it open a new area of reflection for the streaming Jpeg decoder, instead of using the (remaining getBytes) combo, staying in the Get monad might help. The reason to expose the ST monad is that I use it internally to mutate the final image directly, and I'd prefer to avoid freezing/unfreezing the underlying array. So in order to give access to the intermediate representation, a type could be (STT s (StateT JpegDecodingState Get) (MutableImage s PixelYCbCr8)) (the Jpeg decoder only produce (Image PixelYCbCr8 internally)). This should allow a freeze then a color conversion. As it would induce performance loss, this version should exist alongside current implementation. This is not trivial, but it's far from impossible. For the IDCT implementation, I don't think a package make much sense, if you want to use it, just grab it and customize the interface to your needs :). Regards Vincent Berthoux Le 21 février 2012 06:16, Myles C. Maxfield myles.maxfi...@gmail.com a écrit : Hello, and thanks for the quick reply. You're right that using (remaining getBytes) won't work for streaming, as it would pull the rest of the stream into a buffer, thereby delaying further processing until the image is completely downloaded. :-( I'm a little unclear about what you mean about the use of the ST monad. There is an STThttp://hackage.haskell.org/packages/archive/STMonadTrans/0.2/doc/html/Control-Monad-ST-Trans.html monad transformer, so you could wrap that around Get. Is that what you're referring to? As an aside: I didn't realize that JuicyPixels existed, so I wrote a JPEG decoder specifically designed for streaming - it doesn't request a byte of input unless it has to, uses a StateT (wrapped around Attoparsec) to keep track of which bit in the current byte is next, and does the Huffman decoding in-line. However, I didn't use ST for the IDCT, so my own decoder has serious performance problems. This prompted me to start searching around for a solution, and I came across JuicyPixels, which already exists and is much faster than my own implementation. I'm hoping to get rid of my own decoder and just use JuicyPixels. If you're curious, my own code is here: https://github.com/litherum/jpeg. Is it reasonable to extend JuicyPixels to fit my use case? It sounds like JuicyPixels wouldn't work so well as it stands. I'd be happy to do whatever work is necessary to help out and get JuicyPixels usable for me. However, if that would require a full (or near-full) rewrite, it might make more sense for me to use my own implementation with your IDCT. Is there a way we can share just the IDCT between our two repositories? Perhaps making a new IDCT8 library that we can both depend on? As for what API I'd like to be able to use, just a Get DynamicImage should suffice (assuming it has streaming semantics as described above). It would be really nice if it was possible to get at the incomplete image before the stream is completed (so the image could slowly update as more data arrives from the network), but I'm not quite sure how to elegantly express that. Do you have any ideas? I think that having 2 native jpeg decoders (Actually 3, because of this package http://hackage.haskell.org/package/jpeg) is detrimental to the Haskell community, and I would really like to use JuicyPixels :D Thanks, Myles C. Maxfield On Mon, Feb 20, 2012 at 3:01 PM, Vincent Berthoux vincent.berth...@gmail.com wrote: Hi, I can expose the low level parsing, but you would only get the chunks/frames/sections of the image, Cereal is mainly used to parse the structure of the image, not to do the raw processing. For the raw processing, I rely on `remaining getBytes` to be able to manipulate data at bit level or to feed it to zlib, and the documentation clearly state that remaining doesn't work well with runGetPartial, so no read ahead, but even worse for streaming :). To be fair, I
[Haskell-cafe] Streaming to JuicyPixels
Hello, I am interested in the possibility of using JuicyPixels for streaming images from the web. It doesn't appear to expose any of its internally-used Serialize.Get functionality, which is problematic for streaming - I would not like to have to stream the whole image into a buffer before the decoder can start decoding. Are there any plans on exposing this API, so I can use the runGetPartial function to facilitate streaming? In addition, does the library do much readahead? There's no point in exposing a Get interface if it's just going to wait until the stream is done to start decoding anyway. Thanks, Myles C. Maxfield ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Streaming to JuicyPixels
Hello, and thanks for the quick reply. You're right that using (remaining getBytes) won't work for streaming, as it would pull the rest of the stream into a buffer, thereby delaying further processing until the image is completely downloaded. :-( I'm a little unclear about what you mean about the use of the ST monad. There is an STThttp://hackage.haskell.org/packages/archive/STMonadTrans/0.2/doc/html/Control-Monad-ST-Trans.html monad transformer, so you could wrap that around Get. Is that what you're referring to? As an aside: I didn't realize that JuicyPixels existed, so I wrote a JPEG decoder specifically designed for streaming - it doesn't request a byte of input unless it has to, uses a StateT (wrapped around Attoparsec) to keep track of which bit in the current byte is next, and does the Huffman decoding in-line. However, I didn't use ST for the IDCT, so my own decoder has serious performance problems. This prompted me to start searching around for a solution, and I came across JuicyPixels, which already exists and is much faster than my own implementation. I'm hoping to get rid of my own decoder and just use JuicyPixels. If you're curious, my own code is here: https://github.com/litherum/jpeg. Is it reasonable to extend JuicyPixels to fit my use case? It sounds like JuicyPixels wouldn't work so well as it stands. I'd be happy to do whatever work is necessary to help out and get JuicyPixels usable for me. However, if that would require a full (or near-full) rewrite, it might make more sense for me to use my own implementation with your IDCT. Is there a way we can share just the IDCT between our two repositories? Perhaps making a new IDCT8 library that we can both depend on? As for what API I'd like to be able to use, just a Get DynamicImage should suffice (assuming it has streaming semantics as described above). It would be really nice if it was possible to get at the incomplete image before the stream is completed (so the image could slowly update as more data arrives from the network), but I'm not quite sure how to elegantly express that. Do you have any ideas? I think that having 2 native jpeg decoders (Actually 3, because of this package http://hackage.haskell.org/package/jpeg) is detrimental to the Haskell community, and I would really like to use JuicyPixels :D Thanks, Myles C. Maxfield On Mon, Feb 20, 2012 at 3:01 PM, Vincent Berthoux vincent.berth...@gmail.com wrote: Hi, I can expose the low level parsing, but you would only get the chunks/frames/sections of the image, Cereal is mainly used to parse the structure of the image, not to do the raw processing. For the raw processing, I rely on `remaining getBytes` to be able to manipulate data at bit level or to feed it to zlib, and the documentation clearly state that remaining doesn't work well with runGetPartial, so no read ahead, but even worse for streaming :). To be fair, I never thought of this use case, and exposing a partially decoded image would impose the use of the ST Monad somehow, and Serialize is not a monad transformer, making it a bit hard to implement. By curiosity what kind of API would you hope for this kind of functionality? Regards Vincent Berthoux Le 20 février 2012 22:08, Myles C. Maxfield myles.maxfi...@gmail.com a écrit : Hello, I am interested in the possibility of using JuicyPixels for streaming images from the web. It doesn't appear to expose any of its internally-used Serialize.Get functionality, which is problematic for streaming - I would not like to have to stream the whole image into a buffer before the decoder can start decoding. Are there any plans on exposing this API, so I can use the runGetPartial function to facilitate streaming? In addition, does the library do much readahead? There's no point in exposing a Get interface if it's just going to wait until the stream is done to start decoding anyway. Thanks, Myles C. Maxfield ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Contributing to http-conduit
After all these commits have been flying around, I have yet another question: the 'HTTP' package defines Network.Browser which is a State monad which keeps state about a browser (i.e. a cookie jar, a proxy, redirection parameters, etc.) It would be pretty straightforward to implement this kind of functionality on top of http-conduit. I was originally going to do it and release it as its own package, but it may be beneficial to add such a module to the existing http-conduit package. Should I add it in to the existing package, or release it as its own package? --Myles On Mon, Feb 6, 2012 at 12:15 AM, Michael Snoyman mich...@snoyman.comwrote: Just an FYI for everyone: Myles sent an (incredibly thorough) pull request to handle cookies: https://github.com/snoyberg/http-conduit/pull/13 Thanks! On Sun, Feb 5, 2012 at 8:20 AM, Myles C. Maxfield myles.maxfi...@gmail.com wrote: 1. The spec defines a grammar for the attributes. They're in uppercase. 2. Yes - 1.3 is the first version that lists DiffTime as an instance of RealFrac (so I can use the 'floor' function to pull out the number of seconds to render it) 3. I'll see what I can do. --Myles On Sat, Feb 4, 2012 at 9:06 PM, Michael Snoyman mich...@snoyman.com wrote: Looks good, a few questions/requests: 1. Is there a reason to upper-case all the attributes? 2. Is the time = 1.3 a requirements? Because that can cause a lot of trouble for people. 3. Can you send the patch as a Github pull request? It's easier to track that way. Michael On Sat, Feb 4, 2012 at 1:21 AM, Myles C. Maxfield myles.maxfi...@gmail.com wrote: Here is the patch to Web.Cookie. I didn't modify the tests at all because they were already broken - they looked like they hadn't been updated since SetCookie only had 5 parameters. I did verify by hand that the patch works, though. Thanks, Myles On Thu, Feb 2, 2012 at 11:26 PM, Myles C. Maxfield myles.maxfi...@gmail.com wrote: Alright, I'll make a small patch that adds 2 fields to SetCookie: setCookieMaxAge :: Maybe DiffTime setCookieSecureOnly :: Bool I've also gotten started on those cookie functions. I'm currently writing tests for them. @Chris: The best advice I can give is that Chrome (what I'm using as a source on all this) has the data baked into a .cc file. However, they have directions in a README and a script which will parse the list and generate that source file. I recommend doing this. That way, the Haskell module would have 2 source files: one file that reads the list and generates the second file, which is a very large source file that contains each element in the list. The list should export `elem`-type queries. I'm not quite sure how to handle wildcards that appear in the list - that part is up to you. Thanks for helping out with this :] --Myles On Thu, Feb 2, 2012 at 10:53 PM, Michael Snoyman mich...@snoyman.com wrote: Looks good to me too. I agree with Aristid: let's make the change to cookie itself. Do you want to send a pull request? I'm also considering making the SetCookie constructor hidden like we have for Request, so that if in the future we realize we need to add some other settings, it doesn't break the API. Chris: I would recommend compiling it into the module. Best bet would likely being converting the source file to Haskell source. Michael On Fri, Feb 3, 2012 at 6:32 AM, Myles C. Maxfield myles.maxfi...@gmail.com wrote: Alright. After reading the spec, I have these questions / concerns: The spec supports the Max-Age cookie attribute, which Web.Cookies doesn't. I see two possible solutions to this. The first is to have parseSetCookie take a UTCTime as an argument which will represent the current time so it can populate the setCookieExpires field by adding the Max-Age attribute to the current time. Alternatively, that function can return an IO SetCookie so it can ask for the current time by itself (which I think is inferior to taking the current time as an argument). Note that the spec says to prefer Max-Age over Expires. Add a field to SetCookie of type Maybe DiffTime which represents the Max-Age attribute Cookie code should be aware of the Public Suffix List as a part of its domain verification. The cookie code only needs to be able to tell if a specific string is in the list (W.Ascii - Bool) I propose making an entirely unrelated package, public-suffix-list, with a module Network.PublicSuffixList, which will expose this function, as well as functions about parsing the list itself. Thoughts? Web.Cookie doesn't have a secure-only attribute. Adding one in is straightforward enough. The spec describes cookies as a property
Re: [Haskell-cafe] Contributing to http-conduit
That's a pretty reasonable thing to do. Didn't you say that I should keep the 'prefer max-age to expires' logic out of Web.Cookie? What do you think, Michael? --Myles On Sat, Feb 4, 2012 at 4:03 AM, Aristid Breitkreuz arist...@googlemail.comwrote: Is it possible to have both an Expires and a Max-age? If not, maybe you should make a type like data Expiry = NeverExpires | ExpiresAt UTCTime | ExpiresIn DiffTime 2012/2/4 Myles C. Maxfield myles.maxfi...@gmail.com: Here is the patch to Web.Cookie. I didn't modify the tests at all because they were already broken - they looked like they hadn't been updated since SetCookie only had 5 parameters. I did verify by hand that the patch works, though. Thanks, Myles On Thu, Feb 2, 2012 at 11:26 PM, Myles C. Maxfield myles.maxfi...@gmail.com wrote: Alright, I'll make a small patch that adds 2 fields to SetCookie: setCookieMaxAge :: Maybe DiffTime setCookieSecureOnly :: Bool I've also gotten started on those cookie functions. I'm currently writing tests for them. @Chris: The best advice I can give is that Chrome (what I'm using as a source on all this) has the data baked into a .cc file. However, they have directions in a README and a script which will parse the list and generate that source file. I recommend doing this. That way, the Haskell module would have 2 source files: one file that reads the list and generates the second file, which is a very large source file that contains each element in the list. The list should export `elem`-type queries. I'm not quite sure how to handle wildcards that appear in the list - that part is up to you. Thanks for helping out with this :] --Myles On Thu, Feb 2, 2012 at 10:53 PM, Michael Snoyman mich...@snoyman.com wrote: Looks good to me too. I agree with Aristid: let's make the change to cookie itself. Do you want to send a pull request? I'm also considering making the SetCookie constructor hidden like we have for Request, so that if in the future we realize we need to add some other settings, it doesn't break the API. Chris: I would recommend compiling it into the module. Best bet would likely being converting the source file to Haskell source. Michael On Fri, Feb 3, 2012 at 6:32 AM, Myles C. Maxfield myles.maxfi...@gmail.com wrote: Alright. After reading the spec, I have these questions / concerns: The spec supports the Max-Age cookie attribute, which Web.Cookies doesn't. I see two possible solutions to this. The first is to have parseSetCookie take a UTCTime as an argument which will represent the current time so it can populate the setCookieExpires field by adding the Max-Age attribute to the current time. Alternatively, that function can return an IO SetCookie so it can ask for the current time by itself (which I think is inferior to taking the current time as an argument). Note that the spec says to prefer Max-Age over Expires. Add a field to SetCookie of type Maybe DiffTime which represents the Max-Age attribute Cookie code should be aware of the Public Suffix List as a part of its domain verification. The cookie code only needs to be able to tell if a specific string is in the list (W.Ascii - Bool) I propose making an entirely unrelated package, public-suffix-list, with a module Network.PublicSuffixList, which will expose this function, as well as functions about parsing the list itself. Thoughts? Web.Cookie doesn't have a secure-only attribute. Adding one in is straightforward enough. The spec describes cookies as a property of HTTP, not of the World Wide Web. Perhaps Web.Cookie should be renamed? Just a thought; it doesn't really matter to me. As for Network.HTTP.Conduit.Cookie, the spec describes in section 5.3 Storage Model what fields a Cookie has. Here is my proposal for the functions it will expose: receiveSetCookie :: SetCookie - Req.Request m - UTCTime - Bool - CookieJar - CookieJar Runs the algorithm described in section 5.3 Storage Model The UTCTime is the current-time, the Bool is whether or not the caller is an HTTP-based API (as opposed to JavaScript or anything else) updateCookieJar :: Res.Response a - Req.Request m - UTCTime - CookieJar - (CookieJar, Res.Response a) Applies receiveSetCookie to a Response. The output CookieJar is stripped of any Set-Cookie headers. Specifies True for the Bool in receiveSetCookie computeCookieString :: Req.Request m - CookieJar - UTCTime - Bool - (W.Ascii, CookieJar) Runs the algorithm described in section 5.4 The Cookie Header The UTCTime and Bool are the same as in receiveSetCookie insertCookiesIntoRequest :: Req.Request m - CookieJar - UTCTime - (Req.Request m, CookieJar) Applies computeCookieString to a Request. The output cookie jar has
Re: [Haskell-cafe] Contributing to http-conduit
1. The spec defines a grammar for the attributes. They're in uppercase. 2. Yes - 1.3 is the first version that lists DiffTime as an instance of RealFrac (so I can use the 'floor' function to pull out the number of seconds to render it) 3. I'll see what I can do. --Myles On Sat, Feb 4, 2012 at 9:06 PM, Michael Snoyman mich...@snoyman.com wrote: Looks good, a few questions/requests: 1. Is there a reason to upper-case all the attributes? 2. Is the time = 1.3 a requirements? Because that can cause a lot of trouble for people. 3. Can you send the patch as a Github pull request? It's easier to track that way. Michael On Sat, Feb 4, 2012 at 1:21 AM, Myles C. Maxfield myles.maxfi...@gmail.com wrote: Here is the patch to Web.Cookie. I didn't modify the tests at all because they were already broken - they looked like they hadn't been updated since SetCookie only had 5 parameters. I did verify by hand that the patch works, though. Thanks, Myles On Thu, Feb 2, 2012 at 11:26 PM, Myles C. Maxfield myles.maxfi...@gmail.com wrote: Alright, I'll make a small patch that adds 2 fields to SetCookie: setCookieMaxAge :: Maybe DiffTime setCookieSecureOnly :: Bool I've also gotten started on those cookie functions. I'm currently writing tests for them. @Chris: The best advice I can give is that Chrome (what I'm using as a source on all this) has the data baked into a .cc file. However, they have directions in a README and a script which will parse the list and generate that source file. I recommend doing this. That way, the Haskell module would have 2 source files: one file that reads the list and generates the second file, which is a very large source file that contains each element in the list. The list should export `elem`-type queries. I'm not quite sure how to handle wildcards that appear in the list - that part is up to you. Thanks for helping out with this :] --Myles On Thu, Feb 2, 2012 at 10:53 PM, Michael Snoyman mich...@snoyman.com wrote: Looks good to me too. I agree with Aristid: let's make the change to cookie itself. Do you want to send a pull request? I'm also considering making the SetCookie constructor hidden like we have for Request, so that if in the future we realize we need to add some other settings, it doesn't break the API. Chris: I would recommend compiling it into the module. Best bet would likely being converting the source file to Haskell source. Michael On Fri, Feb 3, 2012 at 6:32 AM, Myles C. Maxfield myles.maxfi...@gmail.com wrote: Alright. After reading the spec, I have these questions / concerns: The spec supports the Max-Age cookie attribute, which Web.Cookies doesn't. I see two possible solutions to this. The first is to have parseSetCookie take a UTCTime as an argument which will represent the current time so it can populate the setCookieExpires field by adding the Max-Age attribute to the current time. Alternatively, that function can return an IO SetCookie so it can ask for the current time by itself (which I think is inferior to taking the current time as an argument). Note that the spec says to prefer Max-Age over Expires. Add a field to SetCookie of type Maybe DiffTime which represents the Max-Age attribute Cookie code should be aware of the Public Suffix List as a part of its domain verification. The cookie code only needs to be able to tell if a specific string is in the list (W.Ascii - Bool) I propose making an entirely unrelated package, public-suffix-list, with a module Network.PublicSuffixList, which will expose this function, as well as functions about parsing the list itself. Thoughts? Web.Cookie doesn't have a secure-only attribute. Adding one in is straightforward enough. The spec describes cookies as a property of HTTP, not of the World Wide Web. Perhaps Web.Cookie should be renamed? Just a thought; it doesn't really matter to me. As for Network.HTTP.Conduit.Cookie, the spec describes in section 5.3 Storage Model what fields a Cookie has. Here is my proposal for the functions it will expose: receiveSetCookie :: SetCookie - Req.Request m - UTCTime - Bool - CookieJar - CookieJar Runs the algorithm described in section 5.3 Storage Model The UTCTime is the current-time, the Bool is whether or not the caller is an HTTP-based API (as opposed to JavaScript or anything else) updateCookieJar :: Res.Response a - Req.Request m - UTCTime - CookieJar - (CookieJar, Res.Response a) Applies receiveSetCookie to a Response. The output CookieJar is stripped of any Set-Cookie headers. Specifies True for the Bool in receiveSetCookie computeCookieString :: Req.Request m - CookieJar - UTCTime - Bool - (W.Ascii, CookieJar) Runs the algorithm described in section 5.4 The Cookie Header
Re: [Haskell-cafe] Contributing to http-conduit
Here is the patch to Web.Cookie. I didn't modify the tests at all because they were already broken - they looked like they hadn't been updated since SetCookie only had 5 parameters. I did verify by hand that the patch works, though. Thanks, Myles On Thu, Feb 2, 2012 at 11:26 PM, Myles C. Maxfield myles.maxfi...@gmail.com wrote: Alright, I'll make a small patch that adds 2 fields to SetCookie: setCookieMaxAge :: Maybe DiffTime setCookieSecureOnly :: Bool I've also gotten started on those cookie functions. I'm currently writing tests for them. @Chris: The best advice I can give is that Chrome (what I'm using as a source on all this) has the data baked into a .cc file. However, they have directions in a README and a script which will parse the list and generate that source file. I recommend doing this. That way, the Haskell module would have 2 source files: one file that reads the list and generates the second file, which is a very large source file that contains each element in the list. The list should export `elem`-type queries. I'm not quite sure how to handle wildcards that appear in the list - that part is up to you. Thanks for helping out with this :] --Myles On Thu, Feb 2, 2012 at 10:53 PM, Michael Snoyman mich...@snoyman.comwrote: Looks good to me too. I agree with Aristid: let's make the change to cookie itself. Do you want to send a pull request? I'm also considering making the SetCookie constructor hidden like we have for Request, so that if in the future we realize we need to add some other settings, it doesn't break the API. Chris: I would recommend compiling it into the module. Best bet would likely being converting the source file to Haskell source. Michael On Fri, Feb 3, 2012 at 6:32 AM, Myles C. Maxfield myles.maxfi...@gmail.com wrote: Alright. After reading the spec, I have these questions / concerns: The spec supports the Max-Age cookie attribute, which Web.Cookies doesn't. I see two possible solutions to this. The first is to have parseSetCookie take a UTCTime as an argument which will represent the current time so it can populate the setCookieExpires field by adding the Max-Age attribute to the current time. Alternatively, that function can return an IO SetCookie so it can ask for the current time by itself (which I think is inferior to taking the current time as an argument). Note that the spec says to prefer Max-Age over Expires. Add a field to SetCookie of type Maybe DiffTime which represents the Max-Age attribute Cookie code should be aware of the Public Suffix List as a part of its domain verification. The cookie code only needs to be able to tell if a specific string is in the list (W.Ascii - Bool) I propose making an entirely unrelated package, public-suffix-list, with a module Network.PublicSuffixList, which will expose this function, as well as functions about parsing the list itself. Thoughts? Web.Cookie doesn't have a secure-only attribute. Adding one in is straightforward enough. The spec describes cookies as a property of HTTP, not of the World Wide Web. Perhaps Web.Cookie should be renamed? Just a thought; it doesn't really matter to me. As for Network.HTTP.Conduit.Cookie, the spec describes in section 5.3 Storage Model what fields a Cookie has. Here is my proposal for the functions it will expose: receiveSetCookie :: SetCookie - Req.Request m - UTCTime - Bool - CookieJar - CookieJar Runs the algorithm described in section 5.3 Storage Model The UTCTime is the current-time, the Bool is whether or not the caller is an HTTP-based API (as opposed to JavaScript or anything else) updateCookieJar :: Res.Response a - Req.Request m - UTCTime - CookieJar - (CookieJar, Res.Response a) Applies receiveSetCookie to a Response. The output CookieJar is stripped of any Set-Cookie headers. Specifies True for the Bool in receiveSetCookie computeCookieString :: Req.Request m - CookieJar - UTCTime - Bool - (W.Ascii, CookieJar) Runs the algorithm described in section 5.4 The Cookie Header The UTCTime and Bool are the same as in receiveSetCookie insertCookiesIntoRequest :: Req.Request m - CookieJar - UTCTime - (Req.Request m, CookieJar) Applies computeCookieString to a Request. The output cookie jar has updated last-accessed-times. Specifies True for the Bool in computeCookieString evictExpiredCookies :: CookieJar - UTCTime - CookieJar Runs the algorithm described in the last part of section 5.3 Storage Model This will make the relevant part of 'http' look like: go count req'' cookie_jar'' = do now - liftIO $ getCurrentTime let (req', cookie_jar') = insertCookiesIntoRequest req'' (evictExpiredCookies cookie_jar'' now) now res' - httpRaw req' manager let (cookie_jar, res) = updateCookieJar res' req' now cookie_jar' case getRedirectedRequest req
Re: [Haskell-cafe] Contributing to http-conduit
Alright. After reading the spec, I have these questions / concerns: - The spec supports the Max-Age cookie attribute, which Web.Cookies doesn't. - I see two possible solutions to this. The first is to have parseSetCookie take a UTCTime as an argument which will represent the current time so it can populate the setCookieExpires field by adding the Max-Age attribute to the current time. Alternatively, that function can return an IO SetCookie so it can ask for the current time by itself (which I think is inferior to taking the current time as an argument). Note that the spec says to prefer Max-Age over Expires. - Add a field to SetCookie of type Maybe DiffTime which represents the Max-Age attribute - Cookie code should be aware of the Public Suffix Listhttp://mxr.mozilla.org/mozilla-central/source/netwerk/dns/effective_tld_names.dat as a part of its domain verification. The cookie code only needs to be able to tell if a specific string is in the list (W.Ascii - Bool) - I propose making an entirely unrelated package, public-suffix-list, with a module Network.PublicSuffixList, which will expose this function, as well as functions about parsing the list itself. Thoughts? - Web.Cookie doesn't have a secure-only attribute. Adding one in is straightforward enough. - The spec describes cookies as a property of HTTP, not of the World Wide Web. Perhaps Web.Cookie should be renamed? Just a thought; it doesn't really matter to me. As for Network.HTTP.Conduit.Cookie, the spec describes in section 5.3 Storage Model what fields a Cookie has. Here is my proposal for the functions it will expose: - receiveSetCookie :: SetCookie - Req.Request m - UTCTime - Bool - CookieJar - CookieJar - Runs the algorithm described in section 5.3 Storage Model - The UTCTime is the current-time, the Bool is whether or not the caller is an HTTP-based API (as opposed to JavaScript or anything else) - updateCookieJar :: Res.Response a - Req.Request m - UTCTime - CookieJar - (CookieJar, Res.Response a) - Applies receiveSetCookie to a Response. The output CookieJar is stripped of any Set-Cookie headers. - Specifies True for the Bool in receiveSetCookie - computeCookieString :: Req.Request m - CookieJar - UTCTime - Bool - (W.Ascii, CookieJar) - Runs the algorithm described in section 5.4 The Cookie Header - The UTCTime and Bool are the same as in receiveSetCookie - insertCookiesIntoRequest :: Req.Request m - CookieJar - UTCTime - (Req.Request m, CookieJar) - Applies computeCookieString to a Request. The output cookie jar has updated last-accessed-times. - Specifies True for the Bool in computeCookieString - evictExpiredCookies :: CookieJar - UTCTime - CookieJar - Runs the algorithm described in the last part of section 5.3 Storage Model This will make the relevant part of 'http' look like: go count req'' cookie_jar'' = do now - liftIO $ getCurrentTime let (req', cookie_jar') = insertCookiesIntoRequest req'' (evictExpiredCookies cookie_jar'' now) now res' - httpRaw req' manager let (cookie_jar, res) = updateCookieJar res' req' now cookie_jar' case getRedirectedRequest req' (responseHeaders res) (W.statusCode (statusCode res)) of Just req - go (count - 1) req cookie_jar Nothing - return res I plan to not allow for a user-supplied cookieFilter function. If they want that functionality, they can re-implement the redirection-following logic. Any thoughts on any of this? Thanks, Myles On Wed, Feb 1, 2012 at 5:19 PM, Myles C. Maxfield myles.maxfi...@gmail.comwrote: Nope. I'm not. The RFC is very explicit about how to handle cookies. As soon as I'm finished making sense of it (in terms of Haskell) I'll send another proposal email. On Feb 1, 2012 3:25 AM, Michael Snoyman mich...@snoyman.com wrote: You mean you're *not* making this proposal? On Wed, Feb 1, 2012 at 7:30 AM, Myles C. Maxfield myles.maxfi...@gmail.com wrote: Well, this is embarrassing. Please disregard my previous email. I should learn to read the RFC *before* submitting proposals. --Myles On Tue, Jan 31, 2012 at 6:37 PM, Myles C. Maxfield myles.maxfi...@gmail.com wrote: Here are my initial ideas about supporting cookies. Note that I'm using Chrome for ideas since it's open source. Network/HTTP/Conduit/Cookies.hs file Exporting the following symbols: type StuffedCookie = SetCookie A regular SetCookie can have Nothing for its Domain and Path attributes. A StuffedCookie has to have these fields set. type CookieJar = [StuffedCookie] Chrome's cookie jar is implemented as (the C++ equivalent of) Map W.Ascii StuffedCookie. The key is the eTLD+1 of the domain, so lookups for all cookies for a given domain are fast. I think I'll stay with just a list
Re: [Haskell-cafe] Contributing to http-conduit
Alright, I'll make a small patch that adds 2 fields to SetCookie: setCookieMaxAge :: Maybe DiffTime setCookieSecureOnly :: Bool I've also gotten started on those cookie functions. I'm currently writing tests for them. @Chris: The best advice I can give is that Chrome (what I'm using as a source on all this) has the data baked into a .cc file. However, they have directions in a README and a script which will parse the list and generate that source file. I recommend doing this. That way, the Haskell module would have 2 source files: one file that reads the list and generates the second file, which is a very large source file that contains each element in the list. The list should export `elem`-type queries. I'm not quite sure how to handle wildcards that appear in the list - that part is up to you. Thanks for helping out with this :] --Myles On Thu, Feb 2, 2012 at 10:53 PM, Michael Snoyman mich...@snoyman.comwrote: Looks good to me too. I agree with Aristid: let's make the change to cookie itself. Do you want to send a pull request? I'm also considering making the SetCookie constructor hidden like we have for Request, so that if in the future we realize we need to add some other settings, it doesn't break the API. Chris: I would recommend compiling it into the module. Best bet would likely being converting the source file to Haskell source. Michael On Fri, Feb 3, 2012 at 6:32 AM, Myles C. Maxfield myles.maxfi...@gmail.com wrote: Alright. After reading the spec, I have these questions / concerns: The spec supports the Max-Age cookie attribute, which Web.Cookies doesn't. I see two possible solutions to this. The first is to have parseSetCookie take a UTCTime as an argument which will represent the current time so it can populate the setCookieExpires field by adding the Max-Age attribute to the current time. Alternatively, that function can return an IO SetCookie so it can ask for the current time by itself (which I think is inferior to taking the current time as an argument). Note that the spec says to prefer Max-Age over Expires. Add a field to SetCookie of type Maybe DiffTime which represents the Max-Age attribute Cookie code should be aware of the Public Suffix List as a part of its domain verification. The cookie code only needs to be able to tell if a specific string is in the list (W.Ascii - Bool) I propose making an entirely unrelated package, public-suffix-list, with a module Network.PublicSuffixList, which will expose this function, as well as functions about parsing the list itself. Thoughts? Web.Cookie doesn't have a secure-only attribute. Adding one in is straightforward enough. The spec describes cookies as a property of HTTP, not of the World Wide Web. Perhaps Web.Cookie should be renamed? Just a thought; it doesn't really matter to me. As for Network.HTTP.Conduit.Cookie, the spec describes in section 5.3 Storage Model what fields a Cookie has. Here is my proposal for the functions it will expose: receiveSetCookie :: SetCookie - Req.Request m - UTCTime - Bool - CookieJar - CookieJar Runs the algorithm described in section 5.3 Storage Model The UTCTime is the current-time, the Bool is whether or not the caller is an HTTP-based API (as opposed to JavaScript or anything else) updateCookieJar :: Res.Response a - Req.Request m - UTCTime - CookieJar - (CookieJar, Res.Response a) Applies receiveSetCookie to a Response. The output CookieJar is stripped of any Set-Cookie headers. Specifies True for the Bool in receiveSetCookie computeCookieString :: Req.Request m - CookieJar - UTCTime - Bool - (W.Ascii, CookieJar) Runs the algorithm described in section 5.4 The Cookie Header The UTCTime and Bool are the same as in receiveSetCookie insertCookiesIntoRequest :: Req.Request m - CookieJar - UTCTime - (Req.Request m, CookieJar) Applies computeCookieString to a Request. The output cookie jar has updated last-accessed-times. Specifies True for the Bool in computeCookieString evictExpiredCookies :: CookieJar - UTCTime - CookieJar Runs the algorithm described in the last part of section 5.3 Storage Model This will make the relevant part of 'http' look like: go count req'' cookie_jar'' = do now - liftIO $ getCurrentTime let (req', cookie_jar') = insertCookiesIntoRequest req'' (evictExpiredCookies cookie_jar'' now) now res' - httpRaw req' manager let (cookie_jar, res) = updateCookieJar res' req' now cookie_jar' case getRedirectedRequest req' (responseHeaders res) (W.statusCode (statusCode res)) of Just req - go (count - 1) req cookie_jar Nothing - return res I plan to not allow for a user-supplied cookieFilter function. If they want that functionality, they can re-implement the redirection-following logic. Any thoughts on any of this? Thanks, Myles
Re: [Haskell-cafe] Contributing to http-conduit
Nope. I'm not. The RFC is very explicit about how to handle cookies. As soon as I'm finished making sense of it (in terms of Haskell) I'll send another proposal email. On Feb 1, 2012 3:25 AM, Michael Snoyman mich...@snoyman.com wrote: You mean you're *not* making this proposal? On Wed, Feb 1, 2012 at 7:30 AM, Myles C. Maxfield myles.maxfi...@gmail.com wrote: Well, this is embarrassing. Please disregard my previous email. I should learn to read the RFC *before* submitting proposals. --Myles On Tue, Jan 31, 2012 at 6:37 PM, Myles C. Maxfield myles.maxfi...@gmail.com wrote: Here are my initial ideas about supporting cookies. Note that I'm using Chrome for ideas since it's open source. Network/HTTP/Conduit/Cookies.hs file Exporting the following symbols: type StuffedCookie = SetCookie A regular SetCookie can have Nothing for its Domain and Path attributes. A StuffedCookie has to have these fields set. type CookieJar = [StuffedCookie] Chrome's cookie jar is implemented as (the C++ equivalent of) Map W.Ascii StuffedCookie. The key is the eTLD+1 of the domain, so lookups for all cookies for a given domain are fast. I think I'll stay with just a list of StuffedCookies just to keep it simple. Perhaps a later revision can implement the faster map. getRelevantCookies :: Request m - CookieJar - UTCTime - (CookieJar, Cookies) Gets all the cookies from the cookie jar that should be set for the given Request. The time argument is whatever now is (it's pulled out of the function so the function can remain pure and easily testable) The function will also remove expired cookies from the cookie jar (given what now is) and return the filtered cookie jar putRelevantCookies :: Request m - CookieJar - [StuffedCookie] - CookieJar Insert cookies from a server response into the cookie jar. The first argument is only used for checking to see which cookies are valid (which cookies match the requested domain, etc, so site1.comcan't set a cookie for site2.com) stuffCookie :: Request m - SetCookie - StuffedCookie If the SetCookie's fields are Nothing, fill them in given the Request from which it originated getCookies :: Response a - ([SetCookie], Response a) Pull cookies out of a server response. Return the response with the Set-Cookie headers filtered out putCookies :: Request a - Cookies - Request a A wrapper around renderCookies. Inserts some cookies into a request. Doesn't overwrite cookies that are already set in the request These functions will be exported from Network.HTTP.Conduit as well, so callers can use them to re-implement redirection chains I won't implement a cookie filtering function (like what Network.Browser has) If you want to have arbitrary handling of cookies, re-implement redirection following. It's not very difficult if you use the API provided, and the 'http' function is open source so you can use that as a reference. I will implement the functions according to RFC 6265 I will also need to write the following functions. Should they also be exported? canonicalizeDomain :: W.Ascii - W.Ascii turns ..a.b.c..d.com... to a.b.c.d.com Technically necessary for domain matching (Chrome does it) Perhaps unnecessary for a first pass? Perhaps we can trust users for now? domainMatches :: W.Ascii - W.Ascii - Maybe W.Ascii Does the first domain match against the second domain? If so, return the prefix of the first that isn't in the second pathMatches :: W.Ascii - W.Ascii - Bool Do the paths match? In order to implement domain matching, I have to have knowledge of the Public Suffix List so I know that sub1.sub2.pvt.k12.wy.us can set a cookie for sub2.pvt.k12.wy.us but not for k12.wy.us (because pvt.k12.wy.us is a suffix). There are a variety of ways to implement this. As far as I can tell, Chrome does it by using a script (which a human periodically runs) which parses the list at creates a .cc file that is included in the build. I might be wrong about the execution of the script; it might be a build step. If it is a build step, however, it is suspicious that a build target would try to download a file... Any more elegant ideas? Feedback on any/all of the above would be very helpful before I go off into the weeds on this project. Thanks, Myles C. Maxfield On Sat, Jan 28, 2012 at 8:17 PM, Michael Snoyman mich...@snoyman.com wrote: Thanks, looks great! I've merged it into the Github tree. On Sat, Jan 28, 2012 at 8:36 PM, Myles C. Maxfield myles.maxfi...@gmail.com wrote: Ah, yes, you're completely right. I completely agree that moving the function into the Maybe monad increases readability. This kind of function is what the Maybe monad was designed for. Here is a revised patch. On Sat, Jan 28, 2012 at 8:28 AM, Michael Snoyman mich...@snoyman.com wrote
Re: [Haskell-cafe] Contributing to http-conduit
Here are my initial ideas about supporting cookies. Note that I'm using Chrome for ideas since it's open source. - Network/HTTP/Conduit/Cookies.hs file - Exporting the following symbols: - type StuffedCookie = SetCookie - A regular SetCookie can have Nothing for its Domain and Path attributes. A StuffedCookie has to have these fields set. - type CookieJar = [StuffedCookie] - Chrome's cookie jar is implemented as (the C++ equivalent of) Map W.Ascii StuffedCookie. The key is the eTLD+1 of the domain, so lookups for all cookies for a given domain are fast. - I think I'll stay with just a list of StuffedCookies just to keep it simple. Perhaps a later revision can implement the faster map. - getRelevantCookies :: Request m - CookieJar - UTCTime - (CookieJar, Cookies) - Gets all the cookies from the cookie jar that should be set for the given Request. - The time argument is whatever now is (it's pulled out of the function so the function can remain pure and easily testable) - The function will also remove expired cookies from the cookie jar (given what now is) and return the filtered cookie jar - putRelevantCookies :: Request m - CookieJar - [StuffedCookie] - CookieJar - Insert cookies from a server response into the cookie jar. - The first argument is only used for checking to see which cookies are valid (which cookies match the requested domain, etc, so site1.com can't set a cookie for site2.com) - stuffCookie :: Request m - SetCookie - StuffedCookie - If the SetCookie's fields are Nothing, fill them in given the Request from which it originated - getCookies :: Response a - ([SetCookie], Response a) - Pull cookies out of a server response. Return the response with the Set-Cookie headers filtered out - putCookies :: Request a - Cookies - Request a - A wrapper around renderCookies. Inserts some cookies into a request. - Doesn't overwrite cookies that are already set in the request - These functions will be exported from Network.HTTP.Conduit as well, so callers can use them to re-implement redirection chains - I won't implement a cookie filtering function (like what Network.Browser has) - If you want to have arbitrary handling of cookies, re-implement redirection following. It's not very difficult if you use the API provided, and the 'http' function is open source so you can use that as a reference. - I will implement the functions according to RFC 6265 - I will also need to write the following functions. Should they also be exported? - canonicalizeDomain :: W.Ascii - W.Ascii - turns ..a.b.c..d.com... to a.b.c.d.com - Technically necessary for domain matching (Chrome does it) - Perhaps unnecessary for a first pass? Perhaps we can trust users for now? - domainMatches :: W.Ascii - W.Ascii - Maybe W.Ascii - Does the first domain match against the second domain? - If so, return the prefix of the first that isn't in the second - pathMatches :: W.Ascii - W.Ascii - Bool - Do the paths match? - In order to implement domain matching, I have to have knowledge of the Public Suffix Listhttp://mxr.mozilla.org/mozilla-central/source/netwerk/dns/effective_tld_names.dat so I know that sub1.sub2.pvt.k12.wy.us can set a cookie for sub2.pvt.k12.wy.us but not for k12.wy.us (because pvt.k12.wy.us is a suffix). There are a variety of ways to implement this. - As far as I can tell, Chrome does it by using a script (which a human periodically runs) which parses the list at creates a .cc file that is included in the build. - I might be wrong about the execution of the script; it might be a build step. If it is a build step, however, it is suspicious that a build target would try to download a file... - Any more elegant ideas? Feedback on any/all of the above would be very helpful before I go off into the weeds on this project. Thanks, Myles C. Maxfield On Sat, Jan 28, 2012 at 8:17 PM, Michael Snoyman mich...@snoyman.comwrote: Thanks, looks great! I've merged it into the Github tree. On Sat, Jan 28, 2012 at 8:36 PM, Myles C. Maxfield myles.maxfi...@gmail.com wrote: Ah, yes, you're completely right. I completely agree that moving the function into the Maybe monad increases readability. This kind of function is what the Maybe monad was designed for. Here is a revised patch. On Sat, Jan 28, 2012 at 8:28 AM, Michael Snoyman mich...@snoyman.com wrote: On Sat, Jan 28, 2012 at 1:20 AM, Myles C. Maxfield myles.maxfi...@gmail.com wrote: the fromJust should never fail, beceause of the guard statement: | 300 = code code 400 isJust l
Re: [Haskell-cafe] Contributing to http-conduit
Well, this is embarrassing. Please disregard my previous email. I should learn to read the RFC *before* submitting proposals. --Myles On Tue, Jan 31, 2012 at 6:37 PM, Myles C. Maxfield myles.maxfi...@gmail.com wrote: Here are my initial ideas about supporting cookies. Note that I'm using Chrome for ideas since it's open source. - Network/HTTP/Conduit/Cookies.hs file - Exporting the following symbols: - type StuffedCookie = SetCookie - A regular SetCookie can have Nothing for its Domain and Path attributes. A StuffedCookie has to have these fields set. - type CookieJar = [StuffedCookie] - Chrome's cookie jar is implemented as (the C++ equivalent of) Map W.Ascii StuffedCookie. The key is the eTLD+1 of the domain, so lookups for all cookies for a given domain are fast. - I think I'll stay with just a list of StuffedCookies just to keep it simple. Perhaps a later revision can implement the faster map. - getRelevantCookies :: Request m - CookieJar - UTCTime - (CookieJar, Cookies) - Gets all the cookies from the cookie jar that should be set for the given Request. - The time argument is whatever now is (it's pulled out of the function so the function can remain pure and easily testable) - The function will also remove expired cookies from the cookie jar (given what now is) and return the filtered cookie jar - putRelevantCookies :: Request m - CookieJar - [StuffedCookie] - CookieJar - Insert cookies from a server response into the cookie jar. - The first argument is only used for checking to see which cookies are valid (which cookies match the requested domain, etc, so site1.com can't set a cookie for site2.com) - stuffCookie :: Request m - SetCookie - StuffedCookie - If the SetCookie's fields are Nothing, fill them in given the Request from which it originated - getCookies :: Response a - ([SetCookie], Response a) - Pull cookies out of a server response. Return the response with the Set-Cookie headers filtered out - putCookies :: Request a - Cookies - Request a - A wrapper around renderCookies. Inserts some cookies into a request. - Doesn't overwrite cookies that are already set in the request - These functions will be exported from Network.HTTP.Conduit as well, so callers can use them to re-implement redirection chains - I won't implement a cookie filtering function (like what Network.Browser has) - If you want to have arbitrary handling of cookies, re-implement redirection following. It's not very difficult if you use the API provided, and the 'http' function is open source so you can use that as a reference. - I will implement the functions according to RFC 6265 - I will also need to write the following functions. Should they also be exported? - canonicalizeDomain :: W.Ascii - W.Ascii - turns ..a.b.c..d.com... to a.b.c.d.com - Technically necessary for domain matching (Chrome does it) - Perhaps unnecessary for a first pass? Perhaps we can trust users for now? - domainMatches :: W.Ascii - W.Ascii - Maybe W.Ascii - Does the first domain match against the second domain? - If so, return the prefix of the first that isn't in the second - pathMatches :: W.Ascii - W.Ascii - Bool - Do the paths match? - In order to implement domain matching, I have to have knowledge of the Public Suffix Listhttp://mxr.mozilla.org/mozilla-central/source/netwerk/dns/effective_tld_names.dat so I know that sub1.sub2.pvt.k12.wy.us can set a cookie for sub2.pvt.k12.wy.us but not for k12.wy.us (because pvt.k12.wy.us is a suffix). There are a variety of ways to implement this. - As far as I can tell, Chrome does it by using a script (which a human periodically runs) which parses the list at creates a .cc file that is included in the build. - I might be wrong about the execution of the script; it might be a build step. If it is a build step, however, it is suspicious that a build target would try to download a file... - Any more elegant ideas? Feedback on any/all of the above would be very helpful before I go off into the weeds on this project. Thanks, Myles C. Maxfield On Sat, Jan 28, 2012 at 8:17 PM, Michael Snoyman mich...@snoyman.comwrote: Thanks, looks great! I've merged it into the Github tree. On Sat, Jan 28, 2012 at 8:36 PM, Myles C. Maxfield myles.maxfi...@gmail.com wrote: Ah, yes, you're completely right. I completely agree that moving the function into the Maybe monad increases readability. This kind of function is what the Maybe monad was designed for. Here
Re: [Haskell-cafe] Contributing to http-conduit
Ah, yes, you're completely right. I completely agree that moving the function into the Maybe monad increases readability. This kind of function is what the Maybe monad was designed for. Here is a revised patch. On Sat, Jan 28, 2012 at 8:28 AM, Michael Snoyman mich...@snoyman.comwrote: On Sat, Jan 28, 2012 at 1:20 AM, Myles C. Maxfield myles.maxfi...@gmail.com wrote: the fromJust should never fail, beceause of the guard statement: | 300 = code code 400 isJust l'' isJust l' = Just $ req Because of the order of the operators, it will only evaluate fromJust after it makes sure that the argument isJust. That function in particular shouldn't throw any exceptions - it should only return Nothing. Knowing that, I don't quite think I understand what your concern is. Can you elaborate? You're right, but I had to squint really hard to prove to myself that you're right. That's the kind of code that could easily be broken in future updates by an unwitting maintainer (e.g., me). To protect the world from me, I'd prefer if the code didn't have the fromJust. This might be a good place to leverage the Monad instance of Maybe. Michael getRedirectedRequest.2.patch Description: Binary data ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Contributing to http-conduit
the fromJust should never fail, beceause of the guard statement: | 300 = code code 400 isJust l'' isJust l' = Just $ req Because of the order of the operators, it will only evaluate fromJust after it makes sure that the argument isJust. That function in particular shouldn't throw any exceptions - it should only return Nothing. Knowing that, I don't quite think I understand what your concern is. Can you elaborate? Thanks, Myles On Thu, Jan 26, 2012 at 12:59 AM, Michael Snoyman mich...@snoyman.comwrote: I'm a little worried about the use of `fromJust`, it will give users a very confusing error message, and the error might be trigged at the wrong point in the computation. I'd feel better if checkRedirect lived in either some Failure, an Either, or maybe even in IO itself. IO might make sense if we want to implement some cookie jar functionality in the future via mutable references. Michael On Thu, Jan 26, 2012 at 10:29 AM, Myles C. Maxfield myles.maxfi...@gmail.com wrote: Here is a patch regarding getRedirectedRequest. Comments are very welcome. --Myles C. Maxfield On Wed, Jan 25, 2012 at 10:21 PM, Myles C. Maxfield myles.maxfi...@gmail.com wrote: I was planning on making the caller deal with keeping track of cookies between requests. My cookie idea only solves the problem of cookies persisting through a redirect chain - not between subsequent request chains. Do you think that Network.HTTP.Conduit should have a persistent cookie jar between caller's requests? I don't really think so. --Myles On Wed, Jan 25, 2012 at 9:28 PM, Michael Snoyman mich...@snoyman.com wrote: On Wed, Jan 25, 2012 at 8:18 PM, Myles C. Maxfield myles.maxfi...@gmail.com wrote: Alright, that's fine. I just wanted to be explicit about the interface we'd be providing. Taking the Request construction code out of 'http' and putting it into its own function should be a quick change - I'll have it to you soon. One possible wrench - The existing code copies some fields (like the proxy) from the original request. In order to keep this functionality, the signature would have to be: checkRedirect :: Request m - Response - Maybe (Request m) Is that okay with you? I think I'd also like to call the function something different, perhaps 'getRedirectedRequest'. Is that okay? I'll also add an example to the documentation about how a caller would get the redirection chain by re-implementing redirection (by using the example in your previous email). Sounds great. As for cookie handling - I think Network.Browser has a pretty elegant solution to this. They allow a CookieFilter which has type of URI - Cookie - IO Bool. Cookies are only put in the cookie jar if the function returns True. There is a default CookieFilter, which behaves as we would expect, but the user can override this function. That way, if you don't want to support cookies, you can just pass in (\ _ _ - return False). Also sounds good. If we're already expecting people that want specific functionality to re-implement the redirect-following code, this solution might be unnecessary. Do you think that such a concept would be beneficial for Network.HTTP.Conduit to implement? Yes, I can imagine that some people would want more fine-grained control of which cookies are accepted. Either way, I'll probably end up making a solution similar to your checkRedirect function that will just allow people to take SetCookies out of a Response and put Cookies into a Request. I'll probably also provide a default function which converts a SetCookie into a cookie by looking up the current time, inspecting the Request, etc. This will allow me to not have to change the type of Request or Response - the functions I'll be writing can deal with the raw Headers that are already in Requests and Responses. Modifying 'http' to use these functions will be straightforward. How does this sound to you? Sounds like a good plan to me. I'm not entirely certain how you're planning on implementing the cookie jar itself. In other words, if I make a request, have a cookie set, and then make another request later, where will the cookie be stored in the interim, and how will the second request know to use it? Michael ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Contributing to http-conduit
Here is a patch regarding getRedirectedRequest. Comments are very welcome. --Myles C. Maxfield On Wed, Jan 25, 2012 at 10:21 PM, Myles C. Maxfield myles.maxfi...@gmail.com wrote: I was planning on making the caller deal with keeping track of cookies between requests. My cookie idea only solves the problem of cookies persisting through a redirect chain - not between subsequent request chains. Do you think that Network.HTTP.Conduit should have a persistent cookie jar between caller's requests? I don't really think so. --Myles On Wed, Jan 25, 2012 at 9:28 PM, Michael Snoyman mich...@snoyman.comwrote: On Wed, Jan 25, 2012 at 8:18 PM, Myles C. Maxfield myles.maxfi...@gmail.com wrote: Alright, that's fine. I just wanted to be explicit about the interface we'd be providing. Taking the Request construction code out of 'http' and putting it into its own function should be a quick change - I'll have it to you soon. One possible wrench - The existing code copies some fields (like the proxy) from the original request. In order to keep this functionality, the signature would have to be: checkRedirect :: Request m - Response - Maybe (Request m) Is that okay with you? I think I'd also like to call the function something different, perhaps 'getRedirectedRequest'. Is that okay? I'll also add an example to the documentation about how a caller would get the redirection chain by re-implementing redirection (by using the example in your previous email). Sounds great. As for cookie handling - I think Network.Browser has a pretty elegant solution to this. They allow a CookieFilter which has type of URI - Cookie - IO Bool. Cookies are only put in the cookie jar if the function returns True. There is a default CookieFilter, which behaves as we would expect, but the user can override this function. That way, if you don't want to support cookies, you can just pass in (\ _ _ - return False). Also sounds good. If we're already expecting people that want specific functionality to re-implement the redirect-following code, this solution might be unnecessary. Do you think that such a concept would be beneficial for Network.HTTP.Conduit to implement? Yes, I can imagine that some people would want more fine-grained control of which cookies are accepted. Either way, I'll probably end up making a solution similar to your checkRedirect function that will just allow people to take SetCookies out of a Response and put Cookies into a Request. I'll probably also provide a default function which converts a SetCookie into a cookie by looking up the current time, inspecting the Request, etc. This will allow me to not have to change the type of Request or Response - the functions I'll be writing can deal with the raw Headers that are already in Requests and Responses. Modifying 'http' to use these functions will be straightforward. How does this sound to you? Sounds like a good plan to me. I'm not entirely certain how you're planning on implementing the cookie jar itself. In other words, if I make a request, have a cookie set, and then make another request later, where will the cookie be stored in the interim, and how will the second request know to use it? Michael getRedirectedRequest.patch Description: Binary data ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Contributing to http-conduit
Alright, that's fine. I just wanted to be explicit about the interface we'd be providing. Taking the Request construction code out of 'http' and putting it into its own function should be a quick change - I'll have it to you soon. One possible wrench - The existing code copies some fields (like the proxy) from the original request. In order to keep this functionality, the signature would have to be: checkRedirect :: Request m - Response - Maybe (Request m) Is that okay with you? I think I'd also like to call the function something different, perhaps 'getRedirectedRequest'. Is that okay? I'll also add an example to the documentation about how a caller would get the redirection chain by re-implementing redirection (by using the example in your previous email). As for cookie handling - I think Network.Browser has a pretty elegant solution to this. They allow a CookieFilter which has type of URIhttp://hackage.haskell.org/packages/archive/network/2.2.1.7/doc/html/Network-URI.html#t%3AURI - Cookiehttp://hackage.haskell.org/packages/archive/HTTP/3001.0.0/doc/html/Network-Browser.html#t%3ACookie - IOhttp://hackage.haskell.org/packages/archive/base/4.2.0.0/doc/html/System-IO.html#t%3AIO Boolhttp://hackage.haskell.org/packages/archive/base/4.2.0.0/doc/html/Data-Bool.html#t%3ABool. Cookies are only put in the cookie jar if the function returns True. There is a default CookieFilter, which behaves as we would expect, but the user can override this function. That way, if you don't want to support cookies, you can just pass in (\ _ _ - return False). If we're already expecting people that want specific functionality to re-implement the redirect-following code, this solution might be unnecessary. Do you think that such a concept would be beneficial for Network.HTTP.Conduit to implement? Either way, I'll probably end up making a solution similar to your checkRedirect function that will just allow people to take SetCookies out of a Response and put Cookies into a Request. I'll probably also provide a default function which converts a SetCookie into a cookie by looking up the current time, inspecting the Request, etc. This will allow me to not have to change the type of Request or Response - the functions I'll be writing can deal with the raw Headers that are already in Requests and Responses. Modifying 'http' to use these functions will be straightforward. How does this sound to you? Thanks, Myles C. Maxfield On Wed, Jan 25, 2012 at 5:10 AM, Aristid Breitkreuz arist...@googlemail.com wrote: The nice thing is that this way, nobody can force me to handle cookies. ;-) Might be that usage patterns emerge, which we can then codify into functions later. Am 25.01.2012 08:09 schrieb Michael Snoyman mich...@snoyman.com: On Wed, Jan 25, 2012 at 9:01 AM, Myles C. Maxfield myles.maxfi...@gmail.com wrote: Sorry, I think I'm still a little confused about this. From the point of view of a library user, if I use the 'http' function, but want to know what final URL I ended up at, I would have to set redirects to 0, call http, call checkRedirect, and recurse until checkRedirect returns Nothing (or a count runs out). I would be handling the recursion of redirects myself. On one hand, this solution is lightweight and easy to implement in the library. On the other hand, the caller has to run each individual request themselves, keeping track of the number of requests (so there isn't an infinite loop). The loop is already implemented in the http function - I think it's reasonable to modify the existing loop rather than expect the caller to re-implement that logic. However, it's probably just as reasonable to say if you want to know what URL you end up at, you have to re-implement your own redirection-following logic. I do agree, however, that including an (possibly long, though explicitly bounded) [Ascii] along with every request is arbitrary, and probably not the best solution. Can you think of a solution which allows the caller to know the url chain (or possibly just the last URL - that's the important one) without having to re-implement the redirection-following logic themselves? It sounds like if you had to choose, you would rather force a caller to re-implement redirection-following rather than include a URL chain in every Response. Is this correct? That's correct. I think knowing the final URL is a fairly arbitrary requirement, in the same boat as wanting redirect handling without supporting cookies. These to me fall well within the 20%: most people won't need them, so the API should not be optimized for them. There's also the fact that [Ascii] isn't nearly enough information to properly follow the chain. Next someone's going to want to know if a request was GET or POST, or whether it was a permanent or temporary redirect, or the exact text of the location header, or a million other things involved. If someone wants this stuff
Re: [Haskell-cafe] Contributing to http-conduit
I was planning on making the caller deal with keeping track of cookies between requests. My cookie idea only solves the problem of cookies persisting through a redirect chain - not between subsequent request chains. Do you think that Network.HTTP.Conduit should have a persistent cookie jar between caller's requests? I don't really think so. --Myles On Wed, Jan 25, 2012 at 9:28 PM, Michael Snoyman mich...@snoyman.comwrote: On Wed, Jan 25, 2012 at 8:18 PM, Myles C. Maxfield myles.maxfi...@gmail.com wrote: Alright, that's fine. I just wanted to be explicit about the interface we'd be providing. Taking the Request construction code out of 'http' and putting it into its own function should be a quick change - I'll have it to you soon. One possible wrench - The existing code copies some fields (like the proxy) from the original request. In order to keep this functionality, the signature would have to be: checkRedirect :: Request m - Response - Maybe (Request m) Is that okay with you? I think I'd also like to call the function something different, perhaps 'getRedirectedRequest'. Is that okay? I'll also add an example to the documentation about how a caller would get the redirection chain by re-implementing redirection (by using the example in your previous email). Sounds great. As for cookie handling - I think Network.Browser has a pretty elegant solution to this. They allow a CookieFilter which has type of URI - Cookie - IO Bool. Cookies are only put in the cookie jar if the function returns True. There is a default CookieFilter, which behaves as we would expect, but the user can override this function. That way, if you don't want to support cookies, you can just pass in (\ _ _ - return False). Also sounds good. If we're already expecting people that want specific functionality to re-implement the redirect-following code, this solution might be unnecessary. Do you think that such a concept would be beneficial for Network.HTTP.Conduit to implement? Yes, I can imagine that some people would want more fine-grained control of which cookies are accepted. Either way, I'll probably end up making a solution similar to your checkRedirect function that will just allow people to take SetCookies out of a Response and put Cookies into a Request. I'll probably also provide a default function which converts a SetCookie into a cookie by looking up the current time, inspecting the Request, etc. This will allow me to not have to change the type of Request or Response - the functions I'll be writing can deal with the raw Headers that are already in Requests and Responses. Modifying 'http' to use these functions will be straightforward. How does this sound to you? Sounds like a good plan to me. I'm not entirely certain how you're planning on implementing the cookie jar itself. In other words, if I make a request, have a cookie set, and then make another request later, where will the cookie be stored in the interim, and how will the second request know to use it? Michael ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Contributing to http-conduit
On Mon, Jan 23, 2012 at 10:43 PM, Michael Snoyman mich...@snoyman.comwrote: On Tue, Jan 24, 2012 at 8:37 AM, Myles C. Maxfield myles.maxfi...@gmail.com wrote: I have attached a patch to add a redirect chain to the Response datatype. Comments on this patch are very welcome. I thought that this isn't necessary since a client wanting to track all the redirects could just handle them manually by setting the redirect count to 0. It seems like a lot of work to re-implement the redirection-following code, just to know which URL the bytes are coming from. I feel that adding this field makes the library easier to use, but it's your call. I was originally going to include the entire Request object in the redirection chain, but Request objects are parameterized with a type 'm', so including a 'Request m' field would force the Response type to be parameterized as well. I felt that would be too large a change, so I made the type of the redirection chain W.Ascii. Perhaps its worth using the 'forall' keyword to get rid of the pesky 'm' type parameter for Requests? data RequestBody = RequestBodyLBS L.ByteString | RequestBodyBS S.ByteString | RequestBodyBuilder Int64 Blaze.Builder | forall m. RequestBodySource Int64 (C.Source m Blaze.Builder) | forall m. RequestBodySourceChunked (C.Source m Blaze.Builder) There'd be no way to run the request body then (try compiling the code after that change). Yeah, I never actually tried this change to see if it works. I'll try it tonight after work. Michael Thanks, Myles ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Contributing to http-conduit
Sorry, I don't think I'm following. What would the meaning of the value returned from checkRedirect be? --Myles On Tue, Jan 24, 2012 at 10:47 AM, Michael Snoyman mich...@snoyman.comwrote: On Tue, Jan 24, 2012 at 6:57 PM, Myles C. Maxfield myles.maxfi...@gmail.com wrote: On Mon, Jan 23, 2012 at 10:43 PM, Michael Snoyman mich...@snoyman.com wrote: On Tue, Jan 24, 2012 at 8:37 AM, Myles C. Maxfield myles.maxfi...@gmail.com wrote: I have attached a patch to add a redirect chain to the Response datatype. Comments on this patch are very welcome. I thought that this isn't necessary since a client wanting to track all the redirects could just handle them manually by setting the redirect count to 0. It seems like a lot of work to re-implement the redirection-following code, just to know which URL the bytes are coming from. I feel that adding this field makes the library easier to use, but it's your call. If that's the concern, I'd much rather just expose a function to help with dealing with redirects, rather than sticking a rather arbitrary [Ascii] in everyone's Response. I think a function along the lines of: checkRedirect :: Response - Maybe Request would fit the bill, and could be extracted from the current `http` function. Michael ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Contributing to http-conduit
Sorry, I think I'm still a little confused about this. From the point of view of a library user, if I use the 'http' function, but want to know what final URL I ended up at, I would have to set redirects to 0, call http, call checkRedirect, and recurse until checkRedirect returns Nothing (or a count runs out). I would be handling the recursion of redirects myself. On one hand, this solution is lightweight and easy to implement in the library. On the other hand, the caller has to run each individual request themselves, keeping track of the number of requests (so there isn't an infinite loop). The loop is already implemented in the http function - I think it's reasonable to modify the existing loop rather than expect the caller to re-implement that logic. However, it's probably just as reasonable to say if you want to know what URL you end up at, you have to re-implement your own redirection-following logic. I do agree, however, that including an (possibly long, though explicitly bounded) [Ascii] along with every request is arbitrary, and probably not the best solution. Can you think of a solution which allows the caller to know the url chain (or possibly just the last URL - that's the important one) without having to re-implement the redirection-following logic themselves? It sounds like if you had to choose, you would rather force a caller to re-implement redirection-following rather than include a URL chain in every Response. Is this correct? Thanks for helping me out with this, Myles C. Maxfield On Tue, Jan 24, 2012 at 8:05 PM, Michael Snoyman mich...@snoyman.comwrote: It would be the new request indicated by the server response, if the server gave a redirect response. On Tue, Jan 24, 2012 at 9:05 PM, Myles C. Maxfield myles.maxfi...@gmail.com wrote: Sorry, I don't think I'm following. What would the meaning of the value returned from checkRedirect be? --Myles On Tue, Jan 24, 2012 at 10:47 AM, Michael Snoyman mich...@snoyman.com wrote: On Tue, Jan 24, 2012 at 6:57 PM, Myles C. Maxfield myles.maxfi...@gmail.com wrote: On Mon, Jan 23, 2012 at 10:43 PM, Michael Snoyman mich...@snoyman.com wrote: On Tue, Jan 24, 2012 at 8:37 AM, Myles C. Maxfield myles.maxfi...@gmail.com wrote: I have attached a patch to add a redirect chain to the Response datatype. Comments on this patch are very welcome. I thought that this isn't necessary since a client wanting to track all the redirects could just handle them manually by setting the redirect count to 0. It seems like a lot of work to re-implement the redirection-following code, just to know which URL the bytes are coming from. I feel that adding this field makes the library easier to use, but it's your call. If that's the concern, I'd much rather just expose a function to help with dealing with redirects, rather than sticking a rather arbitrary [Ascii] in everyone's Response. I think a function along the lines of: checkRedirect :: Response - Maybe Request would fit the bill, and could be extracted from the current `http` function. Michael ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Contributing to http-conduit
I have attached a patch to add a redirect chain to the Response datatype. Comments on this patch are very welcome. I was originally going to include the entire Request object in the redirection chain, but Request objects are parameterized with a type 'm', so including a 'Request m' field would force the Response type to be parameterized as well. I felt that would be too large a change, so I made the type of the redirection chain W.Ascii. Perhaps its worth using the 'forall' keyword to get rid of the pesky 'm' type parameter for Requests? data RequestBody = RequestBodyLBS L.ByteString | RequestBodyBS S.ByteString | RequestBodyBuilder Int64 Blaze.Builder | forall m. RequestBodySource Int64 (C.Source m Blaze.Builder) | forall m. RequestBodySourceChunked (C.Source m Blaze.Builder) --Myles On Mon, Jan 23, 2012 at 3:31 AM, Michael Snoyman mich...@snoyman.comwrote: On Mon, Jan 23, 2012 at 1:20 PM, Aristid Breitkreuz arist...@googlemail.com wrote: Rejecting cookies is not without precedent. If you must force cookie handling upon us, at least make it possible to selectively reject them. Aristid If you turn off automatic redirects, then you won't have cookie handling. I'd be interested to hear of a use case where you would want to avoid passing cookies after a redirect. Michael From d60bc1adf4af5a038432c35cde222654dfabf6dd Mon Sep 17 00:00:00 2001 From: Myles C. Maxfield lithe...@gmail.com Date: Mon, 23 Jan 2012 21:44:12 -0800 Subject: [PATCH] Adding a redirection chain field to Responses --- Network/HTTP/Conduit.hs |7 --- Network/HTTP/Conduit/Request.hs | 24 +++- Network/HTTP/Conduit/Response.hs |7 --- 3 files changed, 31 insertions(+), 7 deletions(-) diff --git a/Network/HTTP/Conduit.hs b/Network/HTTP/Conduit.hs index 794a62a..879d5a8 100644 --- a/Network/HTTP/Conduit.hs +++ b/Network/HTTP/Conduit.hs @@ -147,7 +147,7 @@ http - Manager - ResourceT m (Response (C.Source m S.ByteString)) http req0 manager = do -res@(Response status hs body) - +res@(Response _ status hs body) - if redirectCount req0 == 0 then httpRaw req0 manager else go (redirectCount req0) req0 @@ -160,7 +160,7 @@ http req0 manager = do where go 0 _ = liftBase $ throwIO TooManyRedirects go count req = do -res@(Response (W.Status code _) hs _) - httpRaw req manager +res@(Response uri (W.Status code _) hs _) - httpRaw req manager case (300 = code code 400, lookup location hs) of (True, Just l'') - do -- Prepend scheme, host and port if missing @@ -192,7 +192,8 @@ http req0 manager = do then GET else method l } -go (count - 1) req' +response - go (count - 1) req' +return $ response {requestChain = (head uri) : (requestChain response)} _ - return res -- | Get a 'Response' without any redirect following. diff --git a/Network/HTTP/Conduit/Request.hs b/Network/HTTP/Conduit/Request.hs index e6e8876..a777285 100644 --- a/Network/HTTP/Conduit/Request.hs +++ b/Network/HTTP/Conduit/Request.hs @@ -7,6 +7,7 @@ module Network.HTTP.Conduit.Request , ContentType , Proxy (..) , parseUrl +, unParseUrl , browserDecompress , HttpException (..) , alwaysDecompress @@ -39,7 +40,7 @@ import qualified Network.HTTP.Types as W import Control.Exception (Exception, SomeException, toException) import Control.Failure (Failure (failure)) -import Codec.Binary.UTF8.String (encodeString) +import Codec.Binary.UTF8.String (encode, encodeString) import qualified Data.CaseInsensitive as CI import qualified Data.ByteString.Base64 as B64 @@ -207,6 +208,27 @@ parseUrl2 full sec s = do (readDec rest) x - error $ parseUrl1: this should never happen: ++ show x +unParseUrl :: Request m - W.Ascii +unParseUrl Request { secure = secure' + , host = host' + , port = port' + , path = path' + , queryString = querystring' + } = S.concat + [ http + , if secure' then s else S.empty + , :// + , host' + , case (secure', port') of + (True, 443) - S.empty + (True, p) - S.pack $ encode $ : ++ show p + (False, 80) - S.empty + (False, p) - S.pack $ encode $ : ++ show p + , path' + , ? + , querystring' + ] + data HttpException = StatusCodeException W.Status W.ResponseHeaders | InvalidUrlException String String | TooManyRedirects diff --git a/Network/HTTP/Conduit/Response.hs b/Network/HTTP/Conduit/Response.hs index 5c6fd23..c183e34 100644 --- a/Network/HTTP/Conduit/Response.hs +++ b/Network/HTTP/Conduit/Response.hs @@ -33,7 +33,8 @@ import Network.HTTP.Conduit.Chunk -- | A simple representation of the HTTP
Re: [Haskell-cafe] Contributing to http-conduit
Replies are inline. Thanks for the quick and thoughtful response! On Sat, Jan 21, 2012 at 8:56 AM, Michael Snoyman mich...@snoyman.comwrote: Hi Myles, These sound like two solid features, and I'd be happy to merge in code to support it. Some comments below. On Sat, Jan 21, 2012 at 8:38 AM, Myles C. Maxfield myles.maxfi...@gmail.com wrote: To: Michael Snoyman, author and maintainer of http-conduit CC: haskell-cafe Hello! I am interested in contributing to the http-conduit library. I've been using it for a little while and reading through its source, but have felt that it could be improved with two features: - Allowing the caller to know the final URL that ultimately resulted in the HTTP Source. Because httpRaw is not exported, the caller can't even re-implement the redirect-following code themselves. Ideally, the caller would be able to know not only the final URL, but also the entire chain of URLs that led to the final request. I was thinking that it would be even cooler if the caller could be notified of these redirects as they happen in another thread. There are a couple ways to implement this that I have been thinking about: - A straightforward way would be to add a [W.Ascii] to the type of Response, and getResponse can fill in this extra field. getResponse already knows about the Request so it can tell if the response should be gunzipped. What would be in the [W.Ascii], a list of all paths redirected to? Also, I'm not sure what gunzipping has to do with here, can you clarify? Yes; my idea was to make the [W.Ascii] represent the list of all URLs redirected to, in order. My comment about gunzipping is only tangentially related. I meant that in the latest version of the code on GitHub, the getResponse function already takes a Request as an argument. This means that the getResponse function already knows what URL its data is coming from, so modifying the getResponse function to return that URL is simple. (I mentioned gunzip because, as far as I can tell, the reason that getResponse *already* takes a Request is so that the function can tell if the request should be gunzipped.) - It would be nice for the caller to be able to know in real time what URLs the request is being redirected to. A possible way to do this would be for the 'http' function to take an extra argument of type (Maybe (Control.Concurrent.Chan W.Ascii)) which httpRaw can push URLs into. If the caller doesn't want to use this variable, they can simply pass Nothing. Otherwise, the caller can create an IO thread which reads the Chan until some termination condition is met (Perhaps this will change the type of the extra argument to (Maybe (Chan (Maybe W.Ascii. I like this solution, though I can see how it could be considered too heavyweight. I do think it's too heavyweight. I think if people really want lower-level control of the redirects, they should turn off automatic redirect and allow 3xx responses. Yeah, that totally makes more sense. As it stands, however, httpRaw isn't exported, so a caller has no way of knowing about each individual HTTP transaction. Exporting httpRaw solves the problem I'm trying to solve. If we export httpRaw, should we *also* make 'http' return the URL chain? Doing both is probably the best solution, IMHO. - Making the redirection aware of cookies. There are redirects around the web where the first URL returns a Set-Cookie header and a 3xx code which redirects to another site that expects the cookie that the first HTTP transaction set. I propose to add an (IORef to a Data.Set of Cookies) to the Manager datatype, letting the Manager act as a cookie store as well as a repository of available TCP connections. httpRaw could deal with the cookie store. Network.HTTP.Types does not declare a Cookie datatype, so I would probably be adding one. I would probably take it directly from Network.HTTP.Cookie. Actually, we already have the cookie package for this. I'm not sure if putting the cookie store in the manager is necessarily the right approach, since I can imagine wanting to have separate sessions while reusing the same connections. A different approach could be adding a list of Cookies to both the Request and Response. Ah, looks like you're the maintainer of that package as well! I didn't realize it existed. I should have, though; Yesod must need to know about cookies somehow. As the http-conduit package stands, the headers of the original Request can be set, and the headers of the last Response can be read. Because cookies are implemented on top of headers, the caller knows about the cookies before and after the redirection chain. I'm more interested in the preservation of cookies *within* the redirection chain. As discussed earlier, exposing the httpRaw function allows the entire redirection chain
Re: [Haskell-cafe] Contributing to http-conduit
1. Oops - I overlooked the fact that the redirectCount attribute of a Request is exported (it isn't listed on the documentationhttp://hackage.haskell.org/packages/archive/http-conduit/1.2.0/doc/html/Network-HTTP-Conduit.html probably because the constructor itself isn't exported. This seems like a flaw in Haddock...). Silly me. No need to export httpRaw. 2. I think that stuffing many arguments into the 'http' function is ugly. However, I'm not sure that the number of arguments to 'http' could ever reach an unreasonably large amount. Perhaps I have bad foresight, but I personally feel that adding cookies to the http request will be the last thing that we will need to add. Putting a bound on this growth of arguments makes me more willing to think about this option. On the other hand, using a BrowserAction to modify internal state is very elegant. Which approach do you think is best? I think I'm leaning toward the upper-level Browser module idea. If there was to be a higher-level HTTP library, I would argue that the redirection code should be moved into it, and the only high-level function that the Network.HTTP.Conduit module would export is 'http' (or httpRaw). What do you think about this? Thanks for helping me out with this, Myles C. Maxfield On Sun, Jan 22, 2012 at 9:56 PM, Michael Snoyman mich...@snoyman.comwrote: On Sun, Jan 22, 2012 at 11:07 PM, Myles C. Maxfield myles.maxfi...@gmail.com wrote: Replies are inline. Thanks for the quick and thoughtful response! On Sat, Jan 21, 2012 at 8:56 AM, Michael Snoyman mich...@snoyman.com wrote: Hi Myles, These sound like two solid features, and I'd be happy to merge in code to support it. Some comments below. On Sat, Jan 21, 2012 at 8:38 AM, Myles C. Maxfield myles.maxfi...@gmail.com wrote: To: Michael Snoyman, author and maintainer of http-conduit CC: haskell-cafe Hello! I am interested in contributing to the http-conduit library. I've been using it for a little while and reading through its source, but have felt that it could be improved with two features: Allowing the caller to know the final URL that ultimately resulted in the HTTP Source. Because httpRaw is not exported, the caller can't even re-implement the redirect-following code themselves. Ideally, the caller would be able to know not only the final URL, but also the entire chain of URLs that led to the final request. I was thinking that it would be even cooler if the caller could be notified of these redirects as they happen in another thread. There are a couple ways to implement this that I have been thinking about: A straightforward way would be to add a [W.Ascii] to the type of Response, and getResponse can fill in this extra field. getResponse already knows about the Request so it can tell if the response should be gunzipped. What would be in the [W.Ascii], a list of all paths redirected to? Also, I'm not sure what gunzipping has to do with here, can you clarify? Yes; my idea was to make the [W.Ascii] represent the list of all URLs redirected to, in order. My comment about gunzipping is only tangentially related. I meant that in the latest version of the code on GitHub, the getResponse function already takes a Request as an argument. This means that the getResponse function already knows what URL its data is coming from, so modifying the getResponse function to return that URL is simple. (I mentioned gunzip because, as far as I can tell, the reason that getResponse already takes a Request is so that the function can tell if the request should be gunzipped.) It would be nice for the caller to be able to know in real time what URLs the request is being redirected to. A possible way to do this would be for the 'http' function to take an extra argument of type (Maybe (Control.Concurrent.Chan W.Ascii)) which httpRaw can push URLs into. If the caller doesn't want to use this variable, they can simply pass Nothing. Otherwise, the caller can create an IO thread which reads the Chan until some termination condition is met (Perhaps this will change the type of the extra argument to (Maybe (Chan (Maybe W.Ascii. I like this solution, though I can see how it could be considered too heavyweight. I do think it's too heavyweight. I think if people really want lower-level control of the redirects, they should turn off automatic redirect and allow 3xx responses. Yeah, that totally makes more sense. As it stands, however, httpRaw isn't exported, so a caller has no way of knowing about each individual HTTP transaction. Exporting httpRaw solves the problem I'm trying to solve. If we export httpRaw, should we also make 'http' return the URL chain? Doing both is probably the best solution, IMHO. What's the difference between calling httpRaw and calling http with redirections turned off? Making the redirection aware of cookies
Re: [Haskell-cafe] Contributing to http-conduit
Alright, that sounds good to me. I'll get started on it (the IORef idea). Thanks for the insight! --Myles On Sun, Jan 22, 2012 at 10:42 PM, Michael Snoyman mich...@snoyman.comwrote: On Mon, Jan 23, 2012 at 8:31 AM, Myles C. Maxfield myles.maxfi...@gmail.com wrote: 1. Oops - I overlooked the fact that the redirectCount attribute of a Request is exported (it isn't listed on the documentation probably because the constructor itself isn't exported. This seems like a flaw in Haddock...). Silly me. No need to export httpRaw. 2. I think that stuffing many arguments into the 'http' function is ugly. However, I'm not sure that the number of arguments to 'http' could ever reach an unreasonably large amount. Perhaps I have bad foresight, but I personally feel that adding cookies to the http request will be the last thing that we will need to add. Putting a bound on this growth of arguments I completely disagree here. If we'd followed this approach, rawBody, decompress, redirectCount, and checkStatus all would have been arguments. There's a reason we use a settings data type[1] here. [1] http://www.yesodweb.com/blog/2011/10/settings-types makes me more willing to think about this option. On the other hand, using a BrowserAction to modify internal state is very elegant. Which approach do you think is best? I think I'm leaning toward the upper-level Browser module idea. If there was to be a higher-level HTTP library, I would argue that the redirection code should be moved into it, and the only high-level function that the Network.HTTP.Conduit module would export is 'http' (or httpRaw). What do you think about this? I actually don't want to move the redirection code out from where it is right now. I think that redirection *is* a basic part of HTTP. I'd be more in favor of just bundling cookies in with the current API, possibly with the IORef approach I'd mentioned (unless someone wants to give a different idea). Having a single API that provides both high-level and low-level approaches seems like a win to me. Michael ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Contributing to http-conduit
I'm a little confused as to what you mean by 'cookie handling'. Do you mean cookies being set inside redirects for future requests inside the same redirect chain, or users being able to supply cookies to the first HTTP request and pull them out of the last HTTP response? Clearly, making the original request specify 0 cookies is (will be) trivial. It is up to the caller to determine if he/she wants to pull cookies out of the last server response. As for cookies getting set inside a redirect chain - I believe that the Internet is 'broken' without this. I believe a client which does not set cookies inside a redirect chain is a misbehaving client. Are you suggesting that we have a 'do not obey cookies inside a redirection chain, instead always blindly send this arbitrary (possibly empty) set of cookies' setting? That's fine with me, but we should at least but a big disclaimer around that option saying that its use leads to technically misbehaving client behavior. Comments? --Myles On Sun, Jan 22, 2012 at 11:16 PM, Michael Snoyman mich...@snoyman.comwrote: The only times cookies would be used would be: 1. If you explicitly use it. 2. If you have redirects turned on, and a page that redirects you also sets a cookie. I would think that we would want (2) to be on regardless of user setting, do you disagree? Michael On Mon, Jan 23, 2012 at 8:46 AM, Aristid Breitkreuz arist...@googlemail.com wrote: Just make sure Cookie handling can be disabled completely. Aristid Am 23.01.2012 07:44 schrieb Michael Snoyman mich...@snoyman.com: On Mon, Jan 23, 2012 at 8:31 AM, Myles C. Maxfield myles.maxfi...@gmail.com wrote: 1. Oops - I overlooked the fact that the redirectCount attribute of a Request is exported (it isn't listed on the documentation probably because the constructor itself isn't exported. This seems like a flaw in Haddock...). Silly me. No need to export httpRaw. 2. I think that stuffing many arguments into the 'http' function is ugly. However, I'm not sure that the number of arguments to 'http' could ever reach an unreasonably large amount. Perhaps I have bad foresight, but I personally feel that adding cookies to the http request will be the last thing that we will need to add. Putting a bound on this growth of arguments I completely disagree here. If we'd followed this approach, rawBody, decompress, redirectCount, and checkStatus all would have been arguments. There's a reason we use a settings data type[1] here. [1] http://www.yesodweb.com/blog/2011/10/settings-types makes me more willing to think about this option. On the other hand, using a BrowserAction to modify internal state is very elegant. Which approach do you think is best? I think I'm leaning toward the upper-level Browser module idea. If there was to be a higher-level HTTP library, I would argue that the redirection code should be moved into it, and the only high-level function that the Network.HTTP.Conduit module would export is 'http' (or httpRaw). What do you think about this? I actually don't want to move the redirection code out from where it is right now. I think that redirection *is* a basic part of HTTP. I'd be more in favor of just bundling cookies in with the current API, possibly with the IORef approach I'd mentioned (unless someone wants to give a different idea). Having a single API that provides both high-level and low-level approaches seems like a win to me. Michael ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
[Haskell-cafe] Contributing to http-conduit
To: Michael Snoyman, author and maintainer of http-conduit CC: haskell-cafe Hello! I am interested in contributing to the http-conduit library. I've been using it for a little while and reading through its source, but have felt that it could be improved with two features: - Allowing the caller to know the final URL that ultimately resulted in the HTTP Source. Because httpRaw is not exported, the caller can't even re-implement the redirect-following code themselves. Ideally, the caller would be able to know not only the final URL, but also the entire chain of URLs that led to the final request. I was thinking that it would be even cooler if the caller could be notified of these redirects as they happen in another thread. There are a couple ways to implement this that I have been thinking about: - A straightforward way would be to add a [W.Ascii] to the type of Response, and getResponse can fill in this extra field. getResponse already knows about the Request so it can tell if the response should be gunzipped. - It would be nice for the caller to be able to know in real time what URLs the request is being redirected to. A possible way to do this would be for the 'http' function to take an extra argument of type (Maybe (Control.Concurrent.Chan W.Ascii)) which httpRaw can push URLs into. If the caller doesn't want to use this variable, they can simply pass Nothing. Otherwise, the caller can create an IO thread which reads the Chan until some termination condition is met (Perhaps this will change the type of the extra argument to (Maybe (Chan (Maybe W.Ascii. I like this solution, though I can see how it could be considered too heavyweight. - Making the redirection aware of cookies. There are redirects around the web where the first URL returns a Set-Cookie header and a 3xx code which redirects to another site that expects the cookie that the first HTTP transaction set. I propose to add an (IORef to a Data.Set of Cookies) to the Manager datatype, letting the Manager act as a cookie store as well as a repository of available TCP connections. httpRaw could deal with the cookie store. Network.HTTP.Types does not declare a Cookie datatype, so I would probably be adding one. I would probably take it directly from Network.HTTP.Cookie. I'd be happy to do both of these things, but I'm hoping for your input on how to go about this endeavor. Are these features even good to be pursuing? Should I be going about this entirely differently? Thanks, Myles C. Maxfield P.S. I'm curious about the lack of Network.URI throughout Network.HTTP.Conduit. Is there a particular design decision that led you to use raw ascii strings? ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
[Haskell-cafe] Network.Browser and Network.TLS
Hello! I am interested in extending the Network.HTTP code in the HTTP package to support HTTPS via TLS. A clear candidate is to use the Network.TLS module in the TLS library (because its TLS logic is written in pure Haskell, rather than any of the FFI libraries like Network.Curl or the OpenSSL package). It's simple enough to provide an implementation of the Network.Stream.Stream typeclass around a TLSCtx, and this works for the Network.HTTP.Stream functions. However, I am interested in using the functionality in the Network.Browser module. This module uses the Network.HTTP.HandleStream interface, which is implemented directly on top of the Handle datatype and the Network.BufferType.BufferOp functions. HandleStreams seem to be for allowing the user to pull out an arbitrary data type out of an HTTP stream, not for doing any stream processing the way TLS does. As far as I can tell, the current typeclass system does not allow a TLSCtx to piggypack off of a HandleStream. My assumption is that this interface is used for speed so the user doesn't have to convert some canonical type into the type that he/she desires in client code. TLS, however, must use a specific type to decode the bytes that it pulls out of the stream. I don't think it's reasonable to try to modify the TLS library to decode bytes from an arbitrary type. Decoding necessarily needs byte-level access to its input (and therefore output) streams in a manner extremely similar to the functions that ByteString provides. Perhaps I'm wrong about this, but the conclusion I've reached is that it doesn't make sense for TLS to use an arbitrary typeclass because the interface it requires is so similar to the existing ByteString datatype. If an application wants a specific type out of a TLS stream, it must necessarily convert the type in software. Any speed that might be gained by pulling your native type out of a network connection will be dwarfed anyway by the cost of decryption. The Network.Stream functions allow this by using String type for all data transfers (which is counterintuitive for binary data transfers). An implementation of Network.Stream.Stream using TLS would convert TLS's output ByteString into a String (possibly by doing something like ((map (toEnum . fromIntegral)) . unpack) which doesn't make a whole lot of sense and is fairly wasteful). A client program might even convert it back to a ByteString, so the client program must have knowledge about how the bytes are packed into the String. Network.Browser only seems to have one function which isn't simply a state accessor/mutator: 'request'. This function gives the connection a type of HStream ty = HandleStream ty. As stated before, the HandleStream directly uses the Handle type. This means that, as far as I can tell, there is no way to fit TLS into the Network.Browser module as it stands because the types don't allow for it. Supporting TLS in Network.Browser would have to change the type of 'request' and therefore break every program out there which uses the Network.Browser module. It would be possible to create something like a 'requestHTTPS' function which returns a different type, but this is quite inelegant - there should be one function that inspects the scheme of the URI it is handed. I am left with the conclusion that it is impossible to support TLS in Network.Browser without breaking many Haskell programs. It is obviously possible to fork the HTTP library, but I'd like to improve the state of the existing libraries. Likewise, it is possible to create a new module that supports HTTPS but has different typed functions and lots of code duplication with Network.Browser, but that is quite inelegant. I suppose this is mostly directed at the maintainers of the HTTP and TLS libraries, Sigbjorn Finne and Vincent Hanquez, but I'd be greatful for your input on what I can do to contribute to the Haskell community regarding Network.Browser and Network.TLS. Perhaps I should just use the Network.HTTP.Enumerator module and not deal with Network.Browser? Maybe I'm going about this in entirely the wrong way. Thanks, Myles C. Maxfield ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe