Re: transducer (+ closures) efficiency question
Correction: The result in the example should be [1 2 3] of course. On Wednesday, June 24, 2015 at 6:34:20 PM UTC+2, Leon Grapenthin wrote: Using a set as filter predicate is idiomatic Clojure. If you don't know whether your filter coll is a set, you can write your filter-contains as (def filter-contains (comp filter set)) E. g. use as (into [] (filter-contains [1 2 3]) [1 2 3 4]) ;- [4] I am not up to date about optimizations in the PersistentHashSet world, but I'd assume for very small filter colls you can gain speed with a linear scan instead of contains?. If you are ok with input, output and filter being a set, I'd recommend looking at clojure.set/intersection for that task. It optimizes by reducing the smaller set. Depending on the sizes of the collections you are dealing with, the set conversion cost may be worth the speed up you can gain. On Wednesday, June 24, 2015 at 12:07:06 AM UTC+2, Sam Raker wrote: Let's say that, as part of an xf, I want to filter out everything in a sequence that's also in some other sequence. Here are some ways of doing that: (defn filter-contains1 [edn-file] (remove (partial contains? (set (read- edn-file edn-file) (defn filter-contains2 [coll] (remove (partial contains? (set coll (def filter-contains3 [coll] (let [coll-as-set (set coll)] (remove ( partial contains? (set coll) I have the strong suspicion that `filter-contains3` is the best of the 3, and `filter-contains1` the worst. The internal mechanics of transduce are a bit of a mystery to me, however: if `filter-contains2` were to be used on a collection of, say, a million items, would `coll` be cast to a set a million times, or is Clojure/the JVM smarter than that? I'm also wondering if anyone has any best practices (or whatever) they can share relating to this kind of intersection of transducers/xfs and closures. It seems to me, for example, that something like (defn my-thing [coll stuff] (let [s (set coll)] ... (comp ... (map foo) (filter bar) (remove (partial contains? s)) ... is awkward, but that a lot of limited-use transducer factory functions (like the ones above) aren't exactly optimal, either. -- You received this message because you are subscribed to the Google Groups Clojure group. To post to this group, send email to clojure@googlegroups.com Note that posts from new members are moderated - please be patient with your first post. To unsubscribe from this group, send email to clojure+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/clojure?hl=en --- You received this message because you are subscribed to the Google Groups Clojure group. To unsubscribe from this group and stop receiving emails from it, send an email to clojure+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: transducer (+ closures) efficiency question
2015-06-24 18:19 GMT+02:00 Ben Wolfson wolf...@gmail.com: I meant a difference in speed. A var-root fetch is a volatile read, which can have several adverse performance effects. Most importantly, it hinders inlining, since the JIT has to place a guard for every inlining site of a volatile read, thus it won't so readily inline. Even when inlined, there must be a memory barrier for the guard, which can lead to pipeline flushes, bus communication and other stalls in execution. -- You received this message because you are subscribed to the Google Groups Clojure group. To post to this group, send email to clojure@googlegroups.com Note that posts from new members are moderated - please be patient with your first post. To unsubscribe from this group, send email to clojure+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/clojure?hl=en --- You received this message because you are subscribed to the Google Groups Clojure group. To unsubscribe from this group and stop receiving emails from it, send an email to clojure+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: transducer (+ closures) efficiency question
On Wed, Jun 24, 2015 at 8:29 AM, Herwig Hochleitner hhochleit...@gmail.com wrote: 2015-06-24 1:25 GMT+02:00 Ben Wolfson wolf...@gmail.com: On Tue, Jun 23, 2015 at 4:17 PM, Ghadi Shayban gshay...@gmail.com wrote: Tangentially: (remove even?) Will be faster than (remove (fn [i] (even? i))) because in the first case the dereference of the var 'even?' happens only once and the value inside the var will be passed to `remove` at the outset. In the second example the var dereference happens for every single item (though it's very cheap). The second example is equivalent to writing (remove #'even?) I can't seem to observe a difference between the two cases. A simple var reference like even? compiles to a fetch of its root val: @#'even? In the first case, that happens only once, passing the resulting fn object to remove. In the second case, the deref happens every time the fn is called, thus allowing for redefinition of even?. I meant a difference in speed. -- Ben Wolfson Human kind has used its intelligence to vary the flavour of drinks, which may be sweet, aromatic, fermented or spirit-based. ... Family and social life also offer numerous other occasions to consume drinks for pleasure. [Larousse, Drink entry] -- You received this message because you are subscribed to the Google Groups Clojure group. To post to this group, send email to clojure@googlegroups.com Note that posts from new members are moderated - please be patient with your first post. To unsubscribe from this group, send email to clojure+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/clojure?hl=en --- You received this message because you are subscribed to the Google Groups Clojure group. To unsubscribe from this group and stop receiving emails from it, send an email to clojure+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: transducer (+ closures) efficiency question
Using a set as filter predicate is idiomatic Clojure. If you don't know whether your filter coll is a set, you can write your filter-contains as (def filter-contains (comp filter set)) E. g. use as (into [] (filter-contains [1 2 3]) [1 2 3 4]) ;- [4] I am not up to date about optimizations in the PersistentHashSet world, but I'd assume for very small filter colls you can gain speed with a linear scan instead of contains?. If you are ok with input, output and filter being a set, I'd recommend looking at clojure.set/intersection for that task. It optimizes by reducing the smaller set. Depending on the sizes of the collections you are dealing with, the set conversion cost may be worth the speed up you can gain. On Wednesday, June 24, 2015 at 12:07:06 AM UTC+2, Sam Raker wrote: Let's say that, as part of an xf, I want to filter out everything in a sequence that's also in some other sequence. Here are some ways of doing that: (defn filter-contains1 [edn-file] (remove (partial contains? (set (read- edn-file edn-file) (defn filter-contains2 [coll] (remove (partial contains? (set coll (def filter-contains3 [coll] (let [coll-as-set (set coll)] (remove ( partial contains? (set coll) I have the strong suspicion that `filter-contains3` is the best of the 3, and `filter-contains1` the worst. The internal mechanics of transduce are a bit of a mystery to me, however: if `filter-contains2` were to be used on a collection of, say, a million items, would `coll` be cast to a set a million times, or is Clojure/the JVM smarter than that? I'm also wondering if anyone has any best practices (or whatever) they can share relating to this kind of intersection of transducers/xfs and closures. It seems to me, for example, that something like (defn my-thing [coll stuff] (let [s (set coll)] ... (comp ... (map foo) (filter bar) (remove (partial contains? s)) ... is awkward, but that a lot of limited-use transducer factory functions (like the ones above) aren't exactly optimal, either. -- You received this message because you are subscribed to the Google Groups Clojure group. To post to this group, send email to clojure@googlegroups.com Note that posts from new members are moderated - please be patient with your first post. To unsubscribe from this group, send email to clojure+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/clojure?hl=en --- You received this message because you are subscribed to the Google Groups Clojure group. To unsubscribe from this group and stop receiving emails from it, send an email to clojure+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: transducer (+ closures) efficiency question
2015-06-24 1:25 GMT+02:00 Ben Wolfson wolf...@gmail.com: On Tue, Jun 23, 2015 at 4:17 PM, Ghadi Shayban gshay...@gmail.com wrote: Tangentially: (remove even?) Will be faster than (remove (fn [i] (even? i))) because in the first case the dereference of the var 'even?' happens only once and the value inside the var will be passed to `remove` at the outset. In the second example the var dereference happens for every single item (though it's very cheap). The second example is equivalent to writing (remove #'even?) I can't seem to observe a difference between the two cases. A simple var reference like even? compiles to a fetch of its root val: @#'even? In the first case, that happens only once, passing the resulting fn object to remove. In the second case, the deref happens every time the fn is called, thus allowing for redefinition of even?. Have you ever passed passed #'handler to a server-start function (e.g. in a ring app), so that you might redefine the handler fn during run time of the server? Ad. thread topic: As detailed, clojure's evaluation semantics already help with transducers. Even better: Built transducer stacks should be pretty accessible to the JIT, since they contain no more var references (in the wiring). Also, since transducers are a kind of lexically closed pipe from input effect to output effect, hotspot's escape analysis should also kick in nicely. -- You received this message because you are subscribed to the Google Groups Clojure group. To post to this group, send email to clojure@googlegroups.com Note that posts from new members are moderated - please be patient with your first post. To unsubscribe from this group, send email to clojure+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/clojure?hl=en --- You received this message because you are subscribed to the Google Groups Clojure group. To unsubscribe from this group and stop receiving emails from it, send an email to clojure+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: transducer (+ closures) efficiency question
As with all discussions involving interoperability, I'd love to see the same questions answered for the JS target too. (This would of course apply to books such as the new Clojure Applied. ) On Jun 24, 2015 9:48 AM, Herwig Hochleitner hhochleit...@gmail.com wrote: 2015-06-24 18:19 GMT+02:00 Ben Wolfson wolf...@gmail.com: I meant a difference in speed. A var-root fetch is a volatile read, which can have several adverse performance effects. Most importantly, it hinders inlining, since the JIT has to place a guard for every inlining site of a volatile read, thus it won't so readily inline. Even when inlined, there must be a memory barrier for the guard, which can lead to pipeline flushes, bus communication and other stalls in execution. -- You received this message because you are subscribed to the Google Groups Clojure group. To post to this group, send email to clojure@googlegroups.com Note that posts from new members are moderated - please be patient with your first post. To unsubscribe from this group, send email to clojure+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/clojure?hl=en --- You received this message because you are subscribed to the Google Groups Clojure group. To unsubscribe from this group and stop receiving emails from it, send an email to clojure+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/d/optout. -- You received this message because you are subscribed to the Google Groups Clojure group. To post to this group, send email to clojure@googlegroups.com Note that posts from new members are moderated - please be patient with your first post. To unsubscribe from this group, send email to clojure+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/clojure?hl=en --- You received this message because you are subscribed to the Google Groups Clojure group. To unsubscribe from this group and stop receiving emails from it, send an email to clojure+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: transducer (+ closures) efficiency question
Oh fantastic! I was 100% wrong in literally the best way. Thanks! On Tue, Jun 23, 2015 at 7:17 PM, Ghadi Shayban gshay...@gmail.com wrote: Good question. Clojure's evaluation semantics dictate that the arguments are evaluated (computed) *before* calling the function. So(set coll) is computed before being passed to `partial`. Partial receives a function (a value) and arguments (also values) and returns back a new function that saves those original arguments (which happen to be stuffed away in Java final fields). All three of your filters return a transducer. All three of the inputs to `remove` are a partial function. Each of the arguments to the call to partial is a set. They are all essentially equivalent, and should perform the same. (Except filter3 happens to create the set twice; once in the let, and once as the arg to partial). So over a billion item collection, the set in your examples will only be computed once, once, and twice respectively. Note however that sets *are* functions that evaluate whether the argument is in the set. This means you could remove the call to partial and shorten to: (defn filter-contains1 [edn-file] (remove (set (read-edn-file edn-file Tangentially: (remove even?) Will be faster than (remove (fn [i] (even? i))) because in the first case the dereference of the var 'even?' happens only once and the value inside the var will be passed to `remove` at the outset. In the second example the var dereference happens for every single item (though it's very cheap). The second example is equivalent to writing (remove #'even?) On Tuesday, June 23, 2015 at 6:07:06 PM UTC-4, Sam Raker wrote: Let's say that, as part of an xf, I want to filter out everything in a sequence that's also in some other sequence. Here are some ways of doing that: (defn filter-contains1 [edn-file] (remove (partial contains? (set (read- edn-file edn-file) (defn filter-contains2 [coll] (remove (partial contains? (set coll (def filter-contains3 [coll] (let [coll-as-set (set coll)] (remove ( partial contains? (set coll) I have the strong suspicion that `filter-contains3` is the best of the 3, and `filter-contains1` the worst. The internal mechanics of transduce are a bit of a mystery to me, however: if `filter-contains2` were to be used on a collection of, say, a million items, would `coll` be cast to a set a million times, or is Clojure/the JVM smarter than that? I'm also wondering if anyone has any best practices (or whatever) they can share relating to this kind of intersection of transducers/xfs and closures. It seems to me, for example, that something like (defn my-thing [coll stuff] (let [s (set coll)] ... (comp ... (map foo) (filter bar) (remove (partial contains? s)) ... is awkward, but that a lot of limited-use transducer factory functions (like the ones above) aren't exactly optimal, either. -- You received this message because you are subscribed to the Google Groups Clojure group. To post to this group, send email to clojure@googlegroups.com Note that posts from new members are moderated - please be patient with your first post. To unsubscribe from this group, send email to clojure+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/clojure?hl=en --- You received this message because you are subscribed to a topic in the Google Groups Clojure group. To unsubscribe from this topic, visit https://groups.google.com/d/topic/clojure/H0zoF6jlY-c/unsubscribe. To unsubscribe from this group and all its topics, send an email to clojure+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/d/optout. -- . -- You received this message because you are subscribed to the Google Groups Clojure group. To post to this group, send email to clojure@googlegroups.com Note that posts from new members are moderated - please be patient with your first post. To unsubscribe from this group, send email to clojure+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/clojure?hl=en --- You received this message because you are subscribed to the Google Groups Clojure group. To unsubscribe from this group and stop receiving emails from it, send an email to clojure+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/d/optout.
transducer (+ closures) efficiency question
Let's say that, as part of an xf, I want to filter out everything in a sequence that's also in some other sequence. Here are some ways of doing that: (defn filter-contains1 [edn-file] (remove (partial contains? (set (read-edn-file edn-file) (defn filter-contains2 [coll] (remove (partial contains? (set coll (def filter-contains3 [coll] (let [coll-as-set (set coll)] (remove (partial contains? (set coll) I have the strong suspicion that `filter-contains3` is the best of the 3, and `filter-contains1` the worst. The internal mechanics of transduce are a bit of a mystery to me, however: if `filter-contains2` were to be used on a collection of, say, a million items, would `coll` be cast to a set a million times, or is Clojure/the JVM smarter than that? I'm also wondering if anyone has any best practices (or whatever) they can share relating to this kind of intersection of transducers/xfs and closures. It seems to me, for example, that something like (defn my-thing [coll stuff] (let [s (set coll)] ... (comp ... (map foo) (filter bar) (remove (partial contains? s)) ... is awkward, but that a lot of limited-use transducer factory functions (like the ones above) aren't exactly optimal, either. -- You received this message because you are subscribed to the Google Groups Clojure group. To post to this group, send email to clojure@googlegroups.com Note that posts from new members are moderated - please be patient with your first post. To unsubscribe from this group, send email to clojure+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/clojure?hl=en --- You received this message because you are subscribed to the Google Groups Clojure group. To unsubscribe from this group and stop receiving emails from it, send an email to clojure+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: transducer (+ closures) efficiency question
Good question. Clojure's evaluation semantics dictate that the arguments are evaluated (computed) *before* calling the function. So(set coll) is computed before being passed to `partial`. Partial receives a function (a value) and arguments (also values) and returns back a new function that saves those original arguments (which happen to be stuffed away in Java final fields). All three of your filters return a transducer. All three of the inputs to `remove` are a partial function. Each of the arguments to the call to partial is a set. They are all essentially equivalent, and should perform the same. (Except filter3 happens to create the set twice; once in the let, and once as the arg to partial). So over a billion item collection, the set in your examples will only be computed once, once, and twice respectively. Note however that sets *are* functions that evaluate whether the argument is in the set. This means you could remove the call to partial and shorten to: (defn filter-contains1 [edn-file] (remove (set (read-edn-file edn-file Tangentially: (remove even?) Will be faster than (remove (fn [i] (even? i))) because in the first case the dereference of the var 'even?' happens only once and the value inside the var will be passed to `remove` at the outset. In the second example the var dereference happens for every single item (though it's very cheap). The second example is equivalent to writing (remove #'even?) On Tuesday, June 23, 2015 at 6:07:06 PM UTC-4, Sam Raker wrote: Let's say that, as part of an xf, I want to filter out everything in a sequence that's also in some other sequence. Here are some ways of doing that: (defn filter-contains1 [edn-file] (remove (partial contains? (set (read- edn-file edn-file) (defn filter-contains2 [coll] (remove (partial contains? (set coll (def filter-contains3 [coll] (let [coll-as-set (set coll)] (remove ( partial contains? (set coll) I have the strong suspicion that `filter-contains3` is the best of the 3, and `filter-contains1` the worst. The internal mechanics of transduce are a bit of a mystery to me, however: if `filter-contains2` were to be used on a collection of, say, a million items, would `coll` be cast to a set a million times, or is Clojure/the JVM smarter than that? I'm also wondering if anyone has any best practices (or whatever) they can share relating to this kind of intersection of transducers/xfs and closures. It seems to me, for example, that something like (defn my-thing [coll stuff] (let [s (set coll)] ... (comp ... (map foo) (filter bar) (remove (partial contains? s)) ... is awkward, but that a lot of limited-use transducer factory functions (like the ones above) aren't exactly optimal, either. -- You received this message because you are subscribed to the Google Groups Clojure group. To post to this group, send email to clojure@googlegroups.com Note that posts from new members are moderated - please be patient with your first post. To unsubscribe from this group, send email to clojure+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/clojure?hl=en --- You received this message because you are subscribed to the Google Groups Clojure group. To unsubscribe from this group and stop receiving emails from it, send an email to clojure+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: transducer (+ closures) efficiency question
On Tue, Jun 23, 2015 at 4:17 PM, Ghadi Shayban gshay...@gmail.com wrote: Tangentially: (remove even?) Will be faster than (remove (fn [i] (even? i))) because in the first case the dereference of the var 'even?' happens only once and the value inside the var will be passed to `remove` at the outset. In the second example the var dereference happens for every single item (though it's very cheap). The second example is equivalent to writing (remove #'even?) I can't seem to observe a difference between the two cases. -- Ben Wolfson Human kind has used its intelligence to vary the flavour of drinks, which may be sweet, aromatic, fermented or spirit-based. ... Family and social life also offer numerous other occasions to consume drinks for pleasure. [Larousse, Drink entry] -- You received this message because you are subscribed to the Google Groups Clojure group. To post to this group, send email to clojure@googlegroups.com Note that posts from new members are moderated - please be patient with your first post. To unsubscribe from this group, send email to clojure+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/clojure?hl=en --- You received this message because you are subscribed to the Google Groups Clojure group. To unsubscribe from this group and stop receiving emails from it, send an email to clojure+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/d/optout.