Re: [ANN] Parkour: Hadoop MapReduce in idiomatic Clojure
Great stuff! Just as a note, Cascalog 2.0 has a lower-level DSL that lets you write Cascading in idiomatic clojure. Here are some test examples: https://github.com/nathanmarz/cascalog/blob/develop/cascalog-core/test/cascalog/cascading/operations_test.clj Marshall Bockrath-Vandegrift wrote: ronennark...@gmail.com writes: Thanks for releasing this, I personally had to re-invent such functionality over clojure-hadoop Glad to do so. If you’ve been exploring a similar software space, would be very interested in additional specific feedback. And PRs :-). Did you happen to test this over AWS EMR? I have not run it live on EMR, but the unit test matrix includes Hadoop versions 0.20.205, 1.0.3, and 2.2.0, which are the sufficiently-recent Hadoop releases EMR’s documentation claims are supported. -- Sam Ritchie, Twitter Inc 703.662.1337 @sritchie -- -- You received this message because you are subscribed to the Google Groups Clojure group. To post to this group, send email to clojure@googlegroups.com Note that posts from new members are moderated - please be patient with your first post. To unsubscribe from this group, send email to clojure+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/clojure?hl=en --- You received this message because you are subscribed to the Google Groups Clojure group. To unsubscribe from this group and stop receiving emails from it, send an email to clojure+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/groups/opt_out.
Re: [ANN] Parkour: Hadoop MapReduce in idiomatic Clojure
Sam Ritchie sritchi...@gmail.com writes: Great stuff! Thanks! Just as a note, Cascalog 2.0 has a lower-level DSL that lets you write Cascading in idiomatic clojure. Here are some test examples: https://github.com/nathanmarz/cascalog/blob/develop/cascalog-core/test/cascalog/cascading/operations_test.clj Cool. I did not know about that part of the API, which does look nifty. I’m working on a blog post digging into this some, and I’m hoping to snag one of the lightning talk spots at the Conj, but – I do think there’s a big difference between writing job-flows which use a `map`-like `map*` function and literally calling `map` in a literal plain function[1]. Want a state-bearing sequence-mapping transformation? With Parkour, you can just grab bbloom’s `transduce` library[2] and it works just as well in a remote task as it does in local code, because it does in fact do literally the same thing in both scenarios. You can get similar results in Cascalog/Cascading, but need to first re-express the functionality in terms of Cascalog/Cascading’s abstractions vs just leaning directly on Clojure’s. The algebraic execution planners backing Cascading- and FlumeJava-likes allow powerful optimization of cross-task operations, but do require all transformations to be expressed in terms of primitives the planners understand. Parkour loses the cross-task awareness, but allows MapReduce tasks to do anything which can be expressed as operations on a Clojure reducible collection. This can include repeated partial reductions (even map-side), full task-partition reductions, and arbitrary numbers of disjoint task outputs. It’s not a perfect example of what I’m talking about, but Parkour does include an example implementation of the MapReduce algorithm for transforming a graph into a sparse matrix of absolute-indexed cells: https://github.atl.damballa/rnd/parkour/blob/master/examples/parkour/examples/matrixify.clj I’ll see if I can distill out a more compelling example from some real jobs prior to the Conj :-). [1] It admittedly hurts this point a bit that Parkour exclusively uses reducers instead of lazy sequences, but I’m hoping shortly to add the necessary glue to allow tasks to work via seqs too when desired. [2] https://github.com/brandonbloom/transduce -- Marshall Bockrath-Vandegrift llas...@damballa.com Principal Software Engineer, Damballa RD -- -- You received this message because you are subscribed to the Google Groups Clojure group. To post to this group, send email to clojure@googlegroups.com Note that posts from new members are moderated - please be patient with your first post. To unsubscribe from this group, send email to clojure+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/clojure?hl=en --- You received this message because you are subscribed to the Google Groups Clojure group. To unsubscribe from this group and stop receiving emails from it, send an email to clojure+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/groups/opt_out.
[ANN] Parkour: Hadoop MapReduce in idiomatic Clojure
I’m pleased to announce the first public release of Parkour, a library for writing Hadoop MapReduce applications in idiomatic Clojure. Parkour takes your Clojure code’s functional gymnastics and sends it free-running across the urban environment of your Hadoop cluster. https://github.com/damballa/parkour/ Parkour aims to provide deep Clojure integration for Hadoop. Programs using Parkour are normal Clojure programs, using standard Clojure functions instead of new framework abstractions. Programs using Parkour are also full Hadoop programs, with complete access to absolutely everything possible in raw Java Hadoop MapReduce. If you know Clojure, and you know Hadoop, then you’re most of the way to knowing Parkour. Here is the core of the obligatory “word count” MapReduce program, written using Parkour: (defn mapper [conf] (fn [context input] (- (mr/vals input) (r/mapcat #(str/split % #\s+)) (r/map #(- [% 1]) (defn reducer [conf] (fn [context input] (- (mr/keyvalgroups input) (r/map (fn [[word counts]] [word (r/reduce + 0 counts)]) (defn word-count [dseq dsink] (- (pg/input dseq) (pg/map #'mapper) (pg/partition [Text LongWritable]) (pg/combine #'reducer) (pg/reduce #'reducer) (pg/output dsink))) Parkour includes detailed documentation, ranging from a quickstart introduction through detailed discussions of several specific aspects: https://github.com/damballa/parkour/#documentation Although this is the first public release of Parkour, the Damballa RD team has been using it extensively since beginning serious development earlier this year. We do also use and will continue to use Cascalog, but we’ve found that Parkour’s simpler model and more direct Hadoop integration is a better fit for many problems. I am personally incredibly excited about this release. I will be at this year’s Clojure/conj, and will be more than happy to discuss Parkour in detail with those interested. Questions and pull requests welcome! -- Marshall Bockrath-Vandegrift llas...@damballa.com Principal Software Engineer, Damballa RD -- -- You received this message because you are subscribed to the Google Groups Clojure group. To post to this group, send email to clojure@googlegroups.com Note that posts from new members are moderated - please be patient with your first post. To unsubscribe from this group, send email to clojure+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/clojure?hl=en --- You received this message because you are subscribed to the Google Groups Clojure group. To unsubscribe from this group and stop receiving emails from it, send an email to clojure+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/groups/opt_out.
Re: [ANN] Parkour: Hadoop MapReduce in idiomatic Clojure
Thanks for releasing this, I personally had to re-invent such functionality over clojure-hadoop Did you happen to test this over AWS EMR? On Monday, November 4, 2013 3:55:23 PM UTC+2, Marshall Bockrath-Vandegrift wrote: I’m pleased to announce the first public release of Parkour, a library for writing Hadoop MapReduce applications in idiomatic Clojure. Parkour takes your Clojure code’s functional gymnastics and sends it free-running across the urban environment of your Hadoop cluster. https://github.com/damballa/parkour/ Parkour aims to provide deep Clojure integration for Hadoop. Programs using Parkour are normal Clojure programs, using standard Clojure functions instead of new framework abstractions. Programs using Parkour are also full Hadoop programs, with complete access to absolutely everything possible in raw Java Hadoop MapReduce. If you know Clojure, and you know Hadoop, then you’re most of the way to knowing Parkour. Here is the core of the obligatory “word count” MapReduce program, written using Parkour: (defn mapper [conf] (fn [context input] (- (mr/vals input) (r/mapcat #(str/split % #\s+)) (r/map #(- [% 1]) (defn reducer [conf] (fn [context input] (- (mr/keyvalgroups input) (r/map (fn [[word counts]] [word (r/reduce + 0 counts)]) (defn word-count [dseq dsink] (- (pg/input dseq) (pg/map #'mapper) (pg/partition [Text LongWritable]) (pg/combine #'reducer) (pg/reduce #'reducer) (pg/output dsink))) Parkour includes detailed documentation, ranging from a quickstart introduction through detailed discussions of several specific aspects: https://github.com/damballa/parkour/#documentation Although this is the first public release of Parkour, the Damballa RD team has been using it extensively since beginning serious development earlier this year. We do also use and will continue to use Cascalog, but we’ve found that Parkour’s simpler model and more direct Hadoop integration is a better fit for many problems. I am personally incredibly excited about this release. I will be at this year’s Clojure/conj, and will be more than happy to discuss Parkour in detail with those interested. Questions and pull requests welcome! -- Marshall Bockrath-Vandegrift lla...@damballa.com javascript: Principal Software Engineer, Damballa RD -- -- You received this message because you are subscribed to the Google Groups Clojure group. To post to this group, send email to clojure@googlegroups.com Note that posts from new members are moderated - please be patient with your first post. To unsubscribe from this group, send email to clojure+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/clojure?hl=en --- You received this message because you are subscribed to the Google Groups Clojure group. To unsubscribe from this group and stop receiving emails from it, send an email to clojure+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/groups/opt_out.
Re: [ANN] Parkour: Hadoop MapReduce in idiomatic Clojure
ronen nark...@gmail.com writes: Thanks for releasing this, I personally had to re-invent such functionality over clojure-hadoop Glad to do so. If you’ve been exploring a similar software space, would be very interested in additional specific feedback. And PRs :-). Did you happen to test this over AWS EMR? I have not run it live on EMR, but the unit test matrix includes Hadoop versions 0.20.205, 1.0.3, and 2.2.0, which are the sufficiently-recent Hadoop releases EMR’s documentation claims are supported. -- Marshall Bockrath-Vandegrift llas...@damballa.com Principal Software Engineer, Damballa RD -- -- You received this message because you are subscribed to the Google Groups Clojure group. To post to this group, send email to clojure@googlegroups.com Note that posts from new members are moderated - please be patient with your first post. To unsubscribe from this group, send email to clojure+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/clojure?hl=en --- You received this message because you are subscribed to the Google Groups Clojure group. To unsubscribe from this group and stop receiving emails from it, send an email to clojure+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/groups/opt_out.