Re: [ANN] Parkour: Hadoop MapReduce in idiomatic Clojure

2013-11-12 Thread Sam Ritchie
Great stuff! Just as a note, Cascalog 2.0 has a lower-level DSL that 
lets you write Cascading in idiomatic clojure. Here are some test examples:


https://github.com/nathanmarz/cascalog/blob/develop/cascalog-core/test/cascalog/cascading/operations_test.clj

Marshall Bockrath-Vandegrift wrote:

ronennark...@gmail.com  writes:


Thanks for releasing this, I personally had to re-invent such
functionality over clojure-hadoop


Glad to do so.  If you’ve been exploring a similar software space, would
be very interested in additional specific feedback.  And PRs :-).


Did you happen to test this over AWS EMR?


I have not run it live on EMR, but the unit test matrix includes Hadoop
versions 0.20.205, 1.0.3, and 2.2.0, which are the sufficiently-recent
Hadoop releases EMR’s documentation claims are supported.



--
Sam Ritchie, Twitter Inc
703.662.1337
@sritchie

--
--
You received this message because you are subscribed to the Google
Groups Clojure group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en
--- 
You received this message because you are subscribed to the Google Groups Clojure group.

To unsubscribe from this group and stop receiving emails from it, send an email 
to clojure+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


Re: [ANN] Parkour: Hadoop MapReduce in idiomatic Clojure

2013-11-12 Thread Marshall Bockrath-Vandegrift
Sam Ritchie sritchi...@gmail.com writes:

 Great stuff!

Thanks!

 Just as a note, Cascalog 2.0 has a lower-level DSL that lets you
 write Cascading in idiomatic clojure. Here are some test examples:

 https://github.com/nathanmarz/cascalog/blob/develop/cascalog-core/test/cascalog/cascading/operations_test.clj

Cool.  I did not know about that part of the API, which does look nifty.
I’m working on a blog post digging into this some, and I’m hoping to
snag one of the lightning talk spots at the Conj, but – I do think
there’s a big difference between writing job-flows which use a
`map`-like `map*` function and literally calling `map` in a literal
plain function[1].

Want a state-bearing sequence-mapping transformation?  With Parkour, you
can just grab bbloom’s `transduce` library[2] and it works just as well
in a remote task as it does in local code, because it does in fact do
literally the same thing in both scenarios.  You can get similar results
in Cascalog/Cascading, but need to first re-express the functionality in
terms of Cascalog/Cascading’s abstractions vs just leaning directly on
Clojure’s.

The algebraic execution planners backing Cascading- and FlumeJava-likes
allow powerful optimization of cross-task operations, but do require all
transformations to be expressed in terms of primitives the planners
understand.  Parkour loses the cross-task awareness, but allows
MapReduce tasks to do anything which can be expressed as operations on a
Clojure reducible collection.  This can include repeated partial
reductions (even map-side), full task-partition reductions, and
arbitrary numbers of disjoint task outputs.

It’s not a perfect example of what I’m talking about, but Parkour does
include an example implementation of the MapReduce algorithm for
transforming a graph into a sparse matrix of absolute-indexed cells:


https://github.atl.damballa/rnd/parkour/blob/master/examples/parkour/examples/matrixify.clj

I’ll see if I can distill out a more compelling example from some real
jobs prior to the Conj :-).

[1] It admittedly hurts this point a bit that Parkour exclusively uses
reducers instead of lazy sequences, but I’m hoping shortly to add the
necessary glue to allow tasks to work via seqs too when desired.

[2] https://github.com/brandonbloom/transduce

-- 
Marshall Bockrath-Vandegrift llas...@damballa.com
Principal Software Engineer, Damballa RD

-- 
-- 
You received this message because you are subscribed to the Google
Groups Clojure group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en
--- 
You received this message because you are subscribed to the Google Groups 
Clojure group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to clojure+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


[ANN] Parkour: Hadoop MapReduce in idiomatic Clojure

2013-11-04 Thread Marshall Bockrath-Vandegrift
I’m pleased to announce the first public release of Parkour, a library
for writing Hadoop MapReduce applications in idiomatic Clojure.  Parkour
takes your Clojure code’s functional gymnastics and sends it
free-running across the urban environment of your Hadoop cluster.

https://github.com/damballa/parkour/

Parkour aims to provide deep Clojure integration for Hadoop.  Programs
using Parkour are normal Clojure programs, using standard Clojure
functions instead of new framework abstractions.  Programs using Parkour
are also full Hadoop programs, with complete access to absolutely
everything possible in raw Java Hadoop MapReduce.  If you know Clojure,
and you know Hadoop, then you’re most of the way to knowing Parkour.

Here is the core of the obligatory “word count” MapReduce program,
written using Parkour:

(defn mapper
  [conf]
  (fn [context input]
(- (mr/vals input)
 (r/mapcat #(str/split % #\s+))
 (r/map #(- [% 1])

(defn reducer
  [conf]
  (fn [context input]
(- (mr/keyvalgroups input)
 (r/map (fn [[word counts]]
  [word (r/reduce + 0 counts)])

(defn word-count
  [dseq dsink]
  (- (pg/input dseq)
  (pg/map #'mapper)
  (pg/partition [Text LongWritable])
  (pg/combine #'reducer)
  (pg/reduce #'reducer)
  (pg/output dsink)))

Parkour includes detailed documentation, ranging from a quickstart
introduction through detailed discussions of several specific aspects:

https://github.com/damballa/parkour/#documentation

Although this is the first public release of Parkour, the Damballa RD
team has been using it extensively since beginning serious development
earlier this year.  We do also use and will continue to use Cascalog,
but we’ve found that Parkour’s simpler model and more direct Hadoop
integration is a better fit for many problems.

I am personally incredibly excited about this release.  I will be at
this year’s Clojure/conj, and will be more than happy to discuss Parkour
in detail with those interested.

Questions and pull requests welcome!

-- 
Marshall Bockrath-Vandegrift llas...@damballa.com
Principal Software Engineer, Damballa RD

-- 
-- 
You received this message because you are subscribed to the Google
Groups Clojure group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en
--- 
You received this message because you are subscribed to the Google Groups 
Clojure group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to clojure+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


Re: [ANN] Parkour: Hadoop MapReduce in idiomatic Clojure

2013-11-04 Thread ronen
Thanks for releasing this, I personally had to re-invent such functionality 
over clojure-hadoop 

Did you happen to test this over AWS EMR?


On Monday, November 4, 2013 3:55:23 PM UTC+2, Marshall Bockrath-Vandegrift 
wrote:

 I’m pleased to announce the first public release of Parkour, a library 
 for writing Hadoop MapReduce applications in idiomatic Clojure.  Parkour 
 takes your Clojure code’s functional gymnastics and sends it 
 free-running across the urban environment of your Hadoop cluster. 

 https://github.com/damballa/parkour/ 

 Parkour aims to provide deep Clojure integration for Hadoop.  Programs 
 using Parkour are normal Clojure programs, using standard Clojure 
 functions instead of new framework abstractions.  Programs using Parkour 
 are also full Hadoop programs, with complete access to absolutely 
 everything possible in raw Java Hadoop MapReduce.  If you know Clojure, 
 and you know Hadoop, then you’re most of the way to knowing Parkour. 

 Here is the core of the obligatory “word count” MapReduce program, 
 written using Parkour: 

 (defn mapper 
   [conf] 
   (fn [context input] 
 (- (mr/vals input) 
  (r/mapcat #(str/split % #\s+)) 
  (r/map #(- [% 1]) 
 
 (defn reducer 
   [conf] 
   (fn [context input] 
 (- (mr/keyvalgroups input) 
  (r/map (fn [[word counts]] 
   [word (r/reduce + 0 counts)]) 
 
 (defn word-count 
   [dseq dsink] 
   (- (pg/input dseq) 
   (pg/map #'mapper) 
   (pg/partition [Text LongWritable]) 
   (pg/combine #'reducer) 
   (pg/reduce #'reducer) 
   (pg/output dsink))) 

 Parkour includes detailed documentation, ranging from a quickstart 
 introduction through detailed discussions of several specific aspects: 

 https://github.com/damballa/parkour/#documentation 

 Although this is the first public release of Parkour, the Damballa RD 
 team has been using it extensively since beginning serious development 
 earlier this year.  We do also use and will continue to use Cascalog, 
 but we’ve found that Parkour’s simpler model and more direct Hadoop 
 integration is a better fit for many problems. 

 I am personally incredibly excited about this release.  I will be at 
 this year’s Clojure/conj, and will be more than happy to discuss Parkour 
 in detail with those interested. 

 Questions and pull requests welcome! 

 -- 
 Marshall Bockrath-Vandegrift lla...@damballa.com javascript: 
 Principal Software Engineer, Damballa RD 



-- 
-- 
You received this message because you are subscribed to the Google
Groups Clojure group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en
--- 
You received this message because you are subscribed to the Google Groups 
Clojure group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to clojure+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


Re: [ANN] Parkour: Hadoop MapReduce in idiomatic Clojure

2013-11-04 Thread Marshall Bockrath-Vandegrift
ronen nark...@gmail.com writes:

 Thanks for releasing this, I personally had to re-invent such
 functionality over clojure-hadoop

Glad to do so.  If you’ve been exploring a similar software space, would
be very interested in additional specific feedback.  And PRs :-).

 Did you happen to test this over AWS EMR?

I have not run it live on EMR, but the unit test matrix includes Hadoop
versions 0.20.205, 1.0.3, and 2.2.0, which are the sufficiently-recent
Hadoop releases EMR’s documentation claims are supported.

-- 
Marshall Bockrath-Vandegrift llas...@damballa.com
Principal Software Engineer, Damballa RD

-- 
-- 
You received this message because you are subscribed to the Google
Groups Clojure group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en
--- 
You received this message because you are subscribed to the Google Groups 
Clojure group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to clojure+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.