Re: Recommendations for a schema-based data language for use in Hadoop?

2015-08-05 Thread Marshall Bockrath-Vandegrift
Ryan Schmitt rschm...@u.rochester.edu writes:

 I'm currently working on some problems in the big data space, and I'm
 more or less starting from scratch with the Hadoop ecosystem. I was
 looking at ways to work with data in Hadoop, and I realized that
 (because of how InputFormat splitting works) this is a use case where
 it's actually pretty important to use a data language with an external
 schema.

At Damballa we extensively use Avro for these sorts of problems.  We’ve
written a set of Clojure bindings for Avro named “abracad” [1].  Abracad
exposes Avro data as native Clojure data (persistent vectors, maps,
etc), supports protocol-based de/serialization of custom types, and
includes explicit support for defining “EDN-in-Avro” schemas which can
include arbitrary Clojure data.

We’ve implemented support in the mainline Java Avro project (merged in
1.7.5) for specifying configurable “data models” for MapReduce jobs,
which allows Avro MapReduce input to directly produce Clojure data and
output to consume Clojure data.  And we’ve implemented fairly automatic
configuration for such in the Avro dseqs of our “parkour”
Clojure-Hadoop/MR integration library [2].

[1] https://github.com/damballa/abracad

[2] https://github.com/damballa/parkour

-- 
Marshall Bockrath-Vandegrift llas...@damballa.com
Principal Software Engineer, Damballa RD

-- 
You received this message because you are subscribed to the Google
Groups Clojure group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en
--- 
You received this message because you are subscribed to the Google Groups 
Clojure group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to clojure+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Lein :provided profile and uberjar not working

2015-06-03 Thread Marshall Bockrath-Vandegrift
Scott Klarenbach doyouunderst...@gmail.com writes:

 I'd like to exclude certain dependencies from my uberjar, by using the
 :provided profile, but the jars are always included.

I believe the problem you’re seeing is due to the fact that Leiningen
doesn’t quite think about dependency `scope` the same way Maven does.
Adding a direct-only dependency to the `:provided` profile does mark it
as `provided` `scope` in any resulting `pom.xml` and does not include
the dependencies added to an uberjar.  But if one of your normal
dependencies *also* depends on the one you’ve added to `:provided`, then
the “provided-ness” isn’t propogated and it will get pulled into the
uberjar, just like any other indirect dependency.

The work-around is first to add the provided dependency to the
`:provided` profile as you’ve already done, then also add an
`:exclusion` for that same artifact, either at the top-level or on the
offending other direct dependencies.

HTH,

-- 
Marshall Bockrath-Vandegrift llas...@damballa.com
Principal Software Engineer, Damballa RD

-- 
You received this message because you are subscribed to the Google
Groups Clojure group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en
--- 
You received this message because you are subscribed to the Google Groups 
Clojure group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to clojure+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Identifying objects that cannot be read

2015-06-03 Thread Marshall Bockrath-Vandegrift
Richard Möhn richard.mo...@posteo.de writes:

 How do I remove from a nested datastructure the objects whose (pr obj)
 representation doesn't comply with the EDN specification? Fx, 
 {:a 1 :b (find-ns 'user)} → {:a 1}
 Or, easier, how do I just not print these objects?

Additionally/alternatively to the 1.7.0 changes mentioned by Alex
Miller, check out:

https://github.com/llasram/letterpress

If you can stomach the (very gentle) monkey-patching, it provides the
ability to arbitrarily redefine and hook printing within a particular
dynamic scope.

-- 
Marshall Bockrath-Vandegrift llas...@damballa.com
Principal Software Engineer, Damballa RD | 518.859.4559m

-- 
You received this message because you are subscribed to the Google
Groups Clojure group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en
--- 
You received this message because you are subscribed to the Google Groups 
Clojure group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to clojure+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Interop: mutability vs. inheritance

2015-05-27 Thread Marshall Bockrath-Vandegrift
Mars0i marsh...@logical.net writes:

 I believe these statements are correct, but gen-class is complex and
 deftype use has some nuances, so I want to make sure I haven't missed
 something.

You have missed one possible approach – write a small integration class
in Java which delegates behavior to a parameterized something more
easily expressed in idiomatic Clojure (e.g., an `IFn` function or
`reify`d interface).

I’ve made this joke before, but: Java is a pretty solid DSL for writing
Java classes.  When you need a Java class which conforms to some
combination of Java-side abstract signatures (extending a base class,
AOTed for discovery by name via reflection, annotations, etc), in my
experience it is frequently/nearly-always easier to adapt from that
interface to Clojure in Java that it is to adapt to that interface from
Clojure in Clojure.

Some examples:

https://github.com/damballa/parkour/blob/master/src/java/parkour/hadoop/Mappers.java
https://github.com/damballa/abracad/blob/master/src/java/abracad/avro/ClojureDatumReader.java

HTH,

-- 
Marshall Bockrath-Vandegrift llas...@damballa.com
Principal Software Engineer, Damballa RD

-- 
You received this message because you are subscribed to the Google
Groups Clojure group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en
--- 
You received this message because you are subscribed to the Google Groups 
Clojure group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to clojure+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: [ANN] Clojure 1.7.0-RC1 now available

2015-05-27 Thread Marshall Bockrath-Vandegrift
On Tue, May 26, 2015 at 11:29 PM Alex Miller a...@puredanger.com wrote:


 The point I was getting at is really whether you should consider this to
 be broken with the old behavior too.


Such APIs are tricky to use correctly from Clojure via seqs, but it is
possible to do so with the normal automatic clojure.lang.RT seq adapters
in Clojure =1.6. My point is that existing Clojure code can and does
depend on non-chunked realization of iterator seqs, and that for such code
this is a breaking change.


 Can you point to code for the original behavior allowed room to transform
 the mutated object into an object which *could* be safely cached in a
 'downstream' seq? By what means does this transformation occur? It sounds
 to me like you are starting with an Iterator, creating a seq, then walking
 the seq exactly once, one element at a time, and producing a new
 transformed seq or other output.


Exactly -- the unfortunately Java =1.6-only snippet I posted earlier is
just such an example of this.


 If you did reuse that IteratorSeq, all of the elements of the sequence
 would point to the same object which would be in the last state like the
 java 1.6 example you gave. Thus, the caching capability of the seq can't
 possibly be something you're using. And if that's true, then why are you
 paying the allocation and synchronization costs of making the seq at all?
 Why not just use the iterator directly, thus skipping all the extra
 allocation that these object-reusing high-performance iterators are working
 so hard to avoid in the first place? In 1.7, transducers would give you
 exactly the capability to walk the source iterator, apply a transducer
 version of your transformation, and output to a collection (via into), a
 value (via transduce), or a lazy sequence (via sequence). I think you would
 find this faster as well due to reduced allocation (possibly greatly
 reduced depending on the transformation).


I've personally used reducers wherever possible since they were introduced,
and for the Hadoop case Parkour's primary recommended API is in terms of
reducers [1]. For new code, the transducer-based facilities in Clojure 1.7
will certainly provide more options for functional-safe handling of the
iterators at question.  But to repeat my main point, those don't help with
existing code which depends on the one-at-a-time realization semantics of
Iterators being reflected in one-at-a-time realization in iterator-backed
seqs.

[1] https://github.com/damballa/parkour/blob/master/doc/reducers-vs-seqs.md

Thanks,

-Marshall

-- 
You received this message because you are subscribed to the Google
Groups Clojure group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en
--- 
You received this message because you are subscribed to the Google Groups 
Clojure group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to clojure+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: [ANN] Clojure 1.7.0-RC1 now available

2015-05-26 Thread Marshall Bockrath-Vandegrift
Ugh -- looks like the iterator value re-use behavior for EnumMap entrySet
was fixed post Java 1.6 (my example was under Java 1.6, which I believe
is still a Clojure-supported version of Java?).  I can throw together a
synthetic example, but I think the people following this thread get what's
happening.  The point isn't whether this pattern is a good idea or not
(it certainly isn't) but whether existing Java APIs people want to interop
with use it (they certainly do).

I presently depend on no less than 3 separate Java library APIs I currently
know for a fact depend on this behavior:
- Hadoop ReduceContextImpl$ValueIterator
- Mahout DenseVector$AllIterator/NonDefaultIterator
- LensKit FastIterators

It is an option to explicitly construct `IteratorSeq` instances (I actually
had verified that approach this afternoon for the Hadoop API in Parkour),
but I'm not happy about it. That approach places a definite burden on
integration library maintainers to implement the change, and on end-users
to realize they need to upgrade for Clojure 1.7 compatibility. The
`Iterator` interface just fundamentally in no way guarantees that the
`next()` yielded values are functional-safe in the sense necessary to
support chunking. I understand the desire to increase performance, but I
don't think it's worth the potential silent and bewildering breakage in
interop.


On Tue, May 26, 2015 at 9:18 PM Alex Miller a...@puredanger.com wrote:

 That's not what I see with 1.7.0-RC1 (or any of the betas). I tried with
 both Java 1.7.0_25 and 1.8.0-b132.

 user= *clojure-version*
 {:major 1, :minor 7, :incremental 0, :qualifier RC1}
 user= (- (map vector (java.util.EnumSet/allOf
 java.util.concurrent.TimeUnit) (range)) (into {}) (java.util.EnumMap.)
 (.entrySet) (map str) (into []))
 [NANOSECONDS=0 MICROSECONDS=1 MILLISECONDS=2 SECONDS=3 MINUTES=4
 HOURS=5 DAYS=6]

 Re implementing the not-uncommon Java pattern of mutating and re-yielding
 the same object on each `next()` invocation. I'm assuming that you're
 somehow expecting to traverse one seq node, then having an opportunity to
 mutate something (the source, the iterator, the return object) in between
 each new advancement of the seq node? That seems a) not common at all, b) a
 bad idea even in Java and c) dangerous even before this change. In either
 case you end up with a seq that points to a succession of the same repeated
 (mutable and mutating) object - this violates most expectations we as
 Clojure users have of sequences. Any sort of chunking (map, filter, etc)
 over the top of that seq would force realization up to 32 elements beyond
 the head causing the same issue.

 The original one-at-a-time IteratorSeq is still there (for now) and you
 can still make one if you want via (clojure.lang.IteratorSeq/create iter)
 but I would consider it deprecated. I think a custom lazy-seq or a
 loop-recur would be a better way to handle this case, which in my opinion
 is highly unusual. That said, my ears are open if this is an issue for a
 large number of people.


 On Tuesday, May 26, 2015 at 6:24:54 PM UTC-5, Marshall Bockrath-Vandegrift
 wrote:

 The difference is that the original behavior allowed room to transform
 the mutated object into an object which *could* be safely cached in a
 downstream seq, while the new behavior pumps the iterator through 32
 mutations before user-level code has a chance to see it.  Contrived example
 using the Java standard libary:

 Clojure 1.6.0:
 (- (map vector (java.util.EnumSet/allOf java.util.concurrent.TimeUnit)
 (range)) (into {}) (java.util.EnumMap.) (.entrySet) (map str) (into []))
 #= [NANOSECONDS=0 MICROSECONDS=1 MILLISECONDS=2 SECONDS=3
 MINUTES=4 HOURS=5 DAYS=6]

 Clojure 1.7.0-RC1:
 (- (map vector (java.util.EnumSet/allOf java.util.concurrent.TimeUnit)
 (range)) (into {}) (java.util.EnumMap.) (.entrySet) (map str) (into []))
 #= [DAYS=6 DAYS=6 DAYS=6 DAYS=6 DAYS=6 DAYS=6 DAYS=6]

 IMHO the latter behavior demonstrates a mismatch where chunked seqs and
 iterators are simple incompatible.

 On Tue, May 26, 2015 at 5:33 PM Alex Miller a...@puredanger.com wrote:

 In what way is it broken? Both before and after wrapped a mutable
 iterator into a caching seq. The new one is different in that it chunks so
 reads 32 at a time instead of 1. However combining either with other
 chunking sequence operations would have the same effect which is to say
 that using that mutable iterator with anything else, or having expectations
 about its rate of consumption was as dubious before as it is now.

 Unless of course I misunderstand your intent, which possible because I
 am on a phone without easy access to look further at the commit and am
 going by memory.



 On May 26, 2015, at 2:17 PM, Marshall Bockrath-Vandegrift 
 llas...@gmail.com wrote:

 Some of my code is broken by
 commit c47e1bbcfa227723df28d1c9e0a6df2bcb0fecc1, which landed in
 1.7.0-alpha6 (I lasted tested with -alpha5 and have been unfortunately busy
 since).  The culprit is the switch

Re: [ANN] Clojure 1.7.0-RC1 now available

2015-05-26 Thread Marshall Bockrath-Vandegrift
And to further explicate bewildering -- I mean that I only figured out
what was happening because I was explicitly testing a library against
1.7.0-RC1 and even then had to `git bisect` *Clojure* to find the offending
commit.  Otherwise the resulting behavior is just that values generated via
the deep guts of a complex Java API suddenly became nonsensical.

On Tue, May 26, 2015 at 9:45 PM Marshall Bockrath-Vandegrift 
llas...@gmail.com wrote:

 Ugh -- looks like the iterator value re-use behavior for EnumMap entrySet
 was fixed post Java 1.6 (my example was under Java 1.6, which I believe
 is still a Clojure-supported version of Java?).  I can throw together a
 synthetic example, but I think the people following this thread get what's
 happening.  The point isn't whether this pattern is a good idea or not
 (it certainly isn't) but whether existing Java APIs people want to interop
 with use it (they certainly do).

 I presently depend on no less than 3 separate Java library APIs I
 currently know for a fact depend on this behavior:
 - Hadoop ReduceContextImpl$ValueIterator
 - Mahout DenseVector$AllIterator/NonDefaultIterator
 - LensKit FastIterators

 It is an option to explicitly construct `IteratorSeq` instances (I
 actually had verified that approach this afternoon for the Hadoop API in
 Parkour), but I'm not happy about it. That approach places a definite
 burden on integration library maintainers to implement the change, and on
 end-users to realize they need to upgrade for Clojure 1.7 compatibility.
 The `Iterator` interface just fundamentally in no way guarantees that the
 `next()` yielded values are functional-safe in the sense necessary to
 support chunking. I understand the desire to increase performance, but I
 don't think it's worth the potential silent and bewildering breakage in
 interop.


 On Tue, May 26, 2015 at 9:18 PM Alex Miller a...@puredanger.com wrote:

 That's not what I see with 1.7.0-RC1 (or any of the betas). I tried with
 both Java 1.7.0_25 and 1.8.0-b132.

 user= *clojure-version*
 {:major 1, :minor 7, :incremental 0, :qualifier RC1}
 user= (- (map vector (java.util.EnumSet/allOf
 java.util.concurrent.TimeUnit) (range)) (into {}) (java.util.EnumMap.)
 (.entrySet) (map str) (into []))
 [NANOSECONDS=0 MICROSECONDS=1 MILLISECONDS=2 SECONDS=3
 MINUTES=4 HOURS=5 DAYS=6]

 Re implementing the not-uncommon Java pattern of mutating and
 re-yielding the same object on each `next()` invocation. I'm assuming that
 you're somehow expecting to traverse one seq node, then having an
 opportunity to mutate something (the source, the iterator, the return
 object) in between each new advancement of the seq node? That seems a) not
 common at all, b) a bad idea even in Java and c) dangerous even before this
 change. In either case you end up with a seq that points to a succession of
 the same repeated (mutable and mutating) object - this violates most
 expectations we as Clojure users have of sequences. Any sort of chunking
 (map, filter, etc) over the top of that seq would force realization up to
 32 elements beyond the head causing the same issue.

 The original one-at-a-time IteratorSeq is still there (for now) and you
 can still make one if you want via (clojure.lang.IteratorSeq/create iter)
 but I would consider it deprecated. I think a custom lazy-seq or a
 loop-recur would be a better way to handle this case, which in my opinion
 is highly unusual. That said, my ears are open if this is an issue for a
 large number of people.


 On Tuesday, May 26, 2015 at 6:24:54 PM UTC-5, Marshall
 Bockrath-Vandegrift wrote:

 The difference is that the original behavior allowed room to transform
 the mutated object into an object which *could* be safely cached in a
 downstream seq, while the new behavior pumps the iterator through 32
 mutations before user-level code has a chance to see it.  Contrived example
 using the Java standard libary:

 Clojure 1.6.0:
 (- (map vector (java.util.EnumSet/allOf java.util.concurrent.TimeUnit)
 (range)) (into {}) (java.util.EnumMap.) (.entrySet) (map str) (into []))
 #= [NANOSECONDS=0 MICROSECONDS=1 MILLISECONDS=2 SECONDS=3
 MINUTES=4 HOURS=5 DAYS=6]

 Clojure 1.7.0-RC1:
 (- (map vector (java.util.EnumSet/allOf java.util.concurrent.TimeUnit)
 (range)) (into {}) (java.util.EnumMap.) (.entrySet) (map str) (into []))
 #= [DAYS=6 DAYS=6 DAYS=6 DAYS=6 DAYS=6 DAYS=6 DAYS=6]

 IMHO the latter behavior demonstrates a mismatch where chunked seqs and
 iterators are simple incompatible.

 On Tue, May 26, 2015 at 5:33 PM Alex Miller a...@puredanger.com wrote:

 In what way is it broken? Both before and after wrapped a mutable
 iterator into a caching seq. The new one is different in that it chunks so
 reads 32 at a time instead of 1. However combining either with other
 chunking sequence operations would have the same effect which is to say
 that using that mutable iterator with anything else, or having expectations
 about its rate of consumption was as dubious before as it is now

Re: [ANN] Clojure 1.7.0-RC1 now available

2015-05-26 Thread Marshall Bockrath-Vandegrift
Some of my code is broken by
commit c47e1bbcfa227723df28d1c9e0a6df2bcb0fecc1, which landed in
1.7.0-alpha6 (I lasted tested with -alpha5 and have been unfortunately busy
since).  The culprit is the switch to producing seqs over iterators as
chunked iterators.  This would appear to break seq-based traversal of any
iterator implementing the not-uncommon Java pattern of mutating and
re-yielding the same object on each `next()` invocation.

I'm unable to find an existing ticket for this apparent-regression.  Should
I create one, or did I miss the existing ticket, or is there some
mitigating issue which makes this a non-problem?

Thanks.

-Marshall

On Thu, May 21, 2015 at 12:31 PM Alex Miller a...@puredanger.com wrote:

 Clojure 1.7.0-RC1 is now available.

 Try it via
 - Download: https://repo1.maven.org/maven2/org/clojure/clojure/1.7.0-RC1/
 - Leiningen: [org.clojure/clojure 1.7.0-RC1]

 The only change since 1.7.0-beta3 is CLJ-1706, which makes reader
 conditional splicing an error at the top level (previously it would
 silently drop all but the first spliced element).

 For a full list of changes since 1.6.0, see:
 https://github.com/clojure/clojure/blob/master/changes.md

 Please give it a try and let us know if things are working (or not). The
 more and quicker feedback we get, the sooner we can release 1.7.0 final!

 - Alex

 --
 You received this message because you are subscribed to the Google Groups
 Clojure Dev group.
 To unsubscribe from this group and stop receiving emails from it, send an
 email to clojure-dev+unsubscr...@googlegroups.com.
 To post to this group, send email to clojure-...@googlegroups.com.
 Visit this group at http://groups.google.com/group/clojure-dev.
 For more options, visit https://groups.google.com/d/optout.


-- 
You received this message because you are subscribed to the Google
Groups Clojure group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en
--- 
You received this message because you are subscribed to the Google Groups 
Clojure group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to clojure+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: [ANN] Clojure 1.7.0-RC1 now available

2015-05-26 Thread Marshall Bockrath-Vandegrift
The difference is that the original behavior allowed room to transform the
mutated object into an object which *could* be safely cached in a
downstream seq, while the new behavior pumps the iterator through 32
mutations before user-level code has a chance to see it.  Contrived example
using the Java standard libary:

Clojure 1.6.0:
(- (map vector (java.util.EnumSet/allOf java.util.concurrent.TimeUnit)
(range)) (into {}) (java.util.EnumMap.) (.entrySet) (map str) (into []))
#= [NANOSECONDS=0 MICROSECONDS=1 MILLISECONDS=2 SECONDS=3
MINUTES=4 HOURS=5 DAYS=6]

Clojure 1.7.0-RC1:
(- (map vector (java.util.EnumSet/allOf java.util.concurrent.TimeUnit)
(range)) (into {}) (java.util.EnumMap.) (.entrySet) (map str) (into []))
#= [DAYS=6 DAYS=6 DAYS=6 DAYS=6 DAYS=6 DAYS=6 DAYS=6]

IMHO the latter behavior demonstrates a mismatch where chunked seqs and
iterators are simple incompatible.

On Tue, May 26, 2015 at 5:33 PM Alex Miller a...@puredanger.com wrote:

 In what way is it broken? Both before and after wrapped a mutable iterator
 into a caching seq. The new one is different in that it chunks so reads 32
 at a time instead of 1. However combining either with other chunking
 sequence operations would have the same effect which is to say that using
 that mutable iterator with anything else, or having expectations about its
 rate of consumption was as dubious before as it is now.

 Unless of course I misunderstand your intent, which possible because I am
 on a phone without easy access to look further at the commit and am going
 by memory.



 On May 26, 2015, at 2:17 PM, Marshall Bockrath-Vandegrift 
 llas...@gmail.com wrote:

 Some of my code is broken by
 commit c47e1bbcfa227723df28d1c9e0a6df2bcb0fecc1, which landed in
 1.7.0-alpha6 (I lasted tested with -alpha5 and have been unfortunately busy
 since).  The culprit is the switch to producing seqs over iterators as
 chunked iterators.  This would appear to break seq-based traversal of any
 iterator implementing the not-uncommon Java pattern of mutating and
 re-yielding the same object on each `next()` invocation.

 I'm unable to find an existing ticket for this apparent-regression.
 Should I create one, or did I miss the existing ticket, or is there some
 mitigating issue which makes this a non-problem?

 Thanks.

 -Marshall

 On Thu, May 21, 2015 at 12:31 PM Alex Miller a...@puredanger.com wrote:

 Clojure 1.7.0-RC1 is now available.

 Try it via
 - Download: https://repo1.maven.org/maven2/org/clojure/clojure/1.7.0-RC1/
 - Leiningen: [org.clojure/clojure 1.7.0-RC1]

 The only change since 1.7.0-beta3 is CLJ-1706, which makes reader
 conditional splicing an error at the top level (previously it would
 silently drop all but the first spliced element).

 For a full list of changes since 1.6.0, see:
 https://github.com/clojure/clojure/blob/master/changes.md

 Please give it a try and let us know if things are working (or not). The
 more and quicker feedback we get, the sooner we can release 1.7.0 final!

 - Alex

 --
 You received this message because you are subscribed to the Google Groups
 Clojure Dev group.
 To unsubscribe from this group and stop receiving emails from it, send an
 email to clojure-dev+unsubscr...@googlegroups.com.
 To post to this group, send email to clojure-...@googlegroups.com.
 Visit this group at http://groups.google.com/group/clojure-dev.
 For more options, visit https://groups.google.com/d/optout.

  --
 You received this message because you are subscribed to the Google Groups
 Clojure Dev group.
 To unsubscribe from this group and stop receiving emails from it, send an
 email to clojure-dev+unsubscr...@googlegroups.com.
 To post to this group, send email to clojure-...@googlegroups.com.
 Visit this group at http://groups.google.com/group/clojure-dev.
 For more options, visit https://groups.google.com/d/optout.

  --
 You received this message because you are subscribed to the Google Groups
 Clojure Dev group.
 To unsubscribe from this group and stop receiving emails from it, send an
 email to clojure-dev+unsubscr...@googlegroups.com.
 To post to this group, send email to clojure-...@googlegroups.com.
 Visit this group at http://groups.google.com/group/clojure-dev.
 For more options, visit https://groups.google.com/d/optout.


-- 
You received this message because you are subscribed to the Google
Groups Clojure group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en
--- 
You received this message because you are subscribed to the Google Groups 
Clojure group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to clojure+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: are there any real examples on Github of how to use reducers?

2015-05-22 Thread Marshall Bockrath-Vandegrift
piastkra...@gmail.com writes:

 Can anyone point me to a project on Github that is using Clojure 1.5
 reducerers?

The Parkour project provides an interface to Hadoop MapReduce primarily
in terms of tasks as functions over task input chunks as
clojure.core.reducers reducible collections:

https://github.com/damballa/parkour

-- 
Marshall Bockrath-Vandegrift llas...@damballa.com
Principal Software Engineer, Damballa RD

-- 
You received this message because you are subscribed to the Google
Groups Clojure group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en
--- 
You received this message because you are subscribed to the Google Groups 
Clojure group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to clojure+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Help sought on issue with AOT, hadoop, classloaders, and consistency of Clojure fn classes

2015-02-02 Thread Marshall Bockrath-Vandegrift
On Friday, January 30, 2015 at 5:00:31 PM UTC-5, Jason Wolfe wrote:

Thanks for the recommendation.  For now we're looking for a simple 
 low-level interface to MR, but we're also keeping an eye on parkour and 
 pigpen for more complex tasks down the road.  Can you explain why I might 
 prefer parkour to pigpen or vice-versa? 


Parkour actually *is* a low-level interface to MR.  It just exposes that 
interface though relatively Clojure-idiomatic and composable abstractions, 
which can make it look higher-level than it really is.  At Parkour's core 
is the support required for MR tasks to invoke a regular Clojure var-bound 
function in place of the `.run` method of a `Mapper` or `Reducer` class. 
 Everything else in Parkour is built to make using that primitive, 
low-level interface more composable, convenient, and pleasant; but 
ultimately nothing *replaces* that interface -- your Parkour MR Clojure 
task code runs in exactly the way equivalent raw Hadoop MR Java task code 
would.

Parkour's documentation includes a motivation document describing the 
project motivation in the face of the Clojure-Hadoop integration projects 
which existed when I started Parkour (including 
clojure-hadoop): 
https://github.com/damballa/parkour/blob/master/doc/motivation.md 
. It doesn't yet cover PigPen, although I certainly should add a section. I 
honestly haven't evaluated PigPen in detail, but the approach of compiling 
Clojure code to Pig seems excessively complex to me, to the point of only 
be worth it for organizations which have already made a significant 
investment in Pig.

-Marshall
 

-- 
You received this message because you are subscribed to the Google
Groups Clojure group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en
--- 
You received this message because you are subscribed to the Google Groups 
Clojure group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to clojure+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Help sought on issue with AOT, hadoop, classloaders, and consistency of Clojure fn classes

2015-01-30 Thread Marshall Bockrath-Vandegrift
Not a solution to your immediate problem, but if this is for 
new development (not an existing mass of clojure-hadoop code), I'd suggest 
looking at Parkour instead.  As the main Parkour developer I'm obviously 
biased, but Parkour exists in part because the compilation model used by 
clojure-hadoop in order to meet Hadoop's expectations is very much at odds 
with typical Clojure development. In particular, Parkour does not require 
AOT compilation.

https://github.com/damballa/parkour

On Thursday, January 29, 2015 at 2:39:54 AM UTC-5, Jason Wolfe wrote:

 First off, I apologize in advance for not having a reduced test case, and 
 express my sincere gratitude in advance for any assistance.  I've been 
 tearing my hair out for a day or so and not making headway, and figured 
 someone here might recognize some keywords and have a pointer in the right 
 direction. (I'm admittedly pretty green when it comes to class loading, and 
 have largely exhausted my google fu).  

 *Problem: *

 I'm submitting a hadoop job using clojure-hadoop.  All is well with a 
 simple job, but once I require something that transitively depends on 
 Schema, I end up with: 

 clojure.lang.Compiler$CompilerException: 
 java.lang.IllegalArgumentException: No implementation of method: :walker of 
 protocol: #'schema.core/Schema found for class: clojure.core$long, 
 compiling:(crane/config.clj:33:4)

 It works fine when run in-process with hadoop-mapreduce-client-jobclient, 
 but not with bin/hadoop -jar.  This stunk of a classloader issue, and after 
 digging in it seems that there are multiple versions of clojure.core$long 
 floating around.  The version on which the protocol is extended is not the 
 same class for the fn that the symbol 'long resolves to in client code.

 *Context: *

 clojure-hadoop is AOT-compiled, and after being loaded by hadoop it 
 dynamically loads the target namespace (not AOT-compiled, nor any other of 
 the code in question) using 
 https://github.com/alexott/clojure-hadoop/blob/master/src/clojure_hadoop/load.clj#L3

 From here, schema is transitively required, and then client namespaces 
 attempt to use the Schema protocol to generate validators, and when the 
 schema 'long is used (which resolves to the fn with class 
 clojure.core$long), it fails to find the appropriate method.  

 After repeated head-bashing, I've determined that there are (at least two) 
 versions of the clojure.core$long class floating around -- the one used to 
 extend the protocol, which stems from a DynamicClassLoader, and the one 
 that 'long resolves to in client code, which stems from a URLClassLoader. 
  The URLClassLoader is the loader of the current thread and Compiler, but 
 not @(clojure.lang.Compiler/LOADER).

 *Attempts:*

 I've tried wrapping the clojure-hadoop loading code with 
 .setContextClassLoader on some obvious candidates and binding 
 *use-context-classloader* around the code doing the loading, with no avail. 
  I've tried changing the schema code to reference the class in different 
 ways (class (resolve 'long)), (class 'long), etc and that hasn't made a 
 difference.  I've checked and the clojure-hadoop jar doesn't contain any 
 .class files for clojure, schema, or other offending code.  

 *Plea:*

 I suspect there's something obvious I'm missing.  (In retrospect it seems 
 like the design of Schema may be suboptimal in light of this, but if 
 possible I'd like to figure out a workaround without changing that 
 substantially). Thanks in advance for your help -- any and all pointers are 
 welcome.  

 -Jason



  


-- 
You received this message because you are subscribed to the Google
Groups Clojure group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en
--- 
You received this message because you are subscribed to the Google Groups 
Clojure group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to clojure+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Switched map for record and got slower

2015-01-28 Thread Marshall Bockrath-Vandegrift
At the very least, all the places where you switched to method/property 
syntax now require runtime reflection. Try leaving those as they were, or 
switching to using the keyword as the function (which is somewhat more 
idiomatic).  You could instead hint the appropriate object type in each 
function, but part of the benefit of records is that most code should be 
agnostic as to whether it has a plain map or a record.

On Wednesday, January 28, 2015 at 10:35:12 AM UTC-5, Pedro Pereira Santos 
wrote:

 Hello,

 I have a vector with ~20 elements that don't change. They are basically 
 metadata information that are used on the application. Those 20 elements 
 are maps, and I gathered them like:

 (defn units []
   [unit1 unit2 ...])

 I created a record and changed the gathering to:

 (defn units []
   (map #(map-UnitRecord %) [unit1 unit2 ...]))

 I have autotest running. Pressing enter gives me all the suites run time. 
 From my n-tries-check-time benchmark, I noticed that on the original 
 version the suite took ~500ms, but with the record it takes abount ~800ms.

 What can explain this? I don't change these units anywhere, just read from 
 them. I access the properties via functions on the namespace.

 I was expecting the opposite result. Any ideas on what am I doing wrong?

 Thanks

 PS: The PR on my project with this change:
 https://github.com/orionsbelt-battlegrounds/obb-rules/pull/12/files


-- 
You received this message because you are subscribed to the Google
Groups Clojure group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en
--- 
You received this message because you are subscribed to the Google Groups 
Clojure group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to clojure+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Resources don't work in uberjar

2015-01-27 Thread Marshall Bockrath-Vandegrift
On Monday, January 26, 2015 at 9:24:28 PM UTC-5, Dan Harbin wrote:


   io/file


Just delete that line.  The `io/resource` function returns a URL which all 
the Clojure IO functions can handle just fine-as is.  When running in 
development the URL happens to be a `file://` URL, and thus something 
`io/file` can handle.  Once the resource is in a JAR that is no longer the 
case, and hence exceptions.  Just don't require a file when any URL will do 
and you'll be fine.

-Marshall

-- 
You received this message because you are subscribed to the Google
Groups Clojure group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en
--- 
You received this message because you are subscribed to the Google Groups 
Clojure group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to clojure+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


[ANN] letterpress: locally-configurable Clojure data printers

2014-12-06 Thread Marshall Bockrath-Vandegrift
Clojure's tagged literals and `clojure.edn` namespace functions allow 
hyper-local (per call) specification of tagged-literal data readers during 
deserialization.  The letterpress library provides a similar capability for 
printing, allowing per-call specification of print methods and general 
recursive printing middleware.

Code and documentation on github:

https://github.com/llasram/letterpress

Example, round-tripping a JVM Class object through EDN:

(require '[letterpress.core :as lp])

(defn class-print [^Class x ^Writer w]
  (doto w (.write #java.lang/class ) (.write (.getName x
(defn class-read [x] (Class/forName (name x)))

(- String
 (lp/pr-str {:printers {Class class-print}})
 (edn/read-string {:readers {'java.lang/class class-read}}))
;;= java.lang.String
(class *1)
;;= java.lang.Class


Comments and pull requests gladly accepted.

-Marshall

-- 
You received this message because you are subscribed to the Google
Groups Clojure group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en
--- 
You received this message because you are subscribed to the Google Groups 
Clojure group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to clojure+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Weird data reader issue (clj-time)

2014-08-20 Thread Marshall Bockrath-Vandegrift
What's your `print-dup` for instants print them as?  The way compilation 
for these expressions is going to work is:

(a) The initial form will be read using the configured *data-readers*, 
handing the compiler a form with a literal instance object.
(b) The compiler will generate code to create that literal; when the 
literal value isn't of a type the compiler knows how to emit directly, it 
emits code to round-trip back through the reader at run-time, embedding the 
`print-dup` representation of the object as a string.

If `print-dup` prints in such a way as to discard the offset, away it goes.

-Marshall

On Wednesday, August 20, 2014 10:29:15 AM UTC-4, dan.sto...@gmail.com wrote:

 Maybe I am missing something obvious - 

 I am using custom data readers for joda-time instants. time/inst strings 
 are coerced into utc date times, time/insto keep the offset around. 

 Using the exact same function to parse the string via the data-reader, and 
 just calling the function - I get different results. The function is 
 pure... 

 *data-readers*
 = {time/insto (var corp-pure.time/parse-with-offset), 
   time/inst (var corp-pure.time/parse)}

 (.getChronology (corp-pure-time/parse-with-offset 
 2014-05-03T23:00:00+0100))
 (.getChronology #time/insto 2014-05-03T23:00:00+0100)

 = #ISOChronology ISOChronology[+01:00]
 = #ISOChronology ISOChronology[UTC]

 To prove there are no obvious side-effects here:

 (.getChronology (corp-pure.time/parse-with-offset 
 2014-05-03T23:00:00+0100))
 (.getChronology (corp-pure.time/parse-with-offset 
 2014-05-03T23:00:00+0100))

 = #ISOChronology ISOChronology[+01:00]
 = #ISOChronology ISOChronology[+01:00]

 (.getChronology #time/insto 2014-05-03T23:00:00+0100)
 (.getChronology #time/insto 2014-05-03T23:00:00+0100)

 = #ISOChronology ISOChronology[UTC]
 = #ISOChronology ISOChronology[UTC]

 Has anyone seen anything like this?


-- 
You received this message because you are subscribed to the Google
Groups Clojure group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en
--- 
You received this message because you are subscribed to the Google Groups 
Clojure group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to clojure+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


[ANN] Parkour 0.6.0: Hadoop integration in idiomatic Clojure

2014-08-17 Thread Marshall Bockrath-Vandegrift
Hi all,

I’m pleased to announce the release of version 0.6.0 of Parkour.  Parkour 
is a Clojure Hadoop integration library, focused on support for writing 
Hadoop MapReduce applications in plain Clojure.  Code, documentation, and 
artifact dependency information is available on Github:

https://github.com/damballa/parkour

This release includes some minor breaking changes and a major new feature: 
dvals, a value-oriented interface to passing data to jobs through the 
Hadoop distributed cache.  A complete list of user-visible changes is 
available in the project changelog:

https://github.com/damballa/parkour/blob/master/NEWS.md

Parkour provides:

- I/O integration (clojure.java.io functions acting on HDFS files).
- A map-like API for Hadoop configurations.
- MapReduce integration (plain Clojure functions as MapReduce tasks).
- MapReduce job construction and execution.
- Unified local/remote access to Hadoop datasets as reducible collections.
- And much more!

See the (significantly updated) documentation for details, or feel free to 
e-mail me or the Parkour mailing list (park...@librelist.com) with any 
questions.

Thanks!

-Marshall

-- 
You received this message because you are subscribed to the Google
Groups Clojure group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en
--- 
You received this message because you are subscribed to the Google Groups 
Clojure group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to clojure+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


[ANN] Parkour 0.5.4, Hadoop REPL edition

2014-02-08 Thread Marshall Bockrath-Vandegrift
Parkour is a Clojure library for writing distributed programs in the
MapReduce pattern which run on the Hadoop MapReduce platform:

https://github.com/damballa/parkour

Release 0.5.4 adds significant new features for REPL integration.
Parkour now supports connecting to a live cluster, then running
local-mode jobs, mixed-mode jobs, and remote jobs, all from the same
REPL process.  This brings Clojure’s standard REPL-based workflow to
Hadoop, enabling rapid iterative development of MapReduce applications.

Complete release notes:

https://github.com/damballa/parkour/blob/master/NEWS.md#054--2014-02-08

And a blog post discussing the new features in context:

http://blog.platypope.org/2014/2/8/interactive-hadoop-with-parkour/

Artifacts are available on Clojars.

-Marshall

-- 
You received this message because you are subscribed to the Google
Groups Clojure group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en
--- 
You received this message because you are subscribed to the Google Groups 
Clojure group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to clojure+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


Re: Regarding Clojure's license

2013-11-13 Thread Marshall Bockrath-Vandegrift
phillip.l...@newcastle.ac.uk (Phillip Lord) writes:

 I did consider the possibility that it just wasn't funny!

Oh, no – it was hilarious. :-)

-Marshall

-- 
-- 
You received this message because you are subscribed to the Google
Groups Clojure group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en
--- 
You received this message because you are subscribed to the Google Groups 
Clojure group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to clojure+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


Re: [ANN] Parkour: Hadoop MapReduce in idiomatic Clojure

2013-11-12 Thread Marshall Bockrath-Vandegrift
Sam Ritchie sritchi...@gmail.com writes:

 Great stuff!

Thanks!

 Just as a note, Cascalog 2.0 has a lower-level DSL that lets you
 write Cascading in idiomatic clojure. Here are some test examples:

 https://github.com/nathanmarz/cascalog/blob/develop/cascalog-core/test/cascalog/cascading/operations_test.clj

Cool.  I did not know about that part of the API, which does look nifty.
I’m working on a blog post digging into this some, and I’m hoping to
snag one of the lightning talk spots at the Conj, but – I do think
there’s a big difference between writing job-flows which use a
`map`-like `map*` function and literally calling `map` in a literal
plain function[1].

Want a state-bearing sequence-mapping transformation?  With Parkour, you
can just grab bbloom’s `transduce` library[2] and it works just as well
in a remote task as it does in local code, because it does in fact do
literally the same thing in both scenarios.  You can get similar results
in Cascalog/Cascading, but need to first re-express the functionality in
terms of Cascalog/Cascading’s abstractions vs just leaning directly on
Clojure’s.

The algebraic execution planners backing Cascading- and FlumeJava-likes
allow powerful optimization of cross-task operations, but do require all
transformations to be expressed in terms of primitives the planners
understand.  Parkour loses the cross-task awareness, but allows
MapReduce tasks to do anything which can be expressed as operations on a
Clojure reducible collection.  This can include repeated partial
reductions (even map-side), full task-partition reductions, and
arbitrary numbers of disjoint task outputs.

It’s not a perfect example of what I’m talking about, but Parkour does
include an example implementation of the MapReduce algorithm for
transforming a graph into a sparse matrix of absolute-indexed cells:


https://github.atl.damballa/rnd/parkour/blob/master/examples/parkour/examples/matrixify.clj

I’ll see if I can distill out a more compelling example from some real
jobs prior to the Conj :-).

[1] It admittedly hurts this point a bit that Parkour exclusively uses
reducers instead of lazy sequences, but I’m hoping shortly to add the
necessary glue to allow tasks to work via seqs too when desired.

[2] https://github.com/brandonbloom/transduce

-- 
Marshall Bockrath-Vandegrift llas...@damballa.com
Principal Software Engineer, Damballa RD

-- 
-- 
You received this message because you are subscribed to the Google
Groups Clojure group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en
--- 
You received this message because you are subscribed to the Google Groups 
Clojure group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to clojure+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


[ANN] Parkour: Hadoop MapReduce in idiomatic Clojure

2013-11-04 Thread Marshall Bockrath-Vandegrift
I’m pleased to announce the first public release of Parkour, a library
for writing Hadoop MapReduce applications in idiomatic Clojure.  Parkour
takes your Clojure code’s functional gymnastics and sends it
free-running across the urban environment of your Hadoop cluster.

https://github.com/damballa/parkour/

Parkour aims to provide deep Clojure integration for Hadoop.  Programs
using Parkour are normal Clojure programs, using standard Clojure
functions instead of new framework abstractions.  Programs using Parkour
are also full Hadoop programs, with complete access to absolutely
everything possible in raw Java Hadoop MapReduce.  If you know Clojure,
and you know Hadoop, then you’re most of the way to knowing Parkour.

Here is the core of the obligatory “word count” MapReduce program,
written using Parkour:

(defn mapper
  [conf]
  (fn [context input]
(- (mr/vals input)
 (r/mapcat #(str/split % #\s+))
 (r/map #(- [% 1])

(defn reducer
  [conf]
  (fn [context input]
(- (mr/keyvalgroups input)
 (r/map (fn [[word counts]]
  [word (r/reduce + 0 counts)])

(defn word-count
  [dseq dsink]
  (- (pg/input dseq)
  (pg/map #'mapper)
  (pg/partition [Text LongWritable])
  (pg/combine #'reducer)
  (pg/reduce #'reducer)
  (pg/output dsink)))

Parkour includes detailed documentation, ranging from a quickstart
introduction through detailed discussions of several specific aspects:

https://github.com/damballa/parkour/#documentation

Although this is the first public release of Parkour, the Damballa RD
team has been using it extensively since beginning serious development
earlier this year.  We do also use and will continue to use Cascalog,
but we’ve found that Parkour’s simpler model and more direct Hadoop
integration is a better fit for many problems.

I am personally incredibly excited about this release.  I will be at
this year’s Clojure/conj, and will be more than happy to discuss Parkour
in detail with those interested.

Questions and pull requests welcome!

-- 
Marshall Bockrath-Vandegrift llas...@damballa.com
Principal Software Engineer, Damballa RD

-- 
-- 
You received this message because you are subscribed to the Google
Groups Clojure group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en
--- 
You received this message because you are subscribed to the Google Groups 
Clojure group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to clojure+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


Re: [ANN] Parkour: Hadoop MapReduce in idiomatic Clojure

2013-11-04 Thread Marshall Bockrath-Vandegrift
ronen nark...@gmail.com writes:

 Thanks for releasing this, I personally had to re-invent such
 functionality over clojure-hadoop

Glad to do so.  If you’ve been exploring a similar software space, would
be very interested in additional specific feedback.  And PRs :-).

 Did you happen to test this over AWS EMR?

I have not run it live on EMR, but the unit test matrix includes Hadoop
versions 0.20.205, 1.0.3, and 2.2.0, which are the sufficiently-recent
Hadoop releases EMR’s documentation claims are supported.

-- 
Marshall Bockrath-Vandegrift llas...@damballa.com
Principal Software Engineer, Damballa RD

-- 
-- 
You received this message because you are subscribed to the Google
Groups Clojure group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en
--- 
You received this message because you are subscribed to the Google Groups 
Clojure group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to clojure+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


Re: Scala interop (or, aliasing imported Java classes)

2013-11-03 Thread Marshall Bockrath-Vandegrift
Mark mjt0...@gmail.com writes:

 I think my preferred solution would be to allow imported Java classes
 to be aliased, so I could do this:

 (import '(org.fooinstitute.team.library.foo package :as foop))
 = org.fooinstitute.team.library.foo.package
 (foop/isFoo foop)
 = false

 But to the best of my knowledge (and searching), that doesn't exist in
 Clojure.

Clojure doesn’t provide it as a part of the `ns` or `import` forms, but
the underlying `Namespace` objects support it just fine, and then almost
everything works as you’d expect w/ the arbitrary alias:

user (.importClass *ns* 'Q java.util.concurrent.LinkedBlockingQueue)
java.util.concurrent.LinkedBlockingQueue
user Q
java.util.concurrent.LinkedBlockingQueue
user (Q.)
#LinkedBlockingQueue []
user (fn [q x] (.offer q x))
Reflection warning, /tmp/form-init7252328014986537096.clj:1:11 - call to 
offer can't be resolved.
#user$eval1302$fn__1303 user$eval1302$fn__1303@ece88d2
user (fn [q x] (.offer ^Q q x))
#user$eval1308$fn__1309 user$eval1308$fn__1309@251c4123

HTH,

-Marshall

-- 
-- 
You received this message because you are subscribed to the Google
Groups Clojure group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en
--- 
You received this message because you are subscribed to the Google Groups 
Clojure group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to clojure+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


Re: What's the -dup in print-dup?

2013-11-03 Thread Marshall Bockrath-Vandegrift
John Mastro john.b.mas...@gmail.com writes:

 This isn't a very deep question, but I wonder every time I come across
 it: to what does -dup in `print-dup` and `*print-dup*` refer?

I don’t have any special knowledge in this regard, but I’ve always
thought of it as “duplicate,” which makes some sense when you think of
how it’s used.  AFAIK, `print-dup` exists to provide objects which
doen’t normally print to `read`able form an alternative form which is
`read`able.  The compiler can then use the `print-dup` form to embed
instance objects in code, by generating code which produces duplicates
via round-tripping through the reader.

user (print-method (fn []) *out*)
#user$eval1328$fn__1329 user$eval1328$fn__1329@6dc8f3cd
nil
user (print-dup (fn []) *out*)
#=(user$eval1332$fn__1333. )
nil

-Marshall

-- 
-- 
You received this message because you are subscribed to the Google
Groups Clojure group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en
--- 
You received this message because you are subscribed to the Google Groups 
Clojure group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to clojure+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


Re: reflection warning with threading macro

2013-10-15 Thread Marshall Bockrath-Vandegrift
Brian Craft craft.br...@gmail.com writes:

 What's going on? Is there some other way to type hint the case with
 the threading macro?

I’m pretty sure this is CLJ-865 “Macroexpansion discards form metadata”:

   http://dev.clojure.org/jira/browse/CLJ-865

In which case you’ll need to use an alternative form of the code in
order to attach the type-hint to e.g. a local.  At least until 1.6...

-Marshall

-- 
-- 
You received this message because you are subscribed to the Google
Groups Clojure group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en
--- 
You received this message because you are subscribed to the Google Groups 
Clojure group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to clojure+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


Re: How to suppress warnings?

2013-10-07 Thread Marshall Bockrath-Vandegrift
Gary Zhao garyz...@gmail.com writes:

 I'm using core.async, but always see the following warnings. How do I
 suppress them? 

It looks like you are literally `:use`ing `core.async`, or doing a
`:require ... :refer :all`.  This is generally not what you want,
because you have no control over the vars now referred in your
namespace.  You’d fix the problem causing the warnings if you just
`:require` `core.async` in your `ns` declaration and only `:refer` in
specific vars you want to use w/o a namespace-prefix.

-Marshall

-- 
-- 
You received this message because you are subscribed to the Google
Groups Clojure group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en
--- 
You received this message because you are subscribed to the Google Groups 
Clojure group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to clojure+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


Re: Integration with Java dependency injection frameworks?

2013-10-02 Thread Marshall Bockrath-Vandegrift
Cedric Greevey cgree...@gmail.com writes:

 Erm ... a MOOC?

Massive Open Online Course – e.g., Coursera, edX, Udacity, etc.  This is
for the Coursera-offered version of the University of Minnesota course
on recommendation systems, but that didn’t seem particularly relevant to
the problem at hand.

-Marshall

-- 
-- 
You received this message because you are subscribed to the Google
Groups Clojure group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en
--- 
You received this message because you are subscribed to the Google Groups 
Clojure group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to clojure+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


Integration with Java dependency injection frameworks?

2013-10-01 Thread Marshall Bockrath-Vandegrift
Hi all:

I was wondering – does anyone have any experience with or patterns for
integrating Clojure with existing Java dependency injection frameworks?
I’m working with the LensKit framework [1] for a MOOC and it uses a
JSR-330/javax.inject dependency-injection framework named `grapht` [2]
for, well, everything.  In some examples dependency-injection is even
used to provide input data file paths (!).  As far as I can tell, the
injection happens entirely via invoking some sort of concrete,
user-provided, annotated constructor which accepts injected instances on
(potentially annotated) constructor parameters.

My thoughts thus far have centered on two basic potential approaches:

1. Implement small Java stubs with appropriately-annotated constructors
for every injection point.  The downside is that these stubs are
(probably) not very reusable.  Injected instances are only injected at
the constructor parameters, so needing a different set of injected
instances requires a different class/constructor.

2. Dynamically generate stub classes exposing constructors which accept
some arbitrary per-class set of inject-able parameters.  There might be
an easier way to do it, but that’s the approach I’ve tried out thus-far
using ASM and the Clojure dynamic class loaders to produce “Provider”
factory classes:

https://github.com/llasram/esfj

The largest problem with this approach is that the actual factory
implementation still needs to be injected somehow.  I thought about
stashing a function in a Var linked to the class, but have instead just
made the implementation function itself another inject-able IFn
parameter.  This avoids namespace-abuse, but also means injecting
Clojure implementations involves doing *more* dependency injection than
with plain Java, which doesn’t seem right.

Thoughts?

[1] http://lenskit.grouplens.org/
[2] https://github.com/grouplens/grapht

-Marshall

-- 
-- 
You received this message because you are subscribed to the Google
Groups Clojure group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en
--- 
You received this message because you are subscribed to the Google Groups 
Clojure group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to clojure+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


Re: Typedef-like functionality for Clojure records?

2013-09-26 Thread Marshall Bockrath-Vandegrift
Vincent Chen noodle...@gmail.com writes:

 - Use something else than records to model structs (suggestions welcome)?

Maps.

Records have concrete Java types, which allows them to implement
interfaces and participate in protocols.  Fields defined on a record
type are backed by JVM object fields, which can increase performance.
But there are no strictness benefits – a record may have any number of
additional keys associated to values:

(defrecord Foo [bar])
;;= user.Foo
(map-Foo {:bar 1, :baz 2})
;;= #user.Foo{:bar 1, :baz 2}
(class (map-Foo {:bar 1, :baz 2}))
;;= user.Foo

So my suggestion would be to instead turn your `struct` definitions into
functions validating that the expected fields are present within plain
maps.  (Assuming some sort of strictness/validation is the goal.)

-Marshall

-- 
-- 
You received this message because you are subscribed to the Google
Groups Clojure group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en
--- 
You received this message because you are subscribed to the Google Groups 
Clojure group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to clojure+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


Re: Question about using extend-type and multiple implementations of the same protocol on nil.

2013-09-26 Thread Marshall Bockrath-Vandegrift
Dustin Conrad dustinacon...@gmail.com writes:

 Now, in the tests for heap1.clj where I am not requiring the heap2
 namespace (or even know about it), my tests are failing because
 (insert nil 1) now returns a heap2 record.

Different namespaces aren’t isolated from one another.  There is a
single global “namespace of namespaces,” you could say.  The tests for
the `heap1` namespace may not have `require`d the `heap2` namespace, but
e.g. under the normal Leiningen clojure.test test runner, they will both
end up being loaded in the same process.  Namespaces provide a way to
prevent collisions between names, but all references to the same
fully-qualified name refer to the same object identity.

 At this point I decided I obviously did not understand what
 extend-type actually does, and tried to do a bit of research, but I
 couldn't really find anything that went very in-depth.

The `extend-type` form mutates the collection of implementations for one
or more protocol to include implementations for the extended type,
replacing any existing implementations for that type.  Because the
protocols themselves are extended, all such extensions are globally
visible to all users of the protocol.

 If someone could provide an alternative to what I am trying to do,
 that would be great. Alternatively, it would be useful for someone to
 explain what extend-type does under the covers a bit.

It’s not entirely clear what you’re trying to do, but one approach would
be for each namespace to have separate nil-heap `reify`-cations of the
`Heap` protocol.  All local invocations of heap protocol functions would
then go through proxy functions which would replace `nil` with the local
nil-heap implementation.

-- 
Marshall Bockrath-Vandegrift llas...@damballa.com
Damballa Staff Software Engineer | 518.859.4559m

-- 
-- 
You received this message because you are subscribed to the Google
Groups Clojure group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en
--- 
You received this message because you are subscribed to the Google Groups 
Clojure group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to clojure+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


Re: Clojure newbie code review

2013-09-10 Thread Marshall Bockrath-Vandegrift
Philipp Meier phme...@gmail.com writes:

 (alter-var-root #'*read-eval* (constantly false))
 = why do you think this is necessary?

Some versions of the Leiningen `app` template put this in the skeleton
initial source file.  I assume that’s where this came from.

-Marshall

-- 
-- 
You received this message because you are subscribed to the Google
Groups Clojure group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en
--- 
You received this message because you are subscribed to the Google Groups 
Clojure group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to clojure+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


ANN: abracad 0.4.2 – Avro de/serialization for Clojure

2013-08-26 Thread Marshall Bockrath-Vandegrift
Abracad is a library for de/serializing Clojure data structures 
as Avro [1], leveraging the Java Avro implementation.

Avro is a schema-based binary serialization system similar to Thrift or
Protocol Buffers, but with some philosophical differences and better
support for dynamic languages.  In particular, and unlike Thrift or
Protocol Buffers, Avro does not require pre-generated schema-specific
de/serialization code.

Abracad provides Clojure integration for Avro.  Unlike previous Clojure
Avro libraries, Abracad works directly in terms of native Clojure data.
Abracad support: a generic mapping between Avro and Clojure data for
arbitrary schemas; customized protocol-based mappings between Avro
records and any JVM types; and “schema-less” EDN-in-Avro serialization
of arbitrary Clojure data.

Source code is on Github:

  https://github.com/damballa/abracad

Release artifacts are on Clojars.

[1] http://avro.apache.org/

-- 
Marshall Bockrath-Vandegrift llas...@damballa.com
Damballa Staff Software Engineer

-- 
-- 
You received this message because you are subscribed to the Google
Groups Clojure group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en
--- 
You received this message because you are subscribed to the Google Groups 
Clojure group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to clojure+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


Re: too circular?

2013-08-26 Thread Marshall Bockrath-Vandegrift
Dennis Haupt d.haup...@gmail.com writes:

 (defn fib-n [n] 
 (let [fib (fn [a b] (cons a (lazy-seq (fib b (+ b a)] 
 (take n (fib 1 1

 can't i do a recursion here? how can i achieve this without doing an
 outer defn?

You just need to give the anonymous function a name it can use to refer
to itself for (non-tail) recursion:

(defn fib-n [n] 
  (let [fib (fn fib [a b] (cons a (lazy-seq (fib b (+ b a)]
(take n (fib 1 1

-Marshall

-- 
-- 
You received this message because you are subscribed to the Google
Groups Clojure group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en
--- 
You received this message because you are subscribed to the Google Groups 
Clojure group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to clojure+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


Re: apply inc

2013-08-11 Thread Marshall Bockrath-Vandegrift
drclj deepikaro...@gmail.com writes:

 Thanks everyone, in the apply function source code I see

 ([^clojure.lang.IFn f args]
 (. f (applyTo (seq args

 Seems the (applyTo (seq args)) returns arg parameters,

 And the f is invoked only once:

 (. f args)

I think you’re missing that `.` is a special form with special
evaluation rules.  The following forms are all equivalent:

(. f (applyTo (seq args))
(. f applyTo (seq args))
(.applyTo f (seq args))

With the last being syntactic sugar converted during macro-expansion to
the middle form.

So there’s no `applyTo` *function*, just the `applyTo` *method* of IFn
instance `f`.

OOC, do you have a background using R or similar languages?  I was
confused myself learning R, because what R calls `apply` is nothing like
what Lisps call `apply` and (as others in this thread pointed out) is
more similar to what Clojure calls `map`.

-Marshall

-- 
-- 
You received this message because you are subscribed to the Google
Groups Clojure group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en
--- 
You received this message because you are subscribed to the Google Groups 
Clojure group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to clojure+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


[ANN] Method-fn: augmented Java methods as functions

2013-08-10 Thread Marshall Bockrath-Vandegrift
Do Clojure’s built-in methods for bridging the distinction between
functions and host platform methods seem clunky to you?  Then the
method-fn library may hold the solution!:

(require 'method.fn)
(map #mf/i String/trim [ a b ]) ;; Look ma, no reflection!
(map #mf/s Math/log (range 1 5));; Static methods too

Artifacts in Clojars, source code and README on github:

https://github.com/llasram/method-fn

-Marshall

-- 
-- 
You received this message because you are subscribed to the Google
Groups Clojure group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en
--- 
You received this message because you are subscribed to the Google Groups 
Clojure group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to clojure+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.




Re: [ANN] Method-fn: augmented Java methods as functions

2013-08-10 Thread Marshall Bockrath-Vandegrift
Shantanu Kumar kumar.shant...@gmail.com writes:

 Wow! This is neat. Congratulations on the release.

Thank you!

 A minor observation: It may help some readers if you mention on the
 README that it may not work with lein-try (as I found) and that the
 user must `require` the ns first: (require '[method.fn])

Good point.  I’ve updated the README accordingly.

-Marshall

-- 
-- 
You received this message because you are subscribed to the Google
Groups Clojure group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en
--- 
You received this message because you are subscribed to the Google Groups 
Clojure group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to clojure+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.




Re: Symbol.intern doesnt return already interned symbols?

2013-07-26 Thread Marshall Bockrath-Vandegrift
Jürgen Hötzel juer...@hoetzel.info writes:

 If a symbol X is interned twice, shouldn't the second Symbol.intern(X)
 return the previous interned symbol object?

Symbols in Clojure can have metadata, and so can’t have pure value-based
identity.  Keywords fill that role instead, which is why keywords can’t
hold metadata, and why `Keyword/intern` *does* act as you had expected
`Symbol/intern` to act.  Or at least this is my understanding – I hope
it helps.

-Marshall

-- 
-- 
You received this message because you are subscribed to the Google
Groups Clojure group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en
--- 
You received this message because you are subscribed to the Google Groups 
Clojure group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to clojure+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.




Re: Symbol.intern doesnt return already interned symbols?

2013-07-26 Thread Marshall Bockrath-Vandegrift
Jürgen Hötzel juer...@hoetzel.info writes:

 My Question was about the interning. AFAIK interning shoult only
 return a new Symbol, when the Symbol wasn't interned already.

We may already be on the same page.  I was just pointing out that there
aren’t any semantic benefits to that form of iterning for Symbols.  The
way keywords are interned guarantees that any Keywords which are `=` are
the same object and thus `identical?`.  Symbols with different metadata
being separate objects precludes that.  You could have `Symbol/intern`
maintain a (weak-reference) cache to the nil-metadata versions of
Symbols, but it would *just* a cache.  If you had profiling results
which showed that maintaining such a cache made compilation x% faster,
that might be interesting, but otherwise I’m not clear on what the
benefit would be.

-Marshall

-- 
-- 
You received this message because you are subscribed to the Google
Groups Clojure group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en
--- 
You received this message because you are subscribed to the Google Groups 
Clojure group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to clojure+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.




Damballa is hiring

2013-07-10 Thread Marshall Bockrath-Vandegrift
Damballa is currently hiring software engineers and scientists for our
Research  Development (RD) team.  Damballa’s network appliances
analyze network traffic for evidence of malware and advanced threat
infections which have circumvented our customer’s preventative
solutions.  On the RD team, we use Clojure and Hadoop to build machine
learning systems automating the discovery and classification of new
threats in Internet-scale data, and the systems to deliver that
knowledge to our appliances.

Damballa is based in Atlanta, GA, USA.  We would prefer local
candidates, but are completely open to remote work for the right fit.

  Corporate web site :: https://www.damballa.com/
  RD engineer listing :: https://www.damballa.com/careers/RD_Developer
  RD scientist listing :: https://www.damballa.com/careers/RD_Data_Scientist

If you have any questions about the positions or working at Damballa,
please feel free to contact me directly at llas...@damballa.com.

-- 
Marshall Bockrath-Vandegrift llas...@damballa.com
Damballa Staff Software Engineer

-- 
-- 
You received this message because you are subscribed to the Google
Groups Clojure group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en
--- 
You received this message because you are subscribed to the Google Groups 
Clojure group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to clojure+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.




Re: bug in 'extend-protocol' ???

2013-06-13 Thread Marshall Bockrath-Vandegrift
Jim - FooBar(); jimpil1...@gmail.com writes:

 CompilerException java.lang.UnsupportedOperationException: nth not
 supported on this type: Character, compiling:(NO_SOURCE_PATH:1:1)

If you examine the implementation of `extend-protocol` and for how it
distinguishes between additional functions being defined for a type and
new types to which to extend the protocol, I think you’ll see what’s
going on here.

-Marshall

-- 
-- 
You received this message because you are subscribed to the Google
Groups Clojure group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en
--- 
You received this message because you are subscribed to the Google Groups 
Clojure group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to clojure+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.




Re: Clojure in production

2013-06-12 Thread Marshall Bockrath-Vandegrift
Plínio Balduino pbaldu...@gmail.com writes:

 I'm writing a talk about Clojure in the real world and I would like to
 know, if possible, which companies are using Clojure for production or
 to make internal tools.

At Damballa we’re using Clojure and Cascalog to do all of our
Hadoop-based backend data processing.  As we’ve moved more of our
infrastructure to Hadoop- and JVM-based technologies (such as HBase),
our use of Clojure has expanded to encompass most new server-side
development.

We have a small amount of Clojure code released as open source, with
more to come in the future:

https://github.com/damballa

And obligatory company Web site:

https://www.damballa.com/

-Marshall

-- 
-- 
You received this message because you are subscribed to the Google
Groups Clojure group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en
--- 
You received this message because you are subscribed to the Google Groups 
Clojure group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to clojure+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.




Re: importing a java class which 'requires' your namespace in a static block

2013-06-05 Thread Marshall Bockrath-Vandegrift
Jim - FooBar(); jimpil1...@gmail.com writes:

 Now, the first time I (load-file xxx.core.clj) everything is
 perfectly fine. The minute I make a change and re-load I get:

 NoClassDefFoundError Could not initialize class yyy.Foo

This confuses me, because the JVM should only be loading and
initializing the Java class once.  Re-loading the Clojure namespace
shouldn’t report an error unless initialization actually failed the
first time around.

That said, you will definitely get errors if you have a Java class
`require` a namespace in a static initializer which then `import`s the
class.  Clojure `import` causes referenced classes to be initialized,
which runs static initializers, which means a circular namespace -
initializer - namespace dependency effectively reduces to a circular
namespace dependency.

The solution I’ve been using lately is to push the Java-side
namespace-loading into a private static inner class of the original Java
class.  This provides the benefits of JVM-managed single-initialization,
but defers execution of that initialization code until something
actually needs one of the imported Vars.  Example:


https://github.com/damballa/abracad/blob/master/src/java/abracad/avro/ClojureDatumReader.java

HTH,

-Marshall

-- 
-- 
You received this message because you are subscribed to the Google
Groups Clojure group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en
--- 
You received this message because you are subscribed to the Google Groups 
Clojure group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to clojure+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.




Re: pmap thread oversubscription OSX

2013-05-01 Thread Marshall Bockrath-Vandegrift
kuba roth kuba.r...@gmail.com writes:

 I've got more examples for OSX which clearly shows that as soon as the
 number of tasks exceeds number of cores pmap performance suffers. It
 seems to me like there is no blocking taking place on threads and all
 the tasks are started at the same time.

There’s a still-open bug with `pmap` and chunked seqs which sounds like
it may what you’re seeing:

http://dev.clojure.org/jira/browse/CLJ-862

You can get `pmap` to work mostly-reasonably by unchunking input seqs
first, although I’ve found it doesn’t work well for many problems anyway
(no control over level of parallelism, FIFO ordering, etc).

-Marshall

-- 
-- 
You received this message because you are subscribed to the Google
Groups Clojure group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en
--- 
You received this message because you are subscribed to the Google Groups 
Clojure group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to clojure+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.




atl-clj: Atlanta, GA, USA Clojure meetup

2013-02-04 Thread Marshall Bockrath-Vandegrift
Hi all:

As I myself missed when this group got started, I’d like to
delayed-announce the existence of atl-clj, a meetup group for Clojure
users in the Atlanta, GA area:

http://www.meetup.com/Atl-Clj/

Current plan is to meet ~monthly on second Tuesdays.  The next scheduled
meetup is for February 12th, where I’ll be giving a talk + live-coding
demo on Cascalog.

If you’re in the Atlanta area, I hope you can make it!

-Marshall

-- 
-- 
You received this message because you are subscribed to the Google
Groups Clojure group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en
--- 
You received this message because you are subscribed to the Google Groups 
Clojure group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to clojure+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.




Re: multicore list processing (was Re: abysmal multicore performance, especially on AMD processors)

2013-01-31 Thread Marshall Bockrath-Vandegrift
Chas Emerick c...@cemerick.com writes:

 Keeping the discussion here would make sense, esp. in light of
 meetup.com's horrible discussion board.

Excellent.  Solves the problem of deciding the etiquette of jumping on
the meetup board for a meetup one has never been involved in.  :-)

 The nature of the `burn` program is such that I'm skeptical of the
 ability of any garbage-collected runtime (lispy or not) to scale its
 operation across multiple threads.

Bringing you up to speed on this very long thread, I don’t believe the
original `burn` benchmark is a GC issue, due to results cameron first
reported here:

https://groups.google.com/d/msg/clojure/48W2eff3caU/83uXZjLi3iAJ

I that narrowed to what I still believe to be the explanation here:

https://groups.google.com/d/msg/clojure/48W2eff3caU/jd8dpmzEtEYJ

And have more compact demonstration benchmark results here:

https://groups.google.com/d/msg/clojure/48W2eff3caU/tCCkjXxTUMEJ

I haven’t been able to produce those results in a minimal Java-only test
case, though.

Then Wm. Josiah posted a full-application benchmark, which appears to
have entirely different performance problems from the synthetic `burn`
benchmark.  I’d rejected GC as the cause for the slowdown there too, but
ATM can’t recall why or what I tested, so GC may definitely be a
candidate to re-examine:

https://groups.google.com/d/msg/clojure/48W2eff3caU/K224Aqwkn5YJ

Quite looking forward to additional insight...

-Marshall

-- 
-- 
You received this message because you are subscribed to the Google
Groups Clojure group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en
--- 
You received this message because you are subscribed to the Google Groups 
Clojure group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to clojure+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.




Re: strange error: : Could not find or load main class –jar

2013-01-31 Thread Marshall Bockrath-Vandegrift
larry google groups lawrencecloj...@gmail.com writes:

 Any suggestion, no matter how far fetched, will be welcome. I am
 ignorant about the JVM so I am having trouble debugging this problem. 

 java -jar kiosk.clj 3

These are weird (.clj vs .jar), but since you say whatever you actually
ran worked...

 Then I gave the app to the sysadmin, and he tried to spin
 it up on another server, and on startup he got the error:
 
 Could not find or load main class –jar

I’m guessing encoding error.  In fact, if that error is a direct
copy-paste from either the exact error or what you sent the sysadmin, it
completely explains it.  Your `-` character in `-jar` above is in fact
`-` (en-dash), which is causing `java` to search the classpath for a
class named `–jar`.

-Marshall

-- 
-- 
You received this message because you are subscribed to the Google
Groups Clojure group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en
--- 
You received this message because you are subscribed to the Google Groups 
Clojure group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to clojure+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.




Re: abysmal multicore performance, especially on AMD processors

2013-01-30 Thread Marshall Bockrath-Vandegrift
Wm. Josiah Erikson wmjos...@gmail.com writes:

 Am I reading this right that this is actually a Java problem, and not
 clojure-specific? Wouldn't the rest of the Java community have noticed
 this? Or maybe massive parallelism in this particular way isn't
 something commonly done with Java in the industry?

 Thanks for the patches though - it's nice to see some improvement...
 I'll be fascinated to see how this turns out in the end. Have we found
 a large Java bug?

Apologies for my very-slow reply here.  I keep thinking that I’ll have
more time to look into this issue, and keep having other things
requiring my attention.  And on top of that, I’ve temporarily lost the
many-way AMD system I was using as a test-bed.

I very much want to see if I can get my hands on an Intel system to
compare to.  My AMD system is in theory 32-way – two physical CPUs, each
with 16 cores.  However, Linux reports (via /proc/cpuinfo) the cores in
groups of 8 (“cpu cores : 8” etc).  And something very strange happens
when extending parallelism beyond 8-way...  I ran several experiments
using a version of your whole-application benchmark I modified to
control the level of parallelism.  At parallelism 9+, the real time it
takes to complete the benchmark hardly budges, but the user/CPU time
increases linearly with the level of parallelism!  As far as I can tell,
multi-processor AMD *is* a NUMA architecture, which might potentially
explain things.  But enabling the JVM NUMA options doesn’t seem to
affect the benchmark.

I think next steps are two-fold: (1) examine parallelism vs real  CPU
time on an Intel system, and (2) attempt to reproduce the observed
behavior in pure Java.  I’m keeping my fingers crossed that I’ll have
some time to look at this more soon, but I’m honestly not very hopeful.

In the mean time, I hope you’ve managed to exploit multi-process
parallelism to run more efficiently?

-Marshall

-- 
-- 
You received this message because you are subscribed to the Google
Groups Clojure group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en
--- 
You received this message because you are subscribed to the Google Groups 
Clojure group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to clojure+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.




Re: abysmal multicore performance, especially on AMD processors

2012-12-21 Thread Marshall Bockrath-Vandegrift
Wm. Josiah Erikson wmjos...@gmail.com writes:

 I hope this helps people get to the bottom of things.

Not to the bottom of things yet, but found some low-hanging fruit –
switching the `push-state` from a struct-map to a record gives a flat
~2x speedup in all configurations I tested.  So, that’s good?

I have however eliminated to my satisfaction the possibility that this
is something orthogonal to your system.  I do see some speedup
improvements when I lower the level of concurrency to the number of
actual physical cores on my system, but each call to the
`error-function` still takes ~10x as long to complete when run in
parallel vs in serial.

For a while I had some hope that atom-access in the main interpreter
loop was the problem, due to causing extraneous fetches to main memory.
But removing all the atoms from the system didn’t have any appreciable
impact.

Anyway, still poking at it.

-Marshall

-- 
You received this message because you are subscribed to the Google
Groups Clojure group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en


Re: abysmal multicore performance, especially on AMD processors

2012-12-21 Thread Marshall Bockrath-Vandegrift
Lee Spector lspec...@hampshire.edu writes:

 FWIW I used records for push-states at one point but did not observe a
 speedup and it required much messier code, so I reverted to
 struct-maps. But maybe I wasn't doing the right timings. I'm curious
 about how you changed to records without the messiness. I'll include
 below my sig the way that I had to do it... maybe you can show me what
 you did instead.

I just double-checked, and I definitely see a 2x speedup on Josiah’s
benchmark.  That may still be synthetic, of course.  Here’s what I did:

(eval `(defrecord ~'PushState [~'trace ~@(map (comp symbol name) 
push-types)]))

(let [empty-state (map-PushState {})]
  (defn make-push-state
Returns an empty push state.
[] empty-state))

 Still, I guess the gorilla in the room, which is eating the multicore
 performance, hasn't yet been found.

No, not yet...  I’ve become obsessed with figuring it out though, so
still slogging at it.

-Marshall

-- 
You received this message because you are subscribed to the Google
Groups Clojure group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en


Re: Strange behaviour: nested def in proxy method body

2012-12-13 Thread Marshall Bockrath-Vandegrift
kristianlm krist...@adellica.com writes:

 I'm enjoying testing Java code with Clojure and it's been a lot of fun
 so far. Coming from Scheme, the transit is comfortable. However, I
 encountered a big surprise when I nested def's and used them with a
 proxy:

This is a common surprise for people with previous exposure to Scheme:
`def` in Clojure is always explicitly namespace-scoped.  What it does is
create a var with a name and intern it into the namespace with that
name, not introduce a name in the current lexical scope.  Nested `def`s
are thus very rarely going to be what one actually want, and the
behavior you’re seeing is exactly what one would expect.

-Marshall

-- 
You received this message because you are subscribed to the Google
Groups Clojure group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en


Re: abysmal multicore performance, especially on AMD processors

2012-12-12 Thread Marshall Bockrath-Vandegrift
Andy Fingerhut andy.finger...@gmail.com writes:

 I'm not practiced in recognizing megamorphic call sites, so I could be
 missing some in the example code below, modified from Lee's original
 code.  It doesn't use reverse or conj, and as far as I can tell
 doesn't use PersistentList, either, only Cons.

...

 Can you try to reproduce to see if you get similar results?  If so, do
 you know why we get bad parallelism in a single JVM for this code?  If
 there are no megamorphic call sites, then it is examples like this
 that lead me to wonder about locking in memory allocation and/or GC.

I think your benchmark is a bit different from Lee’s original.  The
`reverse`-based versions perform heavily allocation as they repeatedly
reverse a sequence, but each thread will hold a sequence of length at
most 10,000 at any given time.  In your benchmark, each thread holds a
sequence of at most 2,000,000 elements, for a naive 200x increase in
memory pressure and a potential increase in the number of objects being
promoted out of the young generation.

I ran your run benchmark under a version of Cameron’s criterium-based
speed-up measurement wrapper I’ve modified to pass in the `pmap`
function to use.  I reduced the number of iterations in your algorithm
by a factor of 5 to get it to run in a reasonable amount of time.  And I
ran it using default JVM GC settings, on a 32-way AMD system.

I get the following numbers for 1-32 way parallelism with a 500MB heap:

andy  1 : smap-ms 7.5, pmap-ms 7.7, speedup 0.97
andy  2 : smap-ms 7.8, pmap-ms 9.8, speedup 0.80
andy  4 : smap-ms 8.5, pmap-ms 10.6, speedup 0.80
andy  8 : smap-ms 8.6, pmap-ms 11.5, speedup 0.75
andy 16 : smap-ms 8.1, pmap-ms 12.5, speedup 0.65
andy 32 : [java.lang.OutOfMemoryError: Java heap space]

And these numbers with a 4GB heap:

andy  1 : smap-ms 3.8, pmap-ms 4.0, speedup 0.95
andy  2 : smap-ms 4.2, pmap-ms 2.1, speedup 2.02
andy  4 : smap-ms 4.2, pmap-ms 1.7, speedup 2.48
andy  8 : smap-ms 4.2, pmap-ms 1.2, speedup 3.44
andy 16 : smap-ms 4.4, pmap-ms 1.0, speedup 4.52
andy 32 : smap-ms 4.0, pmap-ms 1.6, speedup 2.55

I’m running out of time for breakfast experiments, but it seems
relatively likely to me that the increased at-once sequence size in your
benchmark is increasing the number of objects making it out of the young
generation.  This in turn is increasing the number of pause-the-world
GCs, which increase even further in frequency at lower heap sizes.  I’ll
run these again later with GC logging and report if the results are
unexpected.

-Marshall

-- 
You received this message because you are subscribed to the Google
Groups Clojure group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en


Re: abysmal multicore performance, especially on AMD processors

2012-12-12 Thread Marshall Bockrath-Vandegrift
cameron cdor...@gmail.com writes:

   the megamorphic call site hypothesis does sound plausible but I'm
 not sure where the following test fits in.

...

 I was toying with the idea of replacing the EmptyList class with a
 PersistsentList instance to mitigate the problem
 in at least one common case, however it doesn't seem to help.
 If I replace the reverse call in burn with the following code:
   #(reduce conj (list nil) %)
 I get the same slowdown as we see if reverse (equivalent to #(reduce
 conj '() %))

Ah, but include your own copy of `conj` and try those two cases.  The
existing clojure.core/conj has already been used on multiple types, so
you need a new IFn class with a fresh call site.  Here are the numbers I
get when I do that:

w/o EmptyList : smap-ms 6.1, pmap-ms 1.2, speedup 5.26
w/  EmptyList : smap-ms 10.4, pmap-ms 16.2, speedup 0.64
w/o EmptyList : smap-ms 10.5, pmap-ms 16.3, speedup 0.64

That said, I’m slightly less convinced than I was earlier.  I’m having
difficulty producing a minimal example demonstrating the issue, and the
results wmjosiah reported for modifying their actual code are
disheartening.  I just don’t have anything else which begins to explain
the transition from speedup to inverse speedup in the above benchmarking
sequence.

-Marshall

-- 
You received this message because you are subscribed to the Google
Groups Clojure group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en


Re: abysmal multicore performance, especially on AMD processors

2012-12-11 Thread Marshall Bockrath-Vandegrift
nicolas.o...@gmail.com nicolas.o...@gmail.com writes:

 What happens if your run it a third time at the end?   (The question
 is related to the fact that there appears to be transition states
 between monomorphic and megamorphic call sites,  which might lead to
 an explanation.)

Same results, but your comment jogged my reading and analysis is what I
believe is the right direction.

This is a potentially grandiose claim, but I believe the inverse speedup
is due to contention caused by JVM type profiling in megamorphic call
sites.  Unfortunately -XX:-UseTypeProfile doesn’t appear to turn off
type profiling itself, so I can’t prove this claim definitively without
going even further down the rabbit hole and grubbing through the JVM
source code.

To back up this claim, I present the following modified `conj*`:

(defn conj*
  [coll x]
  (let [a (long-array 32)]
(dotimes [n 5] (dotimes [i 32] (aset a i n)))
(clojure.lang.RT/conj coll x)))

And the resulting profiling numbers:

list-conj* : map-ms: 42.0, pmap-ms 24.8, speedup 1.69
cons-conj* : map-ms: 39.7, pmap-ms 25.1, speedup 1.58

Adding busy-work (and an extra allocation!) to the loop improved the
speed-up, I believe by decreasing the relative portion of the call
execution time during which the callsite type profile information is
being updated.  If I’m correct, the `Cons` and `PersistentList` `.cons`
implementations are so tight that in their base versions the type
profiling forms a significant enough portion of the total call that the
updates are in frequent conflict.  Adding busy-work to the `conj*` call
adds some jitter which prevents them from contending quite as
frequently.

I’m not sure what the next steps are.  Open a bug on the JVM?  This is
something one can attempt to circumvent on a case-by-case basis, but
IHMO has significant negative implications for Clojure’s concurrency
story.

-Marshall

-- 
You received this message because you are subscribed to the Google
Groups Clojure group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en


Re: abysmal multicore performance, especially on AMD processors

2012-12-11 Thread Marshall Bockrath-Vandegrift
Lee Spector lspec...@hampshire.edu writes:

 Is the following a fair characterization pending further developments?

 If you have a cons-intensive task then even if it can be divided into
 completely independent, long-running subtasks, there is currently no
 known way to get significant speedups by running the subtasks on
 multiple cores within a single Clojure process. 

Not quite.  If you’d been using `cons` (in the benchmark, if `reverse`
used `cons` in its implementation), then you’d be getting a perfectly
reasonable speedup.  The problem child in this instance is `conj`.

If my analysis is correct, then the issue is any megamodal call site –
such as `conj` – which is invoked in a tight loop by multiple threads
simultaneously.  Any simultaneous invocation of such call sites
introduces contention and reduces speedup, but the problem only becomes
pathological in very, very tight loops, such as when performing the
minimal work required by the `.cons` [1] implementations of `Cons` and
`PersistentList`.  In these cases the portion of the call which
introduces contention is a sufficient proportion of the overall call
time that the speedup becomes inverse.

 In some cases you will be able to get significant speedups by
 separating the subtasks completely and running them in separate
 Clojure processes running on separate JVM instances.  But the speedups
 will be lost (mostly, and you might even experience slowdowns) if you
 try to run them from within a single Clojure process.

For this particular issue, splitting each task into a separate JVM
entirely negates the problem, because there is no simultaneous
invocation of the same call site.

 Or have I missed a currently-available work-around among the many
 suggestions?

You can specialize your application to avoid megamodal call sites in
tight loops.  If you are working with `Cons`-order sequences, just use
`cons` instead of `conj`.  If you are working with vectors, create your
own private implementation of `conj` which you *only* call on vectors.
If you are depending on operations which may/do use `conj` in tight
loops, create your own private re-implementations which don’t, such as
with any of the faster versions of `reverse` earlier in this thread.

This is suboptimal, but it’s totally possible to work around the issue
with a little bit of analysis and profiling.

[1] Possible point of confusion – the JVM interface method invoked by
the Clojure `conj` function is named `.cons`, for I assume historical
reasons.  The Clojure `cons` function on the other hand just allocates a
`Cons` object in an entirely monomodal fashion.

-Marshall

-- 
You received this message because you are subscribed to the Google
Groups Clojure group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en


Re: abysmal multicore performance, especially on AMD processors

2012-12-11 Thread Marshall Bockrath-Vandegrift
Lee Spector lspec...@hampshire.edu writes:

 If the application does lots of list processing but does so with a
 mix of Clojure list and sequence manipulation functions, then one
 would have to write private, list/cons-only versions of all of these
 things? That is -- overstating it a bit, to be sure, but perhaps not
 entirely unfairly -- re-implement Clojure's Lisp?

I just did a quick look over clojure/core.clj, and `reverse` is the only
function which stood out to me as hitting the most pathological case.
Every other `conj` loop over a user-provided datastructure is `conj`ing
into an explicit non-list/`Cons` type.

So I think if you replace your calls to `reverse` and any `conj` loops
you have in your own code, you should see a perfectly reasonable
speedup.

-Marshall

-- 
You received this message because you are subscribed to the Google
Groups Clojure group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en


Re: abysmal multicore performance, especially on AMD processors

2012-12-10 Thread Marshall Bockrath-Vandegrift
cameron cdor...@gmail.com writes:

 There does seem to be something unusual about conj and
 clojure.lang.PersistentList in this parallel test case and I don't
 think it's related to the JVMs memory allocation.

I’ve got a few more data-points, but still no handle on what exactly is
going on.

My last benchmark showing the `conj*` speedup for `Cons` objects
degrading as soon as it was used on a `PersistantList` was incomplete.
In fact, the speedup degrades after it is used on objects of more than
one type.  The effect just appears immediately when used with
`PersistantList` because '() is in fact a different a
`PersistantList$EmptyList`.  Using `conj*` first in vector
implementation then results in the same inverse speedup on `Cons`s.

Even without your near-optimal speedup using Java standard library
types, I think your earlier benchmarks are enough to demonstrate that
this isn’t an issue with allocation alone.  All of the implementations
based on `reduce` with `conj` must allocate and return a new object for
each iteration.  If parallel allocation were the sole issue, I’d expect
all of the implementations to demonstrate the same behavior.

Unfortunately I have no idea what to connect from these facts:

  - Parallel allocation of `Cons` and `PersistentList` instances through
a Clojure `conj` function remains fast as long as the function only
ever returns objects of a single concrete type

  - Parallel allocation speed for `PersistentVector` instances is
unaffected by `conj` returning multiple types, and does not
demonstrate the inverse speedup seen for the previous types.

At this point I believe the symptoms point to cache contention, but I
don’t know where or why.  Using OpenJDK 7 with -XX:+UseCondMark didn’t
appear to produce any improvement.  Creating a private copy of
`PersistentList` which contained additional padding fields likewise
didn’t appear to produce any improvement.

So, Lee Spector: I think it’s possible to work around this though by
just not using `conj` on lists.  It’s suboptimal, but at least solves
the problem in your original benchmark.  Further improvements are
obviously possible, but that’s a start.

-Marshall

-- 
You received this message because you are subscribed to the Google
Groups Clojure group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en


Re: abysmal multicore performance, especially on AMD processors

2012-12-10 Thread Marshall Bockrath-Vandegrift
Wm. Josiah Erikson wmjos...@gmail.com writes:

 Aha. Not only do I get a lot of made not entrant, I get a lot of
 made zombie. However, I get this for both runs with map and with
 pmap (and with pmapall as well)

I’m not sure this is all that enlightening.  From what I can gather,
“made not entrant” just means that a JITed version proved to be invalid
in light of later code and new invocation paths won’t use the previous
version.  And “made zombie” just means all references to an old JIT’d
version have been lost, making it available to be GC’ed.

A copy of `conj` becomes “not entrant” after being used on both vectors
and lists, but the new version gets the same speed-up when used on
vectors as a copy which has only been used on vectors.  There’s
something else going on which is specifically affecting parallel calls
to the polymorphic version when applied to instances of
`PersistentList`, and `Cons`.

-Marshall

-- 
You received this message because you are subscribed to the Google
Groups Clojure group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en


Re: abysmal multicore performance, especially on AMD processors

2012-12-09 Thread Marshall Bockrath-Vandegrift
cameron cdor...@gmail.com writes:

 Interesting problem, the slowdown seems to being caused by the reverse
 call (actually the calls to conj with a list argument).

Excellent analysis, sir!  I think this points things in the right
direction.

 fast-reverse    : map-ms: 3.3, pmap-ms 0.7, speedup 4.97
 list-cons   : map-ms: 4.0, pmap-ms 0.7, speedup 6.13

The difference between these two I believe is just jitter.  Once `cons`
is called on either a list or a vector, the result is a
`clojure.lang.Cons` object.

 vec-conj    : map-ms: 4.0, pmap-ms 1.3, speedup 3.10

For the sub-goal of optimizing `reverse`, I get better times even than
for the `cons`-based implementation by using transient vectors:

list-cons : map-ms: 4.0, pmap-ms 0.6, speedup 6.42
tvec-conj : map-ms: 0.9, pmap-ms 0.2, speedup 4.10

(defn tvec-conj [coll]
  (persistent! (reduce conj! (transient []) coll)))

 list-conj   : map-ms: 10.8, pmap-ms 21.2, speedup 0.51
 clojure-reverse : map-ms: 13.5, pmap-ms 26.8, speedup 0.50 (this is
 equivalent to the original code)

I add the following:

cons-conj : map-ms: 3.3, pmap-ms 16.8, speedup 0.19

(defn cons-conj [coll]
  (reduce conj (clojure.lang.Cons. (first coll) nil) (rest coll)))

I think this is the key, but I don’t understand it.

The `cons` function just immediately creates a new `Cons` object.  The
`conj` function calls the `.cons` method of the collection, and the
`.cons` implementation `Cons` inherits from `ASeq` just creates a new
`Cons` object!

It’s like there’s a lock of some sort sneaking in on the `conj` path.
Any thoughts on what that could be?

-Marshall

-- 
You received this message because you are subscribed to the Google
Groups Clojure group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en


Re: abysmal multicore performance, especially on AMD processors

2012-12-09 Thread Marshall Bockrath-Vandegrift
Andy Fingerhut andy.finger...@gmail.com writes:

 My current best guess is the JVM's memory allocator, not Clojure code.

I didn’t mean to imply the problem was in Clojure itself, but I don’t
believe the issue is in the memory allocator either.  I now believe the
problem is in a class of JIT optimization HotSpot is performing which
turns into a “pessimization” in the parallel case.

I have the following code, taking the structure from Cameron’s
benchmarks, but replacing the exact functions being tested:

https://gist.github.com/4246320

Note that `conj*` is simply a copy-paste of the `conj` implementation
from clojure/core.clj.  The benchmark runs reducing that `conj*`
function once on `Cons`s, then on `PersistantList`s, then again on
`Cons`s.  

And here are the results:

cons-conj* : map-ms: 5.6, pmap-ms 1.1, speedup 5.08
list-conj* : map-ms: 10.1, pmap-ms 15.9, speedup 0.63
cons-conj* : map-ms: 10.0, pmap-ms 15.6, speedup 0.64

The function performs fine on `Cons` objects, but once applied to
`PersistantList` objects, `conj*` somehow becomes “tainted” and acquires
the inverse speedup factor.  When I run with -XX:+PrintCompilation I
don’t see anything particularly suspicious, but I’m not well-versed in
the art of interpreting HotSpot’s entrails.

My idea for next steps is to create a copy of the PersistantList class
and selectively modify it in an attempt to identify what aspect of it
causes the JVM to produce this behavior.

-Marshall

-- 
You received this message because you are subscribed to the Google
Groups Clojure group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en


Re: abysmal multicore performance, especially on AMD processors

2012-12-08 Thread Marshall Bockrath-Vandegrift
Lee Spector lspec...@hampshire.edu writes:

 I'm also aware that the test that produced the data I give below,
 insofar as it uses pmap to do the distribution, may leave cores idle
 for a bit if some tasks take a lot longer than others, because of the
 way that pmap allocates cores to threads.

Although it doesn’t impact your benchmark, `pmap` may be further
adversely affecting the performance of your actual program.  There’s a
open bug regarding `pmap` and chunked seqs:

http://dev.clojure.org/jira/browse/CLJ-862

The impact is that `pmap` with chunked seq input will spawn futures for
its function applications in flights of 32, spawning as many flights as
necessary to reach or exceed #CPUS + 2.  On a 48-way system, it will
initially launch 64 futures, then spawn an additional 32 every time the
number of active unrealized futures drops below 50, leading to
significant contention for a CPU-bound application.

I hope it can be made useful in a future version of Clojure, but right
now `pmap` is more of an attractive nuisance than anything else.

-Marshall

-- 
You received this message because you are subscribed to the Google
Groups Clojure group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en