[ANN] thurber: Clojure on Apache Beam (distributed batch/streaming)

2020-01-21 Thread atdixon
Here is thurber (https://github.com/atdixon/thurber) (at early alpha 
release) that enables Clojure on Apache Beam platforms like Google Dataflow.

thurber's goals include:

- Full support for Beam capabilities
- AOT-less (AOT not required; full dynamic support for serializing 
functions, including inlined functions, and proxies)
- Macro-less (very few, always optional, macros)
- Performance focus (core optimized for large volume data streaming)
- Idiomatic Clojure focus (Clojure functions are automatically 
distributable functional transforms, lazy sequences over iterative output, 
..)

When coming to Apache Beam and wanting to use Clojure there are a few 
hurdles to overcome, some discussed here in the past.  Clojure's Java 
interop commonly falls short in the domain of distributed big data Java 
platforms (proxies and functions not serializable, no support for 
generation of generic type signatures, minimal/insufficient support for 
method annotations, suboptimal dynamic binding performance, etc)

thurber bridges these issues internally, giving a full dynamic/Clojure 
experience on top of Apache Beam.

(For Onyx users, thurber + Beam meet the same ideals 
<http://www.onyxplatform.org/docs/user-guide/0.14.x/#what-does-onyx-offer> as 
Onyx on a well-backed platform.)

This is early alpha release and feedback on the API & facilities are 
welcome.

For the curious, the walkthrough covers most of thurber capability: 
https://github.com/atdixon/thurber/blob/master/demo/walkthrough.clj

-- 
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en
--- 
You received this message because you are subscribed to the Google Groups 
"Clojure" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to clojure+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/clojure/c18cc8e1-01c9-4688-bff3-6d50f128d0e4%40googlegroups.com.


Re: Inconsistent AOT classnotfoundexception

2019-09-12 Thread atdixon
Interesting! I had not seen clj-headlights but my org is using Beam + 
Clojure and we've made some similar decisions as clj-headlights. 
Specifically we are avoiding AOT and its headaches,.

On Thursday, September 12, 2019 at 12:37:24 AM UTC-5, Kimmo Koskinen wrote:
>
> Hi!
>
> Not a direct answer, but have you looked at clj-headlights 
> https://github.com/logrhythm-oss/clj-headlights, a Apache Beam wrapper 
> for Clojure. It might have pointers related to Beam/AOT specifically.
>
> - Kimmo
>
>

-- 
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en
--- 
You received this message because you are subscribed to the Google Groups 
"Clojure" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to clojure+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/clojure/eb31a66e-3880-497a-8019-484ce970a405%40googlegroups.com.


Re: gen-class/AOT considered harmful (in libraries)???

2019-09-02 Thread atdixon
I was responding to this:

> I'm still torn on whether to actually add Clojure as a proper dependency

The question to have Clojure as a proper dependency doesn't seem to change 
whether you AOT or not. I take "proper dependency" to mean a Maven/Lein 
dependency that ships with the dependency list in your library's POM.

I agree that you that you shouldn't ship classes AOT'd outside of your 
library/its namespaces.  

On Sunday, September 1, 2019 at 10:31:46 PM UTC-5, Didier wrote:
>
> AOT does matter, because AOT is transitive. Effectively, all AOT builds 
> are like Uberjars. They compile your code and the code of your dependencies 
> and theirs as well into .class files putting everything in the build 
> folder. Then the package will take all the classes and put them in the Jar.
>
> Any library that does that does it wrong. Don't AOT your libraries for 
> that reason. Or make sure if you do, you are only including .class of your 
> library code, not any of its dependencies. And even ideally, like I said, 
> only include the bare minimum, so only .classes required for interop. The 
> rest package as source.
>

-- 
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en
--- 
You received this message because you are subscribed to the Google Groups 
"Clojure" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to clojure+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/clojure/a0c62664-93a0-4504-af2b-8b7b73629e59%40googlegroups.com.


Re: gen-class/AOT considered harmful (in libraries)???

2019-09-01 Thread atdixon
Hi, Dimitris - 

It looks from Clojure source [1] that Clojure compiles to v1.8 class files, 
so this should be mean you can run in any JVM 1.8 and beyond.

When you say you are deliberating including Clojure as a proper dependency. 
I've noticed that some Clojure libraries will have Clojure listed as a 
dependency in their repository POM but it will be marked with scope 
`provided` which is a way of saying this is a dependency that the JAR is 
expected to be provided by the library consumer at runtime.

I'm not sure why AOT changes the question of whether to include the Clojure 
jar as a dependency; I think you have this question with 
non-precompiled/AOT'd libraries, no? I do understand that with AOT your lib 
will have *.class files in the JAR that directly reference Clojure classes. 
But in some IDEs -- just having the dependency as provided will suffice to 
make it available for linking/compiling.

[1] 
https://github.com/clojure/clojure/commit/38705b49fd3dbae11e94c576ef49ff3eb1c47395
)

On Friday, August 23, 2019 at 2:18:35 PM UTC-5, Jim foo.bar wrote:
>
> Ok, so after a bit of more thinking/reading, I realise now that my 
> question was somewhat misplaced. 
>
> Any AOT compiled code (not just gen-class), is an artifact of the compiler 
> used to compile it, and therefore, in some sense, implicitly tied to that 
> compiler version. However, that's not to say that the produced class 
> definitely will or won't work with some other version. If my understanding 
> is correct, it will work for future versions (which is the main point here) 
> as long as Clojure itself maintains backwards compatibility (in this case 
> wrt to gen-class). 
>
> I'm still torn on whether to actually add Clojure as a proper dependency - 
> I think I will probably end up doing so, but I'm not thrilled about it...
>
> Kind regards,
>
> Dimitris
>
>
> On 23/08/2019 08:15, dimitris wrote:
>
> Why would I write the class in Java, when this below works exactly as 
> expected?
>
> (ns foo
>   (:gen-class :name foo.Bar
>   :extends java.lang.System$LoggerFinder
>   :constructors {[] []}))
>
> (let [slm (delay(-> 'foo/system-logger-memo
> requiring-resolve
> var-get))]
>   (defn -getLogger
> "Returns a subclass of `System$Logger` which routes logging requests 
> to the `core/*root*` logger."[this name module]
> ;; resolve it at runtime (and only once),;; in order to prevent AOT 
> leaking out of this ns(@slm)))
>
>
> My question boils down to this:
>
> Let's say that I used clojure1.10 to compile the above gen-class. Does the 
> project containing it need to specify Clojure 1.10 as a proper dependency, 
> or will the end user be able to provide any version of Clojure greater or 
> equal to 1.10? 
>
> Thanks in advance...
>
> Kind regards,
>
> Dimitris
>
>
> On 23/08/2019 02:00, Matching Socks wrote:
>
> You are considering gen-class as an alternative to writing the 
> service-provider class in Java and using only methods in Clojure's public 
> Java API to connect it with an implementation-in-Clojure?
> -- 
> You received this message because you are subscribed to the Google
> Groups "Clojure" group.
> To post to this group, send email to clo...@googlegroups.com 
> Note that posts from new members are moderated - please be patient with 
> your first post.
> To unsubscribe from this group, send email to
> clo...@googlegroups.com 
> For more options, visit this group at
> http://groups.google.com/group/clojure?hl=en
> --- 
> You received this message because you are subscribed to the Google Groups 
> "Clojure" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to clo...@googlegroups.com .
> To view this discussion on the web visit 
> https://groups.google.com/d/msgid/clojure/0660b640-cf91-429b-9a40-1695e7955aef%40googlegroups.com
>  
> 
> .
>
>

-- 
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en
--- 
You received this message because you are subscribed to the Google Groups 
"Clojure" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to clojure+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/clojure/d6430c24-6e97-4b13-ba21-e81a34f5020e%40googlegroups.com.


Re: Java Interop on steroids?

2019-07-02 Thread atdixon
I'm glad someone else is thinking on this too!

#2 - For my case at the moment (Apache Beam), I believe we will always know 
the types in advance so using a Java class is workable but of course a 
(proxy++) would be ideal. Beam asks for us to extend abstract generic class 
so we must use (proxy). It also asks for our instances to be Serializable 
(again, proxy explicitly refuses to help here 
 but I believe this, too, 
should be surmountable, without the security implications of that link). In 
any case, what we do looks like this:

// pseudo-ish code
public class MyFn extends BeamFn implements Serializable {
   public MyFn(Var fn) { this.fn = fn; }
   @Override public String invoke(Integer input) { return (String) fn.invoke
(input); }
}

On the Clojure side:

;; pseudo-ish code

(defn my-fn [^Integer val] ...return a string...)

...
   (register-beam-step (MyFn. #'my-fn))
...

Note how we pass the Clojure function as a Var; this is b/c Beam wants to 
send the function over the wire. IFn is not serializable; Var is however 
and can be resolved back to Clojure function on the other end during 
deserialization.

Now of course this whole dance could be eliminated with a specialized 
library that included a proxy++ function that included the ability for 
clients to specify generic type parameter values and serialization support 
(the mechanics of which would need ironing out but I think should be 
possible.)

And to your point about types not known at runtime... this proxy would 
support that use case on-the-fly, which opens a bunch of possible 
interesting options, as well.


On Tuesday, July 2, 2019 at 8:24:41 PM UTC-5, Chris Nuernberger wrote:
>
> eglue,
>
> 1.  I think this is a great idea if it is really necessary.  I would be in 
> favor of a reify++ alone to simplify things.  I find reify amazing at code 
> compression and heavily use it via type specific macros to implement 
> interfaces that for instance support a particular primitive type.
> 2.  Is a possible workaround to define java interfaces that implement the 
> type specific generic interfaces and then reify those explicitly or is the 
> set of possible interface specialization types unknown a-priori? 
> 3.  The case where something is unbounded or unknown a-priori I would 
> think would often end up with a java class as on of the specializations.  
> In this case, regardless of the cause, one answer might be an upgraded 
> reify pathway.  
> 4.  Are these perhaps cases where you can create just a little bit of java 
> as a generator somehow to generate the interface you need to reify?
>
>
> I would personally find reify a much nicer pathway than calling clojure 
> vars from java.  
>
> I also think there could be low hanging fruit (or just good unknown 
> libraries) in the pathway for calling clojure vars from java.
>
>
> Interesting problem (at least to me), thanks!
>
> On Thu, Jun 20, 2019 at 10:03 PM eglue > 
> wrote:
>
>> Don't get me wrong, I'm as much against types as the next R̶i̶c̶h̶ 
>> ̶H̶i̶c̶k̶e̶y̶  guy.
>>
>> However -- there are many popular Java frameworks that love to reflect on 
>> their annotations and their generic type signatures.
>>
>> To name a heavyweight: Spring. But also, of late: big data frameworks, 
>> many written in Java, love reflecting on generic type signatures. My org is 
>> looking at Beam and Flink, for example.
>>
>> These frameworks use types not for the static checking really but as 
>> parameters governing their own dynamic behavior. For example, Spring will 
>> use types at runtime to simply match objects to where they should be 
>> dynamically injected. Beam will look at your type signatures and do runtime 
>> validations to ensure it can process things appropriately. Of course this 
>> is unfortunate, using types this way, when it is all really just data. 
>> Clojure does -- or would do -- it better, simpler, directer, and all of 
>> that.
>>
>> Yet we would like to leverage these frameworks. Or rather, we must for 
>> various pragmatic and business reasons.
>>
>> And any time we need to "communicate" to these frameworks "through" their 
>> desired fashion of generic types and annotations, we can, of course, create 
>> the appropriate .java files to represent what is needed (and then do the 
>> invocation back to Clojure via IFn.invoke or Compiler.eval, etc). Yes, this 
>> works.
>>
>> However this is quite tedious because in these frameworks I mentioned you 
>> end up having to create these Java files quite a bit. For example, when 
>> expressing a streaming data pipeline to Beam, you may specify multiple 
>> transforms, each a function with its own type signature.
>>
>> A little searching and it seems Clojure has shied away from generating 
>> generic type information in places where it could offer this capability. 
>>
>> For example, in `proxy` ... or I suppose also in `gen-class`, `reify`, 
>> and other dynamic bytecode generation features of Clojure.

Re: Clojure is a good choice for Big Data? Which clojure/Hadoop work to use?

2019-07-02 Thread atdixon
I've found Clojure to be an excellent fit for big data processing for a few 
reasons:

- the nature of big data is that it is often unstructured or 
semi-structured, and Clojure's immutable ad hoc map-based orientation is 
well suited to this
- much of the big data ecosystem is Java or JVM-based (and continues to 
be!) and Clojure interop with Java enables using all of the tooling and 
platforms in Clojure

That said, some Clojure libs in the space (like Cascalog that you 
mentioned) seem quiet the past few years. I personally would favor more 
active Java/JVM projects and simply interop with them from Clojure.

Here are a couple of issues that I've run into in Clojure -> Java interop 
in some of these big data platforms and their solutions:

1) Some big data java frameworks want you to extend their base classes and 
provide generic parameters as you do. Clojure's class generation tools 
(gen-class and proxy, etc) do not support providing generic parameters when 
extending Java types. The Java complier on the other hand will keep generic 
parameter values in the compiled target class as class metadata (which is 
how some of these big data systems -- like Apache Beam, for one -- are 
using them at runtime.) The solution here is to write Java classes that 
delegate back to Clojure functions thru vars.

2) These same frameworks often want you to serialize the functions you 
provide to distribute the code throughout the cluster. Clojure disables 
Serialization for the classes it generates, so using the same Java classes 
you create to achieve the generic parameter concretizations you will make 
Serializable and instantiate from Clojure by passing a Var bound to a 
function. Vars in Cljoure are serializable and so doing things this way 
allows (refs to) Clojure functions to be distributed across the cluster.

The key thing is that all of this is very simple to arrange in code once 
you get the basics down, but I've seen a few people stumble on these not 
knowing the tricks. And I realize my short descriptions here may leave some 
people wanted. I may try a blog post on these when time permits.

On Tuesday, July 2, 2019 at 11:07:49 AM UTC-5, orazio wrote:
>
> Hi All,
>
> I'm newbie on Clojure/Big Data, and i'm starting with hadoop.
> I have installed Hortonworks HDP 3.1 
> I have to design a Big Data Layer that ingests large iot datasets and 
> social media datasets, process data with MapReduce job and produce 
> aggregation to store on HBASE tables.
>
> For now, my focus is addressed on data processing issue. My question is: 
> Is Clojure a good choice for distributed data processing on hadoop ?
> I found Cascalog as fully-featured data processing and querying library 
> for Clojure or Java. But are there any active maintainers, for this library 
> ? 
> Do you know other excellent clojure/Hadoop work in the community, abaout 
> data processing? 
>
> I would appreciate some help.
>
> Orazio
>

-- 
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en
--- 
You received this message because you are subscribed to the Google Groups 
"Clojure" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to clojure+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/clojure/be2e6800-874d-4a30-8b6f-44aa32bd3901%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Java Interop on steroids?

2019-06-22 Thread atdixon
> Do the framework you're talking about do static analysis of the types? 
Because generic types are erased at runtime, so there wouldn't ever be a 
way for proxy to set them in.

They aren't entirely erased. They're erased from the code, but Java 
compilers are obligated to emit generic type signatures (as metadata) in 
the compiled class file.

The frameworks I mentioned reflect on these generic type parameters *at 
runtime* to drive their dynamic behavior. 

More information here:

https://docs.oracle.com/javase/8/docs/api/java/lang/Class.html#getGenericSuperclass--
https://docs.oracle.com/javase/specs/jvms/se8/html/jvms-4.html#jvms-4.7.9.1

Searching back through this group, I found that in 2012 someone proposed a 
patch to Clojure supporting this in `proxy` (link: 
https://groups.google.com/d/msg/clojure/Xv1pKATfP0c/od_uwpHlNhMJ) but it 
looks like it never made it in.


On Saturday, June 22, 2019 at 11:32:36 PM UTC-5, Didier wrote:
>
> > They did cite a significant performance boost as a side effect.
>
> I think it isn't very clear from the wording. They didn't just rewrite it 
> in Java, they also changed the architecture:
>
> > Storm 2.0.0 introduces a new core featuring a leaner threading model, a 
> blazing fast messaging subsystem and a lightweight back pressure model
>
> So it is hard I think to do an apples to apples comparison. In my opinion, 
> a system like Storm will fundamentally be more limited by its architecture 
> in performance than the language it uses.
>
> That said, even if I think the performance improvements are probably 
> mostly due to architectural changes they also made. It shouldn't come as a 
> surprise that Java would be faster than Clojure in most cases. I don't want 
> to make false pretenses. Clojure and Java are not equal in semantics, and 
> Java's mutable, eager and object grouped methods semantics are almost 
> always going to be more performant. Clojure makes a trade offs of 
> performance and memory for simplicity.
>
> When people say Clojure can match Java in performance, it almost always 
> implies using escape hatches and changing the semantics back to imperative 
> code. The good news though, when you use Clojure's semantics, you benefit 
> in simplicity and the performance impact is marginal, so it is still fast 
> enough for almost all use cases.
>
> Now, back to type annotations. I really don't think Storm 2.0 performance 
> improvements were due to usage of reflection that hadn't been addressed 
> from Clojure. But who knows.
>
> Do the framework you're talking about do static analysis of the types? 
> Because generic types are erased at runtime, so there wouldn't ever be a 
> way for proxy to set them in.
>
>

-- 
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en
--- 
You received this message because you are subscribed to the Google Groups 
"Clojure" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to clojure+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/clojure/1cf5ad80-0504-4634-b186-b624206c0145%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Java Interop on steroids?

2019-06-22 Thread atdixon
Here is my problem, distilled. This code should tell the full story:

static class Apple {}

Apple a = new Apple() {};

Type[] x = ((ParameterizedType)a.getClass().getGenericSuperclass())
.getActualTypeArguments();

// x is a Type array containing String, Integer

HOWEVER, via Clojure `proxy`, I don't have a way to tell it which type 
arguments to use--

(proxy [Apple [String Integer]] []) ;; something like this is not supported

Certain frameworks, however, consult these type arguments to govern their 
own dynamic behavior.

Having Clojure's proxy support specifying type arguments would allow for 
this not uncommon Java interop need.

On Saturday, June 22, 2019 at 5:38:36 AM UTC-5, Matching Socks wrote:
>
> By "generic type information", you mean the X in List ?
>
>
> On Friday, June 21, 2019 at 12:03:46 AM UTC-4, atdixon wrote:
>>
>> However -- there are many popular Java frameworks that love to reflect on 
>> their annotations and their generic type signatures.
>>
>> To name a heavyweight: Spring. But also, of late: big data frameworks, 
>> many written in Java, love reflecting on generic type signatures. My org is 
>> looking at Beam and Flink, for example.
>>
>

-- 
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en
--- 
You received this message because you are subscribed to the Google Groups 
"Clojure" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to clojure+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/clojure/8c5ae6a3-2626-47ed-9439-adbab9dec469%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Java Interop on steroids?

2019-06-22 Thread atdixon
Here is my problem, distilled. This code should tell the full story:

static class Apple {}

Apple a = new Apple() {};

Type[] x = ((ParameterizedType)a.getClass().getGenericSuperclass())
.getActualTypeArguments();

// x is a Type array containing String, String

HOWEVER, via Clojure `proxy`, I don't have a way to tell it which type 
arguments to use--

(proxy [Apple [String Integer]] []) ;; something like this is not supported

Certain frameworks, however, consult these type arguments to govern their 
own dynamic behavior.

Having Clojure's proxy support specifying type arguments would allow for 
this not uncommon Java interop need.



On Saturday, June 22, 2019 at 5:38:36 AM UTC-5, Matching Socks wrote:
>
> By "generic type information", you mean the X in List ?
>
>
> On Friday, June 21, 2019 at 12:03:46 AM UTC-4, atdixon wrote:
>>
>> However -- there are many popular Java frameworks that love to reflect on 
>> their annotations and their generic type signatures.
>>
>> To name a heavyweight: Spring. But also, of late: big data frameworks, 
>> many written in Java, love reflecting on generic type signatures. My org is 
>> looking at Beam and Flink, for example.
>>
>

-- 
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en
--- 
You received this message because you are subscribed to the Google Groups 
"Clojure" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to clojure+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/clojure/b4fb12fe-6dc6-47a5-a271-f453941fcb2d%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.