Re: Clojure interop with Spark

2020-07-10 Thread Dominic Parry
Another option is Apache Beam. We use it quite extensively. There are a few 
options for Clojure wrappers (we use datasplash), and beam has libraries for a 
number of popular languages.


Kind Regards,
Dom Parry
On 10 Jul 2020, 08:22 +0200, Alex Ott , wrote:
> From Spark perspective, I would really advise to use Dataframe API as much as 
> possible, including the Spark Structured Streaming instead of Spark Streaming 
> - the main reason is more optimized execution of the code because of all 
> optimizations that Catalyst is able to make. But I really don't see libraries 
> that wrap dataframe API
>
> > On Thu, Jul 9, 2020 at 11:36 PM Tim Clemons  wrote:
> > > I'm putting together a big data system centered around using Spark 
> > > Streaming for data ingest and Spark SQL for querying the stored data.  
> > > I've been investigating what options there are for implementing Spark 
> > > applications using Clojure.  It's been close to a decade since sparkling 
> > > or flambo have received any updates and it doesn't look like either will 
> > > accommodate recent distributions of Spark.  I've found powderkeg an 
> > > interesting option, and I like how it supports remote REPLs and the use 
> > > of tranducers rather than wrapped Scala fns.  However, it looks like it's 
> > > also seen a few years without commits and I've heard loose talk that the 
> > > developers have moved on to other pursuits.
> > >
> > > Part of the problem seems to be Spark.  The project seem unapologetic 
> > > about breaking interfaces and seems willing to sacrifice third-party code 
> > > that tries to track Spark's development.
> > >
> > > So my options seem to be the following:
> > >
> > > 1. Deploy an older version of Spark that's compatible with one of the 
> > > above mentioned libraries.  While we don't need to be bleeding edge, 
> > > deploying a three year old version just to accommodate my preferred 
> > > language is hard to justify.
> > >
> > > 2. Create a merge to update one of those libraries to more recent 
> > > versions of Spark and be prepared to maintain it internally for the 
> > > lifespan of this project.  This may be vastly overestimating my personal 
> > > heroics.
> > >
> > > 3. Code my own solution from scratch using Java/Scala interop, sketching 
> > > out just enough of a Clojure wrapper to suit my ends.
> > >
> > > 4. Learn Scala.
> > >
> > > I realize that Spark isn't the only game in town (Onyx, for example).  
> > > However, I'm working with a team of developers who are not familiar with 
> > > Clojure (though I'm working to be an advocate). I choose Spark as an 
> > > established solution that supports multiple languages and handles both 
> > > streaming and batch processing.
> > >
> > > Any insights?  Any solutions I'm overlooking?
> > >
> > >
> > >
> > > --
> > > You received this message because you are subscribed to the Google
> > > Groups "Clojure" group.
> > > To post to this group, send email to clojure@googlegroups.com
> > > Note that posts from new members are moderated - please be patient with 
> > > your first post.
> > > To unsubscribe from this group, send email to
> > > clojure+unsubscr...@googlegroups.com
> > > For more options, visit this group at
> > > http://groups.google.com/group/clojure?hl=en
> > > ---
> > > You received this message because you are subscribed to the Google Groups 
> > > "Clojure" group.
> > > To unsubscribe from this group and stop receiving emails from it, send an 
> > > email to clojure+unsubscr...@googlegroups.com.
> > > To view this discussion on the web visit 
> > > https://groups.google.com/d/msgid/clojure/259f5ff6-dd66-4688-aa80-439fed88ab39o%40googlegroups.com.
>
>
> --
> With best wishes,                    Alex Ott
> http://alexott.net/
> Twitter: alexott_en (English), alexott (Russian)
> --
> You received this message because you are subscribed to the Google
> Groups "Clojure" group.
> To post to this group, send email to clojure@googlegroups.com
> Note that posts from new members are moderated - please be patient with your 
> first post.
> To unsubscribe from this group, send email to
> clojure+unsubscr...@googlegroups.com
> For more options, visit this group at
> http://groups.google.com/group/clojure?hl=en
> ---
> You received this message because you are subscribed to the Google Groups 
> "Clojure" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to clojure+unsubscr...@googlegroups.com.
> To view this discussion on the web visit 
> https://groups.google.com/d/msgid/clojure/CALV1_%3DJtBC02CchwoCT3%3DgHbdMBfaACRA_T6yRnZo0KCr9tACg%40mail.gmail.com.

-- 
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at

Re: Clojure interop with Spark

2020-07-10 Thread Alex Ott
>From Spark perspective, I would really advise to use Dataframe API as much
as possible, including the Spark Structured Streaming instead of Spark
Streaming - the main reason is more optimized execution of the code because
of all optimizations that Catalyst is able to make. But I really don't see
libraries that wrap dataframe API

On Thu, Jul 9, 2020 at 11:36 PM Tim Clemons  wrote:

> I'm putting together a big data system centered around using Spark
> Streaming for data ingest and Spark SQL for querying the stored data.  I've
> been investigating what options there are for implementing Spark
> applications using Clojure.  It's been close to a decade since sparkling or
> flambo have received any updates and it doesn't look like either will
> accommodate recent distributions of Spark.  I've found powderkeg an
> interesting option, and I like how it supports remote REPLs and the use of
> tranducers rather than wrapped Scala fns.  However, it looks like it's also
> seen a few years without commits and I've heard loose talk that the
> developers have moved on to other pursuits.
>
> Part of the problem seems to be Spark.  The project seem unapologetic
> about breaking interfaces and seems willing to sacrifice third-party code
> that tries to track Spark's development.
>
> So my options seem to be the following:
>
> 1. Deploy an older version of Spark that's compatible with one of the
> above mentioned libraries.  While we don't need to be bleeding edge,
> deploying a three year old version just to accommodate my preferred
> language is hard to justify.
>
> 2. Create a merge to update one of those libraries to more recent versions
> of Spark and be prepared to maintain it internally for the lifespan of this
> project.  This may be vastly overestimating my personal heroics.
>
> 3. Code my own solution from scratch using Java/Scala interop, sketching
> out just enough of a Clojure wrapper to suit my ends.
>
> 4. Learn Scala.
>
> I realize that Spark isn't the only game in town (Onyx, for example).
> However, I'm working with a team of developers who are not familiar with
> Clojure (though I'm working to be an advocate). I choose Spark as an
> established solution that supports multiple languages and handles both
> streaming and batch processing.
>
> Any insights?  Any solutions I'm overlooking?
>
>
>
> --
> You received this message because you are subscribed to the Google
> Groups "Clojure" group.
> To post to this group, send email to clojure@googlegroups.com
> Note that posts from new members are moderated - please be patient with
> your first post.
> To unsubscribe from this group, send email to
> clojure+unsubscr...@googlegroups.com
> For more options, visit this group at
> http://groups.google.com/group/clojure?hl=en
> ---
> You received this message because you are subscribed to the Google Groups
> "Clojure" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to clojure+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/clojure/259f5ff6-dd66-4688-aa80-439fed88ab39o%40googlegroups.com
> 
> .
>


-- 
With best wishes,Alex Ott
http://alexott.net/
Twitter: alexott_en (English), alexott (Russian)

-- 
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en
--- 
You received this message because you are subscribed to the Google Groups 
"Clojure" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to clojure+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/clojure/CALV1_%3DJtBC02CchwoCT3%3DgHbdMBfaACRA_T6yRnZo0KCr9tACg%40mail.gmail.com.