Re: Clojure interop with Spark

2020-07-10 Thread Dominic Parry
Another option is Apache Beam. We use it quite extensively. There are a few 
options for Clojure wrappers (we use datasplash), and beam has libraries for a 
number of popular languages.


Kind Regards,
Dom Parry
On 10 Jul 2020, 08:22 +0200, Alex Ott , wrote:
> From Spark perspective, I would really advise to use Dataframe API as much as 
> possible, including the Spark Structured Streaming instead of Spark Streaming 
> - the main reason is more optimized execution of the code because of all 
> optimizations that Catalyst is able to make. But I really don't see libraries 
> that wrap dataframe API
>
> > On Thu, Jul 9, 2020 at 11:36 PM Tim Clemons  wrote:
> > > I'm putting together a big data system centered around using Spark 
> > > Streaming for data ingest and Spark SQL for querying the stored data.  
> > > I've been investigating what options there are for implementing Spark 
> > > applications using Clojure.  It's been close to a decade since sparkling 
> > > or flambo have received any updates and it doesn't look like either will 
> > > accommodate recent distributions of Spark.  I've found powderkeg an 
> > > interesting option, and I like how it supports remote REPLs and the use 
> > > of tranducers rather than wrapped Scala fns.  However, it looks like it's 
> > > also seen a few years without commits and I've heard loose talk that the 
> > > developers have moved on to other pursuits.
> > >
> > > Part of the problem seems to be Spark.  The project seem unapologetic 
> > > about breaking interfaces and seems willing to sacrifice third-party code 
> > > that tries to track Spark's development.
> > >
> > > So my options seem to be the following:
> > >
> > > 1. Deploy an older version of Spark that's compatible with one of the 
> > > above mentioned libraries.  While we don't need to be bleeding edge, 
> > > deploying a three year old version just to accommodate my preferred 
> > > language is hard to justify.
> > >
> > > 2. Create a merge to update one of those libraries to more recent 
> > > versions of Spark and be prepared to maintain it internally for the 
> > > lifespan of this project.  This may be vastly overestimating my personal 
> > > heroics.
> > >
> > > 3. Code my own solution from scratch using Java/Scala interop, sketching 
> > > out just enough of a Clojure wrapper to suit my ends.
> > >
> > > 4. Learn Scala.
> > >
> > > I realize that Spark isn't the only game in town (Onyx, for example).  
> > > However, I'm working with a team of developers who are not familiar with 
> > > Clojure (though I'm working to be an advocate). I choose Spark as an 
> > > established solution that supports multiple languages and handles both 
> > > streaming and batch processing.
> > >
> > > Any insights?  Any solutions I'm overlooking?
> > >
> > >
> > >
> > > --
> > > You received this message because you are subscribed to the Google
> > > Groups "Clojure" group.
> > > To post to this group, send email to clojure@googlegroups.com
> > > Note that posts from new members are moderated - please be patient with 
> > > your first post.
> > > To unsubscribe from this group, send email to
> > > clojure+unsubscr...@googlegroups.com
> > > For more options, visit this group at
> > > http://groups.google.com/group/clojure?hl=en
> > > ---
> > > You received this message because you are subscribed to the Google Groups 
> > > "Clojure" group.
> > > To unsubscribe from this group and stop receiving emails from it, send an 
> > > email to clojure+unsubscr...@googlegroups.com.
> > > To view this discussion on the web visit 
> > > https://groups.google.com/d/msgid/clojure/259f5ff6-dd66-4688-aa80-439fed88ab39o%40googlegroups.com.
>
>
> --
> With best wishes,                    Alex Ott
> http://alexott.net/
> Twitter: alexott_en (English), alexott (Russian)
> --
> You received this message because you are subscribed to the Google
> Groups "Clojure" group.
> To post to this group, send email to clojure@googlegroups.com
> Note that posts from new members are moderated - please be patient with your 
> first post.
> To unsubscribe from this group, send email to
> clojure+unsubscr...@googlegroups.com
> For more options, visit this group at
> http://groups.google.com/group/clojure?hl=en
> ---
> You received this message because you are subscribed to the Google Groups 
> "Clojure" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to clojure+unsubscr...@googlegroups.com.
> To view this discussion on the web visit 
> https://groups.google.com/d/msgid/clojure/CALV1_%3DJtBC02CchwoCT3%3DgHbdMBfaACRA_T6yRnZo0KCr9tACg%40mail.gmail.com.

-- 
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at

Re: Clojure interop with Spark

2020-07-10 Thread Alex Ott
>From Spark perspective, I would really advise to use Dataframe API as much
as possible, including the Spark Structured Streaming instead of Spark
Streaming - the main reason is more optimized execution of the code because
of all optimizations that Catalyst is able to make. But I really don't see
libraries that wrap dataframe API

On Thu, Jul 9, 2020 at 11:36 PM Tim Clemons  wrote:

> I'm putting together a big data system centered around using Spark
> Streaming for data ingest and Spark SQL for querying the stored data.  I've
> been investigating what options there are for implementing Spark
> applications using Clojure.  It's been close to a decade since sparkling or
> flambo have received any updates and it doesn't look like either will
> accommodate recent distributions of Spark.  I've found powderkeg an
> interesting option, and I like how it supports remote REPLs and the use of
> tranducers rather than wrapped Scala fns.  However, it looks like it's also
> seen a few years without commits and I've heard loose talk that the
> developers have moved on to other pursuits.
>
> Part of the problem seems to be Spark.  The project seem unapologetic
> about breaking interfaces and seems willing to sacrifice third-party code
> that tries to track Spark's development.
>
> So my options seem to be the following:
>
> 1. Deploy an older version of Spark that's compatible with one of the
> above mentioned libraries.  While we don't need to be bleeding edge,
> deploying a three year old version just to accommodate my preferred
> language is hard to justify.
>
> 2. Create a merge to update one of those libraries to more recent versions
> of Spark and be prepared to maintain it internally for the lifespan of this
> project.  This may be vastly overestimating my personal heroics.
>
> 3. Code my own solution from scratch using Java/Scala interop, sketching
> out just enough of a Clojure wrapper to suit my ends.
>
> 4. Learn Scala.
>
> I realize that Spark isn't the only game in town (Onyx, for example).
> However, I'm working with a team of developers who are not familiar with
> Clojure (though I'm working to be an advocate). I choose Spark as an
> established solution that supports multiple languages and handles both
> streaming and batch processing.
>
> Any insights?  Any solutions I'm overlooking?
>
>
>
> --
> You received this message because you are subscribed to the Google
> Groups "Clojure" group.
> To post to this group, send email to clojure@googlegroups.com
> Note that posts from new members are moderated - please be patient with
> your first post.
> To unsubscribe from this group, send email to
> clojure+unsubscr...@googlegroups.com
> For more options, visit this group at
> http://groups.google.com/group/clojure?hl=en
> ---
> You received this message because you are subscribed to the Google Groups
> "Clojure" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to clojure+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/clojure/259f5ff6-dd66-4688-aa80-439fed88ab39o%40googlegroups.com
> 
> .
>


-- 
With best wishes,Alex Ott
http://alexott.net/
Twitter: alexott_en (English), alexott (Russian)

-- 
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en
--- 
You received this message because you are subscribed to the Google Groups 
"Clojure" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to clojure+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/clojure/CALV1_%3DJtBC02CchwoCT3%3DgHbdMBfaACRA_T6yRnZo0KCr9tACg%40mail.gmail.com.


Re: Clojure interop with Spark

2020-07-09 Thread Jeff Stokes
Hey Tim,

We at Amperity have used Sparkling for our Clojure Spark interop in the 
past. After a few years of fighting, we eventually ended up with sparkplug (
https://github.com/amperity/sparkplug), which we now use to run all of our 
production Spark jobs. There is built in support for proper function 
serialization including wrappers around the Java RDD APIs. We also have 
some basic support for REPL interaction, but this is fairly limited. We 
also run on a newer versions of Spark (2.4.4), and haven't had issues with 
the library when upgrading or changing Spark versions.

Let me know if I can help if you're interested!

-Jeff


On Thursday, July 9, 2020 at 2:36:41 PM UTC-7, Tim Clemons wrote:
>
> I'm putting together a big data system centered around using Spark 
> Streaming for data ingest and Spark SQL for querying the stored data.  I've 
> been investigating what options there are for implementing Spark 
> applications using Clojure.  It's been close to a decade since sparkling or 
> flambo have received any updates and it doesn't look like either will 
> accommodate recent distributions of Spark.  I've found powderkeg an 
> interesting option, and I like how it supports remote REPLs and the use of 
> tranducers rather than wrapped Scala fns.  However, it looks like it's also 
> seen a few years without commits and I've heard loose talk that the 
> developers have moved on to other pursuits.
>
> Part of the problem seems to be Spark.  The project seem unapologetic 
> about breaking interfaces and seems willing to sacrifice third-party code 
> that tries to track Spark's development.
>
> So my options seem to be the following:
>
> 1. Deploy an older version of Spark that's compatible with one of the 
> above mentioned libraries.  While we don't need to be bleeding edge, 
> deploying a three year old version just to accommodate my preferred 
> language is hard to justify.
>
> 2. Create a merge to update one of those libraries to more recent versions 
> of Spark and be prepared to maintain it internally for the lifespan of this 
> project.  This may be vastly overestimating my personal heroics.
>
> 3. Code my own solution from scratch using Java/Scala interop, sketching 
> out just enough of a Clojure wrapper to suit my ends.
>
> 4. Learn Scala.
>
> I realize that Spark isn't the only game in town (Onyx, for example).  
> However, I'm working with a team of developers who are not familiar with 
> Clojure (though I'm working to be an advocate). I choose Spark as an 
> established solution that supports multiple languages and handles both 
> streaming and batch processing.
>
> Any insights?  Any solutions I'm overlooking?
>
>
>
>

-- 
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en
--- 
You received this message because you are subscribed to the Google Groups 
"Clojure" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to clojure+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/clojure/48d98d5c-cf7f-4a63-a2ee-bf86dc2abfe8o%40googlegroups.com.


Clojure interop with Spark

2020-07-09 Thread Tim Clemons
I'm putting together a big data system centered around using Spark 
Streaming for data ingest and Spark SQL for querying the stored data.  I've 
been investigating what options there are for implementing Spark 
applications using Clojure.  It's been close to a decade since sparkling or 
flambo have received any updates and it doesn't look like either will 
accommodate recent distributions of Spark.  I've found powderkeg an 
interesting option, and I like how it supports remote REPLs and the use of 
tranducers rather than wrapped Scala fns.  However, it looks like it's also 
seen a few years without commits and I've heard loose talk that the 
developers have moved on to other pursuits.

Part of the problem seems to be Spark.  The project seem unapologetic about 
breaking interfaces and seems willing to sacrifice third-party code that 
tries to track Spark's development.

So my options seem to be the following:

1. Deploy an older version of Spark that's compatible with one of the above 
mentioned libraries.  While we don't need to be bleeding edge, deploying a 
three year old version just to accommodate my preferred language is hard to 
justify.

2. Create a merge to update one of those libraries to more recent versions 
of Spark and be prepared to maintain it internally for the lifespan of this 
project.  This may be vastly overestimating my personal heroics.

3. Code my own solution from scratch using Java/Scala interop, sketching 
out just enough of a Clojure wrapper to suit my ends.

4. Learn Scala.

I realize that Spark isn't the only game in town (Onyx, for example).  
However, I'm working with a team of developers who are not familiar with 
Clojure (though I'm working to be an advocate). I choose Spark as an 
established solution that supports multiple languages and handles both 
streaming and batch processing.

Any insights?  Any solutions I'm overlooking?



-- 
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en
--- 
You received this message because you are subscribed to the Google Groups 
"Clojure" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to clojure+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/clojure/259f5ff6-dd66-4688-aa80-439fed88ab39o%40googlegroups.com.