Re: Clojure is a good choice for Big Data? Which clojure/Hadoop work to use?

2019-07-07 Thread orazio
Many thanks for your clarifications.
I don't have a team of engineers. Just myself, that I think with much 
modesty is not little.
I'm not familiar with clojure, i know java programming language.
The lambda's architecture pipeline i want to build will not be made 
entirely with clojure. As described above I will use existing tools that I 
don't need to develop (NiFi, Kafka, MongoDB, Hadoop, Hbase)
Let's focus only on the batch layer of the lambda architecture.
My doubt is that i did not find an optimal tool, recognized by the Big Data 
community as the best, for distributed data processing (map reduce) of 
historical data on HDFS.
Map reduce algorithms that I have to implement concern Word Count Algorithm 
of social data message (twitter,facebook,telegram) and iot data analisys 
and aggregation (such as average values each 30 minutes, each hour, each 
day).
Reading Nathan Marz big data book, Principles and best practices of 
scalable realtime data systems, he suggests clojure/Cascalog for 
distributed data processing on HDFS Hadoop.
I'm asking you if clojure/cascalog could be a good choice to do dataset 
processing (map reduce) and to store the resulting data aggregation to 
Hbase, or if you suggest other work.
Otherwise, if you know an existing, well documented, well googleable 
framework in java language to do distributed data processing and to store 
resulting data aggregation on Hbase,  it would be appreciated your advise 
about it.

Thanks again.
Orazio

Il giorno venerdì 5 luglio 2019 19:43:16 UTC+2, ri...@chartbeat.com ha 
scritto:
>
> As much as I would love to convert a new data engineer to the ways of 
> clojure, in my opinion, choosing a language to solve a problem is rarely a 
> wise move. Do you have a team of engineers ready and willing to learn 
> clojure or are you doing this yourself? We do a lot of work with all of the 
> tools you mention (in clojure) but we built a lot of the frameworks 
> ourselves or wrote wrappers around java tools. Not for the newbie... if 
> your goal is to build this pipeline for your boss and you have any sort of 
> deadline do yourself a favor and pick an existing, well documented, well 
> googleable framework in a language that your team is familiar with. There 
> are a ton of hurdles with everything you mentioned without even getting to 
> clojure. You’re jumping in the deep end of the pool with no life jacket and 
> you don’t know how to swim.
>
> That said, if you ignore my advice you will learn a lot and we will be 
> here to help, just be warned 
>
> On Jul 4, 2019, at 2:09 PM, Thad Guidry > 
> wrote:
>
> Christian writes really good tools.  Sparkling is no exception.
> I have yet to use it in production myself however, since I haven't had the 
> need to use Clojure directly to solve any "data aggregation" problems.  
> Spark and other tools do that well enough, naturally.
>
> As far as using a tool/programming language to solve "data integration" 
> problems in large enterprise environments, I will ALWAYS use Open Source 
> tools for that purpose.  Clojure is no exception.  But I do tend to choose 
> open source hammers to drive nails.  Sometimes Clojure is missing the 
> handle on its hammer, as we have all experienced, but that's on us since WE 
> have the power to make Clojure better.  But often TIME is what we lack to 
> build better API's, libraries, tools for Clojure expansion.
>
> The Apache ecosystem offers many tools & libraries for "big data" and 
> "data integration"  which I often turn to first because I lack TIME for 
> building (long tail), but have enough TIME for learning new things (shorter 
> tail that helps the long tail).
> https://projects.apache.org/projects.html?category 
>
> Thad
> https://www.linkedin.com/in/thadguidry/
>
>
> On Thu, Jul 4, 2019 at 12:37 PM Chris Nuernberger  > wrote:
>
>> Thad,
>>
>> You approach seems very promising to me for a lot of jobs.  Spark runs on 
>> top of many things.
>>
>> As far as a clojure layer on top, what do you think about sparkling 
>> ?
>>
>> On Thu, Jul 4, 2019 at 8:43 AM Thad Guidry > > wrote:
>>
>>> "Batch" - doing things in chunks
>>> "Processing" - THE WORLD :-)  because it means so many different things 
>>> to so many folks (including your boss)
>>>
>>> Without a doubt, you will love Apache Spark for your batch processing 
>>> and writing Spark Programs to conquer any World you are building.
>>> Spend time to install Spark standalone deploy and then use its powerful 
>>> Spark Shell  
>>> (the feeling of Clojure REPL  !!)
>>> If you just want to jump in to a public cluster and Try Spark, then I 
>>> would suggest Databricks . 
>>> Spend time reading the features under Libraries drop-down menu on Apache 
>>> Spark website .
>>>
>>> You might even be encouraged enough to write an official API in Clojure 
>>> for Apache Spark 

Re: Next Scicloj meeting: using Python from Clojure using libpython-clj

2019-07-07 Thread Daniel Slutsky
Updated Agenda for the meeting next week:
• Chris Nuernberger about Libpython-clj: using Python from Clojure
• Alan Marazzi about Panthera: using Pandas in Clojure
• Discussion of community challenges

Details and registration:
https://twitter.com/scicloj/status/1147921086832619520

On Tue, 25 Jun 2019 at 19:29, Daniel Slutsky 
wrote:

> Our next online gathering will take place on July 18th, 5pm UTC.
>
> Agenda:
> • Chris Nuernberger about using python from Clojure using libpython-clj
> (https://github.com/cnuernber/libpython-clj)
> • Discussion of community challenges
>
> More details will follow.
>
>  https://twitter.com/scicloj/status/1143555279163969537
>
>
>

-- 
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en
--- 
You received this message because you are subscribed to the Google Groups 
"Clojure" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to clojure+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/clojure/CADTWONNAa16jZykdYx2HjSyMda3iSu7vPqja_Xz%3DwKm88RSuCg%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.