Re: Spark RDD + HBase: adoption trend

2021-01-20 Thread Sean Owen
RDDs are still relevant in a few ways - there is no Dataset in Python for
example, so RDD is still the 'typed' API. They still underpin DataFrames.
And of course it's still there because there's probably still a lot of code
out there that uses it. Occasionally it's still useful to drop into that
API for certain operations.

If that's a connector to read data from HBase - you probably do want to
return DataFrames ideally.
Unless you're relying on very specific APIs from very specific versions, I
wouldn't think a distro's Spark or HBase is much different?

On Wed, Jan 20, 2021 at 7:44 AM Marco Firrincieli 
wrote:

> Hi, my name is Marco and I'm one of the developers behind
> https://github.com/unicredit/hbase-rdd
> a project we are currently reviewing for various reasons.
>
> We were basically wondering if RDD "is still a thing" nowadays (we see
> lots of usage for DataFrames or Datasets) and we're not sure how much of
> the community still works/uses RDDs.
>
> Also, for lack of time, we always mainly worked using Cloudera-flavored
> Hadoop/HBase & Spark versions. We were thinking the community would then
> help us organize the project in a more "generic" way, but that didn't
> happen.
>
> So I figured I would ask here what is the gut feeling of the Spark
> community so to better define the future of our little library.
>
> Thanks
>
> -Marco
>
> -
> To unsubscribe e-mail: user-unsubscr...@spark.apache.org
>
>


Re: Spark RDD + HBase: adoption trend

2021-01-20 Thread Jacek Laskowski
Hi Marco,

IMHO RDD is only for very sophisticated use cases that very few Spark devs
would be capable of. I consider RDD API a sort of Spark assembler and most
Spark devs should stick to Dataset API.

Speaking of HBase, see
https://github.com/GoogleCloudPlatform/java-docs-samples/tree/master/bigtable/spark
where you can find a demo that I worked on last year and made sure that:

"Apache HBase™ Spark Connector implements the DataSource API for Apache
HBase and allows executing relational queries on data stored in Cloud
Bigtable."

That makes hbase-rdd even more obsolete but not necessarily unusable (I am
little skilled in the HBase space to comment on this).

I think you should consider merging the project hbase-rdd of yours with the
official Apache HBase™ Spark Connector at
https://github.com/apache/hbase-connectors/tree/master/spark (as they seem
to lack active development IMHO).

Pozdrawiam,
Jacek Laskowski

https://about.me/JacekLaskowski
"The Internals Of" Online Books 
Follow me on https://twitter.com/jaceklaskowski




On Wed, Jan 20, 2021 at 2:44 PM Marco Firrincieli 
wrote:

> Hi, my name is Marco and I'm one of the developers behind
> https://github.com/unicredit/hbase-rdd
> a project we are currently reviewing for various reasons.
>
> We were basically wondering if RDD "is still a thing" nowadays (we see
> lots of usage for DataFrames or Datasets) and we're not sure how much of
> the community still works/uses RDDs.
>
> Also, for lack of time, we always mainly worked using Cloudera-flavored
> Hadoop/HBase & Spark versions. We were thinking the community would then
> help us organize the project in a more "generic" way, but that didn't
> happen.
>
> So I figured I would ask here what is the gut feeling of the Spark
> community so to better define the future of our little library.
>
> Thanks
>
> -Marco
>
> -
> To unsubscribe e-mail: user-unsubscr...@spark.apache.org
>
>


Spark RDD + HBase: adoption trend

2021-01-20 Thread Marco Firrincieli
Hi, my name is Marco and I'm one of the developers behind 
https://github.com/unicredit/hbase-rdd 
a project we are currently reviewing for various reasons.

We were basically wondering if RDD "is still a thing" nowadays (we see lots of 
usage for DataFrames or Datasets) and we're not sure how much of the community 
still works/uses RDDs.

Also, for lack of time, we always mainly worked using Cloudera-flavored 
Hadoop/HBase & Spark versions. We were thinking the community would then help 
us organize the project in a more "generic" way, but that didn't happen. 

So I figured I would ask here what is the gut feeling of the Spark community so 
to better define the future of our little library. 

Thanks

-Marco

-
To unsubscribe e-mail: user-unsubscr...@spark.apache.org