Re: This MapR-DB Spark Connector with Secondary Indexes

2019-05-04 Thread Mich Talebzadeh
I am at loss why one needs Spark to load one row from the DB as in the
example below

val data = sparkSession
  .loadFromMapRDB("/user/mapr/tables/data", schema)
  .filter("uid = '101'")
  .select("_id")

Assuming that _id is the primary key so we are just going to load one row
only. Spark as a distrubted processing is designed to work on large data
sets that require a cluster

Also my second point is that what happens if you have a composite index or
rather can one create composite index in this DB?

Third point is it is expected that you already know your search pattern so
create indexes beforehand as needed. Again this negates the use of Spark.

There are tools in the market that do create cubes and indexes dynamically
like Jethro <https://jethro.io/>. That tool would be more appropriate.

HTH

Dr Mich Talebzadeh



LinkedIn * 
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*



http://talebzadehmich.wordpress.com


*Disclaimer:* Use it at your own risk. Any and all responsibility for any
loss, damage or destruction of data or any other property which may arise
from relying on this email's technical content is explicitly disclaimed.
The author will in no case be liable for any monetary damages arising from
such loss, damage or destruction.




On Sat, 27 Apr 2019 at 17:33, Mich Talebzadeh 
wrote:

> First as I understand MapR-DB is a proprietary (not open source) NOSQL
> database that MapR offers. Similar to Hbase but better performance. There
> are some speculative statement as below:
>
>
> https://hackernoon.com/mapr-db-spark-connector-with-secondary-indexes-df41909f28ea
>
>
> "MapR Data Platform offers significant advantages over any other tool on
> the big data space. MapR-DB is one of the core components of the platform
> and it offers state of the art capabilities that blow away most of the
> NoSQL databases out there"
>
> OK Spark has connectors for Hbase, Aerospike, Mongo etc. So no surprise
> here. However, as I understand within Map-R DB one can create secondary
> indexes and Spark can take advantages of these filters to reduce the load
> into RDD.
>
> val schema = StructType(Seq(StructField("_id", StringType),
> StructField("uid", StringType)))
>
> val data = sparkSession
>   .loadFromMapRDB("/user/mapr/tables/data", schema)
>   .filter("uid = '101'")
>   .select("_id")
> So apparently this load will be more efficient as long as the secondary
> indexes are created in Map-R on the filtering column.
>
> Also see this doc
>
> https://mapr.com/docs/51/MapROverview/c_maprdb_new.html
>
> Sounds like MapR-DB tries to be a third part version of HBase and in some
> way mimics HDFS as well. I just don't understand when one can use Apache
> Phoenix with secondary indexes on Hbase that provide a relational view of
> Hbase.
>
> Has anyone used this product?
>
> There is some reference here as well
>
>
> https://stackoverflow.com/questions/30254134/difference-between-mapr-db-and-hbase
>
> Thanks
>
>
> Dr Mich Talebzadeh
>
>
>
> LinkedIn * 
> https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
> <https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>
>
>
> http://talebzadehmich.wordpress.com
>
>
> *Disclaimer:* Use it at your own risk. Any and all responsibility for any
> loss, damage or destruction of data or any other property which may arise
> from relying on this email's technical content is explicitly disclaimed.
> The author will in no case be liable for any monetary damages arising from
> such loss, damage or destruction.
>
>
>


This MapR-DB Spark Connector with Secondary Indexes

2019-04-27 Thread Mich Talebzadeh
First as I understand MapR-DB is a proprietary (not open source) NOSQL
database that MapR offers. Similar to Hbase but better performance. There
are some speculative statement as below:

https://hackernoon.com/mapr-db-spark-connector-with-secondary-indexes-df41909f28ea


"MapR Data Platform offers significant advantages over any other tool on
the big data space. MapR-DB is one of the core components of the platform
and it offers state of the art capabilities that blow away most of the
NoSQL databases out there"

OK Spark has connectors for Hbase, Aerospike, Mongo etc. So no surprise
here. However, as I understand within Map-R DB one can create secondary
indexes and Spark can take advantages of these filters to reduce the load
into RDD.

val schema = StructType(Seq(StructField("_id", StringType),
StructField("uid", StringType)))

val data = sparkSession
  .loadFromMapRDB("/user/mapr/tables/data", schema)
  .filter("uid = '101'")
  .select("_id")
So apparently this load will be more efficient as long as the secondary
indexes are created in Map-R on the filtering column.

Also see this doc

https://mapr.com/docs/51/MapROverview/c_maprdb_new.html

Sounds like MapR-DB tries to be a third part version of HBase and in some
way mimics HDFS as well. I just don't understand when one can use Apache
Phoenix with secondary indexes on Hbase that provide a relational view of
Hbase.

Has anyone used this product?

There is some reference here as well

https://stackoverflow.com/questions/30254134/difference-between-mapr-db-and-hbase

Thanks


Dr Mich Talebzadeh



LinkedIn * 
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*



http://talebzadehmich.wordpress.com


*Disclaimer:* Use it at your own risk. Any and all responsibility for any
loss, damage or destruction of data or any other property which may arise
from relying on this email's technical content is explicitly disclaimed.
The author will in no case be liable for any monetary damages arising from
such loss, damage or destruction.