Vincent, Please see below
> On Oct 5, 2016, at 4:31 AM, vincent gromakowski > <[email protected]> wrote: > > Hi > thanks for your explanations. Please find inline more questions > > Vincent > > 2016-10-05 3:33 GMT+02:00 Denis Magda <[email protected] > <mailto:[email protected]>>: > Hi Vincent, > > See my answers inline > >> On Oct 4, 2016, at 12:54 AM, vincent gromakowski >> <[email protected] <mailto:[email protected]>> wrote: >> >> Hi, >> I know that Ignite has SQL support but: >> - ODBC driver doesn't seem to provide HTTP(S) support, which is easier to >> integrate on corporate networks with rules, firewalls, proxies > > Igor Sapego, what URIs are supported presently? > >> - The SQL engine doesn't seem to scale like Spark SQL would. For instance, >> Spark won't generate OOM is dataset (source or result) doesn't fit in >> memory. From Ignite side, it's not clear… > > OOM is not related to scalability topic at all. This is about application’s > logic. > > Ignite SQL engine perfectly scales out along with your cluster. Moreover, > Ignite supports indexes which allows you to get O(logN) running time > complexity for your SQL queries while in case of Spark you will face with > full-scans (O(N)) all the time. > > However, to benefit from Ignite SQL queries you have to put all the data > in-memory. Ignite doesn’t go to a CacheStore (Cassandra, relational database, > MongoDB, etc) while a SQL query is executed and won’t preload anything from > an underlying CacheStore. Automatic preloading works for key-value queries > like cache.get(key). > > > This is an issue because I will potentially have to query TB of data. If I > use Spark thriftserver backed by IgniteRDD, does it solve this point and can > I get automatic preloading from C* ? IgniteRDD will load missing tuples (key-value) pair from Cassandra because essentially IgniteRDD is an IgniteCache and Cassandra is a CacheStore. The only thing that is left to check is whether Spark triftserver can work with IgniteRDDs. Hope you will be able figure out this and share your feedback with us. > >> - Spark thrift can manage multi tenancy: different users can connect to the >> same SQL engine and share cache. In Ignite it's one cache per user, so a big >> waste of RAM. > > Everyone can connect to an Ignite cluster and work with the same set of > distributed caches. I’m not sure why you need to create caches with the same > content for every user. > > It's a security issue, Ignite cache doesn't provide multiple user account per > cache. I am thinking of using Spark to authenticate multiple users and then > Spark use a shared account on Ignite cache > Basically, Ignite provides basic security interfaces and some implementations which you can rely on by building your secure solution. This article can be useful for your case http://smartkey.co.uk/development/securing-an-apache-ignite-cluster/ <http://smartkey.co.uk/development/securing-an-apache-ignite-cluster/> — Denis > > If you need a real multi-tenancy support where cacheA is allowed to be > accessed by a group of users A only and cacheB by users from group B then you > can take a look at GridGain which is built on top of Ignite > https://gridgain.readme.io/docs/multi-tenancy > <https://gridgain.readme.io/docs/multi-tenancy> > > > > OK but I am evaluating open source only solutions (kylin, druid, alluxio...), > it's a constraint from my hierarchy >> >> What I want to achieve is : >> - use Cassandra for data store as it provides idempotence (HDFS/hive >> doesn't), resulting in exactly once semantic without any duplicates. >> - use Spark SQL thriftserver in multi tenancy for large scale adhoc >> analytics queries (> TB) from an ODBC driver through HTTP(S) >> - accelerate Cassandra reads when the data modeling of the Cassandra table >> doesn't fit the queries. Queries would be OLAP style: target multiple C* >> partitions, groupby or filters on lots of dimensions that aren't necessarely >> in the C* table key. >> > > As it was mentioned Ignite uses Cassandra as a CacheStore. You should keep > this in mind. Before trying to assemble all the chain I would recommend you > trying to connect Spark SQL thrift server directly to Ignite and work with > its shared RDDs [1]. A shared RDD (basically Ignite cache) can be backed by > Cassandra. Probably this chain will work for you but I can’t give more > precise guidance on this. > > > I will try to make it works and give you feedback > > > [1] https://apacheignite-fs.readme.io/docs/ignite-for-spark > <https://apacheignite-fs.readme.io/docs/ignite-for-spark> > > — > Denis > >> Thanks for your advises >> >> >> 2016-10-04 6:51 GMT+02:00 Jörn Franke <[email protected] >> <mailto:[email protected]>>: >> I am not sure that this will be performant. What do you want to achieve >> here? Fast lookups? Then the Cassandra Ignite store might be the right >> solution. If you want to do more analytic style of queries then you can put >> the data on HDFS/Hive and use the Ignite HDFS cache to cache certain >> partitions/tables in Hive in-memory. If you want to go to iterative machine >> learning algorithms you can go for Spark on top of this. You can use then >> also Ignite cache for Spark RDDs. >> >> On 4 Oct 2016, at 02:24, Alexey Kuznetsov <[email protected] >> <mailto:[email protected]>> wrote: >> >>> Hi, Vincent! >>> >>> Ignite also has SQL support (also scalable), I think it will be much faster >>> to query directly from Ignite than query from Spark. >>> Also please mind, that before executing queries you should load all needed >>> data to cache. >>> To load data from Cassandra to Ignite you may use Cassandra store [1]. >>> >>> [1] https://apacheignite.readme.io/docs/ignite-with-apache-cassandra >>> <https://apacheignite.readme.io/docs/ignite-with-apache-cassandra> >>> >>> On Tue, Oct 4, 2016 at 4:19 AM, vincent gromakowski >>> <[email protected] <mailto:[email protected]>> >>> wrote: >>> Hi, >>> I am evaluating the possibility to use Spark SQL (and its scalability) over >>> an Ignite cache with Cassandra persistent store to increase read workloads >>> like OLAP style analytics. >>> Is there any way to configure Spark thriftserver to load an external table >>> in Ignite like we can do in Cassandra ? >>> Here is an example of config for spark backed by cassandra >>> >>> CREATE EXTERNAL TABLE MyHiveTable >>> ( id int, data string ) >>> STORED BY 'org.apache.hadoop.hive.cassandra.cql.CqlStorageHandler' >>> TBLPROPERTIES ("cassandra.host" = "x.x.x.x", "cassandra.ks.name >>> <http://cassandra.ks.name/>" = "test" , >>> "cassandra.cf.name <http://cassandra.cf.name/>" = "mytable" , >>> "cassandra.ks.repfactor" = "1" , >>> "cassandra.ks.strategy" = >>> "org.apache.cassandra.locator.SimpleStrategy" ); >>> >>> >>> >>> >>> -- >>> Alexey Kuznetsov
