Hi I mean using HTTPS transport instead of binary (thrift?) transport. 2016-10-17 19:10 GMT+02:00 Igor Sapego <isap...@gridgain.com>:
> Hi Vincent, > > Can you please explain what do you mean by HTTP(S) support for the ODBC? > > I'm not quite sure I get it. > > Best Regards, > Igor > > On Thu, Oct 6, 2016 at 9:59 AM, vincent gromakowski < > vincent.gromakow...@gmail.com> wrote: > >> Thanks >> >> Starting the thriftserver with igniterdd tables doesn't seem very hard. >> Implementing a security layer over ignite cache may be harder as I need to: >> - get username from thriftserver >> - intercept each request and check permissions >> Maybe spark will also be able to handle permissions... >> >> I will keep you informed >> >> Le 6 oct. 2016 00:12, "Denis Magda" <dma...@gridgain.com> a écrit : >> >>> Vincent, >>> >>> Please see below >>> >>> On Oct 5, 2016, at 4:31 AM, vincent gromakowski < >>> vincent.gromakow...@gmail.com> wrote: >>> >>> Hi >>> thanks for your explanations. Please find inline more questions >>> >>> Vincent >>> >>> 2016-10-05 3:33 GMT+02:00 Denis Magda <dma...@gridgain.com>: >>> >>>> Hi Vincent, >>>> >>>> See my answers inline >>>> >>>> On Oct 4, 2016, at 12:54 AM, vincent gromakowski < >>>> vincent.gromakow...@gmail.com> wrote: >>>> >>>> Hi, >>>> I know that Ignite has SQL support but: >>>> - ODBC driver doesn't seem to provide HTTP(S) support, which is easier >>>> to integrate on corporate networks with rules, firewalls, proxies >>>> >>>> >>>> *Igor Sapego*, what URIs are supported presently? >>>> >>>> - The SQL engine doesn't seem to scale like Spark SQL would. For >>>> instance, Spark won't generate OOM is dataset (source or result) doesn't >>>> fit in memory. From Ignite side, it's not clear… >>>> >>>> >>>> OOM is not related to scalability topic at all. This is about >>>> application’s logic. >>>> >>>> Ignite SQL engine perfectly scales out along with your cluster. >>>> Moreover, Ignite supports indexes which allows you to get O(logN) running >>>> time complexity for your SQL queries while in case of Spark you will face >>>> with full-scans (O(N)) all the time. >>>> >>>> However, to benefit from Ignite SQL queries you have to put all the >>>> data in-memory. Ignite doesn’t go to a CacheStore (Cassandra, relational >>>> database, MongoDB, etc) while a SQL query is executed and won’t preload >>>> anything from an underlying CacheStore. Automatic preloading works for >>>> key-value queries like cache.get(key). >>>> >>> >>> >>> This is an issue because I will potentially have to query TB of data. If >>> I use Spark thriftserver backed by IgniteRDD, does it solve this point and >>> can I get automatic preloading from C* ? >>> >>> >>> IgniteRDD will load missing tuples (key-value) pair from Cassandra >>> because essentially IgniteRDD is an IgniteCache and Cassandra is a >>> CacheStore. The only thing that is left to check is whether Spark >>> triftserver can work with IgniteRDDs. Hope you will be able figure out this >>> and share your feedback with us. >>> >>> >>> >>>> - Spark thrift can manage multi tenancy: different users can connect to >>>> the same SQL engine and share cache. In Ignite it's one cache per user, so >>>> a big waste of RAM. >>>> >>>> >>>> Everyone can connect to an Ignite cluster and work with the same set of >>>> distributed caches. I’m not sure why you need to create caches with the >>>> same content for every user. >>>> >>> >>> It's a security issue, Ignite cache doesn't provide multiple user >>> account per cache. I am thinking of using Spark to authenticate multiple >>> users and then Spark use a shared account on Ignite cache >>> >>> >>> Basically, Ignite provides basic security interfaces and some >>> implementations which you can rely on by building your secure solution. >>> This article can be useful for your case >>> http://smartkey.co.uk/development/securing-an-apache-ignite-cluster/ >>> >>> — >>> Denis >>> >>> >>>> If you need a real multi-tenancy support where cacheA is allowed to be >>>> accessed by a group of users A only and cacheB by users from group B then >>>> you can take a look at GridGain which is built on top of Ignite >>>> https://gridgain.readme.io/docs/multi-tenancy >>>> >>>> >>>> >>> OK but I am evaluating open source only solutions (kylin, druid, >>> alluxio...), it's a constraint from my hierarchy >>> >>>> >>>> What I want to achieve is : >>>> - use Cassandra for data store as it provides idempotence (HDFS/hive >>>> doesn't), resulting in exactly once semantic without any duplicates. >>>> - use Spark SQL thriftserver in multi tenancy for large scale adhoc >>>> analytics queries (> TB) from an ODBC driver through HTTP(S) >>>> - accelerate Cassandra reads when the data modeling of the Cassandra >>>> table doesn't fit the queries. Queries would be OLAP style: target multiple >>>> C* partitions, groupby or filters on lots of dimensions that aren't >>>> necessarely in the C* table key. >>>> >>>> >>>> As it was mentioned Ignite uses Cassandra as a CacheStore. You should >>>> keep this in mind. Before trying to assemble all the chain I would >>>> recommend you trying to connect Spark SQL thrift server directly to Ignite >>>> and work with its shared RDDs [1]. A shared RDD (basically Ignite cache) >>>> can be backed by Cassandra. Probably this chain will work for you but I >>>> can’t give more precise guidance on this. >>>> >>>> >>> I will try to make it works and give you feedback >>> >>> >>> >>>> [1] https://apacheignite-fs.readme.io/docs/ignite-for-spark >>>> >>>> — >>>> Denis >>>> >>>> Thanks for your advises >>>> >>>> >>>> 2016-10-04 6:51 GMT+02:00 Jörn Franke <jornfra...@gmail.com>: >>>> >>>>> I am not sure that this will be performant. What do you want to >>>>> achieve here? Fast lookups? Then the Cassandra Ignite store might be the >>>>> right solution. If you want to do more analytic style of queries then you >>>>> can put the data on HDFS/Hive and use the Ignite HDFS cache to cache >>>>> certain partitions/tables in Hive in-memory. If you want to go to >>>>> iterative >>>>> machine learning algorithms you can go for Spark on top of this. You can >>>>> use then also Ignite cache for Spark RDDs. >>>>> >>>>> On 4 Oct 2016, at 02:24, Alexey Kuznetsov <akuznet...@gridgain.com> >>>>> wrote: >>>>> >>>>> Hi, Vincent! >>>>> >>>>> Ignite also has SQL support (also scalable), I think it will be much >>>>> faster to query directly from Ignite than query from Spark. >>>>> Also please mind, that before executing queries you should load all >>>>> needed data to cache. >>>>> To load data from Cassandra to Ignite you may use Cassandra store [1]. >>>>> >>>>> [1] https://apacheignite.readme.io/docs/ignite-with-apache-cassandra >>>>> >>>>> On Tue, Oct 4, 2016 at 4:19 AM, vincent gromakowski <vincent.gromakows >>>>> k...@gmail.com> wrote: >>>>> >>>>>> Hi, >>>>>> I am evaluating the possibility to use Spark SQL (and its >>>>>> scalability) over an Ignite cache with Cassandra persistent store to >>>>>> increase read workloads like OLAP style analytics. >>>>>> Is there any way to configure Spark thriftserver to load an external >>>>>> table in Ignite like we can do in Cassandra ? >>>>>> Here is an example of config for spark backed by cassandra >>>>>> >>>>>> CREATE EXTERNAL TABLE MyHiveTable >>>>>> ( id int, data string ) >>>>>> STORED BY 'org.apache.hadoop.hive.cassan >>>>>> dra.cql.CqlStorageHandler' >>>>>> TBLPROPERTIES ("cassandra.host" = "x.x.x.x", " >>>>>> cassandra.ks.name" = "test" , >>>>>> "cassandra.cf.name" = "mytable" , >>>>>> "cassandra.ks.repfactor" = "1" , >>>>>> "cassandra.ks.strategy" = >>>>>> "org.apache.cassandra.locator.SimpleStrategy" ); >>>>>> >>>>>> >>>>> >>>>> >>>>> -- >>>>> Alexey Kuznetsov >>>>> >>>>> >>> >