Vincent, That's right, our ODBC driver does not support using HTTP(S) as a transport currently.
Best Regards, Igor On Mon, Oct 17, 2016 at 9:40 PM, vincent gromakowski < [email protected]> wrote: > Hi > I mean using HTTPS transport instead of binary (thrift?) transport. > > 2016-10-17 19:10 GMT+02:00 Igor Sapego <[email protected]>: > >> Hi Vincent, >> >> Can you please explain what do you mean by HTTP(S) support for the ODBC? >> >> I'm not quite sure I get it. >> >> Best Regards, >> Igor >> >> On Thu, Oct 6, 2016 at 9:59 AM, vincent gromakowski < >> [email protected]> wrote: >> >>> Thanks >>> >>> Starting the thriftserver with igniterdd tables doesn't seem very hard. >>> Implementing a security layer over ignite cache may be harder as I need to: >>> - get username from thriftserver >>> - intercept each request and check permissions >>> Maybe spark will also be able to handle permissions... >>> >>> I will keep you informed >>> >>> Le 6 oct. 2016 00:12, "Denis Magda" <[email protected]> a écrit : >>> >>>> Vincent, >>>> >>>> Please see below >>>> >>>> On Oct 5, 2016, at 4:31 AM, vincent gromakowski < >>>> [email protected]> wrote: >>>> >>>> Hi >>>> thanks for your explanations. Please find inline more questions >>>> >>>> Vincent >>>> >>>> 2016-10-05 3:33 GMT+02:00 Denis Magda <[email protected]>: >>>> >>>>> Hi Vincent, >>>>> >>>>> See my answers inline >>>>> >>>>> On Oct 4, 2016, at 12:54 AM, vincent gromakowski < >>>>> [email protected]> wrote: >>>>> >>>>> Hi, >>>>> I know that Ignite has SQL support but: >>>>> - ODBC driver doesn't seem to provide HTTP(S) support, which is easier >>>>> to integrate on corporate networks with rules, firewalls, proxies >>>>> >>>>> >>>>> *Igor Sapego*, what URIs are supported presently? >>>>> >>>>> - The SQL engine doesn't seem to scale like Spark SQL would. For >>>>> instance, Spark won't generate OOM is dataset (source or result) doesn't >>>>> fit in memory. From Ignite side, it's not clear… >>>>> >>>>> >>>>> OOM is not related to scalability topic at all. This is about >>>>> application’s logic. >>>>> >>>>> Ignite SQL engine perfectly scales out along with your cluster. >>>>> Moreover, Ignite supports indexes which allows you to get O(logN) running >>>>> time complexity for your SQL queries while in case of Spark you will face >>>>> with full-scans (O(N)) all the time. >>>>> >>>>> However, to benefit from Ignite SQL queries you have to put all the >>>>> data in-memory. Ignite doesn’t go to a CacheStore (Cassandra, relational >>>>> database, MongoDB, etc) while a SQL query is executed and won’t preload >>>>> anything from an underlying CacheStore. Automatic preloading works for >>>>> key-value queries like cache.get(key). >>>>> >>>> >>>> >>>> This is an issue because I will potentially have to query TB of data. >>>> If I use Spark thriftserver backed by IgniteRDD, does it solve this point >>>> and can I get automatic preloading from C* ? >>>> >>>> >>>> IgniteRDD will load missing tuples (key-value) pair from Cassandra >>>> because essentially IgniteRDD is an IgniteCache and Cassandra is a >>>> CacheStore. The only thing that is left to check is whether Spark >>>> triftserver can work with IgniteRDDs. Hope you will be able figure out this >>>> and share your feedback with us. >>>> >>>> >>>> >>>>> - Spark thrift can manage multi tenancy: different users can connect >>>>> to the same SQL engine and share cache. In Ignite it's one cache per user, >>>>> so a big waste of RAM. >>>>> >>>>> >>>>> Everyone can connect to an Ignite cluster and work with the same set >>>>> of distributed caches. I’m not sure why you need to create caches with the >>>>> same content for every user. >>>>> >>>> >>>> It's a security issue, Ignite cache doesn't provide multiple user >>>> account per cache. I am thinking of using Spark to authenticate multiple >>>> users and then Spark use a shared account on Ignite cache >>>> >>>> >>>> Basically, Ignite provides basic security interfaces and some >>>> implementations which you can rely on by building your secure solution. >>>> This article can be useful for your case >>>> http://smartkey.co.uk/development/securing-an-apache-ignite-cluster/ >>>> >>>> — >>>> Denis >>>> >>>> >>>>> If you need a real multi-tenancy support where cacheA is allowed to be >>>>> accessed by a group of users A only and cacheB by users from group B then >>>>> you can take a look at GridGain which is built on top of Ignite >>>>> https://gridgain.readme.io/docs/multi-tenancy >>>>> >>>>> >>>>> >>>> OK but I am evaluating open source only solutions (kylin, druid, >>>> alluxio...), it's a constraint from my hierarchy >>>> >>>>> >>>>> What I want to achieve is : >>>>> - use Cassandra for data store as it provides idempotence (HDFS/hive >>>>> doesn't), resulting in exactly once semantic without any duplicates. >>>>> - use Spark SQL thriftserver in multi tenancy for large scale adhoc >>>>> analytics queries (> TB) from an ODBC driver through HTTP(S) >>>>> - accelerate Cassandra reads when the data modeling of the Cassandra >>>>> table doesn't fit the queries. Queries would be OLAP style: target >>>>> multiple >>>>> C* partitions, groupby or filters on lots of dimensions that aren't >>>>> necessarely in the C* table key. >>>>> >>>>> >>>>> As it was mentioned Ignite uses Cassandra as a CacheStore. You should >>>>> keep this in mind. Before trying to assemble all the chain I would >>>>> recommend you trying to connect Spark SQL thrift server directly to Ignite >>>>> and work with its shared RDDs [1]. A shared RDD (basically Ignite cache) >>>>> can be backed by Cassandra. Probably this chain will work for you but I >>>>> can’t give more precise guidance on this. >>>>> >>>>> >>>> I will try to make it works and give you feedback >>>> >>>> >>>> >>>>> [1] https://apacheignite-fs.readme.io/docs/ignite-for-spark >>>>> >>>>> — >>>>> Denis >>>>> >>>>> Thanks for your advises >>>>> >>>>> >>>>> 2016-10-04 6:51 GMT+02:00 Jörn Franke <[email protected]>: >>>>> >>>>>> I am not sure that this will be performant. What do you want to >>>>>> achieve here? Fast lookups? Then the Cassandra Ignite store might be the >>>>>> right solution. If you want to do more analytic style of queries then you >>>>>> can put the data on HDFS/Hive and use the Ignite HDFS cache to cache >>>>>> certain partitions/tables in Hive in-memory. If you want to go to >>>>>> iterative >>>>>> machine learning algorithms you can go for Spark on top of this. You can >>>>>> use then also Ignite cache for Spark RDDs. >>>>>> >>>>>> On 4 Oct 2016, at 02:24, Alexey Kuznetsov <[email protected]> >>>>>> wrote: >>>>>> >>>>>> Hi, Vincent! >>>>>> >>>>>> Ignite also has SQL support (also scalable), I think it will be much >>>>>> faster to query directly from Ignite than query from Spark. >>>>>> Also please mind, that before executing queries you should load all >>>>>> needed data to cache. >>>>>> To load data from Cassandra to Ignite you may use Cassandra store [1]. >>>>>> >>>>>> [1] https://apacheignite.readme.io/docs/ignite-with-apache-cassandra >>>>>> >>>>>> On Tue, Oct 4, 2016 at 4:19 AM, vincent gromakowski < >>>>>> [email protected]> wrote: >>>>>> >>>>>>> Hi, >>>>>>> I am evaluating the possibility to use Spark SQL (and its >>>>>>> scalability) over an Ignite cache with Cassandra persistent store to >>>>>>> increase read workloads like OLAP style analytics. >>>>>>> Is there any way to configure Spark thriftserver to load an external >>>>>>> table in Ignite like we can do in Cassandra ? >>>>>>> Here is an example of config for spark backed by cassandra >>>>>>> >>>>>>> CREATE EXTERNAL TABLE MyHiveTable >>>>>>> ( id int, data string ) >>>>>>> STORED BY 'org.apache.hadoop.hive.cassan >>>>>>> dra.cql.CqlStorageHandler' >>>>>>> TBLPROPERTIES ("cassandra.host" = "x.x.x.x", " >>>>>>> cassandra.ks.name" = "test" , >>>>>>> "cassandra.cf.name" = "mytable" , >>>>>>> "cassandra.ks.repfactor" = "1" , >>>>>>> "cassandra.ks.strategy" = >>>>>>> "org.apache.cassandra.locator.SimpleStrategy" ); >>>>>>> >>>>>>> >>>>>> >>>>>> >>>>>> -- >>>>>> Alexey Kuznetsov >>>>>> >>>>>> >>>> >> >
