Re: Spark Data Frame support in Ignite

2017-08-10 Thread Valentin Kulichenko
Denis, This only allows to limit dataset fetched from DB to Spark. This is useful, but does not replace custom Strategy integration. Because after you create the FD, you will use its API to do additional filtering, mapping, aggregation, etc., and this will happen within Spark. With custom

Re: Spark Data Frame support in Ignite

2017-08-10 Thread Denis Magda
>> This JDBC integration is just a Spark data source, which means that Spark >> will fetch data in its local memory first, and only then apply filters, >> aggregations, etc. Seems that there is a backdoor exposed via the standard SQL syntax. You can execute so called “pushdown” queries [1] that

Re: Spark Data Frame support in Ignite

2017-08-04 Thread Dmitriy Setrakyan
On Thu, Aug 3, 2017 at 9:04 PM, Valentin Kulichenko < valentin.kuliche...@gmail.com> wrote: > This JDBC integration is just a Spark data source, which means that Spark > will fetch data in its local memory first, and only then apply filters, > aggregations, etc. This is obviously slow and doesn't

Re: Spark Data Frame support in Ignite

2017-08-03 Thread Valentin Kulichenko
This JDBC integration is just a Spark data source, which means that Spark will fetch data in its local memory first, and only then apply filters, aggregations, etc. This is obviously slow and doesn't use all advantages Ignite provides. To create useful and valuable integration, we should create a

Re: Spark Data Frame support in Ignite

2017-08-03 Thread Dmitriy Setrakyan
On Thu, Aug 3, 2017 at 9:04 AM, Jörn Franke wrote: > I think the development effort would still be higher. Everything would > have to be put via JDBC into Ignite, then checkpointing would have to be > done via JDBC (again additional development effort), a lot of conversion

Re: Spark Data Frame support in Ignite

2017-08-03 Thread Jörn Franke
I think the development effort would still be higher. Everything would have to be put via JDBC into Ignite, then checkpointing would have to be done via JDBC (again additional development effort), a lot of conversion from spark internal format to JDBC and back to ignite internal format.

Re: Spark Data Frame support in Ignite

2017-08-03 Thread Dmitriy Setrakyan
On Thu, Aug 3, 2017 at 8:45 AM, Jörn Franke wrote: > I think the JDBC one is more inefficient, slower requires too much > development effort. You can also check the integration of Alluxio with > Spark. > As far as I know, Alluxio is a file system, so it cannot use JDBC.

Re: Spark Data Frame support in Ignite

2017-08-03 Thread Jörn Franke
I think the JDBC one is more inefficient, slower requires too much development effort. You can also check the integration of Alluxio with Spark. Then, in general I think JDBC has never designed for large data volumes. It is for executing queries and getting a small or aggregated result set

Re: Spark Data Frame support in Ignite

2017-08-03 Thread Dmitriy Setrakyan
Jorn, thanks for your feedback! Can you explain how the direct support would be different from the JDBC support? Thanks, D. On Thu, Aug 3, 2017 at 7:40 AM, Jörn Franke wrote: > These are two different things. Spark applications themselves do not use > JDBC - it is more

Re: Spark Data Frame support in Ignite

2017-08-02 Thread Jörn Franke
These are two different things. Spark applications themselves do not use JDBC - it is more for non-spark applications to access Spark DataFrames. A direct support by Ignite would make more sense. Although you have in theory IGFS, if the user is using HDFS, which might not be the case. It is now