Re: Spark Data Frame support in Ignite

2017-08-10 Thread Valentin Kulichenko
Denis,

This only allows to limit dataset fetched from DB to Spark. This is useful,
but does not replace custom Strategy integration. Because after you create
the FD, you will use its API to do additional filtering, mapping,
aggregation, etc., and this will happen within Spark. With custom strategy
the whole processing will be done on Ignite side.

-Val

On Thu, Aug 10, 2017 at 3:07 PM, Denis Magda  wrote:

> >> This JDBC integration is just a Spark data source, which means that
> Spark
> >> will fetch data in its local memory first, and only then apply filters,
> >> aggregations, etc.
>
> Seems that there is a backdoor exposed via the standard SQL syntax. You
> can execute so called “pushdown” queries [1] that are sent by Spark to a
> JDBC database right away and the result is wrapped into a form of the
> DataFrame.
>
> I could do this trick using Ignite as a JDBC compliant datasource
> executing the query below over the data stored in the cluster:
>
> SELECT p.name as person, c.name as city " +
> "FROM person p, city c  WHERE p.city_id = c.id
>
> There are some limitations though because the actual query issued by Spark
> will be:
>
> SELECT * FROM (SELECT p.name as person, c.name as city " +
> "FROM person p, city c  WHERE p.city_id = c.id) as res
>
> Here [2] is a complete example.
>
>
> [1] https://docs.databricks.com/spark/latest/data-sources/sql-
> databases.html#pushdown-query-to-database-engine <
> https://docs.databricks.com/spark/latest/data-sources/sql-
> databases.html#pushdown-query-to-database-engine>
> [2] https://github.com/dmagda/ignite-dataframes <
> https://github.com/dmagda/ignite-dataframes>
>
> —
> Denis
>
> > On Aug 4, 2017, at 3:41 PM, Dmitriy Setrakyan  wrote:
> >
> > On Thu, Aug 3, 2017 at 9:04 PM, Valentin Kulichenko <
> > valentin.kuliche...@gmail.com> wrote:
> >
> >> This JDBC integration is just a Spark data source, which means that
> Spark
> >> will fetch data in its local memory first, and only then apply filters,
> >> aggregations, etc. This is obviously slow and doesn't use all advantages
> >> Ignite provides.
> >>
> >> To create useful and valuable integration, we should create a custom
> >> Strategy that will convert Spark's logical plan into a SQL query and
> >> execute it directly on Ignite.
> >>
> >
> > I get it, but we have been talking about Data Frame support for longer
> than
> > a year. I think we should advise our users to switch to JDBC until the
> > community gets someone to implement it.
> >
> >
> >>
> >> -Val
> >>
> >> On Thu, Aug 3, 2017 at 12:12 AM, Dmitriy Setrakyan <
> dsetrak...@apache.org>
> >> wrote:
> >>
> >>> On Thu, Aug 3, 2017 at 9:04 AM, Jörn Franke 
> >> wrote:
> >>>
>  I think the development effort would still be higher. Everything would
>  have to be put via JDBC into Ignite, then checkpointing would have to
> >> be
>  done via JDBC (again additional development effort), a lot of
> >> conversion
>  from spark internal format to JDBC and back to ignite internal format.
>  Pagination I do not see as a useful feature for managing large data
> >>> volumes
>  from databases - on the contrary it is very inefficient (and one would
> >> to
>  have to implement logic to fetch al pages). Pagination was also never
>  thought of for fetching large data volumes, but for web pages showing
> a
>  small result set over several pages, where the user can click manually
> >>> for
>  the next page (what they anyway not do most of the time).
> 
>  While it might be a quick solution , I think a deeper integration than
>  JDBC would be more beneficial.
> 
> >>>
> >>> Jorn, I completely agree. However, we have not been able to find a
> >>> contributor for this feature. You sound like you have sufficient domain
> >>> expertise in Spark and Ignite. Would you be willing to help out?
> >>>
> >>>
> > On 3. Aug 2017, at 08:57, Dmitriy Setrakyan 
>  wrote:
> >
> >> On Thu, Aug 3, 2017 at 8:45 AM, Jörn Franke 
>  wrote:
> >>
> >> I think the JDBC one is more inefficient, slower requires too much
> >> development effort. You can also check the integration of Alluxio
> >> with
> >> Spark.
> >>
> >
> > As far as I know, Alluxio is a file system, so it cannot use JDBC.
>  Ignite,
> > on the other hand, is an SQL system and works well with JDBC. As far
> >> as
>  the
> > development effort, we are dealing with SQL, so I am not sure why
> >> JDBC
> > would be harder.
> >
> > Generally speaking, until Ignite provides native data frame
> >>> integration,
> > having JDBC-based integration out of the box is minimally acceptable.
> >
> >
> >> Then, in general I think JDBC has never designed for large data
> >>> volumes.
> >> It is for executing queries and getting a small or aggregated result
> >>> set
> >> back. 

Re: Spark Data Frame support in Ignite

2017-08-10 Thread Denis Magda
>> This JDBC integration is just a Spark data source, which means that Spark
>> will fetch data in its local memory first, and only then apply filters,
>> aggregations, etc. 

Seems that there is a backdoor exposed via the standard SQL syntax. You can 
execute so called “pushdown” queries [1] that are sent by Spark to a JDBC 
database right away and the result is wrapped into a form of the DataFrame.

I could do this trick using Ignite as a JDBC compliant datasource executing the 
query below over the data stored in the cluster:

SELECT p.name as person, c.name as city " +
"FROM person p, city c  WHERE p.city_id = c.id

There are some limitations though because the actual query issued by Spark will 
be:

SELECT * FROM (SELECT p.name as person, c.name as city " +
"FROM person p, city c  WHERE p.city_id = c.id) as res

Here [2] is a complete example.


[1] 
https://docs.databricks.com/spark/latest/data-sources/sql-databases.html#pushdown-query-to-database-engine
 

[2] https://github.com/dmagda/ignite-dataframes 


—
Denis

> On Aug 4, 2017, at 3:41 PM, Dmitriy Setrakyan  wrote:
> 
> On Thu, Aug 3, 2017 at 9:04 PM, Valentin Kulichenko <
> valentin.kuliche...@gmail.com> wrote:
> 
>> This JDBC integration is just a Spark data source, which means that Spark
>> will fetch data in its local memory first, and only then apply filters,
>> aggregations, etc. This is obviously slow and doesn't use all advantages
>> Ignite provides.
>> 
>> To create useful and valuable integration, we should create a custom
>> Strategy that will convert Spark's logical plan into a SQL query and
>> execute it directly on Ignite.
>> 
> 
> I get it, but we have been talking about Data Frame support for longer than
> a year. I think we should advise our users to switch to JDBC until the
> community gets someone to implement it.
> 
> 
>> 
>> -Val
>> 
>> On Thu, Aug 3, 2017 at 12:12 AM, Dmitriy Setrakyan 
>> wrote:
>> 
>>> On Thu, Aug 3, 2017 at 9:04 AM, Jörn Franke 
>> wrote:
>>> 
 I think the development effort would still be higher. Everything would
 have to be put via JDBC into Ignite, then checkpointing would have to
>> be
 done via JDBC (again additional development effort), a lot of
>> conversion
 from spark internal format to JDBC and back to ignite internal format.
 Pagination I do not see as a useful feature for managing large data
>>> volumes
 from databases - on the contrary it is very inefficient (and one would
>> to
 have to implement logic to fetch al pages). Pagination was also never
 thought of for fetching large data volumes, but for web pages showing a
 small result set over several pages, where the user can click manually
>>> for
 the next page (what they anyway not do most of the time).
 
 While it might be a quick solution , I think a deeper integration than
 JDBC would be more beneficial.
 
>>> 
>>> Jorn, I completely agree. However, we have not been able to find a
>>> contributor for this feature. You sound like you have sufficient domain
>>> expertise in Spark and Ignite. Would you be willing to help out?
>>> 
>>> 
> On 3. Aug 2017, at 08:57, Dmitriy Setrakyan 
 wrote:
> 
>> On Thu, Aug 3, 2017 at 8:45 AM, Jörn Franke 
 wrote:
>> 
>> I think the JDBC one is more inefficient, slower requires too much
>> development effort. You can also check the integration of Alluxio
>> with
>> Spark.
>> 
> 
> As far as I know, Alluxio is a file system, so it cannot use JDBC.
 Ignite,
> on the other hand, is an SQL system and works well with JDBC. As far
>> as
 the
> development effort, we are dealing with SQL, so I am not sure why
>> JDBC
> would be harder.
> 
> Generally speaking, until Ignite provides native data frame
>>> integration,
> having JDBC-based integration out of the box is minimally acceptable.
> 
> 
>> Then, in general I think JDBC has never designed for large data
>>> volumes.
>> It is for executing queries and getting a small or aggregated result
>>> set
>> back. Alternatively for inserting / updating single rows.
>> 
> 
> Agree in general. However, Ignite JDBC is designed to work with
>> larger
 data
> volumes and supports data pagination automatically.
> 
> 
>>> On 3. Aug 2017, at 08:17, Dmitriy Setrakyan >> 
>> wrote:
>>> 
>>> Jorn, thanks for your feedback!
>>> 
>>> Can you explain how the direct support would be different from the
>>> JDBC
>>> support?
>>> 
>>> Thanks,
>>> D.
>>> 
 On Thu, Aug 3, 2017 at 7:40 AM, Jörn Franke >> 
>> wrote:
 
 These 

Re: Spark Data Frame support in Ignite

2017-08-04 Thread Dmitriy Setrakyan
On Thu, Aug 3, 2017 at 9:04 PM, Valentin Kulichenko <
valentin.kuliche...@gmail.com> wrote:

> This JDBC integration is just a Spark data source, which means that Spark
> will fetch data in its local memory first, and only then apply filters,
> aggregations, etc. This is obviously slow and doesn't use all advantages
> Ignite provides.
>
> To create useful and valuable integration, we should create a custom
> Strategy that will convert Spark's logical plan into a SQL query and
> execute it directly on Ignite.
>

I get it, but we have been talking about Data Frame support for longer than
a year. I think we should advise our users to switch to JDBC until the
community gets someone to implement it.


>
> -Val
>
> On Thu, Aug 3, 2017 at 12:12 AM, Dmitriy Setrakyan 
> wrote:
>
> > On Thu, Aug 3, 2017 at 9:04 AM, Jörn Franke 
> wrote:
> >
> > > I think the development effort would still be higher. Everything would
> > > have to be put via JDBC into Ignite, then checkpointing would have to
> be
> > > done via JDBC (again additional development effort), a lot of
> conversion
> > > from spark internal format to JDBC and back to ignite internal format.
> > > Pagination I do not see as a useful feature for managing large data
> > volumes
> > > from databases - on the contrary it is very inefficient (and one would
> to
> > > have to implement logic to fetch al pages). Pagination was also never
> > > thought of for fetching large data volumes, but for web pages showing a
> > > small result set over several pages, where the user can click manually
> > for
> > > the next page (what they anyway not do most of the time).
> > >
> > > While it might be a quick solution , I think a deeper integration than
> > > JDBC would be more beneficial.
> > >
> >
> > Jorn, I completely agree. However, we have not been able to find a
> > contributor for this feature. You sound like you have sufficient domain
> > expertise in Spark and Ignite. Would you be willing to help out?
> >
> >
> > > > On 3. Aug 2017, at 08:57, Dmitriy Setrakyan 
> > > wrote:
> > > >
> > > >> On Thu, Aug 3, 2017 at 8:45 AM, Jörn Franke 
> > > wrote:
> > > >>
> > > >> I think the JDBC one is more inefficient, slower requires too much
> > > >> development effort. You can also check the integration of Alluxio
> with
> > > >> Spark.
> > > >>
> > > >
> > > > As far as I know, Alluxio is a file system, so it cannot use JDBC.
> > > Ignite,
> > > > on the other hand, is an SQL system and works well with JDBC. As far
> as
> > > the
> > > > development effort, we are dealing with SQL, so I am not sure why
> JDBC
> > > > would be harder.
> > > >
> > > > Generally speaking, until Ignite provides native data frame
> > integration,
> > > > having JDBC-based integration out of the box is minimally acceptable.
> > > >
> > > >
> > > >> Then, in general I think JDBC has never designed for large data
> > volumes.
> > > >> It is for executing queries and getting a small or aggregated result
> > set
> > > >> back. Alternatively for inserting / updating single rows.
> > > >>
> > > >
> > > > Agree in general. However, Ignite JDBC is designed to work with
> larger
> > > data
> > > > volumes and supports data pagination automatically.
> > > >
> > > >
> > > >>> On 3. Aug 2017, at 08:17, Dmitriy Setrakyan  >
> > > >> wrote:
> > > >>>
> > > >>> Jorn, thanks for your feedback!
> > > >>>
> > > >>> Can you explain how the direct support would be different from the
> > JDBC
> > > >>> support?
> > > >>>
> > > >>> Thanks,
> > > >>> D.
> > > >>>
> > >  On Thu, Aug 3, 2017 at 7:40 AM, Jörn Franke  >
> > > >> wrote:
> > > 
> > >  These are two different things. Spark applications themselves do
> not
> > > use
> > >  JDBC - it is more for non-spark applications to access Spark
> > > DataFrames.
> > > 
> > >  A direct support by Ignite would make more sense. Although you
> have
> > in
> > >  theory IGFS, if the user is using HDFS, which might not be the
> case.
> > > It
> > > >> is
> > >  now also very common to use Object stores, such as S3.
> > >  Direct support could be leverage for interactive analysis or
> > different
> > >  Spark applications sharing data.
> > > 
> > > > On 3. Aug 2017, at 05:12, Dmitriy Setrakyan <
> dsetrak...@apache.org
> > >
> > >  wrote:
> > > >
> > > > Igniters,
> > > >
> > > > We have had the integration with Spark Data Frames on our roadmap
> > > for a
> > > > while:
> > > > https://issues.apache.org/jira/browse/IGNITE-3084
> > > >
> > > > However, while browsing Spark documentation, I cam across the
> > generic
> > >  JDBC
> > > > data frame support in Spark:
> > > > https://spark.apache.org/docs/latest/sql-programming-guide.
> > >  html#jdbc-to-other-databases
> > > >
> > > > Given that Ignite has a JDBC driver, does it mean that 

Re: Spark Data Frame support in Ignite

2017-08-03 Thread Valentin Kulichenko
This JDBC integration is just a Spark data source, which means that Spark
will fetch data in its local memory first, and only then apply filters,
aggregations, etc. This is obviously slow and doesn't use all advantages
Ignite provides.

To create useful and valuable integration, we should create a custom
Strategy that will convert Spark's logical plan into a SQL query and
execute it directly on Ignite.

-Val

On Thu, Aug 3, 2017 at 12:12 AM, Dmitriy Setrakyan 
wrote:

> On Thu, Aug 3, 2017 at 9:04 AM, Jörn Franke  wrote:
>
> > I think the development effort would still be higher. Everything would
> > have to be put via JDBC into Ignite, then checkpointing would have to be
> > done via JDBC (again additional development effort), a lot of conversion
> > from spark internal format to JDBC and back to ignite internal format.
> > Pagination I do not see as a useful feature for managing large data
> volumes
> > from databases - on the contrary it is very inefficient (and one would to
> > have to implement logic to fetch al pages). Pagination was also never
> > thought of for fetching large data volumes, but for web pages showing a
> > small result set over several pages, where the user can click manually
> for
> > the next page (what they anyway not do most of the time).
> >
> > While it might be a quick solution , I think a deeper integration than
> > JDBC would be more beneficial.
> >
>
> Jorn, I completely agree. However, we have not been able to find a
> contributor for this feature. You sound like you have sufficient domain
> expertise in Spark and Ignite. Would you be willing to help out?
>
>
> > > On 3. Aug 2017, at 08:57, Dmitriy Setrakyan 
> > wrote:
> > >
> > >> On Thu, Aug 3, 2017 at 8:45 AM, Jörn Franke 
> > wrote:
> > >>
> > >> I think the JDBC one is more inefficient, slower requires too much
> > >> development effort. You can also check the integration of Alluxio with
> > >> Spark.
> > >>
> > >
> > > As far as I know, Alluxio is a file system, so it cannot use JDBC.
> > Ignite,
> > > on the other hand, is an SQL system and works well with JDBC. As far as
> > the
> > > development effort, we are dealing with SQL, so I am not sure why JDBC
> > > would be harder.
> > >
> > > Generally speaking, until Ignite provides native data frame
> integration,
> > > having JDBC-based integration out of the box is minimally acceptable.
> > >
> > >
> > >> Then, in general I think JDBC has never designed for large data
> volumes.
> > >> It is for executing queries and getting a small or aggregated result
> set
> > >> back. Alternatively for inserting / updating single rows.
> > >>
> > >
> > > Agree in general. However, Ignite JDBC is designed to work with larger
> > data
> > > volumes and supports data pagination automatically.
> > >
> > >
> > >>> On 3. Aug 2017, at 08:17, Dmitriy Setrakyan 
> > >> wrote:
> > >>>
> > >>> Jorn, thanks for your feedback!
> > >>>
> > >>> Can you explain how the direct support would be different from the
> JDBC
> > >>> support?
> > >>>
> > >>> Thanks,
> > >>> D.
> > >>>
> >  On Thu, Aug 3, 2017 at 7:40 AM, Jörn Franke 
> > >> wrote:
> > 
> >  These are two different things. Spark applications themselves do not
> > use
> >  JDBC - it is more for non-spark applications to access Spark
> > DataFrames.
> > 
> >  A direct support by Ignite would make more sense. Although you have
> in
> >  theory IGFS, if the user is using HDFS, which might not be the case.
> > It
> > >> is
> >  now also very common to use Object stores, such as S3.
> >  Direct support could be leverage for interactive analysis or
> different
> >  Spark applications sharing data.
> > 
> > > On 3. Aug 2017, at 05:12, Dmitriy Setrakyan  >
> >  wrote:
> > >
> > > Igniters,
> > >
> > > We have had the integration with Spark Data Frames on our roadmap
> > for a
> > > while:
> > > https://issues.apache.org/jira/browse/IGNITE-3084
> > >
> > > However, while browsing Spark documentation, I cam across the
> generic
> >  JDBC
> > > data frame support in Spark:
> > > https://spark.apache.org/docs/latest/sql-programming-guide.
> >  html#jdbc-to-other-databases
> > >
> > > Given that Ignite has a JDBC driver, does it mean that it
> > transitively
> >  also
> > > supports Spark data frames? If yes, we should document it.
> > >
> > > D.
> > 
> > >>
> >
>


Re: Spark Data Frame support in Ignite

2017-08-03 Thread Dmitriy Setrakyan
On Thu, Aug 3, 2017 at 9:04 AM, Jörn Franke  wrote:

> I think the development effort would still be higher. Everything would
> have to be put via JDBC into Ignite, then checkpointing would have to be
> done via JDBC (again additional development effort), a lot of conversion
> from spark internal format to JDBC and back to ignite internal format.
> Pagination I do not see as a useful feature for managing large data volumes
> from databases - on the contrary it is very inefficient (and one would to
> have to implement logic to fetch al pages). Pagination was also never
> thought of for fetching large data volumes, but for web pages showing a
> small result set over several pages, where the user can click manually for
> the next page (what they anyway not do most of the time).
>
> While it might be a quick solution , I think a deeper integration than
> JDBC would be more beneficial.
>

Jorn, I completely agree. However, we have not been able to find a
contributor for this feature. You sound like you have sufficient domain
expertise in Spark and Ignite. Would you be willing to help out?


> > On 3. Aug 2017, at 08:57, Dmitriy Setrakyan 
> wrote:
> >
> >> On Thu, Aug 3, 2017 at 8:45 AM, Jörn Franke 
> wrote:
> >>
> >> I think the JDBC one is more inefficient, slower requires too much
> >> development effort. You can also check the integration of Alluxio with
> >> Spark.
> >>
> >
> > As far as I know, Alluxio is a file system, so it cannot use JDBC.
> Ignite,
> > on the other hand, is an SQL system and works well with JDBC. As far as
> the
> > development effort, we are dealing with SQL, so I am not sure why JDBC
> > would be harder.
> >
> > Generally speaking, until Ignite provides native data frame integration,
> > having JDBC-based integration out of the box is minimally acceptable.
> >
> >
> >> Then, in general I think JDBC has never designed for large data volumes.
> >> It is for executing queries and getting a small or aggregated result set
> >> back. Alternatively for inserting / updating single rows.
> >>
> >
> > Agree in general. However, Ignite JDBC is designed to work with larger
> data
> > volumes and supports data pagination automatically.
> >
> >
> >>> On 3. Aug 2017, at 08:17, Dmitriy Setrakyan 
> >> wrote:
> >>>
> >>> Jorn, thanks for your feedback!
> >>>
> >>> Can you explain how the direct support would be different from the JDBC
> >>> support?
> >>>
> >>> Thanks,
> >>> D.
> >>>
>  On Thu, Aug 3, 2017 at 7:40 AM, Jörn Franke 
> >> wrote:
> 
>  These are two different things. Spark applications themselves do not
> use
>  JDBC - it is more for non-spark applications to access Spark
> DataFrames.
> 
>  A direct support by Ignite would make more sense. Although you have in
>  theory IGFS, if the user is using HDFS, which might not be the case.
> It
> >> is
>  now also very common to use Object stores, such as S3.
>  Direct support could be leverage for interactive analysis or different
>  Spark applications sharing data.
> 
> > On 3. Aug 2017, at 05:12, Dmitriy Setrakyan 
>  wrote:
> >
> > Igniters,
> >
> > We have had the integration with Spark Data Frames on our roadmap
> for a
> > while:
> > https://issues.apache.org/jira/browse/IGNITE-3084
> >
> > However, while browsing Spark documentation, I cam across the generic
>  JDBC
> > data frame support in Spark:
> > https://spark.apache.org/docs/latest/sql-programming-guide.
>  html#jdbc-to-other-databases
> >
> > Given that Ignite has a JDBC driver, does it mean that it
> transitively
>  also
> > supports Spark data frames? If yes, we should document it.
> >
> > D.
> 
> >>
>


Re: Spark Data Frame support in Ignite

2017-08-03 Thread Jörn Franke
I think the development effort would still be higher. Everything would have to 
be put via JDBC into Ignite, then checkpointing would have to be done via JDBC 
(again additional development effort), a lot of conversion from spark internal 
format to JDBC and back to ignite internal format. Pagination I do not see as a 
useful feature for managing large data volumes from databases - on the contrary 
it is very inefficient (and one would to have to implement logic to fetch al 
pages). Pagination was also never thought of for fetching large data volumes, 
but for web pages showing a small result set over several pages, where the user 
can click manually for the next page (what they anyway not do most of the time).

While it might be a quick solution , I think a deeper integration than JDBC 
would be more beneficial. 

> On 3. Aug 2017, at 08:57, Dmitriy Setrakyan  wrote:
> 
>> On Thu, Aug 3, 2017 at 8:45 AM, Jörn Franke  wrote:
>> 
>> I think the JDBC one is more inefficient, slower requires too much
>> development effort. You can also check the integration of Alluxio with
>> Spark.
>> 
> 
> As far as I know, Alluxio is a file system, so it cannot use JDBC. Ignite,
> on the other hand, is an SQL system and works well with JDBC. As far as the
> development effort, we are dealing with SQL, so I am not sure why JDBC
> would be harder.
> 
> Generally speaking, until Ignite provides native data frame integration,
> having JDBC-based integration out of the box is minimally acceptable.
> 
> 
>> Then, in general I think JDBC has never designed for large data volumes.
>> It is for executing queries and getting a small or aggregated result set
>> back. Alternatively for inserting / updating single rows.
>> 
> 
> Agree in general. However, Ignite JDBC is designed to work with larger data
> volumes and supports data pagination automatically.
> 
> 
>>> On 3. Aug 2017, at 08:17, Dmitriy Setrakyan 
>> wrote:
>>> 
>>> Jorn, thanks for your feedback!
>>> 
>>> Can you explain how the direct support would be different from the JDBC
>>> support?
>>> 
>>> Thanks,
>>> D.
>>> 
 On Thu, Aug 3, 2017 at 7:40 AM, Jörn Franke 
>> wrote:
 
 These are two different things. Spark applications themselves do not use
 JDBC - it is more for non-spark applications to access Spark DataFrames.
 
 A direct support by Ignite would make more sense. Although you have in
 theory IGFS, if the user is using HDFS, which might not be the case. It
>> is
 now also very common to use Object stores, such as S3.
 Direct support could be leverage for interactive analysis or different
 Spark applications sharing data.
 
> On 3. Aug 2017, at 05:12, Dmitriy Setrakyan 
 wrote:
> 
> Igniters,
> 
> We have had the integration with Spark Data Frames on our roadmap for a
> while:
> https://issues.apache.org/jira/browse/IGNITE-3084
> 
> However, while browsing Spark documentation, I cam across the generic
 JDBC
> data frame support in Spark:
> https://spark.apache.org/docs/latest/sql-programming-guide.
 html#jdbc-to-other-databases
> 
> Given that Ignite has a JDBC driver, does it mean that it transitively
 also
> supports Spark data frames? If yes, we should document it.
> 
> D.
 
>> 


Re: Spark Data Frame support in Ignite

2017-08-03 Thread Dmitriy Setrakyan
On Thu, Aug 3, 2017 at 8:45 AM, Jörn Franke  wrote:

> I think the JDBC one is more inefficient, slower requires too much
> development effort. You can also check the integration of Alluxio with
> Spark.
>

As far as I know, Alluxio is a file system, so it cannot use JDBC. Ignite,
on the other hand, is an SQL system and works well with JDBC. As far as the
development effort, we are dealing with SQL, so I am not sure why JDBC
would be harder.

Generally speaking, until Ignite provides native data frame integration,
having JDBC-based integration out of the box is minimally acceptable.


> Then, in general I think JDBC has never designed for large data volumes.
> It is for executing queries and getting a small or aggregated result set
> back. Alternatively for inserting / updating single rows.
>

Agree in general. However, Ignite JDBC is designed to work with larger data
volumes and supports data pagination automatically.


> > On 3. Aug 2017, at 08:17, Dmitriy Setrakyan 
> wrote:
> >
> > Jorn, thanks for your feedback!
> >
> > Can you explain how the direct support would be different from the JDBC
> > support?
> >
> > Thanks,
> > D.
> >
> >> On Thu, Aug 3, 2017 at 7:40 AM, Jörn Franke 
> wrote:
> >>
> >> These are two different things. Spark applications themselves do not use
> >> JDBC - it is more for non-spark applications to access Spark DataFrames.
> >>
> >> A direct support by Ignite would make more sense. Although you have in
> >> theory IGFS, if the user is using HDFS, which might not be the case. It
> is
> >> now also very common to use Object stores, such as S3.
> >> Direct support could be leverage for interactive analysis or different
> >> Spark applications sharing data.
> >>
> >>> On 3. Aug 2017, at 05:12, Dmitriy Setrakyan 
> >> wrote:
> >>>
> >>> Igniters,
> >>>
> >>> We have had the integration with Spark Data Frames on our roadmap for a
> >>> while:
> >>> https://issues.apache.org/jira/browse/IGNITE-3084
> >>>
> >>> However, while browsing Spark documentation, I cam across the generic
> >> JDBC
> >>> data frame support in Spark:
> >>> https://spark.apache.org/docs/latest/sql-programming-guide.
> >> html#jdbc-to-other-databases
> >>>
> >>> Given that Ignite has a JDBC driver, does it mean that it transitively
> >> also
> >>> supports Spark data frames? If yes, we should document it.
> >>>
> >>> D.
> >>
>


Re: Spark Data Frame support in Ignite

2017-08-03 Thread Jörn Franke
I think the JDBC one is more inefficient, slower requires too much development 
effort. You can also check the integration of Alluxio with Spark. 
Then, in general I think JDBC has never designed for large data volumes. It is 
for executing queries and getting a small or aggregated result set back. 
Alternatively for inserting / updating single rows. 

> On 3. Aug 2017, at 08:17, Dmitriy Setrakyan  wrote:
> 
> Jorn, thanks for your feedback!
> 
> Can you explain how the direct support would be different from the JDBC
> support?
> 
> Thanks,
> D.
> 
>> On Thu, Aug 3, 2017 at 7:40 AM, Jörn Franke  wrote:
>> 
>> These are two different things. Spark applications themselves do not use
>> JDBC - it is more for non-spark applications to access Spark DataFrames.
>> 
>> A direct support by Ignite would make more sense. Although you have in
>> theory IGFS, if the user is using HDFS, which might not be the case. It is
>> now also very common to use Object stores, such as S3.
>> Direct support could be leverage for interactive analysis or different
>> Spark applications sharing data.
>> 
>>> On 3. Aug 2017, at 05:12, Dmitriy Setrakyan 
>> wrote:
>>> 
>>> Igniters,
>>> 
>>> We have had the integration with Spark Data Frames on our roadmap for a
>>> while:
>>> https://issues.apache.org/jira/browse/IGNITE-3084
>>> 
>>> However, while browsing Spark documentation, I cam across the generic
>> JDBC
>>> data frame support in Spark:
>>> https://spark.apache.org/docs/latest/sql-programming-guide.
>> html#jdbc-to-other-databases
>>> 
>>> Given that Ignite has a JDBC driver, does it mean that it transitively
>> also
>>> supports Spark data frames? If yes, we should document it.
>>> 
>>> D.
>> 


Re: Spark Data Frame support in Ignite

2017-08-03 Thread Dmitriy Setrakyan
Jorn, thanks for your feedback!

Can you explain how the direct support would be different from the JDBC
support?

Thanks,
D.

On Thu, Aug 3, 2017 at 7:40 AM, Jörn Franke  wrote:

> These are two different things. Spark applications themselves do not use
> JDBC - it is more for non-spark applications to access Spark DataFrames.
>
> A direct support by Ignite would make more sense. Although you have in
> theory IGFS, if the user is using HDFS, which might not be the case. It is
> now also very common to use Object stores, such as S3.
> Direct support could be leverage for interactive analysis or different
> Spark applications sharing data.
>
> > On 3. Aug 2017, at 05:12, Dmitriy Setrakyan 
> wrote:
> >
> > Igniters,
> >
> > We have had the integration with Spark Data Frames on our roadmap for a
> > while:
> > https://issues.apache.org/jira/browse/IGNITE-3084
> >
> > However, while browsing Spark documentation, I cam across the generic
> JDBC
> > data frame support in Spark:
> > https://spark.apache.org/docs/latest/sql-programming-guide.
> html#jdbc-to-other-databases
> >
> > Given that Ignite has a JDBC driver, does it mean that it transitively
> also
> > supports Spark data frames? If yes, we should document it.
> >
> > D.
>


Re: Spark Data Frame support in Ignite

2017-08-02 Thread Jörn Franke
These are two different things. Spark applications themselves do not use JDBC - 
it is more for non-spark applications to access Spark DataFrames.

A direct support by Ignite would make more sense. Although you have in theory 
IGFS, if the user is using HDFS, which might not be the case. It is now also 
very common to use Object stores, such as S3.
Direct support could be leverage for interactive analysis or different Spark 
applications sharing data.

> On 3. Aug 2017, at 05:12, Dmitriy Setrakyan  wrote:
> 
> Igniters,
> 
> We have had the integration with Spark Data Frames on our roadmap for a
> while:
> https://issues.apache.org/jira/browse/IGNITE-3084
> 
> However, while browsing Spark documentation, I cam across the generic JDBC
> data frame support in Spark:
> https://spark.apache.org/docs/latest/sql-programming-guide.html#jdbc-to-other-databases
> 
> Given that Ignite has a JDBC driver, does it mean that it transitively also
> supports Spark data frames? If yes, we should document it.
> 
> D.