Denis,
This only allows to limit dataset fetched from DB to Spark. This is useful,
but does not replace custom Strategy integration. Because after you create
the FD, you will use its API to do additional filtering, mapping,
aggregation, etc., and this will happen within Spark. With custom
>> This JDBC integration is just a Spark data source, which means that Spark
>> will fetch data in its local memory first, and only then apply filters,
>> aggregations, etc.
Seems that there is a backdoor exposed via the standard SQL syntax. You can
execute so called “pushdown” queries [1] that
On Thu, Aug 3, 2017 at 9:04 PM, Valentin Kulichenko <
valentin.kuliche...@gmail.com> wrote:
> This JDBC integration is just a Spark data source, which means that Spark
> will fetch data in its local memory first, and only then apply filters,
> aggregations, etc. This is obviously slow and doesn't
This JDBC integration is just a Spark data source, which means that Spark
will fetch data in its local memory first, and only then apply filters,
aggregations, etc. This is obviously slow and doesn't use all advantages
Ignite provides.
To create useful and valuable integration, we should create a
On Thu, Aug 3, 2017 at 9:04 AM, Jörn Franke wrote:
> I think the development effort would still be higher. Everything would
> have to be put via JDBC into Ignite, then checkpointing would have to be
> done via JDBC (again additional development effort), a lot of conversion
I think the development effort would still be higher. Everything would have to
be put via JDBC into Ignite, then checkpointing would have to be done via JDBC
(again additional development effort), a lot of conversion from spark internal
format to JDBC and back to ignite internal format.
On Thu, Aug 3, 2017 at 8:45 AM, Jörn Franke wrote:
> I think the JDBC one is more inefficient, slower requires too much
> development effort. You can also check the integration of Alluxio with
> Spark.
>
As far as I know, Alluxio is a file system, so it cannot use JDBC.
I think the JDBC one is more inefficient, slower requires too much development
effort. You can also check the integration of Alluxio with Spark.
Then, in general I think JDBC has never designed for large data volumes. It is
for executing queries and getting a small or aggregated result set
Jorn, thanks for your feedback!
Can you explain how the direct support would be different from the JDBC
support?
Thanks,
D.
On Thu, Aug 3, 2017 at 7:40 AM, Jörn Franke wrote:
> These are two different things. Spark applications themselves do not use
> JDBC - it is more
These are two different things. Spark applications themselves do not use JDBC -
it is more for non-spark applications to access Spark DataFrames.
A direct support by Ignite would make more sense. Although you have in theory
IGFS, if the user is using HDFS, which might not be the case. It is now
10 matches
Mail list logo