Request for review : IGNITE-3303 Apache Flink Integration - Flink source

2018-08-26 Thread Saikat Maitra
Hi,

I have updated the PR with additional tests.

Please review and share feedback.

This PR is related to IgniteSink but allows to stream data from Ignite.

PR https://github.com/apache/ignite/pull/870/files

Review https://reviews.ignite.apache.org/ignite/review/IGNT-CR-135

Regards,
Saikat


[MTCGA]: new failures in builds [1588345] needs to be handled

2018-08-26 Thread dpavlov . tasks
Hi Ignite Developer,

I am MTCGA.Bot, and I've detected some issue on TeamCity to be addressed. I 
hope you can help.

 *Recently contributed test failed in master 
org.apache.ignite.testsuites.IgniteIgfsLinuxAndMacOSTestSuite.initializationError
 
https://ci.ignite.apache.org/project.html?projectId=IgniteTests24Java8&testNameId=-4034437294452110435&branch=%3Cdefault%3E&tab=testDetails
 No changes in build

- If your changes can led to this failure(s), please create issue with 
label MakeTeamCityGreenAgain and assign it to you.
-- If you have fix, please set ticket to PA state and write to dev list 
fix is ready 
-- For case fix will require some time please mute test and set label 
Muted_Test to issue 
- If you know which change caused failure please contact change author 
directly
- If you don't know which change caused failure please send message to 
dev list to find out
Should you have any questions please contact dev@ignite.apache.org 
Best Regards,
MTCGA.Bot 
Notification generated at Mon Aug 27 09:43:20 MSK 2018 


Data streaming using Apache Ignite and Flink

2018-08-26 Thread Saikat Maitra
Hello,

I recently published blog on how we can stream data using Apache Ignite and
Flink. This uses IgniteSink with recent changes merged (release due in
2.7.0) which will allow us to run IgniteSink using Apache Flink in cluster
mode.


https://samaitra.blogspot.com/2018/08/data-streaming-using-apache-flink-and.html

Please review and let me know if you have feedback.

Regards,
Saikat


Re: Table Names in Spark Catalog

2018-08-26 Thread Nikolay Izhikov
Igniters, 

Personally, I don't like the solution with database == schema name.

1. I think we should try to use the right abstractions. 
schema == database doesn't sound right for me.

Do you want to answer to all of our users something like that:

- "How I can change Ignite SQL schema?"
- "This is obvious, just use setDatabase("MY_SCHEMA_NAME")".

2. I think we restrict whole solution with that decision.
If Ignite will support multiple databases in the future we just don't have a 
place for it.

I think we should do the following:

1. IgniteExternalCatalog should be able to return *ALL* tables within 
Ignite instance. 
We shouldn't restrict tables list by schema by default.
We should return tables with schema name - `schema.table`

2. We should introduce `OPTION_SCHEMA` for a dataframe to specify a 
schema.

There is an issue with the second step: We can't use schema name in 
`CREATE TABLE` clause.
This is restriction of current Ignite SQL.

I propose to make the following:

1. For all write modes that requires the creation of table we should 
disallow usage of table outside of `SQL_PUBLIC`
or usage of `OPTION_SCHEMA`. We should throw proper exception for this 
case.

2. Create a ticket to support `CREATE TABLE` with custom schema name.

3. After resolving ticket from step 2 we can add full support of custom 
schema to Spark integration.

4. We should throw an exception if user try to use setDatabase.

Is that makes sense for you?

В Вс, 26/08/2018 в 14:09 +0100, Stuart Macdonald пишет:
> I'll go ahead and make the changes to represent the schema name as the
> database name for the purposes of the Spark catalog.
> 
> If anyone knows of an existing way to list all available schemata within an
> Ignite instance please let me know, otherwise the first task will be
> creating that mechanism.
> 
> Stuart.
> 
> On Fri, Aug 24, 2018 at 6:23 PM Valentin Kulichenko <
> valentin.kuliche...@gmail.com> wrote:
> 
> > Nikolay,
> > 
> > If there are multiple configuration in XML, IgniteContext will always use
> > only one of them. Looks like current approach simply doesn't work. I
> > propose to report schema name as 'database' in Spark. If there are multiple
> > clients, you would create multiple sessions and multiple catalogs.
> > 
> > Makes sense?
> > 
> > -Val
> > 
> > On Fri, Aug 24, 2018 at 12:33 AM Nikolay Izhikov 
> > wrote:
> > 
> > > Hello, Valentin.
> > > 
> > > > catalog exist in scope of a single IgniteSparkSession> (and therefore
> > > 
> > > single IgniteContext and single Ignite instance)?
> > > 
> > > Yes.
> > > Actually, I was thinking about use case when we have several Ignite
> > > configuration in one XML file.
> > > Now I see, may be this is too rare use-case to support.
> > > 
> > > Stuart, Valentin, What is your proposal?
> > > 
> > > В Ср, 22/08/2018 в 08:56 -0700, Valentin Kulichenko пишет:
> > > > Nikolay,
> > > > 
> > > > Whatever we decide on would be right :) Basically, we need to answer
> > 
> > this
> > > > question: does the catalog exist in scope of a single
> > 
> > IgniteSparkSession
> > > > (and therefore single IgniteContext and single Ignite instance)? In
> > 
> > other
> > > > words, in case of a rare use case when a single Spark application
> > > 
> > > connects
> > > > to multiple Ignite clusters, would there be a catalog created per
> > > 
> > > cluster?
> > > > 
> > > > If the answer is yes, current logic doesn't make sense.
> > > > 
> > > > -Val
> > > > 
> > > > 
> > > > On Wed, Aug 22, 2018 at 1:44 AM Nikolay Izhikov 
> > > 
> > > wrote:
> > > > 
> > > > > Hello, Valentin.
> > > > > 
> > > > > > I believe we should get rid of this logic and use Ignite schema
> > 
> > name
> > > as
> > > > > 
> > > > > database name in Spark's catalog.
> > > > > 
> > > > > When I develop Ignite integration with Spark Data Frame I use
> > 
> > following
> > > > > abstraction described by Vladimir Ozerov:
> > > > > 
> > > > > "1) Let's consider Ignite cluster as a single database ("catalog" in
> > > 
> > > ANSI
> > > > > SQL'92 terms)." [1]
> > > > > 
> > > > > Am I was wrong? If yes - let's fix it.
> > > > > 
> > > > > [1]
> > > > > 
> > 
> > http://apache-ignite-developers.2346864.n4.nabble.com/SQL-usability-catalogs-schemas-and-tables-td17148.html
> > > > > 
> > > > > В Ср, 22/08/2018 в 09:26 +0100, Stuart Macdonald пишет:
> > > > > > Hi Val, yes that's correct. I'd be happy to make the change to have
> > > 
> > > the
> > > > > > database reference the schema if Nikolay agrees. (I'll first need
> > 
> > to
> > > do a
> > > > > > bit of research into how to obtain the list of all available
> > > 
> > > schemata...)
> > > > > > 
> > > > > > Thanks,
> > > > > > Stuart.
> > > > > > 
> > > > > > On Tue, Aug 21, 2018 at 9:43 PM, Valentin Kulichenko <
> > > > > > valentin.kuliche...@gmail.com> wrote:
> > > > > > 
> > > > > > > Stuart,
> > > > > > > 
> > > > > > > Thanks for pointing this out, I was not 

Re: Table Names in Spark Catalog

2018-08-26 Thread Stuart Macdonald
I'll go ahead and make the changes to represent the schema name as the
database name for the purposes of the Spark catalog.

If anyone knows of an existing way to list all available schemata within an
Ignite instance please let me know, otherwise the first task will be
creating that mechanism.

Stuart.

On Fri, Aug 24, 2018 at 6:23 PM Valentin Kulichenko <
valentin.kuliche...@gmail.com> wrote:

> Nikolay,
>
> If there are multiple configuration in XML, IgniteContext will always use
> only one of them. Looks like current approach simply doesn't work. I
> propose to report schema name as 'database' in Spark. If there are multiple
> clients, you would create multiple sessions and multiple catalogs.
>
> Makes sense?
>
> -Val
>
> On Fri, Aug 24, 2018 at 12:33 AM Nikolay Izhikov 
> wrote:
>
> > Hello, Valentin.
> >
> > > catalog exist in scope of a single IgniteSparkSession> (and therefore
> > single IgniteContext and single Ignite instance)?
> >
> > Yes.
> > Actually, I was thinking about use case when we have several Ignite
> > configuration in one XML file.
> > Now I see, may be this is too rare use-case to support.
> >
> > Stuart, Valentin, What is your proposal?
> >
> > В Ср, 22/08/2018 в 08:56 -0700, Valentin Kulichenko пишет:
> > > Nikolay,
> > >
> > > Whatever we decide on would be right :) Basically, we need to answer
> this
> > > question: does the catalog exist in scope of a single
> IgniteSparkSession
> > > (and therefore single IgniteContext and single Ignite instance)? In
> other
> > > words, in case of a rare use case when a single Spark application
> > connects
> > > to multiple Ignite clusters, would there be a catalog created per
> > cluster?
> > >
> > > If the answer is yes, current logic doesn't make sense.
> > >
> > > -Val
> > >
> > >
> > > On Wed, Aug 22, 2018 at 1:44 AM Nikolay Izhikov 
> > wrote:
> > >
> > > > Hello, Valentin.
> > > >
> > > > > I believe we should get rid of this logic and use Ignite schema
> name
> > as
> > > >
> > > > database name in Spark's catalog.
> > > >
> > > > When I develop Ignite integration with Spark Data Frame I use
> following
> > > > abstraction described by Vladimir Ozerov:
> > > >
> > > > "1) Let's consider Ignite cluster as a single database ("catalog" in
> > ANSI
> > > > SQL'92 terms)." [1]
> > > >
> > > > Am I was wrong? If yes - let's fix it.
> > > >
> > > > [1]
> > > >
> >
> http://apache-ignite-developers.2346864.n4.nabble.com/SQL-usability-catalogs-schemas-and-tables-td17148.html
> > > >
> > > > В Ср, 22/08/2018 в 09:26 +0100, Stuart Macdonald пишет:
> > > > > Hi Val, yes that's correct. I'd be happy to make the change to have
> > the
> > > > > database reference the schema if Nikolay agrees. (I'll first need
> to
> > do a
> > > > > bit of research into how to obtain the list of all available
> > schemata...)
> > > > >
> > > > > Thanks,
> > > > > Stuart.
> > > > >
> > > > > On Tue, Aug 21, 2018 at 9:43 PM, Valentin Kulichenko <
> > > > > valentin.kuliche...@gmail.com> wrote:
> > > > >
> > > > > > Stuart,
> > > > > >
> > > > > > Thanks for pointing this out, I was not aware that we use Spark
> > > >
> > > > database
> > > > > > concept this way. Actually, this confuses me a lot. As far as I
> > > >
> > > > understand,
> > > > > > catalog is created in the scope of a particular
> IgniteSparkSession,
> > > >
> > > > which
> > > > > > in turn is assigned to a particular IgniteContext and therefore
> > single
> > > > > > Ignite client. If that's the case, I don't think it should be
> > aware of
> > > > > > other Ignite clients that are connected to other clusters. This
> > doesn't
> > > > > > look like correct behavior to me, not to mention that with this
> > > >
> > > > approach
> > > > > > having multiple databases would be a very rare case. I believe we
> > > >
> > > > should
> > > > > > get rid of this logic and use Ignite schema name as database name
> > in
> > > > > > Spark's catalog.
> > > > > >
> > > > > > Nikolay, what do you think?
> > > > > >
> > > > > > -Val
> > > > > >
> > > > > > On Tue, Aug 21, 2018 at 8:17 AM Stuart Macdonald <
> > stu...@stuwee.org>
> > > > > > wrote:
> > > > > >
> > > > > > > Nikolay, Val,
> > > > > > >
> > > > > > > The JDBC Spark datasource[1] -- as far as I can tell -- has no
> > > > > > > ExternalCatalog implementation, it just uses the database
> > specified
> > > >
> > > > in the
> > > > > > > JDBC URL. So I don't believe there is any way to call
> > listTables() or
> > > > > > > listDatabases() for JDBC provider.
> > > > > > >
> > > > > > > The Hive ExternalCatalog[2] makes the distinction between
> > database
> > > >
> > > > and
> > > > > > > table using the actual database and table mechanisms built into
> > the
> > > > > > > catalog, which is fine because Hive has the clear distinction
> and
> > > > > > > hierarchy
> > > > > > > of databases and tables.
> > > > > > >
> > > > > > > *However* Ignite already uses the "database" concept in the
> > Ignite
> > > > > > >
> > > > > > > ExternalCatalog[3] to mean the