Re: Spark SQL Table Name Resolution

2018-08-18 Thread Dmitriy Pavlov
Hi Stuart,

I see review already started and Nikolay responded in GitHub.

I've added you to contributors list, so now you can assign issues to
yourself.  Also, I assigned
https://issues.apache.org/jira/browse/IGNITE-9228 issue to you. The issue
could be correctly filtered by all committers. I hope you don't mind.

Sincerely,
Dmitriy Pavlov

пт, 17 авг. 2018 г. в 10:22, Stuart Macdonald :

> Hi Dmitriy, thanks - that’s done now,
>
> Stuart.
>
> On 16 Aug 2018, at 22:23, Dmitriy Setrakyan  wrote:
>
> Stuart, can you please move the ticket into PATCH_AVAILABLE state? You need
> to click "Submit Patch" button in Jira.
>
> D.
>
> On Wed, Aug 15, 2018 at 10:22 AM, Stuart Macdonald 
> wrote:
>
> > Here's the initial pull request for this issue, please review and let me
> > know your feedback. I had to combine the two approaches to enable this to
> > work for both standard .read() where we can add the schema option, and
> > catalog-based selects where we use schemaName.tableName. Happy to discuss
> > on a call if this isn't clear.
> >
> > https://github.com/apache/ignite/pull/4551
> >
> > On Thu, Aug 9, 2018 at 2:32 PM, Stuart Macdonald 
> > wrote:
> >
> > > Hi Nikolay, yes would be happy to - will likely be early next week.
> I’ll
> > > go with the approach of adding a new optional field to the Spark data
> > > source provider unless there are any objections.
> > >
> > > Stuart.
> > >
> > > > On 9 Aug 2018, at 14:20, Nikolay Izhikov 
> wrote:
> > > >
> > > > Stuart, do you want to work on this ticket?
> > > >
> > > > В Вт, 07/08/2018 в 11:13 -0700, Stuart Macdonald пишет:
> > > >> Thanks Val, here’s the ticket:
> > > >>
> > > >> https://issues.apache.org/jira/projects/IGNITE/issues/IGNITE-9228
> > > >>  > > IGNITE-9228?filter=allopenissues>
> > > >>
> > > >> (Thanks for correcting my terminology - I work mostly with the
> > > traditional
> > > >> CacheConfiguration interface where I believe each cache occupies its
> > own
> > > >> schema.)
> > > >>
> > > >> Stuart.
> > > >>
> > > >> On 7 Aug 2018, at 18:34, Valentin Kulichenko <
> > > valentin.kuliche...@gmail.com>
> > > >> wrote:
> > > >>
> > > >> Stuart,
> > > >>
> > > >> Two tables can have same names only if they are located in different
> > > >> schemas. Said that, sdding schema name support makes sense to me for
> > > sure.
> > > >> We can implement this using either separate SCHEMA_NAME parameter,
> or
> > > >> similar to what you suggested in option 3 but with schema name
> instead
> > > of
> > > >> cache name.
> > > >>
> > > >> Please feel free to create a ticket.
> > > >>
> > > >> -Val
> > > >>
> > > >> On Tue, Aug 7, 2018 at 9:32 AM Stuart Macdonald 
> > > wrote:
> > > >>
> > > >> Hello Igniters,
> > > >>
> > > >>
> > > >> The Ignite Spark SQL interface currently takes just “table name” as
> a
> > > >>
> > > >> parameter which it uses to supply a Spark dataset with data from the
> > > >>
> > > >> underlying Ignite SQL table with that name.
> > > >>
> > > >>
> > > >> To do this it loops through each cache and finds the first one with
> > the
> > > >>
> > > >> given table name [1]. This causes issues if there are multiple
> tables
> > > >>
> > > >> registered in different caches with the same table name as you can
> > only
> > > >>
> > > >> access one of those caches from Spark. Is the right thing to do
> here:
> > > >>
> > > >>
> > > >> 1. Simply not support such a scenario and note in the Spark
> > > documentation
> > > >>
> > > >> that table names must be unique?
> > > >>
> > > >> 2. Pass an extra parameter through the Ignite Spark data source
> which
> > > >>
> > > >> optionally specifies the cache name?
> > > >>
> > > >> 3. Support namespacing in the existing table name parameter, ie
> > > >>
> > > >> “cacheName.tableName”?
> > > >>
> > > >>
> > > >> Thanks,
> > > >>
> > > >> Stuart.
> > > >>
> > > >>
> > > >> [1]
> > > >>
> > > >>
> > > >>
> https://github.com/apache/ignite/blob/ca973ad99c6112160a305df05be945
> > > 8e29f88307/modules/spark/src/main/scala/org/apache/ignite/
> > > spark/impl/package.scala#L119
> > >
> >
>


Re: Spark SQL Table Name Resolution

2018-08-17 Thread Stuart Macdonald
Hi Dmitriy, thanks - that’s done now,

Stuart.

On 16 Aug 2018, at 22:23, Dmitriy Setrakyan  wrote:

Stuart, can you please move the ticket into PATCH_AVAILABLE state? You need
to click "Submit Patch" button in Jira.

D.

On Wed, Aug 15, 2018 at 10:22 AM, Stuart Macdonald 
wrote:

> Here's the initial pull request for this issue, please review and let me
> know your feedback. I had to combine the two approaches to enable this to
> work for both standard .read() where we can add the schema option, and
> catalog-based selects where we use schemaName.tableName. Happy to discuss
> on a call if this isn't clear.
>
> https://github.com/apache/ignite/pull/4551
>
> On Thu, Aug 9, 2018 at 2:32 PM, Stuart Macdonald 
> wrote:
>
> > Hi Nikolay, yes would be happy to - will likely be early next week. I’ll
> > go with the approach of adding a new optional field to the Spark data
> > source provider unless there are any objections.
> >
> > Stuart.
> >
> > > On 9 Aug 2018, at 14:20, Nikolay Izhikov  wrote:
> > >
> > > Stuart, do you want to work on this ticket?
> > >
> > > В Вт, 07/08/2018 в 11:13 -0700, Stuart Macdonald пишет:
> > >> Thanks Val, here’s the ticket:
> > >>
> > >> https://issues.apache.org/jira/projects/IGNITE/issues/IGNITE-9228
> > >>  > IGNITE-9228?filter=allopenissues>
> > >>
> > >> (Thanks for correcting my terminology - I work mostly with the
> > traditional
> > >> CacheConfiguration interface where I believe each cache occupies its
> own
> > >> schema.)
> > >>
> > >> Stuart.
> > >>
> > >> On 7 Aug 2018, at 18:34, Valentin Kulichenko <
> > valentin.kuliche...@gmail.com>
> > >> wrote:
> > >>
> > >> Stuart,
> > >>
> > >> Two tables can have same names only if they are located in different
> > >> schemas. Said that, sdding schema name support makes sense to me for
> > sure.
> > >> We can implement this using either separate SCHEMA_NAME parameter, or
> > >> similar to what you suggested in option 3 but with schema name instead
> > of
> > >> cache name.
> > >>
> > >> Please feel free to create a ticket.
> > >>
> > >> -Val
> > >>
> > >> On Tue, Aug 7, 2018 at 9:32 AM Stuart Macdonald 
> > wrote:
> > >>
> > >> Hello Igniters,
> > >>
> > >>
> > >> The Ignite Spark SQL interface currently takes just “table name” as a
> > >>
> > >> parameter which it uses to supply a Spark dataset with data from the
> > >>
> > >> underlying Ignite SQL table with that name.
> > >>
> > >>
> > >> To do this it loops through each cache and finds the first one with
> the
> > >>
> > >> given table name [1]. This causes issues if there are multiple tables
> > >>
> > >> registered in different caches with the same table name as you can
> only
> > >>
> > >> access one of those caches from Spark. Is the right thing to do here:
> > >>
> > >>
> > >> 1. Simply not support such a scenario and note in the Spark
> > documentation
> > >>
> > >> that table names must be unique?
> > >>
> > >> 2. Pass an extra parameter through the Ignite Spark data source which
> > >>
> > >> optionally specifies the cache name?
> > >>
> > >> 3. Support namespacing in the existing table name parameter, ie
> > >>
> > >> “cacheName.tableName”?
> > >>
> > >>
> > >> Thanks,
> > >>
> > >> Stuart.
> > >>
> > >>
> > >> [1]
> > >>
> > >>
> > >> https://github.com/apache/ignite/blob/ca973ad99c6112160a305df05be945
> > 8e29f88307/modules/spark/src/main/scala/org/apache/ignite/
> > spark/impl/package.scala#L119
> >
>


Re: Spark SQL Table Name Resolution

2018-08-16 Thread Dmitriy Setrakyan
Stuart, can you please move the ticket into PATCH_AVAILABLE state? You need
to click "Submit Patch" button in Jira.

D.

On Wed, Aug 15, 2018 at 10:22 AM, Stuart Macdonald 
wrote:

> Here's the initial pull request for this issue, please review and let me
> know your feedback. I had to combine the two approaches to enable this to
> work for both standard .read() where we can add the schema option, and
> catalog-based selects where we use schemaName.tableName. Happy to discuss
> on a call if this isn't clear.
>
> https://github.com/apache/ignite/pull/4551
>
> On Thu, Aug 9, 2018 at 2:32 PM, Stuart Macdonald 
> wrote:
>
> > Hi Nikolay, yes would be happy to - will likely be early next week. I’ll
> > go with the approach of adding a new optional field to the Spark data
> > source provider unless there are any objections.
> >
> > Stuart.
> >
> > > On 9 Aug 2018, at 14:20, Nikolay Izhikov  wrote:
> > >
> > > Stuart, do you want to work on this ticket?
> > >
> > > В Вт, 07/08/2018 в 11:13 -0700, Stuart Macdonald пишет:
> > >> Thanks Val, here’s the ticket:
> > >>
> > >> https://issues.apache.org/jira/projects/IGNITE/issues/IGNITE-9228
> > >>  > IGNITE-9228?filter=allopenissues>
> > >>
> > >> (Thanks for correcting my terminology - I work mostly with the
> > traditional
> > >> CacheConfiguration interface where I believe each cache occupies its
> own
> > >> schema.)
> > >>
> > >> Stuart.
> > >>
> > >> On 7 Aug 2018, at 18:34, Valentin Kulichenko <
> > valentin.kuliche...@gmail.com>
> > >> wrote:
> > >>
> > >> Stuart,
> > >>
> > >> Two tables can have same names only if they are located in different
> > >> schemas. Said that, sdding schema name support makes sense to me for
> > sure.
> > >> We can implement this using either separate SCHEMA_NAME parameter, or
> > >> similar to what you suggested in option 3 but with schema name instead
> > of
> > >> cache name.
> > >>
> > >> Please feel free to create a ticket.
> > >>
> > >> -Val
> > >>
> > >> On Tue, Aug 7, 2018 at 9:32 AM Stuart Macdonald 
> > wrote:
> > >>
> > >> Hello Igniters,
> > >>
> > >>
> > >> The Ignite Spark SQL interface currently takes just “table name” as a
> > >>
> > >> parameter which it uses to supply a Spark dataset with data from the
> > >>
> > >> underlying Ignite SQL table with that name.
> > >>
> > >>
> > >> To do this it loops through each cache and finds the first one with
> the
> > >>
> > >> given table name [1]. This causes issues if there are multiple tables
> > >>
> > >> registered in different caches with the same table name as you can
> only
> > >>
> > >> access one of those caches from Spark. Is the right thing to do here:
> > >>
> > >>
> > >> 1. Simply not support such a scenario and note in the Spark
> > documentation
> > >>
> > >> that table names must be unique?
> > >>
> > >> 2. Pass an extra parameter through the Ignite Spark data source which
> > >>
> > >> optionally specifies the cache name?
> > >>
> > >> 3. Support namespacing in the existing table name parameter, ie
> > >>
> > >> “cacheName.tableName”?
> > >>
> > >>
> > >> Thanks,
> > >>
> > >> Stuart.
> > >>
> > >>
> > >> [1]
> > >>
> > >>
> > >> https://github.com/apache/ignite/blob/ca973ad99c6112160a305df05be945
> > 8e29f88307/modules/spark/src/main/scala/org/apache/ignite/
> > spark/impl/package.scala#L119
> >
>


Re: Spark SQL Table Name Resolution

2018-08-15 Thread Stuart Macdonald
Here's the initial pull request for this issue, please review and let me
know your feedback. I had to combine the two approaches to enable this to
work for both standard .read() where we can add the schema option, and
catalog-based selects where we use schemaName.tableName. Happy to discuss
on a call if this isn't clear.

https://github.com/apache/ignite/pull/4551

On Thu, Aug 9, 2018 at 2:32 PM, Stuart Macdonald  wrote:

> Hi Nikolay, yes would be happy to - will likely be early next week. I’ll
> go with the approach of adding a new optional field to the Spark data
> source provider unless there are any objections.
>
> Stuart.
>
> > On 9 Aug 2018, at 14:20, Nikolay Izhikov  wrote:
> >
> > Stuart, do you want to work on this ticket?
> >
> > В Вт, 07/08/2018 в 11:13 -0700, Stuart Macdonald пишет:
> >> Thanks Val, here’s the ticket:
> >>
> >> https://issues.apache.org/jira/projects/IGNITE/issues/IGNITE-9228
> >>  IGNITE-9228?filter=allopenissues>
> >>
> >> (Thanks for correcting my terminology - I work mostly with the
> traditional
> >> CacheConfiguration interface where I believe each cache occupies its own
> >> schema.)
> >>
> >> Stuart.
> >>
> >> On 7 Aug 2018, at 18:34, Valentin Kulichenko <
> valentin.kuliche...@gmail.com>
> >> wrote:
> >>
> >> Stuart,
> >>
> >> Two tables can have same names only if they are located in different
> >> schemas. Said that, sdding schema name support makes sense to me for
> sure.
> >> We can implement this using either separate SCHEMA_NAME parameter, or
> >> similar to what you suggested in option 3 but with schema name instead
> of
> >> cache name.
> >>
> >> Please feel free to create a ticket.
> >>
> >> -Val
> >>
> >> On Tue, Aug 7, 2018 at 9:32 AM Stuart Macdonald 
> wrote:
> >>
> >> Hello Igniters,
> >>
> >>
> >> The Ignite Spark SQL interface currently takes just “table name” as a
> >>
> >> parameter which it uses to supply a Spark dataset with data from the
> >>
> >> underlying Ignite SQL table with that name.
> >>
> >>
> >> To do this it loops through each cache and finds the first one with the
> >>
> >> given table name [1]. This causes issues if there are multiple tables
> >>
> >> registered in different caches with the same table name as you can only
> >>
> >> access one of those caches from Spark. Is the right thing to do here:
> >>
> >>
> >> 1. Simply not support such a scenario and note in the Spark
> documentation
> >>
> >> that table names must be unique?
> >>
> >> 2. Pass an extra parameter through the Ignite Spark data source which
> >>
> >> optionally specifies the cache name?
> >>
> >> 3. Support namespacing in the existing table name parameter, ie
> >>
> >> “cacheName.tableName”?
> >>
> >>
> >> Thanks,
> >>
> >> Stuart.
> >>
> >>
> >> [1]
> >>
> >>
> >> https://github.com/apache/ignite/blob/ca973ad99c6112160a305df05be945
> 8e29f88307/modules/spark/src/main/scala/org/apache/ignite/
> spark/impl/package.scala#L119
>


Re: Spark SQL Table Name Resolution

2018-08-09 Thread Stuart Macdonald
Hi Nikolay, yes would be happy to - will likely be early next week.
I’ll go with the approach of adding a new optional field to the Spark
data source provider unless there are any objections.

Stuart.

> On 9 Aug 2018, at 14:20, Nikolay Izhikov  wrote:
>
> Stuart, do you want to work on this ticket?
>
> В Вт, 07/08/2018 в 11:13 -0700, Stuart Macdonald пишет:
>> Thanks Val, here’s the ticket:
>>
>> https://issues.apache.org/jira/projects/IGNITE/issues/IGNITE-9228
>> 
>>
>> (Thanks for correcting my terminology - I work mostly with the traditional
>> CacheConfiguration interface where I believe each cache occupies its own
>> schema.)
>>
>> Stuart.
>>
>> On 7 Aug 2018, at 18:34, Valentin Kulichenko 
>> wrote:
>>
>> Stuart,
>>
>> Two tables can have same names only if they are located in different
>> schemas. Said that, sdding schema name support makes sense to me for sure.
>> We can implement this using either separate SCHEMA_NAME parameter, or
>> similar to what you suggested in option 3 but with schema name instead of
>> cache name.
>>
>> Please feel free to create a ticket.
>>
>> -Val
>>
>> On Tue, Aug 7, 2018 at 9:32 AM Stuart Macdonald  wrote:
>>
>> Hello Igniters,
>>
>>
>> The Ignite Spark SQL interface currently takes just “table name” as a
>>
>> parameter which it uses to supply a Spark dataset with data from the
>>
>> underlying Ignite SQL table with that name.
>>
>>
>> To do this it loops through each cache and finds the first one with the
>>
>> given table name [1]. This causes issues if there are multiple tables
>>
>> registered in different caches with the same table name as you can only
>>
>> access one of those caches from Spark. Is the right thing to do here:
>>
>>
>> 1. Simply not support such a scenario and note in the Spark documentation
>>
>> that table names must be unique?
>>
>> 2. Pass an extra parameter through the Ignite Spark data source which
>>
>> optionally specifies the cache name?
>>
>> 3. Support namespacing in the existing table name parameter, ie
>>
>> “cacheName.tableName”?
>>
>>
>> Thanks,
>>
>> Stuart.
>>
>>
>> [1]
>>
>>
>> https://github.com/apache/ignite/blob/ca973ad99c6112160a305df05be9458e29f88307/modules/spark/src/main/scala/org/apache/ignite/spark/impl/package.scala#L119


Re: Spark SQL Table Name Resolution

2018-08-09 Thread Nikolay Izhikov
Stuart, do you want to work on this ticket?

В Вт, 07/08/2018 в 11:13 -0700, Stuart Macdonald пишет:
> Thanks Val, here’s the ticket:
> 
> https://issues.apache.org/jira/projects/IGNITE/issues/IGNITE-9228
> 
> 
> (Thanks for correcting my terminology - I work mostly with the traditional
> CacheConfiguration interface where I believe each cache occupies its own
> schema.)
> 
> Stuart.
> 
> On 7 Aug 2018, at 18:34, Valentin Kulichenko 
> wrote:
> 
> Stuart,
> 
> Two tables can have same names only if they are located in different
> schemas. Said that, sdding schema name support makes sense to me for sure.
> We can implement this using either separate SCHEMA_NAME parameter, or
> similar to what you suggested in option 3 but with schema name instead of
> cache name.
> 
> Please feel free to create a ticket.
> 
> -Val
> 
> On Tue, Aug 7, 2018 at 9:32 AM Stuart Macdonald  wrote:
> 
> Hello Igniters,
> 
> 
> The Ignite Spark SQL interface currently takes just “table name” as a
> 
> parameter which it uses to supply a Spark dataset with data from the
> 
> underlying Ignite SQL table with that name.
> 
> 
> To do this it loops through each cache and finds the first one with the
> 
> given table name [1]. This causes issues if there are multiple tables
> 
> registered in different caches with the same table name as you can only
> 
> access one of those caches from Spark. Is the right thing to do here:
> 
> 
> 1. Simply not support such a scenario and note in the Spark documentation
> 
> that table names must be unique?
> 
> 2. Pass an extra parameter through the Ignite Spark data source which
> 
> optionally specifies the cache name?
> 
> 3. Support namespacing in the existing table name parameter, ie
> 
> “cacheName.tableName”?
> 
> 
> Thanks,
> 
> Stuart.
> 
> 
> [1]
> 
> 
> https://github.com/apache/ignite/blob/ca973ad99c6112160a305df05be9458e29f88307/modules/spark/src/main/scala/org/apache/ignite/spark/impl/package.scala#L119

signature.asc
Description: This is a digitally signed message part


Re: Spark SQL Table Name Resolution

2018-08-07 Thread Stuart Macdonald
Thanks Val, here’s the ticket:

https://issues.apache.org/jira/projects/IGNITE/issues/IGNITE-9228


(Thanks for correcting my terminology - I work mostly with the traditional
CacheConfiguration interface where I believe each cache occupies its own
schema.)

Stuart.

On 7 Aug 2018, at 18:34, Valentin Kulichenko 
wrote:

Stuart,

Two tables can have same names only if they are located in different
schemas. Said that, sdding schema name support makes sense to me for sure.
We can implement this using either separate SCHEMA_NAME parameter, or
similar to what you suggested in option 3 but with schema name instead of
cache name.

Please feel free to create a ticket.

-Val

On Tue, Aug 7, 2018 at 9:32 AM Stuart Macdonald  wrote:

Hello Igniters,


The Ignite Spark SQL interface currently takes just “table name” as a

parameter which it uses to supply a Spark dataset with data from the

underlying Ignite SQL table with that name.


To do this it loops through each cache and finds the first one with the

given table name [1]. This causes issues if there are multiple tables

registered in different caches with the same table name as you can only

access one of those caches from Spark. Is the right thing to do here:


1. Simply not support such a scenario and note in the Spark documentation

that table names must be unique?

2. Pass an extra parameter through the Ignite Spark data source which

optionally specifies the cache name?

3. Support namespacing in the existing table name parameter, ie

“cacheName.tableName”?


Thanks,

Stuart.


[1]


https://github.com/apache/ignite/blob/ca973ad99c6112160a305df05be9458e29f88307/modules/spark/src/main/scala/org/apache/ignite/spark/impl/package.scala#L119


Re: Spark SQL Table Name Resolution

2018-08-07 Thread Valentin Kulichenko
Stuart,

Two tables can have same names only if they are located in different
schemas. Said that, sdding schema name support makes sense to me for sure.
We can implement this using either separate SCHEMA_NAME parameter, or
similar to what you suggested in option 3 but with schema name instead of
cache name.

Please feel free to create a ticket.

-Val

On Tue, Aug 7, 2018 at 9:32 AM Stuart Macdonald  wrote:

> Hello Igniters,
>
> The Ignite Spark SQL interface currently takes just “table name” as a
> parameter which it uses to supply a Spark dataset with data from the
> underlying Ignite SQL table with that name.
>
> To do this it loops through each cache and finds the first one with the
> given table name [1]. This causes issues if there are multiple tables
> registered in different caches with the same table name as you can only
> access one of those caches from Spark. Is the right thing to do here:
>
> 1. Simply not support such a scenario and note in the Spark documentation
> that table names must be unique?
> 2. Pass an extra parameter through the Ignite Spark data source which
> optionally specifies the cache name?
> 3. Support namespacing in the existing table name parameter, ie
> “cacheName.tableName”?
>
> Thanks,
> Stuart.
>
> [1]
>
> https://github.com/apache/ignite/blob/ca973ad99c6112160a305df05be9458e29f88307/modules/spark/src/main/scala/org/apache/ignite/spark/impl/package.scala#L119
>


Spark SQL Table Name Resolution

2018-08-07 Thread Stuart Macdonald
Hello Igniters,

The Ignite Spark SQL interface currently takes just “table name” as a
parameter which it uses to supply a Spark dataset with data from the
underlying Ignite SQL table with that name.

To do this it loops through each cache and finds the first one with the
given table name [1]. This causes issues if there are multiple tables
registered in different caches with the same table name as you can only
access one of those caches from Spark. Is the right thing to do here:

1. Simply not support such a scenario and note in the Spark documentation
that table names must be unique?
2. Pass an extra parameter through the Ignite Spark data source which
optionally specifies the cache name?
3. Support namespacing in the existing table name parameter, ie
“cacheName.tableName”?

Thanks,
Stuart.

[1]
https://github.com/apache/ignite/blob/ca973ad99c6112160a305df05be9458e29f88307/modules/spark/src/main/scala/org/apache/ignite/spark/impl/package.scala#L119