[jira] [Updated] (SPARK-26407) For an external non-partitioned table, if add a directory named with k=v to the table path, select result will be wrong

2018-12-23 Thread Bao Yunz (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-26407?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bao Yunz updated SPARK-26407:
-
Description: 
Scenario 1

Create an external non-partitioned table, in which location directory has a 
directory named with "part=1" and its schema is (id, name), for example. And 
there is some data in the "part=1" directory. Then desc the table, we will find 
the "part" is added in table schema as table column. when insert into the table 
with two columns data, will throw a exception that  target table has 3 columns 
but the inserted data has 2 columns. 

Scenario 2

Create an external non-partitioned table, which location path is empty and its 
scema is (id, name), for example. After several times insert operation, we add 
a directory named with "part=1" in the table location directory.  And there is 
some data in the "part=1" directory.  Then do insert and select operation, we 
will find the scan path is changed to "tablePath/part=1",so that we will get a 
wrong result.

 The right logic should be that if a table is a non-partitioned table, adding a 
partition-like folder under tablePath should not change its schema and select 
result.

  was:
Scenario 1

Create an external non-partitioned table, in which location directory has a 
directory named with "part=1" and its schema is (id, name), for example. And 
there is some data in the "part=1" directory. Then desc the table, we will find 
the "part" is added in table scehma as table column. when insert into the table 
with two columns data, will throw a exception that  target table has 3 columns 
but the inserted data has 2 columns. 

Scenario 2

Create an external non-partitioned table, which location path is empty and its 
scema is (id, name), for example. After several times insert operation, we add 
a directory named with "part=1" in the table location directory.  And there is 
some data in the "part=1" directory.  Then do insert and select operation, we 
will find the scan path is changed to "tablePath/part=1",so that we will get a 
wrong result.

 The right logic should be that if a table is a non-partitioned table, adding a 
partition-like folder under tablePath should not change its schema and select 
result.


> For an external non-partitioned table, if add a directory named with k=v to 
> the table path, select result will be wrong
> ---
>
> Key: SPARK-26407
> URL: https://issues.apache.org/jira/browse/SPARK-26407
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.4.0
>Reporter: Bao Yunz
>Priority: Major
>  Labels: usability
>
> Scenario 1
> Create an external non-partitioned table, in which location directory has a 
> directory named with "part=1" and its schema is (id, name), for example. And 
> there is some data in the "part=1" directory. Then desc the table, we will 
> find the "part" is added in table schema as table column. when insert into 
> the table with two columns data, will throw a exception that  target table 
> has 3 columns but the inserted data has 2 columns. 
> Scenario 2
> Create an external non-partitioned table, which location path is empty and 
> its scema is (id, name), for example. After several times insert operation, 
> we add a directory named with "part=1" in the table location directory.  And 
> there is some data in the "part=1" directory.  Then do insert and select 
> operation, we will find the scan path is changed to "tablePath/part=1",so 
> that we will get a wrong result.
>  The right logic should be that if a table is a non-partitioned table, adding 
> a partition-like folder under tablePath should not change its schema and 
> select result.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-26407) For an external non-partitioned table, if add a directory named with k=v to the table path, select result will be wrong

2018-12-19 Thread Bao Yunz (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-26407?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bao Yunz updated SPARK-26407:
-
Description: 
Scenario 1

Create an external non-partitioned table, in which location directory has a 
directory named with "part=1" and its schema is (id, name), for example. And 
there is some data in the "part=1" directory. Then desc the table, we will find 
the "part" is added in table scehma as table column. when insert into the table 
with two columns data, will throw a exception that  target table has 3 columns 
but the inserted data has 2 columns. 

Scenario 2

Create an external non-partitioned table, which location path is empty and its 
scema is (id, name), for example. After several times insert operation, we add 
a directory named with "part=1" in the table location directory.  And there is 
some data in the "part=1" directory.  Then do insert and select operation, we 
will find the scan path is changed to "tablePath/part=1",so that we will get a 
wrong result.

 The right logic should be that if a table is a non-partitioned table, adding a 
partition-like folder under tablePath should not change its schema and select 
result.

  was:
Scene 1

Create an external non-partition table, in which location directory has a 
directory named with "part=1", for example. Then desc the table, we will find 
the string "part" is showed in table column. when insert the table with data 
which has same column with target table , will throw a exception that target 
table has different column number with the inserted data. 

Scene 2

Create a external non-partition table, which location path is empty. After 
several times insert operation, we add a directory named with "part=1" in the 
table location directory. Then do insert and select operation, we will find the 
scan path is changed to "tablePath/part=1",so that we will get a wrong result.

 

It seems that the existing logic of spark will process this kind of table like 
a partition table. But when we do show partitions operation, it will throw the 
exception that the table is not partitioned, which is confusing。We believe that 
the normal logic should be that if a table is a non-partitioned table, the 
folder under tablePath should not change its basic properties.


> For an external non-partitioned table, if add a directory named with k=v to 
> the table path, select result will be wrong
> ---
>
> Key: SPARK-26407
> URL: https://issues.apache.org/jira/browse/SPARK-26407
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.4.0
>Reporter: Bao Yunz
>Priority: Major
>  Labels: usability
>
> Scenario 1
> Create an external non-partitioned table, in which location directory has a 
> directory named with "part=1" and its schema is (id, name), for example. And 
> there is some data in the "part=1" directory. Then desc the table, we will 
> find the "part" is added in table scehma as table column. when insert into 
> the table with two columns data, will throw a exception that  target table 
> has 3 columns but the inserted data has 2 columns. 
> Scenario 2
> Create an external non-partitioned table, which location path is empty and 
> its scema is (id, name), for example. After several times insert operation, 
> we add a directory named with "part=1" in the table location directory.  And 
> there is some data in the "part=1" directory.  Then do insert and select 
> operation, we will find the scan path is changed to "tablePath/part=1",so 
> that we will get a wrong result.
>  The right logic should be that if a table is a non-partitioned table, adding 
> a partition-like folder under tablePath should not change its schema and 
> select result.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-26407) For an external non-partitioned table, if add a directory named with k=v to the table path, select result will be wrong

2018-12-19 Thread Bao Yunz (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-26407?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bao Yunz updated SPARK-26407:
-
Affects Version/s: (was: 2.3.2)

> For an external non-partitioned table, if add a directory named with k=v to 
> the table path, select result will be wrong
> ---
>
> Key: SPARK-26407
> URL: https://issues.apache.org/jira/browse/SPARK-26407
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.4.0
>Reporter: Bao Yunz
>Priority: Major
>  Labels: usability
>
> Scene 1
> Create a external non-partition table, in which location directory has a 
> directory named with "part=1", for example. Then desc the table, we will find 
> the string "part" is showed in table column. when insert the table with data 
> which has same column with target table , will throw a exception that target 
> table has different column number with the inserted data. 
> Scene 2
> Create a external non-partition table, which location path is empty. After 
> several times insert operation, we add a directory named with "part=1" in the 
> table location directory. Then do insert and select operation, we will find 
> the scan path is changed to "tablePath/part=1",so that we will get a wrong 
> result.
>  
> It seems that the existing logic of spark will process this kind of table 
> like a partition table. But when we do show partitions operation, it will 
> throw the exception that the table is not partitioned, which is confusing。We 
> believe that the normal logic should be that if a table is a non-partitioned 
> table, the folder under tablePath should not change its basic properties.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-26407) For an external non-partitioned table, if add a directory named with k=v to the table path, select result will be wrong

2018-12-19 Thread Bao Yunz (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-26407?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bao Yunz updated SPARK-26407:
-
Description: 
Scene 1

Create an external non-partition table, in which location directory has a 
directory named with "part=1", for example. Then desc the table, we will find 
the string "part" is showed in table column. when insert the table with data 
which has same column with target table , will throw a exception that target 
table has different column number with the inserted data. 

Scene 2

Create a external non-partition table, which location path is empty. After 
several times insert operation, we add a directory named with "part=1" in the 
table location directory. Then do insert and select operation, we will find the 
scan path is changed to "tablePath/part=1",so that we will get a wrong result.

 

It seems that the existing logic of spark will process this kind of table like 
a partition table. But when we do show partitions operation, it will throw the 
exception that the table is not partitioned, which is confusing。We believe that 
the normal logic should be that if a table is a non-partitioned table, the 
folder under tablePath should not change its basic properties.

  was:
Scene 1

Create a external non-partition table, in which location directory has a 
directory named with "part=1", for example. Then desc the table, we will find 
the string "part" is showed in table column. when insert the table with data 
which has same column with target table , will throw a exception that target 
table has different column number with the inserted data. 

Scene 2

Create a external non-partition table, which location path is empty. After 
several times insert operation, we add a directory named with "part=1" in the 
table location directory. Then do insert and select operation, we will find the 
scan path is changed to "tablePath/part=1",so that we will get a wrong result.

 

It seems that the existing logic of spark will process this kind of table like 
a partition table. But when we do show partitions operation, it will throw the 
exception that the table is not partitioned, which is confusing。We believe that 
the normal logic should be that if a table is a non-partitioned table, the 
folder under tablePath should not change its basic properties.


> For an external non-partitioned table, if add a directory named with k=v to 
> the table path, select result will be wrong
> ---
>
> Key: SPARK-26407
> URL: https://issues.apache.org/jira/browse/SPARK-26407
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.4.0
>Reporter: Bao Yunz
>Priority: Major
>  Labels: usability
>
> Scene 1
> Create an external non-partition table, in which location directory has a 
> directory named with "part=1", for example. Then desc the table, we will find 
> the string "part" is showed in table column. when insert the table with data 
> which has same column with target table , will throw a exception that target 
> table has different column number with the inserted data. 
> Scene 2
> Create a external non-partition table, which location path is empty. After 
> several times insert operation, we add a directory named with "part=1" in the 
> table location directory. Then do insert and select operation, we will find 
> the scan path is changed to "tablePath/part=1",so that we will get a wrong 
> result.
>  
> It seems that the existing logic of spark will process this kind of table 
> like a partition table. But when we do show partitions operation, it will 
> throw the exception that the table is not partitioned, which is confusing。We 
> believe that the normal logic should be that if a table is a non-partitioned 
> table, the folder under tablePath should not change its basic properties.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-26407) For an external non-partitioned table, if add a directory named with k=v to the table path, select result will be wrong

2018-12-19 Thread Bao Yunz (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-26407?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bao Yunz updated SPARK-26407:
-
Summary: For an external non-partitioned table, if add a directory named 
with k=v to the table path, select result will be wrong  (was: For an external 
non-partition table, if add a directory named with k=v to the table path, 
select result will be wrong)

> For an external non-partitioned table, if add a directory named with k=v to 
> the table path, select result will be wrong
> ---
>
> Key: SPARK-26407
> URL: https://issues.apache.org/jira/browse/SPARK-26407
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.3.2, 2.4.0
>Reporter: Bao Yunz
>Priority: Major
>  Labels: usability
>
> Scene 1
> Create a external non-partition table, in which location directory has a 
> directory named with "part=1", for example. Then desc the table, we will find 
> the string "part" is showed in table column. when insert the table with data 
> which has same column with target table , will throw a exception that target 
> table has different column number with the inserted data. 
> Scene 2
> Create a external non-partition table, which location path is empty. After 
> several times insert operation, we add a directory named with "part=1" in the 
> table location directory. Then do insert and select operation, we will find 
> the scan path is changed to "tablePath/part=1",so that we will get a wrong 
> result.
>  
> It seems that the existing logic of spark will process this kind of table 
> like a partition table. But when we do show partitions operation, it will 
> throw the exception that the table is not partitioned, which is confusing。We 
> believe that the normal logic should be that if a table is a non-partitioned 
> table, the folder under tablePath should not change its basic properties.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org