[jira] [Commented] (SPARK-32560) improve exception message

2020-08-06 Thread philipse (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-32560?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17172791#comment-17172791
 ] 

philipse commented on SPARK-32560:
--

 

Thanks [~maropu] for you notice. will improve it in furture.;)

> improve exception message
> -
>
> Key: SPARK-32560
> URL: https://issues.apache.org/jira/browse/SPARK-32560
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: philipse
>Priority: Minor
> Attachments: exception.png
>
>
> Exception messages are lack of single quotes, we can improve it to keep 
> consisent



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-32560) improve exception message

2020-08-06 Thread philipse (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-32560?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

philipse updated SPARK-32560:
-
Description: Exception messages are lack of single quotes, we can improve 
it to keep consisent  (was: Exception messages are lack of single quotes, we 
can improve it to keep consisent

!image-2020-08-07-08-32-35-808.png!)

> improve exception message
> -
>
> Key: SPARK-32560
> URL: https://issues.apache.org/jira/browse/SPARK-32560
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: philipse
>Priority: Minor
> Attachments: exception.png
>
>
> Exception messages are lack of single quotes, we can improve it to keep 
> consisent



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-32560) improve exception message

2020-08-06 Thread philipse (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-32560?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

philipse updated SPARK-32560:
-
Attachment: exception.png

> improve exception message
> -
>
> Key: SPARK-32560
> URL: https://issues.apache.org/jira/browse/SPARK-32560
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: philipse
>Priority: Minor
> Attachments: exception.png
>
>
> Exception messages are lack of single quotes, we can improve it to keep 
> consisent
> !image-2020-08-07-08-32-35-808.png!



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-32560) improve exception message

2020-08-06 Thread philipse (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-32560?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

philipse updated SPARK-32560:
-
Description: 
Exception messages are lack of single quotes, we can improve it to keep 
consisent

!image-2020-08-07-08-32-35-808.png!

  was:Exception message have extra single quotes, we can improve it.


> improve exception message
> -
>
> Key: SPARK-32560
> URL: https://issues.apache.org/jira/browse/SPARK-32560
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: philipse
>Priority: Minor
>
> Exception messages are lack of single quotes, we can improve it to keep 
> consisent
> !image-2020-08-07-08-32-35-808.png!



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-32560) improve exception message

2020-08-06 Thread philipse (Jira)
philipse created SPARK-32560:


 Summary: improve exception message
 Key: SPARK-32560
 URL: https://issues.apache.org/jira/browse/SPARK-32560
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 3.0.0
Reporter: philipse


Exception message have extra single quotes, we can improve it.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Issue Comment Deleted] (SPARK-24194) HadoopFsRelation cannot overwrite a path that is also being read from

2020-07-27 Thread philipse (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-24194?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

philipse updated SPARK-24194:
-
Comment: was deleted

(was: Hi 

is the issue closed ? can i try it in product env?

 

Thanks)

> HadoopFsRelation cannot overwrite a path that is also being read from
> -
>
> Key: SPARK-24194
> URL: https://issues.apache.org/jira/browse/SPARK-24194
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.1.0
> Environment: spark master
>Reporter: yangz
>Priority: Minor
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> When
> {code:java}
> INSERT OVERWRITE TABLE territory_count_compare select * from 
> territory_count_compare where shop_count!=real_shop_count
> {code}
> And territory_count_compare is a table with parquet, there will be a error 
> Cannot overwrite a path that is also being read from
>  
> And in file MetastoreDataSourceSuite.scala, there have a test case
>  
>  
> {code:java}
> table(tableName).write.mode(SaveMode.Overwrite).insertInto(tableName)
> {code}
>  
> But when the table territory_count_compare is a common hive table, there is 
> no error. 
> So I think the reason is when insert overwrite into hadoopfs relation with 
> static partition, it first delete the partition in the output. But it should 
> be the time when the job commited.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-32324) Fix error messages during using PIVOT and lateral view

2020-07-20 Thread philipse (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-32324?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

philipse resolved SPARK-32324.
--
Resolution: Not A Problem

> Fix error messages during using PIVOT and lateral view
> --
>
> Key: SPARK-32324
> URL: https://issues.apache.org/jira/browse/SPARK-32324
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: philipse
>Priority: Minor
>
> Currently when we use `lateral view` and `pivot` together in from clause, if  
> `lateral view` is before `pivot`, the error message is "LATERAL cannot be 
> used together with PIVOT in FROM clause".if if  `lateral view` is after 
> `pivot`,the query will be normal ,So the error messages "LATERAL cannot be 
> used together with PIVOT in FROM clause" is not accurate, we may improve it.
>  
> Steps to reproduce:
> {code:java}
> CREATE TABLE person (id INT, name STRING, age INT, class int, address STRING);
>  INSERT INTO person VALUES
>  (100, 'John', 30, 1, 'Street 1'),
>  (200, 'Mary', NULL, 1, 'Street 2'),
>  (300, 'Mike', 80, 3, 'Street 3'),
>  (400, 'Dan', 50, 4, 'Street 4');
> {code}
>  
> Query1:
>  
> {code:java}
> SELECT * FROM person
>  lateral view outer explode(array(30,60)) tabelName as c_age
>  lateral view explode(array(40,80)) as d_age
>  PIVOT (
>  count(distinct age) as a
>  for name in ('Mary','John')
>  )
> {code}
> Result 1:
>  
> {code:java}
> Error: org.apache.spark.sql.catalyst.parser.ParseException: 
>  LATERAL cannot be used together with PIVOT in FROM clause(line 1, pos 9)
> == SQL ==
>  SELECT * FROM person
>  -^^^
>  lateral view outer explode(array(30,60)) tabelName as c_age
>  lateral view explode(array(40,80)) as d_age
>  PIVOT (
>  count(distinct age) as a
>  for name in ('Mary','John')
>  ) (state=,code=0)
> {code}
>  
>  
> Query2:
>  
> {code:java}
> SELECT * FROM person
>  PIVOT (
>  count(distinct age) as a
>  for name in ('Mary','John')
>  )
>  lateral view outer explode(array(30,60)) tabelName as c_age
>  lateral view explode(array(40,80)) as d_age
> {code}
>  
> Reuslt2:
> +---+--++---++
> |id|Mary|John|c_age|d_age|
> +---+--++---++
> |300|NULL|NULL|30|40|
> |300|NULL|NULL|30|80|
> |300|NULL|NULL|60|40|
> |300|NULL|NULL|60|80|
> |100|0|NULL|30|40|
> |100|0|NULL|30|80|
> |100|0|NULL|60|40|
> |100|0|NULL|60|80|
> |400|NULL|NULL|30|40|
> |400|NULL|NULL|30|80|
> |400|NULL|NULL|60|40|
> |400|NULL|NULL|60|80|
> |200|NULL|1|30|40|
> |200|NULL|1|30|80|
> |200|NULL|1|60|40|
> |200|NULL|1|60|80|
> +---+--++---++
> ```
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-32358) temp view not working after upgrading from 2.3.3 to 2.4.5

2020-07-20 Thread philipse (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-32358?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

philipse resolved SPARK-32358.
--
Resolution: Won't Fix

> temp view not working after upgrading from 2.3.3 to 2.4.5
> -
>
> Key: SPARK-32358
> URL: https://issues.apache.org/jira/browse/SPARK-32358
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark
>Affects Versions: 2.4.5
>Reporter: philipse
>Priority: Major
>
> After upgrading from 2.3.3 to spark 2.4.5. the temp view seems not working . 
> Please correct me if i miss sth. Thanks!
> Steps to reproduce:
> ```
> from pyspark.sql import SparkSession
>  from pyspark.sql import Row
>  spark=SparkSession\
>  .builder \
>  .appName('scenary_address_1') \
>  .enableHiveSupport() \
>  .getOrCreate()
>  
> address_tok_result_df=spark.createDataFrame([Row(a=1,b='难',c=80),Row(a=2,b='v',c=81)])
>  print("create dataframe finished")
>  address_tok_result_df.createOrReplaceTempView("scenery_address_test1")
>  print(spark.read.table('scenery_address_test1').dtypes)
>  spark.sql("select * from scenery_address_test1").show()
> ```
>  
> In spark2.3.3  I can easily gey the following result:
> ```
> create dataframe finished
>  [('a', 'bigint'), ('b', 'string'), ('c', 'bigint')]
>  +-+-++---
> |a|b|c|
> +-+-++---
> |1|难|80|
> |2|v|81|
> +-+-+—+
> ```
>  
> But in 2.4.5. i can only get,but without result showing out:
> create dataframe finished
>  [('a', 'bigint'), ('b', 'string'), ('c', 'bigint')]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-32358) temp view not working after upgrading from 2.3.3 to 2.4.5

2020-07-20 Thread philipse (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-32358?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

philipse resolved SPARK-32358.
--
Resolution: Fixed

> temp view not working after upgrading from 2.3.3 to 2.4.5
> -
>
> Key: SPARK-32358
> URL: https://issues.apache.org/jira/browse/SPARK-32358
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark
>Affects Versions: 2.4.5
>Reporter: philipse
>Priority: Major
>
> After upgrading from 2.3.3 to spark 2.4.5. the temp view seems not working . 
> Please correct me if i miss sth. Thanks!
> Steps to reproduce:
> ```
> from pyspark.sql import SparkSession
>  from pyspark.sql import Row
>  spark=SparkSession\
>  .builder \
>  .appName('scenary_address_1') \
>  .enableHiveSupport() \
>  .getOrCreate()
>  
> address_tok_result_df=spark.createDataFrame([Row(a=1,b='难',c=80),Row(a=2,b='v',c=81)])
>  print("create dataframe finished")
>  address_tok_result_df.createOrReplaceTempView("scenery_address_test1")
>  print(spark.read.table('scenery_address_test1').dtypes)
>  spark.sql("select * from scenery_address_test1").show()
> ```
>  
> In spark2.3.3  I can easily gey the following result:
> ```
> create dataframe finished
>  [('a', 'bigint'), ('b', 'string'), ('c', 'bigint')]
>  +-+-++---
> |a|b|c|
> +-+-++---
> |1|难|80|
> |2|v|81|
> +-+-+—+
> ```
>  
> But in 2.4.5. i can only get,but without result showing out:
> create dataframe finished
>  [('a', 'bigint'), ('b', 'string'), ('c', 'bigint')]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Reopened] (SPARK-32358) temp view not working after upgrading from 2.3.3 to 2.4.5

2020-07-20 Thread philipse (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-32358?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

philipse reopened SPARK-32358:
--

> temp view not working after upgrading from 2.3.3 to 2.4.5
> -
>
> Key: SPARK-32358
> URL: https://issues.apache.org/jira/browse/SPARK-32358
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark
>Affects Versions: 2.4.5
>Reporter: philipse
>Priority: Major
>
> After upgrading from 2.3.3 to spark 2.4.5. the temp view seems not working . 
> Please correct me if i miss sth. Thanks!
> Steps to reproduce:
> ```
> from pyspark.sql import SparkSession
>  from pyspark.sql import Row
>  spark=SparkSession\
>  .builder \
>  .appName('scenary_address_1') \
>  .enableHiveSupport() \
>  .getOrCreate()
>  
> address_tok_result_df=spark.createDataFrame([Row(a=1,b='难',c=80),Row(a=2,b='v',c=81)])
>  print("create dataframe finished")
>  address_tok_result_df.createOrReplaceTempView("scenery_address_test1")
>  print(spark.read.table('scenery_address_test1').dtypes)
>  spark.sql("select * from scenery_address_test1").show()
> ```
>  
> In spark2.3.3  I can easily gey the following result:
> ```
> create dataframe finished
>  [('a', 'bigint'), ('b', 'string'), ('c', 'bigint')]
>  +-+-++---
> |a|b|c|
> +-+-++---
> |1|难|80|
> |2|v|81|
> +-+-+—+
> ```
>  
> But in 2.4.5. i can only get,but without result showing out:
> create dataframe finished
>  [('a', 'bigint'), ('b', 'string'), ('c', 'bigint')]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-32358) temp view not working after upgrading from 2.3.3 to 2.4.5

2020-07-19 Thread philipse (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-32358?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

philipse updated SPARK-32358:
-
Description: 
After upgrading from 2.3.3 to spark 2.4.5. the temp view seems not working . 
Please correct me if i miss sth. Thanks!

Steps to reproduce:

```

from pyspark.sql import SparkSession
 from pyspark.sql import Row
 spark=SparkSession\
 .builder \
 .appName('scenary_address_1') \
 .enableHiveSupport() \
 .getOrCreate()
 
address_tok_result_df=spark.createDataFrame([Row(a=1,b='难',c=80),Row(a=2,b='v',c=81)])
 print("create dataframe finished")
 address_tok_result_df.createOrReplaceTempView("scenery_address_test1")
 print(spark.read.table('scenery_address_test1').dtypes)
 spark.sql("select * from scenery_address_test1").show()

```

 

In spark2.3.3  I can easily gey the following result:

```

create dataframe finished
 [('a', 'bigint'), ('b', 'string'), ('c', 'bigint')]
 +-+-++---
|a|b|c|

+-+-++---
|1|难|80|
|2|v|81|

+-+-+—+

```

 

But in 2.4.5. i can only get,but without result showing out:

create dataframe finished
 [('a', 'bigint'), ('b', 'string'), ('c', 'bigint')]

  was:
After upgrading from 2.3.3 to spark 2.4.5. the temp view seems not working . 
Please correct me if i miss sth. Thanks!

Steps to reproduce:

```

from pyspark.sql import SparkSession
 from pyspark.sql import Row
 spark=SparkSession\
 .builder \
 .appName('scenary_address_1') \
 .enableHiveSupport() \
 .getOrCreate()
 
address_tok_result_df=spark.createDataFrame([Row(a=1,b='难',c=80),Row(a=2,b='v',c=81)])
 print("create dataframe finished")
 address_tok_result_df.createOrReplaceTempView("scenery_address_test1")
 print(spark.read.table('scenery_address_test1').dtypes)
 spark.sql("select * from scenery_address_test1").show()

```

 

In spark2.3.3  i can easily gey the following result:

```

create dataframe finished
 [('a', 'bigint'), ('b', 'string'), ('c', 'bigint')]
 ++--++---
|a|b|c|

++--++---
|1|难|80|
|2|v|81|

++--+—+

```

 

But in 2.4.5. i can only get:

create dataframe finished
 [('a', 'bigint'), ('b', 'string'), ('c', 'bigint')]


> temp view not working after upgrading from 2.3.3 to 2.4.5
> -
>
> Key: SPARK-32358
> URL: https://issues.apache.org/jira/browse/SPARK-32358
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark
>Affects Versions: 2.4.5
>Reporter: philipse
>Priority: Major
>
> After upgrading from 2.3.3 to spark 2.4.5. the temp view seems not working . 
> Please correct me if i miss sth. Thanks!
> Steps to reproduce:
> ```
> from pyspark.sql import SparkSession
>  from pyspark.sql import Row
>  spark=SparkSession\
>  .builder \
>  .appName('scenary_address_1') \
>  .enableHiveSupport() \
>  .getOrCreate()
>  
> address_tok_result_df=spark.createDataFrame([Row(a=1,b='难',c=80),Row(a=2,b='v',c=81)])
>  print("create dataframe finished")
>  address_tok_result_df.createOrReplaceTempView("scenery_address_test1")
>  print(spark.read.table('scenery_address_test1').dtypes)
>  spark.sql("select * from scenery_address_test1").show()
> ```
>  
> In spark2.3.3  I can easily gey the following result:
> ```
> create dataframe finished
>  [('a', 'bigint'), ('b', 'string'), ('c', 'bigint')]
>  +-+-++---
> |a|b|c|
> +-+-++---
> |1|难|80|
> |2|v|81|
> +-+-+—+
> ```
>  
> But in 2.4.5. i can only get,but without result showing out:
> create dataframe finished
>  [('a', 'bigint'), ('b', 'string'), ('c', 'bigint')]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-32358) temp view not working after upgrading from 2.3.3 to 2.4.5

2020-07-19 Thread philipse (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-32358?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

philipse updated SPARK-32358:
-
Description: 
After upgrading from 2.3.3 to spark 2.4.5. the temp view seems not working . 
Please correct me if i miss sth. Thanks!

Steps to reproduce:

```

from pyspark.sql import SparkSession
 from pyspark.sql import Row
 spark=SparkSession\
 .builder \
 .appName('scenary_address_1') \
 .enableHiveSupport() \
 .getOrCreate()
 
address_tok_result_df=spark.createDataFrame([Row(a=1,b='难',c=80),Row(a=2,b='v',c=81)])
 print("create dataframe finished")
 address_tok_result_df.createOrReplaceTempView("scenery_address_test1")
 print(spark.read.table('scenery_address_test1').dtypes)
 spark.sql("select * from scenery_address_test1").show()

```

 

In spark2.3.3  i can easily gey the following result:

```

create dataframe finished
 [('a', 'bigint'), ('b', 'string'), ('c', 'bigint')]
 ++--++---
|a|b|c|

++--++---
|1|难|80|
|2|v|81|

++--+—+

```

 

But in 2.4.5. i can only get:

create dataframe finished
 [('a', 'bigint'), ('b', 'string'), ('c', 'bigint')]

  was:
After upgrading from 2.3.3 to spark 2.4.5. the temp view seems not working . i 
am not sure if if missing sth 

Steps to reproduce:

```

from pyspark.sql import SparkSession
from pyspark.sql import Row
spark=SparkSession\
.builder \
.appName('scenary_address_1') \
.enableHiveSupport() \
.getOrCreate()
address_tok_result_df=spark.createDataFrame([Row(a=1,b='难',c=80),Row(a=2,b='v',c=81)])
print("create dataframe finished")
address_tok_result_df.createOrReplaceTempView("scenery_address_test1")
print(spark.read.table('scenery_address_test1').dtypes)
spark.sql("select * from scenery_address_test1").show()

```

 

In spark2.3.3  i can easily gey the following result:

```

create dataframe finished
[('a', 'bigint'), ('b', 'string'), ('c', 'bigint')]
+---+---+---+
| a| b| c|
+---+---+---+
| 1| 难| 80|
| 2| v| 81|
+---+---+—+

```

 

But in 2.4.5. i can only get:

create dataframe finished
[('a', 'bigint'), ('b', 'string'), ('c', 'bigint')]


> temp view not working after upgrading from 2.3.3 to 2.4.5
> -
>
> Key: SPARK-32358
> URL: https://issues.apache.org/jira/browse/SPARK-32358
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark
>Affects Versions: 2.4.5
>Reporter: philipse
>Priority: Major
>
> After upgrading from 2.3.3 to spark 2.4.5. the temp view seems not working . 
> Please correct me if i miss sth. Thanks!
> Steps to reproduce:
> ```
> from pyspark.sql import SparkSession
>  from pyspark.sql import Row
>  spark=SparkSession\
>  .builder \
>  .appName('scenary_address_1') \
>  .enableHiveSupport() \
>  .getOrCreate()
>  
> address_tok_result_df=spark.createDataFrame([Row(a=1,b='难',c=80),Row(a=2,b='v',c=81)])
>  print("create dataframe finished")
>  address_tok_result_df.createOrReplaceTempView("scenery_address_test1")
>  print(spark.read.table('scenery_address_test1').dtypes)
>  spark.sql("select * from scenery_address_test1").show()
> ```
>  
> In spark2.3.3  i can easily gey the following result:
> ```
> create dataframe finished
>  [('a', 'bigint'), ('b', 'string'), ('c', 'bigint')]
>  ++--++---
> |a|b|c|
> ++--++---
> |1|难|80|
> |2|v|81|
> ++--+—+
> ```
>  
> But in 2.4.5. i can only get:
> create dataframe finished
>  [('a', 'bigint'), ('b', 'string'), ('c', 'bigint')]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-32358) temp view not working after upgrading from 2.3.3 to 2.4.5

2020-07-19 Thread philipse (Jira)
philipse created SPARK-32358:


 Summary: temp view not working after upgrading from 2.3.3 to 2.4.5
 Key: SPARK-32358
 URL: https://issues.apache.org/jira/browse/SPARK-32358
 Project: Spark
  Issue Type: Bug
  Components: PySpark
Affects Versions: 2.4.5
Reporter: philipse


After upgrading from 2.3.3 to spark 2.4.5. the temp view seems not working . i 
am not sure if if missing sth 

Steps to reproduce:

```

from pyspark.sql import SparkSession
from pyspark.sql import Row
spark=SparkSession\
.builder \
.appName('scenary_address_1') \
.enableHiveSupport() \
.getOrCreate()
address_tok_result_df=spark.createDataFrame([Row(a=1,b='难',c=80),Row(a=2,b='v',c=81)])
print("create dataframe finished")
address_tok_result_df.createOrReplaceTempView("scenery_address_test1")
print(spark.read.table('scenery_address_test1').dtypes)
spark.sql("select * from scenery_address_test1").show()

```

 

In spark2.3.3  i can easily gey the following result:

```

create dataframe finished
[('a', 'bigint'), ('b', 'string'), ('c', 'bigint')]
+---+---+---+
| a| b| c|
+---+---+---+
| 1| 难| 80|
| 2| v| 81|
+---+---+—+

```

 

But in 2.4.5. i can only get:

create dataframe finished
[('a', 'bigint'), ('b', 'string'), ('c', 'bigint')]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-32324) Fix error messages during using PIVOT and lateral view

2020-07-15 Thread philipse (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-32324?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

philipse updated SPARK-32324:
-
Description: 
Currently when we use `lateral view` and `pivot` together in from clause, if  
`lateral view` is before `pivot`, the error message is "LATERAL cannot be used 
together with PIVOT in FROM clause".if if  `lateral view` is after `pivot`,the 
query will be normal ,So the error messages "LATERAL cannot be used together 
with PIVOT in FROM clause" is not accurate, we may improve it.

 

Steps to reproduce:
{code:java}
CREATE TABLE person (id INT, name STRING, age INT, class int, address STRING);
 INSERT INTO person VALUES
 (100, 'John', 30, 1, 'Street 1'),
 (200, 'Mary', NULL, 1, 'Street 2'),
 (300, 'Mike', 80, 3, 'Street 3'),
 (400, 'Dan', 50, 4, 'Street 4');
{code}
 

Query1:

 
{code:java}
SELECT * FROM person
 lateral view outer explode(array(30,60)) tabelName as c_age
 lateral view explode(array(40,80)) as d_age
 PIVOT (
 count(distinct age) as a
 for name in ('Mary','John')
 )
{code}
Result 1:

 
{code:java}
Error: org.apache.spark.sql.catalyst.parser.ParseException: 
 LATERAL cannot be used together with PIVOT in FROM clause(line 1, pos 9)
== SQL ==
 SELECT * FROM person
 -^^^
 lateral view outer explode(array(30,60)) tabelName as c_age
 lateral view explode(array(40,80)) as d_age
 PIVOT (
 count(distinct age) as a
 for name in ('Mary','John')
 ) (state=,code=0)
{code}
 

 

Query2:

 
{code:java}
SELECT * FROM person
 PIVOT (
 count(distinct age) as a
 for name in ('Mary','John')
 )
 lateral view outer explode(array(30,60)) tabelName as c_age
 lateral view explode(array(40,80)) as d_age
{code}
 

Reuslt2:

+---+--++---++
|id|Mary|John|c_age|d_age|

+---+--++---++
|300|NULL|NULL|30|40|
|300|NULL|NULL|30|80|
|300|NULL|NULL|60|40|
|300|NULL|NULL|60|80|
|100|0|NULL|30|40|
|100|0|NULL|30|80|
|100|0|NULL|60|40|
|100|0|NULL|60|80|
|400|NULL|NULL|30|40|
|400|NULL|NULL|30|80|
|400|NULL|NULL|60|40|
|400|NULL|NULL|60|80|
|200|NULL|1|30|40|
|200|NULL|1|30|80|
|200|NULL|1|60|40|
|200|NULL|1|60|80|

+---+--++---++

```

 

  was:
Currently when we use `lateral view` and `pivot` together in from clause, if  
`lateral view` is before `pivot`, the error message is "LATERAL cannot be used 
together with PIVOT in FROM clause".if if  `lateral view` is after `pivot`,the 
query will be normal ,So the error messages "LATERAL cannot be used together 
with PIVOT in FROM clause" is not accurate, we may improve it.

 

Steps to reproduce:

```

CREATE TABLE person (id INT, name STRING, age INT, class int, address STRING);
INSERT INTO person VALUES
(100, 'John', 30, 1, 'Street 1'),
(200, 'Mary', NULL, 1, 'Street 2'),
(300, 'Mike', 80, 3, 'Street 3'),
(400, 'Dan', 50, 4, 'Street 4');

```

Query1:

```

SELECT * FROM person
lateral view outer explode(array(30,60)) tabelName as c_age
lateral view explode(array(40,80)) as d_age
PIVOT (
 count(distinct age) as a
for name in ('Mary','John')
)

```

Result 1:

```

Error: org.apache.spark.sql.catalyst.parser.ParseException: 
LATERAL cannot be used together with PIVOT in FROM clause(line 1, pos 9)

== SQL ==
SELECT * FROM person
-^^^
lateral view outer explode(array(30,60)) tabelName as c_age
lateral view explode(array(40,80)) as d_age
PIVOT (
 count(distinct age) as a
for name in ('Mary','John')
) (state=,code=0)

```

 

Query2:

```

SELECT * FROM person
PIVOT (
 count(distinct age) as a
for name in ('Mary','John')
)
lateral view outer explode(array(30,60)) tabelName as c_age
lateral view explode(array(40,80)) as d_age

```

Reuslt2:

```

+--+---+---+++
| id | Mary | John | c_age | d_age |
+--+---+---+++
| 300 | NULL | NULL | 30 | 40 |
| 300 | NULL | NULL | 30 | 80 |
| 300 | NULL | NULL | 60 | 40 |
| 300 | NULL | NULL | 60 | 80 |
| 100 | 0 | NULL | 30 | 40 |
| 100 | 0 | NULL | 30 | 80 |
| 100 | 0 | NULL | 60 | 40 |
| 100 | 0 | NULL | 60 | 80 |
| 400 | NULL | NULL | 30 | 40 |
| 400 | NULL | NULL | 30 | 80 |
| 400 | NULL | NULL | 60 | 40 |
| 400 | NULL | NULL | 60 | 80 |
| 200 | NULL | 1 | 30 | 40 |
| 200 | NULL | 1 | 30 | 80 |
| 200 | NULL | 1 | 60 | 40 |
| 200 | NULL | 1 | 60 | 80 |
+--+---+---+++

```

 


> Fix error messages during using PIVOT and lateral view
> --
>
> Key: SPARK-32324
> URL: https://issues.apache.org/jira/browse/SPARK-32324
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: philipse
>Priority: Minor
>
> Currently when we use `lateral view` and `pivot` together in from clause, if  
> `lateral view` is before `pivot`, the error message is "LATERAL cannot be 
> used together with PIVOT in FROM 

[jira] [Created] (SPARK-32324) Fix error messages during using PIVOT and lateral view

2020-07-15 Thread philipse (Jira)
philipse created SPARK-32324:


 Summary: Fix error messages during using PIVOT and lateral view
 Key: SPARK-32324
 URL: https://issues.apache.org/jira/browse/SPARK-32324
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 3.0.0
Reporter: philipse


Currently when we use `lateral view` and `pivot` together in from clause, if  
`lateral view` is before `pivot`, the error message is "LATERAL cannot be used 
together with PIVOT in FROM clause".if if  `lateral view` is after `pivot`,the 
query will be normal ,So the error messages "LATERAL cannot be used together 
with PIVOT in FROM clause" is not accurate, we may improve it.

 

Steps to reproduce:

```

CREATE TABLE person (id INT, name STRING, age INT, class int, address STRING);
INSERT INTO person VALUES
(100, 'John', 30, 1, 'Street 1'),
(200, 'Mary', NULL, 1, 'Street 2'),
(300, 'Mike', 80, 3, 'Street 3'),
(400, 'Dan', 50, 4, 'Street 4');

```

Query1:

```

SELECT * FROM person
lateral view outer explode(array(30,60)) tabelName as c_age
lateral view explode(array(40,80)) as d_age
PIVOT (
 count(distinct age) as a
for name in ('Mary','John')
)

```

Result 1:

```

Error: org.apache.spark.sql.catalyst.parser.ParseException: 
LATERAL cannot be used together with PIVOT in FROM clause(line 1, pos 9)

== SQL ==
SELECT * FROM person
-^^^
lateral view outer explode(array(30,60)) tabelName as c_age
lateral view explode(array(40,80)) as d_age
PIVOT (
 count(distinct age) as a
for name in ('Mary','John')
) (state=,code=0)

```

 

Query2:

```

SELECT * FROM person
PIVOT (
 count(distinct age) as a
for name in ('Mary','John')
)
lateral view outer explode(array(30,60)) tabelName as c_age
lateral view explode(array(40,80)) as d_age

```

Reuslt2:

```

+--+---+---+++
| id | Mary | John | c_age | d_age |
+--+---+---+++
| 300 | NULL | NULL | 30 | 40 |
| 300 | NULL | NULL | 30 | 80 |
| 300 | NULL | NULL | 60 | 40 |
| 300 | NULL | NULL | 60 | 80 |
| 100 | 0 | NULL | 30 | 40 |
| 100 | 0 | NULL | 30 | 80 |
| 100 | 0 | NULL | 60 | 40 |
| 100 | 0 | NULL | 60 | 80 |
| 400 | NULL | NULL | 30 | 40 |
| 400 | NULL | NULL | 30 | 80 |
| 400 | NULL | NULL | 60 | 40 |
| 400 | NULL | NULL | 60 | 80 |
| 200 | NULL | 1 | 30 | 40 |
| 200 | NULL | 1 | 30 | 80 |
| 200 | NULL | 1 | 60 | 40 |
| 200 | NULL | 1 | 60 | 80 |
+--+---+---+++

```

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-32239) remove duplicate datatype

2020-07-09 Thread philipse (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-32239?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

philipse resolved SPARK-32239.
--
Resolution: Won't Fix

> remove duplicate datatype
> -
>
> Key: SPARK-32239
> URL: https://issues.apache.org/jira/browse/SPARK-32239
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: philipse
>Priority: Minor
>
> remove duplicate datatype to improve code quality



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-32239) remove duplicate datatype

2020-07-09 Thread philipse (Jira)
philipse created SPARK-32239:


 Summary: remove duplicate datatype
 Key: SPARK-32239
 URL: https://issues.apache.org/jira/browse/SPARK-32239
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 3.0.0
Reporter: philipse


remove duplicate datatype to improve code quality



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-32193) update docs on regexp function

2020-07-08 Thread philipse (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-32193?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

philipse updated SPARK-32193:
-
Description: 
Sparksql support the following usage, we may update the docs to let it known to 
more users
{code:java}
 select 'abc'  REGEXP '([a-z]+)';{code}
 

 

  was:Hive support regexp function, Spark sql use `rlike` instead of `regexp` , 
we can update the docs to make it known to more users.


> update  docs on regexp function
> ---
>
> Key: SPARK-32193
> URL: https://issues.apache.org/jira/browse/SPARK-32193
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: philipse
>Priority: Minor
>
> Sparksql support the following usage, we may update the docs to let it known 
> to more users
> {code:java}
>  select 'abc'  REGEXP '([a-z]+)';{code}
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-32193) update docs on regexp function

2020-07-08 Thread philipse (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-32193?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

philipse updated SPARK-32193:
-
Summary: update  docs on regexp function  (was: update migrate guide  docs 
on regexp function)

> update  docs on regexp function
> ---
>
> Key: SPARK-32193
> URL: https://issues.apache.org/jira/browse/SPARK-32193
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: philipse
>Priority: Minor
>
> Hive support regexp function, Spark sql use `rlike` instead of `regexp` , we 
> can update the docs to make it known to more users.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-32193) update migrate guide docs on regexp function

2020-07-06 Thread philipse (Jira)
philipse created SPARK-32193:


 Summary: update migrate guide  docs on regexp function
 Key: SPARK-32193
 URL: https://issues.apache.org/jira/browse/SPARK-32193
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 3.0.0
Reporter: philipse


Hive support regexp function, Spark sql use `rlike` instead of `regexp` , we 
can update the docs to make it known to more users.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-32131) union and set operations have wrong exception infomation

2020-06-29 Thread philipse (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-32131?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

philipse updated SPARK-32131:
-
Description: 
Union and set operations can only be performed on tables with the compatible 
column types,while when we have more than two column, the warning messages will 
have wrong column index.Steps to reproduce.

Step1:prepare test data
{code:java}
drop table if exists test1; 
drop table if exists test2; 
drop table if exists test3;
create table if not exists test1(id int, age int, name timestamp);
create table if not exists test2(id int, age timestamp, name timestamp);
create table if not exists test3(id int, age int, name int);
insert into test1 select 1,2,'2020-01-01 01:01:01';
insert into test2 select 1,'2020-01-01 01:01:01','2020-01-01 01:01:01'; 
insert into test3 select 1,3,4;
{code}
Step2:do query:
{code:java}
Query1:
select * from test1 except select * from test2;
Result1:
Error: org.apache.spark.sql.AnalysisException: Except can only be performed on 
tables with the compatible column types. timestamp <> int at the second column 
of the second table;; 'Except false :- Project [id#620, age#621, name#622] : +- 
SubqueryAlias `default`.`test1` : +- HiveTableRelation `default`.`test1`, 
org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, [id#620, age#621, name#622] 
+- Project [id#623, age#624, name#625] +- SubqueryAlias `default`.`test2` +- 
HiveTableRelation `default`.`test2`, 
org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, [id#623, age#624, name#625] 
(state=,code=0)
Query2:
select * from test1 except select * from test3;
Result2:
Error: org.apache.spark.sql.AnalysisException: Except can only be performed on 
tables with the compatible column types. int <> timestamp at the 2th column of 
the second table;; 'Except false :- Project [id#632, age#633, name#634] : +- 
SubqueryAlias `default`.`test1` : +- HiveTableRelation `default`.`test1`, 
org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, [id#632, age#633, name#634] 
+- Project [id#635, age#636, name#637] +- SubqueryAlias `default`.`test3` +- 
HiveTableRelation `default`.`test3`, 
org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, [id#635, age#636, name#637] 
(state=,code=0)
{code}
the result of query1 is correct, while query2 have the wrong errors,it should 
be the third column

Here has the wrong column index.

+Error: org.apache.spark.sql.AnalysisException: Except can only be performed on 
tables with the compatible column types. int <> timestamp at the *2th* column 
of the second table+

We may need to change to the following

+Error: org.apache.spark.sql.AnalysisException: Except can only be performed on 
tables with the compatible column types. int <> timestamp at the *third* column 
of the second table+

  was:
Union and set operations can only be performed on tables with the compatible 
column types,while when we have more than two column, the warning messages will 
have wrong column index.Steps to reproduce.

Step1:prepare test data
{code:java}
drop table if exists test1; 
drop table if exists test2; 
drop table if exists test3;
create table if not exists test1(id int, age int, name timestamp);
create table if not exists test2(id int, age timestamp, name timestamp);
create table if not exists test3(id int, age int, name int);
insert into test1 select 1,2,'2020-01-01 01:01:01';
insert into test2 select 1,'2020-01-01 01:01:01','2020-01-01 01:01:01'; 
insert into test3 select 1,3,4;
{code}
Step2:do query:
{code:java}
Query1:
select * from test1 except select * from test2;
Result1:
Error: org.apache.spark.sql.AnalysisException: Except can only be performed on 
tables with the compatible column types. timestamp <> int at the second column 
of the second table;; 'Except false :- Project [id#620, age#621, name#622] : +- 
SubqueryAlias `default`.`test1` : +- HiveTableRelation `default`.`test1`, 
org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, [id#620, age#621, name#622] 
+- Project [id#623, age#624, name#625] +- SubqueryAlias `default`.`test2` +- 
HiveTableRelation `default`.`test2`, 
org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, [id#623, age#624, name#625] 
(state=,code=0)
Query2:
select * from test1 except select * from test3;
Result2:
Error: org.apache.spark.sql.AnalysisException: Except can only be performed on 
tables with the compatible column types. int <> timestamp at the 2th column of 
the second table;; 'Except false :- Project [id#632, age#633, name#634] : +- 
SubqueryAlias `default`.`test1` : +- HiveTableRelation `default`.`test1`, 
org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, [id#632, age#633, name#634] 
+- Project [id#635, age#636, name#637] +- SubqueryAlias `default`.`test3` +- 
HiveTableRelation `default`.`test3`, 
org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, [id#635, age#636, name#637] 
(state=,code=0)
{code}
the result of query1 is correct, while query2 have the wrong errors,it should 
be the third 

[jira] [Created] (SPARK-32131) union and set operations have wrong exception infomation

2020-06-29 Thread philipse (Jira)
philipse created SPARK-32131:


 Summary: union and set operations have wrong exception infomation
 Key: SPARK-32131
 URL: https://issues.apache.org/jira/browse/SPARK-32131
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 3.0.0
Reporter: philipse


Union and set operations can only be performed on tables with the compatible 
column types,while when we have more than two column, the warning messages will 
have wrong column index.Steps to reproduce.

Step1:prepare test data
{code:java}
drop table if exists test1; 
drop table if exists test2; 
drop table if exists test3;
create table if not exists test1(id int, age int, name timestamp);
create table if not exists test2(id int, age timestamp, name timestamp);
create table if not exists test3(id int, age int, name int);
insert into test1 select 1,2,'2020-01-01 01:01:01';
insert into test2 select 1,'2020-01-01 01:01:01','2020-01-01 01:01:01'; 
insert into test3 select 1,3,4;
{code}
Step2:do query:
{code:java}
Query1:
select * from test1 except select * from test2;
Result1:
Error: org.apache.spark.sql.AnalysisException: Except can only be performed on 
tables with the compatible column types. timestamp <> int at the second column 
of the second table;; 'Except false :- Project [id#620, age#621, name#622] : +- 
SubqueryAlias `default`.`test1` : +- HiveTableRelation `default`.`test1`, 
org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, [id#620, age#621, name#622] 
+- Project [id#623, age#624, name#625] +- SubqueryAlias `default`.`test2` +- 
HiveTableRelation `default`.`test2`, 
org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, [id#623, age#624, name#625] 
(state=,code=0)
Query2:
select * from test1 except select * from test3;
Result2:
Error: org.apache.spark.sql.AnalysisException: Except can only be performed on 
tables with the compatible column types. int <> timestamp at the 2th column of 
the second table;; 'Except false :- Project [id#632, age#633, name#634] : +- 
SubqueryAlias `default`.`test1` : +- HiveTableRelation `default`.`test1`, 
org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, [id#632, age#633, name#634] 
+- Project [id#635, age#636, name#637] +- SubqueryAlias `default`.`test3` +- 
HiveTableRelation `default`.`test3`, 
org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, [id#635, age#636, name#637] 
(state=,code=0)
{code}
the result of query1 is correct, while query2 have the wrong errors,it should 
be the third column



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-31954) delete duplicate test cases in hivequerysuite

2020-06-10 Thread philipse (Jira)
philipse created SPARK-31954:


 Summary: delete duplicate test cases in hivequerysuite
 Key: SPARK-31954
 URL: https://issues.apache.org/jira/browse/SPARK-31954
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 2.4.6
Reporter: philipse


remove duplication test cases and result files in hivequerysuite



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-31839) delete duplicate code

2020-05-27 Thread philipse (Jira)
philipse created SPARK-31839:


 Summary: delete  duplicate code
 Key: SPARK-31839
 URL: https://issues.apache.org/jira/browse/SPARK-31839
 Project: Spark
  Issue Type: Improvement
  Components: Tests
Affects Versions: 2.4.5
Reporter: philipse


there are duplicate code, we can clear it to improve test quality



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-31790) cast scenarios may generate different results between Hive and Spark

2020-05-21 Thread philipse (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-31790?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

philipse updated SPARK-31790:
-
Description: 
`CAST(n,TIMESTAMPTYPE)` If n is Byte/Short/Int/Long data type, Hive treat n as 
milliseconds unit , while Spark SQL as seconds unit. so the cast result is 
different,please be care when you use it.

For example:
{code:java}
In spark
spark-sql> select cast(1586318188000 as timestamp);
52238-06-04 13:06:400.0
spark-sql> select cast(1586318188 as timestamp);
2020-04-08 11:56:28

In Hive
hive> select cast(1586318188000 as timestamp);
2020-04-08 11:56:28

hive> select cast(1586318188 as timestamp);
1970-01-19 16:38:38.188{code}
 

  was:`CAST(n,TIMESTAMPTYPE)` If n is Byte/Short/Int/Long data type, Hive treat 
n as milliseconds unit , while Spark SQL as seconds unit. so the cast result is 
different,please be care when you use it


> cast scenarios may generate different results between  Hive and Spark
> -
>
> Key: SPARK-31790
> URL: https://issues.apache.org/jira/browse/SPARK-31790
> Project: Spark
>  Issue Type: Documentation
>  Components: Documentation
>Affects Versions: 2.4.5
>Reporter: philipse
>Priority: Minor
>
> `CAST(n,TIMESTAMPTYPE)` If n is Byte/Short/Int/Long data type, Hive treat n 
> as milliseconds unit , while Spark SQL as seconds unit. so the cast result is 
> different,please be care when you use it.
> For example:
> {code:java}
> In spark
> spark-sql> select cast(1586318188000 as timestamp);
> 52238-06-04 13:06:400.0
> spark-sql> select cast(1586318188 as timestamp);
> 2020-04-08 11:56:28
> In Hive
> hive> select cast(1586318188000 as timestamp);
> 2020-04-08 11:56:28
> hive> select cast(1586318188 as timestamp);
> 1970-01-19 16:38:38.188{code}
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-31790) cast scenarios may generate different results between Hive and Spark

2020-05-21 Thread philipse (Jira)
philipse created SPARK-31790:


 Summary: cast scenarios may generate different results between  
Hive and Spark
 Key: SPARK-31790
 URL: https://issues.apache.org/jira/browse/SPARK-31790
 Project: Spark
  Issue Type: Documentation
  Components: Documentation
Affects Versions: 2.4.5
Reporter: philipse


`CAST(n,TIMESTAMPTYPE)` If n is Byte/Short/Int/Long data type, Hive treat n as 
milliseconds unit , while Spark SQL as seconds unit. so the cast result is 
different,please be care when you use it



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-31710) result is the not the same when query and execute jobs

2020-05-14 Thread philipse (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-31710?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

philipse updated SPARK-31710:
-
Description: 
Hi Team

Steps to reproduce.
{code:java}
create table test(id bigint);
insert into test select 1586318188000;
create table test1(id bigint) partitioned by (year string);
insert overwrite table test1 partition(year) select 234,cast(id as TIMESTAMP) 
from test;
{code}
let's check the result. 

Case 1:

*select * from test1;*

234 | 52238-06-04 13:06:400.0

--the result is wrong

Case 2:

*select 234,cast(id as TIMESTAMP) from test;*

 

java.lang.IllegalArgumentException: Timestamp format must be -mm-dd 
hh:mm:ss[.f]
 at java.sql.Timestamp.valueOf(Timestamp.java:237)
 at org.apache.hive.jdbc.HiveBaseResultSet.evaluate(HiveBaseResultSet.java:441)
 at 
org.apache.hive.jdbc.HiveBaseResultSet.getColumnValue(HiveBaseResultSet.java:421)
 at org.apache.hive.jdbc.HiveBaseResultSet.getString(HiveBaseResultSet.java:530)
 at org.apache.hive.beeline.Rows$Row.(Rows.java:166)
 at org.apache.hive.beeline.BufferedRows.(BufferedRows.java:43)
 at org.apache.hive.beeline.BeeLine.print(BeeLine.java:1756)
 at org.apache.hive.beeline.Commands.execute(Commands.java:826)
 at org.apache.hive.beeline.Commands.sql(Commands.java:670)
 at org.apache.hive.beeline.BeeLine.dispatch(BeeLine.java:974)
 at org.apache.hive.beeline.BeeLine.execute(BeeLine.java:810)
 at org.apache.hive.beeline.BeeLine.begin(BeeLine.java:767)
 at org.apache.hive.beeline.BeeLine.mainWithInputRedirection(BeeLine.java:480)
 at org.apache.hive.beeline.BeeLine.main(BeeLine.java:463)
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
 at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 at java.lang.reflect.Method.invoke(Method.java:498)
 at org.apache.hadoop.util.RunJar.run(RunJar.java:226)
 at org.apache.hadoop.util.RunJar.main(RunJar.java:141)
 Error: Unrecognized column type:TIMESTAMP_TYPE (state=,code=0)

 

I try hive,it works well,and the convert is fine and correct
{code:java}
select 234,cast(id as TIMESTAMP) from test;
 234   2020-04-08 11:56:28
{code}
Two questions:

q1:

if we forbid this convert,should we keep all cases the same?

q2:

if we allow the convert in some cases, should we decide the long length, for 
the code seems to force to convert to ns with times*100 nomatter how long 
the data is,if it convert to timestamp with incorrect length, we can raise the 
error.
{code:java}
// // converting seconds to us
private[this] def longToTimestamp(t: Long): Long = t * 100L{code}
 

Thanks!

 

  was:
Hi Team 

Steps to reproduce.
{code:java}
create table test(id bigint);
insert into test select 1586318188000;
create table test1(id bigint) partitioned by (year string);
insert overwrite table test1 partition(year) select 234,cast(id as TIMESTAMP) 
from test;
{code}
let's check the result. 

Case 1:

*select * from test1;*

234 | 52238-06-04 13:06:400.0

Case 2:

*select 234,cast(id as TIMESTAMP) from test;*

java.lang.IllegalArgumentException: Timestamp format must be -mm-dd 
hh:mm:ss[.f]
 at java.sql.Timestamp.valueOf(Timestamp.java:237)
 at org.apache.hive.jdbc.HiveBaseResultSet.evaluate(HiveBaseResultSet.java:441)
 at 
org.apache.hive.jdbc.HiveBaseResultSet.getColumnValue(HiveBaseResultSet.java:421)
 at org.apache.hive.jdbc.HiveBaseResultSet.getString(HiveBaseResultSet.java:530)
 at org.apache.hive.beeline.Rows$Row.(Rows.java:166)
 at org.apache.hive.beeline.BufferedRows.(BufferedRows.java:43)
 at org.apache.hive.beeline.BeeLine.print(BeeLine.java:1756)
 at org.apache.hive.beeline.Commands.execute(Commands.java:826)
 at org.apache.hive.beeline.Commands.sql(Commands.java:670)
 at org.apache.hive.beeline.BeeLine.dispatch(BeeLine.java:974)
 at org.apache.hive.beeline.BeeLine.execute(BeeLine.java:810)
 at org.apache.hive.beeline.BeeLine.begin(BeeLine.java:767)
 at org.apache.hive.beeline.BeeLine.mainWithInputRedirection(BeeLine.java:480)
 at org.apache.hive.beeline.BeeLine.main(BeeLine.java:463)
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
 at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 at java.lang.reflect.Method.invoke(Method.java:498)
 at org.apache.hadoop.util.RunJar.run(RunJar.java:226)
 at org.apache.hadoop.util.RunJar.main(RunJar.java:141)
Error: Unrecognized column type:TIMESTAMP_TYPE (state=,code=0)

 

I try hive,it works well,and the convert is correct

Two questions:

q1:

if we forbid this convert,should we keep all cases the same?

q2:

if we allow the convert in some cases, should we decide the long length, for 
the code seems to force to convert to ns with times*100 nomatter how long 
the data is,if it convert to timestamp with 

[jira] [Created] (SPARK-31710) result is the not the same when query and execute jobs

2020-05-14 Thread philipse (Jira)
philipse created SPARK-31710:


 Summary: result is the not the same when query and execute jobs
 Key: SPARK-31710
 URL: https://issues.apache.org/jira/browse/SPARK-31710
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 2.4.5
 Environment: hdp:2.7.7

spark:2.4.5
Reporter: philipse


Hi Team 

Steps to reproduce.
{code:java}
create table test(id bigint);
insert into test select 1586318188000;
create table test1(id bigint) partitioned by (year string);
insert overwrite table test1 partition(year) select 234,cast(id as TIMESTAMP) 
from test;
{code}
let's check the result. 

Case 1:

*select * from test1;*

234 | 52238-06-04 13:06:400.0

Case 2:

*select 234,cast(id as TIMESTAMP) from test;*

java.lang.IllegalArgumentException: Timestamp format must be -mm-dd 
hh:mm:ss[.f]
 at java.sql.Timestamp.valueOf(Timestamp.java:237)
 at org.apache.hive.jdbc.HiveBaseResultSet.evaluate(HiveBaseResultSet.java:441)
 at 
org.apache.hive.jdbc.HiveBaseResultSet.getColumnValue(HiveBaseResultSet.java:421)
 at org.apache.hive.jdbc.HiveBaseResultSet.getString(HiveBaseResultSet.java:530)
 at org.apache.hive.beeline.Rows$Row.(Rows.java:166)
 at org.apache.hive.beeline.BufferedRows.(BufferedRows.java:43)
 at org.apache.hive.beeline.BeeLine.print(BeeLine.java:1756)
 at org.apache.hive.beeline.Commands.execute(Commands.java:826)
 at org.apache.hive.beeline.Commands.sql(Commands.java:670)
 at org.apache.hive.beeline.BeeLine.dispatch(BeeLine.java:974)
 at org.apache.hive.beeline.BeeLine.execute(BeeLine.java:810)
 at org.apache.hive.beeline.BeeLine.begin(BeeLine.java:767)
 at org.apache.hive.beeline.BeeLine.mainWithInputRedirection(BeeLine.java:480)
 at org.apache.hive.beeline.BeeLine.main(BeeLine.java:463)
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
 at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 at java.lang.reflect.Method.invoke(Method.java:498)
 at org.apache.hadoop.util.RunJar.run(RunJar.java:226)
 at org.apache.hadoop.util.RunJar.main(RunJar.java:141)
Error: Unrecognized column type:TIMESTAMP_TYPE (state=,code=0)

 

I try hive,it works well,and the convert is correct

Two questions:

q1:

if we forbid this convert,should we keep all cases the same?

q2:

if we allow the convert in some cases, should we decide the long length, for 
the code seems to force to convert to ns with times*100 nomatter how long 
the data is,if it convert to timestamp with incorrect length, we can raise the 
error.
{code:java}
// // converting seconds to us
private[this] def longToTimestamp(t: Long): Long = t * 100L{code}
 

Thanks!

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-31588) merge small files may need more common setting

2020-05-12 Thread philipse (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-31588?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17105530#comment-17105530
 ] 

philipse commented on SPARK-31588:
--

Thanks Hyukjin for your advice , i will reconsider it.
















> merge small files may need more common setting
> --
>
> Key: SPARK-31588
> URL: https://issues.apache.org/jira/browse/SPARK-31588
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.4.5
> Environment: spark:2.4.5
> hdp:2.7
>Reporter: philipse
>Priority: Major
>
> Hi ,
> SparkSql now allow us to use  repartition or coalesce to manually control the 
> small files like the following
> /*+ REPARTITION(1) */
> /*+ COALESCE(1) */
> But it can only be  tuning case by case ,we need to decide whether we need to 
> use COALESCE or REPARTITION,can we try a more common way to reduce the 
> decision by set the target size  as hive did
> *Good points:*
> 1)we will also the new partitions number
> 2)with an ON-OFF parameter  provided , user can close it if needed
> 3)the parmeter can be set at cluster level instand of user side,it will be 
> more easier to controll samll files.
> 4)greatly reduce the pressue of namenode
>  
> *Not good points:*
> 1)It will add a new task to calculate the target numbers by stastics the out 
> files.
>  
> I don't know whether we have planned this in future.
>  
> Thanks



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-31588) merge small files may need more common setting

2020-05-08 Thread philipse (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-31588?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17102414#comment-17102414
 ] 

philipse commented on SPARK-31588:
--

yes, the block size can be controlled in HDFS.i mean we just take the block 
size as one the the condition. if we can control the target size in SPARK, we 
can control the real data in HDFS,instand using repartition control the hard 
limit.

> merge small files may need more common setting
> --
>
> Key: SPARK-31588
> URL: https://issues.apache.org/jira/browse/SPARK-31588
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.4.5
> Environment: spark:2.4.5
> hdp:2.7
>Reporter: philipse
>Priority: Major
>
> Hi ,
> SparkSql now allow us to use  repartition or coalesce to manually control the 
> small files like the following
> /*+ REPARTITION(1) */
> /*+ COALESCE(1) */
> But it can only be  tuning case by case ,we need to decide whether we need to 
> use COALESCE or REPARTITION,can we try a more common way to reduce the 
> decision by set the target size  as hive did
> *Good points:*
> 1)we will also the new partitions number
> 2)with an ON-OFF parameter  provided , user can close it if needed
> 3)the parmeter can be set at cluster level instand of user side,it will be 
> more easier to controll samll files.
> 4)greatly reduce the pressue of namenode
>  
> *Not good points:*
> 1)It will add a new task to calculate the target numbers by stastics the out 
> files.
>  
> I don't know whether we have planned this in future.
>  
> Thanks



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-31588) merge small files may need more common setting

2020-05-07 Thread philipse (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-31588?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17101775#comment-17101775
 ] 

philipse commented on SPARK-31588:
--

For example:

if we have output 3 files,size as 10M,50M,200M,the block size as 128M,we may 
keep the file size more close the average,but we also should keep the size 
bigger than the block, just in case someone set wrong paramters. 

case 1:we set the target size as 60M.the  expected average file size as 
Max(blocksize,60M) it will output an integer file count as the repartition 
number :[total_file_size /average file size]+1

the final result will be 3 files:size as 128M,128M,4M

 

if we set the target size as 5120M, then it will repartition as 1 file. size as 
 260M.

thus ,we can set the target size as the global paramter,it will benefit all 
task.

> merge small files may need more common setting
> --
>
> Key: SPARK-31588
> URL: https://issues.apache.org/jira/browse/SPARK-31588
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.4.5
> Environment: spark:2.4.5
> hdp:2.7
>Reporter: philipse
>Priority: Major
>
> Hi ,
> SparkSql now allow us to use  repartition or coalesce to manually control the 
> small files like the following
> /*+ REPARTITION(1) */
> /*+ COALESCE(1) */
> But it can only be  tuning case by case ,we need to decide whether we need to 
> use COALESCE or REPARTITION,can we try a more common way to reduce the 
> decision by set the target size  as hive did
> *Good points:*
> 1)we will also the new partitions number
> 2)with an ON-OFF parameter  provided , user can close it if needed
> 3)the parmeter can be set at cluster level instand of user side,it will be 
> more easier to controll samll files.
> 4)greatly reduce the pressue of namenode
>  
> *Not good points:*
> 1)It will add a new task to calculate the target numbers by stastics the out 
> files.
>  
> I don't know whether we have planned this in future.
>  
> Thanks



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-31588) merge small files may need more common setting

2020-04-27 Thread philipse (Jira)
philipse created SPARK-31588:


 Summary: merge small files may need more common setting
 Key: SPARK-31588
 URL: https://issues.apache.org/jira/browse/SPARK-31588
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 2.4.5
 Environment: spark:2.4.5

hdp:2.7
Reporter: philipse


Hi ,

SparkSql now allow us to use  repartition or coalesce to manually control the 
small files like the following

/*+ REPARTITION(1) */

/*+ COALESCE(1) */

But it can only be  tuning case by case ,we need to decide whether we need to 
use COALESCE or REPARTITION,can we try a more common way to reduce the decision 
by set the target size  as hive did

*Good points:*

1)we will also the new partitions number

2)with an ON-OFF parameter  provided , user can close it if needed

3)the parmeter can be set at cluster level instand of user side,it will be more 
easier to controll samll files.

4)greatly reduce the pressue of namenode

 

*Not good points:*

1)It will add a new task to calculate the target numbers by stastics the out 
files.

 

I don't know whether we have planned this in future.

 

Thanks



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-24194) HadoopFsRelation cannot overwrite a path that is also being read from

2020-04-27 Thread philipse (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-24194?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17093055#comment-17093055
 ] 

philipse commented on SPARK-24194:
--

Hi 

is the issue closed ? can i try it in product env?

 

Thanks

> HadoopFsRelation cannot overwrite a path that is also being read from
> -
>
> Key: SPARK-24194
> URL: https://issues.apache.org/jira/browse/SPARK-24194
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.1.0
> Environment: spark master
>Reporter: yangz
>Priority: Minor
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> When
> {code:java}
> INSERT OVERWRITE TABLE territory_count_compare select * from 
> territory_count_compare where shop_count!=real_shop_count
> {code}
> And territory_count_compare is a table with parquet, there will be a error 
> Cannot overwrite a path that is also being read from
>  
> And in file MetastoreDataSourceSuite.scala, there have a test case
>  
>  
> {code:java}
> table(tableName).write.mode(SaveMode.Overwrite).insertInto(tableName)
> {code}
>  
> But when the table territory_count_compare is a common hive table, there is 
> no error. 
> So I think the reason is when insert overwrite into hadoopfs relation with 
> static partition, it first delete the partition in the output. But it should 
> be the time when the job commited.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-31508) string type compare with numberic cause data inaccurate

2020-04-22 Thread philipse (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-31508?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17089663#comment-17089663
 ] 

philipse commented on SPARK-31508:
--

haha ,but normall it will be a little complex,we will migrate many hqls to 
sparksql,I suggest it will be better dealed with in the code. Can you help 
review the PR?

> string type compare with numberic cause data inaccurate
> ---
>
> Key: SPARK-31508
> URL: https://issues.apache.org/jira/browse/SPARK-31508
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.4.5
> Environment: hadoop2.7
> spark2.4.5
>Reporter: philipse
>Priority: Major
> Attachments: image-2020-04-22-20-00-09-821.png
>
>
> Hi all
>  
> Sparksql may should convert values to double if string type compare with 
> number type.the cases shows as below
> 1, create table
> create table test1(id string);
>  
> 2,insert data into table
> insert into test1 select 'avc';
> insert into test1 select '2';
> insert into test1 select '0a';
> insert into test1 select '';
> insert into test1 select 
> '22';
> 3.Let's check what's happening
> select * from test_gf13871.test1 where id > 0
> the results shows below
> *2*
> **
> Really amazing,the big number 222...cannot be selected.
> while when i check in hive,the 222...shows normal.
> 4.try to explain the command,we may know what happened,if the data is big 
> enough than max_int_value,it will not selected,we may need to convert to 
> double instand.
> !image-2020-04-21-18-49-58-850.png!
> I wanna know if we have fixed or planned in 3.0 or later version.,please feel 
> free to give any advice,
>  
> Many Thanks



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-31512) make window function using order by optional

2020-04-21 Thread philipse (Jira)
philipse created SPARK-31512:


 Summary: make window function using order by optional
 Key: SPARK-31512
 URL: https://issues.apache.org/jira/browse/SPARK-31512
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 2.4.5
 Environment: spark2.4.5

hadoop2.7.7
Reporter: philipse


Hi all

In other sql dialect ,order by is not the must when using window function,we 
may make it pararmteterized

the below is the case :

*select row_number()over() from test1*

Error: org.apache.spark.sql.AnalysisException: Window function row_number() 
requires window to be ordered, please add ORDER BY clause. For example SELECT 
row_number()(value_expr) OVER (PARTITION BY window_partition ORDER BY 
window_ordering) from table; (state=,code=0)

 

So. i suggest make it as a choice,or we will meet the error when migrate sql 
from other dialect,such as hive



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-31508) string type compare with numberic cause data inaccurate

2020-04-21 Thread philipse (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-31508?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

philipse updated SPARK-31508:
-
Summary: string type compare with numberic cause data inaccurate  (was: 
string type compare with numberic case data inaccurate)

> string type compare with numberic cause data inaccurate
> ---
>
> Key: SPARK-31508
> URL: https://issues.apache.org/jira/browse/SPARK-31508
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.4.5
> Environment: hadoop2.7
> spark2.4.5
>Reporter: philipse
>Priority: Major
>
> Hi all
>  
> Sparksql may should convert values to double if string type compare with 
> number type.the cases shows as below
> 1, create table
> create table test1(id string);
>  
> 2,insert data into table
> insert into test1 select 'avc';
> insert into test1 select '2';
> insert into test1 select '0a';
> insert into test1 select '';
> insert into test1 select 
> '22';
> 3.Let's check what's happening
> select * from test_gf13871.test1 where id > 0
> the results shows below
> *2*
> **
> Really amazing,the big number 222...cannot be selected.
> while when i check in hive,the 222...shows normal.
> 4.try to explain the command,we may know what happened,if the data is big 
> enough than max_int_value,it will not selected,we may need to convert to 
> double instand.
> !image-2020-04-21-18-49-58-850.png!
> I wanna know if we have fixed or planned in 3.0 or later version.,please feel 
> free to give any advice,
>  
> Many Thanks



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-31508) string type compare with numberic case data inaccurate

2020-04-21 Thread philipse (Jira)
philipse created SPARK-31508:


 Summary: string type compare with numberic case data inaccurate
 Key: SPARK-31508
 URL: https://issues.apache.org/jira/browse/SPARK-31508
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 2.4.5
 Environment: hadoop2.7

spark2.4.5
Reporter: philipse


Hi all

 

Sparksql may should convert values to double if string type compare with number 
type.the cases shows as below
1, create table
create table test1(id string);
 
2,insert data into table
insert into test1 select 'avc';
insert into test1 select '2';
insert into test1 select '0a';
insert into test1 select '';
insert into test1 select 
'22';
3.Let's check what's happening
select * from test_gf13871.test1 where id > 0
the results shows below
*2*
**
Really amazing,the big number 222...cannot be selected.
while when i check in hive,the 222...shows normal.
4.try to explain the command,we may know what happened,if the data is big 
enough than max_int_value,it will not selected,we may need to convert to double 
instand.
!image-2020-04-21-18-49-58-850.png!
I wanna know if we have fixed or planned in 3.0 or later version.,please feel 
free to give any advice,
 
Many Thanks



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-18681) Throw Filtering is supported only on partition keys of type string exception

2020-04-03 Thread philipse (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-18681?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17074361#comment-17074361
 ] 

philipse commented on SPARK-18681:
--

[~michael] any news for this issue ? i meet the same issue on spark2.4.5

> Throw Filtering is supported only on partition keys of type string exception
> 
>
> Key: SPARK-18681
> URL: https://issues.apache.org/jira/browse/SPARK-18681
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.1.0
>Reporter: Yuming Wang
>Assignee: Yuming Wang
>Priority: Major
> Fix For: 2.1.0
>
>
> Cloudera put 
> {{/var/run/cloudera-scm-agent/process/15000-hive-HIVEMETASTORE/hive-site.xml}}
>  as the configuration file for the Hive Metastore Server, where 
> {{hive.metastore.try.direct.sql=false}}. But Spark reading the gateway 
> configuration file and get default value 
> {{hive.metastore.try.direct.sql=true}}. we should use {{getMetaConf}} or 
> {{getMSC.getConfigValue}} method to obtain the original configuration from 
> Hive Metastore Server.
> {noformat}
> spark-sql> CREATE TABLE test (value INT) PARTITIONED BY (part INT);
> Time taken: 0.221 seconds
> spark-sql> select * from test where part=1 limit 10;
> 16/12/02 08:33:45 ERROR thriftserver.SparkSQLDriver: Failed in [select * from 
> test where part=1 limit 10]
> java.lang.RuntimeException: Caught Hive MetaException attempting to get 
> partition metadata by filter from Hive. You can set the Spark configuration 
> setting spark.sql.hive.manageFilesourcePartitions to false to work around 
> this problem, however this will result in degraded performance. Please report 
> a bug: https://issues.apache.org/jira/browse/SPARK
>   at 
> org.apache.spark.sql.hive.client.Shim_v0_13.getPartitionsByFilter(HiveShim.scala:610)
>   at 
> org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$getPartitionsByFilter$1.apply(HiveClientImpl.scala:549)
>   at 
> org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$getPartitionsByFilter$1.apply(HiveClientImpl.scala:547)
>   at 
> org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$withHiveState$1.apply(HiveClientImpl.scala:282)
>   at 
> org.apache.spark.sql.hive.client.HiveClientImpl.liftedTree1$1(HiveClientImpl.scala:229)
>   at 
> org.apache.spark.sql.hive.client.HiveClientImpl.retryLocked(HiveClientImpl.scala:228)
>   at 
> org.apache.spark.sql.hive.client.HiveClientImpl.withHiveState(HiveClientImpl.scala:271)
>   at 
> org.apache.spark.sql.hive.client.HiveClientImpl.getPartitionsByFilter(HiveClientImpl.scala:547)
>   at 
> org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$listPartitionsByFilter$1.apply(HiveExternalCatalog.scala:954)
>   at 
> org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$listPartitionsByFilter$1.apply(HiveExternalCatalog.scala:938)
>   at 
> org.apache.spark.sql.hive.HiveExternalCatalog.withClient(HiveExternalCatalog.scala:91)
>   at 
> org.apache.spark.sql.hive.HiveExternalCatalog.listPartitionsByFilter(HiveExternalCatalog.scala:938)
>   at 
> org.apache.spark.sql.hive.MetastoreRelation.getHiveQlPartitions(MetastoreRelation.scala:156)
>   at 
> org.apache.spark.sql.hive.execution.HiveTableScanExec$$anonfun$10.apply(HiveTableScanExec.scala:151)
>   at 
> org.apache.spark.sql.hive.execution.HiveTableScanExec$$anonfun$10.apply(HiveTableScanExec.scala:150)
>   at org.apache.spark.util.Utils$.withDummyCallSite(Utils.scala:2435)
>   at 
> org.apache.spark.sql.hive.execution.HiveTableScanExec.doExecute(HiveTableScanExec.scala:149)
>   at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:114)
>   at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:114)
>   at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:135)
>   at 
> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
>   at 
> org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:132)
>   at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:113)
>   at 
> org.apache.spark.sql.execution.SparkPlan.getByteArrayRdd(SparkPlan.scala:225)
>   at 
> org.apache.spark.sql.execution.SparkPlan.executeTake(SparkPlan.scala:308)
>   at 
> org.apache.spark.sql.execution.CollectLimitExec.executeCollect(limit.scala:38)
>   at 
> org.apache.spark.sql.execution.SparkPlan.executeCollectPublic(SparkPlan.scala:295)
>   at 
> org.apache.spark.sql.execution.QueryExecution$$anonfun$hiveResultString$4.apply(QueryExecution.scala:134)
>   at 
> org.apache.spark.sql.execution.QueryExecution$$anonfun$hiveResultString$4.apply(QueryExecution.scala:133)
>   at 
>