[jira] [Created] (SPARK-24497) Support recursive SQL query

2018-06-08 Thread Yuming Wang (JIRA)
Yuming Wang created SPARK-24497:
---

 Summary: Support recursive SQL query
 Key: SPARK-24497
 URL: https://issues.apache.org/jira/browse/SPARK-24497
 Project: Spark
  Issue Type: New Feature
  Components: SQL
Affects Versions: 2.4.0
Reporter: Yuming Wang


h3. *Examples*

Here is an example for {{WITH RECURSIVE}} clause usage. Table "department" 
represents the structure of an organization as an adjacency list.
{code:sql}
CREATE TABLE department (
id INTEGER PRIMARY KEY,  -- department ID
parent_department INTEGER REFERENCES department, -- upper department ID
name TEXT -- department name
);

INSERT INTO department (id, parent_department, "name")
VALUES
 (0, NULL, 'ROOT'),
 (1, 0, 'A'),
 (2, 1, 'B'),
 (3, 2, 'C'),
 (4, 2, 'D'),
 (5, 0, 'E'),
 (6, 4, 'F'),
 (7, 5, 'G');

-- department structure represented here is as follows:
--
-- ROOT-+->A-+->B-+->C
--  | |
--  | +->D-+->F
--  +->E-+->G
{code}
 
 To extract all departments under A, you can use the following recursive query:
{code:sql}
WITH RECURSIVE subdepartment AS
(
-- non-recursive term
SELECT * FROM department WHERE name = 'A'

UNION ALL

-- recursive term
SELECT d.*
FROM
department AS d
JOIN
subdepartment AS sd
ON (d.parent_department = sd.id)
)
SELECT *
FROM subdepartment
ORDER BY name;
{code}
More details:

[http://wiki.postgresql.org/wiki/CTEReadme]

[https://info.teradata.com/htmlpubs/DB_TTU_16_00/index.html#page/SQL_Reference/B035-1141-160K/lqe1472241402390.html]

 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-24538) Decimal type support push down to the data sources

2018-06-12 Thread Yuming Wang (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-24538?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16510503#comment-16510503
 ] 

Yuming Wang commented on SPARK-24538:
-

I'm working on this.

> Decimal type support push down to the data sources
> --
>
> Key: SPARK-24538
> URL: https://issues.apache.org/jira/browse/SPARK-24538
> Project: Spark
>  Issue Type: New Feature
>  Components: SQL
>Affects Versions: 2.4.0
>Reporter: Yuming Wang
>Priority: Major
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-24538) Decimal type support push down to the data sources

2018-06-12 Thread Yuming Wang (JIRA)
Yuming Wang created SPARK-24538:
---

 Summary: Decimal type support push down to the data sources
 Key: SPARK-24538
 URL: https://issues.apache.org/jira/browse/SPARK-24538
 Project: Spark
  Issue Type: New Feature
  Components: SQL
Affects Versions: 2.4.0
Reporter: Yuming Wang






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Issue Comment Deleted] (SPARK-24538) Decimal type support push down to the data sources

2018-06-12 Thread Yuming Wang (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-24538?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yuming Wang updated SPARK-24538:

Comment: was deleted

(was: I'm working on this.)

> Decimal type support push down to the data sources
> --
>
> Key: SPARK-24538
> URL: https://issues.apache.org/jira/browse/SPARK-24538
> Project: Spark
>  Issue Type: New Feature
>  Components: SQL
>Affects Versions: 2.4.0
>Reporter: Yuming Wang
>Priority: Major
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-24538) Decimal type support push down to the data sources

2018-06-12 Thread Yuming Wang (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-24538?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yuming Wang updated SPARK-24538:

Description: 
Latest parquet support decimal type statistics. then we can push down:
{noformat}
LM-SHC-16502798:parquet-mr yumwang$ java -jar 
./parquet-tools/target/parquet-tools-1.10.10-column-index-SNAPSHOT.jar meta 
/tmp/spark/parquet/decimal/part-0-3880e69a-6dd1-4c2b-946c-e7dae047f65c-c000.snappy.parquet

file:         
file:/tmp/spark/parquet/decimal/part-0-3880e69a-6dd1-4c2b-946c-e7dae047f65c-c000.snappy.parquet

creator:      parquet-mr version 1.10.0 (build 
031a6654009e3b82020012a18434c582bd74c73a)

extra:        org.apache.spark.sql.parquet.row.metadata = 
{"type":"struct","fields":[{"name":"id","type":"long","nullable":false,"metadata":{}},{"name":"d1","type":"decimal(9,0)","nullable":true,"metadata":{}},{"name":"d2","type":"decimal(9,2)","nullable":true,"metadata":{}},{"name":"d3","type":"decimal(18,0)","nullable":true,"metadata":{}},{"name":"d4","type":"decimal(18,4)","nullable":true,"metadata":{}},{"name":"d5","type":"decimal(38,0)","nullable":true,"metadata":{}},{"name":"d6","type":"decimal(38,18)","nullable":true,"metadata":{}}]}



file schema:  spark_schema



id:           REQUIRED INT64 R:0 D:0

d1:           OPTIONAL INT32 O:DECIMAL R:0 D:1

d2:           OPTIONAL INT32 O:DECIMAL R:0 D:1

d3:           OPTIONAL INT64 O:DECIMAL R:0 D:1

d4:           OPTIONAL INT64 O:DECIMAL R:0 D:1

d5:           OPTIONAL FIXED_LEN_BYTE_ARRAY O:DECIMAL R:0 D:1

d6:           OPTIONAL FIXED_LEN_BYTE_ARRAY O:DECIMAL R:0 D:1



row group 1:  RC:241867 TS:15480513 OFFSET:4



id:            INT64 SNAPPY DO:0 FPO:4 SZ:968154/1935071/2.00 VC:241867 
ENC:BIT_PACKED,PLAIN ST:[min: 0, max: 241866, num_nulls: 0]

d1:            INT32 SNAPPY DO:0 FPO:968158 SZ:967555/967515/1.00 VC:241867 
ENC:RLE,BIT_PACKED,PLAIN ST:[min: 0, max: 241866, num_nulls: 0]

d2:            INT32 SNAPPY DO:0 FPO:1935713 SZ:967558/967515/1.00 VC:241867 
ENC:RLE,BIT_PACKED,PLAIN ST:[min: 0.00, max: 241866.00, num_nulls: 0]

d3:            INT64 SNAPPY DO:0 FPO:2903271 SZ:968866/1935047/2.00 VC:241867 
ENC:RLE,BIT_PACKED,PLAIN ST:[min: 0, max: 241866, num_nulls: 0]

d4:            INT64 SNAPPY DO:0 FPO:3872137 SZ:1247007/1935047/1.55 VC:241867 
ENC:RLE,BIT_PACKED,PLAIN ST:[min: 0., max: 241866., num_nulls: 0]

d5:            FIXED_LEN_BYTE_ARRAY SNAPPY DO:0 FPO:5119144 
SZ:1266850/3870159/3.05 VC:241867 ENC:RLE,BIT_PACKED,PLAIN ST:[min: 0, max: 
241866, num_nulls: 0]

d6:            FIXED_LEN_BYTE_ARRAY SNAPPY DO:0 FPO:6385994 
SZ:2198910/3870159/1.76 VC:241867 ENC:RLE,BIT_PACKED,PLAIN ST:[min: 0E-18, max: 
241866.00, num_nulls: 0]



row group 2:  RC:241867 TS:15480513 OFFSET:8584904



id:            INT64 SNAPPY DO:0 FPO:8584904 SZ:968131/1935071/2.00 VC:241867 
ENC:BIT_PACKED,PLAIN ST:[min: 241867, max: 483733, num_nulls: 0]

d1:            INT32 SNAPPY DO:0 FPO:9553035 SZ:967563/967515/1.00 VC:241867 
ENC:RLE,BIT_PACKED,PLAIN ST:[min: 241867, max: 483733, num_nulls: 0]

d2:            INT32 SNAPPY DO:0 FPO:10520598 SZ:967563/967515/1.00 VC:241867 
ENC:RLE,BIT_PACKED,PLAIN ST:[min: 241867.00, max: 483733.00, num_nulls: 0]

d3:            INT64 SNAPPY DO:0 FPO:11488161 SZ:968110/1935047/2.00 VC:241867 
ENC:RLE,BIT_PACKED,PLAIN ST:[min: 241867, max: 483733, num_nulls: 0]

d4:            INT64 SNAPPY DO:0 FPO:12456271 SZ:1247071/1935047/1.55 VC:241867 
ENC:RLE,BIT_PACKED,PLAIN ST:[min: 241867., max: 483733., num_nulls: 0]

d5:            FIXED_LEN_BYTE_ARRAY SNAPPY DO:0 FPO:13703342 
SZ:1270587/3870159/3.05 VC:241867 ENC:RLE,BIT_PACKED,PLAIN ST:[min: 241867, 
max: 483733, num_nulls: 0]

d6:            FIXED_LEN_BYTE_ARRAY SNAPPY DO:0 FPO:14973929 
SZ:2197306/3870159/1.76 VC:241867 ENC:RLE,BIT_PACKED,PLAIN ST:[min: 
241867.00, max: 483733.00, num_nulls: 
0]{noformat}

> Decimal type support push down to the data sources
> --
>
> Key: SPARK-24538
> URL: https://issues.apache.org/jira/browse/SPARK-24538
> Project: Spark
>  Issue Type: New Feature
>  Components: SQL
>Affects Versions: 2.4.0
>Reporter: Yuming Wang
>Priority: Major
>
> Latest parquet support decimal type statistics. then we can push down:
> {noformat}
> LM-SHC-16502798:parquet-mr yumwang$ java -jar 
> ./parquet-tools/target/parquet-tools-1.10.10-column-index-SNAPSHOT.jar meta 
> /tmp/spark/parquet/decimal/part-0-3880e69a-6dd1-4c2b-946c-e7dae047f65c-c000.snappy.parquet
> file:         
> 

[jira] [Updated] (SPARK-24538) Decimal type support push down to the data sources

2018-06-12 Thread Yuming Wang (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-24538?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yuming Wang updated SPARK-24538:

Description: 
Latest parquet support decimal type statistics. then we can push down to the 
data sources:
{noformat}
LM-SHC-16502798:parquet-mr yumwang$ java -jar 
./parquet-tools/target/parquet-tools-1.10.10-column-index-SNAPSHOT.jar meta 
/tmp/spark/parquet/decimal/part-0-3880e69a-6dd1-4c2b-946c-e7dae047f65c-c000.snappy.parquet

file:         
file:/tmp/spark/parquet/decimal/part-0-3880e69a-6dd1-4c2b-946c-e7dae047f65c-c000.snappy.parquet

creator:      parquet-mr version 1.10.0 (build 
031a6654009e3b82020012a18434c582bd74c73a)

extra:        org.apache.spark.sql.parquet.row.metadata = 
{"type":"struct","fields":[{"name":"id","type":"long","nullable":false,"metadata":{}},{"name":"d1","type":"decimal(9,0)","nullable":true,"metadata":{}},{"name":"d2","type":"decimal(9,2)","nullable":true,"metadata":{}},{"name":"d3","type":"decimal(18,0)","nullable":true,"metadata":{}},{"name":"d4","type":"decimal(18,4)","nullable":true,"metadata":{}},{"name":"d5","type":"decimal(38,0)","nullable":true,"metadata":{}},{"name":"d6","type":"decimal(38,18)","nullable":true,"metadata":{}}]}



file schema:  spark_schema



id:           REQUIRED INT64 R:0 D:0

d1:           OPTIONAL INT32 O:DECIMAL R:0 D:1

d2:           OPTIONAL INT32 O:DECIMAL R:0 D:1

d3:           OPTIONAL INT64 O:DECIMAL R:0 D:1

d4:           OPTIONAL INT64 O:DECIMAL R:0 D:1

d5:           OPTIONAL FIXED_LEN_BYTE_ARRAY O:DECIMAL R:0 D:1

d6:           OPTIONAL FIXED_LEN_BYTE_ARRAY O:DECIMAL R:0 D:1



row group 1:  RC:241867 TS:15480513 OFFSET:4



id:            INT64 SNAPPY DO:0 FPO:4 SZ:968154/1935071/2.00 VC:241867 
ENC:BIT_PACKED,PLAIN ST:[min: 0, max: 241866, num_nulls: 0]

d1:            INT32 SNAPPY DO:0 FPO:968158 SZ:967555/967515/1.00 VC:241867 
ENC:RLE,BIT_PACKED,PLAIN ST:[min: 0, max: 241866, num_nulls: 0]

d2:            INT32 SNAPPY DO:0 FPO:1935713 SZ:967558/967515/1.00 VC:241867 
ENC:RLE,BIT_PACKED,PLAIN ST:[min: 0.00, max: 241866.00, num_nulls: 0]

d3:            INT64 SNAPPY DO:0 FPO:2903271 SZ:968866/1935047/2.00 VC:241867 
ENC:RLE,BIT_PACKED,PLAIN ST:[min: 0, max: 241866, num_nulls: 0]

d4:            INT64 SNAPPY DO:0 FPO:3872137 SZ:1247007/1935047/1.55 VC:241867 
ENC:RLE,BIT_PACKED,PLAIN ST:[min: 0., max: 241866., num_nulls: 0]

d5:            FIXED_LEN_BYTE_ARRAY SNAPPY DO:0 FPO:5119144 
SZ:1266850/3870159/3.05 VC:241867 ENC:RLE,BIT_PACKED,PLAIN ST:[min: 0, max: 
241866, num_nulls: 0]

d6:            FIXED_LEN_BYTE_ARRAY SNAPPY DO:0 FPO:6385994 
SZ:2198910/3870159/1.76 VC:241867 ENC:RLE,BIT_PACKED,PLAIN ST:[min: 0E-18, max: 
241866.00, num_nulls: 0]



row group 2:  RC:241867 TS:15480513 OFFSET:8584904



id:            INT64 SNAPPY DO:0 FPO:8584904 SZ:968131/1935071/2.00 VC:241867 
ENC:BIT_PACKED,PLAIN ST:[min: 241867, max: 483733, num_nulls: 0]

d1:            INT32 SNAPPY DO:0 FPO:9553035 SZ:967563/967515/1.00 VC:241867 
ENC:RLE,BIT_PACKED,PLAIN ST:[min: 241867, max: 483733, num_nulls: 0]

d2:            INT32 SNAPPY DO:0 FPO:10520598 SZ:967563/967515/1.00 VC:241867 
ENC:RLE,BIT_PACKED,PLAIN ST:[min: 241867.00, max: 483733.00, num_nulls: 0]

d3:            INT64 SNAPPY DO:0 FPO:11488161 SZ:968110/1935047/2.00 VC:241867 
ENC:RLE,BIT_PACKED,PLAIN ST:[min: 241867, max: 483733, num_nulls: 0]

d4:            INT64 SNAPPY DO:0 FPO:12456271 SZ:1247071/1935047/1.55 VC:241867 
ENC:RLE,BIT_PACKED,PLAIN ST:[min: 241867., max: 483733., num_nulls: 0]

d5:            FIXED_LEN_BYTE_ARRAY SNAPPY DO:0 FPO:13703342 
SZ:1270587/3870159/3.05 VC:241867 ENC:RLE,BIT_PACKED,PLAIN ST:[min: 241867, 
max: 483733, num_nulls: 0]

d6:            FIXED_LEN_BYTE_ARRAY SNAPPY DO:0 FPO:14973929 
SZ:2197306/3870159/1.76 VC:241867 ENC:RLE,BIT_PACKED,PLAIN ST:[min: 
241867.00, max: 483733.00, num_nulls: 
0]{noformat}

  was:
Latest parquet support decimal type statistics. then we can push down:
{noformat}
LM-SHC-16502798:parquet-mr yumwang$ java -jar 
./parquet-tools/target/parquet-tools-1.10.10-column-index-SNAPSHOT.jar meta 
/tmp/spark/parquet/decimal/part-0-3880e69a-6dd1-4c2b-946c-e7dae047f65c-c000.snappy.parquet

file:         
file:/tmp/spark/parquet/decimal/part-0-3880e69a-6dd1-4c2b-946c-e7dae047f65c-c000.snappy.parquet

creator:      parquet-mr version 1.10.0 (build 
031a6654009e3b82020012a18434c582bd74c73a)

extra:        org.apache.spark.sql.parquet.row.metadata = 

[jira] [Issue Comment Deleted] (SPARK-24549) 32BitDecimalType and 64BitDecimalType support push down to the data sources

2018-06-13 Thread Yuming Wang (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-24549?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yuming Wang updated SPARK-24549:

Comment: was deleted

(was: I'm working on this)

> 32BitDecimalType and 64BitDecimalType support push down to the data sources
> ---
>
> Key: SPARK-24549
> URL: https://issues.apache.org/jira/browse/SPARK-24549
> Project: Spark
>  Issue Type: New Feature
>  Components: SQL
>Affects Versions: 2.4.0
>Reporter: Yuming Wang
>Priority: Major
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-24549) 32BitDecimalType and 64BitDecimalType support push down to the data sources

2018-06-13 Thread Yuming Wang (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-24549?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16511034#comment-16511034
 ] 

Yuming Wang commented on SPARK-24549:
-

I'm working on this

> 32BitDecimalType and 64BitDecimalType support push down to the data sources
> ---
>
> Key: SPARK-24549
> URL: https://issues.apache.org/jira/browse/SPARK-24549
> Project: Spark
>  Issue Type: New Feature
>  Components: SQL
>Affects Versions: 2.4.0
>Reporter: Yuming Wang
>Priority: Major
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-24549) 32BitDecimalType and 64BitDecimalType support push down to the data sources

2018-06-13 Thread Yuming Wang (JIRA)
Yuming Wang created SPARK-24549:
---

 Summary: 32BitDecimalType and 64BitDecimalType support push down 
to the data sources
 Key: SPARK-24549
 URL: https://issues.apache.org/jira/browse/SPARK-24549
 Project: Spark
  Issue Type: New Feature
  Components: SQL
Affects Versions: 2.4.0
Reporter: Yuming Wang






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-24538) ByteArrayDecimalType support push down to the data sources

2018-06-13 Thread Yuming Wang (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-24538?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yuming Wang updated SPARK-24538:

Summary: ByteArrayDecimalType support push down to the data sources  (was: 
Decimal type support push down to the data sources)

> ByteArrayDecimalType support push down to the data sources
> --
>
> Key: SPARK-24538
> URL: https://issues.apache.org/jira/browse/SPARK-24538
> Project: Spark
>  Issue Type: New Feature
>  Components: SQL
>Affects Versions: 2.4.0
>Reporter: Yuming Wang
>Priority: Major
>
> Latest parquet support decimal type statistics. then we can push down to the 
> data sources:
> {noformat}
> LM-SHC-16502798:parquet-mr yumwang$ java -jar 
> ./parquet-tools/target/parquet-tools-1.10.10-column-index-SNAPSHOT.jar meta 
> /tmp/spark/parquet/decimal/part-0-3880e69a-6dd1-4c2b-946c-e7dae047f65c-c000.snappy.parquet
> file:         
> file:/tmp/spark/parquet/decimal/part-0-3880e69a-6dd1-4c2b-946c-e7dae047f65c-c000.snappy.parquet
> creator:      parquet-mr version 1.10.0 (build 
> 031a6654009e3b82020012a18434c582bd74c73a)
> extra:        org.apache.spark.sql.parquet.row.metadata = 
> {"type":"struct","fields":[{"name":"id","type":"long","nullable":false,"metadata":{}},{"name":"d1","type":"decimal(9,0)","nullable":true,"metadata":{}},{"name":"d2","type":"decimal(9,2)","nullable":true,"metadata":{}},{"name":"d3","type":"decimal(18,0)","nullable":true,"metadata":{}},{"name":"d4","type":"decimal(18,4)","nullable":true,"metadata":{}},{"name":"d5","type":"decimal(38,0)","nullable":true,"metadata":{}},{"name":"d6","type":"decimal(38,18)","nullable":true,"metadata":{}}]}
> file schema:  spark_schema
> 
> id:           REQUIRED INT64 R:0 D:0
> d1:           OPTIONAL INT32 O:DECIMAL R:0 D:1
> d2:           OPTIONAL INT32 O:DECIMAL R:0 D:1
> d3:           OPTIONAL INT64 O:DECIMAL R:0 D:1
> d4:           OPTIONAL INT64 O:DECIMAL R:0 D:1
> d5:           OPTIONAL FIXED_LEN_BYTE_ARRAY O:DECIMAL R:0 D:1
> d6:           OPTIONAL FIXED_LEN_BYTE_ARRAY O:DECIMAL R:0 D:1
> row group 1:  RC:241867 TS:15480513 OFFSET:4
> 
> id:            INT64 SNAPPY DO:0 FPO:4 SZ:968154/1935071/2.00 VC:241867 
> ENC:BIT_PACKED,PLAIN ST:[min: 0, max: 241866, num_nulls: 0]
> d1:            INT32 SNAPPY DO:0 FPO:968158 SZ:967555/967515/1.00 VC:241867 
> ENC:RLE,BIT_PACKED,PLAIN ST:[min: 0, max: 241866, num_nulls: 0]
> d2:            INT32 SNAPPY DO:0 FPO:1935713 SZ:967558/967515/1.00 VC:241867 
> ENC:RLE,BIT_PACKED,PLAIN ST:[min: 0.00, max: 241866.00, num_nulls: 0]
> d3:            INT64 SNAPPY DO:0 FPO:2903271 SZ:968866/1935047/2.00 VC:241867 
> ENC:RLE,BIT_PACKED,PLAIN ST:[min: 0, max: 241866, num_nulls: 0]
> d4:            INT64 SNAPPY DO:0 FPO:3872137 SZ:1247007/1935047/1.55 
> VC:241867 ENC:RLE,BIT_PACKED,PLAIN ST:[min: 0., max: 241866., 
> num_nulls: 0]
> d5:            FIXED_LEN_BYTE_ARRAY SNAPPY DO:0 FPO:5119144 
> SZ:1266850/3870159/3.05 VC:241867 ENC:RLE,BIT_PACKED,PLAIN ST:[min: 0, max: 
> 241866, num_nulls: 0]
> d6:            FIXED_LEN_BYTE_ARRAY SNAPPY DO:0 FPO:6385994 
> SZ:2198910/3870159/1.76 VC:241867 ENC:RLE,BIT_PACKED,PLAIN ST:[min: 0E-18, 
> max: 241866.00, num_nulls: 0]
> row group 2:  RC:241867 TS:15480513 OFFSET:8584904
> 
> id:            INT64 SNAPPY DO:0 FPO:8584904 SZ:968131/1935071/2.00 VC:241867 
> ENC:BIT_PACKED,PLAIN ST:[min: 241867, max: 483733, num_nulls: 0]
> d1:            INT32 SNAPPY DO:0 FPO:9553035 SZ:967563/967515/1.00 VC:241867 
> ENC:RLE,BIT_PACKED,PLAIN ST:[min: 241867, max: 483733, num_nulls: 0]
> d2:            INT32 SNAPPY DO:0 FPO:10520598 SZ:967563/967515/1.00 VC:241867 
> ENC:RLE,BIT_PACKED,PLAIN ST:[min: 241867.00, max: 483733.00, num_nulls: 0]
> d3:            INT64 SNAPPY DO:0 FPO:11488161 SZ:968110/1935047/2.00 
> VC:241867 ENC:RLE,BIT_PACKED,PLAIN ST:[min: 241867, max: 483733, num_nulls: 0]
> d4:            INT64 SNAPPY DO:0 FPO:12456271 SZ:1247071/1935047/1.55 
> VC:241867 ENC:RLE,BIT_PACKED,PLAIN ST:[min: 241867., max: 483733., 
> num_nulls: 0]
> d5:            FIXED_LEN_BYTE_ARRAY SNAPPY DO:0 FPO:13703342 
> SZ:1270587/3870159/3.05 VC:241867 ENC:RLE,BIT_PACKED,PLAIN ST:[min: 241867, 
> max: 483733, num_nulls: 0]
> d6:            FIXED_LEN_BYTE_ARRAY SNAPPY DO:0 FPO:14973929 
> SZ:2197306/3870159/1.76 VC:241867 ENC:RLE,BIT_PACKED,PLAIN ST:[min: 
> 241867.00, max: 483733.00, num_nulls: 
> 0]{noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To 

[jira] [Updated] (SPARK-24549) 32BitDecimalType and 64BitDecimalType support push down to the data sources

2018-06-13 Thread Yuming Wang (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-24549?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yuming Wang updated SPARK-24549:

Issue Type: Improvement  (was: New Feature)

> 32BitDecimalType and 64BitDecimalType support push down to the data sources
> ---
>
> Key: SPARK-24549
> URL: https://issues.apache.org/jira/browse/SPARK-24549
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.4.0
>Reporter: Yuming Wang
>Priority: Major
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-24538) ByteArrayDecimalType support push down to the data sources

2018-06-13 Thread Yuming Wang (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-24538?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yuming Wang updated SPARK-24538:

Issue Type: Improvement  (was: New Feature)

> ByteArrayDecimalType support push down to the data sources
> --
>
> Key: SPARK-24538
> URL: https://issues.apache.org/jira/browse/SPARK-24538
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.4.0
>Reporter: Yuming Wang
>Priority: Major
>
> Latest parquet support decimal type statistics. then we can push down to the 
> data sources:
> {noformat}
> LM-SHC-16502798:parquet-mr yumwang$ java -jar 
> ./parquet-tools/target/parquet-tools-1.10.10-column-index-SNAPSHOT.jar meta 
> /tmp/spark/parquet/decimal/part-0-3880e69a-6dd1-4c2b-946c-e7dae047f65c-c000.snappy.parquet
> file:         
> file:/tmp/spark/parquet/decimal/part-0-3880e69a-6dd1-4c2b-946c-e7dae047f65c-c000.snappy.parquet
> creator:      parquet-mr version 1.10.0 (build 
> 031a6654009e3b82020012a18434c582bd74c73a)
> extra:        org.apache.spark.sql.parquet.row.metadata = 
> {"type":"struct","fields":[{"name":"id","type":"long","nullable":false,"metadata":{}},{"name":"d1","type":"decimal(9,0)","nullable":true,"metadata":{}},{"name":"d2","type":"decimal(9,2)","nullable":true,"metadata":{}},{"name":"d3","type":"decimal(18,0)","nullable":true,"metadata":{}},{"name":"d4","type":"decimal(18,4)","nullable":true,"metadata":{}},{"name":"d5","type":"decimal(38,0)","nullable":true,"metadata":{}},{"name":"d6","type":"decimal(38,18)","nullable":true,"metadata":{}}]}
> file schema:  spark_schema
> 
> id:           REQUIRED INT64 R:0 D:0
> d1:           OPTIONAL INT32 O:DECIMAL R:0 D:1
> d2:           OPTIONAL INT32 O:DECIMAL R:0 D:1
> d3:           OPTIONAL INT64 O:DECIMAL R:0 D:1
> d4:           OPTIONAL INT64 O:DECIMAL R:0 D:1
> d5:           OPTIONAL FIXED_LEN_BYTE_ARRAY O:DECIMAL R:0 D:1
> d6:           OPTIONAL FIXED_LEN_BYTE_ARRAY O:DECIMAL R:0 D:1
> row group 1:  RC:241867 TS:15480513 OFFSET:4
> 
> id:            INT64 SNAPPY DO:0 FPO:4 SZ:968154/1935071/2.00 VC:241867 
> ENC:BIT_PACKED,PLAIN ST:[min: 0, max: 241866, num_nulls: 0]
> d1:            INT32 SNAPPY DO:0 FPO:968158 SZ:967555/967515/1.00 VC:241867 
> ENC:RLE,BIT_PACKED,PLAIN ST:[min: 0, max: 241866, num_nulls: 0]
> d2:            INT32 SNAPPY DO:0 FPO:1935713 SZ:967558/967515/1.00 VC:241867 
> ENC:RLE,BIT_PACKED,PLAIN ST:[min: 0.00, max: 241866.00, num_nulls: 0]
> d3:            INT64 SNAPPY DO:0 FPO:2903271 SZ:968866/1935047/2.00 VC:241867 
> ENC:RLE,BIT_PACKED,PLAIN ST:[min: 0, max: 241866, num_nulls: 0]
> d4:            INT64 SNAPPY DO:0 FPO:3872137 SZ:1247007/1935047/1.55 
> VC:241867 ENC:RLE,BIT_PACKED,PLAIN ST:[min: 0., max: 241866., 
> num_nulls: 0]
> d5:            FIXED_LEN_BYTE_ARRAY SNAPPY DO:0 FPO:5119144 
> SZ:1266850/3870159/3.05 VC:241867 ENC:RLE,BIT_PACKED,PLAIN ST:[min: 0, max: 
> 241866, num_nulls: 0]
> d6:            FIXED_LEN_BYTE_ARRAY SNAPPY DO:0 FPO:6385994 
> SZ:2198910/3870159/1.76 VC:241867 ENC:RLE,BIT_PACKED,PLAIN ST:[min: 0E-18, 
> max: 241866.00, num_nulls: 0]
> row group 2:  RC:241867 TS:15480513 OFFSET:8584904
> 
> id:            INT64 SNAPPY DO:0 FPO:8584904 SZ:968131/1935071/2.00 VC:241867 
> ENC:BIT_PACKED,PLAIN ST:[min: 241867, max: 483733, num_nulls: 0]
> d1:            INT32 SNAPPY DO:0 FPO:9553035 SZ:967563/967515/1.00 VC:241867 
> ENC:RLE,BIT_PACKED,PLAIN ST:[min: 241867, max: 483733, num_nulls: 0]
> d2:            INT32 SNAPPY DO:0 FPO:10520598 SZ:967563/967515/1.00 VC:241867 
> ENC:RLE,BIT_PACKED,PLAIN ST:[min: 241867.00, max: 483733.00, num_nulls: 0]
> d3:            INT64 SNAPPY DO:0 FPO:11488161 SZ:968110/1935047/2.00 
> VC:241867 ENC:RLE,BIT_PACKED,PLAIN ST:[min: 241867, max: 483733, num_nulls: 0]
> d4:            INT64 SNAPPY DO:0 FPO:12456271 SZ:1247071/1935047/1.55 
> VC:241867 ENC:RLE,BIT_PACKED,PLAIN ST:[min: 241867., max: 483733., 
> num_nulls: 0]
> d5:            FIXED_LEN_BYTE_ARRAY SNAPPY DO:0 FPO:13703342 
> SZ:1270587/3870159/3.05 VC:241867 ENC:RLE,BIT_PACKED,PLAIN ST:[min: 241867, 
> max: 483733, num_nulls: 0]
> d6:            FIXED_LEN_BYTE_ARRAY SNAPPY DO:0 FPO:14973929 
> SZ:2197306/3870159/1.76 VC:241867 ENC:RLE,BIT_PACKED,PLAIN ST:[min: 
> 241867.00, max: 483733.00, num_nulls: 
> 0]{noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: 

[jira] [Commented] (SPARK-20427) Issue with Spark interpreting Oracle datatype NUMBER

2018-07-01 Thread Yuming Wang (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-20427?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16529013#comment-16529013
 ] 

Yuming Wang commented on SPARK-20427:
-

[~ORichard]. Please try to use {{customSchema}} to specifying the custom data 
types of the read schema.  
https://github.com/apache/spark/blob/v2.3.1/examples/src/main/scala/org/apache/spark/examples/sql/SQLDataSourceExample.scala#L197




> Issue with Spark interpreting Oracle datatype NUMBER
> 
>
> Key: SPARK-20427
> URL: https://issues.apache.org/jira/browse/SPARK-20427
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.1.0
>Reporter: Alexander Andrushenko
>Assignee: Yuming Wang
>Priority: Major
> Fix For: 2.3.0
>
>
> In Oracle exists data type NUMBER. When defining a filed in a table of type 
> NUMBER the field has two components, precision and scale.
> For example, NUMBER(p,s) has precision p and scale s. 
> Precision can range from 1 to 38.
> Scale can range from -84 to 127.
> When reading such a filed Spark can create numbers with precision exceeding 
> 38. In our case it has created fields with precision 44,
> calculated as sum of the precision (in our case 34 digits) and the scale (10):
> "...java.lang.IllegalArgumentException: requirement failed: Decimal precision 
> 44 exceeds max precision 38...".
> The result was, that a data frame was read from a table on one schema but 
> could not be inserted in the identical table on other schema.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-24716) Refactor ParquetFilters

2018-07-02 Thread Yuming Wang (JIRA)
Yuming Wang created SPARK-24716:
---

 Summary: Refactor ParquetFilters
 Key: SPARK-24716
 URL: https://issues.apache.org/jira/browse/SPARK-24716
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 2.4.0
Reporter: Yuming Wang






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-24716) Refactor ParquetFilters

2018-07-02 Thread Yuming Wang (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-24716?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16529603#comment-16529603
 ] 

Yuming Wang commented on SPARK-24716:
-

I'm working on.

> Refactor ParquetFilters
> ---
>
> Key: SPARK-24716
> URL: https://issues.apache.org/jira/browse/SPARK-24716
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.4.0
>Reporter: Yuming Wang
>Priority: Major
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Issue Comment Deleted] (SPARK-24692) Improvement FilterPushdownBenchmark

2018-06-29 Thread Yuming Wang (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-24692?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yuming Wang updated SPARK-24692:

Comment: was deleted

(was: I'm working on it.)

> Improvement FilterPushdownBenchmark
> ---
>
> Key: SPARK-24692
> URL: https://issues.apache.org/jira/browse/SPARK-24692
> Project: Spark
>  Issue Type: Improvement
>  Components: Tests
>Affects Versions: 2.4.0
>Reporter: Yuming Wang
>Priority: Major
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-24658) Remove workaround for ANTLR bug

2018-06-25 Thread Yuming Wang (JIRA)
Yuming Wang created SPARK-24658:
---

 Summary: Remove workaround for ANTLR bug
 Key: SPARK-24658
 URL: https://issues.apache.org/jira/browse/SPARK-24658
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 2.4.0
Reporter: Yuming Wang


Issue [antlr/antlr4#781|https://github.com/antlr/antlr4/issues/781] has already 
been fixed, so the workaround of extracting the pattern into a separate rule is 
no longer needed. The presto already removed it: 
https://github.com/prestodb/presto/pull/10744.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-24638) StringStartsWith support push down

2018-06-23 Thread Yuming Wang (JIRA)
Yuming Wang created SPARK-24638:
---

 Summary: StringStartsWith support push down
 Key: SPARK-24638
 URL: https://issues.apache.org/jira/browse/SPARK-24638
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 2.4.0
Reporter: Yuming Wang






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-24706) Support ByteType and ShortType pushdown to parquet

2018-06-30 Thread Yuming Wang (JIRA)
Yuming Wang created SPARK-24706:
---

 Summary: Support ByteType and ShortType pushdown to parquet
 Key: SPARK-24706
 URL: https://issues.apache.org/jira/browse/SPARK-24706
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 2.4.0
Reporter: Yuming Wang


Benchmark result:

{noformat}
###[ Pushdown benchmark for tinyint 
]
Java HotSpot(TM) 64-Bit Server VM 1.8.0_151-b12 on Mac OS X 10.12.6
Intel(R) Core(TM) i7-7820HQ CPU @ 2.90GHz

Select 1 tinyint row (value = CAST(63 AS tinyint)): Best/Avg Time(ms)
Rate(M/s)   Per Row(ns)   Relative

Parquet Vectorized4307 / 4575  3.7 
273.8   1.0X
Parquet Vectorized (Pushdown)  227 /  241 69.4  
14.4  19.0X
Native ORC Vectorized 3646 / 3727  4.3 
231.8   1.2X
Native ORC Vectorized (Pushdown)   736 /  744 21.4  
46.8   5.9X

Select 10% tinyint rows (value < 12):Best/Avg Time(ms)Rate(M/s)   Per 
Row(ns)   Relative

Parquet Vectorized5209 / 5843  3.0 
331.2   1.0X
Parquet Vectorized (Pushdown) 1296 / 1759 12.1  
82.4   4.0X
Native ORC Vectorized 4455 / 4594  3.5 
283.2   1.2X
Native ORC Vectorized (Pushdown)  1736 / 1813  9.1 
110.4   3.0X

Select 50% tinyint rows (value < 63):Best/Avg Time(ms)Rate(M/s)   Per 
Row(ns)   Relative

Parquet Vectorized8362 / 8394  1.9 
531.7   1.0X
Parquet Vectorized (Pushdown) 6303 / 6530  2.5 
400.7   1.3X
Native ORC Vectorized 7962 / 8113  2.0 
506.2   1.1X
Native ORC Vectorized (Pushdown)  6680 / 7556  2.4 
424.7   1.3X

Select 90% tinyint rows (value < 114):   Best/Avg Time(ms)Rate(M/s)   Per 
Row(ns)   Relative

Parquet Vectorized  11572 / 11715  1.4 
735.7   1.0X
Parquet Vectorized (Pushdown)   11198 / 11326  1.4 
712.0   1.0X
Native ORC Vectorized   11041 / 11209  1.4 
702.0   1.0X
Native ORC Vectorized (Pushdown)11104 / 11472  1.4 
706.0   1.0X
{noformat}




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-24706) Support ByteType and ShortType pushdown to parquet

2018-06-30 Thread Yuming Wang (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-24706?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16528878#comment-16528878
 ] 

Yuming Wang commented on SPARK-24706:
-

Benchmark result:

{noformat}
###[ Pushdown benchmark for tinyint 
]
Java HotSpot(TM) 64-Bit Server VM 1.8.0_151-b12 on Mac OS X 10.12.6
Intel(R) Core(TM) i7-7820HQ CPU @ 2.90GHz

Select 1 tinyint row (value = CAST(63 AS tinyint)): Best/Avg Time(ms)
Rate(M/s)   Per Row(ns)   Relative

Parquet Vectorized4307 / 4575  3.7 
273.8   1.0X
Parquet Vectorized (Pushdown)  227 /  241 69.4  
14.4  19.0X
Native ORC Vectorized 3646 / 3727  4.3 
231.8   1.2X
Native ORC Vectorized (Pushdown)   736 /  744 21.4  
46.8   5.9X

Select 10% tinyint rows (value < 12):Best/Avg Time(ms)Rate(M/s)   Per 
Row(ns)   Relative

Parquet Vectorized5209 / 5843  3.0 
331.2   1.0X
Parquet Vectorized (Pushdown) 1296 / 1759 12.1  
82.4   4.0X
Native ORC Vectorized 4455 / 4594  3.5 
283.2   1.2X
Native ORC Vectorized (Pushdown)  1736 / 1813  9.1 
110.4   3.0X

Select 50% tinyint rows (value < 63):Best/Avg Time(ms)Rate(M/s)   Per 
Row(ns)   Relative

Parquet Vectorized8362 / 8394  1.9 
531.7   1.0X
Parquet Vectorized (Pushdown) 6303 / 6530  2.5 
400.7   1.3X
Native ORC Vectorized 7962 / 8113  2.0 
506.2   1.1X
Native ORC Vectorized (Pushdown)  6680 / 7556  2.4 
424.7   1.3X

Select 90% tinyint rows (value < 114):   Best/Avg Time(ms)Rate(M/s)   Per 
Row(ns)   Relative

Parquet Vectorized  11572 / 11715  1.4 
735.7   1.0X
Parquet Vectorized (Pushdown)   11198 / 11326  1.4 
712.0   1.0X
Native ORC Vectorized   11041 / 11209  1.4 
702.0   1.0X
Native ORC Vectorized (Pushdown)11104 / 11472  1.4 
706.0   1.0X

###[ Pushdown benchmark for smallint 
]###
Java HotSpot(TM) 64-Bit Server VM 1.8.0_151-b12 on Mac OS X 10.12.6
Intel(R) Core(TM) i7-7820HQ CPU @ 2.90GHz

Select 1 smallint row (value =  CAST(63 AS smallint)): Best/Avg Time(ms)
Rate(M/s)   Per Row(ns)   Relative

Parquet Vectorized2939 / 2966  5.4 
186.9   1.0X
Parquet Vectorized (Pushdown)   85 /   91184.9  
 5.4  34.6X
Native ORC Vectorized 2927 / 3026  5.4 
186.1   1.0X
Native ORC Vectorized (Pushdown)   418 /  432 37.7  
26.6   7.0X

Select 10% smallint rows (value < CAST(3276 AS smallint)): Best/Avg Time(ms)
Rate(M/s)   Per Row(ns)   Relative

Parquet Vectorized3735 / 3897  4.2 
237.5   1.0X
Parquet Vectorized (Pushdown) 1204 / 1222 13.1  
76.6   3.1X
Native ORC Vectorized 3796 / 3831  4.1 
241.4   1.0X
Native ORC Vectorized (Pushdown)  1570 / 1581 10.0  
99.8   2.4X

Select 50% smallint rows (value < CAST(16383 AS smallint)): Best/Avg Time(ms)   
 Rate(M/s)   Per Row(ns)   Relative

Parquet Vectorized7194 / 8522  2.2 
457.4   1.0X
Parquet Vectorized (Pushdown) 5758 / 5806  2.7 
366.1   1.2X
Native ORC Vectorized 7311 / 7585  2.2 
464.8   1.0X
Native ORC Vectorized (Pushdown)  6123 / 6342  2.6 
389.3   1.2X

Select 90% smallint rows (value < CAST(29490 AS smallint)): Best/Avg Time(ms)   
 Rate(M/s)   Per Row(ns)   Relative

[jira] [Updated] (SPARK-24706) Support ByteType and ShortType pushdown to parquet

2018-06-30 Thread Yuming Wang (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-24706?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yuming Wang updated SPARK-24706:

Description: 
Benchmark result:

{noformat}
###[ Pushdown benchmark for tinyint 
]
Java HotSpot(TM) 64-Bit Server VM 1.8.0_151-b12 on Mac OS X 10.12.6
Intel(R) Core(TM) i7-7820HQ CPU @ 2.90GHz

Select 1 tinyint row (value = CAST(63 AS tinyint)): Best/Avg Time(ms)
Rate(M/s)   Per Row(ns)   Relative

Parquet Vectorized4307 / 4575  3.7 
273.8   1.0X
Parquet Vectorized (Pushdown)  227 /  241 69.4  
14.4  19.0X
Native ORC Vectorized 3646 / 3727  4.3 
231.8   1.2X
Native ORC Vectorized (Pushdown)   736 /  744 21.4  
46.8   5.9X

Select 10% tinyint rows (value < 12):Best/Avg Time(ms)Rate(M/s)   Per 
Row(ns)   Relative

Parquet Vectorized5209 / 5843  3.0 
331.2   1.0X
Parquet Vectorized (Pushdown) 1296 / 1759 12.1  
82.4   4.0X
Native ORC Vectorized 4455 / 4594  3.5 
283.2   1.2X
Native ORC Vectorized (Pushdown)  1736 / 1813  9.1 
110.4   3.0X

Select 50% tinyint rows (value < 63):Best/Avg Time(ms)Rate(M/s)   Per 
Row(ns)   Relative

Parquet Vectorized8362 / 8394  1.9 
531.7   1.0X
Parquet Vectorized (Pushdown) 6303 / 6530  2.5 
400.7   1.3X
Native ORC Vectorized 7962 / 8113  2.0 
506.2   1.1X
Native ORC Vectorized (Pushdown)  6680 / 7556  2.4 
424.7   1.3X

Select 90% tinyint rows (value < 114):   Best/Avg Time(ms)Rate(M/s)   Per 
Row(ns)   Relative

Parquet Vectorized  11572 / 11715  1.4 
735.7   1.0X
Parquet Vectorized (Pushdown)   11198 / 11326  1.4 
712.0   1.0X
Native ORC Vectorized   11041 / 11209  1.4 
702.0   1.0X
Native ORC Vectorized (Pushdown)11104 / 11472  1.4 
706.0   1.0X

###[ Pushdown benchmark for smallint 
]###
Java HotSpot(TM) 64-Bit Server VM 1.8.0_151-b12 on Mac OS X 10.12.6
Intel(R) Core(TM) i7-7820HQ CPU @ 2.90GHz

Select 1 smallint row (value =  CAST(63 AS smallint)): Best/Avg Time(ms)
Rate(M/s)   Per Row(ns)   Relative

Parquet Vectorized2939 / 2966  5.4 
186.9   1.0X
Parquet Vectorized (Pushdown)   85 /   91184.9  
 5.4  34.6X
Native ORC Vectorized 2927 / 3026  5.4 
186.1   1.0X
Native ORC Vectorized (Pushdown)   418 /  432 37.7  
26.6   7.0X

Select 10% smallint rows (value < CAST(3276 AS smallint)): Best/Avg Time(ms)
Rate(M/s)   Per Row(ns)   Relative

Parquet Vectorized3735 / 3897  4.2 
237.5   1.0X
Parquet Vectorized (Pushdown) 1204 / 1222 13.1  
76.6   3.1X
Native ORC Vectorized 3796 / 3831  4.1 
241.4   1.0X
Native ORC Vectorized (Pushdown)  1570 / 1581 10.0  
99.8   2.4X

Select 50% smallint rows (value < CAST(16383 AS smallint)): Best/Avg Time(ms)   
 Rate(M/s)   Per Row(ns)   Relative

Parquet Vectorized7194 / 8522  2.2 
457.4   1.0X
Parquet Vectorized (Pushdown) 5758 / 5806  2.7 
366.1   1.2X
Native ORC Vectorized 7311 / 7585  2.2 
464.8   1.0X
Native ORC Vectorized (Pushdown)  6123 / 6342  2.6 
389.3   1.2X

Select 90% smallint rows (value < CAST(29490 AS smallint)): Best/Avg Time(ms)   
 Rate(M/s)   Per Row(ns)   Relative

[jira] [Updated] (SPARK-24706) Support ByteType and ShortType pushdown to parquet

2018-06-30 Thread Yuming Wang (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-24706?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yuming Wang updated SPARK-24706:

Description: (was: Benchmark result:

{noformat}
###[ Pushdown benchmark for tinyint 
]
Java HotSpot(TM) 64-Bit Server VM 1.8.0_151-b12 on Mac OS X 10.12.6
Intel(R) Core(TM) i7-7820HQ CPU @ 2.90GHz

Select 1 tinyint row (value = CAST(63 AS tinyint)): Best/Avg Time(ms)
Rate(M/s)   Per Row(ns)   Relative

Parquet Vectorized4307 / 4575  3.7 
273.8   1.0X
Parquet Vectorized (Pushdown)  227 /  241 69.4  
14.4  19.0X
Native ORC Vectorized 3646 / 3727  4.3 
231.8   1.2X
Native ORC Vectorized (Pushdown)   736 /  744 21.4  
46.8   5.9X

Select 10% tinyint rows (value < 12):Best/Avg Time(ms)Rate(M/s)   Per 
Row(ns)   Relative

Parquet Vectorized5209 / 5843  3.0 
331.2   1.0X
Parquet Vectorized (Pushdown) 1296 / 1759 12.1  
82.4   4.0X
Native ORC Vectorized 4455 / 4594  3.5 
283.2   1.2X
Native ORC Vectorized (Pushdown)  1736 / 1813  9.1 
110.4   3.0X

Select 50% tinyint rows (value < 63):Best/Avg Time(ms)Rate(M/s)   Per 
Row(ns)   Relative

Parquet Vectorized8362 / 8394  1.9 
531.7   1.0X
Parquet Vectorized (Pushdown) 6303 / 6530  2.5 
400.7   1.3X
Native ORC Vectorized 7962 / 8113  2.0 
506.2   1.1X
Native ORC Vectorized (Pushdown)  6680 / 7556  2.4 
424.7   1.3X

Select 90% tinyint rows (value < 114):   Best/Avg Time(ms)Rate(M/s)   Per 
Row(ns)   Relative

Parquet Vectorized  11572 / 11715  1.4 
735.7   1.0X
Parquet Vectorized (Pushdown)   11198 / 11326  1.4 
712.0   1.0X
Native ORC Vectorized   11041 / 11209  1.4 
702.0   1.0X
Native ORC Vectorized (Pushdown)11104 / 11472  1.4 
706.0   1.0X

###[ Pushdown benchmark for smallint 
]###
Java HotSpot(TM) 64-Bit Server VM 1.8.0_151-b12 on Mac OS X 10.12.6
Intel(R) Core(TM) i7-7820HQ CPU @ 2.90GHz

Select 1 smallint row (value =  CAST(63 AS smallint)): Best/Avg Time(ms)
Rate(M/s)   Per Row(ns)   Relative

Parquet Vectorized2939 / 2966  5.4 
186.9   1.0X
Parquet Vectorized (Pushdown)   85 /   91184.9  
 5.4  34.6X
Native ORC Vectorized 2927 / 3026  5.4 
186.1   1.0X
Native ORC Vectorized (Pushdown)   418 /  432 37.7  
26.6   7.0X

Select 10% smallint rows (value < CAST(3276 AS smallint)): Best/Avg Time(ms)
Rate(M/s)   Per Row(ns)   Relative

Parquet Vectorized3735 / 3897  4.2 
237.5   1.0X
Parquet Vectorized (Pushdown) 1204 / 1222 13.1  
76.6   3.1X
Native ORC Vectorized 3796 / 3831  4.1 
241.4   1.0X
Native ORC Vectorized (Pushdown)  1570 / 1581 10.0  
99.8   2.4X

Select 50% smallint rows (value < CAST(16383 AS smallint)): Best/Avg Time(ms)   
 Rate(M/s)   Per Row(ns)   Relative

Parquet Vectorized7194 / 8522  2.2 
457.4   1.0X
Parquet Vectorized (Pushdown) 5758 / 5806  2.7 
366.1   1.2X
Native ORC Vectorized 7311 / 7585  2.2 
464.8   1.0X
Native ORC Vectorized (Pushdown)  6123 / 6342  2.6 
389.3   1.2X

Select 90% smallint rows (value < CAST(29490 AS smallint)): Best/Avg Time(ms)   
 Rate(M/s)   Per Row(ns)   Relative

[jira] [Updated] (SPARK-24718) Timestamp support pushdown to parquet data source

2018-07-02 Thread Yuming Wang (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-24718?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yuming Wang updated SPARK-24718:

Description: 
Some thing like this:
{code:java}
case ParquetSchemaType(TIMESTAMP_MICROS, INT64, null)
  if pushDownDecimal =>
  (n: String, v: Any) => FilterApi.eq(
longColumn(n),
Option(v).map(t => (t.asInstanceOf[java.sql.Timestamp].getTime * 1000)
  .asInstanceOf[java.lang.Long]).orNull)
case ParquetSchemaType(TIMESTAMP_MILLIS, INT64, null)
  if pushDownDecimal =>
  (n: String, v: Any) => FilterApi.eq(
longColumn(n),
Option(v).map(_.asInstanceOf[java.sql.Timestamp].getTime
  .asInstanceOf[java.lang.Long]).orNull)
{code}

  was:
Some thing like this:
{code:java}
case ParquetSchemaType(TIMESTAMP_MICROS, INT64, decimal)
  if pushDownDecimal =>
  (n: String, v: Any) => FilterApi.eq(
longColumn(n),
Option(v).map(t => (t.asInstanceOf[java.sql.Timestamp].getTime * 1000)
  .asInstanceOf[java.lang.Long]).orNull)
case ParquetSchemaType(TIMESTAMP_MILLIS, INT64, decimal)
  if pushDownDecimal =>
  (n: String, v: Any) => FilterApi.eq(
longColumn(n),
Option(v).map(_.asInstanceOf[java.sql.Timestamp].getTime
  .asInstanceOf[java.lang.Long]).orNull)
{code}


> Timestamp support pushdown to parquet data source
> -
>
> Key: SPARK-24718
> URL: https://issues.apache.org/jira/browse/SPARK-24718
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.4.0
>Reporter: Yuming Wang
>Priority: Major
>
> Some thing like this:
> {code:java}
> case ParquetSchemaType(TIMESTAMP_MICROS, INT64, null)
>   if pushDownDecimal =>
>   (n: String, v: Any) => FilterApi.eq(
> longColumn(n),
> Option(v).map(t => (t.asInstanceOf[java.sql.Timestamp].getTime * 1000)
>   .asInstanceOf[java.lang.Long]).orNull)
> case ParquetSchemaType(TIMESTAMP_MILLIS, INT64, null)
>   if pushDownDecimal =>
>   (n: String, v: Any) => FilterApi.eq(
> longColumn(n),
> Option(v).map(_.asInstanceOf[java.sql.Timestamp].getTime
>   .asInstanceOf[java.lang.Long]).orNull)
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-24718) Timestamp support pushdown to parquet data source

2018-07-02 Thread Yuming Wang (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-24718?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yuming Wang updated SPARK-24718:

Description: 
Some thing like this:
{code:java}
case ParquetSchemaType(TIMESTAMP_MICROS, INT64, decimal)
  if pushDownDecimal =>
  (n: String, v: Any) => FilterApi.eq(
longColumn(n),
Option(v).map(t => (t.asInstanceOf[java.sql.Timestamp].getTime * 1000)
  .asInstanceOf[java.lang.Long]).orNull)
case ParquetSchemaType(TIMESTAMP_MILLIS, INT64, decimal)
  if pushDownDecimal =>
  (n: String, v: Any) => FilterApi.eq(
longColumn(n),
Option(v).map(_.asInstanceOf[java.sql.Timestamp].getTime
  .asInstanceOf[java.lang.Long]).orNull)
{code}

  was:
Some thing like this:
{code:java}
// INT96 deprecated, doesn't support pushdown, see: PARQUET-323
case ParquetSchemaType(TIMESTAMP_MICROS, INT64, decimal)
  if pushDownDecimal =>
  (n: String, v: Any) => FilterApi.eq(
longColumn(n),
Option(v).map(t => (t.asInstanceOf[java.sql.Timestamp].getTime * 1000)
  .asInstanceOf[java.lang.Long]).orNull)
case ParquetSchemaType(TIMESTAMP_MILLIS, INT64, decimal)
  if pushDownDecimal =>
  (n: String, v: Any) => FilterApi.eq(
longColumn(n),
Option(v).map(_.asInstanceOf[java.sql.Timestamp].getTime
  .asInstanceOf[java.lang.Long]).orNull)
{code}


> Timestamp support pushdown to parquet data source
> -
>
> Key: SPARK-24718
> URL: https://issues.apache.org/jira/browse/SPARK-24718
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.4.0
>Reporter: Yuming Wang
>Priority: Major
>
> Some thing like this:
> {code:java}
> case ParquetSchemaType(TIMESTAMP_MICROS, INT64, decimal)
>   if pushDownDecimal =>
>   (n: String, v: Any) => FilterApi.eq(
> longColumn(n),
> Option(v).map(t => (t.asInstanceOf[java.sql.Timestamp].getTime * 1000)
>   .asInstanceOf[java.lang.Long]).orNull)
> case ParquetSchemaType(TIMESTAMP_MILLIS, INT64, decimal)
>   if pushDownDecimal =>
>   (n: String, v: Any) => FilterApi.eq(
> longColumn(n),
> Option(v).map(_.asInstanceOf[java.sql.Timestamp].getTime
>   .asInstanceOf[java.lang.Long]).orNull)
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Issue Comment Deleted] (SPARK-24716) Refactor ParquetFilters

2018-07-02 Thread Yuming Wang (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-24716?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yuming Wang updated SPARK-24716:

Comment: was deleted

(was: I'm working on.)

> Refactor ParquetFilters
> ---
>
> Key: SPARK-24716
> URL: https://issues.apache.org/jira/browse/SPARK-24716
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.4.0
>Reporter: Yuming Wang
>Priority: Major
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-24718) Timestamp support pushdown to parquet data source

2018-07-02 Thread Yuming Wang (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-24718?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yuming Wang updated SPARK-24718:

Description: 
Some thing like this:
{code:java}
// INT96 deprecated, doesn't support pushdown, see: PARQUET-323
case ParquetSchemaType(TIMESTAMP_MICROS, INT64, decimal)
  if pushDownDecimal =>
  (n: String, v: Any) => FilterApi.eq(
longColumn(n),
Option(v).map(t => (t.asInstanceOf[java.sql.Timestamp].getTime * 1000)
  .asInstanceOf[java.lang.Long]).orNull)
case ParquetSchemaType(TIMESTAMP_MILLIS, INT64, decimal)
  if pushDownDecimal =>
  (n: String, v: Any) => FilterApi.eq(
longColumn(n),
Option(v).map(_.asInstanceOf[java.sql.Timestamp].getTime
  .asInstanceOf[java.lang.Long]).orNull)
{code}

> Timestamp support pushdown to parquet data source
> -
>
> Key: SPARK-24718
> URL: https://issues.apache.org/jira/browse/SPARK-24718
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.4.0
>Reporter: Yuming Wang
>Priority: Major
>
> Some thing like this:
> {code:java}
> // INT96 deprecated, doesn't support pushdown, see: PARQUET-323
> case ParquetSchemaType(TIMESTAMP_MICROS, INT64, decimal)
>   if pushDownDecimal =>
>   (n: String, v: Any) => FilterApi.eq(
> longColumn(n),
> Option(v).map(t => (t.asInstanceOf[java.sql.Timestamp].getTime * 1000)
>   .asInstanceOf[java.lang.Long]).orNull)
> case ParquetSchemaType(TIMESTAMP_MILLIS, INT64, decimal)
>   if pushDownDecimal =>
>   (n: String, v: Any) => FilterApi.eq(
> longColumn(n),
> Option(v).map(_.asInstanceOf[java.sql.Timestamp].getTime
>   .asInstanceOf[java.lang.Long]).orNull)
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-24718) Timestamp support pushdown to parquet data source

2018-07-02 Thread Yuming Wang (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-24718?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16530095#comment-16530095
 ] 

Yuming Wang commented on SPARK-24718:
-

I'm working on

> Timestamp support pushdown to parquet data source
> -
>
> Key: SPARK-24718
> URL: https://issues.apache.org/jira/browse/SPARK-24718
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.4.0
>Reporter: Yuming Wang
>Priority: Major
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-24718) Timestamp support pushdown to parquet data source

2018-07-02 Thread Yuming Wang (JIRA)
Yuming Wang created SPARK-24718:
---

 Summary: Timestamp support pushdown to parquet data source
 Key: SPARK-24718
 URL: https://issues.apache.org/jira/browse/SPARK-24718
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 2.4.0
Reporter: Yuming Wang






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-24096) create table as select not using hive.default.fileformat

2018-04-26 Thread Yuming Wang (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-24096?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16454463#comment-16454463
 ] 

Yuming Wang commented on SPARK-24096:
-

Another related PR: https://github.com/apache/spark/pull/14430

> create table as select not using hive.default.fileformat
> 
>
> Key: SPARK-24096
> URL: https://issues.apache.org/jira/browse/SPARK-24096
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.3.0
>Reporter: StephenZou
>Priority: Major
>
> In my spark conf directory, hive-site.xml have an item indicating orc is the 
> default file format.
> 
>  hive.default.fileformat
>  orc
>  
>  
> But when I use "create table as select ..." to create a table, the output 
> format is plain text. 
> It works only I use "set hive.default.fileformat=orc"
>  
> Then I walked through the spark code and found in 
> sparkSqlParser:visitCreateHiveTable(), 
> val defaultStorage = HiveSerDe.getDefaultStorage(conf)  the conf is SQLConf,
> that explains the above observation, 
> "set hive.default.fileformat=orc" is put into conf map, hive-site.xml is not. 
>  
> It's quite misleading, How to unify the settings?
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-24096) create table as select not using hive.default.fileformat

2018-04-26 Thread Yuming Wang (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-24096?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yuming Wang resolved SPARK-24096.
-
Resolution: Duplicate

> create table as select not using hive.default.fileformat
> 
>
> Key: SPARK-24096
> URL: https://issues.apache.org/jira/browse/SPARK-24096
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.3.0
>Reporter: StephenZou
>Priority: Major
>
> In my spark conf directory, hive-site.xml have an item indicating orc is the 
> default file format.
> 
>  hive.default.fileformat
>  orc
>  
>  
> But when I use "create table as select ..." to create a table, the output 
> format is plain text. 
> It works only I use "set hive.default.fileformat=orc"
>  
> Then I walked through the spark code and found in 
> sparkSqlParser:visitCreateHiveTable(), 
> val defaultStorage = HiveSerDe.getDefaultStorage(conf)  the conf is SQLConf,
> that explains the above observation, 
> "set hive.default.fileformat=orc" is put into conf map, hive-site.xml is not. 
>  
> It's quite misleading, How to unify the settings?
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-22977) DataFrameWriter operations do not show details in SQL tab

2018-01-05 Thread Yuming Wang (JIRA)
Yuming Wang created SPARK-22977:
---

 Summary: DataFrameWriter operations do not show details in SQL tab
 Key: SPARK-22977
 URL: https://issues.apache.org/jira/browse/SPARK-22977
 Project: Spark
  Issue Type: Bug
  Components: SQL, Web UI
Affects Versions: 2.3.0
Reporter: Yuming Wang






--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-22977) DataFrameWriter operations do not show details in SQL tab

2018-01-05 Thread Yuming Wang (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-22977?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yuming Wang updated SPARK-22977:

Attachment: after.png
before.png

> DataFrameWriter operations do not show details in SQL tab
> -
>
> Key: SPARK-22977
> URL: https://issues.apache.org/jira/browse/SPARK-22977
> Project: Spark
>  Issue Type: Bug
>  Components: SQL, Web UI
>Affects Versions: 2.3.0
>Reporter: Yuming Wang
> Attachments: after.png, before.png
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-22977) DataFrameWriter operations do not show details in SQL tab

2018-01-05 Thread Yuming Wang (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-22977?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yuming Wang updated SPARK-22977:

Attachment: after.png
before.png

> DataFrameWriter operations do not show details in SQL tab
> -
>
> Key: SPARK-22977
> URL: https://issues.apache.org/jira/browse/SPARK-22977
> Project: Spark
>  Issue Type: Bug
>  Components: SQL, Web UI
>Affects Versions: 2.3.0
>Reporter: Yuming Wang
> Attachments: after.png, before.png
>
>
> When create 
> !before.png!
> !after.png!



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-22977) DataFrameWriter operations do not show details in SQL tab

2018-01-05 Thread Yuming Wang (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-22977?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yuming Wang updated SPARK-22977:

Description: 
When create 
!before.png!
!after.png!

> DataFrameWriter operations do not show details in SQL tab
> -
>
> Key: SPARK-22977
> URL: https://issues.apache.org/jira/browse/SPARK-22977
> Project: Spark
>  Issue Type: Bug
>  Components: SQL, Web UI
>Affects Versions: 2.3.0
>Reporter: Yuming Wang
> Attachments: after.png, before.png
>
>
> When create 
> !before.png!
> !after.png!



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-22977) DataFrameWriter operations do not show details in SQL tab

2018-01-05 Thread Yuming Wang (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-22977?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yuming Wang updated SPARK-22977:

Attachment: (was: before.png)

> DataFrameWriter operations do not show details in SQL tab
> -
>
> Key: SPARK-22977
> URL: https://issues.apache.org/jira/browse/SPARK-22977
> Project: Spark
>  Issue Type: Bug
>  Components: SQL, Web UI
>Affects Versions: 2.3.0
>Reporter: Yuming Wang
> Attachments: after.png, before.png
>
>
> When create 
> !before.png!
> !after.png!



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-22977) DataFrameWriter operations do not show details in SQL tab

2018-01-05 Thread Yuming Wang (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-22977?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yuming Wang updated SPARK-22977:

Attachment: (was: after.png)

> DataFrameWriter operations do not show details in SQL tab
> -
>
> Key: SPARK-22977
> URL: https://issues.apache.org/jira/browse/SPARK-22977
> Project: Spark
>  Issue Type: Bug
>  Components: SQL, Web UI
>Affects Versions: 2.3.0
>Reporter: Yuming Wang
> Attachments: after.png, before.png
>
>
> When create 
> !before.png!
> !after.png!



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-22977) DataFrameWriter operations do not show details in SQL tab

2018-01-05 Thread Yuming Wang (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-22977?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yuming Wang updated SPARK-22977:

Description: 
When CreateHiveTableAsSelectCommand or  InsertIntoHiveTable,
!before.png!
!after.png!

  was:
When create 
!before.png!
!after.png!


> DataFrameWriter operations do not show details in SQL tab
> -
>
> Key: SPARK-22977
> URL: https://issues.apache.org/jira/browse/SPARK-22977
> Project: Spark
>  Issue Type: Bug
>  Components: SQL, Web UI
>Affects Versions: 2.3.0
>Reporter: Yuming Wang
> Attachments: after.png, before.png
>
>
> When CreateHiveTableAsSelectCommand or  InsertIntoHiveTable,
> !before.png!
> !after.png!



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-22977) DataFrameWriter operations do not show details in SQL tab

2018-01-05 Thread Yuming Wang (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-22977?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yuming Wang updated SPARK-22977:

Description: 
When CreateHiveTableAsSelectCommand or  InsertIntoHiveTable, SQL tab don't show 
details after [SPARK-20213|https://issues.apache.org/jira/browse/SPARK-20213].

*Before*:
!before.png!

*After*:
!after.png!

  was:
When CreateHiveTableAsSelectCommand or  InsertIntoHiveTable,
!before.png!
!after.png!


> DataFrameWriter operations do not show details in SQL tab
> -
>
> Key: SPARK-22977
> URL: https://issues.apache.org/jira/browse/SPARK-22977
> Project: Spark
>  Issue Type: Bug
>  Components: SQL, Web UI
>Affects Versions: 2.3.0
>Reporter: Yuming Wang
> Attachments: after.png, before.png
>
>
> When CreateHiveTableAsSelectCommand or  InsertIntoHiveTable, SQL tab don't 
> show details after 
> [SPARK-20213|https://issues.apache.org/jira/browse/SPARK-20213].
> *Before*:
> !before.png!
> *After*:
> !after.png!



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-22894) DateTimeOperations should accept SQL like string type

2017-12-23 Thread Yuming Wang (JIRA)
Yuming Wang created SPARK-22894:
---

 Summary: DateTimeOperations should accept SQL like string type
 Key: SPARK-22894
 URL: https://issues.apache.org/jira/browse/SPARK-22894
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 2.3.0
Reporter: Yuming Wang



{noformat}
spark-sql> SELECT '2017-12-24' + interval 2 months 2 seconds;
Error in query: cannot resolve '(CAST('2017-12-24' AS DOUBLE) + interval 2 
months 2 seconds)' due to data type mismatch: differing types in 
'(CAST('2017-12-24' AS DOUBLE) + interval 2 months 2 seconds)' (double and 
calendarinterval).; line 1 pos 7;
'Project [unresolvedalias((cast(2017-12-24 as double) + interval 2 months 2 
seconds), None)]
+- OneRowRelation
spark-sql> 

{noformat}




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-22890) Basic tests for DateTimeOperations

2017-12-22 Thread Yuming Wang (JIRA)
Yuming Wang created SPARK-22890:
---

 Summary: Basic tests for DateTimeOperations
 Key: SPARK-22890
 URL: https://issues.apache.org/jira/browse/SPARK-22890
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Affects Versions: 2.3.0
Reporter: Yuming Wang






--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-22893) Unified the data type mismatch message

2017-12-23 Thread Yuming Wang (JIRA)
Yuming Wang created SPARK-22893:
---

 Summary: Unified the data type mismatch message
 Key: SPARK-22893
 URL: https://issues.apache.org/jira/browse/SPARK-22893
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 2.3.0
Reporter: Yuming Wang


{noformat}
spark-sql> select cast(1 as binary);
Error in query: cannot resolve 'CAST(1 AS BINARY)' due to data type mismatch: 
cannot cast IntegerType to BinaryType; line 1 pos 7;
{noformat}

We should use {{dataType.simpleString}}.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-23175) Type conversion does not make sense under case like select ’0.1’ = 0

2018-01-21 Thread Yuming Wang (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-23175?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yuming Wang resolved SPARK-23175.
-
Resolution: Duplicate

> Type conversion does not make sense under case like select ’0.1’ = 0
> 
>
> Key: SPARK-23175
> URL: https://issues.apache.org/jira/browse/SPARK-23175
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.0
>Reporter: Shaoquan Zhang
>Priority: Major
>
> SQL select '0.1' = 0 returns true. The result seems unreasonable.
> From the logical plan, the sql is parsed as 'Project [(cast(cast(0.1 as 
> decimal(20,0)) as int) = 0) AS #6]'. The type conversion converts the string 
> to integer, which leads to the unreasonable result.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-23058) Show create table can't show non printable field delim

2018-01-12 Thread Yuming Wang (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-23058?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yuming Wang updated SPARK-23058:

Summary: Show create table can't show non printable field delim  (was: Show 
create table conn't show non printable field delim)

> Show create table can't show non printable field delim
> --
>
> Key: SPARK-23058
> URL: https://issues.apache.org/jira/browse/SPARK-23058
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.3.0
>Reporter: Yuming Wang
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-23058) Show create table can't show non printable field delim

2018-01-12 Thread Yuming Wang (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-23058?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yuming Wang updated SPARK-23058:

Description: 
# create table t1:
{code:sql}
CREATE EXTERNAL TABLE `t1`(`col1` bigint)
ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe'
WITH SERDEPROPERTIES (
  'field.delim' = '\177',
  'serialization.format' = '\003'
)
STORED AS
  INPUTFORMAT 'org.apache.hadoop.mapred.SequenceFileInputFormat'
  OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat'
LOCATION 'file:/tmp/t1';
{code}
# show create table t1:

{code:java}
spark-sql> show create table t1;
CREATE EXTERNAL TABLE `t1`(`col1` bigint)
ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe'
WITH SERDEPROPERTIES (
  'field.delim' = '',
  'serialization.format' = ''
)
STORED AS
  INPUTFORMAT 'org.apache.hadoop.mapred.SequenceFileInputFormat'
  OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat'
LOCATION 'file:/tmp/t1'
TBLPROPERTIES (
  'transient_lastDdlTime' = '1515766958'
)
{code}

 {{'\177'}} and {{'\003'}} didn't correct show.




> Show create table can't show non printable field delim
> --
>
> Key: SPARK-23058
> URL: https://issues.apache.org/jira/browse/SPARK-23058
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.3.0
>Reporter: Yuming Wang
>
> # create table t1:
> {code:sql}
> CREATE EXTERNAL TABLE `t1`(`col1` bigint)
> ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe'
> WITH SERDEPROPERTIES (
>   'field.delim' = '\177',
>   'serialization.format' = '\003'
> )
> STORED AS
>   INPUTFORMAT 'org.apache.hadoop.mapred.SequenceFileInputFormat'
>   OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat'
> LOCATION 'file:/tmp/t1';
> {code}
> # show create table t1:
> {code:java}
> spark-sql> show create table t1;
> CREATE EXTERNAL TABLE `t1`(`col1` bigint)
> ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe'
> WITH SERDEPROPERTIES (
>   'field.delim' = '',
>   'serialization.format' = ''
> )
> STORED AS
>   INPUTFORMAT 'org.apache.hadoop.mapred.SequenceFileInputFormat'
>   OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat'
> LOCATION 'file:/tmp/t1'
> TBLPROPERTIES (
>   'transient_lastDdlTime' = '1515766958'
> )
> {code}
>  {{'\177'}} and {{'\003'}} didn't correct show.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-23058) Show create table conn't show non printable field delim

2018-01-12 Thread Yuming Wang (JIRA)
Yuming Wang created SPARK-23058:
---

 Summary: Show create table conn't show non printable field delim
 Key: SPARK-23058
 URL: https://issues.apache.org/jira/browse/SPARK-23058
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 2.3.0
Reporter: Yuming Wang






--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-23297) Spark job is finished but the stage process is error

2018-02-04 Thread Yuming Wang (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-23297?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16351791#comment-16351791
 ] 

Yuming Wang edited comment on SPARK-23297 at 2/4/18 1:52 PM:
-

[~KaiXinXIaoLei] Try to increase {{spark.ui.retainedTasks}}.


was (Author: q79969786):
[~KaiXinXIaoLei] Try to increase 
{{spark.scheduler.listenerbus.eventqueue.capacity}}.

> Spark job is finished but the stage process is error
> 
>
> Key: SPARK-23297
> URL: https://issues.apache.org/jira/browse/SPARK-23297
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.1.2, 2.2.1
>Reporter: KaiXinXIaoLei
>Priority: Major
> Attachments: job finished but stage process is error.png
>
>
> I set the log level is WARN, and run spark job using spark-sql. My job is 
> finished but the stage process display the running state,  !job finished but 
> stage process is error.png!



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-23297) Spark job is finished but the stage process is error

2018-02-04 Thread Yuming Wang (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-23297?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16351791#comment-16351791
 ] 

Yuming Wang commented on SPARK-23297:
-

[~KaiXinXIaoLei] Try to increase 
{{spark.scheduler.listenerbus.eventqueue.capacity}}.

> Spark job is finished but the stage process is error
> 
>
> Key: SPARK-23297
> URL: https://issues.apache.org/jira/browse/SPARK-23297
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.1.2, 2.2.1
>Reporter: KaiXinXIaoLei
>Priority: Major
> Attachments: job finished but stage process is error.png
>
>
> I set the log level is WARN, and run spark job using spark-sql. My job is 
> finished but the stage process display the running state,  !job finished but 
> stage process is error.png!



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-23336) Upgrade snappy-java to 1.1.4

2018-02-05 Thread Yuming Wang (JIRA)
Yuming Wang created SPARK-23336:
---

 Summary: Upgrade snappy-java to 1.1.4
 Key: SPARK-23336
 URL: https://issues.apache.org/jira/browse/SPARK-23336
 Project: Spark
  Issue Type: Bug
  Components: Build
Affects Versions: 2.4.0
Reporter: Yuming Wang


We should upgrade the snappy-java version to improve performance compression 
(5%) and decompression (20%).

Details:
 
[https://github.com/xerial/snappy-java/blob/master/Milestone.md#snappy-java-114-2017-05-22]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-23332) Update SQLQueryTestSuite to support test both default mode and hive mode for a typeCoercion TestCase

2018-02-04 Thread Yuming Wang (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-23332?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yuming Wang updated SPARK-23332:

Summary: Update SQLQueryTestSuite to support test both default mode and 
hive mode for a typeCoercion TestCase  (was: Update SQLQueryTestSuite to 
support test hive mode)

> Update SQLQueryTestSuite to support test both default mode and hive mode for 
> a typeCoercion TestCase
> 
>
> Key: SPARK-23332
> URL: https://issues.apache.org/jira/browse/SPARK-23332
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 2.4.0
>Reporter: Yuming Wang
>Priority: Major
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-23332) Update SQLQueryTestSuite to support test hive mode

2018-02-04 Thread Yuming Wang (JIRA)
Yuming Wang created SPARK-23332:
---

 Summary: Update SQLQueryTestSuite to support test hive mode
 Key: SPARK-23332
 URL: https://issues.apache.org/jira/browse/SPARK-23332
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Affects Versions: 2.4.0
Reporter: Yuming Wang






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-23297) Spark job is finished but the stage process is error

2018-02-01 Thread Yuming Wang (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-23297?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16348332#comment-16348332
 ] 

Yuming Wang commented on SPARK-23297:
-

It seems because {{SparkListenerTaskEnd}} events are not consumed in time, and 
this is not a bug.

> Spark job is finished but the stage process is error
> 
>
> Key: SPARK-23297
> URL: https://issues.apache.org/jira/browse/SPARK-23297
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.2.1
>Reporter: KaiXinXIaoLei
>Priority: Major
> Attachments: job finished but stage process is error.png
>
>
> I set the log level is WARN, and run spark job using spark-sql. My job is 
> finished but the stage process display the running state,  !job finished but 
> stage process is error.png!



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-23263) create table stored as parquet should update table size if automatic update table size is enabled

2018-01-29 Thread Yuming Wang (JIRA)
Yuming Wang created SPARK-23263:
---

 Summary: create table stored as parquet should update table size 
if automatic update table size is enabled
 Key: SPARK-23263
 URL: https://issues.apache.org/jira/browse/SPARK-23263
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 2.4.0
Reporter: Yuming Wang


How to reproduce:

{noformat}
bin/spark-sql --conf spark.sql.statistics.size.autoUpdate.enabled=true
{noformat}

{code:sql}
spark-sql> create table test_create_parquet stored as parquet as select 1;
spark-sql> desc extended test_create_parquet;
{code}
The table statistics will not exists.


 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-23336) Upgrade snappy-java to 1.1.7.1

2018-02-06 Thread Yuming Wang (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-23336?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yuming Wang updated SPARK-23336:

Summary: Upgrade snappy-java to 1.1.7.1  (was: Upgrade snappy-java to 1.1.4)

> Upgrade snappy-java to 1.1.7.1
> --
>
> Key: SPARK-23336
> URL: https://issues.apache.org/jira/browse/SPARK-23336
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 2.4.0
>Reporter: Yuming Wang
>Priority: Minor
>
> We should upgrade the snappy-java version to improve performance compression 
> (5%) and decompression (20%).
> Details:
>  
> [https://github.com/xerial/snappy-java/blob/master/Milestone.md#snappy-java-114-2017-05-22]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-23405) The task will hang up when a small table left semi join a big table

2018-02-13 Thread Yuming Wang (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-23405?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16362162#comment-16362162
 ] 

Yuming Wang commented on SPARK-23405:
-

I think it's data skew, you should broadcast small table.

> The task will hang up when a small table left semi join a big table
> ---
>
> Key: SPARK-23405
> URL: https://issues.apache.org/jira/browse/SPARK-23405
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.2.1
>Reporter: KaiXinXIaoLei
>Priority: Major
> Attachments: SQL.png, taskhang up.png
>
>
> I run a sql: `select ls.cs_order_number from ls left semi join catalog_sales 
> cs on ls.cs_order_number = cs.cs_order_number`, The `ls` table is a small 
> table ,and the number is one. The `catalog_sales` table is a big table,  and 
> the number is 10 billion. The task will be hang up:
> !taskhang up.png!
>  And the sql page is :
> !SQL.png!



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-23354) spark jdbc does not maintain length of data type when I move data from MS sql server to Oracle using spark jdbc

2018-02-08 Thread Yuming Wang (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-23354?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16357966#comment-16357966
 ] 

Yuming Wang commented on SPARK-23354:
-

Do you mean custom column type? you can find more details 
[here|https://github.com/apache/spark/pull/18266].

 

> spark jdbc does not maintain length of data type when I move data from MS sql 
> server to Oracle using spark jdbc
> ---
>
> Key: SPARK-23354
> URL: https://issues.apache.org/jira/browse/SPARK-23354
> Project: Spark
>  Issue Type: Bug
>  Components: Input/Output
>Affects Versions: 2.2.1
>Reporter: Lav Patel
>Priority: Major
>
> spark jdbc does not maintain length of data type when I move data from MS sql 
> server to Oracle using spark jdbc
>  
> To fix this, I have written code so it will figure out length of column and 
> it does the conversion.
>  
> I can put more details with a code sample if the community is interested. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-23373) Can not execute "count distinct" queries on parquet formatted table

2018-02-09 Thread Yuming Wang (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-23373?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16358392#comment-16358392
 ] 

Yuming Wang commented on SPARK-23373:
-

I cannot reproduce on current master as your mentioned too.

> Can not execute "count distinct" queries on parquet formatted table
> ---
>
> Key: SPARK-23373
> URL: https://issues.apache.org/jira/browse/SPARK-23373
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.2.0
>Reporter: Wang, Gang
>Priority: Major
>
> I failed to run sql "select count(distinct n_name) from nation", table nation 
> is formatted in Parquet, error trace is as following.
> _spark-sql> select count(distinct n_name) from nation;_
>  _18/02/09 03:55:28 INFO main SparkSqlParser:54 Parsing command: select 
> count(distinct n_name) from nation_
>  _Error in query: Table or view not found: nation; line 1 pos 35_
>  _spark-sql> select count(distinct n_name) from nation_parquet;_
>  _18/02/09 03:55:36 INFO main SparkSqlParser:54 Parsing command: select 
> count(distinct n_name) from nation_parquet_
>  _18/02/09 03:55:36 INFO main CatalystSqlParser:54 Parsing command: int_
>  _18/02/09 03:55:36 INFO main CatalystSqlParser:54 Parsing command: string_
>  _18/02/09 03:55:36 INFO main CatalystSqlParser:54 Parsing command: int_
>  _18/02/09 03:55:36 INFO main CatalystSqlParser:54 Parsing command: string_
>  _18/02/09 03:55:36 INFO main CatalystSqlParser:54 Parsing command: 
> array_
>  _18/02/09 03:55:38 INFO main FileSourceStrategy:54 Pruning directories with:_
>  _18/02/09 03:55:38 INFO main FileSourceStrategy:54 Data Filters:_
>  _18/02/09 03:55:38 INFO main FileSourceStrategy:54 Post-Scan Filters:_
>  _18/02/09 03:55:38 INFO main FileSourceStrategy:54 Output Data Schema: 
> struct_
>  _18/02/09 03:55:38 INFO main FileSourceScanExec:54 Pushed Filters:_
>  _18/02/09 03:55:39 INFO main CodeGenerator:54 Code generated in 295.88685 ms_
>  _18/02/09 03:55:39 INFO main HashAggregateExec:54 
> spark.sql.codegen.aggregate.map.twolevel.enable is set to true, but current 
> version of codegened fast hashmap does not support this aggregate._
>  _18/02/09 03:55:39 INFO main CodeGenerator:54 Code generated in 51.075394 ms_
>  _18/02/09 03:55:39 INFO main HashAggregateExec:54 
> spark.sql.codegen.aggregate.map.twolevel.enable is set to true, but current 
> version of codegened fast hashmap does not support this aggregate._
>  _18/02/09 03:55:39 INFO main CodeGenerator:54 Code generated in 42.819226 ms_
>  _18/02/09 03:55:39 INFO main ParquetFileFormat:54 parquetFilterPushDown is 
> true_
>  _18/02/09 03:55:39 INFO main ParquetFileFormat:54 start filter class_
>  _18/02/09 03:55:39 INFO main ParquetFileFormat:54 Pushed not defined_
>  _18/02/09 03:55:39 INFO main ParquetFileFormat:54 end filter class_
>  _18/02/09 03:55:39 INFO main MemoryStore:54 Block broadcast_0 stored as 
> values in memory (estimated size 305.0 KB, free 366.0 MB)_
>  _18/02/09 03:55:39 INFO main MemoryStore:54 Block broadcast_0_piece0 stored 
> as bytes in memory (estimated size 27.6 KB, free 366.0 MB)_
>  _18/02/09 03:55:39 INFO dispatcher-event-loop-7 BlockManagerInfo:54 Added 
> broadcast_0_piece0 in memory on 10.64.205.170:45616 (size: 27.6 KB, free: 
> 366.3 MB)_
>  _18/02/09 03:55:39 INFO main SparkContext:54 Created broadcast 0 from 
> processCmd at CliDriver.java:376_
>  _18/02/09 03:55:39 INFO main InMemoryFileIndex:54 Selected files after 
> partition pruning:_
>  _PartitionDirectory([empty 
> row],ArrayBuffer(LocatedFileStatus\{path=hdfs://**.com:8020/apps/hive/warehouse/nation_parquet/00_0;
>  isDirectory=false; length=3216; replication=3; blocksize=134217728; 
> modification_time=1516619879024; access_time=0; owner=; group=; 
> permission=rw-rw-rw-; isSymlink=false}))_
>  _18/02/09 03:55:39 INFO main FileSourceScanExec:54 Planning scan with bin 
> packing, max size: 4194304 bytes, open cost is considered as scanning 4194304 
> bytes._
>  _18/02/09 03:55:39 ERROR main SparkSQLDriver:91 Failed in [select 
> count(distinct n_name) from nation_parquet]_
>  {color:#ff}*_org.apache.spark.SparkException: Task not 
> serializable_*{color}
>  _at 
> org.apache.spark.util.ClosureCleaner$.ensureSerializable(ClosureCleaner.scala:340)_
>  _at 
> org.apache.spark.util.ClosureCleaner$.org$apache$spark$util$ClosureCleaner$$clean(ClosureCleaner.scala:330)_
>  _at org.apache.spark.util.ClosureCleaner$.clean(ClosureCleaner.scala:156)_
>  _at org.apache.spark.SparkContext.clean(SparkContext.scala:2294)_
>  _at 
> org.apache.spark.rdd.RDD$$anonfun$mapPartitionsWithIndex$1.apply(RDD.scala:841)_
>  _at 
> org.apache.spark.rdd.RDD$$anonfun$mapPartitionsWithIndex$1.apply(RDD.scala:840)_
>  _at 
> 

[jira] [Commented] (SPARK-23370) Spark receives a size of 0 for an Oracle Number field and defaults the field type to be BigDecimal(30,10) instead of the actual precision and scale

2018-02-09 Thread Yuming Wang (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-23370?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16358401#comment-16358401
 ] 

Yuming Wang commented on SPARK-23370:
-

User can config the column type like below now:
{code:scala}
val props = new Properties()
props.put("customSchema", "ID decimal(38, 0), N1 int, N2 boolean")
val dfRead = spark.read.schema(schema).jdbc(jdbcUrl, "tableWithCustomSchema", 
props)
dfRead.show()
{code}
More details:
https://github.com/apache/spark/pull/18266

> Spark receives a size of 0 for an Oracle Number field and defaults the field 
> type to be BigDecimal(30,10) instead of the actual precision and scale
> ---
>
> Key: SPARK-23370
> URL: https://issues.apache.org/jira/browse/SPARK-23370
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.2.1
> Environment: Spark 2.2
> Oracle 11g
> JDBC ojdbc6.jar
>Reporter: Harleen Singh Mann
>Priority: Major
> Attachments: Oracle KB Document 1266785.pdf
>
>
> Currently, on jdbc read spark obtains the schema of a table from using 
> {color:#654982} resultSet.getMetaData.getColumnType{color}
> This works 99.99% of the times except when the column of Number type is added 
> on an Oracle table using the alter statement. This is essentially an Oracle 
> DB + JDBC bug that has been documented on Oracle KB and patches exist. 
> [oracle 
> KB|https://support.oracle.com/knowledge/Oracle%20Database%20Products/1266785_1.html]
> {color:#ff}As a result of the above mentioned issue, Spark receives a 
> size of 0 for the field and defaults the field type to be BigDecimal(30,10) 
> instead of what it actually should be. This is done in OracleDialect.scala. 
> This may cause issues in the downstream application where relevant 
> information may be missed to the changed precision and scale.{color}
> _The versions that are affected are:_ 
>  _JDBC - Version: 11.2.0.1 and later   [Release: 11.2 and later ]_
>  _Oracle Server - Enterprise Edition - Version: 11.1.0.6 to 11.2.0.1_  
> _[Release: 11.1 to 11.2]_ 
> +Proposed approach:+
> There is another way of fetching the schema information in Oracle: Which is 
> through the all_tab_columns table. If we use this table to fetch the 
> precision and scale of Number time, the above issue is mitigated.
>  
> {color:#14892c}{color:#f6c342}I can implement the changes, but require some 
> inputs on the approach from the gatekeepers here{color}.{color}
>  {color:#14892c}PS. This is also my first Jira issue and my first fork for 
> Spark, so I will need some guidance along the way. (yes, I am a newbee to 
> this) Thanks...{color}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-23510) Support read data from Hive 2.2 and Hive 2.3 metastore

2018-02-24 Thread Yuming Wang (JIRA)
Yuming Wang created SPARK-23510:
---

 Summary: Support read data from Hive 2.2 and Hive 2.3 metastore
 Key: SPARK-23510
 URL: https://issues.apache.org/jira/browse/SPARK-23510
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 2.4.0
Reporter: Yuming Wang






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-23510) Support read data from Hive 2.2 and Hive 2.3 metastore

2018-02-24 Thread Yuming Wang (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-23510?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16375660#comment-16375660
 ] 

Yuming Wang commented on SPARK-23510:
-

[~JPMoresmau] Can you try https://github.com/apache/spark/pull/20668?

> Support read data from Hive 2.2 and Hive 2.3 metastore
> --
>
> Key: SPARK-23510
> URL: https://issues.apache.org/jira/browse/SPARK-23510
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.4.0
>Reporter: Yuming Wang
>Priority: Major
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-22722) Test Coverage for Type Coercion Compatibility

2017-12-28 Thread Yuming Wang (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-22722?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16305979#comment-16305979
 ] 

Yuming Wang commented on SPARK-22722:
-

[~smilegator] All tests are added, except 
[FunctionArgumentConversion|https://github.com/apache/spark/pull/20008#issuecomment-352670852]
 and 
[StackCoercion|https://github.com/apache/spark/pull/20006#pullrequestreview-84366891].

> Test Coverage for Type Coercion Compatibility
> -
>
> Key: SPARK-22722
> URL: https://issues.apache.org/jira/browse/SPARK-22722
> Project: Spark
>  Issue Type: Umbrella
>  Components: SQL
>Affects Versions: 2.3.0
>Reporter: Xiao Li
>Assignee: Yuming Wang
>
> Hive compatibility is pretty important for the users who run or migrate both 
> Hive and Spark SQL. 
> We plan to add a SQLConf for type coercion compatibility 
> (spark.sql.typeCoercion.mode). Users can choose Spark's native mode (default) 
> or Hive mode (hive). 
> Before we deliver the Hive compatibility mode, we plan to write a set of test 
> cases that can be easily run in both Spark and Hive sides. We can easily 
> compare whether they are the same or not. When new typeCoercion rules are 
> added, we also can easily track the changes. These test cases can also be 
> backported to the previous Spark versions for determining the changes we 
> made. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-20295) when spark.sql.adaptive.enabled is enabled, have conflict with Exchange Resue

2018-06-22 Thread Yuming Wang (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-20295?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16520148#comment-16520148
 ] 

Yuming Wang commented on SPARK-20295:
-

[~KevinZwx] Can you try [https://github.com/Intel-bigdata/spark-adaptive]?

 

> when  spark.sql.adaptive.enabled is enabled, have conflict with Exchange Resue
> --
>
> Key: SPARK-20295
> URL: https://issues.apache.org/jira/browse/SPARK-20295
> Project: Spark
>  Issue Type: Bug
>  Components: Shuffle, SQL
>Affects Versions: 2.1.0
>Reporter: Ruhui Wang
>Priority: Major
>
> when run  tpcds-q95, and set  spark.sql.adaptive.enabled = true the physical 
> plan firstly:
> Sort
> :  +- Exchange(coordinator id: 1)
> : +- Project***
> ::-Sort **
> ::  +- Exchange(coordinator id: 2)
> :: :- Project ***
> :+- Sort
> ::  +- Exchange(coordinator id: 3)
>  spark.sql.exchange.reuse is opened, then physical plan will become below:
> Sort
> :  +- Exchange(coordinator id: 1)
> : +- Project***
> ::-Sort **
> ::  +- Exchange(coordinator id: 2)
> :: :- Project ***
> :+- Sort
> ::  +- ReusedExchange  Exchange(coordinator id: 2)
> If spark.sql.adaptive.enabled = true,  the code stack is : 
> ShuffleExchange#doExecute --> postShuffleRDD function --> 
> doEstimationIfNecessary . In this function, 
> assert(exchanges.length == numExchanges) will be error, as left side has only 
> one element, but right is equal to 2.
> If this is a bug of spark.sql.adaptive.enabled and exchange resue?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-24937) Datasource partition table should load empty static partitions

2018-07-30 Thread Yuming Wang (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-24937?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yuming Wang updated SPARK-24937:

Summary: Datasource partition table should load empty static partitions  
(was: Datasource partition table should load empty partitions)

> Datasource partition table should load empty static partitions
> --
>
> Key: SPARK-24937
> URL: https://issues.apache.org/jira/browse/SPARK-24937
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.4.0
>Reporter: Yuming Wang
>Priority: Major
>
> How to reproduce:
> {code:sql}
> spark-sql> CREATE TABLE tbl AS SELECT 1;
> spark-sql> CREATE TABLE tbl1 (c1 BIGINT, day STRING, hour STRING)
>  > USING parquet
>  > PARTITIONED BY (day, hour);
> spark-sql> INSERT INTO TABLE tbl1 PARTITION (day = '2018-07-25', hour='01') 
> SELECT * FROM tbl where 1=0;
> spark-sql> SHOW PARTITIONS tbl1;
> spark-sql> CREATE TABLE tbl2 (c1 BIGINT)
>  > PARTITIONED BY (day STRING, hour STRING);
> 18/07/26 22:49:20 WARN HiveMetaStore: Location: 
> file:/Users/yumwang/tmp/spark/spark-warehouse/tbl2 specified for non-external 
> table:tbl2
> spark-sql> INSERT INTO TABLE tbl2 PARTITION (day = '2018-07-25', hour='01') 
> SELECT * FROM tbl where 1=0;
> 18/07/26 22:49:36 WARN log: Updating partition stats fast for: tbl2
> 18/07/26 22:49:36 WARN log: Updated size to 0
> spark-sql> SHOW PARTITIONS tbl2;
> day=2018-07-25/hour=01
> spark-sql> 
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Issue Comment Deleted] (SPARK-24937) Datasource partition table should load empty partitions

2018-07-26 Thread Yuming Wang (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-24937?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yuming Wang updated SPARK-24937:

Comment: was deleted

(was: I'm working on.)

> Datasource partition table should load empty partitions
> ---
>
> Key: SPARK-24937
> URL: https://issues.apache.org/jira/browse/SPARK-24937
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.4.0
>Reporter: Yuming Wang
>Priority: Major
>
> How to reproduce:
> {code:sql}
> spark-sql> CREATE TABLE tbl AS SELECT 1;
> spark-sql> CREATE TABLE tbl1 (c1 BIGINT, day STRING, hour STRING)
>  > USING parquet
>  > PARTITIONED BY (day, hour);
> spark-sql> INSERT INTO TABLE tbl1 PARTITION (day = '2018-07-25', hour='01') 
> SELECT * FROM tbl where 1=0;
> spark-sql> SHOW PARTITIONS tbl1;
> spark-sql> CREATE TABLE tbl2 (c1 BIGINT)
>  > PARTITIONED BY (day STRING, hour STRING);
> 18/07/26 22:49:20 WARN HiveMetaStore: Location: 
> file:/Users/yumwang/tmp/spark/spark-warehouse/tbl2 specified for non-external 
> table:tbl2
> spark-sql> INSERT INTO TABLE tbl2 PARTITION (day = '2018-07-25', hour='01') 
> SELECT * FROM tbl where 1=0;
> 18/07/26 22:49:36 WARN log: Updating partition stats fast for: tbl2
> 18/07/26 22:49:36 WARN log: Updated size to 0
> spark-sql> SHOW PARTITIONS tbl2;
> day=2018-07-25/hour=01
> spark-sql> 
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-24916) Fix type coercion for IN expression with subquery

2018-07-25 Thread Yuming Wang (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-24916?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yuming Wang resolved SPARK-24916.
-
Resolution: Duplicate

> Fix type coercion for IN expression with subquery
> -
>
> Key: SPARK-24916
> URL: https://issues.apache.org/jira/browse/SPARK-24916
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.2.0, 2.3.0
>Reporter: Yuming Wang
>Priority: Major
>
> How to reproduce:
> {code:sql}
> CREATE TEMPORARY VIEW t4 AS SELECT * FROM VALUES
>   (CAST(1 AS DOUBLE), CAST(2 AS STRING), CAST(3 AS STRING))
> AS t1(t4a, t4b, t4c);
> CREATE TEMPORARY VIEW t5 AS SELECT * FROM VALUES
>   (CAST(1 AS DECIMAL(18, 0)), CAST(2 AS STRING), CAST(3 AS BIGINT))
> AS t1(t5a, t5b, t5c);
> SELECT * FROM t4
> WHERE
> (t4a, t4b, t4c) IN (SELECT t5a,
>t5b,
>t5c
> FROM t5);
> {code}
> Will throw exception:
> {noformat}
> org.apache.spark.sql.AnalysisException
> cannot resolve '(named_struct('t4a', t4.`t4a`, 't4b', t4.`t4b`, 't4c', 
> t4.`t4c`) IN (listquery()))' due to data type mismatch: 
> The data type of one or more elements in the left hand side of an IN subquery
> is not compatible with the data type of the output of the subquery
> Mismatched columns:
> [(t4.`t4a`:double, t5.`t5a`:decimal(18,0)), (t4.`t4c`:string, 
> t5.`t5c`:bigint)]
> Left side:
> [double, string, string].
> Right side:
> [decimal(18,0), string, bigint].;
> {noformat}
> But it success on Spark 2.1.x.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-24816) SQL interface support repartitionByRange

2018-07-31 Thread Yuming Wang (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-24816?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yuming Wang resolved SPARK-24816.
-
Resolution: Won't Fix

{{Order by}} is implement by {{rangepartitioning}}.

> SQL interface support repartitionByRange
> 
>
> Key: SPARK-24816
> URL: https://issues.apache.org/jira/browse/SPARK-24816
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.4.0
>Reporter: Yuming Wang
>Priority: Major
> Attachments: DISTRIBUTE_BY_SORT_BY.png, 
> RANGE_DISTRIBUTE_BY_SORT_BY.png
>
>
> SQL interface support {{repartitionByRange}} to improvement data pushdown. I 
> have test this feature with a big table(data size: 1.1 T, row count: 
> 282,001,954,428) .
> The test sql is:
> {code:sql}
> select * from table where id=401564838907
> {code}
> The test result:
> |Mode|Input Size|Records|Total Time|Duration|Prepare data Resource Allocation 
> MB-seconds|
> |default|959.2 GB|237624395522|11.2 h|1.3 min|6496280086|
> |DISTRIBUTE BY|970.8 GB|244642791213|11.4 h|1.3 min|10536069846|
> |SORT BY|456.3 GB|101587838784|5.4 h|31 s|8965158620|
> |DISTRIBUTE BY + SORT BY |219.0 GB |51723521593|3.3 h|54 s|12552656774|
> |RANGE PARTITION BY |38.5 GB|75355144|45 min|13 s|14525275297|
> |RANGE PARTITION BY + SORT BY|17.4 GB|14334724|45 min|12 s|16255296698|



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-19394) "assertion failed: Expected hostname" on macOS when self-assigned IP contains a percent sign

2018-07-31 Thread Yuming Wang (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-19394?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16564532#comment-16564532
 ] 

Yuming Wang commented on SPARK-19394:
-

Try to add {{::1             localhost}} to /etc/hosts.

> "assertion failed: Expected hostname" on macOS when self-assigned IP contains 
> a percent sign
> 
>
> Key: SPARK-19394
> URL: https://issues.apache.org/jira/browse/SPARK-19394
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.1.0
>Reporter: Jacek Laskowski
>Priority: Minor
>
> See [this question on 
> StackOverflow|http://stackoverflow.com/q/41914586/1305344].
> {quote}
> So when I am not connected to internet, spark shell fails to load in local 
> mode. I am running Apache Spark 2.1.0 downloaded from internet, running on my 
> Mac. So I run ./bin/spark-shell and it gives me the error below.
> So I have read the Spark code and it is using Java's 
> InetAddress.getLocalHost() to find the localhost's IP address. So when I am 
> connected to internet, I get back an IPv4 with my local hostname.
> scala> InetAddress.getLocalHost
> res9: java.net.InetAddress = AliKheyrollahis-MacBook-Pro.local/192.168.1.26
> but the key is, when disconnected, I get an IPv6 with a percentage in the 
> values (it is scoped):
> scala> InetAddress.getLocalHost
> res10: java.net.InetAddress = 
> AliKheyrollahis-MacBook-Pro.local/fe80:0:0:0:2b9a:4521:a301:e9a5%10
> And this IP is the same as the one you see in the error message. I feel my 
> problem is that it throws Spark since it cannot handle %10 in the result.
> ...
> 17/01/28 22:03:28 INFO SparkUI: Bound SparkUI to 0.0.0.0, and started at 
> http://fe80:0:0:0:2b9a:4521:a301:e9a5%10:4040
> 17/01/28 22:03:28 INFO Executor: Starting executor ID driver on host localhost
> 17/01/28 22:03:28 INFO Executor: Using REPL class URI: 
> spark://fe80:0:0:0:2b9a:4521:a301:e9a5%10:56107/classes
> 17/01/28 22:03:28 ERROR SparkContext: Error initializing SparkContext.
> java.lang.AssertionError: assertion failed: Expected hostname
> at scala.Predef$.assert(Predef.scala:170)
> at org.apache.spark.util.Utils$.checkHost(Utils.scala:931)
> at org.apache.spark.util.RpcUtils$.makeDriverRef(RpcUtils.scala:31)
> at org.apache.spark.executor.Executor.(Executor.scala:121)
> at 
> org.apache.spark.scheduler.local.LocalEndpoint.(LocalSchedulerBackend.scala:59)
> at 
> org.apache.spark.scheduler.local.LocalSchedulerBackend.start(LocalSchedulerBackend.scala:126)
> at 
> org.apache.spark.scheduler.TaskSchedulerImpl.start(TaskSchedulerImpl.scala:156)
> at org.apache.spark.SparkContext.(SparkContext.scala:509)
> at org.apache.spark.SparkContext$.getOrCreate(SparkContext.scala:2313)
> at 
> org.apache.spark.sql.SparkSession$Builder$$anonfun$6.apply(SparkSession.scala:868)
> at 
> org.apache.spark.sql.SparkSession$Builder$$anonfun$6.apply(SparkSession.scala:860)
> at scala.Option.getOrElse(Option.scala:121)
> at 
> org.apache.spark.sql.SparkSession$Builder.getOrCreate(SparkSession.scala:860)
> at org.apache.spark.repl.Main$.createSparkSession(Main.scala:95)
> {quote}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-24937) Datasource partition table should load empty partitions

2018-07-26 Thread Yuming Wang (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-24937?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yuming Wang updated SPARK-24937:

Description: 
How to reproduce:
{code:sql}
spark-sql> CREATE TABLE tbl AS SELECT 1;
18/07/26 22:48:11 WARN HiveMetaStore: Location: 
file:/Users/yumwang/tmp/spark/spark-warehouse/tbl specified for non-external 
table:tbl
18/07/26 22:48:15 WARN ObjectStore: Failed to get database global_temp, 
returning NoSuchObjectException
spark-sql> CREATE TABLE tbl1 (c1 BIGINT, day STRING, hour STRING)
 > USING parquet
 > PARTITIONED BY (day, hour);
spark-sql> INSERT INTO TABLE tbl1 PARTITION (day = '2018-07-25', hour='01') 
SELECT * FROM tbl where 1=0;
spark-sql> SHOW PARTITIONS tbl1;
spark-sql> CREATE TABLE tbl2 (c1 BIGINT)
 > PARTITIONED BY (day STRING, hour STRING);
18/07/26 22:49:20 WARN HiveMetaStore: Location: 
file:/Users/yumwang/tmp/spark/spark-warehouse/tbl2 specified for non-external 
table:tbl2
spark-sql> INSERT INTO TABLE tbl2 PARTITION (day = '2018-07-25', hour='01') 
SELECT * FROM tbl where 1=0;
18/07/26 22:49:36 WARN log: Updating partition stats fast for: tbl2
18/07/26 22:49:36 WARN log: Updated size to 0
spark-sql> SHOW PARTITIONS tbl2;
day=2018-07-25/hour=01
spark-sql> 
{code}

  was:
{code:sql}
spark-sql> CREATE TABLE tbl AS SELECT 1;
18/07/26 22:48:11 WARN HiveMetaStore: Location: 
file:/Users/yumwang/tmp/spark/spark-warehouse/tbl specified for non-external 
table:tbl
18/07/26 22:48:15 WARN ObjectStore: Failed to get database global_temp, 
returning NoSuchObjectException
spark-sql> CREATE TABLE tbl1 (c1 BIGINT, day STRING, hour STRING)
 > USING parquet
 > PARTITIONED BY (day, hour);
spark-sql> INSERT INTO TABLE tbl1 PARTITION (day = '2018-07-25', hour='01') 
SELECT * FROM tbl where 1=0;
spark-sql> SHOW PARTITIONS tbl1;
spark-sql> CREATE TABLE tbl2 (c1 BIGINT)
 > PARTITIONED BY (day STRING, hour STRING);
18/07/26 22:49:20 WARN HiveMetaStore: Location: 
file:/Users/yumwang/tmp/spark/spark-warehouse/tbl2 specified for non-external 
table:tbl2
spark-sql> INSERT INTO TABLE tbl2 PARTITION (day = '2018-07-25', hour='01') 
SELECT * FROM tbl where 1=0;
18/07/26 22:49:36 WARN log: Updating partition stats fast for: tbl2
18/07/26 22:49:36 WARN log: Updated size to 0
spark-sql> SHOW PARTITIONS tbl2;
day=2018-07-25/hour=01
spark-sql> 
{code}


> Datasource partition table should load empty partitions
> ---
>
> Key: SPARK-24937
> URL: https://issues.apache.org/jira/browse/SPARK-24937
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.4.0
>Reporter: Yuming Wang
>Priority: Major
>
> How to reproduce:
> {code:sql}
> spark-sql> CREATE TABLE tbl AS SELECT 1;
> 18/07/26 22:48:11 WARN HiveMetaStore: Location: 
> file:/Users/yumwang/tmp/spark/spark-warehouse/tbl specified for non-external 
> table:tbl
> 18/07/26 22:48:15 WARN ObjectStore: Failed to get database global_temp, 
> returning NoSuchObjectException
> spark-sql> CREATE TABLE tbl1 (c1 BIGINT, day STRING, hour STRING)
>  > USING parquet
>  > PARTITIONED BY (day, hour);
> spark-sql> INSERT INTO TABLE tbl1 PARTITION (day = '2018-07-25', hour='01') 
> SELECT * FROM tbl where 1=0;
> spark-sql> SHOW PARTITIONS tbl1;
> spark-sql> CREATE TABLE tbl2 (c1 BIGINT)
>  > PARTITIONED BY (day STRING, hour STRING);
> 18/07/26 22:49:20 WARN HiveMetaStore: Location: 
> file:/Users/yumwang/tmp/spark/spark-warehouse/tbl2 specified for non-external 
> table:tbl2
> spark-sql> INSERT INTO TABLE tbl2 PARTITION (day = '2018-07-25', hour='01') 
> SELECT * FROM tbl where 1=0;
> 18/07/26 22:49:36 WARN log: Updating partition stats fast for: tbl2
> 18/07/26 22:49:36 WARN log: Updated size to 0
> spark-sql> SHOW PARTITIONS tbl2;
> day=2018-07-25/hour=01
> spark-sql> 
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-24937) Datasource partition table should load empty partitions

2018-07-26 Thread Yuming Wang (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-24937?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yuming Wang updated SPARK-24937:

Description: 
How to reproduce:
{code:sql}
spark-sql> CREATE TABLE tbl AS SELECT 1;
spark-sql> CREATE TABLE tbl1 (c1 BIGINT, day STRING, hour STRING)
 > USING parquet
 > PARTITIONED BY (day, hour);
spark-sql> INSERT INTO TABLE tbl1 PARTITION (day = '2018-07-25', hour='01') 
SELECT * FROM tbl where 1=0;
spark-sql> SHOW PARTITIONS tbl1;
spark-sql> CREATE TABLE tbl2 (c1 BIGINT)
 > PARTITIONED BY (day STRING, hour STRING);
18/07/26 22:49:20 WARN HiveMetaStore: Location: 
file:/Users/yumwang/tmp/spark/spark-warehouse/tbl2 specified for non-external 
table:tbl2
spark-sql> INSERT INTO TABLE tbl2 PARTITION (day = '2018-07-25', hour='01') 
SELECT * FROM tbl where 1=0;
18/07/26 22:49:36 WARN log: Updating partition stats fast for: tbl2
18/07/26 22:49:36 WARN log: Updated size to 0
spark-sql> SHOW PARTITIONS tbl2;
day=2018-07-25/hour=01
spark-sql> 
{code}

  was:
How to reproduce:
{code:sql}
spark-sql> CREATE TABLE tbl AS SELECT 1;
18/07/26 22:48:11 WARN HiveMetaStore: Location: 
file:/Users/yumwang/tmp/spark/spark-warehouse/tbl specified for non-external 
table:tbl
18/07/26 22:48:15 WARN ObjectStore: Failed to get database global_temp, 
returning NoSuchObjectException
spark-sql> CREATE TABLE tbl1 (c1 BIGINT, day STRING, hour STRING)
 > USING parquet
 > PARTITIONED BY (day, hour);
spark-sql> INSERT INTO TABLE tbl1 PARTITION (day = '2018-07-25', hour='01') 
SELECT * FROM tbl where 1=0;
spark-sql> SHOW PARTITIONS tbl1;
spark-sql> CREATE TABLE tbl2 (c1 BIGINT)
 > PARTITIONED BY (day STRING, hour STRING);
18/07/26 22:49:20 WARN HiveMetaStore: Location: 
file:/Users/yumwang/tmp/spark/spark-warehouse/tbl2 specified for non-external 
table:tbl2
spark-sql> INSERT INTO TABLE tbl2 PARTITION (day = '2018-07-25', hour='01') 
SELECT * FROM tbl where 1=0;
18/07/26 22:49:36 WARN log: Updating partition stats fast for: tbl2
18/07/26 22:49:36 WARN log: Updated size to 0
spark-sql> SHOW PARTITIONS tbl2;
day=2018-07-25/hour=01
spark-sql> 
{code}


> Datasource partition table should load empty partitions
> ---
>
> Key: SPARK-24937
> URL: https://issues.apache.org/jira/browse/SPARK-24937
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.4.0
>Reporter: Yuming Wang
>Priority: Major
>
> How to reproduce:
> {code:sql}
> spark-sql> CREATE TABLE tbl AS SELECT 1;
> spark-sql> CREATE TABLE tbl1 (c1 BIGINT, day STRING, hour STRING)
>  > USING parquet
>  > PARTITIONED BY (day, hour);
> spark-sql> INSERT INTO TABLE tbl1 PARTITION (day = '2018-07-25', hour='01') 
> SELECT * FROM tbl where 1=0;
> spark-sql> SHOW PARTITIONS tbl1;
> spark-sql> CREATE TABLE tbl2 (c1 BIGINT)
>  > PARTITIONED BY (day STRING, hour STRING);
> 18/07/26 22:49:20 WARN HiveMetaStore: Location: 
> file:/Users/yumwang/tmp/spark/spark-warehouse/tbl2 specified for non-external 
> table:tbl2
> spark-sql> INSERT INTO TABLE tbl2 PARTITION (day = '2018-07-25', hour='01') 
> SELECT * FROM tbl where 1=0;
> 18/07/26 22:49:36 WARN log: Updating partition stats fast for: tbl2
> 18/07/26 22:49:36 WARN log: Updated size to 0
> spark-sql> SHOW PARTITIONS tbl2;
> day=2018-07-25/hour=01
> spark-sql> 
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-24937) Datasource partition table should load empty partitions

2018-07-26 Thread Yuming Wang (JIRA)
Yuming Wang created SPARK-24937:
---

 Summary: Datasource partition table should load empty partitions
 Key: SPARK-24937
 URL: https://issues.apache.org/jira/browse/SPARK-24937
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 2.4.0
Reporter: Yuming Wang


{code:sql}
spark-sql> CREATE TABLE tbl AS SELECT 1;
18/07/26 22:48:11 WARN HiveMetaStore: Location: 
file:/Users/yumwang/tmp/spark/spark-warehouse/tbl specified for non-external 
table:tbl
18/07/26 22:48:15 WARN ObjectStore: Failed to get database global_temp, 
returning NoSuchObjectException
spark-sql> CREATE TABLE tbl1 (c1 BIGINT, day STRING, hour STRING)
 > USING parquet
 > PARTITIONED BY (day, hour);
spark-sql> INSERT INTO TABLE tbl1 PARTITION (day = '2018-07-25', hour='01') 
SELECT * FROM tbl where 1=0;
spark-sql> SHOW PARTITIONS tbl1;
spark-sql> CREATE TABLE tbl2 (c1 BIGINT)
 > PARTITIONED BY (day STRING, hour STRING);
18/07/26 22:49:20 WARN HiveMetaStore: Location: 
file:/Users/yumwang/tmp/spark/spark-warehouse/tbl2 specified for non-external 
table:tbl2
spark-sql> INSERT INTO TABLE tbl2 PARTITION (day = '2018-07-25', hour='01') 
SELECT * FROM tbl where 1=0;
18/07/26 22:49:36 WARN log: Updating partition stats fast for: tbl2
18/07/26 22:49:36 WARN log: Updated size to 0
spark-sql> SHOW PARTITIONS tbl2;
day=2018-07-25/hour=01
spark-sql> 
{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-24937) Datasource partition table should load empty partitions

2018-07-26 Thread Yuming Wang (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-24937?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16558386#comment-16558386
 ] 

Yuming Wang commented on SPARK-24937:
-

I'm working on.

> Datasource partition table should load empty partitions
> ---
>
> Key: SPARK-24937
> URL: https://issues.apache.org/jira/browse/SPARK-24937
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.4.0
>Reporter: Yuming Wang
>Priority: Major
>
> How to reproduce:
> {code:sql}
> spark-sql> CREATE TABLE tbl AS SELECT 1;
> 18/07/26 22:48:11 WARN HiveMetaStore: Location: 
> file:/Users/yumwang/tmp/spark/spark-warehouse/tbl specified for non-external 
> table:tbl
> 18/07/26 22:48:15 WARN ObjectStore: Failed to get database global_temp, 
> returning NoSuchObjectException
> spark-sql> CREATE TABLE tbl1 (c1 BIGINT, day STRING, hour STRING)
>  > USING parquet
>  > PARTITIONED BY (day, hour);
> spark-sql> INSERT INTO TABLE tbl1 PARTITION (day = '2018-07-25', hour='01') 
> SELECT * FROM tbl where 1=0;
> spark-sql> SHOW PARTITIONS tbl1;
> spark-sql> CREATE TABLE tbl2 (c1 BIGINT)
>  > PARTITIONED BY (day STRING, hour STRING);
> 18/07/26 22:49:20 WARN HiveMetaStore: Location: 
> file:/Users/yumwang/tmp/spark/spark-warehouse/tbl2 specified for non-external 
> table:tbl2
> spark-sql> INSERT INTO TABLE tbl2 PARTITION (day = '2018-07-25', hour='01') 
> SELECT * FROM tbl where 1=0;
> 18/07/26 22:49:36 WARN log: Updating partition stats fast for: tbl2
> 18/07/26 22:49:36 WARN log: Updated size to 0
> spark-sql> SHOW PARTITIONS tbl2;
> day=2018-07-25/hour=01
> spark-sql> 
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-20592) Alter table concatenate is not working as expected.

2018-08-05 Thread Yuming Wang (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-20592?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16569673#comment-16569673
 ] 

Yuming Wang commented on SPARK-20592:
-

Spark doesn't support this command:

[https://github.com/apache/spark/blob/73dd6cf9b558f9d752e1f3c13584344257ad7863/sql/catalyst/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBase.g4#L217]

 

> Alter table concatenate is not working as expected.
> ---
>
> Key: SPARK-20592
> URL: https://issues.apache.org/jira/browse/SPARK-20592
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark
>Affects Versions: 2.1.0, 2.2.1, 2.3.1
>Reporter: Guru Prabhakar Reddy Marthala
>Priority: Major
>  Labels: hive, pyspark
>
> Created a table using CTAS from csv to parquet.Parquet table generated 
> numerous small files.tried alter table concatenate but it's not working as 
> expected.
> spark.sql("CREATE TABLE flight.flight_data(year INT,   month INT,   day INT,  
>  day_of_week INT,   dep_time INT,   crs_dep_time INT,   arr_time INT,   
> crs_arr_time INT,   unique_carrier STRING,   flight_num INT,   tail_num 
> STRING,   actual_elapsed_time INT,   crs_elapsed_time INT,   air_time INT,   
> arr_delay INT,   dep_delay INT,   origin STRING,   dest STRING,   distance 
> INT,   taxi_in INT,   taxi_out INT,   cancelled INT,   cancellation_code 
> STRING,   diverted INT,   carrier_delay STRING,   weather_delay STRING,   
> nas_delay STRING,   security_delay STRING,   late_aircraft_delay STRING) ROW 
> FORMAT DELIMITED FIELDS TERMINATED BY ',' stored as textfile")
> spark.sql("load data local INPATH 'i:/2008/2008.csv' INTO TABLE 
> flight.flight_data")
> spark.sql("create table flight.flight_data_pq stored as parquet as select * 
> from flight.flight_data")
> spark.sql("create table flight.flight_data_orc stored as orc as select * from 
> flight.flight_data")
> pyspark.sql.utils.ParseException: u'\nOperation not allowed: alter table 
> concatenate(line 1, pos 0)\n\n== SQL ==\nalter table 
> flight_data.flight_data_pq concatenate\n^^^\n'
> Tried on both orc and parquet format.It's not working.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-25085) Insert overwrite a non-partitioned table can delete table folder

2018-08-10 Thread Yuming Wang (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-25085?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16576711#comment-16576711
 ] 

Yuming Wang commented on SPARK-25085:
-

I'm working on this.

> Insert overwrite a non-partitioned table can delete table folder
> 
>
> Key: SPARK-25085
> URL: https://issues.apache.org/jira/browse/SPARK-25085
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.3.0
>Reporter: Rui Li
>Priority: Major
>
> When inserting overwrite a data source table, Spark firstly deletes all the 
> partitions. For non-partitioned table, it will delete the table folder, which 
> is wrong because table folder may contain information like ACL entries.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-25039) Binary comparison behavior should refer to Teradata

2018-08-06 Thread Yuming Wang (JIRA)
Yuming Wang created SPARK-25039:
---

 Summary: Binary comparison behavior should refer to Teradata
 Key: SPARK-25039
 URL: https://issues.apache.org/jira/browse/SPARK-25039
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 2.4.0
Reporter: Yuming Wang


The main difference is:

# When comparing a {{StringType}} value with a {{NumericType}} value, Spark 
converts the {{StringType}} data to a {{NumericType}} value. But Teradata 
converts the {{StringType}} data to a {{DoubleType}} value.
# When comparing a {{StringType}} value with a {{DateType}} value, Spark 
converts the {{DateType}} data to a {{StringType}} value. But Teradata converts 
the {{StringType}} data to a {{DateType}} value.
 

More details:
https://github.com/apache/spark/blob/65a4bc143ab5dc2ced589dc107bbafa8a7290931/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/TypeCoercion.scala#L120-L149
https://www.info.teradata.com/HTMLPubs/DB_TTU_16_00/index.html#page/SQL_Reference/B035-1145-160K/lrn1472241011038.html




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-25056) Unify the InConversion and BinaryComparison behaviour when InConversion's list only contains one datatype

2018-08-08 Thread Yuming Wang (JIRA)
Yuming Wang created SPARK-25056:
---

 Summary: Unify the InConversion and BinaryComparison behaviour 
when InConversion's list only contains one datatype
 Key: SPARK-25056
 URL: https://issues.apache.org/jira/browse/SPARK-25056
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 2.4.0
Reporter: Yuming Wang


{code:java}
scala> val df = spark.range(4).toDF().selectExpr("cast(id as decimal(9, 2)) as 
id")

df: org.apache.spark.sql.DataFrame = [id: decimal(9,2)]



scala> df.filter("id in('1', '3')").show

+---+

| id|

+---+

+---+





scala> df.filter("id = '1' or id ='3'").show

++

|  id|

++

|1.00|

|3.00|

++
{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-25051) where clause on dataset gives AnalysisException

2018-08-13 Thread Yuming Wang (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-25051?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16579181#comment-16579181
 ] 

Yuming Wang edited comment on SPARK-25051 at 8/14/18 3:13 AM:
--

Yes. The bug only exist in branch-2.3. I can reproduced by:
{code}
val df1 = spark.range(4).selectExpr("id", "cast(id as string) as name")
val df2 = spark.range(3).selectExpr("id")
df1.join(df2, Seq("id"), "left_outer").where(df2("id").isNull).show
{code}


was (Author: q79969786):
Yes. The bug still exists. I can reproduced by:

{code:scala}
val df1 = spark.range(4).selectExpr("id", "cast(id as string) as name")
val df2 = spark.range(3).selectExpr("id")
df1.join(df2, Seq("id"), "left_outer").where(df2("id").isNull).show
{code}


> where clause on dataset gives AnalysisException
> ---
>
> Key: SPARK-25051
> URL: https://issues.apache.org/jira/browse/SPARK-25051
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core, SQL
>Affects Versions: 2.3.0
>Reporter: MIK
>Priority: Major
>
> *schemas :*
> df1
> => id ts
> df2
> => id name country
> *code:*
> val df = df1.join(df2, Seq("id"), "left_outer").where(df2("id").isNull)
> *error*:
> org.apache.spark.sql.AnalysisException:Resolved attribute(s) id#0 missing 
> from xx#15,xx#9L,id#5,xx#6,xx#11,xx#14,xx#13,xx#12,xx#7,xx#16,xx#10,xx#8L in 
> operator !Filter isnull(id#0). Attribute(s) with the same name appear in the 
> operation: id. Please check if the right attribute(s) are used.;;
>  at 
> org.apache.spark.sql.catalyst.analysis.CheckAnalysis$class.failAnalysis(CheckAnalysis.scala:41)
>     at 
> org.apache.spark.sql.catalyst.analysis.Analyzer.failAnalysis(Analyzer.scala:91)
>     at 
> org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis$1.apply(CheckAnalysis.scala:289)
>     at 
> org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis$1.apply(CheckAnalysis.scala:80)
>     at 
> org.apache.spark.sql.catalyst.trees.TreeNode.foreachUp(TreeNode.scala:127)
>     at 
> org.apache.spark.sql.catalyst.analysis.CheckAnalysis$class.checkAnalysis(CheckAnalysis.scala:80)
>     at 
> org.apache.spark.sql.catalyst.analysis.Analyzer.checkAnalysis(Analyzer.scala:91)
>     at 
> org.apache.spark.sql.catalyst.analysis.Analyzer.executeAndCheck(Analyzer.scala:104)
>     at 
> org.apache.spark.sql.execution.QueryExecution.analyzed$lzycompute(QueryExecution.scala:57)
>     at 
> org.apache.spark.sql.execution.QueryExecution.analyzed(QueryExecution.scala:55)
>     at 
> org.apache.spark.sql.execution.QueryExecution.assertAnalyzed(QueryExecution.scala:47)
>     at org.apache.spark.sql.Dataset.(Dataset.scala:172)
>     at org.apache.spark.sql.Dataset.(Dataset.scala:178)
>     at org.apache.spark.sql.Dataset$.apply(Dataset.scala:65)
>     at org.apache.spark.sql.Dataset.withTypedPlan(Dataset.scala:3300)
>     at org.apache.spark.sql.Dataset.filter(Dataset.scala:1458)
>     at org.apache.spark.sql.Dataset.where(Dataset.scala:1486)
> This works fine in spark 2.2.2



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-25051) where clause on dataset gives AnalysisException

2018-08-13 Thread Yuming Wang (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-25051?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16579181#comment-16579181
 ] 

Yuming Wang commented on SPARK-25051:
-

Yes. The bug still exists. I can reproduced by:

{code:scala}
val df1 = spark.range(4).selectExpr("id", "cast(id as string) as name")
val df2 = spark.range(3).selectExpr("id")
df1.join(df2, Seq("id"), "left_outer").where(df2("id").isNull).show
{code}


> where clause on dataset gives AnalysisException
> ---
>
> Key: SPARK-25051
> URL: https://issues.apache.org/jira/browse/SPARK-25051
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core, SQL
>Affects Versions: 2.3.0
>Reporter: MIK
>Priority: Major
>
> *schemas :*
> df1
> => id ts
> df2
> => id name country
> *code:*
> val df = df1.join(df2, Seq("id"), "left_outer").where(df2("id").isNull)
> *error*:
> org.apache.spark.sql.AnalysisException:Resolved attribute(s) id#0 missing 
> from xx#15,xx#9L,id#5,xx#6,xx#11,xx#14,xx#13,xx#12,xx#7,xx#16,xx#10,xx#8L in 
> operator !Filter isnull(id#0). Attribute(s) with the same name appear in the 
> operation: id. Please check if the right attribute(s) are used.;;
>  at 
> org.apache.spark.sql.catalyst.analysis.CheckAnalysis$class.failAnalysis(CheckAnalysis.scala:41)
>     at 
> org.apache.spark.sql.catalyst.analysis.Analyzer.failAnalysis(Analyzer.scala:91)
>     at 
> org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis$1.apply(CheckAnalysis.scala:289)
>     at 
> org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis$1.apply(CheckAnalysis.scala:80)
>     at 
> org.apache.spark.sql.catalyst.trees.TreeNode.foreachUp(TreeNode.scala:127)
>     at 
> org.apache.spark.sql.catalyst.analysis.CheckAnalysis$class.checkAnalysis(CheckAnalysis.scala:80)
>     at 
> org.apache.spark.sql.catalyst.analysis.Analyzer.checkAnalysis(Analyzer.scala:91)
>     at 
> org.apache.spark.sql.catalyst.analysis.Analyzer.executeAndCheck(Analyzer.scala:104)
>     at 
> org.apache.spark.sql.execution.QueryExecution.analyzed$lzycompute(QueryExecution.scala:57)
>     at 
> org.apache.spark.sql.execution.QueryExecution.analyzed(QueryExecution.scala:55)
>     at 
> org.apache.spark.sql.execution.QueryExecution.assertAnalyzed(QueryExecution.scala:47)
>     at org.apache.spark.sql.Dataset.(Dataset.scala:172)
>     at org.apache.spark.sql.Dataset.(Dataset.scala:178)
>     at org.apache.spark.sql.Dataset$.apply(Dataset.scala:65)
>     at org.apache.spark.sql.Dataset.withTypedPlan(Dataset.scala:3300)
>     at org.apache.spark.sql.Dataset.filter(Dataset.scala:1458)
>     at org.apache.spark.sql.Dataset.where(Dataset.scala:1486)
> This works fine in spark 2.2.2



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-25051) where clause on dataset gives AnalysisException

2018-08-12 Thread Yuming Wang (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-25051?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16577401#comment-16577401
 ] 

Yuming Wang edited comment on SPARK-25051 at 8/12/18 10:51 AM:
---

Can you verify it with Spark [2.3.2-rc4 
|https://dist.apache.org/repos/dist/dev/spark/v2.3.2-rc4-bin/]?


was (Author: q79969786):
Can you it with Spark [2.3.2-rc4 
|https://dist.apache.org/repos/dist/dev/spark/v2.3.2-rc4-bin/]?

> where clause on dataset gives AnalysisException
> ---
>
> Key: SPARK-25051
> URL: https://issues.apache.org/jira/browse/SPARK-25051
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core, SQL
>Affects Versions: 2.3.0
>Reporter: MIK
>Priority: Major
>
> *schemas :*
> df1
> => id ts
> df2
> => id name country
> *code:*
> val df = df1.join(df2, Seq("id"), "left_outer").where(df2("id").isNull)
> *error*:
> org.apache.spark.sql.AnalysisException:Resolved attribute(s) id#0 missing 
> from xx#15,xx#9L,id#5,xx#6,xx#11,xx#14,xx#13,xx#12,xx#7,xx#16,xx#10,xx#8L in 
> operator !Filter isnull(id#0). Attribute(s) with the same name appear in the 
> operation: id. Please check if the right attribute(s) are used.;;
>  at 
> org.apache.spark.sql.catalyst.analysis.CheckAnalysis$class.failAnalysis(CheckAnalysis.scala:41)
>     at 
> org.apache.spark.sql.catalyst.analysis.Analyzer.failAnalysis(Analyzer.scala:91)
>     at 
> org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis$1.apply(CheckAnalysis.scala:289)
>     at 
> org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis$1.apply(CheckAnalysis.scala:80)
>     at 
> org.apache.spark.sql.catalyst.trees.TreeNode.foreachUp(TreeNode.scala:127)
>     at 
> org.apache.spark.sql.catalyst.analysis.CheckAnalysis$class.checkAnalysis(CheckAnalysis.scala:80)
>     at 
> org.apache.spark.sql.catalyst.analysis.Analyzer.checkAnalysis(Analyzer.scala:91)
>     at 
> org.apache.spark.sql.catalyst.analysis.Analyzer.executeAndCheck(Analyzer.scala:104)
>     at 
> org.apache.spark.sql.execution.QueryExecution.analyzed$lzycompute(QueryExecution.scala:57)
>     at 
> org.apache.spark.sql.execution.QueryExecution.analyzed(QueryExecution.scala:55)
>     at 
> org.apache.spark.sql.execution.QueryExecution.assertAnalyzed(QueryExecution.scala:47)
>     at org.apache.spark.sql.Dataset.(Dataset.scala:172)
>     at org.apache.spark.sql.Dataset.(Dataset.scala:178)
>     at org.apache.spark.sql.Dataset$.apply(Dataset.scala:65)
>     at org.apache.spark.sql.Dataset.withTypedPlan(Dataset.scala:3300)
>     at org.apache.spark.sql.Dataset.filter(Dataset.scala:1458)
>     at org.apache.spark.sql.Dataset.where(Dataset.scala:1486)
> This works fine in spark 2.2.2



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-24631) Cannot up cast column from bigint to smallint as it may truncate

2018-08-12 Thread Yuming Wang (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-24631?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16577502#comment-16577502
 ] 

Yuming Wang edited comment on SPARK-24631 at 8/12/18 11:18 AM:
---

I hit this issue. fixed it by recreate table/view. Can you execute 2 SQLs like 
below:

 
{code:sql}
desc testtable;
{code}
{code:sql}
show create table testtable;
{code}
 


was (Author: q79969786):
I hit this issue. fixed it by recreate table/view. Can you execute 2 SQLs like 
below:

 
{code:java}
desc testtable;
{code}
{code:java}
show create table testtable;
{code}
 

> Cannot up cast column from bigint to smallint as it may truncate
> 
>
> Key: SPARK-24631
> URL: https://issues.apache.org/jira/browse/SPARK-24631
> Project: Spark
>  Issue Type: New JIRA Project
>  Components: Spark Core, Spark Submit
>Affects Versions: 2.2.1
>Reporter: Sivakumar
>Priority: Major
>
> Getting the below error when executing the simple select query,
> Sample:
> Table Description:
> name: String, id: BigInt
> val df=spark.sql("select name,id from testtable")
> ERROR: {color:#ff}Cannot up cast column "id" from bigint to smallint as 
> it may truncate.{color}
> I am not doing any transformation's, I am just trying to query a table ,But 
> still I am getting the error.
> I am getting this error only on production cluster and only for a single 
> table, other tables are running fine.
> + more data,
> val df=spark.sql("select* from table_name")
> I am just trying this query a table. But with other tables it is running fine.
> {color:#d04437}18/06/22 01:36:29 ERROR Driver1: [] [main] Exception occurred: 
> org.apache.spark.sql.AnalysisException: Cannot up cast `column_name` from 
> bigint to column_name#2525: smallint as it may truncate.{color}
> that specific column is having Bigint datatype, But there were other table's 
> that ran fine with Bigint columns.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-24631) Cannot up cast column from bigint to smallint as it may truncate

2018-08-12 Thread Yuming Wang (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-24631?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16577502#comment-16577502
 ] 

Yuming Wang commented on SPARK-24631:
-

I hit this issue. fixed it by recreate table/view. Can you execute 2 SQLs like 
below:

 
{code:java}
desc testtable;
{code}
{code:java}
show create table testtable;
{code}
 

> Cannot up cast column from bigint to smallint as it may truncate
> 
>
> Key: SPARK-24631
> URL: https://issues.apache.org/jira/browse/SPARK-24631
> Project: Spark
>  Issue Type: New JIRA Project
>  Components: Spark Core, Spark Submit
>Affects Versions: 2.2.1
>Reporter: Sivakumar
>Priority: Major
>
> Getting the below error when executing the simple select query,
> Sample:
> Table Description:
> name: String, id: BigInt
> val df=spark.sql("select name,id from testtable")
> ERROR: {color:#ff}Cannot up cast column "id" from bigint to smallint as 
> it may truncate.{color}
> I am not doing any transformation's, I am just trying to query a table ,But 
> still I am getting the error.
> I am getting this error only on production cluster and only for a single 
> table, other tables are running fine.
> + more data,
> val df=spark.sql("select* from table_name")
> I am just trying this query a table. But with other tables it is running fine.
> {color:#d04437}18/06/22 01:36:29 ERROR Driver1: [] [main] Exception occurred: 
> org.apache.spark.sql.AnalysisException: Cannot up cast `column_name` from 
> bigint to column_name#2525: smallint as it may truncate.{color}
> that specific column is having Bigint datatype, But there were other table's 
> that ran fine with Bigint columns.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-25135) insert datasource table may all null when select from view

2018-08-16 Thread Yuming Wang (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-25135?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yuming Wang updated SPARK-25135:

Affects Version/s: (was: 2.4.0)
   2.3.0
   2.3.1

> insert datasource table may all null when select from view
> --
>
> Key: SPARK-25135
> URL: https://issues.apache.org/jira/browse/SPARK-25135
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.3.0, 2.3.1
>Reporter: Yuming Wang
>Priority: Major
>
> How to reproduce:
> {code:scala}
> val path = "/tmp/spark/parquet"
> val cnt = 30
> spark.range(cnt).selectExpr("cast(id as bigint) as col1", "cast(id as bigint) 
> as col2").write.mode("overwrite").parquet(path)
> spark.sql(s"CREATE TABLE table1(col1 bigint, col2 bigint) using parquet 
> location '$path'")
> spark.sql("create view view1 as select col1, col2 from table1 where col1 > 
> -20")
> spark.sql("create table table2 (COL1 BIGINT, COL2 BIGINT) using parquet")
> spark.sql("insert overwrite table table2 select COL1, COL2 from view1")
> spark.table("table2").show
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-25135) insert datasource table may all null when select from view

2018-08-16 Thread Yuming Wang (JIRA)
Yuming Wang created SPARK-25135:
---

 Summary: insert datasource table may all null when select from view
 Key: SPARK-25135
 URL: https://issues.apache.org/jira/browse/SPARK-25135
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 2.4.0
Reporter: Yuming Wang


How to reproduce:
{code:scala}
val path = "/tmp/spark/parquet"
val cnt = 30
spark.range(cnt).selectExpr("cast(id as bigint) as col1", "cast(id as bigint) 
as col2").write.mode("overwrite").parquet(path)
spark.sql(s"CREATE TABLE table1(col1 bigint, col2 bigint) using parquet 
location '$path'")
spark.sql("create view view1 as select col1, col2 from table1 where col1 > -20")
spark.sql("create table table2 (COL1 BIGINT, COL2 BIGINT) using parquet")
spark.sql("insert overwrite table table2 select COL1, COL2 from view1")
spark.table("table2").show
{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-25132) Spark returns NULL for a column whose Hive metastore schema and Parquet schema are in different letter cases

2018-08-17 Thread Yuming Wang (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-25132?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16583688#comment-16583688
 ] 

Yuming Wang commented on SPARK-25132:
-

Stackoverflow has been asked this question [Spark SQL returns null for a column 
in HIVE table while HIVE query returns non null 
values|https://stackoverflow.com/questions/50298909/spark-sql-returns-null-for-a-column-in-hive-table-while-hive-query-returns-non-n].

> Spark returns NULL for a column whose Hive metastore schema and Parquet 
> schema are in different letter cases
> 
>
> Key: SPARK-25132
> URL: https://issues.apache.org/jira/browse/SPARK-25132
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.3.1
>Reporter: Chenxiao Mao
>Priority: Major
>
> Spark SQL returns NULL for a column whose Hive metastore schema and Parquet 
> schema are in different letter cases, regardless of spark.sql.caseSensitive 
> set to true or false.
> Here is a simple example to reproduce this issue:
> scala> spark.range(5).toDF.write.mode("overwrite").saveAsTable("t1")
> spark-sql> show create table t1;
> CREATE TABLE `t1` (`id` BIGINT)
> USING parquet
> OPTIONS (
>  `serialization.format` '1'
> )
> spark-sql> CREATE TABLE `t2` (`ID` BIGINT)
>  > USING parquet
>  > LOCATION 'hdfs://localhost/user/hive/warehouse/t1';
> spark-sql> select * from t1;
> 0
> 1
> 2
> 3
> 4
> spark-sql> select * from t2;
> NULL
> NULL
> NULL
> NULL
> NULL
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Issue Comment Deleted] (SPARK-25085) Insert overwrite a non-partitioned table can delete table folder

2018-08-11 Thread Yuming Wang (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-25085?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yuming Wang updated SPARK-25085:

Comment: was deleted

(was: I'm working on this.)

> Insert overwrite a non-partitioned table can delete table folder
> 
>
> Key: SPARK-25085
> URL: https://issues.apache.org/jira/browse/SPARK-25085
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.3.0
>Reporter: Rui Li
>Priority: Major
>
> When inserting overwrite a data source table, Spark firstly deletes all the 
> partitions. For non-partitioned table, it will delete the table folder, which 
> is wrong because table folder may contain information like ACL entries.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-25085) Insert overwrite a non-partitioned table can delete table folder

2018-08-11 Thread Yuming Wang (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-25085?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16577221#comment-16577221
 ] 

Yuming Wang commented on SPARK-25085:
-

[~lirui] A another issue you may be interest: 
[SPARK-24937|https://issues.apache.org/jira/browse/SPARK-24937].

> Insert overwrite a non-partitioned table can delete table folder
> 
>
> Key: SPARK-25085
> URL: https://issues.apache.org/jira/browse/SPARK-25085
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.3.0
>Reporter: Rui Li
>Priority: Major
>
> When inserting overwrite a data source table, Spark firstly deletes all the 
> partitions. For non-partitioned table, it will delete the table folder, which 
> is wrong because table folder may contain information like ACL entries.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-25084) "distribute by" on multiple columns may lead to codegen issue

2018-08-09 Thread Yuming Wang (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-25084?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16575771#comment-16575771
 ] 

Yuming Wang commented on SPARK-25084:
-

[~smilegator], [~jerryshao] I think It should be target 2.3.2.

 

> "distribute by" on multiple columns may lead to codegen issue
> -
>
> Key: SPARK-25084
> URL: https://issues.apache.org/jira/browse/SPARK-25084
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.3.1
>Reporter: yucai
>Priority: Major
>
> Test Query:
> {code:java}
> select * from store_sales distribute by (ss_sold_time_sk, ss_item_sk, 
> ss_customer_sk, ss_cdemo_sk, ss_addr_sk, ss_promo_sk, ss_ext_list_price, 
> ss_net_profit) limit 1000;{code}
> Wrong Codegen:
> {code:java}
> /* 146 */ private int computeHashForStruct_0(InternalRow 
> mutableStateArray[0], int value1) {
> /* 147 */
> /* 148 */
> /* 149 */ if (!mutableStateArray[0].isNullAt(0)) {
> /* 150 */
> /* 151 */ final int element = mutableStateArray[0].getInt(0);
> /* 152 */ value1 = 
> org.apache.spark.unsafe.hash.Murmur3_x86_32.hashInt(element, value1);
> /* 153 */
> /* 154 */ }{code}
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-25084) "distribute by" on multiple columns may lead to codegen issue

2018-08-09 Thread Yuming Wang (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-25084?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16575777#comment-16575777
 ] 

Yuming Wang commented on SPARK-25084:
-

It's a regression.

> "distribute by" on multiple columns may lead to codegen issue
> -
>
> Key: SPARK-25084
> URL: https://issues.apache.org/jira/browse/SPARK-25084
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.3.1
>Reporter: yucai
>Priority: Major
>
> Test Query:
> {code:java}
> select * from store_sales distribute by (ss_sold_time_sk, ss_item_sk, 
> ss_customer_sk, ss_cdemo_sk, ss_addr_sk, ss_promo_sk, ss_ext_list_price, 
> ss_net_profit) limit 1000;{code}
> Wrong Codegen:
> {code:java}
> /* 146 */ private int computeHashForStruct_0(InternalRow 
> mutableStateArray[0], int value1) {
> /* 147 */
> /* 148 */
> /* 149 */ if (!mutableStateArray[0].isNullAt(0)) {
> /* 150 */
> /* 151 */ final int element = mutableStateArray[0].getInt(0);
> /* 152 */ value1 = 
> org.apache.spark.unsafe.hash.Murmur3_x86_32.hashInt(element, value1);
> /* 153 */
> /* 154 */ }{code}
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-25071) BuildSide is coming not as expected with join queries

2018-08-11 Thread Yuming Wang (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-25071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16577385#comment-16577385
 ] 

Yuming Wang commented on SPARK-25071:
-

I think it's correct. CBO based on RowCount.
{code:scala}
def getOutputSize(
attributes: Seq[Attribute],
outputRowCount: BigInt,
attrStats: AttributeMap[ColumnStat] = AttributeMap(Nil)): BigInt = {
  // Output size can't be zero, or sizeInBytes of BinaryNode will also be zero
  // (simple computation of statistics returns product of children).
  if (outputRowCount > 0) outputRowCount * getSizePerRow(attributes, attrStats) 
else 1
}
{code}


{code:scala}
  def getSizePerRow(
  attributes: Seq[Attribute],
  attrStats: AttributeMap[ColumnStat] = AttributeMap(Nil)): BigInt = {
// We assign a generic overhead for a Row object, the actual overhead is 
different for different
// Row format.
8 + attributes.map { attr =>
  if (attrStats.get(attr).map(_.avgLen.isDefined).getOrElse(false)) {
attr.dataType match {
  case StringType =>
// UTF8String: base + offset + numBytes
attrStats(attr).avgLen.get + 8 + 4
  case _ =>
attrStats(attr).avgLen.get
}
  } else {
attr.dataType.defaultSize
  }
}.sum
  }
{code}

So for Scenario 2:
right.stats.sizeInBytes=32
left.stats.sizeInBytes=32

{code:scala}
private def broadcastSide(
canBuildLeft: Boolean,
canBuildRight: Boolean,
left: LogicalPlan,
right: LogicalPlan): BuildSide = {

  def smallerSide =
if (right.stats.sizeInBytes <= left.stats.sizeInBytes) BuildRight else 
BuildLeft

  if (canBuildRight && canBuildLeft) {
// Broadcast smaller side base on its estimated physical size
// if both sides have broadcast hint
smallerSide
  } else if (canBuildRight) {
BuildRight
  } else if (canBuildLeft) {
BuildLeft
  } else {
// for the last default broadcast nested loop join
smallerSide
  }
}
{code}

you can verify it by:

{code:scala}
spark.sql("CREATE TABLE small4 (c1 bigint) TBLPROPERTIES ('numRows'='2', 
'rawDataSize'='600','totalSize'='80')")
spark.sql("CREATE TABLE big4 (c1 string) TBLPROPERTIES ('numRows'='2', 
'rawDataSize'='6000', 'totalSize'='800')")
val plan = spark.sql("select * from small4 t1 join big4 t2 on (t1.c1 = 
t2.c1)").queryExecution.executedPlan
val buildSide = 
plan.children.head.asInstanceOf[BroadcastHashJoinExec].buildSide
println(buildSide)

or

spark.sql("CREATE TABLE small4 (c1 bigint) TBLPROPERTIES ('numRows'='2', 
'rawDataSize'='600','totalSize'='80')")
spark.sql("CREATE TABLE big4 (c1 bigint, c2 bigint) TBLPROPERTIES 
('numRows'='2', 'rawDataSize'='6000', 'totalSize'='800')")
val plan = spark.sql("select * from small4 t1 join big4 t2 on (t1.c1 = 
t2.c1)").queryExecution.executedPlan
val buildSide = 
plan.children.head.asInstanceOf[BroadcastHashJoinExec].buildSide
println(buildSide)
{code}




> BuildSide is coming not as expected with join queries
> -
>
> Key: SPARK-25071
> URL: https://issues.apache.org/jira/browse/SPARK-25071
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.3.1
> Environment: Spark 2.3.1 
> Hadoop 2.7.3
>Reporter: Ayush Anubhava
>Priority: Major
>
> *BuildSide is not coming as expected.*
> Pre-requisites:
> *CBO is set as true &  spark.sql.cbo.joinReorder.enabled= true.*
> *import org.apache.spark.sql.execution.joins.BroadcastHashJoinExec*
> *Steps:*
> *Scenario 1:*
> spark.sql("CREATE TABLE small3 (c1 bigint) TBLPROPERTIES ('numRows'='2', 
> 'rawDataSize'='600','totalSize'='800')")
>  spark.sql("CREATE TABLE big3 (c1 bigint) TBLPROPERTIES ('numRows'='2', 
> 'rawDataSize'='6000', 'totalSize'='800')")
>  val plan = spark.sql("select * from small3 t1 join big3 t2 on (t1.c1 = 
> t2.c1)").queryExecution.executedPlan
>  val buildSide = 
> plan.children.head.asInstanceOf[BroadcastHashJoinExec].buildSide
>  println(buildSide)
>  
> *Result 1:*
> scala> val plan = spark.sql("select * from small3 t1 join big3 t2 on (t1.c1 = 
> t2.c1)").queryExecution.executedPlan
>  plan: org.apache.spark.sql.execution.SparkPlan =
>  *(2) BroadcastHashJoin [c1#0L|#0L], [c1#1L|#1L], Inner, BuildRight
>  :- *(2) Filter isnotnull(c1#0L)
>  : +- HiveTableScan [c1#0L|#0L], HiveTableRelation `default`.`small3`, 
> org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, [c1#0L|#0L]
>  +- BroadcastExchange HashedRelationBroadcastMode(List(input[0, bigint, 
> false]))
>  +- *(1) Filter isnotnull(c1#1L)
>  +- HiveTableScan [c1#1L|#1L], HiveTableRelation `default`.`big3`, 
> org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, [c1#1L|#1L]
> scala> val buildSide = 
> 

[jira] [Commented] (SPARK-25051) where clause on dataset gives AnalysisException

2018-08-11 Thread Yuming Wang (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-25051?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16577401#comment-16577401
 ] 

Yuming Wang commented on SPARK-25051:
-

Can you it with Spark [2.3.2-rc4 
|https://dist.apache.org/repos/dist/dev/spark/v2.3.2-rc4-bin/]?

> where clause on dataset gives AnalysisException
> ---
>
> Key: SPARK-25051
> URL: https://issues.apache.org/jira/browse/SPARK-25051
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core, SQL
>Affects Versions: 2.3.0
>Reporter: MIK
>Priority: Major
>
> *schemas :*
> df1
> => id ts
> df2
> => id name country
> *code:*
> val df = df1.join(df2, Seq("id"), "left_outer").where(df2("id").isNull)
> *error*:
> org.apache.spark.sql.AnalysisException:Resolved attribute(s) id#0 missing 
> from xx#15,xx#9L,id#5,xx#6,xx#11,xx#14,xx#13,xx#12,xx#7,xx#16,xx#10,xx#8L in 
> operator !Filter isnull(id#0). Attribute(s) with the same name appear in the 
> operation: id. Please check if the right attribute(s) are used.;;
>  at 
> org.apache.spark.sql.catalyst.analysis.CheckAnalysis$class.failAnalysis(CheckAnalysis.scala:41)
>     at 
> org.apache.spark.sql.catalyst.analysis.Analyzer.failAnalysis(Analyzer.scala:91)
>     at 
> org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis$1.apply(CheckAnalysis.scala:289)
>     at 
> org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis$1.apply(CheckAnalysis.scala:80)
>     at 
> org.apache.spark.sql.catalyst.trees.TreeNode.foreachUp(TreeNode.scala:127)
>     at 
> org.apache.spark.sql.catalyst.analysis.CheckAnalysis$class.checkAnalysis(CheckAnalysis.scala:80)
>     at 
> org.apache.spark.sql.catalyst.analysis.Analyzer.checkAnalysis(Analyzer.scala:91)
>     at 
> org.apache.spark.sql.catalyst.analysis.Analyzer.executeAndCheck(Analyzer.scala:104)
>     at 
> org.apache.spark.sql.execution.QueryExecution.analyzed$lzycompute(QueryExecution.scala:57)
>     at 
> org.apache.spark.sql.execution.QueryExecution.analyzed(QueryExecution.scala:55)
>     at 
> org.apache.spark.sql.execution.QueryExecution.assertAnalyzed(QueryExecution.scala:47)
>     at org.apache.spark.sql.Dataset.(Dataset.scala:172)
>     at org.apache.spark.sql.Dataset.(Dataset.scala:178)
>     at org.apache.spark.sql.Dataset$.apply(Dataset.scala:65)
>     at org.apache.spark.sql.Dataset.withTypedPlan(Dataset.scala:3300)
>     at org.apache.spark.sql.Dataset.filter(Dataset.scala:1458)
>     at org.apache.spark.sql.Dataset.where(Dataset.scala:1486)
> This works fine in spark 2.2.2



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-25230) Upper behaves incorrect for string contains "ß"

2018-08-24 Thread Yuming Wang (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-25230?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yuming Wang updated SPARK-25230:

Description: 
How to reproduce:
{code:sql}
spark-sql> SELECT upper('Haßler');
HASSLER
{code}

Mainstream databases returns {{HAßLER}}.
 !MySQL.png! 

  was:
How to reproduce:
{code:sql}
spark-sql> SELECT upper('Haßler');
HASSLER
{code}

Mainstream databases returns {{HAßLER}}.
 !Teradata.jpeg! 


> Upper behaves incorrect for string contains "ß"
> ---
>
> Key: SPARK-25230
> URL: https://issues.apache.org/jira/browse/SPARK-25230
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.3.1
>Reporter: Yuming Wang
>Priority: Major
> Attachments: MySQL.png, Oracle.png, Teradata.jpeg
>
>
> How to reproduce:
> {code:sql}
> spark-sql> SELECT upper('Haßler');
> HASSLER
> {code}
> Mainstream databases returns {{HAßLER}}.
>  !MySQL.png! 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-25230) Upper behaves incorrect for string contains "ß"

2018-08-24 Thread Yuming Wang (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-25230?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yuming Wang updated SPARK-25230:

Description: 
How to reproduce:
{code:sql}
spark-sql> SELECT upper('Haßler');
HASSLER
{code}

Mainstream databases returns {{HAßLER}}.
 !Teradata.jpeg! 

  was:
How to reproduce:
{code:sql}
spark-sql> SELECT upper('Haßler');
HASSLER
{code}

Mainstream databases returns {{HAßLER}}.


> Upper behaves incorrect for string contains "ß"
> ---
>
> Key: SPARK-25230
> URL: https://issues.apache.org/jira/browse/SPARK-25230
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.3.1
>Reporter: Yuming Wang
>Priority: Major
> Attachments: MySQL.png, Oracle.png, Teradata.jpeg
>
>
> How to reproduce:
> {code:sql}
> spark-sql> SELECT upper('Haßler');
> HASSLER
> {code}
> Mainstream databases returns {{HAßLER}}.
>  !Teradata.jpeg! 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-25230) Upper behaves incorrect for string contains "ß"

2018-08-24 Thread Yuming Wang (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-25230?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yuming Wang updated SPARK-25230:

Attachment: MySQL.png

> Upper behaves incorrect for string contains "ß"
> ---
>
> Key: SPARK-25230
> URL: https://issues.apache.org/jira/browse/SPARK-25230
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.3.1
>Reporter: Yuming Wang
>Priority: Major
> Attachments: MySQL.png, Teradata.jpeg
>
>
> How to reproduce:
> {code:sql}
> spark-sql> SELECT upper('Haßler');
> HASSLER
> {code}
> Mainstream databases returns {{HAßLER}}.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-25230) Upper behaves incorrect for string contains "ß"

2018-08-24 Thread Yuming Wang (JIRA)
Yuming Wang created SPARK-25230:
---

 Summary: Upper behaves incorrect for string contains "ß"
 Key: SPARK-25230
 URL: https://issues.apache.org/jira/browse/SPARK-25230
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 2.3.1
Reporter: Yuming Wang


How to reproduce:
{code:sql}
spark-sql> SELECT upper('Haßler');
HASSLER
{code}

Mainstream databases returns {{HAßLER}}.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-25230) Upper behaves incorrect for string contains "ß"

2018-08-24 Thread Yuming Wang (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-25230?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yuming Wang updated SPARK-25230:

Attachment: WechatIMG511.jpeg

> Upper behaves incorrect for string contains "ß"
> ---
>
> Key: SPARK-25230
> URL: https://issues.apache.org/jira/browse/SPARK-25230
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.3.1
>Reporter: Yuming Wang
>Priority: Major
>
> How to reproduce:
> {code:sql}
> spark-sql> SELECT upper('Haßler');
> HASSLER
> {code}
> Mainstream databases returns {{HAßLER}}.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-25230) Upper behaves incorrect for string contains "ß"

2018-08-24 Thread Yuming Wang (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-25230?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yuming Wang updated SPARK-25230:

Attachment: (was: WechatIMG511.jpeg)

> Upper behaves incorrect for string contains "ß"
> ---
>
> Key: SPARK-25230
> URL: https://issues.apache.org/jira/browse/SPARK-25230
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.3.1
>Reporter: Yuming Wang
>Priority: Major
>
> How to reproduce:
> {code:sql}
> spark-sql> SELECT upper('Haßler');
> HASSLER
> {code}
> Mainstream databases returns {{HAßLER}}.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-25230) Upper behaves incorrect for string contains "ß"

2018-08-24 Thread Yuming Wang (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-25230?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yuming Wang updated SPARK-25230:

Attachment: Teradata.jpeg

> Upper behaves incorrect for string contains "ß"
> ---
>
> Key: SPARK-25230
> URL: https://issues.apache.org/jira/browse/SPARK-25230
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.3.1
>Reporter: Yuming Wang
>Priority: Major
> Attachments: Teradata.jpeg
>
>
> How to reproduce:
> {code:sql}
> spark-sql> SELECT upper('Haßler');
> HASSLER
> {code}
> Mainstream databases returns {{HAßLER}}.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-25230) Upper behaves incorrect for string contains "ß"

2018-08-24 Thread Yuming Wang (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-25230?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yuming Wang updated SPARK-25230:

Description: 
How to reproduce:
{code:sql}
spark-sql> SELECT upper('Haßler');
HASSLER
{code}
Mainstream databases returns {{HAßLER}}.
 !MySQL.png!

 

This 

  was:
How to reproduce:
{code:sql}
spark-sql> SELECT upper('Haßler');
HASSLER
{code}

Mainstream databases returns {{HAßLER}}.
 !MySQL.png! 


> Upper behaves incorrect for string contains "ß"
> ---
>
> Key: SPARK-25230
> URL: https://issues.apache.org/jira/browse/SPARK-25230
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.3.1
>Reporter: Yuming Wang
>Priority: Major
> Attachments: MySQL.png, Oracle.png, Teradata.jpeg
>
>
> How to reproduce:
> {code:sql}
> spark-sql> SELECT upper('Haßler');
> HASSLER
> {code}
> Mainstream databases returns {{HAßLER}}.
>  !MySQL.png!
>  
> This 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-25230) Upper behaves incorrect for string contains "ß"

2018-08-24 Thread Yuming Wang (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-25230?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yuming Wang updated SPARK-25230:

Description: 
How to reproduce:
{code:sql}
spark-sql> SELECT upper('Haßler');
HASSLER
{code}
Mainstream databases returns {{HAßLER}}.
 !MySQL.png!

 

This behavior may lead to data inconsistency:
{code:sql}
create temporary view SPARK_25230 as select * from values
  ("Hassler"),
  ("Haßler")
as EMPLOYEE(name);
select UPPER(name) from SPARK_25230 group by 1;
{code}

  was:
How to reproduce:
{code:sql}
spark-sql> SELECT upper('Haßler');
HASSLER
{code}
Mainstream databases returns {{HAßLER}}.
 !MySQL.png!

 

This behave 
{code:sql}
create temporary view SPARK_25230 as select * from values
  ("Hassler"),
  ("Haßler")
as EMPLOYEE(name);
select UPPER(name) from SPARK_25230 group by 1;
{code}


> Upper behaves incorrect for string contains "ß"
> ---
>
> Key: SPARK-25230
> URL: https://issues.apache.org/jira/browse/SPARK-25230
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.3.1
>Reporter: Yuming Wang
>Priority: Major
> Attachments: MySQL.png, Oracle.png, Teradata.jpeg
>
>
> How to reproduce:
> {code:sql}
> spark-sql> SELECT upper('Haßler');
> HASSLER
> {code}
> Mainstream databases returns {{HAßLER}}.
>  !MySQL.png!
>  
> This behavior may lead to data inconsistency:
> {code:sql}
> create temporary view SPARK_25230 as select * from values
>   ("Hassler"),
>   ("Haßler")
> as EMPLOYEE(name);
> select UPPER(name) from SPARK_25230 group by 1;
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-25230) Upper behavior incorrect for string contains "ß"

2018-08-24 Thread Yuming Wang (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-25230?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yuming Wang updated SPARK-25230:

Summary: Upper behavior incorrect for string contains "ß"  (was: Upper 
behaves incorrect for string contains "ß")

> Upper behavior incorrect for string contains "ß"
> 
>
> Key: SPARK-25230
> URL: https://issues.apache.org/jira/browse/SPARK-25230
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.3.1
>Reporter: Yuming Wang
>Priority: Major
> Attachments: MySQL.png, Oracle.png, Teradata.jpeg
>
>
> How to reproduce:
> {code:sql}
> spark-sql> SELECT upper('Haßler');
> HASSLER
> {code}
> Mainstream databases returns {{HAßLER}}.
>  !MySQL.png!
>  
> This behavior may lead to data inconsistency:
> {code:sql}
> create temporary view SPARK_25230 as select * from values
>   ("Hassler"),
>   ("Haßler")
> as EMPLOYEE(name);
> select UPPER(name) from SPARK_25230 group by 1;
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-25230) Upper behavior incorrect for string contains "ß"

2018-08-24 Thread Yuming Wang (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-25230?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yuming Wang updated SPARK-25230:

Description: 
How to reproduce:
{code:sql}
spark-sql> SELECT upper('Haßler');
HASSLER
{code}
Mainstream databases returns {{HAßLER}}.
 !MySQL.png!

 

This behavior may lead to data inconsistency:
{code:sql}
create temporary view SPARK_25230 as select * from values
  ("Hassler"),
  ("Haßler")
as EMPLOYEE(name);
select UPPER(name) from SPARK_25230 group by 1;
-- result
HASSLER{code}

  was:
How to reproduce:
{code:sql}
spark-sql> SELECT upper('Haßler');
HASSLER
{code}
Mainstream databases returns {{HAßLER}}.
 !MySQL.png!

 

This behavior may lead to data inconsistency:
{code:sql}
create temporary view SPARK_25230 as select * from values
  ("Hassler"),
  ("Haßler")
as EMPLOYEE(name);
select UPPER(name) from SPARK_25230 group by 1;
{code}


> Upper behavior incorrect for string contains "ß"
> 
>
> Key: SPARK-25230
> URL: https://issues.apache.org/jira/browse/SPARK-25230
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.3.1
>Reporter: Yuming Wang
>Priority: Major
> Attachments: MySQL.png, Oracle.png, Teradata.jpeg
>
>
> How to reproduce:
> {code:sql}
> spark-sql> SELECT upper('Haßler');
> HASSLER
> {code}
> Mainstream databases returns {{HAßLER}}.
>  !MySQL.png!
>  
> This behavior may lead to data inconsistency:
> {code:sql}
> create temporary view SPARK_25230 as select * from values
>   ("Hassler"),
>   ("Haßler")
> as EMPLOYEE(name);
> select UPPER(name) from SPARK_25230 group by 1;
> -- result
> HASSLER{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-25230) Upper behaves incorrect for string contains "ß"

2018-08-24 Thread Yuming Wang (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-25230?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yuming Wang updated SPARK-25230:

Description: 
How to reproduce:
{code:sql}
spark-sql> SELECT upper('Haßler');
HASSLER
{code}
Mainstream databases returns {{HAßLER}}.
 !MySQL.png!

 

This behave 

  was:
How to reproduce:
{code:sql}
spark-sql> SELECT upper('Haßler');
HASSLER
{code}
Mainstream databases returns {{HAßLER}}.
 !MySQL.png!

 

This 


> Upper behaves incorrect for string contains "ß"
> ---
>
> Key: SPARK-25230
> URL: https://issues.apache.org/jira/browse/SPARK-25230
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.3.1
>Reporter: Yuming Wang
>Priority: Major
> Attachments: MySQL.png, Oracle.png, Teradata.jpeg
>
>
> How to reproduce:
> {code:sql}
> spark-sql> SELECT upper('Haßler');
> HASSLER
> {code}
> Mainstream databases returns {{HAßLER}}.
>  !MySQL.png!
>  
> This behave 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



<    1   2   3   4   5   6   7   8   9   10   >