[jira] [Created] (SPARK-24497) Support recursive SQL query
Yuming Wang created SPARK-24497: --- Summary: Support recursive SQL query Key: SPARK-24497 URL: https://issues.apache.org/jira/browse/SPARK-24497 Project: Spark Issue Type: New Feature Components: SQL Affects Versions: 2.4.0 Reporter: Yuming Wang h3. *Examples* Here is an example for {{WITH RECURSIVE}} clause usage. Table "department" represents the structure of an organization as an adjacency list. {code:sql} CREATE TABLE department ( id INTEGER PRIMARY KEY, -- department ID parent_department INTEGER REFERENCES department, -- upper department ID name TEXT -- department name ); INSERT INTO department (id, parent_department, "name") VALUES (0, NULL, 'ROOT'), (1, 0, 'A'), (2, 1, 'B'), (3, 2, 'C'), (4, 2, 'D'), (5, 0, 'E'), (6, 4, 'F'), (7, 5, 'G'); -- department structure represented here is as follows: -- -- ROOT-+->A-+->B-+->C -- | | -- | +->D-+->F -- +->E-+->G {code} To extract all departments under A, you can use the following recursive query: {code:sql} WITH RECURSIVE subdepartment AS ( -- non-recursive term SELECT * FROM department WHERE name = 'A' UNION ALL -- recursive term SELECT d.* FROM department AS d JOIN subdepartment AS sd ON (d.parent_department = sd.id) ) SELECT * FROM subdepartment ORDER BY name; {code} More details: [http://wiki.postgresql.org/wiki/CTEReadme] [https://info.teradata.com/htmlpubs/DB_TTU_16_00/index.html#page/SQL_Reference/B035-1141-160K/lqe1472241402390.html] -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-24538) Decimal type support push down to the data sources
[ https://issues.apache.org/jira/browse/SPARK-24538?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16510503#comment-16510503 ] Yuming Wang commented on SPARK-24538: - I'm working on this. > Decimal type support push down to the data sources > -- > > Key: SPARK-24538 > URL: https://issues.apache.org/jira/browse/SPARK-24538 > Project: Spark > Issue Type: New Feature > Components: SQL >Affects Versions: 2.4.0 >Reporter: Yuming Wang >Priority: Major > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-24538) Decimal type support push down to the data sources
Yuming Wang created SPARK-24538: --- Summary: Decimal type support push down to the data sources Key: SPARK-24538 URL: https://issues.apache.org/jira/browse/SPARK-24538 Project: Spark Issue Type: New Feature Components: SQL Affects Versions: 2.4.0 Reporter: Yuming Wang -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Issue Comment Deleted] (SPARK-24538) Decimal type support push down to the data sources
[ https://issues.apache.org/jira/browse/SPARK-24538?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yuming Wang updated SPARK-24538: Comment: was deleted (was: I'm working on this.) > Decimal type support push down to the data sources > -- > > Key: SPARK-24538 > URL: https://issues.apache.org/jira/browse/SPARK-24538 > Project: Spark > Issue Type: New Feature > Components: SQL >Affects Versions: 2.4.0 >Reporter: Yuming Wang >Priority: Major > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-24538) Decimal type support push down to the data sources
[ https://issues.apache.org/jira/browse/SPARK-24538?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yuming Wang updated SPARK-24538: Description: Latest parquet support decimal type statistics. then we can push down: {noformat} LM-SHC-16502798:parquet-mr yumwang$ java -jar ./parquet-tools/target/parquet-tools-1.10.10-column-index-SNAPSHOT.jar meta /tmp/spark/parquet/decimal/part-0-3880e69a-6dd1-4c2b-946c-e7dae047f65c-c000.snappy.parquet file: file:/tmp/spark/parquet/decimal/part-0-3880e69a-6dd1-4c2b-946c-e7dae047f65c-c000.snappy.parquet creator: parquet-mr version 1.10.0 (build 031a6654009e3b82020012a18434c582bd74c73a) extra: org.apache.spark.sql.parquet.row.metadata = {"type":"struct","fields":[{"name":"id","type":"long","nullable":false,"metadata":{}},{"name":"d1","type":"decimal(9,0)","nullable":true,"metadata":{}},{"name":"d2","type":"decimal(9,2)","nullable":true,"metadata":{}},{"name":"d3","type":"decimal(18,0)","nullable":true,"metadata":{}},{"name":"d4","type":"decimal(18,4)","nullable":true,"metadata":{}},{"name":"d5","type":"decimal(38,0)","nullable":true,"metadata":{}},{"name":"d6","type":"decimal(38,18)","nullable":true,"metadata":{}}]} file schema: spark_schema id: REQUIRED INT64 R:0 D:0 d1: OPTIONAL INT32 O:DECIMAL R:0 D:1 d2: OPTIONAL INT32 O:DECIMAL R:0 D:1 d3: OPTIONAL INT64 O:DECIMAL R:0 D:1 d4: OPTIONAL INT64 O:DECIMAL R:0 D:1 d5: OPTIONAL FIXED_LEN_BYTE_ARRAY O:DECIMAL R:0 D:1 d6: OPTIONAL FIXED_LEN_BYTE_ARRAY O:DECIMAL R:0 D:1 row group 1: RC:241867 TS:15480513 OFFSET:4 id: INT64 SNAPPY DO:0 FPO:4 SZ:968154/1935071/2.00 VC:241867 ENC:BIT_PACKED,PLAIN ST:[min: 0, max: 241866, num_nulls: 0] d1: INT32 SNAPPY DO:0 FPO:968158 SZ:967555/967515/1.00 VC:241867 ENC:RLE,BIT_PACKED,PLAIN ST:[min: 0, max: 241866, num_nulls: 0] d2: INT32 SNAPPY DO:0 FPO:1935713 SZ:967558/967515/1.00 VC:241867 ENC:RLE,BIT_PACKED,PLAIN ST:[min: 0.00, max: 241866.00, num_nulls: 0] d3: INT64 SNAPPY DO:0 FPO:2903271 SZ:968866/1935047/2.00 VC:241867 ENC:RLE,BIT_PACKED,PLAIN ST:[min: 0, max: 241866, num_nulls: 0] d4: INT64 SNAPPY DO:0 FPO:3872137 SZ:1247007/1935047/1.55 VC:241867 ENC:RLE,BIT_PACKED,PLAIN ST:[min: 0., max: 241866., num_nulls: 0] d5: FIXED_LEN_BYTE_ARRAY SNAPPY DO:0 FPO:5119144 SZ:1266850/3870159/3.05 VC:241867 ENC:RLE,BIT_PACKED,PLAIN ST:[min: 0, max: 241866, num_nulls: 0] d6: FIXED_LEN_BYTE_ARRAY SNAPPY DO:0 FPO:6385994 SZ:2198910/3870159/1.76 VC:241867 ENC:RLE,BIT_PACKED,PLAIN ST:[min: 0E-18, max: 241866.00, num_nulls: 0] row group 2: RC:241867 TS:15480513 OFFSET:8584904 id: INT64 SNAPPY DO:0 FPO:8584904 SZ:968131/1935071/2.00 VC:241867 ENC:BIT_PACKED,PLAIN ST:[min: 241867, max: 483733, num_nulls: 0] d1: INT32 SNAPPY DO:0 FPO:9553035 SZ:967563/967515/1.00 VC:241867 ENC:RLE,BIT_PACKED,PLAIN ST:[min: 241867, max: 483733, num_nulls: 0] d2: INT32 SNAPPY DO:0 FPO:10520598 SZ:967563/967515/1.00 VC:241867 ENC:RLE,BIT_PACKED,PLAIN ST:[min: 241867.00, max: 483733.00, num_nulls: 0] d3: INT64 SNAPPY DO:0 FPO:11488161 SZ:968110/1935047/2.00 VC:241867 ENC:RLE,BIT_PACKED,PLAIN ST:[min: 241867, max: 483733, num_nulls: 0] d4: INT64 SNAPPY DO:0 FPO:12456271 SZ:1247071/1935047/1.55 VC:241867 ENC:RLE,BIT_PACKED,PLAIN ST:[min: 241867., max: 483733., num_nulls: 0] d5: FIXED_LEN_BYTE_ARRAY SNAPPY DO:0 FPO:13703342 SZ:1270587/3870159/3.05 VC:241867 ENC:RLE,BIT_PACKED,PLAIN ST:[min: 241867, max: 483733, num_nulls: 0] d6: FIXED_LEN_BYTE_ARRAY SNAPPY DO:0 FPO:14973929 SZ:2197306/3870159/1.76 VC:241867 ENC:RLE,BIT_PACKED,PLAIN ST:[min: 241867.00, max: 483733.00, num_nulls: 0]{noformat} > Decimal type support push down to the data sources > -- > > Key: SPARK-24538 > URL: https://issues.apache.org/jira/browse/SPARK-24538 > Project: Spark > Issue Type: New Feature > Components: SQL >Affects Versions: 2.4.0 >Reporter: Yuming Wang >Priority: Major > > Latest parquet support decimal type statistics. then we can push down: > {noformat} > LM-SHC-16502798:parquet-mr yumwang$ java -jar > ./parquet-tools/target/parquet-tools-1.10.10-column-index-SNAPSHOT.jar meta > /tmp/spark/parquet/decimal/part-0-3880e69a-6dd1-4c2b-946c-e7dae047f65c-c000.snappy.parquet > file: >
[jira] [Updated] (SPARK-24538) Decimal type support push down to the data sources
[ https://issues.apache.org/jira/browse/SPARK-24538?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yuming Wang updated SPARK-24538: Description: Latest parquet support decimal type statistics. then we can push down to the data sources: {noformat} LM-SHC-16502798:parquet-mr yumwang$ java -jar ./parquet-tools/target/parquet-tools-1.10.10-column-index-SNAPSHOT.jar meta /tmp/spark/parquet/decimal/part-0-3880e69a-6dd1-4c2b-946c-e7dae047f65c-c000.snappy.parquet file: file:/tmp/spark/parquet/decimal/part-0-3880e69a-6dd1-4c2b-946c-e7dae047f65c-c000.snappy.parquet creator: parquet-mr version 1.10.0 (build 031a6654009e3b82020012a18434c582bd74c73a) extra: org.apache.spark.sql.parquet.row.metadata = {"type":"struct","fields":[{"name":"id","type":"long","nullable":false,"metadata":{}},{"name":"d1","type":"decimal(9,0)","nullable":true,"metadata":{}},{"name":"d2","type":"decimal(9,2)","nullable":true,"metadata":{}},{"name":"d3","type":"decimal(18,0)","nullable":true,"metadata":{}},{"name":"d4","type":"decimal(18,4)","nullable":true,"metadata":{}},{"name":"d5","type":"decimal(38,0)","nullable":true,"metadata":{}},{"name":"d6","type":"decimal(38,18)","nullable":true,"metadata":{}}]} file schema: spark_schema id: REQUIRED INT64 R:0 D:0 d1: OPTIONAL INT32 O:DECIMAL R:0 D:1 d2: OPTIONAL INT32 O:DECIMAL R:0 D:1 d3: OPTIONAL INT64 O:DECIMAL R:0 D:1 d4: OPTIONAL INT64 O:DECIMAL R:0 D:1 d5: OPTIONAL FIXED_LEN_BYTE_ARRAY O:DECIMAL R:0 D:1 d6: OPTIONAL FIXED_LEN_BYTE_ARRAY O:DECIMAL R:0 D:1 row group 1: RC:241867 TS:15480513 OFFSET:4 id: INT64 SNAPPY DO:0 FPO:4 SZ:968154/1935071/2.00 VC:241867 ENC:BIT_PACKED,PLAIN ST:[min: 0, max: 241866, num_nulls: 0] d1: INT32 SNAPPY DO:0 FPO:968158 SZ:967555/967515/1.00 VC:241867 ENC:RLE,BIT_PACKED,PLAIN ST:[min: 0, max: 241866, num_nulls: 0] d2: INT32 SNAPPY DO:0 FPO:1935713 SZ:967558/967515/1.00 VC:241867 ENC:RLE,BIT_PACKED,PLAIN ST:[min: 0.00, max: 241866.00, num_nulls: 0] d3: INT64 SNAPPY DO:0 FPO:2903271 SZ:968866/1935047/2.00 VC:241867 ENC:RLE,BIT_PACKED,PLAIN ST:[min: 0, max: 241866, num_nulls: 0] d4: INT64 SNAPPY DO:0 FPO:3872137 SZ:1247007/1935047/1.55 VC:241867 ENC:RLE,BIT_PACKED,PLAIN ST:[min: 0., max: 241866., num_nulls: 0] d5: FIXED_LEN_BYTE_ARRAY SNAPPY DO:0 FPO:5119144 SZ:1266850/3870159/3.05 VC:241867 ENC:RLE,BIT_PACKED,PLAIN ST:[min: 0, max: 241866, num_nulls: 0] d6: FIXED_LEN_BYTE_ARRAY SNAPPY DO:0 FPO:6385994 SZ:2198910/3870159/1.76 VC:241867 ENC:RLE,BIT_PACKED,PLAIN ST:[min: 0E-18, max: 241866.00, num_nulls: 0] row group 2: RC:241867 TS:15480513 OFFSET:8584904 id: INT64 SNAPPY DO:0 FPO:8584904 SZ:968131/1935071/2.00 VC:241867 ENC:BIT_PACKED,PLAIN ST:[min: 241867, max: 483733, num_nulls: 0] d1: INT32 SNAPPY DO:0 FPO:9553035 SZ:967563/967515/1.00 VC:241867 ENC:RLE,BIT_PACKED,PLAIN ST:[min: 241867, max: 483733, num_nulls: 0] d2: INT32 SNAPPY DO:0 FPO:10520598 SZ:967563/967515/1.00 VC:241867 ENC:RLE,BIT_PACKED,PLAIN ST:[min: 241867.00, max: 483733.00, num_nulls: 0] d3: INT64 SNAPPY DO:0 FPO:11488161 SZ:968110/1935047/2.00 VC:241867 ENC:RLE,BIT_PACKED,PLAIN ST:[min: 241867, max: 483733, num_nulls: 0] d4: INT64 SNAPPY DO:0 FPO:12456271 SZ:1247071/1935047/1.55 VC:241867 ENC:RLE,BIT_PACKED,PLAIN ST:[min: 241867., max: 483733., num_nulls: 0] d5: FIXED_LEN_BYTE_ARRAY SNAPPY DO:0 FPO:13703342 SZ:1270587/3870159/3.05 VC:241867 ENC:RLE,BIT_PACKED,PLAIN ST:[min: 241867, max: 483733, num_nulls: 0] d6: FIXED_LEN_BYTE_ARRAY SNAPPY DO:0 FPO:14973929 SZ:2197306/3870159/1.76 VC:241867 ENC:RLE,BIT_PACKED,PLAIN ST:[min: 241867.00, max: 483733.00, num_nulls: 0]{noformat} was: Latest parquet support decimal type statistics. then we can push down: {noformat} LM-SHC-16502798:parquet-mr yumwang$ java -jar ./parquet-tools/target/parquet-tools-1.10.10-column-index-SNAPSHOT.jar meta /tmp/spark/parquet/decimal/part-0-3880e69a-6dd1-4c2b-946c-e7dae047f65c-c000.snappy.parquet file: file:/tmp/spark/parquet/decimal/part-0-3880e69a-6dd1-4c2b-946c-e7dae047f65c-c000.snappy.parquet creator: parquet-mr version 1.10.0 (build 031a6654009e3b82020012a18434c582bd74c73a) extra: org.apache.spark.sql.parquet.row.metadata =
[jira] [Issue Comment Deleted] (SPARK-24549) 32BitDecimalType and 64BitDecimalType support push down to the data sources
[ https://issues.apache.org/jira/browse/SPARK-24549?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yuming Wang updated SPARK-24549: Comment: was deleted (was: I'm working on this) > 32BitDecimalType and 64BitDecimalType support push down to the data sources > --- > > Key: SPARK-24549 > URL: https://issues.apache.org/jira/browse/SPARK-24549 > Project: Spark > Issue Type: New Feature > Components: SQL >Affects Versions: 2.4.0 >Reporter: Yuming Wang >Priority: Major > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-24549) 32BitDecimalType and 64BitDecimalType support push down to the data sources
[ https://issues.apache.org/jira/browse/SPARK-24549?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16511034#comment-16511034 ] Yuming Wang commented on SPARK-24549: - I'm working on this > 32BitDecimalType and 64BitDecimalType support push down to the data sources > --- > > Key: SPARK-24549 > URL: https://issues.apache.org/jira/browse/SPARK-24549 > Project: Spark > Issue Type: New Feature > Components: SQL >Affects Versions: 2.4.0 >Reporter: Yuming Wang >Priority: Major > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-24549) 32BitDecimalType and 64BitDecimalType support push down to the data sources
Yuming Wang created SPARK-24549: --- Summary: 32BitDecimalType and 64BitDecimalType support push down to the data sources Key: SPARK-24549 URL: https://issues.apache.org/jira/browse/SPARK-24549 Project: Spark Issue Type: New Feature Components: SQL Affects Versions: 2.4.0 Reporter: Yuming Wang -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-24538) ByteArrayDecimalType support push down to the data sources
[ https://issues.apache.org/jira/browse/SPARK-24538?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yuming Wang updated SPARK-24538: Summary: ByteArrayDecimalType support push down to the data sources (was: Decimal type support push down to the data sources) > ByteArrayDecimalType support push down to the data sources > -- > > Key: SPARK-24538 > URL: https://issues.apache.org/jira/browse/SPARK-24538 > Project: Spark > Issue Type: New Feature > Components: SQL >Affects Versions: 2.4.0 >Reporter: Yuming Wang >Priority: Major > > Latest parquet support decimal type statistics. then we can push down to the > data sources: > {noformat} > LM-SHC-16502798:parquet-mr yumwang$ java -jar > ./parquet-tools/target/parquet-tools-1.10.10-column-index-SNAPSHOT.jar meta > /tmp/spark/parquet/decimal/part-0-3880e69a-6dd1-4c2b-946c-e7dae047f65c-c000.snappy.parquet > file: > file:/tmp/spark/parquet/decimal/part-0-3880e69a-6dd1-4c2b-946c-e7dae047f65c-c000.snappy.parquet > creator: parquet-mr version 1.10.0 (build > 031a6654009e3b82020012a18434c582bd74c73a) > extra: org.apache.spark.sql.parquet.row.metadata = > {"type":"struct","fields":[{"name":"id","type":"long","nullable":false,"metadata":{}},{"name":"d1","type":"decimal(9,0)","nullable":true,"metadata":{}},{"name":"d2","type":"decimal(9,2)","nullable":true,"metadata":{}},{"name":"d3","type":"decimal(18,0)","nullable":true,"metadata":{}},{"name":"d4","type":"decimal(18,4)","nullable":true,"metadata":{}},{"name":"d5","type":"decimal(38,0)","nullable":true,"metadata":{}},{"name":"d6","type":"decimal(38,18)","nullable":true,"metadata":{}}]} > file schema: spark_schema > > id: REQUIRED INT64 R:0 D:0 > d1: OPTIONAL INT32 O:DECIMAL R:0 D:1 > d2: OPTIONAL INT32 O:DECIMAL R:0 D:1 > d3: OPTIONAL INT64 O:DECIMAL R:0 D:1 > d4: OPTIONAL INT64 O:DECIMAL R:0 D:1 > d5: OPTIONAL FIXED_LEN_BYTE_ARRAY O:DECIMAL R:0 D:1 > d6: OPTIONAL FIXED_LEN_BYTE_ARRAY O:DECIMAL R:0 D:1 > row group 1: RC:241867 TS:15480513 OFFSET:4 > > id: INT64 SNAPPY DO:0 FPO:4 SZ:968154/1935071/2.00 VC:241867 > ENC:BIT_PACKED,PLAIN ST:[min: 0, max: 241866, num_nulls: 0] > d1: INT32 SNAPPY DO:0 FPO:968158 SZ:967555/967515/1.00 VC:241867 > ENC:RLE,BIT_PACKED,PLAIN ST:[min: 0, max: 241866, num_nulls: 0] > d2: INT32 SNAPPY DO:0 FPO:1935713 SZ:967558/967515/1.00 VC:241867 > ENC:RLE,BIT_PACKED,PLAIN ST:[min: 0.00, max: 241866.00, num_nulls: 0] > d3: INT64 SNAPPY DO:0 FPO:2903271 SZ:968866/1935047/2.00 VC:241867 > ENC:RLE,BIT_PACKED,PLAIN ST:[min: 0, max: 241866, num_nulls: 0] > d4: INT64 SNAPPY DO:0 FPO:3872137 SZ:1247007/1935047/1.55 > VC:241867 ENC:RLE,BIT_PACKED,PLAIN ST:[min: 0., max: 241866., > num_nulls: 0] > d5: FIXED_LEN_BYTE_ARRAY SNAPPY DO:0 FPO:5119144 > SZ:1266850/3870159/3.05 VC:241867 ENC:RLE,BIT_PACKED,PLAIN ST:[min: 0, max: > 241866, num_nulls: 0] > d6: FIXED_LEN_BYTE_ARRAY SNAPPY DO:0 FPO:6385994 > SZ:2198910/3870159/1.76 VC:241867 ENC:RLE,BIT_PACKED,PLAIN ST:[min: 0E-18, > max: 241866.00, num_nulls: 0] > row group 2: RC:241867 TS:15480513 OFFSET:8584904 > > id: INT64 SNAPPY DO:0 FPO:8584904 SZ:968131/1935071/2.00 VC:241867 > ENC:BIT_PACKED,PLAIN ST:[min: 241867, max: 483733, num_nulls: 0] > d1: INT32 SNAPPY DO:0 FPO:9553035 SZ:967563/967515/1.00 VC:241867 > ENC:RLE,BIT_PACKED,PLAIN ST:[min: 241867, max: 483733, num_nulls: 0] > d2: INT32 SNAPPY DO:0 FPO:10520598 SZ:967563/967515/1.00 VC:241867 > ENC:RLE,BIT_PACKED,PLAIN ST:[min: 241867.00, max: 483733.00, num_nulls: 0] > d3: INT64 SNAPPY DO:0 FPO:11488161 SZ:968110/1935047/2.00 > VC:241867 ENC:RLE,BIT_PACKED,PLAIN ST:[min: 241867, max: 483733, num_nulls: 0] > d4: INT64 SNAPPY DO:0 FPO:12456271 SZ:1247071/1935047/1.55 > VC:241867 ENC:RLE,BIT_PACKED,PLAIN ST:[min: 241867., max: 483733., > num_nulls: 0] > d5: FIXED_LEN_BYTE_ARRAY SNAPPY DO:0 FPO:13703342 > SZ:1270587/3870159/3.05 VC:241867 ENC:RLE,BIT_PACKED,PLAIN ST:[min: 241867, > max: 483733, num_nulls: 0] > d6: FIXED_LEN_BYTE_ARRAY SNAPPY DO:0 FPO:14973929 > SZ:2197306/3870159/1.76 VC:241867 ENC:RLE,BIT_PACKED,PLAIN ST:[min: > 241867.00, max: 483733.00, num_nulls: > 0]{noformat} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To
[jira] [Updated] (SPARK-24549) 32BitDecimalType and 64BitDecimalType support push down to the data sources
[ https://issues.apache.org/jira/browse/SPARK-24549?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yuming Wang updated SPARK-24549: Issue Type: Improvement (was: New Feature) > 32BitDecimalType and 64BitDecimalType support push down to the data sources > --- > > Key: SPARK-24549 > URL: https://issues.apache.org/jira/browse/SPARK-24549 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 2.4.0 >Reporter: Yuming Wang >Priority: Major > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-24538) ByteArrayDecimalType support push down to the data sources
[ https://issues.apache.org/jira/browse/SPARK-24538?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yuming Wang updated SPARK-24538: Issue Type: Improvement (was: New Feature) > ByteArrayDecimalType support push down to the data sources > -- > > Key: SPARK-24538 > URL: https://issues.apache.org/jira/browse/SPARK-24538 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 2.4.0 >Reporter: Yuming Wang >Priority: Major > > Latest parquet support decimal type statistics. then we can push down to the > data sources: > {noformat} > LM-SHC-16502798:parquet-mr yumwang$ java -jar > ./parquet-tools/target/parquet-tools-1.10.10-column-index-SNAPSHOT.jar meta > /tmp/spark/parquet/decimal/part-0-3880e69a-6dd1-4c2b-946c-e7dae047f65c-c000.snappy.parquet > file: > file:/tmp/spark/parquet/decimal/part-0-3880e69a-6dd1-4c2b-946c-e7dae047f65c-c000.snappy.parquet > creator: parquet-mr version 1.10.0 (build > 031a6654009e3b82020012a18434c582bd74c73a) > extra: org.apache.spark.sql.parquet.row.metadata = > {"type":"struct","fields":[{"name":"id","type":"long","nullable":false,"metadata":{}},{"name":"d1","type":"decimal(9,0)","nullable":true,"metadata":{}},{"name":"d2","type":"decimal(9,2)","nullable":true,"metadata":{}},{"name":"d3","type":"decimal(18,0)","nullable":true,"metadata":{}},{"name":"d4","type":"decimal(18,4)","nullable":true,"metadata":{}},{"name":"d5","type":"decimal(38,0)","nullable":true,"metadata":{}},{"name":"d6","type":"decimal(38,18)","nullable":true,"metadata":{}}]} > file schema: spark_schema > > id: REQUIRED INT64 R:0 D:0 > d1: OPTIONAL INT32 O:DECIMAL R:0 D:1 > d2: OPTIONAL INT32 O:DECIMAL R:0 D:1 > d3: OPTIONAL INT64 O:DECIMAL R:0 D:1 > d4: OPTIONAL INT64 O:DECIMAL R:0 D:1 > d5: OPTIONAL FIXED_LEN_BYTE_ARRAY O:DECIMAL R:0 D:1 > d6: OPTIONAL FIXED_LEN_BYTE_ARRAY O:DECIMAL R:0 D:1 > row group 1: RC:241867 TS:15480513 OFFSET:4 > > id: INT64 SNAPPY DO:0 FPO:4 SZ:968154/1935071/2.00 VC:241867 > ENC:BIT_PACKED,PLAIN ST:[min: 0, max: 241866, num_nulls: 0] > d1: INT32 SNAPPY DO:0 FPO:968158 SZ:967555/967515/1.00 VC:241867 > ENC:RLE,BIT_PACKED,PLAIN ST:[min: 0, max: 241866, num_nulls: 0] > d2: INT32 SNAPPY DO:0 FPO:1935713 SZ:967558/967515/1.00 VC:241867 > ENC:RLE,BIT_PACKED,PLAIN ST:[min: 0.00, max: 241866.00, num_nulls: 0] > d3: INT64 SNAPPY DO:0 FPO:2903271 SZ:968866/1935047/2.00 VC:241867 > ENC:RLE,BIT_PACKED,PLAIN ST:[min: 0, max: 241866, num_nulls: 0] > d4: INT64 SNAPPY DO:0 FPO:3872137 SZ:1247007/1935047/1.55 > VC:241867 ENC:RLE,BIT_PACKED,PLAIN ST:[min: 0., max: 241866., > num_nulls: 0] > d5: FIXED_LEN_BYTE_ARRAY SNAPPY DO:0 FPO:5119144 > SZ:1266850/3870159/3.05 VC:241867 ENC:RLE,BIT_PACKED,PLAIN ST:[min: 0, max: > 241866, num_nulls: 0] > d6: FIXED_LEN_BYTE_ARRAY SNAPPY DO:0 FPO:6385994 > SZ:2198910/3870159/1.76 VC:241867 ENC:RLE,BIT_PACKED,PLAIN ST:[min: 0E-18, > max: 241866.00, num_nulls: 0] > row group 2: RC:241867 TS:15480513 OFFSET:8584904 > > id: INT64 SNAPPY DO:0 FPO:8584904 SZ:968131/1935071/2.00 VC:241867 > ENC:BIT_PACKED,PLAIN ST:[min: 241867, max: 483733, num_nulls: 0] > d1: INT32 SNAPPY DO:0 FPO:9553035 SZ:967563/967515/1.00 VC:241867 > ENC:RLE,BIT_PACKED,PLAIN ST:[min: 241867, max: 483733, num_nulls: 0] > d2: INT32 SNAPPY DO:0 FPO:10520598 SZ:967563/967515/1.00 VC:241867 > ENC:RLE,BIT_PACKED,PLAIN ST:[min: 241867.00, max: 483733.00, num_nulls: 0] > d3: INT64 SNAPPY DO:0 FPO:11488161 SZ:968110/1935047/2.00 > VC:241867 ENC:RLE,BIT_PACKED,PLAIN ST:[min: 241867, max: 483733, num_nulls: 0] > d4: INT64 SNAPPY DO:0 FPO:12456271 SZ:1247071/1935047/1.55 > VC:241867 ENC:RLE,BIT_PACKED,PLAIN ST:[min: 241867., max: 483733., > num_nulls: 0] > d5: FIXED_LEN_BYTE_ARRAY SNAPPY DO:0 FPO:13703342 > SZ:1270587/3870159/3.05 VC:241867 ENC:RLE,BIT_PACKED,PLAIN ST:[min: 241867, > max: 483733, num_nulls: 0] > d6: FIXED_LEN_BYTE_ARRAY SNAPPY DO:0 FPO:14973929 > SZ:2197306/3870159/1.76 VC:241867 ENC:RLE,BIT_PACKED,PLAIN ST:[min: > 241867.00, max: 483733.00, num_nulls: > 0]{noformat} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail:
[jira] [Commented] (SPARK-20427) Issue with Spark interpreting Oracle datatype NUMBER
[ https://issues.apache.org/jira/browse/SPARK-20427?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16529013#comment-16529013 ] Yuming Wang commented on SPARK-20427: - [~ORichard]. Please try to use {{customSchema}} to specifying the custom data types of the read schema. https://github.com/apache/spark/blob/v2.3.1/examples/src/main/scala/org/apache/spark/examples/sql/SQLDataSourceExample.scala#L197 > Issue with Spark interpreting Oracle datatype NUMBER > > > Key: SPARK-20427 > URL: https://issues.apache.org/jira/browse/SPARK-20427 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.1.0 >Reporter: Alexander Andrushenko >Assignee: Yuming Wang >Priority: Major > Fix For: 2.3.0 > > > In Oracle exists data type NUMBER. When defining a filed in a table of type > NUMBER the field has two components, precision and scale. > For example, NUMBER(p,s) has precision p and scale s. > Precision can range from 1 to 38. > Scale can range from -84 to 127. > When reading such a filed Spark can create numbers with precision exceeding > 38. In our case it has created fields with precision 44, > calculated as sum of the precision (in our case 34 digits) and the scale (10): > "...java.lang.IllegalArgumentException: requirement failed: Decimal precision > 44 exceeds max precision 38...". > The result was, that a data frame was read from a table on one schema but > could not be inserted in the identical table on other schema. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-24716) Refactor ParquetFilters
Yuming Wang created SPARK-24716: --- Summary: Refactor ParquetFilters Key: SPARK-24716 URL: https://issues.apache.org/jira/browse/SPARK-24716 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 2.4.0 Reporter: Yuming Wang -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-24716) Refactor ParquetFilters
[ https://issues.apache.org/jira/browse/SPARK-24716?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16529603#comment-16529603 ] Yuming Wang commented on SPARK-24716: - I'm working on. > Refactor ParquetFilters > --- > > Key: SPARK-24716 > URL: https://issues.apache.org/jira/browse/SPARK-24716 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 2.4.0 >Reporter: Yuming Wang >Priority: Major > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Issue Comment Deleted] (SPARK-24692) Improvement FilterPushdownBenchmark
[ https://issues.apache.org/jira/browse/SPARK-24692?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yuming Wang updated SPARK-24692: Comment: was deleted (was: I'm working on it.) > Improvement FilterPushdownBenchmark > --- > > Key: SPARK-24692 > URL: https://issues.apache.org/jira/browse/SPARK-24692 > Project: Spark > Issue Type: Improvement > Components: Tests >Affects Versions: 2.4.0 >Reporter: Yuming Wang >Priority: Major > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-24658) Remove workaround for ANTLR bug
Yuming Wang created SPARK-24658: --- Summary: Remove workaround for ANTLR bug Key: SPARK-24658 URL: https://issues.apache.org/jira/browse/SPARK-24658 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 2.4.0 Reporter: Yuming Wang Issue [antlr/antlr4#781|https://github.com/antlr/antlr4/issues/781] has already been fixed, so the workaround of extracting the pattern into a separate rule is no longer needed. The presto already removed it: https://github.com/prestodb/presto/pull/10744. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-24638) StringStartsWith support push down
Yuming Wang created SPARK-24638: --- Summary: StringStartsWith support push down Key: SPARK-24638 URL: https://issues.apache.org/jira/browse/SPARK-24638 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 2.4.0 Reporter: Yuming Wang -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-24706) Support ByteType and ShortType pushdown to parquet
Yuming Wang created SPARK-24706: --- Summary: Support ByteType and ShortType pushdown to parquet Key: SPARK-24706 URL: https://issues.apache.org/jira/browse/SPARK-24706 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 2.4.0 Reporter: Yuming Wang Benchmark result: {noformat} ###[ Pushdown benchmark for tinyint ] Java HotSpot(TM) 64-Bit Server VM 1.8.0_151-b12 on Mac OS X 10.12.6 Intel(R) Core(TM) i7-7820HQ CPU @ 2.90GHz Select 1 tinyint row (value = CAST(63 AS tinyint)): Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative Parquet Vectorized4307 / 4575 3.7 273.8 1.0X Parquet Vectorized (Pushdown) 227 / 241 69.4 14.4 19.0X Native ORC Vectorized 3646 / 3727 4.3 231.8 1.2X Native ORC Vectorized (Pushdown) 736 / 744 21.4 46.8 5.9X Select 10% tinyint rows (value < 12):Best/Avg Time(ms)Rate(M/s) Per Row(ns) Relative Parquet Vectorized5209 / 5843 3.0 331.2 1.0X Parquet Vectorized (Pushdown) 1296 / 1759 12.1 82.4 4.0X Native ORC Vectorized 4455 / 4594 3.5 283.2 1.2X Native ORC Vectorized (Pushdown) 1736 / 1813 9.1 110.4 3.0X Select 50% tinyint rows (value < 63):Best/Avg Time(ms)Rate(M/s) Per Row(ns) Relative Parquet Vectorized8362 / 8394 1.9 531.7 1.0X Parquet Vectorized (Pushdown) 6303 / 6530 2.5 400.7 1.3X Native ORC Vectorized 7962 / 8113 2.0 506.2 1.1X Native ORC Vectorized (Pushdown) 6680 / 7556 2.4 424.7 1.3X Select 90% tinyint rows (value < 114): Best/Avg Time(ms)Rate(M/s) Per Row(ns) Relative Parquet Vectorized 11572 / 11715 1.4 735.7 1.0X Parquet Vectorized (Pushdown) 11198 / 11326 1.4 712.0 1.0X Native ORC Vectorized 11041 / 11209 1.4 702.0 1.0X Native ORC Vectorized (Pushdown)11104 / 11472 1.4 706.0 1.0X {noformat} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-24706) Support ByteType and ShortType pushdown to parquet
[ https://issues.apache.org/jira/browse/SPARK-24706?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16528878#comment-16528878 ] Yuming Wang commented on SPARK-24706: - Benchmark result: {noformat} ###[ Pushdown benchmark for tinyint ] Java HotSpot(TM) 64-Bit Server VM 1.8.0_151-b12 on Mac OS X 10.12.6 Intel(R) Core(TM) i7-7820HQ CPU @ 2.90GHz Select 1 tinyint row (value = CAST(63 AS tinyint)): Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative Parquet Vectorized4307 / 4575 3.7 273.8 1.0X Parquet Vectorized (Pushdown) 227 / 241 69.4 14.4 19.0X Native ORC Vectorized 3646 / 3727 4.3 231.8 1.2X Native ORC Vectorized (Pushdown) 736 / 744 21.4 46.8 5.9X Select 10% tinyint rows (value < 12):Best/Avg Time(ms)Rate(M/s) Per Row(ns) Relative Parquet Vectorized5209 / 5843 3.0 331.2 1.0X Parquet Vectorized (Pushdown) 1296 / 1759 12.1 82.4 4.0X Native ORC Vectorized 4455 / 4594 3.5 283.2 1.2X Native ORC Vectorized (Pushdown) 1736 / 1813 9.1 110.4 3.0X Select 50% tinyint rows (value < 63):Best/Avg Time(ms)Rate(M/s) Per Row(ns) Relative Parquet Vectorized8362 / 8394 1.9 531.7 1.0X Parquet Vectorized (Pushdown) 6303 / 6530 2.5 400.7 1.3X Native ORC Vectorized 7962 / 8113 2.0 506.2 1.1X Native ORC Vectorized (Pushdown) 6680 / 7556 2.4 424.7 1.3X Select 90% tinyint rows (value < 114): Best/Avg Time(ms)Rate(M/s) Per Row(ns) Relative Parquet Vectorized 11572 / 11715 1.4 735.7 1.0X Parquet Vectorized (Pushdown) 11198 / 11326 1.4 712.0 1.0X Native ORC Vectorized 11041 / 11209 1.4 702.0 1.0X Native ORC Vectorized (Pushdown)11104 / 11472 1.4 706.0 1.0X ###[ Pushdown benchmark for smallint ]### Java HotSpot(TM) 64-Bit Server VM 1.8.0_151-b12 on Mac OS X 10.12.6 Intel(R) Core(TM) i7-7820HQ CPU @ 2.90GHz Select 1 smallint row (value = CAST(63 AS smallint)): Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative Parquet Vectorized2939 / 2966 5.4 186.9 1.0X Parquet Vectorized (Pushdown) 85 / 91184.9 5.4 34.6X Native ORC Vectorized 2927 / 3026 5.4 186.1 1.0X Native ORC Vectorized (Pushdown) 418 / 432 37.7 26.6 7.0X Select 10% smallint rows (value < CAST(3276 AS smallint)): Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative Parquet Vectorized3735 / 3897 4.2 237.5 1.0X Parquet Vectorized (Pushdown) 1204 / 1222 13.1 76.6 3.1X Native ORC Vectorized 3796 / 3831 4.1 241.4 1.0X Native ORC Vectorized (Pushdown) 1570 / 1581 10.0 99.8 2.4X Select 50% smallint rows (value < CAST(16383 AS smallint)): Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative Parquet Vectorized7194 / 8522 2.2 457.4 1.0X Parquet Vectorized (Pushdown) 5758 / 5806 2.7 366.1 1.2X Native ORC Vectorized 7311 / 7585 2.2 464.8 1.0X Native ORC Vectorized (Pushdown) 6123 / 6342 2.6 389.3 1.2X Select 90% smallint rows (value < CAST(29490 AS smallint)): Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative
[jira] [Updated] (SPARK-24706) Support ByteType and ShortType pushdown to parquet
[ https://issues.apache.org/jira/browse/SPARK-24706?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yuming Wang updated SPARK-24706: Description: Benchmark result: {noformat} ###[ Pushdown benchmark for tinyint ] Java HotSpot(TM) 64-Bit Server VM 1.8.0_151-b12 on Mac OS X 10.12.6 Intel(R) Core(TM) i7-7820HQ CPU @ 2.90GHz Select 1 tinyint row (value = CAST(63 AS tinyint)): Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative Parquet Vectorized4307 / 4575 3.7 273.8 1.0X Parquet Vectorized (Pushdown) 227 / 241 69.4 14.4 19.0X Native ORC Vectorized 3646 / 3727 4.3 231.8 1.2X Native ORC Vectorized (Pushdown) 736 / 744 21.4 46.8 5.9X Select 10% tinyint rows (value < 12):Best/Avg Time(ms)Rate(M/s) Per Row(ns) Relative Parquet Vectorized5209 / 5843 3.0 331.2 1.0X Parquet Vectorized (Pushdown) 1296 / 1759 12.1 82.4 4.0X Native ORC Vectorized 4455 / 4594 3.5 283.2 1.2X Native ORC Vectorized (Pushdown) 1736 / 1813 9.1 110.4 3.0X Select 50% tinyint rows (value < 63):Best/Avg Time(ms)Rate(M/s) Per Row(ns) Relative Parquet Vectorized8362 / 8394 1.9 531.7 1.0X Parquet Vectorized (Pushdown) 6303 / 6530 2.5 400.7 1.3X Native ORC Vectorized 7962 / 8113 2.0 506.2 1.1X Native ORC Vectorized (Pushdown) 6680 / 7556 2.4 424.7 1.3X Select 90% tinyint rows (value < 114): Best/Avg Time(ms)Rate(M/s) Per Row(ns) Relative Parquet Vectorized 11572 / 11715 1.4 735.7 1.0X Parquet Vectorized (Pushdown) 11198 / 11326 1.4 712.0 1.0X Native ORC Vectorized 11041 / 11209 1.4 702.0 1.0X Native ORC Vectorized (Pushdown)11104 / 11472 1.4 706.0 1.0X ###[ Pushdown benchmark for smallint ]### Java HotSpot(TM) 64-Bit Server VM 1.8.0_151-b12 on Mac OS X 10.12.6 Intel(R) Core(TM) i7-7820HQ CPU @ 2.90GHz Select 1 smallint row (value = CAST(63 AS smallint)): Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative Parquet Vectorized2939 / 2966 5.4 186.9 1.0X Parquet Vectorized (Pushdown) 85 / 91184.9 5.4 34.6X Native ORC Vectorized 2927 / 3026 5.4 186.1 1.0X Native ORC Vectorized (Pushdown) 418 / 432 37.7 26.6 7.0X Select 10% smallint rows (value < CAST(3276 AS smallint)): Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative Parquet Vectorized3735 / 3897 4.2 237.5 1.0X Parquet Vectorized (Pushdown) 1204 / 1222 13.1 76.6 3.1X Native ORC Vectorized 3796 / 3831 4.1 241.4 1.0X Native ORC Vectorized (Pushdown) 1570 / 1581 10.0 99.8 2.4X Select 50% smallint rows (value < CAST(16383 AS smallint)): Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative Parquet Vectorized7194 / 8522 2.2 457.4 1.0X Parquet Vectorized (Pushdown) 5758 / 5806 2.7 366.1 1.2X Native ORC Vectorized 7311 / 7585 2.2 464.8 1.0X Native ORC Vectorized (Pushdown) 6123 / 6342 2.6 389.3 1.2X Select 90% smallint rows (value < CAST(29490 AS smallint)): Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative
[jira] [Updated] (SPARK-24706) Support ByteType and ShortType pushdown to parquet
[ https://issues.apache.org/jira/browse/SPARK-24706?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yuming Wang updated SPARK-24706: Description: (was: Benchmark result: {noformat} ###[ Pushdown benchmark for tinyint ] Java HotSpot(TM) 64-Bit Server VM 1.8.0_151-b12 on Mac OS X 10.12.6 Intel(R) Core(TM) i7-7820HQ CPU @ 2.90GHz Select 1 tinyint row (value = CAST(63 AS tinyint)): Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative Parquet Vectorized4307 / 4575 3.7 273.8 1.0X Parquet Vectorized (Pushdown) 227 / 241 69.4 14.4 19.0X Native ORC Vectorized 3646 / 3727 4.3 231.8 1.2X Native ORC Vectorized (Pushdown) 736 / 744 21.4 46.8 5.9X Select 10% tinyint rows (value < 12):Best/Avg Time(ms)Rate(M/s) Per Row(ns) Relative Parquet Vectorized5209 / 5843 3.0 331.2 1.0X Parquet Vectorized (Pushdown) 1296 / 1759 12.1 82.4 4.0X Native ORC Vectorized 4455 / 4594 3.5 283.2 1.2X Native ORC Vectorized (Pushdown) 1736 / 1813 9.1 110.4 3.0X Select 50% tinyint rows (value < 63):Best/Avg Time(ms)Rate(M/s) Per Row(ns) Relative Parquet Vectorized8362 / 8394 1.9 531.7 1.0X Parquet Vectorized (Pushdown) 6303 / 6530 2.5 400.7 1.3X Native ORC Vectorized 7962 / 8113 2.0 506.2 1.1X Native ORC Vectorized (Pushdown) 6680 / 7556 2.4 424.7 1.3X Select 90% tinyint rows (value < 114): Best/Avg Time(ms)Rate(M/s) Per Row(ns) Relative Parquet Vectorized 11572 / 11715 1.4 735.7 1.0X Parquet Vectorized (Pushdown) 11198 / 11326 1.4 712.0 1.0X Native ORC Vectorized 11041 / 11209 1.4 702.0 1.0X Native ORC Vectorized (Pushdown)11104 / 11472 1.4 706.0 1.0X ###[ Pushdown benchmark for smallint ]### Java HotSpot(TM) 64-Bit Server VM 1.8.0_151-b12 on Mac OS X 10.12.6 Intel(R) Core(TM) i7-7820HQ CPU @ 2.90GHz Select 1 smallint row (value = CAST(63 AS smallint)): Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative Parquet Vectorized2939 / 2966 5.4 186.9 1.0X Parquet Vectorized (Pushdown) 85 / 91184.9 5.4 34.6X Native ORC Vectorized 2927 / 3026 5.4 186.1 1.0X Native ORC Vectorized (Pushdown) 418 / 432 37.7 26.6 7.0X Select 10% smallint rows (value < CAST(3276 AS smallint)): Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative Parquet Vectorized3735 / 3897 4.2 237.5 1.0X Parquet Vectorized (Pushdown) 1204 / 1222 13.1 76.6 3.1X Native ORC Vectorized 3796 / 3831 4.1 241.4 1.0X Native ORC Vectorized (Pushdown) 1570 / 1581 10.0 99.8 2.4X Select 50% smallint rows (value < CAST(16383 AS smallint)): Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative Parquet Vectorized7194 / 8522 2.2 457.4 1.0X Parquet Vectorized (Pushdown) 5758 / 5806 2.7 366.1 1.2X Native ORC Vectorized 7311 / 7585 2.2 464.8 1.0X Native ORC Vectorized (Pushdown) 6123 / 6342 2.6 389.3 1.2X Select 90% smallint rows (value < CAST(29490 AS smallint)): Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative
[jira] [Updated] (SPARK-24718) Timestamp support pushdown to parquet data source
[ https://issues.apache.org/jira/browse/SPARK-24718?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yuming Wang updated SPARK-24718: Description: Some thing like this: {code:java} case ParquetSchemaType(TIMESTAMP_MICROS, INT64, null) if pushDownDecimal => (n: String, v: Any) => FilterApi.eq( longColumn(n), Option(v).map(t => (t.asInstanceOf[java.sql.Timestamp].getTime * 1000) .asInstanceOf[java.lang.Long]).orNull) case ParquetSchemaType(TIMESTAMP_MILLIS, INT64, null) if pushDownDecimal => (n: String, v: Any) => FilterApi.eq( longColumn(n), Option(v).map(_.asInstanceOf[java.sql.Timestamp].getTime .asInstanceOf[java.lang.Long]).orNull) {code} was: Some thing like this: {code:java} case ParquetSchemaType(TIMESTAMP_MICROS, INT64, decimal) if pushDownDecimal => (n: String, v: Any) => FilterApi.eq( longColumn(n), Option(v).map(t => (t.asInstanceOf[java.sql.Timestamp].getTime * 1000) .asInstanceOf[java.lang.Long]).orNull) case ParquetSchemaType(TIMESTAMP_MILLIS, INT64, decimal) if pushDownDecimal => (n: String, v: Any) => FilterApi.eq( longColumn(n), Option(v).map(_.asInstanceOf[java.sql.Timestamp].getTime .asInstanceOf[java.lang.Long]).orNull) {code} > Timestamp support pushdown to parquet data source > - > > Key: SPARK-24718 > URL: https://issues.apache.org/jira/browse/SPARK-24718 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 2.4.0 >Reporter: Yuming Wang >Priority: Major > > Some thing like this: > {code:java} > case ParquetSchemaType(TIMESTAMP_MICROS, INT64, null) > if pushDownDecimal => > (n: String, v: Any) => FilterApi.eq( > longColumn(n), > Option(v).map(t => (t.asInstanceOf[java.sql.Timestamp].getTime * 1000) > .asInstanceOf[java.lang.Long]).orNull) > case ParquetSchemaType(TIMESTAMP_MILLIS, INT64, null) > if pushDownDecimal => > (n: String, v: Any) => FilterApi.eq( > longColumn(n), > Option(v).map(_.asInstanceOf[java.sql.Timestamp].getTime > .asInstanceOf[java.lang.Long]).orNull) > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-24718) Timestamp support pushdown to parquet data source
[ https://issues.apache.org/jira/browse/SPARK-24718?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yuming Wang updated SPARK-24718: Description: Some thing like this: {code:java} case ParquetSchemaType(TIMESTAMP_MICROS, INT64, decimal) if pushDownDecimal => (n: String, v: Any) => FilterApi.eq( longColumn(n), Option(v).map(t => (t.asInstanceOf[java.sql.Timestamp].getTime * 1000) .asInstanceOf[java.lang.Long]).orNull) case ParquetSchemaType(TIMESTAMP_MILLIS, INT64, decimal) if pushDownDecimal => (n: String, v: Any) => FilterApi.eq( longColumn(n), Option(v).map(_.asInstanceOf[java.sql.Timestamp].getTime .asInstanceOf[java.lang.Long]).orNull) {code} was: Some thing like this: {code:java} // INT96 deprecated, doesn't support pushdown, see: PARQUET-323 case ParquetSchemaType(TIMESTAMP_MICROS, INT64, decimal) if pushDownDecimal => (n: String, v: Any) => FilterApi.eq( longColumn(n), Option(v).map(t => (t.asInstanceOf[java.sql.Timestamp].getTime * 1000) .asInstanceOf[java.lang.Long]).orNull) case ParquetSchemaType(TIMESTAMP_MILLIS, INT64, decimal) if pushDownDecimal => (n: String, v: Any) => FilterApi.eq( longColumn(n), Option(v).map(_.asInstanceOf[java.sql.Timestamp].getTime .asInstanceOf[java.lang.Long]).orNull) {code} > Timestamp support pushdown to parquet data source > - > > Key: SPARK-24718 > URL: https://issues.apache.org/jira/browse/SPARK-24718 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 2.4.0 >Reporter: Yuming Wang >Priority: Major > > Some thing like this: > {code:java} > case ParquetSchemaType(TIMESTAMP_MICROS, INT64, decimal) > if pushDownDecimal => > (n: String, v: Any) => FilterApi.eq( > longColumn(n), > Option(v).map(t => (t.asInstanceOf[java.sql.Timestamp].getTime * 1000) > .asInstanceOf[java.lang.Long]).orNull) > case ParquetSchemaType(TIMESTAMP_MILLIS, INT64, decimal) > if pushDownDecimal => > (n: String, v: Any) => FilterApi.eq( > longColumn(n), > Option(v).map(_.asInstanceOf[java.sql.Timestamp].getTime > .asInstanceOf[java.lang.Long]).orNull) > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Issue Comment Deleted] (SPARK-24716) Refactor ParquetFilters
[ https://issues.apache.org/jira/browse/SPARK-24716?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yuming Wang updated SPARK-24716: Comment: was deleted (was: I'm working on.) > Refactor ParquetFilters > --- > > Key: SPARK-24716 > URL: https://issues.apache.org/jira/browse/SPARK-24716 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 2.4.0 >Reporter: Yuming Wang >Priority: Major > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-24718) Timestamp support pushdown to parquet data source
[ https://issues.apache.org/jira/browse/SPARK-24718?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yuming Wang updated SPARK-24718: Description: Some thing like this: {code:java} // INT96 deprecated, doesn't support pushdown, see: PARQUET-323 case ParquetSchemaType(TIMESTAMP_MICROS, INT64, decimal) if pushDownDecimal => (n: String, v: Any) => FilterApi.eq( longColumn(n), Option(v).map(t => (t.asInstanceOf[java.sql.Timestamp].getTime * 1000) .asInstanceOf[java.lang.Long]).orNull) case ParquetSchemaType(TIMESTAMP_MILLIS, INT64, decimal) if pushDownDecimal => (n: String, v: Any) => FilterApi.eq( longColumn(n), Option(v).map(_.asInstanceOf[java.sql.Timestamp].getTime .asInstanceOf[java.lang.Long]).orNull) {code} > Timestamp support pushdown to parquet data source > - > > Key: SPARK-24718 > URL: https://issues.apache.org/jira/browse/SPARK-24718 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 2.4.0 >Reporter: Yuming Wang >Priority: Major > > Some thing like this: > {code:java} > // INT96 deprecated, doesn't support pushdown, see: PARQUET-323 > case ParquetSchemaType(TIMESTAMP_MICROS, INT64, decimal) > if pushDownDecimal => > (n: String, v: Any) => FilterApi.eq( > longColumn(n), > Option(v).map(t => (t.asInstanceOf[java.sql.Timestamp].getTime * 1000) > .asInstanceOf[java.lang.Long]).orNull) > case ParquetSchemaType(TIMESTAMP_MILLIS, INT64, decimal) > if pushDownDecimal => > (n: String, v: Any) => FilterApi.eq( > longColumn(n), > Option(v).map(_.asInstanceOf[java.sql.Timestamp].getTime > .asInstanceOf[java.lang.Long]).orNull) > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-24718) Timestamp support pushdown to parquet data source
[ https://issues.apache.org/jira/browse/SPARK-24718?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16530095#comment-16530095 ] Yuming Wang commented on SPARK-24718: - I'm working on > Timestamp support pushdown to parquet data source > - > > Key: SPARK-24718 > URL: https://issues.apache.org/jira/browse/SPARK-24718 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 2.4.0 >Reporter: Yuming Wang >Priority: Major > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-24718) Timestamp support pushdown to parquet data source
Yuming Wang created SPARK-24718: --- Summary: Timestamp support pushdown to parquet data source Key: SPARK-24718 URL: https://issues.apache.org/jira/browse/SPARK-24718 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 2.4.0 Reporter: Yuming Wang -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-24096) create table as select not using hive.default.fileformat
[ https://issues.apache.org/jira/browse/SPARK-24096?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16454463#comment-16454463 ] Yuming Wang commented on SPARK-24096: - Another related PR: https://github.com/apache/spark/pull/14430 > create table as select not using hive.default.fileformat > > > Key: SPARK-24096 > URL: https://issues.apache.org/jira/browse/SPARK-24096 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.3.0 >Reporter: StephenZou >Priority: Major > > In my spark conf directory, hive-site.xml have an item indicating orc is the > default file format. > > hive.default.fileformat > orc > > > But when I use "create table as select ..." to create a table, the output > format is plain text. > It works only I use "set hive.default.fileformat=orc" > > Then I walked through the spark code and found in > sparkSqlParser:visitCreateHiveTable(), > val defaultStorage = HiveSerDe.getDefaultStorage(conf) the conf is SQLConf, > that explains the above observation, > "set hive.default.fileformat=orc" is put into conf map, hive-site.xml is not. > > It's quite misleading, How to unify the settings? > > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-24096) create table as select not using hive.default.fileformat
[ https://issues.apache.org/jira/browse/SPARK-24096?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yuming Wang resolved SPARK-24096. - Resolution: Duplicate > create table as select not using hive.default.fileformat > > > Key: SPARK-24096 > URL: https://issues.apache.org/jira/browse/SPARK-24096 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.3.0 >Reporter: StephenZou >Priority: Major > > In my spark conf directory, hive-site.xml have an item indicating orc is the > default file format. > > hive.default.fileformat > orc > > > But when I use "create table as select ..." to create a table, the output > format is plain text. > It works only I use "set hive.default.fileformat=orc" > > Then I walked through the spark code and found in > sparkSqlParser:visitCreateHiveTable(), > val defaultStorage = HiveSerDe.getDefaultStorage(conf) the conf is SQLConf, > that explains the above observation, > "set hive.default.fileformat=orc" is put into conf map, hive-site.xml is not. > > It's quite misleading, How to unify the settings? > > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-22977) DataFrameWriter operations do not show details in SQL tab
Yuming Wang created SPARK-22977: --- Summary: DataFrameWriter operations do not show details in SQL tab Key: SPARK-22977 URL: https://issues.apache.org/jira/browse/SPARK-22977 Project: Spark Issue Type: Bug Components: SQL, Web UI Affects Versions: 2.3.0 Reporter: Yuming Wang -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-22977) DataFrameWriter operations do not show details in SQL tab
[ https://issues.apache.org/jira/browse/SPARK-22977?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yuming Wang updated SPARK-22977: Attachment: after.png before.png > DataFrameWriter operations do not show details in SQL tab > - > > Key: SPARK-22977 > URL: https://issues.apache.org/jira/browse/SPARK-22977 > Project: Spark > Issue Type: Bug > Components: SQL, Web UI >Affects Versions: 2.3.0 >Reporter: Yuming Wang > Attachments: after.png, before.png > > -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-22977) DataFrameWriter operations do not show details in SQL tab
[ https://issues.apache.org/jira/browse/SPARK-22977?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yuming Wang updated SPARK-22977: Attachment: after.png before.png > DataFrameWriter operations do not show details in SQL tab > - > > Key: SPARK-22977 > URL: https://issues.apache.org/jira/browse/SPARK-22977 > Project: Spark > Issue Type: Bug > Components: SQL, Web UI >Affects Versions: 2.3.0 >Reporter: Yuming Wang > Attachments: after.png, before.png > > > When create > !before.png! > !after.png! -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-22977) DataFrameWriter operations do not show details in SQL tab
[ https://issues.apache.org/jira/browse/SPARK-22977?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yuming Wang updated SPARK-22977: Description: When create !before.png! !after.png! > DataFrameWriter operations do not show details in SQL tab > - > > Key: SPARK-22977 > URL: https://issues.apache.org/jira/browse/SPARK-22977 > Project: Spark > Issue Type: Bug > Components: SQL, Web UI >Affects Versions: 2.3.0 >Reporter: Yuming Wang > Attachments: after.png, before.png > > > When create > !before.png! > !after.png! -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-22977) DataFrameWriter operations do not show details in SQL tab
[ https://issues.apache.org/jira/browse/SPARK-22977?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yuming Wang updated SPARK-22977: Attachment: (was: before.png) > DataFrameWriter operations do not show details in SQL tab > - > > Key: SPARK-22977 > URL: https://issues.apache.org/jira/browse/SPARK-22977 > Project: Spark > Issue Type: Bug > Components: SQL, Web UI >Affects Versions: 2.3.0 >Reporter: Yuming Wang > Attachments: after.png, before.png > > > When create > !before.png! > !after.png! -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-22977) DataFrameWriter operations do not show details in SQL tab
[ https://issues.apache.org/jira/browse/SPARK-22977?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yuming Wang updated SPARK-22977: Attachment: (was: after.png) > DataFrameWriter operations do not show details in SQL tab > - > > Key: SPARK-22977 > URL: https://issues.apache.org/jira/browse/SPARK-22977 > Project: Spark > Issue Type: Bug > Components: SQL, Web UI >Affects Versions: 2.3.0 >Reporter: Yuming Wang > Attachments: after.png, before.png > > > When create > !before.png! > !after.png! -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-22977) DataFrameWriter operations do not show details in SQL tab
[ https://issues.apache.org/jira/browse/SPARK-22977?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yuming Wang updated SPARK-22977: Description: When CreateHiveTableAsSelectCommand or InsertIntoHiveTable, !before.png! !after.png! was: When create !before.png! !after.png! > DataFrameWriter operations do not show details in SQL tab > - > > Key: SPARK-22977 > URL: https://issues.apache.org/jira/browse/SPARK-22977 > Project: Spark > Issue Type: Bug > Components: SQL, Web UI >Affects Versions: 2.3.0 >Reporter: Yuming Wang > Attachments: after.png, before.png > > > When CreateHiveTableAsSelectCommand or InsertIntoHiveTable, > !before.png! > !after.png! -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-22977) DataFrameWriter operations do not show details in SQL tab
[ https://issues.apache.org/jira/browse/SPARK-22977?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yuming Wang updated SPARK-22977: Description: When CreateHiveTableAsSelectCommand or InsertIntoHiveTable, SQL tab don't show details after [SPARK-20213|https://issues.apache.org/jira/browse/SPARK-20213]. *Before*: !before.png! *After*: !after.png! was: When CreateHiveTableAsSelectCommand or InsertIntoHiveTable, !before.png! !after.png! > DataFrameWriter operations do not show details in SQL tab > - > > Key: SPARK-22977 > URL: https://issues.apache.org/jira/browse/SPARK-22977 > Project: Spark > Issue Type: Bug > Components: SQL, Web UI >Affects Versions: 2.3.0 >Reporter: Yuming Wang > Attachments: after.png, before.png > > > When CreateHiveTableAsSelectCommand or InsertIntoHiveTable, SQL tab don't > show details after > [SPARK-20213|https://issues.apache.org/jira/browse/SPARK-20213]. > *Before*: > !before.png! > *After*: > !after.png! -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-22894) DateTimeOperations should accept SQL like string type
Yuming Wang created SPARK-22894: --- Summary: DateTimeOperations should accept SQL like string type Key: SPARK-22894 URL: https://issues.apache.org/jira/browse/SPARK-22894 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 2.3.0 Reporter: Yuming Wang {noformat} spark-sql> SELECT '2017-12-24' + interval 2 months 2 seconds; Error in query: cannot resolve '(CAST('2017-12-24' AS DOUBLE) + interval 2 months 2 seconds)' due to data type mismatch: differing types in '(CAST('2017-12-24' AS DOUBLE) + interval 2 months 2 seconds)' (double and calendarinterval).; line 1 pos 7; 'Project [unresolvedalias((cast(2017-12-24 as double) + interval 2 months 2 seconds), None)] +- OneRowRelation spark-sql> {noformat} -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-22890) Basic tests for DateTimeOperations
Yuming Wang created SPARK-22890: --- Summary: Basic tests for DateTimeOperations Key: SPARK-22890 URL: https://issues.apache.org/jira/browse/SPARK-22890 Project: Spark Issue Type: Sub-task Components: SQL Affects Versions: 2.3.0 Reporter: Yuming Wang -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-22893) Unified the data type mismatch message
Yuming Wang created SPARK-22893: --- Summary: Unified the data type mismatch message Key: SPARK-22893 URL: https://issues.apache.org/jira/browse/SPARK-22893 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 2.3.0 Reporter: Yuming Wang {noformat} spark-sql> select cast(1 as binary); Error in query: cannot resolve 'CAST(1 AS BINARY)' due to data type mismatch: cannot cast IntegerType to BinaryType; line 1 pos 7; {noformat} We should use {{dataType.simpleString}}. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-23175) Type conversion does not make sense under case like select ’0.1’ = 0
[ https://issues.apache.org/jira/browse/SPARK-23175?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yuming Wang resolved SPARK-23175. - Resolution: Duplicate > Type conversion does not make sense under case like select ’0.1’ = 0 > > > Key: SPARK-23175 > URL: https://issues.apache.org/jira/browse/SPARK-23175 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.0.0 >Reporter: Shaoquan Zhang >Priority: Major > > SQL select '0.1' = 0 returns true. The result seems unreasonable. > From the logical plan, the sql is parsed as 'Project [(cast(cast(0.1 as > decimal(20,0)) as int) = 0) AS #6]'. The type conversion converts the string > to integer, which leads to the unreasonable result. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-23058) Show create table can't show non printable field delim
[ https://issues.apache.org/jira/browse/SPARK-23058?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yuming Wang updated SPARK-23058: Summary: Show create table can't show non printable field delim (was: Show create table conn't show non printable field delim) > Show create table can't show non printable field delim > -- > > Key: SPARK-23058 > URL: https://issues.apache.org/jira/browse/SPARK-23058 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 2.3.0 >Reporter: Yuming Wang > -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-23058) Show create table can't show non printable field delim
[ https://issues.apache.org/jira/browse/SPARK-23058?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yuming Wang updated SPARK-23058: Description: # create table t1: {code:sql} CREATE EXTERNAL TABLE `t1`(`col1` bigint) ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe' WITH SERDEPROPERTIES ( 'field.delim' = '\177', 'serialization.format' = '\003' ) STORED AS INPUTFORMAT 'org.apache.hadoop.mapred.SequenceFileInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat' LOCATION 'file:/tmp/t1'; {code} # show create table t1: {code:java} spark-sql> show create table t1; CREATE EXTERNAL TABLE `t1`(`col1` bigint) ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe' WITH SERDEPROPERTIES ( 'field.delim' = '', 'serialization.format' = '' ) STORED AS INPUTFORMAT 'org.apache.hadoop.mapred.SequenceFileInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat' LOCATION 'file:/tmp/t1' TBLPROPERTIES ( 'transient_lastDdlTime' = '1515766958' ) {code} {{'\177'}} and {{'\003'}} didn't correct show. > Show create table can't show non printable field delim > -- > > Key: SPARK-23058 > URL: https://issues.apache.org/jira/browse/SPARK-23058 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 2.3.0 >Reporter: Yuming Wang > > # create table t1: > {code:sql} > CREATE EXTERNAL TABLE `t1`(`col1` bigint) > ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe' > WITH SERDEPROPERTIES ( > 'field.delim' = '\177', > 'serialization.format' = '\003' > ) > STORED AS > INPUTFORMAT 'org.apache.hadoop.mapred.SequenceFileInputFormat' > OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat' > LOCATION 'file:/tmp/t1'; > {code} > # show create table t1: > {code:java} > spark-sql> show create table t1; > CREATE EXTERNAL TABLE `t1`(`col1` bigint) > ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe' > WITH SERDEPROPERTIES ( > 'field.delim' = '', > 'serialization.format' = '' > ) > STORED AS > INPUTFORMAT 'org.apache.hadoop.mapred.SequenceFileInputFormat' > OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat' > LOCATION 'file:/tmp/t1' > TBLPROPERTIES ( > 'transient_lastDdlTime' = '1515766958' > ) > {code} > {{'\177'}} and {{'\003'}} didn't correct show. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-23058) Show create table conn't show non printable field delim
Yuming Wang created SPARK-23058: --- Summary: Show create table conn't show non printable field delim Key: SPARK-23058 URL: https://issues.apache.org/jira/browse/SPARK-23058 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 2.3.0 Reporter: Yuming Wang -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-23297) Spark job is finished but the stage process is error
[ https://issues.apache.org/jira/browse/SPARK-23297?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16351791#comment-16351791 ] Yuming Wang edited comment on SPARK-23297 at 2/4/18 1:52 PM: - [~KaiXinXIaoLei] Try to increase {{spark.ui.retainedTasks}}. was (Author: q79969786): [~KaiXinXIaoLei] Try to increase {{spark.scheduler.listenerbus.eventqueue.capacity}}. > Spark job is finished but the stage process is error > > > Key: SPARK-23297 > URL: https://issues.apache.org/jira/browse/SPARK-23297 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 2.1.2, 2.2.1 >Reporter: KaiXinXIaoLei >Priority: Major > Attachments: job finished but stage process is error.png > > > I set the log level is WARN, and run spark job using spark-sql. My job is > finished but the stage process display the running state, !job finished but > stage process is error.png! -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-23297) Spark job is finished but the stage process is error
[ https://issues.apache.org/jira/browse/SPARK-23297?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16351791#comment-16351791 ] Yuming Wang commented on SPARK-23297: - [~KaiXinXIaoLei] Try to increase {{spark.scheduler.listenerbus.eventqueue.capacity}}. > Spark job is finished but the stage process is error > > > Key: SPARK-23297 > URL: https://issues.apache.org/jira/browse/SPARK-23297 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 2.1.2, 2.2.1 >Reporter: KaiXinXIaoLei >Priority: Major > Attachments: job finished but stage process is error.png > > > I set the log level is WARN, and run spark job using spark-sql. My job is > finished but the stage process display the running state, !job finished but > stage process is error.png! -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-23336) Upgrade snappy-java to 1.1.4
Yuming Wang created SPARK-23336: --- Summary: Upgrade snappy-java to 1.1.4 Key: SPARK-23336 URL: https://issues.apache.org/jira/browse/SPARK-23336 Project: Spark Issue Type: Bug Components: Build Affects Versions: 2.4.0 Reporter: Yuming Wang We should upgrade the snappy-java version to improve performance compression (5%) and decompression (20%). Details: [https://github.com/xerial/snappy-java/blob/master/Milestone.md#snappy-java-114-2017-05-22] -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-23332) Update SQLQueryTestSuite to support test both default mode and hive mode for a typeCoercion TestCase
[ https://issues.apache.org/jira/browse/SPARK-23332?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yuming Wang updated SPARK-23332: Summary: Update SQLQueryTestSuite to support test both default mode and hive mode for a typeCoercion TestCase (was: Update SQLQueryTestSuite to support test hive mode) > Update SQLQueryTestSuite to support test both default mode and hive mode for > a typeCoercion TestCase > > > Key: SPARK-23332 > URL: https://issues.apache.org/jira/browse/SPARK-23332 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 2.4.0 >Reporter: Yuming Wang >Priority: Major > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-23332) Update SQLQueryTestSuite to support test hive mode
Yuming Wang created SPARK-23332: --- Summary: Update SQLQueryTestSuite to support test hive mode Key: SPARK-23332 URL: https://issues.apache.org/jira/browse/SPARK-23332 Project: Spark Issue Type: Sub-task Components: SQL Affects Versions: 2.4.0 Reporter: Yuming Wang -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-23297) Spark job is finished but the stage process is error
[ https://issues.apache.org/jira/browse/SPARK-23297?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16348332#comment-16348332 ] Yuming Wang commented on SPARK-23297: - It seems because {{SparkListenerTaskEnd}} events are not consumed in time, and this is not a bug. > Spark job is finished but the stage process is error > > > Key: SPARK-23297 > URL: https://issues.apache.org/jira/browse/SPARK-23297 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 2.2.1 >Reporter: KaiXinXIaoLei >Priority: Major > Attachments: job finished but stage process is error.png > > > I set the log level is WARN, and run spark job using spark-sql. My job is > finished but the stage process display the running state, !job finished but > stage process is error.png! -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-23263) create table stored as parquet should update table size if automatic update table size is enabled
Yuming Wang created SPARK-23263: --- Summary: create table stored as parquet should update table size if automatic update table size is enabled Key: SPARK-23263 URL: https://issues.apache.org/jira/browse/SPARK-23263 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 2.4.0 Reporter: Yuming Wang How to reproduce: {noformat} bin/spark-sql --conf spark.sql.statistics.size.autoUpdate.enabled=true {noformat} {code:sql} spark-sql> create table test_create_parquet stored as parquet as select 1; spark-sql> desc extended test_create_parquet; {code} The table statistics will not exists. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-23336) Upgrade snappy-java to 1.1.7.1
[ https://issues.apache.org/jira/browse/SPARK-23336?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yuming Wang updated SPARK-23336: Summary: Upgrade snappy-java to 1.1.7.1 (was: Upgrade snappy-java to 1.1.4) > Upgrade snappy-java to 1.1.7.1 > -- > > Key: SPARK-23336 > URL: https://issues.apache.org/jira/browse/SPARK-23336 > Project: Spark > Issue Type: Improvement > Components: Build >Affects Versions: 2.4.0 >Reporter: Yuming Wang >Priority: Minor > > We should upgrade the snappy-java version to improve performance compression > (5%) and decompression (20%). > Details: > > [https://github.com/xerial/snappy-java/blob/master/Milestone.md#snappy-java-114-2017-05-22] -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-23405) The task will hang up when a small table left semi join a big table
[ https://issues.apache.org/jira/browse/SPARK-23405?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16362162#comment-16362162 ] Yuming Wang commented on SPARK-23405: - I think it's data skew, you should broadcast small table. > The task will hang up when a small table left semi join a big table > --- > > Key: SPARK-23405 > URL: https://issues.apache.org/jira/browse/SPARK-23405 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.2.1 >Reporter: KaiXinXIaoLei >Priority: Major > Attachments: SQL.png, taskhang up.png > > > I run a sql: `select ls.cs_order_number from ls left semi join catalog_sales > cs on ls.cs_order_number = cs.cs_order_number`, The `ls` table is a small > table ,and the number is one. The `catalog_sales` table is a big table, and > the number is 10 billion. The task will be hang up: > !taskhang up.png! > And the sql page is : > !SQL.png! -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-23354) spark jdbc does not maintain length of data type when I move data from MS sql server to Oracle using spark jdbc
[ https://issues.apache.org/jira/browse/SPARK-23354?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16357966#comment-16357966 ] Yuming Wang commented on SPARK-23354: - Do you mean custom column type? you can find more details [here|https://github.com/apache/spark/pull/18266]. > spark jdbc does not maintain length of data type when I move data from MS sql > server to Oracle using spark jdbc > --- > > Key: SPARK-23354 > URL: https://issues.apache.org/jira/browse/SPARK-23354 > Project: Spark > Issue Type: Bug > Components: Input/Output >Affects Versions: 2.2.1 >Reporter: Lav Patel >Priority: Major > > spark jdbc does not maintain length of data type when I move data from MS sql > server to Oracle using spark jdbc > > To fix this, I have written code so it will figure out length of column and > it does the conversion. > > I can put more details with a code sample if the community is interested. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-23373) Can not execute "count distinct" queries on parquet formatted table
[ https://issues.apache.org/jira/browse/SPARK-23373?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16358392#comment-16358392 ] Yuming Wang commented on SPARK-23373: - I cannot reproduce on current master as your mentioned too. > Can not execute "count distinct" queries on parquet formatted table > --- > > Key: SPARK-23373 > URL: https://issues.apache.org/jira/browse/SPARK-23373 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.2.0 >Reporter: Wang, Gang >Priority: Major > > I failed to run sql "select count(distinct n_name) from nation", table nation > is formatted in Parquet, error trace is as following. > _spark-sql> select count(distinct n_name) from nation;_ > _18/02/09 03:55:28 INFO main SparkSqlParser:54 Parsing command: select > count(distinct n_name) from nation_ > _Error in query: Table or view not found: nation; line 1 pos 35_ > _spark-sql> select count(distinct n_name) from nation_parquet;_ > _18/02/09 03:55:36 INFO main SparkSqlParser:54 Parsing command: select > count(distinct n_name) from nation_parquet_ > _18/02/09 03:55:36 INFO main CatalystSqlParser:54 Parsing command: int_ > _18/02/09 03:55:36 INFO main CatalystSqlParser:54 Parsing command: string_ > _18/02/09 03:55:36 INFO main CatalystSqlParser:54 Parsing command: int_ > _18/02/09 03:55:36 INFO main CatalystSqlParser:54 Parsing command: string_ > _18/02/09 03:55:36 INFO main CatalystSqlParser:54 Parsing command: > array_ > _18/02/09 03:55:38 INFO main FileSourceStrategy:54 Pruning directories with:_ > _18/02/09 03:55:38 INFO main FileSourceStrategy:54 Data Filters:_ > _18/02/09 03:55:38 INFO main FileSourceStrategy:54 Post-Scan Filters:_ > _18/02/09 03:55:38 INFO main FileSourceStrategy:54 Output Data Schema: > struct_ > _18/02/09 03:55:38 INFO main FileSourceScanExec:54 Pushed Filters:_ > _18/02/09 03:55:39 INFO main CodeGenerator:54 Code generated in 295.88685 ms_ > _18/02/09 03:55:39 INFO main HashAggregateExec:54 > spark.sql.codegen.aggregate.map.twolevel.enable is set to true, but current > version of codegened fast hashmap does not support this aggregate._ > _18/02/09 03:55:39 INFO main CodeGenerator:54 Code generated in 51.075394 ms_ > _18/02/09 03:55:39 INFO main HashAggregateExec:54 > spark.sql.codegen.aggregate.map.twolevel.enable is set to true, but current > version of codegened fast hashmap does not support this aggregate._ > _18/02/09 03:55:39 INFO main CodeGenerator:54 Code generated in 42.819226 ms_ > _18/02/09 03:55:39 INFO main ParquetFileFormat:54 parquetFilterPushDown is > true_ > _18/02/09 03:55:39 INFO main ParquetFileFormat:54 start filter class_ > _18/02/09 03:55:39 INFO main ParquetFileFormat:54 Pushed not defined_ > _18/02/09 03:55:39 INFO main ParquetFileFormat:54 end filter class_ > _18/02/09 03:55:39 INFO main MemoryStore:54 Block broadcast_0 stored as > values in memory (estimated size 305.0 KB, free 366.0 MB)_ > _18/02/09 03:55:39 INFO main MemoryStore:54 Block broadcast_0_piece0 stored > as bytes in memory (estimated size 27.6 KB, free 366.0 MB)_ > _18/02/09 03:55:39 INFO dispatcher-event-loop-7 BlockManagerInfo:54 Added > broadcast_0_piece0 in memory on 10.64.205.170:45616 (size: 27.6 KB, free: > 366.3 MB)_ > _18/02/09 03:55:39 INFO main SparkContext:54 Created broadcast 0 from > processCmd at CliDriver.java:376_ > _18/02/09 03:55:39 INFO main InMemoryFileIndex:54 Selected files after > partition pruning:_ > _PartitionDirectory([empty > row],ArrayBuffer(LocatedFileStatus\{path=hdfs://**.com:8020/apps/hive/warehouse/nation_parquet/00_0; > isDirectory=false; length=3216; replication=3; blocksize=134217728; > modification_time=1516619879024; access_time=0; owner=; group=; > permission=rw-rw-rw-; isSymlink=false}))_ > _18/02/09 03:55:39 INFO main FileSourceScanExec:54 Planning scan with bin > packing, max size: 4194304 bytes, open cost is considered as scanning 4194304 > bytes._ > _18/02/09 03:55:39 ERROR main SparkSQLDriver:91 Failed in [select > count(distinct n_name) from nation_parquet]_ > {color:#ff}*_org.apache.spark.SparkException: Task not > serializable_*{color} > _at > org.apache.spark.util.ClosureCleaner$.ensureSerializable(ClosureCleaner.scala:340)_ > _at > org.apache.spark.util.ClosureCleaner$.org$apache$spark$util$ClosureCleaner$$clean(ClosureCleaner.scala:330)_ > _at org.apache.spark.util.ClosureCleaner$.clean(ClosureCleaner.scala:156)_ > _at org.apache.spark.SparkContext.clean(SparkContext.scala:2294)_ > _at > org.apache.spark.rdd.RDD$$anonfun$mapPartitionsWithIndex$1.apply(RDD.scala:841)_ > _at > org.apache.spark.rdd.RDD$$anonfun$mapPartitionsWithIndex$1.apply(RDD.scala:840)_ > _at >
[jira] [Commented] (SPARK-23370) Spark receives a size of 0 for an Oracle Number field and defaults the field type to be BigDecimal(30,10) instead of the actual precision and scale
[ https://issues.apache.org/jira/browse/SPARK-23370?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16358401#comment-16358401 ] Yuming Wang commented on SPARK-23370: - User can config the column type like below now: {code:scala} val props = new Properties() props.put("customSchema", "ID decimal(38, 0), N1 int, N2 boolean") val dfRead = spark.read.schema(schema).jdbc(jdbcUrl, "tableWithCustomSchema", props) dfRead.show() {code} More details: https://github.com/apache/spark/pull/18266 > Spark receives a size of 0 for an Oracle Number field and defaults the field > type to be BigDecimal(30,10) instead of the actual precision and scale > --- > > Key: SPARK-23370 > URL: https://issues.apache.org/jira/browse/SPARK-23370 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.2.1 > Environment: Spark 2.2 > Oracle 11g > JDBC ojdbc6.jar >Reporter: Harleen Singh Mann >Priority: Major > Attachments: Oracle KB Document 1266785.pdf > > > Currently, on jdbc read spark obtains the schema of a table from using > {color:#654982} resultSet.getMetaData.getColumnType{color} > This works 99.99% of the times except when the column of Number type is added > on an Oracle table using the alter statement. This is essentially an Oracle > DB + JDBC bug that has been documented on Oracle KB and patches exist. > [oracle > KB|https://support.oracle.com/knowledge/Oracle%20Database%20Products/1266785_1.html] > {color:#ff}As a result of the above mentioned issue, Spark receives a > size of 0 for the field and defaults the field type to be BigDecimal(30,10) > instead of what it actually should be. This is done in OracleDialect.scala. > This may cause issues in the downstream application where relevant > information may be missed to the changed precision and scale.{color} > _The versions that are affected are:_ > _JDBC - Version: 11.2.0.1 and later [Release: 11.2 and later ]_ > _Oracle Server - Enterprise Edition - Version: 11.1.0.6 to 11.2.0.1_ > _[Release: 11.1 to 11.2]_ > +Proposed approach:+ > There is another way of fetching the schema information in Oracle: Which is > through the all_tab_columns table. If we use this table to fetch the > precision and scale of Number time, the above issue is mitigated. > > {color:#14892c}{color:#f6c342}I can implement the changes, but require some > inputs on the approach from the gatekeepers here{color}.{color} > {color:#14892c}PS. This is also my first Jira issue and my first fork for > Spark, so I will need some guidance along the way. (yes, I am a newbee to > this) Thanks...{color} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-23510) Support read data from Hive 2.2 and Hive 2.3 metastore
Yuming Wang created SPARK-23510: --- Summary: Support read data from Hive 2.2 and Hive 2.3 metastore Key: SPARK-23510 URL: https://issues.apache.org/jira/browse/SPARK-23510 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 2.4.0 Reporter: Yuming Wang -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-23510) Support read data from Hive 2.2 and Hive 2.3 metastore
[ https://issues.apache.org/jira/browse/SPARK-23510?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16375660#comment-16375660 ] Yuming Wang commented on SPARK-23510: - [~JPMoresmau] Can you try https://github.com/apache/spark/pull/20668? > Support read data from Hive 2.2 and Hive 2.3 metastore > -- > > Key: SPARK-23510 > URL: https://issues.apache.org/jira/browse/SPARK-23510 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 2.4.0 >Reporter: Yuming Wang >Priority: Major > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-22722) Test Coverage for Type Coercion Compatibility
[ https://issues.apache.org/jira/browse/SPARK-22722?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16305979#comment-16305979 ] Yuming Wang commented on SPARK-22722: - [~smilegator] All tests are added, except [FunctionArgumentConversion|https://github.com/apache/spark/pull/20008#issuecomment-352670852] and [StackCoercion|https://github.com/apache/spark/pull/20006#pullrequestreview-84366891]. > Test Coverage for Type Coercion Compatibility > - > > Key: SPARK-22722 > URL: https://issues.apache.org/jira/browse/SPARK-22722 > Project: Spark > Issue Type: Umbrella > Components: SQL >Affects Versions: 2.3.0 >Reporter: Xiao Li >Assignee: Yuming Wang > > Hive compatibility is pretty important for the users who run or migrate both > Hive and Spark SQL. > We plan to add a SQLConf for type coercion compatibility > (spark.sql.typeCoercion.mode). Users can choose Spark's native mode (default) > or Hive mode (hive). > Before we deliver the Hive compatibility mode, we plan to write a set of test > cases that can be easily run in both Spark and Hive sides. We can easily > compare whether they are the same or not. When new typeCoercion rules are > added, we also can easily track the changes. These test cases can also be > backported to the previous Spark versions for determining the changes we > made. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-20295) when spark.sql.adaptive.enabled is enabled, have conflict with Exchange Resue
[ https://issues.apache.org/jira/browse/SPARK-20295?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16520148#comment-16520148 ] Yuming Wang commented on SPARK-20295: - [~KevinZwx] Can you try [https://github.com/Intel-bigdata/spark-adaptive]? > when spark.sql.adaptive.enabled is enabled, have conflict with Exchange Resue > -- > > Key: SPARK-20295 > URL: https://issues.apache.org/jira/browse/SPARK-20295 > Project: Spark > Issue Type: Bug > Components: Shuffle, SQL >Affects Versions: 2.1.0 >Reporter: Ruhui Wang >Priority: Major > > when run tpcds-q95, and set spark.sql.adaptive.enabled = true the physical > plan firstly: > Sort > : +- Exchange(coordinator id: 1) > : +- Project*** > ::-Sort ** > :: +- Exchange(coordinator id: 2) > :: :- Project *** > :+- Sort > :: +- Exchange(coordinator id: 3) > spark.sql.exchange.reuse is opened, then physical plan will become below: > Sort > : +- Exchange(coordinator id: 1) > : +- Project*** > ::-Sort ** > :: +- Exchange(coordinator id: 2) > :: :- Project *** > :+- Sort > :: +- ReusedExchange Exchange(coordinator id: 2) > If spark.sql.adaptive.enabled = true, the code stack is : > ShuffleExchange#doExecute --> postShuffleRDD function --> > doEstimationIfNecessary . In this function, > assert(exchanges.length == numExchanges) will be error, as left side has only > one element, but right is equal to 2. > If this is a bug of spark.sql.adaptive.enabled and exchange resue? -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-24937) Datasource partition table should load empty static partitions
[ https://issues.apache.org/jira/browse/SPARK-24937?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yuming Wang updated SPARK-24937: Summary: Datasource partition table should load empty static partitions (was: Datasource partition table should load empty partitions) > Datasource partition table should load empty static partitions > -- > > Key: SPARK-24937 > URL: https://issues.apache.org/jira/browse/SPARK-24937 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.4.0 >Reporter: Yuming Wang >Priority: Major > > How to reproduce: > {code:sql} > spark-sql> CREATE TABLE tbl AS SELECT 1; > spark-sql> CREATE TABLE tbl1 (c1 BIGINT, day STRING, hour STRING) > > USING parquet > > PARTITIONED BY (day, hour); > spark-sql> INSERT INTO TABLE tbl1 PARTITION (day = '2018-07-25', hour='01') > SELECT * FROM tbl where 1=0; > spark-sql> SHOW PARTITIONS tbl1; > spark-sql> CREATE TABLE tbl2 (c1 BIGINT) > > PARTITIONED BY (day STRING, hour STRING); > 18/07/26 22:49:20 WARN HiveMetaStore: Location: > file:/Users/yumwang/tmp/spark/spark-warehouse/tbl2 specified for non-external > table:tbl2 > spark-sql> INSERT INTO TABLE tbl2 PARTITION (day = '2018-07-25', hour='01') > SELECT * FROM tbl where 1=0; > 18/07/26 22:49:36 WARN log: Updating partition stats fast for: tbl2 > 18/07/26 22:49:36 WARN log: Updated size to 0 > spark-sql> SHOW PARTITIONS tbl2; > day=2018-07-25/hour=01 > spark-sql> > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Issue Comment Deleted] (SPARK-24937) Datasource partition table should load empty partitions
[ https://issues.apache.org/jira/browse/SPARK-24937?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yuming Wang updated SPARK-24937: Comment: was deleted (was: I'm working on.) > Datasource partition table should load empty partitions > --- > > Key: SPARK-24937 > URL: https://issues.apache.org/jira/browse/SPARK-24937 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.4.0 >Reporter: Yuming Wang >Priority: Major > > How to reproduce: > {code:sql} > spark-sql> CREATE TABLE tbl AS SELECT 1; > spark-sql> CREATE TABLE tbl1 (c1 BIGINT, day STRING, hour STRING) > > USING parquet > > PARTITIONED BY (day, hour); > spark-sql> INSERT INTO TABLE tbl1 PARTITION (day = '2018-07-25', hour='01') > SELECT * FROM tbl where 1=0; > spark-sql> SHOW PARTITIONS tbl1; > spark-sql> CREATE TABLE tbl2 (c1 BIGINT) > > PARTITIONED BY (day STRING, hour STRING); > 18/07/26 22:49:20 WARN HiveMetaStore: Location: > file:/Users/yumwang/tmp/spark/spark-warehouse/tbl2 specified for non-external > table:tbl2 > spark-sql> INSERT INTO TABLE tbl2 PARTITION (day = '2018-07-25', hour='01') > SELECT * FROM tbl where 1=0; > 18/07/26 22:49:36 WARN log: Updating partition stats fast for: tbl2 > 18/07/26 22:49:36 WARN log: Updated size to 0 > spark-sql> SHOW PARTITIONS tbl2; > day=2018-07-25/hour=01 > spark-sql> > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-24916) Fix type coercion for IN expression with subquery
[ https://issues.apache.org/jira/browse/SPARK-24916?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yuming Wang resolved SPARK-24916. - Resolution: Duplicate > Fix type coercion for IN expression with subquery > - > > Key: SPARK-24916 > URL: https://issues.apache.org/jira/browse/SPARK-24916 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.2.0, 2.3.0 >Reporter: Yuming Wang >Priority: Major > > How to reproduce: > {code:sql} > CREATE TEMPORARY VIEW t4 AS SELECT * FROM VALUES > (CAST(1 AS DOUBLE), CAST(2 AS STRING), CAST(3 AS STRING)) > AS t1(t4a, t4b, t4c); > CREATE TEMPORARY VIEW t5 AS SELECT * FROM VALUES > (CAST(1 AS DECIMAL(18, 0)), CAST(2 AS STRING), CAST(3 AS BIGINT)) > AS t1(t5a, t5b, t5c); > SELECT * FROM t4 > WHERE > (t4a, t4b, t4c) IN (SELECT t5a, >t5b, >t5c > FROM t5); > {code} > Will throw exception: > {noformat} > org.apache.spark.sql.AnalysisException > cannot resolve '(named_struct('t4a', t4.`t4a`, 't4b', t4.`t4b`, 't4c', > t4.`t4c`) IN (listquery()))' due to data type mismatch: > The data type of one or more elements in the left hand side of an IN subquery > is not compatible with the data type of the output of the subquery > Mismatched columns: > [(t4.`t4a`:double, t5.`t5a`:decimal(18,0)), (t4.`t4c`:string, > t5.`t5c`:bigint)] > Left side: > [double, string, string]. > Right side: > [decimal(18,0), string, bigint].; > {noformat} > But it success on Spark 2.1.x. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-24816) SQL interface support repartitionByRange
[ https://issues.apache.org/jira/browse/SPARK-24816?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yuming Wang resolved SPARK-24816. - Resolution: Won't Fix {{Order by}} is implement by {{rangepartitioning}}. > SQL interface support repartitionByRange > > > Key: SPARK-24816 > URL: https://issues.apache.org/jira/browse/SPARK-24816 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 2.4.0 >Reporter: Yuming Wang >Priority: Major > Attachments: DISTRIBUTE_BY_SORT_BY.png, > RANGE_DISTRIBUTE_BY_SORT_BY.png > > > SQL interface support {{repartitionByRange}} to improvement data pushdown. I > have test this feature with a big table(data size: 1.1 T, row count: > 282,001,954,428) . > The test sql is: > {code:sql} > select * from table where id=401564838907 > {code} > The test result: > |Mode|Input Size|Records|Total Time|Duration|Prepare data Resource Allocation > MB-seconds| > |default|959.2 GB|237624395522|11.2 h|1.3 min|6496280086| > |DISTRIBUTE BY|970.8 GB|244642791213|11.4 h|1.3 min|10536069846| > |SORT BY|456.3 GB|101587838784|5.4 h|31 s|8965158620| > |DISTRIBUTE BY + SORT BY |219.0 GB |51723521593|3.3 h|54 s|12552656774| > |RANGE PARTITION BY |38.5 GB|75355144|45 min|13 s|14525275297| > |RANGE PARTITION BY + SORT BY|17.4 GB|14334724|45 min|12 s|16255296698| -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-19394) "assertion failed: Expected hostname" on macOS when self-assigned IP contains a percent sign
[ https://issues.apache.org/jira/browse/SPARK-19394?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16564532#comment-16564532 ] Yuming Wang commented on SPARK-19394: - Try to add {{::1 localhost}} to /etc/hosts. > "assertion failed: Expected hostname" on macOS when self-assigned IP contains > a percent sign > > > Key: SPARK-19394 > URL: https://issues.apache.org/jira/browse/SPARK-19394 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 2.1.0 >Reporter: Jacek Laskowski >Priority: Minor > > See [this question on > StackOverflow|http://stackoverflow.com/q/41914586/1305344]. > {quote} > So when I am not connected to internet, spark shell fails to load in local > mode. I am running Apache Spark 2.1.0 downloaded from internet, running on my > Mac. So I run ./bin/spark-shell and it gives me the error below. > So I have read the Spark code and it is using Java's > InetAddress.getLocalHost() to find the localhost's IP address. So when I am > connected to internet, I get back an IPv4 with my local hostname. > scala> InetAddress.getLocalHost > res9: java.net.InetAddress = AliKheyrollahis-MacBook-Pro.local/192.168.1.26 > but the key is, when disconnected, I get an IPv6 with a percentage in the > values (it is scoped): > scala> InetAddress.getLocalHost > res10: java.net.InetAddress = > AliKheyrollahis-MacBook-Pro.local/fe80:0:0:0:2b9a:4521:a301:e9a5%10 > And this IP is the same as the one you see in the error message. I feel my > problem is that it throws Spark since it cannot handle %10 in the result. > ... > 17/01/28 22:03:28 INFO SparkUI: Bound SparkUI to 0.0.0.0, and started at > http://fe80:0:0:0:2b9a:4521:a301:e9a5%10:4040 > 17/01/28 22:03:28 INFO Executor: Starting executor ID driver on host localhost > 17/01/28 22:03:28 INFO Executor: Using REPL class URI: > spark://fe80:0:0:0:2b9a:4521:a301:e9a5%10:56107/classes > 17/01/28 22:03:28 ERROR SparkContext: Error initializing SparkContext. > java.lang.AssertionError: assertion failed: Expected hostname > at scala.Predef$.assert(Predef.scala:170) > at org.apache.spark.util.Utils$.checkHost(Utils.scala:931) > at org.apache.spark.util.RpcUtils$.makeDriverRef(RpcUtils.scala:31) > at org.apache.spark.executor.Executor.(Executor.scala:121) > at > org.apache.spark.scheduler.local.LocalEndpoint.(LocalSchedulerBackend.scala:59) > at > org.apache.spark.scheduler.local.LocalSchedulerBackend.start(LocalSchedulerBackend.scala:126) > at > org.apache.spark.scheduler.TaskSchedulerImpl.start(TaskSchedulerImpl.scala:156) > at org.apache.spark.SparkContext.(SparkContext.scala:509) > at org.apache.spark.SparkContext$.getOrCreate(SparkContext.scala:2313) > at > org.apache.spark.sql.SparkSession$Builder$$anonfun$6.apply(SparkSession.scala:868) > at > org.apache.spark.sql.SparkSession$Builder$$anonfun$6.apply(SparkSession.scala:860) > at scala.Option.getOrElse(Option.scala:121) > at > org.apache.spark.sql.SparkSession$Builder.getOrCreate(SparkSession.scala:860) > at org.apache.spark.repl.Main$.createSparkSession(Main.scala:95) > {quote} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-24937) Datasource partition table should load empty partitions
[ https://issues.apache.org/jira/browse/SPARK-24937?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yuming Wang updated SPARK-24937: Description: How to reproduce: {code:sql} spark-sql> CREATE TABLE tbl AS SELECT 1; 18/07/26 22:48:11 WARN HiveMetaStore: Location: file:/Users/yumwang/tmp/spark/spark-warehouse/tbl specified for non-external table:tbl 18/07/26 22:48:15 WARN ObjectStore: Failed to get database global_temp, returning NoSuchObjectException spark-sql> CREATE TABLE tbl1 (c1 BIGINT, day STRING, hour STRING) > USING parquet > PARTITIONED BY (day, hour); spark-sql> INSERT INTO TABLE tbl1 PARTITION (day = '2018-07-25', hour='01') SELECT * FROM tbl where 1=0; spark-sql> SHOW PARTITIONS tbl1; spark-sql> CREATE TABLE tbl2 (c1 BIGINT) > PARTITIONED BY (day STRING, hour STRING); 18/07/26 22:49:20 WARN HiveMetaStore: Location: file:/Users/yumwang/tmp/spark/spark-warehouse/tbl2 specified for non-external table:tbl2 spark-sql> INSERT INTO TABLE tbl2 PARTITION (day = '2018-07-25', hour='01') SELECT * FROM tbl where 1=0; 18/07/26 22:49:36 WARN log: Updating partition stats fast for: tbl2 18/07/26 22:49:36 WARN log: Updated size to 0 spark-sql> SHOW PARTITIONS tbl2; day=2018-07-25/hour=01 spark-sql> {code} was: {code:sql} spark-sql> CREATE TABLE tbl AS SELECT 1; 18/07/26 22:48:11 WARN HiveMetaStore: Location: file:/Users/yumwang/tmp/spark/spark-warehouse/tbl specified for non-external table:tbl 18/07/26 22:48:15 WARN ObjectStore: Failed to get database global_temp, returning NoSuchObjectException spark-sql> CREATE TABLE tbl1 (c1 BIGINT, day STRING, hour STRING) > USING parquet > PARTITIONED BY (day, hour); spark-sql> INSERT INTO TABLE tbl1 PARTITION (day = '2018-07-25', hour='01') SELECT * FROM tbl where 1=0; spark-sql> SHOW PARTITIONS tbl1; spark-sql> CREATE TABLE tbl2 (c1 BIGINT) > PARTITIONED BY (day STRING, hour STRING); 18/07/26 22:49:20 WARN HiveMetaStore: Location: file:/Users/yumwang/tmp/spark/spark-warehouse/tbl2 specified for non-external table:tbl2 spark-sql> INSERT INTO TABLE tbl2 PARTITION (day = '2018-07-25', hour='01') SELECT * FROM tbl where 1=0; 18/07/26 22:49:36 WARN log: Updating partition stats fast for: tbl2 18/07/26 22:49:36 WARN log: Updated size to 0 spark-sql> SHOW PARTITIONS tbl2; day=2018-07-25/hour=01 spark-sql> {code} > Datasource partition table should load empty partitions > --- > > Key: SPARK-24937 > URL: https://issues.apache.org/jira/browse/SPARK-24937 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.4.0 >Reporter: Yuming Wang >Priority: Major > > How to reproduce: > {code:sql} > spark-sql> CREATE TABLE tbl AS SELECT 1; > 18/07/26 22:48:11 WARN HiveMetaStore: Location: > file:/Users/yumwang/tmp/spark/spark-warehouse/tbl specified for non-external > table:tbl > 18/07/26 22:48:15 WARN ObjectStore: Failed to get database global_temp, > returning NoSuchObjectException > spark-sql> CREATE TABLE tbl1 (c1 BIGINT, day STRING, hour STRING) > > USING parquet > > PARTITIONED BY (day, hour); > spark-sql> INSERT INTO TABLE tbl1 PARTITION (day = '2018-07-25', hour='01') > SELECT * FROM tbl where 1=0; > spark-sql> SHOW PARTITIONS tbl1; > spark-sql> CREATE TABLE tbl2 (c1 BIGINT) > > PARTITIONED BY (day STRING, hour STRING); > 18/07/26 22:49:20 WARN HiveMetaStore: Location: > file:/Users/yumwang/tmp/spark/spark-warehouse/tbl2 specified for non-external > table:tbl2 > spark-sql> INSERT INTO TABLE tbl2 PARTITION (day = '2018-07-25', hour='01') > SELECT * FROM tbl where 1=0; > 18/07/26 22:49:36 WARN log: Updating partition stats fast for: tbl2 > 18/07/26 22:49:36 WARN log: Updated size to 0 > spark-sql> SHOW PARTITIONS tbl2; > day=2018-07-25/hour=01 > spark-sql> > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-24937) Datasource partition table should load empty partitions
[ https://issues.apache.org/jira/browse/SPARK-24937?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yuming Wang updated SPARK-24937: Description: How to reproduce: {code:sql} spark-sql> CREATE TABLE tbl AS SELECT 1; spark-sql> CREATE TABLE tbl1 (c1 BIGINT, day STRING, hour STRING) > USING parquet > PARTITIONED BY (day, hour); spark-sql> INSERT INTO TABLE tbl1 PARTITION (day = '2018-07-25', hour='01') SELECT * FROM tbl where 1=0; spark-sql> SHOW PARTITIONS tbl1; spark-sql> CREATE TABLE tbl2 (c1 BIGINT) > PARTITIONED BY (day STRING, hour STRING); 18/07/26 22:49:20 WARN HiveMetaStore: Location: file:/Users/yumwang/tmp/spark/spark-warehouse/tbl2 specified for non-external table:tbl2 spark-sql> INSERT INTO TABLE tbl2 PARTITION (day = '2018-07-25', hour='01') SELECT * FROM tbl where 1=0; 18/07/26 22:49:36 WARN log: Updating partition stats fast for: tbl2 18/07/26 22:49:36 WARN log: Updated size to 0 spark-sql> SHOW PARTITIONS tbl2; day=2018-07-25/hour=01 spark-sql> {code} was: How to reproduce: {code:sql} spark-sql> CREATE TABLE tbl AS SELECT 1; 18/07/26 22:48:11 WARN HiveMetaStore: Location: file:/Users/yumwang/tmp/spark/spark-warehouse/tbl specified for non-external table:tbl 18/07/26 22:48:15 WARN ObjectStore: Failed to get database global_temp, returning NoSuchObjectException spark-sql> CREATE TABLE tbl1 (c1 BIGINT, day STRING, hour STRING) > USING parquet > PARTITIONED BY (day, hour); spark-sql> INSERT INTO TABLE tbl1 PARTITION (day = '2018-07-25', hour='01') SELECT * FROM tbl where 1=0; spark-sql> SHOW PARTITIONS tbl1; spark-sql> CREATE TABLE tbl2 (c1 BIGINT) > PARTITIONED BY (day STRING, hour STRING); 18/07/26 22:49:20 WARN HiveMetaStore: Location: file:/Users/yumwang/tmp/spark/spark-warehouse/tbl2 specified for non-external table:tbl2 spark-sql> INSERT INTO TABLE tbl2 PARTITION (day = '2018-07-25', hour='01') SELECT * FROM tbl where 1=0; 18/07/26 22:49:36 WARN log: Updating partition stats fast for: tbl2 18/07/26 22:49:36 WARN log: Updated size to 0 spark-sql> SHOW PARTITIONS tbl2; day=2018-07-25/hour=01 spark-sql> {code} > Datasource partition table should load empty partitions > --- > > Key: SPARK-24937 > URL: https://issues.apache.org/jira/browse/SPARK-24937 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.4.0 >Reporter: Yuming Wang >Priority: Major > > How to reproduce: > {code:sql} > spark-sql> CREATE TABLE tbl AS SELECT 1; > spark-sql> CREATE TABLE tbl1 (c1 BIGINT, day STRING, hour STRING) > > USING parquet > > PARTITIONED BY (day, hour); > spark-sql> INSERT INTO TABLE tbl1 PARTITION (day = '2018-07-25', hour='01') > SELECT * FROM tbl where 1=0; > spark-sql> SHOW PARTITIONS tbl1; > spark-sql> CREATE TABLE tbl2 (c1 BIGINT) > > PARTITIONED BY (day STRING, hour STRING); > 18/07/26 22:49:20 WARN HiveMetaStore: Location: > file:/Users/yumwang/tmp/spark/spark-warehouse/tbl2 specified for non-external > table:tbl2 > spark-sql> INSERT INTO TABLE tbl2 PARTITION (day = '2018-07-25', hour='01') > SELECT * FROM tbl where 1=0; > 18/07/26 22:49:36 WARN log: Updating partition stats fast for: tbl2 > 18/07/26 22:49:36 WARN log: Updated size to 0 > spark-sql> SHOW PARTITIONS tbl2; > day=2018-07-25/hour=01 > spark-sql> > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-24937) Datasource partition table should load empty partitions
Yuming Wang created SPARK-24937: --- Summary: Datasource partition table should load empty partitions Key: SPARK-24937 URL: https://issues.apache.org/jira/browse/SPARK-24937 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 2.4.0 Reporter: Yuming Wang {code:sql} spark-sql> CREATE TABLE tbl AS SELECT 1; 18/07/26 22:48:11 WARN HiveMetaStore: Location: file:/Users/yumwang/tmp/spark/spark-warehouse/tbl specified for non-external table:tbl 18/07/26 22:48:15 WARN ObjectStore: Failed to get database global_temp, returning NoSuchObjectException spark-sql> CREATE TABLE tbl1 (c1 BIGINT, day STRING, hour STRING) > USING parquet > PARTITIONED BY (day, hour); spark-sql> INSERT INTO TABLE tbl1 PARTITION (day = '2018-07-25', hour='01') SELECT * FROM tbl where 1=0; spark-sql> SHOW PARTITIONS tbl1; spark-sql> CREATE TABLE tbl2 (c1 BIGINT) > PARTITIONED BY (day STRING, hour STRING); 18/07/26 22:49:20 WARN HiveMetaStore: Location: file:/Users/yumwang/tmp/spark/spark-warehouse/tbl2 specified for non-external table:tbl2 spark-sql> INSERT INTO TABLE tbl2 PARTITION (day = '2018-07-25', hour='01') SELECT * FROM tbl where 1=0; 18/07/26 22:49:36 WARN log: Updating partition stats fast for: tbl2 18/07/26 22:49:36 WARN log: Updated size to 0 spark-sql> SHOW PARTITIONS tbl2; day=2018-07-25/hour=01 spark-sql> {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-24937) Datasource partition table should load empty partitions
[ https://issues.apache.org/jira/browse/SPARK-24937?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16558386#comment-16558386 ] Yuming Wang commented on SPARK-24937: - I'm working on. > Datasource partition table should load empty partitions > --- > > Key: SPARK-24937 > URL: https://issues.apache.org/jira/browse/SPARK-24937 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.4.0 >Reporter: Yuming Wang >Priority: Major > > How to reproduce: > {code:sql} > spark-sql> CREATE TABLE tbl AS SELECT 1; > 18/07/26 22:48:11 WARN HiveMetaStore: Location: > file:/Users/yumwang/tmp/spark/spark-warehouse/tbl specified for non-external > table:tbl > 18/07/26 22:48:15 WARN ObjectStore: Failed to get database global_temp, > returning NoSuchObjectException > spark-sql> CREATE TABLE tbl1 (c1 BIGINT, day STRING, hour STRING) > > USING parquet > > PARTITIONED BY (day, hour); > spark-sql> INSERT INTO TABLE tbl1 PARTITION (day = '2018-07-25', hour='01') > SELECT * FROM tbl where 1=0; > spark-sql> SHOW PARTITIONS tbl1; > spark-sql> CREATE TABLE tbl2 (c1 BIGINT) > > PARTITIONED BY (day STRING, hour STRING); > 18/07/26 22:49:20 WARN HiveMetaStore: Location: > file:/Users/yumwang/tmp/spark/spark-warehouse/tbl2 specified for non-external > table:tbl2 > spark-sql> INSERT INTO TABLE tbl2 PARTITION (day = '2018-07-25', hour='01') > SELECT * FROM tbl where 1=0; > 18/07/26 22:49:36 WARN log: Updating partition stats fast for: tbl2 > 18/07/26 22:49:36 WARN log: Updated size to 0 > spark-sql> SHOW PARTITIONS tbl2; > day=2018-07-25/hour=01 > spark-sql> > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-20592) Alter table concatenate is not working as expected.
[ https://issues.apache.org/jira/browse/SPARK-20592?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16569673#comment-16569673 ] Yuming Wang commented on SPARK-20592: - Spark doesn't support this command: [https://github.com/apache/spark/blob/73dd6cf9b558f9d752e1f3c13584344257ad7863/sql/catalyst/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBase.g4#L217] > Alter table concatenate is not working as expected. > --- > > Key: SPARK-20592 > URL: https://issues.apache.org/jira/browse/SPARK-20592 > Project: Spark > Issue Type: Bug > Components: PySpark >Affects Versions: 2.1.0, 2.2.1, 2.3.1 >Reporter: Guru Prabhakar Reddy Marthala >Priority: Major > Labels: hive, pyspark > > Created a table using CTAS from csv to parquet.Parquet table generated > numerous small files.tried alter table concatenate but it's not working as > expected. > spark.sql("CREATE TABLE flight.flight_data(year INT, month INT, day INT, > day_of_week INT, dep_time INT, crs_dep_time INT, arr_time INT, > crs_arr_time INT, unique_carrier STRING, flight_num INT, tail_num > STRING, actual_elapsed_time INT, crs_elapsed_time INT, air_time INT, > arr_delay INT, dep_delay INT, origin STRING, dest STRING, distance > INT, taxi_in INT, taxi_out INT, cancelled INT, cancellation_code > STRING, diverted INT, carrier_delay STRING, weather_delay STRING, > nas_delay STRING, security_delay STRING, late_aircraft_delay STRING) ROW > FORMAT DELIMITED FIELDS TERMINATED BY ',' stored as textfile") > spark.sql("load data local INPATH 'i:/2008/2008.csv' INTO TABLE > flight.flight_data") > spark.sql("create table flight.flight_data_pq stored as parquet as select * > from flight.flight_data") > spark.sql("create table flight.flight_data_orc stored as orc as select * from > flight.flight_data") > pyspark.sql.utils.ParseException: u'\nOperation not allowed: alter table > concatenate(line 1, pos 0)\n\n== SQL ==\nalter table > flight_data.flight_data_pq concatenate\n^^^\n' > Tried on both orc and parquet format.It's not working. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-25085) Insert overwrite a non-partitioned table can delete table folder
[ https://issues.apache.org/jira/browse/SPARK-25085?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16576711#comment-16576711 ] Yuming Wang commented on SPARK-25085: - I'm working on this. > Insert overwrite a non-partitioned table can delete table folder > > > Key: SPARK-25085 > URL: https://issues.apache.org/jira/browse/SPARK-25085 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.3.0 >Reporter: Rui Li >Priority: Major > > When inserting overwrite a data source table, Spark firstly deletes all the > partitions. For non-partitioned table, it will delete the table folder, which > is wrong because table folder may contain information like ACL entries. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-25039) Binary comparison behavior should refer to Teradata
Yuming Wang created SPARK-25039: --- Summary: Binary comparison behavior should refer to Teradata Key: SPARK-25039 URL: https://issues.apache.org/jira/browse/SPARK-25039 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 2.4.0 Reporter: Yuming Wang The main difference is: # When comparing a {{StringType}} value with a {{NumericType}} value, Spark converts the {{StringType}} data to a {{NumericType}} value. But Teradata converts the {{StringType}} data to a {{DoubleType}} value. # When comparing a {{StringType}} value with a {{DateType}} value, Spark converts the {{DateType}} data to a {{StringType}} value. But Teradata converts the {{StringType}} data to a {{DateType}} value. More details: https://github.com/apache/spark/blob/65a4bc143ab5dc2ced589dc107bbafa8a7290931/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/TypeCoercion.scala#L120-L149 https://www.info.teradata.com/HTMLPubs/DB_TTU_16_00/index.html#page/SQL_Reference/B035-1145-160K/lrn1472241011038.html -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-25056) Unify the InConversion and BinaryComparison behaviour when InConversion's list only contains one datatype
Yuming Wang created SPARK-25056: --- Summary: Unify the InConversion and BinaryComparison behaviour when InConversion's list only contains one datatype Key: SPARK-25056 URL: https://issues.apache.org/jira/browse/SPARK-25056 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 2.4.0 Reporter: Yuming Wang {code:java} scala> val df = spark.range(4).toDF().selectExpr("cast(id as decimal(9, 2)) as id") df: org.apache.spark.sql.DataFrame = [id: decimal(9,2)] scala> df.filter("id in('1', '3')").show +---+ | id| +---+ +---+ scala> df.filter("id = '1' or id ='3'").show ++ | id| ++ |1.00| |3.00| ++ {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-25051) where clause on dataset gives AnalysisException
[ https://issues.apache.org/jira/browse/SPARK-25051?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16579181#comment-16579181 ] Yuming Wang edited comment on SPARK-25051 at 8/14/18 3:13 AM: -- Yes. The bug only exist in branch-2.3. I can reproduced by: {code} val df1 = spark.range(4).selectExpr("id", "cast(id as string) as name") val df2 = spark.range(3).selectExpr("id") df1.join(df2, Seq("id"), "left_outer").where(df2("id").isNull).show {code} was (Author: q79969786): Yes. The bug still exists. I can reproduced by: {code:scala} val df1 = spark.range(4).selectExpr("id", "cast(id as string) as name") val df2 = spark.range(3).selectExpr("id") df1.join(df2, Seq("id"), "left_outer").where(df2("id").isNull).show {code} > where clause on dataset gives AnalysisException > --- > > Key: SPARK-25051 > URL: https://issues.apache.org/jira/browse/SPARK-25051 > Project: Spark > Issue Type: Bug > Components: Spark Core, SQL >Affects Versions: 2.3.0 >Reporter: MIK >Priority: Major > > *schemas :* > df1 > => id ts > df2 > => id name country > *code:* > val df = df1.join(df2, Seq("id"), "left_outer").where(df2("id").isNull) > *error*: > org.apache.spark.sql.AnalysisException:Resolved attribute(s) id#0 missing > from xx#15,xx#9L,id#5,xx#6,xx#11,xx#14,xx#13,xx#12,xx#7,xx#16,xx#10,xx#8L in > operator !Filter isnull(id#0). Attribute(s) with the same name appear in the > operation: id. Please check if the right attribute(s) are used.;; > at > org.apache.spark.sql.catalyst.analysis.CheckAnalysis$class.failAnalysis(CheckAnalysis.scala:41) > at > org.apache.spark.sql.catalyst.analysis.Analyzer.failAnalysis(Analyzer.scala:91) > at > org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis$1.apply(CheckAnalysis.scala:289) > at > org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis$1.apply(CheckAnalysis.scala:80) > at > org.apache.spark.sql.catalyst.trees.TreeNode.foreachUp(TreeNode.scala:127) > at > org.apache.spark.sql.catalyst.analysis.CheckAnalysis$class.checkAnalysis(CheckAnalysis.scala:80) > at > org.apache.spark.sql.catalyst.analysis.Analyzer.checkAnalysis(Analyzer.scala:91) > at > org.apache.spark.sql.catalyst.analysis.Analyzer.executeAndCheck(Analyzer.scala:104) > at > org.apache.spark.sql.execution.QueryExecution.analyzed$lzycompute(QueryExecution.scala:57) > at > org.apache.spark.sql.execution.QueryExecution.analyzed(QueryExecution.scala:55) > at > org.apache.spark.sql.execution.QueryExecution.assertAnalyzed(QueryExecution.scala:47) > at org.apache.spark.sql.Dataset.(Dataset.scala:172) > at org.apache.spark.sql.Dataset.(Dataset.scala:178) > at org.apache.spark.sql.Dataset$.apply(Dataset.scala:65) > at org.apache.spark.sql.Dataset.withTypedPlan(Dataset.scala:3300) > at org.apache.spark.sql.Dataset.filter(Dataset.scala:1458) > at org.apache.spark.sql.Dataset.where(Dataset.scala:1486) > This works fine in spark 2.2.2 -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-25051) where clause on dataset gives AnalysisException
[ https://issues.apache.org/jira/browse/SPARK-25051?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16579181#comment-16579181 ] Yuming Wang commented on SPARK-25051: - Yes. The bug still exists. I can reproduced by: {code:scala} val df1 = spark.range(4).selectExpr("id", "cast(id as string) as name") val df2 = spark.range(3).selectExpr("id") df1.join(df2, Seq("id"), "left_outer").where(df2("id").isNull).show {code} > where clause on dataset gives AnalysisException > --- > > Key: SPARK-25051 > URL: https://issues.apache.org/jira/browse/SPARK-25051 > Project: Spark > Issue Type: Bug > Components: Spark Core, SQL >Affects Versions: 2.3.0 >Reporter: MIK >Priority: Major > > *schemas :* > df1 > => id ts > df2 > => id name country > *code:* > val df = df1.join(df2, Seq("id"), "left_outer").where(df2("id").isNull) > *error*: > org.apache.spark.sql.AnalysisException:Resolved attribute(s) id#0 missing > from xx#15,xx#9L,id#5,xx#6,xx#11,xx#14,xx#13,xx#12,xx#7,xx#16,xx#10,xx#8L in > operator !Filter isnull(id#0). Attribute(s) with the same name appear in the > operation: id. Please check if the right attribute(s) are used.;; > at > org.apache.spark.sql.catalyst.analysis.CheckAnalysis$class.failAnalysis(CheckAnalysis.scala:41) > at > org.apache.spark.sql.catalyst.analysis.Analyzer.failAnalysis(Analyzer.scala:91) > at > org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis$1.apply(CheckAnalysis.scala:289) > at > org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis$1.apply(CheckAnalysis.scala:80) > at > org.apache.spark.sql.catalyst.trees.TreeNode.foreachUp(TreeNode.scala:127) > at > org.apache.spark.sql.catalyst.analysis.CheckAnalysis$class.checkAnalysis(CheckAnalysis.scala:80) > at > org.apache.spark.sql.catalyst.analysis.Analyzer.checkAnalysis(Analyzer.scala:91) > at > org.apache.spark.sql.catalyst.analysis.Analyzer.executeAndCheck(Analyzer.scala:104) > at > org.apache.spark.sql.execution.QueryExecution.analyzed$lzycompute(QueryExecution.scala:57) > at > org.apache.spark.sql.execution.QueryExecution.analyzed(QueryExecution.scala:55) > at > org.apache.spark.sql.execution.QueryExecution.assertAnalyzed(QueryExecution.scala:47) > at org.apache.spark.sql.Dataset.(Dataset.scala:172) > at org.apache.spark.sql.Dataset.(Dataset.scala:178) > at org.apache.spark.sql.Dataset$.apply(Dataset.scala:65) > at org.apache.spark.sql.Dataset.withTypedPlan(Dataset.scala:3300) > at org.apache.spark.sql.Dataset.filter(Dataset.scala:1458) > at org.apache.spark.sql.Dataset.where(Dataset.scala:1486) > This works fine in spark 2.2.2 -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-25051) where clause on dataset gives AnalysisException
[ https://issues.apache.org/jira/browse/SPARK-25051?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16577401#comment-16577401 ] Yuming Wang edited comment on SPARK-25051 at 8/12/18 10:51 AM: --- Can you verify it with Spark [2.3.2-rc4 |https://dist.apache.org/repos/dist/dev/spark/v2.3.2-rc4-bin/]? was (Author: q79969786): Can you it with Spark [2.3.2-rc4 |https://dist.apache.org/repos/dist/dev/spark/v2.3.2-rc4-bin/]? > where clause on dataset gives AnalysisException > --- > > Key: SPARK-25051 > URL: https://issues.apache.org/jira/browse/SPARK-25051 > Project: Spark > Issue Type: Bug > Components: Spark Core, SQL >Affects Versions: 2.3.0 >Reporter: MIK >Priority: Major > > *schemas :* > df1 > => id ts > df2 > => id name country > *code:* > val df = df1.join(df2, Seq("id"), "left_outer").where(df2("id").isNull) > *error*: > org.apache.spark.sql.AnalysisException:Resolved attribute(s) id#0 missing > from xx#15,xx#9L,id#5,xx#6,xx#11,xx#14,xx#13,xx#12,xx#7,xx#16,xx#10,xx#8L in > operator !Filter isnull(id#0). Attribute(s) with the same name appear in the > operation: id. Please check if the right attribute(s) are used.;; > at > org.apache.spark.sql.catalyst.analysis.CheckAnalysis$class.failAnalysis(CheckAnalysis.scala:41) > at > org.apache.spark.sql.catalyst.analysis.Analyzer.failAnalysis(Analyzer.scala:91) > at > org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis$1.apply(CheckAnalysis.scala:289) > at > org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis$1.apply(CheckAnalysis.scala:80) > at > org.apache.spark.sql.catalyst.trees.TreeNode.foreachUp(TreeNode.scala:127) > at > org.apache.spark.sql.catalyst.analysis.CheckAnalysis$class.checkAnalysis(CheckAnalysis.scala:80) > at > org.apache.spark.sql.catalyst.analysis.Analyzer.checkAnalysis(Analyzer.scala:91) > at > org.apache.spark.sql.catalyst.analysis.Analyzer.executeAndCheck(Analyzer.scala:104) > at > org.apache.spark.sql.execution.QueryExecution.analyzed$lzycompute(QueryExecution.scala:57) > at > org.apache.spark.sql.execution.QueryExecution.analyzed(QueryExecution.scala:55) > at > org.apache.spark.sql.execution.QueryExecution.assertAnalyzed(QueryExecution.scala:47) > at org.apache.spark.sql.Dataset.(Dataset.scala:172) > at org.apache.spark.sql.Dataset.(Dataset.scala:178) > at org.apache.spark.sql.Dataset$.apply(Dataset.scala:65) > at org.apache.spark.sql.Dataset.withTypedPlan(Dataset.scala:3300) > at org.apache.spark.sql.Dataset.filter(Dataset.scala:1458) > at org.apache.spark.sql.Dataset.where(Dataset.scala:1486) > This works fine in spark 2.2.2 -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-24631) Cannot up cast column from bigint to smallint as it may truncate
[ https://issues.apache.org/jira/browse/SPARK-24631?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16577502#comment-16577502 ] Yuming Wang edited comment on SPARK-24631 at 8/12/18 11:18 AM: --- I hit this issue. fixed it by recreate table/view. Can you execute 2 SQLs like below: {code:sql} desc testtable; {code} {code:sql} show create table testtable; {code} was (Author: q79969786): I hit this issue. fixed it by recreate table/view. Can you execute 2 SQLs like below: {code:java} desc testtable; {code} {code:java} show create table testtable; {code} > Cannot up cast column from bigint to smallint as it may truncate > > > Key: SPARK-24631 > URL: https://issues.apache.org/jira/browse/SPARK-24631 > Project: Spark > Issue Type: New JIRA Project > Components: Spark Core, Spark Submit >Affects Versions: 2.2.1 >Reporter: Sivakumar >Priority: Major > > Getting the below error when executing the simple select query, > Sample: > Table Description: > name: String, id: BigInt > val df=spark.sql("select name,id from testtable") > ERROR: {color:#ff}Cannot up cast column "id" from bigint to smallint as > it may truncate.{color} > I am not doing any transformation's, I am just trying to query a table ,But > still I am getting the error. > I am getting this error only on production cluster and only for a single > table, other tables are running fine. > + more data, > val df=spark.sql("select* from table_name") > I am just trying this query a table. But with other tables it is running fine. > {color:#d04437}18/06/22 01:36:29 ERROR Driver1: [] [main] Exception occurred: > org.apache.spark.sql.AnalysisException: Cannot up cast `column_name` from > bigint to column_name#2525: smallint as it may truncate.{color} > that specific column is having Bigint datatype, But there were other table's > that ran fine with Bigint columns. > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-24631) Cannot up cast column from bigint to smallint as it may truncate
[ https://issues.apache.org/jira/browse/SPARK-24631?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16577502#comment-16577502 ] Yuming Wang commented on SPARK-24631: - I hit this issue. fixed it by recreate table/view. Can you execute 2 SQLs like below: {code:java} desc testtable; {code} {code:java} show create table testtable; {code} > Cannot up cast column from bigint to smallint as it may truncate > > > Key: SPARK-24631 > URL: https://issues.apache.org/jira/browse/SPARK-24631 > Project: Spark > Issue Type: New JIRA Project > Components: Spark Core, Spark Submit >Affects Versions: 2.2.1 >Reporter: Sivakumar >Priority: Major > > Getting the below error when executing the simple select query, > Sample: > Table Description: > name: String, id: BigInt > val df=spark.sql("select name,id from testtable") > ERROR: {color:#ff}Cannot up cast column "id" from bigint to smallint as > it may truncate.{color} > I am not doing any transformation's, I am just trying to query a table ,But > still I am getting the error. > I am getting this error only on production cluster and only for a single > table, other tables are running fine. > + more data, > val df=spark.sql("select* from table_name") > I am just trying this query a table. But with other tables it is running fine. > {color:#d04437}18/06/22 01:36:29 ERROR Driver1: [] [main] Exception occurred: > org.apache.spark.sql.AnalysisException: Cannot up cast `column_name` from > bigint to column_name#2525: smallint as it may truncate.{color} > that specific column is having Bigint datatype, But there were other table's > that ran fine with Bigint columns. > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-25135) insert datasource table may all null when select from view
[ https://issues.apache.org/jira/browse/SPARK-25135?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yuming Wang updated SPARK-25135: Affects Version/s: (was: 2.4.0) 2.3.0 2.3.1 > insert datasource table may all null when select from view > -- > > Key: SPARK-25135 > URL: https://issues.apache.org/jira/browse/SPARK-25135 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.3.0, 2.3.1 >Reporter: Yuming Wang >Priority: Major > > How to reproduce: > {code:scala} > val path = "/tmp/spark/parquet" > val cnt = 30 > spark.range(cnt).selectExpr("cast(id as bigint) as col1", "cast(id as bigint) > as col2").write.mode("overwrite").parquet(path) > spark.sql(s"CREATE TABLE table1(col1 bigint, col2 bigint) using parquet > location '$path'") > spark.sql("create view view1 as select col1, col2 from table1 where col1 > > -20") > spark.sql("create table table2 (COL1 BIGINT, COL2 BIGINT) using parquet") > spark.sql("insert overwrite table table2 select COL1, COL2 from view1") > spark.table("table2").show > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-25135) insert datasource table may all null when select from view
Yuming Wang created SPARK-25135: --- Summary: insert datasource table may all null when select from view Key: SPARK-25135 URL: https://issues.apache.org/jira/browse/SPARK-25135 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 2.4.0 Reporter: Yuming Wang How to reproduce: {code:scala} val path = "/tmp/spark/parquet" val cnt = 30 spark.range(cnt).selectExpr("cast(id as bigint) as col1", "cast(id as bigint) as col2").write.mode("overwrite").parquet(path) spark.sql(s"CREATE TABLE table1(col1 bigint, col2 bigint) using parquet location '$path'") spark.sql("create view view1 as select col1, col2 from table1 where col1 > -20") spark.sql("create table table2 (COL1 BIGINT, COL2 BIGINT) using parquet") spark.sql("insert overwrite table table2 select COL1, COL2 from view1") spark.table("table2").show {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-25132) Spark returns NULL for a column whose Hive metastore schema and Parquet schema are in different letter cases
[ https://issues.apache.org/jira/browse/SPARK-25132?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16583688#comment-16583688 ] Yuming Wang commented on SPARK-25132: - Stackoverflow has been asked this question [Spark SQL returns null for a column in HIVE table while HIVE query returns non null values|https://stackoverflow.com/questions/50298909/spark-sql-returns-null-for-a-column-in-hive-table-while-hive-query-returns-non-n]. > Spark returns NULL for a column whose Hive metastore schema and Parquet > schema are in different letter cases > > > Key: SPARK-25132 > URL: https://issues.apache.org/jira/browse/SPARK-25132 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.3.1 >Reporter: Chenxiao Mao >Priority: Major > > Spark SQL returns NULL for a column whose Hive metastore schema and Parquet > schema are in different letter cases, regardless of spark.sql.caseSensitive > set to true or false. > Here is a simple example to reproduce this issue: > scala> spark.range(5).toDF.write.mode("overwrite").saveAsTable("t1") > spark-sql> show create table t1; > CREATE TABLE `t1` (`id` BIGINT) > USING parquet > OPTIONS ( > `serialization.format` '1' > ) > spark-sql> CREATE TABLE `t2` (`ID` BIGINT) > > USING parquet > > LOCATION 'hdfs://localhost/user/hive/warehouse/t1'; > spark-sql> select * from t1; > 0 > 1 > 2 > 3 > 4 > spark-sql> select * from t2; > NULL > NULL > NULL > NULL > NULL > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Issue Comment Deleted] (SPARK-25085) Insert overwrite a non-partitioned table can delete table folder
[ https://issues.apache.org/jira/browse/SPARK-25085?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yuming Wang updated SPARK-25085: Comment: was deleted (was: I'm working on this.) > Insert overwrite a non-partitioned table can delete table folder > > > Key: SPARK-25085 > URL: https://issues.apache.org/jira/browse/SPARK-25085 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.3.0 >Reporter: Rui Li >Priority: Major > > When inserting overwrite a data source table, Spark firstly deletes all the > partitions. For non-partitioned table, it will delete the table folder, which > is wrong because table folder may contain information like ACL entries. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-25085) Insert overwrite a non-partitioned table can delete table folder
[ https://issues.apache.org/jira/browse/SPARK-25085?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16577221#comment-16577221 ] Yuming Wang commented on SPARK-25085: - [~lirui] A another issue you may be interest: [SPARK-24937|https://issues.apache.org/jira/browse/SPARK-24937]. > Insert overwrite a non-partitioned table can delete table folder > > > Key: SPARK-25085 > URL: https://issues.apache.org/jira/browse/SPARK-25085 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.3.0 >Reporter: Rui Li >Priority: Major > > When inserting overwrite a data source table, Spark firstly deletes all the > partitions. For non-partitioned table, it will delete the table folder, which > is wrong because table folder may contain information like ACL entries. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-25084) "distribute by" on multiple columns may lead to codegen issue
[ https://issues.apache.org/jira/browse/SPARK-25084?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16575771#comment-16575771 ] Yuming Wang commented on SPARK-25084: - [~smilegator], [~jerryshao] I think It should be target 2.3.2. > "distribute by" on multiple columns may lead to codegen issue > - > > Key: SPARK-25084 > URL: https://issues.apache.org/jira/browse/SPARK-25084 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.3.1 >Reporter: yucai >Priority: Major > > Test Query: > {code:java} > select * from store_sales distribute by (ss_sold_time_sk, ss_item_sk, > ss_customer_sk, ss_cdemo_sk, ss_addr_sk, ss_promo_sk, ss_ext_list_price, > ss_net_profit) limit 1000;{code} > Wrong Codegen: > {code:java} > /* 146 */ private int computeHashForStruct_0(InternalRow > mutableStateArray[0], int value1) { > /* 147 */ > /* 148 */ > /* 149 */ if (!mutableStateArray[0].isNullAt(0)) { > /* 150 */ > /* 151 */ final int element = mutableStateArray[0].getInt(0); > /* 152 */ value1 = > org.apache.spark.unsafe.hash.Murmur3_x86_32.hashInt(element, value1); > /* 153 */ > /* 154 */ }{code} > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-25084) "distribute by" on multiple columns may lead to codegen issue
[ https://issues.apache.org/jira/browse/SPARK-25084?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16575777#comment-16575777 ] Yuming Wang commented on SPARK-25084: - It's a regression. > "distribute by" on multiple columns may lead to codegen issue > - > > Key: SPARK-25084 > URL: https://issues.apache.org/jira/browse/SPARK-25084 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.3.1 >Reporter: yucai >Priority: Major > > Test Query: > {code:java} > select * from store_sales distribute by (ss_sold_time_sk, ss_item_sk, > ss_customer_sk, ss_cdemo_sk, ss_addr_sk, ss_promo_sk, ss_ext_list_price, > ss_net_profit) limit 1000;{code} > Wrong Codegen: > {code:java} > /* 146 */ private int computeHashForStruct_0(InternalRow > mutableStateArray[0], int value1) { > /* 147 */ > /* 148 */ > /* 149 */ if (!mutableStateArray[0].isNullAt(0)) { > /* 150 */ > /* 151 */ final int element = mutableStateArray[0].getInt(0); > /* 152 */ value1 = > org.apache.spark.unsafe.hash.Murmur3_x86_32.hashInt(element, value1); > /* 153 */ > /* 154 */ }{code} > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-25071) BuildSide is coming not as expected with join queries
[ https://issues.apache.org/jira/browse/SPARK-25071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16577385#comment-16577385 ] Yuming Wang commented on SPARK-25071: - I think it's correct. CBO based on RowCount. {code:scala} def getOutputSize( attributes: Seq[Attribute], outputRowCount: BigInt, attrStats: AttributeMap[ColumnStat] = AttributeMap(Nil)): BigInt = { // Output size can't be zero, or sizeInBytes of BinaryNode will also be zero // (simple computation of statistics returns product of children). if (outputRowCount > 0) outputRowCount * getSizePerRow(attributes, attrStats) else 1 } {code} {code:scala} def getSizePerRow( attributes: Seq[Attribute], attrStats: AttributeMap[ColumnStat] = AttributeMap(Nil)): BigInt = { // We assign a generic overhead for a Row object, the actual overhead is different for different // Row format. 8 + attributes.map { attr => if (attrStats.get(attr).map(_.avgLen.isDefined).getOrElse(false)) { attr.dataType match { case StringType => // UTF8String: base + offset + numBytes attrStats(attr).avgLen.get + 8 + 4 case _ => attrStats(attr).avgLen.get } } else { attr.dataType.defaultSize } }.sum } {code} So for Scenario 2: right.stats.sizeInBytes=32 left.stats.sizeInBytes=32 {code:scala} private def broadcastSide( canBuildLeft: Boolean, canBuildRight: Boolean, left: LogicalPlan, right: LogicalPlan): BuildSide = { def smallerSide = if (right.stats.sizeInBytes <= left.stats.sizeInBytes) BuildRight else BuildLeft if (canBuildRight && canBuildLeft) { // Broadcast smaller side base on its estimated physical size // if both sides have broadcast hint smallerSide } else if (canBuildRight) { BuildRight } else if (canBuildLeft) { BuildLeft } else { // for the last default broadcast nested loop join smallerSide } } {code} you can verify it by: {code:scala} spark.sql("CREATE TABLE small4 (c1 bigint) TBLPROPERTIES ('numRows'='2', 'rawDataSize'='600','totalSize'='80')") spark.sql("CREATE TABLE big4 (c1 string) TBLPROPERTIES ('numRows'='2', 'rawDataSize'='6000', 'totalSize'='800')") val plan = spark.sql("select * from small4 t1 join big4 t2 on (t1.c1 = t2.c1)").queryExecution.executedPlan val buildSide = plan.children.head.asInstanceOf[BroadcastHashJoinExec].buildSide println(buildSide) or spark.sql("CREATE TABLE small4 (c1 bigint) TBLPROPERTIES ('numRows'='2', 'rawDataSize'='600','totalSize'='80')") spark.sql("CREATE TABLE big4 (c1 bigint, c2 bigint) TBLPROPERTIES ('numRows'='2', 'rawDataSize'='6000', 'totalSize'='800')") val plan = spark.sql("select * from small4 t1 join big4 t2 on (t1.c1 = t2.c1)").queryExecution.executedPlan val buildSide = plan.children.head.asInstanceOf[BroadcastHashJoinExec].buildSide println(buildSide) {code} > BuildSide is coming not as expected with join queries > - > > Key: SPARK-25071 > URL: https://issues.apache.org/jira/browse/SPARK-25071 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.3.1 > Environment: Spark 2.3.1 > Hadoop 2.7.3 >Reporter: Ayush Anubhava >Priority: Major > > *BuildSide is not coming as expected.* > Pre-requisites: > *CBO is set as true & spark.sql.cbo.joinReorder.enabled= true.* > *import org.apache.spark.sql.execution.joins.BroadcastHashJoinExec* > *Steps:* > *Scenario 1:* > spark.sql("CREATE TABLE small3 (c1 bigint) TBLPROPERTIES ('numRows'='2', > 'rawDataSize'='600','totalSize'='800')") > spark.sql("CREATE TABLE big3 (c1 bigint) TBLPROPERTIES ('numRows'='2', > 'rawDataSize'='6000', 'totalSize'='800')") > val plan = spark.sql("select * from small3 t1 join big3 t2 on (t1.c1 = > t2.c1)").queryExecution.executedPlan > val buildSide = > plan.children.head.asInstanceOf[BroadcastHashJoinExec].buildSide > println(buildSide) > > *Result 1:* > scala> val plan = spark.sql("select * from small3 t1 join big3 t2 on (t1.c1 = > t2.c1)").queryExecution.executedPlan > plan: org.apache.spark.sql.execution.SparkPlan = > *(2) BroadcastHashJoin [c1#0L|#0L], [c1#1L|#1L], Inner, BuildRight > :- *(2) Filter isnotnull(c1#0L) > : +- HiveTableScan [c1#0L|#0L], HiveTableRelation `default`.`small3`, > org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, [c1#0L|#0L] > +- BroadcastExchange HashedRelationBroadcastMode(List(input[0, bigint, > false])) > +- *(1) Filter isnotnull(c1#1L) > +- HiveTableScan [c1#1L|#1L], HiveTableRelation `default`.`big3`, > org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, [c1#1L|#1L] > scala> val buildSide = >
[jira] [Commented] (SPARK-25051) where clause on dataset gives AnalysisException
[ https://issues.apache.org/jira/browse/SPARK-25051?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16577401#comment-16577401 ] Yuming Wang commented on SPARK-25051: - Can you it with Spark [2.3.2-rc4 |https://dist.apache.org/repos/dist/dev/spark/v2.3.2-rc4-bin/]? > where clause on dataset gives AnalysisException > --- > > Key: SPARK-25051 > URL: https://issues.apache.org/jira/browse/SPARK-25051 > Project: Spark > Issue Type: Bug > Components: Spark Core, SQL >Affects Versions: 2.3.0 >Reporter: MIK >Priority: Major > > *schemas :* > df1 > => id ts > df2 > => id name country > *code:* > val df = df1.join(df2, Seq("id"), "left_outer").where(df2("id").isNull) > *error*: > org.apache.spark.sql.AnalysisException:Resolved attribute(s) id#0 missing > from xx#15,xx#9L,id#5,xx#6,xx#11,xx#14,xx#13,xx#12,xx#7,xx#16,xx#10,xx#8L in > operator !Filter isnull(id#0). Attribute(s) with the same name appear in the > operation: id. Please check if the right attribute(s) are used.;; > at > org.apache.spark.sql.catalyst.analysis.CheckAnalysis$class.failAnalysis(CheckAnalysis.scala:41) > at > org.apache.spark.sql.catalyst.analysis.Analyzer.failAnalysis(Analyzer.scala:91) > at > org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis$1.apply(CheckAnalysis.scala:289) > at > org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis$1.apply(CheckAnalysis.scala:80) > at > org.apache.spark.sql.catalyst.trees.TreeNode.foreachUp(TreeNode.scala:127) > at > org.apache.spark.sql.catalyst.analysis.CheckAnalysis$class.checkAnalysis(CheckAnalysis.scala:80) > at > org.apache.spark.sql.catalyst.analysis.Analyzer.checkAnalysis(Analyzer.scala:91) > at > org.apache.spark.sql.catalyst.analysis.Analyzer.executeAndCheck(Analyzer.scala:104) > at > org.apache.spark.sql.execution.QueryExecution.analyzed$lzycompute(QueryExecution.scala:57) > at > org.apache.spark.sql.execution.QueryExecution.analyzed(QueryExecution.scala:55) > at > org.apache.spark.sql.execution.QueryExecution.assertAnalyzed(QueryExecution.scala:47) > at org.apache.spark.sql.Dataset.(Dataset.scala:172) > at org.apache.spark.sql.Dataset.(Dataset.scala:178) > at org.apache.spark.sql.Dataset$.apply(Dataset.scala:65) > at org.apache.spark.sql.Dataset.withTypedPlan(Dataset.scala:3300) > at org.apache.spark.sql.Dataset.filter(Dataset.scala:1458) > at org.apache.spark.sql.Dataset.where(Dataset.scala:1486) > This works fine in spark 2.2.2 -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-25230) Upper behaves incorrect for string contains "ß"
[ https://issues.apache.org/jira/browse/SPARK-25230?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yuming Wang updated SPARK-25230: Description: How to reproduce: {code:sql} spark-sql> SELECT upper('Haßler'); HASSLER {code} Mainstream databases returns {{HAßLER}}. !MySQL.png! was: How to reproduce: {code:sql} spark-sql> SELECT upper('Haßler'); HASSLER {code} Mainstream databases returns {{HAßLER}}. !Teradata.jpeg! > Upper behaves incorrect for string contains "ß" > --- > > Key: SPARK-25230 > URL: https://issues.apache.org/jira/browse/SPARK-25230 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.3.1 >Reporter: Yuming Wang >Priority: Major > Attachments: MySQL.png, Oracle.png, Teradata.jpeg > > > How to reproduce: > {code:sql} > spark-sql> SELECT upper('Haßler'); > HASSLER > {code} > Mainstream databases returns {{HAßLER}}. > !MySQL.png! -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-25230) Upper behaves incorrect for string contains "ß"
[ https://issues.apache.org/jira/browse/SPARK-25230?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yuming Wang updated SPARK-25230: Description: How to reproduce: {code:sql} spark-sql> SELECT upper('Haßler'); HASSLER {code} Mainstream databases returns {{HAßLER}}. !Teradata.jpeg! was: How to reproduce: {code:sql} spark-sql> SELECT upper('Haßler'); HASSLER {code} Mainstream databases returns {{HAßLER}}. > Upper behaves incorrect for string contains "ß" > --- > > Key: SPARK-25230 > URL: https://issues.apache.org/jira/browse/SPARK-25230 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.3.1 >Reporter: Yuming Wang >Priority: Major > Attachments: MySQL.png, Oracle.png, Teradata.jpeg > > > How to reproduce: > {code:sql} > spark-sql> SELECT upper('Haßler'); > HASSLER > {code} > Mainstream databases returns {{HAßLER}}. > !Teradata.jpeg! -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-25230) Upper behaves incorrect for string contains "ß"
[ https://issues.apache.org/jira/browse/SPARK-25230?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yuming Wang updated SPARK-25230: Attachment: MySQL.png > Upper behaves incorrect for string contains "ß" > --- > > Key: SPARK-25230 > URL: https://issues.apache.org/jira/browse/SPARK-25230 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.3.1 >Reporter: Yuming Wang >Priority: Major > Attachments: MySQL.png, Teradata.jpeg > > > How to reproduce: > {code:sql} > spark-sql> SELECT upper('Haßler'); > HASSLER > {code} > Mainstream databases returns {{HAßLER}}. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-25230) Upper behaves incorrect for string contains "ß"
Yuming Wang created SPARK-25230: --- Summary: Upper behaves incorrect for string contains "ß" Key: SPARK-25230 URL: https://issues.apache.org/jira/browse/SPARK-25230 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 2.3.1 Reporter: Yuming Wang How to reproduce: {code:sql} spark-sql> SELECT upper('Haßler'); HASSLER {code} Mainstream databases returns {{HAßLER}}. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-25230) Upper behaves incorrect for string contains "ß"
[ https://issues.apache.org/jira/browse/SPARK-25230?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yuming Wang updated SPARK-25230: Attachment: WechatIMG511.jpeg > Upper behaves incorrect for string contains "ß" > --- > > Key: SPARK-25230 > URL: https://issues.apache.org/jira/browse/SPARK-25230 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.3.1 >Reporter: Yuming Wang >Priority: Major > > How to reproduce: > {code:sql} > spark-sql> SELECT upper('Haßler'); > HASSLER > {code} > Mainstream databases returns {{HAßLER}}. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-25230) Upper behaves incorrect for string contains "ß"
[ https://issues.apache.org/jira/browse/SPARK-25230?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yuming Wang updated SPARK-25230: Attachment: (was: WechatIMG511.jpeg) > Upper behaves incorrect for string contains "ß" > --- > > Key: SPARK-25230 > URL: https://issues.apache.org/jira/browse/SPARK-25230 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.3.1 >Reporter: Yuming Wang >Priority: Major > > How to reproduce: > {code:sql} > spark-sql> SELECT upper('Haßler'); > HASSLER > {code} > Mainstream databases returns {{HAßLER}}. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-25230) Upper behaves incorrect for string contains "ß"
[ https://issues.apache.org/jira/browse/SPARK-25230?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yuming Wang updated SPARK-25230: Attachment: Teradata.jpeg > Upper behaves incorrect for string contains "ß" > --- > > Key: SPARK-25230 > URL: https://issues.apache.org/jira/browse/SPARK-25230 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.3.1 >Reporter: Yuming Wang >Priority: Major > Attachments: Teradata.jpeg > > > How to reproduce: > {code:sql} > spark-sql> SELECT upper('Haßler'); > HASSLER > {code} > Mainstream databases returns {{HAßLER}}. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-25230) Upper behaves incorrect for string contains "ß"
[ https://issues.apache.org/jira/browse/SPARK-25230?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yuming Wang updated SPARK-25230: Description: How to reproduce: {code:sql} spark-sql> SELECT upper('Haßler'); HASSLER {code} Mainstream databases returns {{HAßLER}}. !MySQL.png! This was: How to reproduce: {code:sql} spark-sql> SELECT upper('Haßler'); HASSLER {code} Mainstream databases returns {{HAßLER}}. !MySQL.png! > Upper behaves incorrect for string contains "ß" > --- > > Key: SPARK-25230 > URL: https://issues.apache.org/jira/browse/SPARK-25230 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.3.1 >Reporter: Yuming Wang >Priority: Major > Attachments: MySQL.png, Oracle.png, Teradata.jpeg > > > How to reproduce: > {code:sql} > spark-sql> SELECT upper('Haßler'); > HASSLER > {code} > Mainstream databases returns {{HAßLER}}. > !MySQL.png! > > This -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-25230) Upper behaves incorrect for string contains "ß"
[ https://issues.apache.org/jira/browse/SPARK-25230?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yuming Wang updated SPARK-25230: Description: How to reproduce: {code:sql} spark-sql> SELECT upper('Haßler'); HASSLER {code} Mainstream databases returns {{HAßLER}}. !MySQL.png! This behavior may lead to data inconsistency: {code:sql} create temporary view SPARK_25230 as select * from values ("Hassler"), ("Haßler") as EMPLOYEE(name); select UPPER(name) from SPARK_25230 group by 1; {code} was: How to reproduce: {code:sql} spark-sql> SELECT upper('Haßler'); HASSLER {code} Mainstream databases returns {{HAßLER}}. !MySQL.png! This behave {code:sql} create temporary view SPARK_25230 as select * from values ("Hassler"), ("Haßler") as EMPLOYEE(name); select UPPER(name) from SPARK_25230 group by 1; {code} > Upper behaves incorrect for string contains "ß" > --- > > Key: SPARK-25230 > URL: https://issues.apache.org/jira/browse/SPARK-25230 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.3.1 >Reporter: Yuming Wang >Priority: Major > Attachments: MySQL.png, Oracle.png, Teradata.jpeg > > > How to reproduce: > {code:sql} > spark-sql> SELECT upper('Haßler'); > HASSLER > {code} > Mainstream databases returns {{HAßLER}}. > !MySQL.png! > > This behavior may lead to data inconsistency: > {code:sql} > create temporary view SPARK_25230 as select * from values > ("Hassler"), > ("Haßler") > as EMPLOYEE(name); > select UPPER(name) from SPARK_25230 group by 1; > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-25230) Upper behavior incorrect for string contains "ß"
[ https://issues.apache.org/jira/browse/SPARK-25230?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yuming Wang updated SPARK-25230: Summary: Upper behavior incorrect for string contains "ß" (was: Upper behaves incorrect for string contains "ß") > Upper behavior incorrect for string contains "ß" > > > Key: SPARK-25230 > URL: https://issues.apache.org/jira/browse/SPARK-25230 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.3.1 >Reporter: Yuming Wang >Priority: Major > Attachments: MySQL.png, Oracle.png, Teradata.jpeg > > > How to reproduce: > {code:sql} > spark-sql> SELECT upper('Haßler'); > HASSLER > {code} > Mainstream databases returns {{HAßLER}}. > !MySQL.png! > > This behavior may lead to data inconsistency: > {code:sql} > create temporary view SPARK_25230 as select * from values > ("Hassler"), > ("Haßler") > as EMPLOYEE(name); > select UPPER(name) from SPARK_25230 group by 1; > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-25230) Upper behavior incorrect for string contains "ß"
[ https://issues.apache.org/jira/browse/SPARK-25230?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yuming Wang updated SPARK-25230: Description: How to reproduce: {code:sql} spark-sql> SELECT upper('Haßler'); HASSLER {code} Mainstream databases returns {{HAßLER}}. !MySQL.png! This behavior may lead to data inconsistency: {code:sql} create temporary view SPARK_25230 as select * from values ("Hassler"), ("Haßler") as EMPLOYEE(name); select UPPER(name) from SPARK_25230 group by 1; -- result HASSLER{code} was: How to reproduce: {code:sql} spark-sql> SELECT upper('Haßler'); HASSLER {code} Mainstream databases returns {{HAßLER}}. !MySQL.png! This behavior may lead to data inconsistency: {code:sql} create temporary view SPARK_25230 as select * from values ("Hassler"), ("Haßler") as EMPLOYEE(name); select UPPER(name) from SPARK_25230 group by 1; {code} > Upper behavior incorrect for string contains "ß" > > > Key: SPARK-25230 > URL: https://issues.apache.org/jira/browse/SPARK-25230 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.3.1 >Reporter: Yuming Wang >Priority: Major > Attachments: MySQL.png, Oracle.png, Teradata.jpeg > > > How to reproduce: > {code:sql} > spark-sql> SELECT upper('Haßler'); > HASSLER > {code} > Mainstream databases returns {{HAßLER}}. > !MySQL.png! > > This behavior may lead to data inconsistency: > {code:sql} > create temporary view SPARK_25230 as select * from values > ("Hassler"), > ("Haßler") > as EMPLOYEE(name); > select UPPER(name) from SPARK_25230 group by 1; > -- result > HASSLER{code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-25230) Upper behaves incorrect for string contains "ß"
[ https://issues.apache.org/jira/browse/SPARK-25230?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yuming Wang updated SPARK-25230: Description: How to reproduce: {code:sql} spark-sql> SELECT upper('Haßler'); HASSLER {code} Mainstream databases returns {{HAßLER}}. !MySQL.png! This behave was: How to reproduce: {code:sql} spark-sql> SELECT upper('Haßler'); HASSLER {code} Mainstream databases returns {{HAßLER}}. !MySQL.png! This > Upper behaves incorrect for string contains "ß" > --- > > Key: SPARK-25230 > URL: https://issues.apache.org/jira/browse/SPARK-25230 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.3.1 >Reporter: Yuming Wang >Priority: Major > Attachments: MySQL.png, Oracle.png, Teradata.jpeg > > > How to reproduce: > {code:sql} > spark-sql> SELECT upper('Haßler'); > HASSLER > {code} > Mainstream databases returns {{HAßLER}}. > !MySQL.png! > > This behave -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org