[jira] [Created] (SPARK-12637) Print stage info of finished stages properly
Navis created SPARK-12637: - Summary: Print stage info of finished stages properly Key: SPARK-12637 URL: https://issues.apache.org/jira/browse/SPARK-12637 Project: Spark Issue Type: Improvement Components: Spark Core Reporter: Navis Priority: Trivial Currently it prints hashcode of stage info, which seemed not that useful. {noformat} INFO scheduler.StatsReportListener: Finished stage: org.apache.spark.scheduler.StageInfo@2eb47d79 {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-12619) Combine small files in a hadoop directory into single split
Navis created SPARK-12619: - Summary: Combine small files in a hadoop directory into single split Key: SPARK-12619 URL: https://issues.apache.org/jira/browse/SPARK-12619 Project: Spark Issue Type: Improvement Components: Spark Core Reporter: Navis Priority: Trivial When a directory contains too many (small) files, whole spark cluster will be exhausted scheduling tasks created for each file. Custom input format can handle that but if you're using hive metastore, it could hardly be an option. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-9686) Spark Thrift server doesn't return correct JDBC metadata
[ https://issues.apache.org/jira/browse/SPARK-9686?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15010244#comment-15010244 ] Navis commented on SPARK-9686: -- @Cheng Lian Sorry, I've confused "remote metastore" with "remote database". I'm using local metastore, without hive.metastore.uris setting. bq. We should override corresponding methods in SparkSQLCLIService and dispatch these JDBC calls to the metastore Hive client. The patch attached is exactly for that, with configuration replacement for asserting valid metastore configuration (without this, Hive.get() destroys connection and make new one with dummy derby). > Spark Thrift server doesn't return correct JDBC metadata > - > > Key: SPARK-9686 > URL: https://issues.apache.org/jira/browse/SPARK-9686 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 1.4.0, 1.4.1, 1.5.0, 1.5.1, 1.5.2 >Reporter: pin_zhang >Assignee: Cheng Lian > Attachments: SPARK-9686.1.patch.txt > > > 1. Start start-thriftserver.sh > 2. connect with beeline > 3. create table > 4.show tables, the new created table returned > 5. > Class.forName("org.apache.hive.jdbc.HiveDriver"); > String URL = "jdbc:hive2://localhost:1/default"; >Properties info = new Properties(); > Connection conn = DriverManager.getConnection(URL, info); > ResultSet tables = conn.getMetaData().getTables(conn.getCatalog(), >null, null, null); > Problem: >No tables with returned this API, that work in spark1.3 -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-9686) Spark Thrift server doesn't return correct JDBC metadata
[ https://issues.apache.org/jira/browse/SPARK-9686?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15010244#comment-15010244 ] Navis edited comment on SPARK-9686 at 11/18/15 5:06 AM: [~lian cheng] Sorry, I've confused "remote metastore" with "remote database". I'm using local metastore, without hive.metastore.uris setting. And, bq. We should override corresponding methods in SparkSQLCLIService and dispatch these JDBC calls to the metastore Hive client. The patch attached is exactly for that, with configuration replacement for asserting valid metastore configuration (without this, Hive.get() destroys connection and make new one with dummy derby). was (Author: navis): @Cheng Lian Sorry, I've confused "remote metastore" with "remote database". I'm using local metastore, without hive.metastore.uris setting. bq. We should override corresponding methods in SparkSQLCLIService and dispatch these JDBC calls to the metastore Hive client. The patch attached is exactly for that, with configuration replacement for asserting valid metastore configuration (without this, Hive.get() destroys connection and make new one with dummy derby). > Spark Thrift server doesn't return correct JDBC metadata > - > > Key: SPARK-9686 > URL: https://issues.apache.org/jira/browse/SPARK-9686 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 1.4.0, 1.4.1, 1.5.0, 1.5.1, 1.5.2 >Reporter: pin_zhang >Assignee: Cheng Lian > Attachments: SPARK-9686.1.patch.txt > > > 1. Start start-thriftserver.sh > 2. connect with beeline > 3. create table > 4.show tables, the new created table returned > 5. > Class.forName("org.apache.hive.jdbc.HiveDriver"); > String URL = "jdbc:hive2://localhost:1/default"; >Properties info = new Properties(); > Connection conn = DriverManager.getConnection(URL, info); > ResultSet tables = conn.getMetaData().getTables(conn.getCatalog(), >null, null, null); > Problem: >No tables with returned this API, that work in spark1.3 -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-9686) Spark hive jdbc client cannot get table from metadata store
[ https://issues.apache.org/jira/browse/SPARK-9686?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15006352#comment-15006352 ] Navis commented on SPARK-9686: -- [~lian cheng] It's configured with remote database (maria-db in my case) of course. But those are overwritten to derby values when SparkSQLEnv is initialized. I've just overwritten again it with values in metadataHive before running jdbc commands. I cannot imagine why it's so badly twisted around spark thriftserver. > Spark hive jdbc client cannot get table from metadata store > --- > > Key: SPARK-9686 > URL: https://issues.apache.org/jira/browse/SPARK-9686 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 1.4.0, 1.4.1, 1.5.0, 1.5.1 >Reporter: pin_zhang >Assignee: Cheng Lian > Attachments: SPARK-9686.1.patch.txt > > > 1. Start start-thriftserver.sh > 2. connect with beeline > 3. create table > 4.show tables, the new created table returned > 5. > Class.forName("org.apache.hive.jdbc.HiveDriver"); > String URL = "jdbc:hive2://localhost:1/default"; >Properties info = new Properties(); > Connection conn = DriverManager.getConnection(URL, info); > ResultSet tables = conn.getMetaData().getTables(conn.getCatalog(), >null, null, null); > Problem: >No tables with returned this API, that work in spark1.3 -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-11614) serde parameters should be set only when all params are ready
Navis created SPARK-11614: - Summary: serde parameters should be set only when all params are ready Key: SPARK-11614 URL: https://issues.apache.org/jira/browse/SPARK-11614 Project: Spark Issue Type: Bug Components: SQL Reporter: Navis Priority: Minor see HIVE-7975 and HIVE-12373 With changed semantic of setters in thrift objects in hive, setter should be called only after all parameters are set. It's not problem of current state but will be a problem in some day. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-9686) Spark hive jdbc client cannot get table from metadata store
[ https://issues.apache.org/jira/browse/SPARK-9686?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Navis updated SPARK-9686: - Attachment: SPARK-9686.1.patch.txt > Spark hive jdbc client cannot get table from metadata store > --- > > Key: SPARK-9686 > URL: https://issues.apache.org/jira/browse/SPARK-9686 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 1.4.0, 1.4.1 >Reporter: pin_zhang >Assignee: Cheng Lian > Attachments: SPARK-9686.1.patch.txt > > > 1. Start start-thriftserver.sh > 2. connect with beeline > 3. create table > 4.show tables, the new created table returned > 5. > Class.forName("org.apache.hive.jdbc.HiveDriver"); > String URL = "jdbc:hive2://localhost:1/default"; >Properties info = new Properties(); > Connection conn = DriverManager.getConnection(URL, info); > ResultSet tables = conn.getMetaData().getTables(conn.getCatalog(), >null, null, null); > Problem: >No tables with returned this API, that work in spark1.3 -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-9686) Spark hive jdbc client cannot get table from metadata store
[ https://issues.apache.org/jira/browse/SPARK-9686?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14992767#comment-14992767 ] Navis commented on SPARK-9686: -- [~pin_zhang] Met the same problem and the patch attached is what I'm using (rebased on master branch). Not implemented in clean way as data bricks would want to be included but worked for me. > Spark hive jdbc client cannot get table from metadata store > --- > > Key: SPARK-9686 > URL: https://issues.apache.org/jira/browse/SPARK-9686 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 1.4.0, 1.4.1 >Reporter: pin_zhang >Assignee: Cheng Lian > Attachments: SPARK-9686.1.patch.txt > > > 1. Start start-thriftserver.sh > 2. connect with beeline > 3. create table > 4.show tables, the new created table returned > 5. > Class.forName("org.apache.hive.jdbc.HiveDriver"); > String URL = "jdbc:hive2://localhost:1/default"; >Properties info = new Properties(); > Connection conn = DriverManager.getConnection(URL, info); > ResultSet tables = conn.getMetaData().getTables(conn.getCatalog(), >null, null, null); > Problem: >No tables with returned this API, that work in spark1.3 -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-9686) Spark hive jdbc client cannot get table from metadata store
[ https://issues.apache.org/jira/browse/SPARK-9686?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Navis updated SPARK-9686: - Affects Version/s: 1.5.0 1.5.1 > Spark hive jdbc client cannot get table from metadata store > --- > > Key: SPARK-9686 > URL: https://issues.apache.org/jira/browse/SPARK-9686 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 1.4.0, 1.4.1, 1.5.0, 1.5.1 >Reporter: pin_zhang >Assignee: Cheng Lian > Attachments: SPARK-9686.1.patch.txt > > > 1. Start start-thriftserver.sh > 2. connect with beeline > 3. create table > 4.show tables, the new created table returned > 5. > Class.forName("org.apache.hive.jdbc.HiveDriver"); > String URL = "jdbc:hive2://localhost:1/default"; >Properties info = new Properties(); > Connection conn = DriverManager.getConnection(URL, info); > ResultSet tables = conn.getMetaData().getTables(conn.getCatalog(), >null, null, null); > Problem: >No tables with returned this API, that work in spark1.3 -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-11546) Thrift server makes too many logs about result schema
Navis created SPARK-11546: - Summary: Thrift server makes too many logs about result schema Key: SPARK-11546 URL: https://issues.apache.org/jira/browse/SPARK-11546 Project: Spark Issue Type: Improvement Components: SQL Reporter: Navis Priority: Trivial SparkExecuteStatementOperation logs result schema for each getNextRowSet() calls which is by default every 1000 rows, overwhelming whole log file. For example, (it's just 3 rows) 5/11/06 11:41:00 INFO SparkExecuteStatementOperation: Result Schema: ArrayBuffer(city_id#297, gu_id#298, dong_id#299, branch_id#300, company_id#301, team_id#302, part_id#303, enb_id#304, cell_id#305, du_grp_no#306, ru_id#307, pci#308, freq_typ_cd#309, city_name#310, gu_name#311, dong_name#312, branch_name#313, company_name#314, team_name#315, part_name#316, enb_name#317, du_grp_name#318, ru_name#319, tot_et#320, tot_calculated_et#321, tot_cei_lv#322, tot_cei_value#323, tot_user_cnt#324, hdv_et#325, hdv_cei_lv#326, hdv_cei_value#327, hdv_user_cnt#328, hdv_qoe1_value#329, hdv_qoe1_kpi1#330, hdv_qoe1_kpi2#331, hdv_qoe1_kpi3#332, hdv_qoe1_qos1#333, hdv_attempt_cnt#334, hdv_success_cnt#335, hdv_complete_cnt#336, hdv_drop_cnt#337, hdv_loss_cnt#338, hdv_jitter_cnt#339, hdv_calculated_et#340, hdv_bad_et#341, hdv_barring_rate#342, hdv_barring_time#343, hdv_npr#344, lte_et#345, lte_calculated_et#346, lte_cei_lv#347, lte_cei_value#348, lte_user_cnt#349, lte_qoe1_calculated_et#350, lte_qoe1#351, lte_qoe1_kpi1#352, lte_qoe1_kpi2#353, lte_qoe1_kpi3#354, lte_qoe1_kpi4#355, lte_qoe1_kpi5#356, lte_qoe1_qos1#357, lte_attempt_cnt#358, lte_success_cnt#359, lte_data_attempt_cnt#360, lte_data_success_cnt#361, lte_ims_attempt_cnt#362, lte_ims_success_cnt#363, lte_drop_cnt#364, lte_dns_attempt_cnt#365, lte_dns_success_cnt#366, lte_bad_et#367, lte_barring_rate#368, lte_barring_time#369, lte_npr#370, total_play_time#371, total_buffering_time#372, wcdr_et#373, wcdr_cei_lv#374, wcdr_cei_value#375, wcdr_user_cnt#376, wcdr_qoe1_value#377, wcdr_qoe1_kpi1#378, wcdr_qoe1_kpi2#379, wcdr_qoe1_qos1#380, wcdr_attempt_cnt#381, wcdr_success_cnt#382, wcdr_complete_cnt#383, wcdr_drop_cnt#384, wcdr_reattempt_cnt#385, wcdr_calculated_et#386, wcdr_bad_et#387, wcdr_outgoing_cnt#388, wcdr_incoming_cnt#389, wcdr_npr_cnt#390, sgsn_mobility_cnt#391, cdate#392, total_valid_user_cnt#393, hdv_valid_user_cnt#394, lte_valid_user_cnt#395, wcdr_valid_user_cnt#396, lfas_cnt#397, lte_qoe3_qos5#398, page_loading_cnt#399, page_loading_time#400, rrc_attc_cnt#401, rrc_sussc_cnt#402, rre_attc_cnt#403, rre_sussc_cnt#404, erab_attc_cnt#405, erab_sussc_cnt#406, erab_add_attc_cnt#407, erab_add_sussc_cnt#408, cf_nfalt_num#409, hdov_x2_in_attc_cnt#410, hdov_x2_in_sussc_cnt#411, hdov_x2_out_attc_cnt#412, hdov_x2_out_sussc_cnt#413, pdcpsdulossrateultot#414, pdcpsdulossrateulcnt#415, totprbdltot#416, totprbdlcnt#417, totprbultot#418, totprbulcnt#419, rssipath0_tot#420, rssipath0_cnt#421, rssipath0_tot_night#422, rssipath0_cnt_night#423, rssipath1_tot#424, rssipath1_cnt#425, rssipath1_tot_night#426, rssipath1_cnt_night#427, loslofdiff#428, gtpsnenbpeakloss#429, gtpsnenbdlcnt#430, dt#296) 15/11/06 11:41:00 INFO SparkExecuteStatementOperation: Result Schema: ArrayBuffer(city_id#297, gu_id#298, dong_id#299, branch_id#300, company_id#301, team_id#302, part_id#303, enb_id#304, cell_id#305, du_grp_no#306, ru_id#307, pci#308, freq_typ_cd#309, city_name#310, gu_name#311, dong_name#312, branch_name#313, company_name#314, team_name#315, part_name#316, enb_name#317, du_grp_name#318, ru_name#319, tot_et#320, tot_calculated_et#321, tot_cei_lv#322, tot_cei_value#323, tot_user_cnt#324, hdv_et#325, hdv_cei_lv#326, hdv_cei_value#327, hdv_user_cnt#328, hdv_qoe1_value#329, hdv_qoe1_kpi1#330, hdv_qoe1_kpi2#331, hdv_qoe1_kpi3#332, hdv_qoe1_qos1#333, hdv_attempt_cnt#334, hdv_success_cnt#335, hdv_complete_cnt#336, hdv_drop_cnt#337, hdv_loss_cnt#338, hdv_jitter_cnt#339, hdv_calculated_et#340, hdv_bad_et#341, hdv_barring_rate#342, hdv_barring_time#343, hdv_npr#344, lte_et#345, lte_calculated_et#346, lte_cei_lv#347, lte_cei_value#348, lte_user_cnt#349, lte_qoe1_calculated_et#350, lte_qoe1#351, lte_qoe1_kpi1#352, lte_qoe1_kpi2#353, lte_qoe1_kpi3#354, lte_qoe1_kpi4#355, lte_qoe1_kpi5#356, lte_qoe1_qos1#357, lte_attempt_cnt#358, lte_success_cnt#359, lte_data_attempt_cnt#360, lte_data_success_cnt#361, lte_ims_attempt_cnt#362, lte_ims_success_cnt#363, lte_drop_cnt#364, lte_dns_attempt_cnt#365, lte_dns_success_cnt#366, lte_bad_et#367, lte_barring_rate#368, lte_barring_time#369, lte_npr#370, total_play_time#371, total_buffering_time#372, wcdr_et#373, wcdr_cei_lv#374, wcdr_cei_value#375, wcdr_user_cnt#376, wcdr_qoe1_value#377, wcdr_qoe1_kpi1#378, wcdr_qoe1_kpi2#379, wcdr_qoe1_qos1#380, wcdr_attempt_cnt#381, wcdr_success_cnt#382, wcdr_complete_cnt#383, wcdr_drop_cnt#384,
[jira] [Created] (SPARK-11124) JsonParser/Generator should be closed for resource reycle
Navis created SPARK-11124: - Summary: JsonParser/Generator should be closed for resource reycle Key: SPARK-11124 URL: https://issues.apache.org/jira/browse/SPARK-11124 Project: Spark Issue Type: Bug Reporter: Navis Priority: Trivial Some json parsers are not closed. parser in JacksonParser#parseJson, for example. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-11124) JsonParser/Generator should be closed for resource recycle
[ https://issues.apache.org/jira/browse/SPARK-11124?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Navis updated SPARK-11124: -- Summary: JsonParser/Generator should be closed for resource recycle (was: JsonParser/Generator should be closed for resource reycle) > JsonParser/Generator should be closed for resource recycle > -- > > Key: SPARK-11124 > URL: https://issues.apache.org/jira/browse/SPARK-11124 > Project: Spark > Issue Type: Bug >Reporter: Navis >Priority: Trivial > > Some json parsers are not closed. parser in JacksonParser#parseJson, for > example. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-11067) Spark SQL thrift server fails to handle decimal value
[ https://issues.apache.org/jira/browse/SPARK-11067?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14956121#comment-14956121 ] Navis commented on SPARK-11067: --- [~alexliu68] Seeing that "RowBasedSet" is in stack trace, It's older version of hive jdbc. Anyway, with the patch attached decimals are serialized to string in server via {code} HiveDecimal.create(from.getDecimal(ordinal)).bigDecimalValue().toPlainString() {code} and deserialized to bigDecimal in client via {code} new BigDecimal(string) {code} first, It's heavy calculation and seemed possibly affect performance. second, it seemed not exact to use toPlainString() which removes trailing zeros. toString() should be used instead. > Spark SQL thrift server fails to handle decimal value > - > > Key: SPARK-11067 > URL: https://issues.apache.org/jira/browse/SPARK-11067 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 1.4.1 >Reporter: Alex Liu > Attachments: SPARK-11067.1.patch.txt > > > When executing the following query through beeline connecting to Spark sql > thrift server, it errors out for decimal column > {code} > Select decimal_column from table > WARN 2015-10-09 15:04:00 > org.apache.hive.service.cli.thrift.ThriftCLIService: Error fetching results: > java.lang.ClassCastException: java.math.BigDecimal cannot be cast to > org.apache.hadoop.hive.common.type.HiveDecimal > at > org.apache.hive.service.cli.ColumnValue.toTColumnValue(ColumnValue.java:174) > ~[hive-service-0.13.1a.jar:0.13.1a] > at org.apache.hive.service.cli.RowBasedSet.addRow(RowBasedSet.java:60) > ~[hive-service-0.13.1a.jar:0.13.1a] > at org.apache.hive.service.cli.RowBasedSet.addRow(RowBasedSet.java:32) > ~[hive-service-0.13.1a.jar:0.13.1a] > at > org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation.getNextRowSet(Shim13.scala:144) > ~[spark-hive-thriftserver_2.10-1.4.1.1.jar:1.4.1.1] > at > org.apache.hive.service.cli.operation.OperationManager.getOperationNextRowSet(OperationManager.java:192) > ~[hive-service-0.13.1a.jar:0.13.1a] > at > org.apache.hive.service.cli.session.HiveSessionImpl.fetchResults(HiveSessionImpl.java:471) > ~[hive-service-0.13.1a.jar:0.13.1a] > at > org.apache.hive.service.cli.CLIService.fetchResults(CLIService.java:405) > ~[hive-service-0.13.1a.jar:0.13.1a] > at > org.apache.hive.service.cli.thrift.ThriftCLIService.FetchResults(ThriftCLIService.java:530) > ~[hive-service-0.13.1a.jar:0.13.1a] > at > org.apache.hive.service.cli.thrift.TCLIService$Processor$FetchResults.getResult(TCLIService.java:1553) > [hive-service-0.13.1a.jar:0.13.1a] > at > org.apache.hive.service.cli.thrift.TCLIService$Processor$FetchResults.getResult(TCLIService.java:1538) > [hive-service-0.13.1a.jar:0.13.1a] > at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39) > [libthrift-0.9.2.jar:0.9.2] > at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39) > [libthrift-0.9.2.jar:0.9.2] > at > org.apache.hive.service.auth.TSetIpAddressProcessor.process(TSetIpAddressProcessor.java:55) > [hive-service-0.13.1a.jar:4.8.1-SNAPSHOT] > at > org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:285) > [libthrift-0.9.2.jar:0.9.2] > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > [na:1.7.0_55] > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > [na:1.7.0_55] > at java.lang.Thread.run(Thread.java:745) [na:1.7.0_55] > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-11062) Thrift server does not support operationLog
Navis created SPARK-11062: - Summary: Thrift server does not support operationLog Key: SPARK-11062 URL: https://issues.apache.org/jira/browse/SPARK-11062 Project: Spark Issue Type: Bug Components: SQL Reporter: Navis Priority: Trivial Currently, SparkExecuteStatementOperation is skipping beforeRun/afterRun method. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-11067) Spark SQL thrift server fails to handle decimal value
[ https://issues.apache.org/jira/browse/SPARK-11067?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Navis updated SPARK-11067: -- Attachment: SPARK-11067.1.patch.txt The exception would be gone with this patch but handling of big decimal type in hive jdbc seemed need some improvement. > Spark SQL thrift server fails to handle decimal value > - > > Key: SPARK-11067 > URL: https://issues.apache.org/jira/browse/SPARK-11067 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 1.4.1 >Reporter: Alex Liu > Attachments: SPARK-11067.1.patch.txt > > > When executing the following query through beeline connecting to Spark sql > thrift server, it errors out for decimal column > {code} > Select decimal_column from table > WARN 2015-10-09 15:04:00 > org.apache.hive.service.cli.thrift.ThriftCLIService: Error fetching results: > java.lang.ClassCastException: java.math.BigDecimal cannot be cast to > org.apache.hadoop.hive.common.type.HiveDecimal > at > org.apache.hive.service.cli.ColumnValue.toTColumnValue(ColumnValue.java:174) > ~[hive-service-0.13.1a.jar:0.13.1a] > at org.apache.hive.service.cli.RowBasedSet.addRow(RowBasedSet.java:60) > ~[hive-service-0.13.1a.jar:0.13.1a] > at org.apache.hive.service.cli.RowBasedSet.addRow(RowBasedSet.java:32) > ~[hive-service-0.13.1a.jar:0.13.1a] > at > org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation.getNextRowSet(Shim13.scala:144) > ~[spark-hive-thriftserver_2.10-1.4.1.1.jar:1.4.1.1] > at > org.apache.hive.service.cli.operation.OperationManager.getOperationNextRowSet(OperationManager.java:192) > ~[hive-service-0.13.1a.jar:0.13.1a] > at > org.apache.hive.service.cli.session.HiveSessionImpl.fetchResults(HiveSessionImpl.java:471) > ~[hive-service-0.13.1a.jar:0.13.1a] > at > org.apache.hive.service.cli.CLIService.fetchResults(CLIService.java:405) > ~[hive-service-0.13.1a.jar:0.13.1a] > at > org.apache.hive.service.cli.thrift.ThriftCLIService.FetchResults(ThriftCLIService.java:530) > ~[hive-service-0.13.1a.jar:0.13.1a] > at > org.apache.hive.service.cli.thrift.TCLIService$Processor$FetchResults.getResult(TCLIService.java:1553) > [hive-service-0.13.1a.jar:0.13.1a] > at > org.apache.hive.service.cli.thrift.TCLIService$Processor$FetchResults.getResult(TCLIService.java:1538) > [hive-service-0.13.1a.jar:0.13.1a] > at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39) > [libthrift-0.9.2.jar:0.9.2] > at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39) > [libthrift-0.9.2.jar:0.9.2] > at > org.apache.hive.service.auth.TSetIpAddressProcessor.process(TSetIpAddressProcessor.java:55) > [hive-service-0.13.1a.jar:4.8.1-SNAPSHOT] > at > org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:285) > [libthrift-0.9.2.jar:0.9.2] > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > [na:1.7.0_55] > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > [na:1.7.0_55] > at java.lang.Thread.run(Thread.java:745) [na:1.7.0_55] > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-10684) StructType.interpretedOrdering need not to be serialized
Navis created SPARK-10684: - Summary: StructType.interpretedOrdering need not to be serialized Key: SPARK-10684 URL: https://issues.apache.org/jira/browse/SPARK-10684 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 1.5.0 Reporter: Navis Priority: Minor Kryo fails with buffer overflow even with max value (2G). {noformat} org.apache.spark.SparkException: Kryo serialization failed: Buffer overflow. Available: 0, required: 1 Serialization trace: containsChild (org.apache.spark.sql.catalyst.expressions.BoundReference) child (org.apache.spark.sql.catalyst.expressions.SortOrder) array (scala.collection.mutable.ArraySeq) ordering (org.apache.spark.sql.catalyst.expressions.InterpretedOrdering) interpretedOrdering (org.apache.spark.sql.types.StructType) schema (org.apache.spark.sql.catalyst.expressions.GenericRowWithSchema). To avoid this, increase spark.kryoserializer.buffer.max value. at org.apache.spark.serializer.KryoSerializerInstance.serialize(KryoSerializer.scala:263) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:240) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-10679) javax.jdo.JDOFatalUserException in executor
Navis created SPARK-10679: - Summary: javax.jdo.JDOFatalUserException in executor Key: SPARK-10679 URL: https://issues.apache.org/jira/browse/SPARK-10679 Project: Spark Issue Type: Bug Components: Spark Core Reporter: Navis Priority: Minor HadoopRDD throws exception in executor, something like below. {noformat} 5/09/17 18:51:21 INFO metastore.HiveMetaStore: 0: Opening raw store with implemenation class:org.apache.hadoop.hive.metastore.ObjectStore 15/09/17 18:51:21 INFO metastore.ObjectStore: ObjectStore, initialize called 15/09/17 18:51:21 WARN metastore.HiveMetaStore: Retrying creating default database after error: Class org.datanucleus.api.jdo.JDOPersistenceManagerFactory was not found. javax.jdo.JDOFatalUserException: Class org.datanucleus.api.jdo.JDOPersistenceManagerFactory was not found. at javax.jdo.JDOHelper.invokeGetPersistenceManagerFactoryOnImplementation(JDOHelper.java:1175) at javax.jdo.JDOHelper.getPersistenceManagerFactory(JDOHelper.java:808) at javax.jdo.JDOHelper.getPersistenceManagerFactory(JDOHelper.java:701) at org.apache.hadoop.hive.metastore.ObjectStore.getPMF(ObjectStore.java:365) at org.apache.hadoop.hive.metastore.ObjectStore.getPersistenceManager(ObjectStore.java:394) at org.apache.hadoop.hive.metastore.ObjectStore.initialize(ObjectStore.java:291) at org.apache.hadoop.hive.metastore.ObjectStore.setConf(ObjectStore.java:258) at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:73) at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:133) at org.apache.hadoop.hive.metastore.RawStoreProxy.(RawStoreProxy.java:57) at org.apache.hadoop.hive.metastore.RawStoreProxy.getProxy(RawStoreProxy.java:66) at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.newRawStore(HiveMetaStore.java:593) at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.getMS(HiveMetaStore.java:571) at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.createDefaultDB(HiveMetaStore.java:620) at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.init(HiveMetaStore.java:461) at org.apache.hadoop.hive.metastore.RetryingHMSHandler.(RetryingHMSHandler.java:66) at org.apache.hadoop.hive.metastore.RetryingHMSHandler.getProxy(RetryingHMSHandler.java:72) at org.apache.hadoop.hive.metastore.HiveMetaStore.newRetryingHMSHandler(HiveMetaStore.java:5762) at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.(HiveMetaStoreClient.java:199) at org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient.(SessionHiveMetaStoreClient.java:74) at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) at java.lang.reflect.Constructor.newInstance(Constructor.java:526) at org.apache.hadoop.hive.metastore.MetaStoreUtils.newInstance(MetaStoreUtils.java:1521) at org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.(RetryingMetaStoreClient.java:86) at org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.getProxy(RetryingMetaStoreClient.java:132) at org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.getProxy(RetryingMetaStoreClient.java:104) at org.apache.hadoop.hive.ql.metadata.Hive.createMetaStoreClient(Hive.java:3005) at org.apache.hadoop.hive.ql.metadata.Hive.getMSC(Hive.java:3024) at org.apache.hadoop.hive.ql.metadata.Hive.getAllDatabases(Hive.java:1234) at org.apache.hadoop.hive.ql.metadata.Hive.reloadFunctions(Hive.java:174) at org.apache.hadoop.hive.ql.metadata.Hive.(Hive.java:166) at org.apache.hadoop.hive.ql.plan.PlanUtils.configureJobPropertiesForStorageHandler(PlanUtils.java:803) at org.apache.hadoop.hive.ql.plan.PlanUtils.configureInputJobPropertiesForStorageHandler(PlanUtils.java:782) at org.apache.spark.sql.hive.HadoopTableReader$.initializeLocalJobConfFunc(TableReader.scala:298) at org.apache.spark.sql.hive.HadoopTableReader$$anonfun$12.apply(TableReader.scala:274) at org.apache.spark.sql.hive.HadoopTableReader$$anonfun$12.apply(TableReader.scala:274) at org.apache.spark.rdd.HadoopRDD$$anonfun$getJobConf$6.apply(HadoopRDD.scala:176) at org.apache.spark.rdd.HadoopRDD$$anonfun$getJobConf$6.apply(HadoopRDD.scala:176) at scala.Option.map(Option.scala:145) at org.apache.spark.rdd.HadoopRDD.getJobConf(HadoopRDD.scala:176) at org.apache.spark.rdd.HadoopRDD$$anon$1.(HadoopRDD.scala:220) at
[jira] [Commented] (SPARK-9032) scala.MatchError in DataFrameReader.json(String path)
[ https://issues.apache.org/jira/browse/SPARK-9032?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14712712#comment-14712712 ] Navis commented on SPARK-9032: -- Seemingly fixed by SPARK-8093 scala.MatchError in DataFrameReader.json(String path) - Key: SPARK-9032 URL: https://issues.apache.org/jira/browse/SPARK-9032 Project: Spark Issue Type: Bug Components: Java API, SQL Affects Versions: 1.4.0 Environment: Ubuntu 15.04 Reporter: Philipp Poetter Executing read().json() of SQLContext e.g. DataFrameReader raises a MatchError with a stacktrace as follows while trying to read JSON data: 15/07/14 11:25:26 INFO TaskSchedulerImpl: Removed TaskSet 0.0, whose tasks have all completed, from pool 15/07/14 11:25:26 INFO DAGScheduler: Job 0 finished: json at Example.java:23, took 6.981330 s Exception in thread main scala.MatchError: StringType (of class org.apache.spark.sql.types.StringType$) at org.apache.spark.sql.json.InferSchema$.apply(InferSchema.scala:58) at org.apache.spark.sql.json.JSONRelation$$anonfun$schema$1.apply(JSONRelation.scala:139) at org.apache.spark.sql.json.JSONRelation$$anonfun$schema$1.apply(JSONRelation.scala:138) at scala.Option.getOrElse(Option.scala:120) at org.apache.spark.sql.json.JSONRelation.schema$lzycompute(JSONRelation.scala:137) at org.apache.spark.sql.json.JSONRelation.schema(JSONRelation.scala:137) at org.apache.spark.sql.sources.LogicalRelation.init(LogicalRelation.scala:30) at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:120) at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:104) at org.apache.spark.sql.DataFrameReader.json(DataFrameReader.scala:213) at com.hp.sparkdemo.Example.main(Example.java:23) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:497) at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:664) at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:169) at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:192) at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:111) at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) 15/07/14 11:25:26 INFO SparkContext: Invoking stop() from shutdown hook 15/07/14 11:25:26 INFO SparkUI: Stopped Spark web UI at http://10.0.2.15:4040 15/07/14 11:25:26 INFO DAGScheduler: Stopping DAGScheduler 15/07/14 11:25:26 INFO SparkDeploySchedulerBackend: Shutting down all executors 15/07/14 11:25:26 INFO SparkDeploySchedulerBackend: Asking each executor to shut down 15/07/14 11:25:26 INFO MapOutputTrackerMasterEndpoint: MapOutputTrackerMasterEndpoint stopped! Offending code snippet (around line 23): ... JavaSparkContext sctx = new JavaSparkContext(sparkConf); SQLContext ctx = new SQLContext(sctx); DataFrame frame = ctx.read().json(facebookJSON); frame.printSchema(); ... The exception is reproducable using the following JSON: { data: [ { id: X999_Y999, from: { name: Tom Brady, id: X12 }, message: Looking forward to 2010!, actions: [ { name: Comment, link: http://www.facebook.com/X999/posts/Y999; }, { name: Like, link: http://www.facebook.com/X999/posts/Y999; } ], type: status, created_time: 2010-08-02T21:27:44+, updated_time: 2010-08-02T21:27:44+ }, { id: X998_Y998, from: { name: Peyton Manning, id: X18 }, message: Where's my contract?, actions: [ { name: Comment, link: http://www.facebook.com/X998/posts/Y998; }, { name: Like, link: http://www.facebook.com/X998/posts/Y998; } ], type: status, created_time: 2010-08-02T21:27:44+, updated_time: 2010-08-02T21:27:44+ } ] } -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-9685) Unsupported dataType: char(X) in Hive
[ https://issues.apache.org/jira/browse/SPARK-9685?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Navis updated SPARK-9685: - Attachment: SPARK-9685.1.patch.txt I've also met the situation to handle char type in spark a few months ago. The patch attached worked for me but not sure if it's complete package. Unsupported dataType: char(X) in Hive --- Key: SPARK-9685 URL: https://issues.apache.org/jira/browse/SPARK-9685 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 1.5.0 Reporter: Ángel Álvarez Attachments: SPARK-9685.1.patch.txt I'm getting the following error when I try to read a Hive table with char(X) fields: {code} 15/08/06 11:38:51 INFO parse.ParseDriver: Parse Completed org.apache.spark.sql.types.DataTypeException: Unsupported dataType: char(8). If you have a struct and a field name of it has any special characters, please use backticks (`) to quote that field name, e.g. `x+y`. Please note that backtick itself is not supported in a field name. at org.apache.spark.sql.types.DataTypeParser$class.toDataType(DataTypeParser.scala:95) at org.apache.spark.sql.types.DataTypeParser$$anon$1.toDataType(DataTypeParser.scala:107) at org.apache.spark.sql.types.DataTypeParser$.parse(DataTypeParser.scala:111) at org.apache.spark.sql.hive.HiveMetastoreTypes$.toDataType(HiveMetastoreCatalog.scala:769) at org.apache.spark.sql.hive.MetastoreRelation$SchemaAttribute.toAttribute(HiveMetastoreCatalog.scala:742) at org.apache.spark.sql.hive.MetastoreRelation$$anonfun$44.apply(HiveMetastoreCatalog.scala:752) at org.apache.spark.sql.hive.MetastoreRelation$$anonfun$44.apply(HiveMetastoreCatalog.scala:752) {code} It seems there is no char DataType defined in the DataTypeParser class {code} protected lazy val primitiveType: Parser[DataType] = (?i)string.r ^^^ StringType | (?i)float.r ^^^ FloatType | (?i)(?:int|integer).r ^^^ IntegerType | (?i)tinyint.r ^^^ ByteType | (?i)smallint.r ^^^ ShortType | (?i)double.r ^^^ DoubleType | (?i)(?:bigint|long).r ^^^ LongType | (?i)binary.r ^^^ BinaryType | (?i)boolean.r ^^^ BooleanType | fixedDecimalType | (?i)decimal.r ^^^ DecimalType.USER_DEFAULT | (?i)date.r ^^^ DateType | (?i)timestamp.r ^^^ TimestampType | varchar {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-10151) Support invocation of hive macro
Navis created SPARK-10151: - Summary: Support invocation of hive macro Key: SPARK-10151 URL: https://issues.apache.org/jira/browse/SPARK-10151 Project: Spark Issue Type: Bug Components: SQL Reporter: Navis Priority: Minor Macro in hive (which is GenericUDFMacro) contains real function inside of it but it's not conveyed to tasks, resulting null-pointer exception. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-10152) Support Init script for hive-thriftserver
Navis created SPARK-10152: - Summary: Support Init script for hive-thriftserver Key: SPARK-10152 URL: https://issues.apache.org/jira/browse/SPARK-10152 Project: Spark Issue Type: Improvement Components: SQL Reporter: Navis Priority: Trivial If some queries can be executed on thrift server in initialization stage(mostly for registering functions or macros), things are done much easier. Not big stuff to be included in spark but wish someone can use of this. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-8826) Fix ClassCastException in GeneratedAggregate
Navis created SPARK-8826: Summary: Fix ClassCastException in GeneratedAggregate Key: SPARK-8826 URL: https://issues.apache.org/jira/browse/SPARK-8826 Project: Spark Issue Type: Bug Components: SQL Reporter: Navis Priority: Trivial When codegen is disabled, ClassCastException is thrown in some cases. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-8357) Memory leakage on unsafe aggregation path with empty input
Navis created SPARK-8357: Summary: Memory leakage on unsafe aggregation path with empty input Key: SPARK-8357 URL: https://issues.apache.org/jira/browse/SPARK-8357 Project: Spark Issue Type: Bug Components: SQL Reporter: Navis Priority: Minor Currently, unsafe-based hash is released on 'next' call but if input is empty, it would not be called ever. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-8334) Binary logical plan should provide more realistic statistics
Navis created SPARK-8334: Summary: Binary logical plan should provide more realistic statistics Key: SPARK-8334 URL: https://issues.apache.org/jira/browse/SPARK-8334 Project: Spark Issue Type: Improvement Components: SQL Reporter: Navis Priority: Minor Currently, spark-sql multiplies size of two children for output size, which makes following join hash the other side of input. In multi way joins something like (A join B) join C, C will be marked as 'buildHash' target if size of C is not bigger than (size of A * size of B). Some results of TPC-H queries are greatly affected by this. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-8312) Populate statistics info of hive tables if it's needed to be
Navis created SPARK-8312: Summary: Populate statistics info of hive tables if it's needed to be Key: SPARK-8312 URL: https://issues.apache.org/jira/browse/SPARK-8312 Project: Spark Issue Type: Improvement Components: SQL Reporter: Navis Priority: Minor Currently, spark-sql uses stats in metastore for estimating size of hive table, which means analyze command should be executed before accessing the table for better planning especially for joins. But still with the stats, it cannot reflect real input size of the query when partition prunning predicate exists in it. Even worse is that hive cannot update megastore stats for external tables, which is fixed recently in HIVE-6727. The issue detail says the bug is applied to all hive version between 0.13.0 and 1.2.0 -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-8312) Populate statistics info of hive tables if it's needed to be
[ https://issues.apache.org/jira/browse/SPARK-8312?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Navis updated SPARK-8312: - Description: Currently, spark-sql uses stats in metastore for estimating size of hive table, which means analyze command should be executed before accessing the table for better planning especially for joins. But still with the stats, it cannot reflect real input size of the query when partition prunning predicate exists in it. Even worse is that hive cannot update metastore stats for external tables, which is fixed recently in HIVE-6727. The issue detail says the bug is applied to all hive version between 0.13.0 and 1.2.0 was: Currently, spark-sql uses stats in metastore for estimating size of hive table, which means analyze command should be executed before accessing the table for better planning especially for joins. But still with the stats, it cannot reflect real input size of the query when partition prunning predicate exists in it. Even worse is that hive cannot update megastore stats for external tables, which is fixed recently in HIVE-6727. The issue detail says the bug is applied to all hive version between 0.13.0 and 1.2.0 Populate statistics info of hive tables if it's needed to be Key: SPARK-8312 URL: https://issues.apache.org/jira/browse/SPARK-8312 Project: Spark Issue Type: Improvement Components: SQL Reporter: Navis Priority: Minor Currently, spark-sql uses stats in metastore for estimating size of hive table, which means analyze command should be executed before accessing the table for better planning especially for joins. But still with the stats, it cannot reflect real input size of the query when partition prunning predicate exists in it. Even worse is that hive cannot update metastore stats for external tables, which is fixed recently in HIVE-6727. The issue detail says the bug is applied to all hive version between 0.13.0 and 1.2.0 -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-8285) CombineSum should be calculated as unlimited decimal first
Navis created SPARK-8285: Summary: CombineSum should be calculated as unlimited decimal first Key: SPARK-8285 URL: https://issues.apache.org/jira/browse/SPARK-8285 Project: Spark Issue Type: Bug Components: SQL Reporter: Navis Priority: Trivial {code:title=GeneratedAggregate.scala} case cs @ CombineSum(expr) = val calcType = expr.dataType expr.dataType match { case DecimalType.Fixed(_, _) = DecimalType.Unlimited case _ = expr.dataType } {code} calcType is always expr.dataType. credits are all belong to IntelliJ -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-8153) Add configuration for disabling partial aggregation in runtime
Navis created SPARK-8153: Summary: Add configuration for disabling partial aggregation in runtime Key: SPARK-8153 URL: https://issues.apache.org/jira/browse/SPARK-8153 Project: Spark Issue Type: Improvement Components: SQL Reporter: Navis Priority: Trivial Same thing with hive.map.aggr.hash.min.reduction in hive, which disables hash aggregation if it's not sufficiently decreasing the output size. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-7936) Add configuration for initial size and limit of hash for aggregation
[ https://issues.apache.org/jira/browse/SPARK-7936?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Navis updated SPARK-7936: - Summary: Add configuration for initial size and limit of hash for aggregation (was: Add configuration for initial size of hash for aggregation and limit) Add configuration for initial size and limit of hash for aggregation Key: SPARK-7936 URL: https://issues.apache.org/jira/browse/SPARK-7936 Project: Spark Issue Type: Improvement Components: SQL Reporter: Navis Priority: Minor Partial aggregation takes a lot of memory and mostly cannot be completed if it's not sliced into very small partitions (large in count). This patch is for limiting entry size for partial aggregation. Initial size for hash is just a bonus. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-7936) Add configuration for initial size of hash for aggregation and limit
[ https://issues.apache.org/jira/browse/SPARK-7936?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14564271#comment-14564271 ] Navis commented on SPARK-7936: -- Added two configuration 1. spark.sql.aggregation.hash.initSize : initialize size of hash. Applied to both(final, partial) aggregation 2. spark.sql.partial.aggregation.maxEntry : max size of hash for partial aggregation. should not be used for final aggregation Add configuration for initial size of hash for aggregation and limit Key: SPARK-7936 URL: https://issues.apache.org/jira/browse/SPARK-7936 Project: Spark Issue Type: Improvement Components: SQL Reporter: Navis Priority: Minor Partial aggregation takes a lot of memory and mostly cannot be completed if it's not sliced into very small partitions (large in count). This patch is for limiting entry size for partial aggregation. Initial size for hash is just a bonus. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-7936) Add configuration for initial size of hash for aggregation and limit
Navis created SPARK-7936: Summary: Add configuration for initial size of hash for aggregation and limit Key: SPARK-7936 URL: https://issues.apache.org/jira/browse/SPARK-7936 Project: Spark Issue Type: Improvement Components: SQL Reporter: Navis Priority: Minor Partial aggregation takes a lot of memory and mostly cannot be completed if it's not sliced into very small partitions (large in count). This patch is for limiting entry size for partial aggregation. Initial size for hash is just a bonus. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org