[jira] [Created] (SPARK-12637) Print stage info of finished stages properly

2016-01-04 Thread Navis (JIRA)
Navis created SPARK-12637:
-

 Summary: Print stage info of finished stages properly
 Key: SPARK-12637
 URL: https://issues.apache.org/jira/browse/SPARK-12637
 Project: Spark
  Issue Type: Improvement
  Components: Spark Core
Reporter: Navis
Priority: Trivial


Currently it prints hashcode of stage info, which seemed not that useful.
{noformat}
INFO scheduler.StatsReportListener: Finished stage: 
org.apache.spark.scheduler.StageInfo@2eb47d79
{noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-12619) Combine small files in a hadoop directory into single split

2016-01-04 Thread Navis (JIRA)
Navis created SPARK-12619:
-

 Summary: Combine small files in a hadoop directory into single 
split 
 Key: SPARK-12619
 URL: https://issues.apache.org/jira/browse/SPARK-12619
 Project: Spark
  Issue Type: Improvement
  Components: Spark Core
Reporter: Navis
Priority: Trivial


When a directory contains too many (small) files, whole spark cluster will be 
exhausted scheduling tasks created for each file. Custom input format can 
handle that but if you're using hive metastore, it could hardly be an option.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-9686) Spark Thrift server doesn't return correct JDBC metadata

2015-11-17 Thread Navis (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-9686?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15010244#comment-15010244
 ] 

Navis commented on SPARK-9686:
--

@Cheng Lian  Sorry, I've confused "remote metastore" with "remote database". 
I'm using local metastore, without hive.metastore.uris setting.

bq. We should override corresponding methods in SparkSQLCLIService and dispatch 
these JDBC calls to the metastore Hive client.
The patch attached is exactly for that, with configuration replacement for 
asserting valid metastore configuration (without this, Hive.get() destroys 
connection and make new one with dummy derby).

> Spark Thrift server doesn't return correct JDBC metadata 
> -
>
> Key: SPARK-9686
> URL: https://issues.apache.org/jira/browse/SPARK-9686
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.4.0, 1.4.1, 1.5.0, 1.5.1, 1.5.2
>Reporter: pin_zhang
>Assignee: Cheng Lian
> Attachments: SPARK-9686.1.patch.txt
>
>
> 1. Start  start-thriftserver.sh
> 2. connect with beeline
> 3. create table
> 4.show tables, the new created table returned
> 5.
>   Class.forName("org.apache.hive.jdbc.HiveDriver");
>   String URL = "jdbc:hive2://localhost:1/default";
>Properties info = new Properties();
> Connection conn = DriverManager.getConnection(URL, info);
>   ResultSet tables = conn.getMetaData().getTables(conn.getCatalog(),
>null, null, null);
> Problem:
>No tables with returned this API, that work in spark1.3



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-9686) Spark Thrift server doesn't return correct JDBC metadata

2015-11-17 Thread Navis (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-9686?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15010244#comment-15010244
 ] 

Navis edited comment on SPARK-9686 at 11/18/15 5:06 AM:


[~lian cheng]  Sorry, I've confused "remote metastore" with "remote database". 
I'm using local metastore, without hive.metastore.uris setting.

And,
bq. We should override corresponding methods in SparkSQLCLIService and dispatch 
these JDBC calls to the metastore Hive client.
The patch attached is exactly for that, with configuration replacement for 
asserting valid metastore configuration (without this, Hive.get() destroys 
connection and make new one with dummy derby).


was (Author: navis):
@Cheng Lian  Sorry, I've confused "remote metastore" with "remote database". 
I'm using local metastore, without hive.metastore.uris setting.

bq. We should override corresponding methods in SparkSQLCLIService and dispatch 
these JDBC calls to the metastore Hive client.
The patch attached is exactly for that, with configuration replacement for 
asserting valid metastore configuration (without this, Hive.get() destroys 
connection and make new one with dummy derby).

> Spark Thrift server doesn't return correct JDBC metadata 
> -
>
> Key: SPARK-9686
> URL: https://issues.apache.org/jira/browse/SPARK-9686
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.4.0, 1.4.1, 1.5.0, 1.5.1, 1.5.2
>Reporter: pin_zhang
>Assignee: Cheng Lian
> Attachments: SPARK-9686.1.patch.txt
>
>
> 1. Start  start-thriftserver.sh
> 2. connect with beeline
> 3. create table
> 4.show tables, the new created table returned
> 5.
>   Class.forName("org.apache.hive.jdbc.HiveDriver");
>   String URL = "jdbc:hive2://localhost:1/default";
>Properties info = new Properties();
> Connection conn = DriverManager.getConnection(URL, info);
>   ResultSet tables = conn.getMetaData().getTables(conn.getCatalog(),
>null, null, null);
> Problem:
>No tables with returned this API, that work in spark1.3



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-9686) Spark hive jdbc client cannot get table from metadata store

2015-11-16 Thread Navis (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-9686?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15006352#comment-15006352
 ] 

Navis commented on SPARK-9686:
--

[~lian cheng] It's configured with remote database (maria-db in my case) of 
course. But those are overwritten to derby values when SparkSQLEnv is 
initialized. I've just overwritten again it with values in metadataHive before 
running jdbc commands. I cannot imagine why it's so badly twisted around spark 
thriftserver.

> Spark hive jdbc client cannot get table from metadata store
> ---
>
> Key: SPARK-9686
> URL: https://issues.apache.org/jira/browse/SPARK-9686
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.4.0, 1.4.1, 1.5.0, 1.5.1
>Reporter: pin_zhang
>Assignee: Cheng Lian
> Attachments: SPARK-9686.1.patch.txt
>
>
> 1. Start  start-thriftserver.sh
> 2. connect with beeline
> 3. create table
> 4.show tables, the new created table returned
> 5.
>   Class.forName("org.apache.hive.jdbc.HiveDriver");
>   String URL = "jdbc:hive2://localhost:1/default";
>Properties info = new Properties();
> Connection conn = DriverManager.getConnection(URL, info);
>   ResultSet tables = conn.getMetaData().getTables(conn.getCatalog(),
>null, null, null);
> Problem:
>No tables with returned this API, that work in spark1.3



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-11614) serde parameters should be set only when all params are ready

2015-11-09 Thread Navis (JIRA)
Navis created SPARK-11614:
-

 Summary: serde parameters should be set only when all params are 
ready
 Key: SPARK-11614
 URL: https://issues.apache.org/jira/browse/SPARK-11614
 Project: Spark
  Issue Type: Bug
  Components: SQL
Reporter: Navis
Priority: Minor


see HIVE-7975 and HIVE-12373

With changed semantic of setters in thrift objects in hive, setter should be 
called only after all parameters are set. It's not problem of current state but 
will be a problem in some day.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-9686) Spark hive jdbc client cannot get table from metadata store

2015-11-05 Thread Navis (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-9686?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Navis updated SPARK-9686:
-
Attachment: SPARK-9686.1.patch.txt

> Spark hive jdbc client cannot get table from metadata store
> ---
>
> Key: SPARK-9686
> URL: https://issues.apache.org/jira/browse/SPARK-9686
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.4.0, 1.4.1
>Reporter: pin_zhang
>Assignee: Cheng Lian
> Attachments: SPARK-9686.1.patch.txt
>
>
> 1. Start  start-thriftserver.sh
> 2. connect with beeline
> 3. create table
> 4.show tables, the new created table returned
> 5.
>   Class.forName("org.apache.hive.jdbc.HiveDriver");
>   String URL = "jdbc:hive2://localhost:1/default";
>Properties info = new Properties();
> Connection conn = DriverManager.getConnection(URL, info);
>   ResultSet tables = conn.getMetaData().getTables(conn.getCatalog(),
>null, null, null);
> Problem:
>No tables with returned this API, that work in spark1.3



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-9686) Spark hive jdbc client cannot get table from metadata store

2015-11-05 Thread Navis (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-9686?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14992767#comment-14992767
 ] 

Navis commented on SPARK-9686:
--

[~pin_zhang] Met the same problem and the patch attached is what I'm using 
(rebased on master branch). Not implemented in clean way as data bricks would 
want to be included but worked for me.

> Spark hive jdbc client cannot get table from metadata store
> ---
>
> Key: SPARK-9686
> URL: https://issues.apache.org/jira/browse/SPARK-9686
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.4.0, 1.4.1
>Reporter: pin_zhang
>Assignee: Cheng Lian
> Attachments: SPARK-9686.1.patch.txt
>
>
> 1. Start  start-thriftserver.sh
> 2. connect with beeline
> 3. create table
> 4.show tables, the new created table returned
> 5.
>   Class.forName("org.apache.hive.jdbc.HiveDriver");
>   String URL = "jdbc:hive2://localhost:1/default";
>Properties info = new Properties();
> Connection conn = DriverManager.getConnection(URL, info);
>   ResultSet tables = conn.getMetaData().getTables(conn.getCatalog(),
>null, null, null);
> Problem:
>No tables with returned this API, that work in spark1.3



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-9686) Spark hive jdbc client cannot get table from metadata store

2015-11-05 Thread Navis (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-9686?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Navis updated SPARK-9686:
-
Affects Version/s: 1.5.0
   1.5.1

> Spark hive jdbc client cannot get table from metadata store
> ---
>
> Key: SPARK-9686
> URL: https://issues.apache.org/jira/browse/SPARK-9686
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.4.0, 1.4.1, 1.5.0, 1.5.1
>Reporter: pin_zhang
>Assignee: Cheng Lian
> Attachments: SPARK-9686.1.patch.txt
>
>
> 1. Start  start-thriftserver.sh
> 2. connect with beeline
> 3. create table
> 4.show tables, the new created table returned
> 5.
>   Class.forName("org.apache.hive.jdbc.HiveDriver");
>   String URL = "jdbc:hive2://localhost:1/default";
>Properties info = new Properties();
> Connection conn = DriverManager.getConnection(URL, info);
>   ResultSet tables = conn.getMetaData().getTables(conn.getCatalog(),
>null, null, null);
> Problem:
>No tables with returned this API, that work in spark1.3



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-11546) Thrift server makes too many logs about result schema

2015-11-05 Thread Navis (JIRA)
Navis created SPARK-11546:
-

 Summary: Thrift server makes too many logs about result schema
 Key: SPARK-11546
 URL: https://issues.apache.org/jira/browse/SPARK-11546
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Reporter: Navis
Priority: Trivial


SparkExecuteStatementOperation logs result schema for each getNextRowSet() 
calls which is by default every 1000 rows, overwhelming whole log file. For 
example, (it's just 3 rows)

5/11/06 11:41:00 INFO SparkExecuteStatementOperation: Result Schema: 
ArrayBuffer(city_id#297, gu_id#298, dong_id#299, branch_id#300, company_id#301, 
team_id#302, part_id#303, enb_id#304, cell_id#305, du_grp_no#306, ru_id#307, 
pci#308, freq_typ_cd#309, city_name#310, gu_name#311, dong_name#312, 
branch_name#313, company_name#314, team_name#315, part_name#316, enb_name#317, 
du_grp_name#318, ru_name#319, tot_et#320, tot_calculated_et#321, 
tot_cei_lv#322, tot_cei_value#323, tot_user_cnt#324, hdv_et#325, 
hdv_cei_lv#326, hdv_cei_value#327, hdv_user_cnt#328, hdv_qoe1_value#329, 
hdv_qoe1_kpi1#330, hdv_qoe1_kpi2#331, hdv_qoe1_kpi3#332, hdv_qoe1_qos1#333, 
hdv_attempt_cnt#334, hdv_success_cnt#335, hdv_complete_cnt#336, 
hdv_drop_cnt#337, hdv_loss_cnt#338, hdv_jitter_cnt#339, hdv_calculated_et#340, 
hdv_bad_et#341, hdv_barring_rate#342, hdv_barring_time#343, hdv_npr#344, 
lte_et#345, lte_calculated_et#346, lte_cei_lv#347, lte_cei_value#348, 
lte_user_cnt#349, lte_qoe1_calculated_et#350, lte_qoe1#351, lte_qoe1_kpi1#352, 
lte_qoe1_kpi2#353, lte_qoe1_kpi3#354, lte_qoe1_kpi4#355, lte_qoe1_kpi5#356, 
lte_qoe1_qos1#357, lte_attempt_cnt#358, lte_success_cnt#359, 
lte_data_attempt_cnt#360, lte_data_success_cnt#361, lte_ims_attempt_cnt#362, 
lte_ims_success_cnt#363, lte_drop_cnt#364, lte_dns_attempt_cnt#365, 
lte_dns_success_cnt#366, lte_bad_et#367, lte_barring_rate#368, 
lte_barring_time#369, lte_npr#370, total_play_time#371, 
total_buffering_time#372, wcdr_et#373, wcdr_cei_lv#374, wcdr_cei_value#375, 
wcdr_user_cnt#376, wcdr_qoe1_value#377, wcdr_qoe1_kpi1#378, wcdr_qoe1_kpi2#379, 
wcdr_qoe1_qos1#380, wcdr_attempt_cnt#381, wcdr_success_cnt#382, 
wcdr_complete_cnt#383, wcdr_drop_cnt#384, wcdr_reattempt_cnt#385, 
wcdr_calculated_et#386, wcdr_bad_et#387, wcdr_outgoing_cnt#388, 
wcdr_incoming_cnt#389, wcdr_npr_cnt#390, sgsn_mobility_cnt#391, cdate#392, 
total_valid_user_cnt#393, hdv_valid_user_cnt#394, lte_valid_user_cnt#395, 
wcdr_valid_user_cnt#396, lfas_cnt#397, lte_qoe3_qos5#398, page_loading_cnt#399, 
page_loading_time#400, rrc_attc_cnt#401, rrc_sussc_cnt#402, rre_attc_cnt#403, 
rre_sussc_cnt#404, erab_attc_cnt#405, erab_sussc_cnt#406, 
erab_add_attc_cnt#407, erab_add_sussc_cnt#408, cf_nfalt_num#409, 
hdov_x2_in_attc_cnt#410, hdov_x2_in_sussc_cnt#411, hdov_x2_out_attc_cnt#412, 
hdov_x2_out_sussc_cnt#413, pdcpsdulossrateultot#414, pdcpsdulossrateulcnt#415, 
totprbdltot#416, totprbdlcnt#417, totprbultot#418, totprbulcnt#419, 
rssipath0_tot#420, rssipath0_cnt#421, rssipath0_tot_night#422, 
rssipath0_cnt_night#423, rssipath1_tot#424, rssipath1_cnt#425, 
rssipath1_tot_night#426, rssipath1_cnt_night#427, loslofdiff#428, 
gtpsnenbpeakloss#429, gtpsnenbdlcnt#430, dt#296)
15/11/06 11:41:00 INFO SparkExecuteStatementOperation: Result Schema: 
ArrayBuffer(city_id#297, gu_id#298, dong_id#299, branch_id#300, company_id#301, 
team_id#302, part_id#303, enb_id#304, cell_id#305, du_grp_no#306, ru_id#307, 
pci#308, freq_typ_cd#309, city_name#310, gu_name#311, dong_name#312, 
branch_name#313, company_name#314, team_name#315, part_name#316, enb_name#317, 
du_grp_name#318, ru_name#319, tot_et#320, tot_calculated_et#321, 
tot_cei_lv#322, tot_cei_value#323, tot_user_cnt#324, hdv_et#325, 
hdv_cei_lv#326, hdv_cei_value#327, hdv_user_cnt#328, hdv_qoe1_value#329, 
hdv_qoe1_kpi1#330, hdv_qoe1_kpi2#331, hdv_qoe1_kpi3#332, hdv_qoe1_qos1#333, 
hdv_attempt_cnt#334, hdv_success_cnt#335, hdv_complete_cnt#336, 
hdv_drop_cnt#337, hdv_loss_cnt#338, hdv_jitter_cnt#339, hdv_calculated_et#340, 
hdv_bad_et#341, hdv_barring_rate#342, hdv_barring_time#343, hdv_npr#344, 
lte_et#345, lte_calculated_et#346, lte_cei_lv#347, lte_cei_value#348, 
lte_user_cnt#349, lte_qoe1_calculated_et#350, lte_qoe1#351, lte_qoe1_kpi1#352, 
lte_qoe1_kpi2#353, lte_qoe1_kpi3#354, lte_qoe1_kpi4#355, lte_qoe1_kpi5#356, 
lte_qoe1_qos1#357, lte_attempt_cnt#358, lte_success_cnt#359, 
lte_data_attempt_cnt#360, lte_data_success_cnt#361, lte_ims_attempt_cnt#362, 
lte_ims_success_cnt#363, lte_drop_cnt#364, lte_dns_attempt_cnt#365, 
lte_dns_success_cnt#366, lte_bad_et#367, lte_barring_rate#368, 
lte_barring_time#369, lte_npr#370, total_play_time#371, 
total_buffering_time#372, wcdr_et#373, wcdr_cei_lv#374, wcdr_cei_value#375, 
wcdr_user_cnt#376, wcdr_qoe1_value#377, wcdr_qoe1_kpi1#378, wcdr_qoe1_kpi2#379, 
wcdr_qoe1_qos1#380, wcdr_attempt_cnt#381, wcdr_success_cnt#382, 
wcdr_complete_cnt#383, wcdr_drop_cnt#384, 

[jira] [Created] (SPARK-11124) JsonParser/Generator should be closed for resource reycle

2015-10-14 Thread Navis (JIRA)
Navis created SPARK-11124:
-

 Summary: JsonParser/Generator should be closed for resource reycle
 Key: SPARK-11124
 URL: https://issues.apache.org/jira/browse/SPARK-11124
 Project: Spark
  Issue Type: Bug
Reporter: Navis
Priority: Trivial


Some json parsers are not closed. parser in JacksonParser#parseJson, for 
example.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-11124) JsonParser/Generator should be closed for resource recycle

2015-10-14 Thread Navis (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-11124?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Navis updated SPARK-11124:
--
Summary: JsonParser/Generator should be closed for resource recycle  (was: 
JsonParser/Generator should be closed for resource reycle)

> JsonParser/Generator should be closed for resource recycle
> --
>
> Key: SPARK-11124
> URL: https://issues.apache.org/jira/browse/SPARK-11124
> Project: Spark
>  Issue Type: Bug
>Reporter: Navis
>Priority: Trivial
>
> Some json parsers are not closed. parser in JacksonParser#parseJson, for 
> example.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-11067) Spark SQL thrift server fails to handle decimal value

2015-10-13 Thread Navis (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-11067?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14956121#comment-14956121
 ] 

Navis commented on SPARK-11067:
---

[~alexliu68] Seeing that "RowBasedSet" is in stack trace, It's older version of 
hive jdbc. Anyway, with the patch attached decimals are serialized to string in 
server via 
{code}
HiveDecimal.create(from.getDecimal(ordinal)).bigDecimalValue().toPlainString()
{code}
and deserialized to bigDecimal in client via
{code}
new BigDecimal(string)
{code}

first, It's heavy calculation and seemed possibly affect performance.
second, it seemed not exact to use toPlainString() which removes trailing 
zeros. toString() should be used instead.

> Spark SQL thrift server fails to handle decimal value
> -
>
> Key: SPARK-11067
> URL: https://issues.apache.org/jira/browse/SPARK-11067
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.4.1
>Reporter: Alex Liu
> Attachments: SPARK-11067.1.patch.txt
>
>
> When executing the following query through beeline connecting to Spark sql 
> thrift server, it errors out for decimal column
> {code}
> Select decimal_column from table
> WARN  2015-10-09 15:04:00 
> org.apache.hive.service.cli.thrift.ThriftCLIService: Error fetching results: 
> java.lang.ClassCastException: java.math.BigDecimal cannot be cast to 
> org.apache.hadoop.hive.common.type.HiveDecimal
>   at 
> org.apache.hive.service.cli.ColumnValue.toTColumnValue(ColumnValue.java:174) 
> ~[hive-service-0.13.1a.jar:0.13.1a]
>   at org.apache.hive.service.cli.RowBasedSet.addRow(RowBasedSet.java:60) 
> ~[hive-service-0.13.1a.jar:0.13.1a]
>   at org.apache.hive.service.cli.RowBasedSet.addRow(RowBasedSet.java:32) 
> ~[hive-service-0.13.1a.jar:0.13.1a]
>   at 
> org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation.getNextRowSet(Shim13.scala:144)
>  ~[spark-hive-thriftserver_2.10-1.4.1.1.jar:1.4.1.1]
>   at 
> org.apache.hive.service.cli.operation.OperationManager.getOperationNextRowSet(OperationManager.java:192)
>  ~[hive-service-0.13.1a.jar:0.13.1a]
>   at 
> org.apache.hive.service.cli.session.HiveSessionImpl.fetchResults(HiveSessionImpl.java:471)
>  ~[hive-service-0.13.1a.jar:0.13.1a]
>   at 
> org.apache.hive.service.cli.CLIService.fetchResults(CLIService.java:405) 
> ~[hive-service-0.13.1a.jar:0.13.1a]
>   at 
> org.apache.hive.service.cli.thrift.ThriftCLIService.FetchResults(ThriftCLIService.java:530)
>  ~[hive-service-0.13.1a.jar:0.13.1a]
>   at 
> org.apache.hive.service.cli.thrift.TCLIService$Processor$FetchResults.getResult(TCLIService.java:1553)
>  [hive-service-0.13.1a.jar:0.13.1a]
>   at 
> org.apache.hive.service.cli.thrift.TCLIService$Processor$FetchResults.getResult(TCLIService.java:1538)
>  [hive-service-0.13.1a.jar:0.13.1a]
>   at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39) 
> [libthrift-0.9.2.jar:0.9.2]
>   at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39) 
> [libthrift-0.9.2.jar:0.9.2]
>   at 
> org.apache.hive.service.auth.TSetIpAddressProcessor.process(TSetIpAddressProcessor.java:55)
>  [hive-service-0.13.1a.jar:4.8.1-SNAPSHOT]
>   at 
> org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:285)
>  [libthrift-0.9.2.jar:0.9.2]
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>  [na:1.7.0_55]
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>  [na:1.7.0_55]
>   at java.lang.Thread.run(Thread.java:745) [na:1.7.0_55]
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-11062) Thrift server does not support operationLog

2015-10-12 Thread Navis (JIRA)
Navis created SPARK-11062:
-

 Summary: Thrift server does not support operationLog
 Key: SPARK-11062
 URL: https://issues.apache.org/jira/browse/SPARK-11062
 Project: Spark
  Issue Type: Bug
  Components: SQL
Reporter: Navis
Priority: Trivial


Currently, SparkExecuteStatementOperation is skipping beforeRun/afterRun method.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-11067) Spark SQL thrift server fails to handle decimal value

2015-10-12 Thread Navis (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-11067?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Navis updated SPARK-11067:
--
Attachment: SPARK-11067.1.patch.txt

The exception would be gone with this patch but handling of big decimal type in 
hive jdbc seemed need some improvement.

> Spark SQL thrift server fails to handle decimal value
> -
>
> Key: SPARK-11067
> URL: https://issues.apache.org/jira/browse/SPARK-11067
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.4.1
>Reporter: Alex Liu
> Attachments: SPARK-11067.1.patch.txt
>
>
> When executing the following query through beeline connecting to Spark sql 
> thrift server, it errors out for decimal column
> {code}
> Select decimal_column from table
> WARN  2015-10-09 15:04:00 
> org.apache.hive.service.cli.thrift.ThriftCLIService: Error fetching results: 
> java.lang.ClassCastException: java.math.BigDecimal cannot be cast to 
> org.apache.hadoop.hive.common.type.HiveDecimal
>   at 
> org.apache.hive.service.cli.ColumnValue.toTColumnValue(ColumnValue.java:174) 
> ~[hive-service-0.13.1a.jar:0.13.1a]
>   at org.apache.hive.service.cli.RowBasedSet.addRow(RowBasedSet.java:60) 
> ~[hive-service-0.13.1a.jar:0.13.1a]
>   at org.apache.hive.service.cli.RowBasedSet.addRow(RowBasedSet.java:32) 
> ~[hive-service-0.13.1a.jar:0.13.1a]
>   at 
> org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation.getNextRowSet(Shim13.scala:144)
>  ~[spark-hive-thriftserver_2.10-1.4.1.1.jar:1.4.1.1]
>   at 
> org.apache.hive.service.cli.operation.OperationManager.getOperationNextRowSet(OperationManager.java:192)
>  ~[hive-service-0.13.1a.jar:0.13.1a]
>   at 
> org.apache.hive.service.cli.session.HiveSessionImpl.fetchResults(HiveSessionImpl.java:471)
>  ~[hive-service-0.13.1a.jar:0.13.1a]
>   at 
> org.apache.hive.service.cli.CLIService.fetchResults(CLIService.java:405) 
> ~[hive-service-0.13.1a.jar:0.13.1a]
>   at 
> org.apache.hive.service.cli.thrift.ThriftCLIService.FetchResults(ThriftCLIService.java:530)
>  ~[hive-service-0.13.1a.jar:0.13.1a]
>   at 
> org.apache.hive.service.cli.thrift.TCLIService$Processor$FetchResults.getResult(TCLIService.java:1553)
>  [hive-service-0.13.1a.jar:0.13.1a]
>   at 
> org.apache.hive.service.cli.thrift.TCLIService$Processor$FetchResults.getResult(TCLIService.java:1538)
>  [hive-service-0.13.1a.jar:0.13.1a]
>   at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39) 
> [libthrift-0.9.2.jar:0.9.2]
>   at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39) 
> [libthrift-0.9.2.jar:0.9.2]
>   at 
> org.apache.hive.service.auth.TSetIpAddressProcessor.process(TSetIpAddressProcessor.java:55)
>  [hive-service-0.13.1a.jar:4.8.1-SNAPSHOT]
>   at 
> org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:285)
>  [libthrift-0.9.2.jar:0.9.2]
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>  [na:1.7.0_55]
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>  [na:1.7.0_55]
>   at java.lang.Thread.run(Thread.java:745) [na:1.7.0_55]
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-10684) StructType.interpretedOrdering need not to be serialized

2015-09-17 Thread Navis (JIRA)
Navis created SPARK-10684:
-

 Summary: StructType.interpretedOrdering need not to be serialized
 Key: SPARK-10684
 URL: https://issues.apache.org/jira/browse/SPARK-10684
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 1.5.0
Reporter: Navis
Priority: Minor


Kryo fails with buffer overflow even with max value (2G).

{noformat}
org.apache.spark.SparkException: Kryo serialization failed: Buffer overflow. 
Available: 0, required: 1
Serialization trace:
containsChild (org.apache.spark.sql.catalyst.expressions.BoundReference)
child (org.apache.spark.sql.catalyst.expressions.SortOrder)
array (scala.collection.mutable.ArraySeq)
ordering (org.apache.spark.sql.catalyst.expressions.InterpretedOrdering)
interpretedOrdering (org.apache.spark.sql.types.StructType)
schema (org.apache.spark.sql.catalyst.expressions.GenericRowWithSchema). To 
avoid this, increase spark.kryoserializer.buffer.max value.
at 
org.apache.spark.serializer.KryoSerializerInstance.serialize(KryoSerializer.scala:263)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:240)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
{noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-10679) javax.jdo.JDOFatalUserException in executor

2015-09-17 Thread Navis (JIRA)
Navis created SPARK-10679:
-

 Summary: javax.jdo.JDOFatalUserException in executor
 Key: SPARK-10679
 URL: https://issues.apache.org/jira/browse/SPARK-10679
 Project: Spark
  Issue Type: Bug
  Components: Spark Core
Reporter: Navis
Priority: Minor


HadoopRDD throws exception in executor, something like below.
{noformat}
5/09/17 18:51:21 INFO metastore.HiveMetaStore: 0: Opening raw store with 
implemenation class:org.apache.hadoop.hive.metastore.ObjectStore
15/09/17 18:51:21 INFO metastore.ObjectStore: ObjectStore, initialize called
15/09/17 18:51:21 WARN metastore.HiveMetaStore: Retrying creating default 
database after error: Class 
org.datanucleus.api.jdo.JDOPersistenceManagerFactory was not found.
javax.jdo.JDOFatalUserException: Class 
org.datanucleus.api.jdo.JDOPersistenceManagerFactory was not found.
at 
javax.jdo.JDOHelper.invokeGetPersistenceManagerFactoryOnImplementation(JDOHelper.java:1175)
at javax.jdo.JDOHelper.getPersistenceManagerFactory(JDOHelper.java:808)
at javax.jdo.JDOHelper.getPersistenceManagerFactory(JDOHelper.java:701)
at 
org.apache.hadoop.hive.metastore.ObjectStore.getPMF(ObjectStore.java:365)
at 
org.apache.hadoop.hive.metastore.ObjectStore.getPersistenceManager(ObjectStore.java:394)
at 
org.apache.hadoop.hive.metastore.ObjectStore.initialize(ObjectStore.java:291)
at 
org.apache.hadoop.hive.metastore.ObjectStore.setConf(ObjectStore.java:258)
at 
org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:73)
at 
org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:133)
at 
org.apache.hadoop.hive.metastore.RawStoreProxy.(RawStoreProxy.java:57)
at 
org.apache.hadoop.hive.metastore.RawStoreProxy.getProxy(RawStoreProxy.java:66)
at 
org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.newRawStore(HiveMetaStore.java:593)
at 
org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.getMS(HiveMetaStore.java:571)
at 
org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.createDefaultDB(HiveMetaStore.java:620)
at 
org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.init(HiveMetaStore.java:461)
at 
org.apache.hadoop.hive.metastore.RetryingHMSHandler.(RetryingHMSHandler.java:66)
at 
org.apache.hadoop.hive.metastore.RetryingHMSHandler.getProxy(RetryingHMSHandler.java:72)
at 
org.apache.hadoop.hive.metastore.HiveMetaStore.newRetryingHMSHandler(HiveMetaStore.java:5762)
at 
org.apache.hadoop.hive.metastore.HiveMetaStoreClient.(HiveMetaStoreClient.java:199)
at 
org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient.(SessionHiveMetaStoreClient.java:74)
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at 
sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
at 
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
at 
org.apache.hadoop.hive.metastore.MetaStoreUtils.newInstance(MetaStoreUtils.java:1521)
at 
org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.(RetryingMetaStoreClient.java:86)
at 
org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.getProxy(RetryingMetaStoreClient.java:132)
at 
org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.getProxy(RetryingMetaStoreClient.java:104)
at 
org.apache.hadoop.hive.ql.metadata.Hive.createMetaStoreClient(Hive.java:3005)
at org.apache.hadoop.hive.ql.metadata.Hive.getMSC(Hive.java:3024)
at 
org.apache.hadoop.hive.ql.metadata.Hive.getAllDatabases(Hive.java:1234)
at 
org.apache.hadoop.hive.ql.metadata.Hive.reloadFunctions(Hive.java:174)
at org.apache.hadoop.hive.ql.metadata.Hive.(Hive.java:166)
at 
org.apache.hadoop.hive.ql.plan.PlanUtils.configureJobPropertiesForStorageHandler(PlanUtils.java:803)
at 
org.apache.hadoop.hive.ql.plan.PlanUtils.configureInputJobPropertiesForStorageHandler(PlanUtils.java:782)
at 
org.apache.spark.sql.hive.HadoopTableReader$.initializeLocalJobConfFunc(TableReader.scala:298)
at 
org.apache.spark.sql.hive.HadoopTableReader$$anonfun$12.apply(TableReader.scala:274)
at 
org.apache.spark.sql.hive.HadoopTableReader$$anonfun$12.apply(TableReader.scala:274)
at 
org.apache.spark.rdd.HadoopRDD$$anonfun$getJobConf$6.apply(HadoopRDD.scala:176)
at 
org.apache.spark.rdd.HadoopRDD$$anonfun$getJobConf$6.apply(HadoopRDD.scala:176)
at scala.Option.map(Option.scala:145)
at org.apache.spark.rdd.HadoopRDD.getJobConf(HadoopRDD.scala:176)
at org.apache.spark.rdd.HadoopRDD$$anon$1.(HadoopRDD.scala:220)
at 

[jira] [Commented] (SPARK-9032) scala.MatchError in DataFrameReader.json(String path)

2015-08-26 Thread Navis (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-9032?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14712712#comment-14712712
 ] 

Navis commented on SPARK-9032:
--

Seemingly fixed by SPARK-8093

 scala.MatchError in DataFrameReader.json(String path)
 -

 Key: SPARK-9032
 URL: https://issues.apache.org/jira/browse/SPARK-9032
 Project: Spark
  Issue Type: Bug
  Components: Java API, SQL
Affects Versions: 1.4.0
 Environment: Ubuntu 15.04
Reporter: Philipp Poetter

 Executing read().json() of SQLContext e.g. DataFrameReader raises a 
 MatchError with a stacktrace as follows while trying to read JSON data:
 15/07/14 11:25:26 INFO TaskSchedulerImpl: Removed TaskSet 0.0, whose tasks 
 have all completed, from pool 
 15/07/14 11:25:26 INFO DAGScheduler: Job 0 finished: json at Example.java:23, 
 took 6.981330 s
 Exception in thread main scala.MatchError: StringType (of class 
 org.apache.spark.sql.types.StringType$)
   at org.apache.spark.sql.json.InferSchema$.apply(InferSchema.scala:58)
   at 
 org.apache.spark.sql.json.JSONRelation$$anonfun$schema$1.apply(JSONRelation.scala:139)
   at 
 org.apache.spark.sql.json.JSONRelation$$anonfun$schema$1.apply(JSONRelation.scala:138)
   at scala.Option.getOrElse(Option.scala:120)
   at 
 org.apache.spark.sql.json.JSONRelation.schema$lzycompute(JSONRelation.scala:137)
   at org.apache.spark.sql.json.JSONRelation.schema(JSONRelation.scala:137)
   at 
 org.apache.spark.sql.sources.LogicalRelation.init(LogicalRelation.scala:30)
   at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:120)
   at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:104)
   at org.apache.spark.sql.DataFrameReader.json(DataFrameReader.scala:213)
   at com.hp.sparkdemo.Example.main(Example.java:23)
   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
   at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
   at java.lang.reflect.Method.invoke(Method.java:497)
   at 
 org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:664)
   at 
 org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:169)
   at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:192)
   at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:111)
   at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
 15/07/14 11:25:26 INFO SparkContext: Invoking stop() from shutdown hook
 15/07/14 11:25:26 INFO SparkUI: Stopped Spark web UI at http://10.0.2.15:4040
 15/07/14 11:25:26 INFO DAGScheduler: Stopping DAGScheduler
 15/07/14 11:25:26 INFO SparkDeploySchedulerBackend: Shutting down all 
 executors
 15/07/14 11:25:26 INFO SparkDeploySchedulerBackend: Asking each executor to 
 shut down
 15/07/14 11:25:26 INFO MapOutputTrackerMasterEndpoint: 
 MapOutputTrackerMasterEndpoint stopped!
 Offending code snippet (around line 23):
 ...
JavaSparkContext sctx = new JavaSparkContext(sparkConf);
 SQLContext ctx = new SQLContext(sctx);
 DataFrame frame = ctx.read().json(facebookJSON);
 frame.printSchema();
 ...
 The exception is reproducable using the following JSON:
 {
data: [
   {
  id: X999_Y999,
  from: {
 name: Tom Brady, id: X12
  },
  message: Looking forward to 2010!,
  actions: [
 {
name: Comment,
link: http://www.facebook.com/X999/posts/Y999;
 },
 {
name: Like,
link: http://www.facebook.com/X999/posts/Y999;
 }
  ],
  type: status,
  created_time: 2010-08-02T21:27:44+,
  updated_time: 2010-08-02T21:27:44+
   },
   {
  id: X998_Y998,
  from: {
 name: Peyton Manning, id: X18
  },
  message: Where's my contract?,
  actions: [
 {
name: Comment,
link: http://www.facebook.com/X998/posts/Y998;
 },
 {
name: Like,
link: http://www.facebook.com/X998/posts/Y998;
 }
  ],
  type: status,
  created_time: 2010-08-02T21:27:44+,
  updated_time: 2010-08-02T21:27:44+
   }
]
 }



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-9685) Unsupported dataType: char(X) in Hive

2015-08-24 Thread Navis (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-9685?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Navis updated SPARK-9685:
-
Attachment: SPARK-9685.1.patch.txt

I've also met the situation to handle char type in spark a few months ago. The 
patch attached worked for me but not sure if it's complete package.

 Unsupported dataType: char(X) in Hive
 ---

 Key: SPARK-9685
 URL: https://issues.apache.org/jira/browse/SPARK-9685
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 1.5.0
Reporter: Ángel Álvarez
 Attachments: SPARK-9685.1.patch.txt


 I'm getting the following error when I try to read a Hive table with char(X) 
 fields:
 {code}
 15/08/06 11:38:51 INFO parse.ParseDriver: Parse Completed
 org.apache.spark.sql.types.DataTypeException: Unsupported dataType: char(8). 
 If you have a struct and a field name of it has any special characters, 
 please use backticks (`) to quote that field name, e.g. `x+y`. Please note 
 that backtick itself is not supported in a field name.
 at 
 org.apache.spark.sql.types.DataTypeParser$class.toDataType(DataTypeParser.scala:95)
 at 
 org.apache.spark.sql.types.DataTypeParser$$anon$1.toDataType(DataTypeParser.scala:107)
 at 
 org.apache.spark.sql.types.DataTypeParser$.parse(DataTypeParser.scala:111)
 at 
 org.apache.spark.sql.hive.HiveMetastoreTypes$.toDataType(HiveMetastoreCatalog.scala:769)
 at 
 org.apache.spark.sql.hive.MetastoreRelation$SchemaAttribute.toAttribute(HiveMetastoreCatalog.scala:742)
 at 
 org.apache.spark.sql.hive.MetastoreRelation$$anonfun$44.apply(HiveMetastoreCatalog.scala:752)
 at 
 org.apache.spark.sql.hive.MetastoreRelation$$anonfun$44.apply(HiveMetastoreCatalog.scala:752)
 {code}
 It seems there is no char DataType defined in the DataTypeParser class
 {code}
   protected lazy val primitiveType: Parser[DataType] =
 (?i)string.r ^^^ StringType |
 (?i)float.r ^^^ FloatType |
 (?i)(?:int|integer).r ^^^ IntegerType |
 (?i)tinyint.r ^^^ ByteType |
 (?i)smallint.r ^^^ ShortType |
 (?i)double.r ^^^ DoubleType |
 (?i)(?:bigint|long).r ^^^ LongType |
 (?i)binary.r ^^^ BinaryType |
 (?i)boolean.r ^^^ BooleanType |
 fixedDecimalType |
 (?i)decimal.r ^^^ DecimalType.USER_DEFAULT |
 (?i)date.r ^^^ DateType |
 (?i)timestamp.r ^^^ TimestampType |
 varchar
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-10151) Support invocation of hive macro

2015-08-21 Thread Navis (JIRA)
Navis created SPARK-10151:
-

 Summary: Support invocation of hive macro
 Key: SPARK-10151
 URL: https://issues.apache.org/jira/browse/SPARK-10151
 Project: Spark
  Issue Type: Bug
  Components: SQL
Reporter: Navis
Priority: Minor


Macro in hive (which is GenericUDFMacro) contains real function inside of it 
but it's not conveyed to tasks, resulting null-pointer exception.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-10152) Support Init script for hive-thriftserver

2015-08-21 Thread Navis (JIRA)
Navis created SPARK-10152:
-

 Summary: Support Init script for hive-thriftserver
 Key: SPARK-10152
 URL: https://issues.apache.org/jira/browse/SPARK-10152
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Reporter: Navis
Priority: Trivial


If some queries can be executed on thrift server in initialization stage(mostly 
for registering functions or macros), things are done much easier. 

Not big stuff to be included in spark but wish someone can use of this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-8826) Fix ClassCastException in GeneratedAggregate

2015-07-04 Thread Navis (JIRA)
Navis created SPARK-8826:


 Summary: Fix ClassCastException in GeneratedAggregate
 Key: SPARK-8826
 URL: https://issues.apache.org/jira/browse/SPARK-8826
 Project: Spark
  Issue Type: Bug
  Components: SQL
Reporter: Navis
Priority: Trivial


When codegen is disabled, ClassCastException is thrown in some cases.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-8357) Memory leakage on unsafe aggregation path with empty input

2015-06-13 Thread Navis (JIRA)
Navis created SPARK-8357:


 Summary: Memory leakage on unsafe aggregation path with empty input
 Key: SPARK-8357
 URL: https://issues.apache.org/jira/browse/SPARK-8357
 Project: Spark
  Issue Type: Bug
  Components: SQL
Reporter: Navis
Priority: Minor


Currently, unsafe-based hash is released on 'next' call but if input is empty, 
it would not be called ever. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-8334) Binary logical plan should provide more realistic statistics

2015-06-12 Thread Navis (JIRA)
Navis created SPARK-8334:


 Summary: Binary logical plan should provide more realistic 
statistics
 Key: SPARK-8334
 URL: https://issues.apache.org/jira/browse/SPARK-8334
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Reporter: Navis
Priority: Minor


Currently, spark-sql multiplies size of two children for output size, which 
makes   following join hash the other side of input. In multi way joins 
something like (A join B) join C, C  will be marked as 'buildHash' target if 
size of C is not bigger than (size of A * size of B). Some results of TPC-H 
queries are greatly affected by this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-8312) Populate statistics info of hive tables if it's needed to be

2015-06-11 Thread Navis (JIRA)
Navis created SPARK-8312:


 Summary: Populate statistics info of hive tables if it's needed to 
be
 Key: SPARK-8312
 URL: https://issues.apache.org/jira/browse/SPARK-8312
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Reporter: Navis
Priority: Minor


Currently, spark-sql uses stats in metastore for estimating size of hive table, 
which means analyze command should be executed before accessing the table for 
better planning especially for joins. But still with the stats, it cannot 
reflect real input size of the query when partition prunning predicate exists 
in it.

Even worse is that hive cannot update megastore stats for external tables, 
which is fixed recently in HIVE-6727. The issue detail says the bug is applied 
to all hive version between 0.13.0 and 1.2.0



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-8312) Populate statistics info of hive tables if it's needed to be

2015-06-11 Thread Navis (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-8312?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Navis updated SPARK-8312:
-
Description: 
Currently, spark-sql uses stats in metastore for estimating size of hive table, 
which means analyze command should be executed before accessing the table for 
better planning especially for joins. But still with the stats, it cannot 
reflect real input size of the query when partition prunning predicate exists 
in it.

Even worse is that hive cannot update metastore stats for external tables, 
which is fixed recently in HIVE-6727. The issue detail says the bug is applied 
to all hive version between 0.13.0 and 1.2.0

  was:
Currently, spark-sql uses stats in metastore for estimating size of hive table, 
which means analyze command should be executed before accessing the table for 
better planning especially for joins. But still with the stats, it cannot 
reflect real input size of the query when partition prunning predicate exists 
in it.

Even worse is that hive cannot update megastore stats for external tables, 
which is fixed recently in HIVE-6727. The issue detail says the bug is applied 
to all hive version between 0.13.0 and 1.2.0


 Populate statistics info of hive tables if it's needed to be
 

 Key: SPARK-8312
 URL: https://issues.apache.org/jira/browse/SPARK-8312
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Reporter: Navis
Priority: Minor

 Currently, spark-sql uses stats in metastore for estimating size of hive 
 table, which means analyze command should be executed before accessing the 
 table for better planning especially for joins. But still with the stats, it 
 cannot reflect real input size of the query when partition prunning predicate 
 exists in it.
 Even worse is that hive cannot update metastore stats for external tables, 
 which is fixed recently in HIVE-6727. The issue detail says the bug is 
 applied to all hive version between 0.13.0 and 1.2.0



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-8285) CombineSum should be calculated as unlimited decimal first

2015-06-09 Thread Navis (JIRA)
Navis created SPARK-8285:


 Summary: CombineSum should be calculated as unlimited decimal first
 Key: SPARK-8285
 URL: https://issues.apache.org/jira/browse/SPARK-8285
 Project: Spark
  Issue Type: Bug
  Components: SQL
Reporter: Navis
Priority: Trivial


{code:title=GeneratedAggregate.scala}
case cs @ CombineSum(expr) =
val calcType = expr.dataType
  expr.dataType match {
case DecimalType.Fixed(_, _) =
  DecimalType.Unlimited
case _ =
  expr.dataType
  }
{code}
calcType is always expr.dataType. credits are all belong to IntelliJ



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-8153) Add configuration for disabling partial aggregation in runtime

2015-06-07 Thread Navis (JIRA)
Navis created SPARK-8153:


 Summary: Add configuration for disabling partial aggregation in 
runtime
 Key: SPARK-8153
 URL: https://issues.apache.org/jira/browse/SPARK-8153
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Reporter: Navis
Priority: Trivial


Same thing with hive.map.aggr.hash.min.reduction in hive, which disables hash 
aggregation if it's not sufficiently decreasing the output size.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-7936) Add configuration for initial size and limit of hash for aggregation

2015-05-29 Thread Navis (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-7936?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Navis updated SPARK-7936:
-
Summary: Add configuration for initial size and limit of hash for 
aggregation  (was: Add configuration for initial size of hash for aggregation 
and limit)

 Add configuration for initial size and limit of hash for aggregation
 

 Key: SPARK-7936
 URL: https://issues.apache.org/jira/browse/SPARK-7936
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Reporter: Navis
Priority: Minor

 Partial aggregation takes a lot of memory and mostly cannot be completed if 
 it's not sliced into very small partitions (large in count). This patch is 
 for limiting entry size for partial aggregation. Initial size for hash is 
 just a bonus.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-7936) Add configuration for initial size of hash for aggregation and limit

2015-05-29 Thread Navis (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-7936?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14564271#comment-14564271
 ] 

Navis commented on SPARK-7936:
--

Added two configuration 
1. spark.sql.aggregation.hash.initSize : initialize size of hash. Applied to 
both(final, partial) aggregation
2. spark.sql.partial.aggregation.maxEntry : max size of hash for partial 
aggregation. should not be used for final aggregation

 Add configuration for initial size of hash for aggregation and limit
 

 Key: SPARK-7936
 URL: https://issues.apache.org/jira/browse/SPARK-7936
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Reporter: Navis
Priority: Minor

 Partial aggregation takes a lot of memory and mostly cannot be completed if 
 it's not sliced into very small partitions (large in count). This patch is 
 for limiting entry size for partial aggregation. Initial size for hash is 
 just a bonus.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-7936) Add configuration for initial size of hash for aggregation and limit

2015-05-28 Thread Navis (JIRA)
Navis created SPARK-7936:


 Summary: Add configuration for initial size of hash for 
aggregation and limit
 Key: SPARK-7936
 URL: https://issues.apache.org/jira/browse/SPARK-7936
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Reporter: Navis
Priority: Minor


Partial aggregation takes a lot of memory and mostly cannot be completed if 
it's not sliced into very small partitions (large in count). This patch is for 
limiting entry size for partial aggregation. Initial size for hash is just a 
bonus.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org