[jira] [Commented] (SPARK-43865) spark cluster deploy mode cannot initialize metastore java.sql.SQLException: No suitable driver found for jdbc:mysql

2023-06-01 Thread pin_zhang (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-43865?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17728234#comment-17728234
 ] 

pin_zhang commented on SPARK-43865:
---

It's not convinient to upload jar to all worker nodes, because the jars is 
dymanically issued accoding to configuration.

> spark cluster deploy mode cannot initialize metastore java.sql.SQLException: 
> No suitable driver found for jdbc:mysql
> 
>
> Key: SPARK-43865
> URL: https://issues.apache.org/jira/browse/SPARK-43865
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: pin_zhang
>Priority: Major
>
> 1. Test with JDK 11 + SPARK340
> object BugHS {
>   def main(args: Array[String]): Unit = {
> val conf = new SparkConf()
> 
> conf.set("javax.jdo.option.ConnectionURL","jdbc:mysql://mysql:3306/hive_ms_spark3?useSSL=false")
> conf.set("javax.jdo.option.ConnectionDriverName","com.mysql.jdbc.Driver")
> conf.set("javax.jdo.option.ConnectionUserName","**")
> conf.set("javax.jdo.option.ConnectionPassword","**")
> conf.set("spark.sql.hive.thriftServer.singleSession","false")
> conf.set("spark.sql.warehouse.dir","hdfs://hadoop/warehouse_spark3")
> import org.apache.spark.sql.SparkSession
> val spark = SparkSession
>   .builder()
>   .appName("Test").config(conf).enableHiveSupport()
>   .getOrCreate()
> HiveThriftServer2.startWithContext(spark.sqlContext)
> spark.sql("create table IF NOT EXISTS test2 (id int) USING parquet")
>   }
> }
> 2. Submit in cluster mode
>a. spark_config.properties 
> spark.master=spark://master:6066
> 
> spark.jars=hdfs://hadoop/tmp/test_bug/mysql-connector-java-5.1.47.jar
> spark.master.rest.enabled=true
>b. spark-submit2.cmd --deploy-mode cluster  --properties-file 
> spark_config.properties  --class com.test.BugHS 
> "hdfs://hadoop/tmp/test_bug/bug_classloader.jar"  
> 3.  Meet "No suitable driver found" exception, caused by classloader is 
> different for driver in spark.jars and metastore jar in JDK 11
> java.sql.SQLException: Unable to open a test connection to the given 
> database. JDBC url = jdbc:mysql://mysql:3306/hive_ms_spark3?useSSL=false, 
> username = root. Terminating connection pool (set lazyInit to true if you 
> expect to start your database after your app). Original Exception: --
> java.sql.SQLException: No suitable driver found for 
> jdbc:mysql://mysql:3306/hive_ms_spark3?useSSL=false
>   at java.sql/java.sql.DriverManager.getConnection(DriverManager.java:702)
>   at java.sql/java.sql.DriverManager.getConnection(DriverManager.java:189)
>   at com.jolbox.bonecp.BoneCP.obtainRawInternalConnection(BoneCP.java:361)
>   at com.jolbox.bonecp.BoneCP.(BoneCP.java:416)
>   at 
> com.jolbox.bonecp.BoneCPDataSource.getConnection(BoneCPDataSource.java:120)
>   at 
> org.datanucleus.store.rdbms.ConnectionFactoryImpl$ManagedConnectionImpl.getConnection(ConnectionFactoryImpl.java:483)
>   at 
> org.datanucleus.store.rdbms.RDBMSStoreManager.(RDBMSStoreManager.java:297)
>   at 
> jdk.internal.reflect.GeneratedConstructorAccessor77.newInstance(Unknown 
> Source)



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-43865) spark cluster deploy mode cannot initialize metastore java.sql.SQLException: No suitable driver found for jdbc:mysql

2023-05-29 Thread pin_zhang (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-43865?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

pin_zhang updated SPARK-43865:
--
Description: 
1. Test with JDK 11 + SPARK340
object BugHS {
  def main(args: Array[String]): Unit = {
val conf = new SparkConf()

conf.set("javax.jdo.option.ConnectionURL","jdbc:mysql://mysql:3306/hive_ms_spark3?useSSL=false")
conf.set("javax.jdo.option.ConnectionDriverName","com.mysql.jdbc.Driver")
conf.set("javax.jdo.option.ConnectionUserName","**")
conf.set("javax.jdo.option.ConnectionPassword","**")
conf.set("spark.sql.hive.thriftServer.singleSession","false")
conf.set("spark.sql.warehouse.dir","hdfs://hadoop/warehouse_spark3")
import org.apache.spark.sql.SparkSession
val spark = SparkSession
  .builder()
  .appName("Test").config(conf).enableHiveSupport()
  .getOrCreate()
HiveThriftServer2.startWithContext(spark.sqlContext)
spark.sql("create table IF NOT EXISTS test2 (id int) USING parquet")
  }
}
2. Submit in cluster mode
   a. spark_config.properties 
spark.master=spark://master:6066

spark.jars=hdfs://hadoop/tmp/test_bug/mysql-connector-java-5.1.47.jar
spark.master.rest.enabled=true
   b. spark-submit2.cmd --deploy-mode cluster  --properties-file 
spark_config.properties  --class com.test.BugHS 
"hdfs://hadoop/tmp/test_bug/bug_classloader.jar"  

3.  Meet "No suitable driver found" exception, caused by classloader is 
different for driver in spark.jars and metastore jar in JDK 11
java.sql.SQLException: Unable to open a test connection to the given database. 
JDBC url = jdbc:mysql://mysql:3306/hive_ms_spark3?useSSL=false, username = 
root. Terminating connection pool (set lazyInit to true if you expect to start 
your database after your app). Original Exception: --
java.sql.SQLException: No suitable driver found for 
jdbc:mysql://mysql:3306/hive_ms_spark3?useSSL=false
at java.sql/java.sql.DriverManager.getConnection(DriverManager.java:702)
at java.sql/java.sql.DriverManager.getConnection(DriverManager.java:189)
at com.jolbox.bonecp.BoneCP.obtainRawInternalConnection(BoneCP.java:361)
at com.jolbox.bonecp.BoneCP.(BoneCP.java:416)
at 
com.jolbox.bonecp.BoneCPDataSource.getConnection(BoneCPDataSource.java:120)
at 
org.datanucleus.store.rdbms.ConnectionFactoryImpl$ManagedConnectionImpl.getConnection(ConnectionFactoryImpl.java:483)
at 
org.datanucleus.store.rdbms.RDBMSStoreManager.(RDBMSStoreManager.java:297)
at 
jdk.internal.reflect.GeneratedConstructorAccessor77.newInstance(Unknown Source)


  was:
1. Test with JDK 11 + SPARK340

object BugHS {

  def main(args: Array[String]): Unit = {

val conf = new SparkConf()

conf.set("javax.jdo.option.ConnectionURL","jdbc:mysql://mysql:3306/hive_ms_spark3?useSSL=false")
conf.set("javax.jdo.option.ConnectionDriverName","com.mysql.jdbc.Driver")
conf.set("javax.jdo.option.ConnectionUserName","**")
conf.set("javax.jdo.option.ConnectionPassword","**")
conf.set("spark.sql.hive.thriftServer.singleSession","false")
conf.set("spark.sql.warehouse.dir","hdfs://hadoop/warehouse_spark3")
import org.apache.spark.sql.SparkSession
val spark = SparkSession
  .builder()
  .appName("Test").config(conf).enableHiveSupport()
  .getOrCreate()
HiveThriftServer2.startWithContext(spark.sqlContext)
spark.sql("create table IF NOT EXISTS test2 (id int) USING parquet")
  }

}
3. Submit in cluster mode
spark.master=spark\://10.111.7.150\:6066
spark.jars=hdfs\://10.111.7.150\:8020/tmp/test_bug/mysql-connector-java-5.1.47.jar
spark.master.rest.enabled=true




> spark cluster deploy mode cannot initialize metastore java.sql.SQLException: 
> No suitable driver found for jdbc:mysql
> 
>
> Key: SPARK-43865
> URL: https://issues.apache.org/jira/browse/SPARK-43865
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: pin_zhang
>Priority: Major
>
> 1. Test with JDK 11 + SPARK340
> object BugHS {
>   def main(args: Array[String]): Unit = {
> val conf = new SparkConf()
> 
> conf.set("javax.jdo.option.ConnectionURL","jdbc:mysql://mysql:3306/hive_ms_spark3?useSSL=false")
> conf.set("javax.jdo.option.ConnectionDriverName","com.mysql.jdbc.Driver")
> conf.set("javax.jdo.option.ConnectionUserName","**")
> conf.set("javax.jdo.option.ConnectionPassword","**")
> conf.set("spark.sql.hive.thriftServer.singleSession","false")
> conf.set("spark.sql.warehouse.dir","hdfs://hadoop/warehouse_spark3")
> import org.apache.spark.sql.SparkSession
> val spark = SparkSession
>   .builder()
>   

[jira] [Updated] (SPARK-43865) spark cluster deploy mode cannot initialize metastore java.sql.SQLException: No suitable driver found for jdbc:mysql

2023-05-29 Thread pin_zhang (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-43865?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

pin_zhang updated SPARK-43865:
--
Description: 
1. Test with JDK 11 + SPARK340

object BugHS {

  def main(args: Array[String]): Unit = {

val conf = new SparkConf()

conf.set("javax.jdo.option.ConnectionURL","jdbc:mysql://mysql:3306/hive_ms_spark3?useSSL=false")
conf.set("javax.jdo.option.ConnectionDriverName","com.mysql.jdbc.Driver")
conf.set("javax.jdo.option.ConnectionUserName","**")
conf.set("javax.jdo.option.ConnectionPassword","**")
conf.set("spark.sql.hive.thriftServer.singleSession","false")
conf.set("spark.sql.warehouse.dir","hdfs://hadoop/warehouse_spark3")
import org.apache.spark.sql.SparkSession
val spark = SparkSession
  .builder()
  .appName("Test").config(conf).enableHiveSupport()
  .getOrCreate()
HiveThriftServer2.startWithContext(spark.sqlContext)
spark.sql("create table IF NOT EXISTS test2 (id int) USING parquet")
  }

}
3. Submit in cluster mode
spark.master=spark\://10.111.7.150\:6066
spark.jars=hdfs\://10.111.7.150\:8020/tmp/test_bug/mysql-connector-java-5.1.47.jar
spark.master.rest.enabled=true



> spark cluster deploy mode cannot initialize metastore java.sql.SQLException: 
> No suitable driver found for jdbc:mysql
> 
>
> Key: SPARK-43865
> URL: https://issues.apache.org/jira/browse/SPARK-43865
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: pin_zhang
>Priority: Major
>
> 1. Test with JDK 11 + SPARK340
> object BugHS {
>   def main(args: Array[String]): Unit = {
> val conf = new SparkConf()
> 
> conf.set("javax.jdo.option.ConnectionURL","jdbc:mysql://mysql:3306/hive_ms_spark3?useSSL=false")
> conf.set("javax.jdo.option.ConnectionDriverName","com.mysql.jdbc.Driver")
> conf.set("javax.jdo.option.ConnectionUserName","**")
> conf.set("javax.jdo.option.ConnectionPassword","**")
> conf.set("spark.sql.hive.thriftServer.singleSession","false")
> conf.set("spark.sql.warehouse.dir","hdfs://hadoop/warehouse_spark3")
> import org.apache.spark.sql.SparkSession
> val spark = SparkSession
>   .builder()
>   .appName("Test").config(conf).enableHiveSupport()
>   .getOrCreate()
> HiveThriftServer2.startWithContext(spark.sqlContext)
> spark.sql("create table IF NOT EXISTS test2 (id int) USING parquet")
>   }
> }
> 3. Submit in cluster mode
> spark.master=spark\://10.111.7.150\:6066
> spark.jars=hdfs\://10.111.7.150\:8020/tmp/test_bug/mysql-connector-java-5.1.47.jar
> spark.master.rest.enabled=true



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-43865) spark cluster deploy mode cannot initialize metastore java.sql.SQLException: No suitable driver found for jdbc:mysql

2023-05-29 Thread pin_zhang (Jira)
pin_zhang created SPARK-43865:
-

 Summary: spark cluster deploy mode cannot initialize metastore 
java.sql.SQLException: No suitable driver found for jdbc:mysql
 Key: SPARK-43865
 URL: https://issues.apache.org/jira/browse/SPARK-43865
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 3.4.0
Reporter: pin_zhang






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-41168) Spark Master OOM when Worker No space left on device

2022-11-16 Thread pin_zhang (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-41168?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

pin_zhang updated SPARK-41168:
--
Description: 
Spark master + 1 Spark workers (One is No space left on device)
1.  Submit a app with 2 instances
2.  Stop the good worker 
3.  Spark master launch executor continously
 Bad Spark worker cannot create folder.
 Result in large number of executors kept  in spark master memory
2022-11-15 22:21:43 INFO  Master:54 - Launching executor 
app-20221115221952-0016/93441 on worker worker-20221115202400-10.111.1.10-40011
2022-11-15 22:21:43 INFO  Master:54 - Removing executor 
app-20221115221952-0016/93441 because it is FAILED
2022-11-15 22:21:43 INFO  Master:54 - Launching executor 
app-20221115221952-0016/93442 on worker worker-20221115202400-10.111.1.10-40011
2022-11-15 22:21:43 INFO  Master:54 - Removing executor 
app-20221115221952-0016/93442 because it is FAILED


  was:
Spark master + 1 Spark workers (One is No space left on device)
1.  Submit a app with 2 instances
2.  Stop the good worker 
3.  Spark master launch executor continously
 Result in large number of executors kept  in spark master memory
2022-11-15 22:21:43 INFO  Master:54 - Launching executor 
app-20221115221952-0016/93441 on worker worker-20221115202400-10.111.1.10-40011
2022-11-15 22:21:43 INFO  Master:54 - Removing executor 
app-20221115221952-0016/93441 because it is FAILED
2022-11-15 22:21:43 INFO  Master:54 - Launching executor 
app-20221115221952-0016/93442 on worker worker-20221115202400-10.111.1.10-40011
2022-11-15 22:21:43 INFO  Master:54 - Removing executor 
app-20221115221952-0016/93442 because it is FAILED



> Spark Master OOM when Worker No space left on device
> 
>
> Key: SPARK-41168
> URL: https://issues.apache.org/jira/browse/SPARK-41168
> Project: Spark
>  Issue Type: Bug
>  Components: Deploy
>Affects Versions: 2.3.1
>Reporter: pin_zhang
>Priority: Major
>
> Spark master + 1 Spark workers (One is No space left on device)
> 1.  Submit a app with 2 instances
> 2.  Stop the good worker 
> 3.  Spark master launch executor continously
>  Bad Spark worker cannot create folder.
>  Result in large number of executors kept  in spark master memory
> 2022-11-15 22:21:43 INFO  Master:54 - Launching executor 
> app-20221115221952-0016/93441 on worker 
> worker-20221115202400-10.111.1.10-40011
> 2022-11-15 22:21:43 INFO  Master:54 - Removing executor 
> app-20221115221952-0016/93441 because it is FAILED
> 2022-11-15 22:21:43 INFO  Master:54 - Launching executor 
> app-20221115221952-0016/93442 on worker 
> worker-20221115202400-10.111.1.10-40011
> 2022-11-15 22:21:43 INFO  Master:54 - Removing executor 
> app-20221115221952-0016/93442 because it is FAILED



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-41168) Spark Master OOM when Worker No space left on device

2022-11-16 Thread pin_zhang (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-41168?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

pin_zhang updated SPARK-41168:
--
Description: 
Spark master + 1 Spark workers (One is No space left on device)
1.  Submit a app with 2 instances
2.  Stop the good worker 
3.  Spark master launch executor continously
 Result in large number of executors kept  in spark master memory
2022-11-15 22:21:43 INFO  Master:54 - Launching executor 
app-20221115221952-0016/93441 on worker worker-20221115202400-10.111.1.10-40011
2022-11-15 22:21:43 INFO  Master:54 - Removing executor 
app-20221115221952-0016/93441 because it is FAILED
2022-11-15 22:21:43 INFO  Master:54 - Launching executor 
app-20221115221952-0016/93442 on worker worker-20221115202400-10.111.1.10-40011
2022-11-15 22:21:43 INFO  Master:54 - Removing executor 
app-20221115221952-0016/93442 because it is FAILED


  was:
Spark master + 1 Spark workers (One is )
Submit a app with 2 instance, 
1. Spark worker SPARK-41168
Caused by large number of executors kept  in spark master memory
2022-11-15 22:21:43 INFO  Master:54 - Launching executor 
app-20221115221952-0016/93441 on worker worker-20221115202400-10.111.1.10-40011
2022-11-15 22:21:43 INFO  Master:54 - Removing executor 
app-20221115221952-0016/93441 because it is FAILED
2022-11-15 22:21:43 INFO  Master:54 - Launching executor 
app-20221115221952-0016/93442 on worker worker-20221115202400-10.111.1.10-40011
2022-11-15 22:21:43 INFO  Master:54 - Removing executor 
app-20221115221952-0016/93442 because it is FAILED



> Spark Master OOM when Worker No space left on device
> 
>
> Key: SPARK-41168
> URL: https://issues.apache.org/jira/browse/SPARK-41168
> Project: Spark
>  Issue Type: Bug
>  Components: Deploy
>Affects Versions: 2.3.1
>Reporter: pin_zhang
>Priority: Major
>
> Spark master + 1 Spark workers (One is No space left on device)
> 1.  Submit a app with 2 instances
> 2.  Stop the good worker 
> 3.  Spark master launch executor continously
>  Result in large number of executors kept  in spark master memory
> 2022-11-15 22:21:43 INFO  Master:54 - Launching executor 
> app-20221115221952-0016/93441 on worker 
> worker-20221115202400-10.111.1.10-40011
> 2022-11-15 22:21:43 INFO  Master:54 - Removing executor 
> app-20221115221952-0016/93441 because it is FAILED
> 2022-11-15 22:21:43 INFO  Master:54 - Launching executor 
> app-20221115221952-0016/93442 on worker 
> worker-20221115202400-10.111.1.10-40011
> 2022-11-15 22:21:43 INFO  Master:54 - Removing executor 
> app-20221115221952-0016/93442 because it is FAILED



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-41168) Spark Master OOM when Worker No space left on device

2022-11-16 Thread pin_zhang (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-41168?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

pin_zhang updated SPARK-41168:
--
Description: 
Spark master + 1 Spark workers (One is )
Submit a app with 2 instance, 
1. Spark worker SPARK-41168
Caused by large number of executors kept  in spark master memory
2022-11-15 22:21:43 INFO  Master:54 - Launching executor 
app-20221115221952-0016/93441 on worker worker-20221115202400-10.111.1.10-40011
2022-11-15 22:21:43 INFO  Master:54 - Removing executor 
app-20221115221952-0016/93441 because it is FAILED
2022-11-15 22:21:43 INFO  Master:54 - Launching executor 
app-20221115221952-0016/93442 on worker worker-20221115202400-10.111.1.10-40011
2022-11-15 22:21:43 INFO  Master:54 - Removing executor 
app-20221115221952-0016/93442 because it is FAILED


  was:
1. Spark worker 
Caused by large number of executors kept  in spark master memory
2022-11-15 22:21:43 INFO  Master:54 - Launching executor 
app-20221115221952-0016/93441 on worker worker-20221115202400-10.111.1.10-40011
2022-11-15 22:21:43 INFO  Master:54 - Removing executor 
app-20221115221952-0016/93441 because it is FAILED
2022-11-15 22:21:43 INFO  Master:54 - Launching executor 
app-20221115221952-0016/93442 on worker worker-20221115202400-10.111.1.10-40011
2022-11-15 22:21:43 INFO  Master:54 - Removing executor 
app-20221115221952-0016/93442 because it is FAILED



> Spark Master OOM when Worker No space left on device
> 
>
> Key: SPARK-41168
> URL: https://issues.apache.org/jira/browse/SPARK-41168
> Project: Spark
>  Issue Type: Bug
>  Components: Deploy
>Affects Versions: 2.3.1
>Reporter: pin_zhang
>Priority: Major
>
> Spark master + 1 Spark workers (One is )
> Submit a app with 2 instance, 
> 1. Spark worker SPARK-41168
> Caused by large number of executors kept  in spark master memory
> 2022-11-15 22:21:43 INFO  Master:54 - Launching executor 
> app-20221115221952-0016/93441 on worker 
> worker-20221115202400-10.111.1.10-40011
> 2022-11-15 22:21:43 INFO  Master:54 - Removing executor 
> app-20221115221952-0016/93441 because it is FAILED
> 2022-11-15 22:21:43 INFO  Master:54 - Launching executor 
> app-20221115221952-0016/93442 on worker 
> worker-20221115202400-10.111.1.10-40011
> 2022-11-15 22:21:43 INFO  Master:54 - Removing executor 
> app-20221115221952-0016/93442 because it is FAILED



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-41168) Spark Master OOM when Worker No space left on device

2022-11-16 Thread pin_zhang (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-41168?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

pin_zhang updated SPARK-41168:
--
Description: 
1. Spark worker 
Caused by large number of executors kept  in spark master memory
2022-11-15 22:21:43 INFO  Master:54 - Launching executor 
app-20221115221952-0016/93441 on worker worker-20221115202400-10.111.1.10-40011
2022-11-15 22:21:43 INFO  Master:54 - Removing executor 
app-20221115221952-0016/93441 because it is FAILED
2022-11-15 22:21:43 INFO  Master:54 - Launching executor 
app-20221115221952-0016/93442 on worker worker-20221115202400-10.111.1.10-40011
2022-11-15 22:21:43 INFO  Master:54 - Removing executor 
app-20221115221952-0016/93442 because it is FAILED


  was:
Caused by large number of executors kept  in spark master memory
2022-11-15 22:21:43 INFO  Master:54 - Launching executor 
app-20221115221952-0016/93441 on worker worker-20221115202400-10.111.1.10-40011
2022-11-15 22:21:43 INFO  Master:54 - Removing executor 
app-20221115221952-0016/93441 because it is FAILED
2022-11-15 22:21:43 INFO  Master:54 - Launching executor 
app-20221115221952-0016/93442 on worker worker-20221115202400-10.111.1.10-40011
2022-11-15 22:21:43 INFO  Master:54 - Removing executor 
app-20221115221952-0016/93442 because it is FAILED



> Spark Master OOM when Worker No space left on device
> 
>
> Key: SPARK-41168
> URL: https://issues.apache.org/jira/browse/SPARK-41168
> Project: Spark
>  Issue Type: Bug
>  Components: Deploy
>Affects Versions: 2.3.1
>Reporter: pin_zhang
>Priority: Major
>
> 1. Spark worker 
> Caused by large number of executors kept  in spark master memory
> 2022-11-15 22:21:43 INFO  Master:54 - Launching executor 
> app-20221115221952-0016/93441 on worker 
> worker-20221115202400-10.111.1.10-40011
> 2022-11-15 22:21:43 INFO  Master:54 - Removing executor 
> app-20221115221952-0016/93441 because it is FAILED
> 2022-11-15 22:21:43 INFO  Master:54 - Launching executor 
> app-20221115221952-0016/93442 on worker 
> worker-20221115202400-10.111.1.10-40011
> 2022-11-15 22:21:43 INFO  Master:54 - Removing executor 
> app-20221115221952-0016/93442 because it is FAILED



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-41168) Spark Master OOM when Worker No space left on device

2022-11-16 Thread pin_zhang (Jira)
pin_zhang created SPARK-41168:
-

 Summary: Spark Master OOM when Worker No space left on device
 Key: SPARK-41168
 URL: https://issues.apache.org/jira/browse/SPARK-41168
 Project: Spark
  Issue Type: Bug
  Components: Deploy
Affects Versions: 2.3.1
Reporter: pin_zhang


Caused by large number of executors kept  in spark master memory
2022-11-15 22:21:43 INFO  Master:54 - Launching executor 
app-20221115221952-0016/93441 on worker worker-20221115202400-10.111.1.10-40011
2022-11-15 22:21:43 INFO  Master:54 - Removing executor 
app-20221115221952-0016/93441 because it is FAILED
2022-11-15 22:21:43 INFO  Master:54 - Launching executor 
app-20221115221952-0016/93442 on worker worker-20221115202400-10.111.1.10-40011
2022-11-15 22:21:43 INFO  Master:54 - Removing executor 
app-20221115221952-0016/93442 because it is FAILED




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-33946) Cannot connect to spark hive after session timeout

2020-12-30 Thread pin_zhang (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-33946?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

pin_zhang updated SPARK-33946:
--
Description: 
Test with hive setting:
* hive.server2.idle.session.timeout=6
* hive.server2.session.check.interval=1
* hive.server2.thrift.max.worker.threads=2
* hive.server2.thrift.min.worker.threads=1

1. Connect with user test1 and hold the connection.
2. Wait  1 minute, Connect with user test2
3. Seen from web ui  http://localhost:4040/sqlserver/, wait session for  test1 
finished 
4. Connect with user test3, connect was refused 
  Task has been rejected by ExecutorService 10 times till timedout, reason: 
java.util.concurrent.RejectedExecutionException: Task 
org.apache.thrift.server.TThreadPoolServer$WorkerProcess@71b1ea28 rejected from 
java.util.concurrent.ThreadPoolExecutor@716d9872[Running, pool size = 2, active 
threads = 2, queued tasks = 0, completed tasks = 2]
Seems the session was not removed, while UI show finished time for session 

 




  was:
Test with hive setting:
* hive.server2.idle.session.timeout=6
* hive.server2.session.check.interval=1
* hive.server2.thrift.max.worker.threads=2
* hive.server2.thrift.min.worker.threads=1

1. Connect with user test1 and hold the connection.
2. Wait  1 minute, Connect to user test2
3. Seen from web ui  http://localhost:4040/sqlserver/, session for  test1 
finished 
4. Connect with user test3, connect was refused 
  Task has been rejected by ExecutorService 10 times till timedout, reason: 
java.util.concurrent.RejectedExecutionException: Task 
org.apache.thrift.server.TThreadPoolServer$WorkerProcess@71b1ea28 rejected from 
java.util.concurrent.ThreadPoolExecutor@716d9872[Running, pool size = 2, active 
threads = 2, queued tasks = 0, completed tasks = 2]
Seems the session was not removed, while UI show finished time for session 

 





> Cannot connect to spark hive after session timeout
> --
>
> Key: SPARK-33946
> URL: https://issues.apache.org/jira/browse/SPARK-33946
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.3.1
>Reporter: pin_zhang
>Priority: Major
>
> Test with hive setting:
> * hive.server2.idle.session.timeout=6
> * hive.server2.session.check.interval=1
> * hive.server2.thrift.max.worker.threads=2
> * hive.server2.thrift.min.worker.threads=1
> 1. Connect with user test1 and hold the connection.
> 2. Wait  1 minute, Connect with user test2
> 3. Seen from web ui  http://localhost:4040/sqlserver/, wait session for  
> test1 finished 
> 4. Connect with user test3, connect was refused 
>   Task has been rejected by ExecutorService 10 times till timedout, reason: 
> java.util.concurrent.RejectedExecutionException: Task 
> org.apache.thrift.server.TThreadPoolServer$WorkerProcess@71b1ea28 rejected 
> from java.util.concurrent.ThreadPoolExecutor@716d9872[Running, pool size = 2, 
> active threads = 2, queued tasks = 0, completed tasks = 2]
> Seems the session was not removed, while UI show finished time for session 
>



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-33946) Cannot connect to spark hive after session timeout

2020-12-30 Thread pin_zhang (Jira)
pin_zhang created SPARK-33946:
-

 Summary: Cannot connect to spark hive after session timeout
 Key: SPARK-33946
 URL: https://issues.apache.org/jira/browse/SPARK-33946
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 2.3.1
Reporter: pin_zhang


Test with hive setting:
* hive.server2.idle.session.timeout=6
* hive.server2.session.check.interval=1
* hive.server2.thrift.max.worker.threads=2
* hive.server2.thrift.min.worker.threads=1

1. Connect with user test1 and hold the connection.
2. Wait  1 minute, Connect to user test2
3. Seen from web ui  http://localhost:4040/sqlserver/, session for  test1 
finished 
4. Connect with user test3, connect was refused 
  Task has been rejected by ExecutorService 10 times till timedout, reason: 
java.util.concurrent.RejectedExecutionException: Task 
org.apache.thrift.server.TThreadPoolServer$WorkerProcess@71b1ea28 rejected from 
java.util.concurrent.ThreadPoolExecutor@716d9872[Running, pool size = 2, active 
threads = 2, queued tasks = 0, completed tasks = 2]
Seems the session was not removed, while UI show finished time for session 

 






--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-25804) JDOPersistenceManager leak when query via JDBC

2020-03-30 Thread pin_zhang (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-25804?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17070938#comment-17070938
 ] 

pin_zhang commented on SPARK-25804:
---

Any comments on this issue?

> JDOPersistenceManager leak when query via JDBC
> --
>
> Key: SPARK-25804
> URL: https://issues.apache.org/jira/browse/SPARK-25804
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.3.1
>Reporter: pin_zhang
>Priority: Major
> Attachments: image-2018-10-27-01-44-07-972.png
>
>
> 1. start-thriftserver.sh under SPARK2.3.1
> 2. Create Table and insert values
>      create table test_leak (id string, index int);
>      insert into test_leak values('id1',1)
> 3. Create JDBC Client query the table
> import java.sql.*;
> public class HiveClient {
> public static void main(String[] args) throws Exception {
> String driverName = "org.apache.hive.jdbc.HiveDriver";
>  Class.forName(driverName);
>  Connection con = DriverManager.getConnection( 
> "jdbc:hive2://localhost:1/default", "test", "test");
>  Statement stmt = con.createStatement();
>  String sql = "select * from test_leak";
>  int loop = 100;
>  while ( loop – > 0) {
>     ResultSet rs = stmt.executeQuery(sql);
>     rs.next();
>     System.out.println(new java.sql.Timestamp(System.currentTimeMillis()) +" 
> : " +    rs.getString(1));
>    rs.close();
>   if( loop % 100 ==0){
>      Thread.sleep(1);
>   }
> }
> con.close(); 
>  }
>  }
> 4. Dump HS2 heap org.datanucleus.api.jdo.JDOPersistenceManager instances keep 
> increasing.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-29423) leak on org.apache.spark.sql.execution.streaming.StreamingQueryListenerBus

2019-10-11 Thread pin_zhang (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-29423?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16949187#comment-16949187
 ] 

pin_zhang commented on SPARK-29423:
---

The same result on Spark 2.4.3.

> leak on  org.apache.spark.sql.execution.streaming.StreamingQueryListenerBus
> ---
>
> Key: SPARK-29423
> URL: https://issues.apache.org/jira/browse/SPARK-29423
> Project: Spark
>  Issue Type: Bug
>  Components: Structured Streaming
>Affects Versions: 2.3.1
>Reporter: pin_zhang
>Priority: Major
>
> 1.  start server with start-thriftserver.sh
>  2.  JDBC client connect and disconnect to hiveserver2
>  for (int i = 0; i < 1; i++) {
>Connection conn = 
> DriverManager.getConnection("jdbc:hive2://localhost:1", "test", "");
>conn.close();
>  }
> 3.  instance of  
> org.apache.spark.sql.execution.streaming.StreamingQueryListenerBus keep 
> increasing



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-29423) leak on org.apache.spark.sql.execution.streaming.StreamingQueryListenerBus

2019-10-10 Thread pin_zhang (Jira)
pin_zhang created SPARK-29423:
-

 Summary: leak on  
org.apache.spark.sql.execution.streaming.StreamingQueryListenerBus
 Key: SPARK-29423
 URL: https://issues.apache.org/jira/browse/SPARK-29423
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 2.3.1
Reporter: pin_zhang


1.  start server with start-thriftserver.sh
 2.  JDBC client connect and disconnect to hiveserver2
 for (int i = 0; i < 1; i++) {
   Connection conn = 
DriverManager.getConnection("jdbc:hive2://localhost:1", "test", "");
   conn.close();
 }
3.  instance of  
org.apache.spark.sql.execution.streaming.StreamingQueryListenerBus keep 
increasing



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-21067) Thrift Server - CTAS fail with Unable to move source

2019-06-14 Thread pin_zhang (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-21067?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16863796#comment-16863796
 ] 

pin_zhang commented on SPARK-21067:
---

We also encounter this issue, any plan to fix this bug?

> Thrift Server - CTAS fail with Unable to move source
> 
>
> Key: SPARK-21067
> URL: https://issues.apache.org/jira/browse/SPARK-21067
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.1.1, 2.2.0, 2.4.0
> Environment: Yarn
> Hive MetaStore
> HDFS (HA)
>Reporter: Dominic Ricard
>Priority: Major
> Attachments: SPARK-21067.patch
>
>
> After upgrading our Thrift cluster to 2.1.1, we ran into an issue where CTAS 
> would fail, sometimes...
> Most of the time, the CTAS would work only once, after starting the thrift 
> server. After that, dropping the table and re-issuing the same CTAS would 
> fail with the following message (Sometime, it fails right away, sometime it 
> work for a long period of time):
> {noformat}
> Error: org.apache.spark.sql.AnalysisException: 
> org.apache.hadoop.hive.ql.metadata.HiveException: Unable to move source 
> hdfs://nameservice1//tmp/hive-staging/thrift_hive_2017-06-12_16-56-18_464_7598877199323198104-31/-ext-1/part-0
>  to destination 
> hdfs://nameservice1/user/hive/warehouse/dricard.db/test/part-0; 
> (state=,code=0)
> {noformat}
> We have already found the following Jira 
> (https://issues.apache.org/jira/browse/SPARK-11021) which state that the 
> {{hive.exec.stagingdir}} had to be added in order for Spark to be able to 
> handle CREATE TABLE properly as of 2.0. As you can see in the error, we have 
> ours set to "/tmp/hive-staging/\{user.name\}"
> Same issue with INSERT statements:
> {noformat}
> CREATE TABLE IF NOT EXISTS dricard.test (col1 int); INSERT INTO TABLE 
> dricard.test SELECT 1;
> Error: org.apache.spark.sql.AnalysisException: 
> org.apache.hadoop.hive.ql.metadata.HiveException: Unable to move source 
> hdfs://nameservice1/tmp/hive-staging/thrift_hive_2017-06-12_20-41-12_964_3086448130033637241-16/-ext-1/part-0
>  to destination 
> hdfs://nameservice1/user/hive/warehouse/dricard.db/test/part-0; 
> (state=,code=0)
> {noformat}
> This worked fine in 1.6.2, which we currently run in our Production 
> Environment but since 2.0+, we haven't been able to CREATE TABLE consistently 
> on the cluster.
> SQL to reproduce issue:
> {noformat}
> DROP SCHEMA IF EXISTS dricard CASCADE; 
> CREATE SCHEMA dricard; 
> CREATE TABLE dricard.test (col1 int); 
> INSERT INTO TABLE dricard.test SELECT 1; 
> SELECT * from dricard.test; 
> DROP TABLE dricard.test; 
> CREATE TABLE dricard.test AS select 1 as `col1`;
> SELECT * from dricard.test
> {noformat}
> Thrift server usually fails at INSERT...
> Tried the same procedure in a spark context using spark.sql() and didn't 
> encounter the same issue.
> Full stack Trace:
> {noformat}
> 17/06/14 14:52:18 ERROR thriftserver.SparkExecuteStatementOperation: Error 
> executing query, currentState RUNNING,
> org.apache.spark.sql.AnalysisException: 
> org.apache.hadoop.hive.ql.metadata.HiveException: Unable to move source 
> hdfs://nameservice1/tmp/hive-staging/thrift_hive_2017-06-14_14-52-18_521_5906917519254880890-5/-ext-1/part-0
>  to desti
> nation hdfs://nameservice1/user/hive/warehouse/dricard.db/test/part-0;
> at 
> org.apache.spark.sql.hive.HiveExternalCatalog.withClient(HiveExternalCatalog.scala:106)
> at 
> org.apache.spark.sql.hive.HiveExternalCatalog.loadTable(HiveExternalCatalog.scala:766)
> at 
> org.apache.spark.sql.hive.execution.InsertIntoHiveTable.sideEffectResult$lzycompute(InsertIntoHiveTable.scala:374)
> at 
> org.apache.spark.sql.hive.execution.InsertIntoHiveTable.sideEffectResult(InsertIntoHiveTable.scala:221)
> at 
> org.apache.spark.sql.hive.execution.InsertIntoHiveTable.doExecute(InsertIntoHiveTable.scala:407)
> at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:114)
> at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:114)
> at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:135)
> at 
> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
> at 
> org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:132)
> at 
> org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:113)
> at 
> org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:92)
> at 
> org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:92)
> at org.apache.spark.sql.Dataset.(Dataset.scala:185)
> at 

[jira] [Reopened] (SPARK-27600) Unable to start Spark Hive Thrift Server when multiple hive server server share the same metastore

2019-05-09 Thread pin_zhang (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-27600?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

pin_zhang reopened SPARK-27600:
---

The issue is not resolved

> Unable to start Spark Hive Thrift Server when multiple hive server server 
> share the same metastore
> --
>
> Key: SPARK-27600
> URL: https://issues.apache.org/jira/browse/SPARK-27600
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.3.1
>Reporter: pin_zhang
>Priority: Major
>
> When start ten or more spark hive thrift servers at the same time, more than 
> one version saved to table VERSION when meet exception WARN 
> [DataNucleus.Query] (main:) Query for candidates of 
> org.apache.hadoop.hive.metastore.model.MVersionTable and subclasses resulted 
> in no possible candidates
> Exception thrown obtaining schema column information from datastore
> org.datanucleus.exceptions.NucleusDataStoreException: Exception thrown 
> obtaining schema column information from datastore
> Caused by: com.mysql.jdbc.exceptions.jdbc4.MySQLSyntaxErrorException: Table 
> 'via_ms.deleteme1556239494724' doesn't exist
>  at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
>  at 
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
>  at 
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
>  at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
>  at com.mysql.jdbc.Util.handleNewInstance(Util.java:425)
>  at com.mysql.jdbc.Util.getInstance(Util.java:408)
>  at com.mysql.jdbc.SQLError.createSQLException(SQLError.java:944)
>  at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:3978)
>  at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:3914)
>  at com.mysql.jdbc.MysqlIO.sendCommand(MysqlIO.java:2530)
>  at com.mysql.jdbc.MysqlIO.sqlQueryDirect(MysqlIO.java:2683)
>  at com.mysql.jdbc.ConnectionImpl.execSQL(ConnectionImpl.java:2491)
>  at com.mysql.jdbc.ConnectionImpl.execSQL(ConnectionImpl.java:2449)
>  at com.mysql.jdbc.StatementImpl.executeQuery(StatementImpl.java:1381)
>  at com.mysql.jdbc.DatabaseMetaData$2.forEach(DatabaseMetaData.java:2441)
>  at com.mysql.jdbc.DatabaseMetaData$2.forEach(DatabaseMetaData.java:2339)
>  at com.mysql.jdbc.IterateBlock.doForAll(IterateBlock.java:50)
>  at com.mysql.jdbc.DatabaseMetaData.getColumns(DatabaseMetaData.java:2337)
>  at 
> org.apache.commons.dbcp.DelegatingDatabaseMetaData.getColumns(DelegatingDatabaseMetaData.java:218)
>  at 
> org.datanucleus.store.rdbms.adapter.BaseDatastoreAdapter.getColumns(BaseDatastoreAdapter.java:1532)
>  at 
> org.datanucleus.store.rdbms.schema.RDBMSSchemaHandler.refreshTableData(RDBMSSchemaHandler.java:921)
> Then cannot start hive server any more because of 
> MetaException(message:Metastore contains multiple versions (2) 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-27600) Unable to start Spark Hive Thrift Server when multiple hive server server share the same metastore

2019-05-08 Thread pin_zhang (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-27600?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16835437#comment-16835437
 ] 

pin_zhang edited comment on SPARK-27600 at 5/8/19 9:02 AM:
---

[~hyukjin.kwon] I think this is relate to a hive bug 
https://issues.apache.org/jira/browse/HIVE-6113

It shows "The exception appears when there are several processes working with 
Hive concurrently." In hive's fix upgrade third-party datanucleus.

Is it a spark's bug if spark use the hive 1.2.1?

 


was (Author: pin_zhang):
I think this is relate to a hive bug 
https://issues.apache.org/jira/browse/HIVE-6113

It shows "The exception appears when there are several processes working with 
Hive concurrently." In hive's fix upgrade third-party datanucleus.

Is it a spark's bug if spark use the hive 1.2.1?

 

> Unable to start Spark Hive Thrift Server when multiple hive server server 
> share the same metastore
> --
>
> Key: SPARK-27600
> URL: https://issues.apache.org/jira/browse/SPARK-27600
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.3.1
>Reporter: pin_zhang
>Priority: Major
>
> When start ten or more spark hive thrift servers at the same time, more than 
> one version saved to table VERSION when meet exception WARN 
> [DataNucleus.Query] (main:) Query for candidates of 
> org.apache.hadoop.hive.metastore.model.MVersionTable and subclasses resulted 
> in no possible candidates
> Exception thrown obtaining schema column information from datastore
> org.datanucleus.exceptions.NucleusDataStoreException: Exception thrown 
> obtaining schema column information from datastore
> Caused by: com.mysql.jdbc.exceptions.jdbc4.MySQLSyntaxErrorException: Table 
> 'via_ms.deleteme1556239494724' doesn't exist
>  at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
>  at 
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
>  at 
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
>  at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
>  at com.mysql.jdbc.Util.handleNewInstance(Util.java:425)
>  at com.mysql.jdbc.Util.getInstance(Util.java:408)
>  at com.mysql.jdbc.SQLError.createSQLException(SQLError.java:944)
>  at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:3978)
>  at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:3914)
>  at com.mysql.jdbc.MysqlIO.sendCommand(MysqlIO.java:2530)
>  at com.mysql.jdbc.MysqlIO.sqlQueryDirect(MysqlIO.java:2683)
>  at com.mysql.jdbc.ConnectionImpl.execSQL(ConnectionImpl.java:2491)
>  at com.mysql.jdbc.ConnectionImpl.execSQL(ConnectionImpl.java:2449)
>  at com.mysql.jdbc.StatementImpl.executeQuery(StatementImpl.java:1381)
>  at com.mysql.jdbc.DatabaseMetaData$2.forEach(DatabaseMetaData.java:2441)
>  at com.mysql.jdbc.DatabaseMetaData$2.forEach(DatabaseMetaData.java:2339)
>  at com.mysql.jdbc.IterateBlock.doForAll(IterateBlock.java:50)
>  at com.mysql.jdbc.DatabaseMetaData.getColumns(DatabaseMetaData.java:2337)
>  at 
> org.apache.commons.dbcp.DelegatingDatabaseMetaData.getColumns(DelegatingDatabaseMetaData.java:218)
>  at 
> org.datanucleus.store.rdbms.adapter.BaseDatastoreAdapter.getColumns(BaseDatastoreAdapter.java:1532)
>  at 
> org.datanucleus.store.rdbms.schema.RDBMSSchemaHandler.refreshTableData(RDBMSSchemaHandler.java:921)
> Then cannot start hive server any more because of 
> MetaException(message:Metastore contains multiple versions (2) 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-27600) Unable to start Spark Hive Thrift Server when multiple hive server server share the same metastore

2019-05-08 Thread pin_zhang (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-27600?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16835437#comment-16835437
 ] 

pin_zhang commented on SPARK-27600:
---

I think this is relate to a hive bug 
https://issues.apache.org/jira/browse/HIVE-6113

It shows "The exception appears when there are several processes working with 
Hive concurrently." In hive's fix upgrade third-party datanucleus.

Is it a spark's bug if spark use the hive 1.2.1?

 

> Unable to start Spark Hive Thrift Server when multiple hive server server 
> share the same metastore
> --
>
> Key: SPARK-27600
> URL: https://issues.apache.org/jira/browse/SPARK-27600
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.3.1
>Reporter: pin_zhang
>Priority: Major
>
> When start ten or more spark hive thrift servers at the same time, more than 
> one version saved to table VERSION when meet exception WARN 
> [DataNucleus.Query] (main:) Query for candidates of 
> org.apache.hadoop.hive.metastore.model.MVersionTable and subclasses resulted 
> in no possible candidates
> Exception thrown obtaining schema column information from datastore
> org.datanucleus.exceptions.NucleusDataStoreException: Exception thrown 
> obtaining schema column information from datastore
> Caused by: com.mysql.jdbc.exceptions.jdbc4.MySQLSyntaxErrorException: Table 
> 'via_ms.deleteme1556239494724' doesn't exist
>  at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
>  at 
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
>  at 
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
>  at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
>  at com.mysql.jdbc.Util.handleNewInstance(Util.java:425)
>  at com.mysql.jdbc.Util.getInstance(Util.java:408)
>  at com.mysql.jdbc.SQLError.createSQLException(SQLError.java:944)
>  at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:3978)
>  at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:3914)
>  at com.mysql.jdbc.MysqlIO.sendCommand(MysqlIO.java:2530)
>  at com.mysql.jdbc.MysqlIO.sqlQueryDirect(MysqlIO.java:2683)
>  at com.mysql.jdbc.ConnectionImpl.execSQL(ConnectionImpl.java:2491)
>  at com.mysql.jdbc.ConnectionImpl.execSQL(ConnectionImpl.java:2449)
>  at com.mysql.jdbc.StatementImpl.executeQuery(StatementImpl.java:1381)
>  at com.mysql.jdbc.DatabaseMetaData$2.forEach(DatabaseMetaData.java:2441)
>  at com.mysql.jdbc.DatabaseMetaData$2.forEach(DatabaseMetaData.java:2339)
>  at com.mysql.jdbc.IterateBlock.doForAll(IterateBlock.java:50)
>  at com.mysql.jdbc.DatabaseMetaData.getColumns(DatabaseMetaData.java:2337)
>  at 
> org.apache.commons.dbcp.DelegatingDatabaseMetaData.getColumns(DelegatingDatabaseMetaData.java:218)
>  at 
> org.datanucleus.store.rdbms.adapter.BaseDatastoreAdapter.getColumns(BaseDatastoreAdapter.java:1532)
>  at 
> org.datanucleus.store.rdbms.schema.RDBMSSchemaHandler.refreshTableData(RDBMSSchemaHandler.java:921)
> Then cannot start hive server any more because of 
> MetaException(message:Metastore contains multiple versions (2) 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-27553) Operation log is not closed when close session

2019-05-04 Thread pin_zhang (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-27553?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16833044#comment-16833044
 ] 

pin_zhang commented on SPARK-27553:
---

The operation log is not closed when close the session

> Operation log is not closed when close session
> --
>
> Key: SPARK-27553
> URL: https://issues.apache.org/jira/browse/SPARK-27553
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.3.1
>Reporter: pin_zhang
>Priority: Major
>
> On Window
> 1. start spark-shell
> 2. start hive server in shell by 
> org.apache.spark.sql.hive.thriftserver.HiveThriftServer2.startWithContext(spark.sqlContext)
> 3. beeline connect to hive server
>     3.1 connect 
>           beeline -u jdbc:hive2://localhost:1
>     3.2 Run SQL
>           show tables;
>     3.3 quit beeline
>           !quit
> Get exception log
> {code}
>  19/04/24 11:38:22 ERROR HiveSessionImpl: Failed to cleanup ses
> sion log dir: SessionHandle [5827428b-d140-4fc0-8ad4-721c39b3ead0]
> java.io.IOException: Unable to delete file: 
> C:\Users\test\AppData\Local\Temp\test\operation_logs\5827428b-d140-4fc0-8ad4-721c39b3ead0\df9cd631-66e7-4303-9a4
> 1-a09bdefcf888
>  at org.apache.commons.io.FileUtils.forceDelete(FileUtils.java:2279)
>  at org.apache.commons.io.FileUtils.cleanDirectory(FileUtils.java:1653)
>  at org.apache.commons.io.FileUtils.deleteDirectory(FileUtils.java:1535)
>  at org.apache.commons.io.FileUtils.forceDelete(FileUtils.java:2270)
>  at 
> org.apache.hive.service.cli.session.HiveSessionImpl.cleanupSessionLogDir(HiveSessionImpl.java:671)
>  at 
> org.apache.hive.service.cli.session.HiveSessionImpl.close(HiveSessionImpl.java:643)
>  at 
> org.apache.hive.service.cli.session.HiveSessionImplwithUGI.close(HiveSessionImplwithUGI.java:109)
>  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>  at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>  at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>  at java.lang.reflect.Method.invoke(Method.java:497)
>  at 
> org.apache.hive.service.cli.session.HiveSessionProxy.invoke(HiveSessionProxy.java:78)
>  at 
> org.apache.hive.service.cli.session.HiveSessionProxy.access$000(HiveSessionProxy.java:36)
>  at 
> org.apache.hive.service.cli.session.HiveSessionProxy$1.run(HiveSessionProxy.java:63)
>  at java.security.AccessController.doPrivileged(Native Method)
>  at javax.security.auth.Subject.doAs(Subject.java:422)
>  at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1698)
>  at 
> org.apache.hive.service.cli.session.HiveSessionProxy.invoke(HiveSessionProxy.java:59)
>  at com.sun.proxy.$Proxy19.close(Unknown Source)
>  at 
> org.apache.hive.service.cli.session.SessionManager.closeSession(SessionManager.java:280)
>  at 
> org.apache.spark.sql.hive.thriftserver.SparkSQLSessionManager.closeSession(SparkSQLSessionManager.scala:76)
>  at org.apache.hive.service.cli.CLIService.closeSession(CLIService.java:237)
>  at 
> org.apache.hive.service.cli.thrift.ThriftCLIService.CloseSession(ThriftCLIService.java:397)
>  at 
> org.apache.hive.service.cli.thrift.TCLIService$Processor$CloseSession.getResult(TCLIService.java:1273)
>  at 
> org.apache.hive.service.cli.thrift.TCLIService$Processor$CloseSession.getResult(TCLIService.java:1258)
>  at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39)
>  at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39)
>  at 
> org.apache.hive.service.auth.TSetIpAddressProcessor.process(TSetIpAddressProcessor.java:53)
>  at 
> org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:286)
>  at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>  at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>  at java.lang.Thread.run(Thread.java:745)
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-27600) Unable to start Spark Hive Thrift Server when multiple hive server server share the same metastore

2019-04-29 Thread pin_zhang (JIRA)
pin_zhang created SPARK-27600:
-

 Summary: Unable to start Spark Hive Thrift Server when multiple 
hive server server share the same metastore
 Key: SPARK-27600
 URL: https://issues.apache.org/jira/browse/SPARK-27600
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 2.3.1
Reporter: pin_zhang


When start ten or more spark hive thrift servers at the same time, more than 
one version saved to table VERSION when meet exception WARN [DataNucleus.Query] 
(main:) Query for candidates of 
org.apache.hadoop.hive.metastore.model.MVersionTable and subclasses resulted in 
no possible candidates
Exception thrown obtaining schema column information from datastore
org.datanucleus.exceptions.NucleusDataStoreException: Exception thrown 
obtaining schema column information from datastore

Caused by: com.mysql.jdbc.exceptions.jdbc4.MySQLSyntaxErrorException: Table 
'via_ms.deleteme1556239494724' doesn't exist
 at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
 at 
sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
 at 
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
 at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
 at com.mysql.jdbc.Util.handleNewInstance(Util.java:425)
 at com.mysql.jdbc.Util.getInstance(Util.java:408)
 at com.mysql.jdbc.SQLError.createSQLException(SQLError.java:944)
 at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:3978)
 at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:3914)
 at com.mysql.jdbc.MysqlIO.sendCommand(MysqlIO.java:2530)
 at com.mysql.jdbc.MysqlIO.sqlQueryDirect(MysqlIO.java:2683)
 at com.mysql.jdbc.ConnectionImpl.execSQL(ConnectionImpl.java:2491)
 at com.mysql.jdbc.ConnectionImpl.execSQL(ConnectionImpl.java:2449)
 at com.mysql.jdbc.StatementImpl.executeQuery(StatementImpl.java:1381)
 at com.mysql.jdbc.DatabaseMetaData$2.forEach(DatabaseMetaData.java:2441)
 at com.mysql.jdbc.DatabaseMetaData$2.forEach(DatabaseMetaData.java:2339)
 at com.mysql.jdbc.IterateBlock.doForAll(IterateBlock.java:50)
 at com.mysql.jdbc.DatabaseMetaData.getColumns(DatabaseMetaData.java:2337)
 at 
org.apache.commons.dbcp.DelegatingDatabaseMetaData.getColumns(DelegatingDatabaseMetaData.java:218)
 at 
org.datanucleus.store.rdbms.adapter.BaseDatastoreAdapter.getColumns(BaseDatastoreAdapter.java:1532)
 at 
org.datanucleus.store.rdbms.schema.RDBMSSchemaHandler.refreshTableData(RDBMSSchemaHandler.java:921)

Then cannot start hive server any more because of 
MetaException(message:Metastore contains multiple versions (2) 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-27553) Operation log is not closed when close session

2019-04-23 Thread pin_zhang (JIRA)
pin_zhang created SPARK-27553:
-

 Summary: Operation log is not closed when close session
 Key: SPARK-27553
 URL: https://issues.apache.org/jira/browse/SPARK-27553
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 2.3.1
Reporter: pin_zhang


On Window
1. start spark-shell
2. start hive server in shell by 
org.apache.spark.sql.hive.thriftserver.HiveThriftServer2.startWithContext(spark.sqlContext)
3. beeline connect to hive server
    3.1 connect 
          beeline -u jdbc:hive2://localhost:1
    3.2 Run SQL
          show tables;
    3.3 quit beeline
          !quit
Get exception log
 19/04/24 11:38:22 ERROR HiveSessionImpl: Failed to cleanup ses
sion log dir: SessionHandle [5827428b-d140-4fc0-8ad4-721c39b3ead0]
java.io.IOException: Unable to delete file: 
C:\Users\test\AppData\Local\Temp\test\operation_logs\5827428b-d140-4fc0-8ad4-721c39b3ead0\df9cd631-66e7-4303-9a4
1-a09bdefcf888
 at org.apache.commons.io.FileUtils.forceDelete(FileUtils.java:2279)
 at org.apache.commons.io.FileUtils.cleanDirectory(FileUtils.java:1653)
 at org.apache.commons.io.FileUtils.deleteDirectory(FileUtils.java:1535)
 at org.apache.commons.io.FileUtils.forceDelete(FileUtils.java:2270)
 at 
org.apache.hive.service.cli.session.HiveSessionImpl.cleanupSessionLogDir(HiveSessionImpl.java:671)
 at 
org.apache.hive.service.cli.session.HiveSessionImpl.close(HiveSessionImpl.java:643)
 at 
org.apache.hive.service.cli.session.HiveSessionImplwithUGI.close(HiveSessionImplwithUGI.java:109)
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
 at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 at java.lang.reflect.Method.invoke(Method.java:497)
 at 
org.apache.hive.service.cli.session.HiveSessionProxy.invoke(HiveSessionProxy.java:78)
 at 
org.apache.hive.service.cli.session.HiveSessionProxy.access$000(HiveSessionProxy.java:36)
 at 
org.apache.hive.service.cli.session.HiveSessionProxy$1.run(HiveSessionProxy.java:63)
 at java.security.AccessController.doPrivileged(Native Method)
 at javax.security.auth.Subject.doAs(Subject.java:422)
 at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1698)
 at 
org.apache.hive.service.cli.session.HiveSessionProxy.invoke(HiveSessionProxy.java:59)
 at com.sun.proxy.$Proxy19.close(Unknown Source)
 at 
org.apache.hive.service.cli.session.SessionManager.closeSession(SessionManager.java:280)
 at 
org.apache.spark.sql.hive.thriftserver.SparkSQLSessionManager.closeSession(SparkSQLSessionManager.scala:76)
 at org.apache.hive.service.cli.CLIService.closeSession(CLIService.java:237)
 at 
org.apache.hive.service.cli.thrift.ThriftCLIService.CloseSession(ThriftCLIService.java:397)
 at 
org.apache.hive.service.cli.thrift.TCLIService$Processor$CloseSession.getResult(TCLIService.java:1273)
 at 
org.apache.hive.service.cli.thrift.TCLIService$Processor$CloseSession.getResult(TCLIService.java:1258)
 at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39)
 at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39)
 at 
org.apache.hive.service.auth.TSetIpAddressProcessor.process(TSetIpAddressProcessor.java:53)
 at 
org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:286)
 at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
 at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
 at java.lang.Thread.run(Thread.java:745)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-25804) JDOPersistenceManager leak when query via JDBC

2018-10-23 Thread pin_zhang (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-25804?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

pin_zhang updated SPARK-25804:
--
Description: 
1. start-thriftserver.sh under SPARK2.3.1

2. Create Table and insert values

     create table test_leak (id string, index int);

     insert into test_leak values('id1',1)

3. Create JDBC Client query the table

import java.sql.*;

public class HiveClient {

public static void main(String[] args) throws Exception {

String driverName = "org.apache.hive.jdbc.HiveDriver";
 Class.forName(driverName);
 Connection con = DriverManager.getConnection( 
"jdbc:hive2://localhost:1/default", "test", "test");
 Statement stmt = con.createStatement();
 String sql = "select * from test_leak";
 int loop = 100;
 while ( loop – > 0) {

    ResultSet rs = stmt.executeQuery(sql);

    rs.next();

    System.out.println(new java.sql.Timestamp(System.currentTimeMillis()) +" : 
" +    rs.getString(1));

   rs.close();

  if( loop % 100 ==0){
     Thread.sleep(1);
  }

}

con.close(); 
 }
 }

4. Dump HS2 heap org.datanucleus.api.jdo.JDOPersistenceManager instances keep 
increasing.

  was:
1. start-thriftserver.sh under SPARK2.3.1

2. Create Table and insert values

     create table test_leak (id string, index int);

     insert into test_leak values('id1',1)

3. Create JDBC Client query the table

import java.sql.*;

public class HiveClient {

public static void main(String[] args) throws Exception {

String driverName = "org.apache.hive.jdbc.HiveDriver";
 Class.forName(driverName);
 Connection con = DriverManager.getConnection( 
"jdbc:hive2://localhost:1/default", "test", "test");
 Statement stmt = con.createStatement();
 String sql = "select * from test_leak";
 int loop = 100;
 while ( loop -- > 0) {
 ResultSet rs = stmt.executeQuery(sql);
 rs.next(); 
 System.out.println(new java.sql.Timestamp(System.currentTimeMillis()) +" : " + 
rs.getString(1));
 rs.close();
 }
 con.close();
 }
}

4. Dump HS2 heap org.datanucleus.api.jdo.JDOPersistenceManager instances keep 
increasing.


> JDOPersistenceManager leak when query via JDBC
> --
>
> Key: SPARK-25804
> URL: https://issues.apache.org/jira/browse/SPARK-25804
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.3.1
>Reporter: pin_zhang
>Priority: Major
>
> 1. start-thriftserver.sh under SPARK2.3.1
> 2. Create Table and insert values
>      create table test_leak (id string, index int);
>      insert into test_leak values('id1',1)
> 3. Create JDBC Client query the table
> import java.sql.*;
> public class HiveClient {
> public static void main(String[] args) throws Exception {
> String driverName = "org.apache.hive.jdbc.HiveDriver";
>  Class.forName(driverName);
>  Connection con = DriverManager.getConnection( 
> "jdbc:hive2://localhost:1/default", "test", "test");
>  Statement stmt = con.createStatement();
>  String sql = "select * from test_leak";
>  int loop = 100;
>  while ( loop – > 0) {
>     ResultSet rs = stmt.executeQuery(sql);
>     rs.next();
>     System.out.println(new java.sql.Timestamp(System.currentTimeMillis()) +" 
> : " +    rs.getString(1));
>    rs.close();
>   if( loop % 100 ==0){
>      Thread.sleep(1);
>   }
> }
> con.close(); 
>  }
>  }
> 4. Dump HS2 heap org.datanucleus.api.jdo.JDOPersistenceManager instances keep 
> increasing.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-25804) JDOPersistenceManager leak when query via JDBC

2018-10-22 Thread pin_zhang (JIRA)
pin_zhang created SPARK-25804:
-

 Summary: JDOPersistenceManager leak when query via JDBC
 Key: SPARK-25804
 URL: https://issues.apache.org/jira/browse/SPARK-25804
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 2.3.1
Reporter: pin_zhang


1. start-thriftserver.sh under SPARK2.3.1

2. Create Table and insert values

     create table test_leak (id string, index int);

     insert into test_leak values('id1',1)

3. Create JDBC Client query the table

import java.sql.*;

public class HiveClient {

public static void main(String[] args) throws Exception {

String driverName = "org.apache.hive.jdbc.HiveDriver";
 Class.forName(driverName);
 Connection con = DriverManager.getConnection( 
"jdbc:hive2://localhost:1/default", "test", "test");
 Statement stmt = con.createStatement();
 String sql = "select * from test_leak";
 int loop = 100;
 while ( loop -- > 0) {
 ResultSet rs = stmt.executeQuery(sql);
 rs.next(); 
 System.out.println(new java.sql.Timestamp(System.currentTimeMillis()) +" : " + 
rs.getString(1));
 rs.close();
 }
 con.close();
 }
}

4. Dump HS2 heap org.datanucleus.api.jdo.JDOPersistenceManager instances keep 
increasing.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-25169) Multiple DataFrames cannot write to the same folder concurrently

2018-08-21 Thread pin_zhang (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-25169?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

pin_zhang updated SPARK-25169:
--
Component/s: (was: Spark Core)
 SQL

> Multiple DataFrames cannot write to the same folder concurrently
> 
>
> Key: SPARK-25169
> URL: https://issues.apache.org/jira/browse/SPARK-25169
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.3.1
>Reporter: pin_zhang
>Priority: Major
>
>  
> Seems DataFrame writer cannot support write to the same folder concurrently.
> Steps to reproduce
> val sc = new SparkContext(conf)
> val hiveContext = new HiveContext(sc)
> val source="file:///G:/home/json"
> val target ="file:///G:/home/oad"
> new Thread(new Runnable {
>  override def run(): Unit = {
>  hiveContext.jsonFile(source).write.mode(SaveMode.Append).json(target)
>  Thread.sleep(1000L)
>  }
> }).start()
> new Thread(new Runnable {
>  override def run(): Unit = {
>  hiveContext.jsonFile(source).write.mode(SaveMode.Append).json(target)
>  Thread.sleep(1000L)
>  }
> }).start()
> new Thread(new Runnable {
>  override def run(): Unit = {
>  hiveContext.jsonFile(source).write.mode(SaveMode.Append).json(target)
>  Thread.sleep(1000L)
>  }
> }).start()
>  
> Meet exceptions
> java.io.FileNotFoundException: File 
> file:/G:/home/oad/_temporary/0/task_20180821151921_0004_m_01/.part-1-463ee671-0ef0-42ff-8968-1d960bc87996-c000.json.crc
>  does not exist
>  at 
> org.apache.hadoop.fs.RawLocalFileSystem.deprecatedGetFileStatus(RawLocalFileSystem.java:611)
>  at 
> org.apache.hadoop.fs.RawLocalFileSystem.getFileLinkStatusInternal(RawLocalFileSystem.java:824)
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-25169) Multiple DataFrames cannot write to the same folder concurrently

2018-08-21 Thread pin_zhang (JIRA)
pin_zhang created SPARK-25169:
-

 Summary: Multiple DataFrames cannot write to the same folder 
concurrently
 Key: SPARK-25169
 URL: https://issues.apache.org/jira/browse/SPARK-25169
 Project: Spark
  Issue Type: Bug
  Components: Spark Core
Affects Versions: 2.3.1
Reporter: pin_zhang


 

Seems DataFrame writer cannot support write to the same folder concurrently.

Steps to reproduce
val sc = new SparkContext(conf)
val hiveContext = new HiveContext(sc)
val source="file:///G:/home/json"
val target ="file:///G:/home/oad"
new Thread(new Runnable {
 override def run(): Unit = {
 hiveContext.jsonFile(source).write.mode(SaveMode.Append).json(target)
 Thread.sleep(1000L)
 }
}).start()
new Thread(new Runnable {
 override def run(): Unit = {
 hiveContext.jsonFile(source).write.mode(SaveMode.Append).json(target)
 Thread.sleep(1000L)
 }
}).start()
new Thread(new Runnable {
 override def run(): Unit = {
 hiveContext.jsonFile(source).write.mode(SaveMode.Append).json(target)
 Thread.sleep(1000L)
 }
}).start()

 

Meet exceptions

java.io.FileNotFoundException: File 
file:/G:/home/oad/_temporary/0/task_20180821151921_0004_m_01/.part-1-463ee671-0ef0-42ff-8968-1d960bc87996-c000.json.crc
 does not exist
 at 
org.apache.hadoop.fs.RawLocalFileSystem.deprecatedGetFileStatus(RawLocalFileSystem.java:611)
 at 
org.apache.hadoop.fs.RawLocalFileSystem.getFileLinkStatusInternal(RawLocalFileSystem.java:824)

 

 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-24749) Cannot filter array with named_struct

2018-07-05 Thread pin_zhang (JIRA)
pin_zhang created SPARK-24749:
-

 Summary: Cannot filter array with named_struct
 Key: SPARK-24749
 URL: https://issues.apache.org/jira/browse/SPARK-24749
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 2.3.1
Reporter: pin_zhang


1. Create Table

create table arr__int( arr array> )stored as parquet;

2. Insert data

insert into arr__int values( array(named_struct('a', 1)));

3. Filter with struct data

select * from arr__int where array_contains (arr, named_struct('a', 1));
Error: org.apache.spark.sql.AnalysisException: cannot resolve 
'array_contains(arr__int.`arr`, named_struct('a', 1))' due to data type 
mismatch: Arguments must be an array followed by a value of same type as the 
array members; line 1 pos 29;
'Project [*]
+- 'Filter array_contains(arr#6, named_struct(a, 1))
 +- SubqueryAlias arr__int
 +- Relation[arr#6] parquet (state=,code=0)

Caused by schema null is always false in named_struct 

 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-23371) Parquet Footer data is wrong on window in parquet format partition table

2018-02-22 Thread pin_zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-23371?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16373862#comment-16373862
 ] 

pin_zhang commented on SPARK-23371:
---

# It's the spark that bundles two versions(1.6 and 1.8) parquet jars in 
classpath.
 # data write with parquet 1.6 and read with 1.8 with the steps.
 # parquet 1.6 write wrong footer in spark, as it cannot load version info on 
Windows OS.

 

 

 

> Parquet Footer data is wrong on window in parquet format partition table 
> -
>
> Key: SPARK-23371
> URL: https://issues.apache.org/jira/browse/SPARK-23371
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.1.1, 2.1.2
>Reporter: pin_zhang
>Priority: Major
>
> On window
> Run SQL in spark shell
>  spark.sql("create table part_test (id string )partitioned by( index int) 
> stored as parquet")
>  spark.sql("insert into part_test partition (index =1) values ('1')")
> Get exception when query spark.sql("select * from part_test ").show()
> For the parquet.Version in parquet-hadoop-bundle-1.6.0.jar cannot load the 
> version info in spark on window. Classloader try to get version in the 
> parquet-format-2.3.0-incubating.jar
> 18/02/09 16:58:48 WARN CorruptStatistics: Ignoring statistics because 
> created_by
>  could not be parsed (see PARQUET-251): parquet-mr
>  org.apache.parquet.VersionParser$VersionParseException: Could not parse 
> created_
>  by: parquet-mr using format: (.+) version ((.*) )?(build ?(.*))
>  at org.apache.parquet.VersionParser.parse(VersionParser.java:112)
>  at org.apache.parquet.CorruptStatistics.shouldIgnoreStatistics(CorruptSt
>  atistics.java:60)
>  at org.apache.parquet.format.converter.ParquetMetadataConverter.fromParq
>  uetStatistics(ParquetMetadataConverter.java:263)
>  at org.apache.parquet.hadoop.ParquetFileReader$Chunk.readAllPages(Parque
>  tFileReader.java:583)
>  at org.apache.parquet.hadoop.ParquetFileReader.readNextRowGroup(ParquetF
>  ileReader.java:513)
>  at org.apache.spark.sql.execution.datasources.parquet.VectorizedParquetR
>  ecordReader.checkEndOfRowGroup(VectorizedParquetRecordReader.java:270)
>  at org.apache.spark.sql.execution.datasources.parquet.VectorizedParquetR
>  ecordReader.nextBatch(VectorizedParquetRecordReader.java:225)
>  at org.apache.spark.sql.execution.datasources.parquet.VectorizedParquetR
>  ecordReader.nextKeyValue(VectorizedParquetRecordReader.java:137)
>  at org.apache.spark.sql.execution.datasources.RecordReaderIterator.hasNe
>  xt(RecordReaderIterator.scala:39)
>  at org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.hasNex
>  t(FileScanRDD.scala:109)
>  at org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.nextIt
>  erator(FileScanRDD.scala:184)
>  at org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.hasNex
>  t(FileScanRDD.scala:109)
>  at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIte
>  rator.scan_nextBatch$(Unknown Source)
>  at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIte
>  rator.processNext(Unknown Source)
>  at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRo
>  wIterator.java:43)
>  at org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$8$$anon
>  $1.hasNext(WholeStageCodegenExec.scala:377)
>  at org.apache.spark.sql.execution.SparkPlan$$anonfun$2.apply(SparkPlan.s
>  cala:231)
>  at org.apache.spark.sql.execution.SparkPlan$$anonfun$2.apply(SparkPlan.s
>  cala:225)
>  at org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$ap
>  ply$25.apply(RDD.scala:827)
>  at org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$ap
>  ply$25.apply(RDD.scala:827)
>  at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:
>  38)
>  at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323)
>  at org.apache.spark.rdd.RDD.iterator(RDD.scala:287)
>  at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)
>  at org.apache.spark.scheduler.Task.run(Task.scala:99)
>  at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:325)
> at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.
>  java:1142)
>  at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor
>  .java:617)
>  at java.lang.Thread.run(Thread.java:745)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-23371) Parquet Footer data is wrong on window in parquet format partition table

2018-02-09 Thread pin_zhang (JIRA)
pin_zhang created SPARK-23371:
-

 Summary: Parquet Footer data is wrong on window in parquet format 
partition table 
 Key: SPARK-23371
 URL: https://issues.apache.org/jira/browse/SPARK-23371
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 2.1.2, 2.1.1
Reporter: pin_zhang


On window

Run SQL in spark shell
 spark.sql("create table part_test (id string )partitioned by( index int) 
stored as parquet")
 spark.sql("insert into part_test partition (index =1) values ('1')")

Get exception when query spark.sql("select * from part_test ").show()

For the parquet.Version in parquet-hadoop-bundle-1.6.0.jar cannot load the 
version info in spark on window. Classloader try to get version in the 
parquet-format-2.3.0-incubating.jar

18/02/09 16:58:48 WARN CorruptStatistics: Ignoring statistics because created_by
 could not be parsed (see PARQUET-251): parquet-mr
 org.apache.parquet.VersionParser$VersionParseException: Could not parse 
created_
 by: parquet-mr using format: (.+) version ((.*) )?(build ?(.*))
 at org.apache.parquet.VersionParser.parse(VersionParser.java:112)
 at org.apache.parquet.CorruptStatistics.shouldIgnoreStatistics(CorruptSt
 atistics.java:60)
 at org.apache.parquet.format.converter.ParquetMetadataConverter.fromParq
 uetStatistics(ParquetMetadataConverter.java:263)
 at org.apache.parquet.hadoop.ParquetFileReader$Chunk.readAllPages(Parque
 tFileReader.java:583)
 at org.apache.parquet.hadoop.ParquetFileReader.readNextRowGroup(ParquetF
 ileReader.java:513)
 at org.apache.spark.sql.execution.datasources.parquet.VectorizedParquetR
 ecordReader.checkEndOfRowGroup(VectorizedParquetRecordReader.java:270)
 at org.apache.spark.sql.execution.datasources.parquet.VectorizedParquetR
 ecordReader.nextBatch(VectorizedParquetRecordReader.java:225)
 at org.apache.spark.sql.execution.datasources.parquet.VectorizedParquetR
 ecordReader.nextKeyValue(VectorizedParquetRecordReader.java:137)
 at org.apache.spark.sql.execution.datasources.RecordReaderIterator.hasNe
 xt(RecordReaderIterator.scala:39)
 at org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.hasNex
 t(FileScanRDD.scala:109)
 at org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.nextIt
 erator(FileScanRDD.scala:184)
 at org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.hasNex
 t(FileScanRDD.scala:109)
 at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIte
 rator.scan_nextBatch$(Unknown Source)
 at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIte
 rator.processNext(Unknown Source)
 at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRo
 wIterator.java:43)
 at org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$8$$anon
 $1.hasNext(WholeStageCodegenExec.scala:377)
 at org.apache.spark.sql.execution.SparkPlan$$anonfun$2.apply(SparkPlan.s
 cala:231)
 at org.apache.spark.sql.execution.SparkPlan$$anonfun$2.apply(SparkPlan.s
 cala:225)
 at org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$ap
 ply$25.apply(RDD.scala:827)
 at org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$ap
 ply$25.apply(RDD.scala:827)
 at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:
 38)
 at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323)
 at org.apache.spark.rdd.RDD.iterator(RDD.scala:287)
 at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)
 at org.apache.spark.scheduler.Task.run(Task.scala:99)
 at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:325)

at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.
 java:1142)
 at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor
 .java:617)
 at java.lang.Thread.run(Thread.java:745)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-23086) Spark SQL cannot support high concurrency for lock in HiveMetastoreCatalog

2018-01-16 Thread pin_zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-23086?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

pin_zhang updated SPARK-23086:
--
Description: 
* Hive metastore is mysql
* Set hive.server2.thrift.max.worker.threads=500
create table test (id string ) partitioned by (index int) stored as  
parquet;
insert into test  partition (index=1) values('id1');
 * 100 Clients run SQL“select * from table” on table
 * Many clients (97%) blocked at HiveExternalCatalog.withClient
 * Is synchronized expected when only run query against tables?   

"pool-21-thread-65" #1178 prio=5 os_prio=0 tid=0x2aaac8e06800 nid=0x1e70 
waiting for monitor entry [0x4e19a000]
   java.lang.Thread.State: BLOCKED (on object monitor)
at 
org.apache.spark.sql.hive.HiveExternalCatalog.withClient(HiveExternalCatalog.scala:97)
- waiting to lock <0xc06a3ba8> (a 
org.apache.spark.sql.hive.HiveExternalCatalog)
at 
org.apache.spark.sql.hive.HiveExternalCatalog.getTable(HiveExternalCatalog.scala:674)
at 
org.apache.spark.sql.catalyst.catalog.SessionCatalog.lookupRelation(SessionCatalog.scala:667)
- locked <0xc41ab748> (a 
org.apache.spark.sql.hive.HiveSessionCatalog)
at 
org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveRelations$.org$apache$spark$sql$catalyst$analysis$Analyzer$ResolveRelations$$lookupTableFromCatalog(Analyzer.scala:646)
at 
org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveRelations$.resolveRelation(Analyzer.scala:601)
at 
org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveRelations$$anonfun$apply$8.applyOrElse(Analyzer.scala:631)
at 
org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveRelations$$anonfun$apply$8.applyOrElse(Analyzer.scala:624)
at 
org.apache.spark.sql.catalyst.plans.logical.LogicalPlan$$anonfun$resolveOperators$1.apply(LogicalPlan.scala:62)
at 
org.apache.spark.sql.catalyst.plans.logical.LogicalPlan$$anonfun$resolveOperators$1.apply(LogicalPlan.scala:62)
at 
org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:70)
at 
org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.resolveOperators(LogicalPlan.scala:61)
at 
org.apache.spark.sql.catalyst.plans.logical.LogicalPlan$$anonfun$1.apply(LogicalPlan.scala:59)
at 
org.apache.spark.sql.catalyst.plans.logical.LogicalPlan$$anonfun$1.apply(LogicalPlan.scala:59)
at 
org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$4.apply(TreeNode.scala:306)
at 
org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:187)
at 
org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:304)
at 
org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.resolveOperators(LogicalPlan.scala:59)
at 
org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveRelations$.apply(Analyzer.scala:624)
at 
org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveRelations$.apply(Analyzer.scala:570)
at 
org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$execute$1$$anonfun$apply$1.apply(RuleExecutor.scala:85)
at 
org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$execute$1$$anonfun$apply$1.apply(RuleExecutor.scala:82)
at 
scala.collection.LinearSeqOptimized$class.foldLeft(LinearSeqOptimized.scala:124)
at scala.collection.immutable.List.foldLeft(List.scala:84)
at 
org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$execute$1.apply(RuleExecutor.scala:82)
at 
org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$execute$1.apply(RuleExecutor.scala:74)
at scala.collection.immutable.List.foreach(List.scala:381)
at 
org.apache.spark.sql.catalyst.rules.RuleExecutor.execute(RuleExecutor.scala:74)
at 
org.apache.spark.sql.execution.QueryExecution.analyzed$lzycompute(QueryExecution.scala:69)
- locked <0xff491c48> (a 
org.apache.spark.sql.execution.QueryExecution)
at 
org.apache.spark.sql.execution.QueryExecution.analyzed(QueryExecution.scala:67)
at 
org.apache.spark.sql.execution.QueryExecution.assertAnalyzed(QueryExecution.scala:50)
at org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:67)
at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:632)
at org.apache.spark.sql.SQLContext.sql(SQLContext.scala:691)
at 
org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation.org$apache$spark$sql$hive$thriftserver$SparkExecuteStatementOperation$$execute(SparkExecuteStatementOperation.scala:231)
at 
org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation$$anon$1$$anon$2.run(SparkExecuteStatementOperation.scala:174)
at 
org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation$$anon$1$$anon$2.run(SparkExecuteStatementOperation.scala:171)
at 

[jira] [Created] (SPARK-23086) Spark SQL cannot support high concurrency for lock in HiveMetastoreCatalog

2018-01-16 Thread pin_zhang (JIRA)
pin_zhang created SPARK-23086:
-

 Summary: Spark SQL cannot support high concurrency for lock in 
HiveMetastoreCatalog
 Key: SPARK-23086
 URL: https://issues.apache.org/jira/browse/SPARK-23086
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 2.1.1
 Environment: * Spark 2.2.1
Reporter: pin_zhang


* Hive metastore is mysql
* Set hive.server2.thrift.max.worker.threads=500
create table test (id string ) partitioned by (index int) stored as  
parquet;
insert into pz_tb partition (index=1) values('id1');
 * 100 Clients run SQL“select * from table” on cached table
 * Many clients (97%) blocked at HiveExternalCatalog.withClient
 * Is synchronized expected when only run query against tables?   

"pool-21-thread-65" #1178 prio=5 os_prio=0 tid=0x2aaac8e06800 nid=0x1e70 
waiting for monitor entry [0x4e19a000]
   java.lang.Thread.State: BLOCKED (on object monitor)
at 
org.apache.spark.sql.hive.HiveExternalCatalog.withClient(HiveExternalCatalog.scala:97)
- waiting to lock <0xc06a3ba8> (a 
org.apache.spark.sql.hive.HiveExternalCatalog)
at 
org.apache.spark.sql.hive.HiveExternalCatalog.getTable(HiveExternalCatalog.scala:674)
at 
org.apache.spark.sql.catalyst.catalog.SessionCatalog.lookupRelation(SessionCatalog.scala:667)
- locked <0xc41ab748> (a 
org.apache.spark.sql.hive.HiveSessionCatalog)
at 
org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveRelations$.org$apache$spark$sql$catalyst$analysis$Analyzer$ResolveRelations$$lookupTableFromCatalog(Analyzer.scala:646)
at 
org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveRelations$.resolveRelation(Analyzer.scala:601)
at 
org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveRelations$$anonfun$apply$8.applyOrElse(Analyzer.scala:631)
at 
org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveRelations$$anonfun$apply$8.applyOrElse(Analyzer.scala:624)
at 
org.apache.spark.sql.catalyst.plans.logical.LogicalPlan$$anonfun$resolveOperators$1.apply(LogicalPlan.scala:62)
at 
org.apache.spark.sql.catalyst.plans.logical.LogicalPlan$$anonfun$resolveOperators$1.apply(LogicalPlan.scala:62)
at 
org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:70)
at 
org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.resolveOperators(LogicalPlan.scala:61)
at 
org.apache.spark.sql.catalyst.plans.logical.LogicalPlan$$anonfun$1.apply(LogicalPlan.scala:59)
at 
org.apache.spark.sql.catalyst.plans.logical.LogicalPlan$$anonfun$1.apply(LogicalPlan.scala:59)
at 
org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$4.apply(TreeNode.scala:306)
at 
org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:187)
at 
org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:304)
at 
org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.resolveOperators(LogicalPlan.scala:59)
at 
org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveRelations$.apply(Analyzer.scala:624)
at 
org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveRelations$.apply(Analyzer.scala:570)
at 
org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$execute$1$$anonfun$apply$1.apply(RuleExecutor.scala:85)
at 
org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$execute$1$$anonfun$apply$1.apply(RuleExecutor.scala:82)
at 
scala.collection.LinearSeqOptimized$class.foldLeft(LinearSeqOptimized.scala:124)
at scala.collection.immutable.List.foldLeft(List.scala:84)
at 
org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$execute$1.apply(RuleExecutor.scala:82)
at 
org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$execute$1.apply(RuleExecutor.scala:74)
at scala.collection.immutable.List.foreach(List.scala:381)
at 
org.apache.spark.sql.catalyst.rules.RuleExecutor.execute(RuleExecutor.scala:74)
at 
org.apache.spark.sql.execution.QueryExecution.analyzed$lzycompute(QueryExecution.scala:69)
- locked <0xff491c48> (a 
org.apache.spark.sql.execution.QueryExecution)
at 
org.apache.spark.sql.execution.QueryExecution.analyzed(QueryExecution.scala:67)
at 
org.apache.spark.sql.execution.QueryExecution.assertAnalyzed(QueryExecution.scala:50)
at org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:67)
at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:632)
at org.apache.spark.sql.SQLContext.sql(SQLContext.scala:691)
at 
org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation.org$apache$spark$sql$hive$thriftserver$SparkExecuteStatementOperation$$execute(SparkExecuteStatementOperation.scala:231)
at 

[jira] [Created] (SPARK-22420) Spark SQL return invalid json string for struct with date/datetime field

2017-11-01 Thread pin_zhang (JIRA)
pin_zhang created SPARK-22420:
-

 Summary: Spark SQL return invalid json string for struct with 
date/datetime field
 Key: SPARK-22420
 URL: https://issues.apache.org/jira/browse/SPARK-22420
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 2.1.1
Reporter: pin_zhang
Priority: Normal


Run SQL with JDBC client in spark hiveserver2

select  named_struct ( 'b',current_timestamp) from test;
+---+--+
| named_struct(b, current_timestamp())  |
+---+--+
| {"b":2017-11-01 23:18:40.988} |
The json string is is invalid, date time value should be quoted.

If run sql in Apache hiveserver2, get expected json string
select  named_struct ( 'b',current_timestamp) from dummy_table ;
+--+--+
|   _c0|
+--+--+
| {"b":"2017-11-01 23:21:24.168"}  |
+--+--+




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-21437) Java Keyword cannot be used in table schema

2017-07-17 Thread pin_zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-21437?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16091100#comment-16091100
 ] 

pin_zhang commented on SPARK-21437:
---

Hive doesn't have such limitation,  we can create table with sql "create table 
`long` ( `long` long)"
Isn't a spark bug?

> Java Keyword cannot be used in table schema
> ---
>
> Key: SPARK-21437
> URL: https://issues.apache.org/jira/browse/SPARK-21437
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.1.1
>Reporter: pin_zhang
>
> Java keywords doesn't work in spark211 that works in spark 201
> import org.apache.spark.SparkConf
> import org.apache.spark.SparkContext
> import org.apache.spark.sql.SparkSession
> case class a(`const`: Int)
> case class b(aa: a)
> object KeyworkdsTest {
>   def main(args: Array[String]): Unit = {
> val conf = new SparkConf().setAppName("scala").setMaster("local[2]")
> val sc = new SparkContext(conf)
> val spark = 
> SparkSession.builder().enableHiveSupport().config(conf).getOrCreate()
> val q = Seq(b(a(1)))
> val rdd = sc.makeRDD(q)
> val d = spark.createDataFrame(rdd)
>   }
> }



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-21437) Java Keyword cannot be used in table schema

2017-07-17 Thread pin_zhang (JIRA)
pin_zhang created SPARK-21437:
-

 Summary: Java Keyword cannot be used in table schema
 Key: SPARK-21437
 URL: https://issues.apache.org/jira/browse/SPARK-21437
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 2.1.1
Reporter: pin_zhang


Java keywords doesn't work in spark211 that works in spark 201

import org.apache.spark.SparkConf
import org.apache.spark.SparkContext
import org.apache.spark.sql.SparkSession

case class a(`const`: Int)
case class b(aa: a)
object KeyworkdsTest {

  def main(args: Array[String]): Unit = {
val conf = new SparkConf().setAppName("scala").setMaster("local[2]")
val sc = new SparkContext(conf)
val spark = 
SparkSession.builder().enableHiveSupport().config(conf).getOrCreate()
val q = Seq(b(a(1)))
val rdd = sc.makeRDD(q)
val d = spark.createDataFrame(rdd)
  }
}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-21105) Useless empty files in hive table

2017-06-15 Thread pin_zhang (JIRA)
pin_zhang created SPARK-21105:
-

 Summary: Useless empty files in hive table
 Key: SPARK-21105
 URL: https://issues.apache.org/jira/browse/SPARK-21105
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 2.0.1
Reporter: pin_zhang


case class Base(v: Option[Double])
object EmptyFiles {
  
  def main(args: Array[String]): Unit = {
val conf = new SparkConf().setAppName("scala").setMaster("local[12]")
val ctx = new SparkContext(conf)
val spark = 
SparkSession.builder().enableHiveSupport().config(conf).getOrCreate()
val seq = Seq(Base(Some(1D)), Base(Some(1D)));
val rdd = ctx.makeRDD[Base](seq)
import spark.implicits._

rdd.toDS().write.format("json").mode(SaveMode.Append).saveAsTable("EmptyFiles")
  }
}

// DataSet create many useless empty files for empty partition
// if insert  small RDD into the table many times, which result in too many 
empty files, which slow down the query.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-18536) Failed to save to hive table when case class with empty field

2016-11-21 Thread pin_zhang (JIRA)
pin_zhang created SPARK-18536:
-

 Summary: Failed to save to hive table when case class with empty 
field
 Key: SPARK-18536
 URL: https://issues.apache.org/jira/browse/SPARK-18536
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 2.0.1
Reporter: pin_zhang



import scala.collection.mutable.Queue

import org.apache.spark.SparkConf
import org.apache.spark.SparkContext
import org.apache.spark.sql.SaveMode
import org.apache.spark.sql.SparkSession
import org.apache.spark.streaming.Seconds
import org.apache.spark.streaming.StreamingContext
1. Test code
case class EmptyC()
case class EmptyCTable(dimensions: EmptyC, timebin: java.lang.Long)

object EmptyTest {

  def main(args: Array[String]): Unit = {
val conf = new SparkConf().setAppName("scala").setMaster("local[2]")
val ctx = new SparkContext(conf)
val spark = 
SparkSession.builder().enableHiveSupport().config(conf).getOrCreate()
val seq = Seq(EmptyCTable(EmptyC(), 100L))
val rdd = ctx.makeRDD[EmptyCTable](seq)
val ssc = new StreamingContext(ctx, Seconds(1))

val queue = Queue(rdd)
val s = ssc.queueStream(queue, false);
s.foreachRDD((rdd, time) => {
  if (!rdd.isEmpty) {
import spark.sqlContext.implicits._
rdd.toDF.write.mode(SaveMode.Overwrite).saveAsTable("empty_table")
  }
})

ssc.start()
ssc.awaitTermination()

  }

}

2. Exception
Caused by: java.lang.IllegalStateException: Cannot build an empty group
at org.apache.parquet.Preconditions.checkState(Preconditions.java:91)
at org.apache.parquet.schema.Types$GroupBuilder.build(Types.java:554)
at org.apache.parquet.schema.Types$GroupBuilder.build(Types.java:426)
at org.apache.parquet.schema.Types$Builder.named(Types.java:228)
at 
org.apache.spark.sql.execution.datasources.parquet.ParquetSchemaConverter.convertField(ParquetSchemaConverter.scala:527)
at 
org.apache.spark.sql.execution.datasources.parquet.ParquetSchemaConverter.convertField(ParquetSchemaConverter.scala:321)
at 
org.apache.spark.sql.execution.datasources.parquet.ParquetSchemaConverter$$anonfun$convert$1.apply(ParquetSchemaConverter.scala:313)
at 
org.apache.spark.sql.execution.datasources.parquet.ParquetSchemaConverter$$anonfun$convert$1.apply(ParquetSchemaConverter.scala:313)
at 
scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
at 
scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
at scala.collection.Iterator$class.foreach(Iterator.scala:893)
at scala.collection.AbstractIterator.foreach(Iterator.scala:1336)
at scala.collection.IterableLike$class.foreach(IterableLike.scala:72)
at org.apache.spark.sql.types.StructType.foreach(StructType.scala:95)
at scala.collection.TraversableLike$class.map(TraversableLike.scala:234)
at org.apache.spark.sql.types.StructType.map(StructType.scala:95)
at 
org.apache.spark.sql.execution.datasources.parquet.ParquetSchemaConverter.convert(ParquetSchemaConverter.scala:313)
at 
org.apache.spark.sql.execution.datasources.parquet.ParquetWriteSupport.init(ParquetWriteSupport.scala:85)
at 
org.apache.parquet.hadoop.ParquetOutputFormat.getRecordWriter(ParquetOutputFormat.java:288)
at 
org.apache.parquet.hadoop.ParquetOutputFormat.getRecordWriter(ParquetOutputFormat.java:262)
at 
org.apache.spark.sql.execution.datasources.parquet.ParquetOutputWriter.(ParquetFileFormat.scala:562)
at 
org.apache.spark.sql.execution.datasources.parquet.ParquetFileFormat$$anon$1.newInstance(ParquetFileFormat.scala:139)
at 
org.apache.spark.sql.execution.datasources.BaseWriterContainer.newOutputWriter(WriterContainer.scala:131)
at 
org.apache.spark.sql.execution.datasources.DefaultWriterContainer.writeRows(WriterContainer.scala:247)
at 
org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelationCommand$$anonfun$run$1$$anonfun$apply$mcV$sp$1.apply(InsertIntoHadoopFsRelationCommand.scala:143)
at 
org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelationCommand$$anonfun$run$1$$anonfun$apply$mcV$sp$1.apply(InsertIntoHadoopFsRelationCommand.scala:143)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:70)
at org.apache.spark.scheduler.Task.run(Task.scala:86)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:274)
... 3 more
 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Closed] (SPARK-17398) Failed to query on external JSon Partitioned table

2016-11-21 Thread pin_zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-17398?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

pin_zhang closed SPARK-17398.
-
   Resolution: Fixed
Fix Version/s: 2.0.1

> Failed to query on external JSon Partitioned table
> --
>
> Key: SPARK-17398
> URL: https://issues.apache.org/jira/browse/SPARK-17398
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.0
>Reporter: pin_zhang
> Fix For: 2.0.1
>
>
> 1. Create External Json partitioned table 
> with SerDe in hive-hcatalog-core-1.2.1.jar, download fom
> https://mvnrepository.com/artifact/org.apache.hive.hcatalog/hive-hcatalog-core/1.2.1
> 2. Query table meet exception, which works in spark1.5.2
> Exception in thread "main" org.apache.spark.SparkException: Job aborted due 
> to stage failure: Task 0 in stage 1.0 failed 1 times, most recent failure: 
> Lost task
>  0.0 in stage 1.0 (TID 1, localhost): java.lang.ClassCastException: 
> java.util.ArrayList cannot be cast to org.apache.hive.hcatalog.data.HCatRecord
> at 
> org.apache.hive.hcatalog.data.HCatRecordObjectInspector.getStructFieldData(HCatRecordObjectInspector.java:45)
> at 
> org.apache.spark.sql.hive.HadoopTableReader$$anonfun$fillObject$2.apply(TableReader.scala:430)
> at 
> org.apache.spark.sql.hive.HadoopTableReader$$anonfun$fillObject$2.apply(TableReader.scala:426)
>  
> 3. Test Code
> import org.apache.spark.SparkConf
> import org.apache.spark.SparkContext
> import org.apache.spark.sql.hive.HiveContext
> object JsonBugs {
>   def main(args: Array[String]): Unit = {
> val table = "test_json"
> val location = "file:///g:/home/test/json"
> val create = s"""CREATE   EXTERNAL  TABLE  ${table}
>  (id string,  seq string )
>   PARTITIONED BY(index int)
>   ROW FORMAT SERDE 'org.apache.hive.hcatalog.data.JsonSerDe'
>   LOCATION "${location}" 
>   """
> val add_part = s"""
>  ALTER TABLE ${table} ADD 
>  PARTITION (index=1)LOCATION '${location}/index=1'
> """
> val conf = new SparkConf().setAppName("scala").setMaster("local[2]")
> conf.set("spark.sql.warehouse.dir", "file:///g:/home/warehouse")
> val ctx = new SparkContext(conf)
> val hctx = new HiveContext(ctx)
> val exist = hctx.tableNames().map { x => x.toLowerCase() }.contains(table)
> if (!exist) {
>   hctx.sql(create)
>   hctx.sql(add_part)
> } else {
>   hctx.sql("show partitions " + table).show()
> }
> hctx.sql("select * from test_json").show()
>   }
> }



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-17932) Failed to run SQL "show table extended like table_name" in Spark2.0.0

2016-10-14 Thread pin_zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-17932?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

pin_zhang updated SPARK-17932:
--
Description: 
SQL "show table extended  like table_name " doesn't work in spark 2.0.0
that works in spark1.5.2

Error: org.apache.spark.sql.catalyst.parser.ParseException: 
missing 'FUNCTIONS' at 'extended'(line 1, pos 11)

== SQL ==
show table extended  like test
---^^^ (state=,code=0)




  was:
SQL "show table extended  like table_name " doesn't work in spark 2.0.0
that works in spark1.5.2





> Failed to run SQL "show table extended  like table_name"  in Spark2.0.0
> ---
>
> Key: SPARK-17932
> URL: https://issues.apache.org/jira/browse/SPARK-17932
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.0
>Reporter: pin_zhang
>
> SQL "show table extended  like table_name " doesn't work in spark 2.0.0
> that works in spark1.5.2
> Error: org.apache.spark.sql.catalyst.parser.ParseException: 
> missing 'FUNCTIONS' at 'extended'(line 1, pos 11)
> == SQL ==
> show table extended  like test
> ---^^^ (state=,code=0)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-17932) Failed to run SQL "show table extended like table_name" in Spark2.0.0

2016-10-13 Thread pin_zhang (JIRA)
pin_zhang created SPARK-17932:
-

 Summary: Failed to run SQL "show table extended  like table_name"  
in Spark2.0.0
 Key: SPARK-17932
 URL: https://issues.apache.org/jira/browse/SPARK-17932
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 2.0.0
Reporter: pin_zhang


SQL "show table extended  like table_name " doesn't work in spark 2.0.0
that works in spark1.5.2






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-12008) Spark hive security authorization doesn't work as Apache hive's

2016-09-11 Thread pin_zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-12008?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15482828#comment-15482828
 ] 

pin_zhang commented on SPARK-12008:
---

Does Spark SQL have any plan to support authrization in the near futrue?

> Spark hive security authorization doesn't work as Apache hive's
> ---
>
> Key: SPARK-12008
> URL: https://issues.apache.org/jira/browse/SPARK-12008
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.5.2
>Reporter: pin_zhang
>
> Spark hive security authorization doesn't consistent with apache hive
> The same hive-site.xml
>  
>  hive.security.authorization.enabled
>  true
> 
>
> hive.security.authorization.manager
> org.apache.hadoop.hive.ql.security.authorization.plugin.sqlstd.SQLStdHiveAuthorizerFactory
> 
> 
> hive.security.authenticator.manager
> org.apache.hadoop.hive.ql.security.SessionStateUserAuthenticator
>   
>
> hive.server2.enable.doAs
> true
> 
> 1. Run spark start-thriftserver.sh, Will meet exception when run sql.
>SQL standards based authorization should not be enabled from hive 
> cliInstead the use of storage based authorization in hive metastore is 
> reccomended. 
>Set hive.security.authorization.enabled=false to disable authz within cli
> 2. Change to start start-thriftserver.sh with hive configurations
> ./start-thriftserver.sh --conf 
> hive.security.authorization.manager=org.apache.hadoop.hive.ql.security.authorization.plugin.sqlstd.SQLStdHiveAuthorizerFactory
>  --conf 
> hive.security.authenticator.manager=org.apache.hadoop.hive.ql.security.SessionStateUserAuthenticator
>  
> 3. Beeline connect with userA and create table tableA.
> 4. Beeline connect with userB to truncate tableA
>   A) In Apache hive, truncate table get exception
>   Error while compiling statement: FAILED: HiveAccessControlException 
> Permission denied: Principal [name=userB, type=USER] does not have following 
> privileges for operation TRUNCATETABLE [[OBJECT OWNERSHIP] on Object 
> [type=TABLE_OR_VIEW, name=default.tablea]] (state=42000,code=4)
>   B) In Spark hive, any user that can connect to the hive, can truncate, as 
> long as the spark user has privileges.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-17396) Threads number keep increasing when query on external CSV partitioned table

2016-09-05 Thread pin_zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-17396?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15466180#comment-15466180
 ] 

pin_zhang commented on SPARK-17396:
---

"Thread-1902" daemon prio=6 tid=0x14078800 nid=0x3a6c runnable 
[0x38d5e000]
"Thread-1901" daemon prio=6 tid=0x0c64f800 nid=0x32fc runnable 
[0x191ef000]
"Thread-1900" daemon prio=6 tid=0x14249800 nid=0x263c runnable 
[0x4c73e000]
"Thread-1899" daemon prio=6 tid=0x14244000 nid=0x189c runnable 
[0x17c7e000]
"Thread-1898" daemon prio=6 tid=0x0d96a800 nid=0x3e54 runnable 
[0x4c5ef000]
"ForkJoinPool-120-worker-1" daemon prio=6 tid=0x1407d000 nid=0x2234 
waiting for monitor entry [0x4c31e000]
"ForkJoinPool-120-worker-3" daemon prio=6 tid=0x13a64000 nid=0x1f0c 
waiting for monitor entry [0x4c0de000]
"ForkJoinPool-120-worker-5" daemon prio=6 tid=0x13a75800 nid=0x1660 
waiting for monitor entry [0x4241e000]
"ForkJoinPool-120-worker-7" daemon prio=6 tid=0x13d6c000 nid=0x117c 
waiting for monitor entry [0x4bece000]
"ForkJoinPool-120-worker-9" daemon prio=6 tid=0x14233800 nid=0x2a20 
waiting for monitor entry [0x4bd3e000]
"ForkJoinPool-120-worker-11" daemon prio=6 tid=0x1423f800 nid=0x3568 
waiting for monitor entry [0x4afae000]
"ForkJoinPool-120-worker-13" daemon prio=6 tid=0x1424e000 nid=0x378c 
waiting for monitor entry [0x4bc0e000]
"ForkJoinPool-120-worker-15" daemon prio=6 tid=0x14238000 nid=0x1b8c 
waiting for monitor entry [0x18dfd000]
"ForkJoinPool-119-worker-1" daemon prio=6 tid=0x13d74800 nid=0x29a0 
waiting for monitor entry [0x4bade000]
"ForkJoinPool-119-worker-3" daemon prio=6 tid=0x12cd4000 nid=0x18a0 in 
Object.wait() [0x4b9ae000]
"ForkJoinPool-119-worker-7" daemon prio=6 tid=0x12cd3000 nid=0x15ec 
waiting for monitor entry [0x4b87d000]
"ForkJoinPool-119-worker-5" daemon prio=6 tid=0x13bbd800 nid=0x2c24 
waiting for monitor entry [0x4b76d000]
"ForkJoinPool-119-worker-9" daemon prio=6 tid=0x13bc9800 nid=0x3d78 
waiting for monitor entry [0x2acae000]
"ForkJoinPool-119-worker-11" daemon prio=6 tid=0x0d9eb000 nid=0x3f40 
waiting for monitor entry [0x4b57e000]
"ForkJoinPool-119-worker-13" daemon prio=6 tid=0x0d9e4800 nid=0x286c 
waiting for monitor entry [0x4b40e000]
"ForkJoinPool-119-worker-15" daemon prio=6 tid=0x0d9e9000 nid=0x2304 in 
Object.wait() [0x194de000]
"ForkJoinPool-118-worker-1" daemon prio=6 tid=0x14077000 nid=0x3a50 
runnable [0x393dd000]
"ForkJoinPool-118-worker-3" daemon prio=6 tid=0x1407a000 nid=0x1dc0 
runnable [0x2331d000]
"ForkJoinPool-118-worker-5" daemon prio=6 tid=0x0d2f9000 nid=0x2990 
runnable [0x1b6fd000]
"ForkJoinPool-118-worker-7" daemon prio=6 tid=0x0d2df800 nid=0x3bb4 
runnable [0x4a9dd000]
"ForkJoinPool-118-worker-9" daemon prio=6 tid=0x0d2f7800 nid=0x37e4 
waiting for monitor entry [0x2bf5e000]
"ForkJoinPool-118-worker-11" daemon prio=6 tid=0x12648000 nid=0x2878 
runnable [0x2b26d000]
"ForkJoinPool-118-worker-13" daemon prio=6 tid=0x12646000 nid=0x4cc 
waiting for monitor entry [0x183de000]
"ForkJoinPool-118-worker-15" daemon prio=6 tid=0x12647800 nid=0x30c8 
waiting for monitor entry [0x2bd3d000]
"ForkJoinPool-117-worker-5" daemon prio=6 tid=0x12b5c800 nid=0x3510 
waiting for monitor entry [0x4b2be000]
"ForkJoinPool-117-worker-1" daemon prio=6 tid=0x12b5d000 nid=0x36b8 
waiting for monitor entry [0x4b11e000]
"ForkJoinPool-117-worker-3" daemon prio=6 tid=0x12eac800 nid=0x32d4 in 
Object.wait() [0x4acae000]
"ForkJoinPool-117-worker-7" daemon prio=6 tid=0x12ea9800 nid=0x16c4 
waiting for monitor entry [0x4ab1e000]
"ForkJoinPool-117-worker-9" daemon prio=6 tid=0x12e9b000 nid=0x1e44 
waiting for monitor entry [0x2162e000]
"ForkJoinPool-117-worker-11" daemon prio=6 tid=0x13bcc000 nid=0x37f4 
waiting for monitor entry [0x40dee000]
"ForkJoinPool-117-worker-13" daemon prio=6 tid=0x13bcb000 nid=0x361c in 
Object.wait() [0x35dbe000]
"ForkJoinPool-117-worker-15" daemon prio=6 tid=0x13bca800 nid=0x3344 in 
Object.wait() [0x2c0ce000]
"ForkJoinPool-116-worker-1" daemon prio=6 tid=0x13bc9000 nid=0x3a34 
runnable [0x4867d000]
"ForkJoinPool-116-worker-3" daemon prio=6 tid=0x13bc8000 nid=0x1c10 in 
Object.wait() [0x4a8be000]
"ForkJoinPool-116-worker-7" daemon prio=6 tid=0x13bc7800 nid=0x2910 
waiting on condition [0x45e7f000]
"ForkJoinPool-116-worker-5" daemon prio=6 tid=0x13bc6800 nid=0x3b1c 
waiting for monitor entry 

[jira] [Commented] (SPARK-17396) Threads number keep increasing when query on external CSV partitioned table

2016-09-05 Thread pin_zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-17396?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15464602#comment-15464602
 ] 

pin_zhang commented on SPARK-17396:
---

1.Thousand of thread created look like
ForkJoinPool-20-worker-9" #329 daemon prio=5 os_prio=0 tid=0x0ac87000 
nid=0x3d43 waiting on condition [0x5069f000]
"ForkJoinPool-19-worker-3" #324 daemon prio=5 os_prio=0 tid=0x0ae6 
nid=0x3c2a waiting on condition [0x5039c000]

2.The thread should be created by UnionRDD


> Threads number keep increasing when query on external CSV partitioned table
> ---
>
> Key: SPARK-17396
> URL: https://issues.apache.org/jira/browse/SPARK-17396
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.0.0
>Reporter: pin_zhang
>
> 1. Create a external partitioned table row format CSV
> 2. Add 16 partitions to the table
> 3. Run SQL "select count(*) from test_csv"
> 4. ForkJoinThread number keep increasing 
> This happend when table partitions number greater than 10.
> 5. Test Code
> import org.apache.spark.SparkConf
> import org.apache.spark.SparkContext
> import org.apache.spark.sql.hive.HiveContext
> object Bugs {
>   def main(args: Array[String]): Unit = {
> val location = "file:///g:/home/test/csv"
> val create = s"""CREATE   EXTERNAL  TABLE  test_csv
>  (ID string,  SEQ string )
>   PARTITIONED BY(index int)
>   ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.OpenCSVSerde'
>   LOCATION "${location}" 
>   """
> val add_part = s"""
>   ALTER TABLE test_csv ADD 
>   PARTITION (index=1)LOCATION '${location}/index=1'
>   PARTITION (index=2)LOCATION '${location}/index=2'
>   PARTITION (index=3)LOCATION '${location}/index=3'
>   PARTITION (index=4)LOCATION '${location}/index=4'
>   PARTITION (index=5)LOCATION '${location}/index=5'
>   PARTITION (index=6)LOCATION '${location}/index=6'
>   PARTITION (index=7)LOCATION '${location}/index=7'
>   PARTITION (index=8)LOCATION '${location}/index=8'
>   PARTITION (index=9)LOCATION '${location}/index=9'
>   PARTITION (index=10)LOCATION '${location}/index=10'
>   PARTITION (index=11)LOCATION '${location}/index=11'
>   PARTITION (index=12)LOCATION '${location}/index=12'
>   PARTITION (index=13)LOCATION '${location}/index=13'
>   PARTITION (index=14)LOCATION '${location}/index=14'
>   PARTITION (index=15)LOCATION '${location}/index=15'
>   PARTITION (index=16)LOCATION '${location}/index=16'
> """
> val conf = new SparkConf().setAppName("scala").setMaster("local[2]")
> conf.set("spark.sql.warehouse.dir", "file:///g:/home/warehouse")
> val ctx = new SparkContext(conf)
> val hctx = new HiveContext(ctx)
> hctx.sql(create)
> hctx.sql(add_part)
>  for (i <- 1 to 6) {
>   new Query(hctx).start()
> }
>   }
>   class Query(htcx: HiveContext) extends Thread {
> setName("Query-Thread")
> override def run = {
>   while (true) {
> htcx.sql("select count(*) from test_csv").show()
> Thread.sleep(100)
>   }
> }
>   }
> }



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-17398) Failed to query on external JSon Partitioned table

2016-09-04 Thread pin_zhang (JIRA)
pin_zhang created SPARK-17398:
-

 Summary: Failed to query on external JSon Partitioned table
 Key: SPARK-17398
 URL: https://issues.apache.org/jira/browse/SPARK-17398
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 2.0.0
Reporter: pin_zhang


1. Create External Json partitioned table 
with SerDe in hive-hcatalog-core-1.2.1.jar, download fom
https://mvnrepository.com/artifact/org.apache.hive.hcatalog/hive-hcatalog-core/1.2.1
2. Query table meet exception, which works in spark1.5.2
Exception in thread "main" org.apache.spark.SparkException: Job aborted due to 
stage failure: Task 0 in stage 1.0 failed 1 times, most recent failure: Lost 
task
 0.0 in stage 1.0 (TID 1, localhost): java.lang.ClassCastException: 
java.util.ArrayList cannot be cast to org.apache.hive.hcatalog.data.HCatRecord
at 
org.apache.hive.hcatalog.data.HCatRecordObjectInspector.getStructFieldData(HCatRecordObjectInspector.java:45)
at 
org.apache.spark.sql.hive.HadoopTableReader$$anonfun$fillObject$2.apply(TableReader.scala:430)
at 
org.apache.spark.sql.hive.HadoopTableReader$$anonfun$fillObject$2.apply(TableReader.scala:426)
 

3. Test Code

import org.apache.spark.SparkConf
import org.apache.spark.SparkContext
import org.apache.spark.sql.hive.HiveContext

object JsonBugs {

  def main(args: Array[String]): Unit = {
val table = "test_json"
val location = "file:///g:/home/test/json"
val create = s"""CREATE   EXTERNAL  TABLE  ${table}
 (id string,  seq string )
  PARTITIONED BY(index int)
  ROW FORMAT SERDE 'org.apache.hive.hcatalog.data.JsonSerDe'
  LOCATION "${location}" 
  """
val add_part = s"""
 ALTER TABLE ${table} ADD 
 PARTITION (index=1)LOCATION '${location}/index=1'
"""

val conf = new SparkConf().setAppName("scala").setMaster("local[2]")
conf.set("spark.sql.warehouse.dir", "file:///g:/home/warehouse")
val ctx = new SparkContext(conf)

val hctx = new HiveContext(ctx)
val exist = hctx.tableNames().map { x => x.toLowerCase() }.contains(table)
if (!exist) {
  hctx.sql(create)
  hctx.sql(add_part)
} else {
  hctx.sql("show partitions " + table).show()
}
hctx.sql("select * from test_json").show()
  }
}




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-17396) Threads number keep increasing when query on external CSV partitioned table

2016-09-04 Thread pin_zhang (JIRA)
pin_zhang created SPARK-17396:
-

 Summary: Threads number keep increasing when query on external CSV 
partitioned table
 Key: SPARK-17396
 URL: https://issues.apache.org/jira/browse/SPARK-17396
 Project: Spark
  Issue Type: Bug
  Components: Spark Core
Affects Versions: 2.0.0
Reporter: pin_zhang


1. Create a external partitioned table row format CSV
2. Add 16 partitions to the table
3. Run SQL "select count(*) from test_csv"
4. ForkJoinThread number keep increasing 
This happend when table partitions number greater than 10.
5. Test Code
import org.apache.spark.SparkConf
import org.apache.spark.SparkContext
import org.apache.spark.sql.hive.HiveContext

object Bugs {

  def main(args: Array[String]): Unit = {

val location = "file:///g:/home/test/csv"
val create = s"""CREATE   EXTERNAL  TABLE  test_csv
 (ID string,  SEQ string )
  PARTITIONED BY(index int)
  ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.OpenCSVSerde'
  LOCATION "${location}" 
  """
val add_part = s"""
  ALTER TABLE test_csv ADD 
  PARTITION (index=1)LOCATION '${location}/index=1'
  PARTITION (index=2)LOCATION '${location}/index=2'
  PARTITION (index=3)LOCATION '${location}/index=3'
  PARTITION (index=4)LOCATION '${location}/index=4'
  PARTITION (index=5)LOCATION '${location}/index=5'
  PARTITION (index=6)LOCATION '${location}/index=6'
  PARTITION (index=7)LOCATION '${location}/index=7'
  PARTITION (index=8)LOCATION '${location}/index=8'
  PARTITION (index=9)LOCATION '${location}/index=9'
  PARTITION (index=10)LOCATION '${location}/index=10'
  PARTITION (index=11)LOCATION '${location}/index=11'
  PARTITION (index=12)LOCATION '${location}/index=12'
  PARTITION (index=13)LOCATION '${location}/index=13'
  PARTITION (index=14)LOCATION '${location}/index=14'
  PARTITION (index=15)LOCATION '${location}/index=15'
  PARTITION (index=16)LOCATION '${location}/index=16'
"""

val conf = new SparkConf().setAppName("scala").setMaster("local[2]")
conf.set("spark.sql.warehouse.dir", "file:///g:/home/warehouse")
val ctx = new SparkContext(conf)
val hctx = new HiveContext(ctx)
hctx.sql(create)
hctx.sql(add_part)
 for (i <- 1 to 6) {
  new Query(hctx).start()
}
  }

  class Query(htcx: HiveContext) extends Thread {

setName("Query-Thread")

override def run = {
  while (true) {
htcx.sql("select count(*) from test_csv").show()
Thread.sleep(100)
  }

}
  }
}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-17395) Queries on CSV partition table result in frequent GC

2016-09-04 Thread pin_zhang (JIRA)
pin_zhang created SPARK-17395:
-

 Summary: Queries on CSV partition table result in frequent GC 
 Key: SPARK-17395
 URL: https://issues.apache.org/jira/browse/SPARK-17395
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 2.0.0, 1.6.2, 1.5.2
Reporter: pin_zhang


1. Create external partitioned table and run sqls against the table
2. Run the queries for a while, driver JVM does frequent GC
increase head size won't resolve this issue.
3. Test code 
import org.apache.spark.SparkConf
import org.apache.spark.SparkContext
import org.apache.spark.sql.hive.HiveContext
import org.apache.spark.sql.hive.thriftserver.HiveThriftServer2

object Bugs {

  def main(args: Array[String]): Unit = {

val location = "file:///g:/home/test/csv"
val create = s"""CREATE   EXTERNAL  TABLE  test_csv
 (ID string,  SEQ string )
  PARTITIONED BY(index int)
  ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.OpenCSVSerde'
  LOCATION "${location}" 
  """
val add_part = s"""
  ALTER TABLE test_csv ADD 
  PARTITION (index=1)LOCATION '${location}/index=1'
  PARTITION (index=2)LOCATION '${location}/index=2'
  PARTITION (index=3)LOCATION '${location}/index=3'
  PARTITION (index=4)LOCATION '${location}/index=4'
  PARTITION (index=5)LOCATION '${location}/index=5'
  PARTITION (index=6)LOCATION '${location}/index=6'
  PARTITION (index=7)LOCATION '${location}/index=7'
  PARTITION (index=8)LOCATION '${location}/index=8'
  PARTITION (index=9)LOCATION '${location}/index=9'
  PARTITION (index=10)LOCATION '${location}/index=10'
  PARTITION (index=11)LOCATION '${location}/index=11'
  PARTITION (index=12)LOCATION '${location}/index=12'
  PARTITION (index=13)LOCATION '${location}/index=13'
  PARTITION (index=14)LOCATION '${location}/index=14'
  PARTITION (index=15)LOCATION '${location}/index=15'
  PARTITION (index=16)LOCATION '${location}/index=16'
"""

val conf = new SparkConf().setAppName("scala").setMaster("local[2]")
val ctx = new SparkContext(conf)

val hctx = new HiveContext(ctx)
   
hctx.sql(create)
hctx.sql(add_part)

for (i <- 1 to 6) {
  new Query(hctx).start()
}
  }

  class Query(htcx: HiveContext) extends Thread {
 setName("Query-Thread")
 override def run = {
  while (true) {
htcx.sql("select count(*) from test_csv").show()
Thread.sleep(100)
  }

}
  }
}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-9686) Spark Thrift server doesn't return correct JDBC metadata

2016-07-01 Thread pin_zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-9686?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15358916#comment-15358916
 ] 

pin_zhang commented on SPARK-9686:
--

Any plan to fix this bug?

> Spark Thrift server doesn't return correct JDBC metadata 
> -
>
> Key: SPARK-9686
> URL: https://issues.apache.org/jira/browse/SPARK-9686
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.4.0, 1.4.1, 1.5.0, 1.5.1, 1.5.2
>Reporter: pin_zhang
>Assignee: Cheng Lian
>Priority: Critical
> Attachments: SPARK-9686.1.patch.txt
>
>
> 1. Start  start-thriftserver.sh
> 2. connect with beeline
> 3. create table
> 4.show tables, the new created table returned
> 5.
>   Class.forName("org.apache.hive.jdbc.HiveDriver");
>   String URL = "jdbc:hive2://localhost:1/default";
>Properties info = new Properties();
> Connection conn = DriverManager.getConnection(URL, info);
>   ResultSet tables = conn.getMetaData().getTables(conn.getCatalog(),
>null, null, null);
> Problem:
>No tables with returned this API, that work in spark1.3



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-12262) describe extended doesn't return table on detail info tabled stored as PARQUET format

2015-12-10 Thread pin_zhang (JIRA)
pin_zhang created SPARK-12262:
-

 Summary: describe extended doesn't return table on detail info 
tabled stored as PARQUET format
 Key: SPARK-12262
 URL: https://issues.apache.org/jira/browse/SPARK-12262
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 1.5.2
Reporter: pin_zhang


1. start hive server with start-thriftserver.sh
2. create table table1 (id  int) ;
create table table2(id  int) STORED AS PARQUET;
3. describe extended table1 ;
return detailed info
4. describe extended table2 ;
result has no detailed info





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Closed] (SPARK-10290) Spark can register temp table and hive table with the same table name

2015-12-07 Thread pin_zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-10290?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

pin_zhang closed SPARK-10290.
-

not a bug

> Spark can register temp table and hive table with the same table name
> -
>
> Key: SPARK-10290
> URL: https://issues.apache.org/jira/browse/SPARK-10290
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.4.1
>Reporter: pin_zhang
>
> Spark sql allow to create hive table and register temp table with the same 
> name
> no way to run query on the hive table table with the following code
> // register hive table
> DataFrame df = hctx_.read().json("test.json");
> df.write().mode(SaveMode.Overwrite).saveAsTable("test");
>  // register temp table
> hctx_.registerDataFrameAsTable(hctx_.sql("select id from test"), "test");
>   



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-12008) Spark hive security authorization doesn't work as Apache hive's

2015-12-01 Thread pin_zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-12008?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15035159#comment-15035159
 ] 

pin_zhang commented on SPARK-12008:
---

Any comments?

> Spark hive security authorization doesn't work as Apache hive's
> ---
>
> Key: SPARK-12008
> URL: https://issues.apache.org/jira/browse/SPARK-12008
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.5.2
>Reporter: pin_zhang
>
> Spark hive security authorization doesn't consistent with apache hive
> The same hive-site.xml
>  
>  hive.security.authorization.enabled
>  true
> 
>
> hive.security.authorization.manager
> org.apache.hadoop.hive.ql.security.authorization.plugin.sqlstd.SQLStdHiveAuthorizerFactory
> 
> 
> hive.security.authenticator.manager
> org.apache.hadoop.hive.ql.security.SessionStateUserAuthenticator
>   
>
> hive.server2.enable.doAs
> true
> 
> 1. Run spark start-thriftserver.sh, Will meet exception when run sql.
>SQL standards based authorization should not be enabled from hive 
> cliInstead the use of storage based authorization in hive metastore is 
> reccomended. 
>Set hive.security.authorization.enabled=false to disable authz within cli
> 2. Change to start start-thriftserver.sh with hive configurations
> ./start-thriftserver.sh --conf 
> hive.security.authorization.manager=org.apache.hadoop.hive.ql.security.authorization.plugin.sqlstd.SQLStdHiveAuthorizerFactory
>  --conf 
> hive.security.authenticator.manager=org.apache.hadoop.hive.ql.security.SessionStateUserAuthenticator
>  
> 3. Beeline connect with userA and create table tableA.
> 4. Beeline connect with userB to truncate tableA
>   A) In Apache hive, truncate table get exception
>   Error while compiling statement: FAILED: HiveAccessControlException 
> Permission denied: Principal [name=userB, type=USER] does not have following 
> privileges for operation TRUNCATETABLE [[OBJECT OWNERSHIP] on Object 
> [type=TABLE_OR_VIEW, name=default.tablea]] (state=42000,code=4)
>   B) In Spark hive, any user that can connect to the hive, can truncate, as 
> long as the spark user has privileges.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-12008) Spark hive security authorization doesn't work as Apache hive's

2015-11-25 Thread pin_zhang (JIRA)
pin_zhang created SPARK-12008:
-

 Summary: Spark hive security authorization doesn't work as Apache 
hive's
 Key: SPARK-12008
 URL: https://issues.apache.org/jira/browse/SPARK-12008
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 1.5.2
Reporter: pin_zhang


Spark hive security authorization doesn't consistent with apache hive
The same hive-site.xml
 
 hive.security.authorization.enabled
 true

   
hive.security.authorization.manager
org.apache.hadoop.hive.ql.security.authorization.plugin.sqlstd.SQLStdHiveAuthorizerFactory


hive.security.authenticator.manager
org.apache.hadoop.hive.ql.security.SessionStateUserAuthenticator
  
   
hive.server2.enable.doAs
true


1. Run spark start-thriftserver.sh, Will meet exception when run sql.
   SQL standards based authorization should not be enabled from hive cliInstead 
the use of storage based authorization in hive metastore is reccomended. 
   Set hive.security.authorization.enabled=false to disable authz within cli
2. Change to start start-thriftserver.sh with hive configurations
./start-thriftserver.sh --conf 
hive.security.authorization.manager=org.apache.hadoop.hive.ql.security.authorization.plugin.sqlstd.SQLStdHiveAuthorizerFactory
 --conf 
hive.security.authenticator.manager=org.apache.hadoop.hive.ql.security.SessionStateUserAuthenticator
 

3. Beeline connect with userA and create table tableA.
4. Beeline connect with userB to truncate tableA
  A) In Apache hive, truncate table get exception
  Error while compiling statement: FAILED: HiveAccessControlException 
Permission denied: Principal [name=userB, type=USER] does not have following 
privileges for operation TRUNCATETABLE [[OBJECT OWNERSHIP] on Object 
[type=TABLE_OR_VIEW, name=default.tablea]] (state=42000,code=4)
  B) In Spark hive, any user that can connect to the hive, can truncate, as 
long as the spark user has privileges.





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-11748) Result is null after alter column name of table stored as Parquet

2015-11-18 Thread pin_zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-11748?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15013030#comment-15013030
 ] 

pin_zhang commented on SPARK-11748:
---

Apache hive 0.14 has added Support for Parquet Column Rename 
https://issues.apache.org/jira/browse/HIVE-6938
That doesn't work in spark hive


> Result is null after alter column name of table stored as Parquet 
> --
>
> Key: SPARK-11748
> URL: https://issues.apache.org/jira/browse/SPARK-11748
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.5.1
>Reporter: pin_zhang
>
> 1. Test with the following code
> hctx.sql(" create table " + table + " (id int, str string) STORED AS 
> PARQUET ")
> val df = hctx.jsonFile("g:/vip.json")
> df.write.format("parquet").mode(SaveMode.Append).saveAsTable(table)
> hctx.sql(" select * from " + table).show()
> // alter table
> val alter = "alter table " + table + " CHANGE id i_d int "
> hctx.sql(alter)
>  
> hctx.sql(" select * from " + table).show()
> 2. Result
> after change table column name, data in null for the changed column
> Result before alter table
> +---+---+
> | id|str|
> +---+---+
> |  1| s1|
> |  2| s2|
> +---+---+
> Result after alter table
> ++---+
> | i_d|str|
> ++---+
> |null| s1|
> |null| s2|
> ++---+



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-11748) Result is null after alter column name of table stored as Parquet

2015-11-15 Thread pin_zhang (JIRA)
pin_zhang created SPARK-11748:
-

 Summary: Result is null after alter column name of table stored as 
Parquet 
 Key: SPARK-11748
 URL: https://issues.apache.org/jira/browse/SPARK-11748
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 1.5.1
Reporter: pin_zhang


1. Test with the following code
hctx.sql(" create table " + table + " (id int, str string) STORED AS 
PARQUET ")
val df = hctx.jsonFile("g:/vip.json")
df.write.format("parquet").mode(SaveMode.Append).saveAsTable(table)
hctx.sql(" select * from " + table).show()

// alter table
val alter = "alter table " + table + " CHANGE id i_d int "
hctx.sql(alter)
 
hctx.sql(" select * from " + table).show()

2. Result
after change table column name, data in null for the changed column
Result before alter table
+---+---+
| id|str|
+---+---+
|  1| s1|
|  2| s2|
+---+---+
Result after alter table
++---+
| i_d|str|
++---+
|null| s1|
|null| s2|
++---+






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-10290) Spark can register temp table and hive table with the same table name

2015-08-26 Thread pin_zhang (JIRA)
pin_zhang created SPARK-10290:
-

 Summary: Spark can register temp table and hive table with the 
same table name
 Key: SPARK-10290
 URL: https://issues.apache.org/jira/browse/SPARK-10290
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 1.4.1
Reporter: pin_zhang


Spark sql allow to create hive table and register temp table with the same name
no way to run query on the hive table table with the following code

// register hive table
DataFrame df = hctx_.read().json(test.json);
df.write().mode(SaveMode.Overwrite).saveAsTable(test);
 // register temp table
hctx_.registerDataFrameAsTable(hctx_.sql(select id from test), test);





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-9686) Spark hive jdbc client cannot get table from metadata store

2015-08-20 Thread pin_zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-9686?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14704374#comment-14704374
 ] 

pin_zhang commented on SPARK-9686:
--

What's the status of this bug? will it be fixed in 1.4.x?

 Spark hive jdbc client cannot get table from metadata store
 ---

 Key: SPARK-9686
 URL: https://issues.apache.org/jira/browse/SPARK-9686
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 1.4.0, 1.4.1
Reporter: pin_zhang
Assignee: Cheng Lian

 1. Start  start-thriftserver.sh
 2. connect with beeline
 3. create table
 4.show tables, the new created table returned
 5.
   Class.forName(org.apache.hive.jdbc.HiveDriver);
   String URL = jdbc:hive2://localhost:1/default;
Properties info = new Properties();
 Connection conn = DriverManager.getConnection(URL, info);
   ResultSet tables = conn.getMetaData().getTables(conn.getCatalog(),
null, null, null);
 Problem:
No tables with returned this API, that work in spark1.3



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-9686) Spark hive jdbc client cannot get table from metadata

2015-08-06 Thread pin_zhang (JIRA)
pin_zhang created SPARK-9686:


 Summary: Spark hive jdbc client cannot get table from metadata
 Key: SPARK-9686
 URL: https://issues.apache.org/jira/browse/SPARK-9686
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 1.4.1, 1.4.0
Reporter: pin_zhang


1. Start  start-thriftserver.sh
2. connect with beeline
3. create table
4.show tables, the new created table returned
5.
Class.forName(org.apache.hive.jdbc.HiveDriver);
String URL = jdbc:hive2://localhost:1/default;
 Properties info = new Properties();
Connection conn = DriverManager.getConnection(URL, info);
ResultSet tables = conn.getMetaData().getTables(conn.getCatalog(),
 null, null, null);

Problem:
   No tables with returned this API, that work in spark1.3




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-9686) Spark hive jdbc client cannot get table from metadata store

2015-08-06 Thread pin_zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-9686?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

pin_zhang updated SPARK-9686:
-
Summary: Spark hive jdbc client cannot get table from metadata store  (was: 
Spark hive jdbc client cannot get table from metadata)

 Spark hive jdbc client cannot get table from metadata store
 ---

 Key: SPARK-9686
 URL: https://issues.apache.org/jira/browse/SPARK-9686
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 1.4.0, 1.4.1
Reporter: pin_zhang

 1. Start  start-thriftserver.sh
 2. connect with beeline
 3. create table
 4.show tables, the new created table returned
 5.
   Class.forName(org.apache.hive.jdbc.HiveDriver);
   String URL = jdbc:hive2://localhost:1/default;
Properties info = new Properties();
 Connection conn = DriverManager.getConnection(URL, info);
   ResultSet tables = conn.getMetaData().getTables(conn.getCatalog(),
null, null, null);
 Problem:
No tables with returned this API, that work in spark1.3



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-7480) Get exception when DataFrame saveAsTable and run sql on the same table at the same time

2015-05-08 Thread pin_zhang (JIRA)
pin_zhang created SPARK-7480:


 Summary: Get exception when DataFrame saveAsTable and run sql on 
the same table at the same time
 Key: SPARK-7480
 URL: https://issues.apache.org/jira/browse/SPARK-7480
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 1.3.1, 1.3.0
Reporter: pin_zhang


There is a case 
1) In the main thread call  DataFrame.saveAsTable(table,
SaveMode.Overwrite); save json rdd to hive table
2) In another thread run sql the table simultaneously 
You can see many exceptions to indicate the table not exit or table is not 
complete.
Does Spark SQL support such usage?

Thanks

[Main Thread]
DataFrame df = hiveContext_.jsonFile(test.json);
  String table = UNIT_TEST;
 while (true) {
df = hiveContext_.jsonFile(test.json);
df.saveAsTable(table, SaveMode.Overwrite);
System.out.println(new Timestamp(System.currentTimeMillis()) +  [ 
+Thread.currentThread().getName()
+ ] override table);   
try {
Thread.sleep(3000);
} catch (InterruptedException e) {
e.printStackTrace();
}
}

[Query Thread]
   DataFrame query = hiveContext_.sql(select * from UNIT_TEST);
Row[] rows = query.collect();
System.out.println(new Timestamp(System.currentTimeMillis()) + 
 [ + Thread.currentThread().getName()
+ ]  [query result count:]  + rows.length);


[Exceptions in log]

15/05/08 16:05:49 ERROR Hive: NoSuchObjectException(message:default.unit_test 
table not found)
at 
org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.get_table(HiveMetaStore.java:1560)
at sun.reflect.GeneratedMethodAccessor22.invoke(Unknown Source)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:601)
at 
org.apache.hadoop.hive.metastore.RetryingHMSHandler.invoke(RetryingHMSHandler.java:105)
at com.sun.proxy.$Proxy20.get_table(Unknown Source)
at 
org.apache.hadoop.hive.metastore.HiveMetaStoreClient.getTable(HiveMetaStoreClient.java:997)
at sun.reflect.GeneratedMethodAccessor23.invoke(Unknown Source)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:601)
at 
org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.invoke(RetryingMetaStoreClient.java:89)
at com.sun.proxy.$Proxy21.getTable(Unknown Source)
at org.apache.hadoop.hive.ql.metadata.Hive.getTable(Hive.java:976)
at org.apache.hadoop.hive.ql.metadata.Hive.getTable(Hive.java:950)
at 
org.apache.spark.sql.hive.HiveMetastoreCatalog.lookupRelation(HiveMetastoreCatalog.scala:201)
at 
org.apache.spark.sql.hive.HiveContext$$anon$2.org$apache$spark$sql$catalyst$analysis$OverrideCatalog$$super$lookupRelation(HiveContext.scala:262)
at 
org.apache.spark.sql.catalyst.analysis.OverrideCatalog$$anonfun$lookupRelation$3.apply(Catalog.scala:161)
at 
org.apache.spark.sql.catalyst.analysis.OverrideCatalog$$anonfun$lookupRelation$3.apply(Catalog.scala:161)
at scala.Option.getOrElse(Option.scala:120)
at 
org.apache.spark.sql.catalyst.analysis.OverrideCatalog$class.lookupRelation(Catalog.scala:161)
at 
org.apache.spark.sql.hive.HiveContext$$anon$2.lookupRelation(HiveContext.scala:262)
at 
org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveRelations$.getTable(Analyzer.scala:174)
at 
org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveRelations$$anonfun$apply$6.applyOrElse(Analyzer.scala:186)
at 
org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveRelations$$anonfun$apply$6.applyOrElse(Analyzer.scala:181)
at 
org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$3.apply(TreeNode.scala:188)
at 
org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$3.apply(TreeNode.scala:188)
at 
org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:51)
at 
org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:187)
at 
org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$4.apply(TreeNode.scala:208)
at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
at scala.collection.Iterator$class.foreach(Iterator.scala:727)
at scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
at 
scala.collection.generic.Growable$class.$plus$plus$eq(Growable.scala:48)
at 
scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:103)
at 
scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:47)
at scala.collection.TraversableOnce$class.to(TraversableOnce.scala:273)

[jira] [Commented] (SPARK-6923) Spark SQL CLI does not read Data Source schema correctly

2015-04-30 Thread pin_zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-6923?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14521277#comment-14521277
 ] 

pin_zhang commented on SPARK-6923:
--

Hi, Cheng Hao
   Thanks for your reply!
   Do you mean if provide a wrapper for datasource api, the Hive Storage 
Handler can get the data sourced table schema correctly for the external 
application via Hive API?

If so, can it be fixed in Spark 1.3.x?



 Spark SQL CLI does not read Data Source schema correctly
 

 Key: SPARK-6923
 URL: https://issues.apache.org/jira/browse/SPARK-6923
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 1.3.0
Reporter: pin_zhang
Priority: Blocker

 {code:java}
 HiveContext hctx = new HiveContext(sc);
 ListString sample = new ArrayListString();
 sample.add( {\id\: \id_1\, \age\:1} );
 RDDString sampleRDD = new JavaSparkContext(sc).parallelize(sample).rdd();   
 DataFrame df = hctx.jsonRDD(sampleRDD);
 String table=test;
 df.saveAsTable(table, json,SaveMode.Overwrite);
 Table t = hctx.catalog().client().getTable(table);
 System.out.println( t.getCols());
 {code}
 --
 With the code above to save DataFrame to hive table,
 Get table cols returns one column named 'col'
 [FieldSchema(name:col, type:arraystring, comment:from deserializer)]
 Expected return fields schema id, age.
 This results in the jdbc API cannot retrieves the table columns via ResultSet 
 DatabaseMetaData.getColumns(String catalog, String schemaPattern,String 
 tableNamePattern, String columnNamePattern)
 But resultset metadata for query  select * from test   contains fields id, 
 age.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-6923) Spark SQL CLI does not read Data Source schema correctly

2015-04-30 Thread pin_zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-6923?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14521280#comment-14521280
 ] 

pin_zhang commented on SPARK-6923:
--

Hi, Cheng Hao
   Thanks for your reply!
   Do you mean if provide a wrapper for datasource api, the Hive Storage 
Handler can get the data sourced table schema correctly for the external 
application via Hive API?

If so, can it be fixed in Spark 1.3.x?



 Spark SQL CLI does not read Data Source schema correctly
 

 Key: SPARK-6923
 URL: https://issues.apache.org/jira/browse/SPARK-6923
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 1.3.0
Reporter: pin_zhang
Priority: Blocker

 {code:java}
 HiveContext hctx = new HiveContext(sc);
 ListString sample = new ArrayListString();
 sample.add( {\id\: \id_1\, \age\:1} );
 RDDString sampleRDD = new JavaSparkContext(sc).parallelize(sample).rdd();   
 DataFrame df = hctx.jsonRDD(sampleRDD);
 String table=test;
 df.saveAsTable(table, json,SaveMode.Overwrite);
 Table t = hctx.catalog().client().getTable(table);
 System.out.println( t.getCols());
 {code}
 --
 With the code above to save DataFrame to hive table,
 Get table cols returns one column named 'col'
 [FieldSchema(name:col, type:arraystring, comment:from deserializer)]
 Expected return fields schema id, age.
 This results in the jdbc API cannot retrieves the table columns via ResultSet 
 DatabaseMetaData.getColumns(String catalog, String schemaPattern,String 
 tableNamePattern, String columnNamePattern)
 But resultset metadata for query  select * from test   contains fields id, 
 age.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-6923) Spark SQL CLI does not read Data Source schema correctly

2015-04-30 Thread pin_zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-6923?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14521278#comment-14521278
 ] 

pin_zhang commented on SPARK-6923:
--

Hi, Cheng Hao
   Thanks for your reply!
   Do you mean if provide a wrapper for datasource api, the Hive Storage 
Handler can get the data sourced table schema correctly for the external 
application via Hive API?

If so, can it be fixed in Spark 1.3.x?



 Spark SQL CLI does not read Data Source schema correctly
 

 Key: SPARK-6923
 URL: https://issues.apache.org/jira/browse/SPARK-6923
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 1.3.0
Reporter: pin_zhang
Priority: Blocker

 {code:java}
 HiveContext hctx = new HiveContext(sc);
 ListString sample = new ArrayListString();
 sample.add( {\id\: \id_1\, \age\:1} );
 RDDString sampleRDD = new JavaSparkContext(sc).parallelize(sample).rdd();   
 DataFrame df = hctx.jsonRDD(sampleRDD);
 String table=test;
 df.saveAsTable(table, json,SaveMode.Overwrite);
 Table t = hctx.catalog().client().getTable(table);
 System.out.println( t.getCols());
 {code}
 --
 With the code above to save DataFrame to hive table,
 Get table cols returns one column named 'col'
 [FieldSchema(name:col, type:arraystring, comment:from deserializer)]
 Expected return fields schema id, age.
 This results in the jdbc API cannot retrieves the table columns via ResultSet 
 DatabaseMetaData.getColumns(String catalog, String schemaPattern,String 
 tableNamePattern, String columnNamePattern)
 But resultset metadata for query  select * from test   contains fields id, 
 age.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Issue Comment Deleted] (SPARK-6923) Spark SQL CLI does not read Data Source schema correctly

2015-04-30 Thread pin_zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-6923?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

pin_zhang updated SPARK-6923:
-
Comment: was deleted

(was: Hi, Cheng Hao
   Thanks for your reply!
   Do you mean if provide a wrapper for datasource api, the Hive Storage 
Handler can get the data sourced table schema correctly for the external 
application via Hive API?

If so, can it be fixed in Spark 1.3.x?

)

 Spark SQL CLI does not read Data Source schema correctly
 

 Key: SPARK-6923
 URL: https://issues.apache.org/jira/browse/SPARK-6923
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 1.3.0
Reporter: pin_zhang
Priority: Blocker

 {code:java}
 HiveContext hctx = new HiveContext(sc);
 ListString sample = new ArrayListString();
 sample.add( {\id\: \id_1\, \age\:1} );
 RDDString sampleRDD = new JavaSparkContext(sc).parallelize(sample).rdd();   
 DataFrame df = hctx.jsonRDD(sampleRDD);
 String table=test;
 df.saveAsTable(table, json,SaveMode.Overwrite);
 Table t = hctx.catalog().client().getTable(table);
 System.out.println( t.getCols());
 {code}
 --
 With the code above to save DataFrame to hive table,
 Get table cols returns one column named 'col'
 [FieldSchema(name:col, type:arraystring, comment:from deserializer)]
 Expected return fields schema id, age.
 This results in the jdbc API cannot retrieves the table columns via ResultSet 
 DatabaseMetaData.getColumns(String catalog, String schemaPattern,String 
 tableNamePattern, String columnNamePattern)
 But resultset metadata for query  select * from test   contains fields id, 
 age.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Issue Comment Deleted] (SPARK-6923) Spark SQL CLI does not read Data Source schema correctly

2015-04-30 Thread pin_zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-6923?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

pin_zhang updated SPARK-6923:
-
Comment: was deleted

(was: Hi, Cheng Hao
   Thanks for your reply!
   Do you mean if provide a wrapper for datasource api, the Hive Storage 
Handler can get the data sourced table schema correctly for the external 
application via Hive API?

If so, can it be fixed in Spark 1.3.x?

)

 Spark SQL CLI does not read Data Source schema correctly
 

 Key: SPARK-6923
 URL: https://issues.apache.org/jira/browse/SPARK-6923
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 1.3.0
Reporter: pin_zhang
Priority: Blocker

 {code:java}
 HiveContext hctx = new HiveContext(sc);
 ListString sample = new ArrayListString();
 sample.add( {\id\: \id_1\, \age\:1} );
 RDDString sampleRDD = new JavaSparkContext(sc).parallelize(sample).rdd();   
 DataFrame df = hctx.jsonRDD(sampleRDD);
 String table=test;
 df.saveAsTable(table, json,SaveMode.Overwrite);
 Table t = hctx.catalog().client().getTable(table);
 System.out.println( t.getCols());
 {code}
 --
 With the code above to save DataFrame to hive table,
 Get table cols returns one column named 'col'
 [FieldSchema(name:col, type:arraystring, comment:from deserializer)]
 Expected return fields schema id, age.
 This results in the jdbc API cannot retrieves the table columns via ResultSet 
 DatabaseMetaData.getColumns(String catalog, String schemaPattern,String 
 tableNamePattern, String columnNamePattern)
 But resultset metadata for query  select * from test   contains fields id, 
 age.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-6923) Spark SQL CLI does not read Data Source schema correctly

2015-04-27 Thread pin_zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-6923?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14510381#comment-14510381
 ] 

pin_zhang edited comment on SPARK-6923 at 4/27/15 9:43 AM:
---

Hi, Michael
Is possible this CLI bug be fixed in Spark1.3?

Please help to comment.
Thanks


was (Author: pin_zhang):
Hi, Michael
Can this CLI bug be fixed in Spark1.3?


 Spark SQL CLI does not read Data Source schema correctly
 

 Key: SPARK-6923
 URL: https://issues.apache.org/jira/browse/SPARK-6923
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 1.3.0
Reporter: pin_zhang

 HiveContext hctx = new HiveContext(sc);
 ListString sample = new ArrayListString();
 sample.add( {\id\: \id_1\, \age\:1} );
 RDDString sampleRDD = new JavaSparkContext(sc).parallelize(sample).rdd();   
 DataFrame df = hctx.jsonRDD(sampleRDD);
 String table=test;
 df.saveAsTable(table, json,SaveMode.Overwrite);
 Table t = hctx.catalog().client().getTable(table);
 System.out.println( t.getCols());
 --
 With the code above to save DataFrame to hive table,
 Get table cols returns one column named 'col'
 [FieldSchema(name:col, type:arraystring, comment:from deserializer)]
 Expected return fields schema id, age.
 This results in the jdbc API cannot retrieves the table columns via ResultSet 
 DatabaseMetaData.getColumns(String catalog, String schemaPattern,String 
 tableNamePattern, String columnNamePattern)
 But resultset metadata for query  select * from test   contains fields id, 
 age.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-6923) Spark SQL CLI does not read Data Source schema correctly

2015-04-23 Thread pin_zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-6923?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14510381#comment-14510381
 ] 

pin_zhang commented on SPARK-6923:
--

Hi, Michael
Can this CLI bug be fixed in Spark1.3?


 Spark SQL CLI does not read Data Source schema correctly
 

 Key: SPARK-6923
 URL: https://issues.apache.org/jira/browse/SPARK-6923
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 1.3.0
Reporter: pin_zhang

 HiveContext hctx = new HiveContext(sc);
 ListString sample = new ArrayListString();
 sample.add( {\id\: \id_1\, \age\:1} );
 RDDString sampleRDD = new JavaSparkContext(sc).parallelize(sample).rdd();   
 DataFrame df = hctx.jsonRDD(sampleRDD);
 String table=test;
 df.saveAsTable(table, json,SaveMode.Overwrite);
 Table t = hctx.catalog().client().getTable(table);
 System.out.println( t.getCols());
 --
 With the code above to save DataFrame to hive table,
 Get table cols returns one column named 'col'
 [FieldSchema(name:col, type:arraystring, comment:from deserializer)]
 Expected return fields schema id, age.
 This results in the jdbc API cannot retrieves the table columns via ResultSet 
 DatabaseMetaData.getColumns(String catalog, String schemaPattern,String 
 tableNamePattern, String columnNamePattern)
 But resultset metadata for query  select * from test   contains fields id, 
 age.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-6923) Get invalid hive table columns after save DataFrame to hive table

2015-04-22 Thread pin_zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-6923?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14507182#comment-14507182
 ] 

pin_zhang commented on SPARK-6923:
--

Hi, Michael
Can you help to comment. we have a such usage to query hive table and the table 
is generated by DataFrame.

 Get invalid hive table columns after save DataFrame to hive table
 -

 Key: SPARK-6923
 URL: https://issues.apache.org/jira/browse/SPARK-6923
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 1.3.0
Reporter: pin_zhang

 HiveContext hctx = new HiveContext(sc);
 ListString sample = new ArrayListString();
 sample.add( {\id\: \id_1\, \age\:1} );
 RDDString sampleRDD = new JavaSparkContext(sc).parallelize(sample).rdd();   
 DataFrame df = hctx.jsonRDD(sampleRDD);
 String table=test;
 df.saveAsTable(table, json,SaveMode.Overwrite);
 Table t = hctx.catalog().client().getTable(table);
 System.out.println( t.getCols());
 --
 With the code above to save DataFrame to hive table,
 Get table cols returns one column named 'col'
 [FieldSchema(name:col, type:arraystring, comment:from deserializer)]
 Expected return fields schema id, age.
 This results in the jdbc API cannot retrieves the table columns via ResultSet 
 DatabaseMetaData.getColumns(String catalog, String schemaPattern,String 
 tableNamePattern, String columnNamePattern)
 But resultset metadata for query  select * from test   contains fields id, 
 age.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-6923) Get invalid hive table columns after save DataFrame to hive table

2015-04-21 Thread pin_zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-6923?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14504409#comment-14504409
 ] 

pin_zhang commented on SPARK-6923:
--

Hi, Michael
  We run spark app in Spark1.3, and  use the CLIService in HiveServer2 to get 
the table schema, the call stack to get the schema as below
HiveMetaStore$HMSHandler.get_fields(String, String) line: 2873  
HiveMetaStore$HMSHandler.get_schema(String, String) line: 2946  
NativeMethodAccessorImpl.invoke0(Method, Object, Object[]) line: not 
available [native method]  
NativeMethodAccessorImpl.invoke(Object, Object[]) line: 57  
DelegatingMethodAccessorImpl.invoke(Object, Object[]) line: 43  
Method.invoke(Object, Object...) line: 606  
RetryingHMSHandler.invoke(Object, Method, Object[]) line: 105   
$Proxy9.get_schema(String, String) line: not available  
HiveMetaStoreClient.getSchema(String, String) line: 1269
GetColumnsOperation.run() line: 139 
HiveSessionImplwithUGI(HiveSessionImpl).getColumns(String, String, 
String, String) line: 359
NativeMethodAccessorImpl.invoke0(Method, Object, Object[]) line: not 
available [native method]  
NativeMethodAccessorImpl.invoke(Object, Object[]) line: 57  
DelegatingMethodAccessorImpl.invoke(Object, Object[]) line: 43  
Method.invoke(Object, Object...) line: 606  
HiveSessionProxy.invoke(Method, Object[]) line: 79  
HiveSessionProxy.access$000(HiveSessionProxy, Method, Object[]) line: 
37
HiveSessionProxy$1.run() line: 64   
AccessController.doPrivileged(PrivilegedExceptionActionT, 
AccessControlContext) line: not available [native method]   
Subject.doAs(Subject, PrivilegedExceptionActionT) line: 415   
UserGroupInformation.doAs(PrivilegedExceptionActionT) line: 1548  
Hadoop23Shims(HadoopShimsSecure).doAs(UserGroupInformation, 
PrivilegedExceptionActionT) line: 493 
HiveSessionProxy.invoke(Object, Method, Object[]) line: 60  
$Proxy17.getColumns(String, String, String, String) line: not available 
SparkSQLCLIService(CLIService).getColumns(SessionHandle, String, 
String, String, String) line: 309  
ThriftBinaryCLIService(ThriftCLIService).GetColumns(TGetColumnsReq) 
line: 433   
TCLIService$Processor$GetColumnsI.getResult(I, GetColumns_args) line: 
1433
TCLIService$Processor$GetColumnsI.getResult(Object, TBase) line: 1418 
TCLIService$Processor$GetColumnsI(ProcessFunctionI,T).process(int, 
TProtocol, TProtocol, I) line: 39
TSetIpAddressProcessorI(TBaseProcessorI).process(TProtocol, 
TProtocol) line: 39 
TSetIpAddressProcessorI.process(TProtocol, TProtocol) line: 55
TThreadPoolServer$WorkerProcess.run() line: 206 
ThreadPoolExecutor.runWorker(ThreadPoolExecutor$Worker) line: 1145  
ThreadPoolExecutor$Worker.run() line: 615   
Thread.run() line: 745  

   Don't you think the method should return the same table schema as that you 
said hctx.table(tableName).schema?

 Get invalid hive table columns after save DataFrame to hive table
 -

 Key: SPARK-6923
 URL: https://issues.apache.org/jira/browse/SPARK-6923
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 1.3.0
Reporter: pin_zhang

 HiveContext hctx = new HiveContext(sc);
 ListString sample = new ArrayListString();
 sample.add( {\id\: \id_1\, \age\:1} );
 RDDString sampleRDD = new JavaSparkContext(sc).parallelize(sample).rdd();   
 DataFrame df = hctx.jsonRDD(sampleRDD);
 String table=test;
 df.saveAsTable(table, json,SaveMode.Overwrite);
 Table t = hctx.catalog().client().getTable(table);
 System.out.println( t.getCols());
 --
 With the code above to save DataFrame to hive table,
 Get table cols returns one column named 'col'
 [FieldSchema(name:col, type:arraystring, comment:from deserializer)]
 Expected return fields schema id, age.
 This results in the jdbc API cannot retrieves the table columns via ResultSet 
 DatabaseMetaData.getColumns(String catalog, String schemaPattern,String 
 tableNamePattern, String columnNamePattern)
 But resultset metadata for query  select * from test   contains fields id, 
 age.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-6923) Get invalid hive table columns after save DataFrame to hive table

2015-04-16 Thread pin_zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-6923?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14497701#comment-14497701
 ] 

pin_zhang commented on SPARK-6923:
--

In spark1.1.0 client with the jdbc api to get the table schema
age(bigint), id(string)
while in spark1.3.0 {name=col, type=arraystring}
That's not expected.

ArrayListMap results = new ArrayList();
DatabaseMetaData meta = cnn.getMetaData();   
rsColumns = meta.getColumns(database, null, table, null);   
while (rsColumns.next()) {
Map col = new HashMap();
col.put(name, rsColumns.getString(COLUMN_NAME));
String typeName = rsColumns.getString(TYPE_NAME);
col.put(type, typeName);
results.add(col);
}
rsColumns.close();


 Get invalid hive table columns after save DataFrame to hive table
 -

 Key: SPARK-6923
 URL: https://issues.apache.org/jira/browse/SPARK-6923
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 1.3.0
Reporter: pin_zhang

 HiveContext hctx = new HiveContext(sc);
 ListString sample = new ArrayListString();
 sample.add( {\id\: \id_1\, \age\:1} );
 RDDString sampleRDD = new JavaSparkContext(sc).parallelize(sample).rdd();   
 DataFrame df = hctx.jsonRDD(sampleRDD);
 String table=test;
 df.saveAsTable(table, json,SaveMode.Overwrite);
 Table t = hctx.catalog().client().getTable(table);
 System.out.println( t.getCols());
 --
 With the code above to save DataFrame to hive table,
 Get table cols returns one column named 'col'
 [FieldSchema(name:col, type:arraystring, comment:from deserializer)]
 Expected return fields schema id, age.
 This results in the jdbc API cannot retrieves the table columns via ResultSet 
 DatabaseMetaData.getColumns(String catalog, String schemaPattern,String 
 tableNamePattern, String columnNamePattern)
 But resultset metadata for query  select * from test   contains fields id, 
 age.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-6923) Get invalid hive table columns after save DataFrame to hive table

2015-04-16 Thread pin_zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-6923?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14499141#comment-14499141
 ] 

pin_zhang commented on SPARK-6923:
--

Do you means if save data frame to the table that use the new datasource api to 
create table, the hive table won't support the jdbc api DatabaseMetaData 
.getColumns(database, null, table, null) to get the table columns that 
corresponding to the data frame fields?

 Get invalid hive table columns after save DataFrame to hive table
 -

 Key: SPARK-6923
 URL: https://issues.apache.org/jira/browse/SPARK-6923
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 1.3.0
Reporter: pin_zhang

 HiveContext hctx = new HiveContext(sc);
 ListString sample = new ArrayListString();
 sample.add( {\id\: \id_1\, \age\:1} );
 RDDString sampleRDD = new JavaSparkContext(sc).parallelize(sample).rdd();   
 DataFrame df = hctx.jsonRDD(sampleRDD);
 String table=test;
 df.saveAsTable(table, json,SaveMode.Overwrite);
 Table t = hctx.catalog().client().getTable(table);
 System.out.println( t.getCols());
 --
 With the code above to save DataFrame to hive table,
 Get table cols returns one column named 'col'
 [FieldSchema(name:col, type:arraystring, comment:from deserializer)]
 Expected return fields schema id, age.
 This results in the jdbc API cannot retrieves the table columns via ResultSet 
 DatabaseMetaData.getColumns(String catalog, String schemaPattern,String 
 tableNamePattern, String columnNamePattern)
 But resultset metadata for query  select * from test   contains fields id, 
 age.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-6923) Get invalid hive table columns after save DataFrame to hive table

2015-04-14 Thread pin_zhang (JIRA)
pin_zhang created SPARK-6923:


 Summary: Get invalid hive table columns after save DataFrame to 
hive table
 Key: SPARK-6923
 URL: https://issues.apache.org/jira/browse/SPARK-6923
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 1.3.0
Reporter: pin_zhang


HiveContext hctx = new HiveContext(sc);
ListString sample = new ArrayListString();
sample.add( {\id\: \id_1\, \age\:1} );
RDDString sampleRDD = new JavaSparkContext(sc).parallelize(sample).rdd(); 
DataFrame df = hctx.jsonRDD(sampleRDD);
String table=test;
df.saveAsTable(table, json,SaveMode.Overwrite);
Table t = hctx.catalog().client().getTable(table);
System.out.println( t.getCols());
--
With the code above to save DataFrame to hive table,
Get table cols returns one column named 'col'
[FieldSchema(name:col, type:arraystring, comment:from deserializer)]
Expected return fields schema id, age.

This results in the jdbc API cannot retrieves the table columns via ResultSet 
DatabaseMetaData.getColumns(String catalog, String schemaPattern,String 
tableNamePattern, String columnNamePattern)
But resultset metadata for query  select * from test   contains fields id, 
age.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org