[jira] [Commented] (SPARK-21725) spark thriftserver insert overwrite table partition select
[ https://issues.apache.org/jira/browse/SPARK-21725?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16656640#comment-16656640 ] Yuming Wang commented on SPARK-21725: - [~ste...@apache.org] Thanks a lot! > spark thriftserver insert overwrite table partition select > --- > > Key: SPARK-21725 > URL: https://issues.apache.org/jira/browse/SPARK-21725 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 2.1.0 > Environment: centos 6.7 spark 2.1 jdk8 >Reporter: xinzhang >Priority: Major > Labels: spark-sql > > use thriftserver create table with partitions. > session 1: > SET hive.default.fileformat=Parquet;create table tmp_10(count bigint) > partitioned by (pt string) stored as parquet; > --ok > !exit > session 2: > SET hive.default.fileformat=Parquet;create table tmp_11(count bigint) > partitioned by (pt string) stored as parquet; > --ok > !exit > session 3: > --connect the thriftserver > SET hive.default.fileformat=Parquet;insert overwrite table tmp_10 > partition(pt='1') select count(1) count from tmp_11; > --ok > !exit > session 4(do it again): > --connect the thriftserver > SET hive.default.fileformat=Parquet;insert overwrite table tmp_10 > partition(pt='1') select count(1) count from tmp_11; > --error > !exit > - > 17/08/14 18:13:42 ERROR SparkExecuteStatementOperation: Error executing > query, currentState RUNNING, > java.lang.reflect.InvocationTargetException > .. > .. > Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Unable to move > source > hdfs://dc-hadoop54:50001/group/user/user1/meta/hive-temp-table/user1.db/tmp_11/.hive-staging_hive_2017-08-14_18-13-39_035_6303339779053 > 512282-2/-ext-1/part-0 to destination > hdfs://dc-hadoop54:50001/group/user/user1/meta/hive-temp-table/user1.db/tmp_11/pt=1/part-0 > at org.apache.hadoop.hive.ql.metadata.Hive.moveFile(Hive.java:2644) > at org.apache.hadoop.hive.ql.metadata.Hive.copyFiles(Hive.java:2711) > at > org.apache.hadoop.hive.ql.metadata.Hive.loadPartition(Hive.java:1403) > at > org.apache.hadoop.hive.ql.metadata.Hive.loadPartition(Hive.java:1324) > ... 45 more > Caused by: java.io.IOException: Filesystem closed > > - > the doc about the parquet table desc here > http://spark.apache.org/docs/latest/sql-programming-guide.html#parquet-files > Hive metastore Parquet table conversion > When reading from and writing to Hive metastore Parquet tables, Spark SQL > will try to use its own Parquet support instead of Hive SerDe for better > performance. This behavior is controlled by the > spark.sql.hive.convertMetastoreParquet configuration, and is turned on by > default. > I am confused the problem appear in the table(partitions) but it is ok with > table(with out partitions) . It means spark do not use its own parquet ? > Maybe someone give any suggest how could I avoid the issue? -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-21725) spark thriftserver insert overwrite table partition select
[ https://issues.apache.org/jira/browse/SPARK-21725?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16656631#comment-16656631 ] Steve Loughran commented on SPARK-21725: bq. can we fix it on the Hadoop side? fix what? the only way to handle close() of > 1 FS would be moving to referenced counted filesystems everywhere. Otherwise: * Applications which know they get a unique version of an FS instance need to call close() on it. This matters especially for those connectors (object stores, etc) which create thread pools, http connection pools, etc. * Applications which don't set up for a unique FS version, must not call close. Ref counted FS clients would be the ultimate way to do this, but I suspect it is too late to do this see: HADOOP-10792, HADOOP-4655, etc. The general assumption is: if you want to manage the lifespan of your FS instance, create a unique one yourself using {{FileSystem.newInstance()}}. The method has been there since 0.21 so there's no reason not to adopt it. > spark thriftserver insert overwrite table partition select > --- > > Key: SPARK-21725 > URL: https://issues.apache.org/jira/browse/SPARK-21725 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 2.1.0 > Environment: centos 6.7 spark 2.1 jdk8 >Reporter: xinzhang >Priority: Major > Labels: spark-sql > > use thriftserver create table with partitions. > session 1: > SET hive.default.fileformat=Parquet;create table tmp_10(count bigint) > partitioned by (pt string) stored as parquet; > --ok > !exit > session 2: > SET hive.default.fileformat=Parquet;create table tmp_11(count bigint) > partitioned by (pt string) stored as parquet; > --ok > !exit > session 3: > --connect the thriftserver > SET hive.default.fileformat=Parquet;insert overwrite table tmp_10 > partition(pt='1') select count(1) count from tmp_11; > --ok > !exit > session 4(do it again): > --connect the thriftserver > SET hive.default.fileformat=Parquet;insert overwrite table tmp_10 > partition(pt='1') select count(1) count from tmp_11; > --error > !exit > - > 17/08/14 18:13:42 ERROR SparkExecuteStatementOperation: Error executing > query, currentState RUNNING, > java.lang.reflect.InvocationTargetException > .. > .. > Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Unable to move > source > hdfs://dc-hadoop54:50001/group/user/user1/meta/hive-temp-table/user1.db/tmp_11/.hive-staging_hive_2017-08-14_18-13-39_035_6303339779053 > 512282-2/-ext-1/part-0 to destination > hdfs://dc-hadoop54:50001/group/user/user1/meta/hive-temp-table/user1.db/tmp_11/pt=1/part-0 > at org.apache.hadoop.hive.ql.metadata.Hive.moveFile(Hive.java:2644) > at org.apache.hadoop.hive.ql.metadata.Hive.copyFiles(Hive.java:2711) > at > org.apache.hadoop.hive.ql.metadata.Hive.loadPartition(Hive.java:1403) > at > org.apache.hadoop.hive.ql.metadata.Hive.loadPartition(Hive.java:1324) > ... 45 more > Caused by: java.io.IOException: Filesystem closed > > - > the doc about the parquet table desc here > http://spark.apache.org/docs/latest/sql-programming-guide.html#parquet-files > Hive metastore Parquet table conversion > When reading from and writing to Hive metastore Parquet tables, Spark SQL > will try to use its own Parquet support instead of Hive SerDe for better > performance. This behavior is controlled by the > spark.sql.hive.convertMetastoreParquet configuration, and is turned on by > default. > I am confused the problem appear in the table(partitions) but it is ok with > table(with out partitions) . It means spark do not use its own parquet ? > Maybe someone give any suggest how could I avoid the issue? -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-21725) spark thriftserver insert overwrite table partition select
[ https://issues.apache.org/jira/browse/SPARK-21725?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16655442#comment-16655442 ] Yuming Wang commented on SPARK-21725: - [~owen.omalley] [~ste...@apache.org] I found lots of related issues, can we fix it on the Hadoop side? [https://stackoverflow.com/questions/17421218/multiples-hadoop-filesystem-instances/] [https://stackoverflow.com/questions/48592337/hive-hadoop-intermittent-failure-unable-to-move-source-to-destination] > spark thriftserver insert overwrite table partition select > --- > > Key: SPARK-21725 > URL: https://issues.apache.org/jira/browse/SPARK-21725 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 2.1.0 > Environment: centos 6.7 spark 2.1 jdk8 >Reporter: xinzhang >Priority: Major > Labels: spark-sql > > use thriftserver create table with partitions. > session 1: > SET hive.default.fileformat=Parquet;create table tmp_10(count bigint) > partitioned by (pt string) stored as parquet; > --ok > !exit > session 2: > SET hive.default.fileformat=Parquet;create table tmp_11(count bigint) > partitioned by (pt string) stored as parquet; > --ok > !exit > session 3: > --connect the thriftserver > SET hive.default.fileformat=Parquet;insert overwrite table tmp_10 > partition(pt='1') select count(1) count from tmp_11; > --ok > !exit > session 4(do it again): > --connect the thriftserver > SET hive.default.fileformat=Parquet;insert overwrite table tmp_10 > partition(pt='1') select count(1) count from tmp_11; > --error > !exit > - > 17/08/14 18:13:42 ERROR SparkExecuteStatementOperation: Error executing > query, currentState RUNNING, > java.lang.reflect.InvocationTargetException > .. > .. > Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Unable to move > source > hdfs://dc-hadoop54:50001/group/user/user1/meta/hive-temp-table/user1.db/tmp_11/.hive-staging_hive_2017-08-14_18-13-39_035_6303339779053 > 512282-2/-ext-1/part-0 to destination > hdfs://dc-hadoop54:50001/group/user/user1/meta/hive-temp-table/user1.db/tmp_11/pt=1/part-0 > at org.apache.hadoop.hive.ql.metadata.Hive.moveFile(Hive.java:2644) > at org.apache.hadoop.hive.ql.metadata.Hive.copyFiles(Hive.java:2711) > at > org.apache.hadoop.hive.ql.metadata.Hive.loadPartition(Hive.java:1403) > at > org.apache.hadoop.hive.ql.metadata.Hive.loadPartition(Hive.java:1324) > ... 45 more > Caused by: java.io.IOException: Filesystem closed > > - > the doc about the parquet table desc here > http://spark.apache.org/docs/latest/sql-programming-guide.html#parquet-files > Hive metastore Parquet table conversion > When reading from and writing to Hive metastore Parquet tables, Spark SQL > will try to use its own Parquet support instead of Hive SerDe for better > performance. This behavior is controlled by the > spark.sql.hive.convertMetastoreParquet configuration, and is turned on by > default. > I am confused the problem appear in the table(partitions) but it is ok with > table(with out partitions) . It means spark do not use its own parquet ? > Maybe someone give any suggest how could I avoid the issue? -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-21725) spark thriftserver insert overwrite table partition select
[ https://issues.apache.org/jira/browse/SPARK-21725?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16235119#comment-16235119 ] xinzhang commented on SPARK-21725: -- [~mgaido] Finally.I found the pro where is . add the conf to hdfs-site.xml fs.hdfs.impl.disable.cache true reason: spark and hdfs use the same api (at the bottom they use the same instance). When beeline close a filesystem instance . It close the thriftserver's filesystem instance too. Second beeline try to get instance , it will always report "Caused by: java.io.IOException: Filesystem closed" > spark thriftserver insert overwrite table partition select > --- > > Key: SPARK-21725 > URL: https://issues.apache.org/jira/browse/SPARK-21725 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 2.1.0 > Environment: centos 6.7 spark 2.1 jdk8 >Reporter: xinzhang >Priority: Major > Labels: spark-sql > > use thriftserver create table with partitions. > session 1: > SET hive.default.fileformat=Parquet;create table tmp_10(count bigint) > partitioned by (pt string) stored as parquet; > --ok > !exit > session 2: > SET hive.default.fileformat=Parquet;create table tmp_11(count bigint) > partitioned by (pt string) stored as parquet; > --ok > !exit > session 3: > --connect the thriftserver > SET hive.default.fileformat=Parquet;insert overwrite table tmp_10 > partition(pt='1') select count(1) count from tmp_11; > --ok > !exit > session 4(do it again): > --connect the thriftserver > SET hive.default.fileformat=Parquet;insert overwrite table tmp_10 > partition(pt='1') select count(1) count from tmp_11; > --error > !exit > - > 17/08/14 18:13:42 ERROR SparkExecuteStatementOperation: Error executing > query, currentState RUNNING, > java.lang.reflect.InvocationTargetException > .. > .. > Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Unable to move > source > hdfs://dc-hadoop54:50001/group/user/user1/meta/hive-temp-table/user1.db/tmp_11/.hive-staging_hive_2017-08-14_18-13-39_035_6303339779053 > 512282-2/-ext-1/part-0 to destination > hdfs://dc-hadoop54:50001/group/user/user1/meta/hive-temp-table/user1.db/tmp_11/pt=1/part-0 > at org.apache.hadoop.hive.ql.metadata.Hive.moveFile(Hive.java:2644) > at org.apache.hadoop.hive.ql.metadata.Hive.copyFiles(Hive.java:2711) > at > org.apache.hadoop.hive.ql.metadata.Hive.loadPartition(Hive.java:1403) > at > org.apache.hadoop.hive.ql.metadata.Hive.loadPartition(Hive.java:1324) > ... 45 more > Caused by: java.io.IOException: Filesystem closed > > - > the doc about the parquet table desc here > http://spark.apache.org/docs/latest/sql-programming-guide.html#parquet-files > Hive metastore Parquet table conversion > When reading from and writing to Hive metastore Parquet tables, Spark SQL > will try to use its own Parquet support instead of Hive SerDe for better > performance. This behavior is controlled by the > spark.sql.hive.convertMetastoreParquet configuration, and is turned on by > default. > I am confused the problem appear in the table(partitions) but it is ok with > table(with out partitions) . It means spark do not use its own parquet ? > Maybe someone give any suggest how could I avoid the issue? -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-21725) spark thriftserver insert overwrite table partition select
[ https://issues.apache.org/jira/browse/SPARK-21725?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16235039#comment-16235039 ] xinzhang commented on SPARK-21725: -- could u tell me which version hadoop in your env . cdh ? ambari ? the mapr ? databricks ? or the pure community hadoop ? > spark thriftserver insert overwrite table partition select > --- > > Key: SPARK-21725 > URL: https://issues.apache.org/jira/browse/SPARK-21725 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 2.1.0 > Environment: centos 6.7 spark 2.1 jdk8 >Reporter: xinzhang >Priority: Major > Labels: spark-sql > > use thriftserver create table with partitions. > session 1: > SET hive.default.fileformat=Parquet;create table tmp_10(count bigint) > partitioned by (pt string) stored as parquet; > --ok > !exit > session 2: > SET hive.default.fileformat=Parquet;create table tmp_11(count bigint) > partitioned by (pt string) stored as parquet; > --ok > !exit > session 3: > --connect the thriftserver > SET hive.default.fileformat=Parquet;insert overwrite table tmp_10 > partition(pt='1') select count(1) count from tmp_11; > --ok > !exit > session 4(do it again): > --connect the thriftserver > SET hive.default.fileformat=Parquet;insert overwrite table tmp_10 > partition(pt='1') select count(1) count from tmp_11; > --error > !exit > - > 17/08/14 18:13:42 ERROR SparkExecuteStatementOperation: Error executing > query, currentState RUNNING, > java.lang.reflect.InvocationTargetException > .. > .. > Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Unable to move > source > hdfs://dc-hadoop54:50001/group/user/user1/meta/hive-temp-table/user1.db/tmp_11/.hive-staging_hive_2017-08-14_18-13-39_035_6303339779053 > 512282-2/-ext-1/part-0 to destination > hdfs://dc-hadoop54:50001/group/user/user1/meta/hive-temp-table/user1.db/tmp_11/pt=1/part-0 > at org.apache.hadoop.hive.ql.metadata.Hive.moveFile(Hive.java:2644) > at org.apache.hadoop.hive.ql.metadata.Hive.copyFiles(Hive.java:2711) > at > org.apache.hadoop.hive.ql.metadata.Hive.loadPartition(Hive.java:1403) > at > org.apache.hadoop.hive.ql.metadata.Hive.loadPartition(Hive.java:1324) > ... 45 more > Caused by: java.io.IOException: Filesystem closed > > - > the doc about the parquet table desc here > http://spark.apache.org/docs/latest/sql-programming-guide.html#parquet-files > Hive metastore Parquet table conversion > When reading from and writing to Hive metastore Parquet tables, Spark SQL > will try to use its own Parquet support instead of Hive SerDe for better > performance. This behavior is controlled by the > spark.sql.hive.convertMetastoreParquet configuration, and is turned on by > default. > I am confused the problem appear in the table(partitions) but it is ok with > table(with out partitions) . It means spark do not use its own parquet ? > Maybe someone give any suggest how could I avoid the issue? -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-21725) spark thriftserver insert overwrite table partition select
[ https://issues.apache.org/jira/browse/SPARK-21725?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16234332#comment-16234332 ] Marco Gaido commented on SPARK-21725: - I don't have any idea about which is the difference. Please try to set hive.exec.stagingdir as suggested in SPARK-21067. I don't know what else to say. > spark thriftserver insert overwrite table partition select > --- > > Key: SPARK-21725 > URL: https://issues.apache.org/jira/browse/SPARK-21725 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 2.1.0 > Environment: centos 6.7 spark 2.1 jdk8 >Reporter: xinzhang >Priority: Major > Labels: spark-sql > > use thriftserver create table with partitions. > session 1: > SET hive.default.fileformat=Parquet;create table tmp_10(count bigint) > partitioned by (pt string) stored as parquet; > --ok > !exit > session 2: > SET hive.default.fileformat=Parquet;create table tmp_11(count bigint) > partitioned by (pt string) stored as parquet; > --ok > !exit > session 3: > --connect the thriftserver > SET hive.default.fileformat=Parquet;insert overwrite table tmp_10 > partition(pt='1') select count(1) count from tmp_11; > --ok > !exit > session 4(do it again): > --connect the thriftserver > SET hive.default.fileformat=Parquet;insert overwrite table tmp_10 > partition(pt='1') select count(1) count from tmp_11; > --error > !exit > - > 17/08/14 18:13:42 ERROR SparkExecuteStatementOperation: Error executing > query, currentState RUNNING, > java.lang.reflect.InvocationTargetException > .. > .. > Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Unable to move > source > hdfs://dc-hadoop54:50001/group/user/user1/meta/hive-temp-table/user1.db/tmp_11/.hive-staging_hive_2017-08-14_18-13-39_035_6303339779053 > 512282-2/-ext-1/part-0 to destination > hdfs://dc-hadoop54:50001/group/user/user1/meta/hive-temp-table/user1.db/tmp_11/pt=1/part-0 > at org.apache.hadoop.hive.ql.metadata.Hive.moveFile(Hive.java:2644) > at org.apache.hadoop.hive.ql.metadata.Hive.copyFiles(Hive.java:2711) > at > org.apache.hadoop.hive.ql.metadata.Hive.loadPartition(Hive.java:1403) > at > org.apache.hadoop.hive.ql.metadata.Hive.loadPartition(Hive.java:1324) > ... 45 more > Caused by: java.io.IOException: Filesystem closed > > - > the doc about the parquet table desc here > http://spark.apache.org/docs/latest/sql-programming-guide.html#parquet-files > Hive metastore Parquet table conversion > When reading from and writing to Hive metastore Parquet tables, Spark SQL > will try to use its own Parquet support instead of Hive SerDe for better > performance. This behavior is controlled by the > spark.sql.hive.convertMetastoreParquet configuration, and is turned on by > default. > I am confused the problem appear in the table(partitions) but it is ok with > table(with out partitions) . It means spark do not use its own parquet ? > Maybe someone give any suggest how could I avoid the issue? -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-21725) spark thriftserver insert overwrite table partition select
[ https://issues.apache.org/jira/browse/SPARK-21725?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16234149#comment-16234149 ] xinzhang commented on SPARK-21725: -- I can't believe it. I build hadoop 2.8 last night. It still appear .I think the issues here are relevant . [https://issues.apache.org/jira/browse/SPARK-21067] [https://stackoverflow.com/questions/44233523/spark-sql-2-1-1-thrift-server-unable-to-move-source-hdfs-to-target] [https://issues.apache.org/jira/browse/SPARK-11083] My Env is Centos 6.5 Jvm 8 .And to be honest. I still cannot believe u could not reproduce it !! Now we use thriftserver 1.6. It is OK . I tried all 2.x. I am curious what is the different between your env and my env. Would u give me some suggests what should I check in my env ? > spark thriftserver insert overwrite table partition select > --- > > Key: SPARK-21725 > URL: https://issues.apache.org/jira/browse/SPARK-21725 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 2.1.0 > Environment: centos 6.7 spark 2.1 jdk8 >Reporter: xinzhang >Priority: Major > Labels: spark-sql > > use thriftserver create table with partitions. > session 1: > SET hive.default.fileformat=Parquet;create table tmp_10(count bigint) > partitioned by (pt string) stored as parquet; > --ok > !exit > session 2: > SET hive.default.fileformat=Parquet;create table tmp_11(count bigint) > partitioned by (pt string) stored as parquet; > --ok > !exit > session 3: > --connect the thriftserver > SET hive.default.fileformat=Parquet;insert overwrite table tmp_10 > partition(pt='1') select count(1) count from tmp_11; > --ok > !exit > session 4(do it again): > --connect the thriftserver > SET hive.default.fileformat=Parquet;insert overwrite table tmp_10 > partition(pt='1') select count(1) count from tmp_11; > --error > !exit > - > 17/08/14 18:13:42 ERROR SparkExecuteStatementOperation: Error executing > query, currentState RUNNING, > java.lang.reflect.InvocationTargetException > .. > .. > Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Unable to move > source > hdfs://dc-hadoop54:50001/group/user/user1/meta/hive-temp-table/user1.db/tmp_11/.hive-staging_hive_2017-08-14_18-13-39_035_6303339779053 > 512282-2/-ext-1/part-0 to destination > hdfs://dc-hadoop54:50001/group/user/user1/meta/hive-temp-table/user1.db/tmp_11/pt=1/part-0 > at org.apache.hadoop.hive.ql.metadata.Hive.moveFile(Hive.java:2644) > at org.apache.hadoop.hive.ql.metadata.Hive.copyFiles(Hive.java:2711) > at > org.apache.hadoop.hive.ql.metadata.Hive.loadPartition(Hive.java:1403) > at > org.apache.hadoop.hive.ql.metadata.Hive.loadPartition(Hive.java:1324) > ... 45 more > Caused by: java.io.IOException: Filesystem closed > > - > the doc about the parquet table desc here > http://spark.apache.org/docs/latest/sql-programming-guide.html#parquet-files > Hive metastore Parquet table conversion > When reading from and writing to Hive metastore Parquet tables, Spark SQL > will try to use its own Parquet support instead of Hive SerDe for better > performance. This behavior is controlled by the > spark.sql.hive.convertMetastoreParquet configuration, and is turned on by > default. > I am confused the problem appear in the table(partitions) but it is ok with > table(with out partitions) . It means spark do not use its own parquet ? > Maybe someone give any suggest how could I avoid the issue? -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-21725) spark thriftserver insert overwrite table partition select
[ https://issues.apache.org/jira/browse/SPARK-21725?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16234105#comment-16234105 ] Marco Gaido commented on SPARK-21725: - I tried using a mysql metastore and the target package, on a Centos 6.9 with Java 8. I am sorry but I am still unable to reproduce. This looks to me like it is a problem with your specific environment. > spark thriftserver insert overwrite table partition select > --- > > Key: SPARK-21725 > URL: https://issues.apache.org/jira/browse/SPARK-21725 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 2.1.0 > Environment: centos 6.7 spark 2.1 jdk8 >Reporter: xinzhang >Priority: Major > Labels: spark-sql > > use thriftserver create table with partitions. > session 1: > SET hive.default.fileformat=Parquet;create table tmp_10(count bigint) > partitioned by (pt string) stored as parquet; > --ok > !exit > session 2: > SET hive.default.fileformat=Parquet;create table tmp_11(count bigint) > partitioned by (pt string) stored as parquet; > --ok > !exit > session 3: > --connect the thriftserver > SET hive.default.fileformat=Parquet;insert overwrite table tmp_10 > partition(pt='1') select count(1) count from tmp_11; > --ok > !exit > session 4(do it again): > --connect the thriftserver > SET hive.default.fileformat=Parquet;insert overwrite table tmp_10 > partition(pt='1') select count(1) count from tmp_11; > --error > !exit > - > 17/08/14 18:13:42 ERROR SparkExecuteStatementOperation: Error executing > query, currentState RUNNING, > java.lang.reflect.InvocationTargetException > .. > .. > Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Unable to move > source > hdfs://dc-hadoop54:50001/group/user/user1/meta/hive-temp-table/user1.db/tmp_11/.hive-staging_hive_2017-08-14_18-13-39_035_6303339779053 > 512282-2/-ext-1/part-0 to destination > hdfs://dc-hadoop54:50001/group/user/user1/meta/hive-temp-table/user1.db/tmp_11/pt=1/part-0 > at org.apache.hadoop.hive.ql.metadata.Hive.moveFile(Hive.java:2644) > at org.apache.hadoop.hive.ql.metadata.Hive.copyFiles(Hive.java:2711) > at > org.apache.hadoop.hive.ql.metadata.Hive.loadPartition(Hive.java:1403) > at > org.apache.hadoop.hive.ql.metadata.Hive.loadPartition(Hive.java:1324) > ... 45 more > Caused by: java.io.IOException: Filesystem closed > > - > the doc about the parquet table desc here > http://spark.apache.org/docs/latest/sql-programming-guide.html#parquet-files > Hive metastore Parquet table conversion > When reading from and writing to Hive metastore Parquet tables, Spark SQL > will try to use its own Parquet support instead of Hive SerDe for better > performance. This behavior is controlled by the > spark.sql.hive.convertMetastoreParquet configuration, and is turned on by > default. > I am confused the problem appear in the table(partitions) but it is ok with > table(with out partitions) . It means spark do not use its own parquet ? > Maybe someone give any suggest how could I avoid the issue? -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-21725) spark thriftserver insert overwrite table partition select
[ https://issues.apache.org/jira/browse/SPARK-21725?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16233875#comment-16233875 ] xinzhang commented on SPARK-21725: -- That is my target package log (+mysql) [https://github.com/zhangxin0112/java/blob/zxis/spark-root-org.apache.spark.sql.hive.thriftserver.HiveThriftServer2-1-node3.out] That is my source code log (+mysql) [https://github.com/zhangxin0112/java/blob/zxis/src/2.out] > spark thriftserver insert overwrite table partition select > --- > > Key: SPARK-21725 > URL: https://issues.apache.org/jira/browse/SPARK-21725 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 2.1.0 > Environment: centos 6.7 spark 2.1 jdk8 >Reporter: xinzhang >Priority: Major > Labels: spark-sql > > use thriftserver create table with partitions. > session 1: > SET hive.default.fileformat=Parquet;create table tmp_10(count bigint) > partitioned by (pt string) stored as parquet; > --ok > !exit > session 2: > SET hive.default.fileformat=Parquet;create table tmp_11(count bigint) > partitioned by (pt string) stored as parquet; > --ok > !exit > session 3: > --connect the thriftserver > SET hive.default.fileformat=Parquet;insert overwrite table tmp_10 > partition(pt='1') select count(1) count from tmp_11; > --ok > !exit > session 4(do it again): > --connect the thriftserver > SET hive.default.fileformat=Parquet;insert overwrite table tmp_10 > partition(pt='1') select count(1) count from tmp_11; > --error > !exit > - > 17/08/14 18:13:42 ERROR SparkExecuteStatementOperation: Error executing > query, currentState RUNNING, > java.lang.reflect.InvocationTargetException > .. > .. > Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Unable to move > source > hdfs://dc-hadoop54:50001/group/user/user1/meta/hive-temp-table/user1.db/tmp_11/.hive-staging_hive_2017-08-14_18-13-39_035_6303339779053 > 512282-2/-ext-1/part-0 to destination > hdfs://dc-hadoop54:50001/group/user/user1/meta/hive-temp-table/user1.db/tmp_11/pt=1/part-0 > at org.apache.hadoop.hive.ql.metadata.Hive.moveFile(Hive.java:2644) > at org.apache.hadoop.hive.ql.metadata.Hive.copyFiles(Hive.java:2711) > at > org.apache.hadoop.hive.ql.metadata.Hive.loadPartition(Hive.java:1403) > at > org.apache.hadoop.hive.ql.metadata.Hive.loadPartition(Hive.java:1324) > ... 45 more > Caused by: java.io.IOException: Filesystem closed > > - > the doc about the parquet table desc here > http://spark.apache.org/docs/latest/sql-programming-guide.html#parquet-files > Hive metastore Parquet table conversion > When reading from and writing to Hive metastore Parquet tables, Spark SQL > will try to use its own Parquet support instead of Hive SerDe for better > performance. This behavior is controlled by the > spark.sql.hive.convertMetastoreParquet configuration, and is turned on by > default. > I am confused the problem appear in the table(partitions) but it is ok with > table(with out partitions) . It means spark do not use its own parquet ? > Maybe someone give any suggest how could I avoid the issue? -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-21725) spark thriftserver insert overwrite table partition select
[ https://issues.apache.org/jira/browse/SPARK-21725?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16233858#comment-16233858 ] Marco Gaido commented on SPARK-21725: - [~zhangxin0112zx] Can you share the spark-thriftserver logs? > spark thriftserver insert overwrite table partition select > --- > > Key: SPARK-21725 > URL: https://issues.apache.org/jira/browse/SPARK-21725 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 2.1.0 > Environment: centos 6.7 spark 2.1 jdk8 >Reporter: xinzhang >Priority: Major > Labels: spark-sql > > use thriftserver create table with partitions. > session 1: > SET hive.default.fileformat=Parquet;create table tmp_10(count bigint) > partitioned by (pt string) stored as parquet; > --ok > !exit > session 2: > SET hive.default.fileformat=Parquet;create table tmp_11(count bigint) > partitioned by (pt string) stored as parquet; > --ok > !exit > session 3: > --connect the thriftserver > SET hive.default.fileformat=Parquet;insert overwrite table tmp_10 > partition(pt='1') select count(1) count from tmp_11; > --ok > !exit > session 4(do it again): > --connect the thriftserver > SET hive.default.fileformat=Parquet;insert overwrite table tmp_10 > partition(pt='1') select count(1) count from tmp_11; > --error > !exit > - > 17/08/14 18:13:42 ERROR SparkExecuteStatementOperation: Error executing > query, currentState RUNNING, > java.lang.reflect.InvocationTargetException > .. > .. > Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Unable to move > source > hdfs://dc-hadoop54:50001/group/user/user1/meta/hive-temp-table/user1.db/tmp_11/.hive-staging_hive_2017-08-14_18-13-39_035_6303339779053 > 512282-2/-ext-1/part-0 to destination > hdfs://dc-hadoop54:50001/group/user/user1/meta/hive-temp-table/user1.db/tmp_11/pt=1/part-0 > at org.apache.hadoop.hive.ql.metadata.Hive.moveFile(Hive.java:2644) > at org.apache.hadoop.hive.ql.metadata.Hive.copyFiles(Hive.java:2711) > at > org.apache.hadoop.hive.ql.metadata.Hive.loadPartition(Hive.java:1403) > at > org.apache.hadoop.hive.ql.metadata.Hive.loadPartition(Hive.java:1324) > ... 45 more > Caused by: java.io.IOException: Filesystem closed > > - > the doc about the parquet table desc here > http://spark.apache.org/docs/latest/sql-programming-guide.html#parquet-files > Hive metastore Parquet table conversion > When reading from and writing to Hive metastore Parquet tables, Spark SQL > will try to use its own Parquet support instead of Hive SerDe for better > performance. This behavior is controlled by the > spark.sql.hive.convertMetastoreParquet configuration, and is turned on by > default. > I am confused the problem appear in the table(partitions) but it is ok with > table(with out partitions) . It means spark do not use its own parquet ? > Maybe someone give any suggest how could I avoid the issue? -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-21725) spark thriftserver insert overwrite table partition select
[ https://issues.apache.org/jira/browse/SPARK-21725?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16233578#comment-16233578 ] xinzhang commented on SPARK-21725: -- 1. hive 1.2.1 download a new tar only change hive-site.xml about hive metastore with mysql . metastore(local 9083) 2.spark-sql copy the hive-site.xml 3.start spark-thriftserver 4.beeline connect the thriftserver The metastore has changed from derby to mysql . My suggest is could u do it as a new env without your exit env. Like what u say might be related to the metastore. I tested the case in cdh5.7(hadoop2.6) and hadoop2.8(new env) , they will always appear , No matter what I did . Hope your help . Thanks . > spark thriftserver insert overwrite table partition select > --- > > Key: SPARK-21725 > URL: https://issues.apache.org/jira/browse/SPARK-21725 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 2.1.0 > Environment: centos 6.7 spark 2.1 jdk8 >Reporter: xinzhang >Priority: Major > Labels: spark-sql > > use thriftserver create table with partitions. > session 1: > SET hive.default.fileformat=Parquet;create table tmp_10(count bigint) > partitioned by (pt string) stored as parquet; > --ok > !exit > session 2: > SET hive.default.fileformat=Parquet;create table tmp_11(count bigint) > partitioned by (pt string) stored as parquet; > --ok > !exit > session 3: > --connect the thriftserver > SET hive.default.fileformat=Parquet;insert overwrite table tmp_10 > partition(pt='1') select count(1) count from tmp_11; > --ok > !exit > session 4(do it again): > --connect the thriftserver > SET hive.default.fileformat=Parquet;insert overwrite table tmp_10 > partition(pt='1') select count(1) count from tmp_11; > --error > !exit > - > 17/08/14 18:13:42 ERROR SparkExecuteStatementOperation: Error executing > query, currentState RUNNING, > java.lang.reflect.InvocationTargetException > .. > .. > Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Unable to move > source > hdfs://dc-hadoop54:50001/group/user/user1/meta/hive-temp-table/user1.db/tmp_11/.hive-staging_hive_2017-08-14_18-13-39_035_6303339779053 > 512282-2/-ext-1/part-0 to destination > hdfs://dc-hadoop54:50001/group/user/user1/meta/hive-temp-table/user1.db/tmp_11/pt=1/part-0 > at org.apache.hadoop.hive.ql.metadata.Hive.moveFile(Hive.java:2644) > at org.apache.hadoop.hive.ql.metadata.Hive.copyFiles(Hive.java:2711) > at > org.apache.hadoop.hive.ql.metadata.Hive.loadPartition(Hive.java:1403) > at > org.apache.hadoop.hive.ql.metadata.Hive.loadPartition(Hive.java:1324) > ... 45 more > Caused by: java.io.IOException: Filesystem closed > > - > the doc about the parquet table desc here > http://spark.apache.org/docs/latest/sql-programming-guide.html#parquet-files > Hive metastore Parquet table conversion > When reading from and writing to Hive metastore Parquet tables, Spark SQL > will try to use its own Parquet support instead of Hive SerDe for better > performance. This behavior is controlled by the > spark.sql.hive.convertMetastoreParquet configuration, and is turned on by > default. > I am confused the problem appear in the table(partitions) but it is ok with > table(with out partitions) . It means spark do not use its own parquet ? > Maybe someone give any suggest how could I avoid the issue? -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-21725) spark thriftserver insert overwrite table partition select
[ https://issues.apache.org/jira/browse/SPARK-21725?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16226720#comment-16226720 ] Marco Gaido commented on SPARK-21725: - [~zhangxin0112zx] I am sorry but I am still unable to reproduce it locally. Here you are the steps I performed. It might be related to the metastore. May you provide more details about your installation and the logs of the spark thriftserver? {code:java} ➜ spark git:(SPARK-21725) ✗ ./bin/beeline -u "jdbc:hive2://localhost:1" Connecting to jdbc:hive2://localhost:1 log4j:WARN No appenders could be found for logger (org.apache.hive.jdbc.Utils). log4j:WARN Please initialize the log4j system properly. log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info. Connected to: Spark SQL (version 2.3.0-SNAPSHOT) Driver: Hive JDBC (version 1.2.1.spark2) Transaction isolation: TRANSACTION_REPEATABLE_READ Beeline version 1.2.1.spark2 by Apache Hive 0: jdbc:hive2://localhost:1> set hive.default.fileformat=Parquet; +--+--+--+ | key| value | +--+--+--+ | hive.default.fileformat | Parquet | +--+--+--+ 1 row selected (0.434 seconds) 0: jdbc:hive2://localhost:1> create table default.test_e(name string) partitioned by (pt string); +-+--+ | Result | +-+--+ +-+--+ No rows selected (0.472 seconds) 0: jdbc:hive2://localhost:1> create table default.test_f(name string) partitioned by (pt string); +-+--+ | Result | +-+--+ +-+--+ No rows selected (0.067 seconds) 0: jdbc:hive2://localhost:1> !quit Closing: 0: jdbc:hive2://localhost:1 ➜ spark git:(SPARK-21725) ✗ ./bin/beeline -u "jdbc:hive2://localhost:1" Connecting to jdbc:hive2://localhost:1 log4j:WARN No appenders could be found for logger (org.apache.hive.jdbc.Utils). log4j:WARN Please initialize the log4j system properly. log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info. Connected to: Spark SQL (version 2.3.0-SNAPSHOT) Driver: Hive JDBC (version 1.2.1.spark2) Transaction isolation: TRANSACTION_REPEATABLE_READ Beeline version 1.2.1.spark2 by Apache Hive 0: jdbc:hive2://localhost:1> insert overwrite table default.test_e partition(pt="1") select count(1) from default.test_f; +-+--+ | Result | +-+--+ +-+--+ No rows selected (2.351 seconds) 0: jdbc:hive2://localhost:1> !quit Closing: 0: jdbc:hive2://localhost:1 ➜ spark git:(SPARK-21725) ✗ ./bin/beeline -u "jdbc:hive2://localhost:1" Connecting to jdbc:hive2://localhost:1 log4j:WARN No appenders could be found for logger (org.apache.hive.jdbc.Utils). log4j:WARN Please initialize the log4j system properly. log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info. Connected to: Spark SQL (version 2.3.0-SNAPSHOT) Driver: Hive JDBC (version 1.2.1.spark2) Transaction isolation: TRANSACTION_REPEATABLE_READ Beeline version 1.2.1.spark2 by Apache Hive 0: jdbc:hive2://localhost:1> insert overwrite table default.test_e partition(pt="1") select count(1) from default.test_f; +-+--+ | Result | +-+--+ +-+--+ No rows selected (0.612 seconds) 0: jdbc:hive2://localhost:1> {code} > spark thriftserver insert overwrite table partition select > --- > > Key: SPARK-21725 > URL: https://issues.apache.org/jira/browse/SPARK-21725 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 2.1.0 > Environment: centos 6.7 spark 2.1 jdk8 >Reporter: xinzhang > Labels: spark-sql > > use thriftserver create table with partitions. > session 1: > SET hive.default.fileformat=Parquet;create table tmp_10(count bigint) > partitioned by (pt string) stored as parquet; > --ok > !exit > session 2: > SET hive.default.fileformat=Parquet;create table tmp_11(count bigint) > partitioned by (pt string) stored as parquet; > --ok > !exit > session 3: > --connect the thriftserver > SET hive.default.fileformat=Parquet;insert overwrite table tmp_10 > partition(pt='1') select count(1) count from tmp_11; > --ok > !exit > session 4(do it again): > --connect the thriftserver > SET hive.default.fileformat=Parquet;insert overwrite table tmp_10 > partition(pt='1') select count(1) count from tmp_11; > --error > !exit > - > 17/08/14 18:13:42 ERROR SparkExecuteStatementOperation: Error executing > query, currentState RUNNING, > java.lang.reflect.InvocationTargetException > .. > .. > Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Unable to move > source >
[jira] [Commented] (SPARK-21725) spark thriftserver insert overwrite table partition select
[ https://issues.apache.org/jira/browse/SPARK-21725?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16226337#comment-16226337 ] xinzhang commented on SPARK-21725: -- Now I try with the master branch. The problem is still here. Steps: 1.download . install . exec hivesql (hive-1.2.1 . Here prove my hive is OK) !https://user-images.githubusercontent.com/8244097/32210043-7554300e-be46-11e7-8ce0-f61bc0bfa998.png! 2.download . install . exec spark-sql (spark-master I build it with master the lastest commit 44c4003155c1d243ffe0f73d5537b4c8b3f3b564) First time . Spark-sql result: GOOD !https://user-images.githubusercontent.com/8244097/32210200-5b02de20-be47-11e7-8eac-e0228a7cf7f5.png! Second time . Spark-sql result: GOOD !https://user-images.githubusercontent.com/8244097/32210320-f518aa12-be47-11e7-9a86-a16819583748.png! 3.use spark-sql thriftserver First time . Spark-sql result: GOOD Second time .Spark-sql result: BAD !https://user-images.githubusercontent.com/8244097/32210560-47d431da-be49-11e7-8279-7dd88dda42a6.png! > spark thriftserver insert overwrite table partition select > --- > > Key: SPARK-21725 > URL: https://issues.apache.org/jira/browse/SPARK-21725 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 2.1.0 > Environment: centos 6.7 spark 2.1 jdk8 >Reporter: xinzhang > Labels: spark-sql > > use thriftserver create table with partitions. > session 1: > SET hive.default.fileformat=Parquet;create table tmp_10(count bigint) > partitioned by (pt string) stored as parquet; > --ok > !exit > session 2: > SET hive.default.fileformat=Parquet;create table tmp_11(count bigint) > partitioned by (pt string) stored as parquet; > --ok > !exit > session 3: > --connect the thriftserver > SET hive.default.fileformat=Parquet;insert overwrite table tmp_10 > partition(pt='1') select count(1) count from tmp_11; > --ok > !exit > session 4(do it again): > --connect the thriftserver > SET hive.default.fileformat=Parquet;insert overwrite table tmp_10 > partition(pt='1') select count(1) count from tmp_11; > --error > !exit > - > 17/08/14 18:13:42 ERROR SparkExecuteStatementOperation: Error executing > query, currentState RUNNING, > java.lang.reflect.InvocationTargetException > .. > .. > Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Unable to move > source > hdfs://dc-hadoop54:50001/group/user/user1/meta/hive-temp-table/user1.db/tmp_11/.hive-staging_hive_2017-08-14_18-13-39_035_6303339779053 > 512282-2/-ext-1/part-0 to destination > hdfs://dc-hadoop54:50001/group/user/user1/meta/hive-temp-table/user1.db/tmp_11/pt=1/part-0 > at org.apache.hadoop.hive.ql.metadata.Hive.moveFile(Hive.java:2644) > at org.apache.hadoop.hive.ql.metadata.Hive.copyFiles(Hive.java:2711) > at > org.apache.hadoop.hive.ql.metadata.Hive.loadPartition(Hive.java:1403) > at > org.apache.hadoop.hive.ql.metadata.Hive.loadPartition(Hive.java:1324) > ... 45 more > Caused by: java.io.IOException: Filesystem closed > > - > the doc about the parquet table desc here > http://spark.apache.org/docs/latest/sql-programming-guide.html#parquet-files > Hive metastore Parquet table conversion > When reading from and writing to Hive metastore Parquet tables, Spark SQL > will try to use its own Parquet support instead of Hive SerDe for better > performance. This behavior is controlled by the > spark.sql.hive.convertMetastoreParquet configuration, and is turned on by > default. > I am confused the problem appear in the table(partitions) but it is ok with > table(with out partitions) . It means spark do not use its own parquet ? > Maybe someone give any suggest how could I avoid the issue? -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-21725) spark thriftserver insert overwrite table partition select
[ https://issues.apache.org/jira/browse/SPARK-21725?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16220239#comment-16220239 ] xinzhang commented on SPARK-21725: -- I tried the spark(version-master) at 21/Aug2017, it still appear the problem . I will try it again now. I will replay u the result what I get . Thanks for your replay. [~mgaido] [~srowen] > spark thriftserver insert overwrite table partition select > --- > > Key: SPARK-21725 > URL: https://issues.apache.org/jira/browse/SPARK-21725 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 2.1.0 > Environment: centos 6.7 spark 2.1 jdk8 >Reporter: xinzhang > Labels: spark-sql > > use thriftserver create table with partitions. > session 1: > SET hive.default.fileformat=Parquet;create table tmp_10(count bigint) > partitioned by (pt string) stored as parquet; > --ok > !exit > session 2: > SET hive.default.fileformat=Parquet;create table tmp_11(count bigint) > partitioned by (pt string) stored as parquet; > --ok > !exit > session 3: > --connect the thriftserver > SET hive.default.fileformat=Parquet;insert overwrite table tmp_10 > partition(pt='1') select count(1) count from tmp_11; > --ok > !exit > session 4(do it again): > --connect the thriftserver > SET hive.default.fileformat=Parquet;insert overwrite table tmp_10 > partition(pt='1') select count(1) count from tmp_11; > --error > !exit > - > 17/08/14 18:13:42 ERROR SparkExecuteStatementOperation: Error executing > query, currentState RUNNING, > java.lang.reflect.InvocationTargetException > .. > .. > Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Unable to move > source > hdfs://dc-hadoop54:50001/group/user/user1/meta/hive-temp-table/user1.db/tmp_11/.hive-staging_hive_2017-08-14_18-13-39_035_6303339779053 > 512282-2/-ext-1/part-0 to destination > hdfs://dc-hadoop54:50001/group/user/user1/meta/hive-temp-table/user1.db/tmp_11/pt=1/part-0 > at org.apache.hadoop.hive.ql.metadata.Hive.moveFile(Hive.java:2644) > at org.apache.hadoop.hive.ql.metadata.Hive.copyFiles(Hive.java:2711) > at > org.apache.hadoop.hive.ql.metadata.Hive.loadPartition(Hive.java:1403) > at > org.apache.hadoop.hive.ql.metadata.Hive.loadPartition(Hive.java:1324) > ... 45 more > Caused by: java.io.IOException: Filesystem closed > > - > the doc about the parquet table desc here > http://spark.apache.org/docs/latest/sql-programming-guide.html#parquet-files > Hive metastore Parquet table conversion > When reading from and writing to Hive metastore Parquet tables, Spark SQL > will try to use its own Parquet support instead of Hive SerDe for better > performance. This behavior is controlled by the > spark.sql.hive.convertMetastoreParquet configuration, and is turned on by > default. > I am confused the problem appear in the table(partitions) but it is ok with > table(with out partitions) . It means spark do not use its own parquet ? > Maybe someone give any suggest how could I avoid the issue? -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-21725) spark thriftserver insert overwrite table partition select
[ https://issues.apache.org/jira/browse/SPARK-21725?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16220228#comment-16220228 ] Marco Gaido commented on SPARK-21725: - please try with the master branch, not with Spark 2.1.2. I used that and I was unable to reproduce the issue. If you manage to reproduce the issue on the current master, then maybe I am doing something wrong trying to reproduce it, despite the steps you posted are pretty precise: thus in that case, I'd ask you to give more information about the configuration and to check the exact steps to reproduce it. Otherwise, the only suggestion I can give is to upgrade to 2.3.0 as soon as it will be available. > spark thriftserver insert overwrite table partition select > --- > > Key: SPARK-21725 > URL: https://issues.apache.org/jira/browse/SPARK-21725 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 2.1.0 > Environment: centos 6.7 spark 2.1 jdk8 >Reporter: xinzhang > Labels: spark-sql > > use thriftserver create table with partitions. > session 1: > SET hive.default.fileformat=Parquet;create table tmp_10(count bigint) > partitioned by (pt string) stored as parquet; > --ok > !exit > session 2: > SET hive.default.fileformat=Parquet;create table tmp_11(count bigint) > partitioned by (pt string) stored as parquet; > --ok > !exit > session 3: > --connect the thriftserver > SET hive.default.fileformat=Parquet;insert overwrite table tmp_10 > partition(pt='1') select count(1) count from tmp_11; > --ok > !exit > session 4(do it again): > --connect the thriftserver > SET hive.default.fileformat=Parquet;insert overwrite table tmp_10 > partition(pt='1') select count(1) count from tmp_11; > --error > !exit > - > 17/08/14 18:13:42 ERROR SparkExecuteStatementOperation: Error executing > query, currentState RUNNING, > java.lang.reflect.InvocationTargetException > .. > .. > Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Unable to move > source > hdfs://dc-hadoop54:50001/group/user/user1/meta/hive-temp-table/user1.db/tmp_11/.hive-staging_hive_2017-08-14_18-13-39_035_6303339779053 > 512282-2/-ext-1/part-0 to destination > hdfs://dc-hadoop54:50001/group/user/user1/meta/hive-temp-table/user1.db/tmp_11/pt=1/part-0 > at org.apache.hadoop.hive.ql.metadata.Hive.moveFile(Hive.java:2644) > at org.apache.hadoop.hive.ql.metadata.Hive.copyFiles(Hive.java:2711) > at > org.apache.hadoop.hive.ql.metadata.Hive.loadPartition(Hive.java:1403) > at > org.apache.hadoop.hive.ql.metadata.Hive.loadPartition(Hive.java:1324) > ... 45 more > Caused by: java.io.IOException: Filesystem closed > > - > the doc about the parquet table desc here > http://spark.apache.org/docs/latest/sql-programming-guide.html#parquet-files > Hive metastore Parquet table conversion > When reading from and writing to Hive metastore Parquet tables, Spark SQL > will try to use its own Parquet support instead of Hive SerDe for better > performance. This behavior is controlled by the > spark.sql.hive.convertMetastoreParquet configuration, and is turned on by > default. > I am confused the problem appear in the table(partitions) but it is ok with > table(with out partitions) . It means spark do not use its own parquet ? > Maybe someone give any suggest how could I avoid the issue? -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-21725) spark thriftserver insert overwrite table partition select
[ https://issues.apache.org/jira/browse/SPARK-21725?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16220225#comment-16220225 ] Sean Owen commented on SPARK-21725: --- [~zhangxin0112zx] there's no reason to expect 2.1.2 was different. He's asking you to try the current master branch. > spark thriftserver insert overwrite table partition select > --- > > Key: SPARK-21725 > URL: https://issues.apache.org/jira/browse/SPARK-21725 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 2.1.0 > Environment: centos 6.7 spark 2.1 jdk8 >Reporter: xinzhang > Labels: spark-sql > > use thriftserver create table with partitions. > session 1: > SET hive.default.fileformat=Parquet;create table tmp_10(count bigint) > partitioned by (pt string) stored as parquet; > --ok > !exit > session 2: > SET hive.default.fileformat=Parquet;create table tmp_11(count bigint) > partitioned by (pt string) stored as parquet; > --ok > !exit > session 3: > --connect the thriftserver > SET hive.default.fileformat=Parquet;insert overwrite table tmp_10 > partition(pt='1') select count(1) count from tmp_11; > --ok > !exit > session 4(do it again): > --connect the thriftserver > SET hive.default.fileformat=Parquet;insert overwrite table tmp_10 > partition(pt='1') select count(1) count from tmp_11; > --error > !exit > - > 17/08/14 18:13:42 ERROR SparkExecuteStatementOperation: Error executing > query, currentState RUNNING, > java.lang.reflect.InvocationTargetException > .. > .. > Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Unable to move > source > hdfs://dc-hadoop54:50001/group/user/user1/meta/hive-temp-table/user1.db/tmp_11/.hive-staging_hive_2017-08-14_18-13-39_035_6303339779053 > 512282-2/-ext-1/part-0 to destination > hdfs://dc-hadoop54:50001/group/user/user1/meta/hive-temp-table/user1.db/tmp_11/pt=1/part-0 > at org.apache.hadoop.hive.ql.metadata.Hive.moveFile(Hive.java:2644) > at org.apache.hadoop.hive.ql.metadata.Hive.copyFiles(Hive.java:2711) > at > org.apache.hadoop.hive.ql.metadata.Hive.loadPartition(Hive.java:1403) > at > org.apache.hadoop.hive.ql.metadata.Hive.loadPartition(Hive.java:1324) > ... 45 more > Caused by: java.io.IOException: Filesystem closed > > - > the doc about the parquet table desc here > http://spark.apache.org/docs/latest/sql-programming-guide.html#parquet-files > Hive metastore Parquet table conversion > When reading from and writing to Hive metastore Parquet tables, Spark SQL > will try to use its own Parquet support instead of Hive SerDe for better > performance. This behavior is controlled by the > spark.sql.hive.convertMetastoreParquet configuration, and is turned on by > default. > I am confused the problem appear in the table(partitions) but it is ok with > table(with out partitions) . It means spark do not use its own parquet ? > Maybe someone give any suggest how could I avoid the issue? -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-21725) spark thriftserver insert overwrite table partition select
[ https://issues.apache.org/jira/browse/SPARK-21725?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16220216#comment-16220216 ] xinzhang commented on SPARK-21725: -- I download spark 2.1.2 .The problem still appear . Could u give me any suggests to avoid the problem . [~mgaido] > spark thriftserver insert overwrite table partition select > --- > > Key: SPARK-21725 > URL: https://issues.apache.org/jira/browse/SPARK-21725 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 2.1.0 > Environment: centos 6.7 spark 2.1 jdk8 >Reporter: xinzhang > Labels: spark-sql > > use thriftserver create table with partitions. > session 1: > SET hive.default.fileformat=Parquet;create table tmp_10(count bigint) > partitioned by (pt string) stored as parquet; > --ok > !exit > session 2: > SET hive.default.fileformat=Parquet;create table tmp_11(count bigint) > partitioned by (pt string) stored as parquet; > --ok > !exit > session 3: > --connect the thriftserver > SET hive.default.fileformat=Parquet;insert overwrite table tmp_10 > partition(pt='1') select count(1) count from tmp_11; > --ok > !exit > session 4(do it again): > --connect the thriftserver > SET hive.default.fileformat=Parquet;insert overwrite table tmp_10 > partition(pt='1') select count(1) count from tmp_11; > --error > !exit > - > 17/08/14 18:13:42 ERROR SparkExecuteStatementOperation: Error executing > query, currentState RUNNING, > java.lang.reflect.InvocationTargetException > .. > .. > Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Unable to move > source > hdfs://dc-hadoop54:50001/group/user/user1/meta/hive-temp-table/user1.db/tmp_11/.hive-staging_hive_2017-08-14_18-13-39_035_6303339779053 > 512282-2/-ext-1/part-0 to destination > hdfs://dc-hadoop54:50001/group/user/user1/meta/hive-temp-table/user1.db/tmp_11/pt=1/part-0 > at org.apache.hadoop.hive.ql.metadata.Hive.moveFile(Hive.java:2644) > at org.apache.hadoop.hive.ql.metadata.Hive.copyFiles(Hive.java:2711) > at > org.apache.hadoop.hive.ql.metadata.Hive.loadPartition(Hive.java:1403) > at > org.apache.hadoop.hive.ql.metadata.Hive.loadPartition(Hive.java:1324) > ... 45 more > Caused by: java.io.IOException: Filesystem closed > > - > the doc about the parquet table desc here > http://spark.apache.org/docs/latest/sql-programming-guide.html#parquet-files > Hive metastore Parquet table conversion > When reading from and writing to Hive metastore Parquet tables, Spark SQL > will try to use its own Parquet support instead of Hive SerDe for better > performance. This behavior is controlled by the > spark.sql.hive.convertMetastoreParquet configuration, and is turned on by > default. > I am confused the problem appear in the table(partitions) but it is ok with > table(with out partitions) . It means spark do not use its own parquet ? > Maybe someone give any suggest how could I avoid the issue? -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-21725) spark thriftserver insert overwrite table partition select
[ https://issues.apache.org/jira/browse/SPARK-21725?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16134642#comment-16134642 ] xinzhang commented on SPARK-21725: -- Ok. I will retry the version of current master. > spark thriftserver insert overwrite table partition select > --- > > Key: SPARK-21725 > URL: https://issues.apache.org/jira/browse/SPARK-21725 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 2.1.0 > Environment: centos 6.7 spark 2.1 jdk8 >Reporter: xinzhang > Labels: spark-sql > > use thriftserver create table with partitions. > session 1: > SET hive.default.fileformat=Parquet;create table tmp_10(count bigint) > partitioned by (pt string) stored as parquet; > --ok > !exit > session 2: > SET hive.default.fileformat=Parquet;create table tmp_11(count bigint) > partitioned by (pt string) stored as parquet; > --ok > !exit > session 3: > --connect the thriftserver > SET hive.default.fileformat=Parquet;insert overwrite table tmp_10 > partition(pt='1') select count(1) count from tmp_11; > --ok > !exit > session 4(do it again): > --connect the thriftserver > SET hive.default.fileformat=Parquet;insert overwrite table tmp_10 > partition(pt='1') select count(1) count from tmp_11; > --error > !exit > - > 17/08/14 18:13:42 ERROR SparkExecuteStatementOperation: Error executing > query, currentState RUNNING, > java.lang.reflect.InvocationTargetException > .. > .. > Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Unable to move > source > hdfs://dc-hadoop54:50001/group/user/user1/meta/hive-temp-table/user1.db/tmp_11/.hive-staging_hive_2017-08-14_18-13-39_035_6303339779053 > 512282-2/-ext-1/part-0 to destination > hdfs://dc-hadoop54:50001/group/user/user1/meta/hive-temp-table/user1.db/tmp_11/pt=1/part-0 > at org.apache.hadoop.hive.ql.metadata.Hive.moveFile(Hive.java:2644) > at org.apache.hadoop.hive.ql.metadata.Hive.copyFiles(Hive.java:2711) > at > org.apache.hadoop.hive.ql.metadata.Hive.loadPartition(Hive.java:1403) > at > org.apache.hadoop.hive.ql.metadata.Hive.loadPartition(Hive.java:1324) > ... 45 more > Caused by: java.io.IOException: Filesystem closed > > - > the doc about the parquet table desc here > http://spark.apache.org/docs/latest/sql-programming-guide.html#parquet-files > Hive metastore Parquet table conversion > When reading from and writing to Hive metastore Parquet tables, Spark SQL > will try to use its own Parquet support instead of Hive SerDe for better > performance. This behavior is controlled by the > spark.sql.hive.convertMetastoreParquet configuration, and is turned on by > default. > I am confused the problem appear in the table(partitions) but it is ok with > table(with out partitions) . It means spark do not use its own parquet ? > Maybe someone give any suggest how could I avoid the issue? -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-21725) spark thriftserver insert overwrite table partition select
[ https://issues.apache.org/jira/browse/SPARK-21725?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16134424#comment-16134424 ] Marco Gaido commented on SPARK-21725: - [~zhangxin0112zx] I followed your instructions, but I am unable to reproduce the problem in the current master. May you please try and check whether it is still present in the current code? > spark thriftserver insert overwrite table partition select > --- > > Key: SPARK-21725 > URL: https://issues.apache.org/jira/browse/SPARK-21725 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 2.1.0 > Environment: centos 6.7 spark 2.1 jdk8 >Reporter: xinzhang > Labels: spark-sql > > use thriftserver create table with partitions. > session 1: > SET hive.default.fileformat=Parquet;create table tmp_10(count bigint) > partitioned by (pt string) stored as parquet; > --ok > !exit > session 2: > SET hive.default.fileformat=Parquet;create table tmp_11(count bigint) > partitioned by (pt string) stored as parquet; > --ok > !exit > session 3: > --connect the thriftserver > SET hive.default.fileformat=Parquet;insert overwrite table tmp_10 > partition(pt='1') select count(1) count from tmp_11; > --ok > !exit > session 4(do it again): > --connect the thriftserver > SET hive.default.fileformat=Parquet;insert overwrite table tmp_10 > partition(pt='1') select count(1) count from tmp_11; > --error > !exit > - > 17/08/14 18:13:42 ERROR SparkExecuteStatementOperation: Error executing > query, currentState RUNNING, > java.lang.reflect.InvocationTargetException > .. > .. > Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Unable to move > source > hdfs://dc-hadoop54:50001/group/user/user1/meta/hive-temp-table/user1.db/tmp_11/.hive-staging_hive_2017-08-14_18-13-39_035_6303339779053 > 512282-2/-ext-1/part-0 to destination > hdfs://dc-hadoop54:50001/group/user/user1/meta/hive-temp-table/user1.db/tmp_11/pt=1/part-0 > at org.apache.hadoop.hive.ql.metadata.Hive.moveFile(Hive.java:2644) > at org.apache.hadoop.hive.ql.metadata.Hive.copyFiles(Hive.java:2711) > at > org.apache.hadoop.hive.ql.metadata.Hive.loadPartition(Hive.java:1403) > at > org.apache.hadoop.hive.ql.metadata.Hive.loadPartition(Hive.java:1324) > ... 45 more > Caused by: java.io.IOException: Filesystem closed > > - > the doc about the parquet table desc here > http://spark.apache.org/docs/latest/sql-programming-guide.html#parquet-files > Hive metastore Parquet table conversion > When reading from and writing to Hive metastore Parquet tables, Spark SQL > will try to use its own Parquet support instead of Hive SerDe for better > performance. This behavior is controlled by the > spark.sql.hive.convertMetastoreParquet configuration, and is turned on by > default. > I am confused the problem appear in the table(partitions) but it is ok with > table(with out partitions) . It means spark do not use its own parquet ? > Maybe someone give any suggest how could I avoid the issue? -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org