[jira] [Updated] (SPARK-31605) Unable to insert data with partial dynamic partition with Spark & Hive 3
[ https://issues.apache.org/jira/browse/SPARK-31605?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Amit Ashish updated SPARK-31605: Description: When performing inserting data with dynamic partition, the operation fails if all partitions are not dynamic. For example: {code:sql} create external table test_insert(a int) partitioned by (part_a string, part_b string) stored as parquet location ''; {code} The query {code:sql} insert into table test_insert partition(part_a='a', part_b) values (3, 'b'); {code} will fails with errors {code:xml} Cannot create partition spec from hdfs:/// ; missing keys [part_a] Ignoring invalid DP directory {code} On the other hand, if I remove the static value of part_a to make the insert fully dynamic, the following query will succeed. Please note that below is not the issue . Issue is above one , where query throws invalid DP directory warning. {code:sql} insert into table test_insert partition(part_a, part_b) values (1,'a','b'); {code} was: When performing inserting data with dynamic partition, the operation fails if all partitions are not dynamic. For example: The query {code:sql} insert into table test_insert partition(part_a='a', part_b) values (3, 'b'); {code} will fails with errors {code:xml} Cannot create partition spec from hdfs:/// ; missing keys [part_a] Ignoring invalid DP directory {code} On the other hand, if I remove the static value of part_a to make the insert fully dynamic, the following query will success. {code:sql} insert overwrite table t1 (part_a, part_b) select * from t2 {code} > Unable to insert data with partial dynamic partition with Spark & Hive 3 > > > Key: SPARK-31605 > URL: https://issues.apache.org/jira/browse/SPARK-31605 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.3.2 > Environment: Hortonwork HDP 3.1.0 > Spark 2.3.2 > Hive 3 >Reporter: Amit Ashish >Priority: Major > > When performing inserting data with dynamic partition, the operation fails if > all partitions are not dynamic. For example: > > {code:sql} > create external table test_insert(a int) partitioned by (part_a string, > part_b string) stored as parquet location ''; > > {code} > The query > {code:sql} > insert into table test_insert partition(part_a='a', part_b) values (3, 'b'); > {code} > will fails with errors > {code:xml} > Cannot create partition spec from hdfs:/// ; missing keys [part_a] > Ignoring invalid DP directory > {code} > > > > On the other hand, if I remove the static value of part_a to make the insert > fully dynamic, the following query will succeed. Please note that below is > not the issue . Issue is above one , where query throws invalid DP directory > warning. > {code:sql} > insert into table test_insert partition(part_a, part_b) values (1,'a','b'); > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-31605) Unable to insert data with partial dynamic partition with Spark & Hive 3
[ https://issues.apache.org/jira/browse/SPARK-31605?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17095462#comment-17095462 ] Amit Ashish commented on SPARK-31605: - previously closed ticket does not show the actual insert statement working. below is the query that is not working: insert into table test_insert partition(part_a='a', part_b) values (3, 'b'); Getting below error: WARN FileOperations: Ignoring invalid DP directory hdfs://HDP3/warehouse/tablespace/external/hive/dw_analyst.db/test_insert/.hive-staging_hive_2020-04-29_13-28-46_360_4646016571504464856-1/-ext-1/part_b=b 20/04/29 13:28:52 INFO Hive: Loaded 0 partitions As mentioned in previous ticket , setting below does not make any difference: set hive.exec.dynamic.partition.mode=nonstrict; Neither setting spark.hadoop.hive.exec.dynamic.partition.mode=nonstrict as spark config solves this . > Unable to insert data with partial dynamic partition with Spark & Hive 3 > > > Key: SPARK-31605 > URL: https://issues.apache.org/jira/browse/SPARK-31605 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.3.2 > Environment: Hortonwork HDP 3.1.0 > Spark 2.3.2 > Hive 3 >Reporter: Amit Ashish >Priority: Major > > When performing inserting data with dynamic partition, the operation fails if > all partitions are not dynamic. For example: > The query > {code:sql} > insert overwrite table t1 (part_a='a', part_b) select * from t2 > {code} > will fails with errors > {code:xml} > Cannot create partition spec from hdfs:/// ; missing keys [part_a] > Ignoring invalid DP directory > {code} > On the other hand, if I remove the static value of part_a to make the insert > fully dynamic, the following query will success. > {code:sql} > insert overwrite table t1 (part_a, part_b) select * from t2 > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-31605) Unable to insert data with partial dynamic partition with Spark & Hive 3
[ https://issues.apache.org/jira/browse/SPARK-31605?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Amit Ashish updated SPARK-31605: Description: When performing inserting data with dynamic partition, the operation fails if all partitions are not dynamic. For example: The query {code:sql} insert into table test_insert partition(part_a='a', part_b) values (3, 'b'); {code} will fails with errors {code:xml} Cannot create partition spec from hdfs:/// ; missing keys [part_a] Ignoring invalid DP directory {code} On the other hand, if I remove the static value of part_a to make the insert fully dynamic, the following query will success. {code:sql} insert overwrite table t1 (part_a, part_b) select * from t2 {code} was: When performing inserting data with dynamic partition, the operation fails if all partitions are not dynamic. For example: The query {code:sql} insert overwrite table t1 (part_a='a', part_b) select * from t2 {code} will fails with errors {code:xml} Cannot create partition spec from hdfs:/// ; missing keys [part_a] Ignoring invalid DP directory {code} On the other hand, if I remove the static value of part_a to make the insert fully dynamic, the following query will success. {code:sql} insert overwrite table t1 (part_a, part_b) select * from t2 {code} > Unable to insert data with partial dynamic partition with Spark & Hive 3 > > > Key: SPARK-31605 > URL: https://issues.apache.org/jira/browse/SPARK-31605 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.3.2 > Environment: Hortonwork HDP 3.1.0 > Spark 2.3.2 > Hive 3 >Reporter: Amit Ashish >Priority: Major > > When performing inserting data with dynamic partition, the operation fails if > all partitions are not dynamic. For example: > The query > {code:sql} > insert into table test_insert partition(part_a='a', part_b) values (3, 'b'); > {code} > will fails with errors > {code:xml} > Cannot create partition spec from hdfs:/// ; missing keys [part_a] > Ignoring invalid DP directory > {code} > On the other hand, if I remove the static value of part_a to make the insert > fully dynamic, the following query will success. > {code:sql} > insert overwrite table t1 (part_a, part_b) select * from t2 > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-31605) Unable to insert data with partial dynamic partition with Spark & Hive 3
[ https://issues.apache.org/jira/browse/SPARK-31605?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17095462#comment-17095462 ] Amit Ashish edited comment on SPARK-31605 at 4/29/20, 1:42 PM: --- previously closed ticket does not show the actual insert statement working. below is the query that is not working: insert into table test_insert partition(part_a='a', part_b) values (3, 'b'); Getting below warning: WARN FileOperations: Ignoring invalid DP directory hdfs://HDP3/warehouse/tablespace/external/hive/dw_analyst.db/test_insert/.hive-staging_hive_2020-04-29_13-28-46_360_4646016571504464856-1/-ext-1/part_b=b 20/04/29 13:28:52 INFO Hive: Loaded 0 partitions As mentioned in previous ticket , setting below does not make any difference: set hive.exec.dynamic.partition.mode=nonstrict; Neither setting spark.hadoop.hive.exec.dynamic.partition.mode=nonstrict as spark config solves this . Worst part is data does not get inserted and the return code is still 0 . Kindly either suggest a fix for this or enable a non-zero return code to track this in automated data pipelines . was (Author: dreamaaj): previously closed ticket does not show the actual insert statement working. below is the query that is not working: insert into table test_insert partition(part_a='a', part_b) values (3, 'b'); Getting below error: WARN FileOperations: Ignoring invalid DP directory hdfs://HDP3/warehouse/tablespace/external/hive/dw_analyst.db/test_insert/.hive-staging_hive_2020-04-29_13-28-46_360_4646016571504464856-1/-ext-1/part_b=b 20/04/29 13:28:52 INFO Hive: Loaded 0 partitions As mentioned in previous ticket , setting below does not make any difference: set hive.exec.dynamic.partition.mode=nonstrict; Neither setting spark.hadoop.hive.exec.dynamic.partition.mode=nonstrict as spark config solves this . > Unable to insert data with partial dynamic partition with Spark & Hive 3 > > > Key: SPARK-31605 > URL: https://issues.apache.org/jira/browse/SPARK-31605 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.3.2 > Environment: Hortonwork HDP 3.1.0 > Spark 2.3.2 > Hive 3 >Reporter: Amit Ashish >Priority: Major > > When performing inserting data with dynamic partition, the operation fails if > all partitions are not dynamic. For example: > The query > {code:sql} > insert overwrite table t1 (part_a='a', part_b) select * from t2 > {code} > will fails with errors > {code:xml} > Cannot create partition spec from hdfs:/// ; missing keys [part_a] > Ignoring invalid DP directory > {code} > On the other hand, if I remove the static value of part_a to make the insert > fully dynamic, the following query will success. > {code:sql} > insert overwrite table t1 (part_a, part_b) select * from t2 > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-31605) Unable to insert data with partial dynamic partition with Spark & Hive 3
Amit Ashish created SPARK-31605: --- Summary: Unable to insert data with partial dynamic partition with Spark & Hive 3 Key: SPARK-31605 URL: https://issues.apache.org/jira/browse/SPARK-31605 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 2.3.2 Environment: Hortonwork HDP 3.1.0 Spark 2.3.2 Hive 3 Reporter: Amit Ashish When performing inserting data with dynamic partition, the operation fails if all partitions are not dynamic. For example: The query {code:sql} insert overwrite table t1 (part_a='a', part_b) select * from t2 {code} will fails with errors {code:xml} Cannot create partition spec from hdfs:/// ; missing keys [part_a] Ignoring invalid DP directory {code} On the other hand, if I remove the static value of part_a to make the insert fully dynamic, the following query will success. {code:sql} insert overwrite table t1 (part_a, part_b) select * from t2 {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-19984) ERROR codegen.CodeGenerator: failed to compile: org.codehaus.commons.compiler.CompileException: File 'generated.java'
[ https://issues.apache.org/jira/browse/SPARK-19984?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16934744#comment-16934744 ] Ashish commented on SPARK-19984: I faced the similar error. I am using spark 2.4. I am using a lot of data frames and was using dropduplicates() function with almost all the df. Part of the log below,which I got: 2019-09-20 17:10:14,130 [Driver] ERROR org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator - failed to compile: org.codehaus.commons.compiler.CompileException: File 'generated.java', Line 686, Column 28: Redefinition of parameter "agg_expr_11"2019-09-20 17:10:14,130 [Driver] ERROR org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator - failed to compile: org.codehaus.commons.compiler.CompileException: File 'generated.java', Line 686, Column 28: Redefinition of parameter "agg_expr_11"org.codehaus.commons.compiler.CompileException: File 'generated.java', Line 686, Column 28: Redefinition of parameter "agg_expr_11" at org.codehaus.janino.UnitCompiler.compileError(UnitCompiler.java:11821) at org.codehaus.janino.UnitCompiler.buildLocalVariableMap(UnitCompiler.java:3174) at org.codehaus.janino.UnitCompiler.compile(UnitCompiler.java:3009) at org.codehaus.janino.UnitCompiler.compileDeclaredMethods(UnitCompiler.java:1336) at org.codehaus.janino.UnitCompiler.compileDeclaredMethods(UnitCompiler.java:1309) at org.codehaus.janino.UnitCompiler.compile2(UnitCompiler.java:799) at org.codehaus.janino.UnitCompiler.compile2(UnitCompiler.java:958) at org.codehaus.janino.UnitCompiler.access$700(UnitCompiler.java:212) at org.codehaus.janino.UnitCompiler$2.visitMemberClassDeclaration(UnitCompiler.java:393) at org.codehaus.janino.UnitCompiler$2.visitMemberClassDeclaration(UnitCompiler.java:385) at org.codehaus.janino.Java$MemberClassDeclaration.accept(Java.java:1286) at org.codehaus.janino.UnitCompiler.compile(UnitCompiler.java:385) at org.codehaus.janino.UnitCompiler.compileDeclaredMemberTypes(UnitCompiler.java:1285) at org.codehaus.janino.UnitCompiler.compile2(UnitCompiler.java:825) at org.codehaus.janino.UnitCompiler.compile2(UnitCompiler.java:411) at org.codehaus.janino.UnitCompiler.access$400(UnitCompiler.java:212) at org.codehaus.janino.UnitCompiler$2.visitPackageMemberClassDeclaration(UnitCompiler.java:390) at org.codehaus.janino.UnitCompiler$2.visitPackageMemberClassDeclaration(UnitCompiler.java:385) at org.codehaus.janino.Java$PackageMemberClassDeclaration.accept(Java.java:1405) at org.codehaus.janino.UnitCompiler.compile(UnitCompiler.java:385) at org.codehaus.janino.UnitCompiler.compileUnit(UnitCompiler.java:357) at org.codehaus.janino.SimpleCompiler.cook(SimpleCompiler.java:234) at org.codehaus.janino.SimpleCompiler.compileToClassLoader(SimpleCompiler.java:446) at org.codehaus.janino.ClassBodyEvaluator.compileToClass(ClassBodyEvaluator.java:313) at org.codehaus.janino.ClassBodyEvaluator.cook(ClassBodyEvaluator.java:235) at org.codehaus.janino.SimpleCompiler.cook(SimpleCompiler.java:204) at org.codehaus.commons.compiler.Cookable.cook(Cookable.java:80) at org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator$.org$apache$spark$sql$catalyst$expressions$codegen$CodeGenerator$$doCompile(CodeGenerator.scala:1420) at org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator$$anon$1.load(CodeGenerator.scala:1496) at org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator$$anon$1.load(CodeGenerator.scala:1493) at org.spark_project.guava.cache.LocalCache$LoadingValueReference.loadFuture(LocalCache.java:3599) at org.spark_project.guava.cache.LocalCache$Segment.loadSync(LocalCache.java:2379) at org.spark_project.guava.cache.LocalCache$Segment.lockedGetOrLoad(LocalCache.java:2342) at org.spark_project.guava.cache.LocalCache$Segment.get(LocalCache.java:2257) at org.spark_project.guava.cache.LocalCache.get(LocalCache.java:4000) at org.spark_project.guava.cache.LocalCache.getOrLoad(LocalCache.java:4004) at org.spark_project.guava.cache.LocalCache$LocalLoadingCache.get(LocalCache.java:4874) at org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator$.compile(CodeGenerator.scala:1368) at org.apache.spark.sql.execution.WholeStageCodegenExec.liftedTree1$1(WholeStageCodegenExec.scala:579) at org.apache.spark.sql.execution.WholeStageCodegenExec.doExecute(WholeStageCodegenExec.scala:578) at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:131) at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:127) at org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:155) at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151) at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:152) at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:127) at
[jira] [Updated] (SPARK-24081) Spark SQL drops the table while writing into table in "overwrite" mode.
[ https://issues.apache.org/jira/browse/SPARK-24081?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ashish updated SPARK-24081: --- Priority: Blocker (was: Major) > Spark SQL drops the table while writing into table in "overwrite" mode. > > > Key: SPARK-24081 > URL: https://issues.apache.org/jira/browse/SPARK-24081 > Project: Spark > Issue Type: Bug > Components: Java API >Affects Versions: 2.3.0 >Reporter: Ashish >Priority: Blocker > > I am taking data from table and doing modification to the data once I am > writing back to table in overwrite mode its deleting all the record. > Expectation: It will update the table with updated data. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-24081) Spark SQL drops the table while writing into table in "overwrite" mode.
Ashish created SPARK-24081: -- Summary: Spark SQL drops the table while writing into table in "overwrite" mode. Key: SPARK-24081 URL: https://issues.apache.org/jira/browse/SPARK-24081 Project: Spark Issue Type: Bug Components: Java API Affects Versions: 2.3.0 Reporter: Ashish I am taking data from table and doing modification to the data once I am writing back to table in overwrite mode its deleting all the record. Expectation: It will update the table with updated data. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-13699) Spark SQL drops the table in "overwrite" mode while writing into table
[ https://issues.apache.org/jira/browse/SPARK-13699?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16449661#comment-16449661 ] Ashish commented on SPARK-13699: Is this issue gone resolve . I am facing same issue while writing to table in overwrite mode its truncating the table. > Spark SQL drops the table in "overwrite" mode while writing into table > -- > > Key: SPARK-13699 > URL: https://issues.apache.org/jira/browse/SPARK-13699 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 1.6.0 >Reporter: Dhaval Modi >Priority: Major > Attachments: stackTrace.txt > > > Hi, > While writing the dataframe to HIVE table with "SaveMode.Overwrite" option. > E.g. > tgtFinal.write.mode(SaveMode.Overwrite).saveAsTable("tgt_table") > sqlContext drop the table instead of truncating. > This is causing error while overwriting. > Adding stacktrace & commands to reproduce the issue, > Thanks & Regards, > Dhaval -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org