[jira] [Updated] (SPARK-31605) Unable to insert data with partial dynamic partition with Spark & Hive 3

2020-04-29 Thread Amit Ashish (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-31605?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Amit Ashish updated SPARK-31605:

Description: 
When performing inserting data with dynamic partition, the operation fails if 
all partitions are not dynamic. For example:

 
{code:sql}
create external table test_insert(a int) partitioned by (part_a string, part_b 
string) stored as parquet location '';
 
{code}
The query
{code:sql}
insert into table test_insert partition(part_a='a', part_b) values (3, 'b');
{code}
will fails with errors
{code:xml}
Cannot create partition spec from hdfs:/// ; missing keys [part_a]
Ignoring invalid DP directory 
{code}
 

 

 

On the other hand, if I remove the static value of part_a to make the insert 
fully dynamic, the following query will succeed. Please note that below is not 
the issue . Issue is above one , where query throws invalid DP directory 
warning.
{code:sql}
insert into table test_insert partition(part_a, part_b) values (1,'a','b');
{code}

  was:
When performing inserting data with dynamic partition, the operation fails if 
all partitions are not dynamic. For example:

The query
{code:sql}
insert into table test_insert partition(part_a='a', part_b) values (3, 'b');
{code}
will fails with errors
{code:xml}
Cannot create partition spec from hdfs:/// ; missing keys [part_a]
Ignoring invalid DP directory 
{code}
On the other hand, if I remove the static value of part_a to make the insert 
fully dynamic, the following query will success.
{code:sql}
insert overwrite table t1 (part_a, part_b) select * from t2
{code}


> Unable to insert data with partial dynamic partition with Spark & Hive 3
> 
>
> Key: SPARK-31605
> URL: https://issues.apache.org/jira/browse/SPARK-31605
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.3.2
> Environment: Hortonwork HDP 3.1.0
> Spark 2.3.2
> Hive 3
>Reporter: Amit Ashish
>Priority: Major
>
> When performing inserting data with dynamic partition, the operation fails if 
> all partitions are not dynamic. For example:
>  
> {code:sql}
> create external table test_insert(a int) partitioned by (part_a string, 
> part_b string) stored as parquet location '';
>  
> {code}
> The query
> {code:sql}
> insert into table test_insert partition(part_a='a', part_b) values (3, 'b');
> {code}
> will fails with errors
> {code:xml}
> Cannot create partition spec from hdfs:/// ; missing keys [part_a]
> Ignoring invalid DP directory 
> {code}
>  
>  
>  
> On the other hand, if I remove the static value of part_a to make the insert 
> fully dynamic, the following query will succeed. Please note that below is 
> not the issue . Issue is above one , where query throws invalid DP directory 
> warning.
> {code:sql}
> insert into table test_insert partition(part_a, part_b) values (1,'a','b');
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-31605) Unable to insert data with partial dynamic partition with Spark & Hive 3

2020-04-29 Thread Amit Ashish (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-31605?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17095462#comment-17095462
 ] 

Amit Ashish commented on SPARK-31605:
-

previously closed ticket does not show the actual insert statement working.

 

below is the query that is not working:

insert into table test_insert partition(part_a='a', part_b) values (3, 'b');

 

Getting below error:

 

WARN FileOperations: Ignoring invalid DP directory 
hdfs://HDP3/warehouse/tablespace/external/hive/dw_analyst.db/test_insert/.hive-staging_hive_2020-04-29_13-28-46_360_4646016571504464856-1/-ext-1/part_b=b
20/04/29 13:28:52 INFO Hive: Loaded 0 partitions

 

As mentioned in previous ticket , setting below does not make any difference:

 

set hive.exec.dynamic.partition.mode=nonstrict;

 

Neither setting spark.hadoop.hive.exec.dynamic.partition.mode=nonstrict as 
spark config solves this .

 

 

 

 

 

 

> Unable to insert data with partial dynamic partition with Spark & Hive 3
> 
>
> Key: SPARK-31605
> URL: https://issues.apache.org/jira/browse/SPARK-31605
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.3.2
> Environment: Hortonwork HDP 3.1.0
> Spark 2.3.2
> Hive 3
>Reporter: Amit Ashish
>Priority: Major
>
> When performing inserting data with dynamic partition, the operation fails if 
> all partitions are not dynamic. For example:
> The query
> {code:sql}
> insert overwrite table t1 (part_a='a', part_b) select * from t2
> {code}
> will fails with errors
> {code:xml}
> Cannot create partition spec from hdfs:/// ; missing keys [part_a]
> Ignoring invalid DP directory 
> {code}
> On the other hand, if I remove the static value of part_a to make the insert 
> fully dynamic, the following query will success.
> {code:sql}
> insert overwrite table t1 (part_a, part_b) select * from t2
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-31605) Unable to insert data with partial dynamic partition with Spark & Hive 3

2020-04-29 Thread Amit Ashish (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-31605?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Amit Ashish updated SPARK-31605:

Description: 
When performing inserting data with dynamic partition, the operation fails if 
all partitions are not dynamic. For example:

The query
{code:sql}
insert into table test_insert partition(part_a='a', part_b) values (3, 'b');
{code}
will fails with errors
{code:xml}
Cannot create partition spec from hdfs:/// ; missing keys [part_a]
Ignoring invalid DP directory 
{code}
On the other hand, if I remove the static value of part_a to make the insert 
fully dynamic, the following query will success.
{code:sql}
insert overwrite table t1 (part_a, part_b) select * from t2
{code}

  was:
When performing inserting data with dynamic partition, the operation fails if 
all partitions are not dynamic. For example:

The query
{code:sql}
insert overwrite table t1 (part_a='a', part_b) select * from t2
{code}
will fails with errors
{code:xml}
Cannot create partition spec from hdfs:/// ; missing keys [part_a]
Ignoring invalid DP directory 
{code}
On the other hand, if I remove the static value of part_a to make the insert 
fully dynamic, the following query will success.
{code:sql}
insert overwrite table t1 (part_a, part_b) select * from t2
{code}


> Unable to insert data with partial dynamic partition with Spark & Hive 3
> 
>
> Key: SPARK-31605
> URL: https://issues.apache.org/jira/browse/SPARK-31605
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.3.2
> Environment: Hortonwork HDP 3.1.0
> Spark 2.3.2
> Hive 3
>Reporter: Amit Ashish
>Priority: Major
>
> When performing inserting data with dynamic partition, the operation fails if 
> all partitions are not dynamic. For example:
> The query
> {code:sql}
> insert into table test_insert partition(part_a='a', part_b) values (3, 'b');
> {code}
> will fails with errors
> {code:xml}
> Cannot create partition spec from hdfs:/// ; missing keys [part_a]
> Ignoring invalid DP directory 
> {code}
> On the other hand, if I remove the static value of part_a to make the insert 
> fully dynamic, the following query will success.
> {code:sql}
> insert overwrite table t1 (part_a, part_b) select * from t2
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-31605) Unable to insert data with partial dynamic partition with Spark & Hive 3

2020-04-29 Thread Amit Ashish (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-31605?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17095462#comment-17095462
 ] 

Amit Ashish edited comment on SPARK-31605 at 4/29/20, 1:42 PM:
---

previously closed ticket does not show the actual insert statement working.

 

below is the query that is not working:

insert into table test_insert partition(part_a='a', part_b) values (3, 'b');

 

Getting below warning:

 

WARN FileOperations: Ignoring invalid DP directory 
hdfs://HDP3/warehouse/tablespace/external/hive/dw_analyst.db/test_insert/.hive-staging_hive_2020-04-29_13-28-46_360_4646016571504464856-1/-ext-1/part_b=b
 20/04/29 13:28:52 INFO Hive: Loaded 0 partitions

 

As mentioned in previous ticket , setting below does not make any difference:

 

set hive.exec.dynamic.partition.mode=nonstrict;

 

Neither setting spark.hadoop.hive.exec.dynamic.partition.mode=nonstrict as 
spark config solves this .

 

 

Worst part is data does not get inserted and the return code is still 0 . 
Kindly either suggest a fix for this or enable a non-zero return code to track 
this in automated data pipelines .

 

 

 

 

 

 


was (Author: dreamaaj):
previously closed ticket does not show the actual insert statement working.

 

below is the query that is not working:

insert into table test_insert partition(part_a='a', part_b) values (3, 'b');

 

Getting below error:

 

WARN FileOperations: Ignoring invalid DP directory 
hdfs://HDP3/warehouse/tablespace/external/hive/dw_analyst.db/test_insert/.hive-staging_hive_2020-04-29_13-28-46_360_4646016571504464856-1/-ext-1/part_b=b
20/04/29 13:28:52 INFO Hive: Loaded 0 partitions

 

As mentioned in previous ticket , setting below does not make any difference:

 

set hive.exec.dynamic.partition.mode=nonstrict;

 

Neither setting spark.hadoop.hive.exec.dynamic.partition.mode=nonstrict as 
spark config solves this .

 

 

 

 

 

 

> Unable to insert data with partial dynamic partition with Spark & Hive 3
> 
>
> Key: SPARK-31605
> URL: https://issues.apache.org/jira/browse/SPARK-31605
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.3.2
> Environment: Hortonwork HDP 3.1.0
> Spark 2.3.2
> Hive 3
>Reporter: Amit Ashish
>Priority: Major
>
> When performing inserting data with dynamic partition, the operation fails if 
> all partitions are not dynamic. For example:
> The query
> {code:sql}
> insert overwrite table t1 (part_a='a', part_b) select * from t2
> {code}
> will fails with errors
> {code:xml}
> Cannot create partition spec from hdfs:/// ; missing keys [part_a]
> Ignoring invalid DP directory 
> {code}
> On the other hand, if I remove the static value of part_a to make the insert 
> fully dynamic, the following query will success.
> {code:sql}
> insert overwrite table t1 (part_a, part_b) select * from t2
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-31605) Unable to insert data with partial dynamic partition with Spark & Hive 3

2020-04-29 Thread Amit Ashish (Jira)
Amit Ashish created SPARK-31605:
---

 Summary: Unable to insert data with partial dynamic partition with 
Spark & Hive 3
 Key: SPARK-31605
 URL: https://issues.apache.org/jira/browse/SPARK-31605
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 2.3.2
 Environment: Hortonwork HDP 3.1.0

Spark 2.3.2

Hive 3
Reporter: Amit Ashish


When performing inserting data with dynamic partition, the operation fails if 
all partitions are not dynamic. For example:

The query
{code:sql}
insert overwrite table t1 (part_a='a', part_b) select * from t2
{code}
will fails with errors
{code:xml}
Cannot create partition spec from hdfs:/// ; missing keys [part_a]
Ignoring invalid DP directory 
{code}
On the other hand, if I remove the static value of part_a to make the insert 
fully dynamic, the following query will success.
{code:sql}
insert overwrite table t1 (part_a, part_b) select * from t2
{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-19984) ERROR codegen.CodeGenerator: failed to compile: org.codehaus.commons.compiler.CompileException: File 'generated.java'

2019-09-20 Thread Ashish (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-19984?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16934744#comment-16934744
 ] 

Ashish commented on SPARK-19984:


I faced the similar error. I am using spark 2.4. I am using a lot of data 
frames and was using dropduplicates() function with almost all the df.  Part of 
the log below,which I got:

 

2019-09-20 17:10:14,130 [Driver] ERROR 
org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator  - failed to 
compile: org.codehaus.commons.compiler.CompileException: File 'generated.java', 
Line 686, Column 28: Redefinition of parameter "agg_expr_11"2019-09-20 
17:10:14,130 [Driver] ERROR 
org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator  - failed to 
compile: org.codehaus.commons.compiler.CompileException: File 'generated.java', 
Line 686, Column 28: Redefinition of parameter 
"agg_expr_11"org.codehaus.commons.compiler.CompileException: File 
'generated.java', Line 686, Column 28: Redefinition of parameter "agg_expr_11" 
at org.codehaus.janino.UnitCompiler.compileError(UnitCompiler.java:11821) at 
org.codehaus.janino.UnitCompiler.buildLocalVariableMap(UnitCompiler.java:3174) 
at org.codehaus.janino.UnitCompiler.compile(UnitCompiler.java:3009) at 
org.codehaus.janino.UnitCompiler.compileDeclaredMethods(UnitCompiler.java:1336) 
at 
org.codehaus.janino.UnitCompiler.compileDeclaredMethods(UnitCompiler.java:1309) 
at org.codehaus.janino.UnitCompiler.compile2(UnitCompiler.java:799) at 
org.codehaus.janino.UnitCompiler.compile2(UnitCompiler.java:958) at 
org.codehaus.janino.UnitCompiler.access$700(UnitCompiler.java:212) at 
org.codehaus.janino.UnitCompiler$2.visitMemberClassDeclaration(UnitCompiler.java:393)
 at 
org.codehaus.janino.UnitCompiler$2.visitMemberClassDeclaration(UnitCompiler.java:385)
 at org.codehaus.janino.Java$MemberClassDeclaration.accept(Java.java:1286) at 
org.codehaus.janino.UnitCompiler.compile(UnitCompiler.java:385) at 
org.codehaus.janino.UnitCompiler.compileDeclaredMemberTypes(UnitCompiler.java:1285)
 at org.codehaus.janino.UnitCompiler.compile2(UnitCompiler.java:825) at 
org.codehaus.janino.UnitCompiler.compile2(UnitCompiler.java:411) at 
org.codehaus.janino.UnitCompiler.access$400(UnitCompiler.java:212) at 
org.codehaus.janino.UnitCompiler$2.visitPackageMemberClassDeclaration(UnitCompiler.java:390)
 at 
org.codehaus.janino.UnitCompiler$2.visitPackageMemberClassDeclaration(UnitCompiler.java:385)
 at 
org.codehaus.janino.Java$PackageMemberClassDeclaration.accept(Java.java:1405) 
at org.codehaus.janino.UnitCompiler.compile(UnitCompiler.java:385) at 
org.codehaus.janino.UnitCompiler.compileUnit(UnitCompiler.java:357) at 
org.codehaus.janino.SimpleCompiler.cook(SimpleCompiler.java:234) at 
org.codehaus.janino.SimpleCompiler.compileToClassLoader(SimpleCompiler.java:446)
 at 
org.codehaus.janino.ClassBodyEvaluator.compileToClass(ClassBodyEvaluator.java:313)
 at org.codehaus.janino.ClassBodyEvaluator.cook(ClassBodyEvaluator.java:235) at 
org.codehaus.janino.SimpleCompiler.cook(SimpleCompiler.java:204) at 
org.codehaus.commons.compiler.Cookable.cook(Cookable.java:80) at 
org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator$.org$apache$spark$sql$catalyst$expressions$codegen$CodeGenerator$$doCompile(CodeGenerator.scala:1420)
 at 
org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator$$anon$1.load(CodeGenerator.scala:1496)
 at 
org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator$$anon$1.load(CodeGenerator.scala:1493)
 at 
org.spark_project.guava.cache.LocalCache$LoadingValueReference.loadFuture(LocalCache.java:3599)
 at 
org.spark_project.guava.cache.LocalCache$Segment.loadSync(LocalCache.java:2379) 
at 
org.spark_project.guava.cache.LocalCache$Segment.lockedGetOrLoad(LocalCache.java:2342)
 at org.spark_project.guava.cache.LocalCache$Segment.get(LocalCache.java:2257) 
at org.spark_project.guava.cache.LocalCache.get(LocalCache.java:4000) at 
org.spark_project.guava.cache.LocalCache.getOrLoad(LocalCache.java:4004) at 
org.spark_project.guava.cache.LocalCache$LocalLoadingCache.get(LocalCache.java:4874)
 at 
org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator$.compile(CodeGenerator.scala:1368)
 at 
org.apache.spark.sql.execution.WholeStageCodegenExec.liftedTree1$1(WholeStageCodegenExec.scala:579)
 at 
org.apache.spark.sql.execution.WholeStageCodegenExec.doExecute(WholeStageCodegenExec.scala:578)
 at 
org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:131)
 at 
org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:127)
 at 
org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:155)
 at 
org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151) 
at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:152) 
at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:127) at 

[jira] [Updated] (SPARK-24081) Spark SQL drops the table while writing into table in "overwrite" mode.

2018-04-25 Thread Ashish (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-24081?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashish updated SPARK-24081:
---
Priority: Blocker  (was: Major)

> Spark SQL drops the table  while writing into table in "overwrite" mode.
> 
>
> Key: SPARK-24081
> URL: https://issues.apache.org/jira/browse/SPARK-24081
> Project: Spark
>  Issue Type: Bug
>  Components: Java API
>Affects Versions: 2.3.0
>Reporter: Ashish
>Priority: Blocker
>
> I am taking data from table and doing  modification to the data once I am 
> writing back to table in overwrite mode its deleting all the record.
> Expectation: It will update the table with updated data.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-24081) Spark SQL drops the table while writing into table in "overwrite" mode.

2018-04-24 Thread Ashish (JIRA)
Ashish created SPARK-24081:
--

 Summary: Spark SQL drops the table  while writing into table in 
"overwrite" mode.
 Key: SPARK-24081
 URL: https://issues.apache.org/jira/browse/SPARK-24081
 Project: Spark
  Issue Type: Bug
  Components: Java API
Affects Versions: 2.3.0
Reporter: Ashish


I am taking data from table and doing  modification to the data once I am 
writing back to table in overwrite mode its deleting all the record.

Expectation: It will update the table with updated data.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-13699) Spark SQL drops the table in "overwrite" mode while writing into table

2018-04-24 Thread Ashish (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-13699?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16449661#comment-16449661
 ] 

Ashish commented on SPARK-13699:


Is this issue gone resolve .

I am facing same issue while writing to table in overwrite mode its truncating 
the table.

> Spark SQL drops the table in "overwrite" mode while writing into table
> --
>
> Key: SPARK-13699
> URL: https://issues.apache.org/jira/browse/SPARK-13699
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 1.6.0
>Reporter: Dhaval Modi
>Priority: Major
> Attachments: stackTrace.txt
>
>
> Hi,
> While writing the dataframe to HIVE table with "SaveMode.Overwrite" option.
> E.g.
> tgtFinal.write.mode(SaveMode.Overwrite).saveAsTable("tgt_table")
> sqlContext drop the table instead of truncating.
> This is causing error while overwriting.
> Adding stacktrace & commands to reproduce the issue,
> Thanks & Regards,
> Dhaval



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org