Re: Advanced GC Tuning

2021-07-20 Thread Sean Owen
You're right, I think storageFraction is somewhat better to control this, although some things 'counted' in spark.memory.fraction will also be long-lived and in the OldGen. You can also increase the OldGen size if you're pretty sure that's the issue - 'old' objects in the YoungGen. I'm not sure ho

Advanced GC Tuning

2021-07-20 Thread Kuznetsov, Oleksandr
Hello, I was reading the Garbage Collection Tuning guide here: Tuning - Spark 3.1.2 Documentation (apache.org), specifically section on "Advanced GC Tuning". It is stated that if OldGen region is getting full, it is rec

Unpacking and using external modules with PySpark inside k8s

2021-07-20 Thread Mich Talebzadeh
I have been struggling with this. Kubernetes (not that matters minikube is working fine. In one of the module called configure.py I am importing yaml module import yaml This is throwing errors import yaml ModuleNotFoundError: No module named 'yaml' I have been through a number of loo

Re: Unable to write data into hive table using Spark via Hive JDBC driver Caused by: org.apache.hive.service.cli.HiveSQLException: Error while compiling statement: FAILED

2021-07-20 Thread Mich Talebzadeh
BTW what assumption is there that the thread owner is writing to the cluster? The thrift server is running locally on localhost:1. I concur that JDBC to remote Hive is needed. However, this is not the impression I get here. df.write .format("jdbc") .option("url", "jdbc:hive2://localhost:10

unsubscribe

2021-07-20 Thread Du Li

Re: Unable to write data into hive table using Spark via Hive JDBC driver Caused by: org.apache.hive.service.cli.HiveSQLException: Error while compiling statement: FAILED

2021-07-20 Thread Daniel de Oliveira Mantovani
>From the Cloudera Documentation: https://docs.cloudera.com/documentation/other/connectors/hive-jdbc/latest/Cloudera-JDBC-Driver-for-Apache-Hive-Install-Guide.pdf UseNativeQuery 1: The driver does not transform the queries emitted by applications, so the native query is used. 0: The driver trans

Re: Unable to write data into hive table using Spark via Hive JDBC driver Caused by: org.apache.hive.service.cli.HiveSQLException: Error while compiling statement: FAILED

2021-07-20 Thread Daniel de Oliveira Mantovani
Insert mode is "overwrite", it shouldn't doesn't matter if the table already exists or not. The JDBC driver should be based on the Cloudera Hive version, we can't know the CDH version he's using. On Tue, Jul 20, 2021 at 1:21 PM Mich Talebzadeh wrote: > The driver is fine and latest and it shoul

Re: Unable to write data into hive table using Spark via Hive JDBC driver Caused by: org.apache.hive.service.cli.HiveSQLException: Error while compiling statement: FAILED

2021-07-20 Thread Mich Talebzadeh
The driver is fine and latest and it should work. I have asked the thread owner to send the DDL of the table and how the table is created. In this case JDBC from Spark expects the table to be there. The error below java.sql.SQLException: [Cloudera][HiveJDBCDriver](500051) ERROR processing query

Re: Unable to write data into hive table using Spark via Hive JDBC driver Caused by: org.apache.hive.service.cli.HiveSQLException: Error while compiling statement: FAILED

2021-07-20 Thread Daniel de Oliveira Mantovani
Badrinath is trying to write to a Hive in a cluster where he doesn't have permission to submit spark jobs, he doesn't have Hive/Spark metadata access. The only way to communicate with this third-party Hive cluster is through JDBC protocol. [ Cloudera Data Hub - Hive Server] <-> [Spark Standalone]