RE: [EXTERNAL] Re: spark ETL and spark thrift server running together

2022-03-30 Thread Alex Kosberg
Hi Christophe,
Thank you for the explanation!

Regards,
Alex


From: Christophe Préaud 
Sent: Wednesday, March 30, 2022 3:43 PM
To: Alex Kosberg ; user@spark.apache.org
Subject: [EXTERNAL] Re: spark ETL and spark thrift server running together

Hi Alex,

As stated in the Hive documentation 
(https://cwiki.apache.org/confluence/display/Hive/AdminManual+Metastore+Administration<https://clicktime.symantec.com/3UA3CcaMQzi5nnnSG5p8sNw6H4?u=https%3A%2F%2Fcwiki.apache.org%2Fconfluence%2Fdisplay%2FHive%2FAdminManual%2BMetastore%2BAdministration>):

An embedded metastore database is mainly used for unit tests. Only one process 
can connect to the metastore database at a time, so it is not really a 
practical solution but works well for unit tests.

You need to set up a remote metastore database (e.g. MariaDB / MySQL) for 
production use.

Regards,
Christophe.

On 3/30/22 13:31, Alex Kosberg wrote:
Hi,
Some details:
1.   Spark SQL (version 3.2.1)
2.   Driver: Hive JDBC (version 2.3.9)
3.   ThriftCLIService: Starting ThriftBinaryCLIService on port 1 with 
5...500 worker threads
4.   BI tool is connect via odbc driver
After activating Spark Thrift Server I'm unable to run pyspark script using 
spark-submit as they both use the same metastore_db
error:
Caused by: ERROR XJ040: Failed to start database 'metastore_db' with class 
loader 
org.apache.spark.sql.hive.client.IsolatedClientLoader$$anon$1@3acaa384<mailto:org.apache.spark.sql.hive.client.IsolatedClientLoader$$anon$1@3acaa384>,
 see the next exception for details.
at org.apache.derby.iapi.error.StandardException.newException(Unknown 
Source)
at 
org.apache.derby.impl.jdbc.SQLExceptionFactory.wrapArgsForTransportAcrossDRDA(Unknown
 Source)
... 140 more
Caused by: ERROR XSDB6: Another instance of Derby may have already booted the 
database /tmp/metastore_db.

I need to be able to run PySpark (Spark ETL) while having spark thrift server 
up for BI tool queries. Any workaround for it?
Thanks!


Notice: This e-mail together with any attachments may contain information of 
Ribbon Communications Inc. and its Affiliates that is confidential and/or 
proprietary for the sole use of the intended recipient. Any review, disclosure, 
reliance or distribution by others or forwarding without express permission is 
strictly prohibited. If you are not the intended recipient, please notify the 
sender immediately and then delete all copies, including any attachments.



Notice: This e-mail together with any attachments may contain information of 
Ribbon Communications Inc. and its Affiliates that is confidential and/or 
proprietary for the sole use of the intended recipient. Any review, disclosure, 
reliance or distribution by others or forwarding without express permission is 
strictly prohibited. If you are not the intended recipient, please notify the 
sender immediately and then delete all copies, including any attachments.


Re: spark ETL and spark thrift server running together

2022-03-30 Thread Christophe Préaud
Hi Alex,

As stated in the Hive documentation 
(https://cwiki.apache.org/confluence/display/Hive/AdminManual+Metastore+Administration):

*An embedded metastore database is mainly used for unit tests. Only one process 
can connect to the metastore database at a time, so it is not really a 
practical solution but works well for unit tests.*


You need to set up a remote metastore database (e.g. MariaDB / MySQL) for 
production use.

Regards,
Christophe.

On 3/30/22 13:31, Alex Kosberg wrote:
>
> Hi,
>
> Some details:
>
> · Spark SQL (version 3.2.1)
>
> · Driver: Hive JDBC (version 2.3.9)
>
> · ThriftCLIService: Starting ThriftBinaryCLIService on port 1 
> with 5...500 worker threads
>
> · BI tool is connect via odbc driver
>
> After activating Spark Thrift Server I'm unable to run pyspark script using 
> spark-submit as they both use the same metastore_db
>
> error:
>
> Caused by: ERROR XJ040: Failed to start database 'metastore_db' with class 
> loader 
> org.apache.spark.sql.hive.client.IsolatedClientLoader$$anon$1@3acaa384, see 
> the next exception for details.
>
>     at org.apache.derby.iapi.error.StandardException.newException(Unknown 
> Source)
>
>     at 
> org.apache.derby.impl.jdbc.SQLExceptionFactory.wrapArgsForTransportAcrossDRDA(Unknown
>  Source)
>
>     ... 140 more
>
> Caused by: ERROR XSDB6: Another instance of Derby may have already booted the 
> database /tmp/metastore_db.
>
>  
>
> I need to be able to run PySpark (Spark ETL) while having spark thrift server 
> up for BI tool queries. Any workaround for it?
>
> Thanks!
>
>  
>
>
> Notice: This e-mail together with any attachments may contain information of 
> Ribbon Communications Inc. and its Affiliates that is confidential and/or 
> proprietary for the sole use of the intended recipient. Any review, 
> disclosure, reliance or distribution by others or forwarding without express 
> permission is strictly prohibited. If you are not the intended recipient, 
> please notify the sender immediately and then delete all copies, including 
> any attachments.



spark ETL and spark thrift server running together

2022-03-30 Thread Alex Kosberg
Hi,
Some details:
* Spark SQL (version 3.2.1)
* Driver: Hive JDBC (version 2.3.9)
* ThriftCLIService: Starting ThriftBinaryCLIService on port 1 with 
5...500 worker threads
* BI tool is connect via odbc driver
After activating Spark Thrift Server I'm unable to run pyspark script using 
spark-submit as they both use the same metastore_db
error:
Caused by: ERROR XJ040: Failed to start database 'metastore_db' with class 
loader 
org.apache.spark.sql.hive.client.IsolatedClientLoader$$anon$1@3acaa384,
 see the next exception for details.
at org.apache.derby.iapi.error.StandardException.newException(Unknown 
Source)
at 
org.apache.derby.impl.jdbc.SQLExceptionFactory.wrapArgsForTransportAcrossDRDA(Unknown
 Source)
... 140 more
Caused by: ERROR XSDB6: Another instance of Derby may have already booted the 
database /tmp/metastore_db.

I need to be able to run PySpark (Spark ETL) while having spark thrift server 
up for BI tool queries. Any workaround for it?
Thanks!


Notice: This e-mail together with any attachments may contain information of 
Ribbon Communications Inc. and its Affiliates that is confidential and/or 
proprietary for the sole use of the intended recipient. Any review, disclosure, 
reliance or distribution by others or forwarding without express permission is 
strictly prohibited. If you are not the intended recipient, please notify the 
sender immediately and then delete all copies, including any attachments.