Prometheus with spark

2022-10-21 Thread Raj ks
Hi Team,


We wanted to query Prometheus data with spark. Any suggestions will
be appreciated

Searched for documents but did not got any prompt one


[PySpark, Spark Streaming] Bug in timestamp handling in Structured Streaming?

2022-10-21 Thread kai-michael.roes...@sap.com.INVALID
Hi,

I suspect I may have come across a bug in the handling of data containing 
timestamps in PySpark "Structured Streaming" using the foreach option. I'm 
"just" a user of PySpark, no Spark community member, so I don't know how to 
properly address the issue. I have posted a 
question
 about this on StackOverflow but that didn't get any attention, yet. Could 
someone please have a look at it to check whether it is really a bug? In case a 
Jira ticket is created could you please send me the link?
Thanks and best regards
Kai Roesner.
Dr. Kai-Michael Roesner
Development Architect
Technology & Innovation, Common Data Services
SAP SE
Robert-Bosch-Strasse 30/34
69190 Walldorf, Germany
T +49 6227 7-64216
F +49 6227 78-28459
E kai-michael.roes...@sap.com
www.sap.com

Please consider the impact on the environment before printing this e-mail.

Pflichtangaben/Mandatory Disclosure Statements:
www.sap.com/corporate-en/impressum

Diese E-Mail kann Betriebs- oder Geschäftsgeheimnisse oder sonstige 
vertrauliche Informationen enthalten. Sollten Sie diese E-Mail irrtümlich 
erhalten haben, ist Ihnen eine Kenntnisnahme des Inhalts, eine Vervielfältigung 
oder Weitergabe der E-Mail ausdrücklich untersagt. Bitte benachrichtigen Sie 
uns und vernichten Sie die empfangene E-Mail. Vielen Dank.

This e-mail may contain trade secrets or privileged, undisclosed, or otherwise 
confidential information. If you have received this e-mail in error, you are 
hereby notified that any review, copying, or distribution of it is strictly 
prohibited. Please inform us immediately and destroy the original transmittal. 
Thank you for your cooperation.



Re: pyspark connect to spark thrift server port

2022-10-21 Thread Artemis User
I guess there are some confusions here between the metastore and the 
actual Hive database.  Spark (as well as Apache Hive) requires two 
databases for Hive DB operations.  Metastore is used for storing 
metadata only (e.g., schema info), whereas the actual Hive database, 
accessible through Thrift server, is used for applications.  The reason 
why Hive needs its metadata stored separately as a server is because for 
distributed database operations.


My previous message referred to how to secure the metastore database, 
not the actual Hive tables.  Looks like you are looking for how to 
secure access to Hive not metastore (metastore isn't used by general 
users), and your current configuration wasn't set up with the right user 
access control.  Hive actually supports role-based access model just 
like other RDBMS.  You may refer to the Hive admin guide for more 
details 
(https://cwiki.apache.org/confluence/display/Hive/SQL+Standard+Based+Hive+Authorization). 
You can use beeline or SQL scripts via beeline to set user privileges 
and roles.


On 10/21/22 1:27 AM, second_co...@yahoo.com.INVALID wrote:


Hello Artemis,
   Understand, if i gave hive metastore uri to anyone to connect using 
pyspark. the port 9083 is open for anyone without authentication 
feature. The only way pyspark able to connect to hive is through 9083 
and not through port 1.
On Friday, October 21, 2022 at 04:06:38 AM GMT+8, Artemis User 
 wrote:



By default, Spark uses Apache Derby (running in embedded mode with 
store content defined in local files) for hosting the Hive metastore.  
You can externalize the metastore on a JDBC-compliant database (e.g., 
PostgreSQL) and use the database authentication provided by the 
database.  The JDBC configuration shall be defined in a hive-site.xml 
file in the Spark conf directory.  Please see the metastore admin 
guide for more details, including an init script for setting up your 
metastore 
(https://cwiki.apache.org/confluence/display/Hive/AdminManual+Metastore+3.0+Administration 
). 



On 10/20/22 4:31 AM, second_co...@yahoo.com.INVALID 
 wrote:
Currently my pyspark code able to connect to hive metastore at port 
9083. However using this approach i can't put in-place any security 
mechanism like LDAP and sql authentication control. Is there anyway to 
connect from pyspark to spark thrift server on port 1 without 
exposing hive metastore url to the pyspark ? I would like to 
authenticate the user before allow to execute spark sql, and user 
should only allow to query from databases,tables that they have the 
access.




Thank you,
comet