Re: Seeking Guidance on Spark on Kubernetes Secrets Configuration

2023-10-01 Thread Jon Rodríguez Aranguren
Dear Jörn Franke, Jayabindu Singh and Spark Community members,

Thank you profoundly for your initial insights. I feel it's necessary to
provide more precision on our setup to facilitate a deeper understanding.

We're interfacing with S3 Compatible storages, but our operational context
is somewhat distinct. Our infrastructure doesn't lean on conventional cloud
providers like AWS. Instead, we've architected our environment on
On-Premise Kubernetes distributions, specifically k0s and Openshift.

Our objective extends beyond just handling S3 keys. We're orchestrating a
solution that integrates Azure SPNs, API Credentials, and other sensitive
credentials, intending to make Kubernetes' native secrets our central
management hub. The aspiration is to have a universally deployable JAR, one
that can function unmodified across different ecosystems like EMR,
Databricks (on both AWS and Azure), etc. Platforms like Databricks have
already made strides in this direction, allowing secrets to be woven
directly into the Spark Conf through mechanisms like
{{secret_scope/secret_name}}, which are resolved dynamically.

The spark-on-k8s-operator's user guide suggests the feasibility of mounting
secrets. However, a gap exists in our understanding of how to subsequently
access these mounted secret values within the Spark application's context.

Here lies my inquiry: is the spark-on-k8s-operator currently equipped to
support this level of integration? If it does, any elucidation on the
method or best practices would be pivotal for our project. Alternatively,
if you could point me to resources or community experts who have tackled
similar challenges, it would be of immense assistance.

Thank you for bearing with the intricacies of our query, and I appreciate
your continued guidance in this endeavor.

Warm regards,

Jon Rodríguez Aranguren.

El sáb, 30 sept 2023 a las 23:19, Jayabindu Singh ()
escribió:

> Hi Jon,
>
> Using IAM as suggested by Jorn is the best approach.
> We recently moved our spark workload from HDP to Spark on K8 and utilizing
> IAM.
> It will save you from secret management headaches and also allows a lot
> more flexibility on access control and option to allow access to multiple
> S3 buckets in the same pod.
> We have implemented this across Azure, Google and AWS. Azure does require
> some extra work to make it work.
>
> On Sat, Sep 30, 2023 at 12:05 PM Jörn Franke  wrote:
>
>> Don’t use static iam (s3) credentials. It is an outdated insecure method
>> - even AWS recommend against using this for anything (cf eg
>> https://docs.aws.amazon.com/cli/latest/userguide/cli-authentication-user.html
>> ).
>> It is almost a guarantee to get your data stolen and your account
>> manipulated.
>>
>> If you need to use kubernetes (which has its own very problematic
>> security issues) then assign AWS IAM roles with minimal permissions to the
>> pods (for EKS it means using OIDC, cf
>> https://docs.aws.amazon.com/eks/latest/userguide/service_IAM_role.html).
>>
>> Am 30.09.2023 um 03:41 schrieb Jon Rodríguez Aranguren <
>> jon.r.arangu...@gmail.com>:
>>
>> 
>> Dear Spark Community Members,
>>
>> I trust this message finds you all in good health and spirits.
>>
>> I'm reaching out to the collective expertise of this esteemed community
>> with a query regarding Spark on Kubernetes. As a newcomer, I have always
>> admired the depth and breadth of knowledge shared within this forum, and it
>> is my hope that some of you might have insights on a specific challenge I'm
>> facing.
>>
>> I am currently trying to configure multiple Kubernetes secrets, notably
>> multiple S3 keys, at the SparkConf level for a Spark application. My
>> objective is to understand the best approach or methods to ensure that
>> these secrets can be smoothly accessed by the Spark application.
>>
>> If any of you have previously encountered this scenario or possess
>> relevant insights on the matter, your guidance would be highly beneficial.
>>
>> Thank you for your time and consideration. I'm eager to learn from the
>> experiences and knowledge present within this community.
>>
>> Warm regards,
>> Jon
>>
>>


Re: Seeking Guidance on Spark on Kubernetes Secrets Configuration

2023-10-01 Thread Jörn Franke
There is nowadays more a trend to move away from static credentials/certificates that are stored in a secret vault. The issue is that the rotation of them is complex, once they are leaked they can be abused, making minimal permissions feasible is cumbersome etc. That is why keyless approaches are used for A2A access (workload identity federation was mentioned). E.g. in AWS EKS you would build this on oidc (https://docs.aws.amazon.com/eks/latest/userguide/enable-iam-roles-for-service-accounts.html) and configure this instead of using secrets. Similar approaches exist in other clouds and even on-premise (eg SPIFFE https://spiffe.io/).If this will become the standard will be difficult to say - for sure they seem to more easier to manage.Since you seem to have a Kubernetes setup which means per cloud/data Centre a lot of extra work, infrastructure cost and security issues, workload Identity federation may ease this compared to a secret store.Am 01.10.2023 um 08:27 schrieb Jon Rodríguez Aranguren :Dear Jörn Franke, Jayabindu Singh and Spark Community members,Thank you profoundly for your initial insights. I feel it's necessary to provide more precision on our setup to facilitate a deeper understanding.We're interfacing with S3 Compatible storages, but our operational context is somewhat distinct. Our infrastructure doesn't lean on conventional cloud providers like AWS. Instead, we've architected our environment on On-Premise Kubernetes distributions, specifically k0s and Openshift.Our objective extends beyond just handling S3 keys. We're orchestrating a solution that integrates Azure SPNs, API Credentials, and other sensitive credentials, intending to make Kubernetes' native secrets our central management hub. The aspiration is to have a universally deployable JAR, one that can function unmodified across different ecosystems like EMR, Databricks (on both AWS and Azure), etc. Platforms like Databricks have already made strides in this direction, allowing secrets to be woven directly into the Spark Conf through mechanisms like {{secret_scope/secret_name}}, which are resolved dynamically.The spark-on-k8s-operator's user guide suggests the feasibility of mounting secrets. However, a gap exists in our understanding of how to subsequently access these mounted secret values within the Spark application's context.Here lies my inquiry: is the spark-on-k8s-operator currently equipped to support this level of integration? If it does, any elucidation on the method or best practices would be pivotal for our project. Alternatively, if you could point me to resources or community experts who have tackled similar challenges, it would be of immense assistance.Thank you for bearing with the intricacies of our query, and I appreciate your continued guidance in this endeavor.Warm regards,Jon Rodríguez Aranguren.El sáb, 30 sept 2023 a las 23:19, Jayabindu Singh () escribió:Hi Jon,Using IAM as suggested by Jorn is the best approach.We recently moved our spark workload from HDP to Spark on K8 and utilizing IAM.It will save you from secret management headaches and also allows a lot more flexibility on access control and option to allow access to multiple S3 buckets in the same pod. We have implemented this across Azure, Google and AWS. Azure does require some extra work to make it work.On Sat, Sep 30, 2023 at 12:05 PM Jörn Franke  wrote:Don’t use static iam (s3) credentials. It is an outdated insecure method - even AWS recommend against using this for anything (cf eg https://docs.aws.amazon.com/cli/latest/userguide/cli-authentication-user.html).It is almost a guarantee to get your data stolen and your account manipulated. If you need to use kubernetes (which has its own very problematic security issues) then assign AWS IAM roles with minimal permissions to the pods (for EKS it means using OIDC, cf https://docs.aws.amazon.com/eks/latest/userguide/service_IAM_role.html).Am 30.09.2023 um 03:41 schrieb Jon Rodríguez Aranguren :Dear Spark Community Members,I trust this message finds you all in good health and spirits.I'm reaching out to the collective expertise of this esteemed community with a query regarding Spark on Kubernetes. As a newcomer, I have always admired the depth and breadth of knowledge shared within this forum, and it is my hope that some of you might have insights on a specific challenge I'm facing.I am currently trying to configure multiple Kubernetes secrets, notably multiple S3 keys, at the SparkConf level for a Spark application. My objective is to understand the best approach or methods to ensure that these secrets can be smoothly accessed by the Spark application.If any of you have previously encountered this scenario or possess relevant insights on the matter, your guidance would be highly beneficial.Thank you for your time and consideration. I'm eager to learn from the experiences and knowledge present within this community.Warm regards,Jon




Re: Seeking Guidance on Spark on Kubernetes Secrets Configuration

2023-10-01 Thread Jörn Franke
With oidc sth comparable is possible: https://docs.aws.amazon.com/eks/latest/userguide/enable-iam-roles-for-service-accounts.htmlAm 01.10.2023 um 11:13 schrieb Mich Talebzadeh :It seems that workload identity is not available on AWS. Workload Identity replaces the need to use Metadata concealment on exposed storage such as s3 and gcs. The sensitive metadata protected by metadata concealment is also protected by Workload Identity.Both Google Cloud Kubernetes (GKE) and Azure Kubernetes Service support Workload Identity. Taking notes from Google Cloud:  "Workload Identity is the recommended way for your workloads running on Google Kubernetes Engine (GKE) to access Google Cloud services in a secure and manageable way."HTH

Mich Talebzadeh,Distinguished Technologist, Solutions Architect & EngineerLondonUnited Kingdom

   view my Linkedin profile https://en.everybodywiki.com/Mich_Talebzadeh

 Disclaimer: Use it at your own risk. Any and all responsibility for any loss, damage or destruction
of data or any other property which may arise from relying on this email's technical content is explicitly disclaimed.
The author will in no case be liable for any monetary damages arising from such
loss, damage or destruction.  

On Sun, 1 Oct 2023 at 06:36, Jayabindu Singh  wrote:Hi Jon,Using IAM as suggested by Jorn is the best approach.We recently moved our spark workload from HDP to Spark on K8 and utilizing IAM.It will save you from secret management headaches and also allows a lot more flexibility on access control and option to allow access to multiple S3 buckets in the same pod. We have implemented this across Azure, Google and AWS. Azure does require some extra work to make it work.On Sat, Sep 30, 2023 at 12:05 PM Jörn Franke  wrote:Don’t use static iam (s3) credentials. It is an outdated insecure method - even AWS recommend against using this for anything (cf eg https://docs.aws.amazon.com/cli/latest/userguide/cli-authentication-user.html).It is almost a guarantee to get your data stolen and your account manipulated. If you need to use kubernetes (which has its own very problematic security issues) then assign AWS IAM roles with minimal permissions to the pods (for EKS it means using OIDC, cf https://docs.aws.amazon.com/eks/latest/userguide/service_IAM_role.html).Am 30.09.2023 um 03:41 schrieb Jon Rodríguez Aranguren :Dear Spark Community Members,I trust this message finds you all in good health and spirits.I'm reaching out to the collective expertise of this esteemed community with a query regarding Spark on Kubernetes. As a newcomer, I have always admired the depth and breadth of knowledge shared within this forum, and it is my hope that some of you might have insights on a specific challenge I'm facing.I am currently trying to configure multiple Kubernetes secrets, notably multiple S3 keys, at the SparkConf level for a Spark application. My objective is to understand the best approach or methods to ensure that these secrets can be smoothly accessed by the Spark application.If any of you have previously encountered this scenario or possess relevant insights on the matter, your guidance would be highly beneficial.Thank you for your time and consideration. I'm eager to learn from the experiences and knowledge present within this community.Warm regards,Jon




Re: Seeking Guidance on Spark on Kubernetes Secrets Configuration

2023-10-01 Thread Mich Talebzadeh
It seems that workload identity
 is not
available on AWS. Workload Identity replaces the need to use Metadata
concealment on exposed storage such as s3 and gcs. The sensitive metadata
protected by metadata concealment is also protected by Workload Identity.

Both Google Cloud Kubernetes (GKE
)
and Azure Kubernetes Servi
ce
support Workload Identity. Taking notes from Google Cloud:  "Workload
Identity is the recommended way for your workloads running on Google
Kubernetes Engine (GKE) to access Google Cloud services in a secure and
manageable way."


HTH


Mich Talebzadeh,
Distinguished Technologist, Solutions Architect & Engineer
London
United Kingdom


   view my Linkedin profile



 https://en.everybodywiki.com/Mich_Talebzadeh



*Disclaimer:* Use it at your own risk. Any and all responsibility for any
loss, damage or destruction of data or any other property which may arise
from relying on this email's technical content is explicitly disclaimed.
The author will in no case be liable for any monetary damages arising from
such loss, damage or destruction.




On Sun, 1 Oct 2023 at 06:36, Jayabindu Singh  wrote:

> Hi Jon,
>
> Using IAM as suggested by Jorn is the best approach.
> We recently moved our spark workload from HDP to Spark on K8 and utilizing
> IAM.
> It will save you from secret management headaches and also allows a lot
> more flexibility on access control and option to allow access to multiple
> S3 buckets in the same pod.
> We have implemented this across Azure, Google and AWS. Azure does require
> some extra work to make it work.
>
> On Sat, Sep 30, 2023 at 12:05 PM Jörn Franke  wrote:
>
>> Don’t use static iam (s3) credentials. It is an outdated insecure method
>> - even AWS recommend against using this for anything (cf eg
>> https://docs.aws.amazon.com/cli/latest/userguide/cli-authentication-user.html
>> ).
>> It is almost a guarantee to get your data stolen and your account
>> manipulated.
>>
>> If you need to use kubernetes (which has its own very problematic
>> security issues) then assign AWS IAM roles with minimal permissions to the
>> pods (for EKS it means using OIDC, cf
>> https://docs.aws.amazon.com/eks/latest/userguide/service_IAM_role.html).
>>
>> Am 30.09.2023 um 03:41 schrieb Jon Rodríguez Aranguren <
>> jon.r.arangu...@gmail.com>:
>>
>> 
>> Dear Spark Community Members,
>>
>> I trust this message finds you all in good health and spirits.
>>
>> I'm reaching out to the collective expertise of this esteemed community
>> with a query regarding Spark on Kubernetes. As a newcomer, I have always
>> admired the depth and breadth of knowledge shared within this forum, and it
>> is my hope that some of you might have insights on a specific challenge I'm
>> facing.
>>
>> I am currently trying to configure multiple Kubernetes secrets, notably
>> multiple S3 keys, at the SparkConf level for a Spark application. My
>> objective is to understand the best approach or methods to ensure that
>> these secrets can be smoothly accessed by the Spark application.
>>
>> If any of you have previously encountered this scenario or possess
>> relevant insights on the matter, your guidance would be highly beneficial.
>>
>> Thank you for your time and consideration. I'm eager to learn from the
>> experiences and knowledge present within this community.
>>
>> Warm regards,
>> Jon
>>
>>


Re: Seeking Guidance on Spark on Kubernetes Secrets Configuration

2023-09-30 Thread Jayabindu Singh
Hi Jon,

Using IAM as suggested by Jorn is the best approach.
We recently moved our spark workload from HDP to Spark on K8 and utilizing
IAM.
It will save you from secret management headaches and also allows a lot
more flexibility on access control and option to allow access to multiple
S3 buckets in the same pod.
We have implemented this across Azure, Google and AWS. Azure does require
some extra work to make it work.

On Sat, Sep 30, 2023 at 12:05 PM Jörn Franke  wrote:

> Don’t use static iam (s3) credentials. It is an outdated insecure method -
> even AWS recommend against using this for anything (cf eg
> https://docs.aws.amazon.com/cli/latest/userguide/cli-authentication-user.html
> ).
> It is almost a guarantee to get your data stolen and your account
> manipulated.
>
> If you need to use kubernetes (which has its own very problematic security
> issues) then assign AWS IAM roles with minimal permissions to the pods (for
> EKS it means using OIDC, cf
> https://docs.aws.amazon.com/eks/latest/userguide/service_IAM_role.html).
>
> Am 30.09.2023 um 03:41 schrieb Jon Rodríguez Aranguren <
> jon.r.arangu...@gmail.com>:
>
> 
> Dear Spark Community Members,
>
> I trust this message finds you all in good health and spirits.
>
> I'm reaching out to the collective expertise of this esteemed community
> with a query regarding Spark on Kubernetes. As a newcomer, I have always
> admired the depth and breadth of knowledge shared within this forum, and it
> is my hope that some of you might have insights on a specific challenge I'm
> facing.
>
> I am currently trying to configure multiple Kubernetes secrets, notably
> multiple S3 keys, at the SparkConf level for a Spark application. My
> objective is to understand the best approach or methods to ensure that
> these secrets can be smoothly accessed by the Spark application.
>
> If any of you have previously encountered this scenario or possess
> relevant insights on the matter, your guidance would be highly beneficial.
>
> Thank you for your time and consideration. I'm eager to learn from the
> experiences and knowledge present within this community.
>
> Warm regards,
> Jon
>
>


Re: Seeking Guidance on Spark on Kubernetes Secrets Configuration

2023-09-30 Thread Jörn Franke
Don’t use static iam (s3) credentials. It is an outdated insecure method - even 
AWS recommend against using this for anything (cf eg 
https://docs.aws.amazon.com/cli/latest/userguide/cli-authentication-user.html).
It is almost a guarantee to get your data stolen and your account manipulated. 

If you need to use kubernetes (which has its own very problematic security 
issues) then assign AWS IAM roles with minimal permissions to the pods (for EKS 
it means using OIDC, cf 
https://docs.aws.amazon.com/eks/latest/userguide/service_IAM_role.html).

> Am 30.09.2023 um 03:41 schrieb Jon Rodríguez Aranguren 
> :
> 
> 
> Dear Spark Community Members,
> 
> I trust this message finds you all in good health and spirits.
> 
> I'm reaching out to the collective expertise of this esteemed community with 
> a query regarding Spark on Kubernetes. As a newcomer, I have always admired 
> the depth and breadth of knowledge shared within this forum, and it is my 
> hope that some of you might have insights on a specific challenge I'm facing.
> 
> I am currently trying to configure multiple Kubernetes secrets, notably 
> multiple S3 keys, at the SparkConf level for a Spark application. My 
> objective is to understand the best approach or methods to ensure that these 
> secrets can be smoothly accessed by the Spark application.
> 
> If any of you have previously encountered this scenario or possess relevant 
> insights on the matter, your guidance would be highly beneficial.
> 
> Thank you for your time and consideration. I'm eager to learn from the 
> experiences and knowledge present within this community.
> 
> Warm regards,
> Jon


Seeking Guidance on Spark on Kubernetes Secrets Configuration

2023-09-29 Thread Jon Rodríguez Aranguren
Dear Spark Community Members,

I trust this message finds you all in good health and spirits.

I'm reaching out to the collective expertise of this esteemed community
with a query regarding Spark on Kubernetes. As a newcomer, I have always
admired the depth and breadth of knowledge shared within this forum, and it
is my hope that some of you might have insights on a specific challenge I'm
facing.

I am currently trying to configure multiple Kubernetes secrets, notably
multiple S3 keys, at the SparkConf level for a Spark application. My
objective is to understand the best approach or methods to ensure that
these secrets can be smoothly accessed by the Spark application.

If any of you have previously encountered this scenario or possess relevant
insights on the matter, your guidance would be highly beneficial.

Thank you for your time and consideration. I'm eager to learn from the
experiences and knowledge present within this community.

Warm regards,
Jon