Re: AWS Glue PySpark Job

2025-01-04 Thread Perez
Hi Team, I would appreciate any help with this. https://stackoverflow.com/questions/79324390/aws-glue-pyspark-job-is-not-ending/79324917#79324917 On Fri, Jan 3, 2025 at 3:53 PM Perez wrote: > Hi Team, > > I would need your help in understanding the below problem. &g

AWS Glue PySpark Job

2025-01-03 Thread Perez
Hi Team, I would need your help in understanding the below problem. https://stackoverflow.com/questions/79324390/aws-glue-pyspark-job-is-not-ending/79324917#79324917

AWS Glue and Python

2024-06-26 Thread Perez
Hi Team I am facing one issue here https://stackoverflow.com/questions/78673228/unable-to-read-text-file-in-glue-job TIA

Re: [EXTERNAL] Re: [Spark]: Spark / Iceberg / hadoop-aws compatibility matrix

2024-04-03 Thread Oxlade, Dan
I don't really understand how Iceberg and the hadoop libraries can coexist in a deployment. The latest spark (3.5.1) base image contains the hadoop-client*-3.3.4.jar. The AWS v2 SDK is only supported in hadoop*-3.4.0.jar and onward. Iceberg AWS integration states AWS v2 SDK is required&

Re: [EXTERNAL] Re: [Spark]: Spark / Iceberg / hadoop-aws compatibility matrix

2024-04-03 Thread Oxlade, Dan
Swapping out the iceberg-aws-bundle for the very latest aws provided sdk ('software.amazon.awssdk:bundle:2.25.23') produces an incompatibility from a slightly different code path: java.lang.NoSuchMethodError: 'void org.apache.hadoop.util.SemaphoredDel

Re: [EXTERNAL] Re: [Spark]: Spark / Iceberg / hadoop-aws compatibility matrix

2024-04-03 Thread Oxlade, Dan
[sorry; replying all this time] With hadoop-*-3.3.6 in place of the 3.4.0 below I get java.lang.NoClassDefFoundError: com/amazonaws/AmazonClientException I think that the below iceberg-aws-bundle version supplies the v2 sdk. Dan From: Aaron Grubb Sent: 03

Re: [Spark]: Spark / Iceberg / hadoop-aws compatibility matrix

2024-04-03 Thread Aaron Grubb
Downgrade to hadoop-*:3.3.x, Hadoop 3.4.x is based on the AWS SDK v2 and should probably be considered as breaking for tools that build on < 3.4.0 while using AWS. From: Oxlade, Dan Sent: Wednesday, April 3, 2024 2:41:11 PM To: user@spark.apache.org Subj

[Spark]: Spark / Iceberg / hadoop-aws compatibility matrix

2024-04-03 Thread Oxlade, Dan
Hi all, I've struggled with this for quite some time. My requirement is to read a parquet file from s3 to a Dataframe then append to an existing iceberg table. In order to read the parquet I need the hadoop-aws dependency for s3a:// . In order to write to iceberg I need the iceberg depen

Re: automatically/dinamically renew aws temporary token

2023-10-24 Thread Carlos Aguni
llel i'm trying to argue on that path. by now even requesting an increase on the session duration is a struggle. but at the moment, since I was only allowed the AssumeRole approach i'm figuring out a way through this path. > https://github.com/zillow/aws-custom-credential-provider thank you

Re: automatically/dinamically renew aws temporary token

2023-10-23 Thread Pol Santamaria
Hi Carlos! Take a look at this project, it's 6 years old but the approach is still valid: https://github.com/zillow/aws-custom-credential-provider The credential provider gets called each time an S3 or Glue Catalog is accessed, and then you can decide whether to use a cached token or

Re: automatically/dinamically renew aws temporary token

2023-10-22 Thread Jörn Franke
Can’t you attach the cross account permission to the glue job role? Why the detour via AssumeRole ? Assumerole can make sense if you use an AWS IAM user and STS authentication, but this would make no sense within AWS for cross-account access as attaching the permissions to the Glue job role is

automatically/dinamically renew aws temporary token

2023-10-22 Thread Carlos Aguni
each node? i'm currently using spark on AWS glue. wonder what options do I have. regards,c.

Re: Accessing python runner file in AWS EKS kubernetes cluster as in local://

2023-04-14 Thread Mich Talebzadeh
OK I managed to load the Python zipped file and the run py.file onto s3 for AWS EKS to work It is a bit of nightmare compared to the same on Google SDK which is simpler Anyhow you will require additional jar files to be added to $SPARK_HOME/jars. These two files will be picked up after you build

Re: Accessing python runner file in AWS EKS kubernetes cluster as in local://

2023-04-12 Thread Mich Talebzadeh
Thanks! I will have a look. Mich Talebzadeh, Lead Solutions Architect/Engineering Lead Palantir Technologies Limited London United Kingdom view my Linkedin profile https://en.everybodywiki.com/Mich_Talebzadeh *Disclaimer:* Use i

Re: Accessing python runner file in AWS EKS kubernetes cluster as in local://

2023-04-12 Thread Bjørn Jørgensen
Yes, it looks inside the docker containers folder. It will work if you are using s3 og gs. ons. 12. apr. 2023, 18:02 skrev Mich Talebzadeh : > Hi, > > In my spark-submit to eks cluster, I use the standard code to submit to > the cluster as below: > > spark-submit --verbose \ >--master k8s://$

Accessing python runner file in AWS EKS kubernetes cluster as in local://

2023-04-12 Thread Mich Talebzadeh
Hi, In my spark-submit to eks cluster, I use the standard code to submit to the cluster as below: spark-submit --verbose \ --master k8s://$KUBERNETES_MASTER_IP:443 \ --deploy-mode cluster \ --name sparkOnEks \ --py-files local://$CODE_DIRECTORY/spark_on_eks.zip \ local:///home/hduse

Re: [Spark Structured Streaming] Do spark structured streaming is support sink to AWS Kinesis currently and how to handle if achieve quotas of kinesis?

2023-03-06 Thread Mich Talebzadeh
is. But it seems like does not have corresponding > connector can use. I would confirm whether have another method in addition > to this solution > <https://repost.aws/questions/QUP_OJomilTO6oIgvK00VHEA/writing-data-to-kinesis-stream-from-py-spark> > 2. Because aws kinesis have quota l

[Spark Structured Streaming] Do spark structured streaming is support sink to AWS Kinesis currently and how to handle if achieve quotas of kinesis?

2023-03-05 Thread hueiyuan su
whether have another method in addition to this solution <https://repost.aws/questions/QUP_OJomilTO6oIgvK00VHEA/writing-data-to-kinesis-stream-from-py-spark> 2. Because aws kinesis have quota limitation (like 1MB/s and 1000 records/s), if spark structured streaming micro batch size too large, h

Re: [Spark Structured Streaming] Do spark structured streaming is support sink to AWS Kinesis currently?

2023-02-16 Thread Vikas Kumar
treaming > *Level*: Advanced > *Scenario*: How-to > > > *Problems Description* > I would like to implement witeStream data to AWS Kinesis with Spark > structured Streaming, but I do not find related connector jar can be used. > I want to check whether f

[Spark Structured Streaming] Do spark structured streaming is support sink to AWS Kinesis currently?

2023-02-16 Thread hueiyuan su
*Component*: Spark Structured Streaming *Level*: Advanced *Scenario*: How-to *Problems Description* I would like to implement witeStream data to AWS Kinesis with Spark structured Streaming, but I do not find related connector jar can be used. I want to check whether fully

Re: Need help with the configuration for AWS glue jobs

2022-06-23 Thread Sid
Where can I find information on the size of the datasets supported by AWS Glue? I didn't see it on the documentation Also, if I want to process TBs of data for eg 1TB what should be the ideal EMR cluster configuration? Could you please guide me on this? Thanks, Sid. On Thu, 23 Jun 2022,

Re: Need help with the configuration for AWS glue jobs

2022-06-23 Thread Gourav Sengupta
Please use EMR, Glue is not made for heavy processing jobs. On Thu, Jun 23, 2022 at 6:36 AM Sid wrote: > Hi Team, > > Could anyone help me in the below problem: > > > https://stackoverflow.com/questions/72724999/how-to-calculate-number-of-g-1-workers-in-aws-glue-for-p

Need help with the configuration for AWS glue jobs

2022-06-22 Thread Sid
Hi Team, Could anyone help me in the below problem: https://stackoverflow.com/questions/72724999/how-to-calculate-number-of-g-1-workers-in-aws-glue-for-processing-1tb-data Thanks, Sid

Re: AWS EMR SPARK 3.1.1 date issues

2021-08-29 Thread Gourav Sengupta
arquet false > > see > https://github.com/awslabs/aws-glue-data-catalog-client-for-apache-hive-metastore/issues/45 > > On Tue Aug 24, 2021 at 9:18 AM CEST, Gourav Sengupta wrote: > > Hi, > > > > I received a response from AWS, this is an issue with EMR, and they are

Re: AWS EMR SPARK 3.1.1 date issues

2021-08-29 Thread Nicolas Paris
as a workaround turn off pruning : spark.sql.hive.metastorePartitionPruning false spark.sql.hive.convertMetastoreParquet false see https://github.com/awslabs/aws-glue-data-catalog-client-for-apache-hive-metastore/issues/45 On Tue Aug 24, 2021 at 9:18 AM CEST, Gourav Sengupta wrote: > Hi, &g

Re: AWS EMR SPARK 3.1.1 date issues

2021-08-24 Thread Gourav Sengupta
Hi, I received a response from AWS, this is an issue with EMR, and they are working on resolving the issue I believe. Thanks and Regards, Gourav Sengupta On Mon, Aug 23, 2021 at 1:35 PM Gourav Sengupta < gourav.sengupta.develo...@gmail.com> wrote: > Hi, > > the query still gives

Re: AWS EMR SPARK 3.1.1 date issues

2021-08-23 Thread Gourav Sengupta
Hi, the query still gives the same error if we write "SELECT * FROM table_name WHERE data_partition > CURRENT_DATE() - INTERVAL 10 DAYS". Also the queries work fine in SPARK 3.0.x, or in EMR 6.2.0. Thanks and Regards, Gourav Sengupta On Mon, Aug 23, 2021 at 1:16 PM Sean Owen wrote: > Date ha

Re: AWS EMR SPARK 3.1.1 date issues

2021-08-23 Thread Sean Owen
Date handling was tightened up in Spark 3. I think you need to compare to a date literal, not a string literal. On Mon, Aug 23, 2021 at 5:12 AM Gourav Sengupta < gourav.sengupta.develo...@gmail.com> wrote: > Hi, > > while I am running in EMR 6.3.0 (SPARK 3.1.1) a simple query as "SELECT * > FROM

AWS EMR SPARK 3.1.1 date issues

2021-08-23 Thread Gourav Sengupta
Hi, while I am running in EMR 6.3.0 (SPARK 3.1.1) a simple query as "SELECT * FROM WHERE > '2021-03-01'" the query is failing with error: --- pyspark.sql.utils.AnalysisException: org.apache.hadoop.hive.metastore.api.InvalidO

Bursting Your On-Premises Data Lake Analytics and AI Workloads on AWS

2021-02-18 Thread Bin Fan
Hi everyone! I am sharing this article about running Spark / Presto workloads on AWS: Bursting On-Premise Datalake Analytics and AI Workloads on AWS <https://bit.ly/3qA1Tom> published on AWS blog. Hope you enjoy it. Feel free to discuss with me here <https://alluxio.io/slack>. - Bin

Spark in hybrid cloud in AWS & GCP

2020-12-07 Thread Bin Fan
Dear Spark users, If you are interested in running Spark in Hybrid Cloud? Checkout talks from AWS & GCP at the virtual Data Orchestration Summit <https://www.alluxio.io/data-orchestration-summit-2020/> on Dec. 8-9, 2020, register for free <https://www.alluxio.io/data-orchestratio

Re: Spark Job Fails with Unknown Error writing to S3 from AWS EMR

2020-07-22 Thread Shriraj Bhardwaj
We faced this similar situation with jre 8u262 try reverting back... On Thu, Jul 23, 2020, 5:18 AM koti reddy wrote: > Hi, > > Can someone help to resolve this issue? > Thank you in advance. > > Error logs : > > java.io.EOFException: Unexpected EOF while trying to read response from server >

Spark Job Fails with Unknown Error writing to S3 from AWS EMR

2020-07-22 Thread koti reddy
Hi, Can someone help to resolve this issue? Thank you in advance. Error logs : java.io.EOFException: Unexpected EOF while trying to read response from server at org.apache.hadoop.hdfs.protocolPB.PBHelperClient.vintPrefixed(PBHelperClient.java:402) at org.apache.hadoop.hdfs.prot

AWS EMR slow write to HDFS

2019-06-11 Thread Femi Anthony
I'm writing a large dataset in Parquet format to HDFS using Spark and it runs rather slowly in EMR vs say Databricks. I realize that if I was able to use Hadoop 3.1, it would be much more performant because it has a high performance output committer. Is this the case, and if so - when will the

Re: Aws

2019-02-08 Thread Pedro Tuero
>> In the same link, it says that dynamic allocation is true by default. I >>> thought it would do the trick but reading again I think it is related to >>> the number of executors rather than the number of cores. >>> >>> But the jobs are still taking more than

Re: Aws

2019-02-07 Thread Noritaka Sekiyama
thought it would do the trick but reading again I think it is related to >> the number of executors rather than the number of cores. >> >> But the jobs are still taking more than before. >> Watching application history, I see these differences: >> For the same job, the sam

Re: Aws

2019-02-07 Thread Hiroyuki Nagata
gt; the number of executors rather than the number of cores. > > But the jobs are still taking more than before. > Watching application history, I see these differences: > For the same job, the same kind of instances types, default (aws managed) > configuration for executors, cores,

Re: Aws

2019-02-01 Thread Pedro Tuero
ching application history, I see these differences: For the same job, the same kind of instances types, default (aws managed) configuration for executors, cores, and memory: Instances: 6 r5.xlarge : 4 vCpu , 32gb of mem. (So there is 24 cores: 6 instances * 4 cores). With 5.16: - 24 executors (4 in

Re: Aws

2019-01-31 Thread Hiroyuki Nagata
Hi, Pedro I also start using AWS EMR, with Spark 2.4.0. I'm seeking methods for performance tuning. Do you configure dynamic allocation ? FYI: https://spark.apache.org/docs/latest/job-scheduling.html#dynamic-resource-allocation I've not tested it yet. I guess spark-submit needs

Aws

2019-01-31 Thread Pedro Tuero
Hi guys, I use to run spark jobs in Aws emr. Recently I switch from aws emr label 5.16 to 5.20 (which use Spark 2.4.0). I've noticed that a lot of steps are taking longer than before. I think it is related to the automatic configuration of cores by executor. In version 5.16, some executors

Re: Connection issue with AWS S3 from PySpark 2.3.1

2018-12-21 Thread Riccardo Ferrari
Best, On Fri, Dec 21, 2018 at 1:18 PM Aakash Basu wrote: > Any help, anyone? > > On Fri, Dec 21, 2018 at 2:21 PM Aakash Basu > wrote: > >> Hey Shuporno, >> >> With the updated config too, I am getting the same error. While trying to >> figure that out, I fo

Re: Connection issue with AWS S3 from PySpark 2.3.1

2018-12-21 Thread Aakash Basu
Any help, anyone? On Fri, Dec 21, 2018 at 2:21 PM Aakash Basu wrote: > Hey Shuporno, > > With the updated config too, I am getting the same error. While trying to > figure that out, I found this link which says I need aws-java-sdk (which I > already have): > https://github.c

Re: Connection issue with AWS S3 from PySpark 2.3.1

2018-12-21 Thread Aakash Basu
Hey Shuporno, With the updated config too, I am getting the same error. While trying to figure that out, I found this link which says I need aws-java-sdk (which I already have): https://github.com/amazon-archives/kinesis-storm-spout/issues/8 Now, this is my java details: java version "1.8.

Re: Connection issue with AWS S3 from PySpark 2.3.1

2018-12-21 Thread Shuporno Choudhury
8 more > > > > Thanks, > Aakash. > > On Fri, Dec 21, 2018 at 12:51 PM Shuporno Choudhury <[hidden email] > <http:///user/SendEmail.jtp?type=node&node=34217&i=0>> wrote: > >> >> >> On Fri, 21 Dec 2018 at 12:47, Shuporno Choudhury <[h

Re: Connection issue with AWS S3 from PySpark 2.3.1

2018-12-20 Thread Aakash Basu
ec 21, 2018 at 12:51 PM Shuporno Choudhury < shuporno.choudh...@gmail.com> wrote: > > > On Fri, 21 Dec 2018 at 12:47, Shuporno Choudhury < > shuporno.choudh...@gmail.com> wrote: > >> Hi, >> Your connection config uses 's3n' but your read command

Re: Connection issue with AWS S3 from PySpark 2.3.1

2018-12-20 Thread Shuporno Choudhury
> > I feel this should solve the problem. > > On Fri, 21 Dec 2018 at 12:09, Aakash Basu-2 [via Apache Spark User List] < > ml+s1001560n34215...@n3.nabble.com> wrote: > >> Hi, >> >> I am trying to connect to AWS S3 and read a csv file (running POC) from a

Connection issue with AWS S3 from PySpark 2.3.1

2018-12-20 Thread Aakash Basu
Hi, I am trying to connect to AWS S3 and read a csv file (running POC) from a bucket. I have s3cmd and and being able to run ls and other operation from cli. *Present Configuration:* Python 3.7 Spark 2.3.1 *JARs added:* hadoop-aws-2.7.3.jar (in sync with the hadoop version used with spark) aws

Re: [Spark Shell on AWS K8s Cluster]: Is there more documentation regarding how to run spark-shell on k8s cluster?

2018-11-15 Thread Holden Karau
chments. Thank you. > > Please consider the environment before printing. > > > > > > > > *From: *Li Gao > *Date: *Thursday, November 1, 2018 4:56 > *To: *"Zhang, Yuqi" > *Cc: *Gourav Sengupta , "user@spark.apache.org" > , "Nogami, Ma

Re: [Spark Shell on AWS K8s Cluster]: Is there more documentation regarding how to run spark-shell on k8s cluster?

2018-10-31 Thread Zhang, Yuqi
ts. Instead, please notify the sender and delete the e-mail and any attachments. Thank you. Please consider the environment before printing. From: Li Gao Date: Thursday, November 1, 2018 4:56 To: "Zhang, Yuqi" Cc: Gourav Sengupta , "user@spark.apache.org" , "Nogami, Masa

Re: [Spark Shell on AWS K8s Cluster]: Is there more documentation regarding how to run spark-shell on k8s cluster?

2018-10-31 Thread Li Gao
chments. Instead, please > notify the sender and delete the e-mail and any attachments. Thank you. > > Please consider the environment before printing. > > > > > > > > *From: *Li Gao > *Date: *Thursday, November 1, 2018 0:07 > *To: *"Zhang, Yuqi" >

Re: [Spark Shell on AWS K8s Cluster]: Is there more documentation regarding how to run spark-shell on k8s cluster?

2018-10-31 Thread Zhang, Yuqi
s. Thank you. Please consider the environment before printing. From: Li Gao Date: Thursday, November 1, 2018 0:07 To: "Zhang, Yuqi" Cc: "gourav.sengu...@gmail.com" , "user@spark.apache.org" , "Nogami, Masatsugu" Subject: Re: [Spark Shell on AWS K8s Clu

Re: [Spark Shell on AWS K8s Cluster]: Is there more documentation regarding how to run spark-shell on k8s cluster?

2018-10-31 Thread Li Gao
the driver client. -Li On Wed, Oct 31, 2018 at 7:30 AM Zhang, Yuqi wrote: > Hi Gourav, > > > > Thank you for your reply. > > > > I haven’t try glue or EMK, but I guess it’s integrating kubernetes on aws > instances? > > I could set up the k8s cluster on AWS,

Re: [Spark Shell on AWS K8s Cluster]: Is there more documentation regarding how to run spark-shell on k8s cluster?

2018-10-31 Thread Zhang, Yuqi
Hi Gourav, Thank you for your reply. I haven’t try glue or EMK, but I guess it’s integrating kubernetes on aws instances? I could set up the k8s cluster on AWS, but my problem is don’t know how to run spark-shell on kubernetes… Since spark only support client mode on k8s from 2.4 version which

Re: [Spark Shell on AWS K8s Cluster]: Is there more documentation regarding how to run spark-shell on k8s cluster?

2018-10-31 Thread Biplob Biswas
n on kubernetes cluster, so I >> would like to ask if there is some solution to my problem. >> >> >> >> The problem is when I am trying to run spark-shell on kubernetes v1.11.3 >> cluster on AWS environment, I couldn’t successfully run stateful set using >>

Re: [Spark Shell on AWS K8s Cluster]: Is there more documentation regarding how to run spark-shell on k8s cluster?

2018-10-31 Thread Gourav Sengupta
g spark 2.4 client mode function on kubernetes cluster, so I > would like to ask if there is some solution to my problem. > > > > The problem is when I am trying to run spark-shell on kubernetes v1.11.3 > cluster on AWS environment, I couldn’t successfully run stateful set using >

[Spark Shell on AWS K8s Cluster]: Is there more documentation regarding how to run spark-shell on k8s cluster?

2018-10-28 Thread Zhang, Yuqi
cluster on AWS environment, I couldn’t successfully run stateful set using the docker image built from spark 2.4. The error message is showing below. The version I am using is spark v2.4.0-rc3. Also, I wonder if there is more documentation on how to use client-mode or integrate spark-shell on

Re: AWS credentials needed while trying to read a model from S3 in Spark

2018-05-09 Thread Srinath C
You could use IAM roles in AWS to access the data in S3 without credentials. See this link <https://www.cloudera.com/documentation/enterprise/5-5-x/topics/spark_s3.html> and this link <http://parthicloud.com/how-to-access-s3-bucket-from-application-on-amazon-ec2-without-access-credentials

AWS credentials needed while trying to read a model from S3 in Spark

2018-05-09 Thread Mina Aslani
Hi, I am trying to load a ML model from AWS S3 in my spark app running in a docker container, however I need to pass the AWS credentials. My questions is, why do I need to pass the credentials in the path? And what is the workaround? Best regards, Mina

Spark Structured Streaming how to read data from AWS SQS

2017-12-11 Thread Bogdan Cojocar
For spark streaming there are connectors that can achieve this functionality. Unfortunately for spark structured streaming I couldn't find any as it's a newer technology. Is there a way to connect to a source using a spark streaming connector? Or is th

Re: Quick one... AWS SDK version?

2017-10-08 Thread Jonathan Kelly
Tushar, Yes, the hadoop-aws jar installed on an emr-5.8.0 cluster was built with AWS Java SDK 1.11.160, if that’s what you mean. ~ Jonathan On Sun, Oct 8, 2017 at 8:42 AM Tushar Sudake wrote: > Hi Jonathan, > > Does that mean Hadoop-AWS 2.7.3 too is built against AWS SDK 1.11.160

Re: Quick one... AWS SDK version?

2017-10-08 Thread Tushar Sudake
Hi Jonathan, Does that mean Hadoop-AWS 2.7.3 too is built against AWS SDK 1.11.160 and not 1.7.4? Thanks. On Oct 7, 2017 3:50 PM, "Jean Georges Perrin" wrote: Hey Marco, I am actually reading from S3 and I use 2.7.3, but I inherited the project and they use some AWS API from

Re: Quick one... AWS SDK version?

2017-10-07 Thread Jean Georges Perrin
Hey Marco, I am actually reading from S3 and I use 2.7.3, but I inherited the project and they use some AWS API from Amazon SDK, which version is like from yesterday :) so it’s confused and AMZ is changing its version like crazy so it’s a little difficult to follow. Right now I went back to

Re: Quick one... AWS SDK version?

2017-10-07 Thread Marco Mistroni
Hi JG out of curiosity what's ur usecase? are you writing to S3? you could use Spark to do that , e.g using hadoop package org.apache.hadoop:hadoop-aws:2.7.1 ..that will download the aws client which is in line with hadoop 2.7.1? hth marco On Fri, Oct 6, 2017 at 10:58 PM, Jonathan Kelly

Re: Quick one... AWS SDK version?

2017-10-06 Thread Jonathan Kelly
Note: EMR builds Hadoop, Spark, et al, from source against specific versions of certain packages like the AWS Java SDK, httpclient/core, Jackson, etc., sometimes requiring some patches in these applications in order to work with versions of these dependencies that differ from what the applications

Re: Quick one... AWS SDK version?

2017-10-04 Thread Steve Loughran
On 3 Oct 2017, at 21:37, JG Perrin mailto:jper...@lumeris.com>> wrote: Sorry Steve – I may not have been very clear: thinking about aws-java-sdk-z.yy.xxx.jar. To the best of my knowledge, none is bundled with Spark. I know, but if you are talking to s3 via the s3a client, you will ne

RE: Quick one... AWS SDK version?

2017-10-03 Thread JG Perrin
Sorry Steve - I may not have been very clear: thinking about aws-java-sdk-z.yy.xxx.jar. To the best of my knowledge, none is bundled with Spark. From: Steve Loughran [mailto:ste...@hortonworks.com] Sent: Tuesday, October 03, 2017 2:20 PM To: JG Perrin Cc: user@spark.apache.org Subject: Re

RE: Quick one... AWS SDK version?

2017-10-03 Thread JG Perrin
Thanks Yash… this is helpful! From: Yash Sharma [mailto:yash...@gmail.com] Sent: Tuesday, October 03, 2017 1:02 AM To: JG Perrin ; user@spark.apache.org Subject: Re: Quick one... AWS SDK version? Hi JG, Here are my cluster configs if it helps. Cheers. EMR: emr-5.8.0 Hadoop distribution

Re: Quick one... AWS SDK version?

2017-10-03 Thread Steve Loughran
On 3 Oct 2017, at 02:28, JG Perrin mailto:jper...@lumeris.com>> wrote: Hey Sparkians, What version of AWS Java SDK do you use with Spark 2.2? Do you stick with the Hadoop 2.7.3 libs? You generally to have to stick with the version which hadoop was built with I'm afraid...v

Re: Quick one... AWS SDK version?

2017-10-02 Thread Yash Sharma
Hi JG, Here are my cluster configs if it helps. Cheers. EMR: emr-5.8.0 Hadoop distribution: Amazon 2.7.3 AWS sdk: /usr/share/aws/aws-java-sdk/aws-java-sdk-1.11.160.jar Applications: Hive 2.3.0 Spark 2.2.0 Tez 0.8.4 On Tue, 3 Oct 2017 at 12:29 JG Perrin wrote: > Hey Sparkians, > >

Quick one... AWS SDK version?

2017-10-02 Thread JG Perrin
Hey Sparkians, What version of AWS Java SDK do you use with Spark 2.2? Do you stick with the Hadoop 2.7.3 libs? Thanks! jg

Spark ES Connector -- AWS Managed ElasticSearch Services

2017-08-01 Thread Deepak Sharma
I am tying to connect to AWS managed ES service using Spark ES Connector , but am not able to. I am passing es.nodes and es.port along with es.nodes.wan.only set to true. But it fails with below error: 34 ERROR NetworkClient: Node [x.x.x.x:443] failed (The server x.x.x.x failed to respond); no

Re: Running Spark und YARN on AWS EMR

2017-07-17 Thread Takashi Sasaki
> wrote: >> >> Hi Pascal, >> >> The error also occurred frequently in our project. >> >> As a solution, it was effective to specify the memory size directly >> with spark-submit command. >> >> eg. spark-submit ex

Re: Running Spark und YARN on AWS EMR

2017-07-17 Thread Josh Holbrook
cify the memory size directly > with spark-submit command. > > eg. spark-submit executor-memory 2g > > > Regards, > > Takashi > > > 2017-07-18 5:18 GMT+09:00 Pascal Stammer : > >> Hi, > >> > >> I am running a Spark 2.1.x Application on AWS EMR

Re: Running Spark und YARN on AWS EMR

2017-07-17 Thread Pascal Stammer
> eg. spark-submit executor-memory 2g > > > Regards, > > Takashi > >> 2017-07-18 5:18 GMT+09:00 Pascal Stammer : >>> Hi, >>> >>> I am running a Spark 2.1.x Application on AWS EMR with YARN and get >>> following error that kill

Re: Running Spark und YARN on AWS EMR

2017-07-17 Thread Takashi Sasaki
I am running a Spark 2.1.x Application on AWS EMR with YARN and get >> following error that kill my application: >> >> AM Container for appattempt_1500320286695_0001_01 exited with exitCode: >> -104 >> For more detailed output, check application tracking >> p

Running Spark und YARN on AWS EMR

2017-07-17 Thread Pascal Stammer
Hi, I am running a Spark 2.1.x Application on AWS EMR with YARN and get following error that kill my application: AM Container for appattempt_1500320286695_0001_01 exited with exitCode: -104 For more detailed output, check application tracking page:http://ip-172-31-35-192.eu-central-1

Re: What is the best way to run a scheduled spark batch job on AWS EC2 ?

2017-04-12 Thread lucas.g...@gmail.com
"Building data products is a very different discipline from that of building software." That is a fundamentally incorrect assumption. There will always be a need for figuring out how to apply said principles, but saying 'we're different' has always turned out to be incorrect and I have seen no re

Re: What is the best way to run a scheduled spark batch job on AWS EC2 ?

2017-04-12 Thread Steve Loughran
On 12 Apr 2017, at 17:25, Gourav Sengupta mailto:gourav.sengu...@gmail.com>> wrote: Hi, Your answer is like saying, I know how to code in assembly level language and I am going to build the next GUI in assembly level code and I think that there is a genuine functional requirement to see a col

Re: What is the best way to run a scheduled spark batch job on AWS EC2 ?

2017-04-12 Thread Gourav Sengupta
Hi, Your answer is like saying, I know how to code in assembly level language and I am going to build the next GUI in assembly level code and I think that there is a genuine functional requirement to see a color of a button in green on the screen. Perhaps it may be pertinent to read the first pre

Re: What is the best way to run a scheduled spark batch job on AWS EC2 ?

2017-04-12 Thread Steve Loughran
On 11 Apr 2017, at 20:46, Gourav Sengupta mailto:gourav.sengu...@gmail.com>> wrote: And once again JAVA programmers are trying to solve a data analytics and data warehousing problem using programming paradigms. It genuinely a pain to see this happen. While I'm happy to be faulted for treati

Re: What is the best way to run a scheduled spark batch job on AWS EC2 ?

2017-04-11 Thread Sumona Routh
; Regards, > Gourav Sengupta > > On Fri, Apr 7, 2017 at 4:07 PM, Steve Loughran > wrote: > > If you have Jenkins set up for some CI workflow, that can do scheduled > builds and tests. Works well if you can do some build test before even > submitting it to a remote cluster >

Re: What is the best way to run a scheduled spark batch job on AWS EC2 ?

2017-04-11 Thread Gourav Sengupta
tration engine. >>> >>> Regards, >>> Gourav Sengupta >>> >>> On Fri, Apr 7, 2017 at 4:07 PM, Steve Loughran >>> wrote: >>> >>>> If you have Jenkins set up for some CI workflow, that can do scheduled >>>> builds

Re: What is the best way to run a scheduled spark batch job on AWS EC2 ?

2017-04-11 Thread Sam Elamin
multiple options really some of which have been already listed >>> but let me try and clarify >>> >>> Assuming you have a spark application in a jar you have a variety of >>> options >>> >>> You have to have an existing spark cluster that is either run

Re: What is the best way to run a scheduled spark batch job on AWS EC2 ?

2017-04-11 Thread Steve Loughran
. Super simple / hacky Cron job on EC2 that calls a simple shell script that does a spart submit to a Spark Cluster OR create or add step to an EMR cluster More Elegant Airflow/Luigi/AWS Data Pipeline (Which is just CRON in the UI ) that will do the above step but have scheduling and potential

Re: What is the best way to run a scheduled spark batch job on AWS EC2 ?

2017-04-07 Thread shyla deshpande
y listed >>> but let me try and clarify >>> >>> Assuming you have a spark application in a jar you have a variety of >>> options >>> >>> You have to have an existing spark cluster that is either running on EMR >>> or somewhere else. >>&

Re: What is the best way to run a scheduled spark batch job on AWS EC2 ?

2017-04-07 Thread Sam Elamin
t;> Cron job on EC2 that calls a simple shell script that does a spart submit >> to a Spark Cluster OR create or add step to an EMR cluster >> >> *More Elegant* >> Airflow/Luigi/AWS Data Pipeline (Which is just CRON in the UI ) that will >> do the above step but have

Re: What is the best way to run a scheduled spark batch job on AWS EC2 ?

2017-04-07 Thread Gourav Sengupta
; to a Spark Cluster OR create or add step to an EMR cluster > > *More Elegant* > Airflow/Luigi/AWS Data Pipeline (Which is just CRON in the UI ) that will > do the above step but have scheduling and potential backfilling and error > handling(retries,alerts etc) > > AWS are coming

Re: What is the best way to run a scheduled spark batch job on AWS EC2 ?

2017-04-07 Thread Steve Loughran
simple shell script that does a spart submit to a Spark Cluster OR create or add step to an EMR cluster More Elegant Airflow/Luigi/AWS Data Pipeline (Which is just CRON in the UI ) that will do the above step but have scheduling and potential backfilling and error handling(retries,alerts etc) A

Re: What is the best way to run a scheduled spark batch job on AWS EC2 ?

2017-04-07 Thread Sam Elamin
/ hacky* Cron job on EC2 that calls a simple shell script that does a spart submit to a Spark Cluster OR create or add step to an EMR cluster *More Elegant* Airflow/Luigi/AWS Data Pipeline (Which is just CRON in the UI ) that will do the above step but have scheduling and potential backfilling and

Re: What is the best way to run a scheduled spark batch job on AWS EC2 ?

2017-04-06 Thread Gourav Sengupta
Hi Shyla, why would you want to schedule a spark job in EC2 instead of EMR? Regards, Gourav On Fri, Apr 7, 2017 at 1:04 AM, shyla deshpande wrote: > I want to run a spark batch job maybe hourly on AWS EC2 . What is the > easiest way to do this. Thanks >

Re: What is the best way to run a scheduled spark batch job on AWS EC2 ?

2017-04-06 Thread Yash Sharma
On Fri, 7 Apr 2017 at 10:04 shyla deshpande wrote: > I want to run a spark batch job maybe hourly on AWS EC2 . What is the > easiest way to do this. Thanks >

What is the best way to run a scheduled spark batch job on AWS EC2 ?

2017-04-06 Thread shyla deshpande
I want to run a spark batch job maybe hourly on AWS EC2 . What is the easiest way to do this. Thanks

What is the best way to run a scheduled spark batch job on AWS EC2 ?

2017-04-06 Thread shyla deshpande
I want to run a spark batch job maybe hourly on AWS EC2 . What is the easiest way to do this. Thanks

Consuming AWS Cloudwatch logs from Kinesis into Spark

2017-04-05 Thread Tim Smith
I am sharing this code snippet since I spent quite some time figuring it out and I couldn't find any examples online. Between the Kinesis documentation, tutorial on AWS site and other code snippets on the Internet, I was confused about structure/format of the messages that Spark fetches

Spark is inventing its own AWS secret key

2017-03-08 Thread Jonhy Stack
fs.s3a.secret.key", "SECRETKEY") hadoopConf.set("fs.s3a.impl", "org.apache.hadoop.fs.s3a.S3AFileSystem") logs = spark_context.textFile("s3a://mybucket/logs/*) Spark was saying Invalid Access key [ACCESSKEY] However with the same ACCESSKEY and SECRETKEY this was w

Re: Custom log4j.properties on AWS EMR

2017-02-28 Thread Prithish
ee > http://stackoverflow.com/questions/42452622/custom- > log4j-properties-on-aws-emr/42516161#42516161 > > In short, an hdfs:// path can't be used to configure log4j because log4j > knows nothing about hdfs. Instead, since you are using EMR, you should use > the Configuration A

Re: Custom log4j.properties on AWS EMR

2017-02-28 Thread Jonathan Kelly
Prithish, I saw you posted this on SO, so I responded there just now. See http://stackoverflow.com/questions/42452622/custom-log4j-properties-on-aws-emr/42516161#42516161 In short, an hdfs:// path can't be used to configure log4j because log4j knows nothing about hdfs. Instead, since yo

Re: Custom log4j.properties on AWS EMR

2017-02-26 Thread Prithish
ideas? > > I have also posted on Stackoverflow (link below) > http://stackoverflow.com/questions/42452622/custom- > log4j-properties-on-aws-emr > > >

Re: Custom log4j.properties on AWS EMR

2017-02-26 Thread Steve Loughran
his seem to work. However, I can get this working when running on my local Yarn setup. Any ideas? I have also posted on Stackoverflow (link below) http://stackoverflow.com/questions/42452622/custom-log4j-properties-on-aws-emr

Custom log4j.properties on AWS EMR

2017-02-26 Thread Prithish
ever, I can get this working when running on my local Yarn setup. Any ideas? I have also posted on Stackoverflow (link below) http://stackoverflow.com/questions/42452622/custom-log4j-properties-on-aws-emr

  1   2   3   4   >