Help needed optimize spark history server performance

2024-05-03 Thread Vikas Tharyani
of our SHS and prevent these timeouts. Here are some areas we're particularly interested in exploring: - Are there additional configuration options we should consider for handling large event logs? - Could Nginx configuration adjustments help with timeouts? - Are there best

help needed with SPARK-45598 and SPARK-45769

2023-11-09 Thread Maksym M
Greetings, tl;dr there must have been a regression in spark *connect*'s ability to retrieve data, more details in linked issues https://issues.apache.org/jira/browse/SPARK-45598 https://issues.apache.org/jira/browse/SPARK-45769 we have projects that depend on spark connect 3.5 and we'd

Re: Help me learn about JOB TASK and DAG in Apache Spark

2023-04-01 Thread Mich Talebzadeh
023 at 13:12, Khalid Mammadov wrote: > Hey AN-TRUONG > > I have got some articles about this subject that should help. > E.g. > https://khalidmammadov.github.io/spark/spark_internals_rdd.html > > Also check other Spark Internals on web. > > Regards > Khalid > >

Re: Help me learn about JOB TASK and DAG in Apache Spark

2023-04-01 Thread Khalid Mammadov
Hey AN-TRUONG I have got some articles about this subject that should help. E.g. https://khalidmammadov.github.io/spark/spark_internals_rdd.html Also check other Spark Internals on web. Regards Khalid On Fri, 31 Mar 2023, 16:29 AN-TRUONG Tran Phan, wrote: > Thank you for your informat

Re: Help me learn about JOB TASK and DAG in Apache Spark

2023-03-31 Thread Mich Talebzadeh
yes history refers to completed jobs. 4040 is the running jobs you should have screen shots for executors and stages as well. HTH Mich Talebzadeh, Lead Solutions Architect/Engineering Lead Palantir Technologies Limited view my Linkedin profile

Re: Help me learn about JOB TASK and DAG in Apache Spark

2023-03-31 Thread AN-TRUONG Tran Phan
Thank you for your information, I have tracked the spark history server on port 18080 and the spark UI on port 4040. I see the result of these two tools as similar right? I want to know what each Task ID (Example Task ID 0, 1, 3, 4, 5, ) in the images does, is it possible?

Re: Help me learn about JOB TASK and DAG in Apache Spark

2023-03-31 Thread Mich Talebzadeh
Are you familiar with spark GUI default on port 4040? have a look. HTH Mich Talebzadeh, Lead Solutions Architect/Engineering Lead Palantir Technologies Limited view my Linkedin profile

Re: Kind help request

2023-03-25 Thread Sean Owen
It is telling you that the UI can't bind to any port. I presume that's because of container restrictions? If you don't want the UI at all, just set spark.ui.enabled to false On Sat, Mar 25, 2023 at 8:28 AM Lorenzo Ferrando < lorenzo.ferra...@edu.unige.it> wrote: > Dear Spark team, > > I am

Kind help request

2023-03-25 Thread Lorenzo Ferrando
Dear Spark team, I am Lorenzo from University of Genoa. I am currently using (ubuntu 18.04) the nextflow/sarek pipeline to analyse genomic data through a singularity container. One of the step of the pipeline uses GATK4 and it implements Spark. However, after some time I get the following error:

Re: Help needed regarding error with 5 node Spark cluster (shuffle error)- Comcast

2023-01-30 Thread Artemis User
ote: I am not sure if this is the intended DL for reaching out for help. Please redirect to the right DL *From: *Jain, Sanchi *Date: *Monday, January 30, 2023 at 10:10 AM *To: *priv...@spark.apache.org *Subject: *Request for access to create a jira account- Comcast Hello there I am a principal engineer at

Re: Help needed regarding error with 5 node Spark cluster (shuffle error)- Comcast

2023-01-30 Thread Mich Talebzadeh
liable for any monetary damages arising from such loss, damage or destruction. On Mon, 30 Jan 2023 at 15:15, Jain, Sanchi wrote: > I am not sure if this is the intended DL for reaching out for help. Please > redirect to the right DL > > > > *From: *Jain, Sanchi > *Date

Help needed regarding error with 5 node Spark cluster (shuffle error)- Comcast

2023-01-30 Thread Jain, Sanchi
I am not sure if this is the intended DL for reaching out for help. Please redirect to the right DL From: Jain, Sanchi Date: Monday, January 30, 2023 at 10:10 AM To: priv...@spark.apache.org Subject: Request for access to create a jira account- Comcast Hello there I am a principal engineer

Help with ClassNotFoundException: org.apache.spark.internal.io.cloud.PathOutputCommitProtocol

2022-12-30 Thread Meharji Arumilli
-spark-internal-io-cloud Could you kindly help to solve this. Regards Mehar

Re: Help with Shuffle Read performance

2022-09-30 Thread Igor Calabria
alues ranging from 25s to several minutes(the task sizes are > really close, they aren't skewed). I've tried increasing > "spark.reducer.maxSizeInFlight" and > "spark.shuffle.io.numConnectionsPerPeer" and it did improve performance by > a little, but not enough to sat

Re: Help with Shuffle Read performance

2022-09-30 Thread Artemis User
"spark.reducer.maxSizeInFlight" and "spark.shuffle.io.numConnectionsPerPeer" and it did improve performance by a little, but not enough to saturate the cluster resources. Did I miss some more tuning parameters that could help? One obvious thing would be to vertically increase

Re: Help with Shuffle Read performance

2022-09-30 Thread Leszek Reimus
>> sized shuffle of almost 4TB. The relevant cluster config is as follows: >>>>> >>>>> - 30 Executors. 16 physical cores, configured with 32 Cores for spark >>>>> - 128 GB RAM >>>>> - shuffle.partitions is 18k which gives me tasks of around 150~18

Re: Help with Shuffle Read performance

2022-09-30 Thread Sungwoo Park
data from s3 and >>>> writing the shuffle data) CPU usage, disk throughput and network usage is >>>> as expected, but during the reduce phase it gets really low. It seems the >>>> main bottleneck is reading shuffle data from other nodes, task stati

Re: Help with Shuffle Read performance

2022-09-29 Thread Gourav Sengupta
gt;>> The job runs fine but I'm bothered by how underutilized the cluster >>>> gets during the reduce phase. During the map(reading data from s3 and >>>> writing the shuffle data) CPU usage, disk throughput and network usage is >>>> as expected, but during the reduce phase it g

Re: Help with Shuffle Read performance

2022-09-29 Thread Leszek Reimus
ng the reduce phase it gets really low. It seems the main >>> bottleneck is reading shuffle data from other nodes, task statistics >>> reports values ranging from 25s to several minutes(the task sizes are >>> really close, they aren't skewed). I've tried increasing >>&

Re: Help with Shuffle Read performance

2022-09-29 Thread Gourav Sengupta
ffle data from other nodes, task statistics >> reports values ranging from 25s to several minutes(the task sizes are >> really close, they aren't skewed). I've tried increasing >> "spark.reducer.maxSizeInFlight" and >> "spark.shuffle.io.numConnectionsPer

Re: Help with Shuffle Read performance

2022-09-29 Thread Igor Calabria
lues ranging from 25s to several minutes(the task sizes are >> really close, they aren't skewed). I've tried increasing >> "spark.reducer.maxSizeInFlight" and >> "spark.shuffle.io.numConnectionsPerPeer" and it did improve performance by >> a little, but not enough to saturate the cluster resou

Re: Help with Shuffle Read performance

2022-09-29 Thread Vladimir Prus
e, but not enough to saturate the cluster resources. > > Did I miss some more tuning parameters that could help? > One obvious thing would be to vertically increase the machines and use > less nodes to minimize traffic, but 30 nodes doesn't seem like much even > considering 30x30 connections. > > Thanks in advance! > > -- Vladimir Prus http://vladimirprus.com

Re: Help with Shuffle Read performance

2022-09-29 Thread Tufan Rakshit
that's Total Nonsense , EMR is total crap , use kubernetes i will help you . can you please provide whats the size of the shuffle file that is getting generated in each task . What's the total number of Partitions that you have ? What machines are you using ? Are you using an SSD ? Best Tufan

Re: Help with Shuffle Read performance

2022-09-29 Thread Gourav Sengupta
t; really close, they aren't skewed). I've tried increasing > "spark.reducer.maxSizeInFlight" and > "spark.shuffle.io.numConnectionsPerPeer" and it did improve performance by > a little, but not enough to saturate the cluster resources. > > Did I miss some more tun

Help with Shuffle Read performance

2022-09-29 Thread Igor Calabria
; and it did improve performance by a little, but not enough to saturate the cluster resources. Did I miss some more tuning parameters that could help? One obvious thing would be to vertically increase the machines and use less nodes to minimize traffic, but 30 nodes doesn't seem like much even consideri

HELP, Populating an empty pyspark dataframe with auto-generated dates

2022-09-22 Thread Jamie Arodi
I need help populating an empty dataframe in pyspark with auto-generated dates in a column in the format -mm-dd from 1900-01-01 to 2030-12-31. Kindly help.

Re: Need help with the configuration for AWS glue jobs

2022-06-23 Thread Sid
Gourav Sengupta, wrote: > Please use EMR, Glue is not made for heavy processing jobs. > > On Thu, Jun 23, 2022 at 6:36 AM Sid wrote: > >> Hi Team, >> >> Could anyone help me in the below problem: >> >> >> https://stackoverflow.com/questions/727249

Re: Need help with the configuration for AWS glue jobs

2022-06-23 Thread Gourav Sengupta
Please use EMR, Glue is not made for heavy processing jobs. On Thu, Jun 23, 2022 at 6:36 AM Sid wrote: > Hi Team, > > Could anyone help me in the below problem: > > > https://stackoverflow.com/questions/72724999/how-to-calculate-number-of-g-1-workers-in-aws-glue-for-p

Need help with the configuration for AWS glue jobs

2022-06-22 Thread Sid
Hi Team, Could anyone help me in the below problem: https://stackoverflow.com/questions/72724999/how-to-calculate-number-of-g-1-workers-in-aws-glue-for-processing-1tb-data Thanks, Sid

Structured streaming help on releasing memory

2022-05-09 Thread Xavi Gervilla
milar. Is there something wrong with the declaration of the window/watermark? What could be causing the data to keep accumulating even after the 10 minute watermark and after the batch is processed? If there's any additional information you might need or think might be helpful to understand better the problem I'll be happy to provide it. You all have been able to help in the past so thank you in advance.

Need help on migrating Spark on Hortonworks to Kubernetes Cluster

2022-05-08 Thread Chetan Khatri
Hi Everyone, I need help on my Airflow DAG which has Spark Submit and Now I have Kubernetes Cluster instead Hortonworks Linux Distributed Spark Cluster.My existing Spark-Submit is through BashOperator as below: calculation1 = '/usr/hdp/2.6.5.0-292/spark2/bin/spark-submit --conf

Unusual bug,please help me,i can do nothing!!!

2022-03-30 Thread spark User
ailed to initialize Spark session.org.apache.spark.SparkException: Invalid Spark URL: spark://HeartbeatReceiver@x.168.137.41:49963". When I try to add "x.168.137.41" in 'etc/hosts' it works fine, then use "ctrl+c" again. The result is that it cannot start normally. Please help me

error bug,please help me!!!

2022-03-20 Thread spark User
ailed to initialize Spark session.org.apache.spark.SparkException: Invalid Spark URL: spark://HeartbeatReceiver@x.168.137.41:49963". When I try to add "x.168.137.41" in 'etc/hosts' it works fine, then use "ctrl+c" again. The result is that it cannot start normally. Please help me

Re: Help With unstructured text file with spark scala

2022-02-25 Thread Danilo Sousa
IONAL ...| 65751353| Jose Silva| >>> |58693 - NACIONAL ...| 65751388| Joana Silva| >>> |58693 - NACIONAL ...| 65751353| Felipe Silva| >>> |58693 - NACIONAL ...| 65751388| Julia Silva| >>> ++-

Re: Help With unstructured text file with spark scala

2022-02-21 Thread Danilo Sousa
5751388| Julia Silva| >> ++---+-+ >> >> >> cat csv_file: >> >> Plano#Código Beneficiário#Nome Beneficiário >> 58693 - NACIONAL R COPART PJCE#065751353#Jose Silva >> 58693 - NACIONAL R COPART PJ

Re: Help With unstructured text file with spark scala

2022-02-13 Thread Rafael Mendes
Jose Silva| >> |58693 - NACIONAL ...| 65751388| Joana Silva| >> |58693 - NACIONAL ...| 65751353| Felipe Silva| >> |58693 - NACIONAL ...| 65751388| Julia Silva| >> ++---+-+

Re: Help With unstructured text file with spark scala

2022-02-09 Thread Bitfox
t; Plano#Código Beneficiário#Nome Beneficiário > 58693 - NACIONAL R COPART PJCE#065751353#Jose Silva > 58693 - NACIONAL R COPART PJCE#065751388#Joana Silva > 58693 - NACIONAL R COPART PJCE#065751353#Felipe Silva > > 58693 - NACIONAL R COPART PJCE#065751388#Julia Silva >

Re: Help With unstructured text file with spark scala

2022-02-09 Thread Danilo Sousa
open attachments unless you can confirm the sender and know > the content is safe. > > > >Hi >I have to transform unstructured text to dataframe. >Could anyone please help with Scala code ? > >Dataframe need as: > >operadora filial un

Re: Help With unstructured text file with spark scala

2022-02-09 Thread Danilo Sousa
#065751353#Jose Silva > 58693 - NACIONAL R COPART PJCE#065751388#Joana Silva > 58693 - NACIONAL R COPART PJCE#065751353#Felipe Silva > 58693 - NACIONAL R COPART PJCE#065751388#Julia Silva > > > Regards > > > On Wed, Feb 9, 2022 at 12:50 AM Danilo Sousa <mail

Re: Help With unstructured text file with spark scala

2022-02-08 Thread Bitfox
va 58693 - NACIONAL R COPART PJCE#065751388#Joana Silva 58693 - NACIONAL R COPART PJCE#065751353#Felipe Silva 58693 - NACIONAL R COPART PJCE#065751388#Julia Silva Regards On Wed, Feb 9, 2022 at 12:50 AM Danilo Sousa wrote: > Hi > I have to transform unstructured text to dataframe. &g

Re: Help With unstructured text file with spark scala

2022-02-08 Thread Lalwani, Jayesh
, 11:50 AM, "Danilo Sousa" wrote: CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you can confirm the sender and know the content is safe. Hi I have to transform unstructured text to dataframe. Coul

Help With unstructured text file with spark scala

2022-02-08 Thread Danilo Sousa
Hi I have to transform unstructured text to dataframe. Could anyone please help with Scala code ? Dataframe need as: operadora filial unidade contrato empresa plano codigo_beneficiario nome_beneficiario Relação de Beneficiários Ativos e Excluídos Carteira em#27/12/2019##Todos os Beneficiários

Re: help check my simple job

2022-02-06 Thread capitnfrakass
That did resolve my issue. Thanks a lot. frakass n 06/02/2022 17:25, Hannes Bibel wrote: Hi, looks like you're packaging your application for Scala 2.13 (should be specified in your build.sbt) while your Spark installation is built for Scala 2.12. Go to

Re: help check my simple job

2022-02-06 Thread Hannes Bibel
Hi, looks like you're packaging your application for Scala 2.13 (should be specified in your build.sbt) while your Spark installation is built for Scala 2.12. Go to https://spark.apache.org/downloads.html, select under "Choose a package type" the package type that says "Scala 2.13". With that

help check my simple job

2022-02-06 Thread capitnfrakass
Hello I wrote this simple job in scala: $ cat Myjob.scala import org.apache.spark.sql.SparkSession object Myjob { def main(args: Array[String]): Unit = { val sparkSession = SparkSession.builder.appName("Simple Application").getOrCreate() val sparkContext =

Re: About some Spark technical help

2021-12-24 Thread sam smith
i Sam >>>>> >>>>> >>>>> >>>>> Can you tell us more? What is the algorithm? Can you send us the URL >>>>> the publication >>>>> >>>>> >>>>> >>>>> Kind regards &

Re: About some Spark technical help

2021-12-24 Thread Andrew Davidson
>>>> Hi Sam >>>> >>>> >>>> >>>> Can you tell us more? What is the algorithm? Can you send us the URL >>>> the publication >>>> >>>> >>>> >>>> Kind regards >>>> >>>> >

Re: About some Spark technical help

2021-12-24 Thread sam smith
gt; >>> >>> >>> Kind regards >>> >>> >>> >>> Andy >>> >>> >>> >>> *From: *sam smith >>> *Date: *Wednesday, December 22, 2021 at 10:59 AM >>> *To: *"user@spark.apache.org" >&g

Re: About some Spark technical help

2021-12-24 Thread Gourav Sengupta
>> >> >> Andy >> >> >> >> *From: *sam smith >> *Date: *Wednesday, December 22, 2021 at 10:59 AM >> *To: *"user@spark.apache.org" >> *Subject: *About some Spark technical help >> >> >> >> Hello guys, >> >&g

Re: About some Spark technical help

2021-12-23 Thread sam smith
send us the URL the > publication > > > > Kind regards > > > > Andy > > > > *From: *sam smith > *Date: *Wednesday, December 22, 2021 at 10:59 AM > *To: *"user@spark.apache.org" > *Subject: *About some Spark technical help > > > &g

Re: About some Spark technical help

2021-12-23 Thread Andrew Davidson
Hi Sam Can you tell us more? What is the algorithm? Can you send us the URL the publication Kind regards Andy From: sam smith Date: Wednesday, December 22, 2021 at 10:59 AM To: "user@spark.apache.org" Subject: About some Spark technical help Hello guys, I am replicating

dataset partitioning algorithm implementation help

2021-12-23 Thread sam smith
Hello All, I am replicating a paper's algorithm about a partitioning approach to anonymize datasets with Spark / Java, and want to ask you for some help to review my 150 lines of code. My github repo, attached below, contains both my java class and the related paper: https://github.com

About some Spark technical help

2021-12-22 Thread sam smith
Hello guys, I am replicating a paper's algorithm in Spark / Java, and want to ask you guys for some assistance to validate / review about 150 lines of code. My github repo contains both my java class and the related paper, Any interested reviewer here ? Thanks.

About some Spark technical help

2021-12-22 Thread sam smith
Hello guys, I am replicating a paper's algorithm in Spark / Java, and want to ask you guys for some assistance to validate / review about 150 lines of code. My github repo contains both my java class and the related paper, Any interested reviewer here ? Thanks.

Spark usage help

2021-09-01 Thread yinghua...@163.com
Hi: I found that the following methods are used when setting parameters to create a sparksession access hive table 1) hive.execution.engine:spark spark = SparkSession.builder() .appName("get data from hive") .config("hive.execution.engine", "spark") .enableHiveSupport() .getOrCreate()

Re: Spark AQE Post-Shuffle partitions coalesce don't work as expected, and even make data skew in some partitions. Need help to debug issue.

2021-07-06 Thread Mich Talebzadeh
echnical content is explicitly disclaimed. > The author will in no case be liable for any monetary damages arising from > such loss, damage or destruction. > > > > > On Sun, 4 Jul 2021 at 14:13, Nick Grigoriev wrote: > >> I ha

Re: Spark AQE Post-Shuffle partitions coalesce don't work as expected, and even make data skew in some partitions. Need help to debug issue.

2021-07-05 Thread Nick Grigoriev
4 Jul 2021 at 14:13, Nick Grigoriev <mailto:grigo...@gmail.com>> wrote: > I have ask this question on stack overflow, but it look to complex for Q/A > resource. > https://stackoverflow.com/questions/68236323/spark-aqe-post-shuffle-partitions-coalesce-dont-work-as-expected-and-even-

Re: Spark AQE Post-Shuffle partitions coalesce don't work as expected, and even make data skew in some partitions. Need help to debug issue.

2021-07-04 Thread Mich Talebzadeh
/stackoverflow.com/questions/68236323/spark-aqe-post-shuffle-partitions-coalesce-dont-work-as-expected-and-even-make > So I want ask for help here. > > I use global sort on my spark DF, and when I enable AQE and post-shuffle > coalesce, my partitions after sort operation bec

Spark AQE Post-Shuffle partitions coalesce don't work as expected, and even make data skew in some partitions. Need help to debug issue.

2021-07-04 Thread Nick Grigoriev
I have ask this question on stack overflow, but it look to complex for Q/A resource. https://stackoverflow.com/questions/68236323/spark-aqe-post-shuffle-partitions-coalesce-dont-work-as-expected-and-even-make So I want ask for help here. I use global sort on my spark DF, and when I enable AQE

Need help to create database and integration woth Spark App in local machine

2021-06-12 Thread Himanshu Soni
Hi Team, Could you please help with below : 1. Want to create a database (Oracle) with some tables in local machine 2. Integrate the database tables so i can query them from Spark App in local machine Thanks & Regards- Himanshu Soni Mobile: +91 8411000279

Re: [EXTERNAL] Urgent Help - Py Spark submit error

2021-05-15 Thread Mich Talebzadeh
This is an interesting one. I have never tried to add --files ... spark-submit --master yarn --deploy-mode client --files /etc/hive/conf/hive-site.xml,/etc/hadoop/conf/core-site.xml,/etc/hadoop/conf/hdfs-site.xml Rather, under $SPARK_HOME/conf, I create soft links to the needed XML files as

Re: [EXTERNAL] Urgent Help - Py Spark submit error

2021-05-15 Thread KhajaAsmath Mohammed
Thanks everyone. I was able to resolve this. Here is what I did. Just passed conf file using —files option. Mistake that I did was reading the json conf file before creating spark session . Reading if after creating spark session helped it. Thanks once again for your valuable suggestions

Re: [EXTERNAL] Urgent Help - Py Spark submit error

2021-05-15 Thread Sean Owen
If code running on the executors need some local file like a config file, then it does have to be passed this way. That much is normal. On Sat, May 15, 2021 at 1:41 AM Gourav Sengupta wrote: > Hi, > > once again lets start with the requirement. Why are you trying to pass xml > and json files to

Re: [EXTERNAL] Urgent Help - Py Spark submit error

2021-05-15 Thread Gourav Sengupta
executors. >>>>> >>>>> >>>>> On Fri, May 14, 2021 at 5:01 PM Longjiang.Yang < >>>>> longjiang.y...@target.com> wrote: >>>>> >>>>>> Could you check whether this file is accessible in executors? (is it >>>>>> in HDFS or in the client local FS) >>>>>> /appl/common/ftp/conf.json >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> *From: *KhajaAsmath Mohammed >>>>>> *Date: *Friday, May 14, 2021 at 4:50 PM >>>>>> *To: *"user @spark" >>>>>> *Subject: *[EXTERNAL] Urgent Help - Py Spark submit error >>>>>> >>>>>> >>>>>> >>>>>> /appl/common/ftp/conf.json >>>>>> >>>>>

Re: [EXTERNAL] Urgent Help - Py Spark submit error

2021-05-14 Thread Amit Joshi
> >>>>> Could you check whether this file is accessible in executors? (is it >>>>> in HDFS or in the client local FS) >>>>> /appl/common/ftp/conf.json >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> *From: *KhajaAsmath Mohammed >>>>> *Date: *Friday, May 14, 2021 at 4:50 PM >>>>> *To: *"user @spark" >>>>> *Subject: *[EXTERNAL] Urgent Help - Py Spark submit error >>>>> >>>>> >>>>> >>>>> /appl/common/ftp/conf.json >>>>> >>>>

Re: [EXTERNAL] Urgent Help - Py Spark submit error

2021-05-14 Thread KhajaAsmath Mohammed
; >>> On Fri, May 14, 2021 at 5:01 PM Longjiang.Yang < >>> longjiang.y...@target.com> wrote: >>> >>>> Could you check whether this file is accessible in executors? (is it in >>>> HDFS or in the client local FS) >>>> /appl/common/ftp

Re: [EXTERNAL] Urgent Help - Py Spark submit error

2021-05-14 Thread KhajaAsmath Mohammed
gt;>> >>> >>> >>> >>> *From: *KhajaAsmath Mohammed >>> *Date: *Friday, May 14, 2021 at 4:50 PM >>> *To: *"user @spark" >>> *Subject: *[EXTERNAL] Urgent Help - Py Spark submit error >>> >>> >>> >>> /appl/common/ftp/conf.json >>> >>

Re: [EXTERNAL] Urgent Help - Py Spark submit error

2021-05-14 Thread KhajaAsmath Mohammed
gt;> >> >> *From: *KhajaAsmath Mohammed >> *Date: *Friday, May 14, 2021 at 4:50 PM >> *To: *"user @spark" >> *Subject: *[EXTERNAL] Urgent Help - Py Spark submit error >> >> >> >> /appl/common/ftp/conf.json >> >

Re: [EXTERNAL] Urgent Help - Py Spark submit error

2021-05-14 Thread KhajaAsmath Mohammed
cal FS) > /appl/common/ftp/conf.json > > > > > > *From: *KhajaAsmath Mohammed > *Date: *Friday, May 14, 2021 at 4:50 PM > *To: *"user @spark" > *Subject: *[EXTERNAL] Urgent Help - Py Spark submit error > > > > /appl/common/ftp/conf.json >

Urgent Help - Py Spark submit error

2021-05-14 Thread KhajaAsmath Mohammed
Hi, I am having a weird situation where the below command works when the deploy mode is a client and fails if it is a cluster. spark-submit --master yarn --deploy-mode client --files /etc/hive/conf/hive-site.xml,/etc/hadoop/conf/core-site.xml,/etc/hadoop/conf/hdfs-site.xml --driver-memory 70g

Re: Installation Error - Please Help!

2021-05-11 Thread Sean Owen
is error > "'spark-shell' is not recognized as an internal or external command,operable > program or batch file." > I am sharing the screenshots of my environment variables. Please help me. > I am stuck now

Installation Error - Please Help!

2021-05-11 Thread Talha Javed
va HotSpot(TM) 64-Bit Server VM (build 25.291-b10, mixed mode) WHEN I ENTER THE COMMAND spark-shell in cmd it gives me this error "'spark-shell' is not recognized as an internal or external command,operable program or batch file." I am sharing the screenshots of my environment variables.

Need help on Calling Pyspark code using Wheel

2020-10-23 Thread Sachit Murarka
Hi Users, I have created a wheel file using Poetry. I tried running the following commands to run spark job using wheel , but it is not working. Can anyone please let me know about the invocation step for the wheel file? spark-submit --py-files /path/to/wheel spark-submit --files /path/to/wheel

Re: Spark : Very simple query failing [Needed help please]

2020-09-26 Thread Gourav Sengupta
Hi How did you set up your environment? And can you print the schema of your table as well? It looks like you are using hive tables? Regards Gourav On Fri, 18 Sep 2020, 14:11 Debabrata Ghosh, wrote: > Hi, > I needed some help from you on the attached Spark problem > ple

Spark : Very simple query failing [Needed help please]

2020-09-18 Thread Debabrata Ghosh
Hi, I needed some help from you on the attached Spark problem please. I am running the following query: >>> df_location = spark.sql("""select dt from ql_raw_zone.ext_ql_location where ( lat between 41.67 and 45.82) and (lon between -86.74 and -82.42 ) a

Re: help on use case - spark parquet processing

2020-08-13 Thread Amit Sharma
Can you keep option field in your case class. Thanks Amit On Thu, Aug 13, 2020 at 12:47 PM manjay kumar wrote: > Hi , > > I have a use case, > > where i need to merge three data set and build one where ever data is > available. > > And my dataset is a complex object. > > Customer > - name -

help on use case - spark parquet processing

2020-08-13 Thread manjay kumar
Hi , I have a use case, where i need to merge three data set and build one where ever data is available. And my dataset is a complex object. Customer - name - string - accounts - List Account - type - String - Adressess - List Address -name - String --- And it goes on. These file

Re: Apache Spark- Help with email library

2020-07-27 Thread Suat Toksöz
Why I am not able to send my question to the spark email list? Thanks On Mon, Jul 27, 2020 at 10:31 AM tianlangstudio wrote: > I use SimpleJavaEmail http://www.simplejavamail.org/#/features for Send > email and parse email file. It is awesome and may help you. > > <htt

回复:Apache Spark- Help with email library

2020-07-27 Thread tianlangstudio
I use SimpleJavaEmail http://www.simplejavamail.org/#/features for Send email and parse email file. It is awesome and may help you. TianlangStudio Some of the biggest lies: I will start tomorrow/Others are better than me/I am not good enough/I don't have time/This is the way I am

Apache Spark- Help with email library

2020-07-26 Thread sn . noufal
Hi, I am looking to send a dataframe as email.How do I do that? Do you have any library with sample.Appreciate your response Regards, Mohamed - To unsubscribe e-mail: user-unsubscr...@spark.apache.org

Re: Need your help!! (URGENT Code works fine when submitted as java main but part of data missing when running as Spark-Submit)

2020-07-23 Thread murat migdisoglu
a potential reason might be that you are getting a classnotfound exception when you run on the cluster (due to a missing jar in your uber jar) and you are possibly silently eating up exceptions in your code. 1- you can check if there are any failed tasks 2- you can check if there are any failed

Re: Need your help!! (URGENT Code works fine when submitted as java main but part of data missing when running as Spark-Submit)

2020-07-21 Thread Pasha Finkelshteyn
Hi Rachana, Couls you please provide us with mre details: Minimal repro Spark version Java version Scala version On 20/07/21 08:27AM, Rachana Srivastava wrote: > I am unable to identify the root cause of why my code is missing data when I > run as spark-submit but the code works fine when I

Need your help!! (URGENT Code works fine when submitted as java main but part of data missing when running as Spark-Submit)

2020-07-21 Thread Rachana Srivastava
I am unable to identify the root cause of why my code is missing data when I run as spark-submit but the code works fine when I run as java main  Any idea

Re: can we all help use our expertise to create an IT solution for Covid-19

2020-03-26 Thread Mich Talebzadeh
Hi all, Many thanks for your contribution. A number of colleagues have proposed existing sites devoted to analyzing this virus mainly in US and North America. Also donating computers etc. although worthy gestures will not be enough. What I had in mind (and apologies if this looks pedantic) is

Re: can we all help use our expertise to create an IT solution for Covid-19

2020-03-26 Thread Ruijing Li
gt; > - COVID-19 Open Research Dataset Challenge (CORD-19) > <https://www.kaggle.com/allen-institute-for-ai/CORD-19-research-challenge#> > > BTW, I had co-presented in a recent tech talk on Analyzing COVID-19: Can > the Data Community Help? <https://www.youtube.com/watch?

Re: can we all help use our expertise to create an IT solution for Covid-19

2020-03-26 Thread Denny Lee
ISandData/COVID-19> - COVID-19 Open Research Dataset Challenge (CORD-19) <https://www.kaggle.com/allen-institute-for-ai/CORD-19-research-challenge#> BTW, I had co-presented in a recent tech talk on Analyzing COVID-19: Can the Data Community Help? <https://www.youtube.com/watch?v=A0

Re: can we all help use our expertise to create an IT solution for Covid-19

2020-03-26 Thread Rajev Agarwal
Actually I thought these sites exist look at John's hopkins and worldometers On Thu, Mar 26, 2020, 2:27 PM Zahid Rahman wrote: > > "We can then donate this to WHO or others and we can make it very modular > though microservices etc." > > I have no interest because there are 8 million muslims

Re: can we all help use our expertise to create an IT solution for Covid-19

2020-03-26 Thread Mich Talebzadeh
Thank you for your remarks. Points taken. Dr Mich Talebzadeh LinkedIn * https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw * http://talebzadehmich.wordpress.com *Disclaimer:*

Re: can we all help use our expertise to create an IT solution for Covid-19

2020-03-26 Thread Zahid Rahman
"We can then donate this to WHO or others and we can make it very modular though microservices etc." I have no interest because there are 8 million muslims locked up in their home for 8 months by the Hindutwa (Indians) You didn't take any notice of them. Now you are locked up in your home and you

Re: can we all help use our expertise to create an IT solution for Covid-19

2020-03-26 Thread Mich Talebzadeh
Thanks but nobody claimed we can fix it. However, we can all contribute to it. When it utilizes the cloud then it become a global digitization issue. HTH Dr Mich Talebzadeh LinkedIn * https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw

Re: can we all help use our expertise to create an IT solution for Covid-19

2020-03-26 Thread Laurent Bastien Corbeil
People in tech should be more humble and admit this is not something they can fix. There's already plenty of visualizations, dashboards etc showing the spread of the virus. This is not even a big data problem, so Spark would have limited use. On Thu, Mar 26, 2020 at 10:37 AM Sol Rodriguez wrote:

Re: can we all help use our expertise to create an IT solution for Covid-19

2020-03-26 Thread Sol Rodriguez
IMO it's not about technology, it's about data... if we don't have access to the data there's no point throwing "microservices" and "kafka" at the problem. You might find that the most effective analysis might be delivered through an excel sheet ;) So before technology I'd suggest to get access to

Re: can we all help use our expertise to create an IT solution for Covid-19

2020-03-26 Thread Chenguang He
Have you taken a look at this ( https://coronavirus.1point3acres.com/en/test )? They have a visualizer with a very basic analysis of the outbreak. On Thu, Mar 26, 2020 at 8:54 AM Mich Talebzadeh wrote: > Thanks. > > Agreed, computers are not the end but means to an end. We all have to > start

Re: can we all help use our expertise to create an IT solution for Covid-19

2020-03-26 Thread Mich Talebzadeh
Thanks. Agreed, computers are not the end but means to an end. We all have to start from somewhere. It all helps. HTH Dr Mich Talebzadeh LinkedIn * https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw

Re: can we all help use our expertise to create an IT solution for Covid-19

2020-03-26 Thread hemant singh
Hello Mich, I will be more than happy to contribute to this. Thanks, Hemant On Thu, Mar 26, 2020 at 7:11 PM Mich Talebzadeh wrote: > Hi all, > > Do you think we can create a global solution in the cloud using > volunteers like us and third party employees. What I have in mind is to > create

can we all help use our expertise to create an IT solution for Covid-19

2020-03-26 Thread Mich Talebzadeh
Hi all, Do you think we can create a global solution in the cloud using volunteers like us and third party employees. What I have in mind is to create a comprehensive real time solution to get data from various countries, universities pushed into a fast database through Kafka and Spark and used

Need help with Application Detail UI link

2019-12-05 Thread Sunil Patil
Hello, I am running Standalone spark server in EC2. My cluster has 1 master and 16 worker nodes. I have a jenkins server that calls spark-submit command like this /mnt/services/spark/bin/spark-submit --master spark://172.22.6.181:6066 --deploy-mode cluster --conf spark.driver.maxResultSize=1g

Re: Need help regarding logging / log4j.properties

2019-10-31 Thread Roland Johann
Hi Debu, you need to define spark config properties before the jar file path at spark-submit. Everything after the jar path will be passed as arguments to your application. Best Regards Debabrata Ghosh schrieb am Do. 31. Okt. 2019 um 03:26: > Greetings All ! > > I needed

Need help regarding logging / log4j.properties

2019-10-30 Thread Debabrata Ghosh
Greetings All ! I needed some help in obtaining the application logs but I am really confused where it's currently located. Please allow me to explain my problem: 1. I am running the Spark application (written in Java) in a Hortonworks Data Platform Hadoop cluster 2. My spark-submit command

[Ask for help] How to manually submit offsetRanges

2019-09-20 Thread Fangyuan Liu
Hello Sir/Madam, I am using spark streaming and kafka java API. And I want to know if there is any method to commit OffsetRanges except for `commitAsync`. The problem is: I made some modification on the OffsetRanges, and I commit it using `commitAsync` method, however, the modification

  1   2   3   4   5   6   7   8   >