Re: Subject: [Spark SQL] [Debug] Spark Memory Issue with DataFrame Processing

2024-05-27 Thread Shay Elbaz
rk.apache.org Subject: Re: Subject: [Spark SQL] [Debug] Spark Memory Issue with DataFrame Processing This message contains hyperlinks, take precaution before opening these links. Few ideas on top of my head for how to go about solving the problem 1. Try with subsets: Try reproduc

Re: Subject: [Spark SQL] [Debug] Spark Memory Issue with DataFrame Processing

2024-05-27 Thread Mich Talebzadeh
Few ideas on top of my head for how to go about solving the problem 1. Try with subsets: Try reproducing the issue with smaller subsets of your data to pinpoint the specific operation causing the memory problems. 2. Explode or Flatten Nested Structures: If your DataFrame schema

Subject: [Spark SQL] [Debug] Spark Memory Issue with DataFrame Processing

2024-05-27 Thread Gaurav Madan
Dear Community, I'm reaching out to seek your assistance with a memory issue we've been facing while processing certain large and nested DataFrames using Apache Spark. We have encountered a scenario where the driver runs out of memory when applying the `withColumn` method on specific DataFrames

[Advanced][Spark Core][Debug] Spark incorrectly deserializes object during Dataset.map

2021-09-24 Thread Eddie
I am calling the Spark Dataset API (map method) and getting exceptions on deserialization of task results. I am calling this API from Clojure using standard JVM interop syntax. This gist has a tiny Clojure program that shows the problem, as well as the corresponding (working) Scala

[Debug] [Spark Core 2.4.4] org.apache.spark.storage.BlockException: Negative block size -9223372036854775808

2020-06-29 Thread Adam Tobey
Hi, I'm encountering a strange exception in spark 2.4.4 (on AWS EMR 5.29): org.apache.spark.storage.BlockException: Negative block size -9223372036854775808. I've seen this mostly from this line (for remote blocks)

Re: How to debug Spark job

2018-09-08 Thread Marco Mistroni
Hi Might sound like a dumb advice. But try to break apart your process. Sounds like you Are doing ETL start basic with just ET. and do the changes that results in issues If no problem add the load step Enable spark logging so that you can post error message to the list I think you can have a look

Re: [External Sender] How to debug Spark job

2018-09-08 Thread Sonal Goyal
You could also try to profile your program on the executor or driver by using jvisualvm or yourkit to see if there is any memory/cpu optimization you could do. Thanks, Sonal Nube Technologies On Fri, Sep 7, 2018 at 6:35 PM, James

Re: [External Sender] How to debug Spark job

2018-09-07 Thread James Starks
Got the root cause eventually as it throws java.lang.OutOfMemoryError: Java heap space. Increasing --driver-memory temporarily fixes the problem. Thanks. ‐‐‐ Original Message ‐‐‐ On 7 September 2018 12:32 PM, Femi Anthony wrote: > One way I would go about this would be to try running

Re: [External Sender] How to debug Spark job

2018-09-07 Thread Femi Anthony
One way I would go about this would be to try running a new_df.show(numcols, truncate=False) on a few columns before you try writing to parquet to force computation of newdf and see whether the hanging is occurring at that point or during the write. You may also try doing a newdf.count() as well.

How to debug Spark job

2018-09-07 Thread James Starks
I have a Spark job that reads from a postgresql (v9.5) table, and write result to parquet. The code flow is not complicated, basically case class MyCaseClass(field1: String, field2: String) val df = spark.read.format("jdbc")...load() df.createOrReplaceTempView(...) val newdf =

Re: how to debug spark app?

2016-08-04 Thread Ben Teeuwen
Related question: what are good profiling tools other than watching along the application master with the running code? Are there things that can be logged during the run? If I have say 2 ways of accomplishing the same thing, and I want to learn about the time/memory/general resource blocking

Re: how to debug spark app?

2016-08-03 Thread Sumit Khanna
Am not really sure of the best practices on this , but I either consult the localhost:4040/jobs/ etc or better this : val customSparkListener: CustomSparkListener = new CustomSparkListener() sc.addSparkListener(customSparkListener) class CustomSparkListener extends SparkListener { override def

Re: how to debug spark app?

2016-08-03 Thread Ted Yu
Have you looked at: https://spark.apache.org/docs/latest/running-on-yarn.html#debugging-your-application If you use Mesos: https://spark.apache.org/docs/latest/running-on-mesos.html#troubleshooting-and-debugging On Wed, Aug 3, 2016 at 6:13 PM, glen wrote: > Any tool like gdb?

how to debug spark app?

2016-08-03 Thread glen
Any tool like gdb? Which support break point at some line or some function?

Re: Debug spark jobs on Intellij

2016-05-31 Thread Marcelo Oikawa
> Is this python right? I'm not used to it, I'm used to scala, so > No. It is Java. > val toDebug = rdd.foreachPartition(partition -> { //breakpoint stop here > *// by val toDebug I mean to assign the result of foreachPartition to a > variable* > partition.forEachRemaining(message -> { >

Re: Debug spark jobs on Intellij

2016-05-31 Thread Dirceu Semighini Filho
Try this: Is this python right? I'm not used to it, I'm used to scala, so val toDebug = rdd.foreachPartition(partition -> { //breakpoint stop here *// by val toDebug I mean to assign the result of foreachPartition to a variable* partition.forEachRemaining(message -> { //breakpoint

Re: Debug spark jobs on Intellij

2016-05-31 Thread Marcelo Oikawa
> Hi Marcelo, this is because the operations in rdd are lazy, you will only > stop at this inside foreach breakpoint when you call a first, a collect or > a reduce operation. > Does forEachRemaining isn't a final method as first, collect or reduce? Anyway, I guess this is not the problem itself

Re: Debug spark jobs on Intellij

2016-05-31 Thread Dirceu Semighini Filho
Hi Marcelo, this is because the operations in rdd are lazy, you will only stop at this inside foreach breakpoint when you call a first, a collect or a reduce operation. This is when the spark will run the operations. Have you tried that? Cheers. 2016-05-31 17:18 GMT-03:00 Marcelo Oikawa

Debug spark jobs on Intellij

2016-05-31 Thread Marcelo Oikawa
Hello, list. I'm trying to debug my spark application on Intellij IDE. Before I submit my job, I ran the command line: export SPARK_SUBMIT_OPTS=-agentlib:jdwp=transport=dt_socket,server=y,suspend=y,address=4000 after that: bin/spark-submit app-jar-with-dependencies.jar The IDE connects with

Re: Debug spark core and streaming programs in scala

2016-05-16 Thread Ted Yu
>From https://spark.apache.org/docs/latest/monitoring.html#metrics : - JmxSink: Registers metrics for viewing in a JMX console. FYI On Sun, May 15, 2016 at 11:54 PM, Mich Talebzadeh wrote: > Have you tried Spark GUI on 4040. This will show jobs being executed by

Re: Debug spark core and streaming programs in scala

2016-05-16 Thread Mich Talebzadeh
Have you tried Spark GUI on 4040. This will show jobs being executed by executors is each stage and the line of code as well. [image: Inline images 1] Also command line tools like jps and jmonitor HTH Dr Mich Talebzadeh LinkedIn *

Debug spark core and streaming programs in scala

2016-05-15 Thread Deepak Sharma
Hi I have scala program consisting of spark core and spark streaming APIs Is there any open source tool that i can use to debug the program for performance reasons? My primary interest is to find the block of codes that would be exeuted on driver and what would go to the executors. Is there JMX

Fwd: [Help]:Strange Issue :Debug Spark Dataframe code

2016-04-17 Thread Divya Gehlot
Reposting again as unable to find the root cause where things are going wrong. Experts please help . -- Forwarded message -- From: Divya Gehlot <divya.htco...@gmail.com> Date: 15 April 2016 at 19:13 Subject: [Help]:Strange Issue :Debug Spark Dataframe code To: "

[Help]:Strange Issue :Debug Spark Dataframe code

2016-04-15 Thread Divya Gehlot
Hi, I am using Spark 1.5.2 with Scala 2.10. Is there any other option apart from "explain(true)" to debug Spark Dataframe code . I am facing strange issue . I have a lookuo dataframe and using it join another dataframe on different columns . I am getting *Analysis exception* in third

How to debug spark-core with function call stack?

2016-02-16 Thread DaeJin Jung
hello everyone, I would like to draw call stack of Spark-core by analyzing source code. But, I'm not sure how to apply debugging tool like gdb which can support backtrace command. Please let me know if you have any suggestion. Best Regards, Daejin Jung

How to debug Spark source using IntelliJ/ Eclipse

2015-12-05 Thread jatinganhotra
Hi, I am trying to understand Spark internal code and wanted to debug Spark source, to add a new feature. I have tried the steps lined out here on the Spark Wiki page IDE setup <https://cwiki.apache.org/confluence/display/SPARK/Useful+Developer+Tools#UsefulDeveloperTools-IDESe

Re: Debug Spark

2015-12-02 Thread Masf
t you started >> https://cwiki.apache.org/confluence/display/SPARK/Useful+Developer+Tools#UsefulDeveloperTools-IntelliJ >> >> Thanks >> Best Regards >> >> On Sun, Nov 29, 2015 at 9:48 PM, Masf <masfwo...@gmail.com> wrote: >> >>> Hi >>

Re: Debug Spark

2015-12-02 Thread Sudhanshu Janghel
<masfwo...@gmail.com> wrote: > >> Hi >> >> Is it possible to debug spark locally with IntelliJ or another IDE? >> >> Thanks >> >> -- >> Regards. >> Miguel Ángel >> > >

Re: Debug Spark

2015-12-02 Thread Akhil Das
This doc will get you started https://cwiki.apache.org/confluence/display/SPARK/Useful+Developer+Tools#UsefulDeveloperTools-IntelliJ Thanks Best Regards On Sun, Nov 29, 2015 at 9:48 PM, Masf <masfwo...@gmail.com> wrote: > Hi > > Is it possible to debug spark locally with IntelliJ

Re: Debug Spark

2015-11-30 Thread Jacek Laskowski
wrote: > Hi > > Is it possible to debug spark locally with IntelliJ or another IDE? > > Thanks > > -- > Regards. > Miguel Ángel - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For ad

Re: Debug Spark

2015-11-29 Thread Ndjido Ardo BAR
ebug with Intellij? >>> >>> Thanks >>> >>> Regards. >>> Miguel. >>> >>> >>> On Sun, Nov 29, 2015 at 5:32 PM, Ndjido Ardo BAR <ndj...@gmail.com> >>> wrote: >>> >>>> hi, >>>> >>>

Re: Debug Spark

2015-11-29 Thread Նարեկ Գալստեան
M, Ndjido Ardo BAR <ndj...@gmail.com> >> wrote: >> >>> hi, >>> >>> IntelliJ is just great for that! >>> >>> cheers, >>> Ardo. >>> >>> On Sun, Nov 29, 2015 at 5:18 PM, Masf <masfwo...@gmail.com> wrote: >>> >>>> Hi >>>> >>>> Is it possible to debug spark locally with IntelliJ or another IDE? >>>> >>>> Thanks >>>> >>>> -- >>>> Regards. >>>> Miguel Ángel >>>> >>> >>> >> >> >> -- >> >> >> Saludos. >> Miguel Ángel >> > >

Re: Debug Spark

2015-11-29 Thread Ndjido Ardo BAR
gt; >> cheers, >> Ardo. >> >> On Sun, Nov 29, 2015 at 5:18 PM, Masf <masfwo...@gmail.com> wrote: >> >>> Hi >>> >>> Is it possible to debug spark locally with IntelliJ or another IDE? >>> >>> Thanks >>> >>> -- >>> Regards. >>> Miguel Ángel >>> >> >> > > > -- > > > Saludos. > Miguel Ángel >

Re: Debug Spark

2015-11-29 Thread Danny Stephan
, > > IntelliJ is just great for that! > > cheers, > Ardo. > > On Sun, Nov 29, 2015 at 5:18 PM, Masf <masfwo...@gmail.com > <mailto:masfwo...@gmail.com>> wrote: > Hi > > Is it possible to debug spark locally with IntelliJ or another IDE? > > Thanks > > -- > Regards. > Miguel Ángel > > > > > -- > > > Saludos. > Miguel Ángel

Debug Spark

2015-11-29 Thread Masf
Hi Is it possible to debug spark locally with IntelliJ or another IDE? Thanks -- Regards. Miguel Ángel

Re: Debug Spark

2015-11-29 Thread Ndjido Ardo BAR
hi, IntelliJ is just great for that! cheers, Ardo. On Sun, Nov 29, 2015 at 5:18 PM, Masf <masfwo...@gmail.com> wrote: > Hi > > Is it possible to debug spark locally with IntelliJ or another IDE? > > Thanks > > -- > Regards. > Miguel Ángel >

Re: Debug Spark

2015-11-29 Thread Masf
sfwo...@gmail.com> wrote: > >> Hi >> >> Is it possible to debug spark locally with IntelliJ or another IDE? >> >> Thanks >> >> -- >> Regards. >> Miguel Ángel >> > > -- Saludos. Miguel Ángel

Re: Debug Spark Streaming in PyCharm

2015-07-10 Thread Tathagata Das
if that will work with the debugger. Thoughts? Cheers! Brandon Bradley -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Debug-Spark-Streaming-in-PyCharm-tp23766.html Sent from the Apache Spark User List mailing list archive at Nabble.com

Debug Spark Streaming in PyCharm

2015-07-10 Thread blbradley
/home/brandon/src/coins/coinspark/streaming.py I might be able to use spark-submit has the command PyCharm runs, but I'm not sure if that will work with the debugger. Thoughts? Cheers! Brandon Bradley -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Debug

How to debug spark in IntelliJ Idea

2015-05-18 Thread Yi.Zhang
to remote actor(spark master), the breakpoint would be enabled. I don't know how to debug it in IntelliJ Idea. I need help. Thanks. Regards, Yi -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/How-to-debug-spark-in-IntelliJ-Idea-tp22932.html Sent from

Re: How to debug Spark on Yarn?

2015-04-28 Thread Steve Loughran
On 27 Apr 2015, at 07:51, ÐΞ€ρ@Ҝ (๏̯͡๏) deepuj...@gmail.commailto:deepuj...@gmail.com wrote: Spark 1.3 1. View stderr/stdout from executor from Web UI: when the job is running i figured out the executor that am suppose to see, and those two links show 4 special characters on browser. 2.

Re: How to debug Spark on Yarn?

2015-04-27 Thread ๏̯͡๏
Spark 1.3 1. View stderr/stdout from executor from Web UI: when the job is running i figured out the executor that am suppose to see, and those two links show 4 special characters on browser. 2. Tail on Yarn logs: /apache/hadoop/bin/yarn logs -applicationId application_1429087638744_151059 |

Re: How to debug Spark on Yarn?

2015-04-27 Thread ๏̯͡๏
1) Application container logs from Web RM UI never load on browser. I eventually have to kill the browser. 2) /apache/hadoop/bin/yarn logs -applicationId application_1429087638744_151059 | less emits logs only after the application has completed. Are there no better ways to see the logs as they

Re: How to debug Spark on Yarn?

2015-04-27 Thread Zoltán Zvara
You can check container logs from RM web UI or when log-aggregation is enabled with the yarn command. There are other, but less convenient options. On Mon, Apr 27, 2015 at 8:53 AM ÐΞ€ρ@Ҝ (๏̯͡๏) deepuj...@gmail.com wrote: Spark 1.3 1. View stderr/stdout from executor from Web UI: when the job

Re: How to debug Spark on Yarn?

2015-04-24 Thread Marcelo Vanzin
On top of what's been said... On Wed, Apr 22, 2015 at 10:48 PM, ÐΞ€ρ@Ҝ (๏̯͡๏) deepuj...@gmail.com wrote: 1) I can go to Spark UI and see the status of the APP but cannot see the logs as the job progresses. How can i see logs of executors as they progress ? Spark 1.3 should have links to the

Re: How to debug Spark on Yarn?

2015-04-24 Thread Sven Krasser
For #1, click on a worker node on the YARN dashboard. From there, Tools-Local logs-Userlogs has the logs for each application, and you can view them by executor even while an application is running. (This is for Hadoop 2.4, things may have changed in 2.6.) -Sven On Thu, Apr 23, 2015 at 6:27 AM,

Re: How to debug Spark on Yarn?

2015-04-24 Thread Sven Krasser
On Fri, Apr 24, 2015 at 11:31 AM, Marcelo Vanzin van...@cloudera.com wrote: Spark 1.3 should have links to the executor logs in the UI while the application is running. Not yet in the history server, though. You're absolutely correct -- didn't notice it until now. This is a great addition!

Re: How to debug Spark on Yarn?

2015-04-23 Thread Ted Yu
For step 2, you can pipe application log to a file instead of copy-pasting. Cheers On Apr 22, 2015, at 10:48 PM, ÐΞ€ρ@Ҝ (๏̯͡๏) deepuj...@gmail.com wrote: I submit a spark app to YARN and i get these messages 15/04/22 22:45:04 INFO yarn.Client: Application report for

How to debug Spark on Yarn?

2015-04-22 Thread ๏̯͡๏
I submit a spark app to YARN and i get these messages 15/04/22 22:45:04 INFO yarn.Client: Application report for application_1429087638744_101363 (state: RUNNING) 15/04/22 22:45:04 INFO yarn.Client: Application report for application_1429087638744_101363 (state: RUNNING). ... 1) I can go to

How to properly debug spark streaming?

2014-10-28 Thread kpeng1
I am still fairly new to spark and spark streaming. I have been struggling with how to properly debug spark streaming and I was wondering what is the best approach. I have been basically putting println statements everywhere, but sometimes they show up when I run the job and sometimes they don't

Re: Debug Spark in Cluster Mode

2014-10-10 Thread Ilya Ganelin
Pujari rpuj...@hortonworks.com wrote: Hello Folks: What're some best practices to debug Spark in cluster mode? Thanks, Rohit CONFIDENTIALITY NOTICE NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information

Debug Spark in Cluster Mode

2014-10-09 Thread Rohit Pujari
Hello Folks: What're some best practices to debug Spark in cluster mode? Thanks, Rohit -- CONFIDENTIALITY NOTICE NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from

Re: best practice: write and debug Spark application in scala-ide and maven

2014-06-07 Thread Gerard Maas
...@us.ibm.com wrote: Hi, I am trying to write and debug Spark applications in scala-ide and maven, and in my code I target at a Spark instance at spark://xxx object App { def main(args : Array[String]) { println( Hello World! ) val sparkConf = new SparkConf().setMaster(spark://xxx

Re: best practice: write and debug Spark application in scala-ide and maven

2014-06-07 Thread Madhu
that should be sufficient for your example. - Madhu https://www.linkedin.com/in/msiddalingaiah -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/best-practice-write-and-debug-Spark-application-in-scala-ide-and-maven-tp7151p7183.html Sent from the Apache

best practice: write and debug Spark application in scala-ide and maven

2014-06-06 Thread Wei Tan
Hi, I am trying to write and debug Spark applications in scala-ide and maven, and in my code I target at a Spark instance at spark://xxx object App { def main(args : Array[String]) { println( Hello World! ) val sparkConf = new SparkConf().setMaster(spark://xxx:7077).setAppName