Dear Dirceu,
Below is our testing codes, as you can see, we have used "reduce" action to
evoke evaluation. However, it still did not stop at breakpoint-1(as shown in
the the code snippet) when debugging.
We are using IDEA version 14.0.3 to debug. It very very strange to us. Please
help us(me and my colleagues).
// scalastyle:off println
package org.apache.spark.examples
import scala.math.random
import org.apache.spark._
import scala.util.logging.Logged
/** Computes an approximation to pi */
object SparkPi{
def main(args: Array[String]) {
val conf = new SparkConf().setAppName("Spark Pi").setMaster("local")
val spark = new SparkContext(conf)
val slices = if (args.length > 0) args(0).toInt else 2
val n = math.min(100000L * slices, Int.MaxValue).toInt // avoid overflow
val count = spark.parallelize(1 until n, slices).map { i =>
val x = random * 2 - 1 (breakpoint-1)
val y = random * 2 - 1
if (x*x + y*y < 1) 1 else 0
}.reduce(_ + _)
println("Pi is roughly " + 4.0 * count / (n - 1))
spark.stop()
}
}
________________________________
发件人: Dirceu Semighini Filho <[email protected]>
发送时间: 2016年9月16日 22:27
收件人: chen yong
抄送: [email protected]
主题: Re: 答复: it does not stop at breakpoints which is in an anonymous function
No, that's not the right way of doing it.
Remember that RDD operations are lazy, due to performance reasons. Whenever you
call one of those operation methods (count, reduce, collect, ...) they will
execute all the functions that you have done to create that RDD.
It would help if you can post your code here, and also the way that you are
executing it, and trying to debug.
2016-09-16 11:23 GMT-03:00 chen yong
<[email protected]<mailto:[email protected]>>:
Also, I wonder what is the right way to debug spark program. If I use ten
anonymous function in one spark program, for debugging each of them, i have to
place a COUNT action in advace and then remove it after debugging. Is that the
right way?
________________________________
发件人: Dirceu Semighini Filho
<[email protected]<mailto:[email protected]>>
发送时间: 2016年9月16日 21:07
收件人: chen yong
抄送: [email protected]<mailto:[email protected]>
主题: Re: 答复: 答复: 答复: 答复: t it does not stop at breakpoints which is in an
anonymous function
Hello Felix,
No, this line isn't the one that is triggering the execution of the function,
the count does that, unless your count val is a lazy val.
The count method is the one that retrieves the information of the rdd, it has
do go through all of it's data do determine how many records the RDD has.
Regards,
2016-09-15 22:23 GMT-03:00 chen yong
<[email protected]<mailto:[email protected]>>:
Dear Dirceu,
Thanks for your kind help.
i cannot see any code line corresponding to "..... retrieve the data from your
DataFrame/RDDs....". which you suggested in the previous replies.
Later, I guess
the line
val test = count
is the key point. without it, it would not stop at the breakpont-1, right?
________________________________
发件人: Dirceu Semighini Filho
<[email protected]<mailto:[email protected]>>
发送时间: 2016年9月16日 0:39
收件人: chen yong
抄送: [email protected]<mailto:[email protected]>
主题: Re: 答复: 答复: 答复: t it does not stop at breakpoints which is in an anonymous
function
Hi Felix,
Are sure your n is greater than 0?
Here it stops first at breakpoint 1, image attached.
Have you got the count to see if it's also greater than 0?
2016-09-15 11:41 GMT-03:00 chen yong
<[email protected]<mailto:[email protected]>>:
Dear Dirceu
Thank you for your help.
Acutally, I use Intellij IDEA to dubug the spark code.
Let me use the following code snippet to illustrate my problem. In the code
lines below, I've set two breakpoints, breakpoint-1 and breakpoint-2. when i
debuged the code, it did not stop at breakpoint-1, it seems that the map
function was skipped and it directly reached and stoped at the breakpoint-2.
Additionally, I find the following two posts
(1)http://stackoverflow.com/questions/29208844/apache-spark-logging-within-scala
(2)https://www.mail-archive.com/[email protected]/msg29010.html
I am wondering whether loggin is an alternative approach to debugging spark
anonymous functions.
val count = spark.parallelize(1 to n, slices).map { i =>
val x = random * 2 - 1
val y = random * 2 - 1 (breakpoint-1 set in this line)
if (x*x + y*y < 1) 1 else 0
}.reduce(_ + _)
val test = x (breakpoint-2 set in this line)
________________________________
发件人: Dirceu Semighini Filho
<[email protected]<mailto:[email protected]>>
发送时间: 2016年9月14日 23:32
收件人: chen yong
主题: Re: 答复: 答复: t it does not stop at breakpoints which is in an anonymous
function
I don't know which IDE do you use. I use Intellij, and here there is an
Evaluate Expression dialog where I can execute code, whenever it has stopped in
a breakpoint.
In eclipse you have watch and inspect where you can do the same.
Probably you are not seeing the debug stop in your functions because you never
retrieve the data from your DataFrame/RDDs.
What are you doing with this function? Are you getting the result of this
RDD/Dataframe at some place?
You can add a count after the function that you want to debug, just for debug,
but don't forget to remove this after testing.
2016-09-14 12:20 GMT-03:00 chen yong
<[email protected]<mailto:[email protected]>>:
Dear Dirceu,
thanks you again.
Actually,I never saw it stopped at the breakpoints no matter how long I wait.
It just skipped the whole anonymous function to direactly reach the first
breakpoint immediately after the anonymous function body. Is that normal? I
suspect sth wrong in my debugging operations or settings. I am very new to
spark and scala.
Additionally, please give me some detailed instructions about "....Some ides
provide you a place where you can execute the code to see it's results....".
where is the PLACE
your help badly needed!
________________________________
发件人: Dirceu Semighini Filho
<[email protected]<mailto:[email protected]>>
发送时间: 2016年9月14日 23:07
收件人: chen yong
主题: Re: 答复: t it does not stop at breakpoints which is in an anonymous function
You can call a count in the ide just to debug, or you can wait until it reaches
the code, so you can debug.
Some ides provide you a place where you can execute the code to see it's
results.
Be aware of not adding this operations in your production code, because they
can slow down the execution of your code.
2016-09-14 11:43 GMT-03:00 chen yong
<[email protected]<mailto:[email protected]>>:
Thanks for your reply.
you mean i have to insert some codes, such as xxxxx.count or xxxxx.collect,
between the original spark code lines to invoke some operations, right?
but, where is the right places to put my code lines?
Felix
________________________________
发件人: Dirceu Semighini Filho
<[email protected]<mailto:[email protected]>>
发送时间: 2016年9月14日 22:33
收件人: chen yong
抄送: [email protected]<mailto:[email protected]>
主题: Re: t it does not stop at breakpoints which is in an anonymous function
Hello Felix,
Spark functions run lazy, and that's why it doesn't stop in those breakpoints.
They will be executed only when you call some methods of your dataframe/rdd,
like the count, collect, ...
Regards,
Dirceu
2016-09-14 11:26 GMT-03:00 chen yong
<[email protected]<mailto:[email protected]>>:
Hi all,
I am newbie to spark. I am learning spark by debugging the spark code. It is
strange to me that it does not stop at breakpoints which is in an anonymous
function, it is normal in ordianry function, though. It that normal. How to
obverse variables in an anonymous function.
Please help me. Thanks in advance!
Felix