Re: 答复: 答复: it does not stop at breakpoints which is in an anonymous function

2016-09-23 Thread Dirceu Semighini Filho
Hi Felix,
Just runned your code and it prints

Pi is roughly 4.0

Here is the code that I used as you didn't show what a random is I used the
nextInt()

 val n = math.min(10L * slices, Int.MaxValue).toInt // avoid overflow
val count = context.sparkContext.parallelize(1 until n, slices).map { i
=>
  val random = new scala.util.Random(1000).nextInt()
  val x = random * 2 - 1  //(breakpoint-1)
  val y = random * 2 - 1
  if (x*x + y*y < 1) 1 else 0
}.reduce(_ + _)
println("Pi is roughly " + 4.0 * count / (n - 1))
context.sparkContext.stop()

Also using the debug it stops into the map (breakpoint 1) before going to
print

2016-09-18 6:47 GMT-03:00 chen yong :

> Dear Dirceu,
>
>
> Below is  our testing codes, as you can see, we have used "reduce" action
> to evoke evaluation. However, it still did not stop at breakpoint-1(as
> shown in the the code snippet) when debugging.
>
>
>
> We are using IDEA  version 14.0.3 to debug.  It very very strange to us.
> Please help us(me and my colleagues).
>
>
> // scalastyle:off println
> package org.apache.spark.examples
> import scala.math.random
> import org.apache.spark._
> import scala.util.logging.Logged
>
> /** Computes an approximation to pi */
> object SparkPi{
>   def main(args: Array[String]) {
>
> val conf = new SparkConf().setAppName("Spark Pi").setMaster("local")
> val spark = new SparkContext(conf)
> val slices = if (args.length > 0) args(0).toInt else 2
> val n = math.min(10L * slices, Int.MaxValue).toInt // avoid
> overflow
> val count = spark.parallelize(1 until n, slices).map { i =>
> val x = random * 2 - 1  (breakpoint-1)
> val y = random * 2 - 1
> if (x*x + y*y < 1) 1 else 0
>   }.reduce(_ + _)
> println("Pi is roughly " + 4.0 * count / (n - 1))
> spark.stop()
>   }
> }
>
>
>
>
> --
> *发件人:* Dirceu Semighini Filho 
> *发送时间:* 2016年9月16日 22:27
> *收件人:* chen yong
> *抄送:* user@spark.apache.org
> *主题:* Re: 答复: it does not stop at breakpoints which is in an anonymous
> function
>
> No, that's not the right way of doing it.
> Remember that RDD operations are lazy, due to performance reasons.
> Whenever you call one of those operation methods (count, reduce, collect,
> ...) they will execute all the functions that you have done to create that
> RDD.
> It would help if you can post your code here, and also the way that you
> are executing it, and trying to debug.
>
>
> 2016-09-16 11:23 GMT-03:00 chen yong :
>
>> Also, I wonder what is the right way to debug  spark program. If I use
>> ten anonymous function in one spark program, for debugging each of them, i
>> have to place a COUNT action in advace and then remove it after debugging.
>> Is that the right way?
>>
>>
>> --
>> *发件人:* Dirceu Semighini Filho 
>> *发送时间:* 2016年9月16日 21:07
>> *收件人:* chen yong
>> *抄送:* user@spark.apache.org
>> *主题:* Re: 答复: 答复: 答复: 答复: t it does not stop at breakpoints which is in
>> an anonymous function
>>
>> Hello Felix,
>> No, this line isn't the one that is triggering the execution of the
>> function, the count does that, unless your count val is a lazy val.
>> The count method is the one that retrieves the information of the rdd, it
>> has do go through all of it's data do determine how many records the RDD
>> has.
>>
>> Regards,
>>
>> 2016-09-15 22:23 GMT-03:00 chen yong :
>>
>>>
>>> Dear Dirceu,
>>>
>>> Thanks for your kind help.
>>> i cannot see any code line corresponding to ". retrieve the data
>>> from your DataFrame/RDDs". which you suggested in the previous replies.
>>>
>>> Later, I guess
>>>
>>> the line
>>>
>>> val test = count
>>>
>>> is the key point. without it, it would not stop at the breakpont-1,
>>> right?
>>>
>>>
>>>
>>> --
>>> *发件人:* Dirceu Semighini Filho 
>>> *发送时间:* 2016年9月16日 0:39
>>> *收件人:* chen yong
>>> *抄送:* user@spark.apache.org
>>> *主题:* Re: 答复: 答复: 答复: t it does not stop at breakpoints which is in an
>>> anonymous function
>>>
>>> Hi Felix,
>>> Are sure your n is greater than 0?
>>> Here it stops first at breakpoint 1, image attached.
>>> Have you got the count to see if it's also greater than 0?
>>>
>>> 2016-09-15 11:41 GMT-03:00 chen yong :
>>>
 Dear Dirceu


 Thank you for your help.


 Acutally, I use Intellij IDEA to dubug the spark code.


 Let me use the following code snippet to illustrate my problem. In the
 code lines below, I've set two breakpoints, breakpoint-1 and breakpoint-2.
 when i debuged the code, it did not stop at breakpoint-1, it seems
 that the map

 function was skipped and it directly reached and stoped at the
 breakpoint-2.

 Additionally, I find the following two posts
 (1)http://stackoverflow.com/questions/29208844/apache-spark-
 logging-within-scala
 (2)https://www.mail-archive.com/user@spark.apache.org/msg29010.html

 I am wondering whether loggin i

答复: 答复: it does not stop at breakpoints which is in an anonymous function

2016-09-18 Thread chen yong
Dear Dirceu,


Below is  our testing codes, as you can see, we have used "reduce" action to 
evoke evaluation. However, it still did not stop at breakpoint-1(as shown in 
the the code snippet) when debugging.



We are using IDEA  version 14.0.3 to debug.  It very very strange to us. Please 
help us(me and my colleagues).


// scalastyle:off println
package org.apache.spark.examples
import scala.math.random
import org.apache.spark._
import scala.util.logging.Logged

/** Computes an approximation to pi */
object SparkPi{
  def main(args: Array[String]) {

val conf = new SparkConf().setAppName("Spark Pi").setMaster("local")
val spark = new SparkContext(conf)
val slices = if (args.length > 0) args(0).toInt else 2
val n = math.min(10L * slices, Int.MaxValue).toInt // avoid overflow
val count = spark.parallelize(1 until n, slices).map { i =>
val x = random * 2 - 1  (breakpoint-1)
val y = random * 2 - 1
if (x*x + y*y < 1) 1 else 0
  }.reduce(_ + _)
println("Pi is roughly " + 4.0 * count / (n - 1))
spark.stop()
  }
}





发件人: Dirceu Semighini Filho 
发送时间: 2016年9月16日 22:27
收件人: chen yong
抄送: user@spark.apache.org
主题: Re: 答复: it does not stop at breakpoints which is in an anonymous function

No, that's not the right way of doing it.
Remember that RDD operations are lazy, due to performance reasons. Whenever you 
call one of those operation methods (count, reduce, collect, ...) they will 
execute all the functions that you have done to create that RDD.
It would help if you can post your code here, and also the way that you are 
executing it, and trying to debug.


2016-09-16 11:23 GMT-03:00 chen yong 
mailto:cy...@hotmail.com>>:

Also, I wonder what is the right way to debug  spark program. If I use ten 
anonymous function in one spark program, for debugging each of them, i have to 
place a COUNT action in advace and then remove it after debugging. Is that the 
right way?



发件人: Dirceu Semighini Filho 
mailto:dirceu.semigh...@gmail.com>>
发送时间: 2016年9月16日 21:07
收件人: chen yong
抄送: user@spark.apache.org
主题: Re: 答复: 答复: 答复: 答复: t it does not stop at breakpoints which is in an 
anonymous function

Hello Felix,
No, this line isn't the one that is triggering the execution of the function, 
the count does that, unless your count val is a lazy val.
The count method is the one that retrieves the information of the rdd, it has 
do go through all of it's data do determine how many records the RDD has.

Regards,

2016-09-15 22:23 GMT-03:00 chen yong 
mailto:cy...@hotmail.com>>:


Dear Dirceu,

Thanks for your kind help.
i cannot see any code line corresponding to ". retrieve the data from your 
DataFrame/RDDs". which you suggested in the previous replies.

Later, I guess

the line

val test = count

is the key point. without it, it would not stop at the breakpont-1, right?




发件人: Dirceu Semighini Filho 
mailto:dirceu.semigh...@gmail.com>>
发送时间: 2016年9月16日 0:39
收件人: chen yong
抄送: user@spark.apache.org
主题: Re: 答复: 答复: 答复: t it does not stop at breakpoints which is in an anonymous 
function

Hi Felix,
Are sure your n is greater than 0?
Here it stops first at breakpoint 1, image attached.
Have you got the count to see if it's also greater than 0?

2016-09-15 11:41 GMT-03:00 chen yong 
mailto:cy...@hotmail.com>>:

Dear Dirceu


Thank you for your help.


Acutally, I use Intellij IDEA to dubug the spark code.


Let me use the following code snippet to illustrate my problem. In the code 
lines below, I've set two breakpoints, breakpoint-1 and breakpoint-2. when i 
debuged the code, it did not stop at breakpoint-1, it seems that the map

function was skipped and it directly reached and stoped at the breakpoint-2.

Additionally, I find the following two posts
(1)http://stackoverflow.com/questions/29208844/apache-spark-logging-within-scala
(2)https://www.mail-archive.com/user@spark.apache.org/msg29010.html

I am wondering whether loggin is an alternative approach to debugging spark 
anonymous functions.


val count = spark.parallelize(1 to n, slices).map { i =>
  val x = random * 2 - 1
  val y = random * 2 - 1 (breakpoint-1 set in this line)
  if (x*x + y*y < 1) 1 else 0
}.reduce(_ + _)
val test = x (breakpoint-2 set in this line)




发件人: Dirceu Semighini Filho 
mailto:dirceu.semigh...@gmail.com>>
发送时间: 2016年9月14日 23:32
收件人: chen yong
主题: Re: 答复: 答复: t it does not stop at breakpoints which is in an anonymous 
function

I don't know which IDE do you use. I use Intellij, and here there is an 
Evaluate Expression dialog where I can execute code, whenever it has stopped in 
a breakpoint.
In eclipse you have watch and inspect where you can do the same.
Probably you are not seeing the debug stop in your functions because you never 
retrieve the data from your DataFrame/RDD