"apply at Option.scala:120" callback in Spark 1.1, but no user code involved?

2014-09-15 Thread John Salvatier
In Spark 1.1, I'm seeing tasks with callbacks that don't involve my code at
all!
I'd seen something like this before in 1.0.0, but the behavior seems to be
back

apply at Option.scala:120


org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:202)
scala.Option.getOrElse(Option.scala:120)
org.apache.spark.rdd.RDD.partitions(RDD.scala:202)
org.apache.spark.rdd.FilteredRDD.getPartitions(FilteredRDD.scala:29)
org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:204)
org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:202)
scala.Option.getOrElse(Option.scala:120)
org.apache.spark.rdd.RDD.partitions(RDD.scala:202)
org.apache.spark.rdd.MappedRDD.getPartitions(MappedRDD.scala:28)
org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:204)
org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:202)
scala.Option.getOrElse(Option.scala:120)
org.apache.spark.rdd.RDD.partitions(RDD.scala:202)
org.apache.spark.rdd.FilteredRDD.getPartitions(FilteredRDD.scala:29)
org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:204)
org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:202)
scala.Option.getOrElse(Option.scala:120)
org.apache.spark.rdd.RDD.partitions(RDD.scala:202)
org.apache.spark.rdd.MappedRDD.getPartitions(MappedRDD.scala:28)
org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:204)

Ideas on what might be going on?


setCallSite for API backtraces not showing up in logs?

2014-08-18 Thread John Salvatier
What's the correct way to use setCallSite to get the change to show up in
the spark logs?

I have something like

class RichRDD (rdd : RDD[MyThing]) {
   def mySpecialOperation() {
   rdd.context.setCallSite("bubbles and candy!")
   rdd.map()
   val result = rdd.groupBy()
   rdd.context.clearCallSite()
   result
}
}


But when I use .mySpecialOperation "bubbles and candy!" doesn't seem to
show up anywhere in the logs. Is this not the right way to use
.setCallSite?


Re: Better line number hints for logging?

2014-06-03 Thread John Salvatier
Ok, I will probably open a Jira.


On Tue, Jun 3, 2014 at 5:29 PM, Matei Zaharia 
wrote:

> You can use RDD.setName to give it a name. There’s also a creationSite
> field that is private[spark] — we may want to add a public setter for that
> later. If the name isn’t enough and you’d like this, please open a JIRA
> issue for it.
>
> Matei
>
> On Jun 3, 2014, at 5:22 PM, John Salvatier  wrote:
>
> I have created some extension methods for RDDs in RichRecordRDD and these
> are working exceptionally well for me.
>
> However, when looking at the logs, its impossible to tell what's going on
> because all the line number hints point to RichRecordRDD.scala rather than
> the code that uses it. For example:
>
>> INFO scheduler.DAGScheduler: Submitting Stage 122 (MappedRDD[1223] at map
>> at RichRecordRDD.scala:633), which is now runnable
>
> Is there any way set up my extension methods class so that the logs will
> print a more useful line number?
>
>
>


Better line number hints for logging?

2014-06-03 Thread John Salvatier
I have created some extension methods for RDDs in RichRecordRDD and these
are working exceptionally well for me.

However, when looking at the logs, its impossible to tell what's going on
because all the line number hints point to RichRecordRDD.scala rather than
the code that uses it. For example:

> INFO scheduler.DAGScheduler: Submitting Stage 122 (MappedRDD[1223] at map
> at RichRecordRDD.scala:633), which is now runnable

Is there any way set up my extension methods class so that the logs will
print a more useful line number?


Re: How are exceptions in map functions handled in Spark?

2014-04-04 Thread John Salvatier
Btw, thank you for your help.


On Fri, Apr 4, 2014 at 11:49 AM, John Salvatier wrote:

> Is there a way to log exceptions inside a mapping function? logError and
> logInfo seem to freeze things.
>
>
> On Fri, Apr 4, 2014 at 11:02 AM, Matei Zaharia wrote:
>
>> Exceptions should be sent back to the driver program and logged there
>> (with a SparkException thrown if a task fails more than 4 times), but there
>> were some bugs before where this did not happen for non-Serializable
>> exceptions. We changed it to pass back the stack traces only (as text),
>> which should always work. I'd recommend trying a newer Spark version, 0.8
>> should be easy to upgrade to from 0.7.
>>
>> Matei
>>
>> On Apr 4, 2014, at 10:40 AM, John Salvatier  wrote:
>>
>> > I'm trying to get a clear idea about how exceptions are handled in
>> Spark? Is there somewhere where I can read about this? I'm on spark .7
>> >
>> > For some reason I was under the impression that such exceptions are
>> swallowed and the value that produced them ignored but the exception is
>> logged. However, right now we're seeing the task just re-tried over and
>> over again in an infinite loop because there's a value that always
>> generates an exception.
>> >
>> > John
>>
>>
>


Re: How are exceptions in map functions handled in Spark?

2014-04-04 Thread John Salvatier
Is there a way to log exceptions inside a mapping function? logError and
logInfo seem to freeze things.


On Fri, Apr 4, 2014 at 11:02 AM, Matei Zaharia wrote:

> Exceptions should be sent back to the driver program and logged there
> (with a SparkException thrown if a task fails more than 4 times), but there
> were some bugs before where this did not happen for non-Serializable
> exceptions. We changed it to pass back the stack traces only (as text),
> which should always work. I'd recommend trying a newer Spark version, 0.8
> should be easy to upgrade to from 0.7.
>
> Matei
>
> On Apr 4, 2014, at 10:40 AM, John Salvatier  wrote:
>
> > I'm trying to get a clear idea about how exceptions are handled in
> Spark? Is there somewhere where I can read about this? I'm on spark .7
> >
> > For some reason I was under the impression that such exceptions are
> swallowed and the value that produced them ignored but the exception is
> logged. However, right now we're seeing the task just re-tried over and
> over again in an infinite loop because there's a value that always
> generates an exception.
> >
> > John
>
>


How are exceptions in map functions handled in Spark?

2014-04-04 Thread John Salvatier
I'm trying to get a clear idea about how exceptions are handled in Spark?
Is there somewhere where I can read about this? I'm on spark .7

For some reason I was under the impression that such exceptions are
swallowed and the value that produced them ignored but the exception is
logged. However, right now we're seeing the task just re-tried over and
over again in an infinite loop because there's a value that always
generates an exception.

John