hi Andrew,
Thanks for your note. Yes, I see a stack trace now. It seems to be an
issue with python interpreting a function I wish to apply to an RDD. The
stack trace is below. The function is a simple factorial:
def f(n):
if n == 1: return 1
return n * f(n-1)
and I'm trying to use it like this:
tf = sc.textFile(...)
tf.map(lambda line: line and len(line)).map(f).collect()
I get the following error, which does not occur if I use a built-in
function, like math.sqrt
TypeError: __import__() argument 1 must be string, not X#
stacktrace follows
WARN TaskSetManager: Loss was due to
org.apache.spark.api.python.PythonException
org.apache.spark.api.python.PythonException: Traceback (most recent call
last):
File
"/hadoop/d11/yarn/nm/usercache/eric_d_friedman/filecache/26/spark-assembly-1.0.1-hadoop2.2.0.jar/pyspark/worker.py",
line 77, in main
serializer.dump_stream(func(split_index, iterator), outfile)
File
"/hadoop/d11/yarn/nm/usercache/eric_d_friedman/filecache/26/spark-assembly-1.0.1-hadoop2.2.0.jar/pyspark/serializers.py",
line 191, in dump_stream
self.serializer.dump_stream(self._batched(iterator), stream)
File
"/hadoop/d11/yarn/nm/usercache/eric_d_friedman/filecache/26/spark-assembly-1.0.1-hadoop2.2.0.jar/pyspark/serializers.py",
line 123, in dump_stream
for obj in iterator:
File
"/hadoop/d11/yarn/nm/usercache/eric_d_friedman/filecache/26/spark-assembly-1.0.1-hadoop2.2.0.jar/pyspark/serializers.py",
line 180, in _batched
for item in iterator:
File "<ipython-input-39-0f0dafaf1ed4>", line 2, in f
TypeError: __import__() argument 1 must be string, not X#
at org.apache.spark.api.python.PythonRDD$$anon$1.read(PythonRDD.scala:115)
at org.apache.spark.api.python.PythonRDD$$anon$1.<init>(PythonRDD.scala:145)
at org.apache.spark.api.python.PythonRDD.compute(PythonRDD.scala:78)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)
On Wed, Jul 23, 2014 at 11:07 AM, Andrew Or <[email protected]> wrote:
> Hi Eric,
>
> Have you checked the executor logs? It is possible they died because of
> some exception, and the message you see is just a side effect.
>
> Andrew
>
>
> 2014-07-23 8:27 GMT-07:00 Eric Friedman <[email protected]>:
>
> I'm using spark 1.0.1 on a quite large cluster, with gobs of memory, etc.
>> Cluster resources are available to me via Yarn and I am seeing these
>> errors quite often.
>>
>> ERROR YarnClientClusterScheduler: Lost executor 63 on <host>: remote Akka
>> client disassociated
>>
>>
>> This is in an interactive shell session. I don't know a lot about Yarn
>> plumbing and am wondering if there's some constraint in play -- executors
>> can't be idle for too long or they get cleared out.
>>
>>
>> Any insights here?
>>
>
>