Re: Better way to debug serializable issues

2020-02-20 Thread Ruijing Li
Thanks all for the answer. Unfortunately while I wasn’t able to use the
extra parameters to get the needed information, I did solve my issue. It
was an issue of using pureconfig to read a certain config from hadoop
before the spark session initialized, therefore pureconfig would error out
in deserializing the class before spark could configure properly.


On Tue, Feb 18, 2020 at 10:24 AM Maxim Gekk 
wrote:

> Hi Ruijing,
>
> Spark uses SerializationDebugger (
> https://spark.apache.org/docs/latest/api/java/org/apache/spark/serializer/SerializationDebugger.html)
> as default debugger to detect the serialization issues. You can take more
> detailed serialization exception information by setting the following while
> creating a cluster:
> spark.driver.extraJavaOptions -Dsun.io.serialization.extendedDebugInfo=true
> spark.executor.extraJavaOptions
> -Dsun.io.serialization.extendedDebugInfo=true
>
> Maxim Gekk
>
> Software Engineer
>
> Databricks, Inc.
>
>
> On Tue, Feb 18, 2020 at 1:02 PM Ruijing Li  wrote:
>
>> Hi all,
>>
>> When working with spark jobs, I sometimes have to tackle with
>> serialization issues, and I have a difficult time trying to fix those. A
>> lot of times, the serialization issues happen only in cluster mode across
>> the network in a mesos container, so I can’t debug locally. And the
>> exception thrown by spark is not very helpful to find the cause.
>>
>> I’d love to hear some tips on how to debug in the right places. Also, I’d
>> be interested to know if in future releases it would be possible to point
>> out which class or function is causing the serialization issue (right now I
>> find its either Java generic classes or the class Spark is running itself).
>> Thanks!
>> --
>> Cheers,
>> Ruijing Li
>>
> --
Cheers,
Ruijing Li


Re: Better way to debug serializable issues

2020-02-18 Thread Maxim Gekk
Hi Ruijing,

Spark uses SerializationDebugger (
https://spark.apache.org/docs/latest/api/java/org/apache/spark/serializer/SerializationDebugger.html)
as default debugger to detect the serialization issues. You can take more
detailed serialization exception information by setting the following while
creating a cluster:
spark.driver.extraJavaOptions -Dsun.io.serialization.extendedDebugInfo=true
spark.executor.extraJavaOptions
-Dsun.io.serialization.extendedDebugInfo=true

Maxim Gekk

Software Engineer

Databricks, Inc.


On Tue, Feb 18, 2020 at 1:02 PM Ruijing Li  wrote:

> Hi all,
>
> When working with spark jobs, I sometimes have to tackle with
> serialization issues, and I have a difficult time trying to fix those. A
> lot of times, the serialization issues happen only in cluster mode across
> the network in a mesos container, so I can’t debug locally. And the
> exception thrown by spark is not very helpful to find the cause.
>
> I’d love to hear some tips on how to debug in the right places. Also, I’d
> be interested to know if in future releases it would be possible to point
> out which class or function is causing the serialization issue (right now I
> find its either Java generic classes or the class Spark is running itself).
> Thanks!
> --
> Cheers,
> Ruijing Li
>