Hi I am newer in spark and i want ask you what wrang with checkpoint On pyspark 1.6.0
i dont unertsand what happen after i try to use it under datframe : dfTotaleNormalize24 = dfTotaleNormalize23.select([i if i not in listrapcot else udf_Grappra(F.col(i)).alias(i) for i in dfTotaleNormalize23.columns ]) dfTotaleNormalize24.cache() <- cache on memory dfTotaleNormalize24.count <-matrialize dataframe( rdd too ??) dfTotaleNormalize24.rdd.checkpoint() <- (cut DAG and save rdd not yet) dfTotaleNormalize24.rdd.count() <--- matrialize in file but why i get the following error : java.lang.UnsupportedOperationException: Cannot evaluate expression: PythonUDF#Grappra(input[410, StringType]) thank's to explain all details and steps to save and check point Mydatframe it huge on with more than 5 Million rows and 1000 columns and udf befor are applied on more than 150 columns it replace ' ' by 0.0 that all. regards
