Hi
I am newer in spark and i want ask you what wrang with checkpoint  On
pyspark 1.6.0

i dont unertsand what happen after i try to use it under datframe :
   dfTotaleNormalize24 =  dfTotaleNormalize23.select([i if i not in
listrapcot  else          udf_Grappra(F.col(i)).alias(i) for i in
dfTotaleNormalize23.columns  ])

dfTotaleNormalize24.cache()   <- cache on memory
dfTotaleNormalize24.count <-matrialize dataframe(  rdd too ??)
dfTotaleNormalize24.rdd.checkpoint() <- (cut DAG and save rdd not yet)
dfTotaleNormalize24.rdd.count() <--- matrialize in file

but why i get the following error :

 java.lang.UnsupportedOperationException: Cannot evaluate expression:
 PythonUDF#Grappra(input[410, StringType])


thank's to explain all details and steps to save and check point

Mydatframe it huge on with more than 5 Million rows and 1000 columns

and udf befor are applied on more than 150 columns  it replace  ' ' by 0.0
that all.

regards

Reply via email to