Hi,

I dont know why I receive the message

 WARN KMeans: The input data is not directly cached, which may hurt
performance if its parent RDDs are also uncached.

when I try to use Spark Kmeans

df_Part = assembler.transform(df_Part)
df_Part.cache()while (k<=max_cluster) and (wssse > seuilStop):
                    kmeans = KMeans().setK(k)
                    model = kmeans.fit(df_Part)
                    wssse = model.computeCost(df_Part)
                    k=k+1

It says that my input (Dataframe) is not cached !!

I tried to print df_Part.is_cached and I recieved True which means that my
dataframe is cached, So why spark still warning me about this ???

thank you in advance


ᐧ

Reply via email to