Hi , i have dataframe with 1000 columns to dummies with stingIndexer when i apply pipliene take long times whene i want merge result with other data frame
i mean : originnal data frame + columns indexed by STringindexers PB save stage it s long why ? code indexers = [StringIndexer(inputCol=i, outputCol=i+"_index").fit(df) for i in l] li = [i+"_index" for i in l] pipeline = Pipeline(stages=indexers) df_r = pipeline.fit(df).transform(df) df_r = df_r.repartition(500) df_r.persist() df_r.write().parquet(paths)