Hello, I am using pyspark to train a Logistic Regression model using cross validation with ML. My dataset is - for testing purposes very small - like no more than 50 records for train. On the other hand, my "feature" column has a very large size - i.e., 1500+ columns.
I am running on yarn using 3 executors, with 4gb and 4 cores each. I am using cache to store dataframes. Unfortunately, my process does not finish and hangs in doing cross validation. Any clues? Thanks guys Simone