Srinivasan created SPARK-23969: ---------------------------------- Summary: Using FPGrowth with PipelinedRDD gives EOF Error Key: SPARK-23969 URL: https://issues.apache.org/jira/browse/SPARK-23969 Project: Spark Issue Type: Bug Components: MLlib, Spark Submit Affects Versions: 2.0.2 Environment: Spark 2.0.2
Python2.6 (This is due to a slight issue with the Altiscale environment we are using. We will be moving to 3.5) Driver Memory = 32GB 5 nodes with 4 cores each and 32GB memory Shuffle Service = False Dynamic Allocation = False Cross Join = True Reporter: Srinivasan I am trying to find association rule on a data set with 27 million rows and 9 columns. The data is stored in a hive table and loaded into an RDD. I am not using the collect function as I keep getting out of memory error. I understand that FPGrowth needs a list of transactions. Which means I need to convert my dataframe into a list of lists. But I keep getting serializer.py EOF error. -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org