[ https://issues.apache.org/jira/browse/SPARK-5224?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14275711#comment-14275711 ]
Apache Spark commented on SPARK-5224: ------------------------------------- User 'davies' has created a pull request for this issue: https://github.com/apache/spark/pull/4024 > parallelize list/ndarray is really slow > --------------------------------------- > > Key: SPARK-5224 > URL: https://issues.apache.org/jira/browse/SPARK-5224 > Project: Spark > Issue Type: Bug > Components: PySpark > Affects Versions: 1.2.0 > Reporter: Davies Liu > Priority: Blocker > > After the default batchSize changed to 0 (batched based on the size of > object), but parallelize() still use BatchedSerializer with batchSize=1. > Also, BatchedSerializer did not work well with list and numpy.ndarray -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org