[jira] [Updated] (SPARK-6728) Improve performance of py4j for large bytearray
[ https://issues.apache.org/jira/browse/SPARK-6728?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Davies Liu updated SPARK-6728: -- Target Version/s: (was: 1.6.0) > Improve performance of py4j for large bytearray > --- > > Key: SPARK-6728 > URL: https://issues.apache.org/jira/browse/SPARK-6728 > Project: Spark > Issue Type: Improvement > Components: PySpark >Affects Versions: 1.3.0 >Reporter: Davies Liu >Priority: Critical > > PySpark relies on py4j to transfer function arguments and return between > Python and JVM, it's very slow to pass a large bytearray (larger than 10M). > In MLlib, it's possible to have a Vector with more than 100M bytes, which > will need few GB memory, may crash. > The reason is that py4j use text protocol, it will encode the bytearray as > base64, and do multiple string concat. > Binary will help a lot, create a issue for py4j: > https://github.com/bartdag/py4j/issues/159 -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-6728) Improve performance of py4j for large bytearray
[ https://issues.apache.org/jira/browse/SPARK-6728?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Reynold Xin updated SPARK-6728: --- Target Version/s: 1.6.0 (was: 1.5.0) Improve performance of py4j for large bytearray --- Key: SPARK-6728 URL: https://issues.apache.org/jira/browse/SPARK-6728 Project: Spark Issue Type: Improvement Components: PySpark Affects Versions: 1.3.0 Reporter: Davies Liu Priority: Critical PySpark relies on py4j to transfer function arguments and return between Python and JVM, it's very slow to pass a large bytearray (larger than 10M). In MLlib, it's possible to have a Vector with more than 100M bytes, which will need few GB memory, may crash. The reason is that py4j use text protocol, it will encode the bytearray as base64, and do multiple string concat. Binary will help a lot, create a issue for py4j: https://github.com/bartdag/py4j/issues/159 -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-6728) Improve performance of py4j for large bytearray
[ https://issues.apache.org/jira/browse/SPARK-6728?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Owen updated SPARK-6728: - Target Version/s: 1.5.0 (was: 1.4.0) Improve performance of py4j for large bytearray --- Key: SPARK-6728 URL: https://issues.apache.org/jira/browse/SPARK-6728 Project: Spark Issue Type: Improvement Components: PySpark Affects Versions: 1.3.0 Reporter: Davies Liu Priority: Critical PySpark relies on py4j to transfer function arguments and return between Python and JVM, it's very slow to pass a large bytearray (larger than 10M). In MLlib, it's possible to have a Vector with more than 100M bytes, which will need few GB memory, may crash. The reason is that py4j use text protocol, it will encode the bytearray as base64, and do multiple string concat. Binary will help a lot, create a issue for py4j: https://github.com/bartdag/py4j/issues/159 -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-6728) Improve performance of py4j for large bytearray
[ https://issues.apache.org/jira/browse/SPARK-6728?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yin Huai updated SPARK-6728: Affects Version/s: 1.3.0 Improve performance of py4j for large bytearray --- Key: SPARK-6728 URL: https://issues.apache.org/jira/browse/SPARK-6728 Project: Spark Issue Type: Improvement Components: PySpark Affects Versions: 1.3.0 Reporter: Davies Liu PySpark relies on py4j to transfer function arguments and return between Python and JVM, it's very slow to pass a large bytearray (larger than 10M). In MLlib, it's possible to have a Vector with more than 100M bytes, which will need few GB memory, may crash. The reason is that py4j use text protocol, it will encode the bytearray as base64, and do multiple string concat. Binary will help a lot, create a issue for py4j: https://github.com/bartdag/py4j/issues/159 -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-6728) Improve performance of py4j for large bytearray
[ https://issues.apache.org/jira/browse/SPARK-6728?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yin Huai updated SPARK-6728: Priority: Critical (was: Major) Target Version/s: 1.4.0 Improve performance of py4j for large bytearray --- Key: SPARK-6728 URL: https://issues.apache.org/jira/browse/SPARK-6728 Project: Spark Issue Type: Improvement Components: PySpark Affects Versions: 1.3.0 Reporter: Davies Liu Priority: Critical PySpark relies on py4j to transfer function arguments and return between Python and JVM, it's very slow to pass a large bytearray (larger than 10M). In MLlib, it's possible to have a Vector with more than 100M bytes, which will need few GB memory, may crash. The reason is that py4j use text protocol, it will encode the bytearray as base64, and do multiple string concat. Binary will help a lot, create a issue for py4j: https://github.com/bartdag/py4j/issues/159 -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org