[jira] [Updated] (SPARK-6728) Improve performance of py4j for large bytearray

2015-11-09 Thread Davies Liu (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-6728?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Davies Liu updated SPARK-6728:
--
Target Version/s:   (was: 1.6.0)

> Improve performance of py4j for large bytearray
> ---
>
> Key: SPARK-6728
> URL: https://issues.apache.org/jira/browse/SPARK-6728
> Project: Spark
>  Issue Type: Improvement
>  Components: PySpark
>Affects Versions: 1.3.0
>Reporter: Davies Liu
>Priority: Critical
>
> PySpark relies on py4j to transfer function arguments and return between 
> Python and JVM, it's very slow to pass a large bytearray (larger than 10M). 
> In MLlib, it's possible to have a Vector with more than 100M bytes, which 
> will need few GB memory, may crash.
> The reason is that py4j use text protocol, it will encode the bytearray as 
> base64, and do multiple string concat. 
> Binary will help a lot, create a issue for py4j: 
> https://github.com/bartdag/py4j/issues/159



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-6728) Improve performance of py4j for large bytearray

2015-08-25 Thread Reynold Xin (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-6728?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Reynold Xin updated SPARK-6728:
---
Target Version/s: 1.6.0  (was: 1.5.0)

 Improve performance of py4j for large bytearray
 ---

 Key: SPARK-6728
 URL: https://issues.apache.org/jira/browse/SPARK-6728
 Project: Spark
  Issue Type: Improvement
  Components: PySpark
Affects Versions: 1.3.0
Reporter: Davies Liu
Priority: Critical

 PySpark relies on py4j to transfer function arguments and return between 
 Python and JVM, it's very slow to pass a large bytearray (larger than 10M). 
 In MLlib, it's possible to have a Vector with more than 100M bytes, which 
 will need few GB memory, may crash.
 The reason is that py4j use text protocol, it will encode the bytearray as 
 base64, and do multiple string concat. 
 Binary will help a lot, create a issue for py4j: 
 https://github.com/bartdag/py4j/issues/159



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-6728) Improve performance of py4j for large bytearray

2015-06-19 Thread Sean Owen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-6728?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen updated SPARK-6728:
-
Target Version/s: 1.5.0  (was: 1.4.0)

 Improve performance of py4j for large bytearray
 ---

 Key: SPARK-6728
 URL: https://issues.apache.org/jira/browse/SPARK-6728
 Project: Spark
  Issue Type: Improvement
  Components: PySpark
Affects Versions: 1.3.0
Reporter: Davies Liu
Priority: Critical

 PySpark relies on py4j to transfer function arguments and return between 
 Python and JVM, it's very slow to pass a large bytearray (larger than 10M). 
 In MLlib, it's possible to have a Vector with more than 100M bytes, which 
 will need few GB memory, may crash.
 The reason is that py4j use text protocol, it will encode the bytearray as 
 base64, and do multiple string concat. 
 Binary will help a lot, create a issue for py4j: 
 https://github.com/bartdag/py4j/issues/159



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-6728) Improve performance of py4j for large bytearray

2015-04-06 Thread Yin Huai (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-6728?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yin Huai updated SPARK-6728:

Affects Version/s: 1.3.0

 Improve performance of py4j for large bytearray
 ---

 Key: SPARK-6728
 URL: https://issues.apache.org/jira/browse/SPARK-6728
 Project: Spark
  Issue Type: Improvement
  Components: PySpark
Affects Versions: 1.3.0
Reporter: Davies Liu

 PySpark relies on py4j to transfer function arguments and return between 
 Python and JVM, it's very slow to pass a large bytearray (larger than 10M). 
 In MLlib, it's possible to have a Vector with more than 100M bytes, which 
 will need few GB memory, may crash.
 The reason is that py4j use text protocol, it will encode the bytearray as 
 base64, and do multiple string concat. 
 Binary will help a lot, create a issue for py4j: 
 https://github.com/bartdag/py4j/issues/159



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-6728) Improve performance of py4j for large bytearray

2015-04-06 Thread Yin Huai (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-6728?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yin Huai updated SPARK-6728:

Priority: Critical  (was: Major)
Target Version/s: 1.4.0

 Improve performance of py4j for large bytearray
 ---

 Key: SPARK-6728
 URL: https://issues.apache.org/jira/browse/SPARK-6728
 Project: Spark
  Issue Type: Improvement
  Components: PySpark
Affects Versions: 1.3.0
Reporter: Davies Liu
Priority: Critical

 PySpark relies on py4j to transfer function arguments and return between 
 Python and JVM, it's very slow to pass a large bytearray (larger than 10M). 
 In MLlib, it's possible to have a Vector with more than 100M bytes, which 
 will need few GB memory, may crash.
 The reason is that py4j use text protocol, it will encode the bytearray as 
 base64, and do multiple string concat. 
 Binary will help a lot, create a issue for py4j: 
 https://github.com/bartdag/py4j/issues/159



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org