GitHub user davies opened a pull request:
https://github.com/apache/spark/pull/5303
[SPARK-6638] [SQL] Improve performance of StringType in SQL
This PR change the internal representation for StringType from
java.lang.String to UTF8String, which is implemented use Array[Byte] (encoded
in UTF-8).
This PR should not break any public API, Row.getString() will still return
java.lang.String.
This is the first step of improve the performance of String in SQL.
cc @rxin
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/davies/spark string
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/5303.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #5303
----
commit 685fd071ce453cc6b956f98c897c869ad31702a9
Author: Davies Liu <[email protected]>
Date: 2015-03-31T05:42:07Z
use UTF8String instead of String for StringType
commit 21f67c6fda3504caa0b13524d4e498c6e4c9c701
Author: Davies Liu <[email protected]>
Date: 2015-03-31T07:50:11Z
cleanup
commit 4699c3ae1dab6482b26dd3d3739193e68cd77ca3
Author: Davies Liu <[email protected]>
Date: 2015-03-31T20:46:42Z
use Array[Byte] in UTF8String
commit d32abd1e8e6b7b5ef92a34a5d3a42919db58a43c
Author: Davies Liu <[email protected]>
Date: 2015-03-31T20:57:17Z
fix utf8 for python api
commit a85fb275d742dd9384e15f22878b545e9a77a106
Author: Davies Liu <[email protected]>
Date: 2015-03-31T23:42:18Z
refactor
----
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]