GitHub user JoshRosen opened a pull request:
https://github.com/apache/spark/pull/6444
[WIP] Binary processing external sort for SQL's sort-merge join
This is a WIP commit towards implementing a binary processing cache aware
external record sort for use in Spark SQL's sort-merge join. The code here is
modeled after an early design of #5868, which supported pluggable functions for
comparing key prefixes and comparing serialized records.
I'll update this PR with a detailed design description later, similar to
the detailed descriptions and comments posted at #5868; I'm only opening this
now so that I can run some things through Jenkins and track some code review
comments.
This patch incorporates the changes in #6222. After that patch is merged,
I'll rebase to exclude those commits.
This will address the following JIRAs (not putting the links in the title
yet because I don't want to send an email blast): `[SPARK-7078] [SPARK-7079]
[SPARK-7082]`.
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/JoshRosen/spark sql-external-sort
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/6444.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #6444
----
commit a67678c81a76aba72a2aa75a3488506531b310ff
Author: Josh Rosen <[email protected]>
Date: 2015-05-17T18:45:12Z
WIP refactoring of CatalystTypeConverters
commit 640ff1c7178041f5699b7af236766e488191287b
Author: Josh Rosen <[email protected]>
Date: 2015-05-17T23:14:10Z
Comments and cleanup
commit 6477fbd588655f2b28624f1223b7a2a3bde833f5
Author: Josh Rosen <[email protected]>
Date: 2015-05-25T00:57:26Z
Throw ClassCastException errors during inbound conversions.
commit 7f46d9a6930d5917929a26d1d8dee7c3a025332c
Author: Josh Rosen <[email protected]>
Date: 2015-05-25T01:36:04Z
Remove last use of convertToScala().
commit fec87a0adf096b2582039a5530e3f2cc6f9c090f
Author: Josh Rosen <[email protected]>
Date: 2015-05-25T01:40:45Z
Fix wrong input data in InMemoryColumnarQuerySuite
The schema declares an array of booleans, but we
passed an array of integers instead.
commit fd81c599e2b42e14547e21ec7fafc33b2ded2e3c
Author: Josh Rosen <[email protected]>
Date: 2015-05-25T01:58:53Z
Fix serialization error in UserDefinedGenerator.
commit 9543a87e066e5216cbc6e3848ec9f16c45d868b7
Author: Josh Rosen <[email protected]>
Date: 2015-05-25T01:59:07Z
Fix null handling bug; add tests.
commit 51acd8f220c8fe05141c48eea7e235c6f7efdd52
Author: Josh Rosen <[email protected]>
Date: 2015-05-25T23:23:13Z
Fix JavaHashingTFSuite ClassCastException
commit 81a9ecd0c47244afe9a966306e68cc87b8ef3dd2
Author: Josh Rosen <[email protected]>
Date: 2015-05-26T23:03:52Z
Initialize converters lazily so that the attributes are resolved first
commit ee25e8d269a7ac5d36b362e36d7e6395bd9e71f4
Author: Josh Rosen <[email protected]>
Date: 2015-05-27T19:04:01Z
Re-add convertToScala(), since a Hive test still needs it
commit 1df1c2cccbfc68c6aaebdb0f63efd0555b3cbed6
Author: Josh Rosen <[email protected]>
Date: 2015-05-15T05:55:29Z
WIP towards external sorter for Spark SQL.
This is based on an early version of my shuffle sort patch; the
implementation will undergo significant refactoring based on
improvements made as part of the shuffle patch. Stay tuned.
commit 356a28eaa0c1a53d187faf96e87a633283ea58f3
Author: Josh Rosen <[email protected]>
Date: 2015-05-15T21:11:32Z
Import my original tests and get them to pass.
commit 2c6a3899602e7fb83491f59c680a106aa853b477
Author: Josh Rosen <[email protected]>
Date: 2015-05-26T03:47:07Z
Merge in a sketch of a unit test for the new sorter (now failing).
----
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]