GitHub user chenghao-intel opened a pull request:
https://github.com/apache/spark/pull/446
[Spark-1461] Deferred Expression Evaluation (short-circuit evaluation)
Short-circut will significantly improves the performance in Expression
Evaluation, however, we can not fold an deterministic-less UDF (like Rand()),
and also the stateful UDF should not be ignored in short-circuit
evaluation(e.g. in expression: col1 > 0 and row_sequence() < 1000,
row_sequence() can not be ignored.)
I brought an concept of DeferredObject from Hive, which has 2 types of
children types (EagerResult / DeferredResult), the former will require trigger
the evaluation before it's created, while the later will trigger the evaluation
when first call its get() method.
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/chenghao-intel/spark
expression_deferred_evaluation
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/446.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #446
----
commit 6e1e99b5f157e18a713717d9fd2f643fb2576f85
Author: Cheng Hao <[email protected]>
Date: 2014-04-11T05:49:45Z
unify the nullable/stateful/foldable interface
commit 6f6776bb860d9cbf5d7a23de7ff2376f1123c384
Author: Cheng Hao <[email protected]>
Date: 2014-04-11T07:31:40Z
Append some missing types for HiveUDF
commit 9d827c00e684084661bd763c3849ef81bae84023
Author: Cheng Hao <[email protected]>
Date: 2014-04-15T05:34:42Z
refactor the expression with deferred evaluation
commit 5d9febfa610106201ae4c8703b7a01a5c4b21a0d
Author: Cheng Hao <[email protected]>
Date: 2014-04-16T03:25:35Z
fix cann't be compiled issues
commit ded73054e3c0d75b5c0df01c02c33e5816973191
Author: Cheng Hao <[email protected]>
Date: 2014-04-16T07:38:45Z
Fix bug that un-initialized Array of DeferredObjectAdatper
commit 850aec31310aaf3aab4609b9fb8262c0ef688aa1
Author: Cheng Hao <[email protected]>
Date: 2014-04-17T08:05:20Z
Fix bug of ClassCastedException
commit 53f1fb5343815a58d31ee5d0fb262586211c2a86
Author: Cheng Hao <[email protected]>
Date: 2014-04-17T08:24:32Z
rename the Expression.compute => Expression.deferCompute
commit 14e16cf527b6f5aea487720b90c3a3d3ba4764a3
Author: Cheng Hao <[email protected]>
Date: 2014-04-18T02:09:05Z
using lazy val instead of val in Cast
commit 640df9aab2939a5ff601deeccc87d04b11e0a4a7
Author: Cheng Hao <[email protected]>
Date: 2014-04-18T06:01:10Z
make the Expression.eval as public method
commit 9b2ebbaca628291606ba9979c4566588c769403b
Author: Cheng Hao <[email protected]>
Date: 2014-04-18T06:03:52Z
rename Expression.deferCompute => Expression.deferEval
----
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---