Github user marmbrus commented on the pull request:
https://github.com/apache/spark/pull/758#issuecomment-43242728
> @marmbrus I worked around the test failure by adding a SortedOperation
pattern that conservatively matches some definitely sorted operations (false
negative rather than false positive). This may slow down the test suite a bit.
Since most test output are empty or very small, this shouldn't be an issue
right now.
I think false negatives are the wrong direction to go here. A false
negative means that we think the query is not ordered when it should be and
thus are disregarding the order when we should in fact be checking it.
Maybe it would be better to recursively walk the tree looking explicitly
for nodes that do not preserve order (aggregation, join, base relations) and
then return false. Sorts would return true. Thoughts?
> New micro benchmark data:
Sweet, looks like we shaved off a little bit more, so these optimizations
were worth it! It would be good to make notes on which changes lead to what
kind of speed up here. That way, we can better focus our efforts when we
optimize in the future.
> As for Hive data unwrapping, I couldn't find a "static" method to
eliminate right now. Any hints?
My thought was that you will create an `Array` of `Any => Any` functions
that can be applied to each column. This way you only match on the datatype
once, at the beginning, and then simply index into this array instead of
matching for each data item.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---