Github user jerryshao commented on the pull request:
https://github.com/apache/spark/pull/2514#issuecomment-56789130
Hi Matei, I modify the unit test to reproduce the exception. Seems it is
difficult to reproduce this exception with small dataset manually, as this
exception is unreliable.
Here is the reason searched from [stack
overflow](http://stackoverflow.com/questions/24951257/when-does-timsort-complain-about-broken-comparator)
>the exception behavior is unreliable: As long as you have small data sets
(so small that a generated run may never gallop, as MIN_GALLOP is 7) or the
generated runs always coincidentally generate a merge that never gallops, you
will never receive the exception. Thus, without further reviewing the
gallopRight method, we can come to the conclusion that you cannot rely on the
exception: It may never be thrown no matter how wrong your comparator is.
So here I generate 1m random integer values to reproduce the exception.
Seems in my local test with above 1000 rounds of test, this exception can
always be produced. But it cannot be logically proved and still have chance to
not throw exception.
Also I tested with 1k random integer plus some large data like
Integer.MaxValue and Integer.MinValue, hardly to reproduce this exception. And
with 10k dataset, 1/3 chance will get the exception.
I think unless someone familiar with TimSort can manually create
effectively small dataset, potentially this unit test may fail.
So would you give me some suggestions? Thanks a lot.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]