Github user jerryshao commented on the pull request:

    https://github.com/apache/spark/pull/2514#issuecomment-56789130
  
    Hi Matei, I modify the unit test to reproduce the exception. Seems it is 
difficult to reproduce this exception with small dataset manually, as this 
exception is unreliable.
    
    Here is the reason searched from [stack 
overflow](http://stackoverflow.com/questions/24951257/when-does-timsort-complain-about-broken-comparator)
    
    >the exception behavior is unreliable: As long as you have small data sets 
(so small that a generated run may never gallop, as MIN_GALLOP is 7) or the 
generated runs always coincidentally generate a merge that never gallops, you 
will never receive the exception. Thus, without further reviewing the 
gallopRight method, we can come to the conclusion that you cannot rely on the 
exception: It may never be thrown no matter how wrong your comparator is.
    
    So here I generate 1m random integer values to reproduce the exception. 
Seems in my local test with  above 1000 rounds of test, this exception can 
always be produced. But it cannot be logically proved and still have chance to 
not throw exception.
    
    Also I tested with 1k random integer plus some large data like 
Integer.MaxValue and Integer.MinValue, hardly to reproduce this exception. And 
with 10k dataset, 1/3 chance will get the exception.
    
    I think unless someone familiar with TimSort can manually create 
effectively small dataset, potentially this unit test may fail.
    
    So would you give me some suggestions? Thanks a lot.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to