[jira] [Commented] (FLINK-9393) LocatableInputSplit#hashCode should take hostnames into account
[ https://issues.apache.org/jira/browse/FLINK-9393?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16480310#comment-16480310 ] Stephan Ewen commented on FLINK-9393: - The leaking of the hostnames array make {{equals()}} vulnerable, it still does not violate the equals/hashcode contract. In some sense it is almost an argument to not change {{hashCode()}}, unless the argument is "equals is partly broken, so hash code should be broken as well". The way that "splitNumber" gets initialized, it is a perfect hashcode, no collisions at all. That is admittedly not guaranteed from the class itself, but it is the effect of the way it is currently used. Adding the hostnames to the hash code can introduce some collisions that were not there before. Its not going to be a problem, but looking at the bigger picture, things may always be a bit different than an individual hashCode function tells. I am not opposed to fixing this issue. What I am trying to do is make contributors aware of the mechanics and implications of contributing to such a large project. We can probably spend two years of everyone's time to just rewrite parts of the code that are already fine, but could be done differently to satisfy some checkstyle, default pattern, common guideline, etc. Every time we do that, we have to be really careful, because changing something that is not wrong cannot really make anything better, but could make things worse (again, implications can be subtle), so great care is needed, and often more time (from reviewing committers) than it may initially seem. Please understand that I am only writing this in order to raise some awareness. I have seen projects that came into a state that they mainly worked on "moving things around" internally, and slowed down a lot on adding user/developer benefit. So in my opinion we should. In that sense, yes, please go ahead, this is fine to be changed and merged. Just take the above into account when looking for parts of Flink you want to change. > LocatableInputSplit#hashCode should take hostnames into account > --- > > Key: FLINK-9393 > URL: https://issues.apache.org/jira/browse/FLINK-9393 > Project: Flink > Issue Type: Bug >Reporter: Ted Yu >Assignee: vinoyang >Priority: Major > > Currently: > {code} > public int hashCode() { > return this.splitNumber; > {code} > This is not symmetrical with {{equals}} method where hostnames are compared. > LocatableInputSplit#hashCode should take hostnames into account. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (FLINK-9393) LocatableInputSplit#hashCode should take hostnames into account
[ https://issues.apache.org/jira/browse/FLINK-9393?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16480079#comment-16480079 ] vinoyang commented on FLINK-9393: - the reason and opinion provided by [~yuzhih...@gmail.com] looks good. So, [~StephanEwen] any idea about this issue? > LocatableInputSplit#hashCode should take hostnames into account > --- > > Key: FLINK-9393 > URL: https://issues.apache.org/jira/browse/FLINK-9393 > Project: Flink > Issue Type: Bug >Reporter: Ted Yu >Assignee: vinoyang >Priority: Major > > Currently: > {code} > public int hashCode() { > return this.splitNumber; > {code} > This is not symmetrical with {{equals}} method where hostnames are compared. > LocatableInputSplit#hashCode should take hostnames into account. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (FLINK-9393) LocatableInputSplit#hashCode should take hostnames into account
[ https://issues.apache.org/jira/browse/FLINK-9393?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16479565#comment-16479565 ] Ted Yu commented on FLINK-9393: --- The tricky part is about the hostnames member which has an accessor: {code} public String[] getHostnames() { return this.hostnames; {code} In the context of instance of this class used as key to some Map, if some user of some instance of this class happens to change the member of the array, thereby changing the hash code / equals, the Map entry would not be found on the subsequent lookup (by checking the wrong bin). Another downside is higher collision rate since instances with different host names may hash to the same value. It will be more error prone than abiding by the contract between hashCode and equals. We should fix the methods to follow the contract as there really isn't any harm in doing it correctly, and avoiding weird surprises. > LocatableInputSplit#hashCode should take hostnames into account > --- > > Key: FLINK-9393 > URL: https://issues.apache.org/jira/browse/FLINK-9393 > Project: Flink > Issue Type: Bug >Reporter: Ted Yu >Assignee: vinoyang >Priority: Major > > Currently: > {code} > public int hashCode() { > return this.splitNumber; > {code} > This is not symmetrical with {{equals}} method where hostnames are compared. > LocatableInputSplit#hashCode should take hostnames into account. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (FLINK-9393) LocatableInputSplit#hashCode should take hostnames into account
[ https://issues.apache.org/jira/browse/FLINK-9393?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16479459#comment-16479459 ] Stephan Ewen commented on FLINK-9393: - This is not a bug, right? hashCode() is allowed to be worse than equals(), this is not really wrong. It only needs to not be different when objects are equal. In how the splits get generated, splitNumber is a quite good hash code, and fast (compared to mixing in the host name string hashes). Not sure we need/should actually change this. This issue falls a bit under the category - might have a good reason to be like it is - is not a bug - any change might counter the original intention. I personally would favor to focus on other issues... > LocatableInputSplit#hashCode should take hostnames into account > --- > > Key: FLINK-9393 > URL: https://issues.apache.org/jira/browse/FLINK-9393 > Project: Flink > Issue Type: Bug >Reporter: Ted Yu >Assignee: vinoyang >Priority: Major > > Currently: > {code} > public int hashCode() { > return this.splitNumber; > {code} > This is not symmetrical with {{equals}} method where hostnames are compared. > LocatableInputSplit#hashCode should take hostnames into account. -- This message was sent by Atlassian JIRA (v7.6.3#76005)