[jira] [Commented] (FLINK-9393) LocatableInputSplit#hashCode should take hostnames into account

2018-05-18 Thread Stephan Ewen (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-9393?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16480310#comment-16480310
 ] 

Stephan Ewen commented on FLINK-9393:
-

The leaking of the hostnames array make {{equals()}} vulnerable, it still does 
not violate the equals/hashcode contract. In some sense it is almost an 
argument to not change {{hashCode()}}, unless the argument is "equals is partly 
broken, so hash code should be broken as well".

The way that "splitNumber" gets initialized, it is a perfect hashcode, no 
collisions at all. That is admittedly not guaranteed from the class itself, but 
it is the effect of the way it is currently used. Adding the hostnames to the 
hash code can introduce some collisions that were not there before. Its not 
going to be a problem, but looking at the bigger picture, things may always be 
a bit different than an individual hashCode function tells.

I am not opposed to fixing this issue. What I am trying to do is make 
contributors aware of the mechanics and implications of contributing to such a 
large project. We can probably spend two years of everyone's time to just 
rewrite parts of the code that are already fine, but could be done differently 
to satisfy some checkstyle, default pattern, common guideline, etc.
Every time we do that, we have to be really careful, because changing something 
that is not wrong cannot really make anything better, but could make things 
worse (again, implications can be subtle), so great care is needed, and often 
more time (from reviewing committers) than it may initially seem.

Please understand that I am only writing this in order to raise some awareness. 
I have seen projects that came into a state that they mainly worked on "moving 
things around" internally, and slowed down a lot on adding user/developer 
benefit. So in my opinion we should.

In that sense, yes, please go ahead, this is fine to be changed and merged. 
Just take the above into account when looking for parts of Flink you want to 
change.


> LocatableInputSplit#hashCode should take hostnames into account
> ---
>
> Key: FLINK-9393
> URL: https://issues.apache.org/jira/browse/FLINK-9393
> Project: Flink
>  Issue Type: Bug
>Reporter: Ted Yu
>Assignee: vinoyang
>Priority: Major
>
> Currently:
> {code}
>   public int hashCode() {
> return this.splitNumber;
> {code}
> This is not symmetrical with {{equals}} method where hostnames are compared.
> LocatableInputSplit#hashCode should take hostnames into account.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (FLINK-9393) LocatableInputSplit#hashCode should take hostnames into account

2018-05-17 Thread vinoyang (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-9393?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16480079#comment-16480079
 ] 

vinoyang commented on FLINK-9393:
-

the reason and opinion provided by [~yuzhih...@gmail.com] looks good. So, 
[~StephanEwen] any idea about this issue?

> LocatableInputSplit#hashCode should take hostnames into account
> ---
>
> Key: FLINK-9393
> URL: https://issues.apache.org/jira/browse/FLINK-9393
> Project: Flink
>  Issue Type: Bug
>Reporter: Ted Yu
>Assignee: vinoyang
>Priority: Major
>
> Currently:
> {code}
>   public int hashCode() {
> return this.splitNumber;
> {code}
> This is not symmetrical with {{equals}} method where hostnames are compared.
> LocatableInputSplit#hashCode should take hostnames into account.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (FLINK-9393) LocatableInputSplit#hashCode should take hostnames into account

2018-05-17 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-9393?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16479565#comment-16479565
 ] 

Ted Yu commented on FLINK-9393:
---

The tricky part is about the hostnames member which has an accessor:
{code}
  public String[] getHostnames() {
return this.hostnames;
{code}
In the context of instance of this class used as key to some Map, if some user 
of some instance of this class happens to change the member of the array, 
thereby changing the hash code / equals, the Map entry would not be found on 
the subsequent lookup (by checking the wrong bin).

Another downside is higher collision rate since instances with different host 
names may hash to the same value.

It will be more error prone than abiding by the contract between hashCode and 
equals. We should fix the methods to follow the contract as there really isn't 
any harm in doing it correctly, and avoiding weird surprises. 

> LocatableInputSplit#hashCode should take hostnames into account
> ---
>
> Key: FLINK-9393
> URL: https://issues.apache.org/jira/browse/FLINK-9393
> Project: Flink
>  Issue Type: Bug
>Reporter: Ted Yu
>Assignee: vinoyang
>Priority: Major
>
> Currently:
> {code}
>   public int hashCode() {
> return this.splitNumber;
> {code}
> This is not symmetrical with {{equals}} method where hostnames are compared.
> LocatableInputSplit#hashCode should take hostnames into account.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (FLINK-9393) LocatableInputSplit#hashCode should take hostnames into account

2018-05-17 Thread Stephan Ewen (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-9393?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16479459#comment-16479459
 ] 

Stephan Ewen commented on FLINK-9393:
-

This is not a bug, right?

hashCode() is allowed to be worse than equals(), this is not really wrong. It 
only needs to not be different when objects are equal.
In how the splits get generated, splitNumber is a quite good hash code, and 
fast (compared to mixing in the host name string hashes).

Not sure we need/should actually change this. This issue falls a bit under the 
category
  - might have a good reason to be like it is
  - is not a bug
  - any change might counter the original intention.

I personally would favor to focus on other issues...

> LocatableInputSplit#hashCode should take hostnames into account
> ---
>
> Key: FLINK-9393
> URL: https://issues.apache.org/jira/browse/FLINK-9393
> Project: Flink
>  Issue Type: Bug
>Reporter: Ted Yu
>Assignee: vinoyang
>Priority: Major
>
> Currently:
> {code}
>   public int hashCode() {
> return this.splitNumber;
> {code}
> This is not symmetrical with {{equals}} method where hostnames are compared.
> LocatableInputSplit#hashCode should take hostnames into account.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)