Re: Review Request 18925: HIVE-6575 select * fails on parquet table with map datatype

2014-03-10 Thread justin coffey

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/18925/#review36644
---

Ship it!


go for r3 with the getClass (and no instanceof) check and {} formatting.

- justin coffey


On March 8, 2014, 12:01 a.m., Szehon Ho wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/18925/
 ---
 
 (Updated March 8, 2014, 12:01 a.m.)
 
 
 Review request for hive, Brock Noland, justin coffey, and Xuefu Zhang.
 
 
 Repository: hive-git
 
 
 Description
 ---
 
 The issue is, as part of select * query, a DeepParquetHiveMapInspector is 
 used for one column of an overall parquet-table struct object inspector.  
 
 The problem lies in the ObjectInspectorFactory's cache for struct object 
 inspector.  For performance, there is a cache keyed on an array list, of all 
 object inspectors of columns.  The second time the query is run, it attempts 
 to lookup cached struct inspector.  But when the hashmap looks up the part of 
 the key consisting of the DeepParquetHiveMapInspector, java calls .equals 
 against the existing DeepParquetHivemapInspector.  This fails, as the .equals 
 method casted the other to a StandardParquetHiveInspector.
 
 Regenerating the .equals and .hashcode from eclipse.  
 
 Also adding one more check in .equals before casting, to handle the case if 
 another class of object inspector gets hashed to the same hashcode in the 
 cache.  Then java would call .equals against the other, which in this case is 
 not of the same class.
 
 
 Diffs
 -
 
   
 ql/src/java/org/apache/hadoop/hive/ql/io/parquet/serde/AbstractParquetMapInspector.java
  1d72747 
 
 Diff: https://reviews.apache.org/r/18925/diff/
 
 
 Testing
 ---
 
 Manual testing.
 
 
 Thanks,
 
 Szehon Ho
 




Re: Review Request 18925: HIVE-6575 select * fails on parquet table with map datatype

2014-03-08 Thread Xuefu Zhang


 On March 8, 2014, 12:33 a.m., Xuefu Zhang wrote:
  ql/src/java/org/apache/hadoop/hive/ql/io/parquet/serde/AbstractParquetMapInspector.java,
   line 154
  https://reviews.apache.org/r/18925/diff/2/?file=513918#file513918line154
 
  I guess I wasn't clear. It's not inapproapriate, but goes beyond its 
  responsibility. Equality implementation is within a context which is the 
  class. The instance to be checked doesn't necessarily has the runtime class 
  info. In fact, the context shouldn't be aware the runtime class of these 
  instances, as child classes can be added any time. Doing getClass == 
  other.getClass() goes beyond the current context.
  
  What's more appropriate to check type compatibility by calling if 
  (other instanceof this.class). This is different from checking 
  this.getClass() == other.getClass().
  
  Take Java ArrayList.equals() method as an example. 
  (http://grepcode.com/file/repository.grepcode.com/java/root/jdk/openjdk/6-b14/java/util/AbstractList.java#AbstractList.equals%28java.lang.Object%29).
   This method doesn't do runtime class check. The implementation is saying, 
  other.getClass() doesn't have to be ArrayList, but has to be an instance of 
  ArrayList. It could be an instance of MyArrayList as long as MyArrayList is 
  inherited from ArrayList.
  
  If we think it's more protical to do such a check, we'd expect that 
  ArrayList.equals() would also check this.getClass() == other.getClass().
  
  Btw, I don't understand how it breaks transitivity by removing this 
  check.
  
  I understand this check was there before your change. I missed it in my 
  previous review.
 
 
 Szehon Ho wrote:
 Hm I actually did not realize that Java's code has that for collections, 
 thanks for pointing that out.  I suppose in list case, the semantic is the 
 user doesn't care about list implementation, but about the contents. 
 
 What I meant about breaking the transitive property if you allow each 
 class to decide:  Say we remove the check of RT class equality.  There is a 
 subclass called 'A' which choose to override equal to return true only if 
 'other' is A.  Another subclass 'B' doesn't override .equals, and by 
 inheritance can return true if 'other' is any subclass of parent (A or B).  
 A.equals(B) is false, B.equals(A) is true, breaking transitive.  Now that I 
 think about it, this argument doesn't justify having the parent one way or 
 another, all I meant is that a class cannot implement .equals just in its own 
 context as you mentioned, all subclass must choose the same way to be 
 consistent, and I felt that having this check at the parent would ensure that 
 all the children followed it.
 
 But coming back down to this particular issue, I still don't think its 
 safe to remove that check.  There are two subclass of 
 AbstractParquetMapInspector, the Deep and Standard one depending on the type 
 of map.  If we don't do this check, then Deep will be considered equal to 
 Standard, and perhaps the wrong one may be returned from cache and used in 
 the inspection, they are not interchangeable.  This is unlike java list,map, 
 here the actual class matters more than the content.  At least that is my 
 understanding looking at the code.

okay. Frankly, I don't know what's the difference between the two child class: 
the whole parquet code is very confusing. Since the code was there before this, 
it's fine to keep it as it is.


- Xuefu


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/18925/#review36586
---


On March 8, 2014, 12:01 a.m., Szehon Ho wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/18925/
 ---
 
 (Updated March 8, 2014, 12:01 a.m.)
 
 
 Review request for hive, Brock Noland, justin coffey, and Xuefu Zhang.
 
 
 Repository: hive-git
 
 
 Description
 ---
 
 The issue is, as part of select * query, a DeepParquetHiveMapInspector is 
 used for one column of an overall parquet-table struct object inspector.  
 
 The problem lies in the ObjectInspectorFactory's cache for struct object 
 inspector.  For performance, there is a cache keyed on an array list, of all 
 object inspectors of columns.  The second time the query is run, it attempts 
 to lookup cached struct inspector.  But when the hashmap looks up the part of 
 the key consisting of the DeepParquetHiveMapInspector, java calls .equals 
 against the existing DeepParquetHivemapInspector.  This fails, as the .equals 
 method casted the other to a StandardParquetHiveInspector.
 
 Regenerating the .equals and .hashcode from eclipse.  
 
 Also adding one more check in .equals before casting, to handle the case if 
 another 

Review Request 18925: HIVE-6575 select * fails on parquet table with map datatype

2014-03-07 Thread Szehon Ho

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/18925/
---

Review request for hive, Brock Noland, justin coffey, and Xuefu Zhang.


Repository: hive-git


Description
---

The issue is, as part of select * query, a DeepParquetHiveMapInspector is used 
for one column of an overall parquet-table struct object inspector.  

The problem lies in the ObjectInspectorFactory's cache for struct object 
inspector.  For performance it caches based on an array list, of all object 
inspectors of columns.  The second time the query is run, it is attempted to 
lookup cached value.  When the DeepParquetHiveMapInspector is gotten to, java 
calls .equals but it fails as the .equals method casted it to a 
StandardParquetHiveInspector.

Regenerating the .equals and .hashcode from eclipse.  

Also adding one more check in .equals before casting, as if by luck another 
type of object inspector gets hashed to the same hashcode in the cache, java 
would call .equals against it to find it if it is the key.


Diffs
-

  
ql/src/java/org/apache/hadoop/hive/ql/io/parquet/serde/AbstractParquetMapInspector.java
 1d72747 

Diff: https://reviews.apache.org/r/18925/diff/


Testing
---

Manual testing.


Thanks,

Szehon Ho



Re: Review Request 18925: HIVE-6575 select * fails on parquet table with map datatype

2014-03-07 Thread Brock Noland

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/18925/#review36576
---



ql/src/java/org/apache/hadoop/hive/ql/io/parquet/serde/AbstractParquetMapInspector.java
https://reviews.apache.org/r/18925/#comment67582

nit: trim ws


- Brock Noland


On March 7, 2014, 10:12 p.m., Szehon Ho wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/18925/
 ---
 
 (Updated March 7, 2014, 10:12 p.m.)
 
 
 Review request for hive, Brock Noland, justin coffey, and Xuefu Zhang.
 
 
 Repository: hive-git
 
 
 Description
 ---
 
 The issue is, as part of select * query, a DeepParquetHiveMapInspector is 
 used for one column of an overall parquet-table struct object inspector.  
 
 The problem lies in the ObjectInspectorFactory's cache for struct object 
 inspector.  For performance it caches based on an array list, of all object 
 inspectors of columns.  The second time the query is run, it is attempted to 
 lookup cached value.  When the DeepParquetHiveMapInspector is gotten to, java 
 calls .equals but it fails as the .equals method casted it to a 
 StandardParquetHiveInspector.
 
 Regenerating the .equals and .hashcode from eclipse.  
 
 Also adding one more check in .equals before casting, as if by luck another 
 type of object inspector gets hashed to the same hashcode in the cache, java 
 would call .equals against it to find it if it is the key.
 
 
 Diffs
 -
 
   
 ql/src/java/org/apache/hadoop/hive/ql/io/parquet/serde/AbstractParquetMapInspector.java
  1d72747 
 
 Diff: https://reviews.apache.org/r/18925/diff/
 
 
 Testing
 ---
 
 Manual testing.
 
 
 Thanks,
 
 Szehon Ho
 




Re: Review Request 18925: HIVE-6575 select * fails on parquet table with map datatype

2014-03-07 Thread Szehon Ho

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/18925/
---

(Updated March 7, 2014, 10:17 p.m.)


Review request for hive, Brock Noland, justin coffey, and Xuefu Zhang.


Repository: hive-git


Description (updated)
---

The issue is, as part of select * query, a DeepParquetHiveMapInspector is used 
for one column of an overall parquet-table struct object inspector.  

The problem lies in the ObjectInspectorFactory's cache for struct object 
inspector.  For performance, there is a cache keyed on an array list, of all 
object inspectors of columns.  The second time the query is run, it attempts to 
lookup cached struct inspector.  But when the hashmap looks up the part of the 
key consisting of the DeepParquetHiveMapInspector, java calls .equals against 
the existing DeepParquetHivemapInspector.  This fails, as the .equals method 
casted the other to a StandardParquetHiveInspector.

Regenerating the .equals and .hashcode from eclipse.  

Also adding one more check in .equals before casting, to handle the case if 
another class of object inspector gets hashed to the same hashcode in the 
cache.  Then java would call .equals against the other, which in this case is 
not of the same class.


Diffs
-

  
ql/src/java/org/apache/hadoop/hive/ql/io/parquet/serde/AbstractParquetMapInspector.java
 1d72747 

Diff: https://reviews.apache.org/r/18925/diff/


Testing
---

Manual testing.


Thanks,

Szehon Ho



Re: Review Request 18925: HIVE-6575 select * fails on parquet table with map datatype

2014-03-07 Thread Szehon Ho

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/18925/
---

(Updated March 7, 2014, 10:20 p.m.)


Review request for hive, Brock Noland, justin coffey, and Xuefu Zhang.


Changes
---

Thanks for the review.  Trimmed the whitespace.


Repository: hive-git


Description
---

The issue is, as part of select * query, a DeepParquetHiveMapInspector is used 
for one column of an overall parquet-table struct object inspector.  

The problem lies in the ObjectInspectorFactory's cache for struct object 
inspector.  For performance, there is a cache keyed on an array list, of all 
object inspectors of columns.  The second time the query is run, it attempts to 
lookup cached struct inspector.  But when the hashmap looks up the part of the 
key consisting of the DeepParquetHiveMapInspector, java calls .equals against 
the existing DeepParquetHivemapInspector.  This fails, as the .equals method 
casted the other to a StandardParquetHiveInspector.

Regenerating the .equals and .hashcode from eclipse.  

Also adding one more check in .equals before casting, to handle the case if 
another class of object inspector gets hashed to the same hashcode in the 
cache.  Then java would call .equals against the other, which in this case is 
not of the same class.


Diffs (updated)
-

  
ql/src/java/org/apache/hadoop/hive/ql/io/parquet/serde/AbstractParquetMapInspector.java
 1d72747 

Diff: https://reviews.apache.org/r/18925/diff/


Testing
---

Manual testing.


Thanks,

Szehon Ho



Re: Review Request 18925: HIVE-6575 select * fails on parquet table with map datatype

2014-03-07 Thread Xuefu Zhang

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/18925/#review36578
---



ql/src/java/org/apache/hadoop/hive/ql/io/parquet/serde/AbstractParquetMapInspector.java
https://reviews.apache.org/r/18925/#comment67584

I don't think we should check the class equality here. A child class might 
inherit a parent class w/o any new identifier. In that case the child class can 
still be equal to the parent class. It's really up to the child class to decide 
the equality (by overriding the equal()).



ql/src/java/org/apache/hadoop/hive/ql/io/parquet/serde/AbstractParquetMapInspector.java
https://reviews.apache.org/r/18925/#comment67585

1. code style: we may need {} for else if block

2. However, I think this might be refactored as follows for easier 
understanding:

boolean keyEq = (keyInspector == null  other.keyInspector == null) || 
keyInspector.equal(other.keyInspector);
boolean valEq = (ValueInspector == null  other.valueInspector == null) || 
valueInspector.equal(other.valueInspector);
return keyEq  valEq;


- Xuefu Zhang


On March 7, 2014, 10:20 p.m., Szehon Ho wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/18925/
 ---
 
 (Updated March 7, 2014, 10:20 p.m.)
 
 
 Review request for hive, Brock Noland, justin coffey, and Xuefu Zhang.
 
 
 Repository: hive-git
 
 
 Description
 ---
 
 The issue is, as part of select * query, a DeepParquetHiveMapInspector is 
 used for one column of an overall parquet-table struct object inspector.  
 
 The problem lies in the ObjectInspectorFactory's cache for struct object 
 inspector.  For performance, there is a cache keyed on an array list, of all 
 object inspectors of columns.  The second time the query is run, it attempts 
 to lookup cached struct inspector.  But when the hashmap looks up the part of 
 the key consisting of the DeepParquetHiveMapInspector, java calls .equals 
 against the existing DeepParquetHivemapInspector.  This fails, as the .equals 
 method casted the other to a StandardParquetHiveInspector.
 
 Regenerating the .equals and .hashcode from eclipse.  
 
 Also adding one more check in .equals before casting, to handle the case if 
 another class of object inspector gets hashed to the same hashcode in the 
 cache.  Then java would call .equals against the other, which in this case is 
 not of the same class.
 
 
 Diffs
 -
 
   
 ql/src/java/org/apache/hadoop/hive/ql/io/parquet/serde/AbstractParquetMapInspector.java
  1d72747 
 
 Diff: https://reviews.apache.org/r/18925/diff/
 
 
 Testing
 ---
 
 Manual testing.
 
 
 Thanks,
 
 Szehon Ho
 




Re: Review Request 18925: HIVE-6575 select * fails on parquet table with map datatype

2014-03-07 Thread Szehon Ho


 On March 7, 2014, 11:06 p.m., Xuefu Zhang wrote:
 

Thanks for the review, put some response and will do the refactor.


 On March 7, 2014, 11:06 p.m., Xuefu Zhang wrote:
  ql/src/java/org/apache/hadoop/hive/ql/io/parquet/serde/AbstractParquetMapInspector.java,
   line 154
  https://reviews.apache.org/r/18925/diff/2/?file=513918#file513918line154
 
  I don't think we should check the class equality here. A child class 
  might inherit a parent class w/o any new identifier. In that case the child 
  class can still be equal to the parent class. It's really up to the child 
  class to decide the equality (by overriding the equal()).

Hm if you really wanted to achieve that, a child of this class would be able to 
override .equals.  So why would this check not be appropriate here?

Also I don't think its good practice to allow this choice  , as what you are 
suggesting would make parent return true for .equals(children), and you are 
giving a choice for children to choose whether .equals(parent) is true for 
them.  If its not consistent, it would break transitive property of .equals.

Imo, the standard for this should be two elements are equal only if they are 
same class, that is guaranteeing the transitive property of equals.


 On March 7, 2014, 11:06 p.m., Xuefu Zhang wrote:
  ql/src/java/org/apache/hadoop/hive/ql/io/parquet/serde/AbstractParquetMapInspector.java,
   line 164
  https://reviews.apache.org/r/18925/diff/2/?file=513918#file513918line164
 
  1. code style: we may need {} for else if block
  
  2. However, I think this might be refactored as follows for easier 
  understanding:
  
  boolean keyEq = (keyInspector == null  other.keyInspector == null) || 
  keyInspector.equal(other.keyInspector);
  boolean valEq = (ValueInspector == null  other.valueInspector == 
  null) || valueInspector.equal(other.valueInspector);
  return keyEq  valEq;

Yea I'll refactor it.  This is the default generated one, which is not adhering 
to proper java-style.


- Szehon


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/18925/#review36578
---


On March 7, 2014, 10:20 p.m., Szehon Ho wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/18925/
 ---
 
 (Updated March 7, 2014, 10:20 p.m.)
 
 
 Review request for hive, Brock Noland, justin coffey, and Xuefu Zhang.
 
 
 Repository: hive-git
 
 
 Description
 ---
 
 The issue is, as part of select * query, a DeepParquetHiveMapInspector is 
 used for one column of an overall parquet-table struct object inspector.  
 
 The problem lies in the ObjectInspectorFactory's cache for struct object 
 inspector.  For performance, there is a cache keyed on an array list, of all 
 object inspectors of columns.  The second time the query is run, it attempts 
 to lookup cached struct inspector.  But when the hashmap looks up the part of 
 the key consisting of the DeepParquetHiveMapInspector, java calls .equals 
 against the existing DeepParquetHivemapInspector.  This fails, as the .equals 
 method casted the other to a StandardParquetHiveInspector.
 
 Regenerating the .equals and .hashcode from eclipse.  
 
 Also adding one more check in .equals before casting, to handle the case if 
 another class of object inspector gets hashed to the same hashcode in the 
 cache.  Then java would call .equals against the other, which in this case is 
 not of the same class.
 
 
 Diffs
 -
 
   
 ql/src/java/org/apache/hadoop/hive/ql/io/parquet/serde/AbstractParquetMapInspector.java
  1d72747 
 
 Diff: https://reviews.apache.org/r/18925/diff/
 
 
 Testing
 ---
 
 Manual testing.
 
 
 Thanks,
 
 Szehon Ho
 




Re: Review Request 18925: HIVE-6575 select * fails on parquet table with map datatype

2014-03-07 Thread Szehon Ho

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/18925/
---

(Updated March 8, 2014, 12:01 a.m.)


Review request for hive, Brock Noland, justin coffey, and Xuefu Zhang.


Changes
---

Fixed the styling of generated equals code.  I looked at the proposed refactor, 
I think the simplication you suggested would hit NPE if this.keyInspector is 
null and other.keyInspector != null.  Given that you would need a few more 
boolean statements in that line to make it correct, it seems that the current 
one is probably the cleanest to look at.


Repository: hive-git


Description
---

The issue is, as part of select * query, a DeepParquetHiveMapInspector is used 
for one column of an overall parquet-table struct object inspector.  

The problem lies in the ObjectInspectorFactory's cache for struct object 
inspector.  For performance, there is a cache keyed on an array list, of all 
object inspectors of columns.  The second time the query is run, it attempts to 
lookup cached struct inspector.  But when the hashmap looks up the part of the 
key consisting of the DeepParquetHiveMapInspector, java calls .equals against 
the existing DeepParquetHivemapInspector.  This fails, as the .equals method 
casted the other to a StandardParquetHiveInspector.

Regenerating the .equals and .hashcode from eclipse.  

Also adding one more check in .equals before casting, to handle the case if 
another class of object inspector gets hashed to the same hashcode in the 
cache.  Then java would call .equals against the other, which in this case is 
not of the same class.


Diffs (updated)
-

  
ql/src/java/org/apache/hadoop/hive/ql/io/parquet/serde/AbstractParquetMapInspector.java
 1d72747 

Diff: https://reviews.apache.org/r/18925/diff/


Testing
---

Manual testing.


Thanks,

Szehon Ho



Re: Review Request 18925: HIVE-6575 select * fails on parquet table with map datatype

2014-03-07 Thread Xuefu Zhang

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/18925/#review36586
---



ql/src/java/org/apache/hadoop/hive/ql/io/parquet/serde/AbstractParquetMapInspector.java
https://reviews.apache.org/r/18925/#comment67592

I guess I wasn't clear. It's not inapproapriate, but goes beyond its 
responsibility. Equality implementation is within a context which is the class. 
The instance to be checked doesn't necessarily has the runtime class info. In 
fact, the context shouldn't be aware the runtime class of these instances, as 
child classes can be added any time. Doing getClass == other.getClass() goes 
beyond the current context.

What's more appropriate to check type compatibility by calling if (other 
instanceof this.class). This is different from checking this.getClass() == 
other.getClass().

Take Java ArrayList.equals() method as an example. 
(http://grepcode.com/file/repository.grepcode.com/java/root/jdk/openjdk/6-b14/java/util/AbstractList.java#AbstractList.equals%28java.lang.Object%29).
 This method doesn't do runtime class check. The implementation is saying, 
other.getClass() doesn't have to be ArrayList, but has to be an instance of 
ArrayList. It could be an instance of MyArrayList as long as MyArrayList is 
inherited from ArrayList.

If we think it's more protical to do such a check, we'd expect that 
ArrayList.equals() would also check this.getClass() == other.getClass().

Btw, I don't understand how it breaks transitivity by removing this check.

I understand this check was there before your change. I missed it in my 
previous review.



- Xuefu Zhang


On March 8, 2014, 12:01 a.m., Szehon Ho wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/18925/
 ---
 
 (Updated March 8, 2014, 12:01 a.m.)
 
 
 Review request for hive, Brock Noland, justin coffey, and Xuefu Zhang.
 
 
 Repository: hive-git
 
 
 Description
 ---
 
 The issue is, as part of select * query, a DeepParquetHiveMapInspector is 
 used for one column of an overall parquet-table struct object inspector.  
 
 The problem lies in the ObjectInspectorFactory's cache for struct object 
 inspector.  For performance, there is a cache keyed on an array list, of all 
 object inspectors of columns.  The second time the query is run, it attempts 
 to lookup cached struct inspector.  But when the hashmap looks up the part of 
 the key consisting of the DeepParquetHiveMapInspector, java calls .equals 
 against the existing DeepParquetHivemapInspector.  This fails, as the .equals 
 method casted the other to a StandardParquetHiveInspector.
 
 Regenerating the .equals and .hashcode from eclipse.  
 
 Also adding one more check in .equals before casting, to handle the case if 
 another class of object inspector gets hashed to the same hashcode in the 
 cache.  Then java would call .equals against the other, which in this case is 
 not of the same class.
 
 
 Diffs
 -
 
   
 ql/src/java/org/apache/hadoop/hive/ql/io/parquet/serde/AbstractParquetMapInspector.java
  1d72747 
 
 Diff: https://reviews.apache.org/r/18925/diff/
 
 
 Testing
 ---
 
 Manual testing.
 
 
 Thanks,
 
 Szehon Ho
 




Re: Review Request 18925: HIVE-6575 select * fails on parquet table with map datatype

2014-03-07 Thread Szehon Ho


 On March 8, 2014, 12:33 a.m., Xuefu Zhang wrote:
  ql/src/java/org/apache/hadoop/hive/ql/io/parquet/serde/AbstractParquetMapInspector.java,
   line 154
  https://reviews.apache.org/r/18925/diff/2/?file=513918#file513918line154
 
  I guess I wasn't clear. It's not inapproapriate, but goes beyond its 
  responsibility. Equality implementation is within a context which is the 
  class. The instance to be checked doesn't necessarily has the runtime class 
  info. In fact, the context shouldn't be aware the runtime class of these 
  instances, as child classes can be added any time. Doing getClass == 
  other.getClass() goes beyond the current context.
  
  What's more appropriate to check type compatibility by calling if 
  (other instanceof this.class). This is different from checking 
  this.getClass() == other.getClass().
  
  Take Java ArrayList.equals() method as an example. 
  (http://grepcode.com/file/repository.grepcode.com/java/root/jdk/openjdk/6-b14/java/util/AbstractList.java#AbstractList.equals%28java.lang.Object%29).
   This method doesn't do runtime class check. The implementation is saying, 
  other.getClass() doesn't have to be ArrayList, but has to be an instance of 
  ArrayList. It could be an instance of MyArrayList as long as MyArrayList is 
  inherited from ArrayList.
  
  If we think it's more protical to do such a check, we'd expect that 
  ArrayList.equals() would also check this.getClass() == other.getClass().
  
  Btw, I don't understand how it breaks transitivity by removing this 
  check.
  
  I understand this check was there before your change. I missed it in my 
  previous review.
 

Hm I actually did not realize that Java's code has that for collections, thanks 
for pointing that out.  I suppose in list case, the semantic is the user 
doesn't care about list implementation, but about the contents. 

What I meant about breaking the transitive property if you allow each class to 
decide:  Say we remove the check of RT class equality.  There is a subclass 
called 'A' which choose to override equal to return true only if 'other' is A.  
Another subclass 'B' doesn't override .equals, and by inheritance can return 
true if 'other' is any subclass of parent (A or B).  A.equals(B) is false, 
B.equals(A) is true, breaking transitive.  Now that I think about it, this 
argument doesn't justify having the parent one way or another, all I meant is 
that a class cannot implement .equals just in its own context as you mentioned, 
all subclass must choose the same way to be consistent, and I felt that having 
this check at the parent would ensure that all the children followed it.

But coming back down to this particular issue, I still don't think its safe to 
remove that check.  There are two subclass of AbstractParquetMapInspector, the 
Deep and Standard one depending on the type of map.  If we don't do this check, 
then Deep will be considered equal to Standard, and perhaps the wrong one may 
be returned from cache and used in the inspection, they are not 
interchangeable.  This is unlike java list,map, here the actual class matters 
more than the content.  At least that is my understanding looking at the code.


- Szehon


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/18925/#review36586
---


On March 8, 2014, 12:01 a.m., Szehon Ho wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/18925/
 ---
 
 (Updated March 8, 2014, 12:01 a.m.)
 
 
 Review request for hive, Brock Noland, justin coffey, and Xuefu Zhang.
 
 
 Repository: hive-git
 
 
 Description
 ---
 
 The issue is, as part of select * query, a DeepParquetHiveMapInspector is 
 used for one column of an overall parquet-table struct object inspector.  
 
 The problem lies in the ObjectInspectorFactory's cache for struct object 
 inspector.  For performance, there is a cache keyed on an array list, of all 
 object inspectors of columns.  The second time the query is run, it attempts 
 to lookup cached struct inspector.  But when the hashmap looks up the part of 
 the key consisting of the DeepParquetHiveMapInspector, java calls .equals 
 against the existing DeepParquetHivemapInspector.  This fails, as the .equals 
 method casted the other to a StandardParquetHiveInspector.
 
 Regenerating the .equals and .hashcode from eclipse.  
 
 Also adding one more check in .equals before casting, to handle the case if 
 another class of object inspector gets hashed to the same hashcode in the 
 cache.  Then java would call .equals against the other, which in this case is 
 not of the same class.
 
 
 Diffs
 -