Github user liancheng commented on the pull request:

    https://github.com/apache/spark/pull/7421#issuecomment-122527391
  
    Investigated the following 3 build failure samples:
    
    - 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/37233/testReport/
    - 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/37232/testReport/
    - 
https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/1083/testReport/
    
    Firstly, this issue couldn't be steadily reproduced, and only showed up on 
Jenkins occasionally.  An obvious guess is that it's probably a concurrency bug 
and only occurs within highly concurrent jobs (the Jenkins server has 32 cores, 
while our laptops have only 8 or less).
    
    Secondly, all 3 build failures behaved extremely consistently: 18 
`ParquetDataSourceOffMetastoreSuite` test cases involving partitioned Hive 
metastore Parquet tables failed altogether.  It seems that some **internal Hive 
state** got corrupted before this test suite was executed.  However, this PR 
only updates the read path and doesn't introduce any extra state.  So my guess 
is that, this PR doesn't introduce but just somehow triggers an existing issue. 
 The root cause probably lies in some initialization phase, e.g. `HiveContext` 
initialization, or testing partitioned table creation in 
`ParquetDataSourceOffMetastoreSuite.beforeAll()`.
    
    And I got another interesting finding after single step debugging a failed 
test case.  The following stacktrace snippet appears in all 3 build failures:
    
    ```
    Caused by: MetaException(message:Filtering is supported only on partition 
keys of type string)
          .----
          | at 
org.apache.hadoop.hive.metastore.parser.ExpressionTree$FilterBuilder.setError(ExpressionTree.java:185)
          | at 
org.apache.hadoop.hive.metastore.parser.ExpressionTree$LeafNode.getJdoFilterPushdownParam(ExpressionTree.java:452)
          | at 
org.apache.hadoop.hive.metastore.parser.ExpressionTree$LeafNode.generateJDOFilterOverPartitions(ExpressionTree.java:357)
          | at 
org.apache.hadoop.hive.metastore.parser.ExpressionTree$LeafNode.generateJDOFilter(ExpressionTree.java:279)
          | at 
org.apache.hadoop.hive.metastore.parser.ExpressionTree.generateJDOFilterFragment(ExpressionTree.java:590)
          | at 
org.apache.hadoop.hive.metastore.ObjectStore.makeQueryFilterString(ObjectStore.java:2417)
          | at 
org.apache.hadoop.hive.metastore.ObjectStore.getPartitionsViaOrmFilter(ObjectStore.java:2029)
          | at 
org.apache.hadoop.hive.metastore.ObjectStore.access$500(ObjectStore.java:146)
          | at 
org.apache.hadoop.hive.metastore.ObjectStore$4.getJdoResult(ObjectStore.java:2332)
          | at 
org.apache.hadoop.hive.metastore.ObjectStore$4.getJdoResult(ObjectStore.java:2317)
          `----
            at 
org.apache.hadoop.hive.metastore.ObjectStore$GetHelper.run(ObjectStore.java:2214)
    ```
    
    The marked code path showed above is actually NEVER executed in normal 
cases.  To be more specific, the `getJdoResult()` method in [the anonymous 
`GetListHelper` object] [1] is never called in [`GetHelper<T>.run()`] [2].  
Instead, only the `getSqlResult()` method is called.  And we can see that this 
behavior is controlled by `doUseDirectSql`, which is [partially decided] [3] by 
[`ObjectStore.directSql.isCompatibleDatastore`] [4].  Since `ObjectStore` is 
initialized while initializing `HiveContext`, 
`ObjectStore.directSql.isCompatibleDatastore` is probably the corrupted Hive 
internal state.
    
    Haven't got any clue how this state gets corrupted yet.  My guess is that 
there is a race condition during `HiveContext` initialization.  For example, 
maybe the underlying Derby database is not fully created while `ObjectStore` is 
been initialized.
    
    [1]: 
https://github.com/apache/hive/blob/release-0.13.1/metastore/src/java/org/apache/hadoop/hive/metastore/ObjectStore.java#L2317-L2334
    [2]: 
https://github.com/apache/hive/blob/release-0.13.1/metastore/src/java/org/apache/hadoop/hive/metastore/ObjectStore.java#L2206-L2215
    [3]: 
https://github.com/apache/hive/blob/release-0.13.1/metastore/src/java/org/apache/hadoop/hive/metastore/ObjectStore.java#L2195
    [4]: 
https://github.com/apache/hive/blob/release-0.13.1/metastore/src/java/org/apache/hadoop/hive/metastore/MetaStoreDirectSql.java#L116-L140


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to