Csaba Ringhofer has posted comments on this change. ( http://gerrit.cloudera.org:8080/17842 )
Change subject: IMPALA-5741: Support reading tiny RDBMS tables ...................................................................... Patch Set 9: (7 comments) Looked through most of the code (but didn't went to the dao parts deeply). Thanks for the great work! http://gerrit.cloudera.org:8080/#/c/17842/9/fe/src/main/java/org/apache/impala/extdatasource/ExternalDataSourceExecutor.java File fe/src/main/java/org/apache/impala/extdatasource/ExternalDataSourceExecutor.java: http://gerrit.cloudera.org:8080/#/c/17842/9/fe/src/main/java/org/apache/impala/extdatasource/ExternalDataSourceExecutor.java@192 PS9, Line 192: debug This should probably have a higher log level than debug, e.g. warning http://gerrit.cloudera.org:8080/#/c/17842/9/java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/JdbcDataSource.java File java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/JdbcDataSource.java: http://gerrit.cloudera.org:8080/#/c/17842/9/java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/JdbcDataSource.java@85 PS9, Line 85: cacheClass Can you use _ postfix for all members? http://gerrit.cloudera.org:8080/#/c/17842/9/java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/JdbcDataSource.java@106 PS9, Line 106: prepare Can the planner call this more than once? If not, then a Precondition could be added to check this. A state could be also added, e.g. PREPARED or PLANNER_PREPARED. http://gerrit.cloudera.org:8080/#/c/17842/9/java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/JdbcDataSource.java@153 PS9, Line 153: Recorder What does Recorder mean here? Shouldn't it be iterator? http://gerrit.cloudera.org:8080/#/c/17842/9/java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/JdbcDataSource.java@190 PS9, Line 190: if (tableConfig == null) Can the tableConfig be non-null here? I think that this will be only called in CREATED state. http://gerrit.cloudera.org:8080/#/c/17842/9/java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/JdbcDataSource.java@259 PS9, Line 259: totalNumberOfRecords = dbAccessor.getTotalNumberOfRecords(tableConfig); The role of totalNumberOfRecords is not clear to me. If I understand correctly getTotalNumberOfRecords() runs the count(*) for the query without applying LIMIT, and then totalNumberOfRecords is used in getNext() as an additional stop condition besides the iterator. Why is it not enough to use the iterator.hasNext() as stop condition? Or maybe also use LIMIT as a sanity check in case it was not successfully pushed down. I see two problem with calling count(*) first: - without limit the call is potentially more expensive than the original query - without transactions it is possible that the table is changed between the count(*) and the "real" query http://gerrit.cloudera.org:8080/#/c/17842/9/java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/conf/DatabaseType.java File java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/conf/DatabaseType.java: http://gerrit.cloudera.org:8080/#/c/17842/9/java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/conf/DatabaseType.java@20 PS9, Line 20: public enum DatabaseType { The classes in conf and dao directories seem to come from Hive, e.g.: https://github.com/apache/hive/blob/master/jdbc-handler/src/main/java/org/apache/hive/storage/jdbc/conf/DatabaseType.java I think that this should be documented somehow, e.g. by adding README.md or a comment to the files that come from Hive. A commit message could also contain some info about the relationship to Hive. Separating code that is nearly the same as in Hive and Impala specific logic would also help with the review, as code from Hive doesn't need that deep inspection. -- To view, visit http://gerrit.cloudera.org:8080/17842 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I8244e978c7717c6f1452f66f1630b6441392e7d2 Gerrit-Change-Number: 17842 Gerrit-PatchSet: 9 Gerrit-Owner: Fucun Chu <[email protected]> Gerrit-Reviewer: Csaba Ringhofer <[email protected]> Gerrit-Reviewer: Fucun Chu <[email protected]> Gerrit-Reviewer: Impala Public Jenkins <[email protected]> Gerrit-Reviewer: Quanlong Huang <[email protected]> Gerrit-Comment-Date: Mon, 13 Mar 2023 09:45:26 +0000 Gerrit-HasComments: Yes
