Csaba Ringhofer has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17842 )

Change subject: IMPALA-5741: Support reading tiny RDBMS tables
......................................................................


Patch Set 9:

(7 comments)

Looked through most of the code (but didn't went to the dao parts deeply). 
Thanks for the great work!

http://gerrit.cloudera.org:8080/#/c/17842/9/fe/src/main/java/org/apache/impala/extdatasource/ExternalDataSourceExecutor.java
File 
fe/src/main/java/org/apache/impala/extdatasource/ExternalDataSourceExecutor.java:

http://gerrit.cloudera.org:8080/#/c/17842/9/fe/src/main/java/org/apache/impala/extdatasource/ExternalDataSourceExecutor.java@192
PS9, Line 192: debug
This should probably have a higher log level than debug, e.g. warning


http://gerrit.cloudera.org:8080/#/c/17842/9/java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/JdbcDataSource.java
File 
java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/JdbcDataSource.java:

http://gerrit.cloudera.org:8080/#/c/17842/9/java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/JdbcDataSource.java@85
PS9, Line 85: cacheClass
Can you use _ postfix for all members?


http://gerrit.cloudera.org:8080/#/c/17842/9/java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/JdbcDataSource.java@106
PS9, Line 106: prepare
Can the planner call this more than once? If not, then a Precondition could be 
added to check this. A state could be also added, e.g. PREPARED or 
PLANNER_PREPARED.


http://gerrit.cloudera.org:8080/#/c/17842/9/java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/JdbcDataSource.java@153
PS9, Line 153: Recorder
What does Recorder mean here? Shouldn't it be iterator?


http://gerrit.cloudera.org:8080/#/c/17842/9/java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/JdbcDataSource.java@190
PS9, Line 190:     if (tableConfig == null)
Can the tableConfig be non-null here? I think that this will be only called in 
CREATED state.


http://gerrit.cloudera.org:8080/#/c/17842/9/java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/JdbcDataSource.java@259
PS9, Line 259:     totalNumberOfRecords = 
dbAccessor.getTotalNumberOfRecords(tableConfig);
The role of totalNumberOfRecords is not clear to me.
If I understand correctly getTotalNumberOfRecords() runs the count(*) for the 
query without applying LIMIT, and then totalNumberOfRecords is used in 
getNext() as an additional stop condition besides the iterator.

Why is it not enough to use the iterator.hasNext() as stop condition? Or maybe 
also use LIMIT as a sanity check in case it was not successfully pushed down.

I see two problem with calling count(*) first:
- without limit the call is potentially more expensive than the original query
- without transactions it is possible that the table is changed between the 
count(*) and the "real" query


http://gerrit.cloudera.org:8080/#/c/17842/9/java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/conf/DatabaseType.java
File 
java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/conf/DatabaseType.java:

http://gerrit.cloudera.org:8080/#/c/17842/9/java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/conf/DatabaseType.java@20
PS9, Line 20: public enum DatabaseType {
The classes in conf and dao directories seem to come from Hive, e.g.:

https://github.com/apache/hive/blob/master/jdbc-handler/src/main/java/org/apache/hive/storage/jdbc/conf/DatabaseType.java

I think that this should be documented somehow, e.g. by adding README.md or a 
comment to the files that come from Hive. A commit message could also contain 
some info about the relationship to Hive.

Separating code that is nearly the same as in Hive and Impala specific logic 
would also help with the review, as code from Hive doesn't need that deep 
inspection.



--
To view, visit http://gerrit.cloudera.org:8080/17842
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I8244e978c7717c6f1452f66f1630b6441392e7d2
Gerrit-Change-Number: 17842
Gerrit-PatchSet: 9
Gerrit-Owner: Fucun Chu <[email protected]>
Gerrit-Reviewer: Csaba Ringhofer <[email protected]>
Gerrit-Reviewer: Fucun Chu <[email protected]>
Gerrit-Reviewer: Impala Public Jenkins <[email protected]>
Gerrit-Reviewer: Quanlong Huang <[email protected]>
Gerrit-Comment-Date: Mon, 13 Mar 2023 09:45:26 +0000
Gerrit-HasComments: Yes

Reply via email to