Zoltan Borok-Nagy has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/15395 )

Change subject: IMPALA-9042: Milestone 1: properly scan files that has full 
ACID schema
......................................................................


Patch Set 4:

(21 comments)

Thanks for the comments!

http://gerrit.cloudera.org:8080/#/c/15395/3//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/15395/3//COMMIT_MSG@58
PS3, Line 58:  * specific queries about hidden columns (full-acid-rowid.test)
> Can you also add describe and describe formatted tests? I assume these show
Added tests to 'describe-path.test'.


http://gerrit.cloudera.org:8080/#/c/15395/3/be/src/exec/hdfs-orc-scanner.cc
File be/src/exec/hdfs-orc-scanner.cc:

http://gerrit.cloudera.org:8080/#/c/15395/3/be/src/exec/hdfs-orc-scanner.cc@181
PS3, Line 181:   bool is_file_full_acid = 
reader_->hasMetadataValue("hive.acid.version") &&
> I assume you're going to remove the log lines?
Done


http://gerrit.cloudera.org:8080/#/c/15395/3/be/src/exec/hdfs-orc-scanner.cc@185
PS3, Line 185:     return Status(Substitute("Error: Table is in full ACID 
format, but file "
> This is going to be addressed at some point, right? So we can scan original
Yes, "original files" are tracked with IMPALA-9515. Added TODO comment here.


http://gerrit.cloudera.org:8080/#/c/15395/3/be/src/exec/orc-metadata-utils.h
File be/src/exec/orc-metadata-utils.h:

http://gerrit.cloudera.org:8080/#/c/15395/3/be/src/exec/orc-metadata-utils.h@46
PS3, Line 46:  private:
> Maybe IsFileFullAcid()? Since it's going to get confusing in future about w
Removed this function.


http://gerrit.cloudera.org:8080/#/c/15395/3/be/src/exec/orc-metadata-utils.cc
File be/src/exec/orc-metadata-utils.cc:

http://gerrit.cloudera.org:8080/#/c/15395/3/be/src/exec/orc-metadata-utils.cc@94
PS3, Line 94:         if (table_idx == num_part_cols + FILE_INDEX_OF_FIELD_ROW) 
{
> I'd find it clearer if it was a named constant instead of the magic number
Done


http://gerrit.cloudera.org:8080/#/c/15395/3/be/src/exec/orc-metadata-utils.cc@101
PS3, Line 101: DCHECK
> DCHECK_GE
Done


http://gerrit.cloudera.org:8080/#/c/15395/3/be/src/exec/orc-metadata-utils.cc@110
PS3, Line 110: else {
> nit: This and the same @115 could be moved outside the if.
Done


http://gerrit.cloudera.org:8080/#/c/15395/3/be/src/exec/orc-metadata-utils.cc@118
PS3, Line 118:       // Hence, in the table metadata this is a top-level 
column, i.e. it is offsetted
> Could you explain the values here?
Added some comments about it.

Yes, file_idx should remain the same because only the first number is offsetted 
with 'num_part_cols'.


http://gerrit.cloudera.org:8080/#/c/15395/3/be/src/exec/orc-metadata-utils.cc@231
PS3, Line 231:       type.DebugString(), orc_type.toString(), filename_));
> Can you add a brief comment on the approach? It seems to be basically "if i
Looking at the ORC file metadata it turned out that the ACID version is also 
written there, so using that from now on instead of this duck approach.

Do you think it's worth to keep this for verifying the ACID schema?


http://gerrit.cloudera.org:8080/#/c/15395/3/be/src/runtime/descriptors.h
File be/src/runtime/descriptors.h:

http://gerrit.cloudera.org:8080/#/c/15395/3/be/src/runtime/descriptors.h@333
PS3, Line 333: IsTableFul
> IsFullAcidTable?
Renamed it to IsTableFullAcid() to make it similar to IsFileFullAcid() (though 
the latter was entirely removed since then)


http://gerrit.cloudera.org:8080/#/c/15395/3/fe/src/main/java/org/apache/impala/analysis/AlterTableStmt.java
File fe/src/main/java/org/apache/impala/analysis/AlterTableStmt.java:

http://gerrit.cloudera.org:8080/#/c/15395/3/fe/src/main/java/org/apache/impala/analysis/AlterTableStmt.java@94
PS3, Line 94:     // TODO: IMPALA-8831 will enable all ALTER TABLE statements 
on transactional tables.
> ?
Added a comment about it. We need ALTER TABLE ADD PARTITION and ALTER TABLE 
SORT BY during data loading so in the new PS I only enable those.

ALTER TABLE stmts are disabled because they fail when the table has column 
statistics (IMPALA-8831). The enabled ALTER TABLE statements should be safe in 
those regard. Hopefully there'll be a separate patch for IMPALA-8831 that does 
this for us so we can remove this from the final version of this change request.


http://gerrit.cloudera.org:8080/#/c/15395/3/fe/src/main/java/org/apache/impala/analysis/Analyzer.java
File fe/src/main/java/org/apache/impala/analysis/Analyzer.java:

http://gerrit.cloudera.org:8080/#/c/15395/3/fe/src/main/java/org/apache/impala/analysis/Analyzer.java@a297
PS3, Line 297:
> Removing this mean that we support write operations on full ACID tables? Do
ALTER TABLE and DROP TABLE calls this, but they are supported operations.


http://gerrit.cloudera.org:8080/#/c/15395/3/fe/src/main/java/org/apache/impala/analysis/Analyzer.java@224
PS3, Line 224: r, table.getFullName()
> There are now two ensureTableNotFullAcid methods, that only differ in the p
Yeah I think we can remove these.


http://gerrit.cloudera.org:8080/#/c/15395/3/fe/src/main/java/org/apache/impala/analysis/Analyzer.java@237
PS3, Line 237:
> nit: maybe change this string to enum?
I rather keep it this way because it's quite flexible and I don't want to 
introduce a new enum that contains all possible operations.


http://gerrit.cloudera.org:8080/#/c/15395/3/fe/src/main/java/org/apache/impala/analysis/Analyzer.java@238
PS3, Line 238:    * @throws AnalysisException If table is a bucketed table.
> nit: missing space
Done


http://gerrit.cloudera.org:8080/#/c/15395/3/fe/src/main/java/org/apache/impala/analysis/Path.java
File fe/src/main/java/org/apache/impala/analysis/Path.java:

http://gerrit.cloudera.org:8080/#/c/15395/3/fe/src/main/java/org/apache/impala/analysis/Path.java@473
PS3, Line 473: convertToFullAcid
> convertToFullAcidFilePath?
Done


http://gerrit.cloudera.org:8080/#/c/15395/3/testdata/bin/generate-schema-statements.py
File testdata/bin/generate-schema-statements.py:

http://gerrit.cloudera.org:8080/#/c/15395/3/testdata/bin/generate-schema-statements.py@319
PS3, Line 319: '
> flake8: E129 visually indented line with same indent as next logical line
Is there a way to get rid of these? It's also not true, because our next 
logical line is not indendted with 4 spaces.


http://gerrit.cloudera.org:8080/#/c/15395/3/testdata/workloads/functional-query/queries/QueryTest/acid-negative.test
File testdata/workloads/functional-query/queries/QueryTest/acid-negative.test:

http://gerrit.cloudera.org:8080/#/c/15395/3/testdata/workloads/functional-query/queries/QueryTest/acid-negative.test@1
PS3, Line 1: ====
> Why commented out?
In the new patch I only enable ADD PARTITION and SORT BY altering statements, 
so I enabled these tests again. See details at AlterTableStmt.java


http://gerrit.cloudera.org:8080/#/c/15395/3/testdata/workloads/functional-query/queries/QueryTest/full-acid-rowid.test
File testdata/workloads/functional-query/queries/QueryTest/full-acid-rowid.test:

http://gerrit.cloudera.org:8080/#/c/15395/3/testdata/workloads/functional-query/queries/QueryTest/full-acid-rowid.test@16
PS3, Line 16: INT, BOOLEAN, TINYINT, SMALLINT, INT, BIGINT, FLOAT, DOUBLE, 
STRING, STRING, TIMESTAMP, INT, INT
> Can you also add a LABELS section to make sure the labels for the ROW__ID c
Done


http://gerrit.cloudera.org:8080/#/c/15395/3/testdata/workloads/functional-query/queries/QueryTest/full-acid-rowid.test@17
PS3, Line 17: ====
> Do we expect the values to be deterministic across data loads?
TL;DR: Yes,

 * operation: always 0 (INSERT)
 * originalTransaction: it's actually the write id of the txn that inserted the 
row, it starts from 1
 * bucket: it's a bit-packed number:
 ** (bucket codec version) + (bucket ID) + (statement ID)
 ** bucket codec version is currently 1
 ** bucket ID is parsed from the filename. If the table is not bucketed then it 
will be the ordinal of the file in the transactional directory
 ** statment ID: ordinal of statement in multi-statement transactions.
 * rowid: ordinal of row (within transaction and bucket)
 * currentTransaction: current write id, it only differs from 
originalTransaction when the operation is 2 (DELETE)


http://gerrit.cloudera.org:8080/#/c/15395/4/tests/query_test/test_nested_types.py
File tests/query_test/test_nested_types.py:

http://gerrit.cloudera.org:8080/#/c/15395/4/tests/query_test/test_nested_types.py@211
PS4, Line 211:
too much indent



--
To view, visit http://gerrit.cloudera.org:8080/15395
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic2e2afec00c9a5cf87f1d61b5fe52b0085844bcb
Gerrit-Change-Number: 15395
Gerrit-PatchSet: 4
Gerrit-Owner: Zoltan Borok-Nagy <borokna...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com>
Gerrit-Reviewer: Norbert Luksa <norbert.lu...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <huangquanl...@gmail.com>
Gerrit-Reviewer: Tim Armstrong <tarmstr...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <borokna...@cloudera.com>
Gerrit-Comment-Date: Fri, 20 Mar 2020 16:39:07 +0000
Gerrit-HasComments: Yes

Reply via email to