[Impala-ASF-CR] IMPALA-1654: DDL for multiple partitions

2017-01-17 Thread Amos Bird (Code Review)
Amos Bird has posted comments on this change.

Change subject: IMPALA-1654: DDL for multiple partitions
..


Patch Set 3:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/5726/3/docs/topics/impala_alter_table.xml
File docs/topics/impala_alter_table.xml:

Line 144:   For example, you might drop a group of partitions corresponding 
to a particular date
should we warn user about the scalability limit (multiple hms RPCs) of current 
bulk dropping?


http://gerrit.cloudera.org:8080/#/c/5726/3/docs/topics/impala_compute_stats.xml
File docs/topics/impala_compute_stats.xml:

Line 130:   The following COMPUTE INCREMENTAL STATS 
statements affect some but not all 
trailing space.


-- 
To view, visit http://gerrit.cloudera.org:8080/5726
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I2060552d5081e5f93b1b1f398414c52fa03f215b
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: John Russell 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Amos Bird 
Gerrit-HasComments: Yes


[Impala-ASF-CR] Add doc for MT DOP query option.

2017-01-17 Thread Mostafa Mokhtar (Code Review)
Mostafa Mokhtar has posted comments on this change.

Change subject: Add doc for MT_DOP query option.
..


Patch Set 1:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/5652/1/docs/topics/impala_mt_dop.xml
File docs/topics/impala_mt_dop.xml:

Line 39:   Sets the degree of parallelism used for certain operations that
Should we mention the operations where mt_dop applies?
Compute stats and queries that have scan and aggregate only operators?


Line 42:   and increased memory and CPU usage during statement processing.
I would reword to "ideal balance between response time, memory and CPU". 
As some operations like "compute stats" consume less overall memory due to 
running with less scanner threads.


-- 
To view, visit http://gerrit.cloudera.org:8080/5652
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ife2786532b425af6d230074f1c0b5c7dcb2b8a92
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: John Russell 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Mostafa Mokhtar 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-3909: Populate min/max statistics in Parquet writer

2017-01-17 Thread Lars Volker (Code Review)
Lars Volker has posted comments on this change.

Change subject: IMPALA-3909: Populate min/max statistics in Parquet writer
..


Patch Set 2:

(15 comments)

Thanks for the review, and apologies for the long delay. I ran into several 
issues when comparing statistics with Hive, until I found PARQUET-251, which 
seems to cause invalid stats to be written by Hive.

http://gerrit.cloudera.org:8080/#/c/5611/1//COMMIT_MSG
Commit Message:

Line 9: Change-Id: I8368ee58daa50c07a3b8ef65be70203eb941f619
> Do we have end-to-end testing that the column stats are correct? We really 
There is a simple test in test_insert_parquet.py but it only checks rowgroup 
statistics, not page statistics. I think it will be easier to write more 
extensive tests once the read path has been implemented.


http://gerrit.cloudera.org:8080/#/c/5611/1/be/src/exec/hdfs-parquet-table-writer.cc
File be/src/exec/hdfs-parquet-table-writer.cc:

Line 90:   BaseColumnWriter(HdfsParquetTableWriter* parent, ExprContext* 
expr_ctx,
> I feel like this file is getting pretty large. Can we pull these classes ou
Done


Line 97:   num_values_(0),
> I think we call this Merge() in other places like aggregate functions.
Done


Line 101:   def_levels_(NULL),
> I think the memory management here needs a bit more explanation. I.e. when 
Done


Line 206: // levels data and the encoded values.  If compression is 
enabled, this is the
> Do we know what ordering the parquet specification defines for each type? T
I added a comment to the class header. To me it looks safe to rely on the 
overloaded operators. Should we try and explicitly stamp out the template class 
only for the supported types to prevent new types from being added without 
manual interference? Are there any particular classes that you think may break 
in the future?


Line 224:   scoped_ptr compressor_;
> I guess this is the explanation of Copy() I was looking for. I think we nee
Done


Line 288: new ColumnStats(parent_->per_file_mem_pool_.get(), 
encoded_value_size_));
> This can potentially use a lot of memory since the old strings are never fr
Done


PS1, Line 349: value to h
> Maybe this shouldn't be CopyValues, rather CopyMemory? or CopyReferencedMem
Done


Line 485: 
> Can't we initialise most of these using the initialiser list? We should do 
I moved the initialization to Reset() to pass the correct memory pool, similar 
to dict_encoder_.


Line 598: 
> Can't we initialise most of these using the initialiser list? We should do 
I made the *_stats_ inline members, so their default initialization is 
sufficient. The *_stats_base members are stored in the base class. I'm 
reluctant to move them into the c'tor, since for all other column writers they 
will not be initialized at construction time, but when calling Reset().


Line 628:   // TODO: copy repetition data when we support nested types.
> Do we need scoped_ptr? Can we just store these inline?
Done


http://gerrit.cloudera.org:8080/#/c/5611/1/tests/query_test/test_insert_parquet.py
File tests/query_test/test_insert_parquet.py:

Line 214: 
> Extra line
Done


Line 281: 'hive_skip_col_idx' are excluded from the verification of the 
expected values.
> I think we need to find a way to test timestamp.
Done


Line 293: 
> We should also test the types that aren't in alltypes - char, varchar, deci
Done


Line 299: (qualified_table_name, source_table, num_lines)
> Does this exercise the string copying code? If the table is too small it se
I added a test using tpch_parquet.customer and manually verified that the 
resulting .parq file has multiple pages per column, thus exercising the string 
copy code. In hindsight it should already be sufficient to use a file with more 
multiple row batches, which functional.alltypes (7300 rows) should provide. I 
still think it makes sense to have a test with multiple pages per column so I'd 
opt to keep the new one, too.


-- 
To view, visit http://gerrit.cloudera.org:8080/5611
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I8368ee58daa50c07a3b8ef65be70203eb941f619
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Lars Volker 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-3909: Populate min/max statistics in Parquet writer

2017-01-17 Thread Lars Volker (Code Review)
Lars Volker has uploaded a new patch set (#2).

Change subject: IMPALA-3909: Populate min/max statistics in Parquet writer
..

IMPALA-3909: Populate min/max statistics in Parquet writer

Change-Id: I8368ee58daa50c07a3b8ef65be70203eb941f619
---
M be/src/exec/hdfs-parquet-table-writer.cc
M be/src/exec/hdfs-parquet-table-writer.h
A be/src/exec/parquet-column-stats.h
M be/src/exec/parquet-common.h
M be/src/runtime/string-value.h
M be/src/util/dict-test.cc
M tests/query_test/test_insert_parquet.py
M tests/util/get_parquet_metadata.py
8 files changed, 697 insertions(+), 43 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/11/5611/2
-- 
To view, visit http://gerrit.cloudera.org:8080/5611
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I8368ee58daa50c07a3b8ef65be70203eb941f619
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Lars Volker 
Gerrit-Reviewer: Tim Armstrong 


[Impala-ASF-CR] IMPALA-1654: DDL for multiple partitions

2017-01-17 Thread John Russell (Code Review)
John Russell has uploaded a new patch set (#3).

Change subject: IMPALA-1654: DDL for multiple partitions
..

IMPALA-1654: DDL for multiple partitions

Syntax and usage notes for ALTER TABLE,
COMPUTE STATS, and SHOW FILES.

Mixed in a little bit with new Kudu syntax for
ALTER TABLE. Didn't include all new Kudu info
in this CR, the better to minimize merge conflicts.

Change-Id: I2060552d5081e5f93b1b1f398414c52fa03f215b
---
M docs/topics/impala_alter_table.xml
M docs/topics/impala_compute_stats.xml
M docs/topics/impala_show.xml
3 files changed, 158 insertions(+), 3 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/26/5726/3
-- 
To view, visit http://gerrit.cloudera.org:8080/5726
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I2060552d5081e5f93b1b1f398414c52fa03f215b
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: John Russell 


[Impala-ASF-CR] IMPALA-1654: DDL for multiple partitions

2017-01-17 Thread John Russell (Code Review)
John Russell has uploaded a new patch set (#2).

Change subject: IMPALA-1654: DDL for multiple partitions
..

IMPALA-1654: DDL for multiple partitions

Syntax and usage notes for ALTER TABLE,
COMPUTE STATS, and SHOW FILES.

Mixed in a little bit with new Kudu syntax for
ALTER TABLE. Didn't include all new Kudu info
in this CR, the better to minimize merge conflicts.

Change-Id: I2060552d5081e5f93b1b1f398414c52fa03f215b
---
M docs/topics/impala_alter_table.xml
M docs/topics/impala_compute_stats.xml
M docs/topics/impala_show.xml
3 files changed, 139 insertions(+), 3 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/26/5726/2
-- 
To view, visit http://gerrit.cloudera.org:8080/5726
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I2060552d5081e5f93b1b1f398414c52fa03f215b
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: John Russell 


[Impala-ASF-CR] IMPALA-1654: DDL for multiple partitions

2017-01-17 Thread John Russell (Code Review)
John Russell has uploaded a new change for review.

  http://gerrit.cloudera.org:8080/5726

Change subject: IMPALA-1654: DDL for multiple partitions
..

IMPALA-1654: DDL for multiple partitions

Syntax and usage notes for both ALTER TABLE
and COMPUTE STATS.

Mixed in a little bit with new Kudu syntax for
ALTER TABLE. Didn't include all new Kudu info
in this CR, the better to minimize merge conflicts.

Change-Id: I2060552d5081e5f93b1b1f398414c52fa03f215b
---
M docs/topics/impala_alter_table.xml
M docs/topics/impala_compute_stats.xml
2 files changed, 125 insertions(+), 2 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/26/5726/1
-- 
To view, visit http://gerrit.cloudera.org:8080/5726
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newchange
Gerrit-Change-Id: I2060552d5081e5f93b1b1f398414c52fa03f215b
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: John Russell 


[Impala-ASF-CR] IMPALA-4449: Revisit table locking pattern in the catalog

2017-01-17 Thread Bharath Vissapragada (Code Review)
Bharath Vissapragada has posted comments on this change.

Change subject: IMPALA-4449: Revisit table locking pattern in the catalog
..


Patch Set 1:

(7 comments)

http://gerrit.cloudera.org:8080/#/c/5710/1//COMMIT_MSG
Commit Message:

PS1, Line 29: getAllCatalogObjects
getCatalogObjects()?


http://gerrit.cloudera.org:8080/#/c/5710/1/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java
File fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java:

Line 926: while (true) {
Regarding supportability, should we log a debug message/expose a average wait 
time metric via JMX if its stuck in these loops for longer than a configured 
value? IMO that makes it easier to debug queries hung at Catalog operations. 
(Same comment for other methods that follow this pattern, just wanted to hear 
you thoughts)


Line 928:   if (tbl.getLock().writeLock().tryLock()) {
To avoid extra branch, may be we could refactor to something like

catalogLock_.writeLock().lock();
if (!tbl.getLock().writeLock().tryLock()) {
  catalogLock_.writeLock().unlock();
  continue;
}
.
.
.
(Same comment for other places that uses this pattern)


PS1, Line 950: continue;
redundant?


Line 1305:   hdfsTable.setCatalogVersion(newCatalogVersion);
- Just curious, is this an existing bug that we didn't set it if the partition 
exists?

- May be we can move this line after L1285 before the try block since we are 
setting for both cases anyway and remove it at L1296?


Line 1312: continue;
redundant?


http://gerrit.cloudera.org:8080/#/c/5710/1/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
File fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java:

Line 190:  * }
I think we should document the exception of getCatalogObjects() here.


-- 
To view, visit http://gerrit.cloudera.org:8080/5710
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Id08e21da31deb1f003b3cada4517651f3b3b2bb2
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Dimitris Tsirogiannis 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-HasComments: Yes


[Impala-ASF-CR](asf-site) Initial commit of the blog section of the Impala ASF website.

2017-01-17 Thread David Knupp (Code Review)
David Knupp has posted comments on this change.

Change subject: Initial commit of the blog section of the Impala ASF website.
..


Patch Set 3:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/5667/3/blog/assets/js/html5.js
File blog/assets/js/html5.js:

Line 2:  HTML5 Shiv v3.7.0 | @afarkas @jdalton @jon_neal @rem | MIT/GPL2 
Licensed
> https://issues.apache.org/jira/browse/LEGAL-285
FYI:

"IMO yes see [1] you can just select the MIT license. Given that your probably 
not bundling this in a release it probably doesn't matter."

https://issues.apache.org/jira/browse/LEGAL-285


-- 
To view, visit http://gerrit.cloudera.org:8080/5667
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: If5810f04b56c587a5114953cae7e4f484a92be75
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: asf-site
Gerrit-Owner: David Knupp 
Gerrit-Reviewer: David Knupp 
Gerrit-Reviewer: Jim Apple 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-4670 / IMPALA-4672: Add RpcMgr and port Statestore services to KRPC

2017-01-17 Thread Henry Robinson (Code Review)
Henry Robinson has posted comments on this change.

Change subject: IMPALA-4670 / IMPALA-4672: Add RpcMgr and port Statestore 
services to KRPC
..


Patch Set 2:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/5720/2/tests/statestore/test_statestore.py
File tests/statestore/test_statestore.py:

Line 18: # TODO(KRPC): Re-enable.
> Or theoretically (meaning I haven't tried this) you could add a pytest skip
I expect this file to get removed completely as part of this commit.


-- 
To view, visit http://gerrit.cloudera.org:8080/5720
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I8dbf12b9ecd71d26d239d31c19b487175194c766
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Henry Robinson 
Gerrit-Reviewer: David Knupp 
Gerrit-Reviewer: Henry Robinson 
Gerrit-Reviewer: Taras Bobrovytsky 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-4670 / IMPALA-4672: Add RpcMgr and port Statestore services to KRPC

2017-01-17 Thread David Knupp (Code Review)
David Knupp has posted comments on this change.

Change subject: IMPALA-4670 / IMPALA-4672: Add RpcMgr and port Statestore 
services to KRPC
..


Patch Set 2:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/5720/2/tests/statestore/test_statestore.py
File tests/statestore/test_statestore.py:

Line 18: # TODO(KRPC): Re-enable.
> Instead of adding a # on every line, I would suggest to add ''' (three sing
Or theoretically (meaning I haven't tried this) you could add a pytest skipif 
mark at the module (global) scope.

   pytestmark = pytest.mark.skipif(reason="All tests to be rewritten in C++")


-- 
To view, visit http://gerrit.cloudera.org:8080/5720
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I8dbf12b9ecd71d26d239d31c19b487175194c766
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Henry Robinson 
Gerrit-Reviewer: David Knupp 
Gerrit-Reviewer: Taras Bobrovytsky 
Gerrit-HasComments: Yes