[Impala-ASF-CR] IMPALA-5174: Bump gflags to 2.2.0-p1
Impala Public Jenkins has posted comments on this change. Change subject: IMPALA-5174: Bump gflags to 2.2.0-p1 .. Patch Set 1: Build started: http://jenkins.impala.io:8080/job/gerrit-verify-dryrun/588/ -- To view, visit http://gerrit.cloudera.org:8080/6889 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ibc09a750879a8eae8b3549b9438241cb7c4448ed Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Henry RobinsonGerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Matthew Jacobs Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-5246: Verify that a UDF test completely finishes before moving on
Impala Public Jenkins has submitted this change and it was merged. Change subject: IMPALA-5246: Verify that a UDF test completely finishes before moving on .. IMPALA-5246: Verify that a UDF test completely finishes before moving on A memory intensive UDF test takes a while to completely finish and for the memory in Impala to be completely freed. This caused a problem in ASAN builds (and potentially in normal builds) because we would start the next test right away, before the memory is freed. We fix the issue by checking that all fragments finish executing before starting the next test. Testing: - Ran a private ASAN build which passed. Change-Id: I0555b5327945c522f70f449caa1214ee0bfd84fe Reviewed-on: http://gerrit.cloudera.org:8080/6893 Reviewed-by: Alex BehmReviewed-by: Michael Ho Tested-by: Impala Public Jenkins --- M tests/query_test/test_udfs.py 1 file changed, 9 insertions(+), 0 deletions(-) Approvals: Impala Public Jenkins: Verified Michael Ho: Looks good to me, approved Alex Behm: Looks good to me, but someone else must approve -- To view, visit http://gerrit.cloudera.org:8080/6893 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: merged Gerrit-Change-Id: I0555b5327945c522f70f449caa1214ee0bfd84fe Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Taras Bobrovytsky Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Taras Bobrovytsky
[Impala-ASF-CR] IMPALA-5246: Verify that a UDF test completely finishes before moving on
Impala Public Jenkins has posted comments on this change. Change subject: IMPALA-5246: Verify that a UDF test completely finishes before moving on .. Patch Set 1: Verified+1 -- To view, visit http://gerrit.cloudera.org:8080/6893 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I0555b5327945c522f70f449caa1214ee0bfd84fe Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Taras BobrovytskyGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Taras Bobrovytsky Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-5259: Add REFRESH FUNCTIONS statement
Taras Bobrovytsky has uploaded a new patch set (#2). Change subject: IMPALA-5259: Add REFRESH FUNCTIONS statement .. IMPALA-5259: Add REFRESH FUNCTIONS statement Before this patch, Impala relied on INVALIDATE METADATA to load externally added UDFs from HMS. The problem with this approach is that INVALIDATE METADATA affects all databases and tables in the entire cluster. In this patch, we add a REFRESH FUNCTIONS statement that reloads the functions of a database from HMS. We return a list of updated and removed db functions to the issuing Impalad in order to update its local catalog cache. Testing: - Ran a private build which passed. Change-Id: I3625c88bb51cca833f3293c224d3f0feb00e6e0b --- M common/thrift/CatalogService.thrift M fe/src/main/cup/sql-parser.cup M fe/src/main/java/org/apache/impala/analysis/ResetMetadataStmt.java M fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java M fe/src/main/java/org/apache/impala/catalog/Db.java M fe/src/main/java/org/apache/impala/catalog/Function.java M fe/src/main/java/org/apache/impala/catalog/ImpaladCatalog.java M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java M fe/src/main/java/org/apache/impala/service/Frontend.java M tests/custom_cluster/test_permanent_udfs.py 10 files changed, 290 insertions(+), 58 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/78/6878/2 -- To view, visit http://gerrit.cloudera.org:8080/6878 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I3625c88bb51cca833f3293c224d3f0feb00e6e0b Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Taras BobrovytskyGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Taras Bobrovytsky
[Impala-ASF-CR] IMPALA-5259: Add REFRESH FUNCTIONS statement
Taras Bobrovytsky has uploaded a new patch set (#2). Change subject: IMPALA-5259: Add REFRESH FUNCTIONS statement .. IMPALA-5259: Add REFRESH FUNCTIONS statement Before this patch, Impala relied on INVALIDATE METADATA to load externally added UDFs from HMS. The problem with this approach is that INVALIDATE METADATA affects all databases and tables in the entire cluster. In this patch, we add a REFRESH FUNCTIONS statement that reloads the functions of a database from HMS. We return a list of updated and removed db functions to the issuing Impalad in order to update its local catalog cache. Testing: - Ran a private build which passed. Change-Id: I3625c88bb51cca833f3293c224d3f0feb00e6e0b --- M common/thrift/CatalogService.thrift M fe/src/main/cup/sql-parser.cup M fe/src/main/java/org/apache/impala/analysis/ResetMetadataStmt.java M fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java M fe/src/main/java/org/apache/impala/catalog/Db.java M fe/src/main/java/org/apache/impala/catalog/Function.java M fe/src/main/java/org/apache/impala/catalog/ImpaladCatalog.java M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java M fe/src/main/java/org/apache/impala/service/Frontend.java M tests/custom_cluster/test_permanent_udfs.py 10 files changed, 290 insertions(+), 58 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/78/6878/2 -- To view, visit http://gerrit.cloudera.org:8080/6878 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I3625c88bb51cca833f3293c224d3f0feb00e6e0b Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Taras BobrovytskyGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Taras Bobrovytsky
[Impala-ASF-CR] IMPALA-5259: Add REFRESH FUNCTIONS statement
Taras Bobrovytsky has posted comments on this change. Change subject: IMPALA-5259: Add REFRESH FUNCTIONS statement .. Patch Set 1: (12 comments) http://gerrit.cloudera.org:8080/#/c/6878/1/fe/src/main/java/org/apache/impala/analysis/ResetMetadataStmt.java File fe/src/main/java/org/apache/impala/analysis/ResetMetadataStmt.java: Line 30: * Representation of a REFRESH/INVALIDATE METADATA statement. > Can you spell out the different variants to make this clearer? Done Line 45: public ResetMetadataStmt(TableName name, boolean isRefresh, > This c'tor is starting to look scary. Please make it private and add static Done http://gerrit.cloudera.org:8080/#/c/6878/1/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java File fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java: Line 616: List javaFns = > Let's move as much of the heavy work as possible outside of the global lock Done. Line 624: Db db = dbCache_.get().get(dbName); > need to handle db == null Done Line 627: Db tmpDb = new Db(dbName, this, msDb); > I understand this is convenient, but it would be preferable to not RPC to H The msDb is not actually needed, it can be null, I don't think it's used anywhere. Line 636: for (String fn_name: tmpDb.getAllFunctions().keySet()) { > easier to read if you iterate over the entrySet() Done Line 638: boolean result = db.addFunction(fn); > We need to handle the !result case. We could have a totally different funct It will only be false if an identical function exists (we don't compare just the name, we compare with IS_INDISTINGUISHABLE). So I don't think anything needs to be changed. Line 648: for (String fn_name: db.getAllFunctions().keySet()) { > easier to read if you iterate over the entrySet() Done Line 653: toRemove.add(fn); > why not remove immediately here? We can't modify the list as we are iterating over it. There will be some exception (something like concurrent modification). http://gerrit.cloudera.org:8080/#/c/6878/1/fe/src/main/java/org/apache/impala/catalog/Function.java File fe/src/main/java/org/apache/impala/catalog/Function.java: Line 318: result.setCatalog_version(getCatalogVersion()); > using catalogVersion_ is a little clearer to me Done http://gerrit.cloudera.org:8080/#/c/6878/1/fe/src/main/java/org/apache/impala/service/Frontend.java File fe/src/main/java/org/apache/impala/service/Frontend.java: Line 225: if (req.is_delta) { > single line and remove the else block Done http://gerrit.cloudera.org:8080/#/c/6878/1/tests/custom_cluster/test_permanent_udfs.py File tests/custom_cluster/test_permanent_udfs.py: Line 220: REFRESH_COMMANDS = ["INVALIDATE METADATA", > Let's think a little about how to test these scenarios: Done -- To view, visit http://gerrit.cloudera.org:8080/6878 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I3625c88bb51cca833f3293c224d3f0feb00e6e0b Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Taras BobrovytskyGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Taras Bobrovytsky Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5259: Add REFRESH FUNCTIONS statement
Taras Bobrovytsky has uploaded a new patch set (#3). Change subject: IMPALA-5259: Add REFRESH FUNCTIONS statement .. IMPALA-5259: Add REFRESH FUNCTIONS statement Before this patch, Impala relied on INVALIDATE METADATA to load externally added UDFs from HMS. The problem with this approach is that INVALIDATE METADATA affects all databases and tables in the entire cluster. In this patch, we add a REFRESH FUNCTIONS statement that reloads the functions of a database from HMS. We return a list of updated and removed db functions to the issuing Impalad in order to update its local catalog cache. Testing: - Ran a private build which passed. Change-Id: I3625c88bb51cca833f3293c224d3f0feb00e6e0b --- M common/thrift/CatalogService.thrift M fe/src/main/cup/sql-parser.cup M fe/src/main/java/org/apache/impala/analysis/ResetMetadataStmt.java M fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java M fe/src/main/java/org/apache/impala/catalog/Db.java M fe/src/main/java/org/apache/impala/catalog/Function.java M fe/src/main/java/org/apache/impala/catalog/ImpaladCatalog.java M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java M fe/src/main/java/org/apache/impala/service/Frontend.java M tests/custom_cluster/test_permanent_udfs.py 10 files changed, 288 insertions(+), 58 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/78/6878/3 -- To view, visit http://gerrit.cloudera.org:8080/6878 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I3625c88bb51cca833f3293c224d3f0feb00e6e0b Gerrit-PatchSet: 3 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Taras BobrovytskyGerrit-Reviewer: Alex Behm
[Impala-ASF-CR] IMPALA-5259: Add REFRESH FUNCTIONS statement
Taras Bobrovytsky has uploaded a new patch set (#2). Change subject: IMPALA-5259: Add REFRESH FUNCTIONS statement .. IMPALA-5259: Add REFRESH FUNCTIONS statement Before this patch, Impala relied on INVALIDATE METADATA to load externally added UDFs from HMS. The problem with this approach is that INVALIDATE METADATA affects all databases and tables in the entire cluster. In this patch, we add a REFRESH FUNCTIONS statement that reloads the functions of a database from HMS. We return a list of updated and removed db functions to the issuing Impalad in order to update its local catalog cache. Change-Id: I3625c88bb51cca833f3293c224d3f0feb00e6e0b --- M common/thrift/CatalogService.thrift M fe/src/main/cup/sql-parser.cup M fe/src/main/java/org/apache/impala/analysis/ResetMetadataStmt.java M fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java M fe/src/main/java/org/apache/impala/catalog/Db.java M fe/src/main/java/org/apache/impala/catalog/Function.java M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java M fe/src/main/java/org/apache/impala/service/Frontend.java M tests/custom_cluster/test_permanent_udfs.py 9 files changed, 282 insertions(+), 56 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/78/6878/2 -- To view, visit http://gerrit.cloudera.org:8080/6878 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I3625c88bb51cca833f3293c224d3f0feb00e6e0b Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Taras BobrovytskyGerrit-Reviewer: Alex Behm
[Impala-ASF-CR] IMPALA-5319: Fix test hdfs scan node errors failures
Impala Public Jenkins has posted comments on this change. Change subject: IMPALA-5319: Fix test_hdfs_scan_node_errors failures .. Patch Set 1: Verified-1 Build failed: http://jenkins.impala.io:8080/job/gerrit-verify-dryrun/586/ -- To view, visit http://gerrit.cloudera.org:8080/6891 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I533f1921662802ea6e076eefac973f50c014fcb5 Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Matthew JacobsGerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Jim Apple Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Michael Ho Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-5246: Verify that a UDF test completely finishes before moving on
Impala Public Jenkins has posted comments on this change. Change subject: IMPALA-5246: Verify that a UDF test completely finishes before moving on .. Patch Set 1: Build started: http://jenkins.impala.io:8080/job/gerrit-verify-dryrun/587/ -- To view, visit http://gerrit.cloudera.org:8080/6893 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I0555b5327945c522f70f449caa1214ee0bfd84fe Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Taras BobrovytskyGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Taras Bobrovytsky Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-5301: Set Kudu minicluster memory limit
Matthew Jacobs has posted comments on this change. Change subject: IMPALA-5301: Set Kudu minicluster memory limit .. Patch Set 4: http://sandbox.jenkins.cloudera.com/job/impala-umbrella-build-and-test/9740/ -- To view, visit http://gerrit.cloudera.org:8080/6844 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I7fd7e1cd9dc781aaa672a2c68c845cb57ec885d5 Gerrit-PatchSet: 4 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Matthew JacobsGerrit-Reviewer: Matthew Jacobs Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-5301: Set Kudu minicluster memory limit
Matthew Jacobs has posted comments on this change. Change subject: IMPALA-5301: Set Kudu minicluster memory limit .. Patch Set 4: passed a gvo dry run running a private exhaustive job now -- To view, visit http://gerrit.cloudera.org:8080/6844 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I7fd7e1cd9dc781aaa672a2c68c845cb57ec885d5 Gerrit-PatchSet: 4 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Matthew JacobsGerrit-Reviewer: Matthew Jacobs Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-5301: Set Kudu minicluster memory limit
Matthew Jacobs has uploaded a new patch set (#4). Change subject: IMPALA-5301: Set Kudu minicluster memory limit .. IMPALA-5301: Set Kudu minicluster memory limit By default, Kudu assumes it has 80% of system memory which is far too high for the minicluster. This sets a mem limit of 2gb and lowers the limit of the block cache. These values were tested on a gerrit-verify-dryrun job as well as an exhaustive run. This patch also simplifies TestKuduMemLimits which was unnecessarily creating a large table during test execution. Change-Id: I7fd7e1cd9dc781aaa672a2c68c845cb57ec885d5 --- M testdata/cluster/node_templates/cdh5/etc/init.d/kudu-common M tests/query_test/test_kudu.py 2 files changed, 5 insertions(+), 42 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/44/6844/4 -- To view, visit http://gerrit.cloudera.org:8080/6844 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I7fd7e1cd9dc781aaa672a2c68c845cb57ec885d5 Gerrit-PatchSet: 4 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Matthew JacobsGerrit-Reviewer: Matthew Jacobs
[Impala-ASF-CR] IMPALA-5246: Verify that a UDF test completely finishes before moving on
Michael Ho has posted comments on this change. Change subject: IMPALA-5246: Verify that a UDF test completely finishes before moving on .. Patch Set 1: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/6893 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I0555b5327945c522f70f449caa1214ee0bfd84fe Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Taras BobrovytskyGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Taras Bobrovytsky Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-1972/IMPALA-3882: Fix QueryExecState::lock contention
Dan Hecht has posted comments on this change. Change subject: IMPALA-1972/IMPALA-3882: Fix QueryExecState::lock_ contention .. Patch Set 6: (2 comments) http://gerrit.cloudera.org:8080/#/c/6707/6//COMMIT_MSG Commit Message: PS6, Line 16: While the actual fix for this is to not hold the QES::lock_ during : metadata load, I disagree. what you are doing is the actual fix for the lock acquisition order. Not holding QES::lock_ while planning is a partial fix for a different issue (not being able to cancel while planning). PS6, Line 20: We make sure that QES::lock_ doesn't block query_exec_state_map_lock_ : in hot paths. After this change, isn't it the case that we'll never try to acquire another lock while holding the map lock? If so, this should say make query_exec_state_map_lock_ a terminal lock. -- To view, visit http://gerrit.cloudera.org:8080/6707 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ie44daa93e3ae4d04d091261f3ec4891caffe8026 Gerrit-PatchSet: 6 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Bharath VissapragadaGerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Henry Robinson Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-1972/IMPALA-3882: Fix QueryExecState::lock contention
Dan Hecht has posted comments on this change. Change subject: IMPALA-1972/IMPALA-3882: Fix QueryExecState::lock_ contention .. Patch Set 6: (4 comments) http://gerrit.cloudera.org:8080/#/c/6707/6/be/src/service/impala-server.cc File be/src/service/impala-server.cc: Line 192: DEFINE_int64(metadata_loading_pause_injection_ms, 0, "Simulates metadata loading for a " how about prefixing the flag with "stress_", like we do for the one in free-pool.h, to give more of a clue that this is just for stressing the system. PS6, Line 822: // Note: this acquires the exec_state lock *before* the : // query_exec_state_map_ lock. This is the opposite of : // GetQueryExecState(..., true), and therefore looks like a : // candidate for deadlock. The reason this works here is that : // GetQueryExecState cannot find exec_state (under the exec state : // map lock) and take it's lock until RegisterQuery has : // finished. By that point, the exec state map lock will have been : // given up, so the classic deadlock interleaving is not possible. this comment is outdated now. http://gerrit.cloudera.org:8080/#/c/6707/6/be/src/service/impala-server.h File be/src/service/impala-server.h: PS6, Line 85: /// 4. query_exec_state_map_lock_ i think this should be updated. with this change, do we ever hold this lock in conjunction with another lock? http://gerrit.cloudera.org:8080/#/c/6707/6/tests/custom_cluster/test_query_concurrency.py File tests/custom_cluster/test_query_concurrency.py: PS6, Line 55: GetRuntimeProfileStr This doesn't make sense. GetRuntimeProfileStr() doesn't hold the map lock while getting the QES::lock_. I think what's happening is that the call on line 76 to get_in_flight_queries() is hitting the QuerySummaryHandler() handler (please double check that), which then causes the lock cycle. Without the fix, does the test ever get past line 76? I would expect the webserver to block on the QES::lock_ at that point. -- To view, visit http://gerrit.cloudera.org:8080/6707 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ie44daa93e3ae4d04d091261f3ec4891caffe8026 Gerrit-PatchSet: 6 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Bharath VissapragadaGerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Henry Robinson Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-1972/IMPALA-3882: Fix QueryExecState::lock contention
Dan Hecht has posted comments on this change. Change subject: IMPALA-1972/IMPALA-3882: Fix QueryExecState::lock_ contention .. Patch Set 6: > I haven't rebased this yet on to new QES changes so that you can > diff it from previous versions. Please let me know if you need a > rebased patch. Please go ahead and rebase. -- To view, visit http://gerrit.cloudera.org:8080/6707 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ie44daa93e3ae4d04d091261f3ec4891caffe8026 Gerrit-PatchSet: 6 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Bharath VissapragadaGerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Henry Robinson Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-5174: Bump gflags to 2.2.0-p1
Impala Public Jenkins has posted comments on this change. Change subject: IMPALA-5174: Bump gflags to 2.2.0-p1 .. Patch Set 1: Verified-1 Build failed: http://jenkins.impala.io:8080/job/gerrit-verify-dryrun/585/ -- To view, visit http://gerrit.cloudera.org:8080/6889 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ibc09a750879a8eae8b3549b9438241cb7c4448ed Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Henry RobinsonGerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Matthew Jacobs Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-5319: Fix test hdfs scan node errors failures
Impala Public Jenkins has posted comments on this change. Change subject: IMPALA-5319: Fix test_hdfs_scan_node_errors failures .. Patch Set 1: Build started: http://jenkins.impala.io:8080/job/gerrit-verify-dryrun/586/ -- To view, visit http://gerrit.cloudera.org:8080/6891 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I533f1921662802ea6e076eefac973f50c014fcb5 Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Matthew JacobsGerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Jim Apple Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Michael Ho Gerrit-HasComments: No
[native-toolchain-CR] Remove redundant libevent flag when building Thrift
Henry Robinson has uploaded a new change for review. http://gerrit.cloudera.org:8080/6904 Change subject: Remove redundant libevent flag when building Thrift .. Remove redundant libevent flag when building Thrift --with-libevent was specified twice, and overridden by the second option. Libevent is not required for Thrift if the non-blocking server is not used. Change-Id: Ibf6ef921f4389bd875578cd681f008b86f633af8 --- M source/thrift/build.sh 1 file changed, 0 insertions(+), 2 deletions(-) git pull ssh://gerrit.cloudera.org:29418/native-toolchain refs/changes/04/6904/1 -- To view, visit http://gerrit.cloudera.org:8080/6904 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newchange Gerrit-Change-Id: Ibf6ef921f4389bd875578cd681f008b86f633af8 Gerrit-PatchSet: 1 Gerrit-Project: native-toolchain Gerrit-Branch: master Gerrit-Owner: Henry Robinson
[Impala-ASF-CR] IMPALA-5286: Query fails due to Kudu column name case
Alex Behm has posted comments on this change. Change subject: IMPALA-5286: Query fails due to Kudu column name case .. Patch Set 1: (1 comment) http://gerrit.cloudera.org:8080/#/c/6902/1/fe/src/main/java/org/apache/impala/analysis/Analyzer.java File fe/src/main/java/org/apache/impala/analysis/Analyzer.java: Line 946: String key = slotPath.toString().toLowerCase(); This works just fine for other tables. The slotPath and names in Columns are generally assumed to be in lower case already. I suspect the real bug might be in the path that creates Impala columns from Kudu columns. If that is correct, then there could still be other issues lurking due to casing. -- To view, visit http://gerrit.cloudera.org:8080/6902 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I14aba88510012174716691b9946e1c7d54d01b44 Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Thomas Tauber-MarshallGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Matthew Jacobs Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5246: Verify that a UDF test completely finishes before moving on
Alex Behm has posted comments on this change. Change subject: IMPALA-5246: Verify that a UDF test completely finishes before moving on .. Patch Set 1: Code-Review+1 -- To view, visit http://gerrit.cloudera.org:8080/6893 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I0555b5327945c522f70f449caa1214ee0bfd84fe Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Taras BobrovytskyGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Taras Bobrovytsky Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-5286: Query fails due to Kudu column name case
Thomas Tauber-Marshall has uploaded a new change for review. http://gerrit.cloudera.org:8080/6902 Change subject: IMPALA-5286: Query fails due to Kudu column name case .. IMPALA-5286: Query fails due to Kudu column name case Impala is case insensitive for column names and generally deals with them in all lower case. Kudu is case sensitive. This can lead to a problem when a table is created externally in Kudu with a column name with upper case letters. The particular bug here is that the Analyzer can end up creating two SlotDescriptors that point to the same column because (despite comment in Analyzer.java to the contrary) it does a case sensitive comparison of slot paths to determine if a SlotDescriptor for the path already exists. The fix is to convert slot paths to lower case before doing the comparison. Testing: - Added an e2e test in test_kudu.py. Change-Id: I14aba88510012174716691b9946e1c7d54d01b44 --- M fe/src/main/java/org/apache/impala/analysis/Analyzer.java M tests/query_test/test_kudu.py 2 files changed, 27 insertions(+), 1 deletion(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/02/6902/1 -- To view, visit http://gerrit.cloudera.org:8080/6902 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newchange Gerrit-Change-Id: I14aba88510012174716691b9946e1c7d54d01b44 Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Thomas Tauber-Marshall
[Impala-ASF-CR] IMPALA-5304: reduce transfer of Parquet decompression buffers
Alex Behm has posted comments on this change. Change subject: IMPALA-5304: reduce transfer of Parquet decompression buffers .. Patch Set 4: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/6876 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I2dbd749f43078b222ff8e1ddcec840986c466de6 Gerrit-PatchSet: 4 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim ArmstrongGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: No
[Impala-ASF-CR] Add a script to test performance on a developer machine
Jim Apple has posted comments on this change. Change subject: Add a script to test performance on a developer machine .. Patch Set 2: (1 comment) http://gerrit.cloudera.org:8080/#/c/6818/2//COMMIT_MSG Commit Message: PS2, Line 20: buildall.sh > Did you mean to add buildall.sh to this review? I only mean that the script single_node_perf_run.py calls buildall.sh and the others. It is the mortar. buildall.sh was one of the ricks that didn't need to change. -- To view, visit http://gerrit.cloudera.org:8080/6818 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I70ba7f3c28f612a370915615600bf8dcebcedbc9 Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Jim AppleGerrit-Reviewer: David Knupp Gerrit-Reviewer: Jim Apple Gerrit-Reviewer: Michael Brown Gerrit-HasComments: Yes
[Impala-ASF-CR] Add a script to test performance on a developer machine
David Knupp has posted comments on this change. Change subject: Add a script to test performance on a developer machine .. Patch Set 2: (1 comment) Jim, this is mainly just to confirm that this review request has been noted. Will try to find time in the next couple of days to comment. http://gerrit.cloudera.org:8080/#/c/6818/2//COMMIT_MSG Commit Message: PS2, Line 20: buildall.sh Did you mean to add buildall.sh to this review? -- To view, visit http://gerrit.cloudera.org:8080/6818 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I70ba7f3c28f612a370915615600bf8dcebcedbc9 Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Jim AppleGerrit-Reviewer: David Knupp Gerrit-Reviewer: Michael Brown Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5304: reduce transfer of Parquet decompression buffers
Alex Behm has posted comments on this change. Change subject: IMPALA-5304: reduce transfer of Parquet decompression buffers .. Patch Set 4: Code-Review+1 Changed lgtm. I do wonder whether our tests are still covering both paths though. -- To view, visit http://gerrit.cloudera.org:8080/6876 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I2dbd749f43078b222ff8e1ddcec840986c466de6 Gerrit-PatchSet: 4 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim ArmstrongGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: No
[Impala-ASF-CR] Loads all TPC-DS tables
Mostafa Mokhtar has posted comments on this change. Change subject: Loads all TPC-DS tables .. Patch Set 2: (1 comment) http://gerrit.cloudera.org:8080/#/c/6877/2/testdata/datasets/tpcds/tpcds_schema_template.sql File testdata/datasets/tpcds/tpcds_schema_template.sql: Line 659: WHERE ss_sold_date_sk IS NULL; Replace the multiple insert statements with insert overwrite table store_sales partition(ss_sold_date_sk) /*+ clustered,shuffle */ select -- To view, visit http://gerrit.cloudera.org:8080/6877 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ic5277245fd20827c9c09ce5c1a7a37266ca476b9 Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Michael HoGerrit-Reviewer: Greg Rahn Gerrit-Reviewer: Mostafa Mokhtar Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5304: reduce transfer of Parquet decompression buffers
Tim Armstrong has posted comments on this change. Change subject: IMPALA-5304: reduce transfer of Parquet decompression buffers .. Patch Set 2: (2 comments) http://gerrit.cloudera.org:8080/#/c/6876/2/be/src/exec/parquet-column-readers.cc File be/src/exec/parquet-column-readers.cc: Line 1037: decompressed_data_pool_->FreeAll(); > Makes sense. We could selectively Clear() and FreeAll() depending on the un I'd prefer to defer it - I think a policy like that would be better implemented within Clear() since it's generally applicable. http://gerrit.cloudera.org:8080/#/c/6876/2/be/src/exec/parquet-column-readers.h File be/src/exec/parquet-column-readers.h: Line 476: && slot_desc_ != nullptr && slot_desc_->type().IsVarLenType(); > I think checking IsStringType() is more accurate because IsVarLenType() inc You're right, CHAR columns could reference the pages directly in some cases. -- To view, visit http://gerrit.cloudera.org:8080/6876 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I2dbd749f43078b222ff8e1ddcec840986c466de6 Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim ArmstrongGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5174: Bump gflags to 2.2.0-p1
Impala Public Jenkins has posted comments on this change. Change subject: IMPALA-5174: Bump gflags to 2.2.0-p1 .. Patch Set 1: Build started: http://jenkins.impala.io:8080/job/gerrit-verify-dryrun/585/ -- To view, visit http://gerrit.cloudera.org:8080/6889 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ibc09a750879a8eae8b3549b9438241cb7c4448ed Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Henry RobinsonGerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Matthew Jacobs Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-5174: Bump gflags to 2.2.0-p1
Matthew Jacobs has posted comments on this change. Change subject: IMPALA-5174: Bump gflags to 2.2.0-p1 .. Patch Set 1: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/6889 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ibc09a750879a8eae8b3549b9438241cb7c4448ed Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Henry RobinsonGerrit-Reviewer: Lars Volker Gerrit-Reviewer: Matthew Jacobs Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-5304: reduce transfer of Parquet decompression buffers
Tim Armstrong has uploaded a new patch set (#4). Change subject: IMPALA-5304: reduce transfer of Parquet decompression buffers .. IMPALA-5304: reduce transfer of Parquet decompression buffers The buffers contain the Parquet DataPages, which need to be attached to the row batch if the rows point to var-len data stored directly in the page. Otherwise the buffers can be discarded once the values in the page have been materialized. This reduces the amount of memory transferred between threads, which is a known TCMalloc anti-pattern. It also allows us to free memory earlier, which may help reduce memory consumption slightly. Also fix a latent bug I noticed where needs_conversion_ is not always initialised in the constructor. Testing Ran exhaustive build. Most of the Parquet tests use compressed Parquet, which should exercise this code path. Change-Id: I2dbd749f43078b222ff8e1ddcec840986c466de6 --- M be/src/exec/parquet-column-readers.cc M be/src/exec/parquet-column-readers.h 2 files changed, 18 insertions(+), 4 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/76/6876/4 -- To view, visit http://gerrit.cloudera.org:8080/6876 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I2dbd749f43078b222ff8e1ddcec840986c466de6 Gerrit-PatchSet: 4 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim ArmstrongGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] IMPALA-5331: Use new libHDFS API to address "Unknown Error 255"
Sailesh Mukil has posted comments on this change. Change subject: IMPALA-5331: Use new libHDFS API to address "Unknown Error 255" .. Patch Set 2: > I think the more information the better. If we can determine that > one of those messages is always superseded by the > get-last-exception-root-cause > msg then we can elide it, but for now it seems ok to me to have > more verbose msgs that could help debugging. Ok, makes sense. I've added it for all types of errors now. Figuring out if both the messages are redundant is not trivial, so I'm not solving that for now. -- To view, visit http://gerrit.cloudera.org:8080/6894 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I181e316ed63b70b94d4f7a7557d398a931bb171d Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Sailesh MukilGerrit-Reviewer: Henry Robinson Gerrit-Reviewer: Sailesh Mukil Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-5331: Use new libHDFS API to address "Unknown Error 255"
Sailesh Mukil has uploaded a new patch set (#2). Change subject: IMPALA-5331: Use new libHDFS API to address "Unknown Error 255" .. IMPALA-5331: Use new libHDFS API to address "Unknown Error 255" We use the new libHDFS API hdfsGetLastExceptionRootCause() to return the last seen HDFS error on that thread. This patch depends on the recent HDFS commit: https://github.com/apache/hadoop/commit/fda86ef2a32026c02d9b5d4cca1ecb7b4decd872 Change-Id: I181e316ed63b70b94d4f7a7557d398a931bb171d --- M be/src/util/hdfs-util.cc 1 file changed, 1 insertion(+), 0 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/94/6894/2 -- To view, visit http://gerrit.cloudera.org:8080/6894 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I181e316ed63b70b94d4f7a7557d398a931bb171d Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Sailesh MukilGerrit-Reviewer: Henry Robinson Gerrit-Reviewer: Sailesh Mukil
[Impala-ASF-CR] IMPALA-5297: Set Kudu minicluster memory limit
Matthew Jacobs has uploaded a new patch set (#3). Change subject: IMPALA-5297: Set Kudu minicluster memory limit .. IMPALA-5297: Set Kudu minicluster memory limit By default, Kudu assumes it has 80% of system memory which is far too high for the minicluster. This sets a mem limit of 1gb and lowers the limit of the block cache. Also moves TestKuduMemLimits to exhaustive since those tests put more memory pressure on Kudu than the gvo VMs can reliably handle, and the test was very slow anyway. Change-Id: I7fd7e1cd9dc781aaa672a2c68c845cb57ec885d5 --- M testdata/cluster/node_templates/cdh5/etc/init.d/kudu-common M tests/query_test/test_kudu.py 2 files changed, 11 insertions(+), 0 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/44/6844/3 -- To view, visit http://gerrit.cloudera.org:8080/6844 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I7fd7e1cd9dc781aaa672a2c68c845cb57ec885d5 Gerrit-PatchSet: 3 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Matthew JacobsGerrit-Reviewer: Matthew Jacobs
[Impala-ASF-CR] IMPALA-5331: Use new libHDFS API to address "Unknown Error 255"
Henry Robinson has posted comments on this change. Change subject: IMPALA-5331: Use new libHDFS API to address "Unknown Error 255" .. Patch Set 1: I think the more information the better. If we can determine that one of those messages is always superseded by the get-last-exception-root-cause msg then we can elide it, but for now it seems ok to me to have more verbose msgs that could help debugging. -- To view, visit http://gerrit.cloudera.org:8080/6894 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I181e316ed63b70b94d4f7a7557d398a931bb171d Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Sailesh MukilGerrit-Reviewer: Henry Robinson Gerrit-Reviewer: Sailesh Mukil Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-5331: Use new libHDFS API to address "Unknown Error 255"
Sailesh Mukil has posted comments on this change. Change subject: IMPALA-5331: Use new libHDFS API to address "Unknown Error 255" .. Patch Set 1: > If hdfsGetLastExceptionRootCause() always contains relevant > information, I think we should include it in all cases. What's to > be gained by leaving it out? In many cases, we avoid redundant error information by leaving it out. For example, a file not found error with this change for all types of errors: [localhost:21000] > select * from s3_test; Query: select * from s3_test Query submitted at: 2017-05-16 11:42:13 (Coordinator: http://localhost:25000) Query progress can be monitored at: http://localhost:25000/query_plan?query_id=7b4fea9564c4c1d5:fd45dd87 ERROR: Failed to open HDFS file s3a://sailesh-impala-s3/s3_test_parts/asbasda Error(2): No such file or directory Root cause: FileNotFoundException: No such file or directory: s3a://sailesh-impala-s3/s3_test_parts/asbasda -- To view, visit http://gerrit.cloudera.org:8080/6894 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I181e316ed63b70b94d4f7a7557d398a931bb171d Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Sailesh MukilGerrit-Reviewer: Henry Robinson Gerrit-Reviewer: Sailesh Mukil Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-5187, IMPALA-5208: Bump Breakpad Version, undo IMPALA-3794
Michael Brown has posted comments on this change. Change subject: IMPALA-5187, IMPALA-5208: Bump Breakpad Version, undo IMPALA-3794 .. Patch Set 1: Code-Review+1 -- To view, visit http://gerrit.cloudera.org:8080/6883 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ic541ccd565f2bb51f68c085747fc47ae8c905d19 Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Lars VolkerGerrit-Reviewer: David Knupp Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Michael Brown Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-5331: Use new libHDFS API to address "Unknown Error 255"
Henry Robinson has posted comments on this change. Change subject: IMPALA-5331: Use new libHDFS API to address "Unknown Error 255" .. Patch Set 1: If hdfsGetLastExceptionRootCause() always contains relevant information, I think we should include it in all cases. What's to be gained by leaving it out? -- To view, visit http://gerrit.cloudera.org:8080/6894 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I181e316ed63b70b94d4f7a7557d398a931bb171d Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Sailesh MukilGerrit-Reviewer: Henry Robinson Gerrit-Reviewer: Sailesh Mukil Gerrit-HasComments: No
[Impala-ASF-CR] Revert "IMPALA-2716: Hive/Impala incompatibility for timestamp data in Parquet"
Dan Hecht has posted comments on this change. Change subject: Revert "IMPALA-2716: Hive/Impala incompatibility for timestamp data in Parquet" .. Patch Set 1: Was it a clean revert? If not, where were the conflicts? -- To view, visit http://gerrit.cloudera.org:8080/6896 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ic66de277c622748540c1b9969152c2cabed1f3bd Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Attila JegesGerrit-Reviewer: Dan Hecht Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-5166: clean up BufferPool counters
Dan Hecht has posted comments on this change. Change subject: IMPALA-5166: clean up BufferPool counters .. Patch Set 6: Code-Review+2 (2 comments) http://gerrit.cloudera.org:8080/#/c/6690/6/be/src/runtime/bufferpool/reservation-tracker.cc File be/src/runtime/bufferpool/reservation-tracker.cc: PS6, Line 104: "InitialReservation" does that counter still exist? or is this meant to check one of the counters being added below? http://gerrit.cloudera.org:8080/#/c/6690/6/be/src/runtime/mem-tracker.cc File be/src/runtime/mem-tracker.cc: PS6, Line 228: Reservation=5.00 MB is it not useful to indicate how much of the reservation is actually in use as well? don't need to add it to this change but wondering if that will help with tuning reservations in extreme cases. -- To view, visit http://gerrit.cloudera.org:8080/6690 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I34b7f4d94c3d396ac89026c7559d6b2c6e02697c Gerrit-PatchSet: 6 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim ArmstrongGerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4890/5143: Coordinator race involving TearDown()
Marcel Kornacker has posted comments on this change. Change subject: IMPALA-4890/5143: Coordinator race involving TearDown() .. Patch Set 1: This passed tests in both debug and release mode. I haven't run the stress test yet. -- To view, visit http://gerrit.cloudera.org:8080/6897 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I457a6424a0255c137336c4bc01a6e7ed830d18c7 Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Marcel KornackerGerrit-Reviewer: Marcel Kornacker Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-5167: Reduce the number of Kudu clients created (FE)
Thomas Tauber-Marshall has uploaded a new change for review. http://gerrit.cloudera.org:8080/6898 Change subject: IMPALA-5167: Reduce the number of Kudu clients created (FE) .. IMPALA-5167: Reduce the number of Kudu clients created (FE) Creating Kudu clients is very expensive as each will fetch metadata from the Kudu master, so we should minimize the number of Kudu clients that get created. This patch stores a map from Kudu master addressed to Kudu clients in KuduUtil to be used across the FE and catalog. Another patch will address the BE. This relies on a change on the Kudu side that clears non-covered range entries from the client's cache on table open (d07ecd6ded01201c912d2e336611a6a941f48d98). Testing: - Ran a stress test on a 10 node cluster: scan of a small Kudu table, 1000 concurrent queries, load on the Kudu master was reduced signficantly, from ~50% cpu to ~5%. (with the BE changes included) - Ran the Kudu e2e tests. - Manually ran a test with concurrent INSERTs and 'ALTER TABLE ADD PARTITION' (which is affected by the Kudu side change mentiond above) and verified correctness. Change-Id: I9b0b346f37ee43f7f0eefe34a093eddbbdcf2a5e --- M fe/src/main/java/org/apache/impala/catalog/KuduTable.java M fe/src/main/java/org/apache/impala/planner/KuduScanNode.java M fe/src/main/java/org/apache/impala/service/KuduCatalogOpExecutor.java M fe/src/main/java/org/apache/impala/util/KuduUtil.java 4 files changed, 41 insertions(+), 21 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/98/6898/1 -- To view, visit http://gerrit.cloudera.org:8080/6898 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newchange Gerrit-Change-Id: I9b0b346f37ee43f7f0eefe34a093eddbbdcf2a5e Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Thomas Tauber-Marshall
[Impala-ASF-CR] IMPALA-4890/5143: Coordinator race involving TearDown()
Marcel Kornacker has uploaded a new change for review. http://gerrit.cloudera.org:8080/6897 Change subject: IMPALA-4890/5143: Coordinator race involving TearDown() .. IMPALA-4890/5143: Coordinator race involving TearDown() TearDown() releases resources and destroys control structures (the QueryState reference), and it can be called while a concurrent thread executes Exec() or might call GetNext() in the future. The solution is not to destroy the control structures. This also releases resources automatically at the end of query execution. Change-Id: I457a6424a0255c137336c4bc01a6e7ed830d18c7 --- M be/src/runtime/coordinator.cc M be/src/runtime/coordinator.h M be/src/service/client-request-state.cc 3 files changed, 29 insertions(+), 47 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/97/6897/1 -- To view, visit http://gerrit.cloudera.org:8080/6897 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newchange Gerrit-Change-Id: I457a6424a0255c137336c4bc01a6e7ed830d18c7 Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Marcel Kornacker
[Impala-ASF-CR] Revert "IMPALA-2716: Hive/Impala incompatibility for timestamp data in Parquet"
Attila Jeges has uploaded a new change for review. http://gerrit.cloudera.org:8080/6896 Change subject: Revert "IMPALA-2716: Hive/Impala incompatibility for timestamp data in Parquet" .. Revert "IMPALA-2716: Hive/Impala incompatibility for timestamp data in Parquet" Reverting IMPALA-2716 as SparkSQL does not agree with the approach taken. More details can be found at: https://issues.apache.org/jira/browse/SPARK-12297 Change-Id: Ic66de277c622748540c1b9969152c2cabed1f3bd --- M be/src/benchmarks/CMakeLists.txt D be/src/benchmarks/convert-timestamp-benchmark.cc M be/src/exec/hdfs-scan-node-base.cc M be/src/exec/hdfs-scan-node-base.h M be/src/exec/parquet-column-readers.cc M be/src/exprs/timestamp-functions.cc M be/src/exprs/timezone_db.h M be/src/runtime/timestamp-value.cc M be/src/runtime/timestamp-value.h M be/src/service/fe-support.cc M be/src/service/impala-server.cc M be/src/util/backend-gflag-util.cc M common/thrift/BackendGflags.thrift M common/thrift/PlanNodes.thrift M common/thrift/generate_error_codes.py M fe/src/main/java/org/apache/impala/analysis/AlterTableSetTblProperties.java M fe/src/main/java/org/apache/impala/analysis/BaseTableRef.java M fe/src/main/java/org/apache/impala/analysis/CreateTableStmt.java M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java M fe/src/main/java/org/apache/impala/service/BackendConfig.java M fe/src/main/java/org/apache/impala/service/FeSupport.java M fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java M tests/common/impala_test_suite.py M tests/custom_cluster/test_hive_parquet_timestamp_conversion.py M tests/metadata/test_ddl.py M tests/metadata/test_ddl_base.py D tests/query_test/test_parquet_timestamp_compatibility.py 28 files changed, 84 insertions(+), 850 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/96/6896/1 -- To view, visit http://gerrit.cloudera.org:8080/6896 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newchange Gerrit-Change-Id: Ic66de277c622748540c1b9969152c2cabed1f3bd Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Attila Jeges
[Impala-ASF-CR] IMPALA-5166: clean up BufferPool counters
Tim Armstrong has posted comments on this change. Change subject: IMPALA-5166: clean up BufferPool counters .. Patch Set 3: (2 comments) http://gerrit.cloudera.org:8080/#/c/6690/3/be/src/runtime/bufferpool/buffer-pool.cc File be/src/runtime/bufferpool/buffer-pool.cc: PS3, Line 279: profile->GetOrCreateChild > Right, but why not just add that (or use the existing AddChid() pattern)? :S forgot what was in my own patch. Switched to CreateChild() and using an existing API in buffer-pool-test. I think CreateChild() is a better API than AddChild() because it doesn't require threading the ObjectPool through - there's no logical reason I can see that the child profile would have a different lifetime to its parent. http://gerrit.cloudera.org:8080/#/c/6690/3/be/src/runtime/mem-tracker.cc File be/src/runtime/mem-tracker.cc: PS3, Line 237: BufferPoolUsed > update Done -- To view, visit http://gerrit.cloudera.org:8080/6690 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I34b7f4d94c3d396ac89026c7559d6b2c6e02697c Gerrit-PatchSet: 3 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim ArmstrongGerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5166: clean up BufferPool counters
Tim Armstrong has uploaded a new patch set (#6). Change subject: IMPALA-5166: clean up BufferPool counters .. IMPALA-5166: clean up BufferPool counters Misc changes to improve usability of the profiles. * Separate out detailed BufferPool metrics into a "Buffer pool" sub-profile. * Only create the limit counter if there is a limit * Show BufferPool using in query MemTracker (it was accidentally disabled before because there was no query-level profile). * Reduce clutter in MemTracker dump by only showing buffer pool reservation, not usage (the usage was misleading anyway because it didn't include child usage). * Remove TotalUnpinnedBytes, which had limited value - WriteIoBytes and PeakUnpinnedBytes can answer most of the same questions - i.e. did it unpin any pages, and how many did it need to write to disk. * Add buffer pool metrics to /memz (if buffer pool is enabled) and reorder /memz so more useful information is up the top. Change-Id: I34b7f4d94c3d396ac89026c7559d6b2c6e02697c --- M be/src/runtime/bufferpool/buffer-allocator.cc M be/src/runtime/bufferpool/buffer-pool-counters.h M be/src/runtime/bufferpool/buffer-pool-test.cc M be/src/runtime/bufferpool/buffer-pool.cc M be/src/runtime/bufferpool/reservation-tracker.cc M be/src/runtime/mem-tracker.cc M be/src/util/default-path-handlers.cc M be/src/util/memory-metrics.cc M be/src/util/runtime-profile.cc M be/src/util/runtime-profile.h M common/thrift/metrics.json M www/memz.tmpl 12 files changed, 145 insertions(+), 87 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/90/6690/6 -- To view, visit http://gerrit.cloudera.org:8080/6690 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I34b7f4d94c3d396ac89026c7559d6b2c6e02697c Gerrit-PatchSet: 6 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim ArmstrongGerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] IMPALA-5167: Reduce the number of Kudu clients created (BE)
Thomas Tauber-Marshall has posted comments on this change. Change subject: IMPALA-5167: Reduce the number of Kudu clients created (BE) .. Patch Set 6: (3 comments) > So I just spoke with the Kudu folks and it sounds like there will > be an issue with the Java client that will prevent it from being > used for more than 7 days w/ Kerberos (some token issue), and I > don't think that'll get fixed ASAP. The issue does exist for the > C++ client as well but that's being fixed now. Can you separate out > the Java side so we can submit them independently? Done. http://gerrit.cloudera.org:8080/#/c/6792/5/be/src/runtime/exec-env.cc File be/src/runtime/exec-env.cc: Line 403: KuduClientPtr* kudu_client_ptr = new KuduClientPtr; > no std:: in .cc Done Line 404: RETURN_IF_ERROR(CreateKuduClient(master_addresses, _client_ptr->kudu_client)); > I suggested it, but we can take it out if you think it's superfluous. If I understand correctly, this list comes from a table property, which by default is populated from a start up flag. I assume the common case we expect is for people to mostly create tables with the default list, so they should generally be in the same order. Its also probably not a big deal if we end up creating a small number of Kudu clients for different orderings, its still a significant reduction in the total number created. http://gerrit.cloudera.org:8080/#/c/6792/5/fe/src/main/java/org/apache/impala/util/KuduUtil.java File fe/src/main/java/org/apache/impala/util/KuduUtil.java: Line 72: b.defaultOperationTimeoutMs(BackendConfig.INSTANCE.getKuduClientTimeoutMs()); > move above the function comment. Done -- To view, visit http://gerrit.cloudera.org:8080/6792 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I6b0c12a256c33e8ef32315b3736cae2dea2ae705 Gerrit-PatchSet: 6 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Thomas Tauber-MarshallGerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5167: Reduce the number of Kudu clients created (BE)
Hello Matthew Jacobs, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/6792 to look at the new patch set (#6). Change subject: IMPALA-5167: Reduce the number of Kudu clients created (BE) .. IMPALA-5167: Reduce the number of Kudu clients created (BE) Creating Kudu clients is very expensive as each will fetch metadata from the Kudu master, so we should minimize the number of Kudu clients that get created. This patch stores a map from Kudu master addressed to Kudu clients in the ExecEnv to be used across the BE for all queries. Another patch will address the FE. This relies on a change on the Kudu side that clears non-covered range entries from the client's cache on table open (fdc022fe6231af20e307012d98c35b16cbfa7b33) Testing: - Ran a stress test on a 10 node cluster: scan of a small Kudu table, 1000 concurrent queries, load on the Kudu master was reduced signficantly, from ~50% cpu to ~5%. (with the FE changes included) - Ran the Kudu e2e tests. - Manually ran a test with concurrent INSERTs and 'ALTER TABLE ADD PARTITION' (which is affected by the Kudu side change mentiond above) and verified correctness. Change-Id: I6b0c12a256c33e8ef32315b3736cae2dea2ae705 --- M be/src/exec/kudu-scan-node-base.cc M be/src/exec/kudu-table-sink.cc M be/src/exec/kudu-table-sink.h M be/src/exprs/kudu-partition-expr.cc M be/src/runtime/exec-env.cc M be/src/runtime/exec-env.h M be/src/runtime/query-state.cc M be/src/runtime/query-state.h 8 files changed, 73 insertions(+), 69 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/92/6792/6 -- To view, visit http://gerrit.cloudera.org:8080/6792 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I6b0c12a256c33e8ef32315b3736cae2dea2ae705 Gerrit-PatchSet: 6 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Thomas Tauber-MarshallGerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Thomas Tauber-Marshall
[Impala-ASF-CR] IMPALA-5166: clean up BufferPool counters
Tim Armstrong has uploaded a new patch set (#5). Change subject: IMPALA-5166: clean up BufferPool counters .. IMPALA-5166: clean up BufferPool counters Misc changes to improve usability of the profiles. * Separate out detailed BufferPool metrics into a "Buffer pool" sub-profile. * Only create the limit counter if there is a limit * Show BufferPool using in query MemTracker (it was accidentally disabled before because there was no query-level profile). * Reduce clutter in MemTracker dump by only showing buffer pool reservation, not usage (the usage was misleading anyway because it didn't include child usage). * Remove TotalUnpinnedBytes, which had limited value - WriteIoBytes and PeakUnpinnedBytes can answer most of the same questions - i.e. did it unpin any pages, and how many did it need to write to disk. * Add buffer pool metrics to /memz (if buffer pool is enabled) and reorder /memz so more useful information is up the top. Change-Id: I34b7f4d94c3d396ac89026c7559d6b2c6e02697c --- M be/src/runtime/bufferpool/buffer-allocator.cc M be/src/runtime/bufferpool/buffer-pool-counters.h M be/src/runtime/bufferpool/buffer-pool-test.cc M be/src/runtime/bufferpool/buffer-pool.cc M be/src/runtime/bufferpool/reservation-tracker.cc M be/src/runtime/mem-tracker.cc M be/src/util/default-path-handlers.cc M be/src/util/memory-metrics.cc M be/src/util/runtime-profile.cc M be/src/util/runtime-profile.h M common/thrift/metrics.json M www/memz.tmpl 12 files changed, 145 insertions(+), 90 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/90/6690/5 -- To view, visit http://gerrit.cloudera.org:8080/6690 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I34b7f4d94c3d396ac89026c7559d6b2c6e02697c Gerrit-PatchSet: 5 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim ArmstrongGerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] IMPALA-2525: Treat parquet ENUMs as STRINGs when creating impala tables.
Jakub Kukul has uploaded a new patch set (#4). Change subject: IMPALA-2525: Treat parquet ENUMs as STRINGs when creating impala tables. .. IMPALA-2525: Treat parquet ENUMs as STRINGs when creating impala tables. Change-Id: Ia7a2e20c3ab83eb3fac422c3b33c117856fec475 --- M docs/topics/impala_parquet.xml M fe/src/main/java/org/apache/impala/analysis/CreateTableLikeFileStmt.java M testdata/bin/create-load-data.sh A testdata/data/schemas/logicaltypes.parquet M testdata/workloads/functional-query/queries/QueryTest/create-table-like-file.test 5 files changed, 21 insertions(+), 1 deletion(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/50/6550/4 -- To view, visit http://gerrit.cloudera.org:8080/6550 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ia7a2e20c3ab83eb3fac422c3b33c117856fec475 Gerrit-PatchSet: 4 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Jakub KukulGerrit-Reviewer: Jakub Kukul Gerrit-Reviewer: Jim Apple Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Taras Bobrovytsky
[Impala-ASF-CR] IMPALA-5144: Remove sortby() hint
Lars Volker has posted comments on this change. Change subject: IMPALA-5144: Remove sortby() hint .. Patch Set 2: (3 comments) Thank you for having a look. I added a test in PS3, and inline comments pointing to the remaining tests you asked for. http://gerrit.cloudera.org:8080/#/c/6885/2/fe/src/test/java/org/apache/impala/analysis/AnalyzeStmtsTest.java File fe/src/test/java/org/apache/impala/analysis/AnalyzeStmtsTest.java: PS2, Line 1792: > Do we have any tests that use shuffle and clustered? If not, maybe we shoul Tests for shuffle and clustered are above this removed block, starting in Line 1726. The tests here have been added for the sortby() hint in particular, and were also only executed for non-legacy syntax. The tests above run for all hint syntaxes. http://gerrit.cloudera.org:8080/#/c/6885/2/fe/src/test/java/org/apache/impala/analysis/ToSqlTest.java File fe/src/test/java/org/apache/impala/analysis/ToSqlTest.java: PS2, Line 508: > Since you're at it, maybe add a simply test with clustered as well. Didn't Good idea, done. http://gerrit.cloudera.org:8080/#/c/6885/2/testdata/workloads/functional-planner/queries/PlannerTest/insert-sort-by.test File testdata/workloads/functional-planner/queries/PlannerTest/insert-sort-by.test: PS2, Line 328: > Same comment as before. Just make sure we have some coverage for shuffle + They are covered in multiple combinations in insert.test. -- To view, visit http://gerrit.cloudera.org:8080/6885 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I83e1cd6fa7039035973676322deefbce00d3f594 Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Lars VolkerGerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Lars Volker Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5144: Remove sortby() hint
Lars Volker has uploaded a new patch set (#3). Change subject: IMPALA-5144: Remove sortby() hint .. IMPALA-5144: Remove sortby() hint The sortby() hint is superseded by the SORT BY SQL clause, which has been introduced in IMPALA-4166. This changes removes the hint. Change-Id: I83e1cd6fa7039035973676322deefbce00d3f594 --- M be/src/exec/hdfs-table-sink.h M common/thrift/DataSinks.thrift M fe/src/main/java/org/apache/impala/analysis/InsertStmt.java M fe/src/main/java/org/apache/impala/planner/HdfsTableSink.java M fe/src/main/java/org/apache/impala/planner/Planner.java M fe/src/main/java/org/apache/impala/planner/TableSink.java M fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java M fe/src/test/java/org/apache/impala/analysis/AnalyzeStmtsTest.java M fe/src/test/java/org/apache/impala/analysis/ParserTest.java M fe/src/test/java/org/apache/impala/analysis/ToSqlTest.java M testdata/workloads/functional-planner/queries/PlannerTest/insert-sort-by.test M testdata/workloads/functional-planner/queries/PlannerTest/insert.test M testdata/workloads/functional-query/queries/QueryTest/insert.test M tests/query_test/test_insert_parquet.py 14 files changed, 62 insertions(+), 279 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/85/6885/3 -- To view, visit http://gerrit.cloudera.org:8080/6885 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I83e1cd6fa7039035973676322deefbce00d3f594 Gerrit-PatchSet: 3 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Lars VolkerGerrit-Reviewer: Dimitris Tsirogiannis
[Impala-ASF-CR] IMPALA-5309: Adds TABLESAMPLE clause for HDFS table refs.
Marcel Kornacker has posted comments on this change. Change subject: IMPALA-5309: Adds TABLESAMPLE clause for HDFS table refs. .. Patch Set 3: (7 comments) http://gerrit.cloudera.org:8080/#/c/6868/3/fe/src/main/java/org/apache/impala/analysis/TableSampleClause.java File fe/src/main/java/org/apache/impala/analysis/TableSampleClause.java: Line 68: if (randomSeed_ != null) { single line? http://gerrit.cloudera.org:8080/#/c/6868/3/fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java File fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java: Line 812: result.numRows_ = numRows_; this is a bit weird outside of the context of sampling (why should #rows stay the same for a different set of files?). instead of doing the sampling in the metadata, would it make more sense to do it during scan range computation? http://gerrit.cloudera.org:8080/#/c/6868/3/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java File fe/src/main/java/org/apache/impala/catalog/HdfsTable.java: Line 1961: List> allFiles = this is an expensive list, and i don't think you need it. you can achieve the same effect with a list that mirrors orderedparts and contains the cumulative # of files up to the preceding partition (e.g., {0, 517, 1120, ...}). you then generate a random sequence of numbers between 0 and #totalfiles-1. this requires space proportional to the sample size, but not the total number of files. you could even return this as a map (the value being the indices into the per-partition list of file descriptors), and have hdfsscannode use that to access the sampled files in the original partitions. http://gerrit.cloudera.org:8080/#/c/6868/3/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java File fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java: Line 132: // Total number of files from the effective partitions (partitions_ or sample_). update comment Line 135: // Total number of bytes from the effective partitions (partitions_ or sample_). same Line 678: cardinality_ = addCardinalities(cardinality_, p.getNumRows()); shouldn't the cardinality reflect the reduction from sampling? if you're sampling, you reduce scan output, which should affect join order, no? http://gerrit.cloudera.org:8080/#/c/6868/3/tests/query_test/test_tablesample.py File tests/query_test/test_tablesample.py: Line 47: # 2. The results of queries without a repeatable clause could be change due to remove "be" -- To view, visit http://gerrit.cloudera.org:8080/6868 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ief112cfb1e4983c5d94c08696dc83da9ccf43f70 Gerrit-PatchSet: 3 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Alex Behm Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Marcel Kornacker Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-3905: HdfsScanner::GetNext() for Avro, RC, and Seq scans.
Alex Behm has posted comments on this change. Change subject: IMPALA-3905: HdfsScanner::GetNext() for Avro, RC, and Seq scans. .. Patch Set 2: (1 comment) http://gerrit.cloudera.org:8080/#/c/6527/2/be/src/exec/hdfs-scan-node.h File be/src/exec/hdfs-scan-node.h: Line 113: boost::scoped_ptr materialized_row_batches_; > If I understood correctly transfer-of-ownership currently is based on uniqu I'll make the change or give a better explanation why we should not do it now. It will take some time. -- To view, visit http://gerrit.cloudera.org:8080/6527 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ie18f57b0d3fe0052a8ccd361b6a5fcdf979d0669 Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Alex BehmGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: anujphadke Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-3905: HdfsScanner::GetNext() for Avro, RC, and Seq scans.
Alex Behm has posted comments on this change. Change subject: IMPALA-3905: HdfsScanner::GetNext() for Avro, RC, and Seq scans. .. Patch Set 2: (2 comments) http://gerrit.cloudera.org:8080/#/c/6527/2/be/src/exec/base-sequence-scanner.h File be/src/exec/base-sequence-scanner.h: Line 86: virtual Status GetNextInternal(RowBatch* row_batch); > Sounds fine - it would be good though to do the follow-on patch before comm No point in waiting then. Added WARN_UNUSED_RESULT to all scanners and the hdfs-relevant scan nodes. http://gerrit.cloudera.org:8080/#/c/6527/2/be/src/exec/hdfs-rcfile-scanner.h File be/src/exec/hdfs-rcfile-scanner.h: Line 404: uint8_t* row_group_buffer_; > I think maybe StringBuffer should be called MemoryBuffer or something like Agree about changing the name/comments of StringBuffer. Leaving as is for now, lmk if you still want to me to change it. -- To view, visit http://gerrit.cloudera.org:8080/6527 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ie18f57b0d3fe0052a8ccd361b6a5fcdf979d0669 Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Alex BehmGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: anujphadke Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5319: Fix test hdfs scan node errors failures
Jim Apple has posted comments on this change. Change subject: IMPALA-5319: Fix test_hdfs_scan_node_errors failures .. Patch Set 1: > Good idea, thanks. I cancelled that job and started: > http://sandbox.jenkins.cloudera.com/view/Impala/view/Private-Utility/job/impala-private-build-and-test/5609/ People who do not work at Cloudera cannot see this. If you want, you can run exhaustive builds on jenkins.impala.io: http://jenkins.impala.io:8080/view/Utility/job/ubuntu-14.04-from-scratch/parambuild/?EXPLORATION_STRATEGY=exhaustive -- To view, visit http://gerrit.cloudera.org:8080/6891 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I533f1921662802ea6e076eefac973f50c014fcb5 Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Matthew JacobsGerrit-Reviewer: Jim Apple Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Michael Ho Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-5331: Use new libHDFS API to address "Unknown Error 255"
Sailesh Mukil has posted comments on this change. Change subject: IMPALA-5331: Use new libHDFS API to address "Unknown Error 255" .. Patch Set 1: Questions for reviewers: Do you think we should call hdfsGetLastExceptionRootCause() for all HDFS errors (and not only 255)? In some cases, there may be more useful information, while in most other cases, the error message will be redundant (eg: 'file not found' printed twice). Also, does anyone know of a test case that deterministically reproduces an "Unknown Error 255"? I spent a few hours trying to recreate one, but none seem simple enough to automate into our test suite. -- To view, visit http://gerrit.cloudera.org:8080/6894 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I181e316ed63b70b94d4f7a7557d398a931bb171d Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Sailesh MukilGerrit-Reviewer: Sailesh Mukil Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-5331: Use new libHDFS API to address "Unknown Error 255"
Sailesh Mukil has uploaded a new change for review. http://gerrit.cloudera.org:8080/6894 Change subject: IMPALA-5331: Use new libHDFS API to address "Unknown Error 255" .. IMPALA-5331: Use new libHDFS API to address "Unknown Error 255" We use the new libHDFS API hdfsGetLastExceptionRootCause() to return the last seen HDFS error on that thread. This patch depends on the recent HDFS commit: https://github.com/apache/hadoop/commit/fda86ef2a32026c02d9b5d4cca1ecb7b4decd872 Change-Id: I181e316ed63b70b94d4f7a7557d398a931bb171d --- M be/src/util/hdfs-util.cc 1 file changed, 3 insertions(+), 0 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/94/6894/1 -- To view, visit http://gerrit.cloudera.org:8080/6894 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newchange Gerrit-Change-Id: I181e316ed63b70b94d4f7a7557d398a931bb171d Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Sailesh Mukil
[Impala-ASF-CR] IMPALA-3905: HdfsScanner::GetNext() for Avro, RC, and Seq scans.
Alex Behm has posted comments on this change. Change subject: IMPALA-3905: HdfsScanner::GetNext() for Avro, RC, and Seq scans. .. Patch Set 2: (2 comments) http://gerrit.cloudera.org:8080/#/c/6527/2/be/src/exec/base-sequence-scanner.cc File be/src/exec/base-sequence-scanner.cc: Line 137: static_cast(scan_node_)->AddMaterializedRowBatch(row_batch); > Right, but the transfer of ownership is different between the two cases - I Done http://gerrit.cloudera.org:8080/#/c/6527/2/be/src/exec/hdfs-scan-node.h File be/src/exec/hdfs-scan-node.h: Line 113: boost::scoped_ptr materialized_row_batches_; > If I understood correctly transfer-of-ownership currently is based on uniqu I'll make the change or give a better explanation why we should not do it now. -- To view, visit http://gerrit.cloudera.org:8080/6527 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ie18f57b0d3fe0052a8ccd361b6a5fcdf979d0669 Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Alex Behm Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: anujphadke Gerrit-HasComments: Yes