[jira] [Created] (IMPALA-6233) Document the column definitions list in CREATE VIEW
Alexander Behm created IMPALA-6233: -- Summary: Document the column definitions list in CREATE VIEW Key: IMPALA-6233 URL: https://issues.apache.org/jira/browse/IMPALA-6233 Project: IMPALA Issue Type: Improvement Components: Docs Affects Versions: Impala 2.10.0 Reporter: Alexander Behm Assignee: John Russell Looking at this page: https://www.cloudera.com/documentation/enterprise/latest/topics/impala_create_view.html#create_view It appears we do not have an example for the "columns_list" that shows adding a comment to a column. We should add that. Example: {code} create table t1 (c1 int, c2 int); create view v (x comment 'hello world', y) as select * from t1; describe v; +--+--+-+ | name | type | comment | +--+--+-+ | x| int | hello world | | y| int | | +--+--+-+ {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (IMPALA-6232) Short circuit reads disabled when using Impala HDFS file handle cache
Joe McDonnell created IMPALA-6232: - Summary: Short circuit reads disabled when using Impala HDFS file handle cache Key: IMPALA-6232 URL: https://issues.apache.org/jira/browse/IMPALA-6232 Project: IMPALA Issue Type: Bug Components: Backend Affects Versions: Impala 2.10.0 Reporter: Joe McDonnell Assignee: Joe McDonnell Priority: Blocker In Impala 2.10, the HDFS file handle cache was enabled by default. However, testing has revealed that in cases where files are overwritten or appended, the file handle can encounter an error that causes HDFS to disable short circuit reads for 10 minutes. See [HDFS-12528|https://issues.apache.org/jira/browse/HDFS-12528]. Due to this performance impact and the associated unpredictability, Impala should disable the file handle cache by default until this issue is resolved. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Resolved] (IMPALA-6172) KRPC w/ TLS doesn't work on remote clusters after rebase
[ https://issues.apache.org/jira/browse/IMPALA-6172?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sailesh Mukil resolved IMPALA-6172. --- Resolution: Fixed Fix Version/s: Impala 2.11.0 Commit in: https://github.com/apache/incubator-impala/commit/32baa695f499a936b72c5a51ae3649c408aa5a85 > KRPC w/ TLS doesn't work on remote clusters after rebase > > > Key: IMPALA-6172 > URL: https://issues.apache.org/jira/browse/IMPALA-6172 > Project: IMPALA > Issue Type: Sub-task > Components: Security >Reporter: Sailesh Mukil >Assignee: Sailesh Mukil >Priority: Blocker > Labels: broken-build, security > Fix For: Impala 2.11.0 > > > It looks like depending on who initializes OpenSSL (KRPC or us), the behavior > changes. After some cherry-picks, we're unable to run Impala on remote > clusters with TLS with certain certificate types. > We get the following when we use intermediate CAs: > {code:java} > "F1108 10:47:36.532202 93303 impalad-main.cc:79] Could not build messenger: > Runtime error: certificate does not match private key: error:0B080074:x509 > certificate routines:X509_check_private_key:key values > mismatch:x509_cmp.c:331" > {code} > And we get the following when we use self-signed certificates: > "self signed certificate in certificate chain" -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Resolved] (IMPALA-5019) DECIMAL V2 add/sub result type
[ https://issues.apache.org/jira/browse/IMPALA-5019?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Taras Bobrovytsky resolved IMPALA-5019. --- Resolution: Fixed Fix Version/s: Impala 2.11.0 {code} commit bc12a9eb35ff60d7a7e0f6732e9ab6a1d4538f2a Author: Taras BobrovytskyDate: Tue Sep 19 16:23:24 2017 -0700 IMPALA-5019: Decimal V2 addition In this patch, we implement the new decimal return type rules for addition expressions. These rules become active when the query option DECIMAL_V2 is enabled. The algorithm for determining the type of the result is described in the JIRA. DECIMAL V1: ++ | typeof(cast(1 as decimal(38,0)) + cast(0.1 as decimal(38,38))) | ++ | DECIMAL(38,38) | ++ DECIMAL V2: ++ | typeof(cast(1 as decimal(38,0)) + cast(0.1 as decimal(38,38))) | ++ | DECIMAL(38,6) | ++ This patch required backend changes. We implement an algorithm where we handle the whole and fractional parts separately, and then combine them to get the final result. This is more complex and slower. We try to avoid this by first checking if the result would fit into int128. Testing: - Added expr tests. - Tested locally on my machine with a script that generates random decimal numbers and checks that Impala adds them correctly. Performance: For the common case, performance remains the same. select cast(2.2 as decimal(18, 1) + cast(2.2 as decimal(18, 1) BEFORE: 4.74s AFTER: 4.73s In this case, we check if it is necessary to do the complex addition, and it turns out to be not necessary. We see a slowdown because the result needs to be scaled down by dividing. select cast(2.2 as decimal(38, 19) + cast(2.2 as decimal(38, 19) BEFORE: 1.63s AFTER: 13.57s In following case, we take the most complex path and see the most signification perfmance hit. select cast(7.5 as decimal(38,37)) + cast(2.2 as decimal(38,37)) BEFORE: 1.63s AFTER: 20.57 {code} > DECIMAL V2 add/sub result type > -- > > Key: IMPALA-5019 > URL: https://issues.apache.org/jira/browse/IMPALA-5019 > Project: IMPALA > Issue Type: Bug > Components: Frontend >Affects Versions: Impala 2.0 >Reporter: Dan Hecht >Assignee: Taras Bobrovytsky > Fix For: Impala 2.11.0 > > > For decimal_v2=true, we should revisit the add/sub result type. Currently, we > set result scale to max(S1, S2) (potentially losing precision). Other > systems (e.g. SQL Server) seem to choose either S1 or S2 depending on whether > digits to the left of the decimal point would be lost. This would require > changes to the backend implementation of add/sub, however. > Currently we compute rP and rS as follows: > {code} > rS = max(s1, s2) > rP = max(s1, s2) + max(p1 - s1, p2 - s2) + 1 > {code} > We currently handle the case where rP > 38 as follows: > {code} > if (rP > 38): > rP = 38 > rS = min(38, rS) > {code} > This basically truncates the digits to the left of the decimal point. > The proposed result under V2 is: > {code} > if (rP > 38): > minS = min(rS, 6) > rS = rS - (rP - 38) > rS = max(minS, rS) > {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Closed] (IMPALA-6200) Flakiness in Planner Tests
[ https://issues.apache.org/jira/browse/IMPALA-6200?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zach Amsden closed IMPALA-6200. --- Resolution: Fixed I'm gonna go ahead and close this as a dup. We can continue investigation in IMPALA-3887 > Flakiness in Planner Tests > -- > > Key: IMPALA-6200 > URL: https://issues.apache.org/jira/browse/IMPALA-6200 > Project: IMPALA > Issue Type: Bug > Components: Frontend >Affects Versions: Impala 2.11.0 >Reporter: Taras Bobrovytsky >Assignee: Zach Amsden >Priority: Blocker > Labels: broken-build, flaky > > Sometimes we are seeing random small variations in Planner tests, which > causes builds to be flaky. > Actual: > {code} > F00:PLAN FRAGMENT [UNPARTITIONED] hosts=1 instances=1 > | Per-Host Resources: mem-estimate=48.00MB mem-reservation=0B > ^^ > WRITE TO HDFS [functional.alltypes, OVERWRITE=false, PARTITION-KEYS=(CAST(3 + > year AS INT),CAST(month - -1 AS INT))] > | partitions=4 > | mem-estimate=1.56KB mem-reservation=0B > {code} > Expected: > {code} > F00:PLAN FRAGMENT [UNPARTITIONED] hosts=1 instances=1 > | Per-Host Resources: mem-estimate=32.00MB mem-reservation=0B > WRITE TO HDFS [functional.alltypes, OVERWRITE=false, PARTITION-KEYS=(CAST(3 + > year AS INT),CAST(month - -1 AS INT))] > | partitions=4 > | mem-estimate=1.56KB mem-reservation=0B > {code} > Actual: > {code} > | F01:PLAN FRAGMENT [RANDOM] hosts=2 instances=2 > ^ > | Per-Host Resources: mem-estimate=16.00MB mem-reservation=0B > | 01:SCAN HDFS [functional_parquet.alltypestiny, RANDOM] > | partitions=4/4 files=4 size=9.75KB > | stats-rows=unavailable extrapolated-rows=disabled > | table stats: rows=unavailable size=unavailable > | column stats: unavailable > | mem-estimate=16.00MB mem-reservation=0B > | tuple-ids=1 row-size=88B cardinality=unavailable > {code} > Expected: > {code} > | F01:PLAN FRAGMENT [RANDOM] hosts=3 instances=3 > | Per-Host Resources: mem-estimate=16.00MB mem-reservation=0B > | 01:SCAN HDFS [functional_parquet.alltypestiny, RANDOM] > | partitions=4/4 files=4 size=10.48KB > | stats-rows=unavailable extrapolated-rows=disabled > | table stats: rows=unavailable size=unavailable > | column stats: unavailable > | mem-estimate=16.00MB mem-reservation=0B > | tuple-ids=1 row-size=88B cardinality=unavailable > {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Closed] (IMPALA-3436) Round(double, int) should return decimal
[ https://issues.apache.org/jira/browse/IMPALA-3436?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Taras Bobrovytsky closed IMPALA-3436. - Resolution: Won't Fix [~dhecht], I agree that doubles in general have problems (such as there being different ways to represent the same double). If customers want to avoid these problems, they should avoid using the double type. Also, as Tim mentioned, they can still get correct rounding behavior by casting to decimal first. I created IMPALA-6230, to keep track of the required round() changes. > Round(double, int) should return decimal > > > Key: IMPALA-3436 > URL: https://issues.apache.org/jira/browse/IMPALA-3436 > Project: IMPALA > Issue Type: Improvement > Components: Backend >Affects Versions: Impala 2.5.0 >Reporter: Tim Armstrong >Assignee: Taras Bobrovytsky >Priority: Minor > Labels: compatibility, usability > > Impala has several versions of round: round(double a), round(double a, int > d), round(decimal a, int_type d) > round(double a) returns a BIGINT, which makes sense because it rounds to the > nearest int. > round(decimal a, int_type d) returns a DECIMAL, which makes sense because it > rounds to a decimal digit. > round(double a, int d) predates DECIMAL support, so it returns a DOUBLE. It > is specified to return the nearest double value. > E.g. round(cast(1 as DOUBLE) / 10, 1) returns the binary floating point value > closest to 0.1. This number has no exact decimal representation. Both > 0.100 and 0.1555 are valid decimal > representations of this floating point number. I.e. if you convert them back > to float, you will get the same number. > This is correct according to floating point conversion rules and the Impala > documentation, but it is confusing for two reasons: > * round() returning a double is a little surprising, because it can't > precisely represent the result > * Impala clients can display the floating-point result in multiple *valid* > ways. Different clients have different algorithms for converting > floating-point to decimal for display, so even if Impala returns the same > result it may appear as 0.1 in one client and 0.1555. We > don't specify that clients have to use a particular algorithm, so it's valid > as long as it converts back to the same float as part of a round-trip. > We should consider changing the spec of round() in Impala to always return a > decimal to avoid this confusion. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Resolved] (IMPALA-5624) ProcessStateInfo::ReadProcFileDescriptorInfo() should not fork a process
[ https://issues.apache.org/jira/browse/IMPALA-5624?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Csaba Ringhofer resolved IMPALA-5624. - Resolution: Done Fix Version/s: Impala 2.11.0 IMPALA-5624: Replace "ls -l" with opendir() in ProcessStateInfo Running shell commands from impalad can be problematic, because using popen() leads to forking which causes a spike in virtual memory. To avoid this, "ls" is replaced with POSIX API calls. FileDescriptorMap fd_desc_ was only used to get the number of file descriptors, so it was unneccesery work to initialize it. It is removed, and only the number of file descriptors is computed. The automatic test for this function is only a sanity check, because there is no way to know the "expected value" in advance, and the number of file desciptors can change anytime. Change-Id: Ibffae8069a62e100abbfa7d558b49040b095ddc0 Reviewed-on: http://gerrit.cloudera.org:8080/8546 Reviewed-by: Lars VolkerTested-by: Impala Public Jenkins > ProcessStateInfo::ReadProcFileDescriptorInfo() should not fork a process > > > Key: IMPALA-5624 > URL: https://issues.apache.org/jira/browse/IMPALA-5624 > Project: IMPALA > Issue Type: Bug > Components: Backend >Affects Versions: Impala 2.10.0 >Reporter: Tim Armstrong >Assignee: Csaba Ringhofer > Fix For: Impala 2.11.0 > > > Forking processes from the Impala daemon after startup is problematic because > of the spike in virtual memory it causes (see IMPALA-2294). We should avoid > doing this in ProcessStateInfo::ReadProcFileDescriptorInfo(), which is > invoked from the web server debug pages. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (IMPALA-6231) Do some fuzz testing of decimal v2 operations
Taras Bobrovytsky created IMPALA-6231: - Summary: Do some fuzz testing of decimal v2 operations Key: IMPALA-6231 URL: https://issues.apache.org/jira/browse/IMPALA-6231 Project: IMPALA Issue Type: Bug Components: Backend Affects Versions: Impala 2.11.0 Reporter: Taras Bobrovytsky Assignee: Taras Bobrovytsky After all decimal v2 patches go in, we need to develop and run a fuzz tester and checks the correctness of decimal v2 operations. For example, the fuzzer could generate two random decimals, add them together and verify that the result is correct. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (IMPALA-6230) The output type of a round() function should match the input type
Taras Bobrovytsky created IMPALA-6230: - Summary: The output type of a round() function should match the input type Key: IMPALA-6230 URL: https://issues.apache.org/jira/browse/IMPALA-6230 Project: IMPALA Issue Type: Bug Components: Backend, Frontend Affects Versions: Impala 2.10.0 Reporter: Taras Bobrovytsky Assignee: Taras Bobrovytsky At the next compatibility breaking version we should revisit the output types of round() functions. In order to match the behavior of most of other database systems, the output type of the round() functions should be the same as the input type. For example, today, round(double) returns a bigint. We should return a double instead. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Resolved] (IMPALA-6054) Parquet dictionary pages should be freed on dictionary construction
[ https://issues.apache.org/jira/browse/IMPALA-6054?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Csaba Ringhofer resolved IMPALA-6054. - Resolution: Done Fix Version/s: Impala 2.11.0 IMPALA-6054: Parquet dictionary pages should be freed on dictionary construction During dictionary constructon, most types are copied from the parquet dictionary page, but StringValues keep pointers to it. In this case, the dictionary page must be kept and attached to the last row batch that references it. In case of other types, it is safe do delete the dictionary page after the construction of the dictionary. This patch contains two optimizations: - dictionary pages are deleted as soon as possible for non string types - in non-compressed and non-string case, an unnecessary copy is avoided Change-Id: I4d9d5f4da1028d961155dafdac0028a1c3641004 Reviewed-on: http://gerrit.cloudera.org:8080/8436 Reviewed-by: Tim ArmstrongTested-by: Impala Public Jenkins > Parquet dictionary pages should be freed on dictionary construction > --- > > Key: IMPALA-6054 > URL: https://issues.apache.org/jira/browse/IMPALA-6054 > Project: IMPALA > Issue Type: Improvement > Components: Backend >Affects Versions: Impala 2.10.0 >Reporter: Joe McDonnell >Assignee: Csaba Ringhofer >Priority: Minor > Labels: resource-management > Fix For: Impala 2.11.0 > > > The Parquet scanner uses the dictionary_pool_ to allocate memory for the > dictionary page (see BaseScalarColumnReader::InitDictionary()). This > dictionary page is used to initialize the dictionary in > CreateDictionaryDecoder(). The resulting dictionary is a vector of values. > For some datatypes, such as strings, the resulting dictionary has an array of > StringValue's that contain pointers into the dictionary page (see the > StringValue specialization in ParquetPlainEncoder::Decode()). In this case, > the dictionary page must be kept and attached to the last row batch that > references it. However, for other datatypes, the values are copied into the > dictionary and the dictionary page is no longer needed after the dictionary > is constructed. > Currently, these dictionary pages remain in the dictionary_pool_ and are > attached to the last row batch to be passed to other ExecNodes (see > FlushRowGroupResources()). This should only pass StringValue dictionary pages > (or other types that point to data in the page) on the row batch. The other > types should be freed immediately once the dictionary has been constructed. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (IMPALA-6229) A different test results between test/run-tests.py and impala-py.test
Jinchul Kim created IMPALA-6229: --- Summary: A different test results between test/run-tests.py and impala-py.test Key: IMPALA-6229 URL: https://issues.apache.org/jira/browse/IMPALA-6229 Project: IMPALA Issue Type: Bug Components: Infrastructure Reporter: Jinchul Kim Priority: Minor I copied from dev mailing list because this might be a infra issue. I am trying to look into the build error: https://jenkins.impala.io/job/gerrit-verify-dryrun/1472/ (The relevant code change: https://gerrit.cloudera.org/#/c/8355/13). You can reproduce the issue using the patch set. There was a test failure at "TestAllocFail.test_alloc_fail_init". I ran the following command but it always passed on my change: ./tests/run-tests.py tests/custom_cluster/test_alloc_fail.py The command below with "run-tests.py" looks fine to me because it shows tests are finished successfully. I guess it shows false positive error. If "tests/custom_cluster/test_alloc_fail.py" cannot run with "run-tests.py", test should be finished with an error due to incompatibility case. Would you please check two things? 1. The false positive error 2. Does "tests/custom_cluster/test_alloc_fail.py" run with "run-tests.py"? $ ./tests/run-tests.py tests/custom_cluster/test_alloc_fail.py ... = test session starts == platform linux2 -- Python 2.7.12, pytest-2.9.2, py-1.4.32, pluggy-0.3.1 -- /home/jinchulkim/Impala/bin/../infra/python/env/bin/python cachedir: .cache rootdir: /home/jinchulkim/Impala/tests, inifile: pytest.ini plugins: xdist-1.15.0, random-0.2 collected 2 items verifiers/test_verify_metrics.py::TestValidateMetrics::test_metrics_are_zero PASSED verifiers/test_verify_metrics.py::TestValidateMetrics::test_num_unused_buffers PASSED === 2 passed in 0.11 seconds === -- This message was sent by Atlassian JIRA (v6.4.14#64029)