[jira] [Commented] (IMPALA-10308) Fail to load metadata for table: 'iceberg_partitioned' in a scanner test with ASAN build

2020-11-02 Thread WangSheng (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-10308?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17225068#comment-17225068
 ] 

WangSheng commented on IMPALA-10308:


Hi [~sql_forever], we need to create these test tables manually before execute 
tests, if you want to verify a specific test. Or you can run 
$IMPALA_HOME/bin/{{run-all-tests}}{{.sh to execute whole impala tests, impala 
server will create tests tables automatically, all DDL statements in 
functional_schema_template.sql will be executed before run tests, more details 
about impala test, you can refer: 
https://cwiki.apache.org/confluence/display/IMPALA/How+to+load%2C+run%2C+and+create+new+Impala+tests}}

> Fail to load metadata for table: 'iceberg_partitioned' in a scanner test with 
> ASAN build
> 
>
> Key: IMPALA-10308
> URL: https://issues.apache.org/jira/browse/IMPALA-10308
> Project: IMPALA
>  Issue Type: Bug
>Reporter: Qifan Chen
>Priority: Major
>
> The following error was seen when running the scanner test against the ASAN 
> build.
> {code:java}
> E   ImpalaBeeswaxException: ImpalaBeeswaxException:
> EINNER EXCEPTION: 
> EMESSAGE: AnalysisException: Failed to load metadata for table: 
> 'iceberg_partitioned'
> E   CAUSED BY: TableLoadingException: Error loading metadata for Iceberg 
> table hdfs://localhost:20500/test-warehouse/iceberg_test/iceberg_partitioned
> E   CAUSED BY: IllegalArgumentException: Can not create a Path from a null 
> string
>  TestIceberg.test_iceberg_query[protocol: beeswax | exec_option: 
> {'batch_size': 0, 'num_nodes': 0, 'disable_codegen_rows_threshold': 0, 
> 'disable_codegen': True, 'abort_on_error': 1, 'debug_action': 
> 'HDFS_SCANNER_THREAD_CHECK_SOFT_MEM_LIMIT:FAIL@0.5', 
> 'exec_single_node_rows_threshold': 0} | table_format: parquet/none] 
> [gw2] linux2 -- Python 2.7.16 
> /home/qchen/Impala/bin/../infra/python/env-gcc7.5.0/bin/python
> query_test/test_scanners.py:357: in test_iceberg_query
> self.run_test_case('QueryTest/iceberg-query', vector)
> common/impala_test_suite.py:662: in run_test_case
> result = exec_fn(query, user=test_section.get('USER', '').strip() or None)
> common/impala_test_suite.py:600: in __exec_in_impala
> result = self.__execute_query(target_impalad_client, query, user=user)
> common/impala_test_suite.py:920: in __execute_query
> return impalad_client.execute(query, user=user)
> common/impala_connection.py:205: in execute
> return self.__beeswax_client.execute(sql_stmt, user=user)
> beeswax/impala_beeswax.py:187: in execute
> handle = self.__execute_query(query_string.strip(), user=user)
> beeswax/impala_beeswax.py:363: in __execute_query
> handle = self.execute_query_async(query_string, user=user)
> beeswax/impala_beeswax.py:357: in execute_query_async
> handle = self.__do_rpc(lambda: self.imp_service.query(query,))
> beeswax/impala_beeswax.py:520: in __do_rpc
> {code}
> To reproduce, apply the following steps.
> {code:java}
> 1. Build: ${IMPALA_HOME}/buildall.sh -skiptests -ninja -asan
> 2. Run test: 
> cd {IMPALA_HOME} 
> $tests/run-tests.py --exploration_strategy=exhaustive 
> tests/query_test/test_scanners.py
> {code}
> Branch info.
> The master branch with ttps://github.com/apache/impala.git.  The HEAD points 
> at 193c2e773fa9f6772e4a7c30ed3a4f75029863f1.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-9180) Remove legacy ImpalaInternalService

2020-11-02 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-9180?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17225053#comment-17225053
 ] 

ASF subversion and git services commented on IMPALA-9180:
-

Commit 1af60a15605463ab4ba00d5326d130d0a3165821 in impala's branch 
refs/heads/master from wzhou-code
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=1af60a1 ]

IMPALA-9180 (part 3): Remove legacy backend port

The legacy Thrift based Impala internal service has been removed so
the backend port 22000 can be freed up.

This patch set flag be_port as a REMOVED_FLAG and all infrastructures
around it are cleaned up. StatestoreSubscriber::subscriber_id is set
as hostname + krpc_port.

Testing:
 - Passed the exhaustive test.

Change-Id: Ic6909a8da449b4d25ee98037b3eb459af4850dc6
Reviewed-on: http://gerrit.cloudera.org:8080/16533
Reviewed-by: Thomas Tauber-Marshall 
Tested-by: Impala Public Jenkins 


> Remove legacy ImpalaInternalService
> ---
>
> Key: IMPALA-9180
> URL: https://issues.apache.org/jira/browse/IMPALA-9180
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Backend
>Affects Versions: Impala 3.4.0
>Reporter: Michael Ho
>Assignee: Wenzhe Zhou
>Priority: Minor
>
> Now that IMPALA-7984 is done, the legacy Thrift based Impala internal service 
> can now be removed. The port 22000 can also be freed up. In addition to code 
> change, the doc probably needs to be updated to reflect the fact that 22000 
> is no longer in use.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-9767) ASAN crash during coordinator runtime filter updates

2020-11-02 Thread Joe McDonnell (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-9767?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17225025#comment-17225025
 ] 

Joe McDonnell commented on IMPALA-9767:
---

[~fangyurao] My thought is that we may not need to clear the 
bloom_filter_directory_ there. If we are transitioning to a terminal state, 
then I think it will be freed by Coordinator::ReleaseExecResources(), which 
waits for publishing filters to complete.

[https://github.com/apache/impala/blob/master/be/src/runtime/coordinator.cc#L744]

[https://github.com/apache/impala/blob/master/be/src/runtime/coordinator.cc#L1259-L1260]

I'm not very familiar with this code, so take it with a grain of salt.

> ASAN crash during coordinator runtime filter updates
> 
>
> Key: IMPALA-9767
> URL: https://issues.apache.org/jira/browse/IMPALA-9767
> Project: IMPALA
>  Issue Type: Bug
>Reporter: Sahil Takiar
>Assignee: Fang-Yu Rao
>Priority: Major
>  Labels: asan, broken-build, crash
> Attachments: consoleFull_asan_939.txt
>
>
> ASAN crash output:
> {code:java}
> Error MessageAddress Sanitizer message detected in 
> /data/jenkins/workspace/impala-asf-master-core-asan/repos/Impala/logs/ee_tests/impalad.ERRORStandard
>  Error==4808==ERROR: AddressSanitizer: heap-use-after-free on address 
> 0x7f6288cbe818 at pc 0x0199f6fe bp 0x7f63c1a8b270 sp 0x7f63c1a8aa20
> READ of size 1048576 at 0x7f6288cbe818 thread T73 (rpc reactor-552)
> #0 0x199f6fd in read_iovec(void*, __sanitizer::__sanitizer_iovec*, 
> unsigned long, unsigned long) 
> /mnt/source/llvm/llvm-5.0.1.src-p2/projects/compiler-rt/lib/asan/../sanitizer_common/sanitizer_common_interceptors.inc:904
> #1 0x19a1f57 in read_msghdr(void*, __sanitizer::__sanitizer_msghdr*, 
> long) 
> /mnt/source/llvm/llvm-5.0.1.src-p2/projects/compiler-rt/lib/asan/../sanitizer_common/sanitizer_common_interceptors.inc:2781
> #2 0x19a46c3 in __interceptor_sendmsg 
> /mnt/source/llvm/llvm-5.0.1.src-p2/projects/compiler-rt/lib/asan/../sanitizer_common/sanitizer_common_interceptors.inc:2796
> #3 0x372034d in kudu::Socket::Writev(iovec const*, int, long*) 
> /data/jenkins/workspace/impala-asf-master-core-asan/repos/Impala/be/src/kudu/util/net/socket.cc:447:3
> #4 0x331c095 in kudu::rpc::OutboundTransfer::SendBuffer(kudu::Socket&) 
> /data/jenkins/workspace/impala-asf-master-core-asan/repos/Impala/be/src/kudu/rpc/transfer.cc:227:26
> #5 0x3324da1 in kudu::rpc::Connection::WriteHandler(ev::io&, int) 
> /data/jenkins/workspace/impala-asf-master-core-asan/repos/Impala/be/src/kudu/rpc/connection.cc:802:31
> #6 0x52ca4e2 in ev_invoke_pending 
> (/data0/jenkins/workspace/impala-asf-master-core-asan/repos/Impala/be/build/debug/service/impalad+0x52ca4e2)
> #7 0x32aeadc in kudu::rpc::ReactorThread::InvokePendingCb(ev_loop*) 
> /data/jenkins/workspace/impala-asf-master-core-asan/repos/Impala/be/src/kudu/rpc/reactor.cc:196:3
> #8 0x52cdb03 in ev_run 
> (/data0/jenkins/workspace/impala-asf-master-core-asan/repos/Impala/be/build/debug/service/impalad+0x52cdb03)
> #9 0x32aecd1 in kudu::rpc::ReactorThread::RunThread() 
> /data/jenkins/workspace/impala-asf-master-core-asan/repos/Impala/be/src/kudu/rpc/reactor.cc:497:9
> #10 0x32c08db in boost::_bi::bind_t kudu::rpc::ReactorThread>, 
> boost::_bi::list1 > 
> >::operator()() 
> /data/jenkins/workspace/impala-asf-master-core-asan/Impala-Toolchain/boost-1.61.0-p2/include/boost/bind/bind.hpp:1222:16
> #11 0x2148c26 in boost::function0::operator()() const 
> /data/jenkins/workspace/impala-asf-master-core-asan/Impala-Toolchain/boost-1.61.0-p2/include/boost/function/function_template.hpp:770:14
> #12 0x2144b29 in kudu::Thread::SuperviseThread(void*) 
> /data/jenkins/workspace/impala-asf-master-core-asan/repos/Impala/be/src/kudu/util/thread.cc:675:3
> #13 0x7f6c0bcf4e24 in start_thread (/lib64/libpthread.so.0+0x7e24)
> #14 0x7f6c0885834c in __clone (/lib64/libc.so.6+0xf834c)
> 0x7f6288cbe818 is located 24 bytes inside of 1052640-byte region 
> [0x7f6288cbe800,0x7f6288dbf7e0)
> freed by thread T114 here:
> #0 0x1a773e0 in operator delete(void*) 
> /mnt/source/llvm/llvm-5.0.1.src-p2/projects/compiler-rt/lib/asan/asan_new_delete.cc:137
> #1 0x7f6c090faed3 in __gnu_cxx::new_allocator::deallocate(char*, 
> unsigned long) 
> /mnt/source/gcc/build-4.9.2/x86_64-unknown-linux-gnu/libstdc++-v3/include/ext/new_allocator.h:110
> #2 0x7f6c090faed3 in std::string::_Rep::_M_destroy(std::allocator 
> const&) 
> /mnt/source/gcc/build-4.9.2/x86_64-unknown-linux-gnu/libstdc++-v3/include/bits/basic_string.tcc:449
> #3 0x7f6c090faed3 in std::string::_Rep::_M_dispose(std::allocator 
> const&) 
> /mnt/source/gcc/build-4.9.2/x86_64-unknown-linux-gnu/libstdc++-v3/include/bits/basic_string.h:249
> #4 0x7f6c090faed3 in 

[jira] [Commented] (IMPALA-10007) Impala development environment does not support Ubuntu 20.4

2020-11-02 Thread Qifan Chen (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-10007?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17224962#comment-17224962
 ] 

Qifan Chen commented on IMPALA-10007:
-

Just changed the status. Sorry about it. 

> Impala development environment does not support Ubuntu 20.4
> ---
>
> Key: IMPALA-10007
> URL: https://issues.apache.org/jira/browse/IMPALA-10007
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Infrastructure
>Reporter: Qifan Chen
>Assignee: Qifan Chen
>Priority: Minor
> Fix For: Impala 4.0
>
>
> The Impala development environment supports Ubuntu up to 18.4.  When trying 
> the environment on Ubuntu 20.4, one can get the following errors.
>  
> From ${IMPALA_HOME}/buildall.sh:
> Exception: Could not find package label for OS version: ubuntu20.04.
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Comment Edited] (IMPALA-10007) Impala development environment does not support Ubuntu 20.4

2020-11-02 Thread Qifan Chen (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-10007?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17224962#comment-17224962
 ] 

Qifan Chen edited comment on IMPALA-10007 at 11/2/20, 9:17 PM:
---

Just changed the status to resolved. Sorry about it. 


was (Author: sql_forever):
Just changed the status. Sorry about it. 

> Impala development environment does not support Ubuntu 20.4
> ---
>
> Key: IMPALA-10007
> URL: https://issues.apache.org/jira/browse/IMPALA-10007
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Infrastructure
>Reporter: Qifan Chen
>Assignee: Qifan Chen
>Priority: Minor
> Fix For: Impala 4.0
>
>
> The Impala development environment supports Ubuntu up to 18.4.  When trying 
> the environment on Ubuntu 20.4, one can get the following errors.
>  
> From ${IMPALA_HOME}/buildall.sh:
> Exception: Could not find package label for OS version: ubuntu20.04.
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Resolved] (IMPALA-10007) Impala development environment does not support Ubuntu 20.4

2020-11-02 Thread Qifan Chen (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-10007?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Qifan Chen resolved IMPALA-10007.
-
Resolution: Fixed

> Impala development environment does not support Ubuntu 20.4
> ---
>
> Key: IMPALA-10007
> URL: https://issues.apache.org/jira/browse/IMPALA-10007
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Infrastructure
>Reporter: Qifan Chen
>Assignee: Qifan Chen
>Priority: Minor
> Fix For: Impala 4.0
>
>
> The Impala development environment supports Ubuntu up to 18.4.  When trying 
> the environment on Ubuntu 20.4, one can get the following errors.
>  
> From ${IMPALA_HOME}/buildall.sh:
> Exception: Could not find package label for OS version: ubuntu20.04.
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-10007) Impala development environment does not support Ubuntu 20.4

2020-11-02 Thread Tim Armstrong (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-10007?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17224910#comment-17224910
 ] 

Tim Armstrong commented on IMPALA-10007:


[~sql_forever] can we resolve this now?

> Impala development environment does not support Ubuntu 20.4
> ---
>
> Key: IMPALA-10007
> URL: https://issues.apache.org/jira/browse/IMPALA-10007
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Infrastructure
>Reporter: Qifan Chen
>Assignee: Qifan Chen
>Priority: Minor
> Fix For: Impala 4.0
>
>
> The Impala development environment supports Ubuntu up to 18.4.  When trying 
> the environment on Ubuntu 20.4, one can get the following errors.
>  
> From ${IMPALA_HOME}/buildall.sh:
> Exception: Could not find package label for OS version: ubuntu20.04.
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Updated] (IMPALA-7572) Put remote filesystems in pre-merge testing

2020-11-02 Thread Tim Armstrong (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-7572?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong updated IMPALA-7572:
--
Priority: Minor  (was: Major)

> Put remote filesystems in pre-merge testing
> ---
>
> Key: IMPALA-7572
> URL: https://issues.apache.org/jira/browse/IMPALA-7572
> Project: IMPALA
>  Issue Type: Bug
>  Components: Infrastructure
>Affects Versions: Impala 3.0
>Reporter: Jim Apple
>Priority: Minor
>
> https://gerrit.cloudera.org/#/c/11435/ revealed that a patch can pass 
> pre-merge testing but fail on S3 or HDFS with erasure coding. We should have 
> fake versions of filesystems like these (and ADLS) that run during pre-merge 
> testing in order to find these type of failures earlier.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Resolved] (IMPALA-3637) Merge codegen constant replacement mechanisms

2020-11-02 Thread Tim Armstrong (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-3637?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong resolved IMPALA-3637.
---
Resolution: Later

> Merge codegen constant replacement mechanisms
> -
>
> Key: IMPALA-3637
> URL: https://issues.apache.org/jira/browse/IMPALA-3637
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Backend
>Affects Versions: Impala 2.6.0
>Reporter: Tim Armstrong
>Priority: Minor
>  Labels: codegen
>
> We currently have two similar way to replace constants in codegen'd code: 
> Expr::GetConstant() and LlvmCodeGen::ReplaceCallSitesWithBoolConst(). We 
> should merge them so that we have a single mechanism with the functionality 
> of both.
> E.g.
> A version that takes a map where the key is a symbol and the value is a 
> constant, or a vector of constants:
> ReplaceCallSitesWithConstants(Function* map, *)
> We could then avoid the expensive Expr::GetConstant() call on the interpreted 
> path.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Comment Edited] (IMPALA-7876) COMPUTE STATS TABLESAMPLE is not updating number of estimated rows

2020-11-02 Thread Abhishek Rawat (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-7876?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17224873#comment-17224873
 ] 

Abhishek Rawat edited comment on IMPALA-7876 at 11/2/20, 6:33 PM:
--

The core issue here is that the child query computing the num_rows (table 
stats) uses ROUND function which returns the results as a *DECIMAL* type. Eg. 
below.
{code:java}
SELECT ROUND(COUNT / 0.8935390115) FROM t1 TABLESAMPLE SYSTEM(10) 
REPEATABLE(1598511315168){code}
The CatalogOpExecutor when setting the table stats expects the data type to be 
*BIGINT*.

[https://github.com/apache/impala/blob/master/be/src/exec/catalog-op-executor.cc#L243]

[https://github.com/apache/impala/blob/master/be/src/exec/catalog-op-executor.cc#L255]

This used to work in the past because ROUND used to return results as type 
BIGINT.

This behavior was later changed for the better in this 
[commit|https://github.com/apache/impala/commit/8fec1911e52e40aff4cc1de17265bd6803cb13f5]

There are couple of ways to fix this issue. I am leaning towards a fix which 
will add a *CAST as BIGINT* in the generated SQL for the child query, since 
num_rows should be a BIGINT.

[https://github.com/apache/impala/blob/master/fe/src/main/java/org/apache/impala/analysis/ComputeStatsStmt.java#L548]

Also, probably best to fix this in the child query's sql, rather than adding 
implicit casts else where in the code.

 


was (Author: arawat):
The core issue here is that the child query computing the num_rows (table 
stats) uses ROUND function which returns the results as a *DECIMAL* type. Eg. 
below.
{code:java}
SELECT ROUND(COUNT / 0.8935390115) FROM t1 TABLESAMPLE SYSTEM(10) 
REPEATABLE(1598511315168){code}
The CatalogOpExecutor when setting the table stats expects the data type to be 
*BIGINT*.

[https://github.com/apache/impala/blob/master/be/src/exec/catalog-op-executor.cc#L243]

[https://github.com/apache/impala/blob/master/be/src/exec/catalog-op-executor.cc#L255]

This used to work in the past because ROUND used to return results as type 
BIGINT.

This behavior was later changed for the better in this 
[commit|http://https//github.com/apache/impala/commit/8fec1911e52e40aff4cc1de17265bd6803cb13f5]

There are couple of ways to fix this issue. I am leaning towards a fix which 
will add a *CAST as BIGINT* in the generated SQL for the child query, since 
num_rows should be a BIGINT.

[https://github.com/apache/impala/blob/master/fe/src/main/java/org/apache/impala/analysis/ComputeStatsStmt.java#L548]

Also, probably best to fix this in the child query's sql, rather than adding 
implicit casts else where in the code.

 

> COMPUTE STATS TABLESAMPLE is not updating number of estimated rows
> --
>
> Key: IMPALA-7876
> URL: https://issues.apache.org/jira/browse/IMPALA-7876
> Project: IMPALA
>  Issue Type: Bug
>  Components: Frontend
>Affects Versions: Impala 3.0
>Reporter: Andre Araujo
>Assignee: Abhishek Rawat
>Priority: Critical
>
> Running the command below seems to have no impact on the #rows stats.
> {code}
> [host:21000] default> COMPUTE STATS wide TABLESAMPLE SYSTEM(5);
> Query: COMPUTE STATS wide TABLESAMPLE SYSTEM(100)
> +---+
> | summary   |
> +---+
> | Updated 1 partition(s) and 103 column(s). |
> +---+
> WARNINGS: Ignoring TABLESAMPLE because the effective sampling rate is 100%.
> The minimum sample size is COMPUTE_STATS_MIN_SAMPLE_SIZE=1.00GB and the table 
> size 20.35GB
> Fetched 1 row(s) in 43.67s
> [host:21000] default> show table stats wide;
> Query: show table stats wide
> +---+--++-+--+---+-+---+-+
> | #Rows | Extrap #Rows | #Files | Size| Bytes Cached | Cache Replication 
> | Format  | Incremental stats | Location|
> +---+--++-+--+---+-+---+-+
> | 0 | -1   | 84 | 20.35GB | NOT CACHED   | NOT CACHED
> | PARQUET | false | hdfs://ns1/user/hive/warehouse/wide |
> +---+--++-+--+---+-+---+-+
> Fetched 1 row(s) in 0.01s
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Comment Edited] (IMPALA-7876) COMPUTE STATS TABLESAMPLE is not updating number of estimated rows

2020-11-02 Thread Abhishek Rawat (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-7876?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17224873#comment-17224873
 ] 

Abhishek Rawat edited comment on IMPALA-7876 at 11/2/20, 6:29 PM:
--

The core issue here is that the child query computing the num_rows (table 
stats) uses ROUND function which returns the results as a *DECIMAL* type. Eg. 
below.
{code:java}
SELECT ROUND(COUNT / 0.8935390115) FROM t1 TABLESAMPLE SYSTEM(10) 
REPEATABLE(1598511315168){code}
The CatalogOpExecutor when setting the table stats expects the data type to be 
*BIGINT*.

[https://github.com/apache/impala/blob/master/be/src/exec/catalog-op-executor.cc#L243]

[https://github.com/apache/impala/blob/master/be/src/exec/catalog-op-executor.cc#L255]

This used to work in the past because ROUND used to return results as type 
BIGINT.

This behavior was later changed for the better in this 
[commit|http://https//github.com/apache/impala/commit/8fec1911e52e40aff4cc1de17265bd6803cb13f5]

There are couple of ways to fix this issue. I am leaning towards a fix which 
will add a *CAST as BIGINT* in the generated SQL for the child query, since 
num_rows should be a BIGINT.

[https://github.com/apache/impala/blob/master/fe/src/main/java/org/apache/impala/analysis/ComputeStatsStmt.java#L548]

Also, probably best to fix this in the child query's sql, rather than adding 
implicit casts else where in the code.

 


was (Author: arawat):
The core issue here is that the child query computing the num_rows (table 
stats) uses ROUND function which returns the results as a *DECIMAL* type. Eg. 
below.
{code:java}
SELECT ROUND(COUNT / 0.8935390115) FROM t1 TABLESAMPLE SYSTEM(10) 
REPEATABLE(1598511315168){code}
The CatalogOpExecutor when setting the table stats expects the data type to be 
*BIGINT*.

[https://github.com/apache/impala/blob/master/be/src/exec/catalog-op-executor.cc#L243]

[https://github.com/apache/impala/blob/master/be/src/exec/catalog-op-executor.cc#L255]

This used to work in the past because ROUND used to return results as type 
BIGINT.

This behavior was later changed for the better in this 
[commit|http://mpala-6230%2C%20impala-6468:%20Fix%20the%20output%20type%20of%20round()%20and%20related%20fns/].

There are couple of ways to fix this issue. I am leaning towards a fix which 
will add a *CAST as BIGINT* in the generated SQL for the child query, since 
num_rows should be a BIGINT.

[https://github.com/apache/impala/blob/master/fe/src/main/java/org/apache/impala/analysis/ComputeStatsStmt.java#L548]

Also, probably best to fix this in the child query's sql, rather than adding 
implicit casts else where in the code.

 

> COMPUTE STATS TABLESAMPLE is not updating number of estimated rows
> --
>
> Key: IMPALA-7876
> URL: https://issues.apache.org/jira/browse/IMPALA-7876
> Project: IMPALA
>  Issue Type: Bug
>  Components: Frontend
>Affects Versions: Impala 3.0
>Reporter: Andre Araujo
>Assignee: Abhishek Rawat
>Priority: Critical
>
> Running the command below seems to have no impact on the #rows stats.
> {code}
> [host:21000] default> COMPUTE STATS wide TABLESAMPLE SYSTEM(5);
> Query: COMPUTE STATS wide TABLESAMPLE SYSTEM(100)
> +---+
> | summary   |
> +---+
> | Updated 1 partition(s) and 103 column(s). |
> +---+
> WARNINGS: Ignoring TABLESAMPLE because the effective sampling rate is 100%.
> The minimum sample size is COMPUTE_STATS_MIN_SAMPLE_SIZE=1.00GB and the table 
> size 20.35GB
> Fetched 1 row(s) in 43.67s
> [host:21000] default> show table stats wide;
> Query: show table stats wide
> +---+--++-+--+---+-+---+-+
> | #Rows | Extrap #Rows | #Files | Size| Bytes Cached | Cache Replication 
> | Format  | Incremental stats | Location|
> +---+--++-+--+---+-+---+-+
> | 0 | -1   | 84 | 20.35GB | NOT CACHED   | NOT CACHED
> | PARQUET | false | hdfs://ns1/user/hive/warehouse/wide |
> +---+--++-+--+---+-+---+-+
> Fetched 1 row(s) in 0.01s
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Comment Edited] (IMPALA-7876) COMPUTE STATS TABLESAMPLE is not updating number of estimated rows

2020-11-02 Thread Abhishek Rawat (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-7876?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17224873#comment-17224873
 ] 

Abhishek Rawat edited comment on IMPALA-7876 at 11/2/20, 6:28 PM:
--

The core issue here is that the child query computing the num_rows (table 
stats) uses ROUND function which returns the results as a *DECIMAL* type. Eg. 
below.
{code:java}
SELECT ROUND(COUNT / 0.8935390115) FROM t1 TABLESAMPLE SYSTEM(10) 
REPEATABLE(1598511315168){code}
The CatalogOpExecutor when setting the table stats expects the data type to be 
*BIGINT*.

[https://github.com/apache/impala/blob/master/be/src/exec/catalog-op-executor.cc#L243]

[https://github.com/apache/impala/blob/master/be/src/exec/catalog-op-executor.cc#L255]

This used to work in the past because ROUND used to return results as type 
BIGINT.

This behavior was later changed for the better in this 
[commit|http://mpala-6230%2C%20impala-6468:%20Fix%20the%20output%20type%20of%20round()%20and%20related%20fns/].

There are couple of ways to fix this issue. I am leaning towards a fix which 
will add a *CAST as BIGINT* in the generated SQL for the child query, since 
num_rows should be a BIGINT.

[https://github.com/apache/impala/blob/master/fe/src/main/java/org/apache/impala/analysis/ComputeStatsStmt.java#L548]

Also, probably best to fix this in the child query's sql, rather than adding 
implicit casts else where in the code.

 


was (Author: arawat):
The core issue here is that the child query computing the num_rows (table 
stats) uses ROUND function which returns the results as a *DECIMAL* type. Eg. 
below.
SELECT ROUND(COUNT(*) / 0.8935390115) FROM t1 TABLESAMPLE SYSTEM(10) 
REPEATABLE(1598511315168)
The CatalogOpExecutor when setting the table stats expects the data type to be 
*BIGINT*.

[https://github.com/apache/impala/blob/master/be/src/exec/catalog-op-executor.cc#L243]

[https://github.com/apache/impala/blob/master/be/src/exec/catalog-op-executor.cc#L255]

This used to work in the past because ROUND used to return results as type 
BIGINT.

This behavior was later changed for the better in this 
[commit|http://mpala-6230%2C%20impala-6468:%20Fix%20the%20output%20type%20of%20round()%20and%20related%20fns/].

There are couple of ways to fix this issue. I am leaning towards a fix which 
will add a *CAST as BIGINT* in the generated SQL for the child query, since 
num_rows should be a BIGINT.

[https://github.com/apache/impala/blob/master/fe/src/main/java/org/apache/impala/analysis/ComputeStatsStmt.java#L548]

Also, probably best to fix this in the child query's sql, rather than adding 
implicit casts else where in the code.

 

> COMPUTE STATS TABLESAMPLE is not updating number of estimated rows
> --
>
> Key: IMPALA-7876
> URL: https://issues.apache.org/jira/browse/IMPALA-7876
> Project: IMPALA
>  Issue Type: Bug
>  Components: Frontend
>Affects Versions: Impala 3.0
>Reporter: Andre Araujo
>Assignee: Abhishek Rawat
>Priority: Critical
>
> Running the command below seems to have no impact on the #rows stats.
> {code}
> [host:21000] default> COMPUTE STATS wide TABLESAMPLE SYSTEM(5);
> Query: COMPUTE STATS wide TABLESAMPLE SYSTEM(100)
> +---+
> | summary   |
> +---+
> | Updated 1 partition(s) and 103 column(s). |
> +---+
> WARNINGS: Ignoring TABLESAMPLE because the effective sampling rate is 100%.
> The minimum sample size is COMPUTE_STATS_MIN_SAMPLE_SIZE=1.00GB and the table 
> size 20.35GB
> Fetched 1 row(s) in 43.67s
> [host:21000] default> show table stats wide;
> Query: show table stats wide
> +---+--++-+--+---+-+---+-+
> | #Rows | Extrap #Rows | #Files | Size| Bytes Cached | Cache Replication 
> | Format  | Incremental stats | Location|
> +---+--++-+--+---+-+---+-+
> | 0 | -1   | 84 | 20.35GB | NOT CACHED   | NOT CACHED
> | PARQUET | false | hdfs://ns1/user/hive/warehouse/wide |
> +---+--++-+--+---+-+---+-+
> Fetched 1 row(s) in 0.01s
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-7876) COMPUTE STATS TABLESAMPLE is not updating number of estimated rows

2020-11-02 Thread Abhishek Rawat (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-7876?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17224873#comment-17224873
 ] 

Abhishek Rawat commented on IMPALA-7876:


The core issue here is that the child query computing the num_rows (table 
stats) uses ROUND function which returns the results as a *DECIMAL* type. Eg. 
below.
SELECT ROUND(COUNT(*) / 0.8935390115) FROM t1 TABLESAMPLE SYSTEM(10) 
REPEATABLE(1598511315168)
The CatalogOpExecutor when setting the table stats expects the data type to be 
*BIGINT*.

[https://github.com/apache/impala/blob/master/be/src/exec/catalog-op-executor.cc#L243]

[https://github.com/apache/impala/blob/master/be/src/exec/catalog-op-executor.cc#L255]

This used to work in the past because ROUND used to return results as type 
BIGINT.

This behavior was later changed for the better in this 
[commit|http://mpala-6230%2C%20impala-6468:%20Fix%20the%20output%20type%20of%20round()%20and%20related%20fns/].

There are couple of ways to fix this issue. I am leaning towards a fix which 
will add a *CAST as BIGINT* in the generated SQL for the child query, since 
num_rows should be a BIGINT.

[https://github.com/apache/impala/blob/master/fe/src/main/java/org/apache/impala/analysis/ComputeStatsStmt.java#L548]

Also, probably best to fix this in the child query's sql, rather than adding 
implicit casts else where in the code.

 

> COMPUTE STATS TABLESAMPLE is not updating number of estimated rows
> --
>
> Key: IMPALA-7876
> URL: https://issues.apache.org/jira/browse/IMPALA-7876
> Project: IMPALA
>  Issue Type: Bug
>  Components: Frontend
>Affects Versions: Impala 3.0
>Reporter: Andre Araujo
>Assignee: Abhishek Rawat
>Priority: Critical
>
> Running the command below seems to have no impact on the #rows stats.
> {code}
> [host:21000] default> COMPUTE STATS wide TABLESAMPLE SYSTEM(5);
> Query: COMPUTE STATS wide TABLESAMPLE SYSTEM(100)
> +---+
> | summary   |
> +---+
> | Updated 1 partition(s) and 103 column(s). |
> +---+
> WARNINGS: Ignoring TABLESAMPLE because the effective sampling rate is 100%.
> The minimum sample size is COMPUTE_STATS_MIN_SAMPLE_SIZE=1.00GB and the table 
> size 20.35GB
> Fetched 1 row(s) in 43.67s
> [host:21000] default> show table stats wide;
> Query: show table stats wide
> +---+--++-+--+---+-+---+-+
> | #Rows | Extrap #Rows | #Files | Size| Bytes Cached | Cache Replication 
> | Format  | Incremental stats | Location|
> +---+--++-+--+---+-+---+-+
> | 0 | -1   | 84 | 20.35GB | NOT CACHED   | NOT CACHED
> | PARQUET | false | hdfs://ns1/user/hive/warehouse/wide |
> +---+--++-+--+---+-+---+-+
> Fetched 1 row(s) in 0.01s
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Work started] (IMPALA-7876) COMPUTE STATS TABLESAMPLE is not updating number of estimated rows

2020-11-02 Thread Abhishek Rawat (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-7876?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on IMPALA-7876 started by Abhishek Rawat.
--
> COMPUTE STATS TABLESAMPLE is not updating number of estimated rows
> --
>
> Key: IMPALA-7876
> URL: https://issues.apache.org/jira/browse/IMPALA-7876
> Project: IMPALA
>  Issue Type: Bug
>  Components: Frontend
>Affects Versions: Impala 3.0
>Reporter: Andre Araujo
>Assignee: Abhishek Rawat
>Priority: Critical
>
> Running the command below seems to have no impact on the #rows stats.
> {code}
> [host:21000] default> COMPUTE STATS wide TABLESAMPLE SYSTEM(5);
> Query: COMPUTE STATS wide TABLESAMPLE SYSTEM(100)
> +---+
> | summary   |
> +---+
> | Updated 1 partition(s) and 103 column(s). |
> +---+
> WARNINGS: Ignoring TABLESAMPLE because the effective sampling rate is 100%.
> The minimum sample size is COMPUTE_STATS_MIN_SAMPLE_SIZE=1.00GB and the table 
> size 20.35GB
> Fetched 1 row(s) in 43.67s
> [host:21000] default> show table stats wide;
> Query: show table stats wide
> +---+--++-+--+---+-+---+-+
> | #Rows | Extrap #Rows | #Files | Size| Bytes Cached | Cache Replication 
> | Format  | Incremental stats | Location|
> +---+--++-+--+---+-+---+-+
> | 0 | -1   | 84 | 20.35GB | NOT CACHED   | NOT CACHED
> | PARQUET | false | hdfs://ns1/user/hive/warehouse/wide |
> +---+--++-+--+---+-+---+-+
> Fetched 1 row(s) in 0.01s
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-10306) [DOC] Extend FROM_UNIXTIME() doc with Timezone offset behaviour.

2020-11-02 Thread shajini thayasingh (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-10306?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17224841#comment-17224841
 ] 

shajini thayasingh commented on IMPALA-10306:
-

[~gaborkaszab] I assigned this ticket to me and I took care of the changes you 
had requested.

> [DOC] Extend FROM_UNIXTIME() doc with Timezone offset behaviour.
> 
>
> Key: IMPALA-10306
> URL: https://issues.apache.org/jira/browse/IMPALA-10306
> Project: IMPALA
>  Issue Type: Bug
>  Components: Docs
>Reporter: Gabor Kaszab
>Assignee: shajini thayasingh
>Priority: Major
>
> FROM_UNIXTIME() accepts a format parameter that is a string that represents 
> how this function should format its output timestamp. This format parameter 
> can contain a timezone offset, however even if we provide a TZ offset in the 
> format parameter it won't be included in the result.
> The reason is that Impala stores Timestamp without timezone in UTC and has no 
> information of the timezone offset.
> I think it would be nice to clarify this in the docs so that the users won't 
> expect to get specific timezone offsets from this function as a result.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Assigned] (IMPALA-10306) [DOC] Extend FROM_UNIXTIME() doc with Timezone offset behaviour.

2020-11-02 Thread shajini thayasingh (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-10306?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

shajini thayasingh reassigned IMPALA-10306:
---

Assignee: shajini thayasingh

> [DOC] Extend FROM_UNIXTIME() doc with Timezone offset behaviour.
> 
>
> Key: IMPALA-10306
> URL: https://issues.apache.org/jira/browse/IMPALA-10306
> Project: IMPALA
>  Issue Type: Bug
>  Components: Docs
>Reporter: Gabor Kaszab
>Assignee: shajini thayasingh
>Priority: Major
>
> FROM_UNIXTIME() accepts a format parameter that is a string that represents 
> how this function should format its output timestamp. This format parameter 
> can contain a timezone offset, however even if we provide a TZ offset in the 
> format parameter it won't be included in the result.
> The reason is that Impala stores Timestamp without timezone in UTC and has no 
> information of the timezone offset.
> I think it would be nice to clarify this in the docs so that the users won't 
> expect to get specific timezone offsets from this function as a result.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-9767) ASAN crash during coordinator runtime filter updates

2020-11-02 Thread Fang-Yu Rao (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-9767?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17224806#comment-17224806
 ] 

Fang-Yu Rao commented on IMPALA-9767:
-

Thanks [~joemcdonnell]! I think the scenario you described is possible! I will 
take a much closer look at this loop and try to see if we need to use a lock to 
prevent a state change in this loop. Will get back to you if I have any 
questions.

> ASAN crash during coordinator runtime filter updates
> 
>
> Key: IMPALA-9767
> URL: https://issues.apache.org/jira/browse/IMPALA-9767
> Project: IMPALA
>  Issue Type: Bug
>Reporter: Sahil Takiar
>Assignee: Fang-Yu Rao
>Priority: Major
>  Labels: asan, broken-build, crash
> Attachments: consoleFull_asan_939.txt
>
>
> ASAN crash output:
> {code:java}
> Error MessageAddress Sanitizer message detected in 
> /data/jenkins/workspace/impala-asf-master-core-asan/repos/Impala/logs/ee_tests/impalad.ERRORStandard
>  Error==4808==ERROR: AddressSanitizer: heap-use-after-free on address 
> 0x7f6288cbe818 at pc 0x0199f6fe bp 0x7f63c1a8b270 sp 0x7f63c1a8aa20
> READ of size 1048576 at 0x7f6288cbe818 thread T73 (rpc reactor-552)
> #0 0x199f6fd in read_iovec(void*, __sanitizer::__sanitizer_iovec*, 
> unsigned long, unsigned long) 
> /mnt/source/llvm/llvm-5.0.1.src-p2/projects/compiler-rt/lib/asan/../sanitizer_common/sanitizer_common_interceptors.inc:904
> #1 0x19a1f57 in read_msghdr(void*, __sanitizer::__sanitizer_msghdr*, 
> long) 
> /mnt/source/llvm/llvm-5.0.1.src-p2/projects/compiler-rt/lib/asan/../sanitizer_common/sanitizer_common_interceptors.inc:2781
> #2 0x19a46c3 in __interceptor_sendmsg 
> /mnt/source/llvm/llvm-5.0.1.src-p2/projects/compiler-rt/lib/asan/../sanitizer_common/sanitizer_common_interceptors.inc:2796
> #3 0x372034d in kudu::Socket::Writev(iovec const*, int, long*) 
> /data/jenkins/workspace/impala-asf-master-core-asan/repos/Impala/be/src/kudu/util/net/socket.cc:447:3
> #4 0x331c095 in kudu::rpc::OutboundTransfer::SendBuffer(kudu::Socket&) 
> /data/jenkins/workspace/impala-asf-master-core-asan/repos/Impala/be/src/kudu/rpc/transfer.cc:227:26
> #5 0x3324da1 in kudu::rpc::Connection::WriteHandler(ev::io&, int) 
> /data/jenkins/workspace/impala-asf-master-core-asan/repos/Impala/be/src/kudu/rpc/connection.cc:802:31
> #6 0x52ca4e2 in ev_invoke_pending 
> (/data0/jenkins/workspace/impala-asf-master-core-asan/repos/Impala/be/build/debug/service/impalad+0x52ca4e2)
> #7 0x32aeadc in kudu::rpc::ReactorThread::InvokePendingCb(ev_loop*) 
> /data/jenkins/workspace/impala-asf-master-core-asan/repos/Impala/be/src/kudu/rpc/reactor.cc:196:3
> #8 0x52cdb03 in ev_run 
> (/data0/jenkins/workspace/impala-asf-master-core-asan/repos/Impala/be/build/debug/service/impalad+0x52cdb03)
> #9 0x32aecd1 in kudu::rpc::ReactorThread::RunThread() 
> /data/jenkins/workspace/impala-asf-master-core-asan/repos/Impala/be/src/kudu/rpc/reactor.cc:497:9
> #10 0x32c08db in boost::_bi::bind_t kudu::rpc::ReactorThread>, 
> boost::_bi::list1 > 
> >::operator()() 
> /data/jenkins/workspace/impala-asf-master-core-asan/Impala-Toolchain/boost-1.61.0-p2/include/boost/bind/bind.hpp:1222:16
> #11 0x2148c26 in boost::function0::operator()() const 
> /data/jenkins/workspace/impala-asf-master-core-asan/Impala-Toolchain/boost-1.61.0-p2/include/boost/function/function_template.hpp:770:14
> #12 0x2144b29 in kudu::Thread::SuperviseThread(void*) 
> /data/jenkins/workspace/impala-asf-master-core-asan/repos/Impala/be/src/kudu/util/thread.cc:675:3
> #13 0x7f6c0bcf4e24 in start_thread (/lib64/libpthread.so.0+0x7e24)
> #14 0x7f6c0885834c in __clone (/lib64/libc.so.6+0xf834c)
> 0x7f6288cbe818 is located 24 bytes inside of 1052640-byte region 
> [0x7f6288cbe800,0x7f6288dbf7e0)
> freed by thread T114 here:
> #0 0x1a773e0 in operator delete(void*) 
> /mnt/source/llvm/llvm-5.0.1.src-p2/projects/compiler-rt/lib/asan/asan_new_delete.cc:137
> #1 0x7f6c090faed3 in __gnu_cxx::new_allocator::deallocate(char*, 
> unsigned long) 
> /mnt/source/gcc/build-4.9.2/x86_64-unknown-linux-gnu/libstdc++-v3/include/ext/new_allocator.h:110
> #2 0x7f6c090faed3 in std::string::_Rep::_M_destroy(std::allocator 
> const&) 
> /mnt/source/gcc/build-4.9.2/x86_64-unknown-linux-gnu/libstdc++-v3/include/bits/basic_string.tcc:449
> #3 0x7f6c090faed3 in std::string::_Rep::_M_dispose(std::allocator 
> const&) 
> /mnt/source/gcc/build-4.9.2/x86_64-unknown-linux-gnu/libstdc++-v3/include/bits/basic_string.h:249
> #4 0x7f6c090faed3 in std::string::reserve(unsigned long) 
> /mnt/source/gcc/build-4.9.2/x86_64-unknown-linux-gnu/libstdc++-v3/include/bits/basic_string.tcc:511
> #5 0x2781865 in 
> impala::ClientRequestState::UpdateFilter(impala::UpdateFilterParamsPB const&, 
> kudu::rpc::RpcContext*) 

[jira] [Commented] (IMPALA-9879) ASAN use-after-free with KRPC thread and Coordinator::FilterState::ApplyUpdate()

2020-11-02 Thread Fang-Yu Rao (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-9879?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17224796#comment-17224796
 ] 

Fang-Yu Rao commented on IMPALA-9879:
-

Thanks [~joemcdonnell] for the detailed analysis here and at IMPALA-9767! I 
will read and try to understand your analysis this week and will get back to 
you if I have any other idea, since I think I also need some time refreshing my 
memory of how our runtime filters aggregation and distribution works. :)

> ASAN use-after-free  with KRPC thread and 
> Coordinator::FilterState::ApplyUpdate()
> -
>
> Key: IMPALA-9879
> URL: https://issues.apache.org/jira/browse/IMPALA-9879
> Project: IMPALA
>  Issue Type: Bug
>  Components: Backend
>Affects Versions: Impala 4.0
>Reporter: Joe McDonnell
>Assignee: Fang-Yu Rao
>Priority: Critical
>  Labels: broken-build
>
> An ASAN core run failed with the following Impalad crash:
>  
> {noformat}
> ==4348==ERROR: AddressSanitizer: heap-use-after-free on address 
> 0x7fc144423800 at pc 0x01a50071 bp 0x7fc26d7daa40 sp 0x7fc26d7da1f0
> READ of size 1048576 at 0x7fc144423800 thread T81 (rpc reactor-464)
> #0 0x1a50070 in read_iovec(void*, __sanitizer::__sanitizer_iovec*, 
> unsigned long, unsigned long) 
> /mnt/source/llvm/llvm-5.0.1.src-p2/projects/compiler-rt/lib/asan/../sanitizer_common/sanitizer_common_interceptors.inc:904
> #1 0x1a666d1 in read_msghdr(void*, __sanitizer::__sanitizer_msghdr*, 
> long) 
> /mnt/source/llvm/llvm-5.0.1.src-p2/projects/compiler-rt/lib/asan/../sanitizer_common/sanitizer_common_interceptors.inc:2781
> #2 0x1a68fb3 in __interceptor_sendmsg 
> /mnt/source/llvm/llvm-5.0.1.src-p2/projects/compiler-rt/lib/asan/../sanitizer_common/sanitizer_common_interceptors.inc:2796
> #3 0x38074dc in kudu::Socket::Writev(iovec const*, int, long*) 
> /data/jenkins/workspace/impala-asf-master-core-asan/repos/Impala/be/src/kudu/util/net/socket.cc:447:3
> #4 0x3411fa5 in kudu::rpc::OutboundTransfer::SendBuffer(kudu::Socket&) 
> /data/jenkins/workspace/impala-asf-master-core-asan/repos/Impala/be/src/kudu/rpc/transfer.cc:227:26
> #5 0x341aa60 in kudu::rpc::Connection::WriteHandler(ev::io&, int) 
> /data/jenkins/workspace/impala-asf-master-core-asan/repos/Impala/be/src/kudu/rpc/connection.cc:802:31
> #6 0x55ef342 in ev_invoke_pending 
> (/data0/jenkins/workspace/impala-asf-master-core-asan/repos/Impala/be/build/debug/service/impalad+0x55ef342)
> #7 0x33a4d8c in kudu::rpc::ReactorThread::InvokePendingCb(ev_loop*) 
> /data/jenkins/workspace/impala-asf-master-core-asan/repos/Impala/be/src/kudu/rpc/reactor.cc:196:3
> #8 0x55f29ef in ev_run 
> (/data0/jenkins/workspace/impala-asf-master-core-asan/repos/Impala/be/build/debug/service/impalad+0x55f29ef)
> #9 0x33a4f81 in kudu::rpc::ReactorThread::RunThread() 
> /data/jenkins/workspace/impala-asf-master-core-asan/repos/Impala/be/src/kudu/rpc/reactor.cc:497:9
> #10 0x33b66bb in boost::_bi::bind_t kudu::rpc::ReactorThread>, 
> boost::_bi::list1 > 
> >::operator()() 
> /data/jenkins/workspace/impala-asf-master-core-asan/Impala-Toolchain/toolchain-packages-gcc7.5.0/boost-1.61.0-p2/include/boost/bind/bind.hpp:1222:16
> #11 0x21ba196 in boost::function0::operator()() const 
> /data/jenkins/workspace/impala-asf-master-core-asan/Impala-Toolchain/toolchain-packages-gcc7.5.0/boost-1.61.0-p2/include/boost/function/function_template.hpp:770:14
> #12 0x21b6089 in kudu::Thread::SuperviseThread(void*) 
> /data/jenkins/workspace/impala-asf-master-core-asan/repos/Impala/be/src/kudu/util/thread.cc:675:3
> #13 0x7fcabb86be24 in start_thread (/lib64/libpthread.so.0+0x7e24)
> #14 0x7fcab833f34c in __clone (/lib64/libc.so.6+0xf834c)
> 0x7fc144423800 is located 0 bytes inside of 1048577-byte region 
> [0x7fc144423800,0x7fc144523801)
> freed by thread T108 here:
> #0 0x1ad6050 in operator delete(void*) 
> /mnt/source/llvm/llvm-5.0.1.src-p2/projects/compiler-rt/lib/asan/asan_new_delete.cc:137
> #1 0x7fcab8c425a9 in __gnu_cxx::new_allocator::deallocate(char*, 
> unsigned long) 
> /mnt/source/gcc/build-7.5.0/x86_64-pc-linux-gnu/libstdc++-v3/include/ext/new_allocator.h:125
> #2 0x7fcab8c425a9 in std::allocator_traits 
> >::deallocate(std::allocator&, char*, unsigned long) 
> /mnt/source/gcc/build-7.5.0/x86_64-pc-linux-gnu/libstdc++-v3/include/bits/alloc_traits.h:462
> #3 0x7fcab8c425a9 in std::__cxx11::basic_string std::char_traits, std::allocator >::_M_destroy(unsigned long) 
> /mnt/source/gcc/build-7.5.0/x86_64-pc-linux-gnu/libstdc++-v3/include/bits/basic_string.h:226
> #4 0x7fcab8c425a9 in std::__cxx11::basic_string std::char_traits, std::allocator >::reserve(unsigned long) 
> 

[jira] [Commented] (IMPALA-10308) Fail to load metadata for table: 'iceberg_partitioned' in a scanner test with ASAN build

2020-11-02 Thread Qifan Chen (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-10308?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17224793#comment-17224793
 ] 

Qifan Chen commented on IMPALA-10308:
-

Hi [~skyyws],  Sorry I have not tried copying the test files to hdfs, and 
thanks a lot for trying it out. 

Since the error was seen with running test_scanners.py.  I wonder if the 
queries executed prior to the DDL in question have an impact. 

> Fail to load metadata for table: 'iceberg_partitioned' in a scanner test with 
> ASAN build
> 
>
> Key: IMPALA-10308
> URL: https://issues.apache.org/jira/browse/IMPALA-10308
> Project: IMPALA
>  Issue Type: Bug
>Reporter: Qifan Chen
>Priority: Major
>
> The following error was seen when running the scanner test against the ASAN 
> build.
> {code:java}
> E   ImpalaBeeswaxException: ImpalaBeeswaxException:
> EINNER EXCEPTION: 
> EMESSAGE: AnalysisException: Failed to load metadata for table: 
> 'iceberg_partitioned'
> E   CAUSED BY: TableLoadingException: Error loading metadata for Iceberg 
> table hdfs://localhost:20500/test-warehouse/iceberg_test/iceberg_partitioned
> E   CAUSED BY: IllegalArgumentException: Can not create a Path from a null 
> string
>  TestIceberg.test_iceberg_query[protocol: beeswax | exec_option: 
> {'batch_size': 0, 'num_nodes': 0, 'disable_codegen_rows_threshold': 0, 
> 'disable_codegen': True, 'abort_on_error': 1, 'debug_action': 
> 'HDFS_SCANNER_THREAD_CHECK_SOFT_MEM_LIMIT:FAIL@0.5', 
> 'exec_single_node_rows_threshold': 0} | table_format: parquet/none] 
> [gw2] linux2 -- Python 2.7.16 
> /home/qchen/Impala/bin/../infra/python/env-gcc7.5.0/bin/python
> query_test/test_scanners.py:357: in test_iceberg_query
> self.run_test_case('QueryTest/iceberg-query', vector)
> common/impala_test_suite.py:662: in run_test_case
> result = exec_fn(query, user=test_section.get('USER', '').strip() or None)
> common/impala_test_suite.py:600: in __exec_in_impala
> result = self.__execute_query(target_impalad_client, query, user=user)
> common/impala_test_suite.py:920: in __execute_query
> return impalad_client.execute(query, user=user)
> common/impala_connection.py:205: in execute
> return self.__beeswax_client.execute(sql_stmt, user=user)
> beeswax/impala_beeswax.py:187: in execute
> handle = self.__execute_query(query_string.strip(), user=user)
> beeswax/impala_beeswax.py:363: in __execute_query
> handle = self.execute_query_async(query_string, user=user)
> beeswax/impala_beeswax.py:357: in execute_query_async
> handle = self.__do_rpc(lambda: self.imp_service.query(query,))
> beeswax/impala_beeswax.py:520: in __do_rpc
> {code}
> To reproduce, apply the following steps.
> {code:java}
> 1. Build: ${IMPALA_HOME}/buildall.sh -skiptests -ninja -asan
> 2. Run test: 
> cd {IMPALA_HOME} 
> $tests/run-tests.py --exploration_strategy=exhaustive 
> tests/query_test/test_scanners.py
> {code}
> Branch info.
> The master branch with ttps://github.com/apache/impala.git.  The HEAD points 
> at 193c2e773fa9f6772e4a7c30ed3a4f75029863f1.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-10308) Fail to load metadata for table: 'iceberg_partitioned' in a scanner test with ASAN build

2020-11-02 Thread WangSheng (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-10308?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17224611#comment-17224611
 ] 

WangSheng commented on IMPALA-10308:


Hi [~sql_forever], thanks for report this bug. It seems that this test failed 
when loading iceberg_partitioned. Have you even put the test files to hdfs 
manually like this?
{code:java}
// testdata/datasets/functional/functional_schema_template.sql
`hadoop fs -mkdir -p /test-warehouse/iceberg_test && \
hadoop fs -put -f ${IMPALA_HOME}/testdata/data/iceberg_test/iceberg_partitioned 
/test-warehouse/iceberg_test/
{code}
I've already rebuild code in my own environment by ninja and asan, but I can 
create external Iceberg table and query normally, like this:
{code:java}
 CREATE EXTERNAL TABLE functional_parquet.iceberg_partitioned ( 
  
   id INT,  
  
   user STRING, 
  
   action STRING,   
  
   event_time TIMESTAMP 
  
 )  
  
 PARTITION BY SPEC  
  
 (  
  
   event_time HOUR, 
  
   action IDENTITY  
  
 )  
  
 STORED AS ICEBERG  
  
 LOCATION 
'hdfs://localhost:20500/test-warehouse/iceberg_test/iceberg_partitioned'

 TBLPROPERTIES ('iceberg.catalog'='hadoop.tables', 
'iceberg.file_format'='parquet');
select count(1) from functional_parquet.iceberg_partitioned;{code}
 

> Fail to load metadata for table: 'iceberg_partitioned' in a scanner test with 
> ASAN build
> 
>
> Key: IMPALA-10308
> URL: https://issues.apache.org/jira/browse/IMPALA-10308
> Project: IMPALA
>  Issue Type: Bug
>Reporter: Qifan Chen
>Priority: Major
>
> The following error was seen when running the scanner test against the ASAN 
> build.
> {code:java}
> E   ImpalaBeeswaxException: ImpalaBeeswaxException:
> EINNER EXCEPTION: 
> EMESSAGE: AnalysisException: Failed to load metadata for table: 
> 'iceberg_partitioned'
> E   CAUSED BY: TableLoadingException: Error loading metadata for Iceberg 
> table hdfs://localhost:20500/test-warehouse/iceberg_test/iceberg_partitioned
> E   CAUSED BY: IllegalArgumentException: Can not create a Path from a null 
> string
>  TestIceberg.test_iceberg_query[protocol: beeswax | exec_option: 
> {'batch_size': 0, 'num_nodes': 0, 'disable_codegen_rows_threshold': 0, 
> 'disable_codegen': True, 'abort_on_error': 1, 'debug_action': 
> 'HDFS_SCANNER_THREAD_CHECK_SOFT_MEM_LIMIT:FAIL@0.5', 
> 'exec_single_node_rows_threshold': 0} | table_format: parquet/none] 
> [gw2] linux2 -- Python 2.7.16 
> /home/qchen/Impala/bin/../infra/python/env-gcc7.5.0/bin/python
> query_test/test_scanners.py:357: in test_iceberg_query
> self.run_test_case('QueryTest/iceberg-query', vector)
> common/impala_test_suite.py:662: in run_test_case
> result = exec_fn(query, user=test_section.get('USER', '').strip() or None)
> common/impala_test_suite.py:600: in __exec_in_impala
> result = self.__execute_query(target_impalad_client, query, user=user)
> common/impala_test_suite.py:920: in __execute_query
> return impalad_client.execute(query, user=user)
> common/impala_connection.py:205: in execute
> return self.__beeswax_client.execute(sql_stmt, user=user)
> beeswax/impala_beeswax.py:187: in execute
> handle = self.__execute_query(query_string.strip(), user=user)
> beeswax/impala_beeswax.py:363: in __execute_query
> handle = self.execute_query_async(query_string, user=user)
> beeswax/impala_beeswax.py:357: in execute_query_async
> handle = self.__do_rpc(lambda: self.imp_service.query(query,))
> beeswax/impala_beeswax.py:520: in __do_rpc
> {code}
> To reproduce, apply the following