[jira] [Work started] (IMPALA-9936) Only send invalidations in DDL responses to LocalCatalog coordinators

2020-09-09 Thread Quanlong Huang (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-9936?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on IMPALA-9936 started by Quanlong Huang.
--
> Only send invalidations in DDL responses to LocalCatalog coordinators
> -
>
> Key: IMPALA-9936
> URL: https://issues.apache.org/jira/browse/IMPALA-9936
> Project: IMPALA
>  Issue Type: Sub-task
>Reporter: Quanlong Huang
>Assignee: Quanlong Huang
>Priority: Major
>
> Catalogd RPC requests (TDdlExecRequest, TUpdateCatalogRequest and 
> TResetMetadataRequest) should contain the information (whether in 
> LocalCatalog mode) of the clients (coordinators). For LocalCatalog 
> coordinators, catalogd just need to send back invalidations instead of the 
> full catalog objects.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Resolved] (IMPALA-10099) Push down DISTINCT aggregation for EXCEPT/INTERSECT

2020-09-09 Thread Shant Hovsepian (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-10099?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shant Hovsepian resolved IMPALA-10099.
--
Fix Version/s: Impala 4.0
   Resolution: Fixed

> Push down DISTINCT aggregation for EXCEPT/INTERSECT
> ---
>
> Key: IMPALA-10099
> URL: https://issues.apache.org/jira/browse/IMPALA-10099
> Project: IMPALA
>  Issue Type: Improvement
>Reporter: Shant Hovsepian
>Assignee: Shant Hovsepian
>Priority: Major
> Fix For: Impala 4.0
>
>
> The implementation of SetOperations for EXCEPT/INTERSECT in IMPALA-9943 
> produced query rewrites that would apply DISTINCT aggregation after exchanges 
> for distributed plans. In case where the query can be directly rewritten to 
> apply the DISTINCT to the set operation operands would result in better 
> performance for most large queries.
> This should help the performance TPC-DS Q14 which does an INTERSECT of 
> queries with large result sets that contain many duplicates.
> In general it would better to have DISTINCT move around optimization phase 
> during planning which would handle this case as well as many others.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Created] (IMPALA-10162) Support additional LDAP filter options

2020-09-09 Thread Thomas Tauber-Marshall (Jira)
Thomas Tauber-Marshall created IMPALA-10162:
---

 Summary: Support additional LDAP filter options
 Key: IMPALA-10162
 URL: https://issues.apache.org/jira/browse/IMPALA-10162
 Project: IMPALA
  Issue Type: Task
  Components: Security
Affects Versions: Impala 4.0
Reporter: Thomas Tauber-Marshall
Assignee: Thomas Tauber-Marshall


IMPALA-2563 added support for user and group filter on LDAP, with options 
modeled after those in Hive, but they are somewhat restrictive - only allowing 
specifying particular parts of the LDAP search filter used.

There are additional, more general ldap filter options that Impala should also 
support which allow for specifying arbitrary search filters. This for example 
would enable an LDAP configuration where the authenticated usernames are not 
part of the user's DN.

We should model these configs after equivalent options in HDFS, see in 
particular 'hadoop.security.group.mapping.ldap.search.filter.user' and 
'hadoop.security.group.mapping.ldap.search.filter.group'



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Created] (IMPALA-10161) User LDAP search bind support

2020-09-09 Thread Tamas Mate (Jira)
Tamas Mate created IMPALA-10161:
---

 Summary: User LDAP search bind support
 Key: IMPALA-10161
 URL: https://issues.apache.org/jira/browse/IMPALA-10161
 Project: IMPALA
  Issue Type: Improvement
  Components: Backend, Security
Affects Versions: Impala 3.4.0
Reporter: Tamas Mate
Assignee: Tamas Mate


Currently Impala only supports simple direct bind mechanism to authenticate a 
user. While other components allow the administrators to specify a user search 
base dn and an administrator bind dn and bind password to search for the user 
under the user search base directory.

This method is especially useful for larger organizations where the directory 
structure is wide. Given the following two FQDNs:
{code:java}
uid=alice,ou=Engineering,ou=People,dc=mycompany,dc=com
uid=bob,ou=Accounting,ou=People,dc=mycompany,dc=com
{code}
In case the administrator would like to allow both Engineering and Accounting 
users to authenticate neither the ldap_baseDN nor the ldap_bind_pattern 
configuration could give the flexibility to authenticate correctly.
 * ldap_baseDN takes the configured baseDN and prefixes it with _uid=_
 * ldap_bind_pattern gives the option to specify a pattern with a parameter 
such as _user=#UID,OU=foo,CN=bar_

The convenient solution would be to specify a base dn and execute a search 
under it instead of prefixing it with uid, because this depends on the LDAP 
directory structure.

LDAP search has already been implemented for groups, this should be implemented 
for users as well.
  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Resolved] (IMPALA-10140) Throw CatalogException for query "create database if not exist" with sync_ddl as true

2020-09-09 Thread Wenzhe Zhou (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-10140?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenzhe Zhou resolved IMPALA-10140.
--
Fix Version/s: Impala 4.0
   Resolution: Fixed

> Throw CatalogException for query "create database if not exist" with sync_ddl 
> as true
> -
>
> Key: IMPALA-10140
> URL: https://issues.apache.org/jira/browse/IMPALA-10140
> Project: IMPALA
>  Issue Type: Bug
>  Components: Catalog, Frontend
>Affects Versions: Impala 3.2.0
>Reporter: Wenzhe Zhou
>Assignee: Wenzhe Zhou
>Priority: Critical
> Fix For: Impala 4.0
>
>
> Customer faced following error message randomly when running following query 
> on impalad version 3.2.0-cdh6.3.2 RELEASE.
> set sync_ddl =true ; create database if not exists $dbname;
> I0715 11:52:28.496253 51943 client-request-state.cc:187] 
> a246b430fe450786:81647bd6] CatalogException: Couldn't retrieve the 
> catalog topic version for the SYNC_DDL operation after 5 attempts.The 
> operation has been su
>  ccessfully executed but its effects may have not been broadcast to all the 
> coordinators.
>  
> From the Catalog server log, we can check following error message as well.
> I0715 11:01:50.143303 220286 jni-util.cc:256] 
> org.apache.impala.catalog.CatalogException: Couldn't retrieve the catalog 
> topic version for the SYNC_DDL operation after 5 attempts.The operation has 
> been successfully executed but its effects may have not been broadcast to all 
> the coordinators.
>  at 
> org.apache.impala.catalog.CatalogServiceCatalog.waitForSyncDdlVersion(CatalogServiceCatalog.java:2474)
>  at 
> org.apache.impala.service.CatalogOpExecutor.execDdlRequest(CatalogOpExecutor.java:374)
>  at org.apache.impala.service.JniCatalog.execDdl(JniCatalog.java:154)
> This looks to be another variation of the conditions described in 
> IMPALA-7961. But the difference here is that this case is with "CREATE 
> DATABASE ... IF NOT EXISTS".
>  The fix in IMPALA-7961 specifically targets the "CREATE TABLE ... IF NOT 
> EXISTS" use case.
> To fix the issue, we should port the change in patch 
> [https://gerrit.cloudera.org/#/c/12428/] to createDatabase() function.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Created] (IMPALA-10160) kernel_stack_watchdog cannot print user stack

2020-09-09 Thread Sahil Takiar (Jira)
Sahil Takiar created IMPALA-10160:
-

 Summary: kernel_stack_watchdog cannot print user stack
 Key: IMPALA-10160
 URL: https://issues.apache.org/jira/browse/IMPALA-10160
 Project: IMPALA
  Issue Type: Bug
  Components: Backend
Reporter: Sahil Takiar


I've seen this a few times now, the kernel_stack_watchdog is used in a few 
places in the KRPC code and it prints out the kernel + user stack whenever a 
thread is stuck in some method call for too long. The issue is that the user 
stack does not get printed:

{code}
W0908 17:15:00.365721  6605 kernel_stack_watchdog.cc:198] Thread 6612 stuck at 
outbound_call.cc:273 for 120ms:
Kernel stack:
[] futex_wait_queue_me+0xc6/0x130
[] futex_wait+0x17b/0x280
[] do_futex+0x106/0x5a0
[] SyS_futex+0x80/0x180
[] system_call_fastpath+0x16/0x1b
[] 0x

User stack:

{code}

It says that the signal handler of taking the thread stack is unavailable.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-10155) Apparent data race in GetTopNQueriesAndUpdatePoolStats

2020-09-09 Thread Qifan Chen (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-10155?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17193007#comment-17193007
 ] 

Qifan Chen commented on IMPALA-10155:
-

This case is a duplication of IMPALA-10129 which has been resolved. 

> Apparent data race in GetTopNQueriesAndUpdatePoolStats
> --
>
> Key: IMPALA-10155
> URL: https://issues.apache.org/jira/browse/IMPALA-10155
> Project: IMPALA
>  Issue Type: Bug
>Affects Versions: Impala 4.0
>Reporter: Thomas Tauber-Marshall
>Assignee: Qifan Chen
>Priority: Blocker
>
> From a tsan build:
> {noformat}
> WARNING: ThreadSanitizer: data race (pid=6487)
>   Read of size 1 at 0x7b48001c2c28 by thread T320 (mutexes: write 
> M866233966607478128, write M867078391537609888, write M627824798772259232, 
> write M451058461859238408):
> #0 
> impala::MemTracker::GetTopNQueriesAndUpdatePoolStats(std::priority_queue  std::vector std::allocator >, 
> std::greater >&, int, impala::TPoolStats&) 
> /data/jenkins/workspace/impala-asf-master-core-tsan/repos/Impala/be/src/runtime/mem-tracker.cc:453:19
>  (impalad+0x20b2e51)
> #1 impala::MemTracker::UpdatePoolStatsForQueries(int, 
> impala::TPoolStats&) 
> /data/jenkins/workspace/impala-asf-master-core-tsan/repos/Impala/be/src/runtime/mem-tracker.cc:432:3
>  (impalad+0x20b2cdd)
> #2 impala::AdmissionController::PoolStats::UpdateMemTrackerStats() 
> /data/jenkins/workspace/impala-asf-master-core-tsan/repos/Impala/be/src/scheduling/admission-controller.cc:1642:14
>  (impalad+0x21cb7b0)
> #3 
> impala::AdmissionController::AddPoolUpdates(std::vector std::allocator >*) 
> /data/jenkins/workspace/impala-asf-master-core-tsan/repos/Impala/be/src/scheduling/admission-controller.cc:1662:18
>  (impalad+0x21c8af3)
> #4 
> impala::AdmissionController::UpdatePoolStats(std::map  std::char_traits, std::allocator >, impala::TTopicDelta, 
> std::less, 
> std::allocator > >, 
> std::allocator std::char_traits, std::allocator > const, impala::TTopicDelta> > 
> > const&, std::vector std::allocator >*) 
> /data/jenkins/workspace/impala-asf-master-core-tsan/repos/Impala/be/src/scheduling/admission-controller.cc:1355:5
>  (impalad+0x21c881d)
> #5 
> impala::AdmissionController::Init()::$_4::operator()(std::map  std::char_traits, std::allocator >, impala::TTopicDelta, 
> std::less, 
> std::allocator > >, 
> std::allocator std::char_traits, std::allocator > const, impala::TTopicDelta> > 
> > const&, std::vector std::allocator >*) const 
> /data/jenkins/workspace/impala-asf-master-core-tsan/repos/Impala/be/src/scheduling/admission-controller.cc:643:45
>  (impalad+0x21cfb81)
> #6 
> boost::detail::function::void_function_obj_invoker2  void, std::map, 
> std::allocator >, impala::TTopicDelta, 
> std::less, 
> std::allocator > >, 
> std::allocator std::char_traits, std::allocator > const, impala::TTopicDelta> > 
> > const&, std::vector std::allocator 
> >*>::invoke(boost::detail::function::function_buffer&, 
> std::map, 
> std::allocator >, impala::TTopicDelta, 
> std::less, 
> std::allocator > >, 
> std::allocator std::char_traits, std::allocator > const, impala::TTopicDelta> > 
> > const&, std::vector std::allocator >*) 
> /data/jenkins/workspace/impala-asf-master-core-tsan/Impala-Toolchain/toolchain-packages-gcc7.5.0/boost-1.61.0-p2/include/boost/function/function_template.hpp:159:11
>  (impalad+0x21cf9cc)
> #7 boost::function2 std::char_traits, std::allocator >, impala::TTopicDelta, 
> std::less, 
> std::allocator > >, 
> std::allocator std::char_traits, std::allocator > const, impala::TTopicDelta> > 
> > const&, std::vector std::allocator 
> >*>::operator()(std::map std::char_traits, std::allocator >, impala::TTopicDelta, 
> std::less, 
> std::allocator > >, 
> std::allocator std::char_traits, std::allocator > const, impala::TTopicDelta> > 
> > const&, std::vector std::allocator >*) const 
> /data/jenkins/workspace/impala-asf-master-core-tsan/Impala-Toolchain/toolchain-packages-gcc7.5.0/boost-1.61.0-p2/include/boost/function/function_template.hpp:770:14
>  (impalad+0x23fc400)
> #8 
> impala::StatestoreSubscriber::UpdateState(std::map  std::char_traits, std::allocator >, impala::TTopicDelta, 
> std::less, 
> std::allocator > >, 
> std::allocator std::char_traits, std::allocator > const, impala::TTopicDelta> > 
> > const&, impala::TUniqueId const&, std::vector std::allocator >*, bool*) 
> /data/jenkins/workspace/impala-asf-master-core-tsan/repos/Impala/be/src/statestore/statestore-subscriber.cc:471:7
>  (impalad+0x23f9339)
> #9 
> impala::StatestoreSubscriberThriftIf::UpdateState(impala::TUpdateStateResponse&,
>  impala::TUpdateStateRequest const&) 
> /data/jenkins/workspace/impala-asf-master-core-tsan/repos/Impala/be/src/statestore/statestore-subscriber.cc:110:18
>  (impalad+0x23fc65f)
> #10 

[jira] [Resolved] (IMPALA-10155) Apparent data race in GetTopNQueriesAndUpdatePoolStats

2020-09-09 Thread Qifan Chen (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-10155?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Qifan Chen resolved IMPALA-10155.
-
Fix Version/s: Impala 4.0
   Resolution: Duplicate

> Apparent data race in GetTopNQueriesAndUpdatePoolStats
> --
>
> Key: IMPALA-10155
> URL: https://issues.apache.org/jira/browse/IMPALA-10155
> Project: IMPALA
>  Issue Type: Bug
>Affects Versions: Impala 4.0
>Reporter: Thomas Tauber-Marshall
>Assignee: Qifan Chen
>Priority: Blocker
> Fix For: Impala 4.0
>
>
> From a tsan build:
> {noformat}
> WARNING: ThreadSanitizer: data race (pid=6487)
>   Read of size 1 at 0x7b48001c2c28 by thread T320 (mutexes: write 
> M866233966607478128, write M867078391537609888, write M627824798772259232, 
> write M451058461859238408):
> #0 
> impala::MemTracker::GetTopNQueriesAndUpdatePoolStats(std::priority_queue  std::vector std::allocator >, 
> std::greater >&, int, impala::TPoolStats&) 
> /data/jenkins/workspace/impala-asf-master-core-tsan/repos/Impala/be/src/runtime/mem-tracker.cc:453:19
>  (impalad+0x20b2e51)
> #1 impala::MemTracker::UpdatePoolStatsForQueries(int, 
> impala::TPoolStats&) 
> /data/jenkins/workspace/impala-asf-master-core-tsan/repos/Impala/be/src/runtime/mem-tracker.cc:432:3
>  (impalad+0x20b2cdd)
> #2 impala::AdmissionController::PoolStats::UpdateMemTrackerStats() 
> /data/jenkins/workspace/impala-asf-master-core-tsan/repos/Impala/be/src/scheduling/admission-controller.cc:1642:14
>  (impalad+0x21cb7b0)
> #3 
> impala::AdmissionController::AddPoolUpdates(std::vector std::allocator >*) 
> /data/jenkins/workspace/impala-asf-master-core-tsan/repos/Impala/be/src/scheduling/admission-controller.cc:1662:18
>  (impalad+0x21c8af3)
> #4 
> impala::AdmissionController::UpdatePoolStats(std::map  std::char_traits, std::allocator >, impala::TTopicDelta, 
> std::less, 
> std::allocator > >, 
> std::allocator std::char_traits, std::allocator > const, impala::TTopicDelta> > 
> > const&, std::vector std::allocator >*) 
> /data/jenkins/workspace/impala-asf-master-core-tsan/repos/Impala/be/src/scheduling/admission-controller.cc:1355:5
>  (impalad+0x21c881d)
> #5 
> impala::AdmissionController::Init()::$_4::operator()(std::map  std::char_traits, std::allocator >, impala::TTopicDelta, 
> std::less, 
> std::allocator > >, 
> std::allocator std::char_traits, std::allocator > const, impala::TTopicDelta> > 
> > const&, std::vector std::allocator >*) const 
> /data/jenkins/workspace/impala-asf-master-core-tsan/repos/Impala/be/src/scheduling/admission-controller.cc:643:45
>  (impalad+0x21cfb81)
> #6 
> boost::detail::function::void_function_obj_invoker2  void, std::map, 
> std::allocator >, impala::TTopicDelta, 
> std::less, 
> std::allocator > >, 
> std::allocator std::char_traits, std::allocator > const, impala::TTopicDelta> > 
> > const&, std::vector std::allocator 
> >*>::invoke(boost::detail::function::function_buffer&, 
> std::map, 
> std::allocator >, impala::TTopicDelta, 
> std::less, 
> std::allocator > >, 
> std::allocator std::char_traits, std::allocator > const, impala::TTopicDelta> > 
> > const&, std::vector std::allocator >*) 
> /data/jenkins/workspace/impala-asf-master-core-tsan/Impala-Toolchain/toolchain-packages-gcc7.5.0/boost-1.61.0-p2/include/boost/function/function_template.hpp:159:11
>  (impalad+0x21cf9cc)
> #7 boost::function2 std::char_traits, std::allocator >, impala::TTopicDelta, 
> std::less, 
> std::allocator > >, 
> std::allocator std::char_traits, std::allocator > const, impala::TTopicDelta> > 
> > const&, std::vector std::allocator 
> >*>::operator()(std::map std::char_traits, std::allocator >, impala::TTopicDelta, 
> std::less, 
> std::allocator > >, 
> std::allocator std::char_traits, std::allocator > const, impala::TTopicDelta> > 
> > const&, std::vector std::allocator >*) const 
> /data/jenkins/workspace/impala-asf-master-core-tsan/Impala-Toolchain/toolchain-packages-gcc7.5.0/boost-1.61.0-p2/include/boost/function/function_template.hpp:770:14
>  (impalad+0x23fc400)
> #8 
> impala::StatestoreSubscriber::UpdateState(std::map  std::char_traits, std::allocator >, impala::TTopicDelta, 
> std::less, 
> std::allocator > >, 
> std::allocator std::char_traits, std::allocator > const, impala::TTopicDelta> > 
> > const&, impala::TUniqueId const&, std::vector std::allocator >*, bool*) 
> /data/jenkins/workspace/impala-asf-master-core-tsan/repos/Impala/be/src/statestore/statestore-subscriber.cc:471:7
>  (impalad+0x23f9339)
> #9 
> impala::StatestoreSubscriberThriftIf::UpdateState(impala::TUpdateStateResponse&,
>  impala::TUpdateStateRequest const&) 
> /data/jenkins/workspace/impala-asf-master-core-tsan/repos/Impala/be/src/statestore/statestore-subscriber.cc:110:18
>  (impalad+0x23fc65f)
> #10 

[jira] [Resolved] (IMPALA-10124) admission-controller-test fails with no such file or directory error

2020-09-09 Thread Qifan Chen (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-10124?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Qifan Chen resolved IMPALA-10124.
-
Fix Version/s: Impala 4.0
   Resolution: Fixed

> admission-controller-test fails with no such file or directory error
> 
>
> Key: IMPALA-10124
> URL: https://issues.apache.org/jira/browse/IMPALA-10124
> Project: IMPALA
>  Issue Type: Bug
>Reporter: Yongzhi Chen
>Assignee: Qifan Chen
>Priority: Major
> Fix For: Impala 4.0
>
>
> In master-core-ubsan, the admission-controller-test fails :
> 03:12:04 
> /data/jenkins/workspace/impala-asf-master-core-ubsan/repos/Impala/be/build/debug//scheduling/admission-controller-test:
>  line 10: 29380 Segmentation fault  (core dumped) 
> ${IMPALA_HOME}/bin/run-jvm-binary.sh 
> ${IMPALA_HOME}/be/build/latest/service/unifiedbetests 
> --gtest_filter=${GTEST_FILTER} 
> --gtest_output=xml:${IMPALA_BE_TEST_LOGS_DIR}/${TEST_EXEC_NAME}.xml 
> -log_filename="${TEST_EXEC_NAME}" "$@"
> 03:12:04 Traceback (most recent call last):
> 03:12:04   File 
> "/data/jenkins/workspace/impala-asf-master-core-ubsan/repos/Impala/bin/junitxml_prune_notrun.py",
>  line 71, in 
> 03:12:04 if __name__ == "__main__": main()
> 03:12:04   File 
> "/data/jenkins/workspace/impala-asf-master-core-ubsan/repos/Impala/bin/junitxml_prune_notrun.py",
>  line 68, in main
> 03:12:04 junitxml_prune_notrun(options.filename)
> 03:12:04   File 
> "/data/jenkins/workspace/impala-asf-master-core-ubsan/repos/Impala/bin/junitxml_prune_notrun.py",
>  line 31, in junitxml_prune_notrun
> 03:12:04 root = tree.parse(junitxml_filename)
> 03:12:04   File "/usr/lib64/python2.7/xml/etree/ElementTree.py", line 647, in 
> parse
> 03:12:04 source = open(source, "rb")
> 03:12:04 IOError: [Errno 2] No such file or directory: 
> '/data/jenkins/workspace/impala-asf-master-core-ubsan/repos/Impala/logs/be_tests/admission-controller-test.xml'
> ...
> 03:18:30 The following tests FAILED:
> 03:18:30   57 - admission-controller-test (Failed)
> 03:18:30 Errors while running CTest
> 03:18:30 make: *** [test] Error 8
> 03:18:30 ERROR in 
> /data/jenkins/workspace/impala-asf-master-core-ubsan/repos/Impala/bin/run-backend-tests.sh
>  at line 43: "${MAKE_CMD:-make}" test ARGS="${BE_TEST_ARGS}"



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Resolved] (IMPALA-10129) Data race in MemTracker::GetTopNQueriesAndUpdatePoolStats

2020-09-09 Thread Qifan Chen (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-10129?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Qifan Chen resolved IMPALA-10129.
-
Fix Version/s: Impala 4.0
   Resolution: Fixed

> Data race in MemTracker::GetTopNQueriesAndUpdatePoolStats
> -
>
> Key: IMPALA-10129
> URL: https://issues.apache.org/jira/browse/IMPALA-10129
> Project: IMPALA
>  Issue Type: Bug
>  Components: Backend
>Reporter: Sahil Takiar
>Assignee: Qifan Chen
>Priority: Major
> Fix For: Impala 4.0
>
>
> TSAN is reporting a data race in 
> {{MemTracker::GetTopNQueriesAndUpdatePoolStats}}
> {code}
> WARNING: ThreadSanitizer: data race (pid=6436)
>   Read of size 1 at 0x7b480017aaa8 by thread T320 (mutexes: write 
> M861448892003377216, write M862574791910219632, write M623321199144890016, 
> write M1054540811927503496):
> #0 
> impala::MemTracker::GetTopNQueriesAndUpdatePoolStats(std::priority_queue  std::vector std::allocator >, 
> std::greater >&, int, impala::TPoolStats&) 
> /data/jenkins/workspace/impala-asf-master-core-tsan/repos/Impala/be/src/runtime/mem-tracker.cc:453:19
>  (impalad+0x20b13b1)
> #1 impala::MemTracker::UpdatePoolStatsForQueries(int, 
> impala::TPoolStats&) 
> /data/jenkins/workspace/impala-asf-master-core-tsan/repos/Impala/be/src/runtime/mem-tracker.cc:432:3
>  (impalad+0x20b123d)
> #2 impala::AdmissionController::PoolStats::UpdateMemTrackerStats() 
> /data/jenkins/workspace/impala-asf-master-core-tsan/repos/Impala/be/src/scheduling/admission-controller.cc:1642:14
>  (impalad+0x21c9d10)
> #3 
> impala::AdmissionController::AddPoolUpdates(std::vector std::allocator >*) 
> /data/jenkins/workspace/impala-asf-master-core-tsan/repos/Impala/be/src/scheduling/admission-controller.cc:1662:18
>  (impalad+0x21c7053)
> #4 
> impala::AdmissionController::UpdatePoolStats(std::map  std::char_traits, std::allocator >, impala::TTopicDelta, 
> std::less, 
> std::allocator > >, 
> std::allocator std::char_traits, std::allocator > const, impala::TTopicDelta> > 
> > const&, std::vector std::allocator >*) 
> /data/jenkins/workspace/impala-asf-master-core-tsan/repos/Impala/be/src/scheduling/admission-controller.cc:1355:5
>  (impalad+0x21c6d7d)
> #5 
> impala::AdmissionController::Init()::$_4::operator()(std::map  std::char_traits, std::allocator >, impala::TTopicDelta, 
> std::less, 
> std::allocator > >, 
> std::allocator std::char_traits, std::allocator > const, impala::TTopicDelta> > 
> > const&, std::vector std::allocator >*) const 
> /data/jenkins/workspace/impala-asf-master-core-tsan/repos/Impala/be/src/scheduling/admission-controller.cc:643:45
>  (impalad+0x21ce0e1)
> #6 
> boost::detail::function::void_function_obj_invoker2  void, std::map, 
> std::allocator >, impala::TTopicDelta, 
> std::less, 
> std::allocator > >, 
> std::allocator std::char_traits, std::allocator > const, impala::TTopicDelta> > 
> > const&, std::vector std::allocator 
> >*>::invoke(boost::detail::function::function_buffer&, 
> std::map, 
> std::allocator >, impala::TTopicDelta, 
> std::less, 
> std::allocator > >, 
> std::allocator std::char_traits, std::allocator > const, impala::TTopicDelta> > 
> > const&, std::vector std::allocator >*) 
> /data/jenkins/workspace/impala-asf-master-core-tsan/Impala-Toolchain/toolchain-packages-gcc7.5.0/boost-1.61.0-p2/include/boost/function/function_template.hpp:159:11
>  (impalad+0x21cdf2c)
> #7 boost::function2 std::char_traits, std::allocator >, impala::TTopicDelta, 
> std::less, 
> std::allocator > >, 
> std::allocator std::char_traits, std::allocator > const, impala::TTopicDelta> > 
> > const&, std::vector std::allocator 
> >*>::operator()(std::map std::char_traits, std::allocator >, impala::TTopicDelta, 
> std::less, 
> std::allocator > >, 
> std::allocator std::char_traits, std::allocator > const, impala::TTopicDelta> > 
> > const&, std::vector std::allocator >*) const 
> /data/jenkins/workspace/impala-asf-master-core-tsan/Impala-Toolchain/toolchain-packages-gcc7.5.0/boost-1.61.0-p2/include/boost/function/function_template.hpp:770:14
>  (impalad+0x23fa960)
> #8 
> impala::StatestoreSubscriber::UpdateState(std::map  std::char_traits, std::allocator >, impala::TTopicDelta, 
> std::less, 
> std::allocator > >, 
> std::allocator std::char_traits, std::allocator > const, impala::TTopicDelta> > 
> > const&, impala::TUniqueId const&, std::vector std::allocator >*, bool*) 
> /data/jenkins/workspace/impala-asf-master-core-tsan/repos/Impala/be/src/statestore/statestore-subscriber.cc:471:7
>  (impalad+0x23f7899)
> #9 
> impala::StatestoreSubscriberThriftIf::UpdateState(impala::TUpdateStateResponse&,
>  impala::TUpdateStateRequest const&) 
> /data/jenkins/workspace/impala-asf-master-core-tsan/repos/Impala/be/src/statestore/statestore-subscriber.cc:110:18
>  

[jira] [Commented] (IMPALA-10140) Throw CatalogException for query "create database if not exist" with sync_ddl as true

2020-09-09 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-10140?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17192951#comment-17192951
 ] 

ASF subversion and git services commented on IMPALA-10140:
--

Commit 0c89a9d562c280507a6e842898bf3e41cadc3ff1 in impala's branch 
refs/heads/master from wzhou-code
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=0c89a9d ]

IMPALA-10140: Fix CatalogExeception for creating database with sync_ddl as true

IMPALA-7961 handle the cases for query "create table if not exists"
with sync_ddl as true. Customers reported similar issue which happened
for query "create database if not exists" with sync_ddl as true.
This patch adds the similar fixing as the fixing for IMPALA-7961 to
function CatalogOpExecutor.createDatabase() to fix the issue.

Testing:
 - Manual tests
   Since this is a racy bug, I could only reproduce it by forcing
   frequent topicUpdateLog GCs along with a specific sequence of
   actions, like: run some DDLs and REFRESHs to trigger a GC in
   topicUpdateLog, then run query "create database if not exists" with
   sync_ddl as true. Verified that the issue couldn't be reproduced
   after applying this patch.
 - Passed exhaustive test.

Change-Id: Id623118f8938f416414c45d93404fb70d036a9df
Reviewed-on: http://gerrit.cloudera.org:8080/16421
Reviewed-by: Impala Public Jenkins 
Tested-by: Impala Public Jenkins 


> Throw CatalogException for query "create database if not exist" with sync_ddl 
> as true
> -
>
> Key: IMPALA-10140
> URL: https://issues.apache.org/jira/browse/IMPALA-10140
> Project: IMPALA
>  Issue Type: Bug
>  Components: Catalog, Frontend
>Affects Versions: Impala 3.2.0
>Reporter: Wenzhe Zhou
>Assignee: Wenzhe Zhou
>Priority: Critical
>
> Customer faced following error message randomly when running following query 
> on impalad version 3.2.0-cdh6.3.2 RELEASE.
> set sync_ddl =true ; create database if not exists $dbname;
> I0715 11:52:28.496253 51943 client-request-state.cc:187] 
> a246b430fe450786:81647bd6] CatalogException: Couldn't retrieve the 
> catalog topic version for the SYNC_DDL operation after 5 attempts.The 
> operation has been su
>  ccessfully executed but its effects may have not been broadcast to all the 
> coordinators.
>  
> From the Catalog server log, we can check following error message as well.
> I0715 11:01:50.143303 220286 jni-util.cc:256] 
> org.apache.impala.catalog.CatalogException: Couldn't retrieve the catalog 
> topic version for the SYNC_DDL operation after 5 attempts.The operation has 
> been successfully executed but its effects may have not been broadcast to all 
> the coordinators.
>  at 
> org.apache.impala.catalog.CatalogServiceCatalog.waitForSyncDdlVersion(CatalogServiceCatalog.java:2474)
>  at 
> org.apache.impala.service.CatalogOpExecutor.execDdlRequest(CatalogOpExecutor.java:374)
>  at org.apache.impala.service.JniCatalog.execDdl(JniCatalog.java:154)
> This looks to be another variation of the conditions described in 
> IMPALA-7961. But the difference here is that this case is with "CREATE 
> DATABASE ... IF NOT EXISTS".
>  The fix in IMPALA-7961 specifically targets the "CREATE TABLE ... IF NOT 
> EXISTS" use case.
> To fix the issue, we should port the change in patch 
> [https://gerrit.cloudera.org/#/c/12428/] to createDatabase() function.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-7961) Concurrent catalog heavy workloads can cause queries with SYNC_DDL to fail fast

2020-09-09 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-7961?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17192952#comment-17192952
 ] 

ASF subversion and git services commented on IMPALA-7961:
-

Commit 0c89a9d562c280507a6e842898bf3e41cadc3ff1 in impala's branch 
refs/heads/master from wzhou-code
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=0c89a9d ]

IMPALA-10140: Fix CatalogExeception for creating database with sync_ddl as true

IMPALA-7961 handle the cases for query "create table if not exists"
with sync_ddl as true. Customers reported similar issue which happened
for query "create database if not exists" with sync_ddl as true.
This patch adds the similar fixing as the fixing for IMPALA-7961 to
function CatalogOpExecutor.createDatabase() to fix the issue.

Testing:
 - Manual tests
   Since this is a racy bug, I could only reproduce it by forcing
   frequent topicUpdateLog GCs along with a specific sequence of
   actions, like: run some DDLs and REFRESHs to trigger a GC in
   topicUpdateLog, then run query "create database if not exists" with
   sync_ddl as true. Verified that the issue couldn't be reproduced
   after applying this patch.
 - Passed exhaustive test.

Change-Id: Id623118f8938f416414c45d93404fb70d036a9df
Reviewed-on: http://gerrit.cloudera.org:8080/16421
Reviewed-by: Impala Public Jenkins 
Tested-by: Impala Public Jenkins 


> Concurrent catalog heavy workloads can cause queries with SYNC_DDL to fail 
> fast
> ---
>
> Key: IMPALA-7961
> URL: https://issues.apache.org/jira/browse/IMPALA-7961
> Project: IMPALA
>  Issue Type: Bug
>  Components: Catalog
>Affects Versions: Impala 2.12.0, Impala 3.1.0
>Reporter: Bharath Vissapragada
>Assignee: Bharath Vissapragada
>Priority: Critical
> Fix For: Impala 3.2.0
>
> Attachments: 0001-Repro-of-IMPALA-7961.patch
>
>
> When catalog server is under heavy load with concurrent updates to objects, 
> queries with SYNC_DDL can fail with the following message.
> *User facing error message:*
> {noformat}
> ERROR: CatalogException: Couldn't retrieve the catalog topic version for the 
> SYNC_DDL operation after 3 attempts.The operation has been successfully 
> executed but its effects may have not been broadcast to all the coordinators.
> {noformat}
> *Exception from the catalog server log:*
> {noformat}
> I1031 00:00:49.168761 1127039 CatalogServiceCatalog.java:1903] Operation 
> using SYNC_DDL is waiting for catalog topic version: 236535. Time to identify 
> topic version (msec): 1088
> I1031 00:00:49.168824 1125528 CatalogServiceCatalog.java:1903] Operation 
> using SYNC_DDL is waiting for catalog topic version: 236535. Time to identify 
> topic version (msec): 12625
> I1031 00:00:49.168851 1131986 jni-util.cc:230] 
> org.apache.impala.catalog.CatalogException: Couldn't retrieve the catalog 
> topic version for the SYNC_DDL operation after 3 attempts.The operation has 
> been successfully executed but its effects may have not been broadcast to all 
> the coordinators.
> at 
> org.apache.impala.catalog.CatalogServiceCatalog.waitForSyncDdlVersion(CatalogServiceCatalog.java:1891)
> at 
> org.apache.impala.service.CatalogOpExecutor.execDdlRequest(CatalogOpExecutor.java:336)
> at org.apache.impala.service.JniCatalog.execDdl(JniCatalog.java:146)
> 
> {noformat}
> *What this means*
> The Catalog operation is actually successful (the change has been committed 
> to HMS and Catalog server cache) but the Catalog server noticed that it is 
> taking longer than expected time for it to broadcast the changes (for 
> whatever reason) and instead of hanging in there, it fails fast. The 
> coordinators are expected to eventually sync up in the background.
> *Problem*
>  - This violates the contract of the SYNC_DDL query option since the query 
> returns early.
>  - This is a behavioral regression from pre IMPALA-5058 state where the 
> queries would wait forever for SYNC_DDL based changes to propagate.
> *Notes*
>  - Introduced by IMPALA-5058
>  - Based on the occurrences of this issue, we narrowed it down to a specific 
> kind of DDLs (see Jira comments).
>  - My understanding is that this also applies to the Catalog V2 (or 
> LocalCatalog mode) since we still rely on the CatalogServer for DDL 
> orchestration and hence it takes this codepath.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-10129) Data race in MemTracker::GetTopNQueriesAndUpdatePoolStats

2020-09-09 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-10129?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17192950#comment-17192950
 ] 

ASF subversion and git services commented on IMPALA-10129:
--

Commit 9f51673a40d61cf087dd72c6e50719ed522ac851 in impala's branch 
refs/heads/master from Qifan Chen
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=9f51673 ]

IMPALA-10129 Data race in MemTracker::GetTopNQueriesAndUpdatePoolStats

This work addresses a data race condition in admission controller by
providing the initializing values for two data members (
is_query_mem_tracker_ and query_id_) in a constructor for the MemTracker
class. Without doing so, the two data members are set, without lock
protection, after the object is constructed, which allows other threads
to modify either of them at the same time.

Testing:
1. Ran the python admission controller test successfully with a tsan
   build. Data race was not observed with the enhancement. Data race
   was observed without the enhancement.
2. Ran the core test.

Change-Id: I9c4ffe8064d3e099a525cc48c218ef73112fb67b
Reviewed-on: http://gerrit.cloudera.org:8080/16408
Reviewed-by: Impala Public Jenkins 
Tested-by: Impala Public Jenkins 


> Data race in MemTracker::GetTopNQueriesAndUpdatePoolStats
> -
>
> Key: IMPALA-10129
> URL: https://issues.apache.org/jira/browse/IMPALA-10129
> Project: IMPALA
>  Issue Type: Bug
>  Components: Backend
>Reporter: Sahil Takiar
>Assignee: Qifan Chen
>Priority: Major
>
> TSAN is reporting a data race in 
> {{MemTracker::GetTopNQueriesAndUpdatePoolStats}}
> {code}
> WARNING: ThreadSanitizer: data race (pid=6436)
>   Read of size 1 at 0x7b480017aaa8 by thread T320 (mutexes: write 
> M861448892003377216, write M862574791910219632, write M623321199144890016, 
> write M1054540811927503496):
> #0 
> impala::MemTracker::GetTopNQueriesAndUpdatePoolStats(std::priority_queue  std::vector std::allocator >, 
> std::greater >&, int, impala::TPoolStats&) 
> /data/jenkins/workspace/impala-asf-master-core-tsan/repos/Impala/be/src/runtime/mem-tracker.cc:453:19
>  (impalad+0x20b13b1)
> #1 impala::MemTracker::UpdatePoolStatsForQueries(int, 
> impala::TPoolStats&) 
> /data/jenkins/workspace/impala-asf-master-core-tsan/repos/Impala/be/src/runtime/mem-tracker.cc:432:3
>  (impalad+0x20b123d)
> #2 impala::AdmissionController::PoolStats::UpdateMemTrackerStats() 
> /data/jenkins/workspace/impala-asf-master-core-tsan/repos/Impala/be/src/scheduling/admission-controller.cc:1642:14
>  (impalad+0x21c9d10)
> #3 
> impala::AdmissionController::AddPoolUpdates(std::vector std::allocator >*) 
> /data/jenkins/workspace/impala-asf-master-core-tsan/repos/Impala/be/src/scheduling/admission-controller.cc:1662:18
>  (impalad+0x21c7053)
> #4 
> impala::AdmissionController::UpdatePoolStats(std::map  std::char_traits, std::allocator >, impala::TTopicDelta, 
> std::less, 
> std::allocator > >, 
> std::allocator std::char_traits, std::allocator > const, impala::TTopicDelta> > 
> > const&, std::vector std::allocator >*) 
> /data/jenkins/workspace/impala-asf-master-core-tsan/repos/Impala/be/src/scheduling/admission-controller.cc:1355:5
>  (impalad+0x21c6d7d)
> #5 
> impala::AdmissionController::Init()::$_4::operator()(std::map  std::char_traits, std::allocator >, impala::TTopicDelta, 
> std::less, 
> std::allocator > >, 
> std::allocator std::char_traits, std::allocator > const, impala::TTopicDelta> > 
> > const&, std::vector std::allocator >*) const 
> /data/jenkins/workspace/impala-asf-master-core-tsan/repos/Impala/be/src/scheduling/admission-controller.cc:643:45
>  (impalad+0x21ce0e1)
> #6 
> boost::detail::function::void_function_obj_invoker2  void, std::map, 
> std::allocator >, impala::TTopicDelta, 
> std::less, 
> std::allocator > >, 
> std::allocator std::char_traits, std::allocator > const, impala::TTopicDelta> > 
> > const&, std::vector std::allocator 
> >*>::invoke(boost::detail::function::function_buffer&, 
> std::map, 
> std::allocator >, impala::TTopicDelta, 
> std::less, 
> std::allocator > >, 
> std::allocator std::char_traits, std::allocator > const, impala::TTopicDelta> > 
> > const&, std::vector std::allocator >*) 
> /data/jenkins/workspace/impala-asf-master-core-tsan/Impala-Toolchain/toolchain-packages-gcc7.5.0/boost-1.61.0-p2/include/boost/function/function_template.hpp:159:11
>  (impalad+0x21cdf2c)
> #7 boost::function2 std::char_traits, std::allocator >, impala::TTopicDelta, 
> std::less, 
> std::allocator > >, 
> std::allocator std::char_traits, std::allocator > const, impala::TTopicDelta> > 
> > const&, std::vector std::allocator 
> >*>::operator()(std::map std::char_traits, std::allocator >, impala::TTopicDelta, 
> std::less, 
> std::allocator > >, 
> std::allocator std::char_traits, 

[jira] [Commented] (IMPALA-7961) Concurrent catalog heavy workloads can cause queries with SYNC_DDL to fail fast

2020-09-09 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-7961?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17192953#comment-17192953
 ] 

ASF subversion and git services commented on IMPALA-7961:
-

Commit 0c89a9d562c280507a6e842898bf3e41cadc3ff1 in impala's branch 
refs/heads/master from wzhou-code
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=0c89a9d ]

IMPALA-10140: Fix CatalogExeception for creating database with sync_ddl as true

IMPALA-7961 handle the cases for query "create table if not exists"
with sync_ddl as true. Customers reported similar issue which happened
for query "create database if not exists" with sync_ddl as true.
This patch adds the similar fixing as the fixing for IMPALA-7961 to
function CatalogOpExecutor.createDatabase() to fix the issue.

Testing:
 - Manual tests
   Since this is a racy bug, I could only reproduce it by forcing
   frequent topicUpdateLog GCs along with a specific sequence of
   actions, like: run some DDLs and REFRESHs to trigger a GC in
   topicUpdateLog, then run query "create database if not exists" with
   sync_ddl as true. Verified that the issue couldn't be reproduced
   after applying this patch.
 - Passed exhaustive test.

Change-Id: Id623118f8938f416414c45d93404fb70d036a9df
Reviewed-on: http://gerrit.cloudera.org:8080/16421
Reviewed-by: Impala Public Jenkins 
Tested-by: Impala Public Jenkins 


> Concurrent catalog heavy workloads can cause queries with SYNC_DDL to fail 
> fast
> ---
>
> Key: IMPALA-7961
> URL: https://issues.apache.org/jira/browse/IMPALA-7961
> Project: IMPALA
>  Issue Type: Bug
>  Components: Catalog
>Affects Versions: Impala 2.12.0, Impala 3.1.0
>Reporter: Bharath Vissapragada
>Assignee: Bharath Vissapragada
>Priority: Critical
> Fix For: Impala 3.2.0
>
> Attachments: 0001-Repro-of-IMPALA-7961.patch
>
>
> When catalog server is under heavy load with concurrent updates to objects, 
> queries with SYNC_DDL can fail with the following message.
> *User facing error message:*
> {noformat}
> ERROR: CatalogException: Couldn't retrieve the catalog topic version for the 
> SYNC_DDL operation after 3 attempts.The operation has been successfully 
> executed but its effects may have not been broadcast to all the coordinators.
> {noformat}
> *Exception from the catalog server log:*
> {noformat}
> I1031 00:00:49.168761 1127039 CatalogServiceCatalog.java:1903] Operation 
> using SYNC_DDL is waiting for catalog topic version: 236535. Time to identify 
> topic version (msec): 1088
> I1031 00:00:49.168824 1125528 CatalogServiceCatalog.java:1903] Operation 
> using SYNC_DDL is waiting for catalog topic version: 236535. Time to identify 
> topic version (msec): 12625
> I1031 00:00:49.168851 1131986 jni-util.cc:230] 
> org.apache.impala.catalog.CatalogException: Couldn't retrieve the catalog 
> topic version for the SYNC_DDL operation after 3 attempts.The operation has 
> been successfully executed but its effects may have not been broadcast to all 
> the coordinators.
> at 
> org.apache.impala.catalog.CatalogServiceCatalog.waitForSyncDdlVersion(CatalogServiceCatalog.java:1891)
> at 
> org.apache.impala.service.CatalogOpExecutor.execDdlRequest(CatalogOpExecutor.java:336)
> at org.apache.impala.service.JniCatalog.execDdl(JniCatalog.java:146)
> 
> {noformat}
> *What this means*
> The Catalog operation is actually successful (the change has been committed 
> to HMS and Catalog server cache) but the Catalog server noticed that it is 
> taking longer than expected time for it to broadcast the changes (for 
> whatever reason) and instead of hanging in there, it fails fast. The 
> coordinators are expected to eventually sync up in the background.
> *Problem*
>  - This violates the contract of the SYNC_DDL query option since the query 
> returns early.
>  - This is a behavioral regression from pre IMPALA-5058 state where the 
> queries would wait forever for SYNC_DDL based changes to propagate.
> *Notes*
>  - Introduced by IMPALA-5058
>  - Based on the occurrences of this issue, we narrowed it down to a specific 
> kind of DDLs (see Jira comments).
>  - My understanding is that this also applies to the Catalog V2 (or 
> LocalCatalog mode) since we still rely on the CatalogServer for DDL 
> orchestration and hence it takes this codepath.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-9741) Support query iceberg table by impala

2020-09-09 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-9741?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17192955#comment-17192955
 ] 

ASF subversion and git services commented on IMPALA-9741:
-

Commit efc627d050caeb9947af2dfd3fc8a02236c44d0e in impala's branch 
refs/heads/master from Fang-Yu Rao
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=efc627d ]

IMPALA-10158: Set timezone to UTC for Iceberg-related E2E tests

We found that the tests of test_iceberg_query and test_iceberg_profile
fail after the patch for IMPALA-9741 has been merged and that it is due
to the default timezone of Impala not being UTC. This patch fixes the
issue by adding "SET TIMEZONE=UTC;" before those test queries are run.

Testing:
 - Verified in a local development environment that the tests of
   test_iceberg_query and test_iceberg_profile could pass after applying
   this patch.

Change-Id: Ie985519e8ded04f90465e141488bd2dda78af6c3
Reviewed-on: http://gerrit.cloudera.org:8080/16425
Reviewed-by: Impala Public Jenkins 
Tested-by: Impala Public Jenkins 


> Support query iceberg table by impala
> -
>
> Key: IMPALA-9741
> URL: https://issues.apache.org/jira/browse/IMPALA-9741
> Project: IMPALA
>  Issue Type: Sub-task
>Reporter: WangSheng
>Assignee: WangSheng
>Priority: Major
>  Labels: impala-iceberg
> Attachments: select-iceberg.jpg
>
>
> Since we have submit an patch of supporting create iceberg table by impala in 
> IMPALA-9688, we are preparing to implement iceberg table query by impala. But 
> we need to read the impala and iceberg code  deeply to determine how to do 
> this.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-10158) test_iceberg_query and test_iceberg_profile fail after IMPALA-9741

2020-09-09 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-10158?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17192954#comment-17192954
 ] 

ASF subversion and git services commented on IMPALA-10158:
--

Commit efc627d050caeb9947af2dfd3fc8a02236c44d0e in impala's branch 
refs/heads/master from Fang-Yu Rao
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=efc627d ]

IMPALA-10158: Set timezone to UTC for Iceberg-related E2E tests

We found that the tests of test_iceberg_query and test_iceberg_profile
fail after the patch for IMPALA-9741 has been merged and that it is due
to the default timezone of Impala not being UTC. This patch fixes the
issue by adding "SET TIMEZONE=UTC;" before those test queries are run.

Testing:
 - Verified in a local development environment that the tests of
   test_iceberg_query and test_iceberg_profile could pass after applying
   this patch.

Change-Id: Ie985519e8ded04f90465e141488bd2dda78af6c3
Reviewed-on: http://gerrit.cloudera.org:8080/16425
Reviewed-by: Impala Public Jenkins 
Tested-by: Impala Public Jenkins 


> test_iceberg_query and test_iceberg_profile fail after IMPALA-9741
> --
>
> Key: IMPALA-10158
> URL: https://issues.apache.org/jira/browse/IMPALA-10158
> Project: IMPALA
>  Issue Type: Bug
>Reporter: Fang-Yu Rao
>Assignee: Fang-Yu Rao
>Priority: Major
>
> We found that the tests of {{test_iceberg_query}} and 
> {{test_iceberg_profile}} fail after the patch for  IMPALA-9741 has been 
> merged.
> After some investigation with the help of [~boroknagyz] and [~csringhofer], 
> we found that it is a timezone-related issue and that we should add {{SET 
> TIMEZONE=UTC}} in the corresponding test files.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-10159) Support ORC file format for Iceberg table

2020-09-09 Thread WangSheng (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-10159?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17192834#comment-17192834
 ] 

WangSheng commented on IMPALA-10159:


Hi [~boroknagyz], I use spark-shell to generated test files, my spark client 
version is 2.4.5, and the orc jars in this client is 1.5.5, even I replace 
these orc jars to 1.6.3, it doesn't work. Here is the code to generated test 
files:

{code:java}
val conf = new Configuration()
val tblLoc = "/test-warehouse/iceberg_test/iceberg_partitioned_orc"
val catalog = new HadoopTables(conf);
val sparkSchema = StructType(List(StructField("id", IntegerType,true),
StructField("user", StringType,false),StructField("action", StringType,false),
StructField("event_time", 
SparkSchemaUtil.convert(Types.TimestampType.withoutZone()),false)))
val icebergSchema = SparkSchemaUtil.convert(sparkSchema)
val spec = 
PartitionSpec.builderFor(icebergSchema).hour("event_time").identity("action").build
val table = catalog.create(icebergSchema, spec, tblLoc)
val data_df = 
spark.createDataFrame(Seq((1,"Alex","view",Timestamp.valueOf("2020-01-01 
08:00:00".toDF("id","user","action","ts")
var array = 
data_df.select(data_df("id"),data_df("user"),data_df("action"),to_timestamp(data_df("ts"))).collect()
val df = spark.createDataFrame(sc.makeRDD(array), sparkSchema)
df.write.format("iceberg").option("write-format", 
"orc").mode("append").save(tblLoc)
spark.read.format("iceberg").load(tblLoc).show
{code}
This code will throw exception "java.lang.UnsupportedOperationException: Spark 
does not support timestamp without time zone fields"
If we replace "SparkSchemaUtil.convert(Types.TimestampType.withoutZone())" to 
"TimestampType", we can generated test files normally, but when query in 
Impala, you can meet the problem in IMPALA-9967.
And here is the create statement:

{code:java}
CREATE EXTERNAL TABLE default.iceberg_partitioned_orc
STORED AS ICEBERG
LOCATION 
'hdfs://localhost:20500/test-warehouse/iceberg_test/iceberg_partitioned_orc'
TBLPROPERTIES('iceberg_file_format'='orc');
{code}



> Support ORC file format for Iceberg table
> -
>
> Key: IMPALA-10159
> URL: https://issues.apache.org/jira/browse/IMPALA-10159
> Project: IMPALA
>  Issue Type: Sub-task
>Reporter: WangSheng
>Assignee: WangSheng
>Priority: Minor
>  Labels: impala-iceberg
>
> Impala can query PARQUET file format for Iceberg Table now. Since have 
> already do some work in IMPALA-9741, we can continue ORC file format 
> supported work in this jira.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Comment Edited] (IMPALA-10159) Support ORC file format for Iceberg table

2020-09-09 Thread Jira


[ 
https://issues.apache.org/jira/browse/IMPALA-10159?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17192815#comment-17192815
 ] 

Zoltán Borók-Nagy edited comment on IMPALA-10159 at 9/9/20, 12:19 PM:
--

Hey [~skyyws], could you tell me which version of Spark/ORC do you use? An 
alternative is to create the files with an older ORC library.

If that's too much trouble, then maybe we can omit timestamps as you propose, 
and -open a Jira only for the new timestamp types.- we have IMPALA-9967 to 
track the timestamp issue.


was (Author: boroknagyz):
Hey [~skyyws], could you tell me which version of Spark/ORC do you use? An 
alternative is to create the files with an older ORC library.

If that's too much trouble, then maybe we can omit timestamps as you propose, 
and open a Jira only for the new timestamp types.

> Support ORC file format for Iceberg table
> -
>
> Key: IMPALA-10159
> URL: https://issues.apache.org/jira/browse/IMPALA-10159
> Project: IMPALA
>  Issue Type: Sub-task
>Reporter: WangSheng
>Assignee: WangSheng
>Priority: Minor
>  Labels: impala-iceberg
>
> Impala can query PARQUET file format for Iceberg Table now. Since have 
> already do some work in IMPALA-9741, we can continue ORC file format 
> supported work in this jira.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-10159) Support ORC file format for Iceberg table

2020-09-09 Thread Jira


[ 
https://issues.apache.org/jira/browse/IMPALA-10159?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17192815#comment-17192815
 ] 

Zoltán Borók-Nagy commented on IMPALA-10159:


Hey [~skyyws], could you tell me which version of Spark/ORC do you use? An 
alternative is to create the files with an older ORC library.

If that's too much trouble, then maybe we can omit timestamps as you propose, 
and open a Jira only for the new timestamp types.

> Support ORC file format for Iceberg table
> -
>
> Key: IMPALA-10159
> URL: https://issues.apache.org/jira/browse/IMPALA-10159
> Project: IMPALA
>  Issue Type: Sub-task
>Reporter: WangSheng
>Assignee: WangSheng
>Priority: Minor
>  Labels: impala-iceberg
>
> Impala can query PARQUET file format for Iceberg Table now. Since have 
> already do some work in IMPALA-9741, we can continue ORC file format 
> supported work in this jira.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Updated] (IMPALA-10159) Support ORC file format for Iceberg table

2020-09-09 Thread Jira


 [ 
https://issues.apache.org/jira/browse/IMPALA-10159?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zoltán Borók-Nagy updated IMPALA-10159:
---
Labels: impala-iceberg  (was: )

> Support ORC file format for Iceberg table
> -
>
> Key: IMPALA-10159
> URL: https://issues.apache.org/jira/browse/IMPALA-10159
> Project: IMPALA
>  Issue Type: Sub-task
>Reporter: WangSheng
>Assignee: WangSheng
>Priority: Minor
>  Labels: impala-iceberg
>
> Impala can query PARQUET file format for Iceberg Table now. Since have 
> already do some work in IMPALA-9741, we can continue ORC file format 
> supported work in this jira.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-10159) Support ORC file format for Iceberg table

2020-09-09 Thread WangSheng (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-10159?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17192751#comment-17192751
 ] 

WangSheng commented on IMPALA-10159:


Hi [~boroknagyz],[~tarmstrong], supported ORC file format for Iceberg table is 
quite simple based on IMPALA-9741. The point is to construct test cases, and we 
meet problems in IMPALA-9967. My previous test file is generated by Spark, and 
I found that Spark is not supported timestamp without time zone fields. So I 
think we may generate test files without Timestamp type and explain this in the 
code. How do you think?

> Support ORC file format for Iceberg table
> -
>
> Key: IMPALA-10159
> URL: https://issues.apache.org/jira/browse/IMPALA-10159
> Project: IMPALA
>  Issue Type: Sub-task
>Reporter: WangSheng
>Assignee: WangSheng
>Priority: Minor
>
> Impala can query PARQUET file format for Iceberg Table now. Since have 
> already do some work in IMPALA-9741, we can continue ORC file format 
> supported work in this jira.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Created] (IMPALA-10159) Support ORC file format for Iceberg table

2020-09-09 Thread WangSheng (Jira)
WangSheng created IMPALA-10159:
--

 Summary: Support ORC file format for Iceberg table
 Key: IMPALA-10159
 URL: https://issues.apache.org/jira/browse/IMPALA-10159
 Project: IMPALA
  Issue Type: Sub-task
Reporter: WangSheng
Assignee: WangSheng


Impala can query PARQUET file format for Iceberg Table now. Since have already 
do some work in IMPALA-9741, we can continue ORC file format supported work in 
this jira.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org