[jira] [Resolved] (IMPALA-9246) Make crcutils building work on aarch64
[ https://issues.apache.org/jira/browse/IMPALA-9246?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] huangtianhua resolved IMPALA-9246. -- Resolution: Fixed fixed in: [https://gerrit.cloudera.org/#/c/14901/] > Make crcutils building work on aarch64 > -- > > Key: IMPALA-9246 > URL: https://issues.apache.org/jira/browse/IMPALA-9246 > Project: IMPALA > Issue Type: Sub-task >Reporter: huangtianhua >Assignee: huangtianhua >Priority: Major > > Make crcutils failed on aarch64 platform: > g++: error: unrecognized command line option '-msse2' > g++: error: unrecognized command line option '-mcrc32' > g++: error: unrecognized command line option '-msse2' > g++: error: unrecognized command line option '-mcrc32' > Makefile:856: recipe for target > 'code/libcrcutil_la-multiword_64_64_cl_i386_mmx.lo' failed > make: *** [code/libcrcutil_la-multiword_64_64_cl_i386_mmx.lo] Error 1 > make: *** Waiting for unfinished jobs > Makefile:849: recipe for target > 'code/libcrcutil_la-multiword_128_64_gcc_amd64_sse2.lo' failed > make: *** [code/libcrcutil_la-multiword_128_64_gcc_amd64_sse2.lo] Error 1 > {color:#de350b}g++: error: unrecognized command line option '-msse2'{color} > {color:#de350b} g++: error: unrecognized command line option '-mcrc32'{color} > Makefile:842: recipe for target 'code/libcrcutil_la-crc32c_sse4.lo' failed > make: *** [code/libcrcutil_la-crc32c_sse4.lo] Error 1 -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Resolved] (IMPALA-9278) error: expected primary-expression before ‘return’
[ https://issues.apache.org/jira/browse/IMPALA-9278?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] huangtianhua resolved IMPALA-9278. -- Resolution: Fixed > error: expected primary-expression before ‘return’ > -- > > Key: IMPALA-9278 > URL: https://issues.apache.org/jira/browse/IMPALA-9278 > Project: IMPALA > Issue Type: Sub-task >Reporter: huangtianhua >Assignee: huangtianhua >Priority: Major > > An error raised when execute ./buildall.sh on aarch64: > /home/jenkins/workspace/impala/be/src/gutil/atomicops-internals-x86.h:413:15: > error: expected primary-expression before ‘return’ new_val = return > impala::ArithmeticUtil::AsUnsigned(old_val, increment); -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Updated] (IMPALA-9278) error: expected primary-expression before ‘return’
[ https://issues.apache.org/jira/browse/IMPALA-9278?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] huangtianhua updated IMPALA-9278: - Parent: IMPALA-9236 Issue Type: Sub-task (was: Bug) > error: expected primary-expression before ‘return’ > -- > > Key: IMPALA-9278 > URL: https://issues.apache.org/jira/browse/IMPALA-9278 > Project: IMPALA > Issue Type: Sub-task >Reporter: huangtianhua >Assignee: huangtianhua >Priority: Major > > An error raised when execute ./buildall.sh on aarch64: > /home/jenkins/workspace/impala/be/src/gutil/atomicops-internals-x86.h:413:15: > error: expected primary-expression before ‘return’ new_val = return > impala::ArithmeticUtil::AsUnsigned(old_val, increment); -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Created] (IMPALA-9303) Add time now for aarch64
huangtianhua created IMPALA-9303: Summary: Add time now for aarch64 Key: IMPALA-9303 URL: https://issues.apache.org/jira/browse/IMPALA-9303 Project: IMPALA Issue Type: Sub-task Reporter: huangtianhua System timer of ARMv8 runs at a different frequency than the CPU's. Add definition of CycleClock::Now to support aarch64. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Commented] (IMPALA-9278) error: expected primary-expression before ‘return’
[ https://issues.apache.org/jira/browse/IMPALA-9278?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17017753#comment-17017753 ] huangtianhua commented on IMPALA-9278: -- [~tarmstrong], but this issue is in atomicops-internals-x86.h, not sure why x86 is ok:) > error: expected primary-expression before ‘return’ > -- > > Key: IMPALA-9278 > URL: https://issues.apache.org/jira/browse/IMPALA-9278 > Project: IMPALA > Issue Type: Bug >Reporter: huangtianhua >Assignee: huangtianhua >Priority: Major > > An error raised when execute ./buildall.sh on aarch64: > /home/jenkins/workspace/impala/be/src/gutil/atomicops-internals-x86.h:413:15: > error: expected primary-expression before ‘return’ new_val = return > impala::ArithmeticUtil::AsUnsigned(old_val, increment); -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Assigned] (IMPALA-9302) Multithreaded scanners don't check for filter effectiveness
[ https://issues.apache.org/jira/browse/IMPALA-9302?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Armstrong reassigned IMPALA-9302: - Assignee: Tim Armstrong > Multithreaded scanners don't check for filter effectiveness > --- > > Key: IMPALA-9302 > URL: https://issues.apache.org/jira/browse/IMPALA-9302 > Project: IMPALA > Issue Type: Improvement > Components: Backend >Reporter: Tim Armstrong >Assignee: Tim Armstrong >Priority: Major > Labels: multithreading, performance > > This can be reproduced for TPC-H Q9. I saw this on scale factor 30 locally, > where the mt_dop=4 version of the query uses a lot more CPU in the scan than > the mt_dop=0 version. > This turns out to be because none of the runtime filters are getting > disabled, not even the ineffective ones. > {noformat} > Filter 2 (16.00 MB): > - Files processed: 0 (0) > - Files rejected: 0 (0) > - Files total: 0 (0) > - RowGroups processed: 0 (0) > - RowGroups rejected: 0 (0) > - RowGroups total: 0 (0) > - Rows processed: 30.97M (30970695) > - Rows rejected: 0 (0) > - Rows total: 31.01M (31009074) > - Splits processed: 0 (0) > - Splits rejected: 0 (0) > - Splits total: 0 (0) > Filter 4 (8.00 MB): > - Files processed: 0 (0) > - Files rejected: 0 (0) > - Files total: 0 (0) > - RowGroups processed: 0 (0) > - RowGroups rejected: 0 (0) > - RowGroups total: 0 (0) > - Rows processed: 30.97M (30970695) > - Rows rejected: 0 (0) > - Rows total: 31.01M (31009074) > - Splits processed: 0 (0) > - Splits rejected: 0 (0) > - Splits total: 0 (0) > Filter 5 (8.00 MB): > - Files processed: 0 (0) > - Files rejected: 0 (0) > - Files total: 0 (0) > - RowGroups processed: 0 (0) > - RowGroups rejected: 0 (0) > - RowGroups total: 0 (0) > - Rows processed: 30.97M (30970695) > - Rows rejected: 0 (0) > - Rows total: 31.01M (31009074) > - Splits processed: 0 (0) > - Splits rejected: 0 (0) > - Splits total: 0 (0) > Filter 8 (1.00 MB): > - Files processed: 0 (0) > - Files rejected: 0 (0) > - Files total: 0 (0) > - RowGroups processed: 0 (0) > - RowGroups rejected: 0 (0) > - RowGroups total: 0 (0) > - Rows processed: 31.01M (31009074) > - Rows rejected: 0 (0) > - Rows total: 31.01M (31009074) > - Splits processed: 0 (0) > - Splits rejected: 0 (0) > - Splits total: 0 (0) > Filter 10 (1.00 MB): > - Files processed: 0 (0) > - Files rejected: 0 (0) > - Files total: 0 (0) > - RowGroups processed: 0 (0) > - RowGroups rejected: 0 (0) > - RowGroups total: 0 (0) > - Rows processed: 31.01M (31009074) > - Rows rejected: 29.32M (29317263) > - Rows total: 31.01M (31009074) > - Splits processed: 0 (0) > - Splits rejected: 0 (0) > - Splits total: 0 (0) > {noformat} > In contrast here are the filters for mt_dop=0, where not all the rows are > processed. > {noformat} > Filter 2 (16.00 MB): > - Files processed: 0 (0) > - Files rejected: 0 (0) > - Files total: 0 (0) > - RowGroups processed: 0 (0) > - RowGroups rejected: 0 (0) > - RowGroups total: 0 (0) > - Rows processed: 8.18M (8180257) > - Rows rejected: 0 (0) > - Rows total: 180.00M (179998372) > - Splits processed: 0 (0) > - Splits rejected: 0 (0) > - Splits total: 0 (0) > Filter 4 (8.00 MB): > - Files processed: 0 (0) > - Files rejected: 0 (0) > - Files total: 0 (0) > - RowGroups processed: 0 (0) > - RowGroups rejected: 0 (0) > - RowGroups total: 0 (0) > - Rows processed: 8.18M (8180257) > - Rows rejected: 0 (0) > - Rows total: 180.00M (179998372) > - Splits processed: 0 (0) > - Splits rejected: 0 (0) > - Splits total: 0 (0) > Filter 5 (8.00 MB): > - Files processed: 0 (0) > - Files rejected: 0 (0) > - Files
[jira] [Created] (IMPALA-9302) Multithreaded scanners don't check for filter effectiveness
Tim Armstrong created IMPALA-9302: - Summary: Multithreaded scanners don't check for filter effectiveness Key: IMPALA-9302 URL: https://issues.apache.org/jira/browse/IMPALA-9302 Project: IMPALA Issue Type: Improvement Components: Backend Reporter: Tim Armstrong This can be reproduced for TPC-H Q9. I saw this on scale factor 30 locally, where the mt_dop=4 version of the query uses a lot more CPU in the scan than the mt_dop=0 version. This turns out to be because none of the runtime filters are getting disabled, not even the ineffective ones. {noformat} Filter 2 (16.00 MB): - Files processed: 0 (0) - Files rejected: 0 (0) - Files total: 0 (0) - RowGroups processed: 0 (0) - RowGroups rejected: 0 (0) - RowGroups total: 0 (0) - Rows processed: 30.97M (30970695) - Rows rejected: 0 (0) - Rows total: 31.01M (31009074) - Splits processed: 0 (0) - Splits rejected: 0 (0) - Splits total: 0 (0) Filter 4 (8.00 MB): - Files processed: 0 (0) - Files rejected: 0 (0) - Files total: 0 (0) - RowGroups processed: 0 (0) - RowGroups rejected: 0 (0) - RowGroups total: 0 (0) - Rows processed: 30.97M (30970695) - Rows rejected: 0 (0) - Rows total: 31.01M (31009074) - Splits processed: 0 (0) - Splits rejected: 0 (0) - Splits total: 0 (0) Filter 5 (8.00 MB): - Files processed: 0 (0) - Files rejected: 0 (0) - Files total: 0 (0) - RowGroups processed: 0 (0) - RowGroups rejected: 0 (0) - RowGroups total: 0 (0) - Rows processed: 30.97M (30970695) - Rows rejected: 0 (0) - Rows total: 31.01M (31009074) - Splits processed: 0 (0) - Splits rejected: 0 (0) - Splits total: 0 (0) Filter 8 (1.00 MB): - Files processed: 0 (0) - Files rejected: 0 (0) - Files total: 0 (0) - RowGroups processed: 0 (0) - RowGroups rejected: 0 (0) - RowGroups total: 0 (0) - Rows processed: 31.01M (31009074) - Rows rejected: 0 (0) - Rows total: 31.01M (31009074) - Splits processed: 0 (0) - Splits rejected: 0 (0) - Splits total: 0 (0) Filter 10 (1.00 MB): - Files processed: 0 (0) - Files rejected: 0 (0) - Files total: 0 (0) - RowGroups processed: 0 (0) - RowGroups rejected: 0 (0) - RowGroups total: 0 (0) - Rows processed: 31.01M (31009074) - Rows rejected: 29.32M (29317263) - Rows total: 31.01M (31009074) - Splits processed: 0 (0) - Splits rejected: 0 (0) - Splits total: 0 (0) {noformat} In contrast here are the filters for mt_dop=0, where not all the rows are processed. {noformat} Filter 2 (16.00 MB): - Files processed: 0 (0) - Files rejected: 0 (0) - Files total: 0 (0) - RowGroups processed: 0 (0) - RowGroups rejected: 0 (0) - RowGroups total: 0 (0) - Rows processed: 8.18M (8180257) - Rows rejected: 0 (0) - Rows total: 180.00M (179998372) - Splits processed: 0 (0) - Splits rejected: 0 (0) - Splits total: 0 (0) Filter 4 (8.00 MB): - Files processed: 0 (0) - Files rejected: 0 (0) - Files total: 0 (0) - RowGroups processed: 0 (0) - RowGroups rejected: 0 (0) - RowGroups total: 0 (0) - Rows processed: 8.18M (8180257) - Rows rejected: 0 (0) - Rows total: 180.00M (179998372) - Splits processed: 0 (0) - Splits rejected: 0 (0) - Splits total: 0 (0) Filter 5 (8.00 MB): - Files processed: 0 (0) - Files rejected: 0 (0) - Files total: 0 (0) - RowGroups processed: 0 (0) - RowGroups rejected: 0 (0) - RowGroups total: 0 (0) - Rows processed: 8.18M (8180257) - Rows rejected: 0 (0) - Rows total: 180.00M (179998372) - Splits processed: 0 (0) - Splits rejected: 0 (0) - Splits total: 0 (0) Filter 8 (1.00 MB): - Files processed: 0 (0) - Files rejected: 0 (0) - Files total: 0 (0) - RowGroups
[jira] [Created] (IMPALA-9301) Aux error info should detect multiple RPC failures
Sahil Takiar created IMPALA-9301: Summary: Aux error info should detect multiple RPC failures Key: IMPALA-9301 URL: https://issues.apache.org/jira/browse/IMPALA-9301 Project: IMPALA Issue Type: Sub-task Components: Backend Reporter: Sahil Takiar Suggested during the review of ([IMPALA-9296|http://issues.cloudera.org/browse/IMPALA-9296]) [https://gerrit.cloudera.org/#/c/15046/] {quote} I'm not sure that this is the right wa[y] to do it, since it means that if a backend sees multiple rpc failures in a single query only one will ever be reported to the coordinator. Of course, I've been advocating for being aggressive about blacklisting. Suppose there were two rpc failures, then there are two cases here - either both rpcs were to the same other executor, in which case the fact that there were two failures makes us more confident something is going on with that executor and we might actually want to blacklist the executor twice (which will just extend the amount of time that it stays blacklisted for), or the two rpcs were to different executors, in which case if we only blacklist one of them if we then retry the query it may very well fail again. And even if we do want to stay more conservative about blacklisting, you've suggested before (and I agree) that its generally preferable to report as much info about errors as we've got, and then centralize the logic for deciding how to act on those errors in the coordinator. {quote} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Created] (IMPALA-9300) Add a limit on the number of nodes that can be blacklisted per query
Sahil Takiar created IMPALA-9300: Summary: Add a limit on the number of nodes that can be blacklisted per query Key: IMPALA-9300 URL: https://issues.apache.org/jira/browse/IMPALA-9300 Project: IMPALA Issue Type: Sub-task Components: Backend Reporter: Sahil Takiar We currently have no limit on the number of nodes that can be blacklisted if an Exec() RPC fails. For data transfer (TransmitData()) RPC failures, we blacklist at most one node per status update (so typically one node per query). It would be nice to have a global limit on the number of nodes blacklisted to prevent a single query from blacklisting a large part of the cluster. This can help guard against intermittent, cluster-wide, hardware issues that might only last a few seconds. It would be nice if the max number of blacklist-able nodes is a function of the cluster size (e.g. a query cannot blacklist more than a third of the nodes in the cluster). TBD if the value should be configurable or not. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Updated] (IMPALA-9253) Blacklist additional posix error codes for failed DataStreamService RPCs
[ https://issues.apache.org/jira/browse/IMPALA-9253?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sahil Takiar updated IMPALA-9253: - Parent: IMPALA-9299 Issue Type: Sub-task (was: Improvement) > Blacklist additional posix error codes for failed DataStreamService RPCs > > > Key: IMPALA-9253 > URL: https://issues.apache.org/jira/browse/IMPALA-9253 > Project: IMPALA > Issue Type: Sub-task >Reporter: Sahil Takiar >Priority: Major > > Filing as a follow up to > [IMPALA-9137|http://issues.cloudera.org/browse/IMPALA-9137], > [IMPALA-9137|http://issues.cloudera.org/browse/IMPALA-9137] blacklists a node > if a RPC fails with specific posix error codes: > * 107 = ENOTCONN: Transport endpoint is not connected > * 108 = ESHUTDOWN: Cannot send after transport endpoint shutdown > * 111 = ECONNREFUSED: Connection refused > These codes were produced by running a query, killing a node running that > query, and then seeing what error codes the query failed with. > There may be other error codes that are worth using for node blacklisting as > well. One way to come up with more error codes is to use iptables to > introduce network faults between Impala processes and see how RPCs fail. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Updated] (IMPALA-9137) Blacklist node if a DataStreamService RPC to the node fails
[ https://issues.apache.org/jira/browse/IMPALA-9137?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sahil Takiar updated IMPALA-9137: - Parent: IMPALA-9299 Issue Type: Sub-task (was: Bug) > Blacklist node if a DataStreamService RPC to the node fails > --- > > Key: IMPALA-9137 > URL: https://issues.apache.org/jira/browse/IMPALA-9137 > Project: IMPALA > Issue Type: Sub-task > Components: Backend >Reporter: Sahil Takiar >Assignee: Sahil Takiar >Priority: Major > Fix For: Impala 3.4.0 > > > If a query fails because a RPC to a specific node failed, the query error > message will similar to one of the following: > * {{ERROR: TransmitData() to 10.65.30.141:27000 failed: Network error: recv > got EOF from 10.65.30.141:27000 (error 108)}} > * {{ERROR: TransmitData() to 10.65.29.251:27000 failed: Network error: recv > error from 0.0.0.0:0: Transport endpoint is not connected (error 107)}} > * {{ERROR: TransmitData() to 10.65.26.254:27000 failed: Network error: Client > connection negotiation failed: client connection to 10.65.26.254:27000: > connect: Connection refused (error 111)}} > * {{ERROR: EndDataStream() to 127.0.0.1:27002 failed: Network error: recv > error from 0.0.0.0:0: Transport endpoint is not connected (error 107)}} > RPCs are already retried, so it is likely that something is wrong with the > target node. Perhaps it crashed or is so overloaded that it can't process RPC > requests. In any case, the Impala Coordinator should blacklist the target of > the failed RPC so that future queries don't fail with the same error. > If the node crashed, the statestore will eventually remove the failed node > from the cluster as well. However, the statestore can take a while to detect > a failed node because it has a long timeout. The issue is that queries can > still fail in within the timeout window. > This is necessary for transparent query retries because if a node does crash, > it will take too long for the statestore to remove the crashed node from the > cluster. So any attempt at retrying a query will just fail. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Updated] (IMPALA-9224) Blacklist nodes with faulty disks
[ https://issues.apache.org/jira/browse/IMPALA-9224?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sahil Takiar updated IMPALA-9224: - Parent: IMPALA-9299 Issue Type: Sub-task (was: Improvement) > Blacklist nodes with faulty disks > - > > Key: IMPALA-9224 > URL: https://issues.apache.org/jira/browse/IMPALA-9224 > Project: IMPALA > Issue Type: Sub-task > Components: Backend >Reporter: Sahil Takiar >Priority: Critical > > Similar to IMPALA-8339 and IMPALA-9137, Impala should blacklist nodes with > faulty disks. Specifically, if a query fails because of a disk error, the > node with that disk should be blacklisted and the query should be retried. > We shouldn't need to blacklist nodes that fail to read from HDFS / S3, since > they contain their own internal mechanisms for recovering from faulty disks. > We should only blacklist nodes when failing to read / write from *local* > disks. > The two main components of Impala that read / write from local disk are the > spill-to-disk and data caching features. Whenever a query fails because of a > disk failure during spill-to-disk, the node should be blacklisted. > Reads / writes from / to the data cache are a bit different. If a cache read > fails due to a disk error, the error will be printed out and the Lookup() > call to the cache will return 0 bytes read, which means it couldn't find the > data in the cache. This should cause the scan to fall back to a normal, > un-cached read. While this doesn't affect query correctness or the ability > for a query to complete, it can affect performance. Since cache failures > don't result in query failures, we might consider having a threshold of data > cache read / writes errors before blacklisting a node. > We need to be careful to only capture specific disk failures - e.g. disk > quota, permission denied, etc. errors shouldn't result in blacklisting as > they typically are a result of system misconfiguration. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Updated] (IMPALA-9243) Coordinator Web UI should list which executors have been blacklisted
[ https://issues.apache.org/jira/browse/IMPALA-9243?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sahil Takiar updated IMPALA-9243: - Parent: IMPALA-9299 Issue Type: Sub-task (was: Improvement) > Coordinator Web UI should list which executors have been blacklisted > > > Key: IMPALA-9243 > URL: https://issues.apache.org/jira/browse/IMPALA-9243 > Project: IMPALA > Issue Type: Sub-task >Reporter: Sahil Takiar >Priority: Major > > Currently, information about which nodes are blacklisted only shows up in > runtime profiles and Coordinator logs. It would be nice to display > blacklisting information in the Web UI as well so that a user can view which > nodes are blacklisted at any given time. > One potential place to put the blacklisting information is in the /backends > page, which already lists out all the backends part of the cluster. A new > column called "Status" which can have values of either "Active" or > "Blacklisted" would be nice (perhaps we should re-factor the "Quiescing" > column to use the new "Status" column as well). This is similar to what the > Spark Web UI does for blacklisted nodes: > [https://ndu0e1pobsf1dobtvj5nls3q-wpengine.netdna-ssl.com/wp-content/uploads/2019/08/BLACKLIST-SCHEDULING.png] -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Updated] (IMPALA-9137) Blacklist node if a DataStreamService RPC to the node fails
[ https://issues.apache.org/jira/browse/IMPALA-9137?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sahil Takiar updated IMPALA-9137: - Parent: (was: IMPALA-9124) Issue Type: Bug (was: Sub-task) > Blacklist node if a DataStreamService RPC to the node fails > --- > > Key: IMPALA-9137 > URL: https://issues.apache.org/jira/browse/IMPALA-9137 > Project: IMPALA > Issue Type: Bug > Components: Backend >Reporter: Sahil Takiar >Assignee: Sahil Takiar >Priority: Major > Fix For: Impala 3.4.0 > > > If a query fails because a RPC to a specific node failed, the query error > message will similar to one of the following: > * {{ERROR: TransmitData() to 10.65.30.141:27000 failed: Network error: recv > got EOF from 10.65.30.141:27000 (error 108)}} > * {{ERROR: TransmitData() to 10.65.29.251:27000 failed: Network error: recv > error from 0.0.0.0:0: Transport endpoint is not connected (error 107)}} > * {{ERROR: TransmitData() to 10.65.26.254:27000 failed: Network error: Client > connection negotiation failed: client connection to 10.65.26.254:27000: > connect: Connection refused (error 111)}} > * {{ERROR: EndDataStream() to 127.0.0.1:27002 failed: Network error: recv > error from 0.0.0.0:0: Transport endpoint is not connected (error 107)}} > RPCs are already retried, so it is likely that something is wrong with the > target node. Perhaps it crashed or is so overloaded that it can't process RPC > requests. In any case, the Impala Coordinator should blacklist the target of > the failed RPC so that future queries don't fail with the same error. > If the node crashed, the statestore will eventually remove the failed node > from the cluster as well. However, the statestore can take a while to detect > a failed node because it has a long timeout. The issue is that queries can > still fail in within the timeout window. > This is necessary for transparent query retries because if a node does crash, > it will take too long for the statestore to remove the crashed node from the > cluster. So any attempt at retrying a query will just fail. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Updated] (IMPALA-8339) Coordinator should be more resilient to fragment instances startup failure
[ https://issues.apache.org/jira/browse/IMPALA-8339?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sahil Takiar updated IMPALA-8339: - Parent: IMPALA-9299 Issue Type: Sub-task (was: Improvement) > Coordinator should be more resilient to fragment instances startup failure > -- > > Key: IMPALA-8339 > URL: https://issues.apache.org/jira/browse/IMPALA-8339 > Project: IMPALA > Issue Type: Sub-task > Components: Distributed Exec >Reporter: Michael Ho >Assignee: Thomas Tauber-Marshall >Priority: Critical > Labels: Availability, resilience > Fix For: Impala 3.3.0 > > > Impala currently relies on statestore for cluster membership. When an Impala > executor goes offline, it may take a while for statestore to declare that > node as unavailable and for that information to be propagated to all > coordinator nodes. Within this window, some coordinator nodes may still > attempt to issue RPCs to the faulty node, resulting in RPC failures which > resulted in query failures. In other words, many queries may fail to start > within this window until all coordinator nodes get the latest information on > cluster membership. > Going forward, coordinator may need to fall back to using backup executors > for each fragments in case some of the executors are not available. Moreover, > *coordinator should treat the cluster membership information from statestore > (or any external source of truth e.g. etcd) as hints instead of ground truth* > and adjust the scheduling of fragment instances based on the availability of > the executors from the coordinator's perspective. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Created] (IMPALA-9299) Node Blacklisting: Coordinators should blacklist unhealthy nodes
Sahil Takiar created IMPALA-9299: Summary: Node Blacklisting: Coordinators should blacklist unhealthy nodes Key: IMPALA-9299 URL: https://issues.apache.org/jira/browse/IMPALA-9299 Project: IMPALA Issue Type: New Feature Components: Backend Reporter: Sahil Takiar Assignee: Thomas Tauber-Marshall Top level JIRA for Node Blacklisting. High level description of node blacklisting, from IMPALA-8339: {quote} This patch adds the concept of a blacklist of executors to the coordinator, which removes executors from consideration for query scheduling. Blacklisting decisions are local to a given coordinator and are not included in statestore updates. The intention is to allow coordinators to be more aggressive about deciding that an exeutor is unhealthy or unavailable, to minimize failed queries in environments where cluster membership may be more variable, rather than having to wait on the statestore heartbeat mechanism to decide that the executor is down. {quote} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Commented] (IMPALA-9287) test_kudu_table_create_without_hms fails on Hive-3 environment
[ https://issues.apache.org/jira/browse/IMPALA-9287?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17017422#comment-17017422 ] Vihang Karajgaonkar commented on IMPALA-9287: - Thanks [~skyyws]. Let me know if you need any help. > test_kudu_table_create_without_hms fails on Hive-3 environment > -- > > Key: IMPALA-9287 > URL: https://issues.apache.org/jira/browse/IMPALA-9287 > Project: IMPALA > Issue Type: Test > Components: Infrastructure >Affects Versions: Impala 3.4.0 >Reporter: Vihang Karajgaonkar >Assignee: WangSheng >Priority: Blocker > Labels: broken-build > > {{test_kudu_table_create_without_hms}} which was added recently in > IMPALA-9266 fails when Hive-3 is used. To reproduce the issue build Impala > after setting {{USE_CDP_HIVE=true}} and then run the test. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Comment Edited] (IMPALA-9281) Inferred predicates not assigned to scan nodes when views are involved
[ https://issues.apache.org/jira/browse/IMPALA-9281?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17017296#comment-17017296 ] Fang-Yu Rao edited comment on IMPALA-9281 at 1/16/20 4:55 PM: -- After some initial investigation, I found that the for-loop at [https://github.com/apache/impala/blob/master/fe/src/main/java/org/apache/impala/analysis/Analyzer.java#L1754-L1780] in the method {{getBoundPredicates()}} of {{Analyzer.java}} did not create an inferred predicate for the query in the following according to the log file {{impalad.INFO}}. This is the Query 1 described in the problem description above. Note that we need to change the log level of {{org.apache.impala.analysis.Analyzer}} to {{TRACE}} at [http://localhost:25000/log_level]. {code:java} select * from default.myview_1_on_2_parquet_tables a, myview_2_on_2_parquet_tables b where a.table_source = 'ONE' and a.table_source = b.table_source_a; {code} On the other hand, we could find some inferred predicates after executing the following query. {code:java} select * from default.myview_1_on_2_parquet_tables a, myview_2_on_2_parquet_tables b where a.c2 = b.c2a and a.c2 = 'one'; {code} Specifically, we could find the following line in {{impalad.INFO}}. It turns out an inferred predicate {{pta1.c2a = 'one'}} was generated. {code:java} I0116 08:32:04.089465 21718 Analyzer.java:1750] ac4181dbf41da68e:c53b028e] new pred: default.pta1.c2a = 'one' BinaryPredicate{op==, SlotRef{label=default.pta1.c2a, path=c2a, type=STRING, id=11} StringLiteral{value=one}, isInferred=true} {code} According to [https://github.com/apache/impala/blob/master/fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java#L1422] and [https://github.com/apache/impala/blob/master/fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java#L1428], the list of inferred predicates {{conjuncts}} will later be fed into Analyzer.createEquivConjuncts(). If the inferred predicate(s) is(are) not correctly generated, then it seems that {{Analyzer.createEquivConjuncts()}} would not produce a plan that takes into consideration the inferred predicate(s) we expected, e.g., "{{b.table_source_a = 'ONE'}}". The only difference between those 2 queries above is that the column involved in the first query, i.e., {{a.table_source}}, is a constant-valued column, whereas the column involved in the second query is not. Hence, we may need to figure out how the planner performs predicate inference under these 2 scenarios. was (Author: fangyurao): After some initial investigation, I found that the for-loop at https://github.com/apache/impala/blob/master/fe/src/main/java/org/apache/impala/analysis/Analyzer.java#L1754-L1780 in the method {{getBoundPredicates()}} of {{Analyzer.java}} did not create an inferred predicate for the query in the following according to the log file {{impalad.INFO}}. This is the Query 1 described in the problem description above. {code:java} select * from default.myview_1_on_2_parquet_tables a, myview_2_on_2_parquet_tables b where a.table_source = 'ONE' and a.table_source = b.table_source_a; {code} On the other hand, we could find some inferred predicates after executing the following query. {code:java} select * from default.myview_1_on_2_parquet_tables a, myview_2_on_2_parquet_tables b where a.c2 = b.c2a and a.c2 = 'one'; {code} Specifically, we could find the following line in {{impalad.INFO}}. It turns out an inferred predicate {{pta1.c2a = 'one'}} was generated. {code:java} I0116 08:32:04.089465 21718 Analyzer.java:1750] ac4181dbf41da68e:c53b028e] new pred: default.pta1.c2a = 'one' BinaryPredicate{op==, SlotRef{label=default.pta1.c2a, path=c2a, type=STRING, id=11} StringLiteral{value=one}, isInferred=true} {code} According to https://github.com/apache/impala/blob/master/fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java#L1422 and https://github.com/apache/impala/blob/master/fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java#L1428, the list of inferred predicates {{conjuncts}} will later be fed into Analyzer.createEquivConjuncts(). If the inferred predicate(s) is(are) not correctly generated, then it seems that {{Analyzer.createEquivConjuncts()}} would not produce a plan that takes into consideration the inferred predicate(s). > Inferred predicates not assigned to scan nodes when views are involved > -- > > Key: IMPALA-9281 > URL: https://issues.apache.org/jira/browse/IMPALA-9281 > Project: IMPALA > Issue Type: Bug > Components: Frontend >Affects Versions: Impala 3.4.0 >Reporter: Fang-Yu Rao >Assignee: Fang-Yu Rao >Priority: Major > Attachments: profile_query_1_parquet.txt, profile_query_2_parquet.txt > >
[jira] [Commented] (IMPALA-9281) Inferred predicates not assigned to scan nodes when views are involved
[ https://issues.apache.org/jira/browse/IMPALA-9281?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17017296#comment-17017296 ] Fang-Yu Rao commented on IMPALA-9281: - After some initial investigation, I found that the for-loop at https://github.com/apache/impala/blob/master/fe/src/main/java/org/apache/impala/analysis/Analyzer.java#L1754-L1780 in the method {{getBoundPredicates()}} of {{Analyzer.java}} did not create an inferred predicate for the query in the following according to the log file {{impalad.INFO}}. This is the Query 1 described in the problem description above. {code:java} select * from default.myview_1_on_2_parquet_tables a, myview_2_on_2_parquet_tables b where a.table_source = 'ONE' and a.table_source = b.table_source_a; {code} On the other hand, we could find some inferred predicates after executing the following query. {code:java} select * from default.myview_1_on_2_parquet_tables a, myview_2_on_2_parquet_tables b where a.c2 = b.c2a and a.c2 = 'one'; {code} Specifically, we could find the following line in {{impalad.INFO}}. It turns out an inferred predicate {{pta1.c2a = 'one'}} was generated. {code:java} I0116 08:32:04.089465 21718 Analyzer.java:1750] ac4181dbf41da68e:c53b028e] new pred: default.pta1.c2a = 'one' BinaryPredicate{op==, SlotRef{label=default.pta1.c2a, path=c2a, type=STRING, id=11} StringLiteral{value=one}, isInferred=true} {code} According to https://github.com/apache/impala/blob/master/fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java#L1422 and https://github.com/apache/impala/blob/master/fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java#L1428, the list of inferred predicates {{conjuncts}} will later be fed into Analyzer.createEquivConjuncts(). If the inferred predicate(s) is(are) not correctly generated, then it seems that {{Analyzer.createEquivConjuncts()}} would not produce a plan that takes into consideration the inferred predicate(s). > Inferred predicates not assigned to scan nodes when views are involved > -- > > Key: IMPALA-9281 > URL: https://issues.apache.org/jira/browse/IMPALA-9281 > Project: IMPALA > Issue Type: Bug > Components: Frontend >Affects Versions: Impala 3.4.0 >Reporter: Fang-Yu Rao >Assignee: Fang-Yu Rao >Priority: Major > Attachments: profile_query_1_parquet.txt, profile_query_2_parquet.txt > > > When a query involves the join of views each created based on multiple > tables, the inferred predicate(s) is(are) not assigned to the scan node(s). > This issue is/seems related to > https://issues.apache.org/jira/browse/IMPALA-4578#. > In the following a minimum example to reproduce the phenomenon. > {code:java} > CREATE TABLE default.pt1 ( >c1 INT, >c2 STRING > ) > STORED AS PARQUET; > insert into pt1 values (1, 'one'); > CREATE TABLE default.pt2 ( >c1 INT, >c2 STRING > ) > STORED AS PARQUET; > insert into pt2 values (2, 'two'); > CREATE TABLE default.pta1 ( >c1a INT, >c2a STRING > ) > STORED AS PARQUET; > insert into pta1 values (1,'one'); > CREATE TABLE default.pta2 ( >c1a INT, >c2a STRING > ) > STORED AS PARQUET; > insert into pta2 values (2,'two'); > CREATE VIEW myview_1_on_2_parquet_tables AS > SELECT 'ONE' table_source, c1, c2 FROM `default`.pt1 > UNION ALL > SELECT 'TWO' table_source, c1, c2 FROM `default`.pt2; > CREATE VIEW myview_2_on_2_parquet_tables AS > SELECT 'ONE' table_source_a, c1a, c2a FROM `default`.pta1 > UNION ALL > SELECT 'TWO' table_source_a, c1a, c2a FROM `default`.pta2; > {code} > For easy reference, the contents of tables {{pt1}}, {{pt2}}, {{pta1}}, > {{pta2}}, and views {{myview_1_on_2_tables}}, {{myview_2_on_2_tables}} are > also given as follows. > Contents of table {{pt1}} afterwards: > {code:java} > ++-+ > | c1 | c2 | > ++-+ > | 1 | one | > ++-+ > {code} > Contents of table {{pt2}} afterwards: > {code:java} > ++-+ > | c1 | c2 | > ++-+ > | 2 | two | > ++-+ > {code} > Contents of table {{pta1}} afterwards: > {code:java} > +-+-+ > | c1a | c2a | > +-+-+ > | 1 | one | > +-+-+ > {code} > Contents of table {{pta2}} afterwards: > {code:java} > +-+-+ > | c1a | c2a | > +-+-+ > | 2 | two | > +-+-+ > {code} > Contents in {{myview_1_on_2_parquet_tables}} (union of tables {{t1}} and > {{t2}}): > {code:java} > +--++-+ > | table_source | c1 | c2 | > +--++-+ > | ONE | 1 | one | > | TWO | 2 | two | > +--++-+ > {code} > Contents in {{myview_2_on_2_parquet_tables}} (union of tables {{ta1}} and > {{ta2}}): > {code:java} > ++-+-+ > | table_source_a | c1a | c2a | >